This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Overview

Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices"

Introduction

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Dependencies

This code is tested under Ubuntu 16.04, CUDA 11.2 environment with two NVIDIA RTX or V100 GPUs.

Python 3.6.5 version with virtualenv is used for development.

Directory

Root

The ${ROOT} is described as below.

${ROOT}
|-- data
|-- demo
|-- common
|-- main
|-- tool
|-- vis
`-- output
  • data contains data loading codes and soft links to images and annotations directories.
  • demo contains demo codes.
  • common contains kernel codes for 3d multi-person pose estimation system. Also custom backbone is implemented in this repo
  • main contains high-level codes for training or testing the network.
  • tool contains data pre-processing codes. You don't have to run this code. I provide pre-processed data below.
  • vis contains scripts for 3d visualization.
  • output contains log, trained models, visualized outputs, and test result.

Data

You need to follow directory structure of the data as below.

${POSE_ROOT}
|-- data
|   |-- Human36M
|   |   |-- bbox_root
|   |   |   |-- bbox_root_human36m_output.json
|   |   |-- images
|   |   |-- annotations
|   |-- MPII
|   |   |-- images
|   |   |-- annotations
|   |-- MSCOCO
|   |   |-- bbox_root
|   |   |   |-- bbox_root_coco_output.json
|   |   |-- images
|   |   |   |-- train2017
|   |   |   |-- val2017
|   |   |-- annotations
|   |-- MuCo
|   |   |-- data
|   |   |   |-- augmented_set
|   |   |   |-- unaugmented_set
|   |   |   |-- MuCo-3DHP.json
|   |-- MuPoTS
|   |   |-- bbox_root
|   |   |   |-- bbox_mupots_output.json
|   |   |-- data
|   |   |   |-- MultiPersonTestSet
|   |   |   |-- MuPoTS-3D.json

Output

You need to follow the directory structure of the output folder as below.

${POSE_ROOT}
|-- output
|-- |-- log
|-- |-- model_dump
|-- |-- result
`-- |-- vis
  • Creating output folder as soft link form is recommended instead of folder form because it would take large storage capacity.
  • log folder contains training log file.
  • model_dump folder contains saved checkpoints for each epoch.
  • result folder contains final estimation files generated in the testing stage.
  • vis folder contains visualized results.

3D visualization

  • Run $DB_NAME_img_name.py to get image file names in .txt format.
  • Place your test result files (preds_2d_kpt_$DB_NAME.mat, preds_3d_kpt_$DB_NAME.mat) in single or multi folder.
  • Run draw_3Dpose_$DB_NAME.m

Running 3DMPPE_POSENET

Requirements

cd main
pip install -r requirements.txt

Setup Training

  • In the main/config.py, you can change settings of the model including dataset to use, network backbone, and input size and so on.

Train

In the main folder, run

python train.py --gpu 0-1 --backbone LPSKI

to train the network on the GPU 0,1.

If you want to continue experiment, run

python train.py --gpu 0-1 --backbone LPSKI --continue

--gpu 0,1 can be used instead of --gpu 0-1.

Test

Place trained model at the output/model_dump/.

In the main folder, run

python test.py --gpu 0-1 --test_epoch 20-21 --backbone LPSKI

to test the network on the GPU 0,1 with 20th and 21th epoch trained model. --gpu 0,1 can be used instead of --gpu 0-1. For the backbone you can either choose BACKBONE_DICT = { 'LPRES':LpNetResConcat, 'LPSKI':LpNetSkiConcat, 'LPWO':LpNetWoConcat }

Human3.6M dataset using protocol 1

For the evaluation, you can run test.py or there are evaluation codes in Human36M.

Human3.6M dataset using protocol 2

For the evaluation, you can run test.py or there are evaluation codes in Human36M.

MuPoTS-3D dataset

For the evaluation, run test.py. After that, move data/MuPoTS/mpii_mupots_multiperson_eval.m in data/MuPoTS/data. Also, move the test result files (preds_2d_kpt_mupots.mat and preds_3d_kpt_mupots.mat) in data/MuPoTS/data. Then run mpii_mupots_multiperson_eval.m with your evaluation mode arguments.

TFLite inference

For the inference in mobile devices we also tested in mobile devices which converting PyTorch implementation through onnx and finally serving into TFlite. Official demo app is available in here

Reference

What this repo cames from: Training section and is based on following paper and github

@InProceedings{Moon_2019_ICCV_3DMPPE,
  author = {Moon, Gyeongsik and Chang, Juyong and Lee, Kyoung Mu},
  title = {Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image},
  booktitle = {The IEEE Conference on International Conference on Computer Vision (ICCV)},
  year = {2019}
}
Comments
  • some questions about config

    some questions about config

    Hi Author,

    I have some questions about the settings in config.py a. the default output size is 256x256 and there's an assertion in input size check: assert input_size[1] in [256] does it mean other input size cannot be applied? such as 320x320 etc.

    b. the default output shape is devided by 8: output_shape = (input_shape[0]//8, input_shape[1]//8) can it be changed? e.g, divided by 4 instead. Meanwhile, the default depth_dim is set to 32. Is there any requirement on depth and output_shape? Since I notice there are some reshape operations in soft_max, I am a little confused. Based on the reshape operations, it looks that depth_dim and output_shape need to be the same value.

    Wait for your reply, big thanks.

    opened by liamsun2019 7
  • "loss_coord" does not going down

    Hi, Really appreciate your efforts on this repo. I'm following your work to generate a model for my own "toy" project. But I can't reach the same metrics as you did. The loss_coord keeps fluctuating around 0.4-0.5 even though the learning rate decayed to 1e-5. This is my settings:

    ## input, output
    input_shape = (256, 256) 
    output_shape = (input_shape[0]//8, input_shape[1]//8)
    width_multiplier = 1.0
    depth_dim = 32
    bbox_3d_shape = (2000, 2000, 2000) # depth, height, width
    pixel_mean = (0.485, 0.456, 0.406)
    pixel_std = (0.229, 0.224, 0.225)
    
    ## training config
    embedding_size = 2048
    lr_dec_epoch = [17, 21]
    end_epoch = 25
    lr = 1e-3
    lr_dec_factor = 10
    batch_size = 24
    

    Thanks

    opened by Secondgrade 6
  • why the output of the network is 64*64*(18*64)

    why the output of the network is 64*64*(18*64)

    Thanks for your great work! But i'm confused about the output of the network, your paper said the output is 64641152 which means (output_resoutput_res(joint_num*64)), what dose this 64 mean? And what it stands for?

    opened by SatMa34 5
  • about training/test dataset

    about training/test dataset

    Hi Author,

    Thanks for your excellent work. The default setting are defined in config.py as follows:

    trainset_3d = ['MuCo'] trainset_2d = ['MSCOCO']

    testset = 'MuPoTS'

    Based on your paper, it looks to have 2 different training/test configs, one is Human3.6M+MPII, the other is Muco+COCO. So which setting is better or they are just two alternatives and any one is applicable? Thanks for your time.

    opened by liamsun2019 3
  • Pretrained model is not there yet?

    Pretrained model is not there yet?

    Dear SangbumChoi,

    In repository it is said that there is pretrained model of your work. but I can not find it (I'm new to github community). Could you please guide me in the right direction where to find it or if it is not there could you share it with me. I would like to test it out and see how it works.

    I managed to find ONNX version of your work in: https://github.com/PINTO0309/PINTO_model_zoo/blob/main/156_MobileHumanPose/download_mobile_human_pose_working_well.sh but I'm not sure if it works as your model and I would like to run it using pytorch. :)

    Thank you for your work!

    opened by AivisStud 3
  • What is  target_vis = target['vis'], target_have_depth = target['have_depth'] in model.py?

    What is target_vis = target['vis'], target_have_depth = target['have_depth'] in model.py?

    I thought that only 3d joint coordinate x,y,z are needed to train model for 3d dateset, But when I analyzed model.py, there are target_vis, target_have_depth for loss function.

    loss_coord = torch.abs(coord - target_coord) * target_vis
                loss_coord = (loss_coord[:,:,0] + loss_coord[:,:,1] + loss_coord[:,:,2] * target_have_depth)/3.
                return loss_coord
    

    Can you explain what they are?

    opened by tkddnjs98 2
  • backbone object has no attribute 'init_weight' when training

    backbone object has no attribute 'init_weight' when training

    When trying each of the backbones I am getting the following error (using LPRES as example, but same with all networks):

    ============================================================
    LPRES BackBone Generated
    ============================================================
    Traceback (most recent call last):
      File "main/train.py", line 84, in <module>
        main()
      File "main/train.py", line 39, in main
        trainer._make_model()
      File "main/../common/base.py", line 125, in _make_model
        model = get_pose_net(self.backbone, True, self.joint_num)
      File "main/model.py", line 83, in get_pose_net
        model.backbone.init_weight()
      File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in __getattr__
        raise AttributeError("'{}' object has no attribute '{}'".format(
    AttributeError: 'LpNetResConcat' object has no attribute 'init_weight'
    

    When looking at the commit history the problem arrived at this commit bdbc6c14a58d857cab7b654fd02a356ae4b9c6c7

    bug 
    opened by Itay2805 2
  • Small mistake/change in model?

    Small mistake/change in model?

    Hi SangbumChoi,

    I was trying to understand your code a little bit and I got a confusion a little bit (I'm new to this so I might be wrong). I was checking your residual inverted block configuration. according your paper: you modified first 4 layers. here is what i noticed: image

    1st. At first bottleneck you increase the channel output to 64. Looking at the MobilenetV2 structure. shouldn't it go down to 24 and then go up?

    2th. bottleneck layers 4,5 has swapped stride. is this a typo or new modification since your paper came out?

    thank you for your good work!

    bug 
    opened by AivisStud 1
  • issue while onnx to tensorrt conversion

    issue while onnx to tensorrt conversion

    ERROR:EngineBuilder:In node 143 (importResize): UNSUPPORTED_NODE: Assertion failed: scales.is_weights() && "Resize scales must be an initializer!"

    opened by rohanpawar294 1
  • train with custom data

    train with custom data

    hi, thanks for sharing the great job. If using custom data, what pre-process should I do? For example, normalizing the 3D coordinate data into [2000,2000,2000], normalizing the 2D data into [256, 256]?

    opened by ggfresh 1
  • Training on custom dataset

    Training on custom dataset

    Hello, I'm trying to train my own dataset using this model. I'm trying to make this to COCO format. However, this dataset has problem because it only includes [x, y, z] and no bounding box. The dataset has 1person per image with [x, y, z(depth)] coordinates. Can I use it? Or should I make bounding box per image?

    opened by tkddnjs98 0
  • MPJPE too high for protocol 2

    MPJPE too high for protocol 2

    Hi

    Thank you for providing the training , eval and data preparation scripts. I followed the readme and set up the data and all the scripts in correct locations as indicated by t h file structure in readme. I kept the exact same configuration as the config.py and ran the training script for the same number of epochs. The train datase was human36 and mpii and tes set was human36 . protocol followed was the default in the code which is 2.

    when i used the saved checkpoints to run the inference i get : Protocol 2 error (MPJPE) >> tot: 67.92 Directions: 60.23 Discussion: 72.39 Eating: 60.08 Greeting: 62.74 Phoning: 66.70 Posing: 59.33 Purchases: 61.72 Sitting: 81.54 SittingDown: 90.77 Smoking: 67.13 Photo: 80.52 Waiting: 63.39 Walking: 51.68 WalkDog: 71.55 WalkTogether: 58.84

    which is obviously pretty high than what is expected. Just as a note i was getting an error while using broadcast in model.py module 'torch.nn.parallel' has no attribute 'comm' So I changed the original script from

    accu_x = accu_x * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.output_shape[1]+1).type(torch.cuda.FloatTensor), devices=[accu_x.device.index])[0]
    accu_y = accu_y * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.output_shape[0]+1).type(torch.cuda.FloatTensor), devices=[accu_y.device.index])[0]
    accu_z = accu_z * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.depth_dim+1).type(torch.cuda.FloatTensor), devices=[accu_z.device.index])[0]
    

    TO:

    accu_x = accu_x * torch.arange(1, cfg.output_shape[1] + 1).type(torch.cuda.FloatTensor)
    accu_y = accu_y * torch.arange(1, cfg.output_shape[0] + 1).type(torch.cuda.FloatTensor)
    accu_z = accu_z * torch.arange(1, cfg.depth_dim + 1).type(torch.cuda.FloatTensor)
    

    I didnot change anything else including batch size. And again same config file. If you can please help me root cause the issue it will be very helpful. Thanks

    bug 
    opened by baishali1986 4
  • Difference between results from inference and the paper

    Difference between results from inference and the paper

    First, thanks for your great work.

    I trained the model using script 'python train.py --gpu 0-1 --backbone LPSKI' with Human3.6M and MPII datasets. The protocol is 1 and train epoch was 25.

    And I tested the model with test.py and the result is like below : Protocol 1 error (PA MPJPE) >> tot: 42.72 Directions: 37.63 Discussion: 39.01 Eating: 45.51 Greeting: 43.06 Phoning: 41.33 Posing: 41.10 Purchases: 35.78 Sitting: 43.50 SittingDown: 57.36 Smoking: 47.08 Photo: 51.04 Waiting: 38.32 Walking: 30.94 WalkDog: 46.21 WalkTogether: 38.39

    I found the average MPJPE of Protocol1 on paper is 35.2 which is different from my result. Did I miss something to get the right result?? Like other settings in config.py...

    Also, my train time was 16 hours with RTX2080 and the train time on paper is 3 days with 2 RTX titans. So I also wonder what makes time difference between my result and the paper.

    bug enhancement 
    opened by unoShin 24
Owner
Choi Sang Bum
Deep Learning will be implemented inside Mobile [email protected]
Choi Sang Bum
Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos Introduction This repo is official PyTorch implementatio

Gyeongsik Moon 29 Sep 24, 2022
Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices, ACM Multimedia 2021

Codes for ECBSR Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices Xindong Zhang, Hui Zeng, Lei Zhang ACM Multimedia 202

xindong zhang 236 Dec 26, 2022
Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

Pyjcsx 328 Dec 17, 2022
Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors Human POSEitioning System (H

Aymen Mir 66 Dec 21, 2022
Python and C++ implementation of "MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation". Accepted at LXCV @ CVPR 2021.

MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation This is a PyTorch and LibTorch implementation of MarkerPose: a

Jhacson Meza 47 Nov 18, 2022
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021) Introduction This repository is the offical Pytorch implementation of

null 37 Nov 21, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 5, 2023
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

Ce Zheng 363 Dec 28, 2022
This is an official implementation for "Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation".

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation This repo is the official implementation of Exploiting Temporal Con

Vegetabird 241 Jan 7, 2023
Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

Meta Research 122 Dec 13, 2022
PyTorch implementation for 3D human pose estimation

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach This repository is the PyTorch implementation for the network presented in:

Xingyi Zhou 579 Dec 22, 2022
WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose Yijun Zhou and James Gregson - BMVC2020 Abstract: We present an end-to-end head-pos

null 368 Dec 26, 2022
Demo for Real-time RGBD-based Extended Body Pose Estimation paper

Real-time RGBD-based Extended Body Pose Estimation This repository is a real-time demo for our paper that was published at WACV 2021 conference The ou

Renat Bashirov 118 Dec 26, 2022
Real-time pose estimation accelerated with NVIDIA TensorRT

trt_pose Want to detect hand poses? Check out the new trt_pose_hand project for real-time hand pose and gesture recognition! trt_pose is aimed at enab

NVIDIA AI IOT 803 Jan 6, 2023
Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

TCMR: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video Qualtitative result Paper teaser video Introduction This r

Hongsuk Choi 215 Jan 6, 2023
The official repo for CVPR2021——ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search.

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search [paper] Introduction This is the official implementation of ViPNAS: Efficient V

Lumin 42 Sep 26, 2022
The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

PIC4SeRCentre 20 Jan 3, 2023
Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Human Pose Regression with Residual Log-likelihood Estimation [Paper] [arXiv] [Project Page] Human Pose Regression with Residual Log-likelihood Estima

JeffLi 347 Dec 24, 2022