This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Choi Sang Bum

Last update: Jan 5, 2023

Related tags

Overview

Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices"

Introduction

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Dependencies

This code is tested under Ubuntu 16.04, CUDA 11.2 environment with two NVIDIA RTX or V100 GPUs.

Python 3.6.5 version with virtualenv is used for development.

Running 3DMPPE_POSENET

Requirements

cd main
pip install -r requirements.txt

Setup Training

In the main/config.py, you can change settings of the model including dataset to use, network backbone, and input size and so on.

Train

In the main folder, run

python train.py --gpu 0-1 --backbone LPSKI

to train the network on the GPU 0,1.

If you want to continue experiment, run

python train.py --gpu 0-1 --backbone LPSKI --continue

--gpu 0,1 can be used instead of --gpu 0-1.

Test

Place trained model at the output/model_dump/.

In the main folder, run

python test.py --gpu 0-1 --test_epoch 20-21 --backbone LPSKI

to test the network on the GPU 0,1 with 20th and 21th epoch trained model. --gpu 0,1 can be used instead of --gpu 0-1. For the backbone you can either choose BACKBONE_DICT = { 'LPRES':LpNetResConcat, 'LPSKI':LpNetSkiConcat, 'LPWO':LpNetWoConcat }

Human3.6M dataset using protocol 1

For the evaluation, you can run test.py or there are evaluation codes in Human36M.

Human3.6M dataset using protocol 2

For the evaluation, you can run test.py or there are evaluation codes in Human36M.

MuPoTS-3D dataset

For the evaluation, run test.py. After that, move data/MuPoTS/mpii_mupots_multiperson_eval.m in data/MuPoTS/data. Also, move the test result files (preds_2d_kpt_mupots.mat and preds_3d_kpt_mupots.mat) in data/MuPoTS/data. Then run mpii_mupots_multiperson_eval.m with your evaluation mode arguments.

TFLite inference

For the inference in mobile devices we also tested in mobile devices which converting PyTorch implementation through onnx and finally serving into TFlite. Official demo app is available in here

Reference

What this repo cames from: Training section and is based on following paper and github

PyTorch implementation of Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image (ICCV 2019).
Flexible and simple code.
Compatibility for most of the publicly available 2D and 3D, single and multi-person pose estimation datasets including Human3.6M, MPII, MS COCO 2017, MuCo-3DHP and MuPoTS-3D.
Human pose estimation visualization code.

@InProceedings{Moon_2019_ICCV_3DMPPE,
  author = {Moon, Gyeongsik and Chang, Juyong and Lee, Kyoung Mu},
  title = {Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image},
  booktitle = {The IEEE Conference on International Conference on Computer Vision (ICCV)},
  year = {2019}
}

Comments

some questions about config

Hi Author,

I have some questions about the settings in config.py a. the default output size is 256x256 and there's an assertion in input size check: assert input_size[1] in [256] does it mean other input size cannot be applied? such as 320x320 etc.

b. the default output shape is devided by 8: output_shape = (input_shape[0]//8, input_shape[1]//8) can it be changed? e.g, divided by 4 instead. Meanwhile, the default depth_dim is set to 32. Is there any requirement on depth and output_shape? Since I notice there are some reshape operations in soft_max, I am a little confused. Based on the reshape operations, it looks that depth_dim and output_shape need to be the same value.

Wait for your reply, big thanks.

opened by liamsun2019 7

"loss_coord" does not going down

Hi, Really appreciate your efforts on this repo. I'm following your work to generate a model for my own "toy" project. But I can't reach the same metrics as you did. The loss_coord keeps fluctuating around 0.4-0.5 even though the learning rate decayed to 1e-5. This is my settings:

## input, output
input_shape = (256, 256) 
output_shape = (input_shape[0]//8, input_shape[1]//8)
width_multiplier = 1.0
depth_dim = 32
bbox_3d_shape = (2000, 2000, 2000) # depth, height, width
pixel_mean = (0.485, 0.456, 0.406)
pixel_std = (0.229, 0.224, 0.225)

## training config
embedding_size = 2048
lr_dec_epoch = [17, 21]
end_epoch = 25
lr = 1e-3
lr_dec_factor = 10
batch_size = 24

Thanks

opened by Secondgrade 6

why the output of the network is 64*64*(18*64)

Thanks for your great work! But i'm confused about the output of the network, your paper said the output is 64641152 which means (output_resoutput_res(joint_num*64)), what dose this 64 mean? And what it stands for?

opened by SatMa34 5
about training/test dataset

Hi Author,

Thanks for your excellent work. The default setting are defined in config.py as follows:

trainset_3d = ['MuCo'] trainset_2d = ['MSCOCO']

testset = 'MuPoTS'

Based on your paper, it looks to have 2 different training/test configs, one is Human3.6M+MPII, the other is Muco+COCO. So which setting is better or they are just two alternatives and any one is applicable? Thanks for your time.

opened by liamsun2019 3
Pretrained model is not there yet?

Dear SangbumChoi,

In repository it is said that there is pretrained model of your work. but I can not find it (I'm new to github community). Could you please guide me in the right direction where to find it or if it is not there could you share it with me. I would like to test it out and see how it works.

I managed to find ONNX version of your work in: https://github.com/PINTO0309/PINTO_model_zoo/blob/main/156_MobileHumanPose/download_mobile_human_pose_working_well.sh but I'm not sure if it works as your model and I would like to run it using pytorch. :)

Thank you for your work!

opened by AivisStud 3
What is target_vis = target['vis'], target_have_depth = target['have_depth'] in model.py?
I thought that only 3d joint coordinate x,y,z are needed to train model for 3d dateset, But when I analyzed model.py, there are target_vis, target_have_depth for loss function.

loss_coord = torch.abs(coord - target_coord) * target_vis loss_coord = (loss_coord[:,:,0] + loss_coord[:,:,1] + loss_coord[:,:,2] * target_have_depth)/3. return loss_coord

Can you explain what they are?
opened by tkddnjs98 2

backbone object has no attribute 'init_weight' when training

When trying each of the backbones I am getting the following error (using LPRES as example, but same with all networks):

============================================================
LPRES BackBone Generated
============================================================
Traceback (most recent call last):
  File "main/train.py", line 84, in <module>
    main()
  File "main/train.py", line 39, in main
    trainer._make_model()
  File "main/../common/base.py", line 125, in _make_model
    model = get_pose_net(self.backbone, True, self.joint_num)
  File "main/model.py", line 83, in get_pose_net
    model.backbone.init_weight()
  File "/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'LpNetResConcat' object has no attribute 'init_weight'

When looking at the commit history the problem arrived at this commit bdbc6c14a58d857cab7b654fd02a356ae4b9c6c7

bug

opened by Itay2805 2

Small mistake/change in model?

Hi SangbumChoi,

I was trying to understand your code a little bit and I got a confusion a little bit (I'm new to this so I might be wrong). I was checking your residual inverted block configuration. according your paper: you modified first 4 layers. here is what i noticed:

1st. At first bottleneck you increase the channel output to 64. Looking at the MobilenetV2 structure. shouldn't it go down to 24 and then go up?

2th. bottleneck layers 4,5 has swapped stride. is this a typo or new modification since your paper came out?

thank you for your good work!
bug

opened by AivisStud 1
issue while onnx to tensorrt conversion

ERROR:EngineBuilder:In node 143 (importResize): UNSUPPORTED_NODE: Assertion failed: scales.is_weights() && "Resize scales must be an initializer!"

opened by rohanpawar294 1
train with custom data

hi, thanks for sharing the great job. If using custom data, what pre-process should I do? For example, normalizing the 3D coordinate data into [2000,2000,2000], normalizing the 2D data into [256, 256]?

opened by ggfresh 1
Training on custom dataset

Hello, I'm trying to train my own dataset using this model. I'm trying to make this to COCO format. However, this dataset has problem because it only includes [x, y, z] and no bounding box. The dataset has 1person per image with [x, y, z(depth)] coordinates. Can I use it? Or should I make bounding box per image?

opened by tkddnjs98 0
MPJPE too high for protocol 2
Hi

Thank you for providing the training , eval and data preparation scripts. I followed the readme and set up the data and all the scripts in correct locations as indicated by t h file structure in readme. I kept the exact same configuration as the config.py and ran the training script for the same number of epochs. The train datase was human36 and mpii and tes set was human36 . protocol followed was the default in the code which is 2.

when i used the saved checkpoints to run the inference i get : Protocol 2 error (MPJPE) >> tot: 67.92 Directions: 60.23 Discussion: 72.39 Eating: 60.08 Greeting: 62.74 Phoning: 66.70 Posing: 59.33 Purchases: 61.72 Sitting: 81.54 SittingDown: 90.77 Smoking: 67.13 Photo: 80.52 Waiting: 63.39 Walking: 51.68 WalkDog: 71.55 WalkTogether: 58.84

which is obviously pretty high than what is expected. Just as a note i was getting an error while using broadcast in model.py module 'torch.nn.parallel' has no attribute 'comm' So I changed the original script from

accu_x = accu_x * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.output_shape[1]+1).type(torch.cuda.FloatTensor), devices=[accu_x.device.index])[0] accu_y = accu_y * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.output_shape[0]+1).type(torch.cuda.FloatTensor), devices=[accu_y.device.index])[0] accu_z = accu_z * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.depth_dim+1).type(torch.cuda.FloatTensor), devices=[accu_z.device.index])[0]

TO:

accu_x = accu_x * torch.arange(1, cfg.output_shape[1] + 1).type(torch.cuda.FloatTensor) accu_y = accu_y * torch.arange(1, cfg.output_shape[0] + 1).type(torch.cuda.FloatTensor) accu_z = accu_z * torch.arange(1, cfg.depth_dim + 1).type(torch.cuda.FloatTensor)

I didnot change anything else including batch size. And again same config file. If you can please help me root cause the issue it will be very helpful. Thanks
bug
opened by baishali1986 4
Difference between results from inference and the paper

First, thanks for your great work.

I trained the model using script 'python train.py --gpu 0-1 --backbone LPSKI' with Human3.6M and MPII datasets. The protocol is 1 and train epoch was 25.

And I tested the model with test.py and the result is like below : Protocol 1 error (PA MPJPE) >> tot: 42.72 Directions: 37.63 Discussion: 39.01 Eating: 45.51 Greeting: 43.06 Phoning: 41.33 Posing: 41.10 Purchases: 35.78 Sitting: 43.50 SittingDown: 57.36 Smoking: 47.08 Photo: 51.04 Waiting: 38.32 Walking: 30.94 WalkDog: 46.21 WalkTogether: 38.39

I found the average MPJPE of Protocol1 on paper is 35.2 which is different from my result. Did I miss something to get the right result?? Like other settings in config.py...

Also, my train time was 16 hours with RTX2080 and the train time on paper is 3 days with 2 RTX titans. So I also wonder what makes time difference between my result and the paper.
bug enhancement

opened by unoShin 24

Owner

Choi Sang Bum

Deep Learning will be implemented inside Mobile danielsejong55@gmail.com

GitHub

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos Introduction This repo is official PyTorch implementatio

29 Sep 24, 2022

Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices, ACM Multimedia 2021

Codes for ECBSR Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices Xindong Zhang, Hui Zeng, Lei Zhang ACM Multimedia 202

236 Dec 26, 2022

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

328 Dec 17, 2022

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors Human POSEitioning System (H

66 Dec 21, 2022

Python and C++ implementation of "MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation". Accepted at LXCV @ CVPR 2021.

MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation This is a PyTorch and LibTorch implementation of MarkerPose: a

47 Nov 18, 2022

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

367 Dec 27, 2022

This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021) Introduction This repository is the offical Pytorch implementation of

37 Nov 21, 2022

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

3.9k Jan 5, 2023

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

363 Dec 28, 2022

This is an official implementation for "Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation".

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation This repo is the official implementation of Exploiting Temporal Con

241 Jan 7, 2023

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

122 Dec 13, 2022

PyTorch implementation for 3D human pose estimation

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach This repository is the PyTorch implementation for the network presented in:

579 Dec 22, 2022

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose Yijun Zhou and James Gregson - BMVC2020 Abstract: We present an end-to-end head-pos

368 Dec 26, 2022

Demo for Real-time RGBD-based Extended Body Pose Estimation paper

Real-time RGBD-based Extended Body Pose Estimation This repository is a real-time demo for our paper that was published at WACV 2021 conference The ou

118 Dec 26, 2022

Real-time pose estimation accelerated with NVIDIA TensorRT

trt_pose Want to detect hand poses? Check out the new trt_pose_hand project for real-time hand pose and gesture recognition! trt_pose is aimed at enab

803 Jan 6, 2023

Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

TCMR: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video Qualtitative result Paper teaser video Introduction This r

215 Jan 6, 2023

The official repo for CVPR2021——ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search.

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search [paper] Introduction This is the official implementation of ViPNAS: Efficient V

42 Sep 26, 2022

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

20 Jan 3, 2023

Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Human Pose Regression with Residual Log-likelihood Estimation [Paper] [arXiv] [Project Page] Human Pose Regression with Residual Log-likelihood Estima

347 Dec 24, 2022

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Related tags

Overview

Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices"

Introduction

Dependencies

Directory

Root

Data

Output

3D visualization

Running 3DMPPE_POSENET

Requirements

Setup Training

Train

Test

Human3.6M dataset using protocol 1

Human3.6M dataset using protocol 2

MuPoTS-3D dataset

TFLite inference

Reference

Comments

Owner

Choi Sang Bum

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices, ACM Multimedia 2021

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Python and C++ implementation of "MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation". Accepted at LXCV @ CVPR 2021.

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

This is an official implementation for "Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation".

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

PyTorch implementation for 3D human pose estimation

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Demo for Real-time RGBD-based Extended Body Pose Estimation paper

Real-time pose estimation accelerated with NVIDIA TensorRT

Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

The official repo for CVPR2021——ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search.

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral