MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

Overview

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

This repo is the official implementation of "MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation, Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc Van Gool" in PyTorch.

Dependencies

  • Cuda 11.1
  • Python 3.6
  • Pytorch 1.7.1

Dataset setup

Please download the dataset from Human3.6m website and refer to VideoPose3D to set up the Human3.6M dataset ('./dataset' directory).

${POSE_ROOT}/
|-- dataset
|   |-- data_3d_h36m.npz
|   |-- data_2d_h36m_cpn_ft_h36m_dbb.npz

Download pretrained model

The pretrained model can be found in Google_Drive, please download it and put in the './checkpoint' dictory.

Test the model

To test on pretrained model on Human3.6M:

python main.py --reload --previous_dir 'checkpoint/pretrained'

Here, we compare our MHFormer with recent state-of-the-art methods on Human3.6M dataset. Evaluation metric is Mean Per Joint Position Error (MPJPE) in mm​.

Models MPJPE
VideoPose3D 46.8
PoseFormer 44.3
MHFormer 43.0

Train the model

To train on Human3.6M:

python main.py --train

Citation

If you find our work useful in your research, please consider citing:

@article{li2021mhformer,
  title={MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation},
  author={Li, Wenhao and Liu, Hong and Tang, Hao and Wang, Pichao and Van Gool, Luc},
  journal={arXiv preprint},
  year={2021}
}

Acknowledgement

Our code is extended from the following repositories. We thank the authors for releasing the codes.

Comments
  • About inference in real time

    About inference in real time

    Hi author,

    Thanks for your such excellent work. I did some training and tests based on your paper and codes and the results are good. I am now curious about the inference in real time. My intention is to estimate the 3D coordinates while playing back a video. According to your strategy and demo code, estimation against a center frame need the 2D poses before and after it, which means the 3D pose of a certain frame cannot be achieved until the 2D poses after it are calculated. I have no idea how to handle such case. I don't think it's a good idea just to pad some dummy data such as zeros. Appreciate any suggestions from you. Thanks.

    opened by liamsun2019 9
  • bottom-up or top-down

    bottom-up or top-down

    Hi author,

    Looks like it's a top-down model which needs an extra detector. A naive question is, for single pose 3d estimation, is it possible to just use the raw image/frame as the model input so as to omit the detector and speed up the inference time? Wait for your feedback, thanks.

    opened by liamsun2019 5
  • In the wild inference?

    In the wild inference?

    How do you guys handle in the wild inference since you only release models that have been trained on cpn_ft_detections that are already in line with the h36m skeleton?

    opened by alecda573 5
  • Reproducing the results

    Reproducing the results

    Excuse me, I can not reproduce your results.For 81 frames, I can only achieve 44.83mm MPJPE, and for 351 frame, just 43.17mm.Would you please tell me the reason or show me the way to revisit results? It would be better if you could share the training log.Thanks a lot!!!

    opened by funnypig521 4
  • Can you share the training and testing codes of the 3dhp dataset?

    Can you share the training and testing codes of the 3dhp dataset?

    I used the script provided by the P-STMO model(https://github.com/paTRICK-swk/P-STMO) to train and test the MHFormer model on the 3dhp dataset. The average value of PCK and AUC indexes obtained was 20 and mpjpe was 300, but the results were very unreliable.

    opened by ClimberY 3
  • error in vis.py

    error in vis.py

    python demo/vis.py --video sample_video.mp4

    Generating 2D pose... 100%|█████████████████████████████████████████| 291/291 [00:19<00:00, 14.91it/s] Generating 2D pose successful! ['checkpoint/pretrained/351/model_4294.pth']

    Generating 3D pose... 0%| | 0/291 [00:00<?, ?it/s] Traceback (most recent call last): File "demo/vis.py", line 279, in get_pose3D(video_path, output_dir) File "demo/vis.py", line 225, in get_pose3D show3Dpose( post_out, ax) File "demo/vis.py", line 72, in show3Dpose ax.set_aspect('equal') File "anaconda3/envs/lili/lib/python3.8/site-packages/mpl_toolkits/mplot3d/axes3d.py", line 323, in set_aspect raise NotImplementedError( NotImplementedError: Axes3D currently only supports the aspect argument 'auto'. You passed in 'equal'.

    opened by henbucuoshanghai 3
  • Is it possible to use YOLOv7 ?

    Is it possible to use YOLOv7 ?

    非常棒的工作! 我正在拼命学习你们的研究并且尝试将其应用于实际领域。

    我想问的是,

    1. output文件夹下的'input_2D'文件夹内是否已经是3D 人体姿势估计的结果(也就是你们论文中所描述的主要工作生成的结果)?因为我想将其用在优化虚拟人物动作中。
    2. 目前使用的是略显“古老”的YOLOv3和体积很大的HRNet来工作,是否有可能将其换成新的YOLOv7和Lite-HRNet,以加快识别姿态的速度,并最终实现实时识别和多目标同时识别?

    如果可行的话,能否给我讲一下修改方法?例如如何生成“ YOLOv7.weights”文件,以及能否直接在github仓库中下载Lite-HRNet来替换使用?

    请原谅我作为一个cv领域的初学者可能提问过于小白,感谢您的耐心

    Great Great Job! I'm trying my best to study your research and try to apply it in real world.

    What I want to ask is,

    1. Is the 'input_2D' folder that in the ‘output’ folder the result of 3D human pose estimation(that is, the main work in your paper)? Because I want to use it in optimizing avatar motivation.
    2. Currently working with a slightly "old" YOLOv3 and a large HRNet, is it possible to replace it with the new YOLOv7 and Lite-HRNet to speed up the recognition of poses and finally achieve real-time recognition and multi-tasking? target at the same time?

    If possible, can you tell me how to modify it? For example, how to generate the "YOLOv7.weights" file, and can I download Lite-HRNet directly in the github repository to replace it?

    Please forgive me as a beginner in cv area for asking such primary questions, thank you for your patience.

    opened by Arkitect-z 2
  • question about marking pictures in dataset

    question about marking pictures in dataset

    hi,I finished reading your paper and I have a question, it seems that the Human3.6M and MPI-INF-3DHP dataset are used for evaluation, so does it mean that the training dataset is marked by yourself? I learned from other papers(e.g. blazepalm) that the widely used method is to use a 3d software to generate keypoints-marked dataset, then use this initial model to iteratively update. 您好,我阅读了您的文章,十分厉害!但是我有一个问题想请教你们,就是我看文章里面说的那两个数据集是用来评估的,我的理解是测试集,所以训练集是您在谷歌drive上的npz文件?是您自己标注的吗?因为我目前从其他文章了解到的方法是先用3d软件通过获取准确关节点的坐标生成数据集和初始模型,然后不停的迭代(标注其他数据集)。

    opened by WaterS-MoYu 2
  • UnboundLocalError: local variable 'bboxs_pre' referenced before assignment

    UnboundLocalError: local variable 'bboxs_pre' referenced before assignment

    Thank you for the amazing work!

    Traceback (most recent call last):
      File "C:/Users/92336/Desktop/humanposeestimation/MHFormer-main/demo/vis.py", line 278, in <module>
        get_pose2D(video_path, output_dir)
      File "C:/Users/92336/Desktop/humanposeestimation/MHFormer-main/demo/vis.py", line 94, in get_pose2D
        keypoints, scores = hrnet_pose(video_path, det_dim=416, num_peroson=1, gen_output=True)
      File "C:\Users\92336\Desktop\humanposeestimation\MHFormer-main\demo\lib\hrnet\gen_kpts.py", line 118, in gen_video_kpts
        bboxs = bboxs_pre
    UnboundLocalError: local variable 'bboxs_pre' referenced before assignment
    

    I am having this issue while trying to run inference. Any fixes for this? @Vegetebird

    opened by anas-zafar 2
  • License?

    License?

    Hey there! Amazing work on this new paper. Can you all specify a license for your work? I'm the founder of NatML and I would like to bring this model to NatML Hub.

    Also, do you accept pull requests? If you all provide an open-source license for your work, I would like to add a link to the NatML implementation of your model in the README. A lot of augmented reality developers in Unity will find this work very useful. Thank you!

    opened by olokobayusuf 2
  • Something about vis.py

    Something about vis.py

    Hi author, I'm running vis.py, and I want to get three-dimensional coordinate information x,y,z. I see there are post_out and output_3D in the code, may I ask which coordinate is it? What's the difference between these two parameters?

    opened by Ared521 1
  • Add Docker environment and web demo

    Add Docker environment and web demo

    Hey @Vegetebird! 👋

    This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier. To make the model work in Cog, a few edits have been made (for example, parsing args from command line using argparser wouldn't work so we manually set the default args in set_default_options() whenever use_cog= True).

    This also means we can make a web demo where other people can try out your model! View it here: https://replicate.com/vegetebird/human-pose-estimation

    We've added some examples to the web demo; please click the black "Claim this model" button andclaim your page here so you own it/edit it.

    In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. 😊

    opened by vccheng2001 0
Owner
Vegetabird
Vegetabird also wants to fly!
Vegetabird
Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

Pyjcsx 328 Dec 17, 2022
Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors Human POSEitioning System (H

Aymen Mir 66 Dec 21, 2022
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a

Jia Li 256 Dec 24, 2022
This is an official implementation for "Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation".

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation This repo is the official implementation of Exploiting Temporal Con

Vegetabird 241 Jan 7, 2023
[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation Getting Started Our codes are implemented and tested with pyth

ZiNiU WaN 176 Dec 15, 2022
Towards Multi-Camera 3D Human Pose Estimation in Wild Environment

PanopticStudio Toolbox This repository has a toolbox to download, process, and visualize the Panoptic Studio (Panoptic) data. Note: Sep-21-2020: Curre

null 335 Jan 9, 2023
PoseViz – Multi-person, multi-camera 3D human pose visualization tool built using Mayavi.

PoseViz – 3D Human Pose Visualizer Multi-person, multi-camera 3D human pose visualization tool built using Mayavi. As used in MeTRAbs visualizations.

István Sárándi 79 Dec 30, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
Code for "Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo"

Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo This repository includes the source code for our CVPR 2021 paper on multi-view mult

Jiahao Lin 66 Jan 4, 2023
《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

null 42 Nov 24, 2022
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 52 Nov 25, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 5, 2023
Human head pose estimation using Keras over TensorFlow.

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild.

Rafael Berral Soler 71 Jan 5, 2023
Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021)

Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021) Introduction This is the official code of Deep Dual Consecutive Network for Human P

null 295 Dec 29, 2022
Bottom-up Human Pose Estimation

Introduction This is the official code of Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation. This paper has been accepted to CVPR2

null 108 Dec 1, 2022
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
HPRNet: Hierarchical Point Regression for Whole-Body Human Pose Estimation

HPRNet: Hierarchical Point Regression for Whole-Body Human Pose Estimation Official PyTroch implementation of HPRNet. HPRNet: Hierarchical Point Regre

Nermin Samet 53 Dec 4, 2022
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

Aiden Nibali 36 Oct 30, 2022
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30 sports-related actions each, for a total of 510 action clips.

Aiden Nibali 25 Jun 20, 2021