Direct Multi-view Multi-person 3D Human Pose Estimation

Related tags

Deep Learning mvp
Overview

Implementation of NeurIPS-2021 paper: Direct Multi-view Multi-person 3D Human Pose Estimation

[paper] [video-YouTube, video-Bilibili] [slides]

This is the official implementation of our NeurIPS-2021 work: Multi-view Pose Transformer (MvP). MvP is a simple algorithm that directly regresses multi-person 3D human pose from multi-view images.

Framework

mvp_framework

Example Result

mvp_framework

Reference

@article{wang2021mvp,
  title={Direct Multi-view Multi-person 3D Human Pose Estimation},
  author={Tao Wang and Jianfeng Zhang and Yujun Cai and Shuicheng Yan and Jiashi Feng},
  journal={Advances in Neural Information Processing Systems},
  year={2021}
}

1. Installation

  1. Set the project root directory as ${POSE_ROOT}.
  2. Install all the required python packages (with requirements.txt).
  3. compile deformable operation for projective attention.
cd ./models/ops
sh ./make.sh

2. Data and Pre-trained Model Preparation

2.1 CMU Panoptic

Please follow VoxelPose to download the CMU Panoptic Dataset and PoseResNet-50 pre-trained model.

The directory tree should look like this:

${POSE_ROOT}
|-- models
|   |-- pose_resnet50_panoptic.pth.tar
|-- data
|   |-- panoptic
|   |   |-- 16060224_haggling1
|   |   |   |-- hdImgs
|   |   |   |-- hdvideos
|   |   |   |-- hdPose3d_stage1_coco19
|   |   |   |-- calibration_160224_haggling1.json
|   |   |-- 160226_haggling1
|   |   |-- ...

2.2 Shelf/Campus

Please follow VoxelPose to download the Shelf/Campus Dataset.

Due to the limited and incomplete annotations of the two datasets, we use psudo ground truth 3D pose generated from VoxelPose to train the model, we expect mvp would perform much better with absolute ground truth pose data.

Please use voxelpose or other methods to generate psudo ground truth for the training set, you can also use our generated psudo GT: psudo_gt_shelf. psudo_gt_campus. psudo_gt_campus_fix_gtmorethanpred.

Due to the small dataset size, we fine-tune Panoptic pre-trained model to Shelf and Campus. Download the pretrained MvP on Panoptic from model_best_5view and model_best_3view_horizontal_view or model_best_3view_2horizon_1lookdown

The directory tree should look like this:

${POSE_ROOT}
|-- models
|   |-- model_best_5view.pth.tar
|   |-- model_best_3view_horizontal_view.pth.tar
|   |-- model_best_3view_2horizon_1lookdown.pth.tar
|-- data
|   |-- Shelf
|   |   |-- Camera0
|   |   |-- ...
|   |   |-- Camera4
|   |   |-- actorsGT.mat
|   |   |-- calibration_shelf.json
|   |   |-- pesudo_gt
|   |   |   |-- voxelpose_pesudo_gt_shelf.pickle
|   |-- CampusSeq1
|   |   |-- Camera0
|   |   |-- Camera1
|   |   |-- Camera2
|   |   |-- actorsGT.mat
|   |   |-- calibration_campus.json
|   |   |-- pesudo_gt
|   |   |   |-- voxelpose_pesudo_gt_campus.pickle
|   |   |   |-- voxelpose_pesudo_gt_campus_fix_gtmorethanpred_case.pickle

2.3 Human3.6M dataset

Please follow CHUNYUWANG/H36M-Toolbox to prepare the data.

2.4 Full Directory Tree

The data and pre-trained model directory tree should look like this, you can only download the Panoptic dataset and PoseResNet-50 for reproducing the main MvP result and ablation studies:

${POSE_ROOT}
|-- models
|   |-- pose_resnet50_panoptic.pth.tar
|   |-- model_best_5view.pth.tar
|   |-- model_best_3view_horizontal_view.pth.tar
|   |-- model_best_3view_2horizon_1lookdown.pth.tar
|-- data
|   |-- pesudo_gt
|   |   |-- voxelpose_pesudo_gt_shelf.pickle
|   |   |-- voxelpose_pesudo_gt_campus.pickle
|   |   |-- voxelpose_pesudo_gt_campus_fix_gtmorethanpred_case.pickle
|   |-- panoptic
|   |   |-- 16060224_haggling1
|   |   |   |-- hdImgs
|   |   |   |-- hdvideos
|   |   |   |-- hdPose3d_stage1_coco19
|   |   |   |-- calibration_160224_haggling1.json
|   |   |-- 160226_haggling1
|   |   |-- ...
|   |-- Shelf
|   |   |-- Camera0
|   |   |-- ...
|   |   |-- Camera4
|   |   |-- actorsGT.mat
|   |   |-- calibration_shelf.json
|   |   |-- pesudo_gt
|   |   |   |-- voxelpose_pesudo_gt_shelf.pickle
|   |-- CampusSeq1
|   |   |-- Camera0
|   |   |-- Camera1
|   |   |-- Camera2
|   |   |-- actorsGT.mat
|   |   |-- calibration_campus.json
|   |   |-- pesudo_gt
|   |   |   |-- voxelpose_pesudo_gt_campus.pickle
|   |   |   |-- voxelpose_pesudo_gt_campus_fix_gtmorethanpred_case.pickle
|   |-- HM36

3. Training and Evaluation

The evaluation result will be printed after every epoch, the best result can be found in the log.

3.1 CMU Panoptic dataset

We train and validate on the five selected camera views. We trained our models on 8 GPUs and batch_size=1 for each GPU, note the total iteration per epoch should be 3205, if not, please check your data.

python -m torch.distributed.launch --nproc_per_node=8 --use_env run/train_3d.py --cfg configs/panoptic/best_model_config.yaml

Pre-trained models

Datasets AP25 AP25 AP25 AP25 MPJPE pth
Panoptic 92.3 96.6 97.5 97.7 15.8 here

3.1.1 Ablation Experiments

You can find several ablation experiment configs under ./configs/panoptic/, for example, removing RayConv:

python -m torch.distributed.launch --nproc_per_node=8 --use_env run/train_3d.py --cfg configs/panoptic/ablation_remove_rayconv.yaml

3.2 Shelf/Campus datasets

As shelf/campus are very small dataset with incomplete annotation, we finetune pretrained MvP with pseudo ground truth 3D pose extracted with VoxelPose, we expect more accurate GT would help MvP achieve much higher performance.

python -m torch.distributed.launch --nproc_per_node=8 --use_env run/train_3d.py --cfg configs/shelf/mvp_shelf.yaml

Pre-trained models

Datasets Actor 1 Actor 2 Actor 2 Average pth
Shelf 99.3 95.1 97.8 97.4 here
Campus 98.2 94.1 97.4 96.6 here

3.3 Human3.6M dataset

MvP also applies to the naive single-person setting, with dataset like Human3.6, to come

python -m torch.distributed.launch --nproc_per_node=8 --use_env run/train_3d.py --cfg configs/h36m/mvp_h36m.yaml

4. Evaluation Only

To evaluate a trained model, pass the config and model pth:

python -m torch.distributed.launch --nproc_per_node=8 --use_env run/validate_3d.py --cfg xxx --model_path xxx

LICENSE

This repo is under the Apache-2.0 license. For commercial use, please contact the authors.

Comments
  • Error when trying to train

    Error when trying to train

    I am getting the following error when I try to run training, how should I proceed in order to solve it?

    (mvp) jpsml@jpsml-ubuntu:~/mvp$ python -m torch.distributed.launch --nproc_per_node=8 --use_env run/train_3d.py --cfg configs/campus/mvp_campus.yaml


    Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


    Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "/home/jpsml/anaconda3/envs/mvp/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/jpsml/anaconda3/envs/mvp/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/jpsml/anaconda3/envs/mvp/lib/python3.6/site-packages/torch/distributed/launch.py", line 261, in main() File "/home/jpsml/anaconda3/envs/mvp/lib/python3.6/site-packages/torch/distributed/launch.py", line 257, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/jpsml/anaconda3/envs/mvp/bin/python', '-u', 'run/train_3d.py', '--cfg', 'configs/campus/mvp_campus.yaml']' returned non-zero exit status 1.

    opened by jpsml 6
  • Questions about batch size larger than 1

    Questions about batch size larger than 1

    When I run the validation code with given pretrained checkpoint on Panoptic dataset, I found that there is something wrong with batch size larger than 1. For example, When batch size=1, the precision can be reproduced. image

    When batch size=2, it seems that the model fails to predict correctly. image

    Does anyone get the similar problem? Or can you please give some advice about the reason of this problem? thx!

    opened by wangjiongw 4
  • Singe camera results

    Singe camera results

    I have a question about your work. "Direct Multi-view Multi-person 3D Pose Estimation NIPS 2021". Your multiview performance on Panoptic Dataset is much better than VoxelPose. However, why aren't you as good as him with a single view setting. Your MPJPE is 93.8mm while VoxelPose's MPJPE is 66.95mm. And of course, I can't reproduce their results. Could you help me with this problem? Thx

    opened by xiaochehe 4
  • Error during training in evaluation

    Error during training in evaluation

    Hi, I encountered the following error when training the first epoch in the evaluation. Could you help find out the problem? Thanks in anvance.

    INFO:core.function:Test: [200/323]      Time: 0.178s (0.291s)   Speed: 28.1 samples/s   Data: 0.000s (0.055s)   Memory 465635328.0
    Traceback (most recent call last):
      File "run/train_3d.py", line 334, in <module>
        main()
      File "run/train_3d.py", line 260, in main
        final_output_dir, thr, num_views=num_views)
      File "/mnt/lustre/liqikai.vendor/open_mmlab/pose3d/mvp/lib/core/function.py", line 161, in validate_3d
        for i, (inputs, meta) in enumerate(loader):
      File "/mnt/lustre/liqikai.vendor/anaconda3/envs/pt180cu111py37mmcv1317/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
        data = self._next_data()
      File "/mnt/lustre/liqikai.vendor/anaconda3/envs/pt180cu111py37mmcv1317/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1179, in _next_data
        return self._process_data(data)
      File "/mnt/lustre/liqikai.vendor/anaconda3/envs/pt180cu111py37mmcv1317/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
        data.reraise()
      File "/mnt/lustre/liqikai.vendor/anaconda3/envs/pt180cu111py37mmcv1317/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise
        raise self.exc_type(msg)
    ValueError: Caught ValueError in DataLoader worker process 1.
    Original Traceback (most recent call last):
      File "/mnt/lustre/liqikai.vendor/anaconda3/envs/pt180cu111py37mmcv1317/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
        data = fetcher.fetch(index)
      File "/mnt/lustre/liqikai.vendor/anaconda3/envs/pt180cu111py37mmcv1317/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/mnt/lustre/liqikai.vendor/anaconda3/envs/pt180cu111py37mmcv1317/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/mnt/lustre/liqikai.vendor/open_mmlab/pose3d/mvp/lib/dataset/panoptic.py", line 257, in __getitem__
        i, m = super().__getitem__(self.num_views * idx + k)
    ValueError: too many values to unpack (expected 2)
    
    
    opened by liqikai9 2
  • Question about inference speed

    Question about inference speed

    Hello, Can I use your model in my configuration , say 3-4 cameras, by using my cameras' params ? Have you tested the method for its inference speed and if so how fast is it ? Thanks

    opened by gpastal24 1
  • Confused about num_feature_levels and use_feat_level

    Confused about num_feature_levels and use_feat_level

    In best_model_config.yaml num_feature_levels: 1 In config.py config.DECODER.use_feat_level = [0, 1, 2]

    I am confused about that if use_feat_level = [0, 1, 2], should the num_feature_levels be equal to 3 ? thanks

    opened by guoguangchao 1
  • Experiments about Human3.6M

    Experiments about Human3.6M

    Thanks for your excellent. I noticed that you showed the result on Human3.6M dataset, and compared with voxelpose on the same dataset, but related files can not be found. Could you please share the trained checkpoint or training configuration? And how long have you trained MvP and Voxelpose to get final result, respectively? Thanks for your help!

    opened by wangjiongw 0
  • How to visualize the results?

    How to visualize the results?

    Thank you very much for your perfect work. I thind that the visualizations look cool. I would like to ask how to visualize the results as yours.

    Thanks again.

    opened by xjchao7 0
  • Question about the initialization of sampling_offsets

    Question about the initialization of sampling_offsets

    Dear authors,

    In lib.models/ops/modules/projattn.py, I noticed that the weights of self.sampling_offsets is set to constant 0, and the bias has no gradient backpropagation (line 94- line 105).

    https://github.com/sail-sg/mvp/blob/80eecd012f51f49da357e337716d40a6398d520d/lib/models/ops/modules/projattn.py#L94 屏幕快照 2022-07-19 下午3 51 07

    In my opinion, if the weights are set to 0 and the bias has no gradient, the sampled offsets will be always the same across different training samples. But it seems the offsetted points are informatively selected according to Figure 5 in your paper.

    On the other hand, in the provided pretrained model, the weights and the bias are different from what they are initialized. Could you please tell me what is the final initialization method of self.sampling_offsets? Thank you very much!

    opened by Mayy1994 0
  • About Human3.6M dataset

    About Human3.6M dataset

    When I prepare to run experiments on Human 3.6M dataset, I found that the getitem function calls the function from the class it inheritates, which is JointDataset, but JointDateset return 2 items (images and meta info) only, while human 3.6m requires 5 items. Is there any reference to use the human36m dataset? Thanks

    opened by wangjiongw 2
  • About the campus pre-trained weights

    About the campus pre-trained weights

    Hi, I was trying to run a quick evaluation with the provided pre-trained model for the campus dataset. However, it seems that the pre-trained weights (d1_384_85.2.pth.tar) do not match the model. Can you help to double-check the provided pre-trained weight file? Thank you so much.

    opened by wqyin 0
  • About the MvP-Dense Attention module

    About the MvP-Dense Attention module

    In your paper, you mention that you have replaced the projective attention with dense attention module, here is the results:

    mvp_dense

    I wonder how did you run the experiment? How can I modify your code to run the experiment? Which module should I modify?

    opened by liqikai9 10
Owner
Sea AI Lab
Sea AI Lab
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a

Jia Li 256 Dec 24, 2022
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 52 Nov 25, 2022
PoseViz – Multi-person, multi-camera 3D human pose visualization tool built using Mayavi.

PoseViz – 3D Human Pose Visualizer Multi-person, multi-camera 3D human pose visualization tool built using Mayavi. As used in MeTRAbs visualizations.

István Sárándi 79 Dec 30, 2022
Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

Pyjcsx 328 Dec 17, 2022
Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors Human POSEitioning System (H

Aymen Mir 66 Dec 21, 2022
3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks Introduction This repository contains the code and models for the follo

null 124 Jan 6, 2023
PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

PyTorch Realtime Multi-Person Pose Estimation This is a pytorch version of Realtime_Multi-Person_Pose_Estimation, origin code is here Realtime_Multi-P

Dave Fang 157 Nov 12, 2022
Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)

Realtime Multi-Person Pose Estimation By Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh. Introduction Code repo for winning 2016 MSCOCO Keypoints Cha

Zhe Cao 4.9k Dec 31, 2022
Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

Gyeongsik Moon 677 Dec 25, 2022
Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation.

PersonLab This is a Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation. The model predicts heatmaps and vari

OCTI 160 Dec 21, 2022
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

GDR-Net This repo provides the PyTorch implementation of the work: Gu Wang, Fabian Manhardt, Federico Tombari, Xiangyang Ji. GDR-Net: Geometry-Guided

null 169 Jan 7, 2023
[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation Getting Started Our codes are implemented and tested with pyth

ZiNiU WaN 176 Dec 15, 2022
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation This repo is the official implementation of "MHFormer: Multi-Hypothesis Transforme

Vegetabird 281 Jan 7, 2023
Towards Multi-Camera 3D Human Pose Estimation in Wild Environment

PanopticStudio Toolbox This repository has a toolbox to download, process, and visualize the Panoptic Studio (Panoptic) data. Note: Sep-21-2020: Curre

null 335 Jan 9, 2023
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

German Bauer 11 Feb 8, 2022
《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

null 42 Nov 24, 2022
Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Single-view robot pose and joint angle estimation via render & compare Yann Labbé, Justin Carpentier, Mathieu Aubry, Josef Sivic CVPR: Conference on C

Yann Labbé 51 Oct 14, 2022