Direct Multi-view Multi-person 3D Human Pose Estimation

Sea AI Lab

Last update: Jan 5, 2023

Related tags

Miscellaneous mvp

Overview

Implementation of NeurIPS-2021 paper: Direct Multi-view Multi-person 3D Human Pose Estimation

[paper] [video-YouTube, video-Bilibili] [slides]

This is the official implementation of our NeurIPS-2021 work: Multi-view Pose Transformer (MvP). MvP is a simple algorithm that directly regresses multi-person 3D human pose from multi-view images.

Framework

Example Result

Reference

@article{wang2021mvp,
  title={Direct Multi-view Multi-person 3D Human Pose Estimation},
  author={Tao Wang and Jianfeng Zhang and Yujun Cai and Shuicheng Yan and Jiashi Feng},
  journal={Advances in Neural Information Processing Systems},
  year={2021}
}

1. Installation

Set the project root directory as ${POSE_ROOT}.
Install all the required python packages (with requirements.txt).
compile deformable operation for projective attention.

cd ./models/ops
sh ./make.sh

2. Data and Pre-trained Model Preparation

2.1 CMU Panoptic

Please follow VoxelPose to download the CMU Panoptic Dataset and PoseResNet-50 pre-trained model.

The directory tree should look like this:

${POSE_ROOT}
|-- models
|   |-- pose_resnet50_panoptic.pth.tar
|-- data
|   |-- panoptic
|   |   |-- 16060224_haggling1
|   |   |   |-- hdImgs
|   |   |   |-- hdvideos
|   |   |   |-- hdPose3d_stage1_coco19
|   |   |   |-- calibration_160224_haggling1.json
|   |   |-- 160226_haggling1
|   |   |-- ...

2.2 Shelf/Campus

Please follow VoxelPose to download the Shelf/Campus Dataset.

Due to the limited and incomplete annotations of the two datasets, we use psudo ground truth 3D pose generated from VoxelPose to train the model, we expect mvp would perform much better with absolute ground truth pose data.

Please use voxelpose or other methods to generate psudo ground truth for the training set, you can also use our generated psudo GT: psudo_gt_shelf. psudo_gt_campus. psudo_gt_campus_fix_gtmorethanpred.

Due to the small dataset size, we fine-tune Panoptic pre-trained model to Shelf and Campus. Download the pretrained MvP on Panoptic from model_best_5view and model_best_3view_horizontal_view or model_best_3view_2horizon_1lookdown

The directory tree should look like this:

${POSE_ROOT}
|-- models
|   |-- model_best_5view.pth.tar
|   |-- model_best_3view_horizontal_view.pth.tar
|   |-- model_best_3view_2horizon_1lookdown.pth.tar
|-- data
|   |-- Shelf
|   |   |-- Camera0
|   |   |-- ...
|   |   |-- Camera4
|   |   |-- actorsGT.mat
|   |   |-- calibration_shelf.json
|   |   |-- pesudo_gt
|   |   |   |-- voxelpose_pesudo_gt_shelf.pickle
|   |-- CampusSeq1
|   |   |-- Camera0
|   |   |-- Camera1
|   |   |-- Camera2
|   |   |-- actorsGT.mat
|   |   |-- calibration_campus.json
|   |   |-- pesudo_gt
|   |   |   |-- voxelpose_pesudo_gt_campus.pickle
|   |   |   |-- voxelpose_pesudo_gt_campus_fix_gtmorethanpred_case.pickle

2.3 Human3.6M dataset

Please follow CHUNYUWANG/H36M-Toolbox to prepare the data.

2.4 Full Directory Tree

The data and pre-trained model directory tree should look like this, you can only download the Panoptic dataset and PoseResNet-50 for reproducing the main MvP result and ablation studies:

${POSE_ROOT}
|-- models
|   |-- pose_resnet50_panoptic.pth.tar
|   |-- model_best_5view.pth.tar
|   |-- model_best_3view_horizontal_view.pth.tar
|   |-- model_best_3view_2horizon_1lookdown.pth.tar
|-- data
|   |-- pesudo_gt
|   |   |-- voxelpose_pesudo_gt_shelf.pickle
|   |   |-- voxelpose_pesudo_gt_campus.pickle
|   |   |-- voxelpose_pesudo_gt_campus_fix_gtmorethanpred_case.pickle
|   |-- panoptic
|   |   |-- 16060224_haggling1
|   |   |   |-- hdImgs
|   |   |   |-- hdvideos
|   |   |   |-- hdPose3d_stage1_coco19
|   |   |   |-- calibration_160224_haggling1.json
|   |   |-- 160226_haggling1
|   |   |-- ...
|   |-- Shelf
|   |   |-- Camera0
|   |   |-- ...
|   |   |-- Camera4
|   |   |-- actorsGT.mat
|   |   |-- calibration_shelf.json
|   |   |-- pesudo_gt
|   |   |   |-- voxelpose_pesudo_gt_shelf.pickle
|   |-- CampusSeq1
|   |   |-- Camera0
|   |   |-- Camera1
|   |   |-- Camera2
|   |   |-- actorsGT.mat
|   |   |-- calibration_campus.json
|   |   |-- pesudo_gt
|   |   |   |-- voxelpose_pesudo_gt_campus.pickle
|   |   |   |-- voxelpose_pesudo_gt_campus_fix_gtmorethanpred_case.pickle
|   |-- HM36

3. Training and Evaluation

The evaluation result will be printed after every epoch, the best result can be found in the log.

3.1 CMU Panoptic dataset

We train and validate on the five selected camera views. We trained our models on 8 GPUs and batch_size=1 for each GPU, note the total iteration per epoch should be 3205, if not, please check your data.

python -m torch.distributed.launch --nproc_per_node=8 --use_env run/train_3d.py --cfg configs/panoptic/best_model_config.yaml

Pre-trained models

Datasets	AP₂₅	AP₂₅	AP₂₅	AP₂₅	MPJPE	pth
Panoptic	92.3	96.6	97.5	97.7	15.8	here

3.1.1 Ablation Experiments

You can find several ablation experiment configs under ./configs/panoptic/, for example, removing RayConv:

python -m torch.distributed.launch --nproc_per_node=8 --use_env run/train_3d.py --cfg configs/panoptic/ablation_remove_rayconv.yaml

3.2 Shelf/Campus datasets

As shelf/campus are very small dataset with incomplete annotation, we finetune pretrained MvP with pseudo ground truth 3D pose extracted with VoxelPose, we expect more accurate GT would help MvP achieve much higher performance.

python -m torch.distributed.launch --nproc_per_node=8 --use_env run/train_3d.py --cfg configs/shelf/mvp_shelf.yaml

Pre-trained models

Datasets	Actor 1	Actor 2	Actor 2	Average	pth
Shelf	99.3	95.1	97.8	97.4	here
Campus	98.2	94.1	97.4	96.6	here

3.3 Human3.6M dataset

MvP also applies to the naive single-person setting, with dataset like Human3.6, to come

python -m torch.distributed.launch --nproc_per_node=8 --use_env run/train_3d.py --cfg configs/h36m/mvp_h36m.yaml

4. Evaluation Only

To evaluate a trained model, pass the config and model pth:

python -m torch.distributed.launch --nproc_per_node=8 --use_env run/validate_3d.py --cfg xxx --model_path xxx

LICENSE

This repo is under the Apache-2.0 license. For commercial use, please contact the authors.

Comments

Error when trying to train

I am getting the following error when I try to run training, how should I proceed in order to solve it?

(mvp) jpsml@jpsml-ubuntu:~/mvp$ python -m torch.distributed.launch --nproc_per_node=8 --use_env run/train_3d.py --cfg configs/campus/mvp_campus.yaml

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "run/train_3d.py", line 34, in import dataset File "/home/jpsml/mvp/run/../lib/dataset/init.py", line 20, in from dataset.h36m import H36M as h36m File "/home/jpsml/mvp/run/../lib/dataset/h36m.py", line 30, in from lib.utils.cameras_cpu import camera_to_world_frame, project_pose ModuleNotFoundError: No module named 'lib' Traceback (most recent call last): File "/home/jpsml/anaconda3/envs/mvp/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/jpsml/anaconda3/envs/mvp/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/jpsml/anaconda3/envs/mvp/lib/python3.6/site-packages/torch/distributed/launch.py", line 261, in main() File "/home/jpsml/anaconda3/envs/mvp/lib/python3.6/site-packages/torch/distributed/launch.py", line 257, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/jpsml/anaconda3/envs/mvp/bin/python', '-u', 'run/train_3d.py', '--cfg', 'configs/campus/mvp_campus.yaml']' returned non-zero exit status 1.

opened by jpsml 6
Questions about batch size larger than 1

When I run the validation code with given pretrained checkpoint on Panoptic dataset, I found that there is something wrong with batch size larger than 1. For example, When batch size=1, the precision can be reproduced.

When batch size=2, it seems that the model fails to predict correctly.

Does anyone get the similar problem? Or can you please give some advice about the reason of this problem? thx!

opened by wangjiongw 4
Singe camera results

I have a question about your work. "Direct Multi-view Multi-person 3D Pose Estimation NIPS 2021". Your multiview performance on Panoptic Dataset is much better than VoxelPose. However, why aren't you as good as him with a single view setting. Your MPJPE is 93.8mm while VoxelPose's MPJPE is 66.95mm. And of course, I can't reproduce their results. Could you help me with this problem? Thx

opened by xiaochehe 4

Error during training in evaluation

Hi, I encountered the following error when training the first epoch in the evaluation. Could you help find out the problem? Thanks in anvance.

INFO:core.function:Test: [200/323]      Time: 0.178s (0.291s)   Speed: 28.1 samples/s   Data: 0.000s (0.055s)   Memory 465635328.0
Traceback (most recent call last):
  File "run/train_3d.py", line 334, in <module>
    main()
  File "run/train_3d.py", line 260, in main
    final_output_dir, thr, num_views=num_views)
  File "/mnt/lustre/liqikai.vendor/open_mmlab/pose3d/mvp/lib/core/function.py", line 161, in validate_3d
    for i, (inputs, meta) in enumerate(loader):
  File "/mnt/lustre/liqikai.vendor/anaconda3/envs/pt180cu111py37mmcv1317/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/mnt/lustre/liqikai.vendor/anaconda3/envs/pt180cu111py37mmcv1317/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1179, in _next_data
    return self._process_data(data)
  File "/mnt/lustre/liqikai.vendor/anaconda3/envs/pt180cu111py37mmcv1317/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
    data.reraise()
  File "/mnt/lustre/liqikai.vendor/anaconda3/envs/pt180cu111py37mmcv1317/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/mnt/lustre/liqikai.vendor/anaconda3/envs/pt180cu111py37mmcv1317/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/mnt/lustre/liqikai.vendor/anaconda3/envs/pt180cu111py37mmcv1317/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/mnt/lustre/liqikai.vendor/anaconda3/envs/pt180cu111py37mmcv1317/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/mnt/lustre/liqikai.vendor/open_mmlab/pose3d/mvp/lib/dataset/panoptic.py", line 257, in __getitem__
    i, m = super().__getitem__(self.num_views * idx + k)
ValueError: too many values to unpack (expected 2)

opened by liqikai9 2

Question about inference speed

Hello, Can I use your model in my configuration , say 3-4 cameras, by using my cameras' params ? Have you tested the method for its inference speed and if so how fast is it ? Thanks

opened by gpastal24 1
Confused about num_feature_levels and use_feat_level

In best_model_config.yaml num_feature_levels: 1 In config.py config.DECODER.use_feat_level = [0, 1, 2]

I am confused about that if use_feat_level = [0, 1, 2], should the num_feature_levels be equal to 3 ? thanks

opened by guoguangchao 1
Experiments about Human3.6M

Thanks for your excellent. I noticed that you showed the result on Human3.6M dataset, and compared with voxelpose on the same dataset, but related files can not be found. Could you please share the trained checkpoint or training configuration? And how long have you trained MvP and Voxelpose to get final result, respectively? Thanks for your help!

opened by wangjiongw 0
How to visualize the results?

Thank you very much for your perfect work. I thind that the visualizations look cool. I would like to ask how to visualize the results as yours.

Thanks again.

opened by xjchao7 0
Question about the initialization of sampling_offsets

Dear authors,

In lib.models/ops/modules/projattn.py, I noticed that the weights of self.sampling_offsets is set to constant 0, and the bias has no gradient backpropagation (line 94- line 105).

https://github.com/sail-sg/mvp/blob/80eecd012f51f49da357e337716d40a6398d520d/lib/models/ops/modules/projattn.py#L94

In my opinion, if the weights are set to 0 and the bias has no gradient, the sampled offsets will be always the same across different training samples. But it seems the offsetted points are informatively selected according to Figure 5 in your paper.

On the other hand, in the provided pretrained model, the weights and the bias are different from what they are initialized. Could you please tell me what is the final initialization method of self.sampling_offsets? Thank you very much!

opened by Mayy1994 0
About Human3.6M dataset

When I prepare to run experiments on Human 3.6M dataset, I found that the getitem function calls the function from the class it inheritates, which is JointDataset, but JointDateset return 2 items (images and meta info) only, while human 3.6m requires 5 items. Is there any reference to use the human36m dataset? Thanks

opened by wangjiongw 2
About the campus pre-trained weights

Hi, I was trying to run a quick evaluation with the provided pre-trained model for the campus dataset. However, it seems that the pre-trained weights (d1_384_85.2.pth.tar) do not match the model. Can you help to double-check the provided pre-trained weight file? Thank you so much.

opened by wqyin 0
About the MvP-Dense Attention module

In your paper, you mention that you have replaced the projective attention with dense attention module, here is the results:

I wonder how did you run the experiment? How can I modify your code to run the experiment? Which module should I modify?

opened by liqikai9 10

Owner

Sea AI Lab

GitHub

Tool for working with Direct System Calls in Cobalt Strike's Beacon Object Files (BOF) via Syswhispers2

150 Dec 31, 2022

ChainJacking is a tool to find which of your Go lang direct GitHub dependencies is susceptible to ChainJacking attack.

36 Nov 2, 2022

APRS Track Direct is a collection of tools that can be used to run an APRS website

APRS Track Direct APRS Track Direct is a collection of tools that can be used to run an APRS website. You can use data from APRS-IS, CWOP-IS, OGN, HUB

42 Dec 29, 2022

Bootstraparse is a personal project started with a specific goal in mind: creating static html pages for direct display from a markdown-like file

1 Jun 15, 2022

Multi View Stereo on Internet Images

Evaluating MVS in a CPC Scenario This repository contains the set of artficats used for the ENGN8601/8602 research project. The thesis emphasizes on t

1 Nov 10, 2021

A person does not exist image bot

3 Dec 12, 2021

Script to calculate the italian fiscal code of a person.

fiscal_code Hi! This is my first public repository, so please be kind if it is not well formatted or it contains errors. I started learning Python abo

1 Nov 20, 2021

KeyBrowser: A program launches a browser and a keylogger at the same time, is used to retrieve a person's personal information

3 Oct 16, 2022

Enhanced version of blender's bvh add-on with more settings supported. The bvh's rest pose should have the same handedness as the armature while could use a different up/forward definiton.

Enhanced bvh add-on (importer/exporter) for blender Enhanced bvh add-on (importer/exporter) for blender Enhanced bvh importer Enhanced bvh exporter Ho

16 Dec 20, 2022

Direct Multi-view Multi-person 3D Human Pose Estimation

Related tags

Overview

Implementation of NeurIPS-2021 paper: Direct Multi-view Multi-person 3D Human Pose Estimation

[paper] [video-YouTube, video-Bilibili] [slides]

Framework

Example Result

Reference

1. Installation

2. Data and Pre-trained Model Preparation

2.1 CMU Panoptic

2.2 Shelf/Campus

2.3 Human3.6M dataset

2.4 Full Directory Tree

3. Training and Evaluation

3.1 CMU Panoptic dataset

Pre-trained models

3.1.1 Ablation Experiments

3.2 Shelf/Campus datasets

Pre-trained models

3.3 Human3.6M dataset

4. Evaluation Only

LICENSE

Comments

Owner

Sea AI Lab

Tool for working with Direct System Calls in Cobalt Strike's Beacon Object Files (BOF) via Syswhispers2

ChainJacking is a tool to find which of your Go lang direct GitHub dependencies is susceptible to ChainJacking attack.

APRS Track Direct is a collection of tools that can be used to run an APRS website

Bootstraparse is a personal project started with a specific goal in mind: creating static html pages for direct display from a markdown-like file

Multi View Stereo on Internet Images

A person does not exist image bot

Script to calculate the italian fiscal code of a person.

KeyBrowser: A program launches a browser and a keylogger at the same time, is used to retrieve a person's personal information

Enhanced version of blender's bvh add-on with more settings supported. The bvh's rest pose should have the same handedness as the armature while could use a different up/forward definiton.

An addin for Autodesk Fusion 360 that lets you view your design in a Looking Glass Portrait 3D display

This is a Fava extension to display a grouped portfolio view in Fava for a set of Beancount accounts.

A Lego Mindstorm robot for dealing out cards based on a birds-eye view of a poker table and given ArUco fiducial tags.

A 3D Slicer Extension to view data from the flywheel heirarchy

A simply dashboard to view commodities position data based on CFTC reports

Node editor view image node

A bot to view Dilbert comics directly from Discord and get updates of the comics automatically.

This repo presents you the official code of "VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention"

Transpiles some Python into human-readable Golang.

Neogex is a human readable parser standard, being implemented in Python