Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Davis Rempe

Last update: Dec 24, 2022

Related tags

Deep Learning humor

Overview

HuMoR: 3D Human Motion Model for Robust Pose Estimation (ICCV 2021)

This is the official implementation for the ICCV 2021 paper. For more information, see the project webpage.

Environment Setup

Note: This code was developed on Ubuntu 16.04/18.04 with Python 3.7, CUDA 10.1 and PyTorch 1.6.0. Later versions should work, but have not been tested.

Create and activate a virtual environment to work in, e.g. using Conda:

conda create -n humor_env python=3.7
conda activate humor_env

Install CUDA and PyTorch 1.6. For CUDA 10.1, this would look like:

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch

Install the remaining requirements with pip:

pip install -r requirements.txt

You must also have ffmpeg installed on your system to save visualizations.

Downloads & External Dependencies

This codebase relies on various external downloads in order to run for certain modes of operation. Here we briefly overview each and what they are used for. Detailed setup instructions are linked in other READMEs.

Body Model and Pose Prior

Detailed instructions to install SMPL+H and VPoser are in this documentation.

SMPL+H is used for the pose/shape body model. Downloading this model is necessary for all uses of this codebase.
VPoser is used as a pose prior only during the initialization phase of fitting, so it's only needed if you are using the test-time optimization functionality of this codebase.

Datasets

Detailed instructions to install, configure, and process each dataset are in this documentation.

AMASS motion capture data is used to train and evaluate (e.g. randomly sample) the HuMoR motion model and for fitting to 3D data like noisy joints and partial keypoints.
i3DB contains RGB videos with heavy occlusions and is only used in the paper to evaluate test-time fitting to 2D joints.
PROX contains RGB-D videos and is only used in the paper to evaluate test-time fitting to 2D joints and 3D point clouds.

Pretrained Models

Pretrained model checkpoints are available for HuMoR, HuMoR-Qual, and the initial state Gaussian mixture. To download (~215 MB), from the repo root run bash get_ckpt.sh.

OpenPose

OpenPose is used to detect 2D joints for fitting to arbitrary RGB videos. If you will be running test-time optimization on the demo video or your own videos, you must install OpenPose. To clone and build, please follow the OpenPose README in their repo.

Optimization in run_fitting.py assumes OpenPose is installed at ./external/openpose by default - if you install elsewhere, please pass in the location using the --openpose flag.

Fitting to RGB Videos (Test-Time Optimization)

To run motion/shape estimation on an arbitrary RGB video, you must have SMPL+H, VPoser, OpenPose, and a pretrained HuMoR model as detailed above. We have included a demo video in this repo along with a few example configurations to get started.

Note: if running on your own video, make sure the camera is not moving and the person is not interacting with uneven terrain in the scene (we assume a single ground plane). Also, only one person will be reconstructed.

To run the optimization on the demo video use:

python humor/fitting/run_fitting.py @./configs/fit_rgb_demo_no_split.cfg

This configuration optimizes over the entire video (~3 sec) at once (i.e. over all frames). If your video is longer than 2-3 sec, it is recommended to instead use the settings in ./configs/fit_rgb_demo_use_split.cfg which adds the --rgb-seq-len, --rgb-overlap-len, and --rgb-overlap-consist-weight arguments. Using this configuration, the input video is split into multiple overlapping sub-sequences and optimized in a batched fashion (with consistency losses between sub-sequences). This increases efficiency, and lessens the need to tune parameters based on video length. Note the larger the batch size, the better the results will be.

If known, it's highly recommended to pass in camera intrinsics using the --rgb-intrinsics flag. See ./configs/intrinsics_default.json for an example of what this looks like. If intrinsics are not given, default focal lengths are used.

Finally, this demo does not use PlaneRCNN to initialize the ground as described in the paper. Instead, it roughly initializes the ground at y = 0.5 (with camera up-axis -y). We found this to be sufficient and often better than using PlaneRCNN. If you want to use PlaneRCNN instead, set up a separate environment, follow their install instructions, then use the following command to run their method where example_image_dir contains a single frame from your video and the camera parameters: python evaluate.py --methods=f --suffix=warping_refine --dataset=inference --customDataFolder=example_image_dir. The results directory can be passed into our optimization using the --rgb-planercnn-res flag.

Visualizing RGB Results

The optimization is performed in 3 stages, with stages 1 & 2 being initialization using a pose prior and smoothing (i.e. the VPoser-t baseline) and stage 3 being the full optimization with the HuMoR motion prior. So for the demo, the final output for the full sequence will be saved in ./out/rgb_demo_no_split/results_out/final_results/stage3_results.npz. To visualize results from the fitting use something like:

python humor/fitting/viz_fitting_rgb.py  --results ./out/rgb_demo_no_split/results_out --out ./out/rgb_demo_no_split/viz_out --viz-prior-frame

By default, this will visualize the final full video result along with each sub-sequence separately (if applicable). Please use --help to see the many additional visualization options. This code is also useful to see how to load in and use the results for other tasks, if desired.

Fitting on Specific Datasets

Next, we detail how to run and evaluate the test-time optimization on the various datasets presented in the paper. In all these examples, the default batch size is quite small to accomodate smaller GPUs, but it should be increased depending on your system.

AMASS 3D Data

There are multiple settings possible for fitting to 3D data (e.g. noisy joints, partial keypoints, etc...), which can be specified using configuration flags. For example, to fit to partial upper-body 3D keypoints sampled from AMASS data, run:

python humor/fitting/run_fitting.py @./configs/fit_amass_keypts.cfg

Optimization results can be visualized using

python humor/fitting/eval_fitting_3d.py --results ./out/amass_verts_upper_fitting/results_out --out ./out/amass_verts_upper_fitting/eval_out  --qual --viz-stages --viz-observation

and evaluation metrics computed with

python humor/fitting/eval_fitting_3d.py --results ./out/amass_verts_upper_fitting/results_out --out ./out/amass_verts_upper_fitting/eval_out  --quant --quant-stages

The most relevant quantitative results will be written to eval_out/eval_quant/compare_mean.csv.

i3DB RGB Data

The i3DB dataset contains RGB videos with many occlusions along with annotated 3D joints for evaluation. To run test-time optimization on the full dataset, use:

python humor/fitting/run_fitting.py @./configs/fit_imapper.cfg

Results can be visualized using the same script as in the demo:

python humor/fitting/viz_fitting_rgb.py  --results ./out/imapper_fitting/results_out --out ./out/imapper_fitting/viz_out --viz-prior-frame

Quantitative evaluation (comparing to results after each optimization stage) can be run with:

python humor/fitting/eval_fitting_2d.py --results ./out/imapper_fitting/results_out --dataset iMapper --imapper-floors ./data/iMapper/i3DB/floors --out ./out/imapper_fitting/eval_out --quant --quant-stages

The final quantitative results will be written to eval_out/eval_quant/compare_mean.csv.

PROX RGB/RGB-D Data

PROX contains RGB-D data so affords fitting to just 2D joints and 2D joints + 3D point cloud. The commands for running each of these are quite similar, just using different configuration files. For running on the full RGB-D data, use:

python humor/fitting/run_fitting.py @./configs/fit_proxd.cfg

Visualization must add the --flip-img flag to align with the original PROX videos:

python humor/fitting/viz_fitting_rgb.py  --results ./out/proxd_fitting/results_out --out ./out/proxd_fitting/viz_out --viz-prior-frame --flip-img

Quantitative evalution (of plausibility metrics) for full RGB-D data uses

python humor/fitting/eval_fitting_2d.py --results ./out/proxd_fitting/results_out --dataset PROXD --prox-floors ./data/prox/qualitative/floors --out ./out/proxd_fitting/eval_out --quant --quant-stages

and for just RGB data is slightly different:

python humor/fitting/eval_fitting_2d.py --results ./out/prox_fitting/results_out --dataset PROX --prox-floors ./data/prox/qualitative/floors --out ./out/prox_fitting/eval_out --quant --quant-stages

Training & Testing Motion Model

There are two versions of our model: HuMoR and HuMoR-Qual. HuMoR is the main model presented in the paper and is best suited for test-time optimization. HuMoR-Qual is a slight variation on HuMoR that gives more stable and qualitatively superior results for random motion generation (see the paper for details).

Below we describe how to train and test HuMoR, but the exact same commands are used for HuMoR-Qual with a different configuration file at each step (see all provided configs).

Training HuMoR

To train HuMoR from scratch, make sure you have the processed version of the AMASS dataset at ./data/amass_processed and run:

python humor/train/train_humor.py @./configs/train_humor.cfg

The default batch size is meant for a 16 GB GPU.

Testing HuMoR

After training HuMoR or downloading the pretrained checkpoints, we can evaluate the model in multiple ways

To compute single-step losses (the exact same as during training) over the entire test set run:

python humor/test/test_humor.py @./configs/test_humor.cfg

To randomly sample a motion sequence and save a video visualization, run:

python humor/test/test_humor.py @./configs/test_humor_sampling.cfg

If you'd rather visualize the sampling results in an interactive viewer, use:

python humor/test/test_humor.py @./configs/test_humor_sampling_debug.cfg

Try adding --viz-pred-joints, --viz-smpl-joints, or --viz-contacts to the end of the command to visualize more outputs, or increasing the value of --eval-num-samples to sample the model multiple times from the same initial state. --help can always be used to see all flags and their descriptions.

Training Initial State GMM

Test-time optimization also uses a Gaussian mixture model (GMM) prior over the initial state of the sequence. The pretrained model can be downloaded above, but if you wish to train from scratch, run:

python humor/train/train_state_prior.py --data ./data/amass_processed --out ./out/init_state_prior_gmm --gmm-comps 12

Citation

If you found this code or paper useful, please consider citing:

@inproceedings{rempe2021humor,
    author={Rempe, Davis and Birdal, Tolga and Hertzmann, Aaron and Yang, Jimei and Sridhar, Srinath and Guibas, Leonidas J.},
    title={HuMoR: 3D Human Motion Model for Robust Pose Estimation},
    booktitle={International Conference on Computer Vision (ICCV)},
    year={2021}
}

Questions?

If you run into any problems or have questions, please create an issue or contact Davis (first author) via email.

Comments

Performance with low fps

Hi Davis,

Thanks for this great work. I tried running this on a video sampled at 2fps and noticed a reduction in the quality of the reconstruction. Is this expected? Do you have suggestions on improving the results at low fps? Here is a sample result at 30fps vs 2fps

https://user-images.githubusercontent.com/35488694/138021386-2a396a89-f5f7-48dd-a92c-981b40ea64b5.mp4

https://user-images.githubusercontent.com/35488694/138021421-0e7218ac-35f2-4bd9-884d-24f5dff9379c.mp4

Thank you

opened by micaeltchapmi 6
Add hand pose optimization

Hi @davrempe, nice work! As mentioned in readme, current scripts can return a smooth sequence. I want to optimize body and hand pose together, could you give me some advise on the optimization pipeline?

opened by NewCoderQ 5
Error in fitting the i3DB RGB Data

Hi, Davis, thanks for sharing with us your excellent work.

I followed the instructions in README.md to run python humor/fitting/run_fitting.py @./configs/fit_imapper.cfg in order to fit the whole iMapper dataset. At first, the test-time optimization process went well, but after a long time, the process terminated and reported an error. I noticed that the loss was abnormally large (the loss soared in the 30th iteration in Stage 3, and after a while, the process died). I repeated the optimization several times, but no process survived. The total run time varied. Some lasted 3h, and some lasted 6h. I would appreciate it if you would give me some advice. Looking for your early reply.

opened by tinatiansjz 3
ValueError: need at least one array to stack

run the demo...

Traceback (most recent call last): File "humor/fitting/run_fitting.py", line 439, in main(args, config_file) File "humor/fitting/run_fitting.py", line 175, in main video_name=vid_name File "/home/junwei/zjw/humor/humor/fitting/../datasets/rgb_dataset.py", line 59, in init self.data_dict, self.seq_intervals = self.load_data() File "/home/junwei/zjw/humor/humor/fitting/../datasets/rgb_dataset.py", line 146, in load_data joint2d_data = np.stack(keyp_frames, axis=0) # T x J x 3 (x,y,conf) File "<array_function internals>", line 6, in stack File "/home/junwei/anaconda3/envs/humor_env/lib/python3.7/site-packages/numpy/core/shape_base.py", line 422, in stack raise ValueError('need at least one array to stack') ValueError: need at least one array to stack

what happen?

opened by MisterJunwei 3
Is it possible to capture multiplayer ，I found that they were all individual reconstructions
This is a great project, thanks for your open source, I have two questions

Whether it can be realized in real time

Is it applicable to multi-person scenarios Looking forward to your reply
opened by zhanghongyong123456 3
why the prior loss can be negative

Hello, thank you for this great work! In stage3, I found the motion prior and init motion prior loss can be negative. As they are log-likelihood, is this Normal ?

opened by AIML 2
List index out of range

Hi, in prior to the question, thanks for the great work!

While running fitting to RGB videos (test time optimization), I get the error of:

======= iter 69 ======= Optimized sequence 0 in 461.702792 s Saving final results... Traceback (most recent call last): File "humor/fitting/run_fitting.py", line 439, in main(args, config_file) File "humor/fitting/run_fitting.py", line 433, in main body_model_path, num_betas, use_joints2d) File "/home/humor/humor/fitting/../fitting/fitting_utils.py", line 434, in save_rgb_stitched_result concat_cam_res[k] = torch.cat([concat_cam_res[k], cur_stage3_res[k][seq_overlaps[res_idx]:]], dim=0) IndexError: list index out of range

I used the base command that you guys have provided:

python humor/fitting/run_fitting.py @./configs/fit_rgb_demo_no_split.cfg

I can't seem to find why, can you help?

Thanks

opened by yc4ny 2
AttributeError: type object 'SMPL' has no attribute 'SHAPE_SPACE_DIM'

Thank you for your perfect work. When I run the optimization on the demo video, I met the following problem:

I have downloaded the smplh model from the website. Could you please give me some advice?

Thx!

opened by Xianjin111 2
Can we change batch-size and num-iters in test-time optimization?

I want to run the test-time optimization faster on my machine. Can we change batch-size to 32 and reduce num-iters to 3 8 7 in configs/fit_imapper.cfg?

opened by hmchuong 2
can I get the 3d keypoints from the final outputs?

Hi, davrempe Thanks for your generous share of this wonderful work, I have tested the model on my own video, the results are excellent except the consumiing time~ I wonder if i can get the absolute 3d coordinates of the keypoints from the output? thanks~

opened by visonpon 2
the design of overlap_len seems wrong

When I build a RGBVideoDataset from a video, I find that the design of overlap_len seems wrong. Assume that seq_len is 5 and overlap_len is 2, the first subseq's seq_interval should be (0,5) and the second subseq's seq_interval should be (3,8). Only in this way, the length of overlap is 2, that is, the part of overlap is the frame 3 and frame 4. But!!!!!!Your code's result actually is (0,5) and (2,7), and the length of overlap is 3, that is, the part of overlap is the frame 2, 3 and 4. 3 not equal to 2! So, are you sure that the variable overlap_len stands for the length of overlap?

opened by HospitableHost 1
Performence with video captured by mobile phone

Hi Davis， Thanks for this great work. I have tested the model with the demo video and the performence is great. However, when I test it with the video captured by mobile phone, the mesh seems to jitter strangely among frames, the pose and shape is also strange. I have checked that the 2D points produced by OpenPose is correct. The video I used and the result is shown below. I am wondering if this is caused by the camera intrinsics since I use the default intrinsics. Or there is something wrong when I use ./configs/fit_rgb_demo_use_split.cfg during fitting process.

https://user-images.githubusercontent.com/73419275/209795187-1db739b9-5069-4553-859c-8893bda6792d.mp4

opened by henrycjh 2
Extracting fbx file of animated mesh?

At first, I want to congratulate the creators on their excellent work! I find it very useful and interesting! I managed to run Humor on RGB videos and images and I wonder what are the steps for the creation of an .fbx file of the generated animated mesh (e.g. to use it in Blender or Unity environments). I would appreciate any help/ideas.

Thank you in advance!

opened by DoraPist 2
bug when run rgb_use_spli

your code can't work when “python humor/fitting/run_fitting.py @./configs/fit_rgb_demo_use_split.cfg”. bug of UnboundLocalError: local variable 'body_model_path' referenced before assign, after I fixed this bug, there is another bug "IndexError: list index out of range"

opened by HospitableHost 5
i3DB results

Hello, thank you for this great work! I have a question about the reproducibility of the results on the i3DB dataset: when I use your code, the final global joint error is 33.5cm and I have 34.3 for Vposer-t (which are respectively equal to 28.15 and 31.59 in the paper). Am I missing something or were your testing settings any different from those in the code? By the way, is this expected to obtain only a 1cm gap between VPoser-t and HuMoR since the CVAE prior seems to be crucial for good predictions? Thank you!

opened by g-fiche 4
pytorch 3x3 matmul problems with batch size 3

What is the issue this line is referring to https://github.com/davrempe/humor/blob/b86c2d9faf7abd497749621821a5d46211304d62/humor/fitting/run_fitting.py#L276?

Thanks a ton!

opened by ZhengyiLuo 0

Owner

Davis Rempe

GitHub

code for ICCV 2021 paper 'Generalized Source-free Domain Adaptation'

G-SFDA Code (based on pytorch 1.3) for our ICCV 2021 paper 'Generalized Source-free Domain Adaptation'. [project] [paper]. Dataset preparing Download

84 Dec 26, 2022

Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..

ARAPReg Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators.. Installation The cod

132 Nov 28, 2022

Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

Pixel Difference Convolution This repository contains the PyTorch implementation for "Pixel Difference Networks for Efficient Edge Detection" by Zhuo

236 Dec 21, 2022

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization 0. Environment Environment: python 3.6 and cuda 10

62 Dec 30, 2022

Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds This is the official code implementation for the paper "Spatio-temporal Se

63 Jan 5, 2023

Code release for ICCV 2021 paper "Anticipative Video Transformer"

Anticipative Video Transformer Ranked first in the Action Anticipation task of the CVPR 2021 EPIC-Kitchens Challenge! (entry: AVT-FB-UT) [project page

123 Dec 13, 2022

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

235 Dec 26, 2022

Demo code for ICCV 2021 paper "Sensor-Guided Optical Flow"

Sensor-Guided Optical Flow Demo code for "Sensor-Guided Optical Flow", ICCV 2021 This code is provided to replicate results with flow hints obtained f

10 Mar 16, 2022

Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Unified-EPT Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation. Installation Linux, CUDA>=10.0,

29 Aug 23, 2022

Starter code for the ICCV 2021 paper, 'Detecting Invisible People'

Detecting Invisible People [ICCV 2021 Paper] [Website] Tarasha Khurana, Achal Dave, Deva Ramanan Introduction This repository contains code for Detect

28 Sep 16, 2022

Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Vision Transformer with Progressive Sampling This is the official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

123 Jan 1, 2023

PyTorch implementation of paper: AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer, ICCV 2021.

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer [Paper] [PyTorch Implementation] [Paddle Implementation] Overview This reposit

148 Dec 30, 2022

Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction [Paper] [PaddlePaddle Implementation] Homepage of paper: Paint Transformer: Fee

442 Dec 16, 2022

Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

PixelSynth: Generating a 3D-Consistent Experience from a Single Image (ICCV 2021) Chris Rockwell, David F. Fouhey, and Justin Johnson [Project Website

95 Nov 22, 2022

Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

The DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergence, and present a conditional cross-attention mechanism for fast DETR training. Our approach is motivated by that the cross-attention in DETR relies highly on the content embeddings and that the spatial embeddings make minor contributions, increasing the need for high-quality content embeddings and thus increasing the training difficulty.

281 Dec 30, 2022

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Related tags

Overview

HuMoR: 3D Human Motion Model for Robust Pose Estimation (ICCV 2021)

Environment Setup

Downloads & External Dependencies

Body Model and Pose Prior

Datasets

Pretrained Models

OpenPose

Fitting to RGB Videos (Test-Time Optimization)

Visualizing RGB Results

Fitting on Specific Datasets

AMASS 3D Data

i3DB RGB Data

PROX RGB/RGB-D Data

Training & Testing Motion Model

Training HuMoR

Testing HuMoR

Training Initial State GMM

Citation

Questions?

Comments

Owner

Davis Rempe

code for ICCV 2021 paper 'Generalized Source-free Domain Adaptation'

Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..

Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

Code release for ICCV 2021 paper "Anticipative Video Transformer"

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

Demo code for ICCV 2021 paper "Sensor-Guided Optical Flow"

Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Starter code for the ICCV 2021 paper, 'Detecting Invisible People'

Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

PyTorch implementation of paper: AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer, ICCV 2021.

Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

Official implementation of the ICCV 2021 paper: "The Power of Points for Modeling Humans in Clothing".

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules