Efficient 3D human pose estimation in video using 2D keypoint trajectories

Overview

3D human pose estimation in video with temporal convolutions and semi-supervised training

This is the implementation of the approach described in the paper:

Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

More demos are available at https://dariopavllo.github.io/VideoPose3D

Results on Human3.6M

Under Protocol 1 (mean per-joint position error) and Protocol 2 (mean-per-joint position error after rigid alignment).

2D Detections BBoxes Blocks Receptive Field Error (P1) Error (P2)
CPN Mask R-CNN 4 243 frames 46.8 mm 36.5 mm
CPN Ground truth 4 243 frames 47.1 mm 36.8 mm
CPN Ground truth 3 81 frames 47.7 mm 37.2 mm
CPN Ground truth 2 27 frames 48.8 mm 38.0 mm
Mask R-CNN Mask R-CNN 4 243 frames 51.6 mm 40.3 mm
Ground truth -- 4 243 frames 37.2 mm 27.2 mm

Quick start

To get started as quickly as possible, follow the instructions in this section. This should allow you train a model from scratch, test our pretrained models, and produce basic visualizations. For more detailed instructions, please refer to DOCUMENTATION.md.

Dependencies

Make sure you have the following dependencies installed before proceeding:

  • Python 3+ distribution
  • PyTorch >= 0.4.0

Optional:

  • Matplotlib, if you want to visualize predictions. Additionally, you need ffmpeg to export MP4 videos, and imagemagick to export GIFs.
  • MATLAB, if you want to experiment with HumanEva-I (you need this to convert the dataset).

Dataset setup

You can find the instructions for setting up the Human3.6M and HumanEva-I datasets in DATASETS.md. For this short guide, we focus on Human3.6M. You are not required to setup HumanEva, unless you want to experiment with it.

In order to proceed, you must also copy CPN detections (for Human3.6M) and/or Mask R-CNN detections (for HumanEva).

Evaluating our pretrained models

The pretrained models can be downloaded from AWS. Put pretrained_h36m_cpn.bin (for Human3.6M) and/or pretrained_humaneva15_detectron.bin (for HumanEva) in the checkpoint/ directory (create it if it does not exist).

mkdir checkpoint
cd checkpoint
wget https://dl.fbaipublicfiles.com/video-pose-3d/pretrained_h36m_cpn.bin
wget https://dl.fbaipublicfiles.com/video-pose-3d/pretrained_humaneva15_detectron.bin
cd ..

These models allow you to reproduce our top-performing baselines, which are:

  • 46.8 mm for Human3.6M, using fine-tuned CPN detections, bounding boxes from Mask R-CNN, and an architecture with a receptive field of 243 frames.
  • 33.0 mm for HumanEva-I (on 3 actions), using pretrained Mask R-CNN detections, and an architecture with a receptive field of 27 frames. This is the multi-action model trained on 3 actions (Walk, Jog, Box).

To test on Human3.6M, run:

python run.py -k cpn_ft_h36m_dbb -arc 3,3,3,3,3 -c checkpoint --evaluate pretrained_h36m_cpn.bin

To test on HumanEva, run:

python run.py -d humaneva15 -k detectron_pt_coco -str Train/S1,Train/S2,Train/S3 -ste Validate/S1,Validate/S2,Validate/S3 -a Walk,Jog,Box --by-subject -c checkpoint --evaluate pretrained_humaneva15_detectron.bin

DOCUMENTATION.md provides a precise description of all command-line arguments.

Inference in the wild

We have introduced an experimental feature to run our model on custom videos. See INFERENCE.md for more details.

Training from scratch

If you want to reproduce the results of our pretrained models, run the following commands.

For Human3.6M:

python run.py -e 80 -k cpn_ft_h36m_dbb -arc 3,3,3,3,3

By default the application runs in training mode. This will train a new model for 80 epochs, using fine-tuned CPN detections. Expect a training time of 24 hours on a high-end Pascal GPU. If you feel that this is too much, or your GPU is not powerful enough, you can train a model with a smaller receptive field, e.g.

  • -arc 3,3,3,3 (81 frames) should require 11 hours and achieve 47.7 mm.
  • -arc 3,3,3 (27 frames) should require 6 hours and achieve 48.8 mm.

You could also lower the number of epochs from 80 to 60 with a negligible impact on the result.

For HumanEva:

python run.py -d humaneva15 -k detectron_pt_coco -str Train/S1,Train/S2,Train/S3 -ste Validate/S1,Validate/S2,Validate/S3 -b 128 -e 1000 -lrd 0.996 -a Walk,Jog,Box --by-subject

This will train for 1000 epochs, using Mask R-CNN detections and evaluating each subject separately. Since HumanEva is much smaller than Human3.6M, training should require about 50 minutes.

Semi-supervised training

To perform semi-supervised training, you just need to add the --subjects-unlabeled argument. In the example below, we use ground-truth 2D poses as input, and train supervised on just 10% of Subject 1 (specified by --subset 0.1). The remaining subjects are treated as unlabeled data and are used for semi-supervision.

python run.py -k gt --subjects-train S1 --subset 0.1 --subjects-unlabeled S5,S6,S7,S8 -e 200 -lrd 0.98 -arc 3,3,3 --warmup 5 -b 64

This should give you an error around 65.2 mm. By contrast, if we only train supervised

python run.py -k gt --subjects-train S1 --subset 0.1 -e 200 -lrd 0.98 -arc 3,3,3 -b 64

we get around 80.7 mm, which is significantly higher.

Visualization

If you have the original Human3.6M videos, you can generate nice visualizations of the model predictions. For instance:

python run.py -k cpn_ft_h36m_dbb -arc 3,3,3,3,3 -c checkpoint --evaluate pretrained_h36m_cpn.bin --render --viz-subject S11 --viz-action Walking --viz-camera 0 --viz-video "/path/to/videos/S11/Videos/Walking.54138969.mp4" --viz-output output.gif --viz-size 3 --viz-downsample 2 --viz-limit 60

The script can also export MP4 videos, and supports a variety of parameters (e.g. downsampling/FPS, size, bitrate). See DOCUMENTATION.md for more details.

License

This work is licensed under CC BY-NC. See LICENSE for details. Third-party datasets are subject to their respective licenses. If you use our code/models in your research, please cite our paper:

@inproceedings{pavllo:videopose3d:2019,
  title={3D human pose estimation in video with temporal convolutions and semi-supervised training},
  author={Pavllo, Dario and Feichtenhofer, Christoph and Grangier, David and Auli, Michael},
  booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}
Comments
  • 3D trajectory in the wild

    3D trajectory in the wild

    Again, thank you @dariopavllo and team for outstanding work! Thank you all for your amazing discussions to solve many community's issues.

    Now it seems a lot of people figured out how to run it for the wild with decent efficiency.

    The major practical bottleneck is that predicted skeleton is fixed by hip at the world center and thus skeleton doesn't move across the area which limits it’s practicality a lot.

    Now in order to have your skeleton moved across the area you gotta have your own 3D trajectory model (as @dariopavllo points out many many times). And in order to train your own trajectory model to apply it for the hip you gotta have original h36m dataset. And you if don't have one (as original h36m owners doesn't seem to be responding for requests any more) you don't have that option.

    So I was wondering if:

    • anyone solved 3D trajectories issue somehow

    • or have their own pretrained 3d trajectory model to share

    • or if original contributors are planning on adding pretrained 3D trajectories model in a near future.

    That would benefit us all a lot! Keep up great work

    opened by slava-smirnov 13
  •  An error occurred while transforming the Human3.6M preprocessed dataset.

    An error occurred while transforming the Human3.6M preprocessed dataset.

    prepare_data_h36m.py:66: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details. with h5py.File(f) as hf: prepare_data_h36m.py:67: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead. positions = hf['3D_positions'].value.reshape(32, 3, -1).transpose(2, 0, 1)

    Can you provide the following documents? data_2d_h36m_gt.npz data_2d_h36m_sh_ft_h36m.npz data_2d_h36m_sh_pt_mpii.npz
    data_3d_h36m.npz

    opened by xiao19971225 13
  • 2D input format dependency

    2D input format dependency

    Hi, could you provide any specifics around expected 2D keypoints input? Specifically, does the network expects a vector of keypoints with joints in a specific order? By looking at test data you supplied, they stay consistent - i.e. left arm (11,12,13) etc... Perhaps the input keypoints also have to be resized?

    I tried feeding keypoints generated with CPN and AlphaPose on my own dataset but seem to be getting a random set of 3D points

    opened by edrdos101 7
  • VideoPose3D on Android

    VideoPose3D on Android

    How would one go about running this model for use in the wild on an Android device ? I imagine that it would be a similar process than described here: https://pytorch.org/mobile/android/ . I just wanted to know how feasible this would be or if it has already been done. I'd imagine I would have to use a more lightweight 2D pose inference than Detectron or Detectron2 to keep the runtime reasonable.

    Thanks.

    opened by aloyspx 6
  • The detail of producing bounding box

    The detail of producing bounding box

    Hi, Thanks for your code. According to your paper, there are 2 ways to provide bounding box to CPN: one is using Mask R-CNN to extract from original image, the other is ground truth. I have 3 question:

    1. Is the Mask R-CNN finetuned on Human 3.6M dataset, or directly from Model Zoo which is only trained on COCO?
    2. Which backbone is used in Mask R-CNN for extracting bbox, and which for directly produce 2d Key point?
    3. Why the Mask R-CNN provided bbox even performs a little better than ground truth? It seems a little confusing. Thanks a lot!
    opened by DeepRunner 6
  • How to normalize cameras for testing a wild video?

    How to normalize cameras for testing a wild video?

    Hi, I was testing a wild video using the keypoints extracted from detectron. In the run.py for evaluating , there is a lot of camera normalization code (specific to h36m 4 subjects). My doubt was , if we are testing a wild video like you tested in your demo , how do we do that know?

    What i understand is , i should be loading the keypoints using the saved .npz file and directly evaluating after loading the model and skipping everything else in run.py.

    opened by abhikhanna30 5
  • Question on semi-supervised training

    Question on semi-supervised training

    I have some questions on the semi-supervised training.

    (1)When I am training from coco 2d to h36m 3d(use detectron_pt_coco as 2d input), it seems that the order of 17 joints of two skeletons are different. But in the code, after projecting the h36m 3d-points back onto the image, it seems that you directly use reconstruction_semi = projection_func(predicted_semi + predicted_traj_cat[split_idx:], cam_semi); loss_reconstruction = mpjpe(reconstruction_semi, target_semi) to calculate the 2d loss. Seems that target_semi has not been transformed such that the different orders can be matched. Is here a bug, or do I miss something?

    (2) I use the traj model you provided in #145 , and I found its trajectory and 3d pose are relatively good to provide a close 2d reprojection. But I am not sure how did you train it. Did you train with -subject-train with all 5 person, or just some of them for -subject-train and some for -subject-unlabelled? Also I found that we can actually train the trajectory only supervised on -subject-train by setting -warmup as -epoch. Do you train with only supervised trajectory, or also use -subject-unlabeled together in the progress? Which could be better if we want the trajectory and reprojection to be better? Also at the end, which level of traj_valid loss and 2d_valid did you get?

    (3) Does the batchsize have a big impact on the semi-supervised training? I see you set -b 64 in the instruction, but in supervised by default it should be 1024.

    (4) For the 2d input in semi-supervised training, I found that you use target = inputs_2d_cat[:split_idx, pad:-pad, :, :2].contiguous() if pad>0 to reduce the input shape to the output 1 frame result. But is here a bug since causal_shift also affects which frame to match. Should it be target = inputs_2d_cat[:split_idx, pad+causal_shift:, :, :2].contiguous() in causal model?

    Thanks so much for your help.

    opened by lawy623 4
  • Why need to

    Why need to "qinverse" from the world coordinate to the camera coordinate

    Hi, I really like your work and thank you for providing the source code.

    I have a question about the conversion from the world coordinate to the camera coordinate. in common/camera.py

    def world_to_camera(X, R, t):
        Rt = wrap(qinverse, R) # Invert rotation
        return wrap(qrot, np.tile(Rt, (*X.shape[:-1], 1)), X - t) # Rotate and translate
    

    Why dose need to reverse the quaternions firstly and then convert the world coordinate to the camera coordinate? Can you tell me what is the meaning of reversing the quaternions?

    Best regards. Fabro

    opened by fabro66 4
  • hip fix

    hip fix

    Hi, I get wild video 3D pose result, find the center hip point is fixed, when a person sit down, the hip points is same. like this:

    test

    I think maybe is training data prepare, Could you please tell me what the function of inputs_3d[:, :, 0] = 0
    https://github.com/facebookresearch/VideoPose3D/blob/master/run.py#L312 why need set center_hip coordinate to (0,0,0), Is this aim to fix the hip point?

    Appreciate your apply Thank you in advance.

    opened by lxy5513 4
  • Causality and video padding

    Causality and video padding

    Hi and thanks a lot for releasing your code!

    I have a question regarding the padding of videos in the case of causal convolutions. Sorry if that's answered elsewhere or in the paper, it's not very clear to me :)

    Looking at figure 9a of the paper (symmetric case), it's clear that for the first and last frames of the video (depending on the receptive field), the network "sees" duplicate frames due to padding. In the example in 9a, the most extreme case for symmetric convolutions are the 2 first and 2 last frames, where you get the only duplicate frames left and right respectively.

    While in the extreme cases for symmetric convs the samples still include some non-duplicate frames (future or past depending on position in the video), for causal convolutions we have more extreme cases, where for the first 2 frames the network sees samples that are pretty much only padding, i.e. the same frame duplicated receptive_field -1 times.

    So my questions for the causal case are:

    • Is the assumption is that the network is robust enough to differentiate between these cases (all duplicated past frames VS actual past frames)?
    • Did you run any experiments for causal convolutions where you discard these extreme cases and never present them as samples to the network? If yes, was there any change in performance? I guess it's hard to evaluate whole videos in that case, since you can only get predictions for frames where frame_id >= receptive_field if you discard all cases with padding.

    Hope my questions make sense :D

    Thanks!

    opened by achigeor 4
  • Failed to reproduce result reported on the paper

    Failed to reproduce result reported on the paper

    I ran the following command to test your model with ground truth 2D keypoints inputs. python run.py -k gt -arc 3,3,3,3,3 -c checkpoint --evaluate pretrained_h36m_cpn.bin However, the program prints Protocol #1 (MPJPE) action-wise average: 42.6 mm, which is worse than reported result(37.2mm). Is the released model only intended to reproduce the result with 2D keypoints from CPN? In another words, the results from Table 3 on the paper are trained separately on each source of 2D keypoints inputs instead of using an unified model for all those results?

    opened by JiangWenPL 4
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 1
  • Any way to correct either Detectron2 or VideoPose3D data for anatomical consistency?

    Any way to correct either Detectron2 or VideoPose3D data for anatomical consistency?

    I use VideoPose3D in a scientific context and am therefore looking for precision above all. I evaluate videos after recording, so performance is not an issue. Some elements seem to contribute to inconsistent data. E.g. a pair of loose, striped pants seems to confuse Detectron2 and therefore VP3D. Unfortunately, we cannot tell our participants what to wear. So one way to solve the issue would be to correct the data for anatomic constraints (e.g. fixed distances between two joints where applicable) or correction for impossible jumps. Are there any correction algorithms, either after Detectron2 and before VideoPose3D - or even after VideoPose3D analysis?

    opened by c-hoffmann 0
  • Visualization: RAM overflow for longer videos

    Visualization: RAM overflow for longer videos

    I tried the visualization for different videos and noticed, that the shorter ones worked fine, the longer ones ran into problems. I was able to observe from the ffmpeg output that a "rawvideo" stream was created which grew very quickly. My system has 32GB RAM, which allows me to handle about 5 minutes of video (H.264, 960x1080 @ 29.97 fps). But it fills up my RAM almost completely. Anything beyond that leads to a RAM overflow, which can easily be observed with system monitor tools. On the console, the error message is "Broken Pipe" and "Conversion failed!". The code that seems responsible is connected to the read_video function.

    Unfortunately, my python is a bit rusty and I was not able to hack a fix together quickly. I assume that the issue could be resolved by handling each frame of the videofile separately instead of loading them all into RAM as RGB24/rawvideo. Alternatively, loading the whole video into RAM as compressed images and converting single frames only for the output image might be feasible.

    This might be connected to some of the issues in #108 , #152 and #204. If necessary, I can provide console output, however this is easily reproducible since ffmpeg shows the rising output size and system monitor tools can be used to monitor RAM usage as well.

    Fixing this would be highly appreciated, as we are using VP3D for running scientific research. Thank you for your work!

    opened by c-hoffmann 0
  • visualization.py has error?why?

    visualization.py has error?why?

    python run.py -k cpn_ft_h36m_dbb -arc 3,3,3,3,3 -c checkpoint --evaluate pretrained_h36m_cpn.bin --render --viz-subject S11 --viz-action Walking --viz-camera 0 --viz-video "1.mp4" --viz-output output.mp4 --viz-size 3 --viz-downsample 2 --viz-limit 60

    frame= 689 fps=226 q=-0.0 Lsize= 4185675kB time=00:00:22.90 bitrate=1496984.0kbits/s speed=7.52x
    video:4185675kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000% VideoPose3D/common/visualization.py:198: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
    fig.tight_layout()

    I change nothing.why???? tks

    opened by henbucuoshanghai 0
  • 2d ground truth and data_2d_h36m_cpn_ft_h36m_dbb.npz

    2d ground truth and data_2d_h36m_cpn_ft_h36m_dbb.npz

    Hello! :D

    Thank you for the code! I could not quite understand how the code extracts ground truth 2D points from 3D. The tutorial suggested using cpn_ft_h36m_dbb specification but this requires data_2d_h36m_cpn_ft_h36m_dbb.npz. I followed the code but I could not find any way to create or obtain this file.

    opened by RHnejad 1
Owner
Meta Research
Meta Research
Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

CenterPose Overview This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation fro

NVIDIA Research Projects 188 Dec 27, 2022
Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

Pyjcsx 328 Dec 17, 2022
Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors Human POSEitioning System (H

Aymen Mir 66 Dec 21, 2022
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

Aiden Nibali 36 Oct 30, 2022
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30 sports-related actions each, for a total of 510 action clips.

Aiden Nibali 25 Jun 20, 2021
The official repo for CVPR2021——ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search.

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search [paper] Introduction This is the official implementation of ViPNAS: Efficient V

Lumin 42 Sep 26, 2022
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Build Type Linux MacOS Windows Build Status OpenPose has represented the first real-time multi-person system to jointly detect human body, hand, facia

null 25.7k Jan 9, 2023
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
Human head pose estimation using Keras over TensorFlow.

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild.

Rafael Berral Soler 71 Jan 5, 2023
《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

null 42 Nov 24, 2022
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a

Jia Li 256 Dec 24, 2022
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 52 Nov 25, 2022
A super lightweight Lagrangian model for calculating millions of trajectories using ERA5 data

Easy-ERA5-Trck Easy-ERA5-Trck Galleries Install Usage Repository Structure Module Files Version iteration Easy-ERA5-Trck is a super lightweight Lagran

Zhenning Li 26 Nov 19, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 5, 2023
Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021)

Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021) Introduction This is the official code of Deep Dual Consecutive Network for Human P

null 295 Dec 29, 2022
Bottom-up Human Pose Estimation

Introduction This is the official code of Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation. This paper has been accepted to CVPR2

null 108 Dec 1, 2022
HPRNet: Hierarchical Point Regression for Whole-Body Human Pose Estimation

HPRNet: Hierarchical Point Regression for Whole-Body Human Pose Estimation Official PyTroch implementation of HPRNet. HPRNet: Hierarchical Point Regre

Nermin Samet 53 Dec 4, 2022
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

Ce Zheng 363 Dec 28, 2022
Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Human Pose Regression with Residual Log-likelihood Estimation [Paper] [arXiv] [Project Page] Human Pose Regression with Residual Log-likelihood Estima

JeffLi 347 Dec 24, 2022