Efficient 3D human pose estimation in video using 2D keypoint trajectories

Meta Research

Last update: Dec 29, 2022

Related tags

Deep Learning VideoPose3D

Overview

3D human pose estimation in video with temporal convolutions and semi-supervised training

This is the implementation of the approach described in the paper:

Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

More demos are available at https://dariopavllo.github.io/VideoPose3D

Results on Human3.6M

Under Protocol 1 (mean per-joint position error) and Protocol 2 (mean-per-joint position error after rigid alignment).

2D Detections	BBoxes	Blocks	Receptive Field	Error (P1)	Error (P2)
CPN	Mask R-CNN	4	243 frames	46.8 mm	36.5 mm
CPN	Ground truth	4	243 frames	47.1 mm	36.8 mm
CPN	Ground truth	3	81 frames	47.7 mm	37.2 mm
CPN	Ground truth	2	27 frames	48.8 mm	38.0 mm
Mask R-CNN	Mask R-CNN	4	243 frames	51.6 mm	40.3 mm
Ground truth	--	4	243 frames	37.2 mm	27.2 mm

Quick start

To get started as quickly as possible, follow the instructions in this section. This should allow you train a model from scratch, test our pretrained models, and produce basic visualizations. For more detailed instructions, please refer to DOCUMENTATION.md.

Dependencies

Make sure you have the following dependencies installed before proceeding:

Python 3+ distribution
PyTorch >= 0.4.0

Optional:

Matplotlib, if you want to visualize predictions. Additionally, you need ffmpeg to export MP4 videos, and imagemagick to export GIFs.
MATLAB, if you want to experiment with HumanEva-I (you need this to convert the dataset).

Dataset setup

You can find the instructions for setting up the Human3.6M and HumanEva-I datasets in DATASETS.md. For this short guide, we focus on Human3.6M. You are not required to setup HumanEva, unless you want to experiment with it.

In order to proceed, you must also copy CPN detections (for Human3.6M) and/or Mask R-CNN detections (for HumanEva).

Evaluating our pretrained models

The pretrained models can be downloaded from AWS. Put pretrained_h36m_cpn.bin (for Human3.6M) and/or pretrained_humaneva15_detectron.bin (for HumanEva) in the checkpoint/ directory (create it if it does not exist).

mkdir checkpoint
cd checkpoint
wget https://dl.fbaipublicfiles.com/video-pose-3d/pretrained_h36m_cpn.bin
wget https://dl.fbaipublicfiles.com/video-pose-3d/pretrained_humaneva15_detectron.bin
cd ..

These models allow you to reproduce our top-performing baselines, which are:

46.8 mm for Human3.6M, using fine-tuned CPN detections, bounding boxes from Mask R-CNN, and an architecture with a receptive field of 243 frames.
33.0 mm for HumanEva-I (on 3 actions), using pretrained Mask R-CNN detections, and an architecture with a receptive field of 27 frames. This is the multi-action model trained on 3 actions (Walk, Jog, Box).

To test on Human3.6M, run:

python run.py -k cpn_ft_h36m_dbb -arc 3,3,3,3,3 -c checkpoint --evaluate pretrained_h36m_cpn.bin

To test on HumanEva, run:

python run.py -d humaneva15 -k detectron_pt_coco -str Train/S1,Train/S2,Train/S3 -ste Validate/S1,Validate/S2,Validate/S3 -a Walk,Jog,Box --by-subject -c checkpoint --evaluate pretrained_humaneva15_detectron.bin

DOCUMENTATION.md provides a precise description of all command-line arguments.

Inference in the wild

We have introduced an experimental feature to run our model on custom videos. See INFERENCE.md for more details.

Training from scratch

If you want to reproduce the results of our pretrained models, run the following commands.

For Human3.6M:

python run.py -e 80 -k cpn_ft_h36m_dbb -arc 3,3,3,3,3

By default the application runs in training mode. This will train a new model for 80 epochs, using fine-tuned CPN detections. Expect a training time of 24 hours on a high-end Pascal GPU. If you feel that this is too much, or your GPU is not powerful enough, you can train a model with a smaller receptive field, e.g.

-arc 3,3,3,3 (81 frames) should require 11 hours and achieve 47.7 mm.
-arc 3,3,3 (27 frames) should require 6 hours and achieve 48.8 mm.

You could also lower the number of epochs from 80 to 60 with a negligible impact on the result.

For HumanEva:

python run.py -d humaneva15 -k detectron_pt_coco -str Train/S1,Train/S2,Train/S3 -ste Validate/S1,Validate/S2,Validate/S3 -b 128 -e 1000 -lrd 0.996 -a Walk,Jog,Box --by-subject

This will train for 1000 epochs, using Mask R-CNN detections and evaluating each subject separately. Since HumanEva is much smaller than Human3.6M, training should require about 50 minutes.

Semi-supervised training

To perform semi-supervised training, you just need to add the --subjects-unlabeled argument. In the example below, we use ground-truth 2D poses as input, and train supervised on just 10% of Subject 1 (specified by --subset 0.1). The remaining subjects are treated as unlabeled data and are used for semi-supervision.

python run.py -k gt --subjects-train S1 --subset 0.1 --subjects-unlabeled S5,S6,S7,S8 -e 200 -lrd 0.98 -arc 3,3,3 --warmup 5 -b 64

This should give you an error around 65.2 mm. By contrast, if we only train supervised

python run.py -k gt --subjects-train S1 --subset 0.1 -e 200 -lrd 0.98 -arc 3,3,3 -b 64

we get around 80.7 mm, which is significantly higher.

Visualization

If you have the original Human3.6M videos, you can generate nice visualizations of the model predictions. For instance:

python run.py -k cpn_ft_h36m_dbb -arc 3,3,3,3,3 -c checkpoint --evaluate pretrained_h36m_cpn.bin --render --viz-subject S11 --viz-action Walking --viz-camera 0 --viz-video "/path/to/videos/S11/Videos/Walking.54138969.mp4" --viz-output output.gif --viz-size 3 --viz-downsample 2 --viz-limit 60

The script can also export MP4 videos, and supports a variety of parameters (e.g. downsampling/FPS, size, bitrate). See DOCUMENTATION.md for more details.

License

This work is licensed under CC BY-NC. See LICENSE for details. Third-party datasets are subject to their respective licenses. If you use our code/models in your research, please cite our paper:

@inproceedings{pavllo:videopose3d:2019,
  title={3D human pose estimation in video with temporal convolutions and semi-supervised training},
  author={Pavllo, Dario and Feichtenhofer, Christoph and Grangier, David and Auli, Michael},
  booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}

Comments

3D trajectory in the wild
Again, thank you @dariopavllo and team for outstanding work! Thank you all for your amazing discussions to solve many community's issues.

Now it seems a lot of people figured out how to run it for the wild with decent efficiency.

The major practical bottleneck is that predicted skeleton is fixed by hip at the world center and thus skeleton doesn't move across the area which limits it’s practicality a lot.

Now in order to have your skeleton moved across the area you gotta have your own 3D trajectory model (as @dariopavllo points out many many times). And in order to train your own trajectory model to apply it for the hip you gotta have original h36m dataset. And you if don't have one (as original h36m owners doesn't seem to be responding for requests any more) you don't have that option.

So I was wondering if:

anyone solved 3D trajectories issue somehow

or have their own pretrained 3d trajectory model to share

or if original contributors are planning on adding pretrained 3D trajectories model in a near future.

That would benefit us all a lot! Keep up great work
opened by slava-smirnov 13
An error occurred while transforming the Human3.6M preprocessed dataset.

prepare_data_h36m.py:66: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details. with h5py.File(f) as hf: prepare_data_h36m.py:67: H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead. positions = hf['3D_positions'].value.reshape(32, 3, -1).transpose(2, 0, 1)

Can you provide the following documents? data_2d_h36m_gt.npz data_2d_h36m_sh_ft_h36m.npz data_2d_h36m_sh_pt_mpii.npz
data_3d_h36m.npz

opened by xiao19971225 13
2D input format dependency

Hi, could you provide any specifics around expected 2D keypoints input? Specifically, does the network expects a vector of keypoints with joints in a specific order? By looking at test data you supplied, they stay consistent - i.e. left arm (11,12,13) etc... Perhaps the input keypoints also have to be resized?

I tried feeding keypoints generated with CPN and AlphaPose on my own dataset but seem to be getting a random set of 3D points

opened by edrdos101 7
VideoPose3D on Android

How would one go about running this model for use in the wild on an Android device ? I imagine that it would be a similar process than described here: https://pytorch.org/mobile/android/ . I just wanted to know how feasible this would be or if it has already been done. I'd imagine I would have to use a more lightweight 2D pose inference than Detectron or Detectron2 to keep the runtime reasonable.

Thanks.

opened by aloyspx 6
The detail of producing bounding box
Hi, Thanks for your code. According to your paper, there are 2 ways to provide bounding box to CPN: one is using Mask R-CNN to extract from original image, the other is ground truth. I have 3 question:

Is the Mask R-CNN finetuned on Human 3.6M dataset, or directly from Model Zoo which is only trained on COCO?

Which backbone is used in Mask R-CNN for extracting bbox, and which for directly produce 2d Key point?

Why the Mask R-CNN provided bbox even performs a little better than ground truth? It seems a little confusing. Thanks a lot!
opened by DeepRunner 6
How to normalize cameras for testing a wild video?

Hi, I was testing a wild video using the keypoints extracted from detectron. In the run.py for evaluating , there is a lot of camera normalization code (specific to h36m 4 subjects). My doubt was , if we are testing a wild video like you tested in your demo , how do we do that know?

What i understand is , i should be loading the keypoints using the saved .npz file and directly evaluating after loading the model and skipping everything else in run.py.

opened by abhikhanna30 5
Question on semi-supervised training

I have some questions on the semi-supervised training.

(1)When I am training from coco 2d to h36m 3d(use detectron_pt_coco as 2d input), it seems that the order of 17 joints of two skeletons are different. But in the code, after projecting the h36m 3d-points back onto the image, it seems that you directly use reconstruction_semi = projection_func(predicted_semi + predicted_traj_cat[split_idx:], cam_semi); loss_reconstruction = mpjpe(reconstruction_semi, target_semi) to calculate the 2d loss. Seems that target_semi has not been transformed such that the different orders can be matched. Is here a bug, or do I miss something?

(2) I use the traj model you provided in #145 , and I found its trajectory and 3d pose are relatively good to provide a close 2d reprojection. But I am not sure how did you train it. Did you train with -subject-train with all 5 person, or just some of them for -subject-train and some for -subject-unlabelled? Also I found that we can actually train the trajectory only supervised on -subject-train by setting -warmup as -epoch. Do you train with only supervised trajectory, or also use -subject-unlabeled together in the progress? Which could be better if we want the trajectory and reprojection to be better? Also at the end, which level of traj_valid loss and 2d_valid did you get?

(3) Does the batchsize have a big impact on the semi-supervised training? I see you set -b 64 in the instruction, but in supervised by default it should be 1024.

(4) For the 2d input in semi-supervised training, I found that you use target = inputs_2d_cat[:split_idx, pad:-pad, :, :2].contiguous() if pad>0 to reduce the input shape to the output 1 frame result. But is here a bug since causal_shift also affects which frame to match. Should it be target = inputs_2d_cat[:split_idx, pad+causal_shift:, :, :2].contiguous() in causal model?

Thanks so much for your help.

opened by lawy623 4
Why need to "qinverse" from the world coordinate to the camera coordinate
Hi, I really like your work and thank you for providing the source code.

I have a question about the conversion from the world coordinate to the camera coordinate. in common/camera.py

def world_to_camera(X, R, t): Rt = wrap(qinverse, R) # Invert rotation return wrap(qrot, np.tile(Rt, (*X.shape[:-1], 1)), X - t) # Rotate and translate

Why dose need to reverse the quaternions firstly and then convert the world coordinate to the camera coordinate? Can you tell me what is the meaning of reversing the quaternions?

Best regards. Fabro
opened by fabro66 4
hip fix

Hi, I get wild video 3D pose result, find the center hip point is fixed, when a person sit down, the hip points is same. like this:

I think maybe is training data prepare, Could you please tell me what the function of inputs_3d[:, :, 0] = 0
https://github.com/facebookresearch/VideoPose3D/blob/master/run.py#L312 why need set center_hip coordinate to (0,0,0), Is this aim to fix the hip point?

Appreciate your apply Thank you in advance.

opened by lxy5513 4
Causality and video padding
Hi and thanks a lot for releasing your code!

I have a question regarding the padding of videos in the case of causal convolutions. Sorry if that's answered elsewhere or in the paper, it's not very clear to me :)

Looking at figure 9a of the paper (symmetric case), it's clear that for the first and last frames of the video (depending on the receptive field), the network "sees" duplicate frames due to padding. In the example in 9a, the most extreme case for symmetric convolutions are the 2 first and 2 last frames, where you get the only duplicate frames left and right respectively.

While in the extreme cases for symmetric convs the samples still include some non-duplicate frames (future or past depending on position in the video), for causal convolutions we have more extreme cases, where for the first 2 frames the network sees samples that are pretty much only padding, i.e. the same frame duplicated receptive_field -1 times.

So my questions for the causal case are:

Is the assumption is that the network is robust enough to differentiate between these cases (all duplicated past frames VS actual past frames)?

Did you run any experiments for causal convolutions where you discard these extreme cases and never present them as samples to the network? If yes, was there any change in performance? I guess it's hard to evaluate whole videos in that case, since you can only get predictions for frames where frame_id >= receptive_field if you discard all cases with padding.

Hope my questions make sense :D

Thanks!
opened by achigeor 4
Failed to reproduce result reported on the paper

I ran the following command to test your model with ground truth 2D keypoints inputs. python run.py -k gt -arc 3,3,3,3,3 -c checkpoint --evaluate pretrained_h36m_cpn.bin However, the program prints Protocol #1 (MPJPE) action-wise average: 42.6 mm, which is worse than reported result(37.2mm). Is the released model only intended to reproduce the result with 2D keypoints from CPN? In another words, the results from Table 3 on the paper are trained separately on each source of 2D keypoints inputs instead of using an unified model for all those results?

opened by JiangWenPL 4
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 1
Any way to correct either Detectron2 or VideoPose3D data for anatomical consistency?

I use VideoPose3D in a scientific context and am therefore looking for precision above all. I evaluate videos after recording, so performance is not an issue. Some elements seem to contribute to inconsistent data. E.g. a pair of loose, striped pants seems to confuse Detectron2 and therefore VP3D. Unfortunately, we cannot tell our participants what to wear. So one way to solve the issue would be to correct the data for anatomic constraints (e.g. fixed distances between two joints where applicable) or correction for impossible jumps. Are there any correction algorithms, either after Detectron2 and before VideoPose3D - or even after VideoPose3D analysis?

opened by c-hoffmann 0
Visualization: RAM overflow for longer videos

I tried the visualization for different videos and noticed, that the shorter ones worked fine, the longer ones ran into problems. I was able to observe from the ffmpeg output that a "rawvideo" stream was created which grew very quickly. My system has 32GB RAM, which allows me to handle about 5 minutes of video (H.264, 960x1080 @ 29.97 fps). But it fills up my RAM almost completely. Anything beyond that leads to a RAM overflow, which can easily be observed with system monitor tools. On the console, the error message is "Broken Pipe" and "Conversion failed!". The code that seems responsible is connected to the read_video function.

Unfortunately, my python is a bit rusty and I was not able to hack a fix together quickly. I assume that the issue could be resolved by handling each frame of the videofile separately instead of loading them all into RAM as RGB24/rawvideo. Alternatively, loading the whole video into RAM as compressed images and converting single frames only for the output image might be feasible.

This might be connected to some of the issues in #108 , #152 and #204. If necessary, I can provide console output, however this is easily reproducible since ffmpeg shows the rising output size and system monitor tools can be used to monitor RAM usage as well.

Fixing this would be highly appreciated, as we are using VP3D for running scientific research. Thank you for your work!

opened by c-hoffmann 0
visualization.py has error?why?

python run.py -k cpn_ft_h36m_dbb -arc 3,3,3,3,3 -c checkpoint --evaluate pretrained_h36m_cpn.bin --render --viz-subject S11 --viz-action Walking --viz-camera 0 --viz-video "1.mp4" --viz-output output.mp4 --viz-size 3 --viz-downsample 2 --viz-limit 60

frame= 689 fps=226 q=-0.0 Lsize= 4185675kB time=00:00:22.90 bitrate=1496984.0kbits/s speed=7.52x
video:4185675kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000% VideoPose3D/common/visualization.py:198: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
fig.tight_layout()

I change nothing.why???? tks

opened by henbucuoshanghai 0
2d ground truth and data_2d_h36m_cpn_ft_h36m_dbb.npz

Hello! :D

Thank you for the code! I could not quite understand how the code extracts ground truth 2D points from 3D. The tutorial suggested using cpn_ft_h36m_dbb specification but this requires data_2d_h36m_cpn_ft_h36m_dbb.npz. I followed the code but I could not find any way to create or obtain this file.

opened by RHnejad 1

Owner

Meta Research

GitHub

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

CenterPose Overview This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation fro

188 Dec 27, 2022

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

328 Dec 17, 2022

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors Human POSEitioning System (H

66 Dec 21, 2022

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

36 Oct 30, 2022

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30 sports-related actions each, for a total of 510 action clips.

25 Jun 20, 2021

The official repo for CVPR2021——ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search.

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search [paper] Introduction This is the official implementation of ViPNAS: Efficient V

42 Sep 26, 2022

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Build Type Linux MacOS Windows Build Status OpenPose has represented the first real-time multi-person system to jointly detect human body, hand, facia

25.7k Jan 9, 2023

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

4 Dec 15, 2022

Human head pose estimation using Keras over TensorFlow.

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild.

71 Jan 5, 2023

《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

42 Nov 24, 2022

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a

256 Dec 24, 2022

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

52 Nov 25, 2022

A super lightweight Lagrangian model for calculating millions of trajectories using ERA5 data

Easy-ERA5-Trck Easy-ERA5-Trck Galleries Install Usage Repository Structure Module Files Version iteration Easy-ERA5-Trck is a super lightweight Lagran

26 Nov 19, 2022

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

3.9k Jan 5, 2023

Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021)

Deep Dual Consecutive Network for Human Pose Estimation （CVPR2021） Introduction This is the official code of Deep Dual Consecutive Network for Human P

295 Dec 29, 2022

Bottom-up Human Pose Estimation

Introduction This is the official code of Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation. This paper has been accepted to CVPR2

108 Dec 1, 2022

HPRNet: Hierarchical Point Regression for Whole-Body Human Pose Estimation

HPRNet: Hierarchical Point Regression for Whole-Body Human Pose Estimation Official PyTroch implementation of HPRNet. HPRNet: Hierarchical Point Regre

53 Dec 4, 2022

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

363 Dec 28, 2022

Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Human Pose Regression with Residual Log-likelihood Estimation [Paper] [arXiv] [Project Page] Human Pose Regression with Residual Log-likelihood Estima

347 Dec 24, 2022

Efficient 3D human pose estimation in video using 2D keypoint trajectories

Related tags

Overview

3D human pose estimation in video with temporal convolutions and semi-supervised training

Results on Human3.6M

Quick start

Dependencies

Dataset setup

Evaluating our pretrained models

Inference in the wild

Training from scratch

Semi-supervised training

Visualization

License

Comments

Patching CVE-2007-4559

Owner

Meta Research

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

The official repo for CVPR2021——ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search.

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

Human head pose estimation using Keras over TensorFlow.

《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

A super lightweight Lagrangian model for calculating millions of trajectories using ERA5 data

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021)

Bottom-up Human Pose Estimation

HPRNet: Hierarchical Point Regression for Whole-Body Human Pose Estimation

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral