Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

Jia Li

Last update: Dec 24, 2022

Related tags

Deep Learning training tutorial heatmap pytorch distributed apex bottom-up pose-estimation multi-gpus mixed-precision multi-person focal-l2-loss

Overview

SimplePose

Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, accepted by AAAI-2020.

Also this repo serves as the Part B of our paper "Multi-Person Pose Estimation Based on Gaussian Response Heatmaps" (under review). The Part A is available at this link.

Update

A faster project is to be released.

Introduction

A bottom-up approach for the problem of multi-person pose estimation.

Training
Evaluation
Demo

Project Features

Implement the models using Pytorch in auto mixed-precision (using Nvidia Apex).
Support training on multiple GPUs (over 90% GPU usage rate on each GPU card).
Fast data preparing and augmentation during training (generating about 40 samples per second on signle CPU process and much more if wrapped by DataLoader Class).
Focal L2 loss.
Multi-scale supervision.
This project can also serve as a detailed practice to the green hand in Pytorch.

Prepare

Install packages:

Python=3.6, Pytorch>1.0, Nvidia Apex and other packages needed.
Download the COCO dataset.
Download the pre-trained models (default configuration: download the pretrained model snapshotted at epoch 52 provided as follow).

Download Link: BaiduCloud

Alternatively, download the pre-trained model without optimizer checkpoint only for the default configuration via GoogleDrive
Change the paths in the code according to your environment.

Run a Demo

python demo_image.py

Inference Speed

The speed of our system is tested on the MS-COCO test-dev dataset.

Inference speed of our 4-stage IMHN with 512 × 512 input on one 2080TI GPU: 38.5 FPS (100% GPU-Util).
Processing speed of the keypoint assignment algorithm part that is implemented in pure Python and a single process on Intel Xeon E5-2620 CPU: 5.2 FPS (has not been well accelerated).

Evaluation Steps

The corresponding code is in pure python without multiprocess for now.

python evaluate.py

Results on MSCOCO 2017 test-dev subset (focal L2 loss with gamma=2):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.685
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.867
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.749
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.664
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.719
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.728
 Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.892
 Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.782
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.688
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.784

Training Steps

Before training, prepare the training data using ''SimplePose/data/coco_masks_hdf5.py''.

Multiple GUPs are recommended to use to speed up the training process, but we support different training options.

Most code has been provided already, you can train the model with.
1. 'train.py': single training process on one GPU only.
2. 'train_parallel.py': signle training process on multiple GPUs using Dataparallel.
3. 'train_distributed.py' (recommended): multiple training processes on multiple GPUs using Distributed Training:

python -m torch.distributed.launch --nproc_per_node=4 train_distributed.py

Note: The loss_model_parrel.py is for train.py and train_parallel.py, while the loss_model.py is for train_distributed.py and train_distributed_SWA.py. They are different in dividing the batch size. Please refer to the code about the different choices.

For distributed training, the real batch_size = batch_size_in_config* × GPU_Num (world_size actually). For others, the real batch_size = batch_size_in_config*. The differences come from the different mechanisms of data parallel training and distributed training.

Referred Repositories (mainly)

Recommend Repositories

Faster Version: Chun-Ming Su has rebuilt and improved the post-processing speed of this repo using C++, and the improved system can run up to 7~8 FPS using a single scale with flipping on a 2080 TI GPU. Many thanks to Chun-Ming Su.

Citation

Please kindly cite this paper in your publications if it helps your research.

@inproceedings{li2020simple,
  title={Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation.},
  author={Li, Jia and Su, Wen and Wang, Zengfu},
  booktitle={AAAI},
  pages={11354--11361},
  year={2020}
}

Comments

Inference is very slow, 6 seconds per frame.

Hi, and thank you for making this code available.

I am running it in windows, on a GTX 1080, and using the demo_image.py file with the model from google drive and the time it takes to detect keypoints is more than 6 seconds.

What am i doing wrong? How can i get close to the 38 fps that you mention on the readme?

Thank you again!


>python demo_image.py --image input.jpg
0 neck->nose
1 neck->Reye
2 neck->Leye
3 neck->Rear
4 neck->Lear
5 nose->Reye
6 nose->Leye
7 Reye->Rear
8 Leye->Lear
9 neck->Rsho
10 Rsho->Relb
11 Relb->Rwri
12 neck->Lsho
13 Lsho->Lelb
14 Lelb->Lwri
15 neck->Rhip
16 Rhip->Rkne
17 Rkne->Rank
18 neck->Lhip
19 Lhip->Lkne
20 Lkne->Lank
21 nose->Rsho
22 nose->Lsho
23 Rsho->Rhip
24 Rhip->Lkne
25 Lsho->Lhip
26 Lhip->Rkne
27 Rear->Rsho
28 Lear->Lsho
29 Rhip->Lhip
{0: 'neck->nose',
 1: 'neck->Reye',
 2: 'neck->Leye',
 3: 'neck->Rear',
 4: 'neck->Lear',
 5: 'nose->Reye',
 6: 'nose->Leye',
 7: 'Reye->Rear',
 8: 'Leye->Lear',
 9: 'neck->Rsho',
 10: 'Rsho->Relb',
 11: 'Relb->Rwri',
 12: 'neck->Lsho',
 13: 'Lsho->Lelb',
 14: 'Lelb->Lwri',
 15: 'neck->Rhip',
 16: 'Rhip->Rkne',
 17: 'Rkne->Rank',
 18: 'neck->Lhip',
 19: 'Lhip->Lkne',
 20: 'Lkne->Lank',
 21: 'nose->Rsho',
 22: 'nose->Lsho',
 23: 'Rsho->Rhip',
 24: 'Rhip->Lkne',
 25: 'Lsho->Lhip',
 26: 'Lhip->Rkne',
 27: 'Rear->Rsho',
 28: 'Lear->Lsho',
 29: 'Rhip->Lhip',
 30: 'nose',
 31: 'neck',
 32: 'Rsho',
 33: 'Relb',
 34: 'Rwri',
 35: 'Lsho',
 36: 'Lelb',
 37: 'Lwri',
 38: 'Rhip',
 39: 'Rkne',
 40: 'Rank',
 41: 'Lhip',
 42: 'Lkne',
 43: 'Lank',
 44: 'Reye',
 45: 'Leye',
 46: 'Rear',
 47: 'Lear',
 48: 'background',
 49: 'reverseKeypoint'}
Resuming from checkpoint ......
Network weights have been resumed from checkpoint...
cuda
Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
start processing...
the 0th keypoint detection result is :  ([(384.98810766687865, 156.99848021452428), (392.0089789786089, 140.00016588448665), (372.00392927155144, 141.9994244210869), (396.997404715929, 137.00354114471122), (339.00678492184926, 140.0066329927729), (424.0065017794617, 191.99842561943024), (304.9960763460449, 220.00916854059585), (443.0001489242592, 272.0109579295975), (292.00050351624543, 310.9984260760411), (465.0083100132065, 350.99493035095674), (293.00562399904305, 404.00513994760007), (420.99916662586236, 393.0031377139439), (349.9987046664099, 401.00452761418853), (413.99545615615057, 536.0021693790678), (351.0002542695355, 541.9933765298466), (376.0021593526506, 644.988972815169), (352.00185668667876, 677.9945526718805)], 0.9674948892626798)
processing time is 6.45740

opened by antithing 9

An error appears during the training that may pass in a non-contiguous input.

So glad to see your project, I successfully run the demo, create the h5 file. But when I try to train the model, An error appears just like: RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input. I really hope to get your help, thank you very much.
stale

opened by mengfanShi 8
ValueError: not enough values to unpack (expected 5, got 3)

Hi writer, I encountered an error in running train.py,How should I modify it? Thank you for your answer!!!

Test phase, Epoch: 0 Traceback (most recent call last): File "train.py", line 206, in test(epoch, show_image=False) File "train.py", line 178, in test images, mask_misses, heatmaps, offsets, mask_offsets = target_tuple ValueError: not enough values to unpack (expected 5, got 3)
stale

opened by A7777-gp 7
With python demo_image.py

Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'",) start processing... Traceback (most recent call last): File "demo_image.py", line 637, in params, model_params = config_reader() File "/scratch/gp/Improved-Body-Parts-master/utils/config_reader.py", line 9, in config_reader param = config['param'] # 继承了dict的一种字典类型 File "/scratch/mool/ana3/envs/gpp/lib/python3.6/site-packages/configobj.py", line 554, in getitem val = dict.getitem(self, key) KeyError: 'param' (gpp) zhhu@k8s-master01:/scratch/gp/Improved-Body-Parts-master$ python demo_image.py I am in ubuntu18.04 CUDA10.0 run it.But it failed.can you help me to watch the errror? thanks!!!

opened by A7777-gp 7
Joints heatmaps

Hi I want to ask you what we have in the multi-person joints heatmap generated with the heatmapper. Is It just a gaussian around each joint location so that the same semantic joint (i.e. left shoulder) is on the same heatmap channel for all the human targets in the scene but at different x,y location?

So could you vectorize the joint heatmapper emitter i.e. with render gaussian? Cause I see you have many loop there with numpy code and so I am guessing if It could be vectorized with some Pytorch ops.
stale

opened by bhack 7
l2 focal loss is diffirent from paper

So glad to see your project, I successfully run the demo.But i found that the l2 focal loss in this project (models/loss_model_parallel.py), set the alpha=0 and beta=0, factor = torch.abs(1.- st), which is different from your paper shows, alpha=0.1, beta=0.02 and gamma=2, factor = (1. - st) **gamma.I'm really confused about that. I really hope to get your help, thank you very much.
stale

opened by VenAlone 5
how to divide different people's kp in the same image?

你好，我之前在学习CornerNet，CenterNet时，不同目标的相同关键点是通过嵌入向量分组的，如83channel=80类+嵌入向量+xy偏移量，所以当图片中有多个人时，IMHN的输出是如何解析的？IMHN的输出是什么样的，通道有什么含义？我理解的是所有人体的同个关键点预测在同张heatmap上，这样如何区分不同人呢？谢谢
stale

opened by gonghaotian 4
How to use train.py to run

because my computer don't have GPU,so i should run train.py. What request should i give in terminal? I tried but faild. @hellojialee Thank you very much~
stale

opened by Sunstin 4
It can be Running in Ubuntu18.04 with RTX3080, CUDA11.1？

Thank your wonderful work，I downloaded your code and tried to run it.But I encountered a lot of errors when downloading the packages.Can this run on Ubuntu18.04, RTX3080?

opened by A7777-gp 3
Aboat fig 4 is or not improved hourglass ?

Hello, your work is excellent, but there is one thing I don’t understand very well. Figure 4 in the paper is an improved hourglass network, but in Figure 3, it is still marked as hourglass, and you don’t see the part about Figure 4 in your code. Code. Excuse me, I don’t know if I misunderstood Figure 4.
stale

opened by zouxuelian 3
Bump pillow from 5.4.1 to 9.0.1
Bumps pillow from 5.4.1 to 9.0.1.

Release notes

Sourced from pillow's releases.

9.0.1

https://pillow.readthedocs.io/en/stable/releasenotes/9.0.1.html

Changes

In show_file, use os.remove to remove temporary images. CVE-2022-24303 #6010 [@radarhere, @hugovk]

Restrict builtins within lambdas for ImageMath.eval. CVE-2022-22817 #6009 [radarhere]

9.0.0

https://pillow.readthedocs.io/en/stable/releasenotes/9.0.0.html

Changes

Restrict builtins for ImageMath.eval() #5923 [@radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [@radarhere]

Fixed ImagePath.Path array handling #5920 [@radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [@radarhere]

Removed redundant part of condition #5915 [@radarhere]

Explicitly enable strip chopping for large uncompressed TIFFs #5517 [@kmilos]

Use the Windows method to get TCL functions on Cygwin #5807 [@DWesl]

Changed error type to allow for incremental WebP parsing #5404 [@radarhere]

Improved I;16 operations on big endian #5901 [@radarhere]

Ensure that BMP pixel data offset does not ignore palette #5899 [@radarhere]

Limit quantized palette to number of colors #5879 [@radarhere]

Use latin1 encoding to decode bytes #5870 [@radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [@radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [@radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [@radarhere]

Added rounding when converting P and PA #5824 [@radarhere]

Improved putdata() documentation and data handling #5910 [@radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [@radarhere]

Image.NONE is only used for resampling and dithers #5908 [@radarhere]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [@radarhere]

Add Tidelift alignment action and badge #5763 [@aclark4life]

Replaced further direct invocations of setup.py #5906 [@radarhere]

Added ImageShow support for xdg-open #5897 [@m-shinder]

Fixed typo #5902 [@radarhere]

Switched from deprecated "setup.py install" to "pip install ." #5896 [@radarhere]

Support 16-bit grayscale ImageQt conversion #5856 [@cmbruns]

Fixed raising OSError in _safe_read when size is greater than SAFEBLOCK #5872 [@radarhere]

Convert subsequent GIF frames to RGB or RGBA #5857 [@radarhere]

WebP: Fix memory leak during decoding on failure #5798 [@ilai-deutel]

Do not prematurely return in ImageFile when saving to stdout #5665 [@infmagic2047]

Added support for top right and bottom right TGA orientations #5829 [@radarhere]

Corrected ICNS file length in header #5845 [@radarhere]

Block tile TIFF tags when saving #5839 [@radarhere]

Added line width argument to ImageDraw polygon #5694 [@radarhere]

Do not redeclare class each time when converting to NumPy #5844 [@radarhere]

Only prevent repeated polygon pixels when drawing with transparency #5835 [@radarhere]

... (truncated)

Changelog

Sourced from pillow's changelog.

9.0.1 (2022-02-03)

In show_file, use os.remove to remove temporary images. CVE-2022-24303 #6010 [radarhere, hugovk]

Restrict builtins within lambdas for ImageMath.eval. CVE-2022-22817 #6009 [radarhere]

9.0.0 (2022-01-02)

Restrict builtins for ImageMath.eval(). CVE-2022-22817 #5923 [radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [radarhere]

Fixed ImagePath.Path array handling. CVE-2022-22815, CVE-2022-22816 #5920 [radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [radarhere]

Improved I;16 operations on big endian #5901 [radarhere]

Limit quantized palette to number of colors #5879 [radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [radarhere]

Added rounding when converting P and PA #5824 [radarhere]

Improved putdata() documentation and data handling #5910 [radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [hugovk]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [radarhere]

... (truncated)

Commits

6deac9e 9.0.1 version bump

c04d812 Update CHANGES.rst [ci skip]

4fabec3 Added release notes for 9.0.1

02affaa Added delay after opening image with xdg-open

ca0b585 Updated formatting

427221e In show_file, use os.remove to remove temporary images

c930be0 Restrict builtins within lambdas for ImageMath.eval

75b69dd Dont need to pin for GHA

cd938a7 Autolink CWE numbers with sphinx-issues

2e9c461 Add CVE IDs

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies stale
opened by dependabot[bot] 2

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

Related tags

Overview

SimplePose

Introduction

Contents

Project Features

Prepare

Run a Demo

Inference Speed

Evaluation Steps

Training Steps

Referred Repositories (mainly)

Recommend Repositories

Citation

Comments

9.0.1

Changes

9.0.0

Changes

9.0.1 (2022-02-03)

9.0.0 (2022-01-02)

Owner

Jia Li

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

Bottom-up Human Pose Estimation

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Code for "Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo"

Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation.

PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)

PoseViz – Multi-person, multi-camera 3D human pose visualization tool built using Mayavi.

[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding, top-down-bottom-up, and attention (consensus between columns)

Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.