Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

Overview

SimplePose

Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, accepted by AAAI-2020.

Also this repo serves as the Part B of our paper "Multi-Person Pose Estimation Based on Gaussian Response Heatmaps" (under review). The Part A is available at this link.

  • Update

    A faster project is to be released.

Introduction

A bottom-up approach for the problem of multi-person pose estimation.

heatmap

network

Contents

  1. Training
  2. Evaluation
  3. Demo

Project Features

  • Implement the models using Pytorch in auto mixed-precision (using Nvidia Apex).
  • Support training on multiple GPUs (over 90% GPU usage rate on each GPU card).
  • Fast data preparing and augmentation during training (generating about 40 samples per second on signle CPU process and much more if wrapped by DataLoader Class).
  • Focal L2 loss. FL2
  • Multi-scale supervision.
  • This project can also serve as a detailed practice to the green hand in Pytorch.

Prepare

  1. Install packages:

    Python=3.6, Pytorch>1.0, Nvidia Apex and other packages needed.

  2. Download the COCO dataset.

  3. Download the pre-trained models (default configuration: download the pretrained model snapshotted at epoch 52 provided as follow).

    Download Link: BaiduCloud

    Alternatively, download the pre-trained model without optimizer checkpoint only for the default configuration via GoogleDrive

  4. Change the paths in the code according to your environment.

Run a Demo

python demo_image.py

examples

Inference Speed

The speed of our system is tested on the MS-COCO test-dev dataset.

  • Inference speed of our 4-stage IMHN with 512 × 512 input on one 2080TI GPU: 38.5 FPS (100% GPU-Util).
  • Processing speed of the keypoint assignment algorithm part that is implemented in pure Python and a single process on Intel Xeon E5-2620 CPU: 5.2 FPS (has not been well accelerated).

Evaluation Steps

The corresponding code is in pure python without multiprocess for now.

python evaluate.py

Results on MSCOCO 2017 test-dev subset (focal L2 loss with gamma=2):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.685
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.867
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.749
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.664
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.719
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.728
 Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.892
 Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.782
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.688
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.784

Training Steps

Before training, prepare the training data using ''SimplePose/data/coco_masks_hdf5.py''.

Multiple GUPs are recommended to use to speed up the training process, but we support different training options.

  • Most code has been provided already, you can train the model with.

    1. 'train.py': single training process on one GPU only.
    2. 'train_parallel.py': signle training process on multiple GPUs using Dataparallel.
    3. 'train_distributed.py' (recommended): multiple training processes on multiple GPUs using Distributed Training:
python -m torch.distributed.launch --nproc_per_node=4 train_distributed.py

Note: The loss_model_parrel.py is for train.py and train_parallel.py, while the loss_model.py is for train_distributed.py and train_distributed_SWA.py. They are different in dividing the batch size. Please refer to the code about the different choices.

For distributed training, the real batch_size = batch_size_in_config* × GPU_Num (world_size actually). For others, the real batch_size = batch_size_in_config*. The differences come from the different mechanisms of data parallel training and distributed training.

Referred Repositories (mainly)

Recommend Repositories

Faster Version: Chun-Ming Su has rebuilt and improved the post-processing speed of this repo using C++, and the improved system can run up to 7~8 FPS using a single scale with flipping on a 2080 TI GPU. Many thanks to Chun-Ming Su.

Citation

Please kindly cite this paper in your publications if it helps your research.

@inproceedings{li2020simple,
  title={Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation.},
  author={Li, Jia and Su, Wen and Wang, Zengfu},
  booktitle={AAAI},
  pages={11354--11361},
  year={2020}
}
Comments
  • Inference is very slow, 6 seconds per frame.

    Inference is very slow, 6 seconds per frame.

    Hi, and thank you for making this code available.

    I am running it in windows, on a GTX 1080, and using the demo_image.py file with the model from google drive and the time it takes to detect keypoints is more than 6 seconds.

    What am i doing wrong? How can i get close to the 38 fps that you mention on the readme?

    Thank you again!

    
    >python demo_image.py --image input.jpg
    0 neck->nose
    1 neck->Reye
    2 neck->Leye
    3 neck->Rear
    4 neck->Lear
    5 nose->Reye
    6 nose->Leye
    7 Reye->Rear
    8 Leye->Lear
    9 neck->Rsho
    10 Rsho->Relb
    11 Relb->Rwri
    12 neck->Lsho
    13 Lsho->Lelb
    14 Lelb->Lwri
    15 neck->Rhip
    16 Rhip->Rkne
    17 Rkne->Rank
    18 neck->Lhip
    19 Lhip->Lkne
    20 Lkne->Lank
    21 nose->Rsho
    22 nose->Lsho
    23 Rsho->Rhip
    24 Rhip->Lkne
    25 Lsho->Lhip
    26 Lhip->Rkne
    27 Rear->Rsho
    28 Lear->Lsho
    29 Rhip->Lhip
    {0: 'neck->nose',
     1: 'neck->Reye',
     2: 'neck->Leye',
     3: 'neck->Rear',
     4: 'neck->Lear',
     5: 'nose->Reye',
     6: 'nose->Leye',
     7: 'Reye->Rear',
     8: 'Leye->Lear',
     9: 'neck->Rsho',
     10: 'Rsho->Relb',
     11: 'Relb->Rwri',
     12: 'neck->Lsho',
     13: 'Lsho->Lelb',
     14: 'Lelb->Lwri',
     15: 'neck->Rhip',
     16: 'Rhip->Rkne',
     17: 'Rkne->Rank',
     18: 'neck->Lhip',
     19: 'Lhip->Lkne',
     20: 'Lkne->Lank',
     21: 'nose->Rsho',
     22: 'nose->Lsho',
     23: 'Rsho->Rhip',
     24: 'Rhip->Lkne',
     25: 'Lsho->Lhip',
     26: 'Lhip->Rkne',
     27: 'Rear->Rsho',
     28: 'Lear->Lsho',
     29: 'Rhip->Lhip',
     30: 'nose',
     31: 'neck',
     32: 'Rsho',
     33: 'Relb',
     34: 'Rwri',
     35: 'Lsho',
     36: 'Lelb',
     37: 'Lwri',
     38: 'Rhip',
     39: 'Rkne',
     40: 'Rank',
     41: 'Lhip',
     42: 'Lkne',
     43: 'Lank',
     44: 'Reye',
     45: 'Leye',
     46: 'Rear',
     47: 'Lear',
     48: 'background',
     49: 'reverseKeypoint'}
    Resuming from checkpoint ......
    Network weights have been resumed from checkpoint...
    cuda
    Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.
    
    Defaults for this optimization level are:
    enabled                : True
    opt_level              : O1
    cast_model_type        : None
    patch_torch_functions  : True
    keep_batchnorm_fp32    : None
    master_weights         : None
    loss_scale             : dynamic
    Processing user overrides (additional kwargs that are not None)...
    After processing overrides, optimization options are:
    enabled                : True
    opt_level              : O1
    cast_model_type        : None
    patch_torch_functions  : True
    keep_batchnorm_fp32    : None
    master_weights         : None
    loss_scale             : dynamic
    start processing...
    the 0th keypoint detection result is :  ([(384.98810766687865, 156.99848021452428), (392.0089789786089, 140.00016588448665), (372.00392927155144, 141.9994244210869), (396.997404715929, 137.00354114471122), (339.00678492184926, 140.0066329927729), (424.0065017794617, 191.99842561943024), (304.9960763460449, 220.00916854059585), (443.0001489242592, 272.0109579295975), (292.00050351624543, 310.9984260760411), (465.0083100132065, 350.99493035095674), (293.00562399904305, 404.00513994760007), (420.99916662586236, 393.0031377139439), (349.9987046664099, 401.00452761418853), (413.99545615615057, 536.0021693790678), (351.0002542695355, 541.9933765298466), (376.0021593526506, 644.988972815169), (352.00185668667876, 677.9945526718805)], 0.9674948892626798)
    processing time is 6.45740
    
    opened by antithing 9
  • An error appears during the training that may pass in a non-contiguous input.

    An error appears during the training that may pass in a non-contiguous input.

    So glad to see your project, I successfully run the demo, create the h5 file. But when I try to train the model, An error appears just like: RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input. I really hope to get your help, thank you very much.

    stale 
    opened by mengfanShi 8
  • ValueError: not enough values to unpack (expected 5, got 3)

    ValueError: not enough values to unpack (expected 5, got 3)

    Hi writer, I encountered an error in running train.py,How should I modify it? Thank you for your answer!!!

    Test phase, Epoch: 0 Traceback (most recent call last): File "train.py", line 206, in test(epoch, show_image=False) File "train.py", line 178, in test images, mask_misses, heatmaps, offsets, mask_offsets = target_tuple ValueError: not enough values to unpack (expected 5, got 3)

    stale 
    opened by A7777-gp 7
  • With python demo_image.py

    With python demo_image.py

    Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'",) start processing... Traceback (most recent call last): File "demo_image.py", line 637, in params, model_params = config_reader() File "/scratch/gp/Improved-Body-Parts-master/utils/config_reader.py", line 9, in config_reader param = config['param'] # 继承了dict的一种字典类型 File "/scratch/mool/ana3/envs/gpp/lib/python3.6/site-packages/configobj.py", line 554, in getitem val = dict.getitem(self, key) KeyError: 'param' (gpp) zhhu@k8s-master01:/scratch/gp/Improved-Body-Parts-master$ python demo_image.py I am in ubuntu18.04 CUDA10.0 run it.But it failed.can you help me to watch the errror? thanks!!!

    opened by A7777-gp 7
  • Joints heatmaps

    Joints heatmaps

    Hi I want to ask you what we have in the multi-person joints heatmap generated with the heatmapper. Is It just a gaussian around each joint location so that the same semantic joint (i.e. left shoulder) is on the same heatmap channel for all the human targets in the scene but at different x,y location?

    So could you vectorize the joint heatmapper emitter i.e. with render gaussian? Cause I see you have many loop there with numpy code and so I am guessing if It could be vectorized with some Pytorch ops.

    stale 
    opened by bhack 7
  • l2 focal loss is diffirent from paper

    l2 focal loss is diffirent from paper

    So glad to see your project, I successfully run the demo.But i found that the l2 focal loss in this project (models/loss_model_parallel.py), set the alpha=0 and beta=0, factor = torch.abs(1.- st), which is different from your paper shows, alpha=0.1, beta=0.02 and gamma=2, factor = (1. - st) **gamma.I'm really confused about that. I really hope to get your help, thank you very much.

    stale 
    opened by VenAlone 5
  • how to divide different people's kp in the same image?

    how to divide different people's kp in the same image?

    你好,我之前在学习CornerNet,CenterNet时,不同目标的相同关键点是通过嵌入向量分组的,如83channel=80类+嵌入向量+xy偏移量,所以当图片中有多个人时,IMHN的输出是如何解析的?IMHN的输出是什么样的,通道有什么含义?我理解的是所有人体的同个关键点预测在同张heatmap上,这样如何区分不同人呢?谢谢

    stale 
    opened by gonghaotian 4
  • How to use train.py to run

    How to use train.py to run

    because my computer don't have GPU,so i should run train.py. What request should i give in terminal? I tried but faild. @hellojialee Thank you very much~

    stale 
    opened by Sunstin 4
  • It can be Running in Ubuntu18.04 with RTX3080, CUDA11.1?

    It can be Running in Ubuntu18.04 with RTX3080, CUDA11.1?

    Thank your wonderful work,I downloaded your code and tried to run it.But I encountered a lot of errors when downloading the packages.Can this run on Ubuntu18.04, RTX3080?

    opened by A7777-gp 3
  • Aboat fig 4 is or not improved hourglass ?

    Aboat fig 4 is or not improved hourglass ?

    Hello, your work is excellent, but there is one thing I don’t understand very well. Figure 4 in the paper is an improved hourglass network, but in Figure 3, it is still marked as hourglass, and you don’t see the part about Figure 4 in your code. Code. Excuse me, I don’t know if I misunderstood Figure 4.

    stale 
    opened by zouxuelian 3
  • Bump pillow from 5.4.1 to 9.0.1

    Bump pillow from 5.4.1 to 9.0.1

    Bumps pillow from 5.4.1 to 9.0.1.

    Release notes

    Sourced from pillow's releases.

    9.0.1

    https://pillow.readthedocs.io/en/stable/releasenotes/9.0.1.html

    Changes

    • In show_file, use os.remove to remove temporary images. CVE-2022-24303 #6010 [@​radarhere, @​hugovk]
    • Restrict builtins within lambdas for ImageMath.eval. CVE-2022-22817 #6009 [radarhere]

    9.0.0

    https://pillow.readthedocs.io/en/stable/releasenotes/9.0.0.html

    Changes

    ... (truncated)

    Changelog

    Sourced from pillow's changelog.

    9.0.1 (2022-02-03)

    • In show_file, use os.remove to remove temporary images. CVE-2022-24303 #6010 [radarhere, hugovk]

    • Restrict builtins within lambdas for ImageMath.eval. CVE-2022-22817 #6009 [radarhere]

    9.0.0 (2022-01-02)

    • Restrict builtins for ImageMath.eval(). CVE-2022-22817 #5923 [radarhere]

    • Ensure JpegImagePlugin stops at the end of a truncated file #5921 [radarhere]

    • Fixed ImagePath.Path array handling. CVE-2022-22815, CVE-2022-22816 #5920 [radarhere]

    • Remove consecutive duplicate tiles that only differ by their offset #5919 [radarhere]

    • Improved I;16 operations on big endian #5901 [radarhere]

    • Limit quantized palette to number of colors #5879 [radarhere]

    • Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [radarhere]

    • When saving RGBA to GIF, make use of first transparent palette entry #5859 [radarhere]

    • Pass SAMPLEFORMAT to libtiff #5848 [radarhere]

    • Added rounding when converting P and PA #5824 [radarhere]

    • Improved putdata() documentation and data handling #5910 [radarhere]

    • Exclude carriage return in PDF regex to help prevent ReDoS #5912 [hugovk]

    • Fixed freeing pointer in ImageDraw.Outline.transform #5909 [radarhere]

    ... (truncated)

    Commits
    • 6deac9e 9.0.1 version bump
    • c04d812 Update CHANGES.rst [ci skip]
    • 4fabec3 Added release notes for 9.0.1
    • 02affaa Added delay after opening image with xdg-open
    • ca0b585 Updated formatting
    • 427221e In show_file, use os.remove to remove temporary images
    • c930be0 Restrict builtins within lambdas for ImageMath.eval
    • 75b69dd Dont need to pin for GHA
    • cd938a7 Autolink CWE numbers with sphinx-issues
    • 2e9c461 Add CVE IDs
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies stale 
    opened by dependabot[bot] 2
Owner
Jia Li
Hello there :)
Jia Li
Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

Gyeongsik Moon 677 Dec 25, 2022
Bottom-up Human Pose Estimation

Introduction This is the official code of Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation. This paper has been accepted to CVPR2

null 108 Dec 1, 2022
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
Code for "Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo"

Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo This repository includes the source code for our CVPR 2021 paper on multi-view mult

Jiahao Lin 66 Jan 4, 2023
Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation.

PersonLab This is a Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation. The model predicts heatmaps and vari

OCTI 160 Dec 21, 2022
PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

PyTorch Realtime Multi-Person Pose Estimation This is a pytorch version of Realtime_Multi-Person_Pose_Estimation, origin code is here Realtime_Multi-P

Dave Fang 157 Nov 12, 2022
Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)

Realtime Multi-Person Pose Estimation By Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh. Introduction Code repo for winning 2016 MSCOCO Keypoints Cha

Zhe Cao 4.9k Dec 31, 2022
PoseViz – Multi-person, multi-camera 3D human pose visualization tool built using Mayavi.

PoseViz – 3D Human Pose Visualizer Multi-person, multi-camera 3D human pose visualization tool built using Mayavi. As used in MeTRAbs visualizations.

István Sárándi 79 Dec 30, 2022
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

SHI Lab 174 Dec 19, 2022
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Build Type Linux MacOS Windows Build Status OpenPose has represented the first real-time multi-person system to jointly detect human body, hand, facia

null 25.7k Jan 9, 2023
The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

Dinghan Shen 49 Dec 22, 2022
An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding, top-down-bottom-up, and attention (consensus between columns)

GLOM - Pytorch (wip) An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding,

Phil Wang 173 Dec 14, 2022
Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

BRIMs Bidirectional Recurrent Independent Mechanisms Implementation of the paper Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neura

Sarthak Mittal 26 May 26, 2022
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Denis Emelin 42 Nov 24, 2022
Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

Sayom Shakib 4 Nov 3, 2022
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Dec 31, 2022
Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds (ICCV 2021 oral) **Project Page | Arxiv ** Runsong Zhu¹, Yuan Liu², Zhen Dong¹, Te

null 40 Dec 30, 2022
Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

STARS Laboratory 8 Sep 14, 2022
Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

Pyjcsx 328 Dec 17, 2022