PRTR: Pose Recognition with Cascade Transformers

Related tags

Deep Learning PRTR
Overview

PRTR: Pose Recognition with Cascade Transformers

Introduction

This repository is the official implementation for Pose Recognition with Cascade Transformers. It proposes two types of cascade Transformers, as follows, for pose recognition.

Two-stage Transformers

model_two_stage

Please refer to README.md for detailed usage of the two-stage model variant.

Sequential Transformers

model_sequential

Please refer to README.md for detailed usage of the sequential (end-to-end) model variant.

For more details, please see Pose Recognition with Cascade Transformers by Ke Li*, Shijie Wang*, Xiang Zhang*, Yifan Xu, Weijian Xu, and Zhuowen Tu.

Updates

Code and pretrained models will be released soon.

Citation

@misc{li2021pose,
      title={Pose Recognition with Cascade Transformers}, 
      author={Ke Li and Shijie Wang and Xiang Zhang and Yifan Xu and Weijian Xu and Zhuowen Tu},
      year={2021},
      eprint={2104.06976},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This repository is released under the Apache License 2.0. License can be found in LICENSE file.

Acknowledgments

This project is based on the following open source repositories, which greatly facilitate our research.

Comments
  • Squential Transformer training

    Squential Transformer training

    Hi thanks for open-sourcing the code, work is really good i have few queries

    1. Can we train the sequential transformer separately as i need to train it on the WIDER attribute dataset
    2. can we replace the keypoint detector with an attribute classifier so obtain the person attributes
    3. What is the inference time taken for sequential transformer

    Thanks in advance

    opened by abhigoku10 5
  • Confuesd about Positional encoding and Crop data

    Confuesd about Positional encoding and Crop data

    1. positional encoding

    Description of the paper

    absolute positional encoding (DETR Detector) positional encoding relative to corresponding bounding boxes.(Pose transformer )

    Do two-stage transformers use the same positional encoding of sine? (Refer to the code here)

    1. Crop data After debugging, I found that when training pose_transformer: the human box is not detected by DETR, but the box is labeled with COCO. (Refer to the data load function) That means people are predicting one by one. And I noticed that there is a scale operation when cropping, which makes the result after Crop more likely to contain multiple people, but supervision is a pose of one person. Debug trainning data image: QQ截图20211021185505

    So I am curious about how to do parallel detection after cropping when multiple people are detected.

    opened by leijue222 4
  • Question about the sequential variant

    Question about the sequential variant

    Hello, I have some questions about the sequential variant (annotated_prtr.ipynb):

    1. In the forward() function in cell 4, the dimension of hs is [B, person_per_image, f]. Is f here the transformer's dimension (similar to d_model in DETR's transformer)?

    2. For these two lines of code in preparation for STN feature cropping in cell 4:

    person_per_image = hs.size(1)
    num_person = person_per_image * hs.size(0)
    

    I am a little bit confused by person_per_image, since the number of person is likely different in each image. Is hs here similar to the hs in DETR, whose dimension is [batch_size, num_queries, d_model]?

    1. If I only need the cropped features and don't use Transformer (transformer_kpt) for further keypoint detection, do I have to build positional embeddings (as in cell 3) and apply them to STN feature cropping?

    Looking forward to your answers. Thanks in advance!

    opened by EckoTan0804 4
  • MultiScaleDeformableAttention ImportError

    MultiScaleDeformableAttention ImportError

    (PRTR) hpe2020@hbuzntw002-Precision-7920-Tower:~/shanbg/PRTR/two_stage$ python tools/test.py --cfg experiments/coco/transformer/w32_384x288_adamw_lr1e-4.yaml TEST.MODEL_FILE models/pytorch/pose_coco/pose_transformer_w32_384x288.pth TEST.USE_GT_BBOX False Traceback (most recent call last): File "tools/test.py", line 26, in from models.matcher import build_matcher File "/home/hpe2020/shanbg/PRTR/two_stage/tools/../lib/models/init.py", line 14, in import models.deformable_pose_transformer File "/home/hpe2020/shanbg/PRTR/two_stage/tools/../lib/models/deformable_pose_transformer.py", line 5, in from models.deformable_transformer import build_deformable_transformer, inverse_sigmoid File "/home/hpe2020/shanbg/PRTR/two_stage/tools/../lib/models/deformable_transformer.py", line 19, in from models.ops.modules import MSDeformAttn File "/home/hpe2020/shanbg/PRTR/two_stage/tools/../lib/models/ops/modules/init.py", line 9, in from .ms_deform_attn import MSDeformAttn File "/home/hpe2020/shanbg/PRTR/two_stage/tools/../lib/models/ops/modules/ms_deform_attn.py", line 21, in from ..functions import MSDeformAttnFunction File "/home/hpe2020/shanbg/PRTR/two_stage/tools/../lib/models/ops/functions/init.py", line 9, in from .ms_deform_attn_func import MSDeformAttnFunction File "/home/hpe2020/shanbg/PRTR/two_stage/tools/../lib/models/ops/functions/ms_deform_attn_func.py", line 18, in import MultiScaleDeformableAttention as MSDA ImportError: /home/hpe2020/anaconda3/envs/PRTR/lib/python3.6/site-packages/MultiScaleDeformableAttention-1.0-py3.6-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E

    I need help

    opened by sheeranshan 3
  • Training results of COCO

    Training results of COCO

    Hello, This is an amazing work , however, I faced a challenge while training the model on the COCO dataset. The below picture shows the results reported. What could be the reason ? Hoping to receive a reply from you.Thank you. 743423802

    opened by edwintenagyei367 2
  • Person-detection Transformer in two-stage variant

    Person-detection Transformer in two-stage variant

    Hi,

    Thanks for the nice work, I am just get a little bit confused about the implementation. The proposed person-detection transformer is not used in the two-stage variant. Instead, person detection results from another detector and gt bounding boxes are used in COCO and MPII, respectively.

    Am I understanding it right? If the proposed person-detection transformer is used, can you point it out where and how? Thanks a lot!

    Regards, Charles

    opened by Charleshhy 2
  • How do you discard unmatched queries

    How do you discard unmatched queries

    Hello, In training stage of person detection Transformer, how do you discard unmatched queries ? I don't see the code of this stage. Is the "person per image" in your code equals to 5 or less in End-to-end variant?

    opened by ziqi123 1
  • Person-detection Transformer

    Person-detection Transformer

    Hello,

    I've read the PRTR paper and the issue #3. However, I am still not 100% clear about how to fine-tune DETR for person detection by myself. Can you please share more details about it, e.g., how to adjust DETR as well as data loading of COCO for only detecting persons? Btw have you also fine-tuned deformable DETR for person detection?

    Thanks in advance!

    opened by EckoTan0804 1
  • About End to End part

    About End to End part

    After reading your excellent job I would like to try the end-to-end (sequential) part. But I found most of code/tool and instructions all in the two-stage part. Do you have any quick guide about the end-to-end part like two-stage, so that I can run end-to-end like your paper.

    opened by jimyeh7543 0
  • About MPII dataset

    About MPII dataset

    I think your method is very powerful after reading your paper. I would like to ask a question. I would like to implement your end-to-end method, but I don't seem to see any spii dataset using end-to-end training in your repo. I may have missed it, where would it be in this repo?

    opened by CYH4157 1
  • error: command 'g++' failed with exit status 1

    error: command 'g++' failed with exit status 1

    /home/sherry/anaconda3/compiler_compat/ld: cannot find -lcudart collect2: error: ld returned 1 exit status error: command 'g++' failed with exit status 1

    When I run bash make.sh, this problem occurs. How to deal with it?

    opened by Sherryxingxing 4
  • The STN cropping in annotated_prtr.ipynb only works for batch_size=1

    The STN cropping in annotated_prtr.ipynb only works for batch_size=1

    Hello,

    I've tried the code of STN cropping in annotated_prtr.ipynb. For single image (i.e. batch_size=1) it works well. But for batch_size > 1 (e.g. 2) an error occurs: 截屏2021-10-06 11 20 49 (In my case, person_per_image is set to be 100, therefore the first dimension is 2 * 100 = 200)

    It seems that this error is caused by expand() since expand() can only expand the axis with size 1 (according to the discussion: https://discuss.pytorch.org/t/runtimeerror-the-expanded-size-of-the-tensor-must-match-the-existing-size/105523).

    opened by EckoTan0804 0
Owner
mlpc-ucsd
mlpc-ucsd
Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.

CasRel-pytorch-reimplement Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The o

longlongman 170 Dec 1, 2022
一个目标检测的通用框架(不需要cuda编译),支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。

一个目标检测的通用框架(不需要cuda编译),支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。

Haoyu Xu 203 Jan 3, 2023
CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching(CVPR2021)

CFNet(CVPR 2021) This is the implementation of the paper CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching, CVPR 2021, Zhelun Shen, Yuch

null 106 Dec 28, 2022
3D cascade RCNN for object detection on point cloud

3D Cascade RCNN This is the implementation of 3D Cascade RCNN: High Quality Object Detection in Point Clouds. We designed a 3D object detection model

Qi Cai 22 Dec 2, 2022
Automatic labeling, conversion of different data set formats, sample size statistics, model cascade

Simple Gadget Collection for Object Detection Tasks Automatic image annotation Conversion between different annotation formats Obtain statistical info

llt 4 Aug 24, 2022
Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

Real Cascade U-Nets for Anime Image Super Resolution 中文 | English ?? Real-CUGAN

tarsin 111 Dec 28, 2022
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a

Jia Li 256 Dec 24, 2022
《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

null 42 Nov 24, 2022
Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

Pyjcsx 328 Dec 17, 2022
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 52 Nov 25, 2022
Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Ibai Gorordo 99 Dec 31, 2022
Research code for CVPR 2021 paper "End-to-End Human Pose and Mesh Reconstruction with Transformers"

MeshTransformer ✨ This is our research code of End-to-End Human Pose and Mesh Reconstruction with Transformers. MEsh TRansfOrmer is a simple yet effec

Microsoft 473 Dec 31, 2022
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

Ce Zheng 363 Dec 28, 2022
Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos Introduction This repo is official PyTorch implementatio

Gyeongsik Moon 29 Sep 24, 2022
Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

Sayom Shakib 4 Nov 3, 2022
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Dec 31, 2022
The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

PIC4SeRCentre 20 Jan 3, 2023
Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Spacetimeformer Multivariate Forecasting This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecast

QData 440 Jan 2, 2023
Bottleneck Transformers for Visual Recognition

Bottleneck Transformers for Visual Recognition Experiments Model Params (M) Acc (%) ResNet50 baseline (ref) 23.5M 93.62 BoTNet-50 18.8M 95.11% BoTNet-

Myeongjun Kim 236 Jan 3, 2023