The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

PIC4SeRCentre

Last update: Jan 3, 2023

Related tags

Deep Learning AcT

Overview

Action Transformer
A Self-Attention Model for Short-Time Human Action Recognition

This repository contains the official TensorFlow implementation of the paper "Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition".

Action Transformer (AcT), a simple, fully self-attentional architecture that consistently outperforms more elaborated networks that mix convolutional, recurrent and attentive layers. In order to limit computational and energy requests, building on previous human action recognition research, the proposed approach exploits 2D pose representations over small temporal windows, providing a low latency solution for accurate and effective real-time performance.

To do so, we open-source MPOSE2021, a new large-scale dataset, as an attempt to build a formal training and evaluation benchmark for real-time, short-time HAR. MPOSE2021 is developed as an evolution of the MPOSE Dataset [1-3]. It is made by human pose data detected by OpenPose [4] and Posenet [5] on popular datasets for HAR.

This repository allows to easily run a benchmark of AcT models using MPOSE2021, as well as executing a random hyperparameter search.

Usage

First, clone the repository and install the required pip packages (virtual environment recommended!).

pip install -r requirements.txt

To run a random search:

python main.py -s

To run a benchmark:

python main.py -b

That's it!

This code uses the mpose pip package, a friendly tool to download and process MPOSE2021 pose data.

Citations

AcT is intended for scientific research purposes. If you want to use this repository for your research, please cite our work (Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition) as well as [1-5].

@article{mazzia2021action,
  title={Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition},
  author={Mazzia, Vittorio and Angarano, Simone and Salvetti, Francesco and Angelini, Federico and Chiaberge, Marcello},
  journal={Pattern Recognition},
  pages={108487},
  year={2021},
  publisher={Elsevier}
}

References

[1] Angelini, F., Fu, Z., Long, Y., Shao, L., & Naqvi, S. M. (2019). 2D Pose-Based Real-Time Human Action Recognition With Occlusion-Handling. IEEE Transactions on Multimedia, 22(6), 1433-1446.

[2] Angelini, F., Yan, J., & Naqvi, S. M. (2019, May). Privacy-preserving Online Human Behaviour Anomaly Detection Based on Body Movements and Objects Positions. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8444-8448). IEEE.

[3] Angelini, F., & Naqvi, S. M. (2019, July). Joint RGB-Pose Based Human Action Recognition for Anomaly Detection Applications. In 2019 22th International Conference on Information Fusion (FUSION) (pp. 1-7). IEEE.

[4] Cao, Z., Hidalgo, G., Simon, T., Wei, S. E., & Sheikh, Y. (2019). OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence, 43(1), 172-186.

[5] Papandreou, G., Zhu, T., Chen, L. C., Gidaris, S., Tompson, J., & Murphy, K. (2018). Personlab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 269-286).

[6] Mazzia, V., Angarano, S., Salvetti, F., Angelini, F., & Chiaberge, M. (2021). Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition. Pattern Recognition, 108487.

Comments

Velocity

Good job @simoneangarano and thank you.

I just wanted to ask how do you get velocity features from the openpose pose estimation? do you subtract x2-y2 to get the velocity?

And do you have an inference.py possible or pretrained model for the purpose of testing.

Thank you very much.

opened by sard0r 3
Requesting for the trained best model

Hi,

Thanks for your work on real-time human action recognition. I am trying to apply a HAR application on a mobile device.

My question is: Could you provide the trained best model to the public?

If so, we don't need to train a model and we can use the best model to make inferences directly. It will provide much convenience. Thank you.

opened by Yann-Ma 2
Issue running posenet benchmark

Hi,

I tried running the benchmark for posenet which is giving the following error:

ValueError: Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 30, 68), found shape=(None, 30, 52)

Please let me know what I can do about this.

Thank you!

opened by saliknadeem 0
Train data augmentation random flip

https://github.com/PIC4SeR/AcT/blob/ebe08f0255eed4286b2ee98c60573c53489dbdc9/utils/data.py#L42

Sorry to poor english.

I'm guessing, shouldn't we be flipping and swapping the even and odd index values? If you don't change it, there seems to be a risk of overfitting on one side.

opened by JoonHoonKim 3
Trying to train the network on different seq length instead of 30

Hi,

I am trying to run the code using Frames : 18 instead of 30 and train the model. I saw that it is also reading the config file under mpose/config.yaml and i changed there as well T: 18 , but still in mpose.py line 122 in function load_data i still get that the x_train and test are (12562, 30, 17, 3). how can i change it to be 18?

opened by sgalita 1

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Related tags

Overview

Action Transformer
A Self-Attention Model for Short-Time Human Action Recognition

Usage

Citations

References

Comments

Velocity

Requesting for the trained best model

Issue running posenet benchmark

Train data augmentation random flip

Trying to train the network on different seq length instead of 30

Owner

PIC4SeRCentre

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Human Action Controller - A human action controller running on different platforms.

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

This is an official implementation for "Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation".

Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Related tags

Overview

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition

Usage

Citations

References

Comments

Velocity

Requesting for the trained best model

Issue running posenet benchmark

Train data augmentation random flip

Trying to train the network on different seq length instead of 30

Owner

PIC4SeRCentre

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Human Action Controller - A human action controller running on different platforms.

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

This is an official implementation for "Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation".

Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

Action Transformer
A Self-Attention Model for Short-Time Human Action Recognition