[ACM MM 2021] TSA-Net: Tube Self-Attention Network for Action Quality Assessment

ShunliWang

Last update: Dec 23, 2022

Related tags

Deep Learning TSA-Net

Overview

Tube Self-Attention Network (TSA-Net)

This repository contains the PyTorch implementation for paper TSA-Net: Tube Self-Attention Network for Action Quality Assessment (ACM-MM'21 Oral)

[arXiv] [supp] [slides] [poster] [video]

If this repository is helpful to you, please star it. If you find our work useful in your research, please consider citing:

@inproceedings{TSA-Net,
  title={TSA-Net: Tube Self-Attention Network for Action Quality Assessment},
  author={Wang, Shunli and Yang, Dingkang and Zhai, Peng and Chen, Chixiao and Zhang, Lihua},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021},
  pages={4902–4910},
  numpages={9}
}

User Guide

In this repository, we open source the code of TSA-Net on FR-FS dataset. The initialization process is as follows:

# 1.Clone this repository
git clone https://github.com/Shunli-Wang/TSA-Net.git ./TSA-Net
cd ./TSA-Net

# 2.Create conda env
conda create -n TSA-Net python
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

# 3.Download pre-trained model and FRFS dataset. All download links are listed as follow.
# PATH/TO/rgb_i3d_pretrained.pt 
# PATH/TO/FRFS 

# 4.Create data dir
mkdir ./data && cd ./data
mv PATH/TO/rgb_i3d_pretrained.pt ./
ln -s PATH/TO/FRFS ./FRFS

After initialization, please check the data structure:

.
├── data
│   ├── FRFS -> PATH/TO/FRFS
│   └── rgb_i3d_pretrained.pt
├── dataset.py
├── train.py
├── test.py
...

Download links:

FR-FS Dataset: You can download the FR-FS dataset (About 2.5 G) from BaiduNetDisk [star] or Google Drive
rgb_i3d_pretrained.pt: I3D backbone pretrained on Kinetics (BaiduNetDisk [i3dm] or Google Drive) is used in our work, which is referenced from Gated-Spatio-Temporal-Energy-Graph.
Tracking boxes for AQA-7 & MTL-AQA: Due to the ongoing work, we are sorry that we can't share the source code of MTL-AQA and AQA-7. We provide the original tracking boxes of AQA and MTL-AQA at BaiduNetDisk [6v51] or Google Drive.

Training & Evaluation

We provide the training and testing code of TSA-Net and Plain-Net. The difference between the two is whether the TSA module exists. This option is controlled by --TSA item.

python train.py --gpu 0 --model_path TSA-USDL --TSA
python test.py --gpu 0 --pt_w Exp/TSA-USDL/best.pth --TSA

python train.py --gpu 0 --model_path USDL
python test.py --gpu 0 --pt_w Exp/USDL/best.pth

Acknowledgement

Our code is adapted from MUSDL. We are very grateful for their wonderful implementation. All tracking boxes in our project are generated by SiamMask. We also sincerely thank them for their contributions.

Contact

If you have any questions about our work, please contact [email protected].

You might also like...

MRQy is a quality assurance and checking tool for quantitative assessment of magnetic resonance imaging (MRI) data.

Front-end View Backend View Table of Contents Description Prerequisites Running Basic Information Measurements User Interface Feedback and usage Descr

Center for Computational Imaging and Personalized Diagnostics

58 Dec 2, 2022

No-reference Image Quality Assessment(NIQA) Algorithms (BRISQUE, NIQE, PIQE, RankIQA, MetaIQA)

No-Reference Image Quality Assessment Algorithms No-reference Image Quality Assessment(NIQA) is a task of evaluating an image without a reference imag

26 Jan 4, 2023

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

[3DV 2021] We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network.

4 May 26, 2022

This is an official implementation of "Polarized Self-Attention: Towards High-quality Pixel-wise Regression"

Polarized Self-Attention: Towards High-quality Pixel-wise Regression This is an official implementation of: Huajun Liu, Fuqiang Liu, Xinyi Fan and Don

212 Jan 8, 2023

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

70 Nov 4, 2022

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

67 Jan 3, 2023

Human Action Controller - A human action controller running on different platforms.

Human Action Controller (HAC) Goal A human action controller running on different platforms. Fun Easy-to-use Accurate Anywhere Fun Examples Mouse Cont

27 Jul 20, 2022

The code repository for "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection" (ACM MM'21)

RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection (ACM MM'21) By Zhuofan Zong, Qianggang Cao, Biao Leng Introduction F

9 Jul 30, 2022

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

104 Nov 25, 2022

Comments

The copyright of Olympic broadcast video

Thanks for the great work.

As far as I know, the Olympic broadcast video is copyrighted, is it legal to use this dataset in my work?

Could you provide some clarification on copyright?

Looking forward to your reply.

opened by Lycan1003 0
clarification on tube generation

Hi, Regarding the tube generation process in get_mask() in evaluator.py. As far as i can see the actor's bounding boxes are extracted using SiamMask on images of size 640x360 (FRFS) which are then passed to get_mask(). On the other hand the dataloader in get_imgs() first down scales frames to 455x256 followed by a crop of 224x224. From looking at get_mask() i see that lines 21-25 intend to place the actor's bounding box properly on the feature maps (spatially), but i am not certain they will fit spatially because the actors bounding boxes did not adjust to the same downscale and crop process that the frames did. Is this a bug or i perhaps i am misunderstanding something? Thanks

opened by orikorner 1

[ACM MM 2021] TSA-Net: Tube Self-Attention Network for Action Quality Assessment

Related tags

Overview

Tube Self-Attention Network (TSA-Net)

User Guide

Training & Evaluation

Acknowledgement

Contact

You might also like...

MRQy is a quality assurance and checking tool for quantitative assessment of magnetic resonance imaging (MRI) data.

No-reference Image Quality Assessment(NIQA) Algorithms (BRISQUE, NIQE, PIQE, RankIQA, MetaIQA)

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

This is an official implementation of "Polarized Self-Attention: Towards High-quality Pixel-wise Regression"

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Human Action Controller - A human action controller running on different platforms.

The code repository for "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection" (ACM MM'21)

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

Comments

The copyright of Olympic broadcast video

clarification on tube generation

Owner

ShunliWang

[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

[CVPRW 2021] Code for Region-Adaptive Deformable Network for Image Quality Assessment

Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

Source code for paper "Deep Superpixel-based Network for Blind Image Quality Assessment"

[ACM MM 2021] Yes, "Attention is All You Need", for Exemplar based Colorization

Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms

MagFace: A Universal Representation for Face Recognition and Quality Assessment

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Lightweight Face Image Quality Assessment