[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation

Multimedia Computing Group, Nanjing University

Last update: Nov 30, 2022

Related tags

Overview

RTD-Net (ICCV 2021)

This repo holds the codes of paper: "Relaxed Transformer Decoders for Direct Action Proposal Generation", accepted in ICCV 2021.

News

[2021.8.17] We release codes, checkpoint and features on THUMOS14.

Overview

This paper presents a simple and end-to-end learnable framework (RTD-Net) for direct action proposal generation, by re-purposing a Transformer-alike architecture. Thanks to the parallel decoding of multiple proposals with explicit context modeling, our RTD-Net outperforms the previous state-of-the-art methods in temporal action proposal generation task on THUMOS14 and also yields a superior performance for action detection on this dataset. In addition, free of NMS post-processing, our detection pipeline is more efficient than previous methods.

Dependencies

Python 3.7 or higher
PyTorch 1.6 or higher
Torchvision
Numpy 1.19.2

Data Preparation

To reproduce the results in THUMOS14 without further changes:

Download the data from GoogleDrive.
Place I3D_features and TEM_scores into the folder data.

Checkpoint

Dataset	AR@50	AR@100	AR@200	AR@500	checkpoint
THUMOS14	41.52	49.33	56.41	62.91	link

Training

Use train.sh to train RTD-Net.


# First stage

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11323 --use_env main.py --window_size 100 --batch_size 32 --stage 1 --num_queries 32 --point_prob_normalize

# Second stage for relaxation mechanism

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11324 --use_env main.py --window_size 100 --batch_size 32 --lr 1e-5 --stage 2 --epochs 10 --lr_drop 5 --num_queries 32 --point_prob_normalize --load outputs/checkpoint_best_sum_ar.pth

# Third stage for completeness head

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11325 --use_env main.py --window_size 100 --batch_size 32 --lr 1e-4 --stage 3 --epochs 20 --num_queries 32 --point_prob_normalize --load outputs/checkpoint_best_sum_ar.pth

Testing

Inference with test.sh.

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11325 --use_env main.py --window_size 100 --batch_size 32 --lr 1e-4 --stage 3 --epochs 20 --num_queries 32 --point_prob_normalize --eval --resume outputs/checkpoint_best_sum_ar.pth

References

We especially thank the contributors of the BSN, G-TAD and DETR for providing helpful code.

Citations

If you think our work is helpful, please feel free to cite our paper.

@InProceedings{Tan_2021_RTD,
    author    = {Tan, Jing and Tang, Jiaqi and Wang, Limin and Wu, Gangshan},
    title     = {Relaxed Transformer Decoders for Direct Action Proposal Generation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {13526-13535}
}

Contact

For any question, please file an issue or contact

Jing Tan: [email protected]
Jiaqi Tang: [email protected]

Comments

ANet

I want to know, which one the feature file do you use?

And I use the feature of anet_i3d_feature_25fps/flow(rgb)-resize-step16 and test with your checkpoint_best_auc.pth, but the resuts I get is this.

opened by DoublePan-Oh 10
How can I get a multi-scale proposal

Ask a question again. Through experiments, the model makes proposals in different positions by sliding the window. The length of the proposal is basically the same. This phenomenon is due to the setting of positional embedding or other reasons. I added the start and end score convolution to the overall model to train together. @tony2016uestc

opened by ZhongkaiZ 7
about anet code

您好，我去年复现了rtd 在thumos14数据集上的代码，但是我用同样的环境运行anet的代码，却发生了环境方面的错误，nccl278 error。刚才我发现anet代码中util/misc.py文件中，这个函数torch.distributed.init_process_group( backend=args.dist_backend, init_method=args.dist_url, world_size=args.world_size, rank=args.rank, )，最后一个参数后面有一个逗号（rank=args.rank,），但是thumos14的代码中是没有，我想问下这个多出来的逗号是一个错误吗？

opened by menghuaa 4
about activitynet feature

Can you provide the feature of activitynet 1.3 that are rescaled to 100 via linear interpolation?The link that you provide can't visit and the other link is hard to download.

opened by menghuaa 4
labe issue

I am confused about the label problem. In your code, the label of the ground-truth is 0, which represent this is action, can I think that the lable output by the network is 0 for action, not 0 for background？ If the predicted label is 0, does it match the ground-truth?

opened by DoublePan-Oh 4
Results

I follow the steps of github to train and test, but the results I get is different from the paper result，I wonder if there is something I haven't noticed? Looking forward your reply!

opened by DoublePan-Oh 3
about thumos14 annotation

你好，我看到您提供的thumos14_anno__action.json以及thumos14_anno_action_class_idx.json文件中video_test_0001459的duration_second、fps与您提供的thumos14_test_groundtruth.csv中的不一致，通过与gtad对比thumos14_test_groundtruth.csv中的应该是正确的。请问这是您没有注意的一个错误吗？我目前只看了这一个视频，因为我发现thumos14中有些视频的帧频并不是30fps，而在thumos14_anno__action.json、thumos14_anno_action_class_idx.json中您把所有的视频的帧频都设置成了30fps

opened by menghuaa 3
mAP evaluation

Hi, thanks for your interesting work! I'm wondering when would you share the evaluation code for calculating the Mean Average Precision (mAP)? Thanks so much!

opened by xiaodanhu 3
a problem

作者您好，anet代码在我的gpu上训练时间太久，整个流程大概需要5天，这对我来说试错成本太高了。想问下您，activitynet1.2是activitynet1.3的子类，如果我用RTD在activitynet1.2上训练，从activitynet1.3的视频特征中提取出属于activitynet1.2的视频的特征，这样是否是可以的呢？因为目前tsn的特征，只有activitynet1.3的，没有activitynet1.2的。然后我也将activitynet1.2与activitynet1.3的注释文件进行了对比，结果显示activitynet1.2的视频在activitynet1.3都存在，且视频持续时间都是一样的，每个视频annotations的数量都是一致的，仅有三个视频activitynet1.3的annotations的数量增加了一个。

opened by menghuaa 2
torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0

跑代码报了这个错，真的不知道出了什么问题 INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_0p4sbyi9/none_egcn9ob1/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_0p4sbyi9/none_egcn9ob1/attempt_0/1/error.json [W ProcessGroupNCCL.cpp:1569] Rank 0 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1569] Rank 1 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. QStandardPaths: XDG_RUNTIME_DIR points to non-existing path '/run/user/1065', please create it with 0700 permissions. QStandardPaths: XDG_RUNTIME_DIR points to non-existing path '/run/user/1065', please create it with 0700 permissions. qt.qpa.screen: QXcbConnection: Could not connect to display localhost:11.0 Could not connect to any X display. qt.qpa.screen: QXcbConnection: Could not connect to display localhost:11.0 Could not connect to any X display. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 161242) of binary: /home/10601006/apps/anaconda3/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:

opened by menghuaa 2
about RTD in Anet

Hello, I see that your paper combines the proposals generated by RTD on activitynet1.3 with untrimmednet to get the detection result. Can you provide the video classification results or model of untrimmednet training on activitynet1.3?

opened by menghuaa 1
temporal action detection

I would like to ask if you can tell how to do temporal action detection tasks with the generated proposal? What has been modified on the model? It would be nice to tell me the code!

opened by DoublePan-Oh 1

Owner

Multimedia Computing Group, Nanjing University

GitHub

CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network - Pytorch This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal

63 Sep 27, 2022

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

67 Jan 3, 2023

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

248 Dec 23, 2022

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

20 Jan 3, 2023

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud, CVPR 2019.

PointRCNN PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud Code release for the paper PointRCNN:3D Object Proposal Generation a

1.5k Dec 27, 2022

Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Vision-Language Transformer and Query Generation for Referring Segmentation Please consider citing our paper in your publications if the project helps

143 Dec 23, 2022

git《FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding》(CVPR 2021) GitHub: [fig8]

FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding (CVPR 2021) This repo contains the implementation of our state-of-the-art fewshot ob

233 Dec 29, 2022

[ICCV 2021] Group-aware Contrastive Regression for Action Quality Assessment

CoRe Created by Xumin Yu*, Yongming Rao*, Wenliang Zhao, Jiwen Lu, Jie Zhou This is the PyTorch implementation for ICCV paper Group-aware Contrastive

31 Dec 24, 2022

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

70 Nov 4, 2022

Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

ACTION-Net Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21). Getting Started EgoGesture data folder struct

171 Dec 26, 2022

Human Action Controller - A human action controller running on different platforms.

Human Action Controller (HAC) Goal A human action controller running on different platforms. Fun Easy-to-use Accurate Anywhere Fun Examples Mouse Cont

27 Jul 20, 2022

This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".

Self-Diagnosis and Self-Debiasing This repository contains the source code for Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based

62 Dec 12, 2022

PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Long Short-Term Transformer for Online Action Detection Introduction This is a PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short

77 Dec 16, 2022

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

GDR-Net This repo provides the PyTorch implementation of the work: Gu Wang, Fabian Manhardt, Federico Tombari, Xiangyang Ji. GDR-Net: Geometry-Guided

169 Jan 7, 2023

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation (ICCV 2021)

Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation Home | PyTorch BigGAN Discovery | TensorFlow ProGAN Regulariza

54 Dec 30, 2022

(ICCV 2021) Official code of "Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing."

Dressing in Order (DiOr) ?? [Paper] ?? [Webpage] ?? [Running this code] The official implementation of "Dressing in Order: Recurrent Person Image Gene

277 Dec 28, 2022

[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

122 Dec 11, 2022

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

4 Aug 28, 2022