[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation

Overview

RTD-Net (ICCV 2021)

This repo holds the codes of paper: "Relaxed Transformer Decoders for Direct Action Proposal Generation", accepted in ICCV 2021.

News

[2021.8.17] We release codes, checkpoint and features on THUMOS14.

RTD-Net Overview

Overview

This paper presents a simple and end-to-end learnable framework (RTD-Net) for direct action proposal generation, by re-purposing a Transformer-alike architecture. Thanks to the parallel decoding of multiple proposals with explicit context modeling, our RTD-Net outperforms the previous state-of-the-art methods in temporal action proposal generation task on THUMOS14 and also yields a superior performance for action detection on this dataset. In addition, free of NMS post-processing, our detection pipeline is more efficient than previous methods.

Dependencies

Data Preparation

To reproduce the results in THUMOS14 without further changes:

  1. Download the data from GoogleDrive.

  2. Place I3D_features and TEM_scores into the folder data.

Checkpoint

Dataset AR@50 AR@100 AR@200 AR@500 checkpoint
THUMOS14 41.52 49.33 56.41 62.91 link

RTD-Net performance on THUMOS14

Training

Use train.sh to train RTD-Net.


# First stage

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11323 --use_env main.py --window_size 100 --batch_size 32 --stage 1 --num_queries 32 --point_prob_normalize

# Second stage for relaxation mechanism

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11324 --use_env main.py --window_size 100 --batch_size 32 --lr 1e-5 --stage 2 --epochs 10 --lr_drop 5 --num_queries 32 --point_prob_normalize --load outputs/checkpoint_best_sum_ar.pth

# Third stage for completeness head

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11325 --use_env main.py --window_size 100 --batch_size 32 --lr 1e-4 --stage 3 --epochs 20 --num_queries 32 --point_prob_normalize --load outputs/checkpoint_best_sum_ar.pth

Testing

Inference with test.sh.

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11325 --use_env main.py --window_size 100 --batch_size 32 --lr 1e-4 --stage 3 --epochs 20 --num_queries 32 --point_prob_normalize --eval --resume outputs/checkpoint_best_sum_ar.pth

References

We especially thank the contributors of the BSN, G-TAD and DETR for providing helpful code.

Citations

If you think our work is helpful, please feel free to cite our paper.

@InProceedings{Tan_2021_RTD,
    author    = {Tan, Jing and Tang, Jiaqi and Wang, Limin and Wu, Gangshan},
    title     = {Relaxed Transformer Decoders for Direct Action Proposal Generation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {13526-13535}
}

Contact

For any question, please file an issue or contact

Jing Tan: [email protected]
Jiaqi Tang: [email protected]
Comments
  • ANet

    ANet

    I want to know, which one the feature file do you use? image

    And I use the feature of anet_i3d_feature_25fps/flow(rgb)-resize-step16 and test with your checkpoint_best_auc.pth, but the resuts I get is this. image

    opened by DoublePan-Oh 10
  • How can I get a multi-scale proposal

    How can I get a multi-scale proposal

    Ask a question again. Through experiments, the model makes proposals in different positions by sliding the window. The length of the proposal is basically the same. This phenomenon is due to the setting of positional embedding or other reasons. I added the start and end score convolution to the overall model to train together. @tony2016uestc

    opened by ZhongkaiZ 7
  • about anet code

    about anet code

    您好,我去年复现了rtd 在thumos14数据集上的代码,但是我用同样的环境运行anet的代码,却发生了环境方面的错误,nccl278 error。刚才我发现anet代码中util/misc.py文件中,这个函数torch.distributed.init_process_group( backend=args.dist_backend, init_method=args.dist_url, world_size=args.world_size, rank=args.rank, ),最后一个参数后面有一个逗号(rank=args.rank,),但是thumos14的代码中是没有,我想问下这个多出来的逗号是一个错误吗?

    opened by menghuaa 4
  • about activitynet feature

    about activitynet feature

    Can you provide the feature of activitynet 1.3 that are rescaled to 100 via linear interpolation?The link that you provide can't visit and the other link is hard to download.

    opened by menghuaa 4
  • labe issue

    labe issue

    I am confused about the label problem. In your code, the label of the ground-truth is 0, which represent this is action, can I think that the lable output by the network is 0 for action, not 0 for background? If the predicted label is 0, does it match the ground-truth?

    opened by DoublePan-Oh 4
  • Results

    Results

    I follow the steps of github to train and test, but the results I get is different from the paper result,I wonder if there is something I haven't noticed? image Looking forward your reply!

    opened by DoublePan-Oh 3
  • about thumos14 annotation

    about thumos14 annotation

    你好,我看到您提供的thumos14_anno__action.json以及thumos14_anno_action_class_idx.json文件中video_test_0001459的duration_second、fps与您提供的thumos14_test_groundtruth.csv中的不一致,通过与gtad对比thumos14_test_groundtruth.csv中的应该是正确的。请问这是您没有注意的一个错误吗?我目前只看了这一个视频,因为我发现thumos14中有些视频的帧频并不是30fps,而在thumos14_anno__action.json、thumos14_anno_action_class_idx.json中您把所有的视频的帧频都设置成了30fps

    opened by menghuaa 3
  • mAP evaluation

    mAP evaluation

    Hi, thanks for your interesting work! I'm wondering when would you share the evaluation code for calculating the Mean Average Precision (mAP)? Thanks so much!

    opened by xiaodanhu 3
  • a problem

    a problem

    作者您好,anet代码在我的gpu上训练时间太久,整个流程大概需要5天,这对我来说试错成本太高了。 想问下您,activitynet1.2是activitynet1.3的子类,如果我用RTD在activitynet1.2上训练,从activitynet1.3的视频特征中提取出属于activitynet1.2的视频的特征,这样是否是可以的呢?因为目前tsn的特征,只有activitynet1.3的,没有activitynet1.2的。然后我也将activitynet1.2与activitynet1.3的注释文件进行了对比,结果显示activitynet1.2的视频在activitynet1.3都存在,且视频持续时间都是一样的,每个视频annotations的数量都是一致的,仅有三个视频activitynet1.3的annotations的数量增加了一个。

    opened by menghuaa 2
  • torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0

    torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0

    跑代码报了这个错,真的不知道出了什么问题 INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_0p4sbyi9/none_egcn9ob1/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_0p4sbyi9/none_egcn9ob1/attempt_0/1/error.json [W ProcessGroupNCCL.cpp:1569] Rank 0 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1569] Rank 1 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. QStandardPaths: XDG_RUNTIME_DIR points to non-existing path '/run/user/1065', please create it with 0700 permissions. QStandardPaths: XDG_RUNTIME_DIR points to non-existing path '/run/user/1065', please create it with 0700 permissions. qt.qpa.screen: QXcbConnection: Could not connect to display localhost:11.0 Could not connect to any X display. qt.qpa.screen: QXcbConnection: Could not connect to display localhost:11.0 Could not connect to any X display. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 161242) of binary: /home/10601006/apps/anaconda3/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:

    opened by menghuaa 2
  • about RTD in Anet

    about RTD in Anet

    Hello, I see that your paper combines the proposals generated by RTD on activitynet1.3 with untrimmednet to get the detection result. Can you provide the video classification results or model of untrimmednet training on activitynet1.3?

    opened by menghuaa 1
  • temporal action detection

    temporal action detection

    I would like to ask if you can tell how to do temporal action detection tasks with the generated proposal? What has been modified on the model? It would be nice to tell me the code!

    opened by DoublePan-Oh 1
Owner
Multimedia Computing Group, Nanjing University
Multimedia Computing Group, Nanjing University
CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network - Pytorch This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal

Zhiwu Qing 63 Sep 27, 2022
Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

Pilhyeon Lee 67 Jan 3, 2023
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

PIC4SeRCentre 20 Jan 3, 2023
PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud, CVPR 2019.

PointRCNN PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud Code release for the paper PointRCNN:3D Object Proposal Generation a

Shaoshuai Shi 1.5k Dec 27, 2022
Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Vision-Language Transformer and Query Generation for Referring Segmentation Please consider citing our paper in your publications if the project helps

Henghui Ding 143 Dec 23, 2022
git《FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding》(CVPR 2021) GitHub: [fig8]

FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding (CVPR 2021) This repo contains the implementation of our state-of-the-art fewshot ob

null 233 Dec 29, 2022
[ICCV 2021] Group-aware Contrastive Regression for Action Quality Assessment

CoRe Created by Xumin Yu*, Yongming Rao*, Wenliang Zhao, Jiwen Lu, Jie Zhou This is the PyTorch implementation for ICCV paper Group-aware Contrastive

Xumin Yu 31 Dec 24, 2022
Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

Tim Ansell 70 Nov 4, 2022
Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

ACTION-Net Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21). Getting Started EgoGesture data folder struct

V-Sense 171 Dec 26, 2022
Human Action Controller - A human action controller running on different platforms.

Human Action Controller (HAC) Goal A human action controller running on different platforms. Fun Easy-to-use Accurate Anywhere Fun Examples Mouse Cont

null 27 Jul 20, 2022
This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".

Self-Diagnosis and Self-Debiasing This repository contains the source code for Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based

Timo Schick 62 Dec 12, 2022
PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Long Short-Term Transformer for Online Action Detection Introduction This is a PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short

null 77 Dec 16, 2022
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

GDR-Net This repo provides the PyTorch implementation of the work: Gu Wang, Fabian Manhardt, Federico Tombari, Xiangyang Ji. GDR-Net: Geometry-Guided

null 169 Jan 7, 2023
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation (ICCV 2021)

Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation Home | PyTorch BigGAN Discovery | TensorFlow ProGAN Regulariza

Yuxiang Wei 54 Dec 30, 2022
(ICCV 2021) Official code of "Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing."

Dressing in Order (DiOr) ?? [Paper] ?? [Webpage] ?? [Running this code] The official implementation of "Dressing in Order: Recurrent Person Image Gene

Aiyu Cui 277 Dec 28, 2022
[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

Xiefan Guo 122 Dec 11, 2022
Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

tyty 4 Aug 28, 2022