Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".

Last update: Dec 15, 2022

Related tags

Deep Learning OadTR

Overview

OadTR

Code for our ICCV2021 paper: "OadTR: Online Action Detection with Transformers" ["Paper"]

Update

July 28, 2021: Our Paper "OadTR: Online Action Detection with Transformers" was accepted by ICCV2021. At the same time, we released THUMOS14-Kinetics feature.

Dependencies

pytorch==1.6.0
json
numpy
tensorboard-logger
torchvision==0.7.0

Prepare

Unzip the anno file "./data/anno_thumos.zip"
Download the feature THUMOS14-Anet feature (Note: HDD and TVSeries are available by contacting the authors of the datasets and signing agreements due to the copyrights. You can use this Repo to extract features.)

Training

python main.py --num_layers 3 --decoder_layers 5 --enc_layers 64 --output_dir models/en_3_decoder_5_lr_drop_1

Validation

python main.py --num_layers 3 --decoder_layers 5 --enc_layers 64 --output_dir models/en_3_decoder_5_lr_drop_1 --eval --resume models/en_3_decoder_5_lr_drop_1/checkpoint000{}.pth

Citing OadTR

Please cite our paper in your publications if it helps your research:

@article{wang2021oadtr,
  title={OadTR: Online Action Detection with Transformers},
  author={Wang, Xiang and Zhang, Shiwei and Qing, Zhiwu and Shao, Yuanjie and Zuo, Zhengrong and Gao, Changxin and Sang, Nong},
  journal={arXiv preprint arXiv:2106.11149},
  year={2021}
}

Comments

HDD camera dataset

Hi Team, I would like to use both camera and sensor data on transformer. Could you please suggestion to use camera data and sensor both into transformer? I also like to compare with TRN HDD matrix (mAP).

Thanks a lot

opened by pgupta119 8
Question about configuration (num_heads, enc_layers)

Dear authors.

First, thank you for sharing your nice work! It helps me a lot to solve my task.

Btw, your paper states that the best performance is achieved when the number of encoder layers and heads is 5 and 4, respectively, but in your code, they are set to 64 and 8, respectively. Could you explain why?

opened by tghim 5

HDD data segmentation

Hi! I noticed that your code is somewhat different from TRN's in HDD's data segmentation, this maybe because frames are classified in different way in two models. TRN:

for start, end in zip(
    range(seed, target.shape[0] - self.dec_steps, self.enc_steps),
    range(seed + self.enc_steps, target.shape[0] - self.dec_steps, self.enc_steps)):
    enc_target = target[start:end]
    dec_target = self.get_dec_target(target[start:end + self.dec_steps])
    self.inputs.append([
        session, start, end, sensor[start:end],
        enc_target, dec_target,
        ])

Yours:

for start, end in zip(
        range(seed, target.shape[0], 1),  # self.enc_steps
        range(seed + self.enc_steps, target.shape[0]-self.dec_steps, 1)):
    enc_target = target[start:end]
    # dec_target = self.get_dec_target(target[start:end + self.dec_steps])
    dec_target = target[end:end + self.dec_steps]
    distance_target, class_h_target = self.get_distance_target(target[start:end])
    if class_h_target.argmax() != 21:
        self.inputs.append([
            session, start, end, enc_target, distance_target, class_h_target, dec_target
        ])

Will the differences in data segmentation affect the model?

opened by Doha7430 3

THUMOS14-Kinetics feature FPS.

THUMOS14-Kinetics feature is extracted at 4 FPS in IDN. Is your experimental configuration consistent with IDN? In addition, can you provide more details about the feature extraction method?

opened by sqiangcao99 3
Regarding the dimensionality of the input rgb and flow features

dear author, I want to use the Repo you gave to generate my own dataset, and then I found this repo. When I generated RGB and flow features separately according to the method of this repo, I found that the RGB feature of each frame is a sequence with a length of 2048, and the flow feature is a sequence with a length of 1024. However, I found that the RGB and flow characteristics of each frame of the THUMOS14 dataset you released are 2048. So, I want to hear how you deal with this problem.

Some supplements: The output location of the rgb feature in "resnet200_anet_2016" model is "caffe.Flatten_673". The output location of the flow feature in "bn_inception_anet_2016_temporal" model is "global_pool".

opened by Aierhaimian 2
fps of the original video

Hi Thank you for sharing the dataset. But I met some problems when I used the dataset. As you said in the previous issue, the fps of the origin video is 24fps, so the actual fps of the feature is 4, but I calculate the frames and the seconds of the video found that the fps of the feature is 5... If there is any mistakes?

opened by Echo0125 1
Test settings of OadTR.

Hi, thanks for your amazing work. When testing the performance of OadTR, I found you seem to miss the first few frames of the video. This may be unfair compared to other methods.

opened by sqiangcao99 1
Not Getting all classes

Hi Team , I am using the honda dataset which has 12 classes : ["background", "intersection passing", "left turn", "right turn", "left lane change", "right lane change", "left lane branch", "right lane branch", "crosswalk passing", "railroad passing", "merge", "U-turn"]

When I am trained and getting results the mAP/AP table missing the last label (class):

background: nan intersection passing: nan left turn: na right turn: nan left lane change: nan right lane change: nan left lane branch: nan right lane branch: nan crosswalk passing: nan railroad passing: nan merge: nan

Why it doesn't show the U-turn class

opened by pgupta119 1
video_validation_0000690_rgb.npy

Hi, I am getting this error while running the train.py: FileNotFoundError: [Errno 2] No such file or directory: 'data/THUMOS14/Feature/Anet2016_feature_v2/video_validation_0000690_rgb.npy'

Could you help me with fixing that?

opened by elhamravanbakhsh 1

Runtime error on training

I am getting the following error when I run OadTR train command.

python main.py --num_layers 3 --decoder_layers 5 --enc_layers 64 --output_dir models/en_3_decoder_5_lr_drop_1

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your
module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the 
keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` 
function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel 
module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss 
function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).

Could you let me know if I am missing something while running this command ?

opened by nishanthrachakonda 1

Owner

GitHub

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

49 Nov 10, 2022

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

225 Dec 29, 2022

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

4 May 8, 2022

Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

GANInversion_with_ConsecutiveImgs Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images" https://a

38 Dec 7, 2022

PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Adam-NSCL This is a PyTorch implementation of Adam-NSCL algorithm for continual learning from our CVPR2021 (oral) paper: Title: Training Networks in N

34 Dec 21, 2022

Code for ICCV 2021 paper "Distilling Holistic Knowledge with Graph Neural Networks"

HKD Code for ICCV 2021 paper "Distilling Holistic Knowledge with Graph Neural Networks" cifia-100 result The implementation of compared methods are ba

30 Dec 18, 2022

code for ICCV 2021 paper 'Generalized Source-free Domain Adaptation'

G-SFDA Code (based on pytorch 1.3) for our ICCV 2021 paper 'Generalized Source-free Domain Adaptation'. [project] [paper]. Dataset preparing Download

84 Dec 26, 2022

Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..

ARAPReg Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators.. Installation The cod

132 Nov 28, 2022

Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

Pixel Difference Convolution This repository contains the PyTorch implementation for "Pixel Difference Networks for Efficient Edge Detection" by Zhuo

236 Dec 21, 2022

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization 0. Environment Environment: python 3.6 and cuda 10

62 Dec 30, 2022

Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds This is the official code implementation for the paper "Spatio-temporal Se

63 Jan 5, 2023

Code release for ICCV 2021 paper "Anticipative Video Transformer"

Anticipative Video Transformer Ranked first in the Action Anticipation task of the CVPR 2021 EPIC-Kitchens Challenge! (entry: AVT-FB-UT) [project page

123 Dec 13, 2022

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

235 Dec 26, 2022