Code for our ICCV 2021 Paper "OadTR: Online Action Detection with Transformers".

Related tags

Deep Learning OadTR
Overview

OadTR

Code for our ICCV2021 paper: "OadTR: Online Action Detection with Transformers" ["Paper"]

Update

  • July 28, 2021: Our Paper "OadTR: Online Action Detection with Transformers" was accepted by ICCV2021. At the same time, we released THUMOS14-Kinetics feature.

Dependencies

  • pytorch==1.6.0
  • json
  • numpy
  • tensorboard-logger
  • torchvision==0.7.0

Prepare

  • Unzip the anno file "./data/anno_thumos.zip"
  • Download the feature THUMOS14-Anet feature (Note: HDD and TVSeries are available by contacting the authors of the datasets and signing agreements due to the copyrights. You can use this Repo to extract features.)

Training

python main.py --num_layers 3 --decoder_layers 5 --enc_layers 64 --output_dir models/en_3_decoder_5_lr_drop_1

Validation

python main.py --num_layers 3 --decoder_layers 5 --enc_layers 64 --output_dir models/en_3_decoder_5_lr_drop_1 --eval --resume models/en_3_decoder_5_lr_drop_1/checkpoint000{}.pth

Citing OadTR

Please cite our paper in your publications if it helps your research:

@article{wang2021oadtr,
  title={OadTR: Online Action Detection with Transformers},
  author={Wang, Xiang and Zhang, Shiwei and Qing, Zhiwu and Shao, Yuanjie and Zuo, Zhengrong and Gao, Changxin and Sang, Nong},
  journal={arXiv preprint arXiv:2106.11149},
  year={2021}
}
Comments
  •  HDD camera dataset

    HDD camera dataset

    Hi Team, I would like to use both camera and sensor data on transformer. Could you please suggestion to use camera data and sensor both into transformer? I also like to compare with TRN HDD matrix (mAP).

    Thanks a lot

    opened by pgupta119 8
  • Question about configuration (num_heads, enc_layers)

    Question about configuration (num_heads, enc_layers)

    Dear authors.

    First, thank you for sharing your nice work! It helps me a lot to solve my task.

    Btw, your paper states that the best performance is achieved when the number of encoder layers and heads is 5 and 4, respectively, but in your code, they are set to 64 and 8, respectively. Could you explain why?

    opened by tghim 5
  • HDD data segmentation

    HDD data segmentation

    Hi! I noticed that your code is somewhat different from TRN's in HDD's data segmentation, this maybe because frames are classified in different way in two models. TRN:

    for start, end in zip(
        range(seed, target.shape[0] - self.dec_steps, self.enc_steps),
        range(seed + self.enc_steps, target.shape[0] - self.dec_steps, self.enc_steps)):
        enc_target = target[start:end]
        dec_target = self.get_dec_target(target[start:end + self.dec_steps])
        self.inputs.append([
            session, start, end, sensor[start:end],
            enc_target, dec_target,
            ])
    

    Yours:

    for start, end in zip(
            range(seed, target.shape[0], 1),  # self.enc_steps
            range(seed + self.enc_steps, target.shape[0]-self.dec_steps, 1)):
        enc_target = target[start:end]
        # dec_target = self.get_dec_target(target[start:end + self.dec_steps])
        dec_target = target[end:end + self.dec_steps]
        distance_target, class_h_target = self.get_distance_target(target[start:end])
        if class_h_target.argmax() != 21:
            self.inputs.append([
                session, start, end, enc_target, distance_target, class_h_target, dec_target
            ])
    

    Will the differences in data segmentation affect the model?

    opened by Doha7430 3
  • THUMOS14-Kinetics feature FPS.

    THUMOS14-Kinetics feature FPS.

    THUMOS14-Kinetics feature is extracted at 4 FPS in IDN. Is your experimental configuration consistent with IDN? In addition, can you provide more details about the feature extraction method?

    opened by sqiangcao99 3
  • Regarding the dimensionality of the input rgb and flow features

    Regarding the dimensionality of the input rgb and flow features

    dear author, I want to use the Repo you gave to generate my own dataset, and then I found this repo. When I generated RGB and flow features separately according to the method of this repo, I found that the RGB feature of each frame is a sequence with a length of 2048, and the flow feature is a sequence with a length of 1024. However, I found that the RGB and flow characteristics of each frame of the THUMOS14 dataset you released are 2048. So, I want to hear how you deal with this problem.

    Some supplements: The output location of the rgb feature in "resnet200_anet_2016" model is "caffe.Flatten_673". The output location of the flow feature in "bn_inception_anet_2016_temporal" model is "global_pool".

    opened by Aierhaimian 2
  • fps of the original video

    fps of the original video

    Hi Thank you for sharing the dataset. But I met some problems when I used the dataset. As you said in the previous issue, the fps of the origin video is 24fps, so the actual fps of the feature is 4, but I calculate the frames and the seconds of the video found that the fps of the feature is 5... If there is any mistakes?

    opened by Echo0125 1
  • Test settings of OadTR.

    Test settings of OadTR.

    Hi, thanks for your amazing work. When testing the performance of OadTR, I found you seem to miss the first few frames of the video. This may be unfair compared to other methods.

    opened by sqiangcao99 1
  • Not Getting all classes

    Not Getting all classes

    Hi Team , I am using the honda dataset which has 12 classes : ["background", "intersection passing", "left turn", "right turn", "left lane change", "right lane change", "left lane branch", "right lane branch", "crosswalk passing", "railroad passing", "merge", "U-turn"]

    When I am trained and getting results the mAP/AP table missing the last label (class):

    background: nan intersection passing: nan left turn: na right turn: nan left lane change: nan right lane change: nan left lane branch: nan right lane branch: nan crosswalk passing: nan railroad passing: nan merge: nan

    Why it doesn't show the U-turn class

    opened by pgupta119 1
  • video_validation_0000690_rgb.npy

    video_validation_0000690_rgb.npy

    Hi, I am getting this error while running the train.py: FileNotFoundError: [Errno 2] No such file or directory: 'data/THUMOS14/Feature/Anet2016_feature_v2/video_validation_0000690_rgb.npy'

    Could you help me with fixing that?

    opened by elhamravanbakhsh 1
  • Runtime error on training

    Runtime error on training

    I am getting the following error when I run OadTR train command.

    python main.py --num_layers 3 --decoder_layers 5 --enc_layers 64 --output_dir models/en_3_decoder_5_lr_drop_1
    
    RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your
    module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the 
    keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` 
    function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel 
    module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss 
    function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
    

    Could you let me know if I am missing something while running this command ?

    opened by nishanthrachakonda 1
Owner
null
Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

Ren Tianhe 49 Nov 10, 2022
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

Hyungtae Lim 225 Dec 29, 2022
PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

Saim Wani 4 May 8, 2022
Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

GANInversion_with_ConsecutiveImgs Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images" https://a

QingyangXu 38 Dec 7, 2022
PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Adam-NSCL This is a PyTorch implementation of Adam-NSCL algorithm for continual learning from our CVPR2021 (oral) paper: Title: Training Networks in N

Shipeng Wang 34 Dec 21, 2022
Code for ICCV 2021 paper "Distilling Holistic Knowledge with Graph Neural Networks"

HKD Code for ICCV 2021 paper "Distilling Holistic Knowledge with Graph Neural Networks" cifia-100 result The implementation of compared methods are ba

Wang Yucheng 30 Dec 18, 2022
code for ICCV 2021 paper 'Generalized Source-free Domain Adaptation'

G-SFDA Code (based on pytorch 1.3) for our ICCV 2021 paper 'Generalized Source-free Domain Adaptation'. [project] [paper]. Dataset preparing Download

Shiqi Yang 84 Dec 26, 2022
Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..

ARAPReg Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators.. Installation The cod

Bo Sun 132 Nov 28, 2022
Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

Pixel Difference Convolution This repository contains the PyTorch implementation for "Pixel Difference Networks for Efficient Edge Detection" by Zhuo

Alex 236 Dec 21, 2022
Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization 0. Environment Environment: python 3.6 and cuda 10

Haitao Yang 62 Dec 30, 2022
Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds This is the official code implementation for the paper "Spatio-temporal Se

Hesper 63 Jan 5, 2023
Code release for ICCV 2021 paper "Anticipative Video Transformer"

Anticipative Video Transformer Ranked first in the Action Anticipation task of the CVPR 2021 EPIC-Kitchens Challenge! (entry: AVT-FB-UT) [project page

Facebook Research 123 Dec 13, 2022
Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

null 235 Dec 26, 2022
Demo code for ICCV 2021 paper "Sensor-Guided Optical Flow"

Sensor-Guided Optical Flow Demo code for "Sensor-Guided Optical Flow", ICCV 2021 This code is provided to replicate results with flow hints obtained f

null 10 Mar 16, 2022
Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Davis Rempe 367 Dec 24, 2022
Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Unified-EPT Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation. Installation Linux, CUDA>=10.0,

null 29 Aug 23, 2022
Starter code for the ICCV 2021 paper, 'Detecting Invisible People'

Detecting Invisible People [ICCV 2021 Paper] [Website] Tarasha Khurana, Achal Dave, Deva Ramanan Introduction This repository contains code for Detect

Tarasha Khurana 28 Sep 16, 2022
Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

Qing-Long Zhang 199 Jan 8, 2023
Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

FlyingRoastDuck 59 Oct 31, 2022