Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

Overview

EAN: Event Adaptive Network

PWC

PyTorch Implementation of paper:

EAN: Event Adaptive Network for Enhanced Action Recognition

Yuan Tian, Yichao Yan, Xiongkuo Min, Guo Lu, Guangtao Zhai, Guodong Guo, and Zhiyong Gao

[ArXiv]

Main Contribution

Efficiently modeling spatial-temporal information in videos is crucial for action recognition. In this paper, we propose a unified action recognition framework to investigate the dynamic nature of video content by introducing the following designs. First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events. Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer, which yields a sparse paradigm. We call the proposed framework as Event Adaptive Network (EAN) because both key designs are adaptive to the input video content. To exploit the short-term motions within local segments, we propose a novel and efficient Latent Motion Code (LMC) module, further improving the performance of the framework.

Content

Dependencies

Please make sure the following libraries are installed successfully:

Data Preparation

Following the common practice, we need to first extract videos into frames for fast data loading. Please refer to TSN repo for the detailed guide of data pre-processing. We have successfully trained on Something-Something-V1 and V2, Kinetics, Diving48 datasets with this codebase. Basically, the processing of video data can be summarized into 3 steps:

  1. Extract frames from videos:

  2. Generate file lists needed for dataloader:

    • Each line of the list file will contain a tuple of (extracted video frame folder name, video frame number, and video groundtruth class). A list file looks like this:

      video_frame_folder 100 10
      video_2_frame_folder 150 31
      ...
      
    • Or you can use off-the-shelf tools provided by the repos: data_process/gen_label_xxx.py

  3. Edit dataset config information in datasets_video.py

Pretrained Models

Here, we provide the pretrained models of EAN models on Something-Something-V1 datasets. Recognizing actions in this dataset requires strong temporal modeling ability. EAN achieves state-of-the-art performance on these datasets. Notably, our method even surpasses optical flow based methods while with only RGB frames as input.

Something-Something-V1

Model Backbone FLOPs Val Top1 Val Top5 Checkpoints
EAN8F(RGB+LMC) ResNet-50 37G 53.4 81.1 [Jianguo Cloud]
EAN16(RGB+LMC) 74G 54.7 82.3
EAN16+8(RGB+LMC) 111G 57.2 83.9
EAN2 x (16+8)(RGB+LMC) 222G 57.5 84.3

Testing

For example, to test the EAN models on Something-Something-V1, you can first put the downloaded .pth.tar files into the "pretrained" folder and then run:

# test EAN model with 8frames clip
bash scripts/test/sthv1/RGB_LMC_8F.sh

# test EAN model with 16frames clip
bash scripts/test/sthv1/RGB_LMC_16F.sh

Training

We provided several scripts to train EAN with this repo, please refer to "scripts" folder for more details. For example, to train PAN on Something-Something-V1, you can run:

# train EAN model with 8frames clip
bash scripts/train/sthv1/RGB_LMC_8F.sh

Notice that you should scale up the learning rate with batch size. For example, if you use a batch size of 32 you should set learning rate to 0.005.

Other Info

References

This repository is built upon the following baseline implementations for the action recognition task.

Citation

Please [★star] this repo and [cite] the following arXiv paper if you feel our EAN useful to your research:

@misc{tian2021ean,
      title={EAN: Event Adaptive Network for Enhanced Action Recognition}, 
      author={Yuan Tian and Yichao Yan and Xiongkuo Min and Guo Lu and Guangtao Zhai and Guodong Guo and Zhiyong Gao},
      year={2021},
      eprint={2107.10771},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

For any questions, please feel free to open an issue or contact:

Yuan Tian: [email protected]
You might also like...
This is the official implement of paper
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

Generic Event Boundary Detection: A Benchmark for Event Segmentation

Generic Event Boundary Detection: A Benchmark for Event Segmentation We release our data annotation & baseline codes for detecting generic event bound

Scikit-event-correlation - Event Correlation and Forecasting over High Dimensional Streaming Sensor Data algorithms

scikit-event-correlation Event Correlation and Changing Detection Algorithm Theo

Event-forecasting - Event Forecasting Algorithms With Python

event-forecasting Event Forecasting Algorithms Theory Correlating events in comp

Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Event Sourced Bank A "wide but shallow" example of using the Python event sourci

AdaFocus (ICCV 2021)  Adaptive Focus for Efficient Video Recognition
AdaFocus (ICCV 2021) Adaptive Focus for Efficient Video Recognition

AdaFocus (ICCV 2021) This repo contains the official code and pre-trained models for AdaFocus. Adaptive Focus for Efficient Video Recognition Referenc

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition [ArXiv] [Project Page] This repository is the official implementation of AdaMML:

[CVPR 2021] Official PyTorch Implementation for
[CVPR 2021] Official PyTorch Implementation for "Iterative Filter Adaptive Network for Single Image Defocus Deblurring"

IFAN: Iterative Filter Adaptive Network for Single Image Defocus Deblurring Checkout for the demo (GUI/Google Colab)! The GUI version might occasional

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).
This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

Comments
  • About test acc

    About test acc

    I used this project to train somethingv1 data set. When testing with test_models.py, I found that the test accuracy was abnormal, and there was a big difference between the numerical result and the verification result during the training. I think the calculation method of the result was wrong in the code.

    opened by Maojianzeng 0
  • About test_models.py

    About test_models.py

    Thank you for the open source of this project. I have used this project to train my own model. The training log is as follows:

    Epoch: [0][0/1992], lr: 0.00125 Time 8.269 (8.269) Data 4.317 (4.317) Loss 3.8682 (3.8682) Reg_Loss 0.0000 (0.0000) rep_losses1 3.8682 (3.8682) rep_losses2 0.0000 (0.0000) rep_losses3 0.0000 (0.0000) prob_pens 0.0000 (0.0000) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000) ...... Epoch: [69][1980/1992], lr: 0.00000 Time 0.432 (0.450) Data 0.000 (0.002) Loss 1.3235 (1.2887) Reg_Loss 0.0000 (0.0000) rep_losses1 1.3235 (1.2887) rep_losses2 0.0000 (0.0000) rep_losses3 0.0000 (0.0000) prob_pens 0.0000 (0.0000) Prec@1 62.500 (59.181) Prec@5 75.000 (88.787) Test: [0/262] Time 4.522 (4.522) Loss 2.1640 (2.1640) reg_losses 0.0000 (0.0000) reg_loss_fuses 0.0000 (0.0000) Prec@1 50.000 (50.000) Prec@5 75.000 (75.000) ...... Test: [260/262] Time 0.117 (0.458) Loss 2.8752 (3.0782) reg_losses 0.0000 (0.0000) reg_loss_fuses 0.0000 (0.0000) Prec@1 25.000 (34.962) Prec@5 62.500 (63.602) Testing Results: Prec@1 34.924 Prec@5 63.550 Loss 3.08035 Best Prec@1: 36.021

    I think the training process is normal. When I tested the trained model, the test log looked like this:

    video 0 done, total 1/2096, moving Prec@1 0.000 Prec@5 0.000 video 1 done, total 2/2096, moving Prec@1 0.000 Prec@5 0.000 ......

    video 1999 done, total 2000/2096, moving Prec@1 2.100 Prec@5 12.400 video 2000 done, total 2001/2096, moving Prec@1 2.099 Prec@5 12.394 video 2001 done, total 2002/2096, moving Prec@1 2.098 Prec@5 12.388 video 2002 done, total 2003/2096, moving Prec@1 2.097 Prec@5 12.381 video 2003 done, total 2004/2096, moving Prec@1 2.096 Prec@5 12.375 video 2004 done, total 2005/2096, moving Prec@1 2.095 Prec@5 12.369 video 2005 done, total 2006/2096, moving Prec@1 2.094 Prec@5 12.363 video 2006 done, total 2007/2096, moving Prec@1 2.093 Prec@5 12.357 video 2007 done, total 2008/2096, moving Prec@1 2.092 Prec@5 12.351 video 2008 done, total 2009/2096, moving Prec@1 2.091 Prec@5 12.344 video 2009 done, total 2010/2096, moving Prec@1 2.090 Prec@5 12.338 video 2010 done, total 2011/2096, moving Prec@1 2.089 Prec@5 12.382 video 2011 done, total 2012/2096, moving Prec@1 2.087 Prec@5 12.376 video 2012 done, total 2013/2096, moving Prec@1 2.086 Prec@5 12.370 video 2013 done, total 2014/2096, moving Prec@1 2.085 Prec@5 12.363 video 2014 done, total 2015/2096, moving Prec@1 2.084 Prec@5 12.357 video 2015 done, total 2016/2096, moving Prec@1 2.083 Prec@5 12.351 video 2016 done, total 2017/2096, moving Prec@1 2.082 Prec@5 12.345 video 2017 done, total 2018/2096, moving Prec@1 2.081 Prec@5 12.339 video 2018 done, total 2019/2096, moving Prec@1 2.080 Prec@5 12.333 video 2019 done, total 2020/2096, moving Prec@1 2.079 Prec@5 12.327 video 2020 done, total 2021/2096, moving Prec@1 2.078 Prec@5 12.321 video 2021 done, total 2022/2096, moving Prec@1 2.077 Prec@5 12.315 video 2022 done, total 2023/2096, moving Prec@1 2.076 Prec@5 12.308 video 2023 done, total 2024/2096, moving Prec@1 2.075 Prec@5 12.302 video 2024 done, total 2025/2096, moving Prec@1 2.074 Prec@5 12.296 video 2025 done, total 2026/2096, moving Prec@1 2.073 Prec@5 12.290 video 2026 done, total 2027/2096, moving Prec@1 2.072 Prec@5 12.284 video 2027 done, total 2028/2096, moving Prec@1 2.071 Prec@5 12.278 video 2028 done, total 2029/2096, moving Prec@1 2.070 Prec@5 12.272 video 2029 done, total 2030/2096, moving Prec@1 2.069 Prec@5 12.266 video 2030 done, total 2031/2096, moving Prec@1 2.068 Prec@5 12.260 video 2031 done, total 2032/2096, moving Prec@1 2.067 Prec@5 12.254 video 2032 done, total 2033/2096, moving Prec@1 2.066 Prec@5 12.248 video 2033 done, total 2034/2096, moving Prec@1 2.065 Prec@5 12.242 video 2034 done, total 2035/2096, moving Prec@1 2.064 Prec@5 12.236 video 2035 done, total 2036/2096, moving Prec@1 2.063 Prec@5 12.230 video 2036 done, total 2037/2096, moving Prec@1 2.062 Prec@5 12.224 video 2037 done, total 2038/2096, moving Prec@1 2.061 Prec@5 12.218 video 2038 done, total 2039/2096, moving Prec@1 2.060 Prec@5 12.212 video 2039 done, total 2040/2096, moving Prec@1 2.059 Prec@5 12.206 video 2040 done, total 2041/2096, moving Prec@1 2.058 Prec@5 12.200 video 2041 done, total 2042/2096, moving Prec@1 2.057 Prec@5 12.194 video 2042 done, total 2043/2096, moving Prec@1 2.056 Prec@5 12.188 video 2043 done, total 2044/2096, moving Prec@1 2.055 Prec@5 12.182 video 2044 done, total 2045/2096, moving Prec@1 2.054 Prec@5 12.176 video 2045 done, total 2046/2096, moving Prec@1 2.053 Prec@5 12.170 video 2046 done, total 2047/2096, moving Prec@1 2.052 Prec@5 12.164 video 2047 done, total 2048/2096, moving Prec@1 2.051 Prec@5 12.158 video 2048 done, total 2049/2096, moving Prec@1 2.050 Prec@5 12.152 video 2049 done, total 2050/2096, moving Prec@1 2.049 Prec@5 12.146 video 2050 done, total 2051/2096, moving Prec@1 2.048 Prec@5 12.140 video 2051 done, total 2052/2096, moving Prec@1 2.096 Prec@5 12.183 video 2052 done, total 2053/2096, moving Prec@1 2.094 Prec@5 12.177 video 2053 done, total 2054/2096, moving Prec@1 2.093 Prec@5 12.220 video 2054 done, total 2055/2096, moving Prec@1 2.092 Prec@5 12.214 video 2055 done, total 2056/2096, moving Prec@1 2.091 Prec@5 12.208 video 2056 done, total 2057/2096, moving Prec@1 2.090 Prec@5 12.202 video 2057 done, total 2058/2096, moving Prec@1 2.089 Prec@5 12.196 video 2058 done, total 2059/2096, moving Prec@1 2.088 Prec@5 12.190 video 2059 done, total 2060/2096, moving Prec@1 2.087 Prec@5 12.184 video 2060 done, total 2061/2096, moving Prec@1 2.086 Prec@5 12.179 video 2061 done, total 2062/2096, moving Prec@1 2.085 Prec@5 12.173 video 2062 done, total 2063/2096, moving Prec@1 2.084 Prec@5 12.167 video 2063 done, total 2064/2096, moving Prec@1 2.083 Prec@5 12.161 video 2064 done, total 2065/2096, moving Prec@1 2.082 Prec@5 12.155 video 2065 done, total 2066/2096, moving Prec@1 2.081 Prec@5 12.149 video 2066 done, total 2067/2096, moving Prec@1 2.080 Prec@5 12.143 video 2067 done, total 2068/2096, moving Prec@1 2.079 Prec@5 12.137 video 2068 done, total 2069/2096, moving Prec@1 2.078 Prec@5 12.131 video 2069 done, total 2070/2096, moving Prec@1 2.077 Prec@5 12.126 video 2070 done, total 2071/2096, moving Prec@1 2.076 Prec@5 12.120 video 2071 done, total 2072/2096, moving Prec@1 2.075 Prec@5 12.114 video 2072 done, total 2073/2096, moving Prec@1 2.074 Prec@5 12.108 video 2073 done, total 2074/2096, moving Prec@1 2.073 Prec@5 12.102 video 2074 done, total 2075/2096, moving Prec@1 2.072 Prec@5 12.145 video 2075 done, total 2076/2096, moving Prec@1 2.071 Prec@5 12.139 video 2076 done, total 2077/2096, moving Prec@1 2.070 Prec@5 12.181 video 2077 done, total 2078/2096, moving Prec@1 2.069 Prec@5 12.175 video 2078 done, total 2079/2096, moving Prec@1 2.068 Prec@5 12.169 video 2079 done, total 2080/2096, moving Prec@1 2.067 Prec@5 12.163 video 2080 done, total 2081/2096, moving Prec@1 2.114 Prec@5 12.206 video 2081 done, total 2082/2096, moving Prec@1 2.113 Prec@5 12.248 video 2082 done, total 2083/2096, moving Prec@1 2.112 Prec@5 12.242 video 2083 done, total 2084/2096, moving Prec@1 2.111 Prec@5 12.236 video 2084 done, total 2085/2096, moving Prec@1 2.110 Prec@5 12.230 video 2085 done, total 2086/2096, moving Prec@1 2.109 Prec@5 12.224 video 2086 done, total 2087/2096, moving Prec@1 2.108 Prec@5 12.218 video 2087 done, total 2088/2096, moving Prec@1 2.107 Prec@5 12.213 video 2088 done, total 2089/2096, moving Prec@1 2.106 Prec@5 12.207 video 2089 done, total 2090/2096, moving Prec@1 2.105 Prec@5 12.201 video 2090 done, total 2091/2096, moving Prec@1 2.104 Prec@5 12.195 video 2091 done, total 2092/2096, moving Prec@1 2.103 Prec@5 12.189 video 2092 done, total 2093/2096, moving Prec@1 2.102 Prec@5 12.183 video 2093 done, total 2094/2096, moving Prec@1 2.101 Prec@5 12.225 video 2094 done, total 2095/2096, moving Prec@1 2.100 Prec@5 12.267 video 2095 done, total 2096/2096, moving Prec@1 2.099 Prec@5 12.309

    As we have seen, the accuracy calculation in the test procedure is abnormal, which I think is probably caused by an error in test_models.py. Could you help me analyze it?

    opened by Maojianzeng 0
Owner
TianYuan
TianYuan
Official implementation of the paper 'Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution'

DASR Paper Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution Jie Liang, Hui Zeng, and Lei Zhang. In arxiv preprint. Abs

null 81 Dec 28, 2022
Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

Pilhyeon Lee 67 Jan 3, 2023
Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19

2s-AGCN Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19 Note PyTorch version should be 0.3! For PyTor

LShi 547 Dec 26, 2022
Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Gra

null 32 Dec 26, 2022
Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos Introduction This repo is official PyTorch implementatio

Gyeongsik Moon 29 Sep 24, 2022
Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

AimCLR This is an official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Reco

Gty 44 Dec 17, 2022
PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

R2Plus1D-PyTorch PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal

Irhum Shafkat 342 Dec 16, 2022
TDN: Temporal Difference Networks for Efficient Action Recognition

TDN: Temporal Difference Networks for Efficient Action Recognition Overview We release the PyTorch code of the TDN(Temporal Difference Networks).

Multimedia Computing Group, Nanjing University 326 Dec 13, 2022
Learning Representational Invariances for Data-Efficient Action Recognition

Learning Representational Invariances for Data-Efficient Action Recognition Official PyTorch implementation for Learning Representational Invariances

Virginia Tech Vision and Learning Lab 27 Nov 22, 2022
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

null 268 Jan 9, 2023