Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

TianYuan

Last update: Nov 7, 2022

Related tags

Overview

EAN: Event Adaptive Network

PyTorch Implementation of paper:

EAN: Event Adaptive Network for Enhanced Action Recognition

Yuan Tian, Yichao Yan, Xiongkuo Min, Guo Lu, Guangtao Zhai, Guodong Guo, and Zhiyong Gao

[ArXiv]

Main Contribution

Efficiently modeling spatial-temporal information in videos is crucial for action recognition. In this paper, we propose a unified action recognition framework to investigate the dynamic nature of video content by introducing the following designs. First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events. Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer, which yields a sparse paradigm. We call the proposed framework as Event Adaptive Network (EAN) because both key designs are adaptive to the input video content. To exploit the short-term motions within local segments, we propose a novel and efficient Latent Motion Code (LMC) module, further improving the performance of the framework.

Content

Dependencies
Data Preparation
Pretrained Models
- Something-Something-V1
Testing
Training
Other Info

Dependencies

Please make sure the following libraries are installed successfully:

Data Preparation

Following the common practice, we need to first extract videos into frames for fast data loading. Please refer to TSN repo for the detailed guide of data pre-processing. We have successfully trained on Something-Something-V1 and V2, Kinetics, Diving48 datasets with this codebase. Basically, the processing of video data can be summarized into 3 steps:

Extract frames from videos:
- For Something-Something-V2 dataset, please use data_process/vid2img_sthv2.py
- For Kinetics dataset, please use data_process/vid2img_kinetics.py
- For Diving48 dataset, please use data_process/extract_frames_diving48.py
Generate file lists needed for dataloader:
- Each line of the list file will contain a tuple of (extracted video frame folder name, video frame number, and video groundtruth class). A list file looks like this:
```
video_frame_folder 100 10
video_2_frame_folder 150 31
...
```
- Or you can use off-the-shelf tools provided by the repos: data_process/gen_label_xxx.py
Edit dataset config information in datasets_video.py

Pretrained Models

Here, we provide the pretrained models of EAN models on Something-Something-V1 datasets. Recognizing actions in this dataset requires strong temporal modeling ability. EAN achieves state-of-the-art performance on these datasets. Notably, our method even surpasses optical flow based methods while with only RGB frames as input.

Something-Something-V1

Model	Backbone	FLOPs	Val Top1	Val Top5	Checkpoints
EAN_8F(RGB+LMC)	ResNet-50	37G	53.4	81.1	[Jianguo Cloud]
EAN_16(RGB+LMC)		74G	54.7	82.3
EAN_{16+8(RGB+LMC)}		111G	57.2	83.9
EAN_{2 x (16+8)(RGB+LMC)}		222G	57.5	84.3

Testing

For example, to test the EAN models on Something-Something-V1, you can first put the downloaded .pth.tar files into the "pretrained" folder and then run:

# test EAN model with 8frames clip
bash scripts/test/sthv1/RGB_LMC_8F.sh

# test EAN model with 16frames clip
bash scripts/test/sthv1/RGB_LMC_16F.sh

Training

We provided several scripts to train EAN with this repo, please refer to "scripts" folder for more details. For example, to train PAN on Something-Something-V1, you can run:

# train EAN model with 8frames clip
bash scripts/train/sthv1/RGB_LMC_8F.sh

Notice that you should scale up the learning rate with batch size. For example, if you use a batch size of 32 you should set learning rate to 0.005.

Other Info

References

This repository is built upon the following baseline implementations for the action recognition task.

Citation

Please [★star] this repo and [cite] the following arXiv paper if you feel our EAN useful to your research:

@misc{tian2021ean,
      title={EAN: Event Adaptive Network for Enhanced Action Recognition}, 
      author={Yuan Tian and Yichao Yan and Xiongkuo Min and Guo Lu and Guangtao Zhai and Guodong Guo and Zhiyong Gao},
      year={2021},
      eprint={2107.10771},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

For any questions, please feel free to open an issue or contact:

Yuan Tian: [email protected]

You might also like...

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

15 Nov 21, 2022

Comments

About test acc

I used this project to train somethingv1 data set. When testing with test_models.py, I found that the test accuracy was abnormal, and there was a big difference between the numerical result and the verification result during the training. I think the calculation method of the result was wrong in the code.

opened by Maojianzeng 0
About test_models.py

Thank you for the open source of this project. I have used this project to train my own model. The training log is as follows:

Epoch: [0][0/1992], lr: 0.00125 Time 8.269 (8.269) Data 4.317 (4.317) Loss 3.8682 (3.8682) Reg_Loss 0.0000 (0.0000) rep_losses1 3.8682 (3.8682) rep_losses2 0.0000 (0.0000) rep_losses3 0.0000 (0.0000) prob_pens 0.0000 (0.0000) Prec@1 0.000 (0.000) Prec@5 0.000 (0.000) ...... Epoch: [69][1980/1992], lr: 0.00000 Time 0.432 (0.450) Data 0.000 (0.002) Loss 1.3235 (1.2887) Reg_Loss 0.0000 (0.0000) rep_losses1 1.3235 (1.2887) rep_losses2 0.0000 (0.0000) rep_losses3 0.0000 (0.0000) prob_pens 0.0000 (0.0000) Prec@1 62.500 (59.181) Prec@5 75.000 (88.787) Test: [0/262] Time 4.522 (4.522) Loss 2.1640 (2.1640) reg_losses 0.0000 (0.0000) reg_loss_fuses 0.0000 (0.0000) Prec@1 50.000 (50.000) Prec@5 75.000 (75.000) ...... Test: [260/262] Time 0.117 (0.458) Loss 2.8752 (3.0782) reg_losses 0.0000 (0.0000) reg_loss_fuses 0.0000 (0.0000) Prec@1 25.000 (34.962) Prec@5 62.500 (63.602) Testing Results: Prec@1 34.924 Prec@5 63.550 Loss 3.08035 Best Prec@1: 36.021

I think the training process is normal. When I tested the trained model, the test log looked like this:

video 0 done, total 1/2096, moving Prec@1 0.000 Prec@5 0.000 video 1 done, total 2/2096, moving Prec@1 0.000 Prec@5 0.000 ......

video 1999 done, total 2000/2096, moving Prec@1 2.100 Prec@5 12.400 video 2000 done, total 2001/2096, moving Prec@1 2.099 Prec@5 12.394 video 2001 done, total 2002/2096, moving Prec@1 2.098 Prec@5 12.388 video 2002 done, total 2003/2096, moving Prec@1 2.097 Prec@5 12.381 video 2003 done, total 2004/2096, moving Prec@1 2.096 Prec@5 12.375 video 2004 done, total 2005/2096, moving Prec@1 2.095 Prec@5 12.369 video 2005 done, total 2006/2096, moving Prec@1 2.094 Prec@5 12.363 video 2006 done, total 2007/2096, moving Prec@1 2.093 Prec@5 12.357 video 2007 done, total 2008/2096, moving Prec@1 2.092 Prec@5 12.351 video 2008 done, total 2009/2096, moving Prec@1 2.091 Prec@5 12.344 video 2009 done, total 2010/2096, moving Prec@1 2.090 Prec@5 12.338 video 2010 done, total 2011/2096, moving Prec@1 2.089 Prec@5 12.382 video 2011 done, total 2012/2096, moving Prec@1 2.087 Prec@5 12.376 video 2012 done, total 2013/2096, moving Prec@1 2.086 Prec@5 12.370 video 2013 done, total 2014/2096, moving Prec@1 2.085 Prec@5 12.363 video 2014 done, total 2015/2096, moving Prec@1 2.084 Prec@5 12.357 video 2015 done, total 2016/2096, moving Prec@1 2.083 Prec@5 12.351 video 2016 done, total 2017/2096, moving Prec@1 2.082 Prec@5 12.345 video 2017 done, total 2018/2096, moving Prec@1 2.081 Prec@5 12.339 video 2018 done, total 2019/2096, moving Prec@1 2.080 Prec@5 12.333 video 2019 done, total 2020/2096, moving Prec@1 2.079 Prec@5 12.327 video 2020 done, total 2021/2096, moving Prec@1 2.078 Prec@5 12.321 video 2021 done, total 2022/2096, moving Prec@1 2.077 Prec@5 12.315 video 2022 done, total 2023/2096, moving Prec@1 2.076 Prec@5 12.308 video 2023 done, total 2024/2096, moving Prec@1 2.075 Prec@5 12.302 video 2024 done, total 2025/2096, moving Prec@1 2.074 Prec@5 12.296 video 2025 done, total 2026/2096, moving Prec@1 2.073 Prec@5 12.290 video 2026 done, total 2027/2096, moving Prec@1 2.072 Prec@5 12.284 video 2027 done, total 2028/2096, moving Prec@1 2.071 Prec@5 12.278 video 2028 done, total 2029/2096, moving Prec@1 2.070 Prec@5 12.272 video 2029 done, total 2030/2096, moving Prec@1 2.069 Prec@5 12.266 video 2030 done, total 2031/2096, moving Prec@1 2.068 Prec@5 12.260 video 2031 done, total 2032/2096, moving Prec@1 2.067 Prec@5 12.254 video 2032 done, total 2033/2096, moving Prec@1 2.066 Prec@5 12.248 video 2033 done, total 2034/2096, moving Prec@1 2.065 Prec@5 12.242 video 2034 done, total 2035/2096, moving Prec@1 2.064 Prec@5 12.236 video 2035 done, total 2036/2096, moving Prec@1 2.063 Prec@5 12.230 video 2036 done, total 2037/2096, moving Prec@1 2.062 Prec@5 12.224 video 2037 done, total 2038/2096, moving Prec@1 2.061 Prec@5 12.218 video 2038 done, total 2039/2096, moving Prec@1 2.060 Prec@5 12.212 video 2039 done, total 2040/2096, moving Prec@1 2.059 Prec@5 12.206 video 2040 done, total 2041/2096, moving Prec@1 2.058 Prec@5 12.200 video 2041 done, total 2042/2096, moving Prec@1 2.057 Prec@5 12.194 video 2042 done, total 2043/2096, moving Prec@1 2.056 Prec@5 12.188 video 2043 done, total 2044/2096, moving Prec@1 2.055 Prec@5 12.182 video 2044 done, total 2045/2096, moving Prec@1 2.054 Prec@5 12.176 video 2045 done, total 2046/2096, moving Prec@1 2.053 Prec@5 12.170 video 2046 done, total 2047/2096, moving Prec@1 2.052 Prec@5 12.164 video 2047 done, total 2048/2096, moving Prec@1 2.051 Prec@5 12.158 video 2048 done, total 2049/2096, moving Prec@1 2.050 Prec@5 12.152 video 2049 done, total 2050/2096, moving Prec@1 2.049 Prec@5 12.146 video 2050 done, total 2051/2096, moving Prec@1 2.048 Prec@5 12.140 video 2051 done, total 2052/2096, moving Prec@1 2.096 Prec@5 12.183 video 2052 done, total 2053/2096, moving Prec@1 2.094 Prec@5 12.177 video 2053 done, total 2054/2096, moving Prec@1 2.093 Prec@5 12.220 video 2054 done, total 2055/2096, moving Prec@1 2.092 Prec@5 12.214 video 2055 done, total 2056/2096, moving Prec@1 2.091 Prec@5 12.208 video 2056 done, total 2057/2096, moving Prec@1 2.090 Prec@5 12.202 video 2057 done, total 2058/2096, moving Prec@1 2.089 Prec@5 12.196 video 2058 done, total 2059/2096, moving Prec@1 2.088 Prec@5 12.190 video 2059 done, total 2060/2096, moving Prec@1 2.087 Prec@5 12.184 video 2060 done, total 2061/2096, moving Prec@1 2.086 Prec@5 12.179 video 2061 done, total 2062/2096, moving Prec@1 2.085 Prec@5 12.173 video 2062 done, total 2063/2096, moving Prec@1 2.084 Prec@5 12.167 video 2063 done, total 2064/2096, moving Prec@1 2.083 Prec@5 12.161 video 2064 done, total 2065/2096, moving Prec@1 2.082 Prec@5 12.155 video 2065 done, total 2066/2096, moving Prec@1 2.081 Prec@5 12.149 video 2066 done, total 2067/2096, moving Prec@1 2.080 Prec@5 12.143 video 2067 done, total 2068/2096, moving Prec@1 2.079 Prec@5 12.137 video 2068 done, total 2069/2096, moving Prec@1 2.078 Prec@5 12.131 video 2069 done, total 2070/2096, moving Prec@1 2.077 Prec@5 12.126 video 2070 done, total 2071/2096, moving Prec@1 2.076 Prec@5 12.120 video 2071 done, total 2072/2096, moving Prec@1 2.075 Prec@5 12.114 video 2072 done, total 2073/2096, moving Prec@1 2.074 Prec@5 12.108 video 2073 done, total 2074/2096, moving Prec@1 2.073 Prec@5 12.102 video 2074 done, total 2075/2096, moving Prec@1 2.072 Prec@5 12.145 video 2075 done, total 2076/2096, moving Prec@1 2.071 Prec@5 12.139 video 2076 done, total 2077/2096, moving Prec@1 2.070 Prec@5 12.181 video 2077 done, total 2078/2096, moving Prec@1 2.069 Prec@5 12.175 video 2078 done, total 2079/2096, moving Prec@1 2.068 Prec@5 12.169 video 2079 done, total 2080/2096, moving Prec@1 2.067 Prec@5 12.163 video 2080 done, total 2081/2096, moving Prec@1 2.114 Prec@5 12.206 video 2081 done, total 2082/2096, moving Prec@1 2.113 Prec@5 12.248 video 2082 done, total 2083/2096, moving Prec@1 2.112 Prec@5 12.242 video 2083 done, total 2084/2096, moving Prec@1 2.111 Prec@5 12.236 video 2084 done, total 2085/2096, moving Prec@1 2.110 Prec@5 12.230 video 2085 done, total 2086/2096, moving Prec@1 2.109 Prec@5 12.224 video 2086 done, total 2087/2096, moving Prec@1 2.108 Prec@5 12.218 video 2087 done, total 2088/2096, moving Prec@1 2.107 Prec@5 12.213 video 2088 done, total 2089/2096, moving Prec@1 2.106 Prec@5 12.207 video 2089 done, total 2090/2096, moving Prec@1 2.105 Prec@5 12.201 video 2090 done, total 2091/2096, moving Prec@1 2.104 Prec@5 12.195 video 2091 done, total 2092/2096, moving Prec@1 2.103 Prec@5 12.189 video 2092 done, total 2093/2096, moving Prec@1 2.102 Prec@5 12.183 video 2093 done, total 2094/2096, moving Prec@1 2.101 Prec@5 12.225 video 2094 done, total 2095/2096, moving Prec@1 2.100 Prec@5 12.267 video 2095 done, total 2096/2096, moving Prec@1 2.099 Prec@5 12.309

As we have seen, the accuracy calculation in the test procedure is abnormal, which I think is probably caused by an error in test_models.py. Could you help me analyze it?

opened by Maojianzeng 0

Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

Related tags

Overview

EAN: Event Adaptive Network

Main Contribution

Content

Dependencies

Data Preparation

Pretrained Models

Something-Something-V1

Testing

Training

Other Info

References

Citation

Contact

You might also like...

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Generic Event Boundary Detection: A Benchmark for Event Segmentation

Scikit-event-correlation - Event Correlation and Forecasting over High Dimensional Streaming Sensor Data algorithms

Event-forecasting - Event Forecasting Algorithms With Python

Event sourced bank - A wide-and-shallow example using the Python event sourcing library

AdaFocus (ICCV 2021) Adaptive Focus for Efficient Video Recognition

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

[CVPR 2021] Official PyTorch Implementation for "Iterative Filter Adaptive Network for Single Image Defocus Deblurring"

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

Comments

About test acc

About test_models.py

Owner

TianYuan

Official implementation of the paper 'Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution'

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19

Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

TDN: Temporal Difference Networks for Efficient Action Recognition

Learning Representational Invariances for Data-Efficient Action Recognition

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"