Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

Overview

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization

This is an official implementation in PyTorch of AFSD. Our paper is available at https://arxiv.org/abs/2103.13137

Updates

  • (May, 2021) We released AFSD training and inference code for THUMOS14 dataset.
  • (February, 2021) AFSD is accepted by CVPR2021.

Abstract

Temporal action localization is an important yet challenging task in video understanding. Typically, such a task aims at inferring both the action category and localization of the start and end frame for each action instance in a long, untrimmed video. While most current models achieve good results by using pre-defined anchors and numerous actionness, such methods could be bothered with both large number of outputs and heavy tuning of locations and sizes corresponding to different anchors. Instead, anchor-free methods is lighter, getting rid of redundant hyper-parameters, but gains few attention. In this paper, we propose the first purely anchor-free temporal localization method, which is both efficient and effective. Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module to gather more valuable boundary features for each proposal with a novel boundary pooling, and (iii) several consistency constraints to make sure our model can find the accurate boundary given arbitrary proposals. Extensive experiments show that our method beats all anchor-based and actionness-guided methods with a remarkable margin on THUMOS14, achieving state-of-the-art results, and comparable ones on ActivityNet v1.3.

Summary

  • First purely anchor-free framework for temporal action detection task.
  • Fully end-to-end method using frames as input rather then features.
  • Saliency-based refinement module to gather more valuable boundary features.
  • Boundary consistency learning to make sure our model can find the accurate boundary.

Performance

Getting Started

Environment

  • Python 3.7
  • PyTorch == 1.4.0 (Please make sure your pytorch version is 1.4)
  • NVIDIA GPU

Setup

pip3 install -r requirements.txt
python3 setup.py develop

Data Preparation

  • THUMOS14 RGB data:
  1. Download post-processed RGB npy data (13.7GB): [Weiyun]
  2. Unzip the RGB npy data to ./datasets/thumos14/validation_npy/ and ./datasets/thumos14/test_npy/
  • THUMOS14 flow data:
  1. Because it costs more time to generate flow data for THUMOS14, to make easy to run flow model, we provide the post-processed flow data in Google Drive and Weiyun (3.4GB): [Google Drive], [Weiyun]
  2. Unzip the flow npy data to ./datasets/thumos14/validation_flow_npy/ and ./datasets/thumos14/test_flow_npy/

If you want to generate npy data by yourself, please refer to the following guidelines:

  • RGB data generation manually:
  1. To construct THUMOS14 RGB npy inputs, please download the THUMOS14 training and testing videos.
    Training videos: https://storage.googleapis.com/thumos14_files/TH14_validation_set_mp4.zip
    Testing videos: https://storage.googleapis.com/thumos14_files/TH14_Test_set_mp4.zip
    (unzip password is THUMOS14_REGISTERED)
  2. Move the training videos to ./datasets/thumos14/validation/ and the testing videos to ./datasets/thumos14/test/
  3. Run the data processing script: python3 AFSD/common/video2npy.py
  • Flow data generation manually:
  1. If you should generate flow data manually, firstly install the denseflow.
  2. Prepare the post-processed RGB data.
  3. Check and run the script: python3 AFSD/common/gen_denseflow_npy.py

Inference

We provide the pretrained models contain I3D backbone model and final RGB and flow models for THUMOS14 dataset: [Google Drive], [Weiyun]

# run RGB model
python3 AFSD/thumos14/test.py configs/thumos14.yaml --checkpoint_path=models/thumos14/checkpoint-15.ckpt --output_json=thumos14_rgb.json

# run flow model
python3 AFSD/thumos14/test.py configs/thumos14_flow.yaml --checkpoint_path=models/thumos14_flow/checkpoint-16.ckpt --output_json=thumos14_flow.json

# run fusion (RGB + flow) model
python3 AFSD/thumos14/test.py configs/thumos14.yaml --fusion --output_json=thumos14_fusion.json

Evaluation

The output json results of pretrained model can be downloaded from: [Google Drive], [Weiyun]

# evaluate THUMOS14 fusion result as example
python3 eval.py output/thumos14_fusion.json

mAP at tIoU 0.3 is 0.6728296149479254
mAP at tIoU 0.4 is 0.6242590551201842
mAP at tIoU 0.5 is 0.5546668739091394
mAP at tIoU 0.6 is 0.4374840824921885
mAP at tIoU 0.7 is 0.3110112542745055

Training

# train the RGB model
python3 AFSD/thumos14/train.py configs/thumos14.yaml --lw=10 --cw=1 --piou=0.5

# train the flow model
python3 AFSD/thumos14/train.py configs/thumos14_flow.yaml --lw=10 --cw=1 --piou=0.5

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{lin2021afsd,
  title={Learning Salient Boundary Feature for Anchor-free Temporal Action Localization},
  author={Chuming Lin*, Chengming Xu*, Donghao Luo, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Yanwei Fu},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021}
}
Comments
  • CUDA_runtime error (98)

    CUDA_runtime error (98)

    非常感谢开源的工作!我在使用代码时会报错CUDA_runtime error (98)。 报错位置为:AFSD/prop_pooling/boundary_max_pooling_kernel.cu:110 我猜想应该是CUDA拓展出现了问题。 我的环境信息: pytorch 1.4.0 torchvision 0.5.0 cuda: 10.0 另外:不知道有没有CPU版本的boundary_max_pooling_kernel呢?非常感谢!

    opened by czhaneva 5
  • Data download links

    Data download links

    Hi,

    Could you please provide the download links for the THUMOS14 RGB data numpy files instead of the Weiyun link provided here? I am not able to access the link https://share.weiyun.com/bP62lmHj.

    Something either on GDrive or a link with wget access could work.

    Thank you for your help!

    opened by shubhamagarwal92 4
  • Actual number of classes and class indice

    Actual number of classes and class indice

    Hello,

    Thank you for your great work,

    I found out that the number of classes in the config file for thumos14 dataset is the actual number of classes + 1. Here, thumos14 dataset has 20 classes while the config file is set to 21. I also tried it in my costume dataset, and I found out that the number of classes in the config file should be set to the number of actual classes + 1. Otherwise, it gives an error. So, what is that extra class? How can I find the original class indices after the action detection is complete?

    opened by Powercoder64 3
  • Has anyone reproduced the results of rgb model on THUMOS14 dataset?

    Has anyone reproduced the results of rgb model on THUMOS14 dataset?

    I have trained the AFSD rgb model on THUMOS14 dataset as described in Implementation Details, and the experiment results are as follows:

    0.3 | 0.4 | 0.5 | 0.6 | 0.7 | Avg. 57.7 | 52.5 | 44.6 | 35.1 | 23.4 | 42.6

    However, the results are still about 1.0 lower than the value in the paper. Could you offer help and figure out this problem? Thanks a lot.

    opened by dingli93 3
  • Query regarding the Input Video processing

    Query regarding the Input Video processing

    Hi , i observed that the video for ANet dataset is trimmed off to have 768 frames , most likely to fit GPU. But my question , when feeding the data to I3D backbone , is it sent as ( batch, channel = 3 , temporal = 768 , height, width ) dimension ? or you break it up into windows of 16 and repeatedly fit in the data ?

    opened by sauradip 2
  • Missing default.yaml for video2npy.py

    Missing default.yaml for video2npy.py

    Hello! I'm new here and I find that there lacks a file named default.yaml when I am trying to transform video to npy by myself. Expecting for your reply, thanks! image

    opened by LUJUNYIhhh 2
  • Usage of UntrimmedNet Result used during post-processing of ActivityNet

    Usage of UntrimmedNet Result used during post-processing of ActivityNet

    Hi,

    Congrats for your awesome work.

    I just want to know why is the Untrimmednet result used during post process ? After reading your paper, it is evident that this work is a localization network ( classification + proposals ) , so why is the UntrimmedNet coming here ? Isnt this network supposed to give you action classification as well ?

    Thanks in advance

    opened by anonyn498 2
  • What version of opencv-python are you using?

    What version of opencv-python are you using?

    Thanks for your sharing! When I attempt to transfer ant mp4 file to .npy file, some mp4 file could not be read. I guess it's cv2 version problem. So could you tell us what version of opencv-python are you using?

    opened by shadowclouds 1
  • support for multi-GPU

    support for multi-GPU

    我在复现代码的过程当中发现这个repo不支持多卡,在这里把我个人的解决方法写到这里把,希望作者可以更新一下多卡版本 采用4块V100进行训练,修改的地方: train.py->def forward_one_epoch(net, clips, targets, scores=None, training=True, ssl=True):

      if training:
           if ssl:
               tar = targets[0]
               pro = torch.stack([tar,tar,tar,tar],dim=0)
               output_dict = net(clips, proposals=pro, ssl=ssl)
           else:
               output_dict = net(clips, ssl=False)
               output_dict['priors'] = output_dict['priors'][0:126:]
    
    opened by Kaeless 1
  • about feature extraction

    about feature extraction

    Hi, have you tried to use I3D pre-extracted features? Since this methods involves finetuning of I3D models, which may result in unfair comparison with other methods.

    opened by yangwf1 1
  • Question about

    Question about "bounds = [[0, 30], [15, 60], [30, 120], [60, 240], [96, 768], [256, 768]]"

    https://github.com/TencentYoutuResearch/ActionDetection-AFSD/blob/fcdf2a09003214c1b4ae610a11218d6f4f5a0c23/AFSD/anet/multisegment_loss.py#L128

    How to understand the use of bounds? Thanks!

    opened by Qidian213 1
  • What's the difference between net() and net.module()?

    What's the difference between net() and net.module()?

    In the version of supporting multi-GPU, I notice the code in GAF/thumos14/train.py has been modified from output_dict = net(clips, proposals=targets, ssl=ssl) to output_dict = net.module(clips, proposals=targets, ssl=ssl, mode='clf'). Can anyone tell me the difference ? Thanks.

    opened by syj2908 0
  • setup.py

    setup.py

    C:\Users\Administrator.conda\envs\AFSD-pl\lib\site-packages\torch\include\torch\csrc\api\include\torch/nn/functional/embedding.h(115): note: see reference to function template instantiation 'std::string torch::enumtype::get_enum_nametorch::nn::EmbeddingBagMode(V)' being compiled with [ V=torch::nn::EmbeddingBagMode ] C:\Users\Administrator.conda\envs\AFSD-pl\lib\site-packages\torch\include\ATen/core/ivalue_inl.h(624): note: see reference to class template instantiation 'c10::ArrayRefc10::IValue' being compiled error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit code 2

    setup时出现错误,想请问一下如何解决,环境是cuda10.1 cudnn 7.5 windows10 pytorch 1.4

    opened by a907251064 0
  • about the trainning on ActivityNet1.3

    about the trainning on ActivityNet1.3

    based on the paper,should i use the code “python3 AFSD/anet/train_init.py configs/anet.yaml --lw=1 --cw=1 --piou=0.5” to train the net. The lw=1 is right? Why does my loss increase when I train?

    opened by hj611 0
  • about activitynet1.3

    about activitynet1.3

    @linchuming 您好我在运行python3 AFSD/anet_data/video2npy.py THREAD_NUM生成 RGB npy 输入数据时,遇到一个问题,当采样视频的总时长超过1分钟时,ret, frame = cap.read(),ret为false,count = cap.get(cv2.CAP_PROP_FRAME_COUNT)为770。但是同样的count为770,但是采样视频总时长不超过1分钟时,ret是为true。我不知道这是什么问题,您能帮帮我吗?还有一个神奇的现象是,我把不能正确读帧的视频下载到我本地笔记本电脑上时,这些都可以读取。

    opened by menghuaa 1
Owner
Tencent YouTu Research
Tencent YouTu Research
[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

TBE The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Le

Jinpeng Wang 150 Dec 28, 2022
Code for CVPR2021 paper "Robust Reflection Removal with Reflection-free Flash-only Cues"

Robust Reflection Removal with Reflection-free Flash-only Cues (RFC) Paper | To be released: Project Page | Video | Data Tensorflow implementation for

Chenyang LEI 162 Jan 5, 2023
Code for the paper "Graph Attention Tracking". (CVPR2021)

SiamGAT 1. Environment setup This code has been tested on Ubuntu 16.04, Python 3.5, Pytorch 1.2.0, CUDA 9.0. Please install related libraries before r

null 122 Dec 24, 2022
PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021)

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021) This repo presents PyTorch implementation of M

Evgeny 79 Dec 19, 2022
Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

C2-Matching (CVPR2021) This repository contains the implementation of the following paper: Robust Reference-based Super-Resolution via C2-Matching Yum

Yuming Jiang 151 Dec 26, 2022
Code for CVPR2021 paper 'Where and What? Examining Interpretable Disentangled Representations'.

PS-SC GAN This repository contains the main code for training a PS-SC GAN (a GAN implemented with the Perceptual Simplicity and Spatial Constriction c

Xinqi/Steven Zhu 40 Dec 16, 2022
Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition This repository contains code for the CVPR2021 paper "Patch-NetV

QVPR 368 Jan 6, 2023
Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

PGT Code for paper PGT: A Progressive Method for Training Models on Long Videos. Install Run pip install -r requirements.txt. Run python setup.py buil

Bo Pang 27 Mar 30, 2022
PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

Image Super-Resolution with Non-Local Sparse Attention This repository is for NLSN introduced in the following paper "Image Super-Resolution with Non-

null 143 Dec 28, 2022
Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information by Masato Tamura, Hiroki Ohashi, and Tomoaki Yosh

null 105 Dec 23, 2022
Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

OTA: Optimal Transport Assignment for Object Detection This project provides an implementation for our CVPR2021 paper "OTA: Optimal Transport Assignme

null 217 Jan 3, 2023
[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PointDSC repository PyTorch implementation of PointDSC for CVPR'2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency",

null 153 Dec 14, 2022
The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

STAR-FC This code is the implementation for the CVPR 2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes" ?? ?? . ?? Re

Shuai Shen 87 Dec 28, 2022
The official repo of the CVPR2021 oral paper: Representative Batch Normalization with Feature Calibration

Representative Batch Normalization (RBN) with Feature Calibration The official implementation of the CVPR2021 oral paper: Representative Batch Normali

Open source projects of ShangHua-Gao 76 Nov 9, 2022
The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Temporal Query Networks for Fine-grained Video Understanding ?? This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks

null 55 Dec 21, 2022
The official implementation of the CVPR2021 paper: Decoupled Dynamic Filter Networks

Decoupled Dynamic Filter Networks This repo is the official implementation of CVPR2021 paper: "Decoupled Dynamic Filter Networks". Introduction DDF is

F.S.Fire 180 Dec 30, 2022
PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Adam-NSCL This is a PyTorch implementation of Adam-NSCL algorithm for continual learning from our CVPR2021 (oral) paper: Title: Training Networks in N

Shipeng Wang 34 Dec 21, 2022
Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

MUST-GAN Code | paper The Pytorch implementation of our CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generat

TianxiangMa 46 Dec 26, 2022
A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video

null 45 Nov 29, 2022