Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

Overview

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization

This is an official implementation in PyTorch of AFSD. Our paper is available at https://arxiv.org/abs/2103.13137

Updates

  • (May, 2021) We released AFSD training and inference code for THUMOS14 dataset.
  • (February, 2021) AFSD is accepted by CVPR2021.

Abstract

Temporal action localization is an important yet challenging task in video understanding. Typically, such a task aims at inferring both the action category and localization of the start and end frame for each action instance in a long, untrimmed video. While most current models achieve good results by using pre-defined anchors and numerous actionness, such methods could be bothered with both large number of outputs and heavy tuning of locations and sizes corresponding to different anchors. Instead, anchor-free methods is lighter, getting rid of redundant hyper-parameters, but gains few attention. In this paper, we propose the first purely anchor-free temporal localization method, which is both efficient and effective. Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module to gather more valuable boundary features for each proposal with a novel boundary pooling, and (iii) several consistency constraints to make sure our model can find the accurate boundary given arbitrary proposals. Extensive experiments show that our method beats all anchor-based and actionness-guided methods with a remarkable margin on THUMOS14, achieving state-of-the-art results, and comparable ones on ActivityNet v1.3.

Summary

  • First purely anchor-free framework for temporal action detection task.
  • Fully end-to-end method using frames as input rather then features.
  • Saliency-based refinement module to gather more valuable boundary features.
  • Boundary consistency learning to make sure our model can find the accurate boundary.

Performance

Getting Started

Environment

  • Python 3.7
  • PyTorch == 1.4.0 (Please make sure your pytorch version is 1.4)
  • NVIDIA GPU

Setup

pip3 install -r requirements.txt
python3 setup.py develop

Data Preparation

  • THUMOS14 RGB data:
  1. Download post-processed RGB npy data (13.7GB): [Weiyun]
  2. Unzip the RGB npy data to ./datasets/thumos14/validation_npy/ and ./datasets/thumos14/test_npy/
  • THUMOS14 flow data:
  1. Because it costs more time to generate flow data for THUMOS14, to make easy to run flow model, we provide the post-processed flow data in Google Drive and Weiyun (3.4GB): [Google Drive], [Weiyun]
  2. Unzip the flow npy data to ./datasets/thumos14/validation_flow_npy/ and ./datasets/thumos14/test_flow_npy/

If you want to generate npy data by yourself, please refer to the following guidelines:

  • RGB data generation manually:
  1. To construct THUMOS14 RGB npy inputs, please download the THUMOS14 training and testing videos.
    Training videos: https://storage.googleapis.com/thumos14_files/TH14_validation_set_mp4.zip
    Testing videos: https://storage.googleapis.com/thumos14_files/TH14_Test_set_mp4.zip
    (unzip password is THUMOS14_REGISTERED)
  2. Move the training videos to ./datasets/thumos14/validation/ and the testing videos to ./datasets/thumos14/test/
  3. Run the data processing script: python3 AFSD/common/video2npy.py
  • Flow data generation manually:
  1. If you should generate flow data manually, firstly install the denseflow.
  2. Prepare the post-processed RGB data.
  3. Check and run the script: python3 AFSD/common/gen_denseflow_npy.py

Inference

We provide the pretrained models contain I3D backbone model and final RGB and flow models for THUMOS14 dataset: [Google Drive], [Weiyun]

# run RGB model
python3 AFSD/thumos14/test.py configs/thumos14.yaml --checkpoint_path=models/thumos14/checkpoint-15.ckpt --output_json=thumos14_rgb.json

# run flow model
python3 AFSD/thumos14/test.py configs/thumos14_flow.yaml --checkpoint_path=models/thumos14_flow/checkpoint-16.ckpt --output_json=thumos14_flow.json

# run fusion (RGB + flow) model
python3 AFSD/thumos14/test.py configs/thumos14.yaml --fusion --output_json=thumos14_fusion.json

Evaluation

The output json results of pretrained model can be downloaded from: [Google Drive], [Weiyun]

# evaluate THUMOS14 fusion result as example
python3 eval.py output/thumos14_fusion.json

mAP at tIoU 0.3 is 0.6728296149479254
mAP at tIoU 0.4 is 0.6242590551201842
mAP at tIoU 0.5 is 0.5546668739091394
mAP at tIoU 0.6 is 0.4374840824921885
mAP at tIoU 0.7 is 0.3110112542745055

Training

# train the RGB model
python3 AFSD/thumos14/train.py configs/thumos14.yaml --lw=10 --cw=1 --piou=0.5

# train the flow model
python3 AFSD/thumos14/train.py configs/thumos14_flow.yaml --lw=10 --cw=1 --piou=0.5

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{lin2021afsd,
  title={Learning Salient Boundary Feature for Anchor-free Temporal Action Localization},
  author={Chuming Lin*, Chengming Xu*, Donghao Luo, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Yanwei Fu},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021}
}
Comments
  • CUDA_runtime error (98)

    CUDA_runtime error (98)

    非常感谢开源的工作!我在使用代码时会报错CUDA_runtime error (98)。 报错位置为:AFSD/prop_pooling/boundary_max_pooling_kernel.cu:110 我猜想应该是CUDA拓展出现了问题。 我的环境信息: pytorch 1.4.0 torchvision 0.5.0 cuda: 10.0 另外:不知道有没有CPU版本的boundary_max_pooling_kernel呢?非常感谢!

    opened by czhaneva 5
  • Data download links

    Data download links

    Hi,

    Could you please provide the download links for the THUMOS14 RGB data numpy files instead of the Weiyun link provided here? I am not able to access the link https://share.weiyun.com/bP62lmHj.

    Something either on GDrive or a link with wget access could work.

    Thank you for your help!

    opened by shubhamagarwal92 4
  • Actual number of classes and class indice

    Actual number of classes and class indice

    Hello,

    Thank you for your great work,

    I found out that the number of classes in the config file for thumos14 dataset is the actual number of classes + 1. Here, thumos14 dataset has 20 classes while the config file is set to 21. I also tried it in my costume dataset, and I found out that the number of classes in the config file should be set to the number of actual classes + 1. Otherwise, it gives an error. So, what is that extra class? How can I find the original class indices after the action detection is complete?

    opened by Powercoder64 3
  • Has anyone reproduced the results of rgb model on THUMOS14 dataset?

    Has anyone reproduced the results of rgb model on THUMOS14 dataset?

    I have trained the AFSD rgb model on THUMOS14 dataset as described in Implementation Details, and the experiment results are as follows:

    0.3 | 0.4 | 0.5 | 0.6 | 0.7 | Avg. 57.7 | 52.5 | 44.6 | 35.1 | 23.4 | 42.6

    However, the results are still about 1.0 lower than the value in the paper. Could you offer help and figure out this problem? Thanks a lot.

    opened by dingli93 3
  • Query regarding the Input Video processing

    Query regarding the Input Video processing

    Hi , i observed that the video for ANet dataset is trimmed off to have 768 frames , most likely to fit GPU. But my question , when feeding the data to I3D backbone , is it sent as ( batch, channel = 3 , temporal = 768 , height, width ) dimension ? or you break it up into windows of 16 and repeatedly fit in the data ?

    opened by sauradip 2
  • Missing default.yaml for video2npy.py

    Missing default.yaml for video2npy.py

    Hello! I'm new here and I find that there lacks a file named default.yaml when I am trying to transform video to npy by myself. Expecting for your reply, thanks! image

    opened by LUJUNYIhhh 2
  • Usage of UntrimmedNet Result used during post-processing of ActivityNet

    Usage of UntrimmedNet Result used during post-processing of ActivityNet

    Hi,

    Congrats for your awesome work.

    I just want to know why is the Untrimmednet result used during post process ? After reading your paper, it is evident that this work is a localization network ( classification + proposals ) , so why is the UntrimmedNet coming here ? Isnt this network supposed to give you action classification as well ?

    Thanks in advance

    opened by anonyn498 2
  • What version of opencv-python are you using?

    What version of opencv-python are you using?

    Thanks for your sharing! When I attempt to transfer ant mp4 file to .npy file, some mp4 file could not be read. I guess it's cv2 version problem. So could you tell us what version of opencv-python are you using?

    opened by shadowclouds 1
  • support for multi-GPU

    support for multi-GPU

    我在复现代码的过程当中发现这个repo不支持多卡,在这里把我个人的解决方法写到这里把,希望作者可以更新一下多卡版本 采用4块V100进行训练,修改的地方: train.py->def forward_one_epoch(net, clips, targets, scores=None, training=True, ssl=True):

      if training:
           if ssl:
               tar = targets[0]
               pro = torch.stack([tar,tar,tar,tar],dim=0)
               output_dict = net(clips, proposals=pro, ssl=ssl)
           else:
               output_dict = net(clips, ssl=False)
               output_dict['priors'] = output_dict['priors'][0:126:]
    
    opened by Kaeless 1
  • about feature extraction

    about feature extraction

    Hi, have you tried to use I3D pre-extracted features? Since this methods involves finetuning of I3D models, which may result in unfair comparison with other methods.

    opened by yangwf1 1
  • Question about

    Question about "bounds = [[0, 30], [15, 60], [30, 120], [60, 240], [96, 768], [256, 768]]"

    https://github.com/TencentYoutuResearch/ActionDetection-AFSD/blob/fcdf2a09003214c1b4ae610a11218d6f4f5a0c23/AFSD/anet/multisegment_loss.py#L128

    How to understand the use of bounds? Thanks!

    opened by Qidian213 1
  • What's the difference between net() and net.module()?

    What's the difference between net() and net.module()?

    In the version of supporting multi-GPU, I notice the code in GAF/thumos14/train.py has been modified from output_dict = net(clips, proposals=targets, ssl=ssl) to output_dict = net.module(clips, proposals=targets, ssl=ssl, mode='clf'). Can anyone tell me the difference ? Thanks.

    opened by syj2908 0
  • setup.py

    setup.py

    C:\Users\Administrator.conda\envs\AFSD-pl\lib\site-packages\torch\include\torch\csrc\api\include\torch/nn/functional/embedding.h(115): note: see reference to function template instantiation 'std::string torch::enumtype::get_enum_nametorch::nn::EmbeddingBagMode(V)' being compiled with [ V=torch::nn::EmbeddingBagMode ] C:\Users\Administrator.conda\envs\AFSD-pl\lib\site-packages\torch\include\ATen/core/ivalue_inl.h(624): note: see reference to class template instantiation 'c10::ArrayRefc10::IValue' being compiled error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit code 2

    setup时出现错误,想请问一下如何解决,环境是cuda10.1 cudnn 7.5 windows10 pytorch 1.4

    opened by a907251064 0
  • about the trainning on ActivityNet1.3

    about the trainning on ActivityNet1.3

    based on the paper,should i use the code “python3 AFSD/anet/train_init.py configs/anet.yaml --lw=1 --cw=1 --piou=0.5” to train the net. The lw=1 is right? Why does my loss increase when I train?

    opened by hj611 0
  • about activitynet1.3

    about activitynet1.3

    @linchuming 您好我在运行python3 AFSD/anet_data/video2npy.py THREAD_NUM生成 RGB npy 输入数据时,遇到一个问题,当采样视频的总时长超过1分钟时,ret, frame = cap.read(),ret为false,count = cap.get(cv2.CAP_PROP_FRAME_COUNT)为770。但是同样的count为770,但是采样视频总时长不超过1分钟时,ret是为true。我不知道这是什么问题,您能帮帮我吗?还有一个神奇的现象是,我把不能正确读帧的视频下载到我本地笔记本电脑上时,这些都可以读取。

    opened by menghuaa 1
Owner
Tencent YouTu Research
Tencent YouTu Research
(L2ID@CVPR2021) Boosting Co-teaching with Compression Regularization for Label Noise

Nested-Co-teaching (L2ID@CVPR2021) Pytorch implementation of paper "Boosting Co-teaching with Compression Regularization for Label Noise" [PDF] If our

YINGYI CHEN 41 Jan 3, 2023
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

null 81 Dec 1, 2022
Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

DewarpNet This repository contains the codes for DewarpNet training. Recent Updates [May, 2020] Added evaluation images and an important note about Ma

CVLab@StonyBrook 354 Jan 1, 2023
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

This repository hosts the source code of our paper: [AAAI 2021]Sequential End-to-end Network for Efficient Person Search. SeqNet achieves the state-of

Zj Li 218 Dec 31, 2022
Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

DataTuner You have just found the DataTuner. This repository provides tools for fine-tuning language models for a task. See LICENSE.txt for license de

null 81 Jan 1, 2023
This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Merantix-Labs: DAAIN This is the code for our paper DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows which can be found at

Merantix 14 Oct 12, 2022
Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

CSCBLI Code for our ACL Findings 2021 paper, "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction". Require

Jinpeng Zhang 12 Oct 8, 2022
This repository contains the code for the paper "SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks"

SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks (CVPR 2021 Oral) This repository contains the official PyTorch implementation

Shunsuke Saito 235 Dec 18, 2022
SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

NVIDIA Research Projects 31 Nov 22, 2022
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Subramanyam 76 Dec 6, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

null 68 Dec 14, 2022
code for our ICCV 2021 paper "DeepCAD: A Deep Generative Network for Computer-Aided Design Models"

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

Rundi Wu 85 Dec 31, 2022
Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Dataset and Code for RealVSR Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme Xi Yang, Wangmeng Xiang,

Xi Yang 91 Nov 22, 2022
Code for paper "Role-based network embedding via structural features reconstruction with degree-regularized constraint"

Role-based network embedding via structural features reconstruction with degree-regularized constraint Train python main.py --dataset brazil-flights

wang zhang 1 Jun 28, 2022
Code for the paper: Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution

Fusformer Code for the paper: "Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution" Plateform Python 3.8.5 + Pytor

Jin-Fan Hu (胡锦帆) 11 Dec 12, 2022
The code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Long-term Action Assessment".

Likert Scoring with Grade Decoupling for Long-term Action Assessment This is the code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Lon

null 10 Oct 21, 2022
Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

SoftGroup We provide code for reproducing results of the paper SoftGroup for 3D Instance Segmentation on Point Clouds (CVPR 2022) Author: Thang Vu, Ko

Thang Vu 231 Dec 27, 2022
Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

PPE ✨ Repository for our CVPR'2022 paper: Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-

Zipeng Xu 34 Nov 28, 2022