Graph Convolutional Networks for Temporal Action Localization (ICCV2019)

Related tags

Deep Learning PGCN
Overview

Graph Convolutional Networks for Temporal Action Localization

This repo holds the codes and models for the PGCN framework presented on ICCV 2019

Graph Convolutional Networks for Temporal Action Localization Runhao Zeng*, Wenbing Huang*, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan, ICCV 2019, Seoul, Korea.

[Paper]

Updates

20/12/2019 We have uploaded the RGB features, trained models and evaluation results! We found that increasing the number of proposals to 800 in the testing further boosts the performance on THUMOS14. We have also updated the proposal list.

04/07/2020 We have uploaded the I3D features on Anet, the training configurations files in data/dataset_cfg.yaml and the proposal lists for Anet.

Contents



Usage Guide

Prerequisites

[back to top]

The training and testing in PGCN is reimplemented in PyTorch for the ease of use.

Other minor Python modules can be installed by running

pip install -r requirements.txt

Code and Data Preparation

[back to top]

Get the code

Clone this repo with git, please remember to use --recursive

git clone --recursive https://github.com/Alvin-Zeng/PGCN

Download Datasets

We support experimenting with two publicly available datasets for temporal action detection: THUMOS14 & ActivityNet v1.3. Here are some steps to download these two datasets.

  • THUMOS14: We need the validation videos for training and testing videos for testing. You can download them from the THUMOS14 challenge website.
  • ActivityNet v1.3: this dataset is provided in the form of YouTube URL list. You can use the official ActivityNet downloader to download videos from the YouTube.

Download Features

Here, we provide the I3D features (RGB+Flow) for training and testing.

THUMOS14: You can download it from Google Cloud or Baidu Cloud.

Anet: You can download the I3D Flow features from Baidu Cloud (password: jbsa) and the I3D RGB features from Google Cloud (Note: set the interval to 16 in ops/I3D_Pooling_Anet.py when training with RGB features)

Download Proposal Lists (ActivityNet)

Here, we provide the proposal lists for ActivityNet 1.3. You can download them from Google Cloud

Training PGCN

[back to top]

Plesse first set the path of features in data/dataset_cfg.yaml

train_ft_path: $PATH_OF_TRAINING_FEATURES
test_ft_path: $PATH_OF_TESTING_FEATURES

Then, you can use the following commands to train PGCN

python pgcn_train.py thumos14 --snapshot_pre $PATH_TO_SAVE_MODEL

After training, there will be a checkpoint file whose name contains the information about dataset and the number of epoch. This checkpoint file contains the trained model weights and can be used for testing.

Testing Trained Models

[back to top]

You can obtain the detection scores by running

sh test.sh TRAINING_CHECKPOINT

Here, TRAINING_CHECKPOINT denotes for the trained model. This script will report the detection performance in terms of mean average precision at different IoU thresholds.

The trained models and evaluation results are put in the "results" folder.

You can obtain the two-stream results on THUMOS14 by running

sh test_two_stream.sh

THUMOS14

[email protected] (%) RGB Flow RGB+Flow
P-GCN (I3D) 37.23 47.42 49.07 (49.64)

#####Here, 49.64% is obtained by setting the combination weights to Flow:RGB=1.2:1 and nms threshold to 0.32

Other Info

[back to top]

Citation

Please cite the following paper if you feel PGCN useful to your research

@inproceedings{PGCN2019ICCV,
  author    = {Runhao Zeng and
               Wenbing Huang and
               Mingkui Tan and
               Yu Rong and
               Peilin Zhao and
               Junzhou Huang and
               Chuang Gan},
  title     = {Graph Convolutional Networks for Temporal Action Localization},
  booktitle   = {ICCV},
  year      = {2019},
}

Contact

For any question, please file an issue or contact

Runhao Zeng: [email protected]
Comments
  •  the proposal_list for ActivityNet

    the proposal_list for ActivityNet

    hello,have you got the I3D feature or the proposal_list for ActivityNet? I'm also working on activitynet dataset. Thank you! My email is [email protected]

    opened by ShaoQiBNU 9
  • Question about proposal generation

    Question about proposal generation

    Hi, thanks for your sharing of the code. I'd like to know that where the pre-extracted proposals come from. Did you reimplement the paper Boundary Sensitive Network or just use their provided proposals?

    opened by yangwf1 9
  • how to generate bsn_proposal_list.txt

    how to generate bsn_proposal_list.txt

    thank you for your great work. I have some question wether if I want to apply your work for my own dataset how can i generate the bsn_proposal_list files ? and am I need to train bsn proposal generator with my own dataset ?

    thank you so much

    opened by thxkew 7
  • How to run this model on a New datasets

    How to run this model on a New datasets

    I would like to test your model on a new TAL dataset collected by our laboratory. Hence we want to know how should we prepare the dataset directory and ground truth files. Any suggestions will be very helpful!

    opened by makecent 6
  • About Activitynet feature

    About Activitynet feature

    Hi, thanks a lot for open resource .I'm working on activitynet dataset. Do u have I3D features u used in this project, I'm appreciate it if u can share one copy to me! My email address is [email protected]

    opened by yklilfft 6
  • How to predict a single unlabeled video?

    How to predict a single unlabeled video?

    Dear author, I have trained the PGCN model on my own data set, but I need to make a prediction on a video (not in the training set nor in the test set), I see that the code needs to generate the corresponding proposal which need corresponding GT information, But the videos I'm testing now don't have tag files. could u tell me how to do it? thanks a lot

    opened by mrlihellohorld 5
  • Is RGB model saved with float datatype?

    Is RGB model saved with float datatype?

    Thanks for the brilliant work!

    I happen to see an error when the RGB model is directly loaded into the PGCN architecture. The reason seems to be a mismatch of the datatype.

    To solve that, I replaced one line of code in pgcn_test.py reg_scores[prop_idx, :] = net((act_batch_var, comp_batch_var), None, None, None) by reg_scores[prop_idx, :] = net((act_batch_var.float(), comp_batch_var.float()), None, None, None)

    Do it first if you find a similar error :)

    opened by frostinassiky 4
  • The performance of the best model is lower than the results in the paper?

    The performance of the best model is lower than the results in the paper?

    Thanks for your excellent work. I trained the model that you provided and found that the best model's(at the epoch 15) performance is
    | IoU thresh | 0.10 | 0.20 | 0.30 | 0.40 | 0.50 | 0.60 | 0.70 | 0.80 | 0.90 | Average | | mean AP | 0.6574 | 0.6382 | 0.6009 | 0.5374 | 0.4578 | 0.3369 | 0.2172 | 0.0903 | 0.0134 | 0.3944 | +------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------- And it's lower than results in the paper. Could you provide the pretrained model or explain why this happened ? Thank you

    opened by shadowclouds 3
  • A detail question about the model

    A detail question about the model

    Hey! Great Job! After seeing through your paper, I've got one question. I was wondering how did you process the output feature of GCN model (Nxd) before fc layer. Because you know, you got to get rid of the two dimensions. And I saw the code, found that you just picked the first row of all N features. Did I understand it correctly? If so, could you please explain why would you do that. Why not perform an average pool between the N features? Thanks a lot!

    opened by HaiyiMei 3
  • question about the I3D feature

    question about the I3D feature

    In your paper, it says "We first uniformly divide each input video into 64-frame segments. We then use a two-stream Inflated 3D ConvNet (I3D) model pre-trained on Kinetics [5] to extract the segment features." However, in your code

    interval = 8
    clip_length = 64
    start_unit = int(min(ft_num - 1, np.floor(float(start_ind + off) / interval)))
    end_unit = int(min(ft_num - 2, np.ceil(float(end_ind - clip_length) / interval)))
    

    I guess minusing 64 means you do not use the last few frames not divisible by 64, but why should interval=8? Is it means that you divide each input video into 8-frame?

    By the way? Could you offer the I3D feature on ActivityNet? It's so time-comsuming to extrat.

    opened by JJBOY 2
  • Using G-TAD results In PGCN :

    Using G-TAD results In PGCN :

    Hi,

    I am trying to use PGCN on my own dataset. I have annotated the data according to Thumos'14 annotation format and extracted features using I3D. I have also trained and infered a G-TAD model.

    1. Can you let me know how I can re-score G-TAD generated output using PGCN?

    Your answers to the above questions will clarify a lot of doubts.

    Thank you for your time!

    opened by lakshaymehra 2
  • anet dataset with flow feature

    anet dataset with flow feature

    when i use numpy to load the data file, only one int data can be loaded. is there any trick for loading the flow i3d anet dataset, or is there anything wrong when i process the data? image

    opened by dyjjjjj 0
  • where is proposal folder for THUMOS14?And the mismatch between RGB and Flow data for activitynet

    where is proposal folder for THUMOS14?And the mismatch between RGB and Flow data for activitynet

    I can not find the thumos14's proposals like activitynet. And I also can not understand why the number of features are different between flow and rgb, they also could not match the Proposal_Lists.txt provided for acitvitynet.

    I am very confused and I think no one can run the code now using provided features and porposals

    Thanks!!!

    opened by yangmin666 0
  • How to choose  proposals when inferring an unmarked video

    How to choose proposals when inferring an unmarked video

    Hi, With the help of the author and many others, I trained the PGCN model. But I have a question about how to choose inferential proposals. At present, I use GTAD + PGCN to output the prediction results (that is, a certain number of proposals, sorted according to score). However, in actual processing of an unmarked video, a certain number of proposals will be generated through PGCN. Then how can I finally select these proposals as the prediction results?Top100/50 or something like that

    opened by mrlihellohorld 1
  • The Proposal List in PGCN

    The Proposal List in PGCN

    I am confused on how to read the proposal list for the dataset to be used in PGCN. Can someone explain how to read each number in one line of the proposal list? Thank you!

    opened by hongkyhao 0
  • Features of THUMOS14

    Features of THUMOS14

    There are total 413 videos in this dataset, including training and testing. The number of provided RGB features is 413, but the number of provided flow features is 412. Why there is someone is missing?

    opened by Hanqer 0
Owner
Runhao Zeng
Runhao Zeng
Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

Tencent YouTu Research 146 Dec 24, 2022
CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network - Pytorch This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal

Zhiwu Qing 63 Sep 27, 2022
Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19

2s-AGCN Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19 Note PyTorch version should be 0.3! For PyTor

LShi 547 Dec 26, 2022
SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

SSL_SLAM2 Lightweight 3-D Localization and Mapping for Solid-State LiDAR (Intel Realsense L515 as an example) This repo is an extension work of SSL_SL

Wang Han 王晗 1.3k Jan 8, 2023
Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

Ibai Gorordo 15 Oct 14, 2022
This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

Maurice Weiler 59 Dec 9, 2022
TDN: Temporal Difference Networks for Efficient Action Recognition

TDN: Temporal Difference Networks for Efficient Action Recognition Overview We release the PyTorch code of the TDN(Temporal Difference Networks).

Multimedia Computing Group, Nanjing University 326 Dec 13, 2022
Efficient Two-Step Networks for Temporal Action Segmentation (Neurocomputing 2021)

Efficient Two-Step Networks for Temporal Action Segmentation This repository provides a PyTorch implementation of the paper Efficient Two-Step Network

null 8 Apr 16, 2022
This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

CG3 This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning]. R

null 12 Oct 28, 2022
Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks Abstract Facial expression recognition in video

Bogireddy Sai Prasanna Teja Reddy 103 Dec 29, 2022
Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

CMU Locus Lab 3.5k Jan 1, 2023
Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Intelligent Robotics and Machine Vision Lab 4 Jul 19, 2022
Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

Tim Ansell 70 Nov 4, 2022
Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

ACTION-Net Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21). Getting Started EgoGesture data folder struct

V-Sense 171 Dec 26, 2022
Human Action Controller - A human action controller running on different platforms.

Human Action Controller (HAC) Goal A human action controller running on different platforms. Fun Easy-to-use Accurate Anywhere Fun Examples Mouse Cont

null 27 Jul 20, 2022
The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

PIC4SeRCentre 20 Jan 3, 2023
A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation A pytorch-version implementation

null 11 Oct 8, 2022
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation

AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation A pytorch-version implementation codes of paper:

null 11 Dec 13, 2022
Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

AASIST This repository provides the overall framework for training and evaluating audio anti-spoofing systems proposed in 'AASIST: Audio Anti-Spoofing

Clova AI Research 56 Jan 2, 2023