The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Last update: Dec 21, 2022

Related tags

Deep Learning Temporal_Query_Networks

Overview

Temporal Query Networks for Fine-grained Video Understanding

📋 This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks for Fine-grained Video Understanding

Abstract

Our objective in this work is fine-grained classification of actions in untrimmed videos, where the actions may be temporally extended or may span only a few frames of the video. We cast this into a query-response mechanism, where each query addresses a particular question, and has its own response label set.

We make the following four contributions: (i) We propose a new model — a Temporal Query Network — which enables the query-response functionality, and a structural undertanding of fine-grained actions. It attends to relevant segments for each query with a temporal attention mechanism, and can be trained using only the labels for each query. (ii) We propose a new way — stochastic feature bank update — to train a network on videos of various lengths with the dense sampling required to respond to fine-grained queries. (iii) we compare the TQN to other architectures and text supervision methods, and analyze their pros and cons. Finally, (iv) we evaluate the method extensively on the FineGym and Diving48 benchmarks for fine-grained action classification and surpass the state-of-the-art using only RGB features.

Getting Started

Clone this repository

git clone https://github.com/Chuhanxx/Temporal_Query_Networks.git

Create conda virtual env and install the requirements
(This implementation requires CUDA and python > 3.7)

cd Temporal_Query_Networks
source build_venv.sh

Prepare Data and Weight Initialization

Please refer to data.md for data preparation.

Training

you can start training the model with the following steps, taking the Diving48 dataset as an example,:

First stage training: Set the paths in the Diving48_first_stage.yaml config file first, and then run:

cd scripts
python train_1st_stage.py --name $EXP_NAME --dataset diving48 --dataset_config ../configs/Diving48_first_stage.yaml --gpus 0,1 --batch_size 16

Construct stochastically updated feature banks:

python construct_SUFB.py --dataset diving48 --dataset_config ../configs/Diving48_first_stage.yaml \
--gpus 0  --resume_file  $PATH_TO_BEST_FILE_FROM_1ST_STAGE --out_dir $DIR_FOR_SAVING_FEATURES

Second stage training: Set the paths in the Diving48_second_stage.yaml config file first, and then run:

python train_2nd_stage.py --name $EXP_NAME  --dataset diving48  \
--dataset_config ../configs/Diving48_second_stage.yaml   \
--batch_size 16 --gpus 0,1

Test

python test.py --name $EXP_NAME  --dataset diving48 --batch_size 1 \
--dataset_config ../configs/Diving48_second_stage.yaml

Citation

If you use this code etc., please cite the following paper:

@inproceedings{zhangtqn,
  title={Temporal Query Networks for Fine-grained Video Understanding},
  author={Chuhan Zhang and Ankush Gputa and Andrew Zisserman},
  booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

If you have any question, please contact [email protected] .

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

5 Sep 16, 2022

[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

BNN - BN = ? Training Binary Neural Networks without Batch Normalization Codes for this paper BNN - BN = ? Training Binary Neural Networks without Bat

40 Dec 30, 2022

Danfeng Hong, Lianru Gao, Jing Yao, Bing Zhang, Antonio Plaza, Jocelyn Chanussot. Graph Convolutional Networks for Hyperspectral Image Classification, IEEE TGRS, 2021.

Graph Convolutional Networks for Hyperspectral Image Classification Danfeng Hong, Lianru Gao, Jing Yao, Bing Zhang, Antonio Plaza, Jocelyn Chanussot T

154 Dec 13, 2022

Comments

S3D weights pretrained on Kinetics400 download link cannot be opened

Very wonderful work！ But when I download S3D weights pretrained on Kinetics400, I can't open the link. Hope the author can update the link, or send a copy to my email. I would be grateful if I could receive a reply.

opened by HangFang6 2

[Fix] What is the encoding of your annotation files in pickle?

Hi! Just tried out a mwe here

import pickle
with open("annotations/gym99_anno.pkl", "r") as file:
    l = pickle.load(file)
print(l)

However it errors with the default "utf-8" encoding saying that:

Traceback (most recent call last):
  File "scripts/toy.py", line 3, in <module>
    l = pickle.load(file)
  File "/GPFS/rhome/kunhaozheng/miniconda3/envs/tqn/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

opened by DyeKuu 0

reproduce acc 73.5%

I trained tqn on Diving48 with 4 v100 gpus and a batchsize of 16, the best acc output by your test code on 1st stage is about 60%, and the final acc on 2nd stage is 73.5%, I guess this acc is per video acc. Then I used my code to compute per class acc, got about 65% on the second stage. Both results are lower than the reported results(74.5% per class acc, 80+% per video acc).

opened by hhqweasd 3

The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Related tags

Overview

Temporal Query Networks for Fine-grained Video Understanding

Abstract

Getting Started

Prepare Data and Weight Initialization

Training

Test

Citation

You might also like...

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

Danfeng Hong, Lianru Gao, Jing Yao, Bing Zhang, Antonio Plaza, Jocelyn Chanussot. Graph Convolutional Networks for Hyperspectral Image Classification, IEEE TGRS, 2021.

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)

Comments

S3D weights pretrained on Kinetics400 download link cannot be opened

[Fix] What is the encoding of your annotation files in pickle?

reproduce acc 73.5%

Owner

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

PyTorch implementation of Weak-shot Fine-grained Classification via Similarity Transfer

Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"