TCPNet - Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

Overview

Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

This is an implementation of TCPNet.

arch

Introduction

For video recognition task, a global representation summarizing the whole contents of the video snippets plays an important role for the final performance. However, existing video architectures usually generate it by using a simple, global average pooling (GAP) method, which has limited ability to capture complex dynamics of videos. For image recognition task, there exist evidences showing that covariance pooling has stronger representation ability than GAP. Unfortunately, such plain covariance pooling used in image recognition is an orderless representative, which cannot model spatio-temporal structure inherent in videos. Therefore, this paper proposes a Temporal-attentive Covariance Pooling (TCP), inserted at the end of deep architectures, to produce powerful video representations. Specifi- cally, our TCP first develops a temporal attention module to adaptively calibrate spatio-temporal features for the succeeding covariance pooling, approximatively producing attentive covariance representations. Then, a temporal covariance pooling performs temporal pooling of the attentive covariance representations to char- acterize both intra-frame correlations and inter-frame cross-correlations of the calibrated features. As such, the proposed TCP can capture complex temporal dynamics. Finally, a fast matrix power normalization is introduced to exploit geometry of covariance representations. Note that our TCP is model-agnostic and can be flexibly integrated into any video architectures, resulting in TCPNet for effective video recognition. The extensive experiments on six benchmarks (e.g., Kinetics, Something-Something V1 and Charades) using various video architectures show our TCPNet is clearly superior to its counterparts, while having strong generalization ability.

Citation

@InProceedings{Gao_2021_TCP,
                author = {Zilin, Gao and Qilong, Wang and Bingbing, Zhang and Qinghua, Hu and Peihua, Li},
                title = {Temporal-attentive Covariance Pooling Networks for Video Recognition},
                booktitle = {arxiv preprint axXiv:2021.06xxx},
                year = {2021}
  }

Model Zoo

Kinetics-400

Method Backbone frames 1 crop Acc (%) 30 views Acc (%) Model Pretrained Model test log
TCPNet TSN R50 8f 72.4/90.4 75.3/91.8 K400_TCP_TSN_R50_8f Img1K_R50_GCP log
TCPNet TEA R50 8f 73.9/91.6 76.8/92.9 K400_TCP_TEA_R50_8f Img1K_Res2Net50_GCP log
TCPNet TSN R152 8f 75.7/92.2 78.3/93.7 K400_TCP_TSN_R152_8f Img11K_1K_R152_GCP log
TCPNet TSN R50 16f 73.9/91.2 75.8/92.1 K400_TCP_TSN_R50_16f Img1K_R50_GCP log
TCPNet TEA R50 16f 75.3/92.2 77.2/93.1 K400_TCP_TEA_R50_16f Img1K_Res2Net50_GCP log
TCPNet TSN R152 16f 77.2/93.1 79.3/94.0 K400_TCP_TSN_R152_16f Img11K_1K_R152_GCP TODO

Mini-Kinetics-200

Method Backbone frames 1 crop Acc (%) 30 views Acc (%) Model Pretrained Model
TCPNet TSN R50 8f 78.7 80.7 K200_TCP_TSN_8f K400_TCP_TSN_R50_8f

Environments

pytorch v1.0+(for TCP_TSN); v1.0~1.4(for TCP+TEA)

ffmpeg

graphviz pip install graphviz

tensorboard pip install tensorboardX

tqdm pip install tqdm

scikit-learn conda install scikit-learn

matplotlib conda install -c conda-forge matplotlib

fvcore pip install 'git+https://github.com/facebookresearch/fvcore'

Dataset Preparation

We provide a detailed dataset preparation guideline for Kinetics-400 and Mini-Kinetics-200. See Dataset preparation.

StartUp

  1. download the pretrained model and put it in pretrained_models/
  2. execute the training script file e.g.: sh script/K400/train_TCP_TSN_8f_R50.sh
  3. execute the inference script file e.g.: sh script/K400/test_TCP_TSN_R50_8f.sh

TCP Code


├── ops
|    ├── TCP
|    |   ├── TCP_module.py
|    |   ├── TCP_att_module.py
|    |   ├── TSA.py
|    |   └── TCA.py
|    ├ ...
├ ...

Acknowledgement

  • We thank TSM for providing well-designed 2D action recognition toolbox.
  • We also refer to some functions from iSQRT, TEA and Non-local.
  • Mini-K200 dataset samplling strategy follows Mini_K200.
  • We would like to thank Facebook for developing pytorch toolbox.

Thanks for their work!

You might also like...
CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement
CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network - Pytorch This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal

The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.
The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Temporal Query Networks for Fine-grained Video Understanding 📋 This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks

Code for
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Learning the Best Pooling Strategy for Visual Semantic Embedding Official PyTorch implementation of the paper Learning the Best Pooling Strategy for V

Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

ATLOP Code for AAAI 2021 paper Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling. If you make use of this co

Compact Bilinear Pooling for PyTorch

Compact Bilinear Pooling for PyTorch. This repository has a pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch. This

A Pytorch Implementation for Compact Bilinear Pooling.

CompactBilinearPooling-Pytorch A Pytorch Implementation for Compact Bilinear Pooling. Adapted from tensorflow_compact_bilinear_pooling Prerequisites I

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding (AAAI 2020) - PyTorch Implementation

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding PyTorch implementation for the Scalable Attentive Sentence-Pair Modeling vi

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

Dynamic Attentive Graph Learning for Image Restoration, ICCV2021 [PyTorch Code]
Dynamic Attentive Graph Learning for Image Restoration, ICCV2021 [PyTorch Code]

Dynamic Attentive Graph Learning for Image Restoration This repository is for GATIR introduced in the following paper: Chong Mou, Jian Zhang, Zhuoyuan

Comments
  • Error in Loading Weights When Fine-tuning

    Error in Loading Weights When Fine-tuning

    Hi,

    I want to perform fine-tuning on the HMDB51 dataset using the pre-trained models on K400.

    I use the script 'train_hmdb51_TCP_TSN_R50.sh' and 'train_hmdb51_TCP_TEA_R50.sh' to load the weights from two pre-trained models 'K400_TCP_TSN_R50_8f.pth.tar' and 'K400_TCP_TEA_R50_8f.pth.tar', respectively.

    However, both scripts have error in unexpected keys.

    For TEA, the error says: "Unexpected key(s) in state_dict: "module.base_model.iSQRT.layer_reduce1.0.weight", "module.base_model.iSQRT.layer_reduce1.1.weight", "module.base_model.iSQRT.layer_reduce1.1.bias", "module.base_model.iSQRT.layer_reduce1.1.running_mean", "module.base_model.iSQRT.layer_reduce1.1.running_var", "module.base_model.iSQRT.layer_reduce1.1.num_batches_tracked", "module.base_model.iSQRT.layer_reduce2.0.weight", "module.base_model.iSQRT.layer_reduce2.1.weight", "module.base_model.iSQRT.layer_reduce2.1.bias", "module.base_model.iSQRT.layer_reduce2.1.running_mean", "module.base_model.iSQRT.layer_reduce2.1.running_var", "module.base_model.iSQRT.layer_reduce2.1.num_batches_tracked", "module.base_model.iSQRT.att_module.conv_1.weight", "module.base_model.iSQRT.att_module.conv_1.bias", "module.base_model.iSQRT.att_module.conv_2.weight", "module.base_model.iSQRT.att_module.conv_2.bias", "module.base_model.iSQRT.att_module.conv_1d.weight", "module.base_model.iSQRT.att_module.conv_1d.bias", "module.base_model.iSQRT.att_module.sp_att.conv_theta.weight", "module.base_model.iSQRT.att_module.sp_att.conv_theta.bias", "module.base_model.iSQRT.att_module.sp_att.conv_phi.weight", "module.base_model.iSQRT.att_module.sp_att.conv_phi.bias", "module.base_model.iSQRT.att_module.sp_att.conv_g.weight", "module.base_model.iSQRT.att_module.sp_att.conv_g.bias", "module.base_model.iSQRT.att_module.sp_att.norm.weight", "module.base_model.iSQRT.att_module.sp_att.norm.bias", "module.base_model.iSQRT.att_module.sp_att.norm.running_mean", "module.base_model.iSQRT.att_module.sp_att.norm.running_var", "module.base_model.iSQRT.att_module.sp_att.norm.num_batches_tracked"."

    For TSN, the error is "Unexpected key(s) in state_dict: "module.base_model.layer4.iSQRT.layer_reduce1.weight", "module.base_model.layer4.iSQRT.layer_reduce_bn1.weight", "module.base_model.layer4.iSQRT.layer_reduce_bn1.bias", "module.base_model.layer4.iSQRT.layer_reduce_bn1.running_mean", "module.base_model.layer4.iSQRT.layer_reduce_bn1.running_var", "module.base_model.layer4.iSQRT.layer_reduce_bn1.num_batches_tracked", "module.base_model.layer4.iSQRT.layer_reduce2.weight", "module.base_model.layer4.iSQRT.layer_reduce_bn2.weight", "module.base_model.layer4.iSQRT.layer_reduce_bn2.bias", "module.base_model.layer4.iSQRT.layer_reduce_bn2.running_mean", "module.base_model.layer4.iSQRT.layer_reduce_bn2.running_var", "module.base_model.layer4.iSQRT.layer_reduce_bn2.num_batches_tracked", "module.base_model.layer4.iSQRT.att_module.conv_1.weight", "module.base_model.layer4.iSQRT.att_module.conv_1.bias", "module.base_model.layer4.iSQRT.att_module.conv_2.weight", "module.base_model.layer4.iSQRT.att_module.conv_2.bias", "module.base_model.layer4.iSQRT.att_module.conv_1d.weight", "module.base_model.layer4.iSQRT.att_module.conv_1d.bias", "module.base_model.layer4.iSQRT.att_module.sp_att.conv_theta.weight", "module.base_model.layer4.iSQRT.att_module.sp_att.conv_theta.bias", "module.base_model.layer4.iSQRT.att_module.sp_att.conv_phi.weight", "module.base_model.layer4.iSQRT.att_module.sp_att.conv_phi.bias", "module.base_model.layer4.iSQRT.att_module.sp_att.conv_g.weight", "module.base_model.layer4.iSQRT.att_module.sp_att.conv_g.bias", "module.base_model.layer4.iSQRT.att_module.sp_att.norm.weight", "module.base_model.layer4.iSQRT.att_module.sp_att.norm.bias", "module.base_model.layer4.iSQRT.att_module.sp_att.norm.running_mean","

    I think something goes wrong with the key mappings.

    Can you have a look at the codes and help to fix the issue?

    Thanks in advance!

    opened by KingJamesSong 3
Owner
Zilin Gao
Zilin Gao
This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

Locus This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order

Robotics and Autonomous Systems Group 96 Dec 15, 2022
Code for Understanding Pooling in Graph Neural Networks

Select, Reduce, Connect This repository contains the code used for the experiments of: "Understanding Pooling in Graph Neural Networks" Setup Install

Daniele Grattarola 37 Dec 13, 2022
Facebook Research 605 Jan 2, 2023
Python package for covariance matrices manipulation and Biosignal classification with application in Brain Computer interface

pyRiemann pyRiemann is a python package for covariance matrices manipulation and classification through Riemannian geometry. The primary target is cla

null 447 Jan 5, 2023
The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).

DeepBDC for few-shot learning        Introduction In this repo, we provide the implementation of the following paper: "Joint Distribution Matters: Dee

FeiLong 116 Dec 19, 2022
Code for the paper "How Attentive are Graph Attention Networks?"

How Attentive are Graph Attention Networks? This repository is the official implementation of How Attentive are Graph Attention Networks?. The PyTorch

null 175 Dec 29, 2022
[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

pytorch-deep-video-prior (DVP) Official PyTorch implementation for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior TensorFlo

Yazhou XING 90 Oct 19, 2022
TDN: Temporal Difference Networks for Efficient Action Recognition

TDN: Temporal Difference Networks for Efficient Action Recognition Overview We release the PyTorch code of the TDN(Temporal Difference Networks).

Multimedia Computing Group, Nanjing University 326 Dec 13, 2022
Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks Abstract Facial expression recognition in video

Bogireddy Sai Prasanna Teja Reddy 103 Dec 29, 2022
Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Reminder ST-GCN has transferred to MMSkeleton, and keep on developing as an flexible open source toolbox for skeleton-based human understanding. You a

sijie yan 1.1k Dec 25, 2022