[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

Jinpeng Wang

Last update: Oct 16, 2022

Related tags

Overview

DSM

The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

Project Website;

Datasets list and some visualizations/provided weights are preparing now.

1. Introduction (scene-dominated to motion-dominated)

Video datasets are usually scene-dominated, We propose to decouple the scene and the motion (DSM) with two simple operations, so that the model attention towards the motion information is better paid.

The generated triplet is as below:

What DSM learned?

With DSM pretrain, the model learn to focus on motion region (Not necessarily actor) powerful without one label available.

2. Installation

Dataset

Please refer dataset.md for details.

Requirements

Python3
pytorch1.1+
PIL
Intel (on the fly decode)

3. Structure

datasets
- list
  - hmdb51: the train/val lists of HMDB51
  - ucf101: the train/val lists of UCF101
  - kinetics-400: the train/val lists of kinetics-400
  - diving48: the train/val lists of diving48
experiments
- logs: experiments record in detials
- gradientes: grad check
- visualization:
src
- data: load data
- loss: the loss evaluate in this paper
- model: network architectures
- scripts: train/eval scripts
- augment: detail implementation of Spatio-temporal Augmentation
- utils
- feature_extract.py: feature extractor given pretrained model
- main.py: the main function of finetune
- trainer.py
- option.py
- pt.py: self-supervised pretrain
- ft.py: supervised finetune

DSM(Triplet)/DSM/Random

Self-supervised Pretrain

Kinetics

bash scripts/kinetics/pt.sh

UCF101

bash scripts/ucf101/pt.sh

Supervised Finetune (Clip-level)

HMDB51

bash scripts/hmdb51/ft.sh

UCF101

bash scripts/ucf101/ft.sh

Kinetics

bash scripts/kinetics/ft.sh

Video-level Evaluation

Following common practice TSN and Non-local. The final video-level result is average by 10 temporal window sampling + corner crop, which lead to better result than clip-level. Refer test.py for details.

Pretrain And Eval In one step

bash scripts/hmdb51/pt_and_ft_hmdb51.sh

Notice: More Training Options and ablation study Can be find in scripts

Video Retrieve and other visualization

(1). Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim] for further visualization.

(2). Reterival Evaluation

modify line60-line62 in reterival.py.

python reterival.py

Results

Action Recognition

UCF101 Pretrained (I3D)

Method	UCF101	HMDB51
Random Initialization	47.9	29.6
MoCo Baseline	62.3	36.5
DSM(Triplet)	70.7	48.5
DSM	74.8	52.5

Kinetics Pretrained

Video Retrieve (UCF101-C3D)

Method	@1	@5	@10	@20	@50
DSM	16.8	33.4	43.4	54.6	70.7

Video Retrieve (HMDB51-C3D)

Method	@1	@5	@10	@20	@50
DSM	8.2	25.9	38.1	52.0	75.0

More Visualization

Acknowledgement

This work is partly based on STN, UEL and MoCo.

License

Citation

If you use our code in your research or wish to refer to the baseline results, pleasuse use the followint BibTex entry.

@inproceedings{wang2020enhancing,
  author    = {Lin, Ji and Zhang, Richard and Ganz, Frieder and Han, Song and Zhu, Jun-Yan},
  title     = {Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion},
  booktitle = {AAAI},
  year      = {2021},
}

You might also like...

The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

SGRAF PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”. It is built on top of the SCAN and C

149 Dec 22, 2022

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

296 Dec 29, 2022

Code for KHGT model, AAAI2021

KHGT Code for KHGT accepted by AAAI2021 Please unzip the data files in Datasets/ first. To run KHGT on Yelp data, use python labcode_yelp.py For Movi

32 Nov 29, 2022

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

DAL This project hosts the official implementation for our AAAI 2021 paper: Dynamic Anchor Learning for Arbitrary-Oriented Object Detection [arxiv] [c

215 Nov 28, 2022

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

96 Dec 22, 2022

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

182 Dec 30, 2022

PyTorch implementations for our SIGGRAPH 2021 paper: Editable Free-viewpoint Video Using a Layered Neural Representation.

st-nerf We provide PyTorch implementations for our paper: Editable Free-viewpoint Video Using a Layered Neural Representation SIGGRAPH 2021 Jiakai Zha

258 Jan 2, 2023

code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? Code for paper: Does Unsupervised Architecture Representation

39 Dec 17, 2022

This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

JigsawClustering Jigsaw Clustering for Unsupervised Visual Representation Learning Pengguang Chen, Shu Liu, Jiaya Jia Introduction This project provid

73 Sep 18, 2022

Comments

Pretrained Models

Thanks for the amazing work and for releasing the code. Can you also share some pre-trained models? My goal is to evaluate the performance of pretrained models on new datasets.

opened by fmthoker 0

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

Related tags

Overview

DSM

1. Introduction (scene-dominated to motion-dominated)

What DSM learned?

2. Installation

Dataset

Requirements

3. Structure

DSM(Triplet)/DSM/Random

Self-supervised Pretrain

Kinetics

UCF101

Supervised Finetune (Clip-level)

HMDB51

UCF101

Kinetics

Video-level Evaluation

Pretrain And Eval In one step

Video Retrieve and other visualization

(1). Feature Extractor

(2). Reterival Evaluation

Results

Action Recognition

UCF101 Pretrained (I3D)

Kinetics Pretrained

Video Retrieve (UCF101-C3D)

Video Retrieve (HMDB51-C3D)

More Visualization

Acknowledgement

License

Citation

You might also like...

The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Code for KHGT model, AAAI2021

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

PyTorch implementations for our SIGGRAPH 2021 paper: Editable Free-viewpoint Video Using a Layered Neural Representation.

code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

Comments

Pretrained Models

Owner

Jinpeng Wang

Implementation of our paper 'RESA: Recurrent Feature-Shift Aggregator for Lane Detection' in AAAI2021.

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022

[NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Images

Eff video representation - Efficient video representation through neural fields

Implementation of "JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting"

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"