Video Contrastive Learning with Global Context

Last update: Dec 26, 2022

Related tags

Overview

Video Contrastive Learning with Global Context (VCLR)

This is the official PyTorch implementation of our VCLR paper.

Install dependencies

environments

conda create --name vclr python=3.7
conda activate vclr
conda install numpy scipy scikit-learn matplotlib scikit-image
pip install torch==1.7.1 torchvision==0.8.2
pip install opencv-python tqdm termcolor gcc7 ffmpeg tensorflow==1.15.2
pip install mmcv-full==1.2.7

Prepare datasets

Please refer to PREPARE_DATA to prepare the datasets.

Prepare pretrained MoCo weights

In this work, we follow SeCo and use the pretrained weights of MoCov2 as initialization.

cd ~
git clone https://github.com/amazon-research/video-contrastive-learning.git
cd video-contrastive-learning
mkdir pretrain && cd pretrain
wget https://dl.fbaipublicfiles.com/moco/moco_checkpoints/moco_v2_200ep/moco_v2_200ep_pretrain.pth.tar
cd ..

Self-supervised pretraining

bash shell/main_train.sh

Checkpoints will be saved to ./results

Downstream tasks

Linear evaluation

In order to evaluate the effectiveness of self-supervised learning, we conduct a linear evaluation (probing) on Kinetics400 dataset. Basically, we first extract features from the pretrained weight and then train a SVM classifier to see how the learned features perform.

bash shell/eval_svm.sh

Results

Arch	Pretrained dataset	Epoch	Pretrained model	Acc. on K400
ResNet50	Kinetics400	400	Download link	64.1

Video retrieval

bash shell/eval_retrieval.sh

Results

Arch	Pretrained dataset	Epoch	Pretrained model	R@1 on UCF101	R@1 on HMDB51
ResNet50	Kinetics400	400	Download link	70.6	35.2
ResNet50	UCF101	400	Download link	46.8	17.6

Action recognition & action localization

Here, we use mmaction2 for both tasks. If you are not familiar with mmaction2, you can read the official documentation.

Installation

Step1: Install mmaction2

To make sure the results can be reproduced, please use our forked version of mmaction2 (version: 0.11.0):
```
conda activate vclr
cd ~
git clone https://github.com/KuangHaofei/mmaction2

cd mmaction2
pip install -v -e .
```
Step2: Prepare the pretrained weights

Our pretrained backbone have different format with the backbone of mmaction2, it should be transferred to mmaction2 format. We provide the transferred version of our K400 pretrained weights, TSN and TSM. We also provide the script for transferring weights, you can find it here.

Moving the pretrained weights to checkpoints directory:
```
cd ~/mmaction2
mkdir checkpoints
wget https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_mm.pth
wget https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_mm_tsm.pth
```

Action recognition

Make sure you have prepared the dataset and environments following the previous step. Now suppose you are in the root directory of mmaction2, follow the subsequent steps to fine tune the TSN or TSM models for action recognition.

For each dataset, the train and test setting can be found in the configuration files.

UCF101

config file: tsn_ucf101.py

train command:

./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_ucf101.py 8 \
  --validate --seed 0 --deterministic

test command:

python tools/test.py configs/recognition/tsn/vclr/tsn_ucf101.py \
  work_dirs/vclr/ucf101/latest.pth \
  --eval top_k_accuracy mean_class_accuracy --out result.json

HMDB51

config file: tsn_hmdb51.py

train command:

./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_hmdb51.py 8 \
  --validate --seed 0 --deterministic

test command:

python tools/test.py configs/recognition/tsn/vclr/tsn_hmdb51.py \
  work_dirs/vclr/hmdb51/latest.pth \
  --eval top_k_accuracy mean_class_accuracy --out result.json

SomethingSomethingV2: TSN

config file: tsn_sthv2.py

train command:

./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_sthv2.py 8 \
  --validate --seed 0 --deterministic

test command:

python tools/test.py configs/recognition/tsn/vclr/tsn_sthv2.py \
  work_dirs/vclr/tsn_sthv2/latest.pth \
  --eval top_k_accuracy mean_class_accuracy --out result.json

SomethingSomethingV2: TSM

config file: tsm_sthv2.py

train command:

./tools/dist_train.sh configs/recognition/tsm/vclr/tsm_sthv2.py 8 \
  --validate --seed 0 --deterministic

test command:

python tools/test.py configs/recognition/tsm/vclr/tsm_sthv2.py \
  work_dirs/vclr/tsm_sthv2/latest.pth \
  --eval top_k_accuracy mean_class_accuracy --out result.json

ActivityNet

config file: tsn_activitynet.py

train command:

./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_activitynet.py 8 \
  --validate --seed 0 --deterministic

test command:

python tools/test.py configs/recognition/tsn/vclr/tsn_activitynet.py \
  work_dirs/vclr/tsn_activitynet/latest.pth \
  --eval top_k_accuracy mean_class_accuracy --out result.json

Results

Arch	Dataset	Finetuned model	Acc.
TSN	UCF101	Download link	85.6
TSN	HMDB51	Download link	54.1
TSN	SomethingSomethingV2	Download link	33.3
TSM	SomethingSomethingV2	Download link	52.0
TSN	ActivityNet	Download link	71.9

Action localization

Step 1: Follow the previous section, suppose the finetuned model is saved at work_dirs/vclr/tsn_activitynet/latest.pth

Step 2: Extract ActivityNet features

cd ~/mmaction2/tools/data/activitynet/

python tsn_feature_extraction.py --data-prefix /home/ubuntu/data/ActivityNet/rawframes \
  --data-list /home/ubuntu/data/ActivityNet/anet_train_video.txt \
  --output-prefix /home/ubuntu/data/ActivityNet/rgb_feat \
  --modality RGB --ckpt /home/ubuntu/mmaction2/work_dirs/vclr/tsn_activitynet/latest.pth

python tsn_feature_extraction.py --data-prefix /home/ubuntu/data/ActivityNet/rawframes \
  --data-list /home/ubuntu/data/ActivityNet/anet_val_video.txt \
  --output-prefix /home/ubuntu/data/ActivityNet/rgb_feat \
  --modality RGB --ckpt /home/ubuntu/mmaction2/work_dirs/vclr/tsn_activitynet/latest.pth

python activitynet_feature_postprocessing.py \
  --rgb /home/ubuntu/data/ActivityNet/rgb_feat \
  --dest /home/ubuntu/data/ActivityNet/mmaction_feat

Note, the root directory of ActivityNey is /home/ubuntu/data/ActivityNet/ in our case. Please replace it according to your real directory.

Step 3: Train and test the BMN model

train

cd ~/mmaction2
./tools/dist_train.sh configs/localization/bmn/bmn_acitivitynet_feature_vclr.py 2 \
  --work-dir work_dirs/vclr/bmn_activitynet --validate --seed 0 --deterministic --bmn

test

python tools/test.py configs/localization/bmn/bmn_acitivitynet_feature_vclr.py \
  work_dirs/vclr/bmn_activitynet/latest.pth \
  --bmn --eval AR@AN --out result.json

Results

Arch Dataset Finetuned model AUC AR@100

BMN ActivityNet Download link 65.5 73.8

Arch	Dataset	Finetuned model	AUC	AR@100
BMN	ActivityNet	Download link	65.5	73.8

Feature visualization

We provide our feature visualization code at here.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Comments

In script shell/eval_retrieval.sh

In line 34, is that should be python3 eval_retrieve_knn_pred.py instead of python3 eval_retrieval_knn_pred.py? It seems no file named eval_retrieval_knn_pred.py

opened by AllenPu 5
mmcv installation problem

Hi when I run pip install mmcv-full==1.2.7 and I found that I got the error gcc: error: unrecognized command line option ‘-std=c++14’ error: command 'gcc' failed with exit status 1, is that because the gcc version I used is not right? My gcc version is 4.8.5

opened by AllenPu 2
NCESoftmaxLoss bug

The implementation of the InfoNCE loss NCESoftmaxLoss seems to have a bug in it. You seems to be setting the labels to all zeros where there is no indicator for positive pairs but instead everything is a negative pair. Is this a desired feature or a bug?

https://github.com/amazon-research/video-contrastive-learning/blob/9a8d0473fd938dd77dfa6db2762abbc567da040e/models/Contrast.py#L14-L23

opened by icorley-bsky 2
About frame extract failed

Hi, Sorry for troubling you again, I ran the ./tools/data/k400/extract_frames.py , however, some of the videos which empty or very small or have some error(cannot be opened) would cause into fail frame extraction and leave an empty folder, will that affect the training?

opened by AllenPu 1
fixed typo: retrieve -> retrieval
Issue #8

Description of changes: Fixed the typo of the file names about video retrieval by replacing retrieve with retrieval.

Details:

eval_retrieve_knn_pred.py -> eval_retrieval_knn_pred.py

eval_retrieve_store_imgs.py -> eval_retrieval_store_imgs.py

shell/eval_retrieve.sh -> shell/eval_retrieval.sh
opened by KuangHaofei 0
Minor change in readme

Issue #, if available:

Description of changes: Change few link in readme.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by bryanyzhu 0
Add source code

Issue #, if available:

Description of changes: Add all source code.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by bryanyzhu 0
Training VCLR with 3D CNNs

Hi @KuangHaofei and @bryanyzhu

I was looking at your paper. Could you please explain how to apply VCLR at video level and not on frame level? How do I modify the dataloader to input video clips into the model? Here you are only taking frame level inputs from the segments?

That basically means how do I modify the code for the calculation of L_inter, L_intra, L_segment and L_order for 3D CNNs for pre-training?

Thank you

opened by rayush7 0
Pretrain the model on UCF101

Hi, I pretrained the model on UCF101, and the linear evaluation on UCF101 is 74.0946%.

But the result of the pretrained model you provided is 59.8731%.

I just modified the code of the data loading part, and the other settings have not been changed. Do you have any changes to the settings when you pretrain the model on UCF101?

opened by wangll1212 5

Video Contrastive Learning with Global Context

Related tags

Overview

Video Contrastive Learning with Global Context (VCLR)

Install dependencies

Prepare datasets

Prepare pretrained MoCo weights

Self-supervised pretraining

Downstream tasks

Linear evaluation

Video retrieval

Action recognition & action localization

Installation

Action recognition

Action localization

Feature visualization

Security

License

Comments

Owner

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Fast and Context-Aware Framework for Space-Time Video Super-Resolution (VCIP 2021)

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

Codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Learning to Estimate Hidden Motions with Global Motion Aggregation

Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification

Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Trading Gym is an open source project for the development of reinforcement learning algorithms in the context of trading.

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

PyTorch Implement of Context Encoders: Feature Learning by Inpainting

An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

DCSL - Generalizable Crowd Counting via Diverse Context Style Learning

Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"