[NeurIPS'20] Self-supervised Co-Training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman.

Related tags

Deep Learning CoCLR
Overview

CoCLR: Self-supervised Co-Training for Video Representation Learning

arch

This repository contains the implementation of:

  • InfoNCE (MoCo on videos)
  • UberNCE (supervised contrastive learning on videos)
  • CoCLR

Link:

[Project Page] [PDF] [Arxiv]

News

  • [2021.01.29] Upload both RGB and optical flow dataset for UCF101 (links).
  • [2021.01.11] Update our paper for NeurIPS2020 final version: corrected InfoNCE-RGB-linearProbe baseline result in Table1 from 52.3% (pretrained for 800 epochs, unnessary and unfair) to 46.8% (pretrained for 500 epochs, fair comparison). Thanks @liuhualin333 for pointing out.
  • [2020.12.08] Update instructions.
  • [2020.11.17] Upload pretrained weights for UCF101 experiments.
  • [2020.10.30] Update "draft" dataloader files, CoCLR code, evaluation code as requested by some researchers. Will check and add detailed instructions later.

Pretrain Instruction

  • InfoNCE pretrain on UCF101-RGB
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_nce.py --net s3d --model infonce --moco-k 2048 \
--dataset ucf101-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 300 --schedule 250 280 -j 16
  • InfoNCE pretrain on UCF101-Flow
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_nce.py --net s3d --model infonce --moco-k 2048 \
--dataset ucf101-f-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 300 --schedule 250 280 -j 16
  • CoCLR pretrain on UCF101 for one cycle
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 2048 \
--dataset ucf101-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 100 --schedule 80 --name_prefix Cycle1-FlowMining_ -j 8 \
--pretrain {rgb_infoNCE_checkpoint.pth.tar} {flow_infoNCE_checkpoint.pth.tar}
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 2048 --reverse \
--dataset ucf101-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 100 --schedule 80 --name_prefix Cycle1-RGBMining_ -j 8 \
--pretrain {flow_infoNCE_checkpoint.pth.tar} {rgb_cycle1_checkpoint.pth.tar} 
  • InfoNCE pretrain on K400-RGB
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
--nproc_per_node=4 main_infonce.py --net s3d --model infonce --moco-k 16384 \
--dataset k400-2clip --lr 1e-3 --seq_len 32 --ds 1 --batch_size 32 \
--epochs 300 --schedule 250 280 -j 16
  • InfoNCE pretrain on K400-Flow
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
--nproc_per_node=4 teco_fb_main.py --net s3d --model infonce --moco-k 16384 \
--dataset k400-f-2clip --lr 1e-3 --seq_len 32 --ds 1 --batch_size 32 \
--epochs 300 --schedule 250 280 -j 16
  • CoCLR pretrain on K400 for one cycle
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 16384 \
--dataset k400-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 50 --schedule 40 --name_prefix Cycle1-FlowMining_ -j 8 \
--pretrain {rgb_infoNCE_checkpoint.pth.tar} {flow_infoNCE_checkpoint.pth.tar}
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 16384 --reverse \
--dataset k400-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 50 --schedule 40 --name_prefix Cycle1-RGBMining_ -j 8 \
--pretrain {flow_infoNCE_checkpoint.pth.tar} {rgb_cycle1_checkpoint.pth.tar} 

Finetune Instruction

cd eval/ e.g. finetune UCF101-rgb:

CUDA_VISIBLE_DEVICES=0,1 python main_classifier.py --net s3d --dataset ucf101 \
--seq_len 32 --ds 1 --batch_size 32 --train_what ft --epochs 500 --schedule 400 450 \
--pretrain {selected_rgb_pretrained_checkpoint.pth.tar}

then run the test with 10-crop (test-time augmentation is helpful, 10-crop gives better result than center-crop):

CUDA_VISIBLE_DEVICES=0,1 python main_classifier.py --net s3d --dataset ucf101 \
--seq_len 32 --ds 1 --batch_size 32 --train_what ft --epochs 500 --schedule 400 450 \
--test {selected_rgb_finetuned_checkpoint.pth.tar} --ten_crop

Nearest-neighbour Retrieval Instruction

cd eval/ e.g. nn-retrieval for UCF101-rgb

CUDA_VISIBLE_DEVICES=0 python main_classifier.py --net s3d --dataset ucf101 \
--seq_len 32 --ds 1 --test {selected_rgb_pretrained_checkpoint.pth.tar} --retrieval

Linear-probe Instruction

cd eval/

from extracted feature

The code support two methods on linear-probe, either feed the data end-to-end and freeze the backbone, or train linear layer on extracted features. Both methods give similar best results in our experiments.

e.g. on extracted features (after run NN-retrieval command above, features will be saved in os.path.dirname(checkpoint))

CUDA_VISIBLE_DEVICES=0 python feature_linear_probe.py --dataset ucf101 \
--test {feature_dirname} --final_bn --lr 1.0 --wd 1e-3

Note that the default setting should give an alright performance, maybe 1-2% lower than our paper's figure. For different datasets, lr and wd need to be tuned from lr: 0.1 to 1.0; wd: 1e-4 to 1e-1.

load data and freeze backbone

alternatively, feed data end-to-end and freeze the backbone.

CUDA_VISIBLE_DEVICES=0,1 python main_classifier.py --net s3d --dataset ucf101 \
--seq_len 32 --ds 1 --batch_size 32 --train_what last --epochs 100 --schedule 60 80 \
--optim sgd --lr 1e-1 --wd 1e-3 --final_bn --pretrain {selected_rgb_pretrained_checkpoint.pth.tar}

Similarly, lr and wd need to be tuned for different datasets for best performance.

Dataset

Result

Finetune entire network for action classification on UCF101: arch

Pretrained Weights

Our models:

  • UCF101-RGB-CoCLR: [download] [NN@1=51.8 on UCF101-RGB]
  • UCF101-Flow-CoCLR: [download] [NN@1=48.4 on UCF101-Flow]

Baseline models:

  • UCF101-RGB-InfoNCE: [download] [NN@1=33.1 on UCF101-RGB]
  • UCF101-Flow-InfoNCE: [download] [NN@1=45.2 on UCF101-Flow]

Kinetics400-pretrained models:

  • K400-RGB-CoCLR: [download] [NN@1=45.6, Finetune-Acc@1=87.89 on UCF101-RGB]
  • K400-Flow-CoCLR: [download] [NN@1=44.4, Finetune-Acc@1=85.27 on UCF101-Flow]
  • Two-stream result by average the class probability: 0.8789 + 0.8527 => 0.9061
Comments
  • about Initialization & Alternation

    about Initialization & Alternation

    1. Initialization -> use of the pretrained InfoNCE checkpoint.pth.tar

    CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 2048 --dataset ucf101-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 --epochs 100 --schedule 80 --name_prefix Cycle1-FlowMining_ -j 4 --pretrain /mypath/CoCLR/pretrained_by_TH/InfoNCE-ucf101-rgb-128-s3d-ep399.pth.tar /mypath/CoCLR/pretrained_by_TH/InfoNCE-ucf101-f-128-s3d-ep396.pth.tar

    If i type it like this, does it Initialization? and when i do that, these words are printed out: =======Check Weights Loading====== Weights not used from pretrained file:

    Weights not loaded into new model: queue queue_ptr queue_second queue_vname queue_label

    Why is the weights of the pretrained model not used??

    1. Alternation : In your paper, "where each cycle refers to a complete optimization of L1 and L2; meaning, the alternation only happens after the RGB or Flow network has converged."

    So I entered what I wrote above into the terminal, and now I'm Training. (i.e Cycle 1 FlowMining) But acc@1 and acc@5 don't go over 1, is this the right value to have? Or is something wrong?

    ++ Additional If something's wrong, there's one thing I'm concerned about: in lmdb_dataset.py, I got a error for i.decode():

    AttributeError: 'str' object has no attribute 'decode'

    To fix this, I do that: self.db_keys_flow = msgpack.loads(txn.get(b'keys'), raw=True) self.db_order_flow = msgpack.loads(txn.get(b'order'), raw=True) . . self.db_order_rgb = msgpack.unpackb(txn.get(b'order'),raw=True) . . raw_rgb = msgpack.loads(txn.get(self.get_video_id_rgb[vname].encode('ascii')), raw=True) raw_flow = msgpack.loads(txn.get(self.get_video_id_flow[vname].encode('ascii')), raw=True)

    I added "raw=True" and is this causing an error?

    opened by junmin98 20
  • Question about reproducing CoCLR results

    Question about reproducing CoCLR results

    Hi Tengda,

    I am currently trying to replicate your CoCLR result as one of the baselines in our work with the code you provide. However, I encounter some reproduction issues during the training. I understand that the code is not ready yet. It would be much appreciated if you could help us with replication. Thank you so much!

    1. I found out that the Top 1 MoCo accuracy is quite low (only 4-5 percent in UCF101) with 1e-3 lr and Adam Optimizer, 1e-5 weight decay, 2048 moco queue size and 128 batch size. I wonder if you could provide a detailed training command for our reference.

    2. The augmentation is not really clip-wise consistent since the value passed in is false. I wonder if this version is not final version. Could you provide the correct version of the augmentation you use?

    3. Currently the code for data loader is not released and I don't know how input is prepared in data loader to be passed to TwoCropTransform and OneCropTransform. Could you please share the data loader code for our better replication?

    Best Regards, Hualin

    opened by liuhualin333 17
  • Questions on reproducing training from scratch (77% in Table 1)

    Questions on reproducing training from scratch (77% in Table 1)

    Hi Tengda, thanks for these detailed answers. I looked into all of them, seems no detailed training instruction is given on using main_classifier.py to train from scratch. The thing is, I train on UCF101 with rgb from scratch, after 500 epochs, the reported validation accuracy is 46.1%, while in test set it is only 3.41% (center crop only, top1). The detailed commands are as below:

    -- Training

    CUDA_VISIBLE_DEVICES=0,1 python main_classifier.py  --train_what all --epoch 500 --batch_size 24 --lr 1e-3 --wd 1e-3 --dropout 0.9 --schedule [60, 80]
    

    -- Testing

    CUDA_VISIBLE_DEVICES=0,1 python main_classifier.py --test epochxxx.pth --ten_crop
    

    Would you like to have a quick look and help me to figure it out which configuration I made wrong? Though my computation resources is not enough, it is hard to understand why there is such a big gap between validation accuracy and test accuracy? My sincere appreciation.

    opened by June01 8
  •  Kinetics-400 dataset

    Kinetics-400 dataset

    Hi, I'm new to Kinetics-400 dataset. Can you provide some tutorial or instrcutions on how to generate lmdb for Kinetics-400 dataset. I find some useful message on non-local repositiry but not sure it's the proper way, thx ~

    opened by JiaxinZhuang 8
  • lmdb_dataset.py  txn.get(self.get_video_id[name].encode('ascii')))

    lmdb_dataset.py txn.get(self.get_video_id[name].encode('ascii')))

    20210722155421

    When I ran the program, I found "txn.get(self.get_video_id[name].encode('ascii')))" severely limiting the speed. Meanwhile, CPU is free. I don't know what the problem is. Hope to get help. Thanks

    opened by xiaochehe 6
  • cannot access your train/val split csv

    cannot access your train/val split csv

    Hi, I am trying to download your train/val split csv file here: https://github.com/TengdaHan/CoCLR/tree/main/process_data/data/k400

    But it says 403 forbidden. I believe you put it in some internal-only storage?

    opened by thematrixduo 6
  • Simple question on video classification of self-supervised learning and full-supervision methods

    Simple question on video classification of self-supervised learning and full-supervision methods

    Hi Tengda,

    Thanks for the detailed instruction for this code. I am a newbie in this field, have a very simple question regarding to table 2, and in desparate need of your help. Thanks very much in advance!

    Question: From what I understand, self-supervised learning could be used to learn essencial video representation. So I guess with weights learnt by self-supervised learning methods, training the S3D network on UCF-101 will yield better results than train with random initialization. From Table 2, I suppose 90.6 is the former, and 96.8 is the latter. Would you like to explain a bit why there is such a gap?

    opened by June01 6
  • questions of training details of coclr

    questions of training details of coclr

    Hi, im trying to replicate your result on the alternation stage, I now use two init models you provided (both 400~ epochs). I have two questions.

    1). According to your paper, "At the alternation stage, on UCF101 the model is trained for two cycles, where each cycle includes 200 epochs, i.e. RGB and Flow networks are each trained for 100 epochs". Does that main I need to run main_coclr.py four times? each time with 100 epoch and the newest two pretrained models I have from the previous training process?

    2). If so, what lr do you use in each of four 100 epochs in the alternation stage? I also checked the COCLR pretrained model you provided, it seems in 182 epoch and 109 epoch the lr is 1e-4. Is that mean I need to train the second cycles with larger lr, e.g. 1e-2, and decay down to 1e-4?

    Best Regards, Yuqi

    opened by YuqiHUO 6
  • data preparation step

    data preparation step

    Hi Tengda, thanks for sharing your code. I found CoCLR has no data preparation instruction and could you please provide some details about data preprocessing from the raw data? I found similar instruction in DPC and MemDPC, are they feasible for CoCLR?

    opened by justlovebarbecue 5
  • is it possible to train main_coclr.py using single GPU?

    is it possible to train main_coclr.py using single GPU?

    I have only one gpu.

    I wanted to train, so I entered the terminal as follows: CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main_coclr.py

    but i got an error: subprocess.CalledProcessError: Command '['/home/junmin/anaconda3/envs/python36/bin/python', '-u', 'main_coclr.py', '--local_rank=0']' returned non-zero exit status 1.

    Is there any way to train with a single GPU?

    opened by junmin98 5
  • CoCLR using only RGB frame(1 stream)

    CoCLR using only RGB frame(1 stream)

    It took too long to extract the flow, so I am trying to train coclr using only rgb frames(1-stream). Is that possible?

    I think it's possible when you see this part in Table 2 of your paper image

    If possible, can you tell me how to train using only rgb frames?

    ps. Thanks for always answering in detail!

    opened by junmin98 4
  • main_classifier.py: error: unrecognized arguments: --final_bn

    main_classifier.py: error: unrecognized arguments: --final_bn

    When i try : CUDA_VISIBLE_DEVICES=0,1,2 python main_classifier.py --net s3d --dataset ucf101 --seq_len 32 --ds 1 --batch_size 32 --train_what last --epochs 30 --schedule 60 80 --optim sgd --lr 1e-1 --wd 1e-3 --final_bn --pretrain CoCLR-ucf101-rgb-128-s3d-ep182.pth Out: usage: main_classifier.py [-h] [--net NET] [--model MODEL] [--dataset DATASET] [--which_split WHICH_SPLIT] [--seq_len SEQ_LEN] [--num_seq NUM_SEQ] [--num_fc NUM_FC] [--ds DS] [--batch_size BATCH_SIZE] [--optim OPTIM] [--lr LR] [--schedule [SCHEDULE [SCHEDULE ...]]] [--wd WD] [--dropout DROPOUT] [--epochs EPOCHS] [--start_epoch START_EPOCH] [--gpu GPU] [--train_what TRAIN_WHAT] [--img_dim IMG_DIM] [--print_freq PRINT_FREQ] [--eval_freq EVAL_FREQ] [--reset_lr] [--prefix PREFIX] [-j WORKERS] [--cos] [--resume RESUME] [--pretrain PRETRAIN] [--test TEST] [--retrieval] [--dirname DIRNAME] [--center_crop] [--five_crop] [--ten_crop] main_classifier.py: error: unrecognized arguments: --final_bn

    May I ask how to use the command line command --final_bn. After the above error occurred, I deleted --final_bn. Although it can run normally, it shows: Weights not loaded into new model: final_bn.weight final_bn.bias final_bn.running_mean final_bn.running_var final_bn.num_batches_tracked final_fc.0.weight final_fc.0.bias

    Thanks

    opened by wys2929 0
  • Two-stream feature

    Two-stream feature

    How can I get the two-stream fearture? And the rgb pretrained model and flow model can be use to extract two-sream feature? How can I input the command?

    opened by DoublePan-Oh 0
Owner
Tengda Han
Tengda Han
Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Bridging Multi-Task Learning and Meta-Learning Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Trainin

AI Secure 57 Dec 15, 2022
[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

CPT: Efficient Deep Neural Network Training via Cyclic Precision Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin Accep

null 26 Oct 25, 2022
[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

VITA 71 Dec 28, 2022
Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

Microsoft 282 Jan 9, 2023
[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

TBE The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Le

Jinpeng Wang 150 Dec 28, 2022
Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

null 41 Jan 6, 2023
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

gHHC Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, D

Nicholas Monath 35 Nov 16, 2022
Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Dense Contrastive Learning for Self-Supervised Visual Pre-Training This project hosts the code for implementing the DenseCL algorithm for se

Xinlong Wang 491 Jan 3, 2023
Self-supervised learning on Graph Representation Learning (node-level task)

graph_SSL Self-supervised learning on Graph Representation Learning (node-level task) How to run the code To run GRACE, sh run_GRACE.sh To run GCA, sh

Namkyeong Lee 3 Dec 31, 2021
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

VITA 59 Dec 28, 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training [Arxiv] VideoMAE: Masked Autoencoders are Data-Efficient Learne

Multimedia Computing Group, Nanjing University 697 Jan 7, 2023
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 49 Nov 23, 2022
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

NTT Communication Science Laboratories 160 Jan 4, 2023
Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

Self-supervised Graph-level Representation Learning with Local and Global Structure Introduction This project is an implementation of ``Self-supervise

MilaGraph 50 Dec 9, 2022
A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

MERIT A PyTorch implementation of our IJCAI-21 paper Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning. Depen

Graph Analysis & Deep Learning Laboratory, GRAND 32 Jan 2, 2023
Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds This is the official code implementation for the paper "Spatio-temporal Se

Hesper 63 Jan 5, 2023
A self-supervised 3D representation learning framework named viewpoint bottleneck.

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck Paper Created by Liyi Luo, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI In

null 63 Aug 11, 2022
A self-supervised 3D representation learning framework named viewpoint bottleneck.

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck Paper Created by Liyi Luo, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI In

null 42 Sep 24, 2021