Code for DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

Overview

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

Pytorch Implementation for DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

If the project is useful to you, please give us a star. ⭐️

image

@article{gao2021disco,
  title={DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning},
  author={Gao, Yuting and Zhuang, Jia-Xin and Li, Ke and Cheng, Hao and Guo, Xiaowei and Huang, Feiyue and Ji, Rongrong and Sun, Xing},
  journal={arXiv preprint arXiv:2104.09124},
  year={2021}
}

Checkpoints

Teacher Models

Architecture Self-supervised Methods Model Checkpoints
ResNet152 MoCo-V2 Model
ResNet101 MoCo-V2 Model
ResNet50 MoCo-V2 Model

For teacher models such as ResNet-50*2 etc, we use their official implementation, which can be downloaded from their github pages.

Student Models by DisCo

Teacher/Students Efficient-B0 ResNet-18 Vit-Tiny XCiT-Tiny
ResNet-50 Model Model - -
ResNet-101 Model Model - -
ResNet-152 Model Model - -
ResNet-50*2 Model Model - -
ViT-Small - - Model -
XCiT-Small - - - Model

Requirements

  • Python3

  • Pytorch 1.6+

  • Detectron2

  • 8 GPUs are preferred

  • ImageNet, Cifar10/100, VOC, COCO

Run

Before running, we firstly move all data into share memory

cp /path/to/ImageNet /dev/shm

Pretrain Model

For pretraining baseline models with default hidden layer dimension in Tab1

# Switch to moco directory
cd moco

# R-50
python3 -u main_moco.py -a resnet50 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 2048 /dev/shm/ 2>&1 | tee ./logs/std.log
python3 main_lincls.py -a resnet50 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# R-101
python3 -u main_moco.py -a resnet101 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 2048 /dev/shm/ 2>&1 | tee ./logs/std.log
python3 main_lincls.py -a resnet101 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# R-152
python3 -u main_moco.py -a resnet152 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 800 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 2048 /dev/shm/ 2>&1 | tee ./logs/std.log
python3 main_lincls.py -a resnet152 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0799.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# Mob
python3 -u main_moco.py -a mobilenetv3 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 512 /dev/shm 2>&1 |  tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a mobilenetv3 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# Effi-B0
python3 -u main_moco.py -a efficientb0 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 1280 2>&1  |  tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# Effi-B1
python3 -u main_moco.py -a efficientb1 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0  --hidden 1280  /dev/shm  2>&1 | tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a efficientb1 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# R-18
python3 -u main_moco.py -a resnet18 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 1280 /dev/shm/ 2>&1 | tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a resnet18 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

# R-34
python3 -u main_moco.py -a resnet34 --batch-size 256 --learning-rate 0.03 --mlp --moco-t 0.2 --aug-plus --cos --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --hidden 1280 /dev/shm/ 2>&1 | tee ./logs/std.log
#          Evaluation
python3 main_lincls.py -a resnet34 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log

DisCo

For training DisCo in Tab1, Comparision with baseline

# Switch to DisCo directory
cd DisCo

# R-50 & Effib0
python3 -u main.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher_arch resnet50 --teacher /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm/ 2>&1 | tee ./logs/std.log
#          Evaluation
python3 -u main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

# R50w2 & Effib0
python3 -u main.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher_arch resnet50w2 --teacher /path/to/swav_RN50w2_400ep_pretrain.pth.tar /dev/shm 2>&1 | tee ./logs/std.log
#          Evaluation
python3 yt_main_lincls.py -a resnet18 --learning-rate 30.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar  /dev/shm 2>&1 | tee ./logs/std.log

For Tab2, Linear evaluation top-1 accuracy (%) on ImageNet compared with different distillation methods.

# RKD+DisCo, Eff-b0
python3 -u main_moco_distill_rkd.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher /path/to/teacher_res50.pth.tar --use-mse /dev/shm  2>&1 | tee ./logs/std.log
#                  Evaluation
python3 -u main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

# RKD, Eff-b0
python3 -u main_moco_distill_rkd.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher /path/to/teacher_res50.pth.tar /dev/shm  2>&1 | tee ./logs/std.log
#                  Evaluation
python3 -u main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

For Tab3 , **Object detection and instance segmentation results **

# Cp data to /dev/shm and set up path for Detectron2
cp -r /path/to/VOCdevkit/* /dev/shm/
cp -r /path/to/coco_2017 /dev/shm/coco
export DETECTRON2_DATASETS=/dev/shm

pip install /youtu-reid/jiaxzhuang/acmm/detectron2-0.4+cu101-cp36-cp36m-linux_x86_64.whl
cd detection

# Convert model for Detectron2
python3 convert-pretrain-to-detectron2.py /path/ckpt/checkpoint_0199.pth.tar ./output.pkl

# Evaluation on VOC
python3 train_net.py --config-file configs/pascal_voc_R_50_C4_24k_moco.yaml --num-gpus 8 --resume MODEL.RESNETS.DEPTH 34 MODEL.RESNETS.RES2_OUT_CHANNELS 64 2>&1 | tee ../logs/std.log
# Evaluation on CoCo
python3 train_net.py --config-file configs/coco_R_50_C4_2x_moco.yaml --num-gpus 8  --resume MODEL.RESNETS.DEPTH 18 MODEL.RESNETS.RES2_OUT_CHANNELS 64 2>&1 | tee ../logs/std.log

For Fig5 , evaluation on Semi-Supervised Tasks

# Copy 1%, 10% ImageNet from the complete ImageNet, according to split from SimCLR.
cd data
# Need to set up path to Compelete ImageNet and the output path.
python3 -u imagenet_1_fraction.py --ratio 1
python3 -u imagenet_1_fraction.py --ratio 10

# Evaluation on 1% ImageNet with Eff-B0 by DisCo
cp -r /path/to/imagenet_1_fraction/train  /dev/shm
cp -r /path/to/imagenet_1_fraction/val  /dev/shm/
python3 -u main_lincls_semi.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm  2>&1 | tee ./logs/std.log

# Evaluation on 10% ImageNet with R-18 by DisCo
cp -r /path/to/imagenet_10_fraction/train  /dev/shm
cp -r /path/to/imagenet_10_fraction/val  /dev/shm/
python3 -u main_lincls_semi.py -a resnet18 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm  2>&1 | tee ./logs/std.log

For Fig6, evaluation on Cifar10/Cifar100

# Copy Cifar10/100 to /dev/shm
cp /path/to/Cifar10/100 /dev/shm

# Evaluation on 1% Cifar10 with Eff-B0 by DisCo
python3 cifar_main_lincls.py -a efficientb0 --dataset cifar10 --lr 3 --epochs 200 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log
# Evaluation on  Cifar100 with Resnet18 by DisCo
python3 cifar_main_lincls.py -a resnet18 --dataset cifar100 --lr 3 --epochs 200 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

For Tab4, Linear evaluation top-1 accuracy (%) on ImageNet, compared with SEED with consistent dimension in hidden layer.

python3 -u main.py -a efficientb0 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher_arch resnet50 --teacher /path/to/ckpt/checkpoint_0199.pth.tar --hidden 2048 /dev/shm/ 2>&1 | tee ./logs/std.log
#          Evaluation
python3 -u main_lincls.py -a efficientb0 --learning-rate 3.0 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --pretrained /path/to/ckpt/checkpoint_0199.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

For Tab5, Linear evaluation top-1 accuracy (%) on ImageNet with SwAV as the testbed.

# SwAV, Train with SwAV only
cd swav-master
python3 -m torch.distributed.launch --nproc_per_node=8 main_swav.py \
        --data_path /dev/shm/train \
        --base_lr 0.6 \
        --final_lr 0.0006 \
        --warmup_epochs 0 \
        --crops_for_assign 0 1 \
        --size_crops 224 96 \
        --nmb_crops 2 6 \
        --min_scale_crops 0.14 0.05 \
        --max_scale_crops 1. 0.14 \
        --use_fp16 true \
        --freeze_prototypes_niters 5005 \
        --queue_length 3840 \
        --epoch_queue_starts 15 \
        --dump_path ./ckpt \
        --sync_bn pytorch \
        --temperature 0.1 \
        --epsilon 0.05 \
        --sinkhorn_iterations 3 \
        --feat_dim 128 \
        --nmb_prototypes 3000 \
        --epochs 200 \
        --batch_size 64 \
        --wd 0.000001 \
        --arch efficientb0 \
        --use_fp16 true 2>&1 | tee ./logs/std.log
# Evaluation
python3 -m torch.distributed.launch --nproc_per_node=8 eval_linear.py --arch efficientb0 --data_path /dev/shm --pretrained /path/to/ckpt/checkpoints/ckp-199.pth 2>&1 | tee ./logs/std.log

# DisCo + SwAV
python3 -m torch.distributed.launch --nproc_per_node=8 main_swav_distill.py \
        --data_path /dev/shm/train \
        --base_lr 0.6 \
        --final_lr 0.0006 \
        --warmup_epochs 0 \
        --crops_for_assign 0 1 \
        --size_crops 224 96 \
        --nmb_crops 2 6 \
        --min_scale_crops 0.14 0.05 \
        --max_scale_crops 1. 0.14 \
        --use_fp16 true \
        --freeze_prototypes_niters 5005 \
        --queue_length 3840 \
        --epoch_queue_starts 15 \
        --dump_path ./ckpt \
        --sync_bn pytorch \
        --temperature 0.1 \
        --epsilon 0.05 \
        --sinkhorn_iterations 3 \
        --feat_dim 128 \
        --nmb_prototypes 3000 \
        --epochs 200 \
        --batch_size 64 \
        --wd 0.000001 \
        --arch efficientb0 \
        --pretrained /path/to/swav_800ep_pretrain.pth.tar 2>&1 | tee ./logs/std.log

For Tab6, Linear evaluation top-1 accuracy (%) on ImageNet with variants of teacher pre-training methods.

# SwAV
python3 -u main.py -a resnet34 --lr 0.03 --batch-size 256 --moco-t 0.2 --aug-plus --dist-url 'tcp://localhost:10043' --multiprocessing-distributed --world-size 1 --rank 0 --mlp --cos --teacher_arch SWAVresnet50 --teacher /path/to/swav_800ep_pretrain.pth.tar /dev/shm 2>&1 | tee ./logs/std.log

Visualization

cd DisCo
# Generate Embed
# Move Embed to data path

python -u draw.py

Thanks

Code heavily depends on MoCo-V2, Detectron2.

You might also like...
Official PyTorch implementation of
Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

AimCLR This is an official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Reco

PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning
PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning

DirectCLR DirectCLR is a simple contrastive learning model for visual representation learning. It does not require a trainable projector as SimCLR. It

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

PyTorch implementation of Self-supervised Contrastive Regularization for DG (SelfReg)
PyTorch implementation of Self-supervised Contrastive Regularization for DG (SelfReg)

SelfReg PyTorch official implementation of Self-supervised Contrastive Regularization for Domain Generalization (SelfReg, https://arxiv.org/abs/2104.0

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

Official code for the CVPR 2021 paper "How Well Do Self-Supervised Models Transfer?"

How Well Do Self-Supervised Models Transfer? This repository hosts the code for the experiments in the CVPR 2021 paper How Well Do Self-Supervised Mod

Official code for the paper "Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks".

Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks This repository contains the official code for the

Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

CoMatch: Semi-supervised Learning with Contrastive Graph Regularization (Salesforce Research) This is a PyTorch implementation of the CoMatch paper [B

Lightweight mmm - Lightweight (Bayesian) Media Mix Model

Lightweight (Bayesian) Media Mix Model This is not an official Google product. L

Owner
null
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

Core-tuning This repository is the official implementation of ``Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regular

vanint 18 Dec 17, 2022
Saeed Lotfi 28 Dec 12, 2022
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022
Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Dense Contrastive Learning for Self-Supervised Visual Pre-Training This project hosts the code for implementing the DenseCL algorithm for se

Xinlong Wang 491 Jan 3, 2023
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

VITA 59 Dec 28, 2022
Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Patch-Rotation(PatchRot) Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models Submitted to Neurips2021 To

null 4 Jul 12, 2021
Self-Supervised Contrastive Learning of Music Spectrograms

Self-Supervised Music Analysis Self-Supervised Contrastive Learning of Music Spectrograms Dataset Songs on the Billboard Year End Hot 100 were collect

null 27 Dec 10, 2022
A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

MERIT A PyTorch implementation of our IJCAI-21 paper Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning. Depen

Graph Analysis & Deep Learning Laboratory, GRAND 32 Jan 2, 2023
Implementation for paper "Towards the Generalization of Contrastive Self-Supervised Learning"

Contrastive Self-Supervised Learning on CIFAR-10 Paper "Towards the Generalization of Contrastive Self-Supervised Learning", Weiran Huang, Mingyang Yi

Weiran Huang 13 Nov 30, 2022