Implementation of momentum^2 teacher

jemmy li

Last update: Sep 26, 2022

Related tags

Deep Learning momentum2-teacher

Overview

Momentum^{^2} Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

Requirements

All experiments are done with python3.6, torch==1.5.0; torchvision==0.6.0

Usage

Data Preparation

Prepare the ImageNet data in ${root_of_your_clone}/data/imagenet_train, ${root_of_your_clone}/data/imagenet_val. Since we have an internal platform(storage) to read imagenet, I have not tried the local mode. You may need to do some modification in momentum_teacher/data/dataset.py to support the local mode.

Training

Before training, ensure the path (namely ${root_of_clone}) is added in your PYTHONPATH, e.g.

export PYTHONPATH=$PYTHONPATH:${root_of_clone}

To do unsupervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine, run:

using -d to specify gpu_id for training, e.g., -d 0-7
using -b to specify batch_size, e.g., -b 256
using --experiment-name to specify the output folder, and the training log & models will be dumped to './outputs/${experiment-name}'
using -f to specify the description file of ur experiment.

e.g.,

python3 momentum_teacher/tools/train.py -b 256 -d 0-7 --experiment-name your_exp -f momentum_teacher/exps/arxiv/exp_8_v100/momentum2_teacher_100e_exp.py

Linear Evaluation:

With a pre-trained model, to train a supervised linear classifier on frozen features/weights in an 8 gpus machine, run:

using -d to specify gpu_id for training, e.g., -d 0-7
using -b to specify batch_size, e.g., -b 256
using --experiment-name to specify the folder for saving pre-training models.

python3 momentum_teacher/tools/eval.py -b 256 --experiment-name your_exp -f momentum_teacher/exps/arxiv/linear_eval_exp_byol.py

Results

Results of Pretraining on a Single Machine

After pretraining on 8 NVIDIA V100 GPUS and 1024 batch-sizes, the results of linear-evaluation are:

pre-train code	pre-train epochs	pre-train time	accuracy	weights
path	100	~1.8 day	70.7	-
path	200	~3.6 day	72.7	-
path	300	~5.5 day	73.8	-

After pretraining on 8 NVIDIA 2080 GPUS and 256 batch-sizes, the results of linear-evaluation are:

pre-train code	pre-train epochs	pre-train time	accuracy	wights
path	100	~2.5 day	70.4	-
path	200	~5 day	72.3	-
path	300	~7.5 day	72.9	-

Results of Pretraining on Multiple Machines

E.g., To do unsupervised pre-training with 4096 batch-sizes and 32 V100 GPUs. run:

Suggesting that each machine has 8 V100 GPUs and there are 4 machines

# machine 1:
export MACHINE=0; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx
# machine 2:
export MACHINE=1; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx
# machine 3:
export MACHINE=2; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx
# machine 4:
export MACHINE=3; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx

results of linear-eval:

pre-train code	pre-train epochs	pre-train time	accuracy	weights
path	100	~11hour	70.3	-
path	200	~22hour	72.5	-
path	300	~33hour	73.7	-

To do unsupervised pre-training with 4096 batch-sizes and 128 2080 GPUs, pls follow the above guides. Results of linear-eval:

pre-train code	pre-train epochs	pre-train time	accuracy	weights
path	100	~5hour	69.0	-
path	200	~10hour	71.5	-
path	300	~15hour	72.3	-

Disclaimer

This is an implementation for Momentum^2 Teacher, it is worth noting that:

The original implementation is based on our internal Platform.
This released version has slightly better performances compared with the tech report's.

Unet network with mean teacher for altrasound image segmentation

5 Nov 21, 2022

A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI

MPItrampoline MPI wrapper library: MPI trampoline library: MPI integration tests: MPI is the de-facto standard for inter-node communication on HPC sys

31 Dec 22, 2022

ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

3 Oct 6, 2022

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

NuPIC Numenta Platform for Intelligent Computing The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implem

6.3k Dec 30, 2022

PyTorch implementation of neural style transfer algorithm

neural-style-pt This is a PyTorch implementation of the paper A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S. Ecker, and Matthias

770 Jan 2, 2023

PyTorch implementation of DeepDream algorithm

neural-dream This is a PyTorch implementation of DeepDream. The code is based on neural-style-pt. Here we DeepDream a photograph of the Golden Gate Br

121 Nov 5, 2022

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

3.9k Jan 5, 2023

Image-to-Image Translation with Conditional Adversarial Networks (Pix2pix) implementation in keras

pix2pix-keras Pix2pix implementation in keras. Original paper: Image-to-Image Translation with Conditional Adversarial Networks (pix2pix) Paper Author

141 Dec 30, 2022

Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

This is a Python implementation of cover trees, a data structure for finding nearest neighbors in a general metric space (e.g., a 3D box with periodic

28 Nov 25, 2022

Comments

Teacher network weights used for linear evaluation

Hi, and thanks a lot for sharing your code! I noticed that both MoCo and BYOL use the frozen weights from the student network for linear evaluation, whereas your evaluation script uses the weights from the teacher network.

In my tests (on another dataset) the linear evaluation score is much better when using the student network weights. Could you confirm which weights were used in the paper?

opened by oliviermoliner 2

Question about the moco implementation

Hello, I'm a bit confused about the moco implementation in this paper. Since moco only has one forward pass for the teacher network, so I guess that the lazy update is not required for moco right? In this case, did you include the bn statistics for the current batch during the forward pass?

To be more specific, do you update the running_mean and running_var before calculating x?

with torch.no_grad():
    self.running_mean = self.momentum * mean + (1 - self.momentum) * self.running_mean
    self.running_var = self.momentum * var * n / (n - 1) + (1 - self.momentum) * self.running_var

x = (x - self.running_mean[None, :, None, None].detach()) / (
    torch.sqrt(self.running_var[None, :, None, None].detach() + self.eps)
)

or you calculate x first

x = (x - self.running_mean[None, :, None, None].detach()) / (
    torch.sqrt(self.running_var[None, :, None, None].detach() + self.eps)
)

with torch.no_grad():
    self.running_mean = self.momentum * mean + (1 - self.momentum) * self.running_mean
    self.running_var = self.momentum * var * n / (n - 1) + (1 - self.momentum) * self.running_var

opened by kyle-1997 0

Can you provide MomentumBatchNorm3d?

Thanks for your excellent work.

I try to implement a 3D version of momentum bn for self-supervised video representation learning. However, the performance of the pre-trained model is as good as pre-training with SyncBn. Can you provide the official implementation?

Here is my implementation: https://gist.github.com/SunDoge/1cfa4e219f349cba36659b19577f172a

opened by SunDoge 0
NCCL issue while reproducing the results

Hi,

Thank you for sharing this great work!

When I was running your code, I met NCCL error which is shown in the attached screenshot below.

I was wondering if others met this error as well while running this code?

opened by danielchyeh 0

Owner

jemmy li

GitHub

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

16 Nov 4, 2020

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

32 Oct 26, 2022

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

FunMatch-Distillation TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A g

67 Dec 20, 2022

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

This is the official PyTorch implementation of the ALBEF paper [Blog]. This repository supports pre-training on custom datasets, as well as finetuning on VQA, SNLI-VE, NLVR2, Image-Text Retrieval on MSCOCO and Flickr30k, and visual grounding on RefCOCO+. Pre-trained and finetuned checkpoints are released.

805 Jan 9, 2023

[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

71 Dec 28, 2022

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

26 Oct 13, 2022

Implementation of momentum^2 teacher

Related tags

Overview

Momentum^{^2} Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

Requirements

Usage

Data Preparation

Training

Linear Evaluation:

Results

Results of Pretraining on a Single Machine

Results of Pretraining on Multiple Machines

Disclaimer

You might also like...

Unet network with mean teacher for altrasound image segmentation

A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI

ALBERT-pytorch-implementation - ALBERT pytorch implementation

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

PyTorch implementation of neural style transfer algorithm

PyTorch implementation of DeepDream algorithm

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Image-to-Image Translation with Conditional Adversarial Networks (Pix2pix) implementation in keras

Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

Comments

Teacher network weights used for linear evaluation

Question about the moco implementation

Can you provide MomentumBatchNorm3d?

NCCL issue while reproducing the results

Owner

jemmy li

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

auto-tuning momentum SGD optimizer

Deep learning algorithms for muon momentum estimation in the CMS Trigger System

Boosting Adversarial Attacks with Enhanced Momentum (BMVC 2021)

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Implementation of momentum^2 teacher

Related tags

Overview

Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

Requirements

Usage

Data Preparation

Training

Linear Evaluation:

Results

Results of Pretraining on a Single Machine

Results of Pretraining on Multiple Machines

Disclaimer

You might also like...

Unet network with mean teacher for altrasound image segmentation

A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI

ALBERT-pytorch-implementation - ALBERT pytorch implementation

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

PyTorch implementation of neural style transfer algorithm

PyTorch implementation of DeepDream algorithm

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Image-to-Image Translation with Conditional Adversarial Networks (Pix2pix) implementation in keras

Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

Comments

Teacher network weights used for linear evaluation

Question about the moco implementation

Can you provide MomentumBatchNorm3d?

NCCL issue while reproducing the results

Owner

jemmy li

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

auto-tuning momentum SGD optimizer

Deep learning algorithms for muon momentum estimation in the CMS Trigger System

Boosting Adversarial Attacks with Enhanced Momentum (BMVC 2021)

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Momentum^{^2} Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning