A PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-Supervised Learning Framework".

Sea AI Lab

Last update: Nov 8, 2022

Related tags

Deep Learning mugs

Overview

Mugs: A Multi-Granular Self-Supervised Learning Framework

This is a PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-Supervised Learning Framework".

Fig 1. Overall framework of Mugs. In (a), for each image, two random crops of one image are fed into backbones of student and teacher. Three granular supervisions: 1) instance discrimination supervision, 2) local-group discrimination supervision, and 3) group discrimination supervision, are adopted to learn multi-granular representation. In (b), local-group modules in student/teacher averages all patch tokens, and finds top-k neighbors from memory buffer to aggregate them with the average for obtaining a local-group feature.

Pretrained models on ImageNet-1K

You can choose to download only the weights of the pretrained backbone used for downstream tasks, or the full checkpoint which contains backbone and projection head weights for both student and teacher networks.

Table 1. KNN and linear probing performance with their corresponding hyper-parameters, logs and model weights.

arch	params	pretraining epochs	k-nn	linear	download
ViT-S/16	21M	100	72.3%	76.4%	backbone only	full ckpt	args	logs	eval logs
ViT-S/16	21M	300	74.8%	78.2%	backbone only	full ckpt	args	logs	eval logs
ViT-S/16	21M	800	75.6%	78.9%	backbone only	full ckpt	args	logs	eval logs
ViT-B/16	85M	400	78.0%	80.6%	backbone only	full ckpt	args	logs	eval logs
ViT-L/16	307M	250	80.3%	82.1%	backbone only	full ckpt	args	logs	eval logs

Fig 2. Comparison of linear probing accuracy on ImageNet-1K.

Pretraining Settings

Environment

For reproducing, please install PyTorch and download the ImageNet dataset. This codebase has been developed with python version 3.8, PyTorch version 1.7.1, CUDA 11.0 and torchvision 0.8.2. For the full environment, please refer to our Dockerfile file.

ViT pretraining 🍺

To pretraining each model, please find the exact hyper-parameter settings at the args column of Table 1. For training log and linear probing log, please refer to the log and eval logs column of Table 1.

ViT-Small pretraining:

To run ViT-small for 100 epochs, we use two nodes of total 8 A100 GPUs (total 512 minibatch size) by using following command:

python -m torch.distributed.launch --nproc_per_node=8 main.py --data_path DATASET_ROOT --output_dir OUTPUT_ROOT --arch vit_small 
--group_teacher_temp 0.04 --group_warmup_teacher_temp_epochs 0 --weight_decay_end 0.2 --norm_last_layer false --epochs 100

To run ViT-small for 300 epochs, we use two nodes of total 16 A100 GPUs (total 1024 minibatch size) by using following command:

python -m torch.distributed.launch --nproc_per_node=16 main.py --data_path DATASET_ROOT --output_dir OUTPUT_ROOT --arch vit_small 
--group_teacher_temp 0.07 --group_warmup_teacher_temp_epochs 30 --weight_decay_end 0.1 --norm_last_layer false --epochs 300

To run ViT-small for 800 epochs, we use two nodes of total 16 A100 GPUs (total 1024 minibatch size) by using following command:

python -m torch.distributed.launch --nproc_per_node=16 main.py --data_path DATASET_ROOT --output_dir OUTPUT_ROOT --arch vit_small 
--group_teacher_temp 0.07 --group_warmup_teacher_temp_epochs 30 --weight_decay_end 0.1 --norm_last_layer false --epochs 800

ViT-Base pretraining:

To run ViT-base for 400 epochs, we use two nodes of total 24 A100 GPUs (total 1024 minibatch size) by using following command:

python -m torch.distributed.launch --nproc_per_node=24 main.py --data_path DATASET_ROOT --output_dir OUTPUT_ROOT --arch vit_base 
--group_teacher_temp 0.07 --group_warmup_teacher_temp_epochs 50 --min_lr 2e-06 --weight_decay_end 0.1 --freeze_last_layer 3 --norm_last_layer 
false --epochs 400

ViT-Large pretraining:

To run ViT-large for 250 epochs, we use two nodes of total 40 A100 GPUs (total 640 minibatch size) by using following command:

python -m torch.distributed.launch --nproc_per_node=40 main.py --data_path DATASET_ROOT --output_dir OUTPUT_ROOT --arch vit_large 
--lr 0.0015 --min_lr 1.5e-4 --group_teacher_temp 0.07 --group_warmup_teacher_temp_epochs 50 --weight_decay 0.025 
--weight_decay_end 0.08 --norm_last_layer true --drop_path_rate 0.3 --freeze_last_layer 3 --epochs 250

Evaluation

We are cleaning up the evalutation code and will release them when they are ready.

Self-attention visualization

Here we provide the self-attention map of the [CLS] token on the heads of the last layer

Fig 3. Self-attention from a ViT-Base/16 trained with Mugs.

T-SNE visualization

Here we provide the T-SNE visualization of the learned feature by ViT-B/16. We show the fish classes in ImageNet-1K, i.e., the first six classes, including tench, goldfish, white shark, tiger shark, hammerhead, electric ray. See more examples in Appendix.

Fig 4. T-SNE visualization of the learned feature by ViT-B/16.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Citation

If you find this repository useful, please consider giving a star ⭐ and citation 🍺 :

@inproceedings{mugs2022SSL,
  title={Mugs: A Multi-Granular Self-Supervised Learning Framework},
  author={Pan Zhou and Yichen Zhou and Chenyang Si and Weihao Yu and Teck Khim Ng and Shuicheng Yan},
  booktitle={arXiv preprint arXiv:2203.14415},
  year={2022}
}

You might also like...

This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

Deep-Detail-Enhancement-for-Any-Garment Introduction This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in

40 Dec 13, 2022

This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021.

Off-Belief Learning Introduction This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021. Environment Setup

32 Jan 5, 2023

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

FLASH - Pytorch Implementation of the Transformer variant proposed in the paper Transformer Quality in Linear Time Install $ pip install FLASH-pytorch

209 Dec 28, 2022

nnDetection is a self-configuring framework for 3D (volumetric) medical object detection which can be applied to new data sets without manual intervention. It includes guides for 12 data sets that were used to develop and evaluate the performance of the proposed method.

What is nnDetection? Simultaneous localisation and categorization of objects in medical images, also referred to as medical object detection, is of hi

365 Jan 9, 2023

The implemetation of Dynamic Nerual Garments proposed in Siggraph Asia 2021

Comments

Please release the eval code

Please release the evaluation code as well. "Dirty code" is much better than no code. I'm finding it hard to parse the provided eval logs for ViT-L/16. Please just use the same eval procedure you used for ViT-B/16 and ViT-S/16 without too many tricks. I don't think anybody really cares about a ~0.1 improvement in eval accuracy obtained through hacking the lr schedule or whatever.

opened by eminorhan 0
torch.distributed.all_gather issue

Thanks for your work. I try and train my own custom dataset with 219 object classes using the ViT small. But I got this bug. How to fix this bug? --nproc_per_node=1 main.py --data_path orchide --output_dir 2022_5_26_first --arch vit_small --group_teacher_temp 0.07 --group_warmup_teacher_temp_epochs 30 --weight_decay_end 0.1 --norm_last_layer false --epochs 800 --batch_size_per_gpu 8

Traceback (most recent call last): File "D:\1070712\SSL\mugs\main.py", line 813, in <module> train_mugs(args) File "D:\1070712\SSL\mugs\main.py", line 524, in train_mugs train_stats = train_one_epoch( File "D:\1070712\SSL\mugs\main.py", line 702, in train_one_epoch len_weak = student_mem._dequeue_and_enqueue( File "D:\1070712\SSL\mugs\src\model.py", line 381, in _dequeue_and_enqueue weak_num = self.block._dequeue_and_enqueue(query, weak_aug_flags) File "C:\Users\user\anaconda3\envs\SSL\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "D:\1070712\SSL\mugs\src\model.py", line 301, in _dequeue_and_enqueue weak_aug_flags = concat_all_gather(weak_aug_flags) File "C:\Users\user\anaconda3\envs\SSL\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "D:\1070712\SSL\mugs\src\model.py", line 605, in concat_all_gather torch.distributed.all_gather(tensors_gather, tensor, async_op=False) File "C:\Users\user\anaconda3\envs\SSL\lib\site-packages\torch\distributed\distributed_c10d.py", line 1870, in all_gather work.wait() RuntimeError: Invalid scalar type

opened by Eren-Corn0712 0
Hyper-parameters for linear-probing

Thanks for your exciting work!!! I m curious about the hyper-parameters you use to evaluate the pre-trained 300epochs ViT-s/16 with linear probing. (e.g. batch size, learning rate, weight decay, optimizer, data augmentation etc.) I am looking forward to hearing from you soon.

opened by yanjk3 0

A PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-Supervised Learning Framework".

Related tags

Overview

Mugs: A Multi-Granular Self-Supervised Learning Framework

Pretrained models on ImageNet-1K

Pretraining Settings

Environment

ViT pretraining 🍺

ViT-Small pretraining:

ViT-Base pretraining:

ViT-Large pretraining:

Evaluation

Self-attention visualization

T-SNE visualization

License

Citation

You might also like...

This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021.

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

nnDetection is a self-configuring framework for 3D (volumetric) medical object detection which can be applied to new data sets without manual intervention. It includes guides for 12 data sets that were used to develop and evaluate the performance of the proposed method.

The implemetation of Dynamic Nerual Garments proposed in Siggraph Asia 2021

Code for CMaskTrack R-CNN (proposed in Occluded Video Instance Segmentation)

Implement object segmentation on images using HOG algorithm proposed in CVPR 2005

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

Comments

Please release the eval code

torch.distributed.all_gather issue

Hyper-parameters for linear-probing

Owner

Sea AI Lab

PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

An implementation for the loss function proposed in Decoupled Contrastive Loss paper.

Implementation of the method proposed in the paper "Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation"

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

Pytorch implementation of the popular Improv RNN model originally proposed by the Magenta team.

Torch-ngp - A pytorch implementation of the hash encoder proposed in instant-ngp

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation