PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Related tags

Deep Learning dino
Overview

Self-Supervised Vision Transformers with DINO

PyTorch implementation and pretrained models for DINO. For details, see Emerging Properties in Self-Supervised Vision Transformers.
[blogpost] [arXiv]

DINO illustration

Pretrained models

You can choose to download only the weights of the pretrained backbone used for downstream tasks, or the full checkpoint which contains backbone and projection head weights for both student and teacher networks. We also provide the training and evaluation logs.

arch params k-nn linear download
DeiT-S/16 21M 74.5% 77.0% backbone only full checkpoint args logs eval logs
DeiT-S/8 21M 78.3% 79.7% backbone only full checkpoint args logs eval logs
ViT-B/16 85M 76.1% 78.2% backbone only full checkpoint args logs eval logs
ViT-B/8 85M 77.4% 80.1% backbone only full checkpoint args logs eval logs
ResNet-50 23M 67.5% 75.3% backbone only full checkpoint args logs eval logs

The pretrained models are available on PyTorch Hub.

import torch
deits16 = torch.hub.load('facebookresearch/dino', 'dino_deits16')
deits8 = torch.hub.load('facebookresearch/dino', 'dino_deits8')
vitb16 = torch.hub.load('facebookresearch/dino', 'dino_vitb16')
vitb8 = torch.hub.load('facebookresearch/dino', 'dino_vitb8')
resnet50 = torch.hub.load('facebookresearch/dino', 'dino_resnet50')

Training

Documentation

Please install PyTorch and download the ImageNet dataset. This codebase has been developed with python version 3.6, PyTorch version 1.7.1, CUDA 11.0 and torchvision 0.8.2. The exact arguments to reproduce the models presented in our paper can be found in the args column of the pretrained models section. For a glimpse at the full documentation of DINO training please run:

python main_dino.py --help

Vanilla DINO training 🦕

Run DINO with DeiT-small network on a single node with 8 GPUs for 100 epochs with the following command. Training time is 1.75 day and the resulting checkpoint should reach ~69.3% on k-NN eval and ~73.8% on linear eval. We will shortly provide training and linear evaluation logs for this run to help reproducibility.

python -m torch.distributed.launch --nproc_per_node=8 main_dino.py --arch deit_small --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir

Multi-node training

We use Slurm and submitit (pip install submitit). To train on 2 nodes with 8 GPUs each (total 16 GPUs):

python run_with_submitit.py --nodes 2 --ngpus 8 --arch deit_small --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir
DINO with ViT-base network.
python run_with_submitit.py --nodes 2 --ngpus 8 --use_volta32 --arch vit_base  --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir

Boosting DINO performance 🦖

You can improve the performance of the vanilla run by:

  • training for more epochs: --epochs 300,
  • increasing the teacher temperature: --teacher_temp 0.07 --warmup_teacher_temp_epochs 30.
  • removing last layer normalization (only safe with --arch deit_small): --norm_last_layer false,
Full command.
python run_with_submitit.py --arch deit_small --epochs 300 --teacher_temp 0.07 --warmup_teacher_temp_epochs 30 --norm_last_layer false --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir

The resulting pretrained model should reach ~73.4% on k-NN eval and ~76.1% on linear eval. Training time is 2.6 days with 16 GPUs. We will shortly provide training and linear evaluation logs for this run to help reproducibility.

ResNet-50 and other convnets trainings

This code also works for training DINO on convolutional networks, like ResNet-50 for example. We highly recommend to adapt some optimization arguments in this case. For example here is a command to train DINO on ResNet-50 on a single node with 8 GPUs for 100 epochs:

python -m torch.distributed.launch --nproc_per_node=8 main_dino.py --arch resnet50 --optimizer sgd --weight_decay 1e-4 --weight_decay_end 1e-4 --global_crops_scale 0.14 1 --local_crops_scale 0.05 0.14 --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir

Evaluation: k-NN classification on ImageNet

To evaluate a simple k-NN classifier with a single GPU on a pre-trained model, run:

python -m torch.distributed.launch --nproc_per_node=1 eval_knn.py --data_path /path/to/imagenet

If you choose not to specify --pretrained_weights, then DINO reference weights are used by default. If you want instead to evaluate checkpoints from a run of your own, you can run for example:

python -m torch.distributed.launch --nproc_per_node=1 eval_knn.py --pretrained_weights /path/to/checkpoint.pth --checkpoint_key teacher --data_path /path/to/imagenet 

Evaluation: Linear classification on ImageNet

To train a supervised linear classifier on frozen weights on a single node with 8 gpus, run:

python -m torch.distributed.launch --nproc_per_node=8 eval_linear.py --data_path /path/to/imagenet

Self-attention visualization

You can look at the self-attention of the [CLS] token on the different heads of the last layer by running:

python visualize_attention.py
Self-attention from a Vision Transformer with 8x8 patches trained with DINO

License

See the LICENSE file for more details.

Citation

If you find this repository useful, please consider giving a star and citation 🦖 :

@article{caron2021emerging,
  title={Emerging Properties in Self-Supervised Vision Transformers},
  author={Caron, Mathilde and Touvron, Hugo and Misra, Ishan and J\'egou, Herv\'e  and Mairal, Julien and Bojanowski, Piotr and Joulin, Armand},
  journal={arXiv preprint arXiv:2104.14294},
  year={2021}
}
Comments
  • Error using visualize_attention.py. The size of tensor a (3234) must match the size of tensor b (3181) at non-singleton dimension 1

    Error using visualize_attention.py. The size of tensor a (3234) must match the size of tensor b (3181) at non-singleton dimension 1

    Hi all, I am trying to execute visualize_attention.py with default pretrained weights on my own image as below

    !python visualize_attention.py --image_path 'test/finalImg_249.png'

    I get size mistamatch error. Could you please let me know what changes needs to be done here?

    Error stack trace:

    Please use the --pretrained_weights argument to indicate the path of the checkpoint to evaluate. Since no pretrained weights have been provided, we load the reference pretrained DINO weights. /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3458: UserWarning: Default upsampling behavior when mode=bicubic is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.

    "See the documentation of nn.Upsample for details.".format(mode) /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3503: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. "The default behavior for interpolate/upsample with float scale_factor changed "

    Traceback (most recent call last): File "visualize_attention.py", line 162, in attentions = model.forward_selfattention(img.to(device)) File "~/dino/vision_transformer.py", line 246, in forward_selfattention x = x + pos_embed

    RuntimeError: The size of tensor a (3234) must match the size of tensor b (3181) at non-singleton dimension 1

    Image details: import cv2 img = cv2.imread('finalImg_249.png') print (img.shape) #output: (427, 488, 3)

    opened by cishwarya 20
  • Error finetuning from pretrained checkpoint

    Error finetuning from pretrained checkpoint

    Hi all, I'm running into an error when trying to fine-tune from one of the pretrained checkpoints.

    Code

    !mkdir "$output"
    !wget -q -O "$output/checkpoint.pth" https://dl.fbaipublicfiles.com/dino/dino_deitsmall16_pretrain/dino_deitsmall16_pretrain.pth
    
    !python -m torch.distributed.launch \
      --nproc_per_node=1 ./dino/main_dino.py \
      --arch deit_small \
      --data_path "$input" \
      --output_dir "$output"
    

    Error

    | distributed init (rank 0): env://
    git:
      sha: 8aa93fdc90eae4b183c4e3c005174a9f634ecfbf, status: clean, branch: main
    
    arch: deit_small
    batch_size_per_gpu: 64
    ...
    ...
    Student and Teacher are built: they are both deit_small network.
    Loss, optimizer and schedulers ready.
    Found checkpoint at ./drive/MyDrive/DINO/checkpoint.pth
    => failed to load student from checkpoint './drive/MyDrive/DINO/checkpoint.pth'
    => failed to load teacher from checkpoint './drive/MyDrive/DINO/checkpoint.pth'
    => failed to load optimizer from checkpoint './drive/MyDrive/DINO/checkpoint.pth'
    => failed to load fp16_scaler from checkpoint './drive/MyDrive/DINO/checkpoint.pth'
    => failed to load dino_loss from checkpoint './drive/MyDrive/DINO/checkpoint.pth'
    

    Any suggestions would be very much appreciated.

    opened by yadamonk 12
  • Loss not dropping on custom dataset :(

    Loss not dropping on custom dataset :(

    Hi, thanks for the wonderful work, @mathildecaron31! Reported video is inspiring :D __ I am experimenting with a custom dataset. The thing is, it's totally okay to train vision transformer (deit_small) in supervised manner and loss drops fine. Even managed to apply visualize_attention.py to see heatmaps for a separately trained ViT. But when I switch to use self-supervised Dino setup, there is almost no change in loss during training. Do you have idea why it could happen or possible solutions? __ Thanks!

    I am attaching screenshot from training and arguments I have used for training script.

    loss-stop

    arch ='deit_small'
    patch_size = 16
    out_dim = 10000 # default 65536
    norm_last_layer = False
    momentum_teacher = 0.996 # check this according to batch_size
    bsize = 256 #####
    use_bn_in_head = False
    warmup_teacher_temp = 0.0005 # less if does not decrease, default 0.04
    teacher_temp = 0.3 # increase if needed, default: 0.04
    warmup_teacher_temp_epochs = 0 # default 30 to warmup
    use_fp16 = False #disable is loss is unstable, default: True
    weight_decay = 0.04 # a smaller value works well
    weight_decay_end = 0.4 # final value of weight decay
    clip_grad = 3.0 # max parameter gradient norm, 0 for disabling # default, 3.0
    batch_size_per_gpu = 256 # reduce this if not fit, default 64
    epochs = 100
    freeze_last_layer = 5 # default 1, Try increasing this value if the loss does not decrease.
    lr = 0.005 #linear with batch size scaled, for ref of 256, def 0.0005
    warmup_epochs = 0 #linear warmup def 10
    min_lr = 1e-6 # target lr at the optimization
    optimizer = 'sgd' # def: adamw
    global_crops_scale = (0.4, 1.)
    local_crops_number = 8 # local small views
    local_crops_scale = (0.05, 0.4) # def (0.05, 0.4)
    data_path = train_dataset_dir #
    output_dir = "./dirlog"
    saveckp_freq = 20
    seed = 0 # random seed
    num_workers = 40 #def:10
    dist_url = "env://"
    local_rank = 0
    device_ids = [0, 1, 2, 3, 4, 5] # use 6 gpus
    
    opened by tuttelikz 11
  • `interpolate_pos_encoding(x, pos_embed)` doesnt return correct dimension for images that is not square (w != h)

    `interpolate_pos_encoding(x, pos_embed)` doesnt return correct dimension for images that is not square (w != h)

    I notice the generation of positional embedding in interpolate_pos_encoding method is slightly different than the one in the forward_selfattention method. The following simple modification bring both into the same page, to your interest.

        def interpolate_pos_encoding(self, x, pos_embed, w, h):  # passing w and h as arguments
            npatch = x.shape[1] - 1
            N = pos_embed.shape[1] - 1
            if npatch == N:
                return pos_embed
            class_emb = pos_embed[:, 0]
            pos_embed = pos_embed[:, 1:]
            dim = x.shape[-1]
            w0 = w // self.patch_embed.patch_size  # just copy paste from forward_selfattention
            h0 = h // self.patch_embed.patch_size
            pos_embed = nn.functional.interpolate(
                pos_embed.reshape(1, int(math.sqrt(N)), int(math.sqrt(N)), dim).permute(0, 3, 1, 2),
                scale_factor=(w0 / math.sqrt(N), h0 / math.sqrt(N)),  # replace math.sqrt(npatch / N) with one from forward_selfattention
                mode='bicubic',
            )
            pos_embed = pos_embed.permute(0, 2, 3, 1).view(1, -1, dim)
            return torch.cat((class_emb.unsqueeze(0), pos_embed), dim=1)
    
    opened by enverfakhan 11
  • model collapse after a few steps

    model collapse after a few steps

    I use custom data to train DINO, the model seems collapsed after a few steps, the feature seems to be uniform. I use larger teacher temputure to enhance "sharping", but the model collapsed after all. I wonder if DINO is sensitive to the data, in other word, does DINO tend to collapse when training at differnet data?

    opened by Doom9234 8
  • Onnx pretrained

    Onnx pretrained

    Your work looks very interesting. I'm not familiar with Pytorch / Python and it would be great if the pre-trained nets could be provided in ONNX format.

    Regards Armin

    opened by Armin234 8
  • Training/Transferring on CIFAR10

    Training/Transferring on CIFAR10

    Hi

    Thanks for your nice work. I wonder if you can share the hyperparameter for transfer learning on CIFAR10. Have you succeeded to train on cifar10 from scratch without transferring? if so would you also kindly share the hyperparameters for that?

    opened by nimaous 7
  • hello what I would need to do to apply it to 3d medical imaging setting

    hello what I would need to do to apply it to 3d medical imaging setting

    Hello, I would like to use your algorithm for the 3d setting (magnetic resonance imaging of the prostate gland). I have only image-level labels, and your algorithm seems very interesting. What would I need to do to adapt it for a 3-dimensional setting?

    opened by jakubMitura14 6
  • Scaling up DINO to larger model size

    Scaling up DINO to larger model size

    Hi @mathildecaron31, I'm recently considering scaling DINO to a larger model size, e.g., ViT-L/16. I used the almost same parameters as ViT-B/16 and pre-train DINO for 400 epochs but the k-NN and linear probing accuracy are ~1% and ~2% worse than the base-size model respectively. Do you have any related experience with that? Thanks for your help!

    opened by shallowtoil 6
  • knn_eval() with resnet50 has missing keys in state_dict

    knn_eval() with resnet50 has missing keys in state_dict

    While the fc layer is not needed when extracting features from ResNet50, the following command

    $ python eval_knn.py --dump_features resnet50_features --arch resnet50 --data_path imagenet1k_folder
    

    generates this error:

    RuntimeError: Error(s) in loading state_dict for ResNet:
            Missing key(s) in state_dict: "fc.weight", "fc.bias".
    

    Here is the complete output:

    Will run the code on one GPU.
    | distributed init (rank 0): env://
    fatal: Not a git repository (or any parent up to mount point /home)
    Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
    git:
      sha: N/A, status: clean, branch: N/A
    
    arch: resnet50
    batch_size_per_gpu: 128
    checkpoint_key: teacher
    data_path: imagenet1k_folder
    dist_url: env://
    dump_features: resnet50_features
    gpu: 0
    load_features: None
    local_rank: 0
    nb_knn: [10, 20, 100, 200]
    num_workers: 10
    patch_size: 16
    pretrained_weights: 
    rank: 0
    temperature: 0.07
    use_cuda: True
    world_size: 1
    /home/user/anaconda3/envs/vissl/lib/python3.7/site-packages/torchvision/transforms/transforms.py:258: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
      "Argument interpolation should be of type InterpolationMode instead of int. "
    /home/user/anaconda3/envs/vissl/lib/python3.7/site-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 10 worker processes in total. Our suggested max number of worker in current system is 8, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
      cpuset_checked))
    Data loaded with 1281167 train and 50000 val imgs.
    Please use the `--pretrained_weights` argument to indicate the path of the checkpoint to evaluate.
    Since no pretrained weights have been provided, we load the reference pretrained DINO weights.
    Traceback (most recent call last):
      File "eval_knn.py", line 227, in <module>
        train_features, test_features, train_labels, test_labels = extract_feature_pipeline(args)
      File "eval_knn.py", line 70, in extract_feature_pipeline
        utils.load_pretrained_weights(model, args.pretrained_weights, args.checkpoint_key, args.arch, args.patch_size)
      File "/home/user/codes/dino-main/utils.py", line 107, in load_pretrained_weights
        model.load_state_dict(state_dict, strict=True)
      File "/home/user/anaconda3/envs/vissl/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1224, in load_state_dict
        self.__class__.__name__, "\n\t".join(error_msgs)))
    RuntimeError: Error(s) in loading state_dict for ResNet:
            Missing key(s) in state_dict: "fc.weight", "fc.bias". 
    
    opened by mgpadalkar 6
  • Linear evaluation weight problems (

    Linear evaluation weight problems ("_IncompatibleKeys" and access issues)

    Hello, thanks for all the amazing work you put into this. I tried downloading the pretrained weights for the 3 available ViT models and have encountered some issues:

    1. ViT-S/16 gives me an "Access denied" message whenever I try downloading it.
    2. For ViT-S/8 or ViT-B/16, their weights seem corrupted? Whenever I try loading them into eval_linear.py, I get a message listing a long list of missing and unexpected keys. Not exactly sure what is wrong here. The loss also starts at a quite high value and although it's dropping off, I don't think this is the intended behavior from a pretrained model.

    Here's the training output (not the log) for ViT-S/8 TrainingOutput.txt

    Thank you again for your work.

    opened by KnockerPulsar 6
  • Supervised Fine-Tuning of Teacher / Student Transformer Weights

    Supervised Fine-Tuning of Teacher / Student Transformer Weights

    I used DINO to do self-supervised pre-training of a Small ViT on a dataset I have. Now I wanted to fine-tune the model on another dataset in a supervised way.

    I know that, in a way, the code in eval_linear.py allows us to do that, but - as far as I was able to tell - it only updates the weights of the Linear model built on top of the representations generated by the pre-trained Transformer.

    So my question is: has anyone tried to perform supervised fine-tuning in a way that the weights of the Teacher or Student Transformers are updated as well?

    PS: I realize this might not be the ideal place to ask this question, since it sort of falls out of the DINO jurisdiction, but I figured it was worth a try.

    Thanks for the amazing you work you guys did, and for sharing it with us.

    opened by MatCorr 0
  • about the DINO training loss

    about the DINO training loss

    Hello, I'm training resnet18 on a custom dataset. it's been running for some time with a batch size of 325 (the max my gpu can handle). the thing is the loss is flat and it's not getting better or worse. is this behavior normal ? and if so how do you decide on stopping the training ?

    opened by Faisal-Hajari 0
  • How small batch sizes affect performance

    How small batch sizes affect performance

    Hi, thanks for your hard work. I am retraining DINO with my own custom dataset (~570k images).

    1. On my local computer, the maximum batch size is 32 (1 GPU RTX 3080 TI) and a single epoch takes around 1 hour 20 minutes to complete. Is it normal?
    2. Does small batch size matter to the performance?

    Thank you!

    opened by bryanwong17 0
  • Number of classes

    Number of classes

    Hello, thank you for your work! I have a small datsaset containing only one class of instances (trees). I looked in the code and it seems like the number of classes in the VIsionTransformer is always zero (num_classes=0). Is this normal? I am not sure I understand the difference between the num_classes and num_labels used in the eval script.

    Thank you

    opened by VGrondin 0
  • dead code in video_generation.py

    dead code in video_generation.py

    The thresholded attention maps computed in this block of code aren't being used anywhere else as far as I can tell, so this section seems to just waste computation and memory: https://github.com/facebookresearch/dino/blob/main/video_generation.py#L197

    opened by eminorhan 0
  • 🦕 Created dino fork with wandb.ai support and fix another bugs

    🦕 Created dino fork with wandb.ai support and fix another bugs

    Hi! Due to the fact that the DINO model is no longer supported by the developers and the existing problems are not being solved, this is my fork of dino model where i fix #160 and problem with environment varialbe and directory in get_shared_folder() function. I use this and all seems fine! You can submit your changes in a pull request.

    opened by MikeMACintosh 0
Owner
Facebook Research
Facebook Research
A simple pygame dino game which can also be trained and played by a NEAT KI

Dino Game AI Game The game itself was developed with the Pygame module pip install pygame You can also play it yourself by making the dino jump with t

Kilian Kier 7 Dec 5, 2022
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

VITA 59 Dec 28, 2022
Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

Microsoft 282 Jan 9, 2023
EsViT: Efficient self-supervised Vision Transformers

Efficient Self-Supervised Vision Transformers (EsViT) PyTorch implementation for EsViT, built with two techniques: A multi-stage Transformer architect

Microsoft 352 Dec 25, 2022
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
Official implementation of the method ContIG, for self-supervised learning from medical imaging with genomics

ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with Genetics This is the code implementation of the paper "ContIG: Self-s

Digital Health & Machine Learning 22 Dec 13, 2022
Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"

Focal Transformer This is the official implementation of our Focal Transformer -- "Focal Self-attention for Local-Global Interactions in Vision Transf

Microsoft 486 Dec 20, 2022
A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

ViTGAN: Training GANs with Vision Transformers A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers. Refer

Hong-Jia Chen 127 Dec 23, 2022
PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Contrast to Divide: self-supervised pre-training for learning with noisy labels This is an official implementation of "Contrast to Divide: self-superv

null 55 Nov 23, 2022
PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

MAE for Self-supervised ViT Introduction This is an unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-sup

null 36 Oct 30, 2022
This is an official implementation for "Self-Supervised Learning with Swin Transformers".

Self-Supervised Learning with Vision Transformers By Zhenda Xie*, Yutong Lin*, Zhuliang Yao, Zheng Zhang, Qi Dai, Yue Cao and Han Hu This repo is the

Swin Transformer 529 Jan 2, 2023
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

Facebook Research 253 Jan 6, 2023
Implementation of the method described in the Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

null 4 Mar 11, 2022
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

null 44 Dec 12, 2022
Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Self-Supervised Policy Adaptation during Deployment PyTorch implementation of PAD and evaluation benchmarks from Self-Supervised Policy Adaptation dur

Nicklas Hansen 101 Nov 1, 2022
[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy Codes for this paper: [CVPR 2022] The Pr

VITA 16 Nov 26, 2022
As-ViT: Auto-scaling Vision Transformers without Training

As-ViT: Auto-scaling Vision Transformers without Training [PDF] Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou In ICLR 2

VITA 68 Sep 5, 2022