PyTorch implementation of SwAV (Swapping Assignments between Views)

Meta Research

Last update: Jan 4, 2023

Related tags

Deep Learning swav

Overview

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

This code provides a PyTorch implementation and pretrained models for SwAV (Swapping Assignments between Views), as described in the paper Unsupervised Learning of Visual Features by Contrasting Cluster Assignments.

SwAV is an efficient and simple method for pre-training convnets without using annotations. Similarly to contrastive approaches, SwAV learns representations by comparing transformations of an image, but unlike contrastive methods, it does not require to compute feature pairwise comparisons. It makes our framework more efficient since it does not require a large memory bank or an auxiliary momentum network. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or “views”) of the same image, instead of comparing features directly. Simply put, we use a “swapped” prediction mechanism where we predict the cluster assignment of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data.

Model Zoo

We release several models pre-trained with SwAV with the hope that other researchers might also benefit by replacing the ImageNet supervised network with SwAV backbone. To load our best SwAV pre-trained ResNet-50 model, simply do:

import torch
model = torch.hub.load('facebookresearch/swav:main', 'resnet50')

We provide several baseline SwAV pre-trained models with ResNet-50 architecture in torchvision format. We also provide models pre-trained with DeepCluster-v2 and SeLa-v2 obtained by applying improvements from the self-supervised community to DeepCluster and SeLa (see details in the appendix of our paper).

method	epochs	batch-size	multi-crop	ImageNet top-1 acc.	url	args
SwAV	800	4096	2x224 + 6x96	75.3	model	script
SwAV	400	4096	2x224 + 6x96	74.6	model	script
SwAV	200	4096	2x224 + 6x96	73.9	model	script
SwAV	100	4096	2x224 + 6x96	72.1	model	script
SwAV	200	256	2x224 + 6x96	72.7	model	script
SwAV	400	256	2x224 + 6x96	74.3	model	script
SwAV	400	4096	2x224	70.1	model	script
DeepCluster-v2	800	4096	2x224 + 6x96	75.2	model	script
DeepCluster-v2	400	4096	2x160 + 4x96	74.3	model	script
DeepCluster-v2	400	4096	2x224	70.2	model	script
SeLa-v2	400	4096	2x160 + 4x96	71.8	model	-
SeLa-v2	400	4096	2x224	67.2	model	-

Larger architectures

We provide SwAV models with ResNet-50 networks where we multiply the width by a factor ×2, ×4, and ×5. To load the corresponding backbone you can use:

import torch
rn50w2 = torch.hub.load('facebookresearch/swav:main', 'resnet50w2')
rn50w4 = torch.hub.load('facebookresearch/swav:main', 'resnet50w4')
rn50w5 = torch.hub.load('facebookresearch/swav:main', 'resnet50w5')

network	parameters	epochs	ImageNet top-1 acc.	url	args
RN50-w2	94M	400	77.3	model	script
RN50-w4	375M	400	77.9	model	script
RN50-w5	586M	400	78.5	model	-

Running times

We provide the running times for some of our runs:

method	batch-size	multi-crop	scripts	time per epoch
SwAV	4096	2x224 + 6x96	* * * *	3min40s
SwAV	256	2x224 + 6x96	* *	52min10s
DeepCluster-v2	4096	2x160 + 4x96	*	3min13s

Running SwAV unsupervised training

Requirements

Python 3.6
PyTorch install = 1.4.0
torchvision
CUDA 10.1
Apex with CUDA extension (see how I installed apex)
Other dependencies: scipy, pandas, numpy

Singlenode training

SwAV is very simple to implement and experiment with. Our implementation consists in a main_swav.py file from which are imported the dataset definition src/multicropdataset.py, the model architecture src/resnet50.py and some miscellaneous training utilities src/utils.py.

For example, to train SwAV baseline on a single node with 8 gpus for 400 epochs, run:

python -m torch.distributed.launch --nproc_per_node=8 main_swav.py \
--data_path /path/to/imagenet/train \
--epochs 400 \
--base_lr 0.6 \
--final_lr 0.0006 \
--warmup_epochs 0 \
--batch_size 32 \
--size_crops 224 96 \
--nmb_crops 2 6 \
--min_scale_crops 0.14 0.05 \
--max_scale_crops 1. 0.14 \
--use_fp16 true \
--freeze_prototypes_niters 5005 \
--queue_length 3840 \
--epoch_queue_starts 15

Multinode training

Distributed training is available via Slurm. We provide several SBATCH scripts to reproduce our SwAV models. For example, to train SwAV on 8 nodes and 64 GPUs with a batch size of 4096 for 800 epochs run:

sbatch ./scripts/swav_800ep_pretrain.sh

Note that you might need to remove the copyright header from the sbatch file to launch it.

Set up dist_url parameter: We refer the user to pytorch distributed documentation (env or file or tcp) for setting the distributed initialization method (parameter dist_url) correctly. In the provided sbatch files, we use the tcp init method (see * for example).

Evaluating models

Evaluate models: Linear classification on ImageNet

To train a supervised linear classifier on frozen features/weights on a single node with 8 gpus, run:

python -m torch.distributed.launch --nproc_per_node=8 eval_linear.py \
--data_path /path/to/imagenet \
--pretrained /path/to/checkpoints/swav_800ep_pretrain.pth.tar

The resulting linear classifier can be downloaded here.

Evaluate models: Semi-supervised learning on ImageNet

To reproduce our results and fine-tune a network with 1% or 10% of ImageNet labels on a single node with 8 gpus, run:

10% labels

python -m torch.distributed.launch --nproc_per_node=8 eval_semisup.py \
--data_path /path/to/imagenet \
--pretrained /path/to/checkpoints/swav_800ep_pretrain.pth.tar \
--labels_perc "10" \
--lr 0.01 \
--lr_last_layer 0.2

1% labels

python -m torch.distributed.launch --nproc_per_node=8 eval_semisup.py \
--data_path /path/to/imagenet \
--pretrained /path/to/checkpoints/swav_800ep_pretrain.pth.tar \
--labels_perc "1" \
--lr 0.02 \
--lr_last_layer 5

Evaluate models: Transferring to Detection with DETR

DETR is a recent object detection framework that reaches competitive performance with Faster R-CNN while being conceptually simpler and trainable end-to-end. We evaluate our SwAV ResNet-50 backbone on object detection on COCO dataset using DETR framework with full fine-tuning. Here are the instructions for reproducing our experiments:

Install detr and prepare COCO dataset following these instructions.
Apply the changes highlighted in this gist to detr backbone file in order to load SwAV backbone instead of ImageNet supervised weights.
Launch training from detr repository with run_with_submitit.py.

python run_with_submitit.py --batch_size 4 --nodes 2 --lr_backbone 5e-5

Common Issues

For help or issues using SwAV, please submit a GitHub issue.

The loss does not decrease and is stuck at ln(nmb_prototypes) (8.006 for 3000 prototypes).

It sometimes happens that the system collapses at the beginning and does not manage to converge. We have found the following empirical workarounds to improve convergence and avoid collapsing at the beginning:

use a lower epsilon value (--epsilon 0.03 instead of the default 0.05)
carefully tune the hyper-parameters
freeze the prototypes during first iterations (freeze_prototypes_niters argument)
switch to hard assignment
remove batch-normalization layer from the projection head
reduce the difficulty of the problem (less crops or softer data augmentation)

We now analyze the collapsing problem: it happens when all examples are mapped to the same unique representation. In other words, the convnet always has the same output regardless of its input, it is a constant function. All examples gets the same cluster assignment because they are identical, and the only valid assignment that satisfy the equipartition constraint in this case is the uniform assignment (1/K where K is the number of prototypes). In turn, this uniform assignment is trivial to predict since it is the same for all examples. Reducing epsilon parameter (see Eq(3) of our paper) encourages the assignments Q to be sharper (i.e. less uniform), which strongly helps avoiding collapse. However, using a too low value for epsilon may lead to numerical instability.

Training gets unstable when using the queue.

The queue is composed of feature representations from the previous batches. These lines discard the oldest feature representations from the queue and save the newest one (i.e. from the current batch) through a round-robin mechanism. This way, the assignment problem is performed on more samples: without the queue we assign B examples to num_prototypes clusters where B is the total batch size while with the queue we assign (B + queue_length) examples to num_prototypes clusters. This is especially useful when working with small batches because it improves the precision of the assignment.

If you start using the queue too early or if you use a too large queue, this can considerably disturb training: this is because the queue members are too inconsistent. After introducing the queue the loss should be lower than what it was without the queue. On the following loss curve (30 first epochs of this script) we introduced the queue at epoch 15. We observe that it made the loss go more down.

SwAV training loss batch_size=256 during the first 30 epochs

If when introducing the queue, the loss goes up and does not decrease afterwards you should stop your training and change the queue parameters. We recommend (i) using a smaller queue, (ii) starting the queue later in training.

License

See the LICENSE file for more details.

Citation

If you find this repository useful in your research, please cite:

@article{caron2020unsupervised,
  title={Unsupervised Learning of Visual Features by Contrasting Cluster Assignments},
  author={Caron, Mathilde and Misra, Ishan and Mairal, Julien and Goyal, Priya and Bojanowski, Piotr and Joulin, Armand},
  booktitle={Proceedings of Advances in Neural Information Processing Systems (NeurIPS)},
  year={2020}
}

Comments

Benchmarking on CIFAR-10
Hi,

I wanted to benchmark SwAV on CIFAR-10. Is there any recommended configuration for CIFAR-10? For eg:

The number of prototypes could be set to 50, 100 etc.

Since CIFAR-10 images are 32x32, multicrop can be avoided.

Also, do you plan to publish any pretrained model on CIFAR-10?
opened by abhinavagarwalla 26
TypeError: optimizers must be either a single optimizer or a list of optimizers.

Hello,

I'm trying to run main_swav.py with the following command:

python -m torch.distributed.launch --nproc_per_node=1 main_swav.py --images_path=<path to data directory> --train_annotations_path <path to data file> --epochs 400 --base_lr 0.6 --final_lr 0.0006 --warmup_epochs 0 --batch_size 32 --size_crops 224 96 --nmb_crops 2 6 --min_scale_crops 0.14 0.05 --max_scale_crops 1. 0.14 --use_fp16 true --freeze_prototypes_niters 5005 --queue_length 3840 --epoch_queue_starts 15

Some of those parameters have been added to accommodate our data. The only changes I have made to the code are minor changes to the dataset and additional/changed arguments. When I run this command I get the following error:

`Traceback (most recent call last): File "main_swav.py", line 380, in main() File "main_swav.py", line 189, in main model, optimizer = apex.amp.initialize(model, optimizer, opt_level="O1") File "/opt/conda/lib/python3.6/site-packages/apex/amp/frontend.py", line 358, in initialize return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs) File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 158, in _initialize raise TypeError("optimizers must be either a single optimizer or a list of optimizers.") TypeError: optimizers must be either a single optimizer or a list of optimizers.

Traceback (most recent call last): File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in main() File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main cmd=cmd) subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', 'main_swav.py', '--local_rank=0', '--images_path=/data/computer_vision_projects/rare_planes/classification_data/images/', '--train_annotations_path', '/data/computer_vision_projects/rare_planes/classification_data/annotations/instances_train_role_mislabel_category_id_033_chipped.json', '--epochs', '400', '--base_lr', '0.6', '--final_lr', '0.0006', '--warmup_epochs', '0', '--batch_size', '32', '--size_crops', '224', '96', '--nmb_crops', '2', '6', '--min_scale_crops', '0.14', '0.05', '--max_scale_crops', '1.', '0.14', '--use_fp16', 'true', '--freeze_prototypes_niters', '5005', '--queue_length', '3840', '--epoch_queue_starts', '15']' returned non-zero exit status 1. make: *** [Makefile:69: train-rare-planes] Error 1`

Immediately before the line that throws the error I placed a couple print statements: print("type(OPTIMIZER)", type(optimizer)) print("OPTIMIZER", optimizer)

The output from those is: type(OPTIMIZER) <class 'apex.parallel.LARC.LARC'> OPTIMIZER SGD ( Parameter Group 0 dampening: 0 lr: 0.6 momentum: 0.9 nesterov: False weight_decay: 1e-06 )

Here are some version numbers I'm using: Python 3.6.9 :: Anaconda, Inc. PyTorch == 1.5.0a0+8f84ded torchvision == 0.6.0a0 CUDA == 10.2 apex == 0.1

Any ideas why I would be seeing this error? Thanks in advance!

opened by guerriep 18
Training loss history

Hi,

Thank you so much for sharing your codes!

May I know if you have a copy of your loss record?

When I trained your model from scratch, the loss was stacked around 8 for the first 2 epochs. (I am still training the model)

Is it the same for you?

Thank you.

opened by alibabadoufu 14
How can I use single node and single GPU

Dear, I would like to use my single GPU on my personal computer to test your code. Can you explain how to reproduce the training or test configuration.

opened by marcomameli1992 11
num_prototype

Hi, thanks for your excellent work! I have a question about num_prototype in deepclustering V2. What does the num_prototype mean? why num_prototype can bigger than class_num? Thanks!

opened by txw1997 8
Why was faiss not used for DC2?

Hello,

Absolutely fascinating work! I was looking into how DC2 improves upon the original DC, and noticed that DC2 implementation does not use faiss for clustering. May I know why this choice was made?

Thank you.

opened by yutaizhou 6
The learning rate of linear classification

Thanks for your awesome work. I wonder why the learning rate is so small in linear classification(0.3 in eval_linear.py)? In the linear classification of MoCo, the initial learning rate is 30 with a two-stage reduction. There is a 100x difference with this repo. Have you ever run the eval_linear.py with moco v2 weights or run swav weights with the code from MoCo? I wonder about the performance impact of the lr.

opened by dddzg 6

How is it ensured that only full resolution views are used for code computation?

Referring to this section of the paper:

In the code, this part is supposedly handled with crops_for_assign:

for i, crop_id in enumerate(args.crops_for_assign):
            with torch.no_grad():
                out = output[bs * crop_id: bs * (crop_id + 1)]

                # time to use the queue
                if queue is not None:
                    if use_the_queue or not torch.all(queue[i, -1, :] == 0):
                        use_the_queue = True
                        out = torch.cat((torch.mm(
                            queue[i],
                            model.module.prototypes.weight.t()
                        ), out))
                    # fill the queue
                    queue[i, bs:] = queue[i, :-bs].clone()
                    queue[i, :bs] = embedding[crop_id * bs: (crop_id + 1) * bs]
                # get assignments
                q = torch.exp(out / args.epsilon).t()
                q = distributed_sinkhorn(q, args.sinkhorn_iterations)[-bs:]

I am not sure how this indexing out = output[bs * crop_id: bs * (crop_id + 1)] ensures we are only operating on full resolution views (224/160)?

opened by sayakpaul 6

Empty clusters?
Hi @mathildecaron31

I trained a network from scratch with my own dataset and wrote some code that sorts images in different folders regarding their cluster assignments. I did this with the following lines of code:

embedding, output = model(inputs) p = softmax(output / args.temperature) prediction = p.tolist() prototyp = [] for i in range(len(prediction)): prototyp.append(np.argmax(prediction[i]))

The problem is that when I save the images in different folders regarding their cluster assignment, some folders remain empty. The number of folders is the same as the number of prototypes. I always thought that the images are equally distributed between the different prototypes. What is the problem? Can you help me?
opened by daniiki 5
How to perform multinode training with torch.distributed.launch?
Hi, nice work! I tried to do pretraining with main_swav.py on multiple machines.

Here's the main code for distributed training.

python -m torch.distributed.launch main_swav.py --rank 0 \ --world_size 8 \ --dist_url 'tcp://172.31.11.200:23456' \

I comment the line 55-59 in src/utils.py in order to set ranks for each machine. It is okay to run.

But I found that during training, on each machine, only 1 GPU was used. I think it is caused by https://github.com/facebookresearch/swav/blob/77f718581585ae99187d0bdf01899e2de1085c38/src/utils.py#L68. Could you help me figure it out?

Many thanks!
opened by fangruizhu 5
About process_group in SyncBN

Hi,

I noticed that you adopted 8 GPUs as a group in SyncBN (https://github.com/facebookresearch/swav/blob/master/main_swav.py#L158) when training with a large batch size of 4096, i.e. 512 training samples in a group for sync batchnorm. I am wondering that 1) why don't you use global syncBN for training and 2) how much does it affect?

Thanks!

opened by yxgeee 5
Exchanging k-means Clustering with Spectral Clustering

Hello everyone, For my master thesis I exchanged k-means Clustering with Spectral Clustering for computing the pseudo-labels (line 396 - 493). Unfortunately I only tested it on a 1000 images dataset, but first results seemed promising. Feel free to contact me :)

Kind regards, Björn

opened by BjornBecker 0

question of loading pre-trained model in eval_linear.py and eval_semisup.py

Hi, I am using eval_linear.py and eval_semisup.py to do tasks on my own dataset. When I try to load pre-trained model from here The eval_semisup.py gives following message:

INFO - 08/28/22 11:48:22 - 0:00:00 - Building data done with 8765 images loaded  
INFO - 08/28/22 11:48:22 - 0:00:00 - key "projection_head.weight" could not be found in provided state dict  
INFO - 08/28/22 11:48:22 - 0:00:00 - key "projection_head.bias" could not be found in provided state dict
INFO - 08/28/22 11:48:22 - 0:00:00 - Load pretrained model with msg: _IncompatibleKeys(missing_keys=['projection_head.weight', 'projection_head.bias'], unexpected_keys=['prototypes.weight', 'projection_head.0.weight', 'projection_head.0.bias', 'projection_head.1.weight', 'projection_head.1.bias', 'projection_head.1.running_mean', 'projection_head.1.running_var', 'projection_head.1.num_batches_tracked', 'projection_head.3.weight', 'projection_head.3.bias'])

When I load pre-trained model in eval_linear.py, the output is:

INFO - 08/28/22 14:02:22 - 0:00:02 - Load pretrained model with msg: _IncompatibleKeys(missing_keys=[], unexpected_keys=['projection_head.0.weight', 'projection_head.0.bias', 'projection_head.1.weight', 'projection_head.1.bias', 'projection_head.1.running_mean', 'projection_head.1.running_var', 'projection_head.1.num_batches_tracked', 'projection_head.3.weight', 'projection_head.3.bias', 'prototypes.weight'])

It seems the pretained model has the missing data of projection_head weight and bias. So I printed the keys in saved model and found projection head is in there but with a suffix at the end of saved model:

dict_keys(['module.conv1.weight', 'module.bn1.weight', 'module.bn1.bias', 'module.bn1.running_mean', 'module.bn1.running_var', 'module.bn1.num_batches_tracked', 'module.layer1.0.conv1.weight', 'module.layer1.0.bn1.weight', 'module.layer1.0.bn1.bias', 'module.layer1.0.bn1.running_mean', 'module.layer1.0.bn1.running_var', 'module.layer1.0.bn1.num_batches_tracked', 'module.layer1.0.conv2.weight', 'module.layer1.0.bn2.weight', 'module.layer1.0.bn2.bias', 'module.layer1.0.bn2.running_mean', 'module.layer1.0.bn2.running_var', 'module.layer1.0.bn2.num_batches_tracked', 'module.layer1.0.conv3.weight', 'module.layer1.0.bn3.weight', 'module.layer1.0.bn3.bias', 'module.layer1.0.bn3.running_mean', 'module.layer1.0.bn3.running_var', 'module.layer1.0.bn3.num_batches_tracked', 'module.layer1.0.downsample.0.weight', 'module.layer1.0.downsample.1.weight', 'module.layer1.0.downsample.1.bias', 'module.layer1.0.downsample.1.running_mean', 'module.layer1.0.downsample.1.running_var', 'module.layer1.0.downsample.1.num_batches_tracked', 'module.layer1.1.conv1.weight', 'module.layer1.1.bn1.weight', 'module.layer1.1.bn1.bias', 'module.layer1.1.bn1.running_mean', 'module.layer1.1.bn1.running_var', 'module.layer1.1.bn1.num_batches_tracked', 'module.layer1.1.conv2.weight', 'module.layer1.1.bn2.weight', 'module.layer1.1.bn2.bias', 'module.layer1.1.bn2.running_mean', 'module.layer1.1.bn2.running_var', 'module.layer1.1.bn2.num_batches_tracked', 'module.layer1.1.conv3.weight', 'module.layer1.1.bn3.weight', 'module.layer1.1.bn3.bias', 'module.layer1.1.bn3.running_mean', 'module.layer1.1.bn3.running_var', 'module.layer1.1.bn3.num_batches_tracked', 'module.layer1.2.conv1.weight', 'module.layer1.2.bn1.weight', 'module.layer1.2.bn1.bias', 'module.layer1.2.bn1.running_mean', 'module.layer1.2.bn1.running_var', 'module.layer1.2.bn1.num_batches_tracked', 'module.layer1.2.conv2.weight', 'module.layer1.2.bn2.weight', 'module.layer1.2.bn2.bias', 'module.layer1.2.bn2.running_mean', 'module.layer1.2.bn2.running_var', 'module.layer1.2.bn2.num_batches_tracked', 'module.layer1.2.conv3.weight', 'module.layer1.2.bn3.weight', 'module.layer1.2.bn3.bias', 'module.layer1.2.bn3.running_mean', 'module.layer1.2.bn3.running_var', 'module.layer1.2.bn3.num_batches_tracked', 'module.layer2.0.conv1.weight', 'module.layer2.0.bn1.weight', 'module.layer2.0.bn1.bias', 'module.layer2.0.bn1.running_mean', 'module.layer2.0.bn1.running_var', 'module.layer2.0.bn1.num_batches_tracked', 'module.layer2.0.conv2.weight', 'module.layer2.0.bn2.weight', 'module.layer2.0.bn2.bias', 'module.layer2.0.bn2.running_mean', 'module.layer2.0.bn2.running_var', 'module.layer2.0.bn2.num_batches_tracked', 'module.layer2.0.conv3.weight', 'module.layer2.0.bn3.weight', 'module.layer2.0.bn3.bias', 'module.layer2.0.bn3.running_mean', 'module.layer2.0.bn3.running_var', 'module.layer2.0.bn3.num_batches_tracked', 'module.layer2.0.downsample.0.weight', 'module.layer2.0.downsample.1.weight', 'module.layer2.0.downsample.1.bias', 'module.layer2.0.downsample.1.running_mean', 'module.layer2.0.downsample.1.running_var', 'module.layer2.0.downsample.1.num_batches_tracked', 'module.layer2.1.conv1.weight', 'module.layer2.1.bn1.weight', 'module.layer2.1.bn1.bias', 'module.layer2.1.bn1.running_mean', 'module.layer2.1.bn1.running_var', 'module.layer2.1.bn1.num_batches_tracked', 'module.layer2.1.conv2.weight', 'module.layer2.1.bn2.weight', 'module.layer2.1.bn2.bias', 'module.layer2.1.bn2.running_mean', 'module.layer2.1.bn2.running_var', 'module.layer2.1.bn2.num_batches_tracked', 'module.layer2.1.conv3.weight', 'module.layer2.1.bn3.weight', 'module.layer2.1.bn3.bias', 'module.layer2.1.bn3.running_mean', 'module.layer2.1.bn3.running_var', 'module.layer2.1.bn3.num_batches_tracked', 'module.layer2.2.conv1.weight', 'module.layer2.2.bn1.weight', 'module.layer2.2.bn1.bias', 'module.layer2.2.bn1.running_mean', 'module.layer2.2.bn1.running_var', 'module.layer2.2.bn1.num_batches_tracked', 'module.layer2.2.conv2.weight', 'module.layer2.2.bn2.weight', 'module.layer2.2.bn2.bias', 'module.layer2.2.bn2.running_mean', 'module.layer2.2.bn2.running_var', 'module.layer2.2.bn2.num_batches_tracked', 'module.layer2.2.conv3.weight', 'module.layer2.2.bn3.weight', 'module.layer2.2.bn3.bias', 'module.layer2.2.bn3.running_mean', 'module.layer2.2.bn3.running_var', 'module.layer2.2.bn3.num_batches_tracked', 'module.layer2.3.conv1.weight', 'module.layer2.3.bn1.weight', 'module.layer2.3.bn1.bias', 'module.layer2.3.bn1.running_mean', 'module.layer2.3.bn1.running_var', 'module.layer2.3.bn1.num_batches_tracked', 'module.layer2.3.conv2.weight', 'module.layer2.3.bn2.weight', 'module.layer2.3.bn2.bias', 'module.layer2.3.bn2.running_mean', 'module.layer2.3.bn2.running_var', 'module.layer2.3.bn2.num_batches_tracked', 'module.layer2.3.conv3.weight', 'module.layer2.3.bn3.weight', 'module.layer2.3.bn3.bias', 'module.layer2.3.bn3.running_mean', 'module.layer2.3.bn3.running_var', 'module.layer2.3.bn3.num_batches_tracked', 'module.layer3.0.conv1.weight', 'module.layer3.0.bn1.weight', 'module.layer3.0.bn1.bias', 'module.layer3.0.bn1.running_mean', 'module.layer3.0.bn1.running_var', 'module.layer3.0.bn1.num_batches_tracked', 'module.layer3.0.conv2.weight', 'module.layer3.0.bn2.weight', 'module.layer3.0.bn2.bias', 'module.layer3.0.bn2.running_mean', 'module.layer3.0.bn2.running_var', 'module.layer3.0.bn2.num_batches_tracked', 'module.layer3.0.conv3.weight', 'module.layer3.0.bn3.weight', 'module.layer3.0.bn3.bias', 'module.layer3.0.bn3.running_mean', 'module.layer3.0.bn3.running_var', 'module.layer3.0.bn3.num_batches_tracked', 'module.layer3.0.downsample.0.weight', 'module.layer3.0.downsample.1.weight', 'module.layer3.0.downsample.1.bias', 'module.layer3.0.downsample.1.running_mean', 'module.layer3.0.downsample.1.running_var', 'module.layer3.0.downsample.1.num_batches_tracked', 'module.layer3.1.conv1.weight', 'module.layer3.1.bn1.weight', 'module.layer3.1.bn1.bias', 'module.layer3.1.bn1.running_mean', 'module.layer3.1.bn1.running_var', 'module.layer3.1.bn1.num_batches_tracked', 'module.layer3.1.conv2.weight', 'module.layer3.1.bn2.weight', 'module.layer3.1.bn2.bias', 'module.layer3.1.bn2.running_mean', 'module.layer3.1.bn2.running_var', 'module.layer3.1.bn2.num_batches_tracked', 'module.layer3.1.conv3.weight', 'module.layer3.1.bn3.weight', 'module.layer3.1.bn3.bias', 'module.layer3.1.bn3.running_mean', 'module.layer3.1.bn3.running_var', 'module.layer3.1.bn3.num_batches_tracked', 'module.layer3.2.conv1.weight', 'module.layer3.2.bn1.weight', 'module.layer3.2.bn1.bias', 'module.layer3.2.bn1.running_mean', 'module.layer3.2.bn1.running_var', 'module.layer3.2.bn1.num_batches_tracked', 'module.layer3.2.conv2.weight', 'module.layer3.2.bn2.weight', 'module.layer3.2.bn2.bias', 'module.layer3.2.bn2.running_mean', 'module.layer3.2.bn2.running_var', 'module.layer3.2.bn2.num_batches_tracked', 'module.layer3.2.conv3.weight', 'module.layer3.2.bn3.weight', 'module.layer3.2.bn3.bias', 'module.layer3.2.bn3.running_mean', 'module.layer3.2.bn3.running_var', 'module.layer3.2.bn3.num_batches_tracked', 'module.layer3.3.conv1.weight', 'module.layer3.3.bn1.weight', 'module.layer3.3.bn1.bias', 'module.layer3.3.bn1.running_mean', 'module.layer3.3.bn1.running_var', 'module.layer3.3.bn1.num_batches_tracked', 'module.layer3.3.conv2.weight', 'module.layer3.3.bn2.weight', 'module.layer3.3.bn2.bias', 'module.layer3.3.bn2.running_mean', 'module.layer3.3.bn2.running_var', 'module.layer3.3.bn2.num_batches_tracked', 'module.layer3.3.conv3.weight', 'module.layer3.3.bn3.weight', 'module.layer3.3.bn3.bias', 'module.layer3.3.bn3.running_mean', 'module.layer3.3.bn3.running_var', 'module.layer3.3.bn3.num_batches_tracked', 'module.layer3.4.conv1.weight', 'module.layer3.4.bn1.weight', 'module.layer3.4.bn1.bias', 'module.layer3.4.bn1.running_mean', 'module.layer3.4.bn1.running_var', 'module.layer3.4.bn1.num_batches_tracked', 'module.layer3.4.conv2.weight', 'module.layer3.4.bn2.weight', 'module.layer3.4.bn2.bias', 'module.layer3.4.bn2.running_mean', 'module.layer3.4.bn2.running_var', 'module.layer3.4.bn2.num_batches_tracked', 'module.layer3.4.conv3.weight', 'module.layer3.4.bn3.weight', 'module.layer3.4.bn3.bias', 'module.layer3.4.bn3.running_mean', 'module.layer3.4.bn3.running_var', 'module.layer3.4.bn3.num_batches_tracked', 'module.layer3.5.conv1.weight', 'module.layer3.5.bn1.weight', 'module.layer3.5.bn1.bias', 'module.layer3.5.bn1.running_mean', 'module.layer3.5.bn1.running_var', 'module.layer3.5.bn1.num_batches_tracked', 'module.layer3.5.conv2.weight', 'module.layer3.5.bn2.weight', 'module.layer3.5.bn2.bias', 'module.layer3.5.bn2.running_mean', 'module.layer3.5.bn2.running_var', 'module.layer3.5.bn2.num_batches_tracked', 'module.layer3.5.conv3.weight', 'module.layer3.5.bn3.weight', 'module.layer3.5.bn3.bias', 'module.layer3.5.bn3.running_mean', 'module.layer3.5.bn3.running_var', 'module.layer3.5.bn3.num_batches_tracked', 'module.layer4.0.conv1.weight', 'module.layer4.0.bn1.weight', 'module.layer4.0.bn1.bias', 'module.layer4.0.bn1.running_mean', 'module.layer4.0.bn1.running_var', 'module.layer4.0.bn1.num_batches_tracked', 'module.layer4.0.conv2.weight', 'module.layer4.0.bn2.weight', 'module.layer4.0.bn2.bias', 'module.layer4.0.bn2.running_mean', 'module.layer4.0.bn2.running_var', 'module.layer4.0.bn2.num_batches_tracked', 'module.layer4.0.conv3.weight', 'module.layer4.0.bn3.weight', 'module.layer4.0.bn3.bias', 'module.layer4.0.bn3.running_mean', 'module.layer4.0.bn3.running_var', 'module.layer4.0.bn3.num_batches_tracked', 'module.layer4.0.downsample.0.weight', 'module.layer4.0.downsample.1.weight', 'module.layer4.0.downsample.1.bias', 'module.layer4.0.downsample.1.running_mean', 'module.layer4.0.downsample.1.running_var', 'module.layer4.0.downsample.1.num_batches_tracked', 'module.layer4.1.conv1.weight', 'module.layer4.1.bn1.weight', 'module.layer4.1.bn1.bias', 'module.layer4.1.bn1.running_mean', 'module.layer4.1.bn1.running_var', 'module.layer4.1.bn1.num_batches_tracked', 'module.layer4.1.conv2.weight', 'module.layer4.1.bn2.weight', 'module.layer4.1.bn2.bias', 'module.layer4.1.bn2.running_mean', 'module.layer4.1.bn2.running_var', 'module.layer4.1.bn2.num_batches_tracked', 'module.layer4.1.conv3.weight', 'module.layer4.1.bn3.weight', 'module.layer4.1.bn3.bias', 'module.layer4.1.bn3.running_mean', 'module.layer4.1.bn3.running_var', 'module.layer4.1.bn3.num_batches_tracked', 'module.layer4.2.conv1.weight', 'module.layer4.2.bn1.weight', 'module.layer4.2.bn1.bias', 'module.layer4.2.bn1.running_mean', 'module.layer4.2.bn1.running_var', 'module.layer4.2.bn1.num_batches_tracked', 'module.layer4.2.conv2.weight', 'module.layer4.2.bn2.weight', 'module.layer4.2.bn2.bias', 'module.layer4.2.bn2.running_mean', 'module.layer4.2.bn2.running_var', 'module.layer4.2.bn2.num_batches_tracked', 'module.layer4.2.conv3.weight', 'module.layer4.2.bn3.weight', 'module.layer4.2.bn3.bias', 'module.layer4.2.bn3.running_mean', 'module.layer4.2.bn3.running_var', 'module.layer4.2.bn3.num_batches_tracked', 'module.projection_head.0.weight', 'module.projection_head.0.bias', 'module.projection_head.1.weight', 'module.projection_head.1.bias', 'module.projection_head.1.running_mean', 'module.projection_head.1.running_var', 'module.projection_head.1.num_batches_tracked', 'module.projection_head.3.weight', 'module.projection_head.3.bias', 'module.prototypes.weight'])

How should I handle this problem if I want to load projection head into eval_semisup.py?

Thanks

opened by zhawhjw 0

How can I load DeepCluster-v2?

I tired model = torch.hub.load('facebookresearch/deepcluster-v2', 'resnet50')

It gave a error of HTTPError: HTTP Error 404: Not Found

Could you help me out with this?

opened by km5ar 0
A question about dataloader setting : `shufflu`?

Excellent work!

I see that your training script does not use the shuffle=True setting when loading data. I wonder if this setting has any effect for performance?

Does using shuffle=True have a positive effect? Or negative effects?

opened by Classmate-Huang 0

Owner

Meta Research

GitHub

Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper

Continual Learning With Filter Atom Swapping Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper If find t

11 Aug 29, 2022

Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Swapping Autoencoder for Deep Image Manipulation Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang UC

449 Dec 27, 2022

Swapping face using Face Mesh with TensorFlow Lite

17 Apr 26, 2022

Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces Installation After cloning the repo open

37 Dec 3, 2022

PAWS 🐾 Predicting View-Assignments with Support Samples

This repo provides a PyTorch implementation of PAWS (predicting view assignments with support samples), as described in the paper Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples.

437 Dec 23, 2022

Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

Deep Semisupervised Multiview Learning With Increasing Views (ISVN, IEEE TCYB) Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, Jie Lin, Huaibai Yan, Dez

3 Nov 19, 2022

A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

BraVe This is a JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short. The model provided in this package wa

44 Nov 20, 2022

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

111 Dec 29, 2022

Viewmaker Networks: Learning Views for Unsupervised Representation Learning

Viewmaker Networks: Learning Views for Unsupervised Representation Learning Alex Tamkin, Mike Wu, and Noah Goodman Paper link: https://arxiv.org/abs/2

31 Dec 1, 2022

[ICCV 2021 (oral)] Planar Surface Reconstruction from Sparse Views

Planar Surface Reconstruction From Sparse Views Linyi Jin, Shengyi Qian, Andrew Owens, David F. Fouhey University of Michigan ICCV 2021 (Oral) This re

89 Jan 5, 2023

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation Official PyTorch implementation for the paper Look

20 Nov 24, 2022

[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

249 Dec 28, 2022

Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yu

UT-Austin Robot Perception and Learning Lab

63 Jan 3, 2023

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Transparency-by-Design networks (TbD-nets) This repository contains code for replicating the experiments and visualizations from the paper Transparenc

351 Nov 18, 2022

PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules

Dynamic Routing Between Capsules - PyTorch implementation PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules from Sara Sabour,

475 Dec 24, 2022

Pytorch implementation of Hinton's Dynamic Routing Between Capsules

pytorch-capsule A Pytorch implementation of Hinton's "Dynamic Routing Between Capsules". https://arxiv.org/pdf/1710.09829.pdf Thanks to @naturomics fo

625 Oct 27, 2022

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

5.7k Jan 9, 2023

Compare outputs between layers written in Tensorflow and layers written in Pytorch

Compare outputs of Wasserstein GANs between TensorFlow vs Pytorch This is our testing module for the implementation of improved WGAN in Pytorch Prereq

72 Dec 20, 2022

PyTorch implementation of SwAV (Swapping Assignments between Views)

Related tags

Overview

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

Model Zoo

Larger architectures

Running times

Running SwAV unsupervised training

Requirements

Singlenode training

Multinode training

Evaluating models

Evaluate models: Linear classification on ImageNet

Evaluate models: Semi-supervised learning on ImageNet

Evaluate models: Transferring to Detection with DETR

Common Issues

The loss does not decrease and is stuck at ln(nmb_prototypes) (8.006 for 3000 prototypes).

Training gets unstable when using the queue.

License

See also

Citation

Comments

Owner

Meta Research

Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper

Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Swapping face using Face Mesh with TensorFlow Lite

Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

PAWS 🐾 Predicting View-Assignments with Support Samples

Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Viewmaker Networks: Learning Views for Unsupervised Representation Learning

[ICCV 2021 (oral)] Planar Surface Reconstruction from Sparse Views

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules

Pytorch implementation of Hinton's Dynamic Routing Between Capsules

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

Compare outputs between layers written in Tensorflow and layers written in Pytorch