PyTorch implementation of SwAV (Swapping Assignments between Views)

Related tags

Deep Learning swav
Overview

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

This code provides a PyTorch implementation and pretrained models for SwAV (Swapping Assignments between Views), as described in the paper Unsupervised Learning of Visual Features by Contrasting Cluster Assignments.

SwAV Illustration

SwAV is an efficient and simple method for pre-training convnets without using annotations. Similarly to contrastive approaches, SwAV learns representations by comparing transformations of an image, but unlike contrastive methods, it does not require to compute feature pairwise comparisons. It makes our framework more efficient since it does not require a large memory bank or an auxiliary momentum network. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or “views”) of the same image, instead of comparing features directly. Simply put, we use a “swapped” prediction mechanism where we predict the cluster assignment of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data.

Model Zoo

We release several models pre-trained with SwAV with the hope that other researchers might also benefit by replacing the ImageNet supervised network with SwAV backbone. To load our best SwAV pre-trained ResNet-50 model, simply do:

import torch
model = torch.hub.load('facebookresearch/swav:main', 'resnet50')

We provide several baseline SwAV pre-trained models with ResNet-50 architecture in torchvision format. We also provide models pre-trained with DeepCluster-v2 and SeLa-v2 obtained by applying improvements from the self-supervised community to DeepCluster and SeLa (see details in the appendix of our paper).

method epochs batch-size multi-crop ImageNet top-1 acc. url args
SwAV 800 4096 2x224 + 6x96 75.3 model script
SwAV 400 4096 2x224 + 6x96 74.6 model script
SwAV 200 4096 2x224 + 6x96 73.9 model script
SwAV 100 4096 2x224 + 6x96 72.1 model script
SwAV 200 256 2x224 + 6x96 72.7 model script
SwAV 400 256 2x224 + 6x96 74.3 model script
SwAV 400 4096 2x224 70.1 model script
DeepCluster-v2 800 4096 2x224 + 6x96 75.2 model script
DeepCluster-v2 400 4096 2x160 + 4x96 74.3 model script
DeepCluster-v2 400 4096 2x224 70.2 model script
SeLa-v2 400 4096 2x160 + 4x96 71.8 model -
SeLa-v2 400 4096 2x224 67.2 model -

Larger architectures

We provide SwAV models with ResNet-50 networks where we multiply the width by a factor ×2, ×4, and ×5. To load the corresponding backbone you can use:

import torch
rn50w2 = torch.hub.load('facebookresearch/swav:main', 'resnet50w2')
rn50w4 = torch.hub.load('facebookresearch/swav:main', 'resnet50w4')
rn50w5 = torch.hub.load('facebookresearch/swav:main', 'resnet50w5')
network parameters epochs ImageNet top-1 acc. url args
RN50-w2 94M 400 77.3 model script
RN50-w4 375M 400 77.9 model script
RN50-w5 586M 400 78.5 model -

Running times

We provide the running times for some of our runs:

method batch-size multi-crop scripts time per epoch
SwAV 4096 2x224 + 6x96 * * * * 3min40s
SwAV 256 2x224 + 6x96 * * 52min10s
DeepCluster-v2 4096 2x160 + 4x96 * 3min13s

Running SwAV unsupervised training

Requirements

Singlenode training

SwAV is very simple to implement and experiment with. Our implementation consists in a main_swav.py file from which are imported the dataset definition src/multicropdataset.py, the model architecture src/resnet50.py and some miscellaneous training utilities src/utils.py.

For example, to train SwAV baseline on a single node with 8 gpus for 400 epochs, run:

python -m torch.distributed.launch --nproc_per_node=8 main_swav.py \
--data_path /path/to/imagenet/train \
--epochs 400 \
--base_lr 0.6 \
--final_lr 0.0006 \
--warmup_epochs 0 \
--batch_size 32 \
--size_crops 224 96 \
--nmb_crops 2 6 \
--min_scale_crops 0.14 0.05 \
--max_scale_crops 1. 0.14 \
--use_fp16 true \
--freeze_prototypes_niters 5005 \
--queue_length 3840 \
--epoch_queue_starts 15

Multinode training

Distributed training is available via Slurm. We provide several SBATCH scripts to reproduce our SwAV models. For example, to train SwAV on 8 nodes and 64 GPUs with a batch size of 4096 for 800 epochs run:

sbatch ./scripts/swav_800ep_pretrain.sh

Note that you might need to remove the copyright header from the sbatch file to launch it.

Set up dist_url parameter: We refer the user to pytorch distributed documentation (env or file or tcp) for setting the distributed initialization method (parameter dist_url) correctly. In the provided sbatch files, we use the tcp init method (see * for example).

Evaluating models

Evaluate models: Linear classification on ImageNet

To train a supervised linear classifier on frozen features/weights on a single node with 8 gpus, run:

python -m torch.distributed.launch --nproc_per_node=8 eval_linear.py \
--data_path /path/to/imagenet \
--pretrained /path/to/checkpoints/swav_800ep_pretrain.pth.tar

The resulting linear classifier can be downloaded here.

Evaluate models: Semi-supervised learning on ImageNet

To reproduce our results and fine-tune a network with 1% or 10% of ImageNet labels on a single node with 8 gpus, run:

  • 10% labels
python -m torch.distributed.launch --nproc_per_node=8 eval_semisup.py \
--data_path /path/to/imagenet \
--pretrained /path/to/checkpoints/swav_800ep_pretrain.pth.tar \
--labels_perc "10" \
--lr 0.01 \
--lr_last_layer 0.2
  • 1% labels
python -m torch.distributed.launch --nproc_per_node=8 eval_semisup.py \
--data_path /path/to/imagenet \
--pretrained /path/to/checkpoints/swav_800ep_pretrain.pth.tar \
--labels_perc "1" \
--lr 0.02 \
--lr_last_layer 5

Evaluate models: Transferring to Detection with DETR

DETR is a recent object detection framework that reaches competitive performance with Faster R-CNN while being conceptually simpler and trainable end-to-end. We evaluate our SwAV ResNet-50 backbone on object detection on COCO dataset using DETR framework with full fine-tuning. Here are the instructions for reproducing our experiments:

  1. Install detr and prepare COCO dataset following these instructions.

  2. Apply the changes highlighted in this gist to detr backbone file in order to load SwAV backbone instead of ImageNet supervised weights.

  3. Launch training from detr repository with run_with_submitit.py.

python run_with_submitit.py --batch_size 4 --nodes 2 --lr_backbone 5e-5

Common Issues

For help or issues using SwAV, please submit a GitHub issue.

The loss does not decrease and is stuck at ln(nmb_prototypes) (8.006 for 3000 prototypes).

It sometimes happens that the system collapses at the beginning and does not manage to converge. We have found the following empirical workarounds to improve convergence and avoid collapsing at the beginning:

  • use a lower epsilon value (--epsilon 0.03 instead of the default 0.05)
  • carefully tune the hyper-parameters
  • freeze the prototypes during first iterations (freeze_prototypes_niters argument)
  • switch to hard assignment
  • remove batch-normalization layer from the projection head
  • reduce the difficulty of the problem (less crops or softer data augmentation)

We now analyze the collapsing problem: it happens when all examples are mapped to the same unique representation. In other words, the convnet always has the same output regardless of its input, it is a constant function. All examples gets the same cluster assignment because they are identical, and the only valid assignment that satisfy the equipartition constraint in this case is the uniform assignment (1/K where K is the number of prototypes). In turn, this uniform assignment is trivial to predict since it is the same for all examples. Reducing epsilon parameter (see Eq(3) of our paper) encourages the assignments Q to be sharper (i.e. less uniform), which strongly helps avoiding collapse. However, using a too low value for epsilon may lead to numerical instability.

Training gets unstable when using the queue.

The queue is composed of feature representations from the previous batches. These lines discard the oldest feature representations from the queue and save the newest one (i.e. from the current batch) through a round-robin mechanism. This way, the assignment problem is performed on more samples: without the queue we assign B examples to num_prototypes clusters where B is the total batch size while with the queue we assign (B + queue_length) examples to num_prototypes clusters. This is especially useful when working with small batches because it improves the precision of the assignment.

If you start using the queue too early or if you use a too large queue, this can considerably disturb training: this is because the queue members are too inconsistent. After introducing the queue the loss should be lower than what it was without the queue. On the following loss curve (30 first epochs of this script) we introduced the queue at epoch 15. We observe that it made the loss go more down.

SwAV training loss batch_size=256 during the first 30 epochs

If when introducing the queue, the loss goes up and does not decrease afterwards you should stop your training and change the queue parameters. We recommend (i) using a smaller queue, (ii) starting the queue later in training.

License

See the LICENSE file for more details.

See also

PyTorch Lightning Bolts: Implementation by the Lightning team.

SwAV-TF: A TensorFlow re-implementation.

Citation

If you find this repository useful in your research, please cite:

@article{caron2020unsupervised,
  title={Unsupervised Learning of Visual Features by Contrasting Cluster Assignments},
  author={Caron, Mathilde and Misra, Ishan and Mairal, Julien and Goyal, Priya and Bojanowski, Piotr and Joulin, Armand},
  booktitle={Proceedings of Advances in Neural Information Processing Systems (NeurIPS)},
  year={2020}
}
Comments
  • Benchmarking on CIFAR-10

    Benchmarking on CIFAR-10

    Hi,

    I wanted to benchmark SwAV on CIFAR-10. Is there any recommended configuration for CIFAR-10? For eg:

    • The number of prototypes could be set to 50, 100 etc.
    • Since CIFAR-10 images are 32x32, multicrop can be avoided.

    Also, do you plan to publish any pretrained model on CIFAR-10?

    opened by abhinavagarwalla 26
  • TypeError: optimizers must be either a single optimizer or a list of optimizers.

    TypeError: optimizers must be either a single optimizer or a list of optimizers.

    Hello,

    I'm trying to run main_swav.py with the following command:

    python -m torch.distributed.launch --nproc_per_node=1 main_swav.py --images_path=<path to data directory> --train_annotations_path <path to data file> --epochs 400 --base_lr 0.6 --final_lr 0.0006 --warmup_epochs 0 --batch_size 32 --size_crops 224 96 --nmb_crops 2 6 --min_scale_crops 0.14 0.05 --max_scale_crops 1. 0.14 --use_fp16 true --freeze_prototypes_niters 5005 --queue_length 3840 --epoch_queue_starts 15

    Some of those parameters have been added to accommodate our data. The only changes I have made to the code are minor changes to the dataset and additional/changed arguments. When I run this command I get the following error:

    `Traceback (most recent call last): File "main_swav.py", line 380, in main() File "main_swav.py", line 189, in main model, optimizer = apex.amp.initialize(model, optimizer, opt_level="O1") File "/opt/conda/lib/python3.6/site-packages/apex/amp/frontend.py", line 358, in initialize return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs) File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 158, in _initialize raise TypeError("optimizers must be either a single optimizer or a list of optimizers.") TypeError: optimizers must be either a single optimizer or a list of optimizers.

    Traceback (most recent call last): File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in main() File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main cmd=cmd) subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', 'main_swav.py', '--local_rank=0', '--images_path=/data/computer_vision_projects/rare_planes/classification_data/images/', '--train_annotations_path', '/data/computer_vision_projects/rare_planes/classification_data/annotations/instances_train_role_mislabel_category_id_033_chipped.json', '--epochs', '400', '--base_lr', '0.6', '--final_lr', '0.0006', '--warmup_epochs', '0', '--batch_size', '32', '--size_crops', '224', '96', '--nmb_crops', '2', '6', '--min_scale_crops', '0.14', '0.05', '--max_scale_crops', '1.', '0.14', '--use_fp16', 'true', '--freeze_prototypes_niters', '5005', '--queue_length', '3840', '--epoch_queue_starts', '15']' returned non-zero exit status 1. make: *** [Makefile:69: train-rare-planes] Error 1`

    Immediately before the line that throws the error I placed a couple print statements: print("type(OPTIMIZER)", type(optimizer)) print("OPTIMIZER", optimizer)

    The output from those is: type(OPTIMIZER) <class 'apex.parallel.LARC.LARC'> OPTIMIZER SGD ( Parameter Group 0 dampening: 0 lr: 0.6 momentum: 0.9 nesterov: False weight_decay: 1e-06 )

    Here are some version numbers I'm using: Python 3.6.9 :: Anaconda, Inc. PyTorch == 1.5.0a0+8f84ded torchvision == 0.6.0a0 CUDA == 10.2 apex == 0.1

    Any ideas why I would be seeing this error? Thanks in advance!

    opened by guerriep 18
  • Training loss history

    Training loss history

    Hi,

    Thank you so much for sharing your codes!

    May I know if you have a copy of your loss record?

    When I trained your model from scratch, the loss was stacked around 8 for the first 2 epochs. (I am still training the model)

    Is it the same for you?

    Thank you.

    opened by alibabadoufu 14
  • How can I use single node and single GPU

    How can I use single node and single GPU

    Dear, I would like to use my single GPU on my personal computer to test your code. Can you explain how to reproduce the training or test configuration.

    opened by marcomameli1992 11
  • num_prototype

    num_prototype

    Hi, thanks for your excellent work! I have a question about num_prototype in deepclustering V2. What does the num_prototype mean? why num_prototype can bigger than class_num? Thanks!

    opened by txw1997 8
  • Why was faiss not used for DC2?

    Why was faiss not used for DC2?

    Hello,

    Absolutely fascinating work! I was looking into how DC2 improves upon the original DC, and noticed that DC2 implementation does not use faiss for clustering. May I know why this choice was made?

    Thank you.

    opened by yutaizhou 6
  • The learning rate of linear classification

    The learning rate of linear classification

    Thanks for your awesome work. I wonder why the learning rate is so small in linear classification(0.3 in eval_linear.py)? In the linear classification of MoCo, the initial learning rate is 30 with a two-stage reduction. There is a 100x difference with this repo. Have you ever run the eval_linear.py with moco v2 weights or run swav weights with the code from MoCo? I wonder about the performance impact of the lr.

    opened by dddzg 6
  • How is it ensured that only full resolution views are used for code computation?

    How is it ensured that only full resolution views are used for code computation?

    Referring to this section of the paper:

    image

    In the code, this part is supposedly handled with crops_for_assign:

    for i, crop_id in enumerate(args.crops_for_assign):
                with torch.no_grad():
                    out = output[bs * crop_id: bs * (crop_id + 1)]
    
                    # time to use the queue
                    if queue is not None:
                        if use_the_queue or not torch.all(queue[i, -1, :] == 0):
                            use_the_queue = True
                            out = torch.cat((torch.mm(
                                queue[i],
                                model.module.prototypes.weight.t()
                            ), out))
                        # fill the queue
                        queue[i, bs:] = queue[i, :-bs].clone()
                        queue[i, :bs] = embedding[crop_id * bs: (crop_id + 1) * bs]
                    # get assignments
                    q = torch.exp(out / args.epsilon).t()
                    q = distributed_sinkhorn(q, args.sinkhorn_iterations)[-bs:]
    

    I am not sure how this indexing out = output[bs * crop_id: bs * (crop_id + 1)] ensures we are only operating on full resolution views (224/160)?

    opened by sayakpaul 6
  • Empty clusters?

    Empty clusters?

    Hi @mathildecaron31

    I trained a network from scratch with my own dataset and wrote some code that sorts images in different folders regarding their cluster assignments. I did this with the following lines of code:

    embedding, output = model(inputs)
    p = softmax(output / args.temperature)
    prediction = p.tolist()
    prototyp = []
    for i in range(len(prediction)): 
          prototyp.append(np.argmax(prediction[i]))
    

    The problem is that when I save the images in different folders regarding their cluster assignment, some folders remain empty. The number of folders is the same as the number of prototypes. I always thought that the images are equally distributed between the different prototypes. What is the problem? Can you help me?

    opened by daniiki 5
  • How to perform multinode training with torch.distributed.launch?

    How to perform multinode training with torch.distributed.launch?

    Hi, nice work! I tried to do pretraining with main_swav.py on multiple machines.

    Here's the main code for distributed training.

    python -m torch.distributed.launch main_swav.py --rank 0 \
    --world_size 8 \
    --dist_url 'tcp://172.31.11.200:23456' \
    

    I comment the line 55-59 in src/utils.py in order to set ranks for each machine. It is okay to run.

    But I found that during training, on each machine, only 1 GPU was used. I think it is caused by https://github.com/facebookresearch/swav/blob/77f718581585ae99187d0bdf01899e2de1085c38/src/utils.py#L68. Could you help me figure it out?

    Many thanks!

    opened by fangruizhu 5
  • About process_group in SyncBN

    About process_group in SyncBN

    Hi,

    I noticed that you adopted 8 GPUs as a group in SyncBN (https://github.com/facebookresearch/swav/blob/master/main_swav.py#L158) when training with a large batch size of 4096, i.e. 512 training samples in a group for sync batchnorm. I am wondering that 1) why don't you use global syncBN for training and 2) how much does it affect?

    Thanks!

    opened by yxgeee 5
  • Exchanging k-means Clustering with Spectral Clustering

    Exchanging k-means Clustering with Spectral Clustering

    Hello everyone, For my master thesis I exchanged k-means Clustering with Spectral Clustering for computing the pseudo-labels (line 396 - 493). Unfortunately I only tested it on a 1000 images dataset, but first results seemed promising. Feel free to contact me :)

    Kind regards, Björn

    opened by BjornBecker 0
  • question of loading pre-trained model in eval_linear.py and eval_semisup.py

    question of loading pre-trained model in eval_linear.py and eval_semisup.py

    Hi, I am using eval_linear.py and eval_semisup.py to do tasks on my own dataset. When I try to load pre-trained model from here The eval_semisup.py gives following message:

    INFO - 08/28/22 11:48:22 - 0:00:00 - Building data done with 8765 images loaded  
    INFO - 08/28/22 11:48:22 - 0:00:00 - key "projection_head.weight" could not be found in provided state dict  
    INFO - 08/28/22 11:48:22 - 0:00:00 - key "projection_head.bias" could not be found in provided state dict
    INFO - 08/28/22 11:48:22 - 0:00:00 - Load pretrained model with msg: _IncompatibleKeys(missing_keys=['projection_head.weight', 'projection_head.bias'], unexpected_keys=['prototypes.weight', 'projection_head.0.weight', 'projection_head.0.bias', 'projection_head.1.weight', 'projection_head.1.bias', 'projection_head.1.running_mean', 'projection_head.1.running_var', 'projection_head.1.num_batches_tracked', 'projection_head.3.weight', 'projection_head.3.bias'])
    

    When I load pre-trained model in eval_linear.py, the output is:

    INFO - 08/28/22 14:02:22 - 0:00:02 - Load pretrained model with msg: _IncompatibleKeys(missing_keys=[], unexpected_keys=['projection_head.0.weight', 'projection_head.0.bias', 'projection_head.1.weight', 'projection_head.1.bias', 'projection_head.1.running_mean', 'projection_head.1.running_var', 'projection_head.1.num_batches_tracked', 'projection_head.3.weight', 'projection_head.3.bias', 'prototypes.weight'])
    

    It seems the pretained model has the missing data of projection_head weight and bias. So I printed the keys in saved model and found projection head is in there but with a suffix at the end of saved model:

    dict_keys(['module.conv1.weight', 'module.bn1.weight', 'module.bn1.bias', 'module.bn1.running_mean', 'module.bn1.running_var', 'module.bn1.num_batches_tracked', 'module.layer1.0.conv1.weight', 'module.layer1.0.bn1.weight', 'module.layer1.0.bn1.bias', 'module.layer1.0.bn1.running_mean', 'module.layer1.0.bn1.running_var', 'module.layer1.0.bn1.num_batches_tracked', 'module.layer1.0.conv2.weight', 'module.layer1.0.bn2.weight', 'module.layer1.0.bn2.bias', 'module.layer1.0.bn2.running_mean', 'module.layer1.0.bn2.running_var', 'module.layer1.0.bn2.num_batches_tracked', 'module.layer1.0.conv3.weight', 'module.layer1.0.bn3.weight', 'module.layer1.0.bn3.bias', 'module.layer1.0.bn3.running_mean', 'module.layer1.0.bn3.running_var', 'module.layer1.0.bn3.num_batches_tracked', 'module.layer1.0.downsample.0.weight', 'module.layer1.0.downsample.1.weight', 'module.layer1.0.downsample.1.bias', 'module.layer1.0.downsample.1.running_mean', 'module.layer1.0.downsample.1.running_var', 'module.layer1.0.downsample.1.num_batches_tracked', 'module.layer1.1.conv1.weight', 'module.layer1.1.bn1.weight', 'module.layer1.1.bn1.bias', 'module.layer1.1.bn1.running_mean', 'module.layer1.1.bn1.running_var', 'module.layer1.1.bn1.num_batches_tracked', 'module.layer1.1.conv2.weight', 'module.layer1.1.bn2.weight', 'module.layer1.1.bn2.bias', 'module.layer1.1.bn2.running_mean', 'module.layer1.1.bn2.running_var', 'module.layer1.1.bn2.num_batches_tracked', 'module.layer1.1.conv3.weight', 'module.layer1.1.bn3.weight', 'module.layer1.1.bn3.bias', 'module.layer1.1.bn3.running_mean', 'module.layer1.1.bn3.running_var', 'module.layer1.1.bn3.num_batches_tracked', 'module.layer1.2.conv1.weight', 'module.layer1.2.bn1.weight', 'module.layer1.2.bn1.bias', 'module.layer1.2.bn1.running_mean', 'module.layer1.2.bn1.running_var', 'module.layer1.2.bn1.num_batches_tracked', 'module.layer1.2.conv2.weight', 'module.layer1.2.bn2.weight', 'module.layer1.2.bn2.bias', 'module.layer1.2.bn2.running_mean', 'module.layer1.2.bn2.running_var', 'module.layer1.2.bn2.num_batches_tracked', 'module.layer1.2.conv3.weight', 'module.layer1.2.bn3.weight', 'module.layer1.2.bn3.bias', 'module.layer1.2.bn3.running_mean', 'module.layer1.2.bn3.running_var', 'module.layer1.2.bn3.num_batches_tracked', 'module.layer2.0.conv1.weight', 'module.layer2.0.bn1.weight', 'module.layer2.0.bn1.bias', 'module.layer2.0.bn1.running_mean', 'module.layer2.0.bn1.running_var', 'module.layer2.0.bn1.num_batches_tracked', 'module.layer2.0.conv2.weight', 'module.layer2.0.bn2.weight', 'module.layer2.0.bn2.bias', 'module.layer2.0.bn2.running_mean', 'module.layer2.0.bn2.running_var', 'module.layer2.0.bn2.num_batches_tracked', 'module.layer2.0.conv3.weight', 'module.layer2.0.bn3.weight', 'module.layer2.0.bn3.bias', 'module.layer2.0.bn3.running_mean', 'module.layer2.0.bn3.running_var', 'module.layer2.0.bn3.num_batches_tracked', 'module.layer2.0.downsample.0.weight', 'module.layer2.0.downsample.1.weight', 'module.layer2.0.downsample.1.bias', 'module.layer2.0.downsample.1.running_mean', 'module.layer2.0.downsample.1.running_var', 'module.layer2.0.downsample.1.num_batches_tracked', 'module.layer2.1.conv1.weight', 'module.layer2.1.bn1.weight', 'module.layer2.1.bn1.bias', 'module.layer2.1.bn1.running_mean', 'module.layer2.1.bn1.running_var', 'module.layer2.1.bn1.num_batches_tracked', 'module.layer2.1.conv2.weight', 'module.layer2.1.bn2.weight', 'module.layer2.1.bn2.bias', 'module.layer2.1.bn2.running_mean', 'module.layer2.1.bn2.running_var', 'module.layer2.1.bn2.num_batches_tracked', 'module.layer2.1.conv3.weight', 'module.layer2.1.bn3.weight', 'module.layer2.1.bn3.bias', 'module.layer2.1.bn3.running_mean', 'module.layer2.1.bn3.running_var', 'module.layer2.1.bn3.num_batches_tracked', 'module.layer2.2.conv1.weight', 'module.layer2.2.bn1.weight', 'module.layer2.2.bn1.bias', 'module.layer2.2.bn1.running_mean', 'module.layer2.2.bn1.running_var', 'module.layer2.2.bn1.num_batches_tracked', 'module.layer2.2.conv2.weight', 'module.layer2.2.bn2.weight', 'module.layer2.2.bn2.bias', 'module.layer2.2.bn2.running_mean', 'module.layer2.2.bn2.running_var', 'module.layer2.2.bn2.num_batches_tracked', 'module.layer2.2.conv3.weight', 'module.layer2.2.bn3.weight', 'module.layer2.2.bn3.bias', 'module.layer2.2.bn3.running_mean', 'module.layer2.2.bn3.running_var', 'module.layer2.2.bn3.num_batches_tracked', 'module.layer2.3.conv1.weight', 'module.layer2.3.bn1.weight', 'module.layer2.3.bn1.bias', 'module.layer2.3.bn1.running_mean', 'module.layer2.3.bn1.running_var', 'module.layer2.3.bn1.num_batches_tracked', 'module.layer2.3.conv2.weight', 'module.layer2.3.bn2.weight', 'module.layer2.3.bn2.bias', 'module.layer2.3.bn2.running_mean', 'module.layer2.3.bn2.running_var', 'module.layer2.3.bn2.num_batches_tracked', 'module.layer2.3.conv3.weight', 'module.layer2.3.bn3.weight', 'module.layer2.3.bn3.bias', 'module.layer2.3.bn3.running_mean', 'module.layer2.3.bn3.running_var', 'module.layer2.3.bn3.num_batches_tracked', 'module.layer3.0.conv1.weight', 'module.layer3.0.bn1.weight', 'module.layer3.0.bn1.bias', 'module.layer3.0.bn1.running_mean', 'module.layer3.0.bn1.running_var', 'module.layer3.0.bn1.num_batches_tracked', 'module.layer3.0.conv2.weight', 'module.layer3.0.bn2.weight', 'module.layer3.0.bn2.bias', 'module.layer3.0.bn2.running_mean', 'module.layer3.0.bn2.running_var', 'module.layer3.0.bn2.num_batches_tracked', 'module.layer3.0.conv3.weight', 'module.layer3.0.bn3.weight', 'module.layer3.0.bn3.bias', 'module.layer3.0.bn3.running_mean', 'module.layer3.0.bn3.running_var', 'module.layer3.0.bn3.num_batches_tracked', 'module.layer3.0.downsample.0.weight', 'module.layer3.0.downsample.1.weight', 'module.layer3.0.downsample.1.bias', 'module.layer3.0.downsample.1.running_mean', 'module.layer3.0.downsample.1.running_var', 'module.layer3.0.downsample.1.num_batches_tracked', 'module.layer3.1.conv1.weight', 'module.layer3.1.bn1.weight', 'module.layer3.1.bn1.bias', 'module.layer3.1.bn1.running_mean', 'module.layer3.1.bn1.running_var', 'module.layer3.1.bn1.num_batches_tracked', 'module.layer3.1.conv2.weight', 'module.layer3.1.bn2.weight', 'module.layer3.1.bn2.bias', 'module.layer3.1.bn2.running_mean', 'module.layer3.1.bn2.running_var', 'module.layer3.1.bn2.num_batches_tracked', 'module.layer3.1.conv3.weight', 'module.layer3.1.bn3.weight', 'module.layer3.1.bn3.bias', 'module.layer3.1.bn3.running_mean', 'module.layer3.1.bn3.running_var', 'module.layer3.1.bn3.num_batches_tracked', 'module.layer3.2.conv1.weight', 'module.layer3.2.bn1.weight', 'module.layer3.2.bn1.bias', 'module.layer3.2.bn1.running_mean', 'module.layer3.2.bn1.running_var', 'module.layer3.2.bn1.num_batches_tracked', 'module.layer3.2.conv2.weight', 'module.layer3.2.bn2.weight', 'module.layer3.2.bn2.bias', 'module.layer3.2.bn2.running_mean', 'module.layer3.2.bn2.running_var', 'module.layer3.2.bn2.num_batches_tracked', 'module.layer3.2.conv3.weight', 'module.layer3.2.bn3.weight', 'module.layer3.2.bn3.bias', 'module.layer3.2.bn3.running_mean', 'module.layer3.2.bn3.running_var', 'module.layer3.2.bn3.num_batches_tracked', 'module.layer3.3.conv1.weight', 'module.layer3.3.bn1.weight', 'module.layer3.3.bn1.bias', 'module.layer3.3.bn1.running_mean', 'module.layer3.3.bn1.running_var', 'module.layer3.3.bn1.num_batches_tracked', 'module.layer3.3.conv2.weight', 'module.layer3.3.bn2.weight', 'module.layer3.3.bn2.bias', 'module.layer3.3.bn2.running_mean', 'module.layer3.3.bn2.running_var', 'module.layer3.3.bn2.num_batches_tracked', 'module.layer3.3.conv3.weight', 'module.layer3.3.bn3.weight', 'module.layer3.3.bn3.bias', 'module.layer3.3.bn3.running_mean', 'module.layer3.3.bn3.running_var', 'module.layer3.3.bn3.num_batches_tracked', 'module.layer3.4.conv1.weight', 'module.layer3.4.bn1.weight', 'module.layer3.4.bn1.bias', 'module.layer3.4.bn1.running_mean', 'module.layer3.4.bn1.running_var', 'module.layer3.4.bn1.num_batches_tracked', 'module.layer3.4.conv2.weight', 'module.layer3.4.bn2.weight', 'module.layer3.4.bn2.bias', 'module.layer3.4.bn2.running_mean', 'module.layer3.4.bn2.running_var', 'module.layer3.4.bn2.num_batches_tracked', 'module.layer3.4.conv3.weight', 'module.layer3.4.bn3.weight', 'module.layer3.4.bn3.bias', 'module.layer3.4.bn3.running_mean', 'module.layer3.4.bn3.running_var', 'module.layer3.4.bn3.num_batches_tracked', 'module.layer3.5.conv1.weight', 'module.layer3.5.bn1.weight', 'module.layer3.5.bn1.bias', 'module.layer3.5.bn1.running_mean', 'module.layer3.5.bn1.running_var', 'module.layer3.5.bn1.num_batches_tracked', 'module.layer3.5.conv2.weight', 'module.layer3.5.bn2.weight', 'module.layer3.5.bn2.bias', 'module.layer3.5.bn2.running_mean', 'module.layer3.5.bn2.running_var', 'module.layer3.5.bn2.num_batches_tracked', 'module.layer3.5.conv3.weight', 'module.layer3.5.bn3.weight', 'module.layer3.5.bn3.bias', 'module.layer3.5.bn3.running_mean', 'module.layer3.5.bn3.running_var', 'module.layer3.5.bn3.num_batches_tracked', 'module.layer4.0.conv1.weight', 'module.layer4.0.bn1.weight', 'module.layer4.0.bn1.bias', 'module.layer4.0.bn1.running_mean', 'module.layer4.0.bn1.running_var', 'module.layer4.0.bn1.num_batches_tracked', 'module.layer4.0.conv2.weight', 'module.layer4.0.bn2.weight', 'module.layer4.0.bn2.bias', 'module.layer4.0.bn2.running_mean', 'module.layer4.0.bn2.running_var', 'module.layer4.0.bn2.num_batches_tracked', 'module.layer4.0.conv3.weight', 'module.layer4.0.bn3.weight', 'module.layer4.0.bn3.bias', 'module.layer4.0.bn3.running_mean', 'module.layer4.0.bn3.running_var', 'module.layer4.0.bn3.num_batches_tracked', 'module.layer4.0.downsample.0.weight', 'module.layer4.0.downsample.1.weight', 'module.layer4.0.downsample.1.bias', 'module.layer4.0.downsample.1.running_mean', 'module.layer4.0.downsample.1.running_var', 'module.layer4.0.downsample.1.num_batches_tracked', 'module.layer4.1.conv1.weight', 'module.layer4.1.bn1.weight', 'module.layer4.1.bn1.bias', 'module.layer4.1.bn1.running_mean', 'module.layer4.1.bn1.running_var', 'module.layer4.1.bn1.num_batches_tracked', 'module.layer4.1.conv2.weight', 'module.layer4.1.bn2.weight', 'module.layer4.1.bn2.bias', 'module.layer4.1.bn2.running_mean', 'module.layer4.1.bn2.running_var', 'module.layer4.1.bn2.num_batches_tracked', 'module.layer4.1.conv3.weight', 'module.layer4.1.bn3.weight', 'module.layer4.1.bn3.bias', 'module.layer4.1.bn3.running_mean', 'module.layer4.1.bn3.running_var', 'module.layer4.1.bn3.num_batches_tracked', 'module.layer4.2.conv1.weight', 'module.layer4.2.bn1.weight', 'module.layer4.2.bn1.bias', 'module.layer4.2.bn1.running_mean', 'module.layer4.2.bn1.running_var', 'module.layer4.2.bn1.num_batches_tracked', 'module.layer4.2.conv2.weight', 'module.layer4.2.bn2.weight', 'module.layer4.2.bn2.bias', 'module.layer4.2.bn2.running_mean', 'module.layer4.2.bn2.running_var', 'module.layer4.2.bn2.num_batches_tracked', 'module.layer4.2.conv3.weight', 'module.layer4.2.bn3.weight', 'module.layer4.2.bn3.bias', 'module.layer4.2.bn3.running_mean', 'module.layer4.2.bn3.running_var', 'module.layer4.2.bn3.num_batches_tracked', 'module.projection_head.0.weight', 'module.projection_head.0.bias', 'module.projection_head.1.weight', 'module.projection_head.1.bias', 'module.projection_head.1.running_mean', 'module.projection_head.1.running_var', 'module.projection_head.1.num_batches_tracked', 'module.projection_head.3.weight', 'module.projection_head.3.bias', 'module.prototypes.weight'])
    

    How should I handle this problem if I want to load projection head into eval_semisup.py?

    Thanks

    opened by zhawhjw 0
  • How can I load DeepCluster-v2?

    How can I load DeepCluster-v2?

    I tired model = torch.hub.load('facebookresearch/deepcluster-v2', 'resnet50')

    It gave a error of HTTPError: HTTP Error 404: Not Found

    Could you help me out with this?

    opened by km5ar 0
  • A question about dataloader setting : `shufflu`?

    A question about dataloader setting : `shufflu`?

    Excellent work!

    I see that your training script does not use the shuffle=True setting when loading data. I wonder if this setting has any effect for performance?

    Does using shuffle=True have a positive effect? Or negative effects?

    opened by Classmate-Huang 0
Owner
Meta Research
Meta Research
Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper

Continual Learning With Filter Atom Swapping Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper If find t

null 11 Aug 29, 2022
Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Swapping Autoencoder for Deep Image Manipulation Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang UC

null 449 Dec 27, 2022
Swapping face using Face Mesh with TensorFlow Lite

Swapping face using Face Mesh with TensorFlow Lite

iwatake 17 Apr 26, 2022
Code of 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces Installation After cloning the repo open

null 37 Dec 3, 2022
PAWS 🐾 Predicting View-Assignments with Support Samples

This repo provides a PyTorch implementation of PAWS (predicting view assignments with support samples), as described in the paper Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples.

Facebook Research 437 Dec 23, 2022
Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

Deep Semisupervised Multiview Learning With Increasing Views (ISVN, IEEE TCYB) Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, Jie Lin, Huaibai Yan, Dez

null 3 Nov 19, 2022
A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

BraVe This is a JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short. The model provided in this package wa

DeepMind 44 Nov 20, 2022
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

null 111 Dec 29, 2022
Viewmaker Networks: Learning Views for Unsupervised Representation Learning

Viewmaker Networks: Learning Views for Unsupervised Representation Learning Alex Tamkin, Mike Wu, and Noah Goodman Paper link: https://arxiv.org/abs/2

Alex Tamkin 31 Dec 1, 2022
[ICCV 2021 (oral)] Planar Surface Reconstruction from Sparse Views

Planar Surface Reconstruction From Sparse Views Linyi Jin, Shengyi Qian, Andrew Owens, David F. Fouhey University of Michigan ICCV 2021 (Oral) This re

Linyi Jin 89 Jan 5, 2023
Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation Official PyTorch implementation for the paper Look

Rishabh Jangir 20 Nov 24, 2022
[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

null 249 Dec 28, 2022
Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yu

UT-Austin Robot Perception and Learning Lab 63 Jan 3, 2023
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Transparency-by-Design networks (TbD-nets) This repository contains code for replicating the experiments and visualizations from the paper Transparenc

David Mascharka 351 Nov 18, 2022
PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules

Dynamic Routing Between Capsules - PyTorch implementation PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules from Sara Sabour,

Adam Bielski 475 Dec 24, 2022
Pytorch implementation of Hinton's Dynamic Routing Between Capsules

pytorch-capsule A Pytorch implementation of Hinton's "Dynamic Routing Between Capsules". https://arxiv.org/pdf/1710.09829.pdf Thanks to @naturomics fo

Tim Omernick 625 Oct 27, 2022
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

mxin262 183 Jan 3, 2023
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

Microsoft 5.7k Jan 9, 2023
Compare outputs between layers written in Tensorflow and layers written in Pytorch

Compare outputs of Wasserstein GANs between TensorFlow vs Pytorch This is our testing module for the implementation of improved WGAN in Pytorch Prereq

Hung Nguyen 72 Dec 20, 2022