Self-labelling via simultaneous clustering and representation learning. (ICLR 2020)

Overview

Self-labelling via simultaneous clustering and representation learning

๐Ÿ†— ๐Ÿ†— ๐ŸŽ‰ NEW models (20th August 2020): Added standard SeLa pretrained torchvision ResNet models to make loading much easier + added baselines using better MoCov2 augmentation (~69% LP performance) + added evaluation with K=1000 for ImageNet "unuspervised clustering"

๐Ÿ†• โœ… ๐ŸŽ‰ updated code: 23rd April 2020: bug fixes + CIFAR code + evaluation for resnet & alexnet.

Checkout our blogpost for a quick non-technical overview and an interactive visualization of our clusters.

Self-Label

This code is the official implementation of the ICLR 2020 paper Self-labelling via simultaneous clustering and representation learning.

Abstract

Combining clustering and representation learning is one of the most promising approaches for unsupervised learning of deep neural networks. However, doing so naively leads to ill posed learning problems with degenerate solutions. In this paper, we propose a novel and principled learning formulation that addresses these issues. The method is obtained by maximizing the information between labels and input data indices. We show that this criterion extends standard crossentropy minimization to an optimal transport problem, which we solve efficiently for millions of input images and thousands of labels using a fast variant of the Sinkhorn-Knopp algorithm. The resulting method is able to self-label visual data so as to train highly competitive image representations without manual labels. Our method achieves state of the art representation learning performance for AlexNet and ResNet-50 on SVHN, CIFAR-10, CIFAR-100 and ImageNet.

Results at a glance

NMI(%) aNMI(%) ARI(%) LP Acc (%)
AlexNet 1k 50.5 12.2 2.7 42.1
AlexNet 10k 66.4 4.7 4.7 43.8
R50 10x3k 54.2 34.4 7.2 61.5

With better augmentations (all single crop)

Label-Acc NMI(%) aNMI(%) ARI(%) LP Acc (%) model_weights
Aug++ R18 1k (new) 26.9 62.7 36.4 12.5 53.3 here
Aug++ R50 1k (new) 30.5 65.7 42.0 16.2 63.5 here
Aug++ R50 10x3k (new) 38.1 75.7 52.8 27.6 68.8 here
(MoCo-v2 + k-means**, K=3k) 71.4 39.6 15.8 71.1
  • "Aug++" refers to the better augmentations used in SimCLR, taken from the MoCo-v2 repo, but I still only trained for 280 epochs, with three lr-drops as in CMC.
  • There are still further improvements to be made with a MLP or training 800 epochs (I train 280), as done in SimCLR, MoCov2 and SwAV.
  • **MoCo-v2 uses 800 epochs, MLP and cos-lr-schedule. On MoCo-v2 I run k-means (K=3000) on the avg-pooled features (after the MLP-head it's pretty much the same performance) to obtain NMI, aNMI and ARI numbers.
  • Models above use standard torchvision ResNet backbones so loading is now super easy:
import torch, torchvision
model = torchvision.models.resnet50(pretrained=False, num_classes=3000)
ckpt = torch.load('resnet50-10x3k_pp.pth')
model.load_state_dict(ckpt['state_dict'])
pseudolabels = ckpt['L']
  • note on improvement potential: by just using "aug+": I get LP-accuracy of 67.2% after 200 epochs. MoCo-v2 with "aug+" only has 63.4% after 200 epochs.

Clusters that were discovered by our method

Sorted

Imagenet validation images with clusters sorted by imagenet purity

Random

Imagenet validation images with random clusters

The edge-colors encode the true imagenet classes (which are not used for training). You can view all clusters here.

Requirements

  • Python >3.6
  • PyTorch > 1.0
  • CUDA
  • Numpy, SciPy
  • also, see requirements.txt
  • (optional:) TensorboardX

Running our code

Run the self-supervised training of an AlexNet with the command

$./scripts/alexnet.sh

or train a ResNet-50 with

$./scripts/resnet.sh

Note: you need to specify your dataset directory (it expects a format just like ImageNet with "train" and "val" folders). You also need to give the code enough GPUs to allow for storage of activations on the GPU. Otherwise you need to use the CPU variant which is significantly slower.

Full documentation of the unsupervised training code main.py:

usage: main.py [-h] [--epochs EPOCHS] [--batch-size BATCH_SIZE] [--lr LR]
               [--lrdrop LRDROP] [--wd WD] [--dtype {f64,f32}] [--nopts NOPTS]
               [--augs AUGS] [--paugs PAUGS] [--lamb LAMB] [--cpu]
               [--arch ARCH] [--archspec {big,small}] [--ncl NCL] [--hc HC]
               [--device DEVICE] [--modeldevice MODELDEVICE] [--exp EXP]
               [--workers WORKERS] [--imagenet-path IMAGENET_PATH]
               [--comment COMMENT] [--log-intv LOG_INTV] [--log-iter LOG_ITER]

PyTorch Implementation of Self-Label

optional arguments:
  -h, --help            show this help message and exit
  --epochs EPOCHS       number of epochs
  --batch-size BATCH_SIZE
                        batch size (default: 256)
  --lr LR               initial learning rate (default: 0.05)
  --lrdrop LRDROP       multiply LR by 0.1 every (default: 150 epochs)
  --wd WD               weight decay pow (default: (-5)
  --dtype {f64,f32}     SK-algo dtype (default: f64)
  --nopts NOPTS         number of pseudo-opts (default: 100)
  --augs AUGS           augmentation level (default: 3)
  --paugs PAUGS         for pseudoopt: augmentation level (default: 3)
  --lamb LAMB           for pseudoopt: lambda (default:25)
  --cpu                 use CPU variant (slow) (default: off)
  --arch ARCH           alexnet or resnet (default: alexnet)
  --archspec {big,small}
                        alexnet variant (default:big)
  --ncl NCL             number of clusters per head (default: 3000)
  --hc HC               number of heads (default: 1)
  --device DEVICE       GPU devices to use for storage and model
  --modeldevice MODELDEVICE
                        GPU numbers on which the CNN runs
  --exp EXP             path to experiment directory
  --workers WORKERS     number workers (default: 6)
  --imagenet-path IMAGENET_PATH
                        path to folder that contains `train` and `val`
  --comment COMMENT     name for tensorboardX
  --log-intv LOG_INTV   save stuff every x epochs (default: 1)
  --log-iter LOG_ITER   log every x-th batch (default: 200)

Evaluation

Linear Evaluation

We provide the linear evaluation methods in this repo. Simply download the models via . ./scripts/download_models.sh and then either run scripts/eval-alexnet.sh or scripts/eval-resnet.sh.

Pascal VOC

We follow the standard evaluation protocols for self-supervised visual representation learning.

Our extracted pseudolabels

As we show in the paper, the pseudolabels we generate from our training can be used to quickly train a neural network with regular cross-entropy. Moreover they seem to correctly group together similar images. Hence we provide the labels for everyone to use.

AlexNet

You can download the pseudolabels from our best (raw) AlexNet model with 10x3000 clusters here.

ResNet

You can download the pseudolabels from our best ResNet model with 10x3000 clusters here.

Trained models

You can also download our trained models by running

$./scripts/download_models.sh

Use them like this:

import torch
import models
d = torch.load('self-label_models/resnet-10x3k.pth')
m = models.resnet(num_classes = [3000]*10)
m.load_state_dict(d)

d = torch.load('self-label_models/alexnet-10x3k-wRot.pth')
m = models.alexnet(num_classes = [3000]*10)
m.load_state_dict(d)

Reference

If you use this code etc., please cite the following paper:

Yuki M. Asano, Christian Rupprecht and Andrea Vedaldi. "Self-labelling via simultaneous clustering and representation learning." Proc. ICLR (2020)

@inproceedings{asano2020self,
  title={Self-labelling via simultaneous clustering and representation learning},
  author={Asano, Yuki M. and Rupprecht, Christian and Vedaldi, Andrea},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2020},
}
Comments
  • Some questions about this paper

    Some questions about this paper

    Really a nice work ! But I have some minor questions:

    1: I think Eq.5 should be writen as : log E(p,q) + log N = log<Q, -log P>. Can you give some points about how Eq.5 holds on? (althoug it does not influence any conclusion in this paper, I just want to confirm it)

    2: After reading your answer about "How do you transition from the probability matrix Q to the labels?" on openreview website, I have some confusion๏ผš 1) Do you mean that in step1(representation learning), the Q of eq.6 is actually a one-hot matrix by applying argmax on probability matrix Q^* (the Q^* here means the direct solution of eq.7 in step2)? 2) If I understand correctly, how about use Q^* directly in step1 to compute the cross-entropy loss, but not use argmax(Q^*) ? If a soft label will be better?

    p.s. I have asked above questions on openreview website but maybe you did not receive it, so sorry to bother you again here.

    opened by haohang96 7
  • Multiple Issues

    Multiple Issues

    Hi

    Trying to run the code, I came across the following issues which might be helpful if you fix them:

    • Resnet architecture is incorrectly written as "presnet" in scripts/resnet.sh
    • In return_model_loader function at data.py, the trainloader is only available for resnet architecture.
    • Using only one GPU will result in division by zero multigpu.py
    • Using more than one classification head leads to IndexError in CrossEntropy. Could you please check these issues?

    Thank you

    opened by mrm-196 6
  • How to correspond GT and and labels for eval?

    How to correspond GT and and labels for eval?

    Thanks for your codebase!

    I tried cifar.py for cifar 10 and did not know bellow code.

    acc = kNN(model, trainloader, testloader, K=10, sigma=0.1, dim=knn_dim)

    kNN devide test data of cifar10 to 10 clusters and calclate acc with GT? If so, how to correspond GT classes and and clusters for eval?

    opened by ilikeokoge 3
  •  how to configure number of clustering heads T, if clustering is priority

    how to configure number of clustering heads T, if clustering is priority

    Hi,

    You say in a paper 3.5 Since our main objective is to use clustering to learn a good data representation ฮฆ, we consider a multi-task setting in which the same representation is shared among several different clustering tasks, which can potentially capture different and complementary clustering axis

    What if my main objective is to receive meaningful clusters. I would like to train model on my own unlabeled, rather messy dataset and then clean dataset from particular images filtering out some clusters. Would you recommend to set number of heads T =1 in that case?

    opened by ksulima 3
  • Custom Dataset All-NaN Slice encountered

    Custom Dataset All-NaN Slice encountered

    Hi,

    I am trying to use your algorithm using another dataset. After few epochs, when I tried to update the pseudo labels, I got this message :

    Traceback (most recent call last):
      File "PseudoLabelling.py", line 391, in <module>
        L, cfg.PS = update_pseudoLabels(model, cfg, inferloader, L)
      File "PseudoLabelling.py", line 314, in update_pseudoLabels
        L, PS = cpu_sk(model, pseudo_loader, cfg.hc, cfg.outs, L, cfg.presize, cfg.K, cfg.dtype, cfg.device, cfg.lamb)
      File "PseudoLabelling.py", line 251, in cpu_sk
        L, PS = optimize_L_sk(L, PS, outs, lamb, "cpu", dtype, nh=0)
      File "PseudoLabelling.py", line 306, in optimize_L_sk
        argmaxes = np.nanargmax(PS, 0) # size N
    /.conda/envs/torch/lib/python3.8/site-packages/numpy/lib/nanfunctions.py", line 551, in nanargmax
    aise ValueError("All-NaN slice encountered")
    ValueError: All-NaN slice encounteredes = np.nanargmax(PS, 0) # size N
    

    Why did I get this error ? Is it because every pseudo labels are the same than the previous epochs ?

    opened by Shiro-LK 3
  • Provide a concise requirements.txt

    Provide a concise requirements.txt

    Hi @yukimasano and @mayank010698 ,

    Could you please provide a requirements.txt file for everyone interested to set up the project painlessly? Your README.md suggests the following:

    Requirements Python >3.6 PyTorch > 1.0 CUDA Numpy, SciPy (optional:) TensorboardX

    Here's a configuration which successfully runs the scripts/eval-resnet.sh script with the resnet-10x3k model:

    scipy==1.5.2
    tensorboardx==2.1
    torch==1.6.0
    torchvision~=0.7.0
    numpy~=1.19.2
    Pillow~=7.2.0
    

    Providing a requirements.txt file can save a lot of debugging hours finding which exact version of torch (> 1.0.0) and torchvision works.

    Thank you in advance!

    opened by bespoke-code 2
  • If need to re-set classification head (fc layer) at each new assignment?

    If need to re-set classification head (fc layer) at each new assignment?

    opened by haohang96 2
  • Reproducing Results

    Reproducing Results

    Hello,

    I am trying to reproduce the reported results for ImageNet using AlexNet. I have simply cloned the repository and run alexnet.sh script. I am using 4 Nvidia RTX 2080ti GPUs.

    In the paper, for the label optimization step, it is stated that "convergence is reached within 2 minutes on ImageNet when computed on a GPU". In my case, it takes 5 days and the program throws an out-of-memory error after that. What kind of GPUs have you used during the training?

    Also, as a follow-up: I was wondering if it is possible to share the CIFAR code since it is easier to run and reproduce the results?

    Thank you.

    opened by bhiziroglu 2
  • ckpt['L'] should be ckpt['labels']

    ckpt['L'] should be ckpt['labels']

    In the code snippet, it's written pseudolabels = ckpt['L'] but there's no such key L in the state dict. labels is the correct key I guess.

    @yukimasano

    opened by sayakpaul 1
  • Labels order if you set shuffle=True in the dataloader

    Labels order if you set shuffle=True in the dataloader

    Hi, thanks for your great paper. I've noticed that you set shuffle=True in your dataloader t_loader and let train_loader and pseudo_loader point to the same object t_loader. I'm wondering in this way whether the optimised labels are in the right order when they are used to train the model since you will shuffle the data again.

    opened by cnyanhao 1
  • getting the pseudolabels

    getting the pseudolabels

    Hi, I wonder if you have the code for getting the pseudolabels of training stage? As you recommend:

    "As we show in the paper, the pseudolabels we generate from our training can be used to quickly train a neural network with regular cross-entropy. Moreover they seem to correctly group together similar images. "

    I want to get the pseudolabels as I train the model on a new dataset. Thank you!

    opened by hosea7456 1
  • TOP1:  75.48 on cifar 10,  much less than 83.4 reported in the paper.

    TOP1: 75.48 on cifar 10, much less than 83.4 reported in the paper.

    Hi, I run the cifar.sh in the end, the results output like this:

    Epoch: [399][390/391]Time: 0.074 (0.308) Data: 0.008 (0.210) Loss: 1.6734 (1.5705)
    10-NN,s=0.1: TOP1:  75.43
    best accuracy: 75.62
    doing PCA with 128 components ..done
    10-NN,s=0.1: TOP1:  75.48
    ......
    Epoch: [399][390/391]Time: 0.074 (0.308) Data: 0.008 (0.210) Loss: 1.6734 (1.5705)
    10-NN,s=0.1: TOP1:  75.43
    best accuracy: 75.62
    doing PCA with 128 components ..done
    10-NN,s=0.1: TOP1:  75.48
    
    

    this result is much less than 83.4 reported in the paper. Can someone tell me why? Thank you very much!

    opened by chester-w-xie 1
  • question about learning rate

    question about learning rate

    Hello, thanks for your great work. I want to reproduce your results in ImageNet. However, I find that the default learning-rate in code is 0.08, but it is 0.05 in paper. So which learning-rate I should use to reproduce the results?

    opened by lzyhha 0
Owner
Yuki M. Asano
I'm an Computer Vision researcher at the University of Amsterdam. Did my PhD at the Visual Geometry Group in Oxford.
Yuki M. Asano
Implementation of "Selection via Proxy: Efficient Data Selection for Deep Learning" from ICLR 2020.

Selection via Proxy: Efficient Data Selection for Deep Learning This repository contains a refactored implementation of "Selection via Proxy: Efficien

Stanford Future Data Systems 70 Nov 16, 2022
Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Yaoming Cai 5 Jul 18, 2022
Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

ADGC: Awesome Deep Graph Clustering ADGC is a collection of state-of-the-art (SOTA), novel deep graph clustering methods (papers, codes and datasets).

yueliu1999 297 Dec 27, 2022
Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling

null 274 Dec 6, 2022
A light-weight image labelling tool for Python designed for creating segmentation data sets.

An image labelling tool for creating segmentation data sets, for Django and Flask.

null 117 Nov 21, 2022
Paper Title: Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution

HKDnet Paper Title: "Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution" Email: 18186470991@163.

wasteland 11 Nov 12, 2022
This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

JigsawClustering Jigsaw Clustering for Unsupervised Visual Representation Learning Pengguang Chen, Shu Liu, Jiaya Jia Introduction This project provid

DV Lab 73 Sep 18, 2022
ใ€ŠImproving Unsupervised Image Clustering With Robust Learningใ€‹(2020)

Improving Unsupervised Image Clustering With Robust Learning This repo is the PyTorch codes for "Improving Unsupervised Image Clustering With Robust L

Sungwon Park 129 Dec 27, 2022
Simultaneous NMT/MMT framework in PyTorch

This repository includes the codes, the experiment configurations and the scripts to prepare/download data for the Simultaneous Machine Translation wi

NLP@Imperial 37 Sep 29, 2022
SIMULEVAL A General Evaluation Toolkit for Simultaneous Translation

SimulEval SimulEval is a general evaluation framework for simultaneous translation on text and speech. Requirement python >= 3.7.0 Installation git cl

Facebook Research 48 Dec 28, 2022
An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

Self-Supervised Graph Representation Learning via Topology Transformations This repository is the official PyTorch implementation of the following pap

Hsiang Gao 2 Oct 31, 2022
Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision https://arxiv.org/abs/2003.00393 Abstract Active learning (AL) aims to min

Denis 29 Nov 21, 2022
Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

NDQ: Learning Nearly Decomposable Value Functions with Communication Minimization Note This codebase accompanies paper Learning Nearly Decomposable Va

Tonghan Wang 69 Nov 26, 2022
ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing This repository contains code for the ICLR 2021 paper "SCoRE: Pre-Tr

Microsoft 28 Oct 2, 2022
Pytorch code for "State-only Imitation with Transition Dynamics Mismatch" (ICLR 2020)

This repo contains code for our paper State-only Imitation with Transition Dynamics Mismatch published at ICLR 2020. The code heavily uses the RL mach

null 20 Sep 8, 2022
A supplementary code for Editable Neural Networks, an ICLR 2020 submission.

Editable neural networks A supplementary code for Editable Neural Networks, an ICLR 2020 submission by Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitry Py

Anton Sinitsin 32 Nov 29, 2022
[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

FedBN: Federated Learning on Non-IID Features via Local Batch Normalization This is the PyTorch implemention of our paper FedBN: Federated Learning on

Med-AIR@CUHK 156 Dec 15, 2022
Code for "Learning Structural Edits via Incremental Tree Transformations" (ICLR'21)

Learning Structural Edits via Incremental Tree Transformations Code for "Learning Structural Edits via Incremental Tree Transformations" (ICLR'21) 1.

NeuLab 40 Dec 23, 2022
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022