Pytorch implementation of the paper SPICE: Semantic Pseudo-labeling for Image Clustering

Related tags

Deep Learning SPICE
Overview

SPICE: Semantic Pseudo-labeling for Image Clustering

By Chuang Niu and Ge Wang

This is a Pytorch implementation of the paper. (In updating)

PWC PWC PWC PWC PWC

Installation

Please refer to requirement.txt for all required packages. Assuming Anaconda with python 3.7, a step-by-step example for installing this project is as follows:

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
conda install -c conda-forge addict tensorboard python-lmdb
conda install matplotlib scipy scikit-learn pillow

Then, clone this repo

git clone https://github.com/niuchuangnn/SPICE.git
cd SPICE

Data

Prepare datasets of interest as described in dataset.md.

Training

Read the training tutorial for details.

Evaluation

Evaluation of SPICE-Self:

python tools/eval_self.py --config-file configs/stl10/eval.py --weight PATH/TO/MODEL --all 1

Evaluation of SPICE-Semi:

python tools/eval_semi.py --load_path PATH/TO/MODEL --net WideResNet --widen_factor 2 --data_dir PATH/TO/DATA --dataset cifar10 --all 1 

Read the evaluation tutorial for more descriptions about the evaluation and the visualization of learned clusters.

Model Zoo

All trained models in our paper are available as follows.

Dataset Version ACC NMI ARI Model link
STL10 SPICE-Self 91.0 82.0 81.5 Model
SPICE 93.8 87.2 87.0 Model
SPICE-Self* 89.9 80.9 79.7 Model
SPICE* 92.9 86.0 85.3 Model
CIFAR10 SPICE-Self 83.8 73.4 70.5 Model
SPICE 92.6 86.5 85.2 Model
SPICE-Self* 84.9 74.5 71.8 Model
SPICE* 91.7 85.8 83.6 Model
CIFAR100 SPICE-Self 46.8 44.8 29.4 Model
SPICE 53.8 56.7 38.7 Model
SPICE-Self* 48.0 45.0 30.8 Model
SPICE* 58.4 58.3 42.2 Model
ImageNet-10 SPICE-Self 96.9 92.7 93.3 Model
SPICE 96.7 91.7 92.9 Model
ImageNet-Dog SPICE-Self 54.6 49.8 36.2 Model
SPICE 55.4 50.4 34.3 Model
TinyImageNet SPICE-Self 30.5 44.9 16.3 Model
SPICE-Self* 29.2 52.5 14.5 Model

More models based on ResNet18 for both SPICE-Self* and SPICE-Semi*.

Dataset Version ACC NMI ARI Model link
STL10 SPICE-Self* 86.2 75.6 73.2 Model
SPICE* 92.0 85.2 83.6 Model
CIFAR10 SPICE-Self* 84.5 73.9 70.9 Model
SPICE* 91.8 85.0 83.6 Model
CIFAR100 SPICE-Self* 46.8 45.7 32.1 Model
SPICE* 53.5 56.5 40.4 Model

Acknowledgement for reference repos

Citation

@misc{niu2021spice,
      title={SPICE: Semantic Pseudo-labeling for Image Clustering}, 
      author={Chuang Niu and Ge Wang},
      year={2021},
      eprint={2103.09382},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Comments
  • Issue in replicating results on CIFAR10

    Issue in replicating results on CIFAR10

    I am trying to train the model and replicate results for the CIFAR10 dataset. I downloaded the dataset as per instructions and followed steps for training as in the attached file -
    trainingStepsCIFAR10.md. It contains command and config used (if different from the default config in repo) along with output snippets.

    MOCO is getting trained. I think cluster heads are not getting trained as accuracy is not increasing from 26%. I also tried a lower learning rate of 0.0005 but it didn't help. Finding locally consistent examples fails to find any consistent example at ratio_confident=0.99 (provided default) and only 6770 at ratio_confident=0.9.

    Is this normal? If not, can you provide me some direction on where the issue might be present and how it can be resolved?

    Detailed logs of training cluster head are in file: spice-self-logs.txt There are few errors of broken pipe. I think those are due to the usage of Jupyter for training and reconnections with Jupyter server. It shouldn't affect the convergence of the model.

    opened by ML-Guy 21
  • question about parameters

    question about parameters

    dear author, image size of my own datasets is 224*224. in train_moco.py, how should I change learning rate and batch_size?

    when I ran with batch_size=128 and lr remained , running loss is about 9.5.

    opened by anewusername77 11
  • train_self_v2.py need ground truth label?

    train_self_v2.py need ground truth label?

        targets_i.append(gt_cluster_labels[h][img_idx_i].to(cfg.gpu, non_blocking=True))
    
        loss_dict = model(imgs_i, target=targets_i, forward_type="loss")
    
        loss = sum(loss for loss in loss_dict.values())
        loss_mean = loss / num_heads
    
        optimizer.zero_grad()
        loss_mean.backward()
        optimizer.step()
    

    i want to train my own data WITHOUT ground truth labels. but it seems this step need labels?

    opened by anewusername77 4
  • A quick question

    A quick question

    Dear authors of SPICE,

    I am very impressed by your work. Just a quick question: how the semantic prediction matrix P is obtained from the features?

    Thanks!

    opened by zhenxianglance 4
  • what if i don't have reliable datasets with label

    what if i don't have reliable datasets with label

    as the title said, can i train this model and lable those data?they are completely without any class labels, just want to assign them with a specific class(no need for meaning of classes)

    opened by anewusername77 4
  • Error from local_consistency , cannot create reliable_labels , and how to get predicted labels in all the ground truth clusters

    Error from local_consistency , cannot create reliable_labels , and how to get predicted labels in all the ground truth clusters

    Trained the feature model with custom dataset , and did the stage 2 training using train_self_v2.py script,

    But in the state 2 training evaluation metrics the accuracy was always -1 (never improved on 56 epochs), it was because the clusters created were not equal to the number of ground truth labels , but the training was completed.

    After when running the local_consistency.py script the model output from the local_consistency forward type output was none

    idx_select, labels_select = model(feas_sim=feas_sim, scores=scores, forward_type="local_consistency")

    Tried with different ratio_confident and score_th values , but the values of idx_select, labels_select are tensor([ ], dtype=torch.int64) tensor([ ], dtype=torch.int64)

    And so the reliable_labels are not created.

    opened by Balakumaran-kandula 3
  • embeddings TSNE

    embeddings TSNE

    Hi,

    I want to plot the embeddings' tsne. I am using feas_moco_512_l2.npy as the embeddings and datasets/stl10/stl10_binary/fold_indices.txt as the labels. The tsne plot looks a bit random though I can see different clusters... It seems that "fold_indices.txt" is not what I think it is. I've also noticed that there are some indices that correspond to multiple different classes (for example index 4555 shows up for both class '1' and class '8' in "fold_indices.txt").

    Any ideas?

    Thanks!

    image

    opened by ofir-dl 3
  • Does SPICE need targets(label for data on supervised learning)?

    Does SPICE need targets(label for data on supervised learning)?

    Hello Sir,

    I saw targets on CIFAR10.py Does SPICE need targets(label for data on supervised learning)? If so, why do you need targets on your process?

    Thanks, Edward Cho.

    opened by edwardcho 2
  • I can not train model in RTX3090

    I can not train model in RTX3090

    UserWarning: NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37. If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

    opened by YexiongLin 2
  • Evaluation on CIFAR100

    Evaluation on CIFAR100

    Hi, Thanks for sharing your code. I want to do an evaluation on cifar100 I used the following net and weight but I got a mismatch error while loading state_dict.

    --load_path ./model_zoo/model_cifar100_cls_res18.pth --net resnet18_cifar and --load_path ./model_zoo/model_cifar100.pth - -net WideResNet could you please tell me which net and model I should use for CIFAR100? Thanks

    opened by nahidq 2
  • invalid ELF header

    invalid ELF header

    When trying to run your code I get the following error, do you have any suggestions as to what I might be doing wrong?

    (base) anders@CenturionUbuntu:~/PycharmProjects/Spice/SPICE$ python tools/eval_self.py --config-file configs/cifar10/eval.py --weight model_zoo/model_cifar10.pth.tar --all 1 Traceback (most recent call last): File "tools/eval_self.py", line 12, in import torch File "/home/anders/anaconda3/lib/python3.8/site-packages/torch/init.py", line 188, in _load_global_deps() File "/home/anders/anaconda3/lib/python3.8/site-packages/torch/init.py", line 141, in _load_global_deps ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) File "/home/anders/anaconda3/lib/python3.8/ctypes/init.py", line 381, in init self._handle = _dlopen(self._name, mode) OSError: /home/anders/anaconda3/lib/python3.8/site-packages/torch/lib/../../../../libcufft.so.10: invalid ELF header

    opened by PhoQus 2
  • suggested bug fix + semi-evaluation question

    suggested bug fix + semi-evaluation question

    Hi ! Great work.

    To allow training MoCo with CIFAR datasets. I suggest you change row 184 in 'tools/train_moco.py'.

    from:

         model = moco.builder.MoCo(
            base_model,
            args.moco_dim, args.moco_k, args.moco_m, args.moco_t, args.mlp)
    

    to:

         model = moco.builder.MoCo(
            base_model,
            args.moco_dim, args.moco_k, args.moco_m, args.moco_t, args.mlp, args.img_size)
    

    Another question, Do you perform your final evaluation (SPICE-semi) on the resulting 'model_best.pth' or 'latest_model.pth' ? I haven't saw any reference for it. Thanks !

    opened by DanielShalam 0
  • SPICE on single GPU

    SPICE on single GPU

    It seems like you have prepared arguments to run the script on single GPU (--multiprocessing-distributed and --gpu), but then I run into "Only DistributedDataParallel is supported." exception. Or when I run the script only with "--gpu 0" as argument, I run into error originating in spawn .py code.

    I only have access to a computer with single Nvidia 3090 GPU and want to test your model as part of my Master's thesis. Does it mean that I would need to implement a version for 1 GPU myself, or am I missing something in your implementation?

    opened by TomasPlachy 0
  • If I want to remove targets (labels), how to change your code?

    If I want to remove targets (labels), how to change your code?

    Hello Sir,

    I think that Clustering should do not have 'the targets(labels)'. But I found targets in your code (cifar.py).

    So I want to remove codes that related to targets(labels). How to change your code?? Is it possible?

    Thanks, Edward Cho

    opened by edwardcho 0
  • Using CIFAR10, when I tried to train, I met some error at step-4 (4. Determine reliable images).

    Using CIFAR10, when I tried to train, I met some error at step-4 (4. Determine reliable images).

    Hello Sir,

    As I mentioned at Title, I was tried to train your code.

    When I tried train according to your help-site(https://github.com/niuchuangnn/SPICE/blob/main/train.md), I met some error.

    4. Determine reliable images
    
    python tools/local_consistency.py --config-file ./configs/cifar10/eval.py --embedding ./results/cifar10/embedding/feas_moco_512_l2.npy
    

    This is error message.

    Traceback (most recent call last):
      File "tools/local_consistency.py", line 141, in <module>
        main()
      File "tools/local_consistency.py", line 131, in main
        acc = calculate_acc(labels_select, gt_labels_select)
      File "./spice/utils/evaluation.py", line 21, in calculate_acc
        assert len(np.unique(ypred)) == len(np.unique(y))
    AssertionError
    

    So I checked labels_select and gt_labels_select,

    len(labels_select) : 10
    tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    len(gt_labels_select) : 10
    [8 5 8 5 3 4 1 5 1 3]
    

    What I should do for this problem??

    Thanks, Edward Cho.

    opened by edwardcho 0
  • SPICE training Error

    SPICE training Error

    Hi, I have a question to ask. After I follow the steps to install, start training python tools/train_moco.py --img_size 32 --moco-k 12800 --arch resnet18_cifar --save_folder ./results/cifar10/moco_res18_cls --resume ./results/cifar10/moco_res18_cls/checkpoint_last.pth.tar --data_type cifar10 --data ./datasets/cifar10 --all 0in training tutorial

    Below is the error:

    (u) C:\Users\Kelly>cd SPICE

    (u) C:\Users\Kelly\SPICE>python tools/train_moco.py --img_size 32 --moco-k 12800 --arch resnet18_cifar --save_folder ./results/cifar10/moco_res18_cls --resume ./results/cifar10/moco_res18_cls/checkpoint_last.pth.tar --data_type cifar10 --data ./datasets/cifar10 --all 0 Use GPU: 0 for training Traceback (most recent call last): File "tools/train_moco.py", line 453, in main() File "tools/train_moco.py", line 145, in main mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) File "C:\Users\Kelly.conda\envs\u\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "C:\Users\Kelly.conda\envs\u\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes while not context.join(): File "C:\Users\Kelly.conda\envs\u\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

    -- Process 0 terminated with the following error: Traceback (most recent call last): File "C:\Users\Kelly.conda\envs\u\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap fn(i, *args) File "C:\Users\Kelly\SPICE\tools\train_moco.py", line 170, in main_worker dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url, File "C:\Users\Kelly.conda\envs\u\lib\site-packages\torch\distributed\distributed_c10d.py", line 602, in init_process_group default_pg = _new_process_group_helper( File "C:\Users\Kelly.conda\envs\u\lib\site-packages\torch\distributed\distributed_c10d.py", line 727, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError: Distributed package doesn't have NCCL built in

    How do I need to solve thanks Kelly

    opened by kkellyk 0
Owner
Chuang Niu
Chuang Niu
Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

Flood Detection Challenge This repository contains code for our submission to the ETCI 2021 Competition on Flood Detection (Winning Solution #2). Acco

Siddha Ganju 108 Dec 28, 2022
Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Yaoming Cai 5 Jul 18, 2022
Learn about Spice.ai with in-depth samples

Samples Learn about Spice.ai with in-depth samples ServerOps - Learn when to run server maintainance during periods of low load Gardener - Intelligent

Spice.ai 16 Mar 23, 2022
Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

ADGC: Awesome Deep Graph Clustering ADGC is a collection of state-of-the-art (SOTA), novel deep graph clustering methods (papers, codes and datasets).

yueliu1999 297 Dec 27, 2022
Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021, official Pytorch implementatio

Microsoft 247 Dec 25, 2022
[CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

TorchSemiSeg [CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision by Xiaokang Chen1, Yuhui Yuan2, Gang Zeng1, Jingdong Wang

Chen XiaoKang 387 Jan 8, 2023
[CVPR 2022] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

Using Unreliable Pseudo Labels Official PyTorch implementation of Semi-Supervised Semantic Segmentation Using Unreliable Pseudo Labels, CVPR 2022. Ple

Haochen Wang 268 Dec 24, 2022
An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020

UnpairedSR An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020 turn RCAN(modified) --> xmodel(xilinx

JiaKui Hu 10 Oct 28, 2022
This is an unofficial PyTorch implementation of Meta Pseudo Labels

This is an unofficial PyTorch implementation of Meta Pseudo Labels. The official Tensorflow implementation is here.

Jungdae Kim 320 Jan 8, 2023
PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

Zhe Niu 28 Dec 19, 2022
Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req

Rowel Atienza 152 Dec 28, 2022
labelpix is a graphical image labeling interface for drawing bounding boxes

Welcome to labelpix ?? labelpix is a graphical image labeling interface for drawing bounding boxes. ?? Homepage Install pip install -r requirements.tx

schissmantics 26 May 24, 2022
PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in clustering (CVPR2021)

PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering Jang Hyun Cho1, Utkarsh Mall2, Kavita Bala2, Bharath Harihar

Jang Hyun Cho 164 Dec 30, 2022
code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

SHOT++ Code for our TPAMI submission "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer" that is ext

null 75 Dec 16, 2022
Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation Introduction Getting Started FSD50K Recipe AudioSet Recipe Label E

Yuan Gong 84 Dec 27, 2022
A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

Semantic Image Synthesis via Adversarial Learning This is a PyTorch implementation of the paper Semantic Image Synthesis via Adversarial Learning. Req

Seonghyeon Nam 146 Nov 25, 2022
PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

Reference-Based-Sketch-Image-Colorization-ImageNet This is a PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization usin

Yuzhi ZHAO 11 Jul 28, 2022
The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

STAR-FC This code is the implementation for the CVPR 2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes" ?? ?? . ?? Re

Shuai Shen 87 Dec 28, 2022
PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

Completer: Incomplete Multi-view Clustering via Contrastive Prediction This repo contains the code and data of the following paper accepted by CVPR 20

XLearning Group 72 Dec 7, 2022