《Improving Unsupervised Image Clustering With Robust Learning》(2020)

Related tags

Deep Learning RUC
Overview

Improving Unsupervised Image Clustering With Robust Learning

This repo is the PyTorch codes for "Improving Unsupervised Image Clustering With Robust Learning (RUC)"

Improving Unsupervised Image Clustering With Robust Learning

Sungwon Park, Sungwon Han, Sundong Kim, Danu Kim, Sungkyu Park, Seunghoon Hong, Meeyoung Cha.

Highlight

  1. RUC is an add-on module to enhance the performance of any off-the-shelf unsupervised learning algorithms. RUC is inspired by robust learning. It first divides clustered data points into clean and noisy set, then refine the clustering results. With RUC, state-of-the-art unsupervised clustering methods; SCAN and TSUC showed showed huge performance improvements. (STL-10 : 86.7%, CIFAR-10 : 90.3%, CIFAR-20 : 54.3%)

  1. Prediction results of existing unsupervised learning algorithms were overconfident. RUC can make the prediction of existing algorithms softer with better calibration.

  1. Robust to adversarially crafted samples. ERM-based unsupervised clustering algorithms can be prone to adversarial attack. Adding RUC to the clustering models improves robustness against adversarial noise.

  1. Robust to adversarially crafted samples. ERM-based unsupervised clustering algorithms can be prone to adversarial attack. Adding RUC to the clustering models improves robustness against adversarial noise.

Required packages

  • python == 3.6.10
  • pytorch == 1.1.0
  • scikit-learn == 0.21.2
  • scipy == 1.3.0
  • numpy == 1.18.5
  • pillow == 7.1.2

Overall model architecture

Usage

usage: main_ruc_[dataset].py [-h] [--lr LR] [--momentum M] [--weight_decay W]
                         [--epochs EPOCHS] [--batch_size B] [--s_thr S_THR]
                         [--n_num N_NUM] [--o_model O_MODEL]
                         [--e_model E_MODEL] [--seed SEED]

config for RUC

optional arguments:
  -h, --help            show this help message and exit
  --lr LR               initial learning rate
  --momentum M          momentum
  --weight_decay        weight decay
  --epochs EPOCHS       max epoch per round. (default: 200)
  --batch_size B        training batch size
  --s_thr S_THR         confidence sampling threshold
  --n_num N_NUM         the number of neighbor for metric sampling
  --o_model O_MODEL     original model path
  --e_model E_MODEL     embedding model path
  --seed SEED           random seed

Model ZOO

Currently, we support the pretrained model for our model. We used the pretrained SCAN and SimCLR model from SCAN github.

Dataset Download link
CIFAR-10 Download
CIFAR-20 Download
STL-10 Download

Citation

If you find this repo useful for your research, please consider citing our paper:

@article{park2020improving,
  title={Improving Unsupervised Image Clustering With Robust Learning},
  author={Park, Sungwon and Han, Sungwon and Kim, Sundong and Kim, Danu and Park, Sungkyu and Hong, Seunghoon and Cha, Meeyoung},
  journal={arXiv preprint arXiv:2012.11150},
  year={2020}
}
Comments
  • ImageNet dataset

    ImageNet dataset

    Thank you so much for your amazing work!

    Could you please provide the script used for ImageNet dataset?

    I would like to train RUC on my own dataset which has a similar image size to ImageNet. However, I cannot train the SCAN-pretrained model (MOCO) with an 8G GPU, even if I resize the image to 96x96 and set the batch size to 2. Thus I would like to ask, which GPUs did you use in your ImageNet experiments and how much memory do they have? If I understand your paper well, you resize them to 256x256, right? And what is the batch size you use?

    Many thanks in advance!

    opened by qinghezeng 3
  • tabular data/ noisy instances

    tabular data/ noisy instances

    Hi, thanks for sharing your implementation. I have two questions about it:

    1. Does it also work on tabular data?
    2. Is it possible to identify the noisy instances (return the noisy IDs or the clean set).

    Thanks!

    opened by nazaretl 2
  • about cluster visualization

    about cluster visualization

    Could you tell me how did you visualize clusters just like calibration.png? I would really appreciate if you let me know the method. Thank you in advance.

    opened by jooyeon-Yee 2
  • About pretrained model

    About pretrained model

    opened by zwhyOuO 2
  • Where should I get the pre-trained model for

    Where should I get the pre-trained model for "o-model" and "e_model"?

    Hi, I want to reproduce your code, however, I noticed that it is necessary to load pretrained model from param "o_model" and "e_model" which is absent in the repo neither readme.md. Where could I get them? Thanks a lot.

    opened by ValkyriaLenneth 2
  • Final model selection

    Final model selection

    Hi, thank you again for sharing the code!

    I wonder which model did you use for the evaluation in the paper, Model 1 or Model 2 or some kind of ensembling (for example prediction3 in function test_ruc in lib/protocols.py)? And how did you select the best model? By selecting the best accuracy of prediction3 on training or validation set?

    Thank you so much!

    opened by qinghezeng 1
  • How to get my clustering result?

    How to get my clustering result?

    Hi~Thanks for your excellent work. I trained RUC on CIFAR10 datasets,but I just get the trained model. I want to know how to get clustering result? If I want to save label corresponding to each sample to a txt file , what should I do?

    opened by zbw0329 1
  • original model  vs   embedding model

    original model vs embedding model

    Hello what different between
    --o_model O_MODEL original model path --e_model E_MODEL embedding model path

    Where do I find them? There is only one model

    Screenshot from 2021-06-02 12-45-40

    Thanks

    opened by DalalAlmutawa 1
  • Does RUC separatate train and test set?

    Does RUC separatate train and test set?

    Hi, I'm confused as to whether the reported performances are with train and test separated i.e training only with train dataset and evaluate using test dataset (I believe this is how SCAN reported results). In code, it seems like both the training dataset and test dataset are loaded with train=True meaning the model uses only the train dataset for training and reporting performance?!

    Thank you for your time and for open-sourcing the code. ✌️

    opened by shekkizh 1
  • Misatch module names

    Misatch module names

    Hi, I have trained the model using the SCAN GitHub and generated two models under the scan and self-label directory as instructed in the SCAN GitHub. When I use those models to run main_ruc_cifar10.py, I get the following error.

    Traceback (most recent call last): File "main_ruc_cifar10.py", line 376, in main() File "main_ruc_cifar10.py", line 317, in main net_uc.load_state_dict(state_dict) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1605, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for ClusteringModel: Missing key(s) in state_dict: "backbone.conv1.weight", "backbone.bn1.weight", "backbone.bn1.bias", "backbone.bn1.running_mean", "backbone.bn1.running_var", "backbone.layer1.0.conv1.weight", "backbone.layer1.0.bn1.weight", "backbone.layer1.0.bn1.bias", "backbone.layer1.0.bn1.running_mean", "backbone.layer1.0.bn1.running_var", "backbone.layer1.0.conv2.weight", "backbone.layer1.0.bn2.weight", "backbone.layer1.0.bn2.bias", "backbone.layer1.0.bn2.running_mean", "backbone.layer1.0.bn2.running_var", "backbone.layer1.1.conv1.weight", "backbone.layer1.1.bn1.weight", "backbone.layer1.1.bn1.bias", "backbone.layer1.1.bn1.running_mean", "backbone.layer1.1.bn1.running_var", "backbone.layer1.1.conv2.weight", "backbone.layer1.1.bn2.weight", "backbone.layer1.1.bn2.bias", "backbone.layer1.1.bn2.running_mean", "backbone.layer1.1.bn2.running_var", "backbone.layer2.0.conv1.weight", "backbone.layer2.0.bn1.weight", "backbone.layer2.0.bn1.bias", "backbone.layer2.0.bn1.running_mean", "backbone.layer2.0.bn1.running_var", "backbone.layer2.0.conv2.weight", "backbone.layer2.0.bn2.weight", "backbone.layer2.0.bn2.bias", "backbone.layer2.0.bn2.running_mean", "backbone.layer2.0.bn2.running_var", "backbone.layer2.0.shortcut.0.weight", "backbone.layer2.0.shortcut.1.weight", "backbone.layer2.0.shortcut.1.bias", "backbone.layer2.0.shortcut.1.running_mean", "backbone.layer2.0.shortcut.1.running_var", "backbone.layer2.1.conv1.weight", "backbone.layer2.1.bn1.weight", "backbone.layer2.1.bn1.bias", "backbone.layer2.1.bn1.running_mean", "backbone.layer2.1.bn1.running_var", "backbone.layer2.1.conv2.weight", "backbone.layer2.1.bn2.weight", "backbone.layer2.1.bn2.bias", "backbone.layer2.1.bn2.running_mean", "backbone.layer2.1.bn2.running_var", "backbone.layer3.0.conv1.weight", "backbone.layer3.0.bn1.weight", "backbone.layer3.0.bn1.bias", "backbone.layer3.0.bn1.running_mean", "backbone.layer3.0.bn1.running_var", "backbone.layer3.0.conv2.weight", "backbone.layer3.0.bn2.weight", "backbone.layer3.0.bn2.bias", "backbone.layer3.0.bn2.running_mean", "backbone.layer3.0.bn2.running_var", "backbone.layer3.0.shortcut.0.weight", "backbone.layer3.0.shortcut.1.weight", "backbone.layer3.0.shortcut.1.bias", "backbone.layer3.0.shortcut.1.running_mean", "backbone.layer3.0.shortcut.1.running_var", "backbone.layer3.1.conv1.weight", "backbone.layer3.1.bn1.weight", "backbone.layer3.1.bn1.bias", "backbone.layer3.1.bn1.running_mean", "backbone.layer3.1.bn1.running_var", "backbone.layer3.1.conv2.weight", "backbone.layer3.1.bn2.weight", "backbone.layer3.1.bn2.bias", "backbone.layer3.1.bn2.running_mean", "backbone.layer3.1.bn2.running_var", "backbone.layer4.0.conv1.weight", "backbone.layer4.0.bn1.weight", "backbone.layer4.0.bn1.bias", "backbone.layer4.0.bn1.running_mean", "backbone.layer4.0.bn1.running_var", "backbone.layer4.0.conv2.weight", "backbone.layer4.0.bn2.weight", "backbone.layer4.0.bn2.bias", "backbone.layer4.0.bn2.running_mean", "backbone.layer4.0.bn2.running_var", "backbone.layer4.0.shortcut.0.weight", "backbone.layer4.0.shortcut.1.weight", "backbone.layer4.0.shortcut.1.bias", "backbone.layer4.0.shortcut.1.running_mean", "backbone.layer4.0.shortcut.1.running_var", "backbone.layer4.1.conv1.weight", "backbone.layer4.1.bn1.weight", "backbone.layer4.1.bn1.bias", "backbone.layer4.1.bn1.running_mean", "backbone.layer4.1.bn1.running_var", "backbone.layer4.1.conv2.weight", "backbone.layer4.1.bn2.weight", "backbone.layer4.1.bn2.bias", "backbone.layer4.1.bn2.running_mean", "backbone.layer4.1.bn2.running_var", "cluster_head.0.weight", "cluster_head.0.bias". Unexpected key(s) in state_dict: "model", "head".

    Please help me resolve this

    opened by qwedaq 0
  • Can I train the model from scratch?

    Can I train the model from scratch?

    Hi, I found that if I don't use the preetrained model, the code don't work. So I can't train from scratch? If I need to train with other dataset like Imagenet, or use other backbone like resnet50, how can I find pretrained model? Thank you.

    opened by michaelzfm 2
Owner
Sungwon Park
Master Student in KAIST, School of Computing
Sungwon Park
Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

ADGC: Awesome Deep Graph Clustering ADGC is a collection of state-of-the-art (SOTA), novel deep graph clustering methods (papers, codes and datasets).

yueliu1999 297 Dec 27, 2022
This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

JigsawClustering Jigsaw Clustering for Unsupervised Visual Representation Learning Pengguang Chen, Shu Liu, Jiaya Jia Introduction This project provid

DV Lab 73 Sep 18, 2022
Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

Arjun Majumdar 44 Dec 14, 2022
Fast and robust clustering of point clouds generated with a Velodyne sensor.

Depth Clustering This is a fast and robust algorithm to segment point clouds taken with Velodyne sensor into objects. It works with all available Velo

Photogrammetry & Robotics Bonn 957 Dec 21, 2022
PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in clustering (CVPR2021)

PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering Jang Hyun Cho1, Utkarsh Mall2, Kavita Bala2, Bharath Harihar

Jang Hyun Cho 164 Dec 30, 2022
DimReductionClustering - Dimensionality Reduction + Clustering + Unsupervised Score Metrics

Dimensionality Reduction + Clustering + Unsupervised Score Metrics Introduction

null 11 Nov 15, 2022
pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

Zhiqiang Shen 16 Nov 4, 2020
Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination The offical implementation for the "NOH-NMS: Improving Pedestrian Detection by

Tencent YouTu Research 64 Nov 11, 2022
PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

ContextNet ContextNet has CNN-RNN-transducer architecture and features a fully convolutional encoder that incorporates global context information into

Sangchun Ha 24 Nov 24, 2022
Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

Han Xu 129 Dec 11, 2022
LBK 35 Dec 26, 2022
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 7, 2023
Exploiting Robust Unsupervised Video Person Re-identification

Exploiting Robust Unsupervised Video Person Re-identification Implementation of the proposed uPMnet. For the preprint, please refer to [Arxiv]. Gettin

null 1 Apr 9, 2022
[3DV 2020] PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction

PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction International Conference on 3D Vision, 2020 Sai Sagar Jinka1, Rohan

Rohan Chacko 39 Oct 12, 2022
Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

arXiv Dual Contrastive Learning Adversarial Generative Networks (DCLGAN) We provide our PyTorch implementation of DCLGAN, which is a simple yet powerf

null 119 Dec 4, 2022
《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

null 42 Nov 24, 2022
IJCAI2020 & IJCV 2020 :city_sunrise: Unsupervised Scene Adaptation with Memory Regularization in vivo

Seg_Uncertainty In this repo, we provide the code for the two papers, i.e., MRNet:Unsupervised Scene Adaptation with Memory Regularization in vivo, IJ

Zhedong Zheng 348 Jan 5, 2023
IAST: Instance Adaptive Self-training for Unsupervised Domain Adaptation (ECCV 2020)

This repo is the official implementation of our paper "Instance Adaptive Self-training for Unsupervised Domain Adaptation". The purpose of this repo is to better communicate with you and respond to your questions. This repo is almost the same with Another-Version, and you can also refer to that version.

CVSM Group -  email: czhu@bupt.edu.cn 84 Dec 12, 2022
UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

UDP-Pose This is the pytorch implementation for UDP++, which won the Fisrt place in COCO Keypoint Challenge at ECCV 2020 Workshop. Top-Down Results on

null 20 Jul 29, 2022