Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Google Research

Last update: Nov 1, 2022

Related tags

Deep Learning crest

Overview

CReST in Tensorflow 2

Code for the paper: "CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning" by Chen Wei, Kihyuk Sohn, Clayton Mellina, Alan Yuille and Fan Yang.

This is not an officially supported Google product.

Install dependencies

sudo apt install python3-dev python3-virtualenv python3-tk imagemagick
virtualenv -p python3 --system-site-packages env3
. env3/bin/activate
pip install -r requirements.txt

The code has been tested on Ubuntu 18.04 with CUDA 10.2.

Environment setting

. env3/bin/activate
export ML_DATA=/path/to/your/data
export ML_DIR=/path/to/your/code
export RESULT=/path/to/your/result
export PYTHONPATH=$PYTHONPATH:$ML_DIR

Datasets

Download or generate the datasets as follows:

CIFAR10 and CIFAR100: Follow the steps to download and generate balanced CIFAR10 and CIFAR100 datasets. Put it under ${ML_DATA}/cifar, for example, ${ML_DATA}/cifar/cifar10-test.tfrecord.
Long-tailed CIFAR10 and CIFAR100: Follow the steps to download the datasets prepared by Cui et al. Put it under ${ML_DATA}/cifar-lt, for example, ${ML_DATA}/cifar-lt/cifar-10-data-im-0.1.

Running experiment on Long-tailed CIFAR10, CIFAR100

Run MixMatch (paper) and FixMatch (paper):

Specify method to run via --method. It can be fixmatch or mixmatch.
Specify dataset via --dataset. It can be cifar10lt or cifar100lt.
Specify the class imbalanced ratio, i.e., the number of training samples from the most minority class over that from the most majority class, via --class_im_ratio.
Specify the percentage of labeled data via --percent_labeled.
Specify the number of generations for self-training via --num_generation.
Specify whether to use distribution alignment via --do_distalign.
Specify the initial distribution alignment temperature via --dalign_t.

Specify how distribution alignment is applied via --how_dalign. It can be constant or adaptive.

python -m train_and_eval_loop \
  --model_dir=/tmp/model \
  --method=fixmatch \
  --dataset=cifar10lt \
  --input_shape=32,32,3 \
  --class_im_ratio=0.01 \
  --percent_labeled=0.1 \
  --fold=1 \
  --num_epoch=64 \
  --num_generation=6 \
  --sched_level=1 \
  --dalign_t=0.5 \
  --how_dalign=adaptive \
  --do_distalign=True

Results

The code reproduces main results of the paper. For all settings and methods, we run experiments on 5 different folds and report the mean and standard deviations. Note that the numbers may not exactly match those from the papers as there are extra randomness coming from the training.

Results on Long-tailed CIFAR10 with 10% labeled data (Table 1 in the paper).

	gamma=50	gamma=100	gamma=200
FixMatch	79.4 (0.98)	66.2 (0.83)	59.9 (0.44)
CReST	83.7 (0.40)	75.4 (1.62)	63.9 (0.67)
CReST+	84.5 (0.41)	77.7 (1.22)	67.5 (1.36)

Training with Multiple GPUs

Simply set CUDA_VISIBLE_DEVICES=0,1,2,3 or any number of GPUs.
Make sure that batch size is divisible by the number of GPUs.

Augmentation

One can concatenate different augmentation shortkeys to compose an augmentation sequence.
- d: default augmentation, resize and shift.
- h: horizontal flip.
- ra: random augment with all augmentation ops.
- rc: random augment with color augmentation ops only.
- rg: random augment with geometric augmentation ops only.
- c: cutout.
- For example, dhrac applies shift, flip, random augment with all ops, followed by cutout.

Citing this work

@article{wei2021crest,
    title={CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning},
    author={Chen Wei and Kihyuk Sohn and Clayton Mellina and Alan Yuille and Fan Yang},
    journal={arXiv preprint arXiv:2102.09559},
    year={2021},
}

You might also like...

The code for our paper Semi-Supervised Learning with Multi-Head Co-Training

Semi-Supervised Learning with Multi-Head Co-Training (PyTorch) Abstract Co-training, extended from self-training, is one of the frameworks for semi-su

6 Dec 4, 2022

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

5 Dec 10, 2022

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

MiVOS (CVPR 2021) - Mask Propagation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [arXiv] [Paper PDF] [Project Page] [Papers with Code] This repo impleme

106 Jan 3, 2023

This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised d

3 Oct 22, 2021

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

This repo is the official implementation of "Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework". @inproceedings{zhou2021insta

34 Dec 31, 2022

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

10 Oct 11, 2022

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning Overview This code is for paper: Not All Unlabeled Data are Equa

22 Nov 23, 2022

PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Contrast to Divide: self-supervised pre-training for learning with noisy labels This is an official implementation of "Contrast to Divide: self-superv

55 Nov 23, 2022

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Self-Supervised Vision Transformers with DINO PyTorch implementation and pretrained models for DINO. For details, see Emerging Properties in Self-Supe

4.2k Jan 3, 2023

Comments

Batch Size of Unlabeled Data

Hi,

Thanks for your great work! I just have a quick question while going through your hyper-parameters.

In train_and_eval_loop.py, the parameter unlab_ratio is set to 1 whereas FixMatch uses a larger ratio like 7. What are the batch sizes used for obtaining the results in the paper? I didn't seem to find any specification of batch sizes in the manucript.

Thanks for your time and enjoy the rest of your day!

Best

opened by ZhuoranYu 2
tensorflow.python.framework.errors_impl.InvalidArgumentError: limit must be a scalar, not shape [1] [Op:Range] name: range/ evaluting generation at epoch 64

Here is the information of error:

evaluting generation at epoch 64
acc 0.655 per_class_recall [0.978 0.994 0.858 0.789 0.806 0.52 0.615 0.547 0.222 0.224] per_class_precision [0.4882676 0.5748988 0.63697106 0.52113605 0.72743684 0.83467096 0.94180703 0.9513044 1. 0.99115044] confusion_matrix [[978. 3. 9. 7. 2. 1. 0. 0. 0. 0.] [ 5. 994. 0. 0. 0. 1. 0. 0. 0. 0.] [ 64. 2. 858. 26. 27. 15. 7. 1. 0. 0.] [ 48. 7. 69. 789. 28. 39. 13. 7. 0. 0.] [ 38. 0. 70. 62. 806. 7. 10. 7. 0. 0.] [ 19. 2. 84. 316. 43. 520. 4. 12. 0. 0.] [ 38. 7. 147. 152. 34. 7. 615. 0. 0. 0.] [ 67. 2. 75. 116. 160. 31. 2. 547. 0. 0.] [584. 142. 27. 15. 5. 1. 2. 0. 222. 2.] [162. 570. 8. 31. 3. 1. 0. 1. 0. 224.]] Traceback (most recent call last):
File "/userdata/anaconda3/envs/hty-crest/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/userdata/anaconda3/envs/hty-crest/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/userdata/hty/crest-master/train_and_eval_loop.py", line 200, in app.run(main) File "/userdata/anaconda3/envs/hty-crest/lib/python3.7/site-packages/absl/app.py", line 303, in run _run_main(main, args) File "/userdata/anaconda3/envs/hty-crest/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/userdata/hty/crest-master/train_and_eval_loop.py", line 196, in main trainer.train_generations() File "/userdata/hty/crest-master/util/semisup.py", line 155, in train_generations pseudo_label_list = self.eval() File "/userdata/hty/crest-master/util/semisup.py", line 264, in eval pseudo_label_list = self.get_pseudo_label_list(unlab_scores) File "/userdata/hty/crest-master/util/semisup.py", line 292, in get_pseudo_label_list idx = tf.range(y_pred.shape) File "/userdata/anaconda3/envs/hty-crest/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 1438, in range return gen_math_ops._range(start, limit, delta, name=name) File "/userdata/anaconda3/envs/hty-crest/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 7790, in _range _six.raise_from(_core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: limit must be a scalar, not shape [1] [Op:Range] name: range/ evaluting generation at epoch 64 acc 0.687

Besides, there is nothing in the $RESULT directory, how to fix this?

opened by hantongyou 1
TypeError: int() argument must be a string, a bytes-like object or a number, not 'Tensor'
Hi, I got some error messages:

/crest/third_party/rand_augment.py:80 _posterize_level_to_arg return (int((level / MAX_LEVEL) * 4), )

Is it something with the version of TensorFlow? My version is 2.4.0.
opened by JoyHuYY1412 2

Owner

Google Research

GitHub

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

ST++ This is the official PyTorch implementation of our paper: ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation. Lihe Ya

147 Jan 3, 2023

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

282 Jan 9, 2023

MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

This repository is the official PyTorch implementation of Meta-Balance. Find the paper on arxiv MetaBalance: High-Performance Neural Networks for Clas

20 Oct 18, 2021

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

Balanced-Evolutionary-Semi-Stacking Code for the paper ''BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalan

0 Jan 16, 2022

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

92 Nov 30, 2022

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

2 Dec 17, 2021

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

59 Dec 28, 2022

Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Related tags

Overview

CReST in Tensorflow 2

Install dependencies

Environment setting

Datasets

Running experiment on Long-tailed CIFAR10, CIFAR100

Results

Training with Multiple GPUs

Augmentation

Citing this work

You might also like...

The code for our paper Semi-Supervised Learning with Multi-Head Co-Training

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.

PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Comments

Batch Size of Unlabeled Data

tensorflow.python.framework.errors_impl.InvalidArgumentError: limit must be a scalar, not shape [1] [Op:Range] name: range/ evaluting generation at epoch 64

Here is the information of error:

TypeError: int() argument must be a string, a bytes-like object or a number, not 'Tensor'

Owner

Google Research

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

Official codes: Self-Supervised Learning by Estimating Twin Class Distribution

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Related tags

Overview

CReST in Tensorflow 2

Install dependencies

Environment setting

Datasets

Running experiment on Long-tailed CIFAR10, CIFAR100

Results

Training with Multiple GPUs

Augmentation

Citing this work

You might also like...

The code for our paper Semi-Supervised Learning with Multi-Head Co-Training

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Comments

Batch Size of Unlabeled Data

tensorflow.python.framework.errors_impl.InvalidArgumentError: limit must be a scalar, not shape [1] [Op:Range] name: range/ evaluting generation at epoch 64

Here is the information of error:

TypeError: int() argument must be a string, a bytes-like object or a number, not 'Tensor'

Owner

Google Research

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

Official codes: Self-Supervised Learning by Estimating Twin Class Distribution

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.