A scientific and useful toolbox, which contains practical and effective long-tail related tricks with extensive experimental results

Bag of tricks for long-tailed visual recognition with deep convolutional neural networks

This repository is the official PyTorch implementation of AAAI-21 paper Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks, which provides practical and effective tricks used in long-tailed image classification.

Trick gallery: trick_gallery.md

The tricks will be constantly updated. If you have or need any long-tail related trick newly proposed, please to open an issue or pull requests. Make sure to attach the results in corresponding md files if you pull a request with a new trick.
For any problem, such as bugs, feel free to open an issue.

Paper collection of long-tailed visual recognition

Awesome-of-Long-Tailed-Recognition

Long-Tailed-Classification-Leaderboard

Development log

Trick gallery and combinations

Brief inroduction

We divided the long-tail realted tricks into four families: re-weighting, re-sampling, mixup training, and two-stage training. For more details of the above four trick families, see the original paper.

Detailed information :

Trick gallery:

Tricks, corresponding results, experimental settings, and running commands are listed in trick_gallery.md.
Trick combinations:

Combinations of different tricks, corresponding results, experimental settings, and running commands are listed in trick_combination.md.
These tricks and trick combinations, which provide the corresponding results in this repo, have been reorgnized and tested. We are trying our best to deal with the rest, which will be constantly updated.

Main requirements

torch >= 1.4.0
torchvision >= 0.5.0
tensorboardX >= 2.1
tensorflow >= 1.14.0 #convert long-tailed cifar datasets from tfrecords to jpgs
Python 3
apex

We provide the detailed requirements in requirements.txt. You can run pip install requirements.txt to create the same running environment as ours.
The apex is recommended to be installed for saving GPU memories:

pip install -U pip
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

If the apex is not installed, the Distributed training with DistributedDataParallel in our codes cannot be used.

Preparing the datasets

We provide three datasets in this repo: long-tailed CIFAR (CIFAR-LT), long-tailed ImageNet (ImageNet-LT), and iNaturalist 2018 (iNat18).

The detailed information of these datasets are shown as follows:

Datasets	CIFAR-10-LT		CIFAR-100-LT		ImageNet-LT	iNat18
	Imbalance factor
	100	50	100	50
Training images	12,406	13,996	10,847	12,608	11,5846	437,513
Classes	50	50	100	100	1,000	8,142
Max images	5,000	5,000	500	500	1,280	1,000
Min images	50	100	5	10	5	2
Imbalance factor	100	50	100	50	256	500

- `Max images` and `Min images` represents the number of training images in the largest and smallest classes, respectively.

- CIFAR-10-LT-100 means the long-tailed CIFAR-10 dataset with the imbalance factor $\beta = 100$.

- Imbalance factor is defined as $\beta = \frac{\text{Max images}}{\text{Min images}}$.

Data format

The annotation of a dataset is a dict consisting of two field: annotations and num_classes. The field annotations is a list of dict with image_id, fpath, im_height, im_width and category_id.

Here is an example.

{
    'annotations': [
                    {
                        'image_id': 1,
                        'fpath': '/data/iNat18/images/train_val2018/Plantae/7477/3b60c9486db1d2ee875f11a669fbde4a.jpg',
                        'im_height': 600,
                        'im_width': 800,
                        'category_id': 7477
                    },
                    ...
                   ]
    'num_classes': 8142
}

CIFAR-LT

There are two versions of CIFAR-LT.
1. Cui et al., CVPR 2019 firstly proposed the CIFAR-LT. They provided the download link of CIFAR-LT, and also the codes to generate the data, which are in TensorFlow.
  
  You can follow the steps below to get this version of CIFAR-LT:
  1. Download the Cui's CIFAR-LT in GoogleDrive or Baidu Netdisk (password: 5rsq). Suppose you download the data and unzip them at path /downloaded/data/.
  2. Run tools/convert_from_tfrecords, and the converted CIFAR-LT and corresponding jsons will be generated at /downloaded/converted/.
```
# Convert from the original format of CIFAR-LT
python tools/convert_from_tfrecords.py  --input_path /downloaded/data/ --out_path /downloaded/converted/
```
1. Cao et al., NeurIPS 2019 followed Cui et al., CVPR 2019's method to generate the CIFAR-LT randomly. They modify the CIFAR datasets provided by PyTorch as this file shows.
ImageNet-LT

You can use the following steps to convert from the original images of ImageNet-LT.
1. Download the original ILSVRC-2012. Suppose you have downloaded and reorgnized them at path /downloaded/ImageNet/, which should contain two sub-directories: /downloaded/ImageNet/train and /downloaded/ImageNet/val.
2. Download the train/test splitting files (ImageNet_LT_train.txt and ImageNet_LT_test.txt) in GoogleDrive or Baidu Netdisk (password: cj0g). Suppose you have downloaded them at path /downloaded/ImageNet-LT/.
3. Run tools/convert_from_ImageNet.py, and you will get two jsons: ImageNet_LT_train.json and ImageNet_LT_val.json.
```
# Convert from the original format of ImageNet-LT
python tools/convert_from_ImageNet.py --input_path /downloaded/ImageNet-LT/ --image_path /downloaed/ImageNet/ --output_path ./
```

iNat18

You can use the following steps to convert from the original format of iNaturalist 2018.

The images and annotations should be downloaded at iNaturalist 2018 firstly. Suppose you have downloaded them at path /downloaded/iNat18/.
Run tools/convert_from_iNat.py, and use the generated iNat18_train.json and iNat18_val.json to train.

# Convert from the original format of iNaturalist
# See tools/convert_from_iNat.py for more details of args 
python tools/convert_from_iNat.py --input_json_file /downloaded/iNat18/train2018.json --image_path /downloaded/iNat18/images --output_json_file ./iNat18_train.json

python tools/convert_from_iNat.py --input_json_file /downloaded/iNat18/val2018.json --image_path /downloaded/iNat18/images --output_json_file ./iNat18_val.json

Usage

In this repo:

The results of CIFAR-LT (ResNet-32) and ImageNet-LT (ResNet-10), which need only one GPU to train, are gotten by DataParallel training with apex.
The results of iNat18 (ResNet-50), which need more than one GPU to train, are gotten by DistributedDataParallel training with apex.
If more than one GPU is used, DistributedDataParallel training is efficient than DataParallel training, especially when the CPU calculation forces are limited.

Training

Parallel training with DataParallel

1, To train
# To train long-tailed CIFAR-10 with imbalanced ratio of 50. 
# `GPUs` are the GPUs you want to use, such as `0,4`.
bash data_parallel_train.sh configs/test/data_parallel.yaml GPUs

Distributed training with DistributedDataParallel

1, Change the NCCL_SOCKET_IFNAME in run_with_distributed_parallel.sh to [your own socket name]. 
export NCCL_SOCKET_IFNAME = [your own socket name]

2, To train
# To train long-tailed CIFAR-10 with imbalanced ratio of 50. 
# `GPUs` are the GPUs you want to use, such as `0,1,4`.
# `NUM_GPUs` are the number of GPUs you want to use. If you set `GPUs` to `0,1,4`, then `NUM_GPUs` should be `3`.
bash distributed_data_parallel_train.sh configs/test/distributed_data_parallel.yaml NUM_GPUs GPUs

Validation

You can get the validation accuracy and the corresponding confusion matrix after running the following commands.

See main/valid.py for more details.

1, Change the TEST.MODEL_FILE in the yaml to your own path of the trained model firstly.
2, To do validation
# `GPUs` are the GPUs you want to use, such as `0,1,4`.
python main/valid.py --cfg [Your yaml] --gpus GPUS

The comparison between the baseline results using our codes and the references [Cui, Kang]

We use Top-1 error rates as our evaluation metric.

From the results of two CIFAR-LT, we can see that the CIFAR-LT provided by Cao has much lower Top-1 error rates on CIFAR-10-LT, compared with the baseline results reported in his paper. So, in our experiments, we use the CIFAR-LT of Cui for fairness.
For the ImageNet-LT, we find that the color_jitter augmentation was not included in our experiments, which, however, is adopted by other methods. So, in this repo, we add the color_jitter augmentation on ImageNet-LT. The old baseline without color_jitter is 64.89, which is +1.15 points higher than the new baseline.
You can click the Baseline in the table below to see the experimental settings and corresponding running commands.

Datasets	Cui et al., 2019				Cao et al., 2020				ImageNet-LT	iNat18
	CIFAR-10-LT		CIFAR-100-LT		CIFAR-10-LT		CIFAR-100-LT
	Imbalance factor				Imbalance factor
	100	50	100	50	100	50	100	50
Backbones	ResNet-32				ResNet-32				ResNet-10	ResNet-50
Baselines using our codes CONFIG (from left to right): configs/cui_cifar/baseline/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} configs/cao_cifar/baseline/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml} configs/ImageNet_LT/imagenetlt_baseline.yaml configs/iNat18/iNat18_baseline.yaml Running commands: For CIFAR-LT and ImageNet-LT: bash data_parallel_train.sh CONFIG GPU For iNat18: bash distributed_data_parallel_train.sh configs/iNat18/iNat18_baseline.yaml NUM_GPUs GPUs	30.12	24.81	61.76	57.65	28.05	23.55	62.27	56.22	63.74	40.55
Reference [Cui, Kang, Liu]	29.64	25.19	61.68	56.15	29.64	25.19	61.68	56.15	64.40	42.86

Citation

@inproceedings{zhang2020tricks,
  author    = {Yongshun Zhang and Xiu{-}Shen Wei and Boyan Zhou and Jianxin Wu},
  title     = {Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks},
  booktitle = {AAAI},
  year      = {2021},
}

Contacts

If you have any question about our work, please do not hesitate to contact us by emails provided in the paper.

Hi, where we find the supplemental materials? Thanks.

Sorry for replying late. You can find the supp at http://www.lamda.nju.edu.cn/zhangys/papers/AAAI_tricks_supp.pdf, and I will update it to README soon.

Thank you!

Also I am curious about a statement in the paper: We can also find that combining CS_CE and CAM-based balance-sampling together cannot further improve the accuracy, since both of them try to enlarge the influence of tail classes and the joint use of the two could cause an accuracy drop due to the overfitting problem. How do you observe the overfitting or this is just a hypothesis? Thanks.

Originally posted by @jrcai in https://github.com/zhangyongshun/BagofTricks-LT/issues/1#issuecomment-767092525

About the effect of input mixup to long-tailed learning

Hi, thank you for providing so many tricks to solve the problem of long-tailed recognition. I'm wondering the baseline with only input mixup on the dataset of CIFAR100 with imbalance ratio 100 could get error rate 59.66(58.21). In my experiments, the error rate is only around 61.0(60.2) according to the average results of multiple experiments.

opened by mingliangzhang2018 9
About DRS training

Hello! Thanks for your contribution. I have such questions: The DRS strategy described in Decoupling representation and classifier for long-tailed recognition is that: first train whole network for 90 or 200 epochs, then freeze the backbone and re-initialize a classifier and train. But the DRS strategy in the code is just to change a different sampler? or I just misunderstand the code?

opened by adf1178 2
Reported Accuracy

Can you confirm that the following splits are used for reporting the final accuracies in the paper?

CIFAR100-LT: Val Split ImageNet-LT: Test Split iNaturalist18: Val Split

Do all the earlier works follow the same, that is report final accuracies for CIFAR100-LT and iNaturalist18 only for validation splits and not test splits?

opened by rahulvigneswaran 1
about Trick combinations
Hello! Thanks for your contribution. I have such questions:

In the combination of IM & DRS with CAM-BS, At what stage is IM used? or both stage?

In the fine-tuning after mixup training (Table 11), which epoch do you remove the mixup?

Is there some configs for the combination tricks ?

Thanks a lot ~
opened by Ein027 1
> > Hi, where we find the supplemental materials? Thanks.

Hi, where we find the supplemental materials? Thanks.

Sorry for replying late. You can find the supp at http://www.lamda.nju.edu.cn/zhangys/papers/AAAI_tricks_supp.pdf, and I will update it to README soon.

Thank you!

Also I am curious about a statement in the paper: We can also find that combining CS_CE and CAM-based balance-sampling together cannot further improve the accuracy, since both of them try to enlarge the influence of tail classes and the joint use of the two could cause an accuracy drop due to the overfitting problem. How do you observe the overfitting or this is just a hypothesis? Thanks.

Originally posted by @jrcai in https://github.com/zhangyongshun/BagofTricks-LT/issues/1#issuecomment-767092525

opened by zhangyongshun 0
About configs for bag-of-tricks on iNat18

Hi, Thanks for your work and sharing your codes! I'm wondering if you can provide the config files for training your bag-of-tricks on iNat18? Thanks a lot.

opened by ruikangliu 0
About download link imagenet-100t

https://image-net.org/challenges/LSVRC/2012/signup I can't find the download link about imagenet-100t ImageNet-LT You can use the following steps to convert from the original images of ImageNet-LT.

Download the original ILSVRC-2012. Suppose you have downloaded and reorgnized them at path /downloaded/ImageNet/, which should contain two sub-directories: /downloaded/ImageNet/train and /downloaded/ImageNet/val. Download the train/test splitting files (ImageNet_LT_train.txt and ImageNet_LT_test.txt) in GoogleDrive or Baidu Netdisk (password: cj0g). Suppose you have downloaded them at path /downloaded/ImageNet-LT/. Run tools/convert_from_ImageNet.py, and you will get two jsons: ImageNet_LT_train.json and ImageNet_LT_val.json.

opened by alice-cool 0
About configs

Hi， I found there are only cifar's config files of combinations . Can you please offer imagenet and inaturalist's config files of combinations? I couldn't find some import parameters , for example, cfg.DATASET.CAM_NUMBER_THRES which is needed when I train on imagnet and iNat. Thank you!

opened by michaelzfm 5
about Tau-norm

更新了decouple的部分，好耶 But i reproduce tau_norm of cifar10_im50 and found acc was only 75.98. I first trained baseline for 200 epochs. And use tau_norm.yaml provided in the repo. Is there anything missing?

opened by adf1178 3

A scientific and useful toolbox, which contains practical and effective long-tail related tricks with extensive experimental results

Related tags

Overview

Bag of tricks for long-tailed visual recognition with deep convolutional neural networks

Trick gallery: trick_gallery.md

Paper collection of long-tailed visual recognition

Development log

Trick gallery and combinations

Brief inroduction

Detailed information :

Tricks, corresponding results, experimental settings, and running commands are listed in trick_gallery.md.

Combinations of different tricks, corresponding results, experimental settings, and running commands are listed in trick_combination.md.

Main requirements

Preparing the datasets

Data format

CIFAR-LT

ImageNet-LT

iNat18

Usage

Training

Parallel training with DataParallel

Distributed training with DistributedDataParallel

Validation

The comparison between the baseline results using our codes and the references [Cui, Kang]

Citation

Contacts

Comments

Owner

Yong-Shun Zhang

Implementation of DropLoss for Long-Tail Instance Segmentation in Pytorch

Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment

Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

OpenLT: An open-source project for long-tail classification

A coin flip game in which you can put the amount of money below or equal to 1000 and then choose heads or tail

A FAIR dataset of TCV experimental results for validating edge/divertor turbulence models.

An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

Evaluating different engineering tricks that make RL work

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

This package contains deep learning models and related scripts for RoseTTAFold

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang