Official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

Overview

Test-Agnostic Long-Tailed Recognition

This repository is the official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

  • TADE (our method) innovates the expert training scheme by introducing diversity-promoting expertise-guided losses, which train different experts to handle distinct class distributions. In this way, the learned experts would be more diverse than existing multi-expert methods, leading to better ensemble performance, and aggregatedly simulate a wide spectrum of possible class distributions.
  • TADE develops a new self-supervised method, namely prediction stability maximization, to adaptively aggregate these experts for better handling unknown test distribution, using unlabeled test class data.

Results

ImageNet-LT (ResNeXt-50)

Long-tailed recognition with uniform test class distribution:

Methods MACs(G) Top-1 acc. Model
Softmax 4.26 48.0
RIDE 6.08 56.3
TADE (ours) 6.08 58.8 Download

Test-agnostic long-tailed recognition:

Methods MACs(G) Forward-50 Forward-10 Uniform Backward-10 Backward-50
Softmax 4.26 66.1 60.3 48.0 34.9 27.6
RIDE 6.08 67.6 64.0 56.3 48.7 44.0
TADE (ours) 6.08 69.4 65.4 58.8 54.5 53.1

CIFAR100-Imbalance ratio 100 (ResNet-32)

Long-tailed recognition with uniform test class distribution:

Methods MACs(G) Top-1 acc.
Softmax 0.07 41.4
RIDE 0.11 48.0
TADE (ours) 0.11 49.8

Test-agnostic long-tailed recognition:

Methods MACs(G) Forward-50 Forward-10 Uniform Backward-10 Backward-50
Softmax 0.07 62.3 56.2 41.4 25.8 17.5
RIDE 0.11 63.0 57.0 48.0 35.4 29.3
TADE (ours) 0.11 65.9 58.3 49.8 43.9 42.4

Places-LT (ResNet-152)

Long-tailed recognition with uniform test class distribution:

Methods MACs(G) Top-1 acc.
Softmax 11.56 31.4
RIDE 13.18 40.3
TADE (ours) 13.18 40.9

Test-agnostic long-tailed recognition:

Methods MACs(G) Forward-50 Forward-10 Uniform Backward-10 Backward-50
Softmax 11.56 45.6 40.2 31.4 23.4 19.4
RIDE 13.18 43.1 41.6 40.3 38.2 36.9
TADE (ours) 13.18 46.4 43.3 40.9 41.4 41.6

iNaturalist 2018 (ResNet-50)

Long-tailed recognition with uniform test class distribution:

Methods MACs(G) Top-1 acc.
Softmax 4.14 64.7
RIDE 5.80 71.8
TADE (ours) 5.80 72.9

Test-agnostic long-tailed recognition:

Methods MACs(G) Forward-3 Forward-2 Uniform Backward-2 Backward-3
Softmax 4.14 65.4 65.5 64.7 64.0 63.4
RIDE 5.80 71.5 71.9 71.8 71.9 71.8
TADE (ours) 5.80 72.3 72.5 72.9 73.5 73.3

Requirements

  • To install requirements:
pip install -r requirements.txt

Hardware requirements

8 GPUs with >= 11G GPU RAM are recommended. Otherwise the model with more experts may not fit in, especially on datasets with more classes (the FC layers will be large). We do not support CPU training, but CPU inference could be supported by slight modification.

Datasets

Four bechmark datasets

  • Please download these datasets and put them to the /data file.
  • ImageNet-LT and Places-LT can be found at here.
  • iNaturalist data should be the 2018 version from here.
  • CIFAR-100 will be downloaded automatically with the dataloader.
data
├── ImageNet_LT
│   ├── test
│   ├── train
│   └── val
├── CIFAR100
│   └── cifar-100-python
├── Place365
│   ├── data_256
│   ├── test_256
│   └── val_256
└── iNaturalist 
    ├── test2018
    └── train_val2018

Txt files

  • We provide txt files for test-agnostic long-tailed recognition for ImageNet-LT, Places-LT and iNaturalist 2018. CIFAR-100 will be generated automatically with the code.
  • For iNaturalist 2018, please unzip the iNaturalist_train.zip.
data_txt
├── ImageNet_LT
│   ├── ImageNet_LT_backward2.txt
│   ├── ImageNet_LT_backward5.txt
│   ├── ImageNet_LT_backward10.txt
│   ├── ImageNet_LT_backward25.txt
│   ├── ImageNet_LT_backward50.txt
│   ├── ImageNet_LT_forward2.txt
│   ├── ImageNet_LT_forward5.txt
│   ├── ImageNet_LT_forward10.txt
│   ├── ImageNet_LT_forward25.txt
│   ├── ImageNet_LT_forward50.txt
│   ├── ImageNet_LT_test.txt
│   ├── ImageNet_LT_train.txt
│   ├── ImageNet_LT_uniform.txt
│   └── ImageNet_LT_val.txt
├── Places_LT_v2
│   ├── Places_LT_backward2.txt
│   ├── Places_LT_backward5.txt
│   ├── Places_LT_backward10.txt
│   ├── Places_LT_backward25.txt
│   ├── Places_LT_backward50.txt
│   ├── Places_LT_forward2.txt
│   ├── Places_LT_forward5.txt
│   ├── Places_LT_forward10.txt
│   ├── Places_LT_forward25.txt
│   ├── Places_LT_forward50.txt
│   ├── Places_LT_test.txt
│   ├── Places_LT_train.txt
│   ├── Places_LT_uniform.txt
│   └── Places_LT_val.txt
└── iNaturalist18
    ├── iNaturalist18_backward2.txt
    ├── iNaturalist18_backward3.txt
    ├── iNaturalist18_forward2.txt
    ├── iNaturalist18_forward3.txt
    ├── iNaturalist18_train.txt
    ├── iNaturalist18_uniform.txt
    └── iNaturalist18_val.txt 

Pretrained models

  • For the training on Places-LT, we follow previous method and use the pre-trained model.
  • Please download the checkpoint. Unzip and move the checkpoint files to /model/pretrained_model_places/.

Script

ImageNet-LT

Training

  • To train the expertise-diverse model, run this command:
python train.py -c configs/config_imagenet_lt_resnext50_tade.json

Evaluate

  • To evaluate expertise-diverse model on the uniform test class distribution, run:
python test.py -r checkpoint_path
  • To evaluate expertise-diverse model on agnostic test class distributions, run:
python test_all_imagenet.py -r checkpoint_path

Test-time training

  • To test-time train the expertise-diverse model for agnostic test class distributions, run:
python test_train_imagenet.py -c configs/test_time_imagenet_lt_resnext50_tade.json -r checkpoint_path

CIFAR100-LT

Training

  • To train the expertise-diverse model, run this command:
python train.py -c configs/config_cifar100_ir100_tade.json
  • One can change the imbalance ratio from 100 to 10/50 by changing the config file.

Evaluate

  • To evaluate expertise-diverse model on the uniform test class distribution, run:
python test.py -r checkpoint_path
  • To evaluate expertise-diverse model on agnostic test class distributions, run:
python test_all_cifar.py -r checkpoint_path

Test-time training

  • To test-time train the expertise-diverse model for agnostic test class distributions, run:
python test_train_cifar.py -c configs/test_time_cifar100_ir100_tade.json -r checkpoint_path
  • One can change the imbalance ratio from 100 to 10/50 by changing the config file.

Places-LT

Training

  • To train the expertise-diverse model, run this command:
python train.py -c configs/config_places_lt_resnet152_tade.json

Evaluate

  • To evaluate expertise-diverse model on the uniform test class distribution, run:
python test_places.py -r checkpoint_path
  • To evaluate expertise-diverse model on agnostic test class distributions, run:
python test_all_places.py -r checkpoint_path

Test-time training

  • To test-time train the expertise-diverse model for agnostic test class distributions, run:
python test_train_places.py -c configs/test_time_places_lt_resnet152_tade.json -r checkpoint_path

iNaturalist 2018

Training

  • To train the expertise-diverse model, run this command:
python train.py -c configs/config_iNaturalist_resnet50_tade.json

Evaluate

  • To evaluate expertise-diverse model on the uniform test class distribution, run:
python test.py -r checkpoint_path
  • To evaluate expertise-diverse model on agnostic test class distributions, run:
python test_all_inat.py -r checkpoint_path

Test-time training

  • To test-time train the expertise-diverse model for agnostic test class distributions, run:
python test_train_inat.py -c configs/test_time_iNaturalist_resnet50_tade.json -r checkpoint_path

Citation

If you find our work inspiring or use our codebase in your research, please cite our work.

@article{zhang2021test,
  title={Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision},
  author={Zhang, Yifan and Hooi, Bryan and Hong, Lanqing and Feng, Jiashi},
  journal={arXiv},
  year={2021}
}

Acknowledgements

This is a project based on this pytorch template.

The mutli-expert framework are based on RIDE. The data generation of agnostic test class distributions takes references from LADE.

Comments
  • GPU

    GPU

    hello. very sorry for commenting here, my question is from ppn portfolio which its issues is closed and I had no other way to ask from you. i tried to run that project on google colab but it took so much time which colab doesn't accept. i figured out that tensorflow 1.4.0 doesn't use gpu. is there any solution for that? i tried so much but i got no answer. please help me. and if there is anything that i should consider while using colab for that project, please remind me.

    regards

    opened by m1996 14
  • Try TADE on custom dataset

    Try TADE on custom dataset

    Hi,

    Your excellent work really catches my eye, and I want to try TADE on my own dataset to test if it works for industry tasks, but the results doesn't look good compared with conventional methods like focal loss. Results is shown below:

    TADE without Test Training: tr: 84 acc val: 87 acc TADE with Test Training 1 epoch: ** val 82.25** TADE with Test Training 5 epoch: ** val 38.41** TADE with Test Training 8 epoch: ** val 30** Focal Loss:tr: 88 acc val: 89 acc

    It seems like that result of TADE without test-training is slightly worse than Focal Loss. And with the increase of Test-Training epochs, the accuracy becomes worse. The custom dataset is about a industrial defect classification task, so most pictures have similar background. And pictures can be divided into three categories. train datasetset cls_num_list = [2883,1019,56]

    Question: I am not sure if it is because there are only three categories in my dataset, so that it is hard for the output_logit vector to represent similarity, which makes the performance of self-supervised aggregation worse. Do you have idea about that?

    opened by Lllllp93 12
  • About CIFAR10-LT's Implementation details

    About CIFAR10-LT's Implementation details

    Hello, In your paper,the top-1 accuracy on CIFAR10-LT(Imbalance Ratio=10,100) is 90.8% and 83.8%,but when I run your source code,the top-1 accuracy on CIFAR10-LT(Imbalance Ratio=10,100) is 90.16% and 82.92%,What are the specific Implementation details on CIFAR10-LT? Thank you~

    opened by sunhappy6900 6
  • n_gpus vs batch_size

    n_gpus vs batch_size

    Are you offsetting the batch_size for the number n_gpus in the config itself?

    CIFAR100:

    • n_gpus = 1
    • batch_size = 128 Effective batchsize => 128*1 = 128 ?

    ImageNet-LT:

    • n_gpus = 2
    • batch_size = 64 Effective batchsize => 64*2 = 128 ?

    iNaturalist18:

    • n_gpus = 4
    • batch_size = 512 Effective batchsize => 512*4 = 2048 ?
    opened by rahulvigneswaran 5
  • Doubts regarding the experimental setup

    Doubts regarding the experimental setup

    1. For CIFAR100-LT a. Are there different val and test set? b. On what dataset split do you choose the best-trained model? c. What split do you use for hyperparam tuning?

    2. For iNaturalist18 a. Are there different val and test set? b. On what dataset split do you choose the best-trained model? c. What split do you use for hyperparam tuning? d. Even though there is an officially available test set (https://github.com/visipedia/inat_comp/tree/master/2018#Data) for iNaturalist18, why don't you use that?

    3. General doubts a. What seeds do you use? b. Do you take a mean of multiple seeds?

    opened by rahulvigneswaran 4
  • A question about perferance.

    A question about perferance.

    A great job. Your work solves a wider range of LT problems.

    But I m confused with TADE performance on the vanilla LT test set.

    Actually, with the same backbone and training strategy, the following methods adopt almost the same loss, but the top-1 ACC varies, for example on CIFAR100-LT-IR-100:

    • ICLR'21 logit adjustment [43.89% cf. origin paper Tab.3 ]
    • CVPR'21 LADE without test prior [45.6% cf. this paper Tab.8(a)]
    • NeurIPS'20 Balanced Softmax which can be rewritten as Eq.3 in this paper [46.1% cf. this paper Tab.8(a)]

    In such a situation, TADE should get the best performance when the expert E2 (Eq.3 in this paper) mainly works. If so, it should not outperform the above methods by a large margin, right?

    However, the TADE's top-1 ACC is 49.8% (cf. this paper Tab.8(a)) and the weight of experts is [0.40 0.35 0.24] (cf. this paper Tab. 12). The E1 mainly works.

    So I just wondering how to explain the improvement of TADE on the vanilla test dataset?

    opened by XuZhengzhuo 3
  • About Backbone

    About Backbone

    Hi, Thank you very much for your work. I would like to ask if you tried to use ResNeXt101-32x4d instead of ResNeXt50 in your experiments. After my experiments, ResNeXt101 is not as effective as ResNeXt50. Are there any other parameters that need to be changed besides the backbone?

    Best,

    opened by oldfemalepig 3
  • About a question of test_training_cifar.py

    About a question of test_training_cifar.py

    In line 200 and 201 of test_training_cifar.py: dataset = IMBALANCECIFAR100(data_dir, train=True, download=True, transform=train_trsfm, imb_type=imb_type, imb_factor=test_imb_factor, reverse=reverse) train_dataset = IMBALANCECIFAR100(data_dir, train=True, download=True, transform= TwoCropsTransform(train_trsfm), imb_type=imb_type, imb_factor=test_imb_factor, reverse=reverse) why you set the train is True? I think it should be False to obtain the weighting parameters of test set. Can you explain it? Thanks!

    opened by lastonephy 2
  • About the setting of shared backbone and separate expert

    About the setting of shared backbone and separate expert

    Hi~ Thanks for your excellent work. I have two questions about the paper and the code. (1) I notice that, in default, the shared backbone contains only the layer_1 & layer_2 of resent, other layers in resnet (layer_3 & layer_4) are all in the "expert". Could this setting be described as "shared backbone"? I mean, by saying "shared backbone", the readers will assume that only the classier layer is treated as the "expert". (2) Have you tried the setting that only the classier layer is treated as the "expert"? How much the performance decrease is?

    opened by zhiyuanyou 2
  • Where to find the Objective function in the code

    Where to find the Objective function in the code

    In 4.3 of the paper, objective function is used to calculate the weight of each expert, I guess it is in test_all.py, but I can't find it. Please tell me the spefic position in the code, thank you!

    opened by madoka109 2
  • Implementation detail about LDAM loss

    Implementation detail about LDAM loss

    Hi Vanint, I notice that in your LDAM loss implementation the scale is applied on the the adjustment only

    x_m = x - batch_m * self.s 
    

    which is different from the original LDAM loss

     return F.cross_entropy(self.s*output, target, weight=self.weight)
    

    basically equivalent to

    x_m = (x - batch_m) * self.s 
    

    could you explained more on this detail? Is the coefficient absorbed somewhere on the logit output?

    opened by fliman 2
Owner
vanint
vanint
Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention ACL2021 Findings Usage 0. Prepare environment Requirements: python==3.6 te

Xiaobao Wu 8 Dec 16, 2022
SummerTime - Text Summarization Toolkit for Non-experts

A library to help users choose appropriate summarization tools based on their specific tasks or needs. Includes models, evaluation metrics, and datasets.

Yale-LILY 213 Jan 4, 2023
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Language-Agnostic SEntence Representations

LASER Language-Agnostic SEntence Representations LASER is a library to calculate and use multilingual sentence embeddings. NEWS 2019/11/08 CCMatrix is

Facebook Research 3.2k Jan 4, 2023
xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Description xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building bl

Facebook Research 2.3k Jan 8, 2023
[NeurIPS 2021] Code for Learning Signal-Agnostic Manifolds of Neural Fields

Learning Signal-Agnostic Manifolds of Neural Fields This is the uncleaned code for the paper Learning Signal-Agnostic Manifolds of Neural Fields. The

null 60 Dec 12, 2022
skweak: A software toolkit for weak supervision applied to NLP tasks

Labelled data remains a scarce resource in many practical NLP scenarios. This is especially the case when working with resource-poor languages (or text domains), or when using task-specific labels without pre-existing datasets. The only available option is often to collect and annotate texts by hand, which is expensive and time-consuming.

Norsk Regnesentral (Norwegian Computing Center) 850 Dec 28, 2022
Labelling platform for text using distant supervision

With DataQA, you can label unstructured text documents using rule-based distant supervision.

null 245 Aug 5, 2022
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

Tsukinousag1 3 Apr 2, 2022
A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Simple-Vosk A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk. Check out the official Vosk G

null 2 Jun 19, 2022
Mirco Ravanelli 2.3k Dec 27, 2022
PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

data2vec-pytorch PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI (F

Aryan Shekarlaban 105 Jan 4, 2023
Extracting Summary Knowledge Graphs from Long Documents

GraphSum This repo contains the data and code for the G2G model in the paper: Extracting Summary Knowledge Graphs from Long Documents. The other basel

Zeqiu (Ellen) Wu 10 Oct 21, 2022
ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension

ThinkTwice ThinkTwice is a retriever-reader architecture for solving long-text machine reading comprehension. It is based on the paper: ThinkTwice: A

Walle 4 Aug 6, 2021
Beyond Paragraphs: NLP for Long Sequences

Beyond Paragraphs: NLP for Long Sequences

AI2 338 Dec 2, 2022
Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage >>> from transformers import RemBertToken

Koichi Yasuoka 3 Dec 22, 2021
Long text token classification using LongFormer

Long text token classification using LongFormer

abhishek thakur 161 Aug 7, 2022