[NeurIPS 2020] This project provides a strong single-stage baseline for Long-Tailed Classification, Detection, and Instance Segmentation (LVIS).

Overview

A Strong Single-Stage Baseline for Long-Tailed Problems

Python PyTorch

This project provides a strong single-stage baseline for Long-Tailed Classification (under ImageNet-LT, Long-Tailed CIFAR-10/-100 datasets), Detection, and Instance Segmentation (under LVIS dataset). It is also a PyTorch implementation of the NeurIPS 2020 paper Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect, which proposes a general solution to remove the bad momentum causal effect for a variety of Long-Tailed Recognition tasks. The codes are organized into three folders:

  1. The classification folder supports long-tailed classification on ImageNet-LT, Long-Tailed CIFAR-10/CIFAR-100 datasets.
  2. The lvis_old folder (deprecated) supports long-tailed object detection and instance segmentation on LVIS V0.5 dataset, which is built on top of mmdet V1.1.
  3. The latest version of long-tailed detection and instance segmentation is under lvis1.0 folder. Since both LVIS V0.5 and mmdet V1.1 are no longer available on their homepages, we have to re-implement our method on mmdet V2.4 using LVIS V1.0 annotations.

Slides

If you want to present our work in your group meeting / introduce it to your friends / seek answers for some ambiguous parts in the paper, feel free to use our slides. It has two versions: one-hour full version and five-minute short version.

Installation

The classification part allows the lower version of the following requirements. However, in detection and instance segmentation (mmdet V2.4), I tested some lower versions of python and pytorch, which are all failed. If you want to try other environments, please check the updates of mmdetection.

Requirements:

  • PyTorch >= 1.6.0
  • Python >= 3.7.0
  • CUDA >= 10.1
  • torchvision >= 0.7.0
  • gcc version >= 5.4.0

Step-by-step installation

conda create -n longtail pip python=3.7 -y
source activate longtail
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
pip install pyyaml tqdm matplotlib sklearn h5py

# download the project
git clone https://github.com/KaihuaTang/Long-Tailed-Recognition.pytorch.git
cd Long-Tailed-Recognition.pytorch

# the following part is only used to build mmdetection 
cd lvis1.0
pip install mmcv-full
pip install mmlvis
pip install -r requirements/build.txt
pip install -v -e .  # or "python setup.py develop"

Additional Notes

When we wrote the paper, we are using lvis V0.5 and mmdet V1.1 for our long-tailed instance segmentation experiments, but they've been deprecated by now. If you want to reproduce our results on lvis V0.5, you have to find a way to build mmdet V1.1 environments and use the code in lvis_old folder.

Datasets

ImageNet-LT

ImageNet-LT is a long-tailed subset of original ImageNet, you can download the dataset from its homepage. After you download the dataset, you need to change the data_root of 'ImageNet' in ./classification/main.py file.

CIFAR-10/-100

When you run the code for the first time, our dataloader will automatically download the CIFAR-10/-100. You need to set the data_root in ./classification/main.py to the path where you want to put all CIFAR data.

LVIS

Large Vocabulary Instance Segmentation (LVIS) dataset uses the COCO 2017 train, validation, and test image sets. If you have already downloaded the COCO images, you only need to download the LVIS annotations. LVIS val set contains images from COCO 2017 train in addition to the COCO 2017 val split.

You need to put all the annotations and images under ./data/LVIS like this:

data
  |-- LVIS
    |--lvis_v1_train.json
    |--lvis_v1_val.json
      |--images
        |--train2017
          |--.... (images)
        |--test2017
          |--.... (images)
        |--val2017
          |--.... (images)

Getting Started

For long-tailed classification, please go to [link]

For long-tailed object detection and instance segmentation, please go to [link]

Advantages of the Proposed Method

  • Compared with previous state-of-the-art Decoupling, our method only requires one-stage training.
  • Most of the existing methods for long-tailed problems are using data distribution to conduct re-sampling or re-weighting during training, which is based on an inappropriate assumption that you can know the future distribution before you start to learn. Meanwhile, the proposed method doesn't need to know the data distribution during training, we only need to use an average feature for inference after we train the model.
  • Our method can be easily transferred to any tasks. We outperform the previous state-of-the-arts Decoupling, BBN, OLTR in image classification, and we achieve better results than 2019 Winner of LVIS challenge EQL in long-tailed object detection and instance segmentation (under the same settings with even fewer GPUs).

Citation

If you find our paper or this project helps your research, please kindly consider citing our paper in your publications.

@inproceedings{tang2020longtailed,
  title={Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect},
  author={Tang, Kaihua and Huang, Jianqiang and Zhang, Hanwang},
  booktitle= {NeurIPS},
  year={2020}
}
Comments
  • 请问detection中rcnn head的cls是不是做了2次softmax操作?

    请问detection中rcnn head的cls是不是做了2次softmax操作?

    第一次在cos_forward函数里,if self.KEEP_FG会做一次softmax https://github.com/KaihuaTang/Long-Tailed-Recognition.pytorch/blob/90c8b2c0b66d17f78b67263861bc9d858fe20128/lvis1.0/mmdet/models/roi_heads/bbox_heads/convfc_bbox_head.py#L241

    第二次在基类bbox_headget_bboxes函数里 https://github.com/KaihuaTang/Long-Tailed-Recognition.pytorch/blob/90c8b2c0b66d17f78b67263861bc9d858fe20128/lvis1.0/mmdet/models/roi_heads/bbox_heads/bbox_head.py#L197 请问是我理解错了还是就是这样设计的哇?

    opened by TangJiajieseu 6
  • 关于训练和测试集同是长尾分布时 TDE 有效性的问题?

    关于训练和测试集同是长尾分布时 TDE 有效性的问题?

    尊敬的作者,您好,感谢您的工作并开源代码 我尝试使用您的方案训练自己的数据集(5类,训练集和测试集都是长尾分布,且同分布)

    训练和模型参数

    # default num_head = 2
    criterions:
      PerformanceLoss:
        def_file: ./loss/SoftmaxLoss.py
        loss_params: {}
        optim_params: null
        weight: 1.0
    last: false
    # apply incremental pca to remove main components
    apply_ipca: false
    num_components: 512
    model_dir: null
    tuning_memory: false
    networks:
      classifier:
        def_file: ./models/CausalNormClassifier.py
        optim_params: {lr: 0.001, momentum: 0.9, weight_decay: 0}
        scheduler_params: {coslr: false, endlr: 0.0, gamma: 0.1, step_size: 30, warmup: true, lr_step: [60, 80], lr_factor: 0.1, warm_epoch: 5}
        params: {dataset: GCAssCls, feat_dim: 128, num_classes: 5, stage1_weights: false, use_effect: true, num_head: 2, tau: 16.0, alpha: 1.0, gamma: 0.03125}
      feat_model:
        def_file: ./models/ResNet18Feature.py
        fix: false
        optim_params: {lr: 0.001, momentum: 0.9, weight_decay: 0}
        scheduler_params: {coslr: false, endlr: 0.0, gamma: 0.1, step_size: 30, warmup: true, lr_step: [60, 80], lr_factor: 0.1, warm_epoch: 5}
        params: {dataset: GCAssCls, dropout: 0.5, stage1_weights: false, use_fc: True, fc_channel: 128, pretrained: True}
    shuffle: false
    training_opt:
      backbone: resnet18
      batch_size: 128
      dataset: GCAssCls
      display_step: 10
      display_grad: False
      display_grad_step: 10
      feature_dim: 128
      log_dir: ./logs/GCAssCls/models/resnet18_e100_C5_warmup_causal_norm_lr1e-3_adam
      log_root: /logs/GCAssCls
      num_classes: 5
      num_epochs: 100
      num_freeze_epochs: 6
      num_workers: 12
      open_threshold: 0.1
      sampler: null
      sub_dir: models
      optimizer: adam
    

    我使用了ResNet18作为backbone,并使用了ImageNet上的pretrain model,训练出来的结果,并在backbone连接了fc和dropout,迭代100个epoch,在训练集上的准确率基本到99.9%,此时获得模型在测试集上表现很差 classifice 中 use_effect 置为 false 是测试 不含 TDE 的结果。

    测试在长尾测试集上的结果

    image

    几点疑问

    • 添加了TDE后,many-shot类准确率大幅下降,few-shot类的召回率有所提升,请问这正常吗?
    • 论文分类任务中test数据集都是类别均衡分布的数据集,如 ImageNet-LT 每个类有50个样本,cifar-10/100只在训练时进行非均衡采样,而实际生产场景中的分类任务很多本身也是长尾分布的,在收集的训练集基本同应用场景同分布,在与实际场景不同分布的情况下去评估模型的性能有违初衷?
    opened by urbaneman 2
  • 关于lvis1.0中计算d

    关于lvis1.0中计算d

    def update_embed(self, targets, gt_label): if self.training: # remove background with torch.no_grad(): fg_target = targets[gt_label > 0].clone().detach().mean(0, keepdim=True) self.causal_embed = self.MU * self.causal_embed + fg_target return

    新版mmdet中0-(num_classes-1)才是正样本,上面gt_label > 0是不是错了?

    opened by dmy1997 2
  • embed_mean grow to inf

    embed_mean grow to inf

    Hello, i have added CausalNormClassifier to my own project, and recorded embed mean. However, i meet the problem that the embed mean grow to inf. I print torch.sum(embed_mean)

    tensor(1902.6863, device='cuda:0')
    Train:   2%|                                   | 1/45 [00:07<05:36,  7.65s/it]
    tensor(8754.8096, device='cuda:0')
    Train:   4%|                                   | 2/45 [00:06<03:03,  4.27s/it]
    tensor(33422.0312, device='cuda:0')
    Train:   7%|                                  | 3/45 [00:08<02:47,  3.99s/it]
    tensor(122225.3984, device='cuda:0')
    

    the embed_mean is updated as follows:

    self.embed_mean = torch.zeros(int(self.training_opt['feature_dim'])).numpy()
    self.embed_mean = self.mu * self.embed_mean + self.features.detach().mean(0).view(-1).cpu().numpy()
    

    During the train process, the gradient will be small and small, so the the velocity will not grow to inf. But the feature which generated by model may not be small and small, so it seems to grow to inf. So i can't store this variable. How can i solve this problem? Or anything i miss?

    opened by yaojunr 2
  • Regarding training_opt: {open_threshold: 0.1}

    Regarding training_opt: {open_threshold: 0.1}

    What does training_opt: {open_threshold: 0.1} do? It looks like, it is pointing to the theta below but it is not being used at all.

    def F_measure(preds, labels, theta=None):
        # Regular f1 score
        return f1_score(labels.detach().cpu().numpy(), preds.detach().cpu().numpy(), average='macro')
    
    opened by rahulvigneswaran 2
  • Moving average for d_hat

    Moving average for d_hat

    In the code the moving average is like this, https://github.com/KaihuaTang/Long-Tailed-Recognition.pytorch/blob/90c8b2c0b66d17f78b67263861bc9d858fe20128/classification/run_networks.py#L215

    In the above scenario, more importance is being given to newer epochs, in that case, why can't we just use the final epoch's model? Or is there any other rationale behind it?

    Should the line have been like the following instead ? self.embed_mean = self.mu * self.embed_mean + (1-self.mu)*self.features.detach().mean(0).view(-1).cpu().numpy()

    where the self.mu = 0.9.

    opened by rahulvigneswaran 1
  • Moving average of embedding

    Moving average of embedding

    Why is there a moving average of the embedding out of the feature before it goes into the fully connected network? What is it used for? https://github.com/KaihuaTang/Long-Tailed-Recognition.pytorch/blob/540cae6da251f49b1b021630e219dde7c5867ce2/classification/run_networks.py#L215

    opened by rahulvigneswaran 1
  • feat_uniform.yaml

    feat_uniform.yaml

    1. Is feat_uniform.yaml just normal training?
    2. For ResNet32Feature.py, why are you using the BBN_ResNet_Cifar class instead of using the parent class ResNetFeature ?
    opened by rahulvigneswaran 1
  • A question regarding the assignment of num_head, tau, alpha and gamma

    A question regarding the assignment of num_head, tau, alpha and gamma

    Hey @KaihuaTang, I am an NLPer. Thanks for your interesting work. I've observed that you got the same cofigs regarding the parameters in class Causal_Norm_Classifier, namely num_head, tau, alpha and gamma, for both CIFAR and ImagNet datasets. How come? Could you also please explain the idea of that kinda assginment? Thanks a lot.

    opened by 22842219 0
  • 论文求助

    论文求助

    作者你好,最近才看到这篇关于长尾数据分类的论文。有一些地方不是很明白。 1)论文指出验证或测试时才需要消除头部偏移的影响,那么消除头部偏移影响后,模型在训练集上的表现如何? 2)该偏移可以看作零输入的特征输出,那么能否将网络中所有卷积和BatchNormalization的bias均设置为否? 3) alpha的含义是什么,为什么可以大于1 image

    谢谢

    opened by Itsanewday 0
  • 关于统计移动平均特征的代码问题

    关于统计移动平均特征的代码问题

    您好,首先非常感谢您solid的工作。

    但在我看具体代码时,有一些问题想要请教:

    在 lvis1.0\mmdet\models\roi_heads\bbox_heads\convfc_bbox_head.py 文件中的 update_embed 函数里:

            if self.training:
                # remove background
                with torch.no_grad():
                    fg_target = targets[gt_label > 0].clone().detach().mean(0, keepdim=True)   
                    self.causal_embed = self.MU * self.causal_embed + fg_target    
            return
    

    就我理解而言,这里是给滑动平均值加上了特征图的(fg_target)的数值,而非梯度值,与下图式中的 g_t 不符。请问是我对代码的理解有误吗?可否指正一下,非常感谢! image

    opened by Kittywyk 1
  • Cifar Validation set

    Cifar Validation set

    Hello. I found in ImbalanceCIFAR.py, the validation set and the test set seem to be the same. Is it better to use a balanced training set (50000 frames) as the validation set instead of using the 10000 frame test set for validation? Thanks!

    opened by sxontheway 1
  • Migration to multi label long tailed recognition

    Migration to multi label long tailed recognition

    Thank you for your excellent work. I want to try to migrate the classifier (CausalNormClassifier) in this work to the task of multi label long tailel recognition. However, the following problems appeared. I believe the model has not learned any knowledge at all, perhaps the output is smoothed out in the inference stage. Because the loss change during training seems normal. Can you give me some advice or tell me the possible problems?

    感谢你开源这么优秀的工作,我想尝试把这个工作中的分类器迁移到多标签长尾数据分类任务中去,但是却出现了如下的问题,感觉模型完全没有学习到任何知识,或者好像是在推理的阶段输出被平滑抵消掉了?因为在训练过程中的loss变化看起来是正常的。能否给我一些建议或者告诉我可能出现的问题?

    捕获

    opened by AlphaPlusTT 2
Owner
Kaihua Tang
@kaihuatang.github.io/
Kaihua Tang
This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

peng gao 11 Dec 1, 2021
Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021

Exploring Classification Equilibrium in Long-Tailed Object Detection (LOCE, ICCV 2021) Paper Introduction The conventional detectors tend to make imba

null 52 Nov 21, 2022
A Strong Baseline for Image Semantic Segmentation

A Strong Baseline for Image Semantic Segmentation Introduction This project is an open source semantic segmentation toolbox based on PyTorch. It is ba

Clark He 49 Sep 20, 2022
[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

CodingMan 45 Dec 12, 2022
Code for Mining the Benefits of Two-stage and One-stage HOI Detection

Status: Archive (code is provided as-is, no updates expected) PPO-EWMA [Paper] This is code for training agents using PPO-EWMA and PPG-EWMA, introduce

OpenAI 33 Dec 15, 2022
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch >= 1.2.0 P

null 16 Dec 14, 2022
Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

PatatiPatata 28 Oct 18, 2022
On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks We provide the code (in PyTorch) and datasets for our paper "On Size-Orient

Zemin Liu 4 Jun 18, 2022
Jingju baseline - A baseline model of our project of Beijing opera script generation

Jingju Baseline It is a baseline of our project about Beijing opera script gener

midon 1 Jan 14, 2022
Single-Stage 6D Object Pose Estimation, CVPR 2020

Overview This repository contains the code for the paper Single-Stage 6D Object Pose Estimation. Yinlin Hu, Pascal Fua, Wei Wang and Mathieu Salzmann.

CVLAB @ EPFL 89 Dec 26, 2022
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps Here is the code for ssbassline model. We also provide OCR results/features/mode

ZephyrZhuQi 51 Nov 18, 2022
This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

A STRONG BASELINE FOR VEHICLE RE-IDENTIFICATION This paper is accepted to the IEEE Conference on Computer Vision and Pattern Recognition Workshop(CVPR

Cybercore Co. Ltd 78 Dec 29, 2022
A tiny, friendly, strong baseline code for Person-reID (based on pytorch).

Pytorch ReID Strong, Small, Friendly A tiny, friendly, strong baseline code for Person-reID (based on pytorch). Strong. It is consistent with the new

Zhedong Zheng 3.5k Jan 8, 2023
Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021. For details of the model and experiments, please see our paper.

tricktreat 87 Dec 16, 2022
A Data Annotation Tool for Semantic Segmentation, Object Detection and Lane Line Detection.(In Development Stage)

Data-Annotation-Tool How to Run this Tool? To run this software, follow the steps: git clone https://github.com/Autonomous-Car-Project/Data-Annotation

TiVRA AI 13 Aug 18, 2022
VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection (ICCV 2021)

Preparation Please see dataset/README.md to get more details about our datasets-VIL100 Please see INSTALL.md to install environment and evaluation too

null 82 Dec 15, 2022
Virtual Dance Reality Stage: a feature that offers you to share a stage with another user virtually

Portrait Segmentation using Tensorflow This script removes the background from an input image. You can read more about segmentation here Setup The scr

null 291 Dec 24, 2022
Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

null 47 Dec 16, 2022
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

null 23 Oct 17, 2022