[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression

Overview

Delving into Deep Imbalanced Regression

This repository contains the implementation code for paper:
Delving into Deep Imbalanced Regression
Yuzhe Yang, Kaiwen Zha, Ying-Cong Chen, Hao Wang, Dina Katabi
38th International Conference on Machine Learning (ICML 2021), Long Oral
[Project Page] [Paper] [Video] [Blog Post]


Deep Imbalanced Regression (DIR) aims to learn from imbalanced data with continuous targets,
tackle potential missing data for certain regions, and generalize to the entire target range.

Beyond Imbalanced Classification: Brief Introduction for DIR

Existing techniques for learning from imbalanced data focus on targets with categorical indices, i.e., the targets are different classes. However, many real-world tasks involve continuous and even infinite target values. We systematically investigate Deep Imbalanced Regression (DIR), which aims to learn continuous targets from natural imbalanced data, deal with potential missing data for certain target values, and generalize to the entire target range.

We curate and benchmark large-scale DIR datasets for common real-world tasks in computer vision, natural language processing, and healthcare domains, ranging from single-value prediction such as age, text similarity score, health condition score, to dense-value prediction such as depth.

Usage

We separate the codebase for different datasets into different subfolders. Please go into the subfolders for more information (e.g., installation, dataset preparation, training, evaluation & models).

IMDB-WIKI-DIR  |  AgeDB-DIR  |  NYUD2-DIR  |  STS-B-DIR

Highlights

(1) ✔️ New Task: Deep Imbalanced Regression (DIR)

(2) ✔️ New Techniques:

image image
Label distribution smoothing (LDS) Feature distribution smoothing (FDS)

(3) ✔️ New Benchmarks:

  • Computer Vision: 💡 IMDB-WIKI-DIR (age) / AgeDB-DIR (age) / NYUD2-DIR (depth)
  • Natural Language Processing: 📋 STS-B-DIR (text similarity score)
  • Healthcare: 🏥 SHHS-DIR (health condition score)
IMDB-WIKI-DIR AgeDB-DIR NYUD2-DIR STS-B-DIR SHHS-DIR
image image image image image

Updates

  • [06/2021] We provide a hands-on tutorial of DIR. Check it out!
  • [05/2021] We create a Blog post for this work (version in Chinese is also available here). Check it out for more details!
  • [05/2021] Paper accepted to ICML 2021 as a Long Talk. We have released the code and models. You can find all reproduced checkpoints via this link, or go into each subfolder for models for each dataset.
  • [02/2021] arXiv version posted. Please stay tuned for updates.

Citation

If you find this code or idea useful, please cite our work:

@inproceedings{yang2021delving,
  title={Delving into Deep Imbalanced Regression},
  author={Yang, Yuzhe and Zha, Kaiwen and Chen, Ying-Cong and Wang, Hao and Katabi, Dina},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2021}
}

Contact

If you have any questions, feel free to contact us through email ([email protected] & [email protected]) or Github issues. Enjoy!

Comments
  • Hi @YyzHarry,

    Hi @YyzHarry,

    Hi @YyzHarry, I want to use your code of LDS to solve my problem,now I have a question to ask you:whether the input data format of LDS must be csv?Is npz format data OK?can it be applied to high-dimensional data? I would appreciate it if you could give me some guidance.

    opened by ytkmy5555 6
  • Bins in FDS and LDS - not usable in general approach, only for given datasets

    Bins in FDS and LDS - not usable in general approach, only for given datasets

    Hi Team, I liked the ideas in your paper, but from reading the paper and provided code it sounds like the provided FDS and LDS code can be applied to any dataset/model? Is it really true?

    • It looks like you are using only integers(as you are predicting age) to make a dictionary of histogram bins in both FDS and LDS. In the paper you say : "We use a minimum bin size of 1, i.e., yb+1 − yb = 1, and group features with the same target value in the same bin." I imagine this makes a lot of things easier but if you are facing imbalanced regression problem and your labels are float between 0 and 5 this version of code won't help you. Do you by any chance have code with general approach?
      • see following parts of code with usecase specific histogram bins:
        • https://github.com/YyzHarry/imbalanced-regression/blob/055a7b3804bbaf903ed25a55c11ab8acc6e142e1/agedb-dir/datasets.py#L60
        • https://github.com/YyzHarry/imbalanced-regression/blob/055a7b3804bbaf903ed25a55c11ab8acc6e142e1/agedb-dir/fds.py#L120
    • I did not find an explanation for this clipping(maybe empirically it gave better results?): https://github.com/YyzHarry/imbalanced-regression/blob/055a7b3804bbaf903ed25a55c11ab8acc6e142e1/agedb-dir/datasets.py#L67
    • there is also another clipping here(I guess again better empirical results?): https://github.com/YyzHarry/imbalanced-regression/blob/055a7b3804bbaf903ed25a55c11ab8acc6e142e1/agedb-dir/utils.py#L102

    Note: I like the ideas in the paper, but due to lack of documentation/explanation I am right now spending a lot of time on generalizing the code and trying to figure out why you made some of the operations(eg. clippings)

    opened by 5uperpalo 5
  • prediction value processing

    prediction value processing

    hello, I read your paper interestingly and want to ask you a question about the prediction result processing. I would like to ask how to limit the last prediction y^ to be between 0 and 99, or to get it directly from the regression function without any processing?

    opened by huangbingyang2020 4
  • How to use this method in a multi-dimension regression problem?

    How to use this method in a multi-dimension regression problem?

    Hi, Amazing for your great job in the imblanced regression problem. But I notice that this work discusses more on the 1D regression problems. What if the output is more than 1D (like Batch_Size x 10) ? Any suggestion will be helpful. Thanks.

    opened by semi-supervised-paper 3
  • about test error

    about test error

    Hello, I have some questions about the error pdf. Can I know how to get the right error pdf? Each labels have different numbers of samples, so should I apply a mean method for each label error or make them have the same amount?

    opened by W-rudder 2
  • The reproduced benchmark and model seem to be damaged

    The reproduced benchmark and model seem to be damaged

    Hi @YyzHarry ,The reproduced benchmark and model seem to be damaged。I use this link( https://drive.google.com/file/d/1CPDlcRCQ1EC4E3x9w955cmILSaVOlkyz/view )The downloaded model cannot be opened. It indicates that the data has been damaged. Can you update the model? Thank you!

    opened by GXNU156489 2
  • Applying LDS/FDS to classic machine learning models

    Applying LDS/FDS to classic machine learning models

    Hi! This work is really fantastic! However, I found it hard to apply LDS/FDS to classic machine learning models like random forest. For example, after getting the effective label density with LDS, how should I use this?

    opened by luopx 2
  • Hi, confusion about the computation of the feature statistics similarity (mean&variance)

    Hi, confusion about the computation of the feature statistics similarity (mean&variance)

    I'm dealing my unbalanced data with your study; A little confusion: how to compute the Feature statistics similarity between the many-shot region and few-shot region? Cuz the features in the few-shot region bins are less than that in many-shot region. image

    Thanks to your attention!

    opened by lixingang 2
  • Incorrect Focal-R mse loss?

    Incorrect Focal-R mse loss?

    Hi authors,

    page 6 from your paper: Precisely, Focal-R loss based on L1 distance can be written as 1/n∑n i=1 σ(|βei|)γ ei, where ei is the L1 error for i-th sample, σ(·)

    • QUESTION 1 : in the focal_mse loss: https://github.com/YyzHarry/imbalanced-regression/blob/055a7b3804bbaf903ed25a55c11ab8acc6e142e1/agedb-dir/loss.py#L24 should be torch.abs((inputs - targets)**2) and not only torch.abs(inputs - targets), am I correct?
    • QUESTION 2 : why is there 2*torch.abs(...)-1 ? you do not have and -1 or 2* in the function in your paper?
    opened by 5uperpalo 2
  • Does this method apply to linear regression models like elastic net?

    Does this method apply to linear regression models like elastic net?

    Hi, I'm wondering whether I can use this method as a preprocessing step for non-DNN models, namely simple linear regression or elastic net regression? If so, how should I adopt this method?

    Thank you so much!

    opened by albert-ying 2
  • About SHHS-DIR dataset

    About SHHS-DIR dataset

    Thanks a lot for your contribution, your works are really awesome. I am very interested in your work. However, during the code reading, I did not find the SHHS-DIR dataset. Could you publish the SHHS-DIR dataset or its sampling method, thank you!

    opened by axi345 2
  • Using FDS in ML project

    Using FDS in ML project

    Hello, in your paper on the problem of deep imbalance regression, I have the privilege of learning about the smoothing methods of LDS and FDS. In one of my machine learning projects predicting convective cloud precipitation, I wanted to use FDS to play a role in it because of the imbalance between non-precipitation samples and precipitation samples. I wonder how to smooth the feature statistic without knowing its label in the test set, in my data, my data is very unbalanced (70% of the data without precipitation), which results in the characteristic statistics of each label interval are particularly similar (about 98%), so that the smoothing effect is still not significant, and if there are some important points to pay attention to if using FDS in machine learning?

    opened by thebluewind 1
  • validation question

    validation question

    hello, I read your paper interestingly and want to apply it to my custom data. In this regard, two problems occurred.

    first, when i evaluate my custom data, i can suffer this error 1 do you know what is?

    And can i extract each prediction results, not average prediction result?.

    thank you

    opened by namasang1 1
  • Using FDS/LDS with a custom model and data

    Using FDS/LDS with a custom model and data

    Hi @YyzHarry,

    I am trying to adapt the example from here https://github.com/YyzHarry/imbalanced-regression/tree/main/agedb-dir with my custom model and data. Thus, I would like to ask you whether this would be feasible and if yes if there are any example showing explicitly how to do that.

    Thanks.

    opened by ttsesm 16
Owner
Yuzhe Yang
Ph.D. student at MIT CSAIL
Yuzhe Yang
PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Out-of-distribution Generalization Investigation on Vision Transformers This repository contains PyTorch evaluation code for Delving Deep into the Gen

Chongzhi Zhang 72 Dec 13, 2022
imbalanced-DL: Deep Imbalanced Learning in Python

imbalanced-DL: Deep Imbalanced Learning in Python Overview imbalanced-DL (imported as imbalanceddl) is a Python package designed to make deep imbalanc

NTUCSIE CLLab 19 Dec 28, 2022
Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

Delving into Localization Errors for Monocular 3D Detection By Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang. Intr

XINZHU.MA 124 Jan 4, 2023
[ICLR 2021] Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization Kaidi Cao, Yining Chen, Junwei Lu, Nikos Arechiga, Adrien Gaidon, Tengyu Ma

Kaidi Cao 29 Oct 20, 2022
Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression

Quantile Regression DQN Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression (https://arx

Arsenii Senya Ashukha 80 Sep 17, 2022
Hitters Linear Regression - Hitters Linear Regression With Python

Hitters_Linear_Regression Kullanacağımız veri seti Carnegie Mellon Üniversitesi'

AyseBuyukcelik 2 Jan 26, 2022
[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning DouZero is a reinforcement learning framework for DouDizhu (斗地主), t

Kwai Inc. 3.1k Jan 4, 2023
Code release for "Self-Tuning for Data-Efficient Deep Learning" (ICML 2021)

Self-Tuning for Data-Efficient Deep Learning This repository contains the implementation code for paper: Self-Tuning for Data-Efficient Deep Learning

THUML @ Tsinghua University 101 Dec 11, 2022
A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

RSG: A Simple but Effective Module for Learning Imbalanced Datasets (CVPR 2021) A Pytorch implementation of our CVPR 2021 paper "RSG: A Simple but Eff

null 120 Dec 12, 2022
[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

VITA 24 Dec 17, 2022
Python lib to talk to pylontech lithium batteries (US2000, US3000, ...) using RS485

python-pylontech Python lib to talk to pylontech lithium batteries (US2000, US3000, ...) using RS485 What is this lib ? This lib is meant to talk to P

Frank 26 Dec 28, 2022
The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

DS3L This is the code for paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020. Setups The code is implem

Guolz 36 Oct 19, 2022
meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)

meProp The codes were used for the paper meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting (ICML 2017) [pdf]

LancoPKU 107 Nov 18, 2022
The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

Dice Loss for NLP Tasks This repository contains code for Dice Loss for Data-imbalanced NLP Tasks at ACL2020. Setup Install Package Dependencies The c

null 223 Dec 17, 2022
MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data

This repository is the official PyTorch implementation of Meta-Balance. Find the paper on arxiv MetaBalance: High-Performance Neural Networks for Clas

Arpit Bansal 20 Oct 18, 2021
A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.

Imbalanced Dataset Sampler Introduction In many machine learning applications, we often come across datasets where some types of data may be seen more

Ming 2k Jan 8, 2023
Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

Seulki Park 70 Jan 3, 2023
Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness Code for Paper "Imbalanced Gradients: A Subtle Cause of Overestimated Adv

Hanxun Huang 11 Nov 30, 2022
BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

Balanced-Evolutionary-Semi-Stacking Code for the paper ''BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalan

null 0 Jan 16, 2022