Awesome Long-Tailed Learning

Overview

Awesome Long-Tailed Learning Awesome

This repo pays specially attention to the long-tailed distribution, where labels follow a long-tailed or power-law distribution in the training dataset or/and test dataset. Related papers are sumarized, including its application in computer vision, in particular image classification, and extreme multi-label learning (XML), in particular text categorization.

🔆 Updated 2021-09-27

Long-tailed Learning in Computer Vision

Type of Long-Tailed Learning Methods

Type TST IS CBS CLW NC ENS DA
Meaning Two-Stage Training Instance Sampling Class-Balanced Sampling Class-Level Weighting Normalized Classifier Ensemble Data Augmentation

Long-Tailed Learning Workshops

Year Venue Title Remark
2021 CVPR Open World Vision long-tail, open-set, streaming labels
2021 CVPR Learning from Limited and Imperfect Data (L2ID) label noise, SSL, long-tail

Long-Tailed Learning Papers

Year Venue Title Remark
2021 Arxiv LEARNING FROM LONG-TAILED DATA WITH NOISY LABELS
2021 ICCV Self Supervision to Distillation for Long-Tailed Visual Recognition
2021 ICCV Distilling Virtual Examples for Long-tailed Recognition
2021 CVPR Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification
2021 CVPR MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition
2021 CVPR Disentangling Label Distribution for Long-tailed Visual Recognition
2021 CVPR Long-Tailed Multi-Label Visual Recognition by Collaborative Training on Uniform and Re-Balanced Samplings
2021 CVPR Seesaw Loss for Long-Tailed Instance Segmentation
2021 ICLR IS LABEL SMOOTHING TRULY INCOMPATIBLE WITH KNOWLEDGE DISTILLATION: AN EMPIRICAL STUDY
2021 Arxiv Improving Long-Tailed Classification from Instance Level
2021 Arxiv DISTRIBUTION-AWARE SEMANTICS-ORIENTED PSEUDO-LABEL FOR IMBALANCED SEMI-SUPERVISED LEARNING SSL, Code
2021 Arxiv ResLT: Residual Learning for Long-tailed Recognition
2021 Arxiv Improving Long-Tailed Classification from Instance Level
2021 Arxiv Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces by Google
2021 Arxiv Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition
2021 Arxiv Procrustean Training for Imbalanced Deep Learning
2021 Arxiv Balanced Knowledge Distillation for Long-tailed Learning CBS+IS, Code
2021 Arxiv Class-Balanced Distillation for Long-Tailed Visual Recognition ENS+DA+IS, by Google Research
2021 Arxiv Distributional Robustness Loss for Long-tail Learning TST+CBS
2021 CVPR Improving Calibration for Long-Tailed Recognition DA+TST, Code
2021 CVPR Distribution Alignment: A Unified Framework for Long-tail Visual Recognition TST
2021 CVPR Adversarial Robustness under Long-Tailed Distribution
2021 CVPR CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning by Google, Code, Tensorflow
2021 ICLR HETEROSKEDASTIC AND IMBALANCED DEEP LEARNING WITH ADAPTIVE REGULARIZATION Code
2021 ICLR LONG-TAILED RECOGNITION BY ROUTING DIVERSE DISTRIBUTION-AWARE EXPERTS ENS+NC, Code, by Zi-Wei Liu
2021 ICLR Long-Tail Learning via Logit Adjustment by Google
2021 AAAI Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks
2021 Arxiv Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification
2020 Arxiv ELF: An Early-Exiting Framework for Long-Tailed Classification
2020 CVPR Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective
2020 CVPR Equalization Loss for Long-Tailed Object Recognition
2020 CVPR Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective
2020 ICLR Decoupling representation and classifier for long-tailed recognition Code
2020 NeurIPS Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning Code
2020 NeurIPS Rethinking the Value of Labels for Improving Class-Imbalanced Learning Code
2020 CVPR Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition Code
2019 NeurIPS Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss Code
2019 CVPR Large-Scale Long-Tailed Recognition in an Open World Code, bibtex, by CUHK
2018 - iNatrualist. The inaturalist 2018 competition dataset long-tailed dataset
2017 Arxiv The Devil is in the Tails: Fine-grained Classification in the Wild
2017 NeurIPS Learning to model the tail

eXtreme Multi-label Learning for Information Retrieval

Binary Relevance

Year Venue Title Remark
2019 Machine learning Data Scarcity, Robustness and Extreme Multi-label Classification
2019 WSDM Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches
2017 KDD PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification
2017 AISTATS Label Filters for Large Scale Multilabel Classification
2016 WSDM DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification
2016 ICML PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification

Tree-based Methods

Year Venue Title Remark
2021 KDD Extreme Multi-label Learning for Semantic Matching in Product Search by Amazon, code
2020 arXiv Probabilistic Label Trees for Extreme Multi-label Classification PLT survey, code
2020 arXiv Online probabilistic label trees
2020 AISTATS LdSM: Logarithm-depth Streaming Multi-label Decision Trees Instance tree,c++ code
2019 NeurIPS AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks Label tree
2019 arXiv Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification Label tree
2018 ICML CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning Instance tree
2018 WWW Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising Label tree...by Manik Varma
2016 ICML Extreme F-Measure Maximization using Sparse Probability Estimates Label tree
2016 KDD Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications Instance tree
2014 KDD A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning Instance tree, python implementation
2013 ICML Label Partitioning For Sublinear Ranking Label tree
2013 WWW Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages Instance tree, Random Forest, Gini Index
2011 NeurIPS Efficient label tree learning for large scale object recognition Label tree, multi-class
2010 NeurIPS Label embedding trees for large multi-class tasks Label tree, multi-class
2008 ECML Workshop Effective and Efficient Multilabel Classification in Domains with Large Number of Labels Label tree

Embedding-based Methods

Year Venue Title Remark
2019 AAAI Distributional Semantics Meets Multi-Label Learning bibtex
2019 arXiv Ranking-Based Autoencoder for Extreme Multi-label Classification
2019 NeurIPS Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Ouput Spaces by Google Research
2017 KDD AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification
2015 NeurIPS Sparse Local Embeddings for Extreme Multi-label Classification
2014 ICML Large-scale Multi-label Learning with Missing Labels
2014 ICML Multi-label Classification via Feature-aware Implicit Label Space Encoding
2013 ICML Efficient Multi-label Classification with Many Labels
2012 NeurIIPS Feature-aware Label Space Dimension Reduction for Multi-label Classification
2011 IJCAI WSABIE: Scaling Up To Large Vocabulary Image Annotation bibtex
2009 NeurIPS Multi-Label Prediction via Compressed Sensing
2008 KDD Extracting Shared Subspaces for Multi-label Classification

Speed-up and Compression

Year Venue Title Remark
2020 KDD Large-Scale Training System for 100-Million Classification at Alibaba Applied Data Science Track
2020 arXiv SOLAR: Sparse Orthogonal Learned and Random Embeddings
2020 ICLR EXTREME CLASSIFICATION VIA ADVERSARIAL SOFTMAX APPROXIMATION
2019 AISTATS Stochastic Negative Mining for Learning with Large Output Spaces by Google
2019 NeurIPS Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products Rice University, bibtex
2019 arXiv An Embarrassingly Simple Baseline for eXtreme Multi-label Prediction
2019 arXiv Accelerating Extreme Classification via Adaptive Feature Agglomeration bibtex, authors from IIT
2019 SDM Fast Training for Large-Scale One-versus-All Linear Classifiers using Tree-Structured Initialization code bibtex

Noval XML Settings

Year Venue Title Remark
2020 arXiv Extreme Multi-label Classification from Aggregated Labels by Inderjit Dhillon. This paper considers multi-instance learning in XML
2020 arXiv Unbiased Loss Functions for Extreme Classification With Missing Labels by Rohit Babbar. Missing labels
2020 ICML Deep Streaming Label Learning code, by Dacheng Tao, streaming multi-label learning
2016 arXiv Streaming Label Learning for Modeling Labels on the Fly by Dacheng Tao, streaming multi-label learning

Theoritical Studies

Year Venue Title Remark
2019 ICML Sparse Extreme Multi-label Learning with Oracle Property Code, by Weiwei Liu
2019 NeurIPS Multilabel reductions: what is my loss optimising? bibtex, by Google

Text Classification

Year Venue Title Remark
2021 ICML SiameseXML: Siamese Networks meet Extreme Classifiers with 100M Labels
2020 KDD Correlation Networks for Extreme Multi-label Text Classification code
2020 arXiv GNN-XML: Graph Neural Networks for Extreme Multi-label Text Classification
2020 ICML Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification code
2019 ACL Large-Scale Multi-Label Text Classification on EU Legislation Eur-Lex 4.3K, bibtex
2019 arXiv X-BERT: eXtreme Multi-label Text Classification with BERT code by Yiming Yang, Inderjit Dhillon
2019 NeurIPS AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks
2018 EMNLP Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces few-shot, zero-shot, evaluation metric
2018 NeurIPS A no-regret generalization of hierarchical softmax to extreme multi-label classification code, PLT code
2017 SIGIR Deep Learning for Extreme Multi-label Text Classification by Yiming Yang at CMU, bibtex

Others

Label Correlation

Year Venue Title Remark
2019 ICML DL2: Training and Querying Neural Networks with Logic
2015 KDD Discovering and Exploiting Deterministic Label Relationships in Multi-Label Learning
2010 KDD Multi-Label Learning by Exploiting Label Dependency

Long-tailed Continual Learning

Year Venue Title Remark
2020 ECCV Imbalanced Continual Learning with Partitioning Reservoir Sampling

Train/Test Split

Year Venue Title Remark
2021 Arxiv Stratified Sampling for Extreme Multi-Label Data

XML Seminar

Year Venue Title Remark
2019 Dagstuhl Seminar 18291 Extreme Classification

Survey References:

  1. https://arxiv.org/pdf/1901.00248.pdf
  2. http://www.iith.ac.in/~saketha/research/AkshatMTP2018.pdf
  3. http://manikvarma.org/pubs/bengio19.pdf
  4. The Emerging Trends of Multi-Label Learning

XML Datasets link

Extreme Classification Workshops link

You might also like...
On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks We provide the code (in PyTorch) and datasets for our paper "On Size-Orient

PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment

logit-adj-pytorch PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment This code implements the paper: Long-tail Learning via

A curated list of  awesome resources related to Semantic Search🔎  and Semantic Similarity tasks.
A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

A list of awesome PyTorch scholarship articles, guides, blogs, courses and other resources.

Awesome PyTorch Scholarship Resources A collection of awesome PyTorch and Python learning resources. Contributions are always welcome! Course Informat

Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

ADGC: Awesome Deep Graph Clustering ADGC is a collection of state-of-the-art (SOTA), novel deep graph clustering methods (papers, codes and datasets).

✨✨✨An awesome open source toolbox for stereo matching.

OpenStereo This is an awesome open source toolbox for stereo matching. Supported Methods: BM SGM(T-PAMI'07) GCNet(ICCV'17) PSMNet(CVPR'18) StereoNet(E

Awesome Remote Sensing Toolkit based on PaddlePaddle.
Awesome Remote Sensing Toolkit based on PaddlePaddle.

基于飞桨框架开发的高性能遥感图像处理开发套件,端到端地完成从训练到部署的全流程遥感深度学习应用。 最新动态 PaddleRS 即将发布alpha版本!欢迎大家试用 简介 PaddleRS是遥感科研院所、相关高校共同基于飞桨开发的遥感处理平台,支持遥感图像分类,目标检测,图像分割,以及变化检测等常用遥

On Generating Extended Summaries of Long Documents

ExtendedSumm This repository contains the implementation details and datasets used in On Generating Extended Summaries of Long Documents paper at the

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting This is the origin Pytorch implementation of Informer in the followin

Comments
  • Hello. New paper for long-tailed learning.

    Hello. New paper for long-tailed learning.

    Hello! We'd like to introduce a new paper regarding video long-tailed recognition accepted to AAAI 2023. The paper's title is "Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition".

    Arxiv version is here". https://arxiv.org/abs/2211.13471

    GitHub is available here. https://github.com/wjun0830/MOVE

    We would appreciate it if you could spare the time to add a new paper.

    Btw, one of our contributions can be categorized as "Using Instance sampling for data augmentation."

    Thank you so much.

    opened by wjun0830 0
Owner
Stomach_ache
Stomach_ache
Improving Calibration for Long-Tailed Recognition (CVPR2021)

Improving Calibration for Long-Tailed Recognition (CVPR2021)

Jia Research Lab 19 Apr 28, 2021
Pytorch implementation for "Adversarial Robustness under Long-Tailed Distribution" (CVPR 2021 Oral)

Adversarial Long-Tail This repository contains the PyTorch implementation of the paper: Adversarial Robustness under Long-Tailed Distribution, CVPR 20

Tong WU 89 Dec 15, 2022
Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021

Exploring Classification Equilibrium in Long-Tailed Object Detection (LOCE, ICCV 2021) Paper Introduction The conventional detectors tend to make imba

null 52 Nov 21, 2022
Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

Large-Scale Long-Tailed Recognition in an Open World [Project] [Paper] [Blog] Overview Open Long-Tailed Recognition (OLTR) is the author's re-implemen

Zhongqi Miao 761 Dec 26, 2022
Improving Calibration for Long-Tailed Recognition (CVPR2021)

MiSLAS Improving Calibration for Long-Tailed Recognition Authors: Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia [arXiv] [slide] [BibTeX] Introductio

DV Lab 116 Dec 20, 2022
Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective Zhengzhuo Xu, Zenghao Chai, Chun Yuan This is the PyTorch implement

Sincere 16 Dec 15, 2022
A Simple Long-Tailed Rocognition Baseline via Vision-Language Model

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

Teli Ma 4 Jan 20, 2022
This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

peng gao 11 Dec 1, 2021
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch >= 1.2.0 P

null 16 Dec 14, 2022
Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

PatatiPatata 28 Oct 18, 2022