DuBE: Duple-balanced Ensemble Learning from Skewed Data

Last update: Nov 12, 2022

Related tags

Overview

DuBE: Duple-balanced Ensemble Learning from Skewed Data

"Towards Inter-class and Intra-class Imbalance in Class-imbalanced Learning"
(IEEE ICDE 2022 Submission) [Documentation] [Examples]

DuBE is an ensemble learning framework for (multi)class-imbalanced classification. It is an easy-to-use solution to imbalanced learning problems, features good performance, computing efficiency, and wide compatibility with different learning models. Documentation and examples are available at https://duplebalance.readthedocs.io.

Table of Contents
Background
Install
Usage
Documentation

Background

Imbalanced Learning (IL) is an important problem that widely exists in data mining applications. Typical IL methods utilize intuitive class-wise resampling or reweighting to directly balance the training set. However, some recent research efforts in specific domains show that class-imbalanced learning can be achieved without class-wise manipulation. This prompts us to think about the relationship between the two different IL strategies and the nature of the class imbalance. Fundamentally, they correspond to two essential imbalances that exist in IL: the difference in quantity between examples from different classes as well as between easy and hard examples within a single class, i.e., inter-class and intra-class imbalance.

Existing works fail to explicitly take both imbalances into account and thus suffer from suboptimal performance. In light of this, we present Duple-Balanced Ensemble, namely DUBE, a versatile ensemble learning framework. Unlike prevailing methods, DUBE directly performs inter-class and intra-class balancing without relying on heavy distance-based computation, which allows it to achieve competitive performance while being computationally efficient.

Install

Our DuBE implementation requires following dependencies:

python (>=3.8.5)
numpy (>=1.19.2)
pandas (>=1.1.3)
scikit-learn (>=0.23.2)

You can install DuBE by clone this repository:

git clone https://github.com/ICDE2022Sub/duplebalance.git
cd duplebalance
pip install .

Usage

For more detailed usage example, please see Examples.

A minimal working example:

# load dataset & prepare environment
from duplebalance import DupleBalanceClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_classes=3,
                           n_informative=4, weights=[0.2, 0.3, 0.5],
                           random_state=0)

# ensemble training
clf = DupleBalanceClassifier(
    n_estimators=10,
    random_state=42,
    ).fit(X_train, y_train)

# predict
y_pred_test = clf.predict_proba(X_test)

Documentation

For more detailed API references, please see API reference.

Our DupleBalance implementation can be used much in the same way as the ensemble classifiers in sklearn.ensemble. The DupleBalanceClassifier class inherits from the sklearn.ensemble.BaseEnsemble base class.

Main parameters are listed below:

Parameters	Description
`base_estimator`	object, optional (default=`sklearn.tree.DecisionTreeClassifier()`) The base estimator to fit on self-paced under-sampled subsets of the dataset. NO need to support sample weighting. Built-in `fit()`, `predict()`, `predict_proba()` methods are required.
`n_estimators`	int, optional (default=10) The number of base estimators in the ensemble.
`resampling_target`	{'hybrid', 'under', 'over', 'raw'}, default="hybrid" Determine the number of instances to be sampled from each class (inter-class balancing). - If `under`, perform under-sampling. The class containing the fewest samples is considered the minority class :math:`c_{min}`. All other classes are then under-sampled until they are of the same size as :math:`c_{min}`. - If `over`, perform over-sampling. The class containing the argest number of samples is considered the majority class :math:`c_{maj}`. All other classes are then over-sampled until they are of the same size as :math:`c_{maj}`. - If `hybrid`, perform hybrid-sampling. All classes are under/over-sampled to the average number of instances from each class. - If `raw`, keep the original size of all classes when resampling.
`resampling_strategy`	{'hem', 'shem', 'uniform'}, default="shem") Decide how to assign resampling probabilities to instances during ensemble training (intra-class balancing). - If `hem`, perform hard-example mining. Assign probability with respect to instance's latest prediction error. - If `shem`, perform soft hard-example mining. Assign probability by inversing the classification error density. - If `uniform`, assign uniform probability, i.e., random resampling.
`perturb_alpha`	float or str, optional (default='auto') The multiplier of the calibrated Gaussian noise that was add on the sampled data. It determines the intensity of the perturbation-based augmentation. If `'auto'`, perturb_alpha will be automatically tuned using a subset of the given training data.
`k_bins`	int, optional (default=5) The number of error bins that were used to approximate error distribution. It is recommended to set it to 5. One can try a larger value when the smallest class in the data set has a sufficient number (say, > 1000) of samples.
`estimator_params`	list of str, optional (default=tuple()) The list of attributes to use as parameters when instantiating a new base estimator. If none are given, default parameters are used.
`n_jobs`	int, optional (default=None) The number of jobs to run in parallel for :meth:`predict`. `None` means 1 unless in a :obj:`joblib.parallel_backend` context. `-1` means using all processors. See :term:`Glossary <n_jobs>` for more details.
`random_state`	int / RandomState instance / None, optional (default=None) If integer, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `numpy.random`.
`verbose`	int, optional (default=0) Controls the verbosity when fitting and predicting.

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning This repository is official Tensorflow implementation of paper: Ensemb

12 Oct 18, 2022

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning Overview This code is for paper: Not All Unlabeled Data are Equa

22 Nov 23, 2022

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

195 Dec 7, 2022

Automatically download the cwru data set, and then divide it into training data set and test data set

Automatically download the cwru data set, and then divide it into training data set and test data set.自动下载cwru数据集，然后分训练数据集和测试数据集

6 Jun 27, 2022

Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

Modeling High-Frequency Limit Order Book Dynamics Using Machine Learning Framework to capture the dynamics of high-frequency limit order books. Overvi

1.3k Jan 7, 2023

Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

LiDAR-MOS: Moving Object Segmentation in 3D LiDAR Data This repo contains the code for our paper: Moving Object Segmentation in 3D LiDAR Data: A Learn

394 Dec 29, 2022

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

63 Oct 17, 2022

PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.

Federated Learning with Non-IID Data This is an implementation of the following paper: Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, Vik

48 Dec 29, 2022

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

2 Dec 17, 2021

DuBE: Duple-balanced Ensemble Learning from Skewed Data

Related tags

Overview

DuBE: Duple-balanced Ensemble Learning from Skewed Data

"Towards Inter-class and Intra-class Imbalance in Class-imbalanced Learning"
(IEEE ICDE 2022 Submission) [Documentation] [Examples]

Table of Contents

Background

Install

Usage

Documentation

You might also like...

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

Automatically download the cwru data set, and then divide it into training data set and test data set

Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Owner

Official Pytorch implementation of "Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021)

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

Intrusion Detection System using ensemble learning (machine learning)

The Python ensemble sampling toolkit for affine-invariant MCMC

Neural Ensemble Search for Performant and Calibrated Predictions

An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

zeus is a Python implementation of the Ensemble Slice Sampling method.

Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation

DuBE: Duple-balanced Ensemble Learning from Skewed Data

Related tags

Overview

DuBE: Duple-balanced Ensemble Learning from Skewed Data

"Towards Inter-class and Intra-class Imbalance in Class-imbalanced Learning" (IEEE ICDE 2022 Submission) [Documentation] [Examples]

Table of Contents

Background

Install

Usage

Documentation

You might also like...

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

Automatically download the cwru data set, and then divide it into training data set and test data set

Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Owner

Official Pytorch implementation of "Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021)

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

Intrusion Detection System using ensemble learning (machine learning)

The Python ensemble sampling toolkit for affine-invariant MCMC

Neural Ensemble Search for Performant and Calibrated Predictions

An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

zeus is a Python implementation of the Ensemble Slice Sampling method.

Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation

"Towards Inter-class and Intra-class Imbalance in Class-imbalanced Learning"
(IEEE ICDE 2022 Submission) [Documentation] [Examples]

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.