Python-based implementations of algorithms for learning on imbalanced data.

DIAL | Notre Dame

Last update: Dec 13, 2022

Related tags

Overview

ND DIAL: Imbalanced Algorithms

Minimalist Python-based implementations of algorithms for imbalanced learning. Includes deep and representational learning algorithms (implemented via TensorFlow). Below is a list of the methods currently implemented.

Undersampling
1. Random Majority Undersampling with/without Replacement
Oversampling
1. SMOTE - Synthetic Minority Over-sampling Technique [1]
2. DAE - Denoising Autoencoder [2] (TensorFlow)
3. GAN - Generative Adversarial Network [3] (TensorFlow)
4. VAE - Variational Autoencoder [4] (TensorFlow)
Ensemble Sampling
1. RAMOBoost [5]
2. RUSBoost [6]
3. SMOTEBoost [7]

References:

[1]	: N. V. Chawla, K. W. Bowyer, L. O. Hall, and P. Kegelmeyer. "SMOTE: Synthetic Minority Over-Sampling Technique." Journal of Artificial Intelligence Research (JAIR), 2002.

[2]	: P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. "Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion". Journal of Machine Learning Research (JMLR), 2010.

[3]	: I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. "Generative Adversarial Nets". Advances in Neural Information Processing Systems 27 (NIPS), 2014.

[4]	: D. P. Kingma and M. Welling. "Auto-Encoding Variational Bayes". arXiv preprint arXiv:1312.6114, 2013.

[5]	: S. Chen, H. He, and E. A. Garcia. "RAMOBoost: Ranked Minority Oversampling in Boosting". IEEE Transactions on Neural Networks, 2010.

[6]	: C. Seiffert, T. M. Khoshgoftaar, J. V. Hulse, and A. Napolitano. "RUSBoost: Improving Classification Performance when Training Data is Skewed". International Conference on Pattern Recognition (ICPR), 2008.

[7]	: N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer. "SMOTEBoost: Improving Prediction of the Minority Class in Boosting." European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), 2003.

You might also like...

Uplift modeling and causal inference with machine learning algorithms

Disclaimer This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to chang

3.7k Jan 7, 2023

Machine Learning Algorithms

Machine-Learning-Algorithms In this project, the dataset was created through a survey opened on Google forms. The purpose of the form is to find the p

3 Aug 10, 2022

Machine learning algorithms implementation

Machine learning algorithms implementation This repository consisits of implementation of various machine learning algorithms. The algorithms implemen

1 Jan 3, 2022

Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

implementation of machine learning Algorithms such as decision tree and random forest and xgboost on darasets then compare results for each and implement ant colony and genetic algorithms on tsp map, play blackjack game and robot in grid world and evaluate reward for it

1 Jan 19, 2022

A data preprocessing package for time series data. Design for machine learning and deep learning.

152 Jan 7, 2023

Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

4.9k Jan 5, 2023

Implementation of different ML Algorithms from scratch, written in Python 3.x

393 Nov 29, 2022

Implementation of linesearch Optimization Algorithms in Python

Nonlinear Optimization Algorithms During my time as Scientific Assistant at the Karlsruhe Institute of Technology (Germany) I implemented various Opti

3 Dec 6, 2022

An open-source library of algorithms to analyse time series in GPU and CPU.

216 Dec 30, 2022

Comments

is:issue is:open Multi-class SMOTEBoost

Just curious to ask, does this package supports a multi-class problem with a multi-minority scenario? Not sure if I read this correctly, but it seems to me the support is only for the binary class problem:

if minority_target is None:
    # Determine the minority class label.
    stats_c_ = Counter(y)
    maj_c_ = max(stats_c_, key=stats_c_.get)
    min_c_ = min(stats_c_, key=stats_c_.get)
    self.minority_target = min_c_
else:
    self.minority_target = minority_target

In my current task, I have a multi-majority multi-minority scenario:

Class = 0,	Count = 18749,	Percentage = 22.01
Class = 1,	Count = 3482,	Percentage = 4.09
Class = 2,	Count = 9566,	Percentage = 11.23
Class = 3,	Count = 49741,	Percentage = 58.4
Class = 4,	Count = 3634,	Percentage = 4.27

opened by arilwan 1

Python-based implementations of algorithms for learning on imbalanced data.

Related tags

Overview

ND DIAL: Imbalanced Algorithms

References:

You might also like...

Uplift modeling and causal inference with machine learning algorithms

Machine Learning Algorithms

Machine learning algorithms implementation

Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

A data preprocessing package for time series data. Design for machine learning and deep learning.

Distributed Evolutionary Algorithms in Python

Implementation of different ML Algorithms from scratch, written in Python 3.x

Implementation of linesearch Optimization Algorithms in Python

An open-source library of algorithms to analyse time series in GPU and CPU.

Comments

is:issue is:open Multi-class SMOTEBoost

Owner

DIAL | Notre Dame

LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate.

Implementations of Machine Learning models, Regularizers, Optimizers and different Cost functions.

A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms

Simple linear model implementations from scratch.

Implemented four supervised learning Machine Learning algorithms

scikit-multimodallearn is a Python package implementing algorithms multimodal data.

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

Metric learning algorithms in Python