Python Auto-ML Package for Tabular Datasets

Sagnik Roy

Last update: Nov 20, 2022

Related tags

Deep Learning numpy scikit-learn pandas python3 python-package automl modular-architecture

Overview

Tabular-AutoML

AutoML Package for tabular datasets

Tabular dataset tuning is now hassle free!

Run one liner command and get best tuning and processed dataset in a go.

Used Python Libraries :

Installation & Usage

Create a Virtual Environment : Tutorial
Clone the repository.
Open the directory with cmd.
Copy this command in terminal to install dependencies.

pip install -r requirements.txt

Installing the requirements.txt may generate some error due to outdated MS Visual C++ Build. You can fix this problem using this.
First check the parser variable that has to be passed with all customizations.

>>> python -m tab_automl.main --help
usage: main.py [-h] -d  -t  -tf  [-p] [-f] [-spd] [-sfd] [-sm]

automl hyper parameters

optional arguments:
  -h, --help            show this help message and exit
  -d , --data-source    File path
  -t , --problem-type   Problem Type , currently supporting *regression* or *classification*
  -tf , --target-feature
                        Target feature inside the data
  -p , --pre-proc       If data processing is required
  -f , --fet-eng        If feature engineering is required
  -spd , --save-proc-data
                        Save the processed data
  -sfd , --save-fet-data
                        Save the feature engineered data
  -sm , --save-model    Save the best trained model

Now run the command with your custom data, problem type and target feature

>> # For Classification Problem >>> python -m tab_automl.main -d "your custom data scource\custom_data.csv" -t "classification" -tf "your_custom_target_feature" -spd "true" -sfd "true" -sm "true"">

>>> # For Regression Problem
>>> python -m tab_automl.main -d "your custom data scource\custom_data.csv" -t "regression" -tf "your_custom_target_feature" -spd "true" -sfd "true" -sm "true"

>>> # For Classification Problem
>>> python -m tab_automl.main -d "your custom data scource\custom_data.csv" -t "classification" -tf "your_custom_target_feature" -spd "true" -sfd "true" -sm "true"

Contributing Guidelines

Coment on the issue on which you want to work.
If you get assigned, fork the repository.
Create a new branch which should be named on your github user_id , e.g. sagnik1511.
Update the changes on that branch.
Create a PR (Pull request) to the main branch of the parent repository.
The PR title should named like this [Issue Number] Heading of the issue.
Describe the changes you have done with proper reasons.

Contributors

Sagnik Roy : sagnik1511

If you like the project, do ⭐

Also follow me on GitHub , Kaggle , LinkedIn

Thank You for Visiting :)

Comments

Add new dataset for clustering problems inside datasets folder

Add a new tabular dataset for clustering problems inside tab_automl/datasets and also add the dataset class inside tab_automl/automl/datasets.py.

Add proper comments and quality in code.

Follow contributing guidelines on README.md
enhancement JWOC medium

opened by sagnik1511 7
Adding KNN classifier

The PR fix for issues #7

Added KNN classifier to models.py the advantage of this is it takes almost zero time to train because it only stores the data of the training part. and faster than all the models mentioned in the models.py file , it is also a non parametric models with only parameter that needs to be mentioned is the number of neighbours, adding to this as KNN doesnt undergo training we can add new data to it which doesn't affect the accuracy of the model. It is also very easy to implement and interpret as there is only one hyperparameter which is the number of neighbours .

opened by VishnuBhaarath 5
Adding clustering 5 models inside single_model_dict #32

PR tagged with #32

I have added 5 clustering models in automl -> models.py . Name of the clustering models are as follows: AffinityPropagation AgglomerativeClustering Birch DBSCAN KMeans . DESCRIPTION AffinityPropagation - It involves finding a set of exemplars that best summarize the data. It takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged between data points until a high-quality set of exemplars and corresponding clusters gradually emerges . AgglomerativeClustering- It involves merging examples until the desired number of clusters is achieved.It is implemented via the class AgglomerativeClustering and the main configuration to tune is the “n_clusters” set, an estimate of the number of clusters in the data, e.g. 2. . BIRCH -BIRCH Clustering involves constructing a tree structure from which cluster centroids are extracted. main configuration to tune is the “threshold” and “n_clusters” hyperparameters, the latter of which provides an estimate of the number of clusters. . DBSCAN Clustering involves finding high-density areas in the domain and expanding those areas of the feature space around them as clusters. the main configuration to tune is the “eps” and “min_samples” hyperparameters. . KMEANS -the main configuration to tune is the “n_clusters” hyperparameter set to the estimated number of clusters in the data.
JWOC easy

opened by Tihsrah 3
KNN regressor

PR fix for issue #6

Added KNN regressor to models.py the advantage of this is it takes almost zero time to train because it only stores the data of the training part. and faster than all the models mentioned in the models.py file in training time as its zero in KNN regressor , it is also a non parametric models with only parameter that needs to be mentioned is the number of neighbours, adding to this as KNN doesn't undergo training we can add new data to it which doesn't affect the accuracy of the model. It is also very easy to implement and interpret as there is only one hyperparameter which is the number of neighbours, apart from this it's versatile and can be used as a regressor as well as classifier
JWOC easy

opened by VishnuBhaarath 2
Update all Print statements to f-string

PR Fix for issue #3

I have updated all the Stings in print statements to f-string. I have changed tab_automl/automl/datasets.py, tab_automl/automl/fet_engineering.py, tab_automl/automl/processing.py, tab_automl/automl/training.py, tab_automl/main.py, test.py files and updated all the print statements in these files.
JWOC easy

opened by kunalchhabra37 2
[ Taking different formats as input data {#4}] Load data from different file formats

new file formats which are added are: .txt .json .xlsx .sqlite

changes are made on ClassificationDataset in datasets.py

One input is added to get table name from the sqlite in the .sqlite file format
JWOC medium

opened by Tihsrah 2
Load data from different file formats.

The datasets are getting loaded on .csv format only in the codebase, example : see here.

Add different data loading techniques for other formats like .txt , .sqlite, etc. Add required comments in the code.

Follow contributing guidelines on README.md
enhancement JWOC medium

opened by sagnik1511 2
$[Issue {#32}]Add clustering models inside single_model_dict$

[Issue {#32}]Add clustering models inside single_model_dict

This PR fixes for issue #32

Changes made

Added 5 new clustering models in models.py

Reason

Mini-Batch K-Means Mini-Batch K-Means is a modified version of k-means that makes updates to the cluster centroids using mini-batches of samples rather than the entire dataset, which can make it faster for large datasets, and perhaps more robust to statistical noise. Mean Shift Mean shift clustering involves finding and adapting centroids based on the density of examples in the feature space. OPTICS OPTICS clustering (where OPTICS is short for Ordering Points To Identify the Clustering Structure) is a modified version of DBSCAN. Spectral Clustering Spectral Clustering is a general class of clustering methods, drawn from linear algebra. to tune is the n_clusters hyperparameter used to specify the estimated number of clusters in the data. Gaussian Mixture Model A Gaussian mixture model summarizes a multivariate probability density function with a mixture of Gaussian probability distributions.
JWOC easy

opened by snega16 1
[Issue {#25}] Add new dataset for clustering problems inside datasets folder
This PR fixes for issue #25

Changes made:

Added new clustering dataset Credit Card Customer data to datasets

Updated datasets,py with clustering class clustering() and clustering dataset class called Credit_Card_Customer_Data().

JWOC medium
opened by snega16 1
[Issue {#7}] add new models for classification training

The PR fix for issues #7

Added KNN classifier to models.py the advantage of this is it takes almost zero time to train because it only stores the data of the training part. and faster than all the models mentioned in the models.py file , it is also a non parametric models with only parameter that needs to be mentioned is the number of neighbours, adding to this as KNN doesnt undergo training we can add new data to it which doesn't affect the accuracy of the model. It is also very easy to implement and interpret as there is only one hyperparameter which is the number of neighbours .
JWOC easy

opened by VishnuBhaarath 1
[Issue {#7}] Add new models for classification training
This PR fixes for issue #7

Changes made

Added new classification model XGBoost Classifier in models.py

Added new requirements for the model in requirements.txt

Reason

XGBoost Classifier model handles the missing data efficiently and it also has built in cross validation capability. It is also regularized, so the models don't overfit. To add, it also uses gradient descent algorithm to minimize loss. So this model can give good accuracy.
JWOC easy
opened by snega16 1
Add clustering models inside single_model_dict

What you have to do- 1. Add 5 new clustering models inside tab_automl.automl.models file's single_model_dict object. 2. Follow same code representations.

Follow contributing guidelines on README.md
JWOC easy

opened by sagnik1511 10
Add a new single_model_trainer function for training clustering problems.

What you have to do- 1. Add a new function inside the tab_automl.automl.training.Trainer class for training clustering problems. 2. Inside the main function of tab_automl.main some implementation will be needed so that the clustering problem type fits.

Follow contributing guidelines on README.md
JWOC medium

opened by sagnik1511 0
Update the parser with the new problem type "Clustering"

What you have to do - 1. Update the parser's problem type definitions. 2. Update the tab_automl.utils.misc.validate_parse_variable as it was prepared to check only the problem types of classification and regression. 3. The target variable parser should have a default value None as the clustering problem won't allow any target variable, but keep in mind if the problem type is some supervised technique, then the target_feature should be checked inside .tab_automl.utils.misc.validate_parse_variable function. 4. Also update the README.md where it specifies the problem types.

Follow contributing guidelines on README.md
help wanted JWOC medium

opened by sagnik1511 0
Add a parameter of k-fold validation inside training
Add k-fold validation for chosen datasets.

Add appropriate print statements and comments inside the code.

Add all utilities on tab_automl.utils.training

If possible update the parser too with a variable named -kf --k-fold which takes the number of folds. (Optional)

Follow contributing guidelines on README.md
help wanted hard JWOC
opened by sagnik1511 0
Add a new class "OutlierProcessing" under processing
Prepare a new class under the processing module.

Prepare the functions with a proper idea and also add appropriate comments.

Add a function "run" inside the "OutlierProcessing" which will go through every feature, e.g. link.

Add the function under the class Preprocessing.

Follow contributing guidelines on README.md
enhancement JWOC medium
opened by sagnik1511 4
Add new loss functions on training
Add 3 loss functions for both regression and classification problem types.

Add them similarly to how the model scores are stored. See here

Add proper comments.

If new functions are needed for the loss functions, store them on tab_automl.utils.training .

Update the requirements if new libraries are being used.

Follow contributing guidelines on README.md
enhancement hard JWOC
opened by sagnik1511 8

Owner

Sagnik Roy

Data Science Intern @ Argoid • Video Games & Machine Vision attracts me!

GitHub

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

45 Dec 8, 2022

The toolkit to generate auto labeled datasets

Ozeu Ozeu is the toolkit to autolabal dataset for instance segmentation. You can generate datasets labaled with segmentation mask and bounding box fro

28 Mar 28, 2022

An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

EasyDatas An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results Installation pip install git+https

4 Dec 14, 2021

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Deep Learning Dataset Maker Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data. How to use Down

25 Dec 15, 2022

Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

Continual learning datasets Introduction This repository contains PyTorch image

5 Aug 28, 2022

A standard framework for modelling Deep Learning Models for tabular data

PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike.

801 Jan 8, 2023

Implementation of TabTransformer, attention network for tabular data, in Pytorch

Tab Transformer Implementation of Tab Transformer, attention network for tabular data, in Pytorch. This simple architecture came within a hair's bread

420 Jan 5, 2023

Boosted neural network for tabular data

XBNet - Xtremely Boosted Network Boosted neural network for tabular data XBNet is an open source project which is built with PyTorch which tries to co

175 Jan 4, 2023

The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

This repository is the official PyTorch implementation of SAINT. Find the paper on arxiv SAINT: Improved Neural Networks for Tabular Data via Row Atte

284 Dec 21, 2022

Calculates carbon footprint based on fuel mix and discharge profile at the utility selected. Can create graphs and tabular output for fuel mix based on input file of series of power drawn over a period of time.

carbon-footprint-calculator Conda distribution ~/anaconda3/bin/conda install anaconda-client conda-build ~/anaconda3/bin/conda config --set anaconda_u

Seattle university Renewable energy research

7 Sep 26, 2022

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

63 Oct 17, 2022

The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"

SubTab: Author: Talip Ucar ([email protected]) The official implementation of the paper, SubTab: Subsetting Features of Tabular Data for Self-Supervis

98 Dec 29, 2022

A framework for attentive explainable deep learning on tabular data

?? kendrite A framework for attentive explainable deep learning on tabular data ?? Quick start kedro run ?? Built upon Technology Description Links ke

3 Nov 6, 2021

PyTorch implementation for OCT-GAN Neural ODE-based Conditional Tabular GANs (WWW 2021)

OCT-GAN: Neural ODE-based Conditional Tabular GANs (OCT-GAN) Code for reproducing the experiments in the paper: Jayoung Kim*, Jinsung Jeon*, Jaehoon L

7 Dec 27, 2022

Job-Recommend-Competition - Vectorwise Interpretable Attentions for Multimodal Tabular Data

SiD - Simple Deep Model Vectorwise Interpretable Attentions for Multimodal Tabul

40 Dec 22, 2022

The pyrelational package offers a flexible workflow to enable active learning with as little change to the models and datasets as possible

pyrelational is a python active learning library developed by Relation Therapeutics for rapidly implementing active learning pipelines from data management, model development (and Bayesian approximation), to creating novel active learning strategies.

95 Dec 27, 2022

Código de um painel de auto atendimento feito em Python.

Painel de Auto-Atendimento O intuito desse projeto era fazer em Python um programa que simulasse um painel de auto atendimento, no maior estilo Mac Do

2 Nov 9, 2022

A little Python application to auto tag your photos with the power of machine learning.

Tag Machine A little Python application to auto tag your photos with the power of machine learning. Report a bug or request a feature Table of Content

14 Dec 21, 2022

Python script that analyses the given datasets and comes up with the best polynomial regression representation with the smallest polynomial degree possible

Python script that analyses the given datasets and comes up with the best polynomial regression representation with the smallest polynomial degree possible, to be the most reliable with the least complexity possible

2 Aug 1, 2022

Python Auto-ML Package for Tabular Datasets

Related tags

Overview

Tabular-AutoML

AutoML Package for tabular datasets

Tabular dataset tuning is now hassle free!

Run one liner command and get best tuning and processed dataset in a go.

Installation & Usage

Contributing Guidelines

Contributors

Sagnik Roy : sagnik1511

If you like the project, do ⭐

Also follow me on GitHub , Kaggle , LinkedIn

Thank You for Visiting :)

Comments

Changes made

Reason

Owner

Sagnik Roy

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

The toolkit to generate auto labeled datasets

An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

A standard framework for modelling Deep Learning Models for tabular data

Implementation of TabTransformer, attention network for tabular data, in Pytorch

Boosted neural network for tabular data

The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

Calculates carbon footprint based on fuel mix and discharge profile at the utility selected. Can create graphs and tabular output for fuel mix based on input file of series of power drawn over a period of time.

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"

A framework for attentive explainable deep learning on tabular data

PyTorch implementation for OCT-GAN Neural ODE-based Conditional Tabular GANs (WWW 2021)

Job-Recommend-Competition - Vectorwise Interpretable Attentions for Multimodal Tabular Data

The pyrelational package offers a flexible workflow to enable active learning with as little change to the models and datasets as possible

Código de um painel de auto atendimento feito em Python.

A little Python application to auto tag your photos with the power of machine learning.

Python script that analyses the given datasets and comes up with the best polynomial regression representation with the smallest polynomial degree possible