Boosted neural network for tabular data

Related tags

Deep Learning XBNet
Overview

XBNet - Xtremely Boosted Network

Boosted neural network for tabular data

XBNet is an open source project which is built with PyTorch which tries to combine tree based paradigm of models with neural networks to create a robust architecture

Features

  • Better performance
  • Faster training and inference speed
  • Easy to implement with rapid prototyping capabilities

Features to be added :

  • Metrics for different requirements
  • Addition of some other types of layers

  • If you have any improvements create an issue and if you want you can also make a pull request for the same


Developed with ❤️ by Tushar Sarkar

Comments
  • XBNET for Regression Analysis

    XBNET for Regression Analysis

    Hello Tushar,

    I am trying to use XBNet for Regression analysis in my research work. Is there any prototype for the application of XBNet when in comes to Regression analysis? I see there is a class XBNETRegressor but would be helpful if there is an example like that for Classifier. Any pointers would also be appreciated.

    Thanks & Regards, Rajat

    opened by Rajat18 4
  • Benchmark

    Benchmark

    Is there any comparison against other algorithms like XGBoost?

    In this paper, XGBoost is compared to other networks:

    https://arxiv.org/pdf/2106.03253.pdf

    Thank you.

    opened by deadsoul44 2
  • The function of epsilon in this algorithm

    The function of epsilon in this algorithm

    I rebuild the algorithm,but at the "self.boosted_layers[i] = torch.from_numpy(np.array(self.xg.fit(x0.detach().numpy(), (self.l).detach().numpy()).feature_importances_) + self.epsilon)", I can not understand , that feature_importances_ will not change . I think that the function of epsilon does not work.

    opened by su123123123 2
  • XGBoost segmentation fault

    XGBoost segmentation fault

    Hi Tushar. Thanks for sharing the package. I am facing an issue with the line self.temp1 = XGBClassifier().fit(self.X, self.y).feature_importances_ in the method base_tree() of models.py. I can't get much out of the error which says:

    UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
      warnings.warn(label_encoder_deprecation_msg, UserWarning)
    
    Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
    

    The surprising thing is that when I run XGBClassifier().fit(X, y) on the same data in an IPython console, it runs fine.

    I am using the same script as provided in the README:

    import torch
    import numpy as np
    import pandas as pd
    from sklearn.preprocessing import LabelEncoder
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split
    from XBNet.training_utils import training,predict
    from XBNet.models import XBNETClassifier
    from XBNet.run import run_XBNET
    
    # Experiment with Iris data directly from sklearn.
    # iris = load_iris()
    # data = pd.DataFrame(iris.data)
    # data.columns = iris.feature_names
    # data.loc[:, 'type'] = iris.target
    # X, y = data.iloc[:, :-1], data.iloc[:, -1]
    # X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)
    
    data = pd.read_csv('test/Iris.csv')
    print(data.shape)
    x_data = data[data.columns[:-1]]
    print(x_data.shape)
    y_data = data[data.columns[-1]]
    le = LabelEncoder()
    y_data = np.array(le.fit_transform(y_data))
    print(le.classes_)
    
    X_train,X_test,y_train,y_test = train_test_split(x_data.to_numpy(),y_data,test_size = 0.3,random_state = 0)
    model = XBNETClassifier(X_train,y_train,2)
    
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
    
    m,acc, lo, val_ac, val_lo = run_XBNET(X_train,X_test,y_train,y_test,model,criterion,optimizer,32,300)
    print(predict(m,x_data.to_numpy()[0,:]))
    

    Any idea what could be going wrong here? Following are some details of my system that might be useful:

    • OS: macOS 11.5.2
    • Python 3.8.11
    • XGBoost version: 1.4.2
    • XBNet: 1.3.1
    opened by tejas-kale 1
  • Please update eval_metric from 'merror' to 'mlogloss'

    Please update eval_metric from 'merror' to 'mlogloss'

    Please update eval_metric from 'merror' to 'mlogloss'

    WARNING: ../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior.

    opened by GDGauravDutta 1
  • run_XBNE import error

    run_XBNE import error

    I tried to run this package on colab facing the below error, when I try to import the run_XBNE

    from XBNet.run import run_XBNE

    ImportError: cannot import name 'run_XBNE' from 'XBNet.run' (/usr/local/lib/python3.7/dist-packages/XBNet/run.py)

    opened by sarathsurpur 1
  • Bump numpy from 1.21.2 to 1.22.0

    Bump numpy from 1.21.2 to 1.22.0

    Bumps numpy from 1.21.2 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Confusion between implemented code and the algorithm in the paper

    Confusion between implemented code and the algorithm in the paper

    While in the code implemented, the minimum value of the gradient is taken into account, and subsequently, the gradient of the layer is updated with the feature importance. image

    However, in the paper, the algorithm states that the minimum value of the weight of the layer should be taken, and accordingly, the weight matrix of the layer should be updated. image

    @tusharsarkar3 @dishaShah01 Please clarify this confusion. Also, if possible, please guide through the equation opted for the updation.

    Thank you so much.

    opened by kountaydwivedi 0
  • This error is shown on both colab and kaggle.

    This error is shown on both colab and kaggle.

    Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Collecting git+https://github.com/tusharsarkar3/XBNet.git Cloning https://github.com/tusharsarkar3/XBNet.git to /tmp/pip-req-build-pmimf9mo Running command git clone -q https://github.com/tusharsarkar3/XBNet.git /tmp/pip-req-build-pmimf9mo Requirement already satisfied: sklearn==0.0 in /usr/local/lib/python3.7/dist-packages (from XBNet==1.4.6) (0.0) ERROR: Could not find a version that satisfies the requirement numpy==1.22.0 (from xbnet) (from versions: 1.3.0, 1.4.1, 1.5.0, 1.5.1, 1.6.0, 1.6.1, 1.6.2, 1.7.0, 1.7.1, 1.7.2, 1.8.0, 1.8.1, 1.8.2, 1.9.0, 1.9.1, 1.9.2, 1.9.3, 1.10.0.post2, 1.10.1, 1.10.2, 1.10.4, 1.11.0, 1.11.1, 1.11.2, 1.11.3, 1.12.0, 1.12.1, 1.13.0rc1, 1.13.0rc2, 1.13.0, 1.13.1, 1.13.3, 1.14.0rc1, 1.14.0, 1.14.1, 1.14.2, 1.14.3, 1.14.4, 1.14.5, 1.14.6, 1.15.0rc1, 1.15.0rc2, 1.15.0, 1.15.1, 1.15.2, 1.15.3, 1.15.4, 1.16.0rc1, 1.16.0rc2, 1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.16.4, 1.16.5, 1.16.6, 1.17.0rc1, 1.17.0rc2, 1.17.0, 1.17.1, 1.17.2, 1.17.3, 1.17.4, 1.17.5, 1.18.0rc1, 1.18.0, 1.18.1, 1.18.2, 1.18.3, 1.18.4, 1.18.5, 1.19.0rc1, 1.19.0rc2, 1.19.0, 1.19.1, 1.19.2, 1.19.3, 1.19.4, 1.19.5, 1.20.0rc1, 1.20.0rc2, 1.20.0, 1.20.1, 1.20.2, 1.20.3, 1.21.0rc1, 1.21.0rc2, 1.21.0, 1.21.1, 1.21.2, 1.21.3, 1.21.4, 1.21.5, 1.21.6) ERROR: No matching distribution found for numpy==1.22.0

    opened by 12-crypto 0
  • The number of non-linear layers in the network?

    The number of non-linear layers in the network?

    Could you clarify, is there only 1 non-linear layer in your network architecture at the end?

    As far as I understood, we are sequentially adding nn.Linear layers to the Sequential list, then, there rises a choice whether to add the activation function (in your Iris example (opened issue), you did not add any activation function, therefore, in fact the whole network is linear in fact). Is it the only non-linearity existing in the network?

    opened by wallykop 0
  • Does XBNet work with a data generator similar to keras models?

    Does XBNet work with a data generator similar to keras models?

    Hey I wanted to know if I can use a data generator for data that is too large to fit into memory. I know XGBoost and other similar gradient boosted models have a very slow implementation of to read and re-read a CSV into memory for large datasets but that is much too slow for me. I was hoping I could use a generator with XBNet and read a large H5 file of tab data into memory iteratively and train on the entire dataset the way keras models do it.

    opened by PhillipMaire 0
  • Results for Iris dataset

    Results for Iris dataset

    Hello,

    I am trying to recreate your results for the Iris dataset. I have installed XBNET and all the required libraries.

    I am following the example code provided in the READ ME and I am having some trouble with running the code.

    Could you please let me know what input and output dimensions you used for layers 1 and 2, with their respective biases, as well as what your last layer was?

    Thank you very much.

    help wanted 
    opened by Bolu98 5
Owner
Tushar Sarkar
I love solving problems with data
Tushar Sarkar
Multivariate Boosted TRee

Multivariate Boosted TRee What is MBTR MBTR is a python package for multivariate boosted tree regressors trained in parameter space. The package can h

SUPSI-DACD-ISAAC 61 Dec 19, 2022
Boosted CVaR Classification (NeurIPS 2021)

Boosted CVaR Classification Runtian Zhai, Chen Dan, Arun Sai Suggala, Zico Kolter, Pradeep Ravikumar NeurIPS 2021 Table of Contents Quick Start Train

Runtian Zhai 4 Feb 15, 2022
The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

This repository is the official PyTorch implementation of SAINT. Find the paper on arxiv SAINT: Improved Neural Networks for Tabular Data via Row Atte

Gowthami Somepalli 284 Dec 21, 2022
PyTorch implementation for OCT-GAN Neural ODE-based Conditional Tabular GANs (WWW 2021)

OCT-GAN: Neural ODE-based Conditional Tabular GANs (OCT-GAN) Code for reproducing the experiments in the paper: Jayoung Kim*, Jinsung Jeon*, Jaehoon L

BigDyL 7 Dec 27, 2022
A standard framework for modelling Deep Learning Models for tabular data

PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike.

null 801 Jan 8, 2023
deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

null 63 Oct 17, 2022
The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"

SubTab: Author: Talip Ucar ([email protected]) The official implementation of the paper, SubTab: Subsetting Features of Tabular Data for Self-Supervis

AstraZeneca 98 Dec 29, 2022
A framework for attentive explainable deep learning on tabular data

?? kendrite A framework for attentive explainable deep learning on tabular data ?? Quick start kedro run ?? Built upon Technology Description Links ke

Marnix Koops 3 Nov 6, 2021
Job-Recommend-Competition - Vectorwise Interpretable Attentions for Multimodal Tabular Data

SiD - Simple Deep Model Vectorwise Interpretable Attentions for Multimodal Tabul

Jungwoo Park 40 Dec 22, 2022
This is a model made out of Neural Network specifically a Convolutional Neural Network model

This is a model made out of Neural Network specifically a Convolutional Neural Network model. This was done with a pre-built dataset from the tensorflow and keras packages. There are other alternative libraries that can be used for this purpose, one of which is the PyTorch library.

null 9 Oct 18, 2022
Calculates carbon footprint based on fuel mix and discharge profile at the utility selected. Can create graphs and tabular output for fuel mix based on input file of series of power drawn over a period of time.

carbon-footprint-calculator Conda distribution ~/anaconda3/bin/conda install anaconda-client conda-build ~/anaconda3/bin/conda config --set anaconda_u

Seattle university Renewable energy research 7 Sep 26, 2022
[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets Introduction This repo contains the source code accompanying the paper: Well-tuned Sim

null 52 Jan 4, 2023
Research on Tabular Deep Learning (Python package & papers)

Research on Tabular Deep Learning For paper implementations, see the section "Papers and projects". rtdl is a PyTorch-based package providing a user-f

Yura Gorishniy 510 Dec 30, 2022
This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

The Neural Process Family This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CN

DeepMind 892 Dec 28, 2022
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

Unity Technologies 187 Dec 24, 2022
Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

Intel Labs 210 Jan 4, 2023
Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

A Simple Neural Network from scratch A Simple Neural Network from scratch in Pyt

Youssef Chafiqui 2 Jan 7, 2022
Facilitates implementing deep neural-network backbones, data augmentations

Introduction Nowadays, the training of Deep Learning models is fragmented and unified. When AI engineers face up with one specific task, the common wa

null 40 Dec 29, 2022
Pytorch implementation of Cut-Thumbnail in the paper Cut-Thumbnail:A Novel Data Augmentation for Convolutional Neural Network.

Cut-Thumbnail (Accepted at ACM MULTIMEDIA 2021) Tianshu Xie, Xuan Cheng, Xiaomin Wang, Minghui Liu, Jiali Deng, Tao Zhou, Ming Liu This is the officia

null 3 Apr 12, 2022