DANet for Tabular data classification/ regression.

Related tags

Deep Learning DANet
Overview

Deep Abstract Networks

A pyTorch implementation for AAAI-2022 paper DANets: Deep Abstract Networks for Tabular Data Classification and Regression.

Brief Introduction

Tabular data are ubiquitous in real world applications. Although many commonly-used neural components (e.g., convolution) and extensible neural networks (e.g., ResNet) have been developed by the machine learning community, few of them were effective for tabular data and few designs were adequately tailored for tabular data structures. In this paper, we propose a novel and flexible neural component for tabular data, called Abstract Layer (AbstLay), which learns to explicitly group correlative input features and generate higher-level features for semantics abstraction. Also, we design a structure re-parameterization method to compress AbstLay, thus reducing the computational complexity by a clear margin in the reference phase. A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks. In DANets, a special shortcut path is introduced to fetch information from raw tabular features, assisting feature interactions across different levels. Comprehensive experiments on real-world tabular datasets show that our AbstLay and DANets are effective for tabular data classification and regression, and the computational complexity is superior to competitive methods.

DANets illustration

DANets

Downloads

Dataset

Download the datasets from the following links:

(Optional) Before starting the program, you may change the file format to .pkl by using svm2pkl() or csv2pkl() functions in ./data/data_util.py.

Weights for inference models

The demo weights for Forest Cover Type dataset is available in the folder "./Weights/".

How to use

Setting

  1. Clone or download this repository, and cd the path.
  2. Build a working python environment. Python 3.7 is fine for this repository.
  3. Install packages following the requirements.txt, e.g., by using pip install -r requirements.txt.

Training

  1. Set the hyperparameters in config files (./config/default.py or ./config/*.yaml).
    Notably, the hyperparameters in .yaml file will cover those in default.py.

  2. Run by python main.py --c [config_path] --g [gpu_id].

    • -c: The config file path
    • -g: GPU device ID
  3. The checkpoint models and best models will be saved at the ./logs file.

Inference

  1. Replace the resume_dir path with the file path containing your trained model/weight.
  2. Run codes by using python predict.py -d [dataset_name] -m [model_file_path] -g [gpu_id].
    • -d: Dataset name
    • -m: Model path for loading
    • -g: GPU device ID

Config Hyperparameters

Normal parameters

  • dataset: str
    The dataset name given must match those in ./data/dataset.py.

  • task: str
    Choose one of the pre-given tasks 'classification' and 'regression'.

  • resume_dir: str
    The log path containing the checkpoint models.

  • logname: str
    The directory names of the models save at ./logs.

  • seed: int
    The random seed.

Model parameters

  • layer: int (default=20)
    Number of abstract layers to stack

  • k: int (default=5)
    Number of masks

  • base_outdim: int (default=64)
    The output feature dimension in abstract layer.

  • drop_rate: float (default=0.1)
    Dropout rate in shortcut module

Fit parameters

  • lr: float (default=0.008)
    Learning rate

  • max_epochs: int (default=5000)
    Maximum number of epochs in training.

  • patience: int (default=1500)
    Number of consecutive epochs without improvement before performing early stopping. If patience is set to 0, then no early stopping will be performed.

  • batch_size: int (default=8192)
    Number of examples per batch.

  • virtual_batch_size: int (default=256)
    Size of the mini batches used for "Ghost Batch Normalization". virtual_batch_size must divide batch_size.

Citations

@inproceedings{danets, 
   title={DANets: Deep Abstract Networks for Tabular Data Classification and Regression}, 
   author={Chen, Jintai and Liao, Kuanlun and Wan, Yao and Chen, Danny Z and Wu, Jian}, 
   booktitle={AAAI}, 
   year={2022}
 }
You might also like...
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality
Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

A Network-Based High-Level Data Classification Algorithm Using Betweenness Centr

Calculates carbon footprint based on fuel mix and discharge profile at the utility selected. Can create graphs and tabular output for fuel mix based on input file of series of power drawn over a period of time.

carbon-footprint-calculator Conda distribution ~/anaconda3/bin/conda install anaconda-client conda-build ~/anaconda3/bin/conda config --set anaconda_u

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets
[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets Introduction This repo contains the source code accompanying the paper: Well-tuned Sim

PyTorch implementation for OCT-GAN Neural ODE-based Conditional Tabular GANs (WWW 2021)

OCT-GAN: Neural ODE-based Conditional Tabular GANs (OCT-GAN) Code for reproducing the experiments in the paper: Jayoung Kim*, Jinsung Jeon*, Jaehoon L

Research on Tabular Deep Learning (Python package & papers)

Research on Tabular Deep Learning For paper implementations, see the section "Papers and projects". rtdl is a PyTorch-based package providing a user-f

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

Code for
Code for "CloudAAE: Learning 6D Object Pose Regression with On-line Data Synthesis on Point Clouds" @ICRA2021

CloudAAE This is an tensorflow implementation of "CloudAAE: Learning 6D Object Pose Regression with On-line Data Synthesis on Point Clouds" Files log:

Owner
Ronnie Rocket
Ronnie Rocket
A standard framework for modelling Deep Learning Models for tabular data

PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike.

null 801 Jan 8, 2023
Implementation of TabTransformer, attention network for tabular data, in Pytorch

Tab Transformer Implementation of Tab Transformer, attention network for tabular data, in Pytorch. This simple architecture came within a hair's bread

Phil Wang 420 Jan 5, 2023
Boosted neural network for tabular data

XBNet - Xtremely Boosted Network Boosted neural network for tabular data XBNet is an open source project which is built with PyTorch which tries to co

Tushar Sarkar 175 Jan 4, 2023
The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

This repository is the official PyTorch implementation of SAINT. Find the paper on arxiv SAINT: Improved Neural Networks for Tabular Data via Row Atte

Gowthami Somepalli 284 Dec 21, 2022
deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

null 63 Oct 17, 2022
The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"

SubTab: Author: Talip Ucar ([email protected]) The official implementation of the paper, SubTab: Subsetting Features of Tabular Data for Self-Supervis

AstraZeneca 98 Dec 29, 2022
A framework for attentive explainable deep learning on tabular data

?? kendrite A framework for attentive explainable deep learning on tabular data ?? Quick start kedro run ?? Built upon Technology Description Links ke

Marnix Koops 3 Nov 6, 2021
Job-Recommend-Competition - Vectorwise Interpretable Attentions for Multimodal Tabular Data

SiD - Simple Deep Model Vectorwise Interpretable Attentions for Multimodal Tabul

Jungwoo Park 40 Dec 22, 2022
Web-interface + rest API for classification and regression (https://jeff1evesque.github.io/machine-learning.docs)

Machine Learning This project provides a web-interface, as well as a programmatic-api for various machine learning algorithms. Supported algorithms: S

Jeff Levesque 252 Dec 11, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 4, 2023