InfiniteBoost: building infinite ensembles with gradient descent

Overview

InfiniteBoost

Code for a paper
InfiniteBoost: building infinite ensembles with gradient descent (arXiv:1706.01109).
A. Rogozhnikov, T. Likhomanenko

Description

InfiniteBoost is an approach to building ensembles which combines best sides of random forest and gradient boosting.

Trees in the ensemble encounter mistakes done by previous trees (as in gradient boosting), but due to modified scheme of encountering contributions the ensemble converges to the limit, thus avoiding overfitting (just as random forest).

Left: InfiniteBoost with automated search of capacity vs gradient boosting with different learning rates (shrinkages), right: random forest vs InfiniteBoost with small capacities.

More plots of comparison in research notebooks and in research/plots directory.

Reproducing research

Research is performed in jupyter notebooks (if you're not familiar, read why Jupyter notebooks are awesome).

You can use the docker image arogozhnikov/pmle:0.01 from docker hub. Dockerfile is stored in this repository (ubuntu 16 + basic sklearn stuff).

To run the environment (sudo is needed on Linux):

sudo docker run -it --rm -v /YourMountedDirectory:/notebooks -p 8890:8890 arogozhnikov/pmle:0.01

(and open localhost:8890 in your browser).

InfiniteBoost package

Self-written minimalistic implementation of trees as used for experiments against boosting.

Specific implementation was used to compare with random forest and based on the trees from scikit-learn package.

Code written in python 2 (expected to work with python 3, but not tested), some critical functions in fortran, so you need gfortran + openmp installed before installing the package (or simply use docker image).

pip install numpy
pip install .
# testing (optional)
cd tests && nosetests .

You can use implementation of trees from the package for your experiments, in this case please cite InfiniteBoost paper.

Comments
  • [CONFUSED] -fopenmp -O3

    [CONFUSED] -fopenmp -O3" failed with exit status 1 ??

    Hi. I don't understand why using pip install . throws me error:

    Running setup.py install for infiniteboost ... error Complete output from command /home/lemma/miniconda2/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-Q3sQ_y-build/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-RqOC6j-record/install-record.txt --single-version-externally-managed --compile: running install running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands --compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options running build_src build_src building extension "infiniteboost.fortranfunctions" sources f2py options: [] adding 'build/src.linux-x86_64-2.7/fortranobject.c' to sources. adding 'build/src.linux-x86_64-2.7' to include_dirs. adding 'build/src.linux-x86_64-2.7/infiniteboost/fortranfunctions-f2pywrappers2.f90' to sources. build_src: building npy-pkg config files running build_py creating build/lib.linux-x86_64-2.7 creating build/lib.linux-x86_64-2.7/infiniteboost copying infiniteboost/researchlosses.py -> build/lib.linux-x86_64-2.7/infiniteboost copying infiniteboost/researchboosting.py -> build/lib.linux-x86_64-2.7/infiniteboost copying infiniteboost/init.py -> build/lib.linux-x86_64-2.7/infiniteboost copying infiniteboost/researchtree.py -> build/lib.linux-x86_64-2.7/infiniteboost running build_ext customize UnixCCompiler customize UnixCCompiler using build_ext customize Gnu95FCompiler Found executable /usr/bin/gfortran customize Gnu95FCompiler customize Gnu95FCompiler using build_ext building 'infiniteboost.fortranfunctions' extension compiling C sources C compiler: gcc -pthread -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fopenmp -O2 -march=core2 -ftree-vectorize -fPIC

    creating build/temp.linux-x86_64-2.7
    creating build/temp.linux-x86_64-2.7/build
    creating build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7
    creating build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/infiniteboost
    compile options: '-Ibuild/src.linux-x86_64-2.7 -I/home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include -I/home/lemma/miniconda2/include/python2.7 -c'
    gcc: build/src.linux-x86_64-2.7/fortranobject.c
    In file included from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1777:0,
                     from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
                     from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                     from build/src.linux-x86_64-2.7/fortranobject.h:13,
                     from build/src.linux-x86_64-2.7/fortranobject.c:2:
    /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
     #warning "Using deprecated NumPy API, disable it by " \
      ^
    gcc: build/src.linux-x86_64-2.7/infiniteboost/fortranfunctionsmodule.c
    In file included from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1777:0,
                     from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
                     from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                     from build/src.linux-x86_64-2.7/fortranobject.h:13,
                     from build/src.linux-x86_64-2.7/infiniteboost/fortranfunctionsmodule.c:19:
    /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
     #warning "Using deprecated NumPy API, disable it by " \
      ^
    build/src.linux-x86_64-2.7/infiniteboost/fortranfunctionsmodule.c: In function ‘initfortranfunctions’:
    build/src.linux-x86_64-2.7/infiniteboost/fortranfunctionsmodule.c:778:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
       Py_TYPE(&PyFortran_Type) = &PyType_Type;
       ^
    compiling Fortran 90 module sources
    creating build/temp.linux-x86_64-2.7/infiniteboost
    Fortran f77 compiler: /usr/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -fPIC -O3 -funroll-loops
    Fortran f90 compiler: /usr/bin/gfortran -Wall -g -fno-second-underscore -fPIC -O3 -funroll-loops
    Fortran fix compiler: /usr/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -Wall -g -fno-second-underscore -fPIC -O3 -funroll-loops
    compile options: '-Ibuild/src.linux-x86_64-2.7 -I/home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include -I/home/lemma/miniconda2/include/python2.7 -c'
    extra options: '-Jbuild/temp.linux-x86_64-2.7/infiniteboost -Ibuild/temp.linux-x86_64-2.7/infiniteboost'
    extra f90 options: '-fopenmp -O3'
    gfortran:f90: infiniteboost/fortranfunctions.f90
    infiniteboost/fortranfunctions.f90:124.14:
    
            !$OMP SIMD
                  1
    Error: Unclassifiable OpenMP directive at (1)
    infiniteboost/fortranfunctions.f90:124.14:
    
            !$OMP SIMD
                  1
    Error: Unclassifiable OpenMP directive at (1)
    error: Command "/usr/bin/gfortran -Wall -g -fno-second-underscore -fPIC -O3 -funroll-loops -Ibuild/src.linux-x86_64-2.7 -I/home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include -I/home/lemma/miniconda2/include/python2.7 -c -c infiniteboost/fortranfunctions.f90 -o build/temp.linux-x86_64-2.7/infiniteboost/fortranfunctions.o -Jbuild/temp.linux-x86_64-2.7/infiniteboost -Ibuild/temp.linux-x86_64-2.7/infiniteboost -fopenmp -O3" failed with exit status 1
    
    ----------------------------------------
    

    Command "/home/lemma/miniconda2/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-Q3sQ_y-build/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-RqOC6j-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-Q3sQ_y-build/

    ....Even though I have GNU (gcc, g++, gfortran) installed in my machine. I have Intel i5 4 cores.

    opened by kroscek 2
  • Make predictions for classfication task of infiniteBoost

    Make predictions for classfication task of infiniteBoost

    I am trying to test infiniteboost with titanic dataset from kaggle.

    titanic_df = pd.read_csv("train_cleaned") y = titanic_df["Survived"].values X = titanic_df.drop("Survived", axis = 1).values clf = InfiniteBoosting(loss = LogisticLoss(),n_estimators= 100) X = BinTransformer().fit_transform(X) clf.fit(X,y) ypred = clf.staged_decision_function(X) y_last_pred = clf.decision_function(X) y_last_pred

    It is a classification problem, how can I know the infiniteBoost will consider it as a classification problem?(the target variable is y, the value of y is 0 or 1(int)). And when I used the decision_function to make predictions, I found it doesn't look like neither probabilities nor classes. So, how does inifiniteBoost work with classification tasks and how to use it to make predictions of probabilities?

    opened by vatn 2
  • Question: InfiniteBoost vs XGBoost ?

    Question: InfiniteBoost vs XGBoost ?

    This is more a question than an issue (I can close it at any time), did you compare your approach with XGBoost implementation ? It could be interesting to compare, especially on overfitting.

    A small typo here : InfiniteBost -> InfiniteBoost

    Thanks

    opened by vfdev-5 2
  • Compare with other algorithms

    Compare with other algorithms

    Is it a deliberate decision to not compare this algorithm to popular implementations such as xgboost and lightgbm? If this is fundamental research I can imagine it is (not) yet at the same level. Giving some numbers for comparison will give a clearer view of the purpose of the paper to the reader :)

    opened by sbrugman 1
  • Why should I use InfiniBoost?

    Why should I use InfiniBoost?

    I read the paper (thanks) but I am still puzzled - I don't see any ground-breaking improvements in precision or performance over RF or GB? What is the big benefit?

    Thanks

    opened by hrstoyanov 2
Owner
Alex Rogozhnikov
ML + Science at scale
Alex Rogozhnikov
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 3, 2023
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 7, 2023
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 5, 2023
MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees.

MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees. MooGBT optimizes for multiple objectives by defining constraints on sub-objective(s) along with a primary objective. The constraints are defined as upper bounds on sub-objective loss function. MooGBT uses a Augmented Lagrangian(AL) based constrained optimization framework with Gradient Boosted Trees, to optimize for multiple objectives.

Swiggy 66 Dec 6, 2022
Bonsai: Gradient Boosted Trees + Bayesian Optimization

Bonsai is a wrapper for the XGBoost and Catboost model training pipelines that leverages Bayesian optimization for computationally efficient hyperparameter tuning.

null 24 Oct 27, 2022
Houseprices - Predict sales prices and practice feature engineering, RFs, and gradient boosting

House Prices - Advanced Regression Techniques Predicting House Prices with Machine Learning This project is build to enhance my knowledge about machin

null 1 Jan 1, 2022
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

null 23.3k Dec 31, 2022
Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API.

null 164 Jan 4, 2023
A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

Nicholas Monath 31 Nov 3, 2022
A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Intro PyTorch implementation of Learning to learn by gradient descent by gradient descent. Run python main.py TODO Initial implementation Toy data LST

Ilya Kostrikov 300 Dec 11, 2022
Fully Automated YouTube Channel ▶️with Added Extra Features.

Fully Automated Youtube Channel ▒█▀▀█ █▀▀█ ▀▀█▀▀ ▀▀█▀▀ █░░█ █▀▀▄ █▀▀ █▀▀█ ▒█▀▀▄ █░░█ ░░█░░ ░▒█░░ █░░█ █▀▀▄ █▀▀ █▄▄▀ ▒█▄▄█ ▀▀▀▀ ░░▀░░ ░▒█░░ ░▀▀▀ ▀▀▀░

sam-sepiol 249 Jan 2, 2023
Training vision models with full-batch gradient descent and regularization

Stochastic Training is Not Necessary for Generalization -- Training competitive vision models without stochasticity This repository implements trainin

Jonas Geiping 32 Jan 6, 2023
Iterative stochastic gradient descent (SGD) linear regressor with regularization

SGD-Linear-Regressor Iterative stochastic gradient descent (SGD) linear regressor with regularization Dataset: Kaggle “Graduate Admission 2” https://w

Zechen Ma 1 Oct 29, 2021
Plotting points that lie on the intersection of the given curves using gradient descent.

Plotting intersection of curves using gradient descent Webapp Link ---> What's the app about Why this app Plotting functions and their intersection. A

Divakar Verma 2 Jan 9, 2022
Fibonacci Method Gradient Descent

An implementation of the Fibonacci method for gradient descent, featuring a TKinter GUI for inputting the function / parameters to be examined and a matplotlib plot of the function and results.

Emma 1 Jan 28, 2022
Neural Oblivious Decision Ensembles

Neural Oblivious Decision Ensembles A supplementary code for anonymous ICLR 2020 submission. What does it do? It learns deep ensembles of oblivious di

null 25 Sep 21, 2022
Create large-scale ML-driven multiscale simulation ensembles to study the interactions

MuMMI RAS v0.1 Released: Nov 16, 2021 MuMMI RAS is the application component of the MuMMI framework developed to create large-scale ML-driven multisca

null 4 Feb 16, 2022
The command line interface for Gradient - Gradient is an an end-to-end MLOps platform

Gradient CLI Get started: Create Account • Install CLI • Tutorials • Docs Resources: Website • Blog • Support • Contact Sales Gradient is an an end-to

Paperspace 58 Dec 6, 2022
API for RL algorithm design & testing of BCA (Building Control Agent) HVAC on EnergyPlus building energy simulator by wrapping their EMS Python API

RL - EmsPy (work In Progress...) The EmsPy Python package was made to facilitate Reinforcement Learning (RL) algorithm research for developing and tes

null 20 Jan 5, 2023