InfiniteBoost: building infinite ensembles with gradient descent

Alex Rogozhnikov

Last update: Jan 3, 2023

Related tags

Overview

InfiniteBoost

Code for a paper
InfiniteBoost: building infinite ensembles with gradient descent (arXiv:1706.01109).
A. Rogozhnikov, T. Likhomanenko

Description

InfiniteBoost is an approach to building ensembles which combines best sides of random forest and gradient boosting.

Trees in the ensemble encounter mistakes done by previous trees (as in gradient boosting), but due to modified scheme of encountering contributions the ensemble converges to the limit, thus avoiding overfitting (just as random forest).

Left: InfiniteBoost with automated search of capacity vs gradient boosting with different learning rates (shrinkages), right: random forest vs InfiniteBoost with small capacities.

More plots of comparison in research notebooks and in research/plots directory.

Reproducing research

Research is performed in jupyter notebooks (if you're not familiar, read why Jupyter notebooks are awesome).

You can use the docker image arogozhnikov/pmle:0.01 from docker hub. Dockerfile is stored in this repository (ubuntu 16 + basic sklearn stuff).

To run the environment (sudo is needed on Linux):

sudo docker run -it --rm -v /YourMountedDirectory:/notebooks -p 8890:8890 arogozhnikov/pmle:0.01

(and open localhost:8890 in your browser).

InfiniteBoost package

Self-written minimalistic implementation of trees as used for experiments against boosting.

Specific implementation was used to compare with random forest and based on the trees from scikit-learn package.

Code written in python 2 (expected to work with python 3, but not tested), some critical functions in fortran, so you need gfortran + openmp installed before installing the package (or simply use docker image).

pip install numpy
pip install .
# testing (optional)
cd tests && nosetests .

You can use implementation of trees from the package for your experiments, in this case please cite InfiniteBoost paper.

Comments

[CONFUSED] -fopenmp -O3" failed with exit status 1 ??

Hi. I don't understand why using pip install . throws me error:

Running setup.py install for infiniteboost ... error Complete output from command /home/lemma/miniconda2/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-Q3sQ_y-build/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-RqOC6j-record/install-record.txt --single-version-externally-managed --compile: running install running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands --compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options running build_src build_src building extension "infiniteboost.fortranfunctions" sources f2py options: [] adding 'build/src.linux-x86_64-2.7/fortranobject.c' to sources. adding 'build/src.linux-x86_64-2.7' to include_dirs. adding 'build/src.linux-x86_64-2.7/infiniteboost/fortranfunctions-f2pywrappers2.f90' to sources. build_src: building npy-pkg config files running build_py creating build/lib.linux-x86_64-2.7 creating build/lib.linux-x86_64-2.7/infiniteboost copying infiniteboost/researchlosses.py -> build/lib.linux-x86_64-2.7/infiniteboost copying infiniteboost/researchboosting.py -> build/lib.linux-x86_64-2.7/infiniteboost copying infiniteboost/init.py -> build/lib.linux-x86_64-2.7/infiniteboost copying infiniteboost/researchtree.py -> build/lib.linux-x86_64-2.7/infiniteboost running build_ext customize UnixCCompiler customize UnixCCompiler using build_ext customize Gnu95FCompiler Found executable /usr/bin/gfortran customize Gnu95FCompiler customize Gnu95FCompiler using build_ext building 'infiniteboost.fortranfunctions' extension compiling C sources C compiler: gcc -pthread -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fopenmp -O2 -march=core2 -ftree-vectorize -fPIC

creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/build
creating build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/infiniteboost
compile options: '-Ibuild/src.linux-x86_64-2.7 -I/home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include -I/home/lemma/miniconda2/include/python2.7 -c'
gcc: build/src.linux-x86_64-2.7/fortranobject.c
In file included from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1777:0,
                 from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
                 from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from build/src.linux-x86_64-2.7/fortranobject.h:13,
                 from build/src.linux-x86_64-2.7/fortranobject.c:2:
/home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it by " \
  ^
gcc: build/src.linux-x86_64-2.7/infiniteboost/fortranfunctionsmodule.c
In file included from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1777:0,
                 from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
                 from /home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from build/src.linux-x86_64-2.7/fortranobject.h:13,
                 from build/src.linux-x86_64-2.7/infiniteboost/fortranfunctionsmodule.c:19:
/home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it by " \
  ^
build/src.linux-x86_64-2.7/infiniteboost/fortranfunctionsmodule.c: In function ‘initfortranfunctions’:
build/src.linux-x86_64-2.7/infiniteboost/fortranfunctionsmodule.c:778:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   Py_TYPE(&PyFortran_Type) = &PyType_Type;
   ^
compiling Fortran 90 module sources
creating build/temp.linux-x86_64-2.7/infiniteboost
Fortran f77 compiler: /usr/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -fPIC -O3 -funroll-loops
Fortran f90 compiler: /usr/bin/gfortran -Wall -g -fno-second-underscore -fPIC -O3 -funroll-loops
Fortran fix compiler: /usr/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -Wall -g -fno-second-underscore -fPIC -O3 -funroll-loops
compile options: '-Ibuild/src.linux-x86_64-2.7 -I/home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include -I/home/lemma/miniconda2/include/python2.7 -c'
extra options: '-Jbuild/temp.linux-x86_64-2.7/infiniteboost -Ibuild/temp.linux-x86_64-2.7/infiniteboost'
extra f90 options: '-fopenmp -O3'
gfortran:f90: infiniteboost/fortranfunctions.f90
infiniteboost/fortranfunctions.f90:124.14:

        !$OMP SIMD
              1
Error: Unclassifiable OpenMP directive at (1)
infiniteboost/fortranfunctions.f90:124.14:

        !$OMP SIMD
              1
Error: Unclassifiable OpenMP directive at (1)
error: Command "/usr/bin/gfortran -Wall -g -fno-second-underscore -fPIC -O3 -funroll-loops -Ibuild/src.linux-x86_64-2.7 -I/home/lemma/miniconda2/lib/python2.7/site-packages/numpy/core/include -I/home/lemma/miniconda2/include/python2.7 -c -c infiniteboost/fortranfunctions.f90 -o build/temp.linux-x86_64-2.7/infiniteboost/fortranfunctions.o -Jbuild/temp.linux-x86_64-2.7/infiniteboost -Ibuild/temp.linux-x86_64-2.7/infiniteboost -fopenmp -O3" failed with exit status 1

----------------------------------------

Command "/home/lemma/miniconda2/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-Q3sQ_y-build/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-RqOC6j-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-Q3sQ_y-build/

....Even though I have GNU (gcc, g++, gfortran) installed in my machine. I have Intel i5 4 cores.

opened by kroscek 2

Make predictions for classfication task of infiniteBoost

I am trying to test infiniteboost with titanic dataset from kaggle.

titanic_df = pd.read_csv("train_cleaned") y = titanic_df["Survived"].values X = titanic_df.drop("Survived", axis = 1).values clf = InfiniteBoosting(loss = LogisticLoss(),n_estimators= 100) X = BinTransformer().fit_transform(X) clf.fit(X,y) ypred = clf.staged_decision_function(X) y_last_pred = clf.decision_function(X) y_last_pred

It is a classification problem, how can I know the infiniteBoost will consider it as a classification problem?(the target variable is y, the value of y is 0 or 1(int)). And when I used the decision_function to make predictions, I found it doesn't look like neither probabilities nor classes. So, how does inifiniteBoost work with classification tasks and how to use it to make predictions of probabilities?

opened by vatn 2
Question: InfiniteBoost vs XGBoost ?

This is more a question than an issue (I can close it at any time), did you compare your approach with XGBoost implementation ? It could be interesting to compare, especially on overfitting.

A small typo here : InfiniteBost -> InfiniteBoost

Thanks

opened by vfdev-5 2
Compare with other algorithms

Is it a deliberate decision to not compare this algorithm to popular implementations such as xgboost and lightgbm? If this is fundamental research I can imagine it is (not) yet at the same level. Giving some numbers for comparison will give a clearer view of the purpose of the paper to the reader :)

opened by sbrugman 1
Why should I use InfiniBoost?

I read the paper (thanks) but I am still puzzled - I don't see any ground-breaking improvements in precision or performance over RF or GB? What is the big benefit?

Thanks

opened by hrstoyanov 2

Owner

Alex Rogozhnikov

ML + Science at scale

GitHub

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community

23.6k Jan 3, 2023

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

14.5k Jan 7, 2023

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

6.9k Jan 5, 2023

MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees.

MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees. MooGBT optimizes for multiple objectives by defining constraints on sub-objective(s) along with a primary objective. The constraints are defined as upper bounds on sub-objective loss function. MooGBT uses a Augmented Lagrangian(AL) based constrained optimization framework with Gradient Boosted Trees, to optimize for multiple objectives.

66 Dec 6, 2022

Bonsai: Gradient Boosted Trees + Bayesian Optimization

Bonsai is a wrapper for the XGBoost and Catboost model training pipelines that leverages Bayesian optimization for computationally efficient hyperparameter tuning.

24 Oct 27, 2022

Houseprices - Predict sales prices and practice feature engineering, RFs, and gradient boosting

House Prices - Advanced Regression Techniques Predicting House Prices with Machine Learning This project is build to enhance my knowledge about machin

1 Jan 1, 2022

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

23.3k Dec 31, 2022

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API.

164 Jan 4, 2023

A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

31 Nov 3, 2022

A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Intro PyTorch implementation of Learning to learn by gradient descent by gradient descent. Run python main.py TODO Initial implementation Toy data LST

300 Dec 11, 2022

Fully Automated YouTube Channel ▶️with Added Extra Features.

Fully Automated Youtube Channel ▒█▀▀█ █▀▀█ ▀▀█▀▀ ▀▀█▀▀ █░░█ █▀▀▄ █▀▀ █▀▀█ ▒█▀▀▄ █░░█ ░░█░░ ░▒█░░ █░░█ █▀▀▄ █▀▀ █▄▄▀ ▒█▄▄█ ▀▀▀▀ ░░▀░░ ░▒█░░ ░▀▀▀ ▀▀▀░

249 Jan 2, 2023

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl

6.1k Jan 5, 2023

API for RL algorithm design & testing of BCA (Building Control Agent) HVAC on EnergyPlus building energy simulator by wrapping their EMS Python API

RL - EmsPy (work In Progress...) The EmsPy Python package was made to facilitate Reinforcement Learning (RL) algorithm research for developing and tes

20 Jan 5, 2023

InfiniteBoost: building infinite ensembles with gradient descent

Related tags

Overview

InfiniteBoost

Description

Reproducing research

InfiniteBoost package

Comments

[CONFUSED] -fopenmp -O3" failed with exit status 1 ??

Make predictions for classfication task of infiniteBoost

Question: InfiniteBoost vs XGBoost ?

Compare with other algorithms

Why should I use InfiniBoost?

Owner

Alex Rogozhnikov

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees.

Bonsai: Gradient Boosted Trees + Bayesian Optimization

Houseprices - Predict sales prices and practice feature engineering, RFs, and gradient boosting

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices

A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Fully Automated YouTube Channel ▶️with Added Extra Features.

Training vision models with full-batch gradient descent and regularization

Iterative stochastic gradient descent (SGD) linear regressor with regularization

Plotting points that lie on the intersection of the given curves using gradient descent.

Fibonacci Method Gradient Descent

Neural Oblivious Decision Ensembles

Create large-scale ML-driven multiscale simulation ensembles to study the interactions

The command line interface for Gradient - Gradient is an an end-to-end MLOps platform

API for RL algorithm design & testing of BCA (Building Control Agent) HVAC on EnergyPlus building energy simulator by wrapping their EMS Python API