forest-confidence-interval
: Confidence intervals for Forest algorithms
Forest algorithms are powerful ensemble methods for classification and regression. However, predictions from these algorithms do contain some amount of error. Prediction variability can illustrate how influential the training set is for producing the observed random forest predictions.
forest-confidence-interval
is a Python module that adds a calculation of variance and computes confidence intervals to the basic functionality implemented in scikit-learn random forest regression or classification objects. The core functions calculate an in-bag and error bars for random forest objects.
Compatible with Python2.7 and Python3.6
This module is based on R code from Stefan Wager (see important links below) and is licensed under the MIT open source license (see LICENSE)
Important Links
scikit-learn
- http://scikit-learn.org/
Stefan Wager's randomForestCI
- https://github.com/swager/randomForestCI (deprecated in favor of grf
: https://github.com/swager/grf)
Installation and Usage
Before installing the module you will need numpy
, scipy
and scikit-learn
. Dependencies associated with the previous modules may need root privileges to install Consult the API Reference for documentation on core functionality
pip install numpy scipy scikit-learn
can also install dependencies with:
pip install -r requirements.txt
To install forest-confidence-interval
execute:
pip install forestci
or, if you are installing from the source code:
python setup.py install
If would like to install the development version of the software use:
pip install git+git://github.com/scikit-learn-contrib/forest-confidence-interval.git
forest-confidence-interval
?
Why use Our software is designed for individuals using scikit-learn
random forest objects that want to add estimates of uncertainty to random forest predictors. Prediction variability demonstrates how much the training set influences results and is important for estimating standard errors. forest-confidence-interval
is a Python module for calculating variance and adding confidence intervals to the popular Python library scikit-learn
. The software is compatible with both scikit-learn
random forest regression or classification objects.
Examples
The examples (gallery below) demonstrates the package functionality with random forest classifiers and regression models. The regression example uses a popular UCI Machine Learning data set on cars while the classifier example simulates how to add measurements of uncertainty to tasks like predicting spam emails.
Contributing
Contributions are very welcome, but we ask that contributors abide by the contributor covenant.
To report issues with the software, please post to the issue log Bug reports are also appreciated, please add them to the issue log after verifying that the issue does not already exist. Comments on existing issues are also welcome.
Please submit improvements as pull requests against the repo after verifying that the existing tests pass and any new code is well covered by unit tests. Please write code that complies with the Python style guide, PEP8.
E-mail Ariel Rokem, Kivan Polimis, or Bryna Hazelton if you have any questions, suggestions or feedback.
Testing
Requires installation of nose
package. Tests are located in the forestci/tests
folder and can be run with the nosetests
command in the main directory.
Citation
Click on the JOSS status badge for the Journal of Open Source Software article on this project. The BibTeX citation for the JOSS article is below:
@article{polimisconfidence,
title={Confidence Intervals for Random Forests in Python},
author={Polimis, Kivan and Rokem, Ariel and Hazelton, Bryna},
journal={Journal of Open Source Software},
volume={2},
number={1},
year={2017}
}