treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions.

Ando Saabas

Last update: Dec 22, 2022

Related tags

Deep Learning Model Explanation treeinterpreter

Overview

TreeInterpreter

Package for interpreting scikit-learn's decision tree and random forest predictions. Allows decomposing each prediction into bias and feature contribution components as described in http://blog.datadive.net/interpreting-random-forests/. For a dataset with n features, each prediction on the dataset is decomposed as prediction = bias + feature_1_contribution + ... + feature_n_contribution.

It works on scikit-learn's

DecisionTreeRegressor
DecisionTreeClassifier
ExtraTreeRegressor
ExtraTreeClassifier
RandomForestRegressor
RandomForestClassifier
ExtraTreesRegressor
ExtraTreesClassifier

Free software: BSD license

Dependencies

scikit-learn 0.17+

Installation

The easiest way to install the package is via pip:

$ pip install treeinterpreter

Usage

from treeinterpreter import treeinterpreter as ti
# fit a scikit-learn's regressor model
rf = RandomForestRegressor()
rf.fit(trainX, trainY)

prediction, bias, contributions = ti.predict(rf, testX)

Prediction is the sum of bias and feature contributions:

assert(numpy.allclose(prediction, bias + np.sum(contributions, axis=1)))
assert(numpy.allclose(rf.predict(testX), bias + np.sum(contributions, axis=1)))

More usage examples at http://blog.datadive.net/random-forest-interpretation-with-scikit-learn/.

Comments

New release

Would it be possible for you to release a new version on pypi with the latest changes from the repository included? Version 0.2.2 is from December 2018 and since then a couple of merge requests have been merged into the master branch which would be nice to have in the pip installable version.

Thanks in advance

opened by Strandtasche 2

Python 3.6 + sklearn 0.24

Hello. I am receiving this error with

tree interpreter version '0.1.0' and sklearn0.24.0

just after doing from treeinterpreter import treeinterpreter as ti

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-729-679b5145dca2> in <module>
----> 1 from treeinterpreter import treeinterpreter as ti
      2 from sklearn.tree import DecisionTreeRegressor
      3 from sklearn.ensemble import RandomForestRegressor

~/miniconda3/envs/py36_ds_liv/lib/python3.6/site-packages/treeinterpreter/treeinterpreter.py in <module>
      3 import sklearn
      4 
----> 5 from sklearn.ensemble.forest import ForestClassifier, ForestRegressor
      6 from sklearn.tree import DecisionTreeRegressor, DecisionTreeClassifier, _tree
      7 from distutils.version import LooseVersion

ModuleNotFoundError: No module named 'sklearn.ensemble.forest'

opened by fsan 2

Tests?

How should tests be run?

I get 3 test failures using python setup.py test or pytest using Python 3.6.

======================================================================
FAIL: test_forest_regressor (tests.test_treeinterpreter.TestTreeinterpreter)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/c97346/p/treeinterpreter/tests/test_treeinterpreter.py", line 75, in test_forest_regressor
    self.assertTrue(np.allclose(base_prediction, pred))
AssertionError: False is not true

======================================================================
FAIL: test_forest_regressor_joint (tests.test_treeinterpreter.TestTreeinterpreter)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/c97346/p/treeinterpreter/tests/test_treeinterpreter.py", line 90, in test_forest_regressor_joint
    self.assertTrue(np.allclose(base_prediction, pred))
AssertionError: False is not true

======================================================================
FAIL: test_tree_regressor (tests.test_treeinterpreter.TestTreeinterpreter)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/c97346/p/treeinterpreter/tests/test_treeinterpreter.py", line 40, in test_tree_regressor
    self.assertTrue(np.allclose(base_prediction, pred))
AssertionError: False is not true

opened by jimfulton 2

Getting "ValueError: Wrong model type" when I try to run predict function

I ran the following code, mostly cut-and-pasted from the unit tests, in a python 3 jupyter notebook:

import numpy as np
import unittest

from sklearn.datasets import load_boston, load_iris
from sklearn.ensemble import (RandomForestRegressor, RandomForestClassifier,
                              ExtraTreesClassifier, ExtraTreesRegressor,)
from sklearn.tree import (DecisionTreeClassifier, DecisionTreeRegressor,
                          ExtraTreeClassifier, ExtraTreeRegressor,)

from treeinterpreter import treeinterpreter

boston = load_boston()
iris = load_iris()

TreeRegressor = ExtraTreeRegressor
X = boston.data
Y = boston.target
testX = X[int(len(X)/2):]

#Predict for decision tree
dt = TreeRegressor()
dt.fit(X[:int(len(X)/2)], Y[:int(len(X)/2)])

base_prediction = dt.predict(testX)
pred, bias, contrib = treeinterpreter.predict(dt, testX)

This gave me the following error:

~/.local/lib/python3.5/site-packages/treeinterpreter/treeinterpreter.py in predict(model, X, joint_contribution)
    204     else:
    205         raise ValueError("Wrong model type. Base learner needs to be \
--> 206             DecisionTreeClassifier or DecisionTreeRegressor.")

ValueError: Wrong model type. Base learner needs to be             DecisionTreeClassifier or DecisionTreeRegressor.

Note that running isinstance(dt, DecisionTreeRegressor) returns True. Unfortunately, I don't know enough about python to understand why isinstance would give different answers when it is run in my notebook vs in treeinterpreter.predict.

Do you guys happen to know how to fix this?

opened by istorch 2

Fixed a crash on trees with depth of 0.

A fix for issue #4.

Can be reproduced by running the following:

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from treeinterpreter import treeinterpreter as ti


X = np.array([[0, 0], [1, 1]])
Y = [0, 1]

rf = RandomForestClassifier(n_estimators=10)
rf = rf.fit(X, Y)

prediction, bias, contributions = ti.predict(rf, X)

opened by mickey946 1

fix IndexError if tree value is squeezed into a single float

For some trees in sklearn, the package raises error:

    return _predict_forest(model, X)
  File "python2.7/site-packages/treeinterpreter/treeinterpreter.py", line 96, in _predict_forest
    pred, bias, contribution = _predict_tree(tree, X)
  File "python2.7/site-packages/treeinterpreter/treeinterpreter.py", line 74, in _predict_tree
    biases[row] = values[path[0]]

Error happens when tree value is squeezed into float instead of the 1-D array of floats.

opened by janrygl 1

Fix shape of pred in DecisionTreeRegressor; closes issue #32

This closes #32 by ensuring that the shape of the output predictions from a DecisionTreeRegressor is a one-dimensional array rather than a 2D (n,1) array.

opened by juliangilbey 0
Replace usage of ForestClassifier and ForestRegressor with concrete classes
My bad, I did not properly check #29, and this now causes an

ImportError: cannot import name 'ForestClassifier' from 'sklearn.ensemble'

upon loading this library.

Sklearn 0.22+ seems to have removed public access to the base classes ForestClassifier and ForestRegressor. This PR changes the type checking to use more concrete implementations RandomForestClassifier, ExtraTreesClassifier, RandomForestRegressor, ExtraTreesRegressor. I think this is a better approach anyway, as sklearn may decide to add more forests type models which we cannot guarantee to support.
opened by iamDecode 0

Silence sklearn 0.22+ FutureWarning

From sklearn 0.22 on, importing the treeinterpreter package yields the following FutureWarning:

FutureWarning: The sklearn.ensemble.forest module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.ensemble. Anything that cannot be imported from sklearn.ensemble is now part of the private API.

This PR changes the import to silence this warning, and allow for compatibility with sklearn 0.24 and up.

opened by iamDecode 0

Optimizes mean calculation routine in treeinterpreter/treeinterpreter.py

I tried to use treeinterpreter to calculate feature contribution components for a large dataset which consist of 55K rows, each row ~ 15K features and even though I've parallelized my computation using Spark I was not able to run the code successfully.

One of the issues I was facing was tremendous amount of memory required by treeinterpreter for each run. It turned out that in my case most of the memory is used by _predict_forest to assemble lists of biases, contributions and predictions which are later used to calculate corresponding mean vectors. To improve the code memory usage and runtime I propose to use iterative method for computing averages, as it is summarized in http://www.heikohoffmann.de/htmlthesis/node134.html

opened by VolodymyrOrlov 0
Add support for more sklearn trees

Specifically, target support for sklearn's ExtraTreeRegressor, ExtraTreeClassifier, ExtraTreesRegressor, and ExtraTreesClassifier. This is implemented by removing explicit checks on type(model) == DecisionTreeRegressor, etc. and allowing subclasses to fit seamlessly within the treeinterpreter framework by having checks on isinstance(model, DecisionTreeRegressor).

See http://scikit-learn.org/stable/modules/ensemble.html#extremely-randomized-trees for more details on ExtraTrees estimators.

XGBoost and other gradient-boosted estimators do not inherit from the DecisionTreeRegressor or ForestRegressor classes, and thus support is not provided for them through this PR, as seems to have been a concern in previous contributions.

opened by micahjsmith 0

Error when predicting with a RandomForest that its first trees were trained only on some of the data classes (batched training)

Happens when training on batched data with warm_start = True and the data is unbalanced.

Error:

/Users/x/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py:136: VisibleDeprecationWarning: **Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes)** is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order, subok=True)
Traceback (most recent call last):
ml-pipeline/src/treeint_simple_example.py", line 22, in <module>
    test_predict_prob, bias, contributions = ti.predict(rf, test_data.head(2))
  File "/Users/x/anaconda3/lib/python3.7/site-packages/treeinterpreter/treeinterpreter.py", line 212, in predict
    return _predict_forest(model, X, joint_contribution=joint_contribution)
  File "/Users/x/anaconda3/lib/python3.7/site-packages/treeinterpreter/treeinterpreter.py", line 166, in _predict_forest
    return (np.mean(predictions, axis=0), np.mean(biases, axis=0),
  File "<__array_function__ internals>", line 6, in mean
  File "/Users/x/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 3373, in mean
    out=out, **kwargs)
  File "/Users/x/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py", line 144, in _mean
    arr = asanyarray(a)
  File "/Users/x/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py", line 136, in asanyarray
    return array(a, dtype, copy=False, order=order, subok=True)
**ValueError: could not broadcast input array from shape (2,1) into shape (2)**

Reproduction:

from sklearn.ensemble import RandomForestClassifier
from treeinterpreter import treeinterpreter as ti
import pandas as pd

# Random forest that can train on chunks of data.
rf = RandomForestClassifier(warm_start=True, n_estimators=1)

# data of chunk1
chunk1_data_vec = [0, 0]
chunk1_df = pd.DataFrame(data={'label': chunk1_data_vec, 'features1': chunk1_data_vec, 'features2': chunk1_data_vec})
# data of chunk2
chunk2_data_vec = [0, 0, 1, 1, 0, 0, 1, 1]
chunk2_df = pd.DataFrame(data={'label': chunk2_data_vec, 'features1': chunk2_data_vec, 'features2': chunk2_data_vec})


# fit first chunk of data that has a single label
rf.fit(X=chunk1_df.drop(['label'], axis='columns'), y=chunk1_df['label'])
# fit second chunk of data that has 2 labels
rf.n_estimators += 1
rf.fit(X=chunk2_df.drop(['label'], axis='columns'), y=chunk2_df['label'])

# test
test_data = chunk2_df.drop(['label'], axis='columns')
# regular predict
rf.predict_proba(test_data)
# tree interpreter predict
test_predict_prob, bias, contributions = ti.predict(rf, test_data.head(2))

opened by aiah123 0

Minimum python version?

Is there a minimum Python version recommended for this package? Would be useful to know it for future conda-forge builds, see for instance https://github.com/conda-forge/treeinterpreter-feedstock/pull/1#issuecomment-757550583

opened by philip-khor 0
Support for pipeline objects

Currently, we couln't able to use the sklearn's pipeline objects directly in treeinterpreter.

possible work around is provided here. Can it be possible to support it natively?

opened by venkyyuvy 0
Most recent version not installed with pip

I still get the sklearn deprecation warning about sklearn.ensemble.forest, and when I ran pip install --upgrade treeinterpreter, I got the message that I was already up-to-date.

opened by serenalotreck 1
Performance?

Hi,

I'm working on a project where treeinterpreter is taking ~2 minutes per prediction.

I suppose this is because the implementation is in pure Python.

Do you know if anyone has looked at porting this to C (or cython or whatever) to make it go faster?

Jim

opened by jimfulton 1

Owner

Ando Saabas

GitHub

A library for debugging/inspecting machine learning classifiers and explaining their predictions

ELI5 ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions. It provides support for the following m

2.6k Dec 30, 2022

Lime: Explaining the predictions of any machine learning classifier

lime This project is about explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predict

10.3k Jan 1, 2023

Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)

Hierarchical neural-net interpretations (ACD) ?? Produces hierarchical interpretations for a single prediction made by a pytorch neural network. Offic

111 Jan 3, 2023

Making decision trees competitive with neural networks on CIFAR10, CIFAR100, TinyImagenet200, Imagenet

Neural-Backed Decision Trees · Site · Paper · Blog · Video Alvin Wan, *Lisa Dunlap, *Daniel Ho, Jihan Yin, Scott Lee, Henry Jin, Suzanne Petryk, Sarah

556 Dec 20, 2022

An intuitive library to add plotting functionality to scikit-learn objects.

Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i

2.3k Dec 31, 2022

Interpretability and explainability of data and machine learning models

AI Explainability 360 (v0.2.1) The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datase

1.2k Dec 29, 2022

Portal is the fastest way to load and visualize your deep neural networks on images and videos 🔮

Portal is the fastest way to load and visualize your deep neural networks on images and videos ??

243 Jan 5, 2023

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Class Activation Map methods implemented in Pytorch pip install grad-cam ⭐ Comprehensive collection of Pixel Attribution methods for Computer Vision.

6.5k Jan 1, 2023

Algorithms for monitoring and explaining machine learning models

Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The focus of the library is to provide high-qual

1.9k Dec 30, 2022

Bias and Fairness Audit Toolkit

The Bias and Fairness Audit Toolkit Aequitas is an open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers

513 Jan 6, 2023

Visual analysis and diagnostic tools to facilitate machine learning model selection.

Yellowbrick Visual analysis and diagnostic tools to facilitate machine learning model selection. What is Yellowbrick? Yellowbrick is a suite of visual

3.9k Dec 30, 2022

A collection of infrastructure and tools for research in neural network interpretability.

Lucid Lucid is a collection of infrastructure and tools for research in neural network interpretability. We're not currently supporting tensorflow 2!

4.5k Jan 7, 2023

Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX (.onnx, .pb, .pbtxt), Keras (.h5, .keras), Tens

20.9k Dec 28, 2022

tensorboard for pytorch (and chainer, mxnet, numpy, ...)

tensorboardX Write TensorBoard events with simple function call. The current release (v2.1) is tested on anaconda3, with PyTorch 1.5.1 / torchvision 0

7.5k Jan 7, 2023

TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, Korean, Chinese, German and Easy to adapt for other languages)

?? TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using fake-quantize aware and pruning, make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.

3k Jan 4, 2023

treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions.

Related tags

Overview

TreeInterpreter

Dependencies

Installation

Usage

Comments

Owner

Ando Saabas

A library for debugging/inspecting machine learning classifiers and explaining their predictions

Lime: Explaining the predictions of any machine learning classifier

Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)

Making decision trees competitive with neural networks on CIFAR10, CIFAR100, TinyImagenet200, Imagenet

An intuitive library to add plotting functionality to scikit-learn objects.

Interpretability and explainability of data and machine learning models

Portal is the fastest way to load and visualize your deep neural networks on images and videos 🔮

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Algorithms for monitoring and explaining machine learning models

Bias and Fairness Audit Toolkit

Visual analysis and diagnostic tools to facilitate machine learning model selection.

A collection of infrastructure and tools for research in neural network interpretability.

Visualizer for neural network, deep learning, and machine learning models

tensorboard for pytorch (and chainer, mxnet, numpy, ...)

TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, Korean, Chinese, German and Easy to adapt for other languages)

A collection of research papers and software related to explainability in graph machine learning.

Quickly and easily create / train a custom DeepDream model

Implementation of linear CorEx and temporal CorEx.

Visualizer for neural network, deep learning, and machine learning models