treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions.

Overview

TreeInterpreter

Package for interpreting scikit-learn's decision tree and random forest predictions. Allows decomposing each prediction into bias and feature contribution components as described in http://blog.datadive.net/interpreting-random-forests/. For a dataset with n features, each prediction on the dataset is decomposed as prediction = bias + feature_1_contribution + ... + feature_n_contribution.

It works on scikit-learn's

  • DecisionTreeRegressor
  • DecisionTreeClassifier
  • ExtraTreeRegressor
  • ExtraTreeClassifier
  • RandomForestRegressor
  • RandomForestClassifier
  • ExtraTreesRegressor
  • ExtraTreesClassifier

Free software: BSD license

Dependencies

  • scikit-learn 0.17+

Installation

The easiest way to install the package is via pip:

$ pip install treeinterpreter

Usage

from treeinterpreter import treeinterpreter as ti
# fit a scikit-learn's regressor model
rf = RandomForestRegressor()
rf.fit(trainX, trainY)

prediction, bias, contributions = ti.predict(rf, testX)

Prediction is the sum of bias and feature contributions:

assert(numpy.allclose(prediction, bias + np.sum(contributions, axis=1)))
assert(numpy.allclose(rf.predict(testX), bias + np.sum(contributions, axis=1)))

More usage examples at http://blog.datadive.net/random-forest-interpretation-with-scikit-learn/.

Comments
  • New release

    New release

    Would it be possible for you to release a new version on pypi with the latest changes from the repository included? Version 0.2.2 is from December 2018 and since then a couple of merge requests have been merged into the master branch which would be nice to have in the pip installable version.

    Thanks in advance

    opened by Strandtasche 2
  • Python 3.6 + sklearn 0.24

    Python 3.6 + sklearn 0.24

    Hello. I am receiving this error with

    tree interpreter version '0.1.0' and sklearn0.24.0

    just after doing from treeinterpreter import treeinterpreter as ti

    ModuleNotFoundError                       Traceback (most recent call last)
    <ipython-input-729-679b5145dca2> in <module>
    ----> 1 from treeinterpreter import treeinterpreter as ti
          2 from sklearn.tree import DecisionTreeRegressor
          3 from sklearn.ensemble import RandomForestRegressor
    
    ~/miniconda3/envs/py36_ds_liv/lib/python3.6/site-packages/treeinterpreter/treeinterpreter.py in <module>
          3 import sklearn
          4 
    ----> 5 from sklearn.ensemble.forest import ForestClassifier, ForestRegressor
          6 from sklearn.tree import DecisionTreeRegressor, DecisionTreeClassifier, _tree
          7 from distutils.version import LooseVersion
    
    ModuleNotFoundError: No module named 'sklearn.ensemble.forest'
    
    opened by fsan 2
  • Tests?

    Tests?

    How should tests be run?

    I get 3 test failures using python setup.py test or pytest using Python 3.6.

    ======================================================================
    FAIL: test_forest_regressor (tests.test_treeinterpreter.TestTreeinterpreter)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/Users/c97346/p/treeinterpreter/tests/test_treeinterpreter.py", line 75, in test_forest_regressor
        self.assertTrue(np.allclose(base_prediction, pred))
    AssertionError: False is not true
    
    ======================================================================
    FAIL: test_forest_regressor_joint (tests.test_treeinterpreter.TestTreeinterpreter)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/Users/c97346/p/treeinterpreter/tests/test_treeinterpreter.py", line 90, in test_forest_regressor_joint
        self.assertTrue(np.allclose(base_prediction, pred))
    AssertionError: False is not true
    
    ======================================================================
    FAIL: test_tree_regressor (tests.test_treeinterpreter.TestTreeinterpreter)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/Users/c97346/p/treeinterpreter/tests/test_treeinterpreter.py", line 40, in test_tree_regressor
        self.assertTrue(np.allclose(base_prediction, pred))
    AssertionError: False is not true
    
    opened by jimfulton 2
  • Getting

    Getting "ValueError: Wrong model type" when I try to run predict function

    I ran the following code, mostly cut-and-pasted from the unit tests, in a python 3 jupyter notebook:

    import numpy as np
    import unittest
    
    from sklearn.datasets import load_boston, load_iris
    from sklearn.ensemble import (RandomForestRegressor, RandomForestClassifier,
                                  ExtraTreesClassifier, ExtraTreesRegressor,)
    from sklearn.tree import (DecisionTreeClassifier, DecisionTreeRegressor,
                              ExtraTreeClassifier, ExtraTreeRegressor,)
    
    from treeinterpreter import treeinterpreter
    
    boston = load_boston()
    iris = load_iris()
    
    TreeRegressor = ExtraTreeRegressor
    X = boston.data
    Y = boston.target
    testX = X[int(len(X)/2):]
    
    #Predict for decision tree
    dt = TreeRegressor()
    dt.fit(X[:int(len(X)/2)], Y[:int(len(X)/2)])
    
    base_prediction = dt.predict(testX)
    pred, bias, contrib = treeinterpreter.predict(dt, testX)
    

    This gave me the following error:

    ~/.local/lib/python3.5/site-packages/treeinterpreter/treeinterpreter.py in predict(model, X, joint_contribution)
        204     else:
        205         raise ValueError("Wrong model type. Base learner needs to be \
    --> 206             DecisionTreeClassifier or DecisionTreeRegressor.")
    
    ValueError: Wrong model type. Base learner needs to be             DecisionTreeClassifier or DecisionTreeRegressor.
    

    Note that running isinstance(dt, DecisionTreeRegressor) returns True. Unfortunately, I don't know enough about python to understand why isinstance would give different answers when it is run in my notebook vs in treeinterpreter.predict.

    Do you guys happen to know how to fix this?

    opened by istorch 2
  • Fixed a crash on trees with depth of 0.

    Fixed a crash on trees with depth of 0.

    A fix for issue #4.

    Can be reproduced by running the following:

    import numpy as np
    from sklearn.ensemble import RandomForestClassifier
    from treeinterpreter import treeinterpreter as ti
    
    
    X = np.array([[0, 0], [1, 1]])
    Y = [0, 1]
    
    rf = RandomForestClassifier(n_estimators=10)
    rf = rf.fit(X, Y)
    
    prediction, bias, contributions = ti.predict(rf, X)
    
    opened by mickey946 1
  • fix IndexError if tree value is squeezed into a single float

    fix IndexError if tree value is squeezed into a single float

    For some trees in sklearn, the package raises error:

        return _predict_forest(model, X)
      File "python2.7/site-packages/treeinterpreter/treeinterpreter.py", line 96, in _predict_forest
        pred, bias, contribution = _predict_tree(tree, X)
      File "python2.7/site-packages/treeinterpreter/treeinterpreter.py", line 74, in _predict_tree
        biases[row] = values[path[0]]
    

    Error happens when tree value is squeezed into float instead of the 1-D array of floats.

    opened by janrygl 1
  • Fix shape of pred in DecisionTreeRegressor; closes issue #32

    Fix shape of pred in DecisionTreeRegressor; closes issue #32

    This closes #32 by ensuring that the shape of the output predictions from a DecisionTreeRegressor is a one-dimensional array rather than a 2D (n,1) array.

    opened by juliangilbey 0
  • Replace usage of ForestClassifier and ForestRegressor with concrete classes

    Replace usage of ForestClassifier and ForestRegressor with concrete classes

    My bad, I did not properly check #29, and this now causes an

    ImportError: cannot import name 'ForestClassifier' from 'sklearn.ensemble' 
    

    upon loading this library.

    Sklearn 0.22+ seems to have removed public access to the base classes ForestClassifier and ForestRegressor. This PR changes the type checking to use more concrete implementations RandomForestClassifier, ExtraTreesClassifier, RandomForestRegressor, ExtraTreesRegressor. I think this is a better approach anyway, as sklearn may decide to add more forests type models which we cannot guarantee to support.

    opened by iamDecode 0
  • Silence sklearn 0.22+ FutureWarning

    Silence sklearn 0.22+ FutureWarning

    From sklearn 0.22 on, importing the treeinterpreter package yields the following FutureWarning:

    FutureWarning: The sklearn.ensemble.forest module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.ensemble. Anything that cannot be imported from sklearn.ensemble is now part of the private API.
    

    This PR changes the import to silence this warning, and allow for compatibility with sklearn 0.24 and up.

    opened by iamDecode 0
  • Optimizes mean calculation routine in treeinterpreter/treeinterpreter.py

    Optimizes mean calculation routine in treeinterpreter/treeinterpreter.py

    I tried to use treeinterpreter to calculate feature contribution components for a large dataset which consist of 55K rows, each row ~ 15K features and even though I've parallelized my computation using Spark I was not able to run the code successfully.

    One of the issues I was facing was tremendous amount of memory required by treeinterpreter for each run. It turned out that in my case most of the memory is used by _predict_forest to assemble lists of biases, contributions and predictions which are later used to calculate corresponding mean vectors. To improve the code memory usage and runtime I propose to use iterative method for computing averages, as it is summarized in http://www.heikohoffmann.de/htmlthesis/node134.html

    opened by VolodymyrOrlov 0
  • Add support for more sklearn trees

    Add support for more sklearn trees

    Specifically, target support for sklearn's ExtraTreeRegressor, ExtraTreeClassifier, ExtraTreesRegressor, and ExtraTreesClassifier. This is implemented by removing explicit checks on type(model) == DecisionTreeRegressor, etc. and allowing subclasses to fit seamlessly within the treeinterpreter framework by having checks on isinstance(model, DecisionTreeRegressor).

    See http://scikit-learn.org/stable/modules/ensemble.html#extremely-randomized-trees for more details on ExtraTrees estimators.

    XGBoost and other gradient-boosted estimators do not inherit from the DecisionTreeRegressor or ForestRegressor classes, and thus support is not provided for them through this PR, as seems to have been a concern in previous contributions.

    opened by micahjsmith 0
  • Error when predicting with a RandomForest that its first trees were trained only on some of the data classes (batched training)

    Error when predicting with a RandomForest that its first trees were trained only on some of the data classes (batched training)

    Happens when training on batched data with warm_start = True and the data is unbalanced.

    Error:

    /Users/x/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py:136: VisibleDeprecationWarning: **Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes)** is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
      return array(a, dtype, copy=False, order=order, subok=True)
    Traceback (most recent call last):
    ml-pipeline/src/treeint_simple_example.py", line 22, in <module>
        test_predict_prob, bias, contributions = ti.predict(rf, test_data.head(2))
      File "/Users/x/anaconda3/lib/python3.7/site-packages/treeinterpreter/treeinterpreter.py", line 212, in predict
        return _predict_forest(model, X, joint_contribution=joint_contribution)
      File "/Users/x/anaconda3/lib/python3.7/site-packages/treeinterpreter/treeinterpreter.py", line 166, in _predict_forest
        return (np.mean(predictions, axis=0), np.mean(biases, axis=0),
      File "<__array_function__ internals>", line 6, in mean
      File "/Users/x/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 3373, in mean
        out=out, **kwargs)
      File "/Users/x/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py", line 144, in _mean
        arr = asanyarray(a)
      File "/Users/x/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py", line 136, in asanyarray
        return array(a, dtype, copy=False, order=order, subok=True)
    **ValueError: could not broadcast input array from shape (2,1) into shape (2)**
    

    Reproduction:

    from sklearn.ensemble import RandomForestClassifier
    from treeinterpreter import treeinterpreter as ti
    import pandas as pd
    
    # Random forest that can train on chunks of data.
    rf = RandomForestClassifier(warm_start=True, n_estimators=1)
    
    # data of chunk1
    chunk1_data_vec = [0, 0]
    chunk1_df = pd.DataFrame(data={'label': chunk1_data_vec, 'features1': chunk1_data_vec, 'features2': chunk1_data_vec})
    # data of chunk2
    chunk2_data_vec = [0, 0, 1, 1, 0, 0, 1, 1]
    chunk2_df = pd.DataFrame(data={'label': chunk2_data_vec, 'features1': chunk2_data_vec, 'features2': chunk2_data_vec})
    
    
    # fit first chunk of data that has a single label
    rf.fit(X=chunk1_df.drop(['label'], axis='columns'), y=chunk1_df['label'])
    # fit second chunk of data that has 2 labels
    rf.n_estimators += 1
    rf.fit(X=chunk2_df.drop(['label'], axis='columns'), y=chunk2_df['label'])
    
    # test
    test_data = chunk2_df.drop(['label'], axis='columns')
    # regular predict
    rf.predict_proba(test_data)
    # tree interpreter predict
    test_predict_prob, bias, contributions = ti.predict(rf, test_data.head(2))
    
    
    opened by aiah123 0
  • Minimum python version?

    Minimum python version?

    Is there a minimum Python version recommended for this package? Would be useful to know it for future conda-forge builds, see for instance https://github.com/conda-forge/treeinterpreter-feedstock/pull/1#issuecomment-757550583

    opened by philip-khor 0
  • Support for pipeline objects

    Support for pipeline objects

    Currently, we couln't able to use the sklearn's pipeline objects directly in treeinterpreter.

    possible work around is provided here. Can it be possible to support it natively?

    opened by venkyyuvy 0
  • Most recent version not installed with pip

    Most recent version not installed with pip

    I still get the sklearn deprecation warning about sklearn.ensemble.forest, and when I ran pip install --upgrade treeinterpreter, I got the message that I was already up-to-date.

    opened by serenalotreck 1
  • Performance?

    Performance?

    Hi,

    I'm working on a project where treeinterpreter is taking ~2 minutes per prediction.

    I suppose this is because the implementation is in pure Python.

    Do you know if anyone has looked at porting this to C (or cython or whatever) to make it go faster?

    Jim

    opened by jimfulton 1
Owner
Ando Saabas
Ando Saabas
A library for debugging/inspecting machine learning classifiers and explaining their predictions

ELI5 ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions. It provides support for the following m

null 2.6k Dec 30, 2022
Lime: Explaining the predictions of any machine learning classifier

lime This project is about explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predict

Marco Tulio Correia Ribeiro 10.3k Jan 1, 2023
Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)

Hierarchical neural-net interpretations (ACD) ?? Produces hierarchical interpretations for a single prediction made by a pytorch neural network. Offic

Chandan Singh 111 Jan 3, 2023
Making decision trees competitive with neural networks on CIFAR10, CIFAR100, TinyImagenet200, Imagenet

Neural-Backed Decision Trees · Site · Paper · Blog · Video Alvin Wan, *Lisa Dunlap, *Daniel Ho, Jihan Yin, Scott Lee, Henry Jin, Suzanne Petryk, Sarah

Alvin Wan 556 Dec 20, 2022
An intuitive library to add plotting functionality to scikit-learn objects.

Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i

Reiichiro Nakano 2.3k Dec 31, 2022
Interpretability and explainability of data and machine learning models

AI Explainability 360 (v0.2.1) The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datase

null 1.2k Dec 29, 2022
Portal is the fastest way to load and visualize your deep neural networks on images and videos 🔮

Portal is the fastest way to load and visualize your deep neural networks on images and videos ??

Datature 243 Jan 5, 2023
Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Class Activation Map methods implemented in Pytorch pip install grad-cam ⭐ Comprehensive collection of Pixel Attribution methods for Computer Vision.

Jacob Gildenblat 6.5k Jan 1, 2023
Algorithms for monitoring and explaining machine learning models

Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The focus of the library is to provide high-qual

Seldon 1.9k Dec 30, 2022
Bias and Fairness Audit Toolkit

The Bias and Fairness Audit Toolkit Aequitas is an open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers

Data Science for Social Good 513 Jan 6, 2023
Visual analysis and diagnostic tools to facilitate machine learning model selection.

Yellowbrick Visual analysis and diagnostic tools to facilitate machine learning model selection. What is Yellowbrick? Yellowbrick is a suite of visual

District Data Labs 3.9k Dec 30, 2022
A collection of infrastructure and tools for research in neural network interpretability.

Lucid Lucid is a collection of infrastructure and tools for research in neural network interpretability. We're not currently supporting tensorflow 2!

null 4.5k Jan 7, 2023
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX (.onnx, .pb, .pbtxt), Keras (.h5, .keras), Tens

Lutz Roeder 20.9k Dec 28, 2022
tensorboard for pytorch (and chainer, mxnet, numpy, ...)

tensorboardX Write TensorBoard events with simple function call. The current release (v2.1) is tested on anaconda3, with PyTorch 1.5.1 / torchvision 0

Tzu-Wei Huang 7.5k Jan 7, 2023
TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, Korean, Chinese, German and Easy to adapt for other languages)

?? TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using fake-quantize aware and pruning, make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.

null 3k Jan 4, 2023
A collection of research papers and software related to explainability in graph machine learning.

A collection of research papers and software related to explainability in graph machine learning.

AstraZeneca 1.9k Dec 26, 2022
Quickly and easily create / train a custom DeepDream model

Dream-Creator This project aims to simplify the process of creating a custom DeepDream model by using pretrained GoogleNet models and custom image dat

null 56 Jan 3, 2023
Implementation of linear CorEx and temporal CorEx.

Correlation Explanation Methods Official implementation of linear correlation explanation (linear CorEx) and temporal correlation explanation (T-CorEx

Hrayr Harutyunyan 34 Nov 15, 2022
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX, TensorFlow Lite, Keras, Caffe, Darknet, ncnn,

Lutz Roeder 16.3k Sep 27, 2021