Contrastive Explanation (Foil Trees), developed at TNO/Utrecht University

Overview

Contrastive Explanation (Foil Trees)

Contrastive and counterfactual explanations for machine learning (ML)

Marcel Robeer (2018-2020), TNO/Utrecht University

Travis (.org) License Python version

Contents

  1. Introduction
  2. Publications: citing this package
  3. Example usage
  4. Documentation: choices for problem explanation
  5. License

Introduction

Contrastive Explanation provides an explanation for why an instance had the current outcome (fact) rather than a targeted outcome of interest (foil). These counterfactual explanations limit the explanation to the features relevant in distinguishing fact from foil, thereby disregarding irrelevant features. The idea of contrastive explanations is captured in this Python package ContrastiveExplanation. Example facts and foils are:

Machine Learning (ML) type Problem Explainable AI (XAI) question Fact Foil
Classification Determine type of animal Why is this instance a cat rather than a dog? Cat Dog
Regression analysis Predict students' grade Why is the predicted grade for this student 6.5 rather than higher? 6.5 More than 6.5
Clustering Find similar flowers Why is this flower in cluster 1 rather than cluster 4? Cluster 1 Cluster 4

Publications

One scientific paper was published on Contrastive Explanation / Foil Trees:

  • J. van der Waa, M. Robeer, J. van Diggelen, M. Brinkhuis, and M. Neerincx, "Contrastive Explanations with Local Foil Trees", in 2018 Workshop on Human Interpretability in Machine Learning (WHI 2018), 2018, pp. 41-47. [Online]. Available: http://arxiv.org/abs/1806.07470

It was developed as part of a Master's Thesis at Utrecht University / TNO:

Citing this package

@inproceedings{vanderwaa2018,
  title={{Contrastive Explanations with Local Foil Trees}},
  author={van der Waa, Jasper and Robeer, Marcel and van Diggelen, Jurriaan and Brinkhuis, Matthieu and Neerincx, Mark},
  booktitle={2018 Workshop on Human Interpretability in Machine Learning (WHI)},
  year={2018}
}

Example usage

As a simple example, let us explain a Random Forest classifier that determine the type of flower in the well-known Iris flower classification problem. The data set comprises 150 instances, each one of three types of flowers (setosa, versicolor and virginica). For each instance, the data set includes four features (sepal length, sepal width, petal length, petal width) and the goal is to determine which type of flower (class) each instance is.

Steps

First, train the 'black-box' model to explain

from sklearn import datasets, model_selection, ensemble
seed = 1

# Train black-box model on Iris data
data = datasets.load_iris()
train, test, y_train, y_test = model_selection.train_test_split(data.data, 
                                                                data.target, 
                                                                train_size=0.80, 
                                                                random_state=seed)
model = ensemble.RandomForestClassifier(random_state=seed)
model.fit(train, y_train)

Next, perform contrastive explanation on the first test instance (test[0]) by wrapping the tabular data in a DomainMapper, and then using method ContrastiveExplanation.explain_instance_domain()

# Contrastive explanation
import contrastive_explanation as ce

dm = ce.domain_mappers.DomainMapperTabular(train, 
                                           feature_names=data.feature_names,
					   contrast_names=data.target_names)
exp = ce.ContrastiveExplanation(dm, verbose=True)

sample = test[0]
exp.explain_instance_domain(model.predict_proba, sample)

[OUT] "The model predicted 'setosa' instead of 'versicolor' because 'sepal length (cm) <= 6.517 and petal width (cm) <= 0.868'"

The predicted class using the RandomForestClassifier was 'setosa', while the second most probable class 'versicolor' may have been expected instead. The difference of why the current instance was classified 'setosa' is because its sepal length is less than or equal to 6.517 centimers and its petal width is less than or equal to 0.868 centimers. In other words, if the instance would keep all feature values the same, but change its sepal width to more than 6.517 centimers and its petal width to more than 0.868 centimers, the black-box classifier would have changed the outcome to 'versicolor'.

More examples

For more examples, check out the attached Notebook.

Documentation

Several choices can be made to tailor the explanation to your type of explanation problem.

Choices for problem explanation

FactFoil

Used for determining the current outcome (fact) and the outcome of interest (foil), based on a foil_method (e.g. second most probable class, random class, greater than the current outcome). Foils can also be manually selected by using the foil=... optional argument of the ContrastiveExplanation.explain_instance_domain() method.

FactFoil Description foil_method
FactFoilClassification (default) Determine fact and foil for classification/unsupervised learning second, random
FactFoilRegression Determine fact and foil for regression analysis greater, smaller
Explanators

Method for forming the explanation, either using a Foil Tree (TreeExplanator) as described in the paper, or using a prototype (PointExplanator, not fully implemented). As multiple explanations hold, one can choose the foil_strategy as either 'closest' (shortest explanation), 'size' (move the current outcome to the area containing most samples of the foil outcome), 'impurity' (most informative foil area), or 'random' (random foil area)

Explanator Description foil_strategy
TreeExplanator (default) Foil Tree: Explain using a decision tree closest, size, impurity, random
PointExplanator Explain with a representatitive point (prototype) of the foil class closest, medoid, random
Domain Mappers

For handling the different types of data:

  • Tabular (rows and columns)
  • Images (rudimentary support)

Maps to a general format that the explanator can form the explanation in, and then maps the explanation back into this format. Ensures meaningful feature names.

DomainMapper Description
DomainMapperTabular Tabular data (columns with feature names, rows)
DomainMapperPandas Uses a pandas dataframe to create a DomainMapperTabular, while automatically inferring feature names
DomainMapperImage Image data

License

ContrastiveExplanation is BSD-3 Licensed.

Comments
  • Changing output every run

    Changing output every run

    Hi,

    When I run the exp.explain_instance_domain(model.predict_proba, sample) , I get different output everytime. After every run, a different column is given as output.

    Is there a way I can get constant results?

    opened by Subh1m 5
  • TypeError in the example notebook

    TypeError in the example notebook

    running the example notebook (Contrastive explanation - example usage), the line exp.explain_instance_domain(model.predict_proba, sample) gives the following error: image

    opened by Naviden 3
  • Not working for multi-valued categorical features

    Not working for multi-valued categorical features

    Does the current implementation support only binary-valued categorical features?

    Because I tried with the adult income dataset which has many multi-value categorical and continuous features (https://archive.ics.uci.edu/ml/datasets/adult) and got output like these:

    "The model predicted '<=50k' instead of '>50k' because 'hours_per_week <= 42.832 and not occupation and age <= 34.95 and not education and hours_per_week <= 57.892'"

    Here, education and occupation are not binary features - they have many levels.

    bug enhancement 
    opened by raam93 3
  • Always getting the warning

    Always getting the warning "UserWarning: Could not find a difference between fact...", with blank explenations - for any dataset, and every sample.

    I am trying to exactly recreate the example from the README for the Iris-dataset. Unforunately, when running .explain_instance_domain(model.predict_proba, sample) I get the following output:

    [F] Picked foil "1" using foil selection strategy "second" [D] Obtaining neighborhood data C:\Users\dsemkoandrosenko\contrastive_explanation\contrastive_explanation.py:264: UserWarning: Could not find a difference between fact "setosa" and foil "versicolor" warnings.warn(f'Could not find a difference between fact ' "The model predicted 'setosa' instead of 'versicolor' because ''"

    I get the same issue with every single other sample, and even every other dataset I try. What could be the issue?

    Versions: Windows 10 Python: 3.7.4 Scikit-Learn: 0.21.3 Numpy: 1.18.2

    bug 
    opened by mlds2020 2
  • Create DOI on Zenodo

    Create DOI on Zenodo

    I wanted to ask if you considered to create a DOI for the Contrastive Explanation. This allows researchers to reference a version of the package with ease and can be done with Zenodo for example. There is also great integration between Zenodo, which creates a new DOI and a persistent copy of the repository for each release, and Github. You can find instructions on how to create a DOI here from official Github docs: https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content This step helps with visibility of this repository and therefore making your research software more used.

    On a similar note, it also helps researchers to know how to correctly cite the software. I see that you already added this in the readme but Github also offers the citation file format which also shows up on the top right if used: image You can find more information about it here: https://citation-file-format.github.io/

    I would be happy to help with this if you have any questions.

    opened by kequach 1
  • Trying to understand output

    Trying to understand output

    Hi

    I have been trying your approach for a regression problem with categorical features. I receive an explanation in form:

    The model predicted 123 because sales < 1445 and not month ( dummy example)

    month is a categorical variable with values "1", "2", .. "12".

    What does it mean " and not month" then ?

    Thank you

    opened by andreysharapov 0
  • Explanations of Clustering algorithms

    Explanations of Clustering algorithms

    Hi, I was wondering if the Clustering algorithm is supported or not. I see you mentioned it in the README but looking at the code, I can't find it anywhere. Thanks :)

    opened by Naviden 0
  • Specify Features

    Specify Features

    Hi, thank you for the great package. I would like to know is there a way to specify which features to change? For example I would like to see what I need to change only for specific features?

    Thank you

    enhancement 
    opened by arsine1996 0
Releases(v0.2)
Owner
M.J. Robeer
M.J. Robeer
Graph-total-spanning-trees - A Python script to get total number of Spanning Trees in a Graph

Total number of Spanning Trees in a Graph This is a python script just written f

Mehdi I. 0 Jul 18, 2022
School of Artificial Intelligence at the Nanjing University (NJU)School of Artificial Intelligence at the Nanjing University (NJU)

F-Principle This is an exercise problem of the digital signal processing (DSP) course at School of Artificial Intelligence at the Nanjing University (

Thyrix 5 Nov 23, 2022
Among AIs is a (prototype of) PC Game we developed as part of the Smart Applications course @ University of Pisa.

Among AIs is a PC Game we developed as part of the Smart Applications course @ Department of Computer Science of University of Pisa, under t

Gabriele Pisciotta 5 Dec 5, 2021
Discord bot developed by Delhi University Student Community!

DUSC-Bot Discord bot developed by Delhi University Student Community! Libraries Used Pycord - Documentation Features Can purge messages in bulk Drop-D

professor 1 Jan 29, 2022
A version of nrsc5-gui that merges the interface developed by cmnybo with the architecture developed by zefie in order to start a new baseline that is not heavily dependent upon Python processing.

NRSC5-DUI is a graphical interface for nrsc5. It makes it easy to play your favorite FM HD radio stations using an RTL-SDR dongle. It will also displa

null 61 Dec 22, 2022
Implementation of the paper "Shapley Explanation Networks"

Shapley Explanation Networks Implementation of the paper "Shapley Explanation Networks" at ICLR 2021. Note that this repo heavily uses the experimenta

null 68 Dec 27, 2022
CausaLM: Causal Model Explanation Through Counterfactual Language Models

CausaLM: Causal Model Explanation Through Counterfactual Language Models Authors: Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart Abstract: Understan

Amir Feder 39 Jul 10, 2022
CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms

CARLA - Counterfactual And Recourse Library CARLA is a python library to benchmark counterfactual explanation and recourse models. It comes out-of-the

Carla Recourse 200 Dec 28, 2022
This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

RobustFreqCNN About This repository contains the implementation of the paper "Towards Frequency-Based Explanation for Robust CNN" arxiv. It primarly d

Sarosij Bose 2 Jan 23, 2022
A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code.

A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code

Louis-François Bouchard 2.9k Jan 8, 2023
Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

Lime Comparing deep contextualized model for sentences highlighting task. In addition, take the classic explanation model "LIME" with bert-base model

JHJu 2 Jan 18, 2022
NitroSniper - A discord nitro sniper, it uses 2 account tokens here's the explanation

Discord-Nitro-Sniper This is a discord nitro sniper, it uses 2 account tokens he

vanis / 1800 0 Jan 20, 2022
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Denis Emelin 42 Nov 24, 2022
Saeed Lotfi 28 Dec 12, 2022
Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

This is a Python implementation of cover trees, a data structure for finding nearest neighbors in a general metric space (e.g., a 3D box with periodic

Patrick Varilly 28 Nov 25, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 4, 2023
Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

PyStanfordDependencies Python interface for converting Penn Treebank trees to Universal Dependencies and Stanford Dependencies. Example usage Start by

David McClosky 64 May 8, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 5.7k Feb 12, 2021
It is a forest of random projection trees

rpforest rpforest is a Python library for approximate nearest neighbours search: finding points in a high-dimensional space that are close to a given

Lyst 211 Dec 29, 2022