A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Benedek Rozemberczki

Last update: Dec 27, 2022

Related tags

Deep Learning Model Explanation classifier machine-learning deep-learning random-forest ensemble ensemble-learning game-theory voting-classifier random-forest-classifier explainable-ai explainable-ml weighted-voting-games shapley owen shapley-value game-theory-toolbox voting-game

Overview

Documentation | External Resources | Research Paper

Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble.

The library consists of various methods to compute (approximate) the Shapley value of players (models) in weighted voting games (ensemble games) - a class of transferable utility cooperative games. We covered the exact enumeration based computation and various widely know approximation methods from economics and computer science research papers. There are also functionalities to identify the heterogeneity of the player pool based on the Shapley entropy. In addition, the framework comes with a detailed documentation, an intuitive tutorial, 100% test coverage and illustrative toy examples.

Citing

If you find Shapley useful in your research please consider adding the following citation:

@misc{rozemberczki2021shapley,
      title = {{The Shapley Value of Classifiers in Ensemble Games}}, 
      author = {Benedek Rozemberczki and Rik Sarkar},
      year = {2021},
      eprint = {2101.02153},
      archivePrefix = {arXiv},
      primaryClass = {cs.LG}
}

A simple example

Shapley makes solving voting games quite easy - see the accompanying tutorial. For example, this is all it takes to solve a weighted voting game with defined on the fly with permutation sampling:

import numpy as np
from shapley import PermutationSampler

W = np.random.uniform(0, 1, (1, 7))
W = W/W.sum()
q = 0.5

solver = PermutationSampler()
solver.solve_game(W, q)
shapley_values = solver.get_solution()

Methods Included

In detail, the following methods can be used.

Expected Marginal Contribution Approximation from Fatima et al.: A Linear Approximation Method for the Shapley Value
Multilinear Extension from Owen: Multilinear Extensions of Games
Monte Carlo Permutation Sampling from Maleki et al.: Bounding the Estimation Error of Sampling-based Shapley Value Approximation
Exact Enumeration from Shapley: A Value for N-Person Games

Head over to our documentation to find out more about installation, creation of datasets and a full list of implemented methods and available datasets. For a quick start, check out the examples in the examples/ directory.

If you notice anything unexpected, please open an issue. If you are missing a specific method, feel free to open a feature request.

Installation

$ pip install shapley

Running tests

$ python setup.py test

Running examples

$ cd examples
$ python permutation_sampler_example.py

License

MIT License

Comments

Error in running MLE example

Thank you for sharing your great work. I truly enjoyed reading it. However, I met an error when I tried the example. It seems to be fine for the MC example.

$ python multilinear_extension_example.py RuntimeWarning: invalid value encountered in true_divide self._Phi = self._Phi / np.sum(self._Phi, axis=1).reshape(-1, 1) Traceback (most recent call last): File "multilinear_extension_example.py", line 11, in solver.solve_game(W, q) File "/lib/python3.6/site-packages/shapley/solvers/multilinear_extension.py", line 34, in solve_game self._run_sanity_check(W, self._Phi) File "/lib/python3.6/site-packages/shapley/solution_concept.py", line 28, in _run_sanity_check self._verify_distribution(Phi) File "/lib/python3.6/site-packages/shapley/solution_concept.py", line 22, in _verify_distribution assert np.sum(Phi) - Phi.shape[0] < 0.001 AssertionError

opened by xxlya 2

Releases(v_10003)

v_10003(Apr 28, 2022)
Moves the Shapley library to an ABC based design.

Adds a version attribute.

Source code(tar.gz)
Source code(zip)
v_10002(May 16, 2021)

Source code(tar.gz)
Source code(zip)
v_10001(Feb 1, 2021)
Fixed the expectations and variances.

Source code(tar.gz)
Source code(zip)
v_10000(Dec 31, 2020)

The official first release of Shapley.
Source code(tar.gz)
Source code(zip)

Logging MXNet data for visualization in TensorBoard.

Logging MXNet Data for Visualization in TensorBoard Overview MXBoard provides a set of APIs for logging MXNet data for visualization in TensorBoard. T

327 Dec 5, 2022

L2X - Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation.

L2X Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation at ICML 2018,

113 Sep 6, 2022

JittorVis - Visual understanding of deep learning model.

JittorVis is a deep neural network computational graph visualization library based on Jittor.

182 Jan 6, 2023

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Documentation | External Resources | Research Paper Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble. The

187 Dec 27, 2022

ML-Ensemble – high performance ensemble learning

A Python library for high performance ensemble learning ML-Ensemble combines a Scikit-learn high-level API with a low-level computational graph framew

764 Dec 31, 2022

inding a method to objectively quantify skill versus chance in games, using reinforcement learning

Skill-vs-chance-games-analysis - Finding a method to objectively quantify skill versus chance in games, using reinforcement learning

4 Nov 19, 2022

Finding a method to objectively quantify skill expression in games, using reinforcement learning

Analyzing Skill Expression in Games This is a repo where I describe a method to measure the amount of skill expression games have. Table of Contents M

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

49 Dec 22, 2022

A library for debugging/inspecting machine learning classifiers and explaining their predictions

ELI5 ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions. It provides support for the following m

2.6k Dec 30, 2022

ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions

154 Dec 17, 2022

Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.

1 Dec 28, 2021

Intrusion Detection System using ensemble learning (machine learning)

IDS-ML implementation of an intrusion detection system using ensemble machine learning methods Data set This project is carried out using the UNSW-15

4 Nov 25, 2022

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl

6.1k Jan 5, 2023

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

3.7k Jan 3, 2023

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

CKAN: The Open Source Data Portal Software CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work

3.6k Dec 27, 2022

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

482 Nov 19, 2022

Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow.

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Related tags

Overview

You might also like...

Logging MXNet data for visualization in TensorBoard.

L2X - Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation.

JittorVis - Visual understanding of deep learning model.

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

ML-Ensemble – high performance ensemble learning

inding a method to objectively quantify skill versus chance in games, using reinforcement learning

Finding a method to objectively quantify skill expression in games, using reinforcement learning

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

A library for debugging/inspecting machine learning classifiers and explaining their predictions

ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions

Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.

Intrusion Detection System using ensemble learning (machine learning)

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow.

This implements one of result networks from Large-scale evolution of image classifiers

Face recognition with trained classifiers for detecting objects using OpenCV

Comments

Error in running MLE example

Releases(v_10003)

v_10003(Apr 28, 2022)

v_10002(May 16, 2021)

v_10001(Feb 1, 2021)

v_10000(Dec 31, 2020)

Owner

Benedek Rozemberczki

A game theoretic approach to explain the output of any machine learning model.

Interpretability and explainability of data and machine learning models

Visualizer for neural network, deep learning, and machine learning models

Visualizer for neural network, deep learning, and machine learning models

Algorithms for monitoring and explaining machine learning models

Visual analysis and diagnostic tools to facilitate machine learning model selection.

Lime: Explaining the predictions of any machine learning classifier

FairML - is a python toolbox auditing the machine learning models for bias.

A library that implements fairness-aware machine learning algorithms

A collection of research papers and software related to explainability in graph machine learning.