A library that implements fairness-aware machine learning algorithms

Overview

Themis ML

Build Status Documentation Status Gitter

themis-ml is a Python library built on top of pandas and sklearnthat implements fairness-aware machine learning algorithms.

Fairness-aware Machine Learning

themis-ml defines discrimination as the preference (bias) for or against a set of social groups that result in the unfair treatment of its members with respect to some outcome.

It defines fairness as the inverse of discrimination, and in the context of a machine learning algorithm, this is measured by the degree to which the algorithm's predictions favor one social group over another in relation to an outcome that holds socioeconomic, political, or legal importance, e.g. the denial/approval of a loan application.

A "fair" algorithm depends on how we define fairness. For example if we define fairness as statistical parity, a fair algorithm is one in which the proportion of approved loans among minorities is equal to the proportion of approved loans among white people.

Features

Here are a few of the discrimination discovery and fairness-aware techniques that this library implements.

Measuring Discrimination

  • Mean difference
  • Normalized mean difference
  • Consistency
  • Situation Test Score

Mitigating Discrimination

Preprocessing

  • Relabelling (Massaging)
  • Reweighting
  • Sampling

Model Estimation

  • Additive Counterfactually Fair Estimator
  • Prejudice Remover Regularized Estimator

Postprocessing

  • Reject Option Classification
  • Discrimination-aware Ensemble Classification

Datasets

themis-ml also provides utility functions for loading freely available datasets from a variety of sources.

Installation

The source code is currently hosted on GitHub: https://github.com/cosmicBboy/themis-ml. You can install the latest released version with conda or pip.

# conda
conda install -c cosmicbboy themis-ml

If you install with pip, you'll need to install scikit-learn, numpy, and pandas with either pip or conda. Version requirements:

  • numpy (>= 1.9.0)
  • scikit-learn (>= 0.19.1)
  • pandas (>= 0.22.0)
# pip
pip install themis-ml

Documentation

Official documentation for this package can be found here

References

You can find a complete set of references for the discrimination discovery and fairness-aware methods implemented in themis-ml in this paper.

Comments
  • Implement

    Implement "Additive Counterfactually Fair" estimator

    The main idea is to:

    • train linear models using some linear estimator M to predict each feature x_i using the protected class attribute s as input.
    • then compute the residuals epsilon_ij between the predicted feature values and true feature values for each observation i for each feature j.
    • The final model is then trained using epsilon_ij as input features to predict the target y.
    opened by cosmicBboy 1
  • Create utility function to load German Credit dataset

    Create utility function to load German Credit dataset

    Create a function german_dataset that outputs a pandas dataframe with the german credit data found here:

    https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)

    opened by cosmicBboy 1
  • add census income data.

    add census income data.

    fixes #31. This diff adds a function to read in census income data from 1994-1995, taken from https://archive.ics.uci.edu/ml/datasets/Census-Income+(KDD)

    opened by cosmicBboy 0
  • add census income data.

    add census income data.

    fixes #31. This diff adds a function to read in census income data from 1994-1995, taken from https://archive.ics.uci.edu/ml/datasets/Census-Income+(KDD)

    opened by cosmicBboy 0
  • add census income data.

    add census income data.

    fixes #31. This diff adds a function to read in census income data from 1994-1995, taken from https://archive.ics.uci.edu/ml/datasets/Census-Income+(KDD)

    opened by cosmicBboy 0
  • bound mean difference CI metrics to -1, 1 range

    bound mean difference CI metrics to -1, 1 range

    this revision adds constraints to the mean difference scores such that both mean difference and normalized mean difference confidence internval bounds are within -1 and 1

    opened by cosmicBboy 0
  • bound mean difference CI metrics to -1, 1 range

    bound mean difference CI metrics to -1, 1 range

    this revision adds constraints to the mean difference scores such that both mean difference and normalized mean difference confidence internval bounds are within -1 and 1

    opened by cosmicBboy 0
  • conda build recipe, pip install deps, py3.6 for ci

    conda build recipe, pip install deps, py3.6 for ci

    this revision does the following:

    • add conda build recipe in the conda.recipe directory
    • adds pypi install dependencies
    • makefile convenience targets for building conda on 2.7 and 3.6
    • adds support for python 3.6
    • add travis ci for 3.6
    opened by cosmicBboy 0
  • Implement

    Implement "Reject Option Classification" post-processing technique

    Single Classifier Setting

    • training an initial classifier on dataset D
    • generating predicted probabilities on the test set
    • computing the proximity of each prediction to the decision boundary learned by the classifier
    • within the critical region threshold theta around the decision boundary, where 0.5 < theta < 1, X_s1 (disadvantaged observations) are assigned as y+ and X_s0 (advantaged observations are assigned as y –.

    Multi-classifier Setting

    ROC in the multiple classifier setting is similar to the single classifier setting, except that predicted probabilities are defined as the weighted average of probabilities generated by each classifier C_k (k is the number of different classifiers trained), where the weights can be defined as:

    • the accuracy of the classifier on the data.
    • uniform (take the mean of the predictions)
    opened by cosmicBboy 0
  • Implement

    Implement "Massaging"/"Relabelling" data transformation technique

    This technique essentially relabels the target variables using a function that can compute a decision boundary in input data space.

    • the top n -ve labelled observations in the disadvantaged group s1 that are closest to the decision boundary are "promoted" to the +ve label.

    • the top n +ve labelled observations in the advantaged group s0 closest to the decision boundary are "demoted' to the -ve label.

    • n is the number of promotions/demotions needed to make p(+|s0) = p(+|s1)

    opened by cosmicBboy 0
  • Create stratified metrics for mean difference and normalized mean difference

    Create stratified metrics for mean difference and normalized mean difference

    The purpose of this issue is to add support for stratified mean difference and normalized mean difference so that we can control for other explanatory (or confounding) factors that may be driving the mean difference in outcome y between the advantaged and disadvantaged groups s_+ and s_-

    opened by cosmicBboy 0
  • Implement

    Implement "Discrimination-aware Ensemble Classification" postprocessing

    DAEC is like #10, with a similar relabelling rule as ROC but re-assigns any prediction where classifiers disagree on the predicted label.

    For example, if the an observation was positively labelled and the ensemble classifiers disagree on the predicted label, then the prediction would be negative.

    opened by cosmicBboy 0
  • Implement

    Implement "Prejudice Remover Regularized" Estimator

    PRR as an optimization technique that extends the standard L1/L2-norm regularization method by adding a prejudice index term to the objective function. This term is equivalent to normalized mutual information, which measures the degree to which predictions ŷ and s are dependent on each other.

    With values ranging from 0 to 1, 0 means that ŷ and s are independent and a value of 1 means that they are dependent. The goal of the objective function is to find model parameters that minimize the difference between ŷ and y in addition to the degree to which ŷ depends on s. See reference below for exact implementation details.

    Reference:
    Kamishima, T., Akaho, S., Asoh, H., & Sakuma, J. (2012). Fairness-aware classifier with prejudice remover regularizer. Machine Learning and Knowledge Discovery in Databases, 35-50.
    http://www.kamishima.net/archive/2011-ws-icdm_padm.pdf
    
    opened by cosmicBboy 0
Releases(0.0.4)
  • 0.0.1(Sep 22, 2017)

    This is the initial release of themis-ml. It exposes the following functionality with respect to fairness-aware machine learning methods:

    Measuring Discrimination

    Group-level measures

    • Mean difference
    • Normalized mean difference

    Mitigating Discrimination

    Preprocessing

    • Relabelling (Massaging)
    • Reweighting

    Model Estimation

    • Additive Counterfactually Fair Estimator

    Postprocessing

    • Reject Option Classification

    Datasets

    • German Credit Dataset
    Source code(tar.gz)
    Source code(zip)
Owner
Niels Bantilan
Data Scientist, Machine Learning Engineer
Niels Bantilan
Algorithms for monitoring and explaining machine learning models

Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The focus of the library is to provide high-qual

Seldon 1.9k Dec 30, 2022
A library for debugging/inspecting machine learning classifiers and explaining their predictions

ELI5 ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions. It provides support for the following m

null 2.6k Dec 30, 2022
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX (.onnx, .pb, .pbtxt), Keras (.h5, .keras), Tens

Lutz Roeder 20.9k Dec 28, 2022
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX, TensorFlow Lite, Keras, Caffe, Darknet, ncnn,

Lutz Roeder 16.3k Sep 27, 2021
A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Documentation | External Resources | Research Paper Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble. The

Benedek Rozemberczki 187 Dec 27, 2022
Visual analysis and diagnostic tools to facilitate machine learning model selection.

Yellowbrick Visual analysis and diagnostic tools to facilitate machine learning model selection. What is Yellowbrick? Yellowbrick is a suite of visual

District Data Labs 3.9k Dec 30, 2022
A game theoretic approach to explain the output of any machine learning model.

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allo

Scott Lundberg 18.3k Jan 8, 2023
Lime: Explaining the predictions of any machine learning classifier

lime This project is about explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predict

Marco Tulio Correia Ribeiro 10.3k Jan 1, 2023
FairML - is a python toolbox auditing the machine learning models for bias.

======== FairML: Auditing Black-Box Predictive Models FairML is a python toolbox auditing the machine learning models for bias. Description Predictive

Julius Adebayo 338 Nov 9, 2022
Interpretability and explainability of data and machine learning models

AI Explainability 360 (v0.2.1) The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datase

null 1.2k Dec 29, 2022
A collection of research papers and software related to explainability in graph machine learning.

A collection of research papers and software related to explainability in graph machine learning.

AstraZeneca 1.9k Dec 26, 2022
An intuitive library to add plotting functionality to scikit-learn objects.

Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i

Reiichiro Nakano 2.3k Dec 31, 2022
Python Library for Model Interpretation/Explanations

Skater Skater is a unified framework to enable Model Interpretation for all forms of model to help one build an Interpretable machine learning system

Oracle 1k Dec 27, 2022
null 131 Jun 25, 2021
Lucid library adapted for PyTorch

Lucent PyTorch + Lucid = Lucent The wonderful Lucid library adapted for the wonderful PyTorch! Lucent is not affiliated with Lucid or OpenAI's Clarity

Lim Swee Kiat 520 Dec 26, 2022
L2X - Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation.

L2X Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation at ICML 2018,

Jianbo Chen 113 Sep 6, 2022
JittorVis - Visual understanding of deep learning model.

JittorVis is a deep neural network computational graph visualization library based on Jittor.

null 182 Jan 6, 2023
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Algo Phantoms 81 Nov 26, 2022
deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

null 63 Oct 17, 2022
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Machine Learning From Scratch About Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The purpose

Erik Linder-Norén 21.8k Jan 9, 2023