FairML - is a python toolbox auditing the machine learning models for bias.

Overview

========

Build Status Coverage Status GitHub license GitHub issues

FairML: Auditing Black-Box Predictive Models

FairML is a python toolbox auditing the machine learning models for bias.

Description

Predictive models are increasingly been deployed for the purpose of determining access to services such as credit, insurance, and employment. Despite societal gains in efficiency and productivity through deployment of these models, potential systemic flaws have not been fully addressed, particularly the potential for unintentional discrimination. This discrimination could be on the basis of race, gender, religion, sexual orientation, or other characteristics. This project addresses the question: how can an analyst determine the relative significance of the inputs to a black-box predictive model in order to assess the model’s fairness (or discriminatory extent)?

We present FairML, an end-to-end toolbox for auditing predictive models by quantifying the relative significance of the model’s inputs. FairML leverages model compression and four input ranking algorithms to quantify a model’s relative predictive dependence on its inputs. The relative significance of the inputs to a predictive model can then be used to assess the fairness (or discriminatory extent) of such a model. With FairML, analysts can more easily audit cumbersome predictive models that are difficult to interpret.s of black-box algorithms and corresponding input data.

Installation

You can pip install this package, via github - i.e. this repo - using the following commands:

pip install https://github.com/adebayoj/fairml/archive/master.zip

or you can clone the repository doing:

git clone https://github.com/adebayoj/fairml.git

sudo python setup.py install

Methodology

Code Demo

Now we show how to use the fairml python package to audit a black-box model.

"""
First we import modules for model building and data
processing.
"""
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression

"""
Now, we import the two key methods from fairml.
audit_model takes:

- (required) black-box function, which is the model to be audited
- (required) sample_data to be perturbed for querying the function. This has to be a pandas dataframe with no missing data.

- other optional parameters that control the mechanics of the auditing process, for example:
  - number_of_runs : number of iterations to perform
  - interactions : flag to enable checking model dependence on interactions.

audit_model returns an overloaded dictionary where keys are the column names of input pandas dataframe and values are lists containing model  dependence on that particular feature. These lists of size number_of_runs.

"""
from fairml import audit_model
from fairml import plot_generic_dependence_dictionary

Above, we provide a quick explanation of the key fairml functionality. Now we move into building an example model that we'd like to audit.

# read in the propublica data to be used for our analysis.
propublica_data = pd.read_csv(
    filepath_or_buffer="./doc/example_notebooks/"
    "propublica_data_for_fairml.csv")

# create feature and design matrix for model building.
compas_rating = propublica_data.score_factor.values
propublica_data = propublica_data.drop("score_factor", 1)


# this is just for demonstration, any classifier or regressor
# can be used here. fairml only requires a predict function
# to diagnose a black-box model.

# we fit a quick and dirty logistic regression sklearn
# model here.
clf = LogisticRegression(penalty='l2', C=0.01)
clf.fit(propublica_data.values, compas_rating)

Now let's audit the model built with FairML.

#  call audit model with model
total, _ = audit_model(clf.predict, propublica_data)

# print feature importance
print(total)

# generate feature dependence plot
fig = plot_dependencies(
    total.get_compress_dictionary_into_key_median(),
    reverse_values=False,
    title="FairML feature dependence"
)
plt.savefig("fairml_ldp.eps", transparent=False, bbox_inches='tight')

The demo above produces the figure below.

Feel free to email the authors with any questions:
Julius Adebayo ([email protected])

Data

The data used for the demo above is available in the repo at: /doc/example_notebooks/propublica_data_for_fairml.csv

Comments
  • Code not consistent with thesis

    Code not consistent with thesis

    Hi, In your thesis, somewhere you say that : four different algorithms, listed below, are run to rank the importance of all features. Each ranking algorithm produces a score for each feature. The four sets of scores obtained from the ranking algorithms are then aggregated into a combined score for each feature. The final ranking of a feature is determined by the combined score obtained. The results can then be scaled and ploted. The ranking algorithms consist of:

    1. Iterative Orthogonal Feature Projection Algorithm (IOFP)
    2. minimum Redundancy, Maximum Relevance Feature Selection (mRMR)
    3. Least Absolute Shrinkage and Selection Operator (LASSO)
    4. Random Forest (RF)

    However in the audit_model function which seems to be the main function, which is implemented in orthogonal_projection.py, you seem to only implement 1 and 2 and then return the results as 2 dictionaries. Am I correct and there is an inconsistency between the documentation and the code or have I misunderstood something?

    opened by kazemia 2
  • Partial Code Review

    Partial Code Review

    We went over the code and put in a lot of stubs for how things should change syntactically. Major take aways are:

    • [ ] PEEEPPPP 8888888
    • [ ] Move the perturbation strategies into their own file
    • [ ] Specify perturbation strategies with a callable
    • [ ] Change audit_model to directly take predict function
    • [ ] Added input sanitization into audit_model
    • [ ] pep8
    • [ ] Change setup.py file to use requirements.txt
    • [ ] Use of newlines
    • [ ] Created AuditModel object so you get pretty printing of audit output (maybe second output of audit_model should also be wrapped?)
    • [ ] pep8
    opened by mynameisfiber 1
  • FIX: the problem of pos_color and negative_color always being reversed.

    FIX: the problem of pos_color and negative_color always being reversed.

    Hi! I found a problem where the color is always the opposite of the specified color.

    I set pos_color="#3DE8F7"(Blue),negative_color="#ff4d4d"(red). The color are always reversed, even if the argument reverse_values is True or False in prahing.py/plot_dependencies. image

    So I found the cause and fixed it!

    It would be great if you could reflect this.

    opened by mei28 0
  • ValueError: cannot reshape array of size 34712 into shape (17356,)

    ValueError: cannot reshape array of size 34712 into shape (17356,)

    Hi @adebayoj

    I just tried to use Fairml to explain my XGBoost Model, and I got the aforementioned error.

    I think the error is logical, because in the mse function, I think you are doing the reshape as if it was only a one class situation. However, for my case the output_constant_col and the normal_black_box_output is of size (17356,2). So I am working in a 2 class classification scenario.

    Any ideas how to overcome the error mentioned?

    Capture (1)

    Thanks.

    opened by Dola47 1
  • Need Clarification for function detect_feature_sign

    Need Clarification for function detect_feature_sign

    Hi Julius,

    I need clarification in function detect feature sign.

    1. For calculating feature dependence value we are transforming input by obtain_orthogonal_transformed_matrix function and obtain the output from model by sending the transformed input to model.
    2. For calculating feature sign we are transforming the input by different transformation and output from model by sending the transformed input to model.

    Output from model for step 1 and 2 is different for same feature.

    So can you please clarify why we are using two different transformation for calculating feature dependence value and sign respectively. Hoping to hear from you soon.

    Thankyou in Advance

    opened by Princec711 0
  • cannot import name 'plot generic dependency dictionary'

    cannot import name 'plot generic dependency dictionary'

    Hi, I have installed fairml but when I do "from fairml import plot_generic_dependence_dictionary". I got an error: 'cannot import name 'plot generic dependency dictionary'. Can you please let me know how to fix it? Thanks

    opened by YukunZhang 5
  • Data in public domain or can I use it?

    Data in public domain or can I use it?

    I am a law professor at the University of Houston Law Center publishing a book on Data Analysis using the Wolfram Language. I would like to place your data with some minor optimizations for the Wolfram Language on the Wolfram Data Repository for use by other researchers and readers of my book. I would of course give you full credit for developing the simplified data and cite ProPublica and Northpointe (now Equivant) as originators of data. I am not sure of the licensing status of your dataset and want to be sure there is not a problem. I can be reached directly at schandler @ uh . edu

    Thanks. A very interesting and important topic!

    opened by sethchandler 1
Owner
Julius Adebayo
Julius Adebayo
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX (.onnx, .pb, .pbtxt), Keras (.h5, .keras), Tens

Lutz Roeder 20.9k Dec 28, 2022
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX, TensorFlow Lite, Keras, Caffe, Darknet, ncnn,

Lutz Roeder 16.3k Sep 27, 2021
python partial dependence plot toolbox

PDPbox python partial dependence plot toolbox Motivation This repository is inspired by ICEbox. The goal is to visualize the impact of certain feature

Li Jiangchun 722 Dec 30, 2022
⬛ Python Individual Conditional Expectation Plot Toolbox

⬛ PyCEbox Python Individual Conditional Expectation Plot Toolbox A Python implementation of individual conditional expecation plots inspired by R's IC

Austin Rochford 140 Dec 30, 2022
Visualization Toolbox for Long Short Term Memory networks (LSTMs)

Visualization Toolbox for Long Short Term Memory networks (LSTMs)

Hendrik Strobelt 1.1k Jan 4, 2023
Algorithms for monitoring and explaining machine learning models

Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The focus of the library is to provide high-qual

Seldon 1.9k Dec 30, 2022
Interpretability and explainability of data and machine learning models

AI Explainability 360 (v0.2.1) The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datase

null 1.2k Dec 29, 2022
A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Documentation | External Resources | Research Paper Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble. The

Benedek Rozemberczki 187 Dec 27, 2022
Visual analysis and diagnostic tools to facilitate machine learning model selection.

Yellowbrick Visual analysis and diagnostic tools to facilitate machine learning model selection. What is Yellowbrick? Yellowbrick is a suite of visual

District Data Labs 3.9k Dec 30, 2022
A game theoretic approach to explain the output of any machine learning model.

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allo

Scott Lundberg 18.3k Jan 8, 2023
A library for debugging/inspecting machine learning classifiers and explaining their predictions

ELI5 ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions. It provides support for the following m

null 2.6k Dec 30, 2022
Lime: Explaining the predictions of any machine learning classifier

lime This project is about explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predict

Marco Tulio Correia Ribeiro 10.3k Jan 1, 2023
A library that implements fairness-aware machine learning algorithms

Themis ML themis-ml is a Python library built on top of pandas and sklearnthat implements fairness-aware machine learning algorithms. Fairness-aware M

Niels Bantilan 105 Dec 30, 2022
A collection of research papers and software related to explainability in graph machine learning.

A collection of research papers and software related to explainability in graph machine learning.

AstraZeneca 1.9k Dec 26, 2022
L2X - Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation.

L2X Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation at ICML 2018,

Jianbo Chen 113 Sep 6, 2022
JittorVis - Visual understanding of deep learning model.

JittorVis is a deep neural network computational graph visualization library based on Jittor.

null 182 Jan 6, 2023
Python implementation of R package breakDown

pyBreakDown Python implementation of breakDown package (https://github.com/pbiecek/breakDown). Docs: https://pybreakdown.readthedocs.io. Requirements

MI^2 DataLab 41 Mar 17, 2022
Python Library for Model Interpretation/Explanations

Skater Skater is a unified framework to enable Model Interpretation for all forms of model to help one build an Interpretable machine learning system

Oracle 1k Dec 27, 2022
👋🦊 Xplique is a Python toolkit dedicated to explainability, currently based on Tensorflow.

???? Xplique is a Python toolkit dedicated to explainability, currently based on Tensorflow.

DEEL 343 Jan 2, 2023