Algorithmic encoding of protected characteristics and its implications on disparities across subgroups

Team MIRA - BioMedIA

Last update: Oct 24, 2022

Related tags

Deep Learning chexploration

Overview

Algorithmic encoding of protected characteristics and its implications on disparities across subgroups

This repository contains the code for the paper

B. Glocker, S. Winzeck. Algorithmic encoding of protected characteristics and its implications on disparities across subgroups. 2021. under review. arXiv:2110.14755

Dataset

The CheXpert imaging dataset together with the patient demographic information used in this work can be downloaded from https://stanfordmlgroup.github.io/competitions/chexpert/.

Code

For running the code, we recommend setting up a dedicated Python environment.

Setup Python environment using conda

Create and activate a Python 3 conda environment:

conda create -n pymira python=3
conda activate chexploration

Install PyTorch using conda:

conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

Setup Python environment using virtualenv

Create and activate a Python 3 virtual environment:

virtualenv -p python3 <path_to_envs>/chexploration
source <path_to_envs>/chexploration/bin/activate

Install PyTorch using pip:

pip install torch torchvision

Install additional Python packages:

pip install matplotlib jupyter pandas seaborn pytorch-lightning scikit-learn scikit-image tensorboard tqdm openpyxl

How to use

In order to replicate the results presented in the paper, please follow these steps:

Download the CheXpert dataset, copy the file train.csv to the datafiles folder
Download the CheXpert demographics data, copy the file CHEXPERT DEMO.xlsx to the datafiles folder
Run the notebook chexpert.sample.ipynb to generate the study data
Adjust the variable img_data_dir to point to the imaging data and run the following scripts
- Run the script chexpert.disease.py to train a disease detection model
- Run the script chexpert.sex.py to train a sex classification model
- Run the script chexpert.race.py to train a race classification model
Run the notebook chexpert.predictions.ipynb to evaluate all three prediction models
Run the notebook chexpert.explorer.ipynb for the unsupervised exploration of feature representations

Additionally, there are scripts chexpert.sex.split.py and chexpert.race.split.py to run SPLIT on the disease detection model. The default setting in all scripts is to train a DenseNet-121 using the training data from all patients. The results for models trained on subgroups only can be produced by changing the path to the datafiles (e.g., using full_sample_train_white.csv and full_sample_val_white.csv instead of full_sample_train.csv and full_sample_val.csv).

Note, the Python scripts also contain code for running the experiments using a ResNet-34 backbone which requires less GPU memory.

Trained models

All trained models, feature embeddings and output predictions can be found here.

Funding sources

This work is supported through funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 757173, Project MIRA, ERC-2017-STG) and by the UKRI London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare.

License

This project is licensed under the Apache License 2.0.

You might also like...

[ACMMM 2021 Oral] Enhanced Invertible Encoding for Learned Image Compression

InvCompress Official Pytorch Implementation for "Enhanced Invertible Encoding for Learned Image Compression", ACMMM 2021 (Oral) Figure: Our framework

96 Nov 30, 2022

AirCode: A Robust Object Encoding Method

AirCode This repo contains source codes for the arXiv preprint "AirCode: A Robust Object Encoding Method" Demo Object matching comparison when the obj

30 Dec 9, 2022

Eth brownie struct encoding example

eth-brownie struct encoding example Overview This repository contains an example of encoding a struct, so that it can be used in a function call, usin

2 Mar 4, 2022

PyTorch implementation of Rethinking Positional Encoding in Language Pre-training

TUPE PyTorch implementation of Rethinking Positional Encoding in Language Pre-training. Quickstart Clone this repository. git clone https://github.com

5 Jan 27, 2022

The undersampled DWI image using Slice-Interleaved Diffusion Encoding (SIDE) method can be reconstructed by the UNet network.

UNet-SIDE The undersampled DWI image using Slice-Interleaved Diffusion Encoding (SIDE) method can be reconstructed by the UNet network. For Super Reso

1 Jan 13, 2022

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Algorithmic encoding of protected characteristics and its implications on disparities across subgroups

Related tags

Overview

Algorithmic encoding of protected characteristics and its implications on disparities across subgroups

Dataset

Code

Setup Python environment using conda

Setup Python environment using virtualenv

Install additional Python packages:

How to use

Trained models

Funding sources

License

You might also like...

[ACMMM 2021 Oral] Enhanced Invertible Encoding for Learned Image Compression

AirCode: A Robust Object Encoding Method

Eth brownie struct encoding example

PyTorch implementation of Rethinking Positional Encoding in Language Pre-training

The undersampled DWI image using Slice-Interleaved Diffusion Encoding (SIDE) method can be reconstructed by the UNet network.

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Algorithmic trading using machine learning.

High frequency AI based algorithmic trading module.

Algorithmic trading with deep learning experiments

Owner

Team MIRA - BioMedIA

City-seeds - A random generator of cultural characteristics intended to spark ideas and help draw threads

Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

Diabetes-Feature-Engineering - A machine learning model that can predict whether people have diabetes when their characteristics are specified

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”

CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms

git《FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding》(CVPR 2021) GitHub: [fig8]

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Distance Encoding for GNN Design

Relative Positional Encoding for Transformers with Linear Complexity