Source Code for ICSE 2022 Paper - ``Can We Achieve Fairness Using Semi-Supervised Learning?''

Last update: Dec 18, 2021

Related tags

Deep Learning Fair-SSL

Overview

Fair-SSL

Source Code for ICSE 2022 Paper - Can We Achieve Fairness Using Semi-Supervised Learning?

Ethical bias in machine learning models has become a matter of concern in the software engineering community. Most of the prior software engineering works concentrated on finding ethical bias in models rather than fixing it. After finding bias, the next step is mitigation. Prior researchers mainly tried to use supervised approaches to achieve fairness. However, in the real world, getting data with trustworthy ground truth is challenging and also ground truth can contain human bias. Semi-supervised learning is a domain of machine learning where labeled and unlabeled data both are used to overcome the data labeling challenges. We, in this work, applied four popular semi-supervised techniques as pseudo-labelers to create fair classification models. Our framework, Fair-SSL, takes a very small amount (10%) of labeled data as input and generates pseudo-labels for the unlabeled data. We then synthetically generate new data points to balance the training data based on class and protected attribute as proposed by Chakraborty et al. in FSE 2021. Finally, classification model is trained on the balanced pseudo-labeled data and validated on test data. After experimenting on ten datasets and three learners, we found out that Fair-SSL achieves similar performance like three other state-of-the-art bias mitigation algorithms. Where prior algorithms require much training data, Fair-SSL requires only 10% of the labeled training data. As per our knowledge, this is the first SE work where semi-supervised techniques are used to fight against ethical bias in ML models.

Dataset Description -

1> Adult Income dataset - http://archive.ics.uci.edu/ml/datasets/Adult

2> COMPAS - https://github.com/propublica/compas-analysis

3> German Credit - https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29

4> Bank Marketing - https://archive.ics.uci.edu/ml/datasets/bank+marketing

5> Default Credit - https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

6> Heart - https://archive.ics.uci.edu/ml/datasets/Heart+Disease

7> MEPS - https://meps.ahrq.gov/mepsweb/

8> Student - https://archive.ics.uci.edu/ml/datasets/Student+Performance

9> Home Credit - https://www.kaggle.com/c/home-credit-default-risk

Data Preprocessing -

We have used data preprocessing as suggested by IBM AIF360
The rows containing missing values are ignored, continuous features are converted to categorical (e.g., age<25: young,age>=25: old), non-numerical features are converted to numerical(e.g., male: 1, female: 0). Fiinally, all the feature values are normalized(converted between 0 to 1).
For optimized Pre-processing, plaese visit Optimized Preprocessing

You might also like...

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

43 Nov 19, 2022

The code for our paper Semi-Supervised Learning with Multi-Head Co-Training

Semi-Supervised Learning with Multi-Head Co-Training (PyTorch) Abstract Co-training, extended from self-training, is one of the frameworks for semi-su

6 Dec 4, 2022

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

10 Oct 11, 2022

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

5 Dec 10, 2022

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

3 Aug 20, 2022

ICLR 2021, Fair Mixup: Fairness via Interpolation

Fair Mixup: Fairness via Interpolation Training classifiers under fairness constraints such as group fairness, regularizes the disparities of predicti

49 Nov 22, 2022

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

12.6k Jan 9, 2023

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness This repository contains the code used for the exper

28 Nov 29, 2022

The audio-video synchronization of MKV Container Format is exploited to achieve data hiding

The audio-video synchronization of MKV Container Format is exploited to achieve data hiding, where the hidden data can be utilized for various management purposes, including hyper-linking, annotation, and authentication

1 Nov 17, 2021

Source Code for ICSE 2022 Paper - ``Can We Achieve Fairness Using Semi-Supervised Learning?''

Related tags

Overview

Fair-SSL

Dataset Description -

Data Preprocessing -

You might also like...

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

The code for our paper Semi-Supervised Learning with Multi-Head Co-Training

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

ICLR 2021, Fair Mixup: Fairness via Interpolation

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

The audio-video synchronization of MKV Container Format is exploited to achieve data hiding

Owner

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

[CVPR 2022] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

ISBI 2022: Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image.

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

using STGCN to achieve egg classification task

Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"

Source Code for ICSE 2022 Paper - ``Can We Achieve Fairness Using Semi-Supervised Learning?''

Related tags

Overview

Fair-SSL

Dataset Description -

Data Preprocessing -

You might also like...

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

The code for our paper Semi-Supervised Learning with Multi-Head Co-Training

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

ICLR 2021, Fair Mixup: Fairness via Interpolation

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

The audio-video synchronization of MKV Container Format is exploited to achieve data hiding

Owner

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

[CVPR 2022] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

ISBI 2022: Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image.

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

using STGCN to achieve egg classification task

Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.