Simple data balancing baselines for worst-group-accuracy benchmarks.

Overview

BalancingGroups

Code to replicate the experimental results from Simple data balancing baselines achieve competitive worst-group-accuracy.

Replicating the main results

Installing dependencies

Easiest way to have a working environment for this repo is to create a conda environement with the following commands

conda env create -f environment.yaml
conda activate balancinggroups

If conda is not available, please install the dependencies listed in the requirements.txt file.

Download, extract and Generate metadata for datasets

This script downloads, extracts and formats the datasets metadata so that it works with the rest of the code out of the box.

python setup_datasets.py --download --data_path data

Launch jobs

To reproduce the experiments in the paper on a SLURM cluster :

# Launching 1400 combo seeds = 50 hparams for 4 datasets for 7 algorithms
# Each combo seed is ran 5 times to compute error bars, totalling 7000 jobs
python train.py --data_path data --output_dir main_sweep --num_hparams_seeds 1400 --num_init_seeds 5 --partition <slurm_partition>

If you want to run the jobs localy, omit the --partition argument.

Parse results

The parse.py script can generate all of the plots and tables from the paper. By default, it generates the best test worst-group-accuracy table for each dataset/method. This script can be called while the experiments are still running.

python parse.py main_sweep

License

This source code is released under the CC-BY-NC license, included here.

Comments
  • Best Model Parameters

    Best Model Parameters

    Do you plan to release the best models for all the algorithms and datasets? Based on the paper, it is not clear what were the best hyperparameter values since it provides mean and std over top 5. It would help make the work more accessible for researchers with lesser compute if you could also release the models! Thanks :)

    opened by pratyushmaini 4
  • File Not Found for civilcomments train

    File Not Found for civilcomments train

    Are you using a different file for train set? It does not seem to get automatically downloaded.

    FileNotFoundError: [Errno 2] No such file or directory: 'tr/civilcomments/civilcomments_fine.csv'


    File "train.py", line 57, in run_experiment loaders = get_loaders(args["data_path"], args["dataset"], args["batch_size"], args["method"]) File "datasets.py", line 347, in get_loaders dataset_tr = Dataset(data_path, "tr", subsample_what, duplicates) File "datasets.py", line 235, in init super().init(split, data_path, subsample_what, duplicates, "fine") File "datasets.py", line 195, in init text = pd.read_csv(

    opened by pratyushmaini 3
  • An issue for the JTT code.

    An issue for the JTT code.

    Sorry to disturb you. I wonder why the code for upweighting is like "self.weights[i] += predictions.detach() * (self.hparams["up"] - 1)". I think this will upweigh the correctly-classified samples.

    opened by LJSthu 2
  • Questions regarding the experiment on civilcomments

    Questions regarding the experiment on civilcomments

    I have some questions regarding the group assignments for the civilcomments dataset. In the paper, it is mentioned that coarse grouping is being used, so there should only be two groups (the example being one of the identities or not).

    I have generated the metadata (metadata_civilcomments_coarse.csv) with setup_datasets.py. In this metadata file, it appears that there are 8 different groups in the column named "a" (value 0-7), which seems to be inconsistent with the two groups mentioned in the paper. From the code, it appears that only the training set is using the coarse grouping while the validation and testing sets are using fine grouping. I am wondering why it is designed this way or whether there are any places that I have misunderstood?

    Thank you for your time in advance.

    opened by yangarbiter 2
  • Weight decay for BERT models

    Weight decay for BERT models

    Hi! I noticed that in your code for BERT AdamW optimizer you only apply weight decay to parameters that contain the strings bias or LayerNorm.weight:

    https://github.com/facebookresearch/BalancingGroups/blob/72d31e56e168b8ab03348810d4c5bac0f8a90a7a/models.py#L41-L45

    The original group DRO code seems to do the opposite and not apply weight decay to only those parameters:

    https://github.com/kohpangwei/group_DRO/blob/master/train.py#L111-L114

    opened by izmailovpavel 0
Owner
Meta Research
Meta Research
Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Balancing Training for Multilingual Neural Machine Translation Implementation of the paper Balancing Training for Multilingual Neural Machine Translat

Xinyi Wang 21 May 18, 2022
Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Representation Robustness Evaluations Our implementation is based on code from MadryLab's robustness package and Devon Hjelm's Deep InfoMax. For all t

Sicheng 19 Dec 7, 2022
Robustness between the worst and average case

Robustness between the worst and average case A repository that implements intermediate robustness training and evaluation from the NeurIPS 2021 paper

CMU Locus Lab 10 Dec 10, 2021
Multi Task RL Baselines

MTRL Multi Task RL Algorithms Contents Introduction Setup Usage Documentation Contributing to MTRL Community Acknowledgements Introduction M

Facebook Research 171 Jan 9, 2023
Baselines for TrajNet++

TrajNet++ : The Trajectory Forecasting Framework PyTorch implementation of Human Trajectory Forecasting in Crowds: A Deep Learning Perspective TrajNet

VITA lab at EPFL 183 Jan 5, 2023
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

DLR-RM 4.7k Jan 1, 2023
Provide baselines and evaluation metrics of the task: traffic flow prediction

Note: This repo is adpoted from https://github.com/UNIMIBInside/Smart-Mobility-Prediction. Due to technical reasons, I did not fork their code. Introd

Zhangzhi Peng 11 Nov 2, 2022
Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

Group-CAM By Zhang, Qinglong and Rao, Lu and Yang, Yubin [State Key Laboratory for Novel Software Technology at Nanjing University] This repo is the o

zhql 98 Nov 16, 2022
BC3407-Group-5-Project - BC3407 Group Project With Python

BC3407-Group-5-Project As the world struggles to contain the ever-changing varia

null 1 Jan 26, 2022
Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

NeurIPS 2020 SEVIR Code for paper: SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology Requirement

USAF - MIT Artificial Intelligence Accelerator 46 Dec 15, 2022
"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

NAS-Bench-301 This repository containts code for the paper: "NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search". The

AutoML-Freiburg-Hannover 57 Nov 30, 2022
Benchmarks for semi-supervised domain generalization.

Semi-Supervised Domain Generalization This code is the official implementation of the following paper: Semi-Supervised Domain Generalization with Stoc

Kaiyang 49 Dec 10, 2022
Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

CMU Locus Lab 3.5k Jan 1, 2023
Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Face Recognition: Too Bias, or Not Too Bias? Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition:

Joseph P. Robinson 41 Dec 12, 2022
NeurIPS 2021 Datasets and Benchmarks Track

AP-10K: A Benchmark for Animal Pose Estimation in the Wild Introduction | Updates | Overview | Download | Training Code | Key Questions | License Intr

AP-10K 82 Dec 11, 2022
Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Self-Supervised Policy Adaptation during Deployment PyTorch implementation of PAD and evaluation benchmarks from Self-Supervised Policy Adaptation dur

Nicklas Hansen 101 Nov 1, 2022
Benchmarks for the Optimal Power Flow Problem

Power Grid Lib - Optimal Power Flow This benchmark library is curated and maintained by the IEEE PES Task Force on Benchmarks for Validation of Emergi

A Library of IEEE PES Power Grid Benchmarks 207 Dec 8, 2022
Benchmark spaces - Benchmarks of how well different two dimensional spaces work for clustering algorithms

benchmark_spaces Benchmarks of how well different two dimensional spaces work fo

Bram Cohen 6 May 7, 2022