Simple data balancing baselines for worst-group-accuracy benchmarks.

Meta Research

Last update: Dec 2, 2022

Related tags

Deep Learning BalancingGroups

Overview

BalancingGroups

Code to replicate the experimental results from Simple data balancing baselines achieve competitive worst-group-accuracy.

Replicating the main results

Installing dependencies

Easiest way to have a working environment for this repo is to create a conda environement with the following commands

conda env create -f environment.yaml
conda activate balancinggroups

If conda is not available, please install the dependencies listed in the requirements.txt file.

Download, extract and Generate metadata for datasets

This script downloads, extracts and formats the datasets metadata so that it works with the rest of the code out of the box.

python setup_datasets.py --download --data_path data

Launch jobs

To reproduce the experiments in the paper on a SLURM cluster :

# Launching 1400 combo seeds = 50 hparams for 4 datasets for 7 algorithms
# Each combo seed is ran 5 times to compute error bars, totalling 7000 jobs
python train.py --data_path data --output_dir main_sweep --num_hparams_seeds 1400 --num_init_seeds 5 --partition <slurm_partition>

If you want to run the jobs localy, omit the --partition argument.

Parse results

The parse.py script can generate all of the plots and tables from the paper. By default, it generates the best test worst-group-accuracy table for each dataset/method. This script can be called while the experiments are still running.

python parse.py main_sweep

License

This source code is released under the CC-BY-NC license, included here.

Comments

Best Model Parameters

Do you plan to release the best models for all the algorithms and datasets? Based on the paper, it is not clear what were the best hyperparameter values since it provides mean and std over top 5. It would help make the work more accessible for researchers with lesser compute if you could also release the models! Thanks :)

opened by pratyushmaini 4
File Not Found for civilcomments train

Are you using a different file for train set? It does not seem to get automatically downloaded.

FileNotFoundError: [Errno 2] No such file or directory: 'tr/civilcomments/civilcomments_fine.csv'

File "train.py", line 57, in run_experiment loaders = get_loaders(args["data_path"], args["dataset"], args["batch_size"], args["method"]) File "datasets.py", line 347, in get_loaders dataset_tr = Dataset(data_path, "tr", subsample_what, duplicates) File "datasets.py", line 235, in init super().init(split, data_path, subsample_what, duplicates, "fine") File "datasets.py", line 195, in init text = pd.read_csv(

opened by pratyushmaini 3
An issue for the JTT code.

Sorry to disturb you. I wonder why the code for upweighting is like "self.weights[i] += predictions.detach() * (self.hparams["up"] - 1)". I think this will upweigh the correctly-classified samples.

opened by LJSthu 2
Questions regarding the experiment on civilcomments

I have some questions regarding the group assignments for the civilcomments dataset. In the paper, it is mentioned that coarse grouping is being used, so there should only be two groups (the example being one of the identities or not).

I have generated the metadata (metadata_civilcomments_coarse.csv) with setup_datasets.py. In this metadata file, it appears that there are 8 different groups in the column named "a" (value 0-7), which seems to be inconsistent with the two groups mentioned in the paper. From the code, it appears that only the training set is using the coarse grouping while the validation and testing sets are using fine grouping. I am wondering why it is designed this way or whether there are any places that I have misunderstood?

Thank you for your time in advance.

opened by yangarbiter 2
Weight decay for BERT models

Hi! I noticed that in your code for BERT AdamW optimizer you only apply weight decay to parameters that contain the strings bias or LayerNorm.weight:

https://github.com/facebookresearch/BalancingGroups/blob/72d31e56e168b8ab03348810d4c5bac0f8a90a7a/models.py#L41-L45

The original group DRO code seems to do the opposite and not apply weight decay to only those parameters:

https://github.com/kohpangwei/group_DRO/blob/master/train.py#L111-L114

opened by izmailovpavel 0

Owner

Meta Research

GitHub

Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Balancing Training for Multilingual Neural Machine Translation Implementation of the paper Balancing Training for Multilingual Neural Machine Translat

21 May 18, 2022

Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Representation Robustness Evaluations Our implementation is based on code from MadryLab's robustness package and Devon Hjelm's Deep InfoMax. For all t

19 Dec 7, 2022

Robustness between the worst and average case

Robustness between the worst and average case A repository that implements intermediate robustness training and evaluation from the NeurIPS 2021 paper

10 Dec 10, 2021

Multi Task RL Baselines

MTRL Multi Task RL Algorithms Contents Introduction Setup Usage Documentation Contributing to MTRL Community Acknowledgements Introduction M

171 Jan 9, 2023

Baselines for TrajNet++

TrajNet++ : The Trajectory Forecasting Framework PyTorch implementation of Human Trajectory Forecasting in Crowds: A Deep Learning Perspective TrajNet

183 Jan 5, 2023

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

4.7k Jan 1, 2023

Provide baselines and evaluation metrics of the task: traffic flow prediction

Note: This repo is adpoted from https://github.com/UNIMIBInside/Smart-Mobility-Prediction. Due to technical reasons, I did not fork their code. Introd

11 Nov 2, 2022

Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

Group-CAM By Zhang, Qinglong and Rao, Lu and Yang, Yubin [State Key Laboratory for Novel Software Technology at Nanjing University] This repo is the o

98 Nov 16, 2022

BC3407-Group-5-Project - BC3407 Group Project With Python

BC3407-Group-5-Project As the world struggles to contain the ever-changing varia

1 Jan 26, 2022

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes Introduction This is the unofficial code of Deep Dual-re

113 Dec 23, 2022

Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

NeurIPS 2020 SEVIR Code for paper: SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology Requirement

USAF - MIT Artificial Intelligence Accelerator

46 Dec 15, 2022

Simple data balancing baselines for worst-group-accuracy benchmarks.

Related tags

Overview

BalancingGroups

Replicating the main results

Installing dependencies

Download, extract and Generate metadata for datasets

Launch jobs

Parse results

License

Comments

Best Model Parameters

File Not Found for civilcomments train

An issue for the JTT code.

Questions regarding the experiment on civilcomments

Weight decay for BERT models

Owner

Meta Research

Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Robustness between the worst and average case

Multi Task RL Baselines

Baselines for TrajNet++

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Provide baselines and evaluation metrics of the task: traffic flow prediction

Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

BC3407-Group-5-Project - BC3407 Group Project With Python

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

Benchmarks for semi-supervised domain generalization.

Sequence modeling benchmarks and temporal convolutional networks

Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

NeurIPS 2021 Datasets and Benchmarks Track

Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Benchmarks for the Optimal Power Flow Problem

Benchmark spaces - Benchmarks of how well different two dimensional spaces work for clustering algorithms