Code and Experiments for ACL-IJCNLP 2021 Paper Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.

Overview

Mind Your Outliers!

Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering
Siddharth Karamcheti, Ranjay Krishna, Li Fei-Fei, Christopher D. Manning
Annual Meeting for the Association of Computational Linguistics (ACL-IJCNLP) 2021.

Code & Experiments for training various models and performing active learning on a variety of different VQA datasets and splits. Additional code for creating and visualizing dataset maps, for qualitative analysis!

If there are any trained models you want access to that aren't easy for you to train, please let me know and I will do my best to get them to you. Unfortunately finding a hosting solution for 1.8TB of checkpoints hasn't been easy 😅 .


Quickstart

Clones vqa-outliers to the current working directory, then walks through dependency setup, mostly leveraging the environments/environment- files. Assumes conda is installed locally (and is on your path!). Follow the directions here to install conda (Anaconda or Miniconda) if not.

We provide two installation directions -- one set of instructions for CUDA-equipped machines running Linux w/ GPUs (for training), and another for CPU-only machines (e.g., MacOS, Linux) geared towards local development and in case GPUs are not available.

The existing GPU YAML File is geared for CUDA 11.0 -- if you have older GPUs, file an issue, and I'll create an appropriate conda configuration!

Setup Instructions

# Clone `vqa-outliers` Repository and run Conda Setup
git clone https://github.com/siddk/vqa-outliers.git
cd vqa-outliers

# Ensure you're using the appropriate hardware config!
conda env create -f environments/environment-{cpu, gpu}.yaml
conda activate vqa-outliers

Usage

The following section walks through downloading all the necessary data (be warned -- it's a lot!) and running both the various active learning strategies on the given VQA datasets, as well as the code for generating Dataset Maps over the full dataset, and visualizing active learning acquisitions relative to those maps.

Note: This is going to require several hundred GB of disk space -- for targeted experiments, feel free to file an issue and I can point you to what you need!

Downloading Data

We have dependencies on a few datasets, some pretrained word vectors (GloVe), and a pretrained multimodal model (LXMERT), though not the one commonly released in HuggingFace Transformers. To download all dependencies, use the following commands from the root of this repository (in general, run everything from repository root!).

# Note: All the following will create/write to the directory data/ in the current repository -- feel free to change!

# GloVe Vectors
./scripts/download/glove.sh

# Download LXMERT Checkpoint (no-QA Pretraining)
./scripts/download/lxmert.sh

# Download VQA-2 Dataset (Entire Thing -- Questions, Raw Images, BottomUp Object Features)!
./scripts/download/vqa2.sh

# Download GQA Dataset (Entire Thing -- Questions, Raw Images, BottomUp Object Features)!
./scripts/download/gqa.sh

Additional Preprocessing

Many of the models we evaluate in this work use the object-based BottomUp-TopDown Attention Features -- however, our Grid Logistic Regression and LSTM-CNN Baseline both use dense ResNet-101 Features of the images. We extract these from the raw images ourselves as follows (again, this will take a ton of disk space):

# Note: GPU Recommended for Faster Extraction

# Extract VQA-2 Grid Features
python scripts/extract.py --dataset vqa2 --images data/VQA-Images --spatial data/VQA-Spatials

# Extract GQA Grid Features
python scripts/extract.py --dataset gqa --images data/GQA-Images --spatial data/GQA-Spatials

Running Active Learning

Running Active Learning is a simple matter of using the script active.py in the root of this directory. This script is able to reproduce every experiment from the paper, and allows you to specify the following:

  • Dataset in < vqa2 | gqa >
  • Split in < all | sports | food > (for VQA-2) and all for GQA
  • Model (mode) in < glreg | olreg | cnn | butd | lxmert > (Both Logistic Regression Models, LSTM-CNN, BottomUp-TopDown, and LXMERT, respectively)
  • Active Learning Strategy in < baseline | least-conf | entropy | mc-entropy | mc-bald | coreset-{fused, language, vision} > following the paper.
  • Size of Seed Set (burn, for burn-in) in < p05 | p10 | p25 | p50 > where each denotes percentage of full-dataset to use as seed set.

For example, to run the BottomUp-TopDown Attention Model (butd) with the VQA-2 Sports Dataset, with Bayesian Active Learning by Disagreement, with a seed set that's 10% the size of the original dataset, use the following:

# Note: If GPU available (recommended), pass --gpus 1 as well!
python active.py --dataset vqa2 --split sports --mode butd --burn p10 --strategy mc-bald

File an issue if you run into trouble!

Creating Dataset Maps

Creating a Dataset Map entails training a model on an entire dataset, while maintaining statistics on a per-example basis, over the course of training. To train models and dump these statistics, use the top-level file cartograph.py as follows (again, for the BottomUp-TopDown Model, on VQA2-Sports):

python cartograph.py --dataset vqa2 --split sports --mode butd

Once you've trained a model and generated the necessary statistics, you can plot the corresponding map using the top-level file chart.py as follows:

# Note: `map` mode only generates the dataset map... to generate acquisition plots, see below!
python chart.py --mode map --dataset vqa2 --split sports --model butd

Note that Dataset Maps are generated per-dataset, per-model!

Visualizing Acquisitions

To visualize the acquisitions of a given active learning strategy relative to a given dataset map (the bar graphs from our paper), you can run the following (again, with our running example, but works for any combination):

python chart.py --mode acquisitions --dataset vqa2 --split sports --model butd --burn p10 --strategies mc-bald

Note that the script chart.py defaults to plotting acquisitions for all active learning strategies -- either make sure to run these out for the configuration you want, or provide the appropriate arguments!

Ablating Outliers

Finally, to run the Outlier Ablation experiments for a given model/active learning strategy, take the following steps:

  • Identify the different "frontiers" of examples (different difficulty classes) by using scripts/frontier.py
  • Once this file has been generated, run active.py with the special flag --dataset vqa2-frontier and the arbitrary strategies you care about.
  • Sit back, examine the results, and get excited!

Concretely, you can generate the frontier files for a BottomUp-TopDown Attention Model as follows:

python scripts/frontier.py --model butd

Any other model would also work -- just make sure you've generated the map via cartograph.py first!


Results

We present the full set of results from the paper (and the additional results from the supplement) in the visualizations/ directory. The sub-directory active-learning shows performance vs. samples for various splits of strategies (visualizing all on the same plot is a bit taxing), while the sub-directory acquisitions has both the dataset maps and corresponding acquisitions per strategy!


Start-Up (from Scratch)

Use these commands if you're starting a repository from scratch (this shouldn't be necessary to use/build off of this code, but I like to keep this in the README in case things break in the future). Generally, you should be fine with the "Usage" section above!

Linux w/ GPU & CUDA 11.0

# Create Python Environment (assumes Anaconda -- replace with package manager of choice!)
conda create --name vqa-outliers python=3.8
conda activate vqa-outliers
conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch
conda install ipython jupyter
conda install pytorch-lightning -c conda-forge

pip install typed-argument-parser h5py opencv-python matplotlib annoy seaborn spacy scipy transformers scikit-learn

Mac OS & Linux (CPU)

# Create Python Environment (assumes Anaconda -- replace with package manager of choice!)
conda create --name vqa-outliers python=3.8
conda activate vqa-outliers
conda install pytorch torchvision torchaudio -c pytorch
conda install ipython jupyter
conda install pytorch-lightning -c conda-forge

pip install typed-argument-parser h5py opencv-python matplotlib annoy seaborn spacy scipy transformers scikit-learn

Note

We are committed to maintaining this repository for the community. We did port this code up to latest versions of PyTorch-Lightning and PyTorch, so there may be small incompatibilities we didn't catch when testing -- please feel free to open an issue if you run into problems, and I will respond within 24 hours. If urgent, please shoot me an email at [email protected] with "VQA-Outliers Code" in the Subject line and I'll be happy to help!

Comments
  • The validation score of the LXMERT baseline

    The validation score of the LXMERT baseline

    Hi, there!

    Thank you for much for sharing this outstanding work.

    Following your codes, I first re-implement the LXMERT baseline these days. But my validation score of this model is only 62% (no-vqa pertaining and using all vqa-v2 training samples), which is relatively low compared with the standard repository. Besides, the training also takes more epochs (15ep) than the default.

    Would you please share your validation score and training logs here? I want to figure out why this phenomenon. Thanks!

    opened by BierOne 5
  • Report some modification for reproduction.

    Report some modification for reproduction.

    When I ran this repository, I needed to modify some scripts, so I reported them.

    1. In README, when I extracted features of VQA2, the input/output folder's name was not correct. https://github.com/siddk/vqa-outliers/blame/main/README.md#L85
    x python scripts/extract.py --dataset vqa2 --images data/VQA-Images --spatial data/VQA-Spatials
    o python scripts/extract.py --dataset vqa2 --images data/VQA2-Images --spatial data/VQA2-Spatials
    
    1. When I extracted features of GQA, GQA's image path was not correct. The directory structure of GQA's images is not GQA-Images/image/xxxx.jpg, but GQA-Images/xxxx.jpg. So, it is necessary to modify the following lines.

    https://github.com/siddk/vqa-outliers/blob/main/scripts/extract.py#L32

    -        self.images = [x for x in os.listdir(os.path.join(path, "images")) if ".jpg" in x]
    +        self.images = [x for x in os.listdir(os.path.join(path)) if ".jpg" in x]
    

    https://github.com/siddk/vqa-outliers/blob/main/scripts/extract.py#L35

    -        i_path = os.path.join(self.path, "images", self.images[index])
    +        i_path = os.path.join(self.path, self.images[index])
    
    1. This repository requires pytorch-lightning==1.3.8 (https://github.com/siddk/vqa-outliers/blob/main/environment/environment-gpu.yaml#L127), but some lines did not match the version.

      1. In training_epoch_end of each model, it is necessary to remove return. So, I modified the following:
      -        return {"progress_bar": pbar, "log": log}
      +        for k, v in log.items():
      +            self.log(k, v)
      +
      +        # return {"progress_bar": pbar, "log": log}
      
      1. In cartograph.py, an argument of ModelCheckpoint, filepath, is not exist. I modified the following:
      checkpoint_callback = ModelCheckpoint(
      -        filepath=os.path.join(args.save_dir, "runs", run_name, args.mode + "-{epoch:02d}-{val_loss:.3f}-{val_acc:.3f}"),
      +        dirpath=os.path.join(args.save_dir, "runs", run_name),
      +        filename= args.mode + "-{epoch:02d}-{val_loss:.3f}-{val_acc:.3f}",
               monitor="val_acc",
      
    2. To create Dataset Map, when I ran cartograph.py, it was necessary to add one augment, --sync, to log metrics information. https://github.com/siddk/vqa-outliers/blame/main/README.md#L118

    NOTE: I didn't use conda, but docker + pip. The versions of packages are same to environment-gpu.yaml. Therefore, the behavior may be different from conda.

    I should have created separate issues, but I apologize for putting them together.

    opened by katsura-jp 5
  • The spatial feature's dimension of GQAObjectDataset is only 4.

    The spatial feature's dimension of GQAObjectDataset is only 4.

    In the GQA data loader, the dimension of the spatial feature is set to 4 for LXMERT. So, when I trained BUTD on GQA, the dimension was insufficient ( 6 dim is required ).

    spatials = torch.from_numpy(np.array(self.spatials[entry["image"]]))[:, :4]
    

    https://github.com/siddk/vqa-outliers/blob/main/src/preprocessing/gqa/obj_dataset.py#L141

    https://github.com/siddk/vqa-outliers/blob/main/src/preprocessing/gqa/obj_dataset.py#L141

    Therefore, it was necessary to rewrite it as follows:

    spatials = torch.from_numpy(np.array(self.spatials[entry["image"]]))
    if self.lxmert:
        spatials = spatials[:, :4]
    
    opened by katsura-jp 1
  • Fix bugs for reproduction.

    Fix bugs for reproduction.

    Fix the bugs that were reported in issue #2 and add reproducing environment on dokcer.

    Bug

    1. VQA's data folder name. change data/VQA-Images and data/VQA-Spatials to data/VQA2-Images and data/VQA2-Spatials respectively.
    2. GQA's image path.
    3. return value in training_epoch_end of each model.
    4. arguments of ModelCheckpoint.
    5. command for creating DatasetMap.

    Add

    Provide Docker environment for reproducing. I added files, Dockerfile and requirements.txt. And, I reported how to build image and run container in README.

    Attention

    I have tested the PR in Docker environment, but not on Anaconda.

    opened by katsura-jp 0
  • Potential Bug Report

    Potential Bug Report

    Potential Bug

    Hi, thanks for sharing the code. When I was trying to get the code running, I encountered the following error message:

    pytorch_lightning.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='val_acc') not found in the returned metrics: ['train_epoch_loss', 'train_epoch_acc']. 
    HINT: Did you call self.log('val_acc', value) in the LightningModule?
    

    I went through the code, and I think this might be related to the following block of code which is present in all the models. https://github.com/siddk/vqa-outliers/blob/9cb877ec6848301aec68dc31a2ebd121c521b33e/src/models/lstm_cnn.py#L268-L290

    Specifically, I think the this line of code https://github.com/siddk/vqa-outliers/blob/9cb877ec6848301aec68dc31a2ebd121c521b33e/src/models/lstm_cnn.py#L290 should be changed to

            for k, v in log.items():
                self.log(k, v)
    

    which is consistent with the implementation of training_epoch_end and also fixes the error. Moreover, the returns of validation_epoch_end is not accessed in the source code of pytorch-lightning. I also suppose that validation_epoch_end should be symmetric to training_epoch_end. I wonder if my observation is correct. Looking forwarding to your replies~

    opened by Ja1Zhou 1
Owner
Sidd Karamcheti
PhD Student at Stanford & Research Intern at Hugging Face 🤗
Sidd Karamcheti
Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

Zero-shot-Fact-Verification-by-Claim-Generation This repository contains code and models for the paper: Zero-shot Fact Verification by Claim Generatio

Liangming Pan 47 Jan 1, 2023
[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

CLNER The code is for our ACL-IJCNLP 2021 paper: Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning CLNER is a

null 71 Dec 8, 2022
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

null 310 Dec 28, 2022
NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)

NeuralWOZ This code is official implementation of "NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation". Sungdong Kim, Mi

NAVER AI 31 Oct 25, 2022
Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

Ren Tianhe 49 Nov 10, 2022
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

Hyungtae Lim 225 Dec 29, 2022
This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

AdapterHub 18 Dec 9, 2022
Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

Kalpesh Krishna 41 Nov 8, 2022
Bilinear attention networks for visual question answering

Bilinear Attention Networks This repository is the implementation of Bilinear Attention Networks for the visual question answering and Flickr30k Entit

Jin-Hwa Kim 506 Nov 29, 2022
Visual Question Answering in Pytorch

Visual Question Answering in pytorch /!\ New version of pytorch for VQA available here: https://github.com/Cadene/block.bootstrap.pytorch This repo wa

Remi 672 Jan 1, 2023
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

?? Flamingo - Pytorch Implementation of Flamingo, state-of-the-art few-shot visual question answering attention net, in Pytorch. It will include the p

Phil Wang 630 Dec 28, 2022
The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

TriageSQL The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question Intention Classification Benchmark for Text

Yusen Zhang 22 Nov 9, 2022
Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering Abstract In open-domain question answering (QA), retrieve-and-read mec

Clova AI Research 34 Apr 13, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 68 Jul 18, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 39 Oct 5, 2021
Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Path-Generator-QA This is a Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Common

Peifeng Wang 33 Dec 5, 2022
Code for "The Intrinsic Dimension of Images and Its Impact on Learning" - ICLR 2021 Spotlight

dimensions Estimating the instrinsic dimensionality of image datasets Code for: The Intrinsic Dimensionaity of Images and Its Impact On Learning - Phi

Phil Pope 41 Dec 10, 2022
So-ViT: Mind Visual Tokens for Vision Transformer

So-ViT: Mind Visual Tokens for Vision Transformer        Introduction This repository contains the source code under PyTorch framework and models trai

Jiangtao Xie 44 Nov 24, 2022
Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial

Utku Ozbulak 53 Jul 4, 2022