Code for Greedy Gradient Ensemble for Visual Question Answering （ICCV 2021, Oral）

Last update: Jun 29, 2022

Related tags

Deep Learning GGE

Overview

Greedy Gradient Ensemble for De-biased VQA

Code release for "Greedy Gradient Ensemble for Robust Visual Question Answering" (ICCV 2021, Oral). GGE can extend to other tasks with dataset biases.

@inproceedings{han2015greedy,
	title={Greedy Gradient Ensemble for Robust Visual Question Answering},
	author={Han, Xinzhe and Wang, Shuhui and Su, Chi and Huang, Qingming and Tian, Qi},
	booktitle={Proceedings of the IEEE international conference on computer vision},
	year={2021}
}

Prerequisites

We use Anaconda to manage our dependencies . You will need to execute the following steps to install all dependencies:

Edit the value for prefix variable in requirements.yml file, by assigning it the path to conda environment
Then, install all dependencies using: conda env create -f requirements.yml
Change to the new environment: bias

Data Setup

Download UpDn features from google drive into /data/detection_features folder
Download questions/answers for VQAv2 and VQA-CPv2 by executing bash tools/download.sh
Download visual cues/hints provided in A negative case analysis of visual grounding methods for VQA into data/hints. Note that we use caption based hints for grounding-based method reproduction, CGR and CGW.
Preprocess process the data with bash tools/process.sh

Training GGE

Run

CUDA_VISIBLE_DEVICES=0 python main.py --dataset cpv2 --mode MODE --debias gradient --topq 1 --topv -1 --qvp 5 --output []

to train a model. In main.py, import base_model for UpDn baseline; import base_model_ban as base_model for BAN baseline; import base_model_block as base_model for S-MRL baseline.

Set MODE as gge_iter and gge_tog for our best performance model; gge_d_bias and gge_q_bias for single bias ablation; base for baseline model.

Training ablations in Sec. 3 and Sec. 5

For models in Sec. 3, execute from train_ab import train and import base_model_ab as base_model in main.py. Run

CUDA_VISIBLE_DEVICES=0 python main.py --dataset cpv2 --mode MODE --debias METHODS --topq 1 --topv -1 --qvp 5 --output []

METHODS learned_mixin for LMH, MODE inv_sup for inv_sup strategy, v_inverse for inverse hint. Note that the results for HINT$_inv$ is obtained by running the code from A negative case analysis of visual grounding methods for VQA.

To test v_only model, import base_model_v_only as base_model in main.py.

To test RUBi and LMH+RUBi, run

CUDA_VISIBLE_DEVICES=0 python rubi_main.py --dataset cpv2 --mode MODE --output []

MODE updn is for RUBi, lmh_rubi is for LMH+RUBi.

Testing

For test stage, we output the overall Acc, CGR, CGW and CGD at threshold 0.2. change base_model to corresponding model in sensitivity.py and run

CUDA_VISIBLE_DEVICES=0 python sensitivity.py --dataset cpv2 --debias METHOD --load_checkpoint_path logs/your_path --output your_path

Visualization

We provide visualization in visualization.ipynb. If you want to see other visualization by yourself, download MS-COCO 2014 to data/images.

Acknowledgements

This repo uses features from A negative case analysis of visual grounding methods for VQA. Some codes are modified from CSS and UpDn.

Comments

About "util/cpv2_type_mask.json","util/cpv2_notype_mask.json","util/cpv1_type_mask.json" files.

Hello,

Thank you for your sharing.

In "dataset.py", "util/cpv2_type_mask.json","util/cpv2_notype_mask.json","util/cpv1_type_mask.json" ... Six .json files not exist in this repository. Could you offer these .json files or give methods to generate them? I have tried this way, but it doesn't work.

Thank you!

opened by zlj63501 3
Question about bias

Hi,

I was just wondering, what exactly is "b" from the dataset, or entry["bias"]?

Are these the question types? I found that they are the shape of answer labels and I don't understand why they are in that shape.

Thanks in advance!

opened by chojw 2
Performance of gge_tog

Hi, thanks for sharing your code!

I just had a question about the performance of gge_tog, it never goes above 54% while gge_iter achieves the reported performance. Is there some issue with the way I ran the code?

I used this line: python main.py --dataset cpv2 --mode ggt_tog ---debias gradient --topq 1 --topv -1 --qvp 5 --output gge_tog

Thanks in advance!

opened by chojw 2
'weight' is undefined for self.debias_loss_fn (with --debias gradient)

Hi

If I run the code with the following command: python main.py --dataset cpv2 --mode gge_iter --debias gradient --topq 1 --topv -1 --qvp 5 --output []

(i.e. --mode gge_iter and --debias gradient), then I get an error on line 81 of base_model.py: loss = self.debias_loss_fn(None, logits, ref_logits, labels, weight)

TypeError: forward() takes 5 positional arguments but 6 were given

To fix this, I've added weight as a parameter of GreedyGradient in vqa_debias_loss_functions.py, and multiplied the output of the BCE loss by weight: loss = F.binary_cross_entropy_with_logits(logits, y_gradient) * weight

I couldn't find your weighting/scaling factor in the paper, so please let me know if that's correct

opened by nihirv 1
Performance on GQA-OOD

Hi,

I saw in the GGD paper that there was testing on the GQA-OOD dataset. Will there be any updates on testing on the GQA-OOD dataset??

Thanks in advance!

opened by chojw 0
Wrong derivation of negative gradient of sigmoid+BCE

Sorry for the wrong derivation of the negative gradient for Sigmoid+BCE loss. The correct negative gradient is

$$ \nabla \mathcal{H}_i= y_i - \sigma(\mathcal{H}_i) $$

In theory, as long as the pseudo label has a negative correlation with the bias model prediction, it is able to mine the hard examples. The wrong gradient in the paper is actually an approximation of $\nabla \mathcal{H}_i$. That's why it still works well.

opened by GeraldHan 0

Owner

GitHub

Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

Unsupervised-Multi-hop-QA This repository contains code and models for the paper: Unsupervised Multi-hop Question Answering by Question Generation (NA

70 Nov 27, 2022

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

310 Dec 28, 2022

Bilinear attention networks for visual question answering

Bilinear Attention Networks This repository is the implementation of Bilinear Attention Networks for the visual question answering and Flickr30k Entit

506 Nov 29, 2022

Visual Question Answering in Pytorch

Visual Question Answering in pytorch /!\ New version of pytorch for VQA available here: https://github.com/Cadene/block.bootstrap.pytorch This repo wa

672 Jan 1, 2023

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

18 Dec 9, 2022

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

?? Flamingo - Pytorch Implementation of Flamingo, state-of-the-art few-shot visual question answering attention net, in Pytorch. It will include the p

630 Dec 28, 2022

ML-Ensemble – high performance ensemble learning

A Python library for high performance ensemble learning ML-Ensemble combines a Scikit-learn high-level API with a low-level computational graph framew

764 Dec 31, 2022

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

41 Nov 8, 2022

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering Abstract In open-domain question answering (QA), retrieve-and-read mec

34 Apr 13, 2022

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

68 Jul 18, 2022

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

39 Oct 5, 2021

The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

TriageSQL The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question Intention Classification Benchmark for Text

22 Nov 9, 2022

QA-GNN: Question Answering using Language Models and Knowledge Graphs

QA-GNN: Question Answering using Language Models and Knowledge Graphs This repo provides the source code & data of our paper: QA-GNN: Reasoning with L

434 Jan 4, 2023

GrailQA: Strongly Generalizable Question Answering

GrailQA is a new large-scale, high-quality KBQA dataset with 64,331 questions annotated with both answers and corresponding logical forms in different syntax (i.e., SPARQL, S-expression, etc.). It can be used to test three levels of generalization in KBQA: i.i.d., compositional, and zero-shot.

76 Dec 21, 2022

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Path-Generator-QA This is a Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Common

33 Dec 5, 2022

Code for Greedy Gradient Ensemble for Visual Question Answering （ICCV 2021, Oral）

Related tags

Overview

Greedy Gradient Ensemble for De-biased VQA

Prerequisites

Data Setup

Training GGE

Training ablations in Sec. 3 and Sec. 5

Testing

Visualization

Acknowledgements

Comments

About "util/cpv2_type_mask.json","util/cpv2_notype_mask.json","util/cpv1_type_mask.json" files.

Question about bias

Performance of gge_tog

'weight' is undefined for self.debias_loss_fn (with --debias gradient)

Performance on GQA-OOD

Wrong derivation of negative gradient of sigmoid+BCE

Owner

Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

Bilinear attention networks for visual question answering

Visual Question Answering in Pytorch

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

ML-Ensemble – high performance ensemble learning

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

QA-GNN: Question Answering using Language Models and Knowledge Graphs

GrailQA: Strongly Generalizable Question Answering

Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering

covid question answering datasets and fine tuned models

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

FeTaQA: Free-form Table Question Answering

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering