Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).

Related tags

Deep Learning GD-VCR
Overview

GD-VCR

Code for Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning (EMNLP 2021).

Research Questions and Aims:

  1. How well can a model perform on the images which requires geo-diverse commonsense to understand?
  2. What are the reasons behind performance disparity on Western and non-Western images?
  3. We aim to broaden researchers' vision on a realistic issue existing all over the world, and call upon researchers to consider more inclusive commonsense knowledge and better model transferability on various cultures.

In this repo, GD-VCR dataset and codes about 1) general model evaluation, 2) detailed controlled experiments, and 3) dataset construction are provided.

Repo Structure

GD-VCR
 ├─X_VCR				  --> storing GD-VCR/VCR data
 ├─configs
 │  └─vcr
 │     └─fine-tune-qa.json		  --> part of configs for evaluation
 ├─dataloaders
 │  └─vcr.py			          --> load GD-VCR/VCR data based on configs
 ├─models
 │  └─train.py		                  --> fine-tune/evaluate models
 │
 ├─val.jsonl			          --> GD-VCR dataset
 ├─val_addition_single.jsonl		  --> additional low-order QA pairs

GD-VCR dataset

First download the original VCR dataset to X_VCR:

cd X_VCR
wget https://s3.us-west-2.amazonaws.com/ai2-rowanz/vcr1annots.zip
wget https://s3.us-west-2.amazonaws.com/ai2-rowanz/vcr1images.zip
unzip vcr1annots.zip
unzip vcr1images.zip

Then download the GD-VCR dataset to X_VCR:

cd X_VCR
mv val.jsonl orig_val.jsonl
wget https://gdvcr.s3.us-west-1.amazonaws.com/MC-VCR_sample.zip
unzip MC-VCR_sample.zip

cd ..
mv val.jsonl X_VCR/
mv val_addition_single.jsonl X_VCR/

The detailed items in our GD-VCR dataset are almost the same as VCR. Please refer to VCR website for detailed explanations.

VisualBERT

Prepare Environment

Prepare environment as mentioned in the original repo of VisualBERT.

Fine-tune model on original VCR

Download the task-specific pre-trained checkpoint on original VCR vcr_pre_train.th to GD-VCR/visualbert/trained_models.

Then, use the command to fine-tune:

export PYTHONPATH=$PYTHONPATH:GD-VCR/visualbert/
export PYTHONPATH=$PYTHONPATH:GD-VCR/

cd GD-VCR/visualbert/models

CUDA_VISIBLE_DEVICES=0 python train.py -folder ../trained_models -config ../configs/vcr/fine-tune-qa.json

For convenience, we provide a trained checkpoint [Link] for quick evaluation.

Evaluation on GD-VCR

CUDA_VISIBLE_DEVICES=0 python train.py -folder ../trained_models -config ../configs/vcr/eval.json \
        [-region REGION] \
        [-scene SCENE] \
        [-single_or_multiple SINGLE_OR_MULTIPLE] \
        [-orig_or_new ORIG_OR_NEW] \
	[-addition_annotation_analysis] \
        [-grounding]

Here are the explanations of several important attributions:

  • REGION: One of the regions west, east-asia, south-asia, africa.
  • SCENE: One of the scenario (e.g., wedding).
  • SINGLE_OR_MULTIPLE: Whether studying single(low-order) or multiple(high-order) cognitive questions.
  • addition_annotation_analysis: Whether studying GD-VCR or additional annotated questions. If yes, you can choose to set SINGLE_OR_MULTIPLE to specify which types of questions you want to investigate.
  • ORIG_OR_NEW: Whether studying GD-VCR or original VCR dev set.
  • grounding: Whether analyzing grounding results by visualizing attention weights.

Given our fine-tuned VisualBERT model above, the evaluation results are shown below:

Models Overall West South Asia East Asia Africa
VisualBERT 53.27 **62.91** 52.04 45.39 51.85

ViLBERT

Prepare Environment

Prepare environment as mentioned in the original repo of ViLBERT.

Extract image features

We make use of the docker made for LXMERT. Detailed commands are shown below:

cd GD-VCR
git clone https://github.com/jiasenlu/bottom-up-attention.git
mv generate_tsv.py bottom-up-attention/tools
mv generate_tsv_gt.py bottom-up-attention/tools

docker pull airsplay/bottom-up-attention
docker run --name gd_vcr --runtime=nvidia -it -v /PATH/TO/:/PATH/TO/ airsplay/bottom-up-attention /bin/bash
[Used to enter into the docker]

cd /PATH/TO/GD-VCR/bottom-up-attention
pip install json_lines
pip install jsonlines
pip install python-dateutil==2.5.0

python ./tools/generate_tsv.py --cfg experiments/cfgs/faster_rcnn_end2end_resnet.yml --def models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt --out ../vilbert_beta/feature/VCR/VCR_resnet101_faster_rcnn_genome.tsv --net data/faster_rcnn_models/resnet101_faster_rcnn_final.caffemodel --total_group 1 --group_id 0 --split VCR
python ./tools/generate_tsv_gt.py --cfg experiments/cfgs/faster_rcnn_end2end_resnet.yml --def models/vg/ResNet-101/faster_rcnn_end2end_final/test_gt.prototxt --out ../vilbert_beta/feature/VCR/VCR_gt_resnet101_faster_rcnn_genome.tsv --net data/faster_rcnn_models/resnet101_faster_rcnn_final.caffemodel --total_group 1 --group_id 0 --split VCR_gt
[Used to extract features]

Then, exit the dockerfile, and convert extracted features into lmdb form:

cd GD-VCR/vilbert_beta
python script/convert_lmdb_VCR.py
python script/convert_lmdb_VCR_gt.py

Fine-tune model on original VCR

Download the pre-trained checkpoint to GD-VCR/vilbert_beta/save/bert_base_6_layer_6_connect_freeze_0/.

Then, use the command to fine-tune:

cd GD-VCR/vilbert_beta
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin  --config_file config/bert_base_6layer_6conect.json  --learning_rate 2e-5 --num_workers 16 --tasks 1-2 --save_name pretrained

For convenience, we provide a trained checkpoint [Link] for quick evaluation.

Evaluation on GD-VCR

CUDA_VISIBLE_DEVICES=0,1 python eval_tasks.py 
		--bert_model bert-base-uncased 
		--from_pretrained save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/vilbert_best.bin 
		--config_file config/bert_base_6layer_6conect.json --task 1 --split val  --batch_size 16

Note that if you want the results on original VCR dev set, you could directly change the "val_annotations_jsonpath" value of TASK1 to X_VCR/orig_val.jsonl.

Given our fine-tuned ViLBERT model above, the evaluation results are shown below:

Models Overall West South Asia East Asia Africa
ViLBERT 58.47 **65.82** 62.90 46.45 62.04

Dataset Construction

Here we provide dataset construction methods in our paper:

  • similarity.py: Compute the similarity among answer candidates and distribute candidates to each annotated questions.
  • relevance_model.py: Train a model to compute the relevance between question and answer.
  • question_cluster.py: Infer question templates from original VCR dataset as the basis of annotation.

For sake of convenience, we provide the trained relevance computation model [Link].

Acknowledgement

We thank for VisualBERT, ViLBERT, and Detectron authors' implementation. Also, we appreciate the effort of original VCR paper's author, and our work is highly influenced by VCR.

Citation

Please cite our EMNLP paper if this repository inspired your work.

@inproceedings{yin2021broaden,
  title = {Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning},
  author = {Yin, Da and Li, Liunian Harold and Hu, Ziniu and Peng, Nanyun and Chang, Kai-Wei},
  booktitle = {EMNLP},
  year = {2021}
}
You might also like...
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”
Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”

GATER This repository contains the code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”. Our implementation is

Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

Aspect Sentiment Quad Prediction (ASQP) This repo contains the annotated data and code for our paper Aspect Sentiment Quad Prediction as Paraphrase Ge

EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections Ruiqi Zhong, Kristy Lee*, Zheng Zhang*, Dan Klein EMN

[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

RoSTER The source code used for Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training, p

Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Learning Opinion Summarizers by Selecting Informative Reviews This repository contains the codebase and the dataset for the corresponding EMNLP 2021

EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation.

This repository contains data and code for our EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation. Please contact me at [email protected]

EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling This is the official implementation for "Frustratingly Simple Pretraining Al

Implementation for the EMNLP 2021 paper "Interactive Machine Comprehension with Dynamic Knowledge Graphs".

Interactive Machine Comprehension with Dynamic Knowledge Graphs Implementation for the EMNLP 2021 paper. Dependencies apt-get -y update apt-get instal

Owner
Da Yin
Da Yin
Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Mozhdeh Gheini 16 Jul 16, 2022
Data augmentation for NLP, accepted at EMNLP 2021 Findings

AEDA: An Easier Data Augmentation Technique for Text Classification This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Techni

Akbar Karimi 81 Dec 9, 2022
Related resources for our EMNLP 2021 paper

Plan-then-Generate: Controlled Data-to-Text Generation via Planning Authors: Yixuan Su, David Vandyke, Sihui Wang, Yimai Fang, and Nigel Collier Code

Yixuan Su 61 Jan 3, 2023
“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)

Data Augmentation for Cross-Domain Named Entity Recognition Authors: Shuguang Chen, Gustavo Aguilar, Leonardo Neves and Thamar Solorio This repository

RiTUAL@UH 18 Sep 10, 2022
This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories This repo is the code release of EMNLP 2021 con

null 12 Nov 22, 2022
The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

CAiRE 42 Jan 7, 2023
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

MultiModal-InfoMax This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Informa

Deep Cognition and Language Research (DeCLaRe) Lab 89 Dec 26, 2022
Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers.

Contra-OOD Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers. Requirements PyTorch Transformers datasets

Wenxuan Zhou 27 Oct 28, 2022
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

LancoPKU 105 Jan 3, 2023
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System This repository contains the PyTorch im

Libo Qin 25 Sep 6, 2022