QA-GNN: Question Answering using Language Models and Knowledge Graphs

Michihiro Yasunaga

Last update: Jan 4, 2023

Related tags

Overview

QA-GNN: Question Answering using Language Models and Knowledge Graphs

This repo provides the source code & data of our paper: QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering (NAACL 2021).

@InProceedings{yasunaga2021qagnn,
  author =  {Michihiro Yasunaga and Hongyu Ren and Antoine Bosselut and Percy Liang and Jure Leskovec},
  title =   {QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering},
  year =    {2021},  
  booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)},  
}

Webpage: https://snap.stanford.edu/qagnn

Usage

0. Dependencies

Python == 3.7
PyTorch == 1.4.0
transformers == 2.0.0
torch-geometric ==1.6.0

Run the following commands to create a conda environment (assuming CUDA10.1):

conda create -n qagnn python=3.7
source activate qagnn
pip install numpy==1.18.3 tqdm
pip install torch==1.4.0 torchvision==0.5.0
pip install transformers==2.0.0 nltk spacy==2.1.6
python -m spacy download en

#for torch-geometric
pip install torch-scatter==2.0.4 -f https://pytorch-geometric.com/whl/torch-1.4.0+cu101.html
pip install torch-cluster==1.5.4 -f https://pytorch-geometric.com/whl/torch-1.4.0+cu101.html
pip install torch-sparse==0.6.1 -f https://pytorch-geometric.com/whl/torch-1.4.0+cu101.html
pip install torch-spline-conv==1.2.0 -f https://pytorch-geometric.com/whl/torch-1.4.0+cu101.html
pip install torch-geometric==1.6.0 -f https://pytorch-geometric.com/whl/torch-1.4.0+cu101.html

1. Download Data

Download all the raw data -- ConceptNet, CommonsenseQA, OpenBookQA -- by

./download_raw_data.sh

You can preprocess the raw data by running

python preprocess.py -p <num_processes>

The script will:

Setup ConceptNet (e.g., extract English relations from ConceptNet, merge the original 42 relation types into 17 types)
Convert the QA datasets into .jsonl files (e.g., stored in data/csqa/statement/)
Identify all mentioned concepts in the questions and answers
Extract subgraphs for each q-a pair

TL;DR. The preprocessing may take long; for your convenience, you can download all the processed data by

./download_preprocessed_data.sh

The resulting file structure will look like:

.
├── README.md
└── data/
    ├── cpnet/                 (prerocessed ConceptNet)
    └── csqa/
        ├── train_rand_split.jsonl
        ├── dev_rand_split.jsonl
        ├── test_rand_split_no_answers.jsonl
        ├── statement/             (converted statements)
        ├── grounded/              (grounded entities)
        ├── graphs/                (extracted subgraphs)
        ├── ...

2. Training

For CommonsenseQA, run

./run_qagnn__csqa.sh

For OpenBookQA, run

./run_qagnn__obqa.sh

As configured in these scripts, the model needs two types of input files

--{train,dev,test}_statements: preprocessed question statements in jsonl format. This is mainly loaded by load_input_tensors function in utils/data_utils.py.
--{train,dev,test}_adj: information of the KG subgraph extracted for each question. This is mainly loaded by load_sparse_adj_data_with_contextnode function in utils/data_utils.py.

Use Your Own Dataset

Convert your dataset to {train,dev,test}.statement.jsonl in .jsonl format (see data/csqa/statement/train.statement.jsonl)
Create a directory in data/{yourdataset}/ to store the .jsonl files
Modify preprocess.py and perform subgraph extraction for your data
Modify utils/parser_utils.py to support your own dataset

Acknowledgment

This repo is built upon the following work:

Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering. Yanlin Feng*, Xinyue Chen*, Bill Yuchen Lin, Peifeng Wang, Jun Yan and Xiang Ren. EMNLP 2020.
https://github.com/INK-USC/MHGRN

Many thanks to the authors and developers!

You might also like...

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

NExT-QA We reproduce some SOTA VideoQA methods to provide benchmark results for our NExT-QA dataset accepted to CVPR2021 (with 1 'Strong Accept' and 2

50 Nov 24, 2022

FeTaQA: Free-form Table Question Answering

FeTaQA: Free-form Table Question Answering FeTaQA is a Free-form Table Question Answering dataset with 10K Wikipedia-based {table, question, free-form

Language, Information, and Learning at Yale

40 Dec 13, 2022

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Path-Generator-QA This is a Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Common

33 Dec 5, 2022

This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".

CORA This is the official implementation of the following paper: Akari Asai, Xinyan Yu, Jungo Kasai and Hannaneh Hajishirzi. One Question Answering Mo

59 Dec 28, 2022

Bilinear attention networks for visual question answering

Comments

Make scripts executable
Currently, the sctips: download_raw_data.sh, run_qagnn__csqa.sh and run_qagnn__obqa.sh are not executable. I changed the their file mode to enable direct execution by running the commands (as described in README.md):

./download_raw_data.sh

./run_qagnn__csqa.sh

./run_qagnn__obqa.sh
opened by SuJiaKuan 0

QA-GNN: Question Answering using Language Models and Knowledge Graphs

Related tags

Overview

QA-GNN: Question Answering using Language Models and Knowledge Graphs

Usage

0. Dependencies

1. Download Data

2. Training

Use Your Own Dataset

Acknowledgment

You might also like...

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

FeTaQA: Free-form Table Question Answering

Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".

Bilinear attention networks for visual question answering

Visual Question Answering in Pytorch

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

Comments

Make scripts executable

Owner

Michihiro Yasunaga

RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs

covid question answering datasets and fine tuned models

Language models are open knowledge graphs ( non official implementation )

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

GrailQA: Strongly Generalizable Question Answering

Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering