GrailQA: Strongly Generalizable Question Answering

Overview

GrailQA: Strongly Generalizable Question Answering

Contributions Welcome License language-python3 made-with-Pytorch paper image

GrailQA is a new large-scale, high-quality KBQA dataset with 64,331 questions annotated with both answers and corresponding logical forms in different syntax (i.e., SPARQL, S-expression, etc.). It can be used to test three levels of generalization in KBQA: i.i.d., compositional, and zero-shot.

This is the accompanying code for the paper "Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases" published at TheWebConf (previously WWW) 2021. For dataset and leaderboard, please refer to the homepage of GrailQA. In this repository, we provide the code for the baseline models for reproducibility and demonstrate how to work with this dataset.

Package Description

This repository is structured as follows:

GrailQA/
├─ model_configs/
    ├─ train/: Configuration files for training
    ├─ test/: Configuration files for inference
├─ data/: Data files for training, validation, and test
├─ ontology/: Processed Freebase ontology files
    ├─ domain_dict: Mapping from a domain in Freebase Commons to all schema items in it
    ├─ domain_info: Mapping from a schema item to a Freebase Commons domain it belongs to
    ├─ fb_roles: Domain and range information for a relation (Note that here domain means a different thing from domains in Freebase Commons)
    ├─ fb_types: Class hierarchy in Freebase
    ├─ reverse_properties: Reverse properties in Freebase 
├─ bert_configs/: BERT configuration used by pytorch_transformer, which you are very unlikely to modify
├─ entity_linking_results/: Entity linking results 
├─ entity_linker/: source code for the entity linker, which is a separate module from our main model
├─ vocabulary/: Preprocessed vocabulary, which is only required by our GloVe-based models
├─ cache/: Cached results for SPARQL queries, which are used to accelerate the experiments by caching many SPARQL query results offline
├─ saved_models/: Trained models
├─ utils/:
    ├─ bert_interface.py: Interface to BERT 
    ├─ logic_form_util: Tools related to logical forms, including the exact match checker for two logical forms
    ├─ search_over_graphs.py: Generate candidate logical forms for our Ranking models
    ├─ sparql_executor: Sparql-related tools
├─ bert_constrained_seq2seq.py: BERT-based model for both Ranking and Transduction
├─ bert_seq2seq_reader.py: Data reader for BERT-based models
├─ constrained_seq2seq.py: GloVe-based model for both Ranking and Transduction
├─ constrained_seq2seq_reader.py: Data reader for GloVe-based models
├─ run.py: Main function

Setup

Follow these steps if you want to reproduce the results in the paper.

  1. Follow Freebase Setup to set up a Virtuoso triplestore service. After starting your virtuoso service, replace the url in utils/sparql_executer.py with your own.
  2. Download cache files from https://1drv.ms/u/s!AuJiG47gLqTznjfRRxdW5YDYFt3o?e=GawH1f and put all the files under cache/.
  3. Download trained models from https://1drv.ms/u/s!AuJiG47gLqTznxbenfeRBrTuTbWz?e=g5Nazi and put all the files under saved_models/.
  4. Download GrailQA dataset and put it under data/.
  5. Install all required libraries:
$ pip install -r requirements.txt

(Note: we have included our adapted version of AllenNLP in this repo so there's no need to separately install that.)

Reproduce Our Results

The predictions of our baseline models can be found via CodaLab. Run predict command to reproduce the predictions. There are several arguments to configure to run predict:

python run.py predict
  [path_to_saved_model]
  [path_to_test_data]
  -c [path_to_the_config_file]
  --output-file [results_file_name] 
  --cuda-device [cuda_device_to_use]

Specifically, to run Ranking+BERT:

$ PYTHONHASHSEED=23 python run.py predict saved_models/BERT/model.tar.gz data/grailqa_v1.0_test_public.json --include-package bert_constrained_seq2seq --include-package bert_seq2seq_reader --include-package utils.bert_interface --use-dataset-reader --predictor seq2seq -c model_configs/test/bert_ranking.jsonnet --output-file bert_ranking.txt --cuda-device 0

To run Ranking+GloVe:

$ PYTHONHASHSEED=23 python run.py predict predict saved_models/GloVe/model.tar.gz data/grailqa_v1.0_test_public.json --include-package constrained_seq2seq --include-package constrained_seq2seq_reader --predictor seq2seq --use-dataset-reader -c model_configs/test/glove_ranking.jsonnet --output-file glove_ranking.txt --cuda-device 0

To run Transduction+BERT:

$ PYTHONHASHSEED=23 python run.py predict saved_models/BERT/model.tar.gz data/grailqa_v1.0_test_public.json --include-package bert_constrained_seq2seq --include-package bert_seq2seq_reader --include-package utils.bert_interface --use-dataset-reader --predictor seq2seq -c model_configs/test/bert_vp.jsonnet --output-file bert_vp.txt --cuda-device 0

To run Transduction+GloVe:

$ PYTHONHASHSEED=23 python run.py predict predict saved_models/GloVe/model.tar.gz data/grailqa_v1.0_test_public.json --include-package constrained_seq2seq --include-package constrained_seq2seq_reader --predictor seq2seq --use-dataset-reader -c model_configs/test/glove_vp.jsonnet --output-file glove_vp.txt --cuda-device 0

Entity Linking

We also release our code for entity linking to facilitate future research. Similar to most other KBQA methods, entity linking is a separate module from our main model. If you just want to run our main models, you do not need to re-run our entity linking module because our models directly use the entity linking results under entity_linking/.

Our entity linker is based on BERT-NER and the popularity-based entity disambiguation in aqqu. Specifically, we use the NER model to identify a set of entity mentions, and then use the identified mentions to retieve Freebase entities from the entity memory constructed from Freebase entity mentions information (i.e., mentions in FACC1 and all alias in Freebase if not included in FACC11).

To run our entity linker, first download the mentions data from https://1drv.ms/u/s!AuJiG47gLqTznjl7VbnOESK6qPW2?e=HDy2Ye and put all data under entity_linker/data/. Second, download our trained NER model from https://1drv.ms/u/s!AuJiG47gLqTznjge7wLqAZiSMIcU?e=5RpKaC, which is trained using the training data of GrailQA, and put it under entity_linker/BERT_NER/. Then you should be all set! We provide a use example in entity_linker/bert_entity_linker.py. Follow the use example to identiy entities using our entity linker for your own data.

[1]: FACC1 containes the mentions information for around 1/8 of Freebase entities, including different mentions for those entities and the frequency for each mention. For entities not included in FACC1, we use the following properties to retrieve the mentions for each entity: , , . Note that we don't have frequency information for those entity mentions, so we simply treat the number of occurences as 1 for all of them in our implementation.

Train New Models

You can also use our code to train new models.

Training Configuration

Following AllenNLP, our train command also takes a configuration file as input to specify all model hyperparameters and training related parameters such as learning rate, batch size, cuda device, etc. Most parameters in the training configuration files (i.e., files under model_configs/train/) are hopefully intutive based on their names, so we will only explain those parameters that might be confusing here.

- ranking: Ranking model or generation mode. True for Ranking, and false for Transduction.
- offline: Whether to use cached files under cache/.
- num_constants_per_group: Number of schema items in each chunk for BERT encoding.
- gq1: True for GraphQuestions, and false for GrailQA.
- use_sparql: Whether to use SPARQL as the target query. Set to be false, because in this paper we are using S-expressions.
- use_constrained_vocab: Whether to do vocabulary pruning or not.
- constrained_vocab: If we do vocabulary pruning, how to do it? Options include 1_step, 2_step and mix2.
- perfect_entity_linking: Whether to assume gold entities are given.

Training Command

To train the BERT-based model, run:

$ PYTHONHASHSEED=23 python run.py train model_configs/train/train_bert.jsonnet --include-package bert_constrained_seq2seq --include-package bert_seq2seq_reader --include-package utils.bert_interface -s [your_path_specified_for_training]

To train the GloVe-based model, run:

$ PYTHONHASHSEED=23 python run.py train model_configs/train/train_glove.jsonnet --include-package constrained_seq2seq --include-package constrained_seq2seq_reader -s [your_path_specified_for_training]

Online Running Time

We also show the running time of inference in online mode, in which offline caches are disabled. The aim of this setting is to mimic the real scenario in production. To report the average running time, we randomly sample 1,000 test questions for each model and run every model on a single GeoForce RTX 2080-ti GPU card with batch size 1. A comparison of different models is shown below:

Transduction Transduction-BERT Transduction-VP Transduction-BERT-VP Ranking Ranking-BERT
Running time (seconds) 60.899 50.176 4.787 1.932 115.459 80.892

The running time is quite significant when either ranking mode or vocabulary pruning is activated. This is because running SPARQL queries to query the 2-hop information (i.e., either candidate logical forms for ranking or 2-hop schema items for vocabulary pruning) is very time-consuming. This is also a general issue for the enumeration+ranking framework in KBQA, which is used by many existing methods. This issue has to some extend been underaddressed so far. A common practice is to use offline cache to store the exectuions of all related SPARQL queries, which assumes the test questions are known in advance. This assumption is true for existing KBQA benchmarks but is not realistic for a real production system. How to improve the efficiency of KBQA models while maintaining their efficacy is still an active area for research.

Citation

@inproceedings{gu2021beyond,
  title={Beyond IID: three levels of generalization for question answering on knowledge bases},
  author={Gu, Yu and Kase, Sue and Vanni, Michelle and Sadler, Brian and Liang, Percy and Yan, Xifeng and Su, Yu},
  booktitle={The World Wide Web Conference},
  year={2021},
  organization={ACM}
}
Comments
  • No 'model_config.json' file for BERT_NER

    No 'model_config.json' file for BERT_NER

    Hi there, when I was running main function of bert_entity_linker, I got this message.

    File "/home2/yhshu/.pycharm_helpers/pydev/pydevd.py", line 1483, in _exec
        pydev_imports.execfile(file, globals, locals)  # execute the script
      File "/home2/yhshu/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
        exec(compile(contents+"\n", file, 'exec'), glob, loc)
      File "/home2/yhshu/yhshu/workspace/GrailQA/entity_linker/bert_entity_linker.py", line 135, in <module>
    python-BaseException
        entity_linker = BertEntityLinker(surface_index, model_path="/BERT_NER/out_base_gq/", device="cuda:6")
      File "/home2/yhshu/yhshu/workspace/GrailQA/entity_linker/bert_entity_linker.py", line 32, in __init__
        self._model = Ner(path + model_path, device)
      File "/home2/yhshu/yhshu/workspace/GrailQA/entity_linker/BERT_NER/bert.py", line 36, in __init__
        self.model, self.tokenizer, self.model_config = self.load_model(model_dir)
      File "/home2/yhshu/yhshu/workspace/GrailQA/entity_linker/BERT_NER/bert.py", line 47, in load_model
        model_config = json.load(open(model_config))
    FileNotFoundError: [Errno 2] No such file or directory: '/home2/yhshu/yhshu/workspace/GrailQA/entity_linker/BERT_NER/out_base_gq/model_config.json'
    

    I noticed that there's no file called model_config.json in out_base_gq directory. May I know if I executed this function incorrectly or I need to get this file in some way?

    opened by yhshu 8
  • Training of Mention Detection Model (BERT-NER)

    Training of Mention Detection Model (BERT-NER)

    I would like to train the "mention detection model" on the GrailQA dataset, which can be done using /entity_linker/BERT_NER/run_ner.py. But it also expects GrailQA dataset in CoNLL-2003 format (i.e., question tokens tagged with ["B", "I", "O", "[CLS]", "[SEP]"] tags) which is not present in the repository. So would you please share the processed GrailQA dataset (train and valid split) or script for converting the dataset into the required format? Can you please confirm that the BERT-NER model has been trained with the parameters mentioned in /entity_linker/BERT_NER/run_ner.py ?

    opened by gourango01 4
  • The time of training is so long

    The time of training is so long

    Hi, excuse me, I would like to ask some questions about the time of training for Bert-based KGQA.

    When I run your code (BERT+Ranking and BERT+Transduction), the training time is very long (about two months). My machine is (V100, 32g). Do you train for such a long time? Is there something wrong with me?

    thanks for your excellent work and look forward to your reply.

    image

    opened by BitcoinNLPer 4
  • How is the entity linking performance calculated?

    How is the entity linking performance calculated?

    Hi,

    I wonder how is the entity linking performance (the linker based on AQQU) calculated? Since the ground-truth entities or the predicted entities could be empty, how does this calculation handle these situations?

    Thanks for your reply.

    opened by yhshu 2
  • Suspicious bug in logical form generation

    Suspicious bug in logical form generation

    Hi, bert_seq2seq_reader.py: line 203 seems to be suspicious. Is it a bug or a feature? Could it be lfs_2 = generate_all_logical_forms_2(entities, offline=self._offline)? It is somewhat unnatural to iterate through entities but always call generate_all_logical_forms_2 using entities[0].

    opened by xiye17 2
  • Some problem with GrailQA submission

    Some problem with GrailQA submission

    Hello,

    I have observed that the prediction results submitted by GrailQA on the test dataset need to be manually reviewed and wait for a week at the latest, which is a bit inconvenient for the iteration of the experiment. Can you provide real-time feedback and evaluation of the test dataset on Codalab as like dev dataset ?

    Best

    opened by BitcoinNLPer 2
  • the script of Sparql-to-S-expression

    the script of Sparql-to-S-expression

    At present, there is no script for the function SPARQL-to-SEXPRESSION in this dir GrailQA-main/utils/GrailQA-main/utils. Is it convenient to release it?

    Thanks!

    opened by BitcoinNLPer 2
  • Does the Ranking model currently only support one hop?

    Does the Ranking model currently only support one hop?

    For what reason does the Ranking model currently only support one hop? the code search over graphs.py at 187

    The alpha version only considers one entity and one step def generate_all_logical_forms_alpha(entity: str, domains: List[str] = None, offline=True):

    opened by LLLLLLoki 1
  • Some confusion for baseline

    Some confusion for baseline

    Hi,

    1. In the inference stage of the ranking-based model, why use the average/sum probability of each token of the candidate s-expression to select the s-expression that best matches the corresponding question? (I don’t actually understand this method). some code for the above question:

      log_probs = F.log_softmax(logits, dim=-1) log_probs = log_probs.reshape(batch_size, num_of_candidates, num_decoding_steps, -1) targets.masked_fill_(targets == -1, 0) log_probs_gather = log_probs.gather(dim=-1, index=targets.unsqueeze(-1)) log_probs_gather = log_probs_gather.squeeze(-1)

    2. In addition, why the generative model and the ranking model use the same model? and why not use BERT-based model for the binary classification task to do the matching task for the question and the candidate s-expressions?

    3. It's make-sense that using static vocabulary to learn knowledge, but the baseline model uses a dynamically variable vocabulary. What I wonder is why this approach can also well learn the related knowledge on GrailQA.

    PS: I have submitted the test set prediction file on codalab, and send a email to you~

    Best.

    opened by BitcoinNLPer 1
  • `freebase_complete_all_mention` missing

    `freebase_complete_all_mention` missing

    Hi, It seems the file for surfaceindexing [freebase_complete_all_mention] at line 133 in entity_linker/bert_entity_linker.py is missing and not included in the downloadable files. Can you possibly release it?

    opened by xiye17 1
  • error

    error

    Hi. Every time I run the code I get this error

    ModuleNotFoundError: No module named 'spacy.lang.en.tag_map'

    Spacy is installed and all other pipeline packages.

    could you please help out?

    opened by FatimahD-Alotaibi 4
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Questions about the test set

    Questions about the test set

    Hello, I would like to know how to get the correct answers of the questions in test set and their corresponding sparql statements? Do I only need to upload my model to the website you provided to get the results each time when I change the model? In other words, how do I get this test set with complete information?

    opened by luoxindi 2
  • Training Environment Consulting

    Training Environment Consulting

    Hello, I would like to ask about the configuration of your computer when training bert-seq2seq. In addition, I saw that the epoch in config.json is 1000, which means that you have trained for 1000 rounds?

    opened by Yepgang 1
Owner
OSU DKI Lab
The Data, Knowledge, and Intelligence Lab at Ohio State University
OSU DKI Lab
QA-GNN: Question Answering using Language Models and Knowledge Graphs

QA-GNN: Question Answering using Language Models and Knowledge Graphs This repo provides the source code & data of our paper: QA-GNN: Reasoning with L

Michihiro Yasunaga 434 Jan 4, 2023
Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering Abstract In open-domain question answering (QA), retrieve-and-read mec

Clova AI Research 34 Apr 13, 2022
Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering

BPR Binary Passage Retriever (BPR) is an efficient neural retrieval model for open-domain question answering. BPR integrates a learning-to-hash techni

Studio Ousia 147 Dec 7, 2022
covid question answering datasets and fine tuned models

Covid-QA Fine tuned models for question answering on Covid-19 data. Hosted Inference This model has been contributed to huggingface.Click here to see

Abhijith Neil Abraham 19 Sep 9, 2021
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

NExT-QA We reproduce some SOTA VideoQA methods to provide benchmark results for our NExT-QA dataset accepted to CVPR2021 (with 1 'Strong Accept' and 2

Junbin Xiao 50 Nov 24, 2022
FeTaQA: Free-form Table Question Answering

FeTaQA: Free-form Table Question Answering FeTaQA is a Free-form Table Question Answering dataset with 10K Wikipedia-based {table, question, free-form

Language, Information, and Learning at Yale 40 Dec 13, 2022
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

null 310 Dec 28, 2022
Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Path-Generator-QA This is a Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Common

Peifeng Wang 33 Dec 5, 2022
This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".

CORA This is the official implementation of the following paper: Akari Asai, Xinyan Yu, Jungo Kasai and Hannaneh Hajishirzi. One Question Answering Mo

Akari Asai 59 Dec 28, 2022
Bilinear attention networks for visual question answering

Bilinear Attention Networks This repository is the implementation of Bilinear Attention Networks for the visual question answering and Flickr30k Entit

Jin-Hwa Kim 506 Nov 29, 2022
Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

Kalpesh Krishna 41 Nov 8, 2022
Visual Question Answering in Pytorch

Visual Question Answering in pytorch /!\ New version of pytorch for VQA available here: https://github.com/Cadene/block.bootstrap.pytorch This repo wa

Remi 672 Jan 1, 2023
This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

AdapterHub 18 Dec 9, 2022
RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering Authors: Xi Ye, Semih Yavuz, Kazuma Hashimoto, Yingbo Zhou and

Salesforce 72 Dec 5, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 68 Jul 18, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 39 Oct 5, 2021
Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

Ren Tianhe 49 Nov 10, 2022
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

Hyungtae Lim 225 Dec 29, 2022
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

?? Flamingo - Pytorch Implementation of Flamingo, state-of-the-art few-shot visual question answering attention net, in Pytorch. It will include the p

Phil Wang 630 Dec 28, 2022