A pytorch implementation of Reading Wikipedia to Answer Open-Domain Questions.

Runqi Yang

Last update: Nov 8, 2022

Related tags

Overview

DrQA

A pytorch implementation of the ACL 2017 paper Reading Wikipedia to Answer Open-Domain Questions (DrQA).

Reading comprehension is a task to produce an answer when given a question and one or more pieces of evidence (usually natural language paragraphs). Compared to question answering over knowledge bases, reading comprehension models are more flexible and have revealed a great potential for zero-shot learning.

SQuAD is a reading comprehension benchmark where there's only a single piece of evidence and the answer is guaranteed to be a part of the evidence. Since the publication of SQuAD dataset, there has been fast progress in the research of reading comprehension and a bunch of great models have come out. DrQA is one that is conceptually simpler than most others but still yields strong performance even as a single model.

The motivation for this project is to offer a clean version of DrQA for the machine reading comprehension task, so one can quickly do some modifications and try out new ideas. Click here to see the comparison with what's described in the original paper and with two "official" projects ParlAI and DrQA.

Requirements

python >=3.5
pytorch 0.4. Historical versions:
- DrQA with pytorch 0.3
- DrQA with pytorch 0.2
numpy
msgpack
spacy 1.x

Quick Start

Setup

download the project via git clone https://github.com/hitvoice/DrQA.git; cd DrQA
make sure python 3, pip, wget and unzip are installed.
install pytorch matched with your OS, python and cuda versions.
install the remaining requirements via pip install -r requirements.txt
download the SQuAD datafile, GloVe word vectors and Spacy English language models using bash download.sh.

Train

# prepare the data
python prepro.py
# train for 40 epochs with batchsize 32
python train.py -e 40 -bs 32

Warning: Running prepro.py takes about 9G memory when using 8 threads. If there's not enough memory on your machine, try reducing the number of threads used by the script, for example, python prepro.py --threads 2

Predict

python interact.py

Example interactions:

Evidence: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24-10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.
Question: What day was the game played on?
Answer: February 7, 2016
Time: 0.0245s

Evidence: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24-10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.
Question: What is the AFC short for?
Answer: The American Football Conference
Time: 0.0214s

Evidence: Beanie style with simple design. So cool to wear and make you different. It wears as peak cap and a fashion cap. It is good to match your clothes well during party and holiday, also makes you charming and fashion, leisure and fashion in public and streets. It suits all adults, for men or women. Matches well with your winter outfits so you stay warm all winter long.
Question: Is it for women?
Answer: It suits all adults, for men or women
Time: 0.0238s

The last example is a randomly picked product description from Amazon (not in SQuAD).

Results

EM & F1

	EM	F1
in the original paper	69.5	78.8
in this project	69.64	78.76
offical(Spacy)	69.71	78.94
offical(CoreNLP)	69.76	79.09

Compared with the official implementation:

Detailed Comparisons

Compared to what's described in the original paper:

The grammatical features are generated by spaCy instead of Stanford CoreNLP. It's much faster and produces similar scores.

Compared to the code in facebookresearch/DrQA:

This project is much more light-weighted and focusing solely on training and evaluating on SQuAD dataset while lacking the document retriever, the interactive inference API, and some other features.
The implementation in facebookresearch/DrQA is able to train on multiple GPUs, while (currently and for simplicity) in this implementation we only support single-GPU training.

Compared to the code in facebookresearch/ParlAI:

The DrQA model is no longer wrapped in a chatbot framework, which makes the code more readable, easier to modify and is faster to train. The preprocessing for text corpus is performed only once, while in a dialog framework raw text is transmitted each time and preprocessing for the same text must be done again and again.
This is a full implementation of the original paper, while the model in ParlAI is a partial implementation, missing all grammatical features (lemma, POS tags and named entity tags).
Some minor bug fixes. Some of them have been merged into ParlAI.

About

Maintainer: Runqi Yang.

Credits: thank Jun Yang for code review and advice.

Most of the pytorch model code is borrowed from Facebook/ParlAI under a BSD-3 license.

Comments

How to get the 78.6 F1 score?

Hi,

Thanks for creating this repo!

When I ran the code with default options and 30 epochs, I got 78.0~78.1 F1 score. Did I miss something? Do I need more training epochs?

Thanks, Tao

opened by taolei87 8
Trying to understand the index_answer funtion

The last condition in this function, wherein you return (None, None). Does this condition arise or is it just for avoiding a crash. I am trying to implement the same paper and when I try to get the final labels for my context-question pair, there are many answers that result in ValueError. Is this some flaw in dataset? Thank you.

opened by kushalj001 6
FileNotFoundError: [Errno 2] No such file or directory: 'SQuAD/meta.msgpack'

I ran download.sh and I saw two files in the SQuAD folder: dev-v1.1.json train-v1.1.json

Then I got the error when running: python train.py -e 40 -bs 32

Where can I download the meta.msgpack? Thanks.

opened by ghost 6
UNK过多

在用squad_preprocess.py预处理之后，用load_squad函数load出来的context和question里面UNK太多了，最后的vocab数量40000+，如下图。但是我把data.msgpack里面存储的id形式的context和question用vocab转化为string形式后，发现UNK太多了，想问一下怎么处理呢？
question

opened by cairoHy 6
replace async with non_blocking for pytorch 1.0

this argument got renamed for pytorch 1.0: https://pytorch.org/docs/stable/notes/cuda.html can put it into a branch as well if you would like to keep things separate.

opened by brettkoonce 5
Adding Evidence as Database (like wikipedia )

say once your model is trained and you export the model for prediction. you want to add all the evidence as database in "id", "txt" format.So multiple users can run the queries on the dataset for Answers . how to add such datasets ? would we require another python script like dataset / document reader.py ?

opened by augmen 5
Small changes to run prepro.py
At first the goal was to add feedback while the script was running, but performance got worse so I gave up. But I fixed some little things for being able to run the file:

Add init.py to /drqa so we can import str2bool

Add encoding type to load_wv_vocab and build_embedding functions (I don't know if it's my machine configuration but I was getting some UnicodeDecodeErrors)
opened by dmesquita 5
Question about POS and NER in the model

Does the model map each POS tag and NER tag category to a one-hot encoding? If not, why? It doesn't make sense to me how you can just supply the category ID directly in the embedding.

opened by nanonaren 3
Question

Hi, I'm new to Machine Learning and I've got a question regarding models predictions. What kind of data do i have to provide? Or even more general question. How do I actually make any predictions using this particular model? Thanks
question

opened by EvGe22 3
planning to implement Attend It Again paper.

Hello there!

I was planning to implement the attention-again from this paper Attend It Again on DrQA.

Basically, what Attend It again does is as follows.

This model has two LSTM layers. In the bottom layer of LSTM, we use the traditional attention mechanisms and generate the hidden state of LSTM unit from previous hidden state and current input. Next step, we integrate the hidden state of previous LSTM unit in top layer, current input feature and the current output from the bottom layer of LSTM unit.

My plan was to take the doc_hiddens and the x1_emb and feed these to an Attention similar to qemb_match along with question_hiddens then feed this to a LSTM network similar to doc_rnn. Later take this output and feed into start_attn and end_attn to get the start_scores and end_scores.

Can you please tell, if this will be any good to get the better F1 measure ?

opened by anis016 2
no model file

I run the command python interact.py got this error (pt) swapnilbhadade@hitvoice:~/pt/DrQA-1$ python interact.py Traceback (most recent call last): File "interact.py", line 22, in <module> checkpoint = torch.load(args.model_file) File "/home/swapnilbhadade/pt/lib/python3.5/site-packages/torch/serialization.py", line 301, in load f = open(f, 'rb') FileNotFoundError: [Errno 2] No such file or directory: 'models/best_model.pt'

opened by augmen 2
training stopped at epoch 1

can you tell me how long does it take for the training process to complete?

i am using a google colab notebook. and it has been stuck at epoch 1 since last 20 mins

opened by rajpratyush 9
AssertionError: Torch not compiled with CUDA enabled

$ python3 train.py -e 40 -bs 32

02/15/2020 05:17:11 [Program starts. Loading data...] 02/15/2020 05:22:48 {'log_per_updates': 3, 'data_file': 'SQuAD/data.msgpack', 'model_dir': '/Users/balagopalbhallamudi/Desktop/DrQA/models', 'save_last_only': False, 'save_dawn_logs': False, 'seed': 1013, 'cuda': False, 'epochs': 40, 'batch_size': 32, 'resume': '', 'resume_options': False, 'reduce_lr': 0.0, 'optimizer': 'adamax', 'grad_clipping': 10, 'weight_decay': 0, 'learning_rate': 0.1, 'momentum': 0, 'tune_partial': 1000, 'fix_embeddings': False, 'rnn_padding': False, 'question_merge': 'self_attn', 'doc_layers': 3, 'question_layers': 3, 'hidden_size': 128, 'num_features': 4, 'pos': True, 'ner': True, 'use_qemb': True, 'concat_rnn_layers': True, 'dropout_emb': 0.4, 'dropout_rnn': 0.4, 'dropout_rnn_output': True, 'max_len': 15, 'rnn_type': 'lstm', 'pretrained_words': True, 'vocab_size': 91590, 'embedding_dim': 300, 'pos_size': 50, 'ner_size': 19} 02/15/2020 05:22:48 [Data loaded.] 02/15/2020 05:22:48 Epoch 1 02/15/2020 07:07:48 > epoch [ 1] updates[ 2707] train loss[4.38260] remaining[0:00:00]

02/15/2020 07:09:46 dev EM: 53.140964995269634 F1: 64.78947947738538 Traceback (most recent call last): File "train.py", line 377, in main() File "train.py", line 87, in main model.save(model_file, epoch, [em, f1, best_val_score]) File "/Users/balagopalbhallamudi/Desktop/DrQA/drqa/model.py", line 147, in save 'torch_cuda_state': torch.cuda.get_rng_state() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/random.py", line 20, in get_rng_state _lazy_init() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/init.py", line 196, in _lazy_init _check_driver() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/cuda/init.py", line 94, in _check_driver raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled (base) Balagopals-MacBook-Pro:DrQA balagopalbhallamudi$ python3 interact.py Traceback (most recent call last): File "interact.py", line 31, in checkpoint = torch.load(args.model_file, map_location=lambda storage, loc: storage) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 525, in load with _open_file_like(f, 'rb') as opened_file: File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 212, in _open_file_like return _open_file(name_or_buffer, mode) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/serialization.py", line 193, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'models/best_model.pt'

opened by balagopal24 2
train stop

hello, i'm the new researcher on machine reading comprehension. when use"python train.py -e 40 -bs 32", the process will stop at “Data loaded”. could you give me the solution about this?

opened by Backlight-SS 3
Using DrQA on an Chinese dataset
Is it expected that this code can be applied to a Chinese language dataset with only minor changes?

I understand that I will need to provide the following:

Chinese train/dev data files in the SQuAD format

GloVe word vectors trained on the Chinese language

Spacy Chinese language models

Changes in prepro.py to take care of things such as tokenization, add encoding="utf8" to file read/write statements, etc.

Would very much appreciate any insights if there is any known reasons why this is not supposed to work.
opened by kaihuchen 3
Is there a way to know the score of the prediction to analyse whether it is right or wrong?

@hitvoice Consider below evidence and questions

{
"evidence":"I am on vacation from July 31st and coming back next month", "question":{
"1":"when he is going on vacation?", "2":"when he is returning back from vacation?", }

Answer will be: "when he is going on vacation?": "July 31st", "when he is returning back from vacation?": "next month",

This is working as expected. But consider the case where I have not provided the return back details and the evidence is just

"evidence":"I am on vacation from July 31st"

And I am getting below answer "when he is going on vacation?": "July 31st", "when he is returning back from vacation?": "July 31st",

And we know that return back date is not July 31st, is there a way to get the score of the prediction and based on some threshold make it invalid or blank?

opened by Endeavour-BRM 1

Owner

Runqi Yang

ML engineer @Alibaba. Interested in conversational systems and deep learning.

GitHub

Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization This is an official implementation of "Feature Stylization and Domain-

22 Sep 22, 2022

A Pytorch Implementation of [Source data‐free domain adaptation of object detector through domain

A Pytorch Implementation of Source data‐free domain adaptation of object detector through domain‐specific perturbation Please follow Faster R-CNN and

1 Dec 25, 2021

🥈78th place in Riiid Answer Correctness Prediction competition

Riiid Answer Correctness Prediction Introduction This repository is the code that placed 78th in Riiid Answer Correctness Prediction competition. Requ

10 Jul 14, 2022

Transformer part of 12th place solution in Riiid! Answer Correctness Prediction

kaggle_riiid Transformer part of 12th place solution in Riiid! Answer Correctness Prediction. Please see here for more information. Execution You need

2 Apr 23, 2022

SymPy-powered, Wolfram|Alpha-like answer engine totally in your browser, without backend computation

SymPy Beta SymPy Beta is a fork of SymPy Gamma. The purpose of this project is to run a SymPy-powered, Wolfram|Alpha-like answer engine totally in you

25 Dec 21, 2022

Predicts an answer in yes or no.

Oui-ou-non-prediction Predicts an answer in 'yes' or 'no'. It is based on the game 'effeuiller la marguerite' in which the person plucks flower petals

1 Jan 15, 2022

Wordle-solver - Wordle answer generation program in python

?? Wordle Solver ?? Wordle answer generation program in python ✔️ Requirements U

4 May 28, 2022

Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"

DSBF Introduction This repository contains the implementation code for paper: Domain-Specific Bias Filtering for Single Labeled Domain Generalization

7 Jan 5, 2023

The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

Code for "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval" (ACL 2021, Long) This is the repository for baseline m