SPARQLing Database Queries from Intermediate Question Decompositions

This repo is the implementation of the following paper:

SPARQLing Database Queries from Intermediate Question Decompositions
Irina Saparina and Anton Osokin
To appear in proceedings of EMNLP'21

License

This software is released under the MIT license, which means that you can use the code in any way you want.

Dependencies

Conda env with pytorch 1.9

Create conda env with pytorch 1.9 and many other packages upgraded: conda_env_with_pytorch1.9.yaml:

conda env create -n env-torch1.9 -f conda_env_with_pytorch1.9.yaml
conda activate env-torch1.9

Download some nltk resourses, Bert and GraPPa:

python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt')"
python -c "from transformers import AutoModel; AutoModel.from_pretrained('bert-large-uncased-whole-word-masking'); AutoModel.from_pretrained('Salesforce/grappa_large_jnt')"

mkdir -p third_party && \
cd third_party && \
curl https://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip | jar xv

Data

We currently provide both Spider and Break inside our repos. Note that datasets differ from original ones as we fixed some annotation errors. Download databases:

bash ./utils/wget_gdrive.sh spider_temp.zip 11icoH_EA-NYb0OrPTdehRWm_d7-DIzWX
unzip spider_temp.zip -d spider_temp
cp -r spider_temp/spider/database ./data/spider
rm -rf spider_temp/
python ./qdmr2sparql/fix_databases.py --spider_path ./data/spider

To reproduce our annotation procedure see qdmr2sparql/README.md.

For testing qdmr2sparql translator run qdmr2sparql/test_qdmr2sparql.py

Experiments

Every experiment has its own config file in text2qdmr/configs/experiments. The pipeline of working with any model version or dataset is:

python run_text2qdmr.py preprocess experiment_config_file  # preprocess the data
python run_text2qdmr.py train experiment_config_file       # train a model
python run_text2qdmr.py eval experiment_config_file        # evaluate the results

# multiple GPUs on one machine:
export NGPUS=4 # set $NGPUS manually
python -m torch.distributed.launch --nproc_per_node=$NGPUS --use_env --master_port `./utils/get_free_port.sh`  run_text2qdmr.py train experiment_config_file

Note that preprocessing and evaluation use execution and take some time. To speed up the evaluation, you can install Virtuoso server (see qdmr2sparql/README_Virtuoso.md).

Checkpoints and samples

The dev and test examples of model output are model_samples/.

Checkpoints of our best models:

Model name	Dev	Test	Link
grappa-aug	80.4	62.0	https://www.dropbox.com/s/t9z1uwvohuakig8/grappa-aug_model_checkpoint-00072000?dl=0
grappa-full_break	74.6	62.6	https://www.dropbox.com/s/bf6vyhtep4knmm7/full-break-grappa_model_checkpoint-00075000?dl=0

Acknowledgements

Text2qdmr module is based on RAT-SQL code, the implementation of ACL'20 paper "RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers" by Wang et al.

Spider dataset was proposed by Yi et al. in EMNLP'18 paper "Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task".

Break dataset was proposed by Wolfson et al. in TACL paper "Break It Down: A Question Understanding Benchmark".

Hi, I'm using this on Windows and was able to run through the initial setup steps including running the preprocess of one of the experiment tests for text2qdmr. Though when I attempt to train the config file, it results in an attribute error. I've tried finding some solutions for it, but haven't been able to get around it. Any help is appreciated, thanks!

(env-torch1.9) D:\GitInstalls\sparqling-queries>python run_text2qdmr.py train ./text2qdmr/configs/experiments/bert_qdmr_train.jsonnet Running with 1 GPU [2022-02-11T14:19:49] Logging to logdir/bert_qdmr_train\bs=6,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1 Some weights of the model checkpoint at bert-large-uncased-whole-word-masking were not used when initializing BertModel: ['cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\site-packages\transformers\optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use thePyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning FutureWarning, Loaded dataset size: 4321 [2022-02-11T14:20:11] Running on git commit 'e04d0bfd507c4859be3f35d4e0d8eb57434bb4f6' [2022-02-11T14:20:12] Result of conda info: [2022-02-11T14:20:12] active environment : env-torch1.9 active env location : C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9 shell level : 2 user config file : C:\Users\a2ewmk.condarc populated config files : C:\Users\a2ewmk.condarc conda version : 4.11.0 conda-build version : 3.21.6 python version : 3.9.7.final.0 virtual packages : __cuda=10.2=0 __win=0=0 __archspec=1=x86_64 base environment : C:\Users\a2ewmk\Anaconda3 (writable) conda av data dir : C:\Users\a2ewmk\Anaconda3\etc\conda conda av metadata url : None channel URLs : https://repo.anaconda.com/pkgs/main/win-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/win-64 https://repo.anaconda.com/pkgs/r/noarch https://repo.anaconda.com/pkgs/msys2/win-64 https://repo.anaconda.com/pkgs/msys2/noarch package cache : C:\Users\a2ewmk\Anaconda3\pkgs C:\Users\a2ewmk.conda\pkgs C:\Users\a2ewmk\AppData\Local\conda\conda\pkgs envs directories : C:\Users\a2ewmk\Anaconda3\envs C:\Users\a2ewmk.conda\envs C:\Users\a2ewmk\AppData\Local\conda\conda\envs platform : win-64 user-agent : conda/4.11.0 requests/2.26.0 CPython/3.9.7 Windows/10 Windows/10.0.18363 administrator : False netrc file : None offline mode : False

[2022-02-11T14:20:12] pytorch version: 1.9.0 [2022-02-11T14:20:12] transformers version: 4.16.2 Traceback (most recent call last): File "run_text2qdmr.py", line 181, in main() File "run_text2qdmr.py", line 123, in main train.main(train_config, distributed=args.distributed) File "D:\GitInstalls\sparqling-queries\text2qdmr\commands\train.py", line 456, in main trainer.train(config, modeldir=args.logdir, tb_name=os.path.join('runs_train', args.name)) File "D:\GitInstalls\sparqling-queries\text2qdmr\commands\train.py", line 280, in train for batch in train_data_loader: File "D:\GitInstalls\sparqling-queries\text2qdmr\commands\train.py", line 392, in _yield_batches_from_epochs for batch in loader: File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\site-packages\torch\utils\data\dataloader.py", line 354, in iter self._iterator = self._get_iterator() File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init w.start() File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\popen_spawn_win32.py", line 89, in init reduction.dump(process_obj, to_child) File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'EncDecModel.Preproc.dataset..'

(env-torch1.9) D:\GitInstalls\sparqling-queries>Running with 1 GPU Traceback (most recent call last): File "", line 1, in File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\spawn.py", line 115, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input

The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

GCoNet The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection . Trained model Download final_gconet.pth

46 Nov 17, 2022

Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning This is the Github repository of our paper, "Common S

19 Nov 30, 2022

Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

[ICCV2021] Learning to Regress Bodies from Images using Differentiable Semantic Rendering Getting Started DSR has been implemented and tested on Ubunt

83 Nov 27, 2022

Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

ASFormer: Transformer for Action Segmentation This repo provides training & inference code for BMVC 2021 paper: ASFormer: Transformer for Action Segme

42 Dec 23, 2022

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories This repo is the code release of EMNLP 2021 con

12 Nov 22, 2022

Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP"

DiLBERT Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP" Pretrained Model The pretrained model presented in the paper is

2 Dec 15, 2022

Repo for the paper Extrapolating from a Single Image to a Thousand Classes using Distillation

Extrapolating from a Single Image to a Thousand Classes using Distillation by Yuki M. Asano* and Aaqib Saeed* (*Equal Contribution) Extrapolating from

16 Nov 4, 2022

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

BiDR Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval. Requirements torch==

11 Oct 20, 2022

The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

33 Jan 5, 2023

Attribute Error: Can't pickle local object 'EncDecModel.Preproc.dataset..'
Hi, I'm using this on Windows and was able to run through the initial setup steps including running the preprocess of one of the experiment tests for text2qdmr. Though when I attempt to train the config file, it results in an attribute error. I've tried finding some solutions for it, but haven't been able to get around it. Any help is appreciated, thanks!

(env-torch1.9) D:\GitInstalls\sparqling-queries>python run_text2qdmr.py train ./text2qdmr/configs/experiments/bert_qdmr_train.jsonnet Running with 1 GPU [2022-02-11T14:19:49] Logging to logdir/bert_qdmr_train\bs=6,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1 Some weights of the model checkpoint at bert-large-uncased-whole-word-masking were not used when initializing BertModel: ['cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\site-packages\transformers\optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use thePyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning FutureWarning, Loaded dataset size: 4321 [2022-02-11T14:20:11] Running on git commit 'e04d0bfd507c4859be3f35d4e0d8eb57434bb4f6' [2022-02-11T14:20:12] Result of conda info: [2022-02-11T14:20:12] active environment : env-torch1.9 active env location : C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9 shell level : 2 user config file : C:\Users\a2ewmk.condarc populated config files : C:\Users\a2ewmk.condarc conda version : 4.11.0 conda-build version : 3.21.6 python version : 3.9.7.final.0 virtual packages : __cuda=10.2=0 __win=0=0 __archspec=1=x86_64 base environment : C:\Users\a2ewmk\Anaconda3 (writable) conda av data dir : C:\Users\a2ewmk\Anaconda3\etc\conda conda av metadata url : None channel URLs : https://repo.anaconda.com/pkgs/main/win-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/win-64 https://repo.anaconda.com/pkgs/r/noarch https://repo.anaconda.com/pkgs/msys2/win-64 https://repo.anaconda.com/pkgs/msys2/noarch package cache : C:\Users\a2ewmk\Anaconda3\pkgs C:\Users\a2ewmk.conda\pkgs C:\Users\a2ewmk\AppData\Local\conda\conda\pkgs envs directories : C:\Users\a2ewmk\Anaconda3\envs C:\Users\a2ewmk.conda\envs C:\Users\a2ewmk\AppData\Local\conda\conda\envs platform : win-64 user-agent : conda/4.11.0 requests/2.26.0 CPython/3.9.7 Windows/10 Windows/10.0.18363 administrator : False netrc file : None offline mode : False

[2022-02-11T14:20:12] pytorch version: 1.9.0 [2022-02-11T14:20:12] transformers version: 4.16.2 Traceback (most recent call last): File "run_text2qdmr.py", line 181, in main() File "run_text2qdmr.py", line 123, in main train.main(train_config, distributed=args.distributed) File "D:\GitInstalls\sparqling-queries\text2qdmr\commands\train.py", line 456, in main trainer.train(config, modeldir=args.logdir, tb_name=os.path.join('runs_train', args.name)) File "D:\GitInstalls\sparqling-queries\text2qdmr\commands\train.py", line 280, in train for batch in train_data_loader: File "D:\GitInstalls\sparqling-queries\text2qdmr\commands\train.py", line 392, in _yield_batches_from_epochs for batch in loader: File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\site-packages\torch\utils\data\dataloader.py", line 354, in iter self._iterator = self._get_iterator() File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init w.start() File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\popen_spawn_win32.py", line 89, in init reduction.dump(process_obj, to_child) File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'EncDecModel.Preproc.dataset..'

(env-torch1.9) D:\GitInstalls\sparqling-queries>Running with 1 GPU Traceback (most recent call last): File "", line 1, in File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "C:\Users\a2ewmk\Anaconda3\envs\env-torch1.9\lib\multiprocessing\spawn.py", line 115, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input
opened by a2ewmk 2

This repo in the implementation of EMNLP'21 paper "SPARQLing Database Queries from Intermediate Question Decompositions" by Irina Saparina, Anton Osokin

Related tags

Overview

SPARQLing Database Queries from Intermediate Question Decompositions

License

Dependencies

Conda env with pytorch 1.9

Data

Experiments

Checkpoints and samples

Acknowledgements

You might also like...

The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP"

Repo for the paper Extrapolating from a Single Image to a Thousand Classes using Distillation

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

Comments

Attribute Error: Can't pickle local object 'EncDecModel.Preproc.dataset..'

Owner

Yandex Research

This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

The official repo of the CVPR2021 oral paper: Representative Batch Normalization with Feature Calibration

The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

The repo of the preprinting paper "Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection"

Repo for the Video Person Clustering dataset, and code for the associated paper

This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)