SPRING is a seq2seq model for Text-to-AMR and AMR-to-Text (AAAI2021).

Sapienza NLP group

Last update: Dec 21, 2022

Related tags

Deep Learning natural-language-processing amr semantic-parser natural-language-generation abstract-meaning-representation semantic-parsing data-to-text

Overview

SPRING

This is the repo for SPRING (Symmetric ParsIng aNd Generation), a novel approach to semantic parsing and generation, presented at AAAI 2021.

With SPRING you can perform both state-of-the-art Text-to-AMR parsing and AMR-to-Text generation without many cumbersome external components. If you use the code, please reference this work in your paper:

@inproceedings{bevilacqua-etal-2021-one,
    title = {One {SPRING} to Rule Them Both: {S}ymmetric {AMR} Semantic Parsing and Generation without a Complex Pipeline},
    author = {Bevilacqua, Michele and Blloshmi, Rexhina and Navigli, Roberto},
    booktitle = {Proceedings of AAAI},
    year = {2021}
}

Pretrained Checkpoints

Here we release our best SPRING models which are based on the DFS linearization.

Text-to-AMR Parsing

Model trained in the AMR 2.0 training set: AMR2.parsing-1.0.tar.bz2
Model trained in the AMR 3.0 training set: AMR3.parsing-1.0.tar.bz2

AMR-to-Text Generation

Model trained in the AMR 2.0 training set: AMR2.generation-1.0.tar.bz2
Model trained in the AMR 3.0 training set: AMR3.generation-1.0.tar.bz2

If you need the checkpoints of other experiments in the paper, please send us an email.

Installation

cd spring
pip install -r requirements.txt
pip install -e .

The code only works with transformers < 3.0 because of a disrupting change in positional embeddings. The code works fine with torch 1.5. We recommend the usage of a new conda env.

Train

Modify config.yaml in configs. Instructions in comments within the file. Also see the appendix.

Text-to-AMR

python bin/train.py --config configs/config.yaml --direction amr

Results in runs/

AMR-to-Text

python bin/train.py --config configs/config.yaml --direction text

Results in runs/

Evaluate

Text-to-AMR

python bin/predict_amrs.py \
    --datasets <AMR-ROOT>/data/amrs/split/test/*.txt \
    --gold-path data/tmp/amr2.0/gold.amr.txt \
    --pred-path data/tmp/amr2.0/pred.amr.txt \
    --checkpoint runs/<checkpoint>.pt \
    --beam-size 5 \
    --batch-size 500 \
    --device cuda \
    --penman-linearization --use-pointer-tokens

gold.amr.txt and pred.amr.txt will contain, respectively, the concatenated gold and the predictions.

To reproduce our paper's results, you will also need need to run the BLINK entity linking system on the prediction file (data/tmp/amr2.0/pred.amr.txt in the previous code snippet). To do so, you will need to install BLINK, and download their models:

git clone https://github.com/facebookresearch/BLINK.git
cd BLINK
pip install -r requirements.txt
sh download_blink_models.sh
cd models
wget http://dl.fbaipublicfiles.com/BLINK//faiss_flat_index.pkl
cd ../..

Then, you will be able to launch the blinkify.py script:

python bin/blinkify.py \
    --datasets data/tmp/amr2.0/pred.amr.txt \
    --out data/tmp/amr2.0/pred.amr.blinkified.txt \
    --device cuda \
    --blink-models-dir BLINK/models

To have comparable Smatch scores you will also need to use the scripts available at https://github.com/mdtux89/amr-evaluation, which provide results that are around ~0.3 Smatch points lower than those returned by bin/predict_amrs.py.

AMR-to-Text

python bin/predict_sentences.py \
    --datasets <AMR-ROOT>/data/amrs/split/test/*.txt \
    --gold-path data/tmp/amr2.0/gold.text.txt \
    --pred-path data/tmp/amr2.0/pred.text.txt \
    --checkpoint runs/<checkpoint>.pt \
    --beam-size 5 \
    --batch-size 500 \
    --device cuda \
    --penman-linearization --use-pointer-tokens

gold.text.txt and pred.text.txt will contain, respectively, the concatenated gold and the predictions. For BLEU, chrF++, and Meteor in order to be comparable you will need to tokenize both gold and predictions using JAMR tokenizer. To compute BLEU and chrF++, please use bin/eval_bleu.py. For METEOR, use https://www.cs.cmu.edu/~alavie/METEOR/ . For BLEURT don't use tokenization and run the eval with https://github.com/google-research/bleurt. Also see the appendix.

Linearizations

The previously shown commands assume the use of the DFS-based linearization. To use BFS or PENMAN decomment the relevant lines in configs/config.yaml (for training). As for the evaluation scripts, substitute the --penman-linearization --use-pointer-tokens line with --use-pointer-tokens for BFS or with --penman-linearization for PENMAN.

License

This project is released under the CC-BY-NC-SA 4.0 license (see LICENSE). If you use SPRING, please put a link to this repo.

Acknowledgements

The authors gratefully acknowledge the support of the ERC Consolidator Grant MOUSSE No. 726487 and the ELEXIS project No. 731015 under the European Union’s Horizon 2020 research and innovation programme.

This work was supported in part by the MIUR under the grant "Dipartimenti di eccellenza 2018-2022" of the Department of Computer Science of the Sapienza University of Rome.

Comments

Bug in evaluate

Hi, when I run python bin/predict_amrs.py \ --datasets <AMR-ROOT>/data/amrs/split/test/*.txt \ --gold-path data/tmp/amr2.0/gold.amr.txt \ --pred-path data/tmp/amr2.0/pred.amr.txt \ --checkpoint runs/<checkpoint>.pt \ --beam-size 5 \ --batch-size 500 \ --device cuda \ --penman-linearization --use-pointer-tokens

I meet a problem:

RuntimeError: Error(s) in loading state_dict for AMRBartForConditionalGeneration: size mismatch for final_logits_bias: copying a param with shape torch.Size([1, 53587]) from checkpoint, the shape in current model is torch.Size([1, 53075])

could you help me?

opened by Wangpeiyi9979 15
Parameter mismatch in predict_amrs_from_plaintext.py

size mismatch for final_logits_bias: copying a param with shape torch.Size([1, 53587]) from checkpoint, the shape in current model is torch.Size([1, 53075]). size mismatch for model.shared.weight: copying a param with shape torch.Size([53587, 1024]) from checkpoint, the shape in current model is torch.Size([53075, 1024]). size mismatch for model.encoder.embed_tokens.weight: copying a param with shape torch.Size([53587, 1024]) from checkpoint, the shape in current model is torch.Size([53075, 1024]). size mismatch for model.decoder.embed_tokens.weight: copying a param with shape torch.Size([53587, 1024]) from checkpoint, the shape in current model is torch.Size([53075, 1024])

When I used predict_amrs_from_plaintext.py for customized text file, I had a mismatch in model config and model checkpoint. Specifically, I used AMR3.parsing.pt for model checkpoint and bart-large for model config. This is the direction of text to graph. Can you help me resolve this issue?

opened by xu1998hz 3
Question about the changes in AMRBartForConditionalGeneration
I see that you have a custom modeling_bart.py file with AMRBartForConditionalGeneration. It looks like most of the changes revolve around adding "backreferences" but it's difficult to tell since I'm not sure what version of transformers that file originally came from (do you happen to know this?)

Is there anywhere that explains what's being changed in this class (conceptually)? I didn't see these modifications explained in your paper "One SPRING to Rule Them Both". Are they detailed in another place?

My goal is to upgrade to the latest transformers library for compatibility reasons. This repo trains correctly if I switch to the newest BartForConditionalGeneration from the transformers library, but I'm getting smatch scores about 2 points lower than in your paper. Is this consistent with what you've seen (ie.. does the custom AMRBartForConditionalGeneration improve scores by about 2 points?).
opened by bjascob 3
Question about predicting sentences in predict_amrs_from_plaintext.py

Hi, First of all, thank you for the great work!

I have a question about predicting sentences in predict_amrs_from_plaintext.py (from L123 to L139).

Is the code for (plain text version of) AMR-to-Text generation part?

Thanks.

opened by cws7777 3
Error when loading checkpoint.
RuntimeError: Error(s) in loading state_dict for AMRBartForConditionalGeneration: Unexpected key(s) in state_dict: "model.encoder.embed_backreferences.weight", "model.encoder.embed_backreferences.transform.weight", "model.encoder.embed_backreferences.transform.bias", "model.decoder.embed_backreferences.weight", "model.decoder.embed_backreferences.transform.weight", "model.decoder.embed_backreferences.transform.bias".

When running:

python bin/predict_amrs.py \ --datasets <AMR-ROOT>/data/amrs/split/test/*.txt \ --gold-path data/tmp/amr2.0/gold.amr.txt \ --pred-path data/tmp/amr2.0/pred.amr.txt \ --checkpoint runs/<checkpoint>.pt \ --beam-size 5 \ --batch-size 500 \ --device cuda \ --penman-linearization --use-pointer-tokens

With the http://nlp.uniroma1.it/AMR/AMR2.parsing-1.0.tar.bz2 checkpoint (AMR2.amr-lin3.pt).

Can those keys be ignored from the checkpoint?
opened by mrdrozdov 3
Size mismatch when loading state dict

Thanks for this amazing work!

I tried running the predict_amrs_from_plaintext.py script but came cross a runtime error. It occurred when loading the state_dict of the checkpoint you released for AMR 3.0 for AMRBartForConditionalGeneration. I saw that you suggested that transformers version < 3 should be used. I experimented version 2.11.0 as suggested in your requirements.txt file, and also version 2.8.0, but the problem persisted. The tokenizers is of version 2.7.0. I was wondering if you have any idea regarding what the reason might be, and how I can fix the problem? Many thanks!

I've attached the full error message below:

RuntimeError: Error(s) in loading state_dict for AMRBartForConditionalGeneration: size mismatch for final_logits_bias: copying a param with shape torch.Size([1, 53587]) from checkpoint, the shape in current model is torch.Size([1, 53075]). size mismatch for model.shared.weight: copying a param with shape torch.Size([53587, 1024]) from checkpoint, the shape in current model is torch.Size([53075, 1024]). size mismatch for model.encoder.embed_tokens.weight: copying a param with shape torch.Size([53587, 1024]) from checkpoint, the shape in current model is torch.Size([53075, 1024]). size mismatch for model.decoder.embed_tokens.weight: copying a param with shape torch.Size([53587, 1024]) from checkpoint, the shape in current model is torch.Size([53075, 1024]).

opened by HuiyuanXie 2
Preprocessing for Spanish?

Hi, I am trying to fine-tune SPRING on a small set of annotated Spanish AMRs to see how its cross-lingual parsing performance is. It seems the pre-processing is not working for Spanish (understandably), but we can't find anywhere in the code that calls for e.g. spaCy for English preprocessing that we could replace with something for Spanish. Can you help guide us to where this might be, or offer general clarification? Thanks!

opened by luciaelizabeth 2
Cannot regenerate the same output

Hi,

I am doing Text-to-AMR task, the input is a text and output is a AMR. the beam size that I used is 1. But when I regenerate twice, the output AMR is not same. How can I make the output AMR be the same for each generation? Like Can I set something like seed to control that? Many thanks.

opened by 14H034160212 2
code question

Hi, thanks for your nice work. I read the code, and have a naive question: what is the role of the po_logits? why not just use the lm_logits the same as the origin Bart.

opened by Wangpeiyi9979 2
What is the input format of a text file?
Hi, thanks for the work. I would like to use the existing text-to-AMR tool like:

python bin/predict_amrs.py \ --datasets <AMR-ROOT>/data/amrs/split/test/*.txt \ --gold-path data/tmp/amr2.0/gold.amr.txt \ --pred-path data/tmp/amr2.0/pred.amr.txt \ --checkpoint runs/<checkpoint>.pt \ --beam-size 5 \ --batch-size 500 \ --device cuda \ --penman-linearization --use-pointer-tokens

Could you give me an example of the <AMR-ROOT>/data/amrs/split/test/*.txt files?

I have tried a plain text file, but it reported an AssertionError at here

I do not have LDC license, and just want to use this tool for my project. Thank you.
opened by zhaozj89 2
Reproduce AMR2Text results
Thanks for your nice work! I met a few questions when trying to reproduce the AMR2Text results on AMR2.0.

I tried to run the following command using the default config (DFS)

python bin/train.py --config configs/config.yaml --direction text

but got a BLEU score of 41.78, which is lower than the result (45.3) reported in your paper.

I also tried to do predict using released checkpoint AMR2.generation.pt as following:

python bin/predict_sentences.py \ --datasets <AMR-ROOT>/data/amrs/split/test/*.txt \ --gold-path data/tmp/amr2.0/gold.text.txt \ --pred-path data/tmp/amr2.0/pred.text.txt \ --checkpoint runs/AMR2.generation.pt \ --beam-size 5 \ --batch-size 500 \ --device cuda \ --penman-linearization --use-pointer-tokens

but only got a BLEU score of 42.3.

I have no idea what is going wrong, could anyone give me some suggestions?
My virtual environment is available at here.
opened by goodbai-nlp 2
What was the disruptive change in terms of positional embeddings?

It was a pain to install the requested version of transformers (well, actually its tokenizers==0.7.0 dependency) on our cluster. So I am hoping to contribute a fix to make the library compatible with recent transformers versions. Can you give a bit more information about the issues you experienced and what the problem is?

opened by BramVanroy 0
Hyperparameters not the same in paper and config

Thank you for open-sourcing your repo! I am trying to reproduce your results but found difficulty reaching the same scores. I then found that the hyperparameters in the config are not the same as discussed in the paper's appendix. Specifically, you mention a beam search of 5 in the paper but the config has 1. Could you please clarify? Which of these is correct?

I also find that there is a warmup_steps of 1 which seems out-of-place and a very uncommon value. Can you confirm that this is indeed correct?

opened by BramVanroy 5

SPRING is a seq2seq model for Text-to-AMR and AMR-to-Text (AAAI2021).

Related tags

Overview

SPRING

Pretrained Checkpoints

Text-to-AMR Parsing

AMR-to-Text Generation

Installation

Train

Text-to-AMR

AMR-to-Text

Evaluate

Text-to-AMR

AMR-to-Text

Linearizations

License

Acknowledgements

Comments

Owner

Sapienza NLP group

Code for our paper "Graph Pre-training for AMR Parsing and Generation" in ACL2022

Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Implementation for our AAAI2021 paper (Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction).

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

Implementation of our paper 'RESA: Recurrent Feature-Shift Aggregator for Lane Detection' in AAAI2021.

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

Out-of-Town Recommendation with Travel Intention Modeling (AAAI2021)

Intent parsing and slot filling in PyTorch with seq2seq + attention

Using a Seq2Seq RNN architecture via TensorFlow to predict future Bitcoin prices

Jittor Medical Segmentation Lib -- The assignment of Pattern Recognition course (2021 Spring) in Tsinghua University

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

A 1.3B text-to-image generation model trained on 14 million image-text pairs

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.

An atmospheric growth and evolution model based on the EVo degassing model and FastChem 2.0

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!