Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Related tags

Deep Learning g2g-transformer

Overview

Graph-to-Graph Transformers

Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NLP) tasks, especially when combined with language-model pre-training, such as BERT.

We propose "Graph-to-Graph Transformer" and "Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement"(accepted to TACL) to generalize vanilla Transformer to encode graph structure, and builds the desired output graph.

Note : To use G2GTr model for transition-based dependency parsing, please refer to G2GTr repository.

Installation
Quick Start
Data Pre-processing and Initial Parser
Training
Evaluation
Predict Raw Sentences
Citations

Installation

Following packages should be included in your environment:

Python >= 3.7
PyTorch >= 1.4.0
Transformers(huggingface) = 2.4.1

The easier way is to run the following command:

conda env create -f environment.yml
conda activate rngtr

Quick Start

Graph-to-Graph Transformer architecture is general and can be applied to any NLP tasks which interacts with graphs. To use our implementation in your task, you just need to add BertGraphModel class to your code to encode both token-level and graph-level information. Here is a sample usage:

#Loading BertGraphModel and initialize it with available BERT models.
import torch
from parser.utils.graph import initialize_bertgraph,BertGraphModel
# inputing unlabelled graph with label size 5, and Layer Normalization of key
# you can load other BERT pre-trained models too.
encoder = initialize_bertgraph('bert-base-cased',layernorm_key=True,layernorm_value=False,
             input_label_graph=False,input_unlabel_graph=True,label_size=5)

#sample input
input = torch.tensor([[1,2],[3,4]])
graph = torch.tensor([ [[1,0],[0,1]],[[0,1],[1,0]] ])
graph_rel = torch.tensor([[0,1],[3,4]])
output = encoder(input_ids=input,graph_arc=graph,graph_rel=graph_rel)
print(output[0].shape)
## torch.Size([2, 2, 768])

# inputting labelled graph
encoder = initialize_bertgraph('bert-base-cased',layernorm_key=True,layernorm_value=False,
             input_label_graph=True,input_unlabel_graph=False,label_size=5)

#sample input
input = torch.tensor([[1,2],[3,4]])
graph = torch.tensor([ [[2,0],[0,3]],[[0,1],[4,0]] ])
output = encoder(input_ids=input,graph_arc=graph,)
print(output[0].shape)
## torch.Size([2, 2, 768])

If you just want to use BertGraphModel in your research, you can just import it from our repository:

from parser.utils.graph import BertGraphModel,BertGraphConfig
config = BertGraphConfig(YOUR-CONFIG)
config.add_graph_par(GRAPH-CONFIG)
encoder = BertGraphModel(config)

Data Pre-processing and Initial Parser

Dataset Preparation

We evaluated our model on UD Treebanks, English and Chinese Penn Treebanks, and CoNLL 2009 Shared Task. In following sections, we prepare datasets and their evaluation scripts.

Penn Treebanks

English Penn Treebank can be downloaded from english and chinese under LDC license. For English Penn Treebank, replace gold POS tags with Stanford POS tagger with following command in this repository:

bash scripts/postag.sh ${data_dir}/ptb3-wsj-[train|dev|dev.proj|test].conllx

CoNLL 2009 Treebanks

You can download Treebanks from here under LDC license. We use predicted POS tags provided by organizers.

UD Treebanks

You can find required Treebanks from here. (use version 2.3)

Initial Parser

As mentioned in our paper, you can use any initial parser to produce dependency graph. Here we use Biaffine Parser for Penn Treebanks, and German Corpus. We also apply our model to ouput prediction of UDify parser for UD Treebanks.
Biaffine Parser: To prepare biaffine initial parser, we use this repository to produce output predictions.
UDify Parser: For UD Treebanks, we use UDify repository to produce required initial dependency graph.
Alternatively, you can easily run the following command file to produce all required outputs:

bash job_scripts/udify_dataset.bash

Training

To train your own model, you can easily fill out the script in job_scripts directory, and run it. Here is the list of sample scripts:

Model	Script
Syntactic Transformer	`baseline.bash`
Any initial parser+RNGTr	`rngtr.bash`
Empty+RNGTr	`empty_rngtr.bash`

Evaluation

First you should download official scripts from UD, Penn Treebaks, and German. Then, run the following command:

bash job_scripts/predict.bash

To replicate refinement analysis and error analysis results, you should use MaltEval tools.

Predict Raw Sentences

You can also predict dependency graphs of raw texts with a pre-trained model by modifying predict.bash file. Just set input_type to raw. Then, put all your sentences in a .txt file, and the output will be in CoNNL format.

Citations

If you use this code for your research, please cite these works as:

@misc{mohammadshahi2020recursive,
      title={Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement}, 
      author={Alireza Mohammadshahi and James Henderson},
      year={2020},
      eprint={2003.13118},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@inproceedings{mohammadshahi-henderson-2020-graph,
    title = "Graph-to-Graph Transformer for Transition-based Dependency Parsing",
    author = "Mohammadshahi, Alireza  and
      Henderson, James",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.findings-emnlp.294",
    pages = "3278--3289",
    abstract = "We propose the Graph2Graph Transformer architecture for conditioning on and predicting arbitrary graphs, and apply it to the challenging task of transition-based dependency parsing. After proposing two novel Transformer models of transition-based dependency parsing as strong baselines, we show that adding the proposed mechanisms for conditioning on and predicting graphs of Graph2Graph Transformer results in significant improvements, both with and without BERT pre-training. The novel baselines and their integration with Graph2Graph Transformer significantly outperform the state-of-the-art in traditional transition-based dependency parsing on both English Penn Treebank, and 13 languages of Universal Dependencies Treebanks. Graph2Graph Transformer can be integrated with many previous structured prediction methods, making it easy to apply to a wide range of NLP tasks.",
}

Have a question not listed here? Open a GitHub Issue or send us an email.

PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Non-Autoregressive Transformer Code release for Non-Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K.

261 Nov 12, 2022

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

R2D2 This is the official code for paper titled "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Mode

49 Dec 17, 2022

This is a template for the Non-autoregressive Deep Learning-Based TTS model (in PyTorch).

Non-autoregressive Deep Learning-Based TTS Template This is a template for the Non-autoregressive TTS model. It contains Data Preprocessing Pipeline D

13 Dec 5, 2022

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

138 Oct 28, 2022

Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

DRRN-pytorch This is an unofficial implementation of "Deep Recursive Residual Network for Super Resolution (DRRN)", CVPR 2017 in Pytorch. [Paper] You

192 Dec 12, 2022

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

TalkNet 2 [WIP] TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Predictio

69 Dec 17, 2022

Comments

The graph adjacency matrix format for the model
Hi @alirezamshi , thanks very for sharing this work, very interesting indeed.

I've got a question on the exact labelled graph format when using your example input:

#sample input input = torch.tensor([[1,2],[3,4]]) graph = torch.tensor([ [[2,0],[0,3]],[[0,1],[4,0]] ]) output = encoder(input_ids=input,graph_arc=graph,)

As you can see above, the input sequence shape is (2,2), assuming: batch size is 2 and sentence length is 2; The corresponding graph shape you have is (2,2,2), assuming: batch size 2, seq_len * seq_len matrix with each value representing a specific relation.

I am little bit confused about this graph adjacency matrix format: does the 0 represents the relation indexed at 0 or simply no connection.... I cannot find the way how you distinguish between these two scenarios... how you gonna represent both no connection and the relation 0 in the same matrix.... (if you don't have a relation 0, then I guess the embedding layer in your graph model should have the padding_idx to be 0? So that the 0 position will always be 0 and not attended. Currently it seems that the None/0 will have embedding and will be updated each time - which behaves more like a connected relation)

It would be great if you could give me some information on this matter.

(Currently, in order to allow both no connect and relation type 0, I preserve a padding_idx which is also used for the construction of nn.Embedding layer for dp_relation_k and dp_relation_v: e.g. self.dp_relation_k = nn.Embedding(2*config.label_size+2,self.attention_head_size, padding_idx = 2*config.label_size+1) I am not sure if it is necessary to do so, or I simply misunderstand your code w.r.t the labelled graph)

Many thanks
opened by Punchwes 1

Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Related tags

Overview

Graph-to-Graph Transformers

Contents

Installation

Quick Start

Data Pre-processing and Initial Parser

Dataset Preparation

Penn Treebanks

CoNLL 2009 Treebanks

UD Treebanks

Initial Parser

Training

Evaluation

Predict Raw Sentences

Citations

You might also like...

PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

This is a template for the Non-autoregressive Deep Learning-Based TTS model (in PyTorch).

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

SlotRefine: A Fast Non-Autoregressive Model forJoint Intent Detection and Slot Filling

MaRS - a recursive filtering framework that allows for truly modular multi-sensor integration

Checking fibonacci - Generating the Fibonacci sequence is a classic recursive problem

Comments

The graph adjacency matrix format for the model

Owner

Idiap Research Institute

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Unoffical implementation about Image Super-Resolution via Iterative Refinement by Pytorch

RefineGNN - Iterative refinement graph neural network for antibody sequence-structure co-design (RefineGNN)

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

Pytorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.