The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Last update: Dec 14, 2022

Related tags

Text Data & NLP Graformer

Overview

Graformer

The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Graformer (also named BridgeTransformer in the code) is a sequence-to-sequence model mainly for Neural Machine Translation. We improve the multilingual translation by taking advantage of pre-trained (masked) language models, including pre-trained encoder (BERT) and pre-trained decoder (GPT). The code is based on Fairseq.

Examples

You can start with run/run.sh, with some minor modification. The corresponding scripts represent:

train a pre-trained BERT:
    run_arnold_multilingual_masked_lm_6e6d.sh

train a pre-trained GPT:
    run_arnold_multilingual_lm_6e6d.sh

train a Graformer:
    run_arnold_multilingual_graft_transformer_12e12d_ted.sh

inference from Graformer:
    run_arnold_multilingual_graft_inference_ted.sh

Released Models

We release our pre-trained mBERT and mGPT, along with the trained Graformer model in here.

Tensorflow Version

We will provide the tensorflow version in Neurst, a popular toolkit for sequence processing.

Citation

Please cite as:

@inproceedings{sun2021mulilingual,
    title = "Multilingual Translation via Grafting Pre-trained Language Models",
    author = "Sun, Zewei and Wang, Mingxuan and Li, Lei",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    year = "2021"
}

Contact

If you have any questions, please feel free to contact me: [email protected]

Comments

Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match

What is your question?

After pre-training the masked lm and the lm following the code in the github repo, I am trying to fuse them and fine-tune them together. However, I am getting these error/exception messages.

RuntimeError: Error(s) in loading state_dict for BridgeTransformerModel: size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]). size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]). size mismatch for decoder.lm_output_projection.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]).

Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match

Code

The fine-tuning code: python3 Graformer/train.py data-bin-ar-en/ --task translation_multi_simple_epoch --langs 'ar,en' --lang-pairs 'ar-en' --decoder-langtok --lang-tok-replacing-bos-eos --arch bridge_transformer --encoder-layers 12 --decoder-layers 12 --no-encoder-attn-layers 0,1,2,3,4,5 --encoder-learned-pos --decoder-learned-pos --no-scale-embedding --encoder-normalize-before --decoder-normalize-before --activation-fn gelu --finetune-from-model masked_lm_checkpoints/checkpoint_last.pt,lm_checkpoints/checkpoint_last.pt --freeze-params "(.embed.)|(.layers\.(0|1|2|3|4|5)\..)|(.layers\.6\.self_attn_layer_norm.)" --transfer-params "encoder.layer_norm.weight:encoder.layers.6.self_attn_layer_norm.weight,decoder.layer_norm.weight:decoder.layers.6.self_attn_layer_norm.weight,encoder.layer_norm.bias:encoder.layers.6.self_attn_layer_norm.bias,decoder.layer_norm.bias:decoder.layers.6.self_attn_layer_norm.bias,decoder.embed_tokens.weight:decoder.lm_output_projection.weight,decoder.layer_norm.weight:decoder.lm_layer_norm.weight,decoder.layer_norm.bias:decoder.lm_layer_norm.bias" --lm-fusion --max-epoch 100 --max-tokens 16000 --optimizer adam --adam-betas '(0.9,0.98)' --lr 0.001 --warmup-updates 2500 --update-freq 5 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --dropout 0.1 --save-interval 5 --keep-interval-updates 5 --keep-best-checkpoints 1 --save-dir grafted-transformer-checkpoints --fp16 --disable-validation --ddp-backend=no_c10d

Note: The dictionary I pre-trained the models with is not exactly 64k in length.

What's your environment?

PyTorch Version: 1.11.0 OS (e.g., Linux): Linux Python version: 3.8.10 GPU models and configuration: NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4

Note: I am working on only 1 GPU
question

opened by salma-elshafey 0

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

XL-Sum This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Lang

189 Jan 2, 2023

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

44 Dec 31, 2022

The implementation of Parameter Differentiation based Multilingual Neural Machine Translation

The implementation of Parameter Differentiation based Multilingual Neural Machin

21 Dec 17, 2022

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect.

117 Jan 7, 2023

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

4 Jul 1, 2022

Neural-Machine-Translation - Implementation of revolutionary machine translation models

Neural Machine Translation Framework: PyTorch Repository contaning my implementa

1 Feb 17, 2022

TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

TweebankNLP This repo contains the new Tweebank-NER dataset and Twitter-Stanza p

84 Dec 20, 2022

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

End-to-end neural table-text understanding models.

914 Jan 7, 2023

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing Trankit is a light-weight Transformer-based Pyth

652 Jan 6, 2023

The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Related tags

Overview

Graformer

Examples

Released Models

Tensorflow Version

Citation

Contact

You might also like...

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

The implementation of Parameter Differentiation based Multilingual Neural Machine Translation

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

Neural-Machine-Translation - Implementation of revolutionary machine translation models

TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Comments

Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match

What is your question?

Code

What's your environment?

Owner

MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert, MILES uses the bert-base-multilingual-uncased model, as well as simple language-agnostic approaches to complex word identification (CWI) and candidate ranking.

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding".

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

Must-read papers on improving efficiency for pre-trained language models.

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Chinese Pre-Trained Language Models (CPM-LM) Version-I

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Guide to using pre-trained large language models of source code

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple