The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Overview

Graformer

The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Graformer (also named BridgeTransformer in the code) is a sequence-to-sequence model mainly for Neural Machine Translation. We improve the multilingual translation by taking advantage of pre-trained (masked) language models, including pre-trained encoder (BERT) and pre-trained decoder (GPT). The code is based on Fairseq.

Examples

You can start with run/run.sh, with some minor modification. The corresponding scripts represent:

train a pre-trained BERT:
    run_arnold_multilingual_masked_lm_6e6d.sh

train a pre-trained GPT:
    run_arnold_multilingual_lm_6e6d.sh

train a Graformer:
    run_arnold_multilingual_graft_transformer_12e12d_ted.sh

inference from Graformer:
    run_arnold_multilingual_graft_inference_ted.sh
    

Released Models

We release our pre-trained mBERT and mGPT, along with the trained Graformer model in here.

Tensorflow Version

We will provide the tensorflow version in Neurst, a popular toolkit for sequence processing.

Citation

Please cite as:

@inproceedings{sun2021mulilingual,
    title = "Multilingual Translation via Grafting Pre-trained Language Models",
    author = "Sun, Zewei and Wang, Mingxuan and Li, Lei",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    year = "2021"
}

Contact

If you have any questions, please feel free to contact me: [email protected]

You might also like...
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

The implementation of Parameter Differentiation based Multilingual Neural Machine Translation

The implementation of Parameter Differentiation based Multilingual Neural Machin

DziriBERT: a Pre-trained Language Model for the Algerian Dialect
DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect.

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

Neural-Machine-Translation - Implementation of revolutionary machine translation models

Neural Machine Translation Framework: PyTorch Repository contaning my implementa

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing Trankit is a light-weight Transformer-based Pyth

Comments
  • Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match

    Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match

    What is your question?

    After pre-training the masked lm and the lm following the code in the github repo, I am trying to fuse them and fine-tune them together. However, I am getting these error/exception messages.

    RuntimeError: Error(s) in loading state_dict for BridgeTransformerModel: size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]). size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]). size mismatch for decoder.lm_output_projection.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]).

    Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match

    Code

    The fine-tuning code: python3 Graformer/train.py data-bin-ar-en/ --task translation_multi_simple_epoch --langs 'ar,en' --lang-pairs 'ar-en' --decoder-langtok --lang-tok-replacing-bos-eos --arch bridge_transformer --encoder-layers 12 --decoder-layers 12 --no-encoder-attn-layers 0,1,2,3,4,5 --encoder-learned-pos --decoder-learned-pos --no-scale-embedding --encoder-normalize-before --decoder-normalize-before --activation-fn gelu --finetune-from-model masked_lm_checkpoints/checkpoint_last.pt,lm_checkpoints/checkpoint_last.pt --freeze-params "(.embed.)|(.layers\.(0|1|2|3|4|5)\..)|(.layers\.6\.self_attn_layer_norm.)" --transfer-params "encoder.layer_norm.weight:encoder.layers.6.self_attn_layer_norm.weight,decoder.layer_norm.weight:decoder.layers.6.self_attn_layer_norm.weight,encoder.layer_norm.bias:encoder.layers.6.self_attn_layer_norm.bias,decoder.layer_norm.bias:decoder.layers.6.self_attn_layer_norm.bias,decoder.embed_tokens.weight:decoder.lm_output_projection.weight,decoder.layer_norm.weight:decoder.lm_layer_norm.weight,decoder.layer_norm.bias:decoder.lm_layer_norm.bias" --lm-fusion --max-epoch 100 --max-tokens 16000 --optimizer adam --adam-betas '(0.9,0.98)' --lr 0.001 --warmup-updates 2500 --update-freq 5 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --dropout 0.1 --save-interval 5 --keep-interval-updates 5 --keep-best-checkpoints 1 --save-dir grafted-transformer-checkpoints --fp16 --disable-validation --ddp-backend=no_c10d

    Note: The dictionary I pre-trained the models with is not exactly 64k in length.

    What's your environment?

    PyTorch Version: 1.11.0 OS (e.g., Linux): Linux Python version: 3.8.10 GPU models and configuration: NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4

    Note: I am working on only 1 GPU

    question 
    opened by salma-elshafey 0
Owner
null
BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural language processing.

Benjamin Heinzerling 1.1k Jan 3, 2023
RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

Stefan Dumitrescu 9 Nov 7, 2022
Must-read papers on improving efficiency for pre-trained language models.

Must-read papers on improving efficiency for pre-trained language models.

Tobias Lee 89 Jan 3, 2023
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from huggingface transformers. In the future, we will also support PLMs implemented by other libraries.

THUNLP 2.3k Jan 8, 2023
Chinese Pre-Trained Language Models (CPM-LM) Version-I

CPM-Generate 为了促进中文自然语言处理研究的发展,本项目提供了 CPM-LM (2.6B) 模型的文本生成代码,可用于文本生成的本地测试,并以此为基础进一步研究零次学习/少次学习等场景。[项目首页] [模型下载] [技术报告] 若您想使用CPM-1进行推理,我们建议使用高效推理工具BMI

Tsinghua AI 1.4k Jan 3, 2023
PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Feature_CRF_AE Feature_CRF_AE provides a implementation of Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Jacob Zhou 6 Apr 29, 2022
Guide to using pre-trained large language models of source code

Large Models of Source Code I occasionally train and publicly release large neural language models on programs, including PolyCoder. Here, I describe

Vincent Hellendoorn 947 Dec 28, 2022
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Alexander Veysov 3.2k Dec 31, 2022