Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

Last update: Jan 9, 2023

Related tags

Deep Learning GLAT

Overview

GLAT

Implementation for the ACL2021 paper "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

Requirements

Python >= 3.7
Pytorch >= 1.5.0
Fairseq 1.0.0a0

Preparation

Train an autoregressive Transformer according to the instructions in Fairseq.

Use the trained autoregressive Transformer to generate target sentences for the training set.

Binarize the distilled training data.

input_dir=path_to_raw_text_data
data_dir=path_to_binarized_output
src=source_language
tgt=target_language
python3 fairseq_cli/preprocess.py --source-lang ${src} --target-lang ${tgt} --trainpref ${input_dir}/train \
    --validpref ${input_dir}/valid --testpref ${input_dir}/test --destdir ${data_dir}/ \
    --workers 32 --src-dict ${input_dir}/dict.${src}.txt --tgt-dict {input_dir}/dict.${tgt}.txt

Train

save_path=path_for_saving_models
python3 train.py ${data_dir} --arch glat --noise full_mask --share-all-embeddings \
    --criterion glat_loss --label-smoothing 0.1 --lr 5e-4 --warmup-init-lr 1e-7 --stop-min-lr 1e-9 \
    --lr-scheduler inverse_sqrt --warmup-updates 4000 --optimizer adam --adam-betas '(0.9, 0.999)' \
    --adam-eps 1e-6 --task translation_lev_modified --max-tokens 8192 --weight-decay 0.01 --dropout 0.1 \
    --encoder-layers 6 --encoder-embed-dim 512 --decoder-layers 6 --decoder-embed-dim 512 --fp16 \
    --max-source-positions 1000 --max-target-positions 1000 --max-update 300000 --seed 0 --clip-norm 5\
    --save-dir ${save_path} --src-embedding-copy --pred-length-offset --log-interval 1000 \
    --eval-bleu --eval-bleu-args '{"iter_decode_max_iter": 0, "iter_decode_with_beam": 1}' \
    --eval-tokenized-bleu --eval-bleu-remove-bpe --best-checkpoint-metric bleu \
    --maximize-best-checkpoint-metric --decoder-learned-pos --encoder-learned-pos \
    --apply-bert-init --activation-fn gelu --user-dir glat_plugins \

Inference

checkpoint_path=path_to_your_checkpoint
python3 fairseq_cli/generate.py ${data_dir} --path ${checkpoint_path} --user-dir glat_plugins \
    --task translation_lev_modified --remove-bpe --max-sentences 20 --source-lang ${src} --target-lang ${tgt} \
    --quiet --iter-decode-max-iter 0 --iter-decode-eos-penalty 0 --iter-decode-with-beam 1 --gen-subset test

The script for averaging checkpoints is scripts/average_checkpoints.py

Comments

Inference with NPD. Issue: AttributeError: 'dict' object has no attribute '_get_node_flag'

hi, folks

I triggered an issue,

when I tried to set '--iter-decode-with-external-reranker ' and '--path checkpoints/nat.pt:at_checkpoints/at.pt '

, do inference with NPD.

Did you meet similar issues,

and do you have any ideas?

It could help me a lot.

Thanks.

opened by chuanmingliu 1
Training Speed

Hi, thanks for your awesome work!

When I use the command line here (https://github.com/FLC777/GLAT#train) to train the model on 8*V100 GPUs, the time cost of each epoch increases rapidly (epoch1 10min -> epoch 50 120min). Is there something wrong with my command? I wonder what I am supposed to do.

Thanks very much! hemingkx

opened by hemingkx 1

Which part of your code copies the encoder hidden when glancing?

Hi, I read your paper and code with great interest. You said in the paper that during glancing, the sampled tokens are replaced with the embeddings from decoder, while the unsampled tokens use encoder output. However, in the code, the unglanced tokens are still masked. Am I missing something?

       glat_info = None
        if glat and tgt_tokens is not None:
            with torch.no_grad():
                with torch_seed(rand_seed):
                    word_ins_out = self.decoder(
                        normalize=False,
                        prev_output_tokens=prev_output_tokens,
                        encoder_out=encoder_out,
                    )
                pred_tokens = word_ins_out.argmax(-1)
                same_num = ((pred_tokens == tgt_tokens) & nonpad_positions).sum(1)
                input_mask = torch.ones_like(nonpad_positions)
                bsz, seq_len = tgt_tokens.size()
                for li in range(bsz):
                    target_num = (((seq_lens[li] - same_num[li].sum()).float()) * glat['context_p']).long()
                    if target_num > 0:
                        input_mask[li].scatter_(dim=0, index=torch.randperm(seq_lens[li])[:target_num].cuda(), value=0)
                input_mask = input_mask.eq(1)
                input_mask = input_mask.masked_fill(~nonpad_positions,False)
                glat_prev_output_tokens = prev_output_tokens.masked_fill(~input_mask, 0) + tgt_tokens.masked_fill(input_mask, 0)
                # this line here
                
                glat_tgt_tokens = tgt_tokens.masked_fill(~input_mask, self.pad)

                prev_output_tokens, tgt_tokens = glat_prev_output_tokens, glat_tgt_tokens

                glat_info = {
                    "glat_accu": (same_num.sum() / seq_lens.sum()).item(),
                    "glat_context_p": glat['context_p'],
                }

opened by zkx06111 1

About Data Set Selection

Hello, author, while reviewing your code, I found that you did not specify which dataset to use. I read your paper, such as WMT4en de, and downloaded it. However, as for the src dict and tgt dict parameters in preprocess.py, I did not find dict.en.txt and dict.de.txt in the dataset. I dare to bother you, but I hope you can provide a dataset.

opened by Yonghao-Li 1
KeyError: 'bleu'

@FLC777 Hello, thank you very much for the open source. When I run the glat_CTC script, the model can run and save, but the following errors are always reported from time to time, and then the program stops and kills:

opened by Andrewlesson 0
cannot reproduce results on wmt14 ende distill

I follow the instructions but I cannot reproduce the results on wmt14 ende. With the provided generation script I only got BLEU4 = 16.65, 48.6/22.0/11.4/6.3 (BP=1.000, ratio=1.014, syslen=65432, reflen=64506) After re-ranking with an AT model, I got BLEU4 = 19.93, 52.2/25.6/14.2/8.3 (BP=1.000, ratio=1.013, syslen=65313, reflen=64506) Still far from 26.55 as reported in the paper. I use the provided training script and train the model on 8 V100 GPUs. The training log shows that the glat accuracy is pretty high at the end of training. epoch 151: 1050 / 1993 loss=3.323, nll_loss=1.371, glat_accu=0.758, glat_keep=0.073, glat_context_p=0.3, word_ins=3.162, length=3.223, ppl=10.01, wps=132588, ups=2.16, wpb=61302.6, bsz=2018.2, num_updates=300000, lr=5.7735e-05, gnorm=1.455, clip=0, train_wall=46, gb_free=6.8, wall=139238 But the valid loss is also very high. valid | epoch 151 | valid on 'valid' subset | loss 7.412 | nll_loss 5.888 | glat_accu 0 | glat_keep 0 | glat_context_p 0 | word_ins 7.252 | length 3.209 | ppl 170.31 | wps 242820 | wpb 41551 | bsz 1500 | num_updates 300000 | best_loss 6.511

opened by fernando9torres 0
Problem of inference speed

I test the inference speed compared autoregressive model with GLAT model, It is no more than 2x speed up actually. There is about 15.3x speed up in your paper (Table 1), pls provide inference scripts. thanks

opened by zhajiahe 0

Owner

GitHub

ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

3 Oct 6, 2022

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

NuPIC Numenta Platform for Intelligent Computing The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implem

6.3k Dec 30, 2022

PyTorch implementation of neural style transfer algorithm

neural-style-pt This is a PyTorch implementation of the paper A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S. Ecker, and Matthias

770 Jan 2, 2023

PyTorch implementation of DeepDream algorithm

neural-dream This is a PyTorch implementation of DeepDream. The code is based on neural-style-pt. Here we DeepDream a photograph of the Golden Gate Br

121 Nov 5, 2022

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

3.9k Jan 5, 2023

Image-to-Image Translation with Conditional Adversarial Networks (Pix2pix) implementation in keras

pix2pix-keras Pix2pix implementation in keras. Original paper: Image-to-Image Translation with Conditional Adversarial Networks (pix2pix) Paper Author

141 Dec 30, 2022

Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

This is a Python implementation of cover trees, a data structure for finding nearest neighbors in a general metric space (e.g., a 3D box with periodic

28 Nov 25, 2022

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

364 Dec 28, 2022

Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow

xRBM Library Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow Installation Using pip: pip install xrbm Examples Tut

55 Dec 29, 2022

A fast Evolution Strategy implementation in Python

Evostra: Evolution Strategy for Python Evolution Strategy (ES) is an optimization technique based on ideas of adaptation and evolution. You can learn

251 Dec 8, 2022

🌳 A Python-inspired implementation of the Optimum-Path Forest classifier.

OPFython: A Python-Inspired Optimum-Path Forest Classifier Welcome to OPFython. Note that this implementation relies purely on the standard LibOPF. Th

30 Jan 4, 2023

Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

Geometric Vector Perceptron Implementation of Geometric Vector Perceptron, a simple circuit with 3d rotation equivariance for learning over large biom

59 Nov 24, 2022

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

101 Nov 25, 2022

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

49 Nov 23, 2022

PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition. Transformer models are good at capturing content-based

565 Jan 4, 2023

An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

48 Sep 27, 2022

The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

76 Nov 23, 2022

A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

Exploring simple siamese representation learning This is a PyTorch re-implementation of the SimSiam paper on ImageNet dataset. The results match that

72 Nov 9, 2022

PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

FullSubNet This Git repository for the official PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech E

357 Jan 4, 2023