= 3.7 Pytorch " /> = 3.7 Pytorch " /> = 3.7 Pytorch "/>

Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

Related tags

Deep Learning GLAT
Overview

GLAT

Implementation for the ACL2021 paper "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

Requirements

  • Python >= 3.7
  • Pytorch >= 1.5.0
  • Fairseq 1.0.0a0

Preparation

Train an autoregressive Transformer according to the instructions in Fairseq.

Use the trained autoregressive Transformer to generate target sentences for the training set.

Binarize the distilled training data.

input_dir=path_to_raw_text_data
data_dir=path_to_binarized_output
src=source_language
tgt=target_language
python3 fairseq_cli/preprocess.py --source-lang ${src} --target-lang ${tgt} --trainpref ${input_dir}/train \
    --validpref ${input_dir}/valid --testpref ${input_dir}/test --destdir ${data_dir}/ \
    --workers 32 --src-dict ${input_dir}/dict.${src}.txt --tgt-dict {input_dir}/dict.${tgt}.txt

Train

save_path=path_for_saving_models
python3 train.py ${data_dir} --arch glat --noise full_mask --share-all-embeddings \
    --criterion glat_loss --label-smoothing 0.1 --lr 5e-4 --warmup-init-lr 1e-7 --stop-min-lr 1e-9 \
    --lr-scheduler inverse_sqrt --warmup-updates 4000 --optimizer adam --adam-betas '(0.9, 0.999)' \
    --adam-eps 1e-6 --task translation_lev_modified --max-tokens 8192 --weight-decay 0.01 --dropout 0.1 \
    --encoder-layers 6 --encoder-embed-dim 512 --decoder-layers 6 --decoder-embed-dim 512 --fp16 \
    --max-source-positions 1000 --max-target-positions 1000 --max-update 300000 --seed 0 --clip-norm 5\
    --save-dir ${save_path} --src-embedding-copy --pred-length-offset --log-interval 1000 \
    --eval-bleu --eval-bleu-args '{"iter_decode_max_iter": 0, "iter_decode_with_beam": 1}' \
    --eval-tokenized-bleu --eval-bleu-remove-bpe --best-checkpoint-metric bleu \
    --maximize-best-checkpoint-metric --decoder-learned-pos --encoder-learned-pos \
    --apply-bert-init --activation-fn gelu --user-dir glat_plugins \

Inference

checkpoint_path=path_to_your_checkpoint
python3 fairseq_cli/generate.py ${data_dir} --path ${checkpoint_path} --user-dir glat_plugins \
    --task translation_lev_modified --remove-bpe --max-sentences 20 --source-lang ${src} --target-lang ${tgt} \
    --quiet --iter-decode-max-iter 0 --iter-decode-eos-penalty 0 --iter-decode-with-beam 1 --gen-subset test

The script for averaging checkpoints is scripts/average_checkpoints.py

Comments
  • Inference with NPD. Issue: AttributeError: 'dict' object has no attribute '_get_node_flag'

    Inference with NPD. Issue: AttributeError: 'dict' object has no attribute '_get_node_flag'

    hi, folks

    I triggered an issue,

    image

    when I tried to set '--iter-decode-with-external-reranker ' and '--path checkpoints/nat.pt:at_checkpoints/at.pt '

    , do inference with NPD.

    Did you meet similar issues,

    and do you have any ideas?

    It could help me a lot.

    Thanks.

    opened by chuanmingliu 1
  • Training Speed

    Training Speed

    Hi, thanks for your awesome work!

    When I use the command line here (https://github.com/FLC777/GLAT#train) to train the model on 8*V100 GPUs, the time cost of each epoch increases rapidly (epoch1 10min -> epoch 50 120min). Is there something wrong with my command? I wonder what I am supposed to do.

    Thanks very much! hemingkx

    opened by hemingkx 1
  • Which part of your code copies the encoder hidden when glancing?

    Which part of your code copies the encoder hidden when glancing?

    Hi, I read your paper and code with great interest. You said in the paper that during glancing, the sampled tokens are replaced with the embeddings from decoder, while the unsampled tokens use encoder output. However, in the code, the unglanced tokens are still masked. Am I missing something?

           glat_info = None
            if glat and tgt_tokens is not None:
                with torch.no_grad():
                    with torch_seed(rand_seed):
                        word_ins_out = self.decoder(
                            normalize=False,
                            prev_output_tokens=prev_output_tokens,
                            encoder_out=encoder_out,
                        )
                    pred_tokens = word_ins_out.argmax(-1)
                    same_num = ((pred_tokens == tgt_tokens) & nonpad_positions).sum(1)
                    input_mask = torch.ones_like(nonpad_positions)
                    bsz, seq_len = tgt_tokens.size()
                    for li in range(bsz):
                        target_num = (((seq_lens[li] - same_num[li].sum()).float()) * glat['context_p']).long()
                        if target_num > 0:
                            input_mask[li].scatter_(dim=0, index=torch.randperm(seq_lens[li])[:target_num].cuda(), value=0)
                    input_mask = input_mask.eq(1)
                    input_mask = input_mask.masked_fill(~nonpad_positions,False)
                    glat_prev_output_tokens = prev_output_tokens.masked_fill(~input_mask, 0) + tgt_tokens.masked_fill(input_mask, 0)
                    # this line here
                    
                    glat_tgt_tokens = tgt_tokens.masked_fill(~input_mask, self.pad)
    
                    prev_output_tokens, tgt_tokens = glat_prev_output_tokens, glat_tgt_tokens
    
                    glat_info = {
                        "glat_accu": (same_num.sum() / seq_lens.sum()).item(),
                        "glat_context_p": glat['context_p'],
                    }
    
    opened by zkx06111 1
  •   About Data Set Selection

    About Data Set Selection

    Hello, author, while reviewing your code, I found that you did not specify which dataset to use. I read your paper, such as WMT4en de, and downloaded it. However, as for the src dict and tgt dict parameters in preprocess.py, I did not find dict.en.txt and dict.de.txt in the dataset. I dare to bother you, but I hope you can provide a dataset.

    opened by Yonghao-Li 1
  • KeyError: 'bleu'

    KeyError: 'bleu'

    @FLC777 Hello, thank you very much for the open source. When I run the glat_CTC script, the model can run and save, but the following errors are always reported from time to time, and then the program stops and kills: image image

    opened by Andrewlesson 0
  • cannot reproduce results on wmt14 ende distill

    cannot reproduce results on wmt14 ende distill

    I follow the instructions but I cannot reproduce the results on wmt14 ende. With the provided generation script I only got BLEU4 = 16.65, 48.6/22.0/11.4/6.3 (BP=1.000, ratio=1.014, syslen=65432, reflen=64506) After re-ranking with an AT model, I got BLEU4 = 19.93, 52.2/25.6/14.2/8.3 (BP=1.000, ratio=1.013, syslen=65313, reflen=64506) Still far from 26.55 as reported in the paper. I use the provided training script and train the model on 8 V100 GPUs. The training log shows that the glat accuracy is pretty high at the end of training. epoch 151: 1050 / 1993 loss=3.323, nll_loss=1.371, glat_accu=0.758, glat_keep=0.073, glat_context_p=0.3, word_ins=3.162, length=3.223, ppl=10.01, wps=132588, ups=2.16, wpb=61302.6, bsz=2018.2, num_updates=300000, lr=5.7735e-05, gnorm=1.455, clip=0, train_wall=46, gb_free=6.8, wall=139238 But the valid loss is also very high. valid | epoch 151 | valid on 'valid' subset | loss 7.412 | nll_loss 5.888 | glat_accu 0 | glat_keep 0 | glat_context_p 0 | word_ins 7.252 | length 3.209 | ppl 170.31 | wps 242820 | wpb 41551 | bsz 1500 | num_updates 300000 | best_loss 6.511

    opened by fernando9torres 0
  • Problem of inference speed

    Problem of inference speed

    I test the inference speed compared autoregressive model with GLAT model, It is no more than 2x speed up actually. There is about 15.3x speed up in your paper (Table 1), pls provide inference scripts. thanks

    opened by zhajiahe 0
Owner
null
ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

BG Kim 3 Oct 6, 2022
Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

NuPIC Numenta Platform for Intelligent Computing The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implem

Numenta 6.3k Dec 30, 2022
PyTorch implementation of neural style transfer algorithm

neural-style-pt This is a PyTorch implementation of the paper A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S. Ecker, and Matthias

null 770 Jan 2, 2023
PyTorch implementation of DeepDream algorithm

neural-dream This is a PyTorch implementation of DeepDream. The code is based on neural-style-pt. Here we DeepDream a photograph of the Golden Gate Br

null 121 Nov 5, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 5, 2023
Image-to-Image Translation with Conditional Adversarial Networks (Pix2pix) implementation in keras

pix2pix-keras Pix2pix implementation in keras. Original paper: Image-to-Image Translation with Conditional Adversarial Networks (pix2pix) Paper Author

William Falcon 141 Dec 30, 2022
Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

This is a Python implementation of cover trees, a data structure for finding nearest neighbors in a general metric space (e.g., a 3D box with periodic

Patrick Varilly 28 Nov 25, 2022
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 364 Dec 28, 2022
Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow

xRBM Library Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow Installation Using pip: pip install xrbm Examples Tut

Omid Alemi 55 Dec 29, 2022
A fast Evolution Strategy implementation in Python

Evostra: Evolution Strategy for Python Evolution Strategy (ES) is an optimization technique based on ideas of adaptation and evolution. You can learn

Mika 251 Dec 8, 2022
🌳 A Python-inspired implementation of the Optimum-Path Forest classifier.

OPFython: A Python-Inspired Optimum-Path Forest Classifier Welcome to OPFython. Note that this implementation relies purely on the standard LibOPF. Th

Gustavo Rosa 30 Jan 4, 2023
Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

Geometric Vector Perceptron Implementation of Geometric Vector Perceptron, a simple circuit with 3d rotation equivariance for learning over large biom

Phil Wang 59 Nov 24, 2022
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

null 101 Nov 25, 2022
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 49 Nov 23, 2022
PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition. Transformer models are good at capturing content-based

Soohwan Kim 565 Jan 4, 2023
An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

Enrico Fini 48 Sep 27, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

Exploring simple siamese representation learning This is a PyTorch re-implementation of the SimSiam paper on ImageNet dataset. The results match that

Taojiannan Yang 72 Nov 9, 2022
PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

FullSubNet This Git repository for the official PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech E

郝翔 357 Jan 4, 2023