Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction


This is a fork of Fairseq(-py) with implementations of the following models:

Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction

An NMT models with two-dimensional convolutions to jointly encode the source and the target sequences.

Pervasive Attention also provides an extensive decoding grid that we leverage to efficiently train wait-k models.


Efficient Wait-k Models for Simultaneous Machine Translation

Transformer Wait-k models (Ma et al., 2019) with unidirectional encoders and with joint training of multiple wait-k paths.


Fairseq Requirements and Installation

  • PyTorch version >= 1.4.0
  • Python version >= 3.6
  • For training new models, you'll also need an NVIDIA GPU and NCCL

Installing Fairseq

git clone
cd attn2d
pip install --editable .


fairseq(-py) is MIT-licensed. The license applies to the pre-trained models as well.


For Pervasive Attention, please cite:

    author ="Elbayad, Maha and Besacier, Laurent and Verbeek, Jakob",
    title = "Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction",
    booktitle = "Proceedings of the 22nd Conference on Computational Natural Language Learning",
    year = "2018",

For our wait-k models, please cite:

    title={Efficient Wait-k Models for Simultaneous Machine Translation},
    author={Elbayad, Maha and Besacier, Laurent and Verbeek, Jakob},
    journal={arXiv preprint arXiv:2005.08595},

For Fairseq, please cite:

  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
  year = {2019},
  • The training speed of pervasive attention model

    The training speed of pervasive attention model


    I am trying to run the pervasive attention model recipes as recommended in the README:

    Based on my observation, the model is too slow with almost 30 words per second on a single GPU 1080! May you please give me a rough idea of what should be the speed expectations based on your experiments?

    Thanks for your insights Parnia

    opened by papar22 1
  • AttributeError: 'float' object has no attribute 'data' on pytorch 0.4.1

    AttributeError: 'float' object has no attribute 'data' on pytorch 0.4.1

    I get the following error at trainer.backward_step() on running the demo script: File "attn2d/nmt/", line 230, in backward_step self.clip_norm).data.item() AttributeError: 'float' object has no attribute 'data'

    which I fixed by: grad_norm = torch.nn.utils.clip_grad_norm_(self.model.parameters(), self.clip_norm)

    Which then runs OK for torch==0.4.1

    Which version of pytorch was the code written for?

    opened by adriangrepo 1
  • ModuleNotFoundError: No module named 'nmt.loader.pair_loader'

    ModuleNotFoundError: No module named 'nmt.loader.pair_loader'

    Hi, as title says, I got this error:

    Traceback (most recent call last): File "/home//repos/attn2d/", line 84, in train(params) File "/home//repos/attn2d/", line 20, in train from nmt.trainer import Trainer File "/home//repos/attn2d/nmt/", line 17, in from nmt.loader.pair_loader import DataPair ModuleNotFoundError: No module named 'nmt.loader.pair_loader'

    opened by koszilard 1
  • 16 undefined names

    16 undefined names

    • Replace ‘false’ with ‘False’
    • Missing import tensorflow as tf
    • Missing import numpy as np
    • See #2

    flake8 testing of on Python 3.7.0

    $ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

    ./nmt/ F821 undefined name 'exp'
                                               exp(self._iter / self.speed))
    ./nmt/ F821 undefined name 'nn'
            return nn.utils.clip_grad_norm_(params, self.grad_norm_max, self.grad_norm_type)
    ./nmt/models/ F821 undefined name 'PositionalPooling4'
            super(PositionalPooling4, self).__init__()
    ./nmt/models/ F821 undefined name 'trg_emb'
                    if trg_emb.size(1) > max_h:
    ./nmt/models/ F821 undefined name 'trg_emb'
                        trg_emb = trg_emb[:, -max_h:, :, :]
    ./nmt/models/ F821 undefined name 'trg_labels'
                    trg_labels =, trg_labels_t),
    ./nmt/utils/ F821 undefined name 'DETOK'
        source = " ".join(DETOK.detokenize(source.split())).encode('utf-8')
    ./nmt/utils/ F821 undefined name 'DETOK'
        gt = " ".join(DETOK.detokenize(gt.split())).encode('utf-8')
    ./nmt/utils/ F821 undefined name 'DETOK'
        pred = " ".join(DETOK.detokenize(pred.split())).encode('utf-8')
    ./nmt/utils/ F821 undefined name 'tf'
        _summary = tf.summary.scalar(name=key,
    ./nmt/utils/ F821 undefined name 'tf'
    ./nmt/utils/ F821 undefined name 'tf'
        summary = tf.Summary(value=[tf.Summary.Value(tag=key, simple_value=value)])
    ./nmt/utils/ F821 undefined name 'tf'
        summary = tf.Summary(value=[tf.Summary.Value(tag=key, simple_value=value)])
    ./nmt/loss/ F821 undefined name 'false'
                p.requires_grad = false
    ./nmt/loss/ F821 undefined name 'false'
                p.requires_grad = false
    ./nmt/loss/samplers/ F821 undefined name 'score'
            return preds_matrix, np.ones(batch_size) * score, stats
    16    F821 undefined name 'false'
    opened by cclauss 1
  • Is this a typo?

    Is this a typo?

    at attn2d/nmt/models/ line 145.

    class PositionalPooling(nn.Module):
        def __init__(self, max_length, emb_size):
            super(PositionalPooling4, self).__init__()
            self.src_embedding = nn.Embedding(max_length, emb_size)
            self.trg_embedding = nn.Embedding(max_length, emb_size)

    Is PositionalPooling4 PositionalPooling?

    opened by elect000 1
  • Training/Eval Error for waitk model

    Training/Eval Error for waitk model

    🐛 Bug

    I'am trying to run the trainning code follow the waitk guide file , and fixed some bug just as @ereday this issue mentioned , but still got error when i ran the train code :

    RuntimeError: Output 0 of SplitBackward0 is a view and is being modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

    To Reproduce

    Steps to reproduce the behavior (always include the command you ran):

    CUDA_VISIBLE_DEVICES=0 python $DATA_BIN -s en -t de --left-pad-source False \
        --user-dir examples/waitk --arch waitk_transformer_small \
        --save-dir $Workdir/checkpoints/$MODEL --tensorboard-logdir $Workdir/logs/$MODEL \
        --seed 1 --no-epoch-checkpoints --no-progress-bar --log-interval 10  \
        --optimizer adam --adam-betas '(0.9, 0.98)' --weight-decay 0.0001 \
        --max-tokens 4000 --update-freq 2 --max-update 50000 \
        --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr '1e-07' --lr 0.002 \
        --min-lr '1e-9' --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
        --share-decoder-input-output-embed --waitk  $k
    1. See error

    Expected behavior


    • fairseq Version (e.g., 1.0 or master):
    • PyTorch Version (e.g., 1.0)
    • OS (e.g., Linux):
    • How you installed fairseq (pip, source):
    • Build command you used (if compiling from source):
    • Python version:
    • CUDA/cuDNN version:
    • GPU models and configuration:
    • Any other relevant information:

    Additional context

    this repo seems out-of-date and the issue raised half years ago is still no replied.

    opened by EricLina 0
  • ModuleNotFoundError: No module named 'examples.simultaneous'

    ModuleNotFoundError: No module named 'examples.simultaneous'

    🐛 Bug


    I was trying to evaluate the pre-trained models under "Efficient Wait-k Models for Simultaneous Machine Translation". For this, I followed the instructions given in the readme. Specifically, I did followings:

    After downloading model and data and placing them underpre_saved:

    cd ~/attn2d/pre_saved
    tar xzf iwslt14_de_en.tar.gz
    tar xzf tf_waitk_model.tar.gz
    k=5 # Evaluation time k
    CUDA_VISIBLE_DEVICES=0 python pre_saved/iwslt14_deen_bpe10k_binaries/ -s de -t en --gen-subset test --path pre_saved/tf_waitk_model.tar.gz --task waitk_translation --eval-waitk $k --model-overrides "{'max_source_positions': 1024, 'max_target_positions': 1024}" --left-pad-source False --user-dir examples/waitk --no-progress-bar --max-tokens 8000 --remove-bpe --beam 1 2>&1 | tee -a $output

    It generates following error message:

    Traceback (most recent call last):
      File "", line 11, in <module>
      File "/home/attn2d/fairseq_cli/", line 276, in cli_main
        parser = options.get_generation_parser()
      File "/home/attn2d/fairseq/", line 33, in get_generation_parser
        parser = get_parser("Generation", default_task)
      File "/home/attn2d/fairseq/", line 197, in get_parser
      File "/home/attn2d/fairseq/", line 350, in import_user_module
      File "/home/anaconda3/envs/py37/lib/python3.7/importlib/", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
      File "<frozen importlib._bootstrap>", line 983, in _find_and_load
      File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 728, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/home/attn2d/examples/waitk/", line 1, in <module>
        from . import models, tasks
      File "/home/attn2d/examples/waitk/models/", line 7, in <module>
        importlib.import_module('examples.simultaneous.models.' + model_name)
      File "/home/anaconda3/envs/py37/lib/python3.7/importlib/", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
    ModuleNotFoundError: No module named 'examples.simultaneous'


    Okay, here is more detail about this:

    I believe thisline is responsible for the error message shared above.

    I changed this importlib.import_module('examples.simultaneous.models.' + model_name) to importlib.import_module('examples.waitk.models.' + model_name)

    Then, I got another error:

      File "", line 11, in <module>
      File "/home/attn2d/fairseq_cli/", line 276, in cli_main
        parser = options.get_generation_parser()
      File "/home/attn2d/fairseq/", line 33, in get_generation_parser
        parser = get_parser("Generation", default_task)
      File "/home/attn2d/fairseq/", line 197, in get_parser
      File "/home/attn2d/fairseq/", line 350, in import_user_module
      File "/home/anaconda3/lib/python3.7/importlib/", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
      File "<frozen importlib._bootstrap>", line 983, in _find_and_load
      File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 728, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/home/attn2d/examples/waitk/", line 1, in <module>
        from . import models, tasks
      File "/home/attn2d/examples/waitk/models/", line 8, in <module>
        importlib.import_module('examples.waitk.models.' + model_name)
      File "/home/anaconda3/lib/python3.7/importlib/", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "/home/attn2d/examples/waitk/", line 1, in <module>
        from . import models, tasks
      File "/home/attn2d/examples/waitk/models/", line 8, in <module>
        importlib.import_module('examples.waitk.models.' + model_name)
      File "/home/anaconda3/lib/python3.7/importlib/", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "/home/attn2d/examples/waitk/models/", line 24, in <module>
        from examples.simultaneous.modules import TransformerEncoderLayer, TransformerDecoderLayer

    So, I changed this line here to ```from examples.waitk.modules import TransformerEncoderLayer, ```` too. Then when I tried once more, I got the following error:

      File "", line 11, in <module>
      File "/home/attn2d/fairseq_cli/", line 276, in cli_main
        parser = options.get_generation_parser()
      File "/home/attn2d/fairseq/", line 33, in get_generation_parser
        parser = get_parser("Generation", default_task)
      File "/home/attn2d/fairseq/", line 197, in get_parser
      File "/home/attn2d/fairseq/", line 350, in import_user_module
      File "/home/anaconda3/lib/python3.7/importlib/", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
      File "<frozen importlib._bootstrap>", line 983, in _find_and_load
      File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 728, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/home/attn2d/examples/waitk/", line 1, in <module>
        from . import models, tasks
      File "/home/attn2d/examples/waitk/models/", line 8, in <module>
        importlib.import_module('examples.waitk.models.' + model_name)
      File "/home/anaconda3/lib/python3.7/importlib/", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "/home/attn2d/examples/waitk/", line 1, in <module>
        from . import models, tasks
      File "/home/attn2d/examples/waitk/models/", line 8, in <module>
        importlib.import_module('examples.waitk.models.' + model_name)
      File "/home/anaconda3/lib/python3.7/importlib/", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "/home/attn2d/examples/waitk/models/", line 25, in <module>
        from examples.waitk.modules import TransformerEncoderLayer, TransformerDecoderLayer
      File "/home/attn2d/examples/waitk/modules/", line 2, in <module>
        from .controller import Controller

    So, to fix it I commented out following lines inexamples/waitk/modules/

    from .controller import Controller
    from .branch_controller import BranchController
    from .oracle import SimulTransOracleDP, SimulTransOracleDP1

    Next, I've tried to use the generation command given in the readme once more..

    CUDA_VISIBLE_DEVICES=0 python pretrained-sources/iwslt14_deen_bpe10k_binaries/ -s de -t en --gen-subset test --path pretrained-sources/ --task waitk_translation --eval-waitk $k --model-overrides "{'max_source_positions': 1024, 'max_target_positions': 1024}" --left-pad-source False --user-dir examples/waitk --no-progress-bar --max-tokens 8000 --remove-bpe --beam 1 2>&1 | tee -a $output

    I got this error:

    2021-09-20 20:29:46 | INFO | fairseq_cli.generate | Namespace(all_gather_list_size=16384, beam=1, bpe=None, checkpoint_suffix='', cpu=False, criterion='cross_entropy', data='pretrained-sources/iwslt14_deen_bpe10k_binaries/', data_buffer_size=0, dataset_impl=None, decoding_format=None, diverse_beam_groups=-1, diverse_beam_strength=0.5, diversity_rate=-1.0, empty_cache_freq=0, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, eval_waitk=5, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, left_pad_source='False', left_pad_target='False', lenpen=1, load_alignments=False, log_format=None, log_interval=100, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_sentences=None, max_source_positions=1024, max_target_positions=1024, max_tokens=8000, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides="{'max_source_positions': 1024, 'max_target_positions': 1024}", model_parallel_size=1, momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=True, no_repeat_ngram_size=0, num_shards=1, num_workers=1, optimizer='nag', path='pretrained-sources/', prefix_size=0, print_alignment=False, print_step=False, quantization_config_path=None, quiet=False, remove_bpe='@@ ', replace_unk=None, required_batch_size_multiple=8, results_path=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, source_lang='de', target_lang='en', task='waitk_translation', temperature=1.0, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, truncate_source=False, unkpen=0, unnormalized=False, upsample_primary=1, user_dir='examples/waitk', warmup_updates=0, weight_decay=0.0)
    2021-09-20 20:29:46 | INFO | fairseq.tasks.translation | [de] dictionary: 8848 types
    2021-09-20 20:29:46 | INFO | fairseq.tasks.translation | [en] dictionary: 6632 types
    2021-09-20 20:29:46 | INFO | | loaded 6750 examples from: pretrained-sources/iwslt14_deen_bpe10k_binaries/
    2021-09-20 20:29:46 | INFO | | loaded 6750 examples from: pretrained-sources/iwslt14_deen_bpe10k_binaries/
    2021-09-20 20:29:46 | INFO | fairseq.tasks.translation | pretrained-sources/iwslt14_deen_bpe10k_binaries/ test de-en 6750 examples
    2021-09-20 20:29:46 | INFO | fairseq_cli.generate | loading model(s) from pretrained-sources/
    Traceback (most recent call last):
      File "", line 11, in <module>
      File "/home/attn2d/fairseq_cli/", line 278, in cli_main
      File "/home/attn2d/fairseq_cli/", line 36, in main
        return _main(args, sys.stdout)
      File "/home/attn2d/fairseq_cli/", line 103, in _main
      File "/home/attn2d/fairseq/tasks/", line 181, in get_batch_iterator
      File "/home/attn2d/fairseq/data/", line 220, in batch_by_size
        from import batch_by_size_fast
      File "fairseq/data/data_utils_fast.pyx", line 1, in init
        # cython: language_level=3
    ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

    I just gave up after that.. @elbayadm I hope you can help me on this.

    Code sample


    I have followed the instructions in the README to install my environment. :

    git clone
    cd attn2d
    pip install --editable .

    As a result, I have the following libraries in my environment:

    Python 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) 
    [GCC 9.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import torch
    torc>>> torch.__version__
    >>> import fairseq
    >>> fairseq.__version__
    $ python --version
    Python 3.7.10

    Operating system: Linux

    opened by ereday 2
PhD student (LIG & INRIA)
