Sequence-to-Sequence learning using PyTorch

Overview

Seq2Seq in PyTorch

This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train and infer using them.

Using this code you can train:

  • Neural-machine-translation (NMT) models
  • Language models
  • Image to caption generation
  • Skip-thought sentence representations
  • And more...

Installation

git clone --recursive https://github.com/eladhoffer/seq2seq.pytorch
cd seq2seq.pytorch; python setup.py develop

Models

Models currently available:

Datasets

Datasets currently available:

All datasets can be tokenized using 3 available segmentation methods:

  • Character based segmentation
  • Word based segmentation
  • Byte-pair-encoding (BPE) as suggested by bpe with selectable number of tokens.

After choosing a tokenization method, a vocabulary will be generated and saved for future inference.

Training methods

The models can be trained using several methods:

  • Basic Seq2Seq - given encoded sequence, generate (decode) output sequence. Training is done with teacher-forcing.
  • Multi Seq2Seq - where several tasks (such as multiple languages) are trained simultaneously by using the data sequences as both input to the encoder and output for decoder.
  • Image2Seq - used to train image to caption generators.

Usage

Example training scripts are available in scripts folder. Inference examples are available in examples folder.

  • example for training a transformer on WMT16 according to original paper regime:
DATASET=${1:-"WMT16_de_en"}
DATASET_DIR=${2:-"./data/wmt16_de_en"}
OUTPUT_DIR=${3:-"./results"}

WARMUP="4000"
LR0="512**(-0.5)"

python main.py \
  --save transformer \
  --dataset ${DATASET} \
  --dataset-dir ${DATASET_DIR} \
  --results-dir ${OUTPUT_DIR} \
  --model Transformer \
  --model-config "{'num_layers': 6, 'hidden_size': 512, 'num_heads': 8, 'inner_linear': 2048}" \
  --data-config "{'moses_pretok': True, 'tokenization':'bpe', 'num_symbols':32000, 'shared_vocab':True}" \
  --b 128 \
  --max-length 100 \
  --device-ids 0 \
  --label-smoothing 0.1 \
  --trainer Seq2SeqTrainer \
  --optimization-config "[{'step_lambda':
                          \"lambda t: { \
                              'optimizer': 'Adam', \
                              'lr': ${LR0} * min(t ** -0.5, t * ${WARMUP} ** -1.5), \
                              'betas': (0.9, 0.98), 'eps':1e-9}\"
                          }]"
  • example for training attentional LSTM based model with 3 layers in both encoder and decoder:
python main.py \
  --save de_en_wmt17 \
  --dataset ${DATASET} \
  --dataset-dir ${DATASET_DIR} \
  --results-dir ${OUTPUT_DIR} \
  --model RecurrentAttentionSeq2Seq \
  --model-config "{'hidden_size': 512, 'dropout': 0.2, \
                   'tie_embedding': True, 'transfer_hidden': False, \
                   'encoder': {'num_layers': 3, 'bidirectional': True, 'num_bidirectional': 1, 'context_transform': 512}, \
                   'decoder': {'num_layers': 3, 'concat_attention': True,\
                               'attention': {'mode': 'dot_prod', 'dropout': 0, 'output_transform': True, 'output_nonlinearity': 'relu'}}}" \
  --data-config "{'moses_pretok': True, 'tokenization':'bpe', 'num_symbols':32000, 'shared_vocab':True}" \
  --b 128 \
  --max-length 80 \
  --device-ids 0 \
  --trainer Seq2SeqTrainer \
  --optimization-config "[{'epoch': 0, 'optimizer': 'Adam', 'lr': 1e-3},
                          {'epoch': 6, 'lr': 5e-4},
                          {'epoch': 8, 'lr':1e-4},
                          {'epoch': 10, 'lr': 5e-5},
                          {'epoch': 12, 'lr': 1e-5}]" \
Comments
  • Translate_English_German_LSTM.ipynb in the examples dir seems to be broken

    Translate_English_German_LSTM.ipynb in the examples dir seems to be broken

    Can't open the file in the Chrome browser. Jupyter Notebook Preview in VS Code gives: SyntaxError: Unexpected string in JSON at position 239.

    Would appreciate it if this could be fixed. Thanks a lot.

    opened by yucongo 2
  • TypeError: __init__() got an unexpected keyword argument 'num_symbols'

    TypeError: __init__() got an unexpected keyword argument 'num_symbols'

    Im trying to train the model provided in the script at: scripts/train/train_en_de.sh and I got an error TypeError: __init__() got an unexpected keyword argument 'num_symbols'

    Can you let me know what I am missing?

    opened by daothanhtuan 1
  • ModuleNotFoundError: No module named 'apply_bpe'

    ModuleNotFoundError: No module named 'apply_bpe'

    Hello, the problem I found is that even after I installed the subword-nmt library, running the code below (in the Generate_Caption.ipynb) still makes an error. checkpoint = load_url('https://dl.dropboxusercontent.com/s/05dvriaiqk74cum/caption_resnet50-4c0fa803.pth', map_location={'gpu:0':'cpu'})

    The error is File "/home/noone/anaconda3/envs/tensorflow_3_180/lib/python3.6/site-packages/torch/serialization.py", line 469, in _load result = unpickler.load() ModuleNotFoundError: No module named 'apply_bpe'

    My environment is python 3.6 and I'm using conda.

    I don't know why, and hope you could help with that. Thanks in advance.

    opened by yangysc 1
  • How to retrieve the final context vector from the encoder?

    How to retrieve the final context vector from the encoder?

    Hi, I want to encode a small document into a vector using the RNN encoder. Can you kindly tell me how to get the final context vector representation of a document?

    opened by shahbazsyed 1
  • ImportError: cannot import name 'PermutedSequenceGenerator'

    ImportError: cannot import name 'PermutedSequenceGenerator'

    When running main.py, I get the following error:

      File "main.py", line 15, in <module>
        from seq2seq import models, datasets
      File "seq2seq.pytorch\seq2seq\models\__init__.py", line 1, in <module>
        from .transformer import Transformer, TransformerAttentionDecoder, TransformerAttentionEncoder
      File "seq2seq.pytorch\seq2seq\models\transformer.py", line 6, in <module>
        from .seq2seq_base import Seq2Seq
      File "seq2seq.pytorch\seq2seq\models\seq2seq_base.py", line 8, in <module>
        from seq2seq.tools.beam_search import SequenceGenerator, PermutedSequenceGenerator
    ImportError: cannot import name 'PermutedSequenceGenerator'
    

    I'm not sure whether it needs to be imported for training to work. Thanks

    opened by kevaday 1
  • ModuleNotFound Error

    ModuleNotFound Error

    Hi Elad,

    I am using your code for my task. I face an error related to ModuleNotFoundError: No module named 'seq2seq.tools.utils.log'. I already executed script python setup.py develop but after that it showing the same error.

    Best Arunav

    opened by Shandilya21 3
  • module 'torch.nn' has no attribute 'MultiheadAttention'

    module 'torch.nn' has no attribute 'MultiheadAttention'

    Hi In the file of modules/attention.py, the class MultiHeadAttention(nn.MultiheadAttention) is reported an error:

    class MultiHeadAttention(nn.MultiheadAttention): AttributeError: module 'torch.nn' has no attribute 'MultiheadAttention'

    I use the pytorch=0.4.1. Is there any version of torch that containing the nn.MultiheadAttention?

    opened by chenQ1114 1
  • A question on the beam search

    A question on the beam search

    In this line, https://github.com/eladhoffer/seq2seq.pytorch/blob/348276b2fcc5a60352b3dccaae7102666dbbd4ac/seq2seq/tools/beam_search.py#L216, the decrease of num_hyp will increase the value of k, which then leads to the out-of-bounds for the words[idx][k], I don't know why you use num_hyp?

    opened by wjb123 0
  • How do we use scripts under the train folder

    How do we use scripts under the train folder

    Hi. Please i would like to know how to train train these models. When i ran train_en_de.sh, i get errors:

    Traceback (most recent call last): File "/media/vivien/A/NEW-SMT/seq2seq.pytorch-master/main.py", line 15, in from seq2seq.tools.utils.log import setup_logging ModuleNotFoundError: No module named 'seq2seq.tools.utils.log' Traceback (most recent call last): File "/media/vivien/A/NEW-SMT/seq2seq.pytorch-master/main.py", line 15, in from seq2seq.tools.utils.log import setup_logging ModuleNotFoundError: No module named 'seq2seq.tools.utils.log' Traceback (most recent call last): File "/media/vivien/A/NEW-SMT/seq2seq.pytorch-master/main.py", line 15, in from seq2seq.tools.utils.log import setup_logging ModuleNotFoundError: No module named 'seq2seq.tools.utils.log' Traceback (most recent call last): File "/media/vivien/A/NEW-SMT/seq2seq.pytorch-master/main.py", line 15, in from seq2seq.tools.utils.log import setup_logging ModuleNotFoundError: No module named 'seq2seq.tools.utils.log'

    opened by liperrino 1
Owner
Elad Hoffer
Elad Hoffer
Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Yoon Kim 43 Dec 23, 2022
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

null 20.5k Jan 8, 2023
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Sockeye This package contains the Sockeye project, an open-source sequence-to-sequence framework for Neural Machine Translation based on Apache MXNet

Amazon Web Services - Labs 1.1k Dec 27, 2022
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

null 11.3k Feb 18, 2021
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Sockeye This package contains the Sockeye project, an open-source sequence-to-sequence framework for Neural Machine Translation based on Apache MXNet

Amazon Web Services - Labs 986 Feb 17, 2021
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Amazon Web Services - Labs 1000 Apr 19, 2021
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

null 13.2k Jul 7, 2021
A highly sophisticated sequence-to-sequence model for code generation

CoderX A proof-of-concept AI system by Graham Neubig (June 30, 2021). About CoderX CoderX is a retrieval-based code generation AI system reminiscent o

Graham Neubig 39 Aug 3, 2021
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
MASS: Masked Sequence to Sequence Pre-training for Language Generation

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Microsoft 1.1k Dec 17, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Sequence model architectures from scratch in PyTorch

This repository implements a variety of sequence model architectures from scratch in PyTorch. Effort has been put to make the code well structured so that it can serve as learning material. The training loop implements the learner design pattern from fast.ai in pure PyTorch, with access to the loop provided through callbacks. Detailed logging and graphs are also provided with python logging and wandb. Additional implementations will be added.

Brando Koch 11 Mar 28, 2022
A Fast Sequence Transducer Implementation with PyTorch Bindings

transducer A Fast Sequence Transducer Implementation with PyTorch Bindings. The corresponding publication is Sequence Transduction with Recurrent Neur

Awni Hannun 184 Dec 18, 2022
Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

TRICE: a task-agnostic transferring framework for multi-source sequence generation This is the source code of our work Transfer Learning for Sequence

THUNLP-MT 9 Jun 27, 2022
Yet Another Sequence Encoder - Encode sequences to vector of vector in python !

Yase Yet Another Sequence Encoder - encode sequences to vector of vectors in python ! Why Yase ? Yase enable you to encode any sequence which can be r

Pierre PACI 12 Aug 19, 2021
Task-based datasets, preprocessing, and evaluation for sequence models.

SeqIO: Task-based datasets, preprocessing, and evaluation for sequence models. SeqIO is a library for processing sequential data to be fed into downst

Google 290 Dec 26, 2022
LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

LightSeq is a high performance inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP models such as BERT, GPT2, Transformer, etc. It is therefore best useful for Machine Translation, Text Generation, Dialog, Language Modelling, and other related tasks using these models.

Bytedance Inc. 2.5k Jan 3, 2023
[ICLR'19] Trellis Networks for Sequence Modeling

TrellisNet for Sequence Modeling This repository contains the experiments done in paper Trellis Networks for Sequence Modeling by Shaojie Bai, J. Zico

CMU Locus Lab 460 Oct 13, 2022
Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

CMU Locus Lab 3.5k Jan 3, 2023