Sequence-to-Sequence learning using PyTorch

Overview

Seq2Seq in PyTorch

This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train and infer using them.

Using this code you can train:

  • Neural-machine-translation (NMT) models
  • Language models
  • Image to caption generation
  • Skip-thought sentence representations
  • And more...

Installation

git clone --recursive https://github.com/eladhoffer/seq2seq.pytorch
cd seq2seq.pytorch; python setup.py develop

Models

Models currently available:

Datasets

Datasets currently available:

All datasets can be tokenized using 3 available segmentation methods:

  • Character based segmentation
  • Word based segmentation
  • Byte-pair-encoding (BPE) as suggested by bpe with selectable number of tokens.

After choosing a tokenization method, a vocabulary will be generated and saved for future inference.

Training methods

The models can be trained using several methods:

  • Basic Seq2Seq - given encoded sequence, generate (decode) output sequence. Training is done with teacher-forcing.
  • Multi Seq2Seq - where several tasks (such as multiple languages) are trained simultaneously by using the data sequences as both input to the encoder and output for decoder.
  • Image2Seq - used to train image to caption generators.

Usage

Example training scripts are available in scripts folder. Inference examples are available in examples folder.

  • example for training a transformer on WMT16 according to original paper regime:
DATASET=${1:-"WMT16_de_en"}
DATASET_DIR=${2:-"./data/wmt16_de_en"}
OUTPUT_DIR=${3:-"./results"}

WARMUP="4000"
LR0="512**(-0.5)"

python main.py \
  --save transformer \
  --dataset ${DATASET} \
  --dataset-dir ${DATASET_DIR} \
  --results-dir ${OUTPUT_DIR} \
  --model Transformer \
  --model-config "{'num_layers': 6, 'hidden_size': 512, 'num_heads': 8, 'inner_linear': 2048}" \
  --data-config "{'moses_pretok': True, 'tokenization':'bpe', 'num_symbols':32000, 'shared_vocab':True}" \
  --b 128 \
  --max-length 100 \
  --device-ids 0 \
  --label-smoothing 0.1 \
  --trainer Seq2SeqTrainer \
  --optimization-config "[{'step_lambda':
                          \"lambda t: { \
                              'optimizer': 'Adam', \
                              'lr': ${LR0} * min(t ** -0.5, t * ${WARMUP} ** -1.5), \
                              'betas': (0.9, 0.98), 'eps':1e-9}\"
                          }]"
  • example for training attentional LSTM based model with 3 layers in both encoder and decoder:
python main.py \
  --save de_en_wmt17 \
  --dataset ${DATASET} \
  --dataset-dir ${DATASET_DIR} \
  --results-dir ${OUTPUT_DIR} \
  --model RecurrentAttentionSeq2Seq \
  --model-config "{'hidden_size': 512, 'dropout': 0.2, \
                   'tie_embedding': True, 'transfer_hidden': False, \
                   'encoder': {'num_layers': 3, 'bidirectional': True, 'num_bidirectional': 1, 'context_transform': 512}, \
                   'decoder': {'num_layers': 3, 'concat_attention': True,\
                               'attention': {'mode': 'dot_prod', 'dropout': 0, 'output_transform': True, 'output_nonlinearity': 'relu'}}}" \
  --data-config "{'moses_pretok': True, 'tokenization':'bpe', 'num_symbols':32000, 'shared_vocab':True}" \
  --b 128 \
  --max-length 80 \
  --device-ids 0 \
  --trainer Seq2SeqTrainer \
  --optimization-config "[{'epoch': 0, 'optimizer': 'Adam', 'lr': 1e-3},
                          {'epoch': 6, 'lr': 5e-4},
                          {'epoch': 8, 'lr':1e-4},
                          {'epoch': 10, 'lr': 5e-5},
                          {'epoch': 12, 'lr': 1e-5}]" \
Comments
  • Translate_English_German_LSTM.ipynb in the examples dir seems to be broken

    Translate_English_German_LSTM.ipynb in the examples dir seems to be broken

    Can't open the file in the Chrome browser. Jupyter Notebook Preview in VS Code gives: SyntaxError: Unexpected string in JSON at position 239.

    Would appreciate it if this could be fixed. Thanks a lot.

    opened by yucongo 2
  • TypeError: __init__() got an unexpected keyword argument 'num_symbols'

    TypeError: __init__() got an unexpected keyword argument 'num_symbols'

    Im trying to train the model provided in the script at: scripts/train/train_en_de.sh and I got an error TypeError: __init__() got an unexpected keyword argument 'num_symbols'

    Can you let me know what I am missing?

    opened by daothanhtuan 1
  • ModuleNotFoundError: No module named 'apply_bpe'

    ModuleNotFoundError: No module named 'apply_bpe'

    Hello, the problem I found is that even after I installed the subword-nmt library, running the code below (in the Generate_Caption.ipynb) still makes an error. checkpoint = load_url('https://dl.dropboxusercontent.com/s/05dvriaiqk74cum/caption_resnet50-4c0fa803.pth', map_location={'gpu:0':'cpu'})

    The error is File "/home/noone/anaconda3/envs/tensorflow_3_180/lib/python3.6/site-packages/torch/serialization.py", line 469, in _load result = unpickler.load() ModuleNotFoundError: No module named 'apply_bpe'

    My environment is python 3.6 and I'm using conda.

    I don't know why, and hope you could help with that. Thanks in advance.

    opened by yangysc 1
  • How to retrieve the final context vector from the encoder?

    How to retrieve the final context vector from the encoder?

    Hi, I want to encode a small document into a vector using the RNN encoder. Can you kindly tell me how to get the final context vector representation of a document?

    opened by shahbazsyed 1
  • ImportError: cannot import name 'PermutedSequenceGenerator'

    ImportError: cannot import name 'PermutedSequenceGenerator'

    When running main.py, I get the following error:

      File "main.py", line 15, in <module>
        from seq2seq import models, datasets
      File "seq2seq.pytorch\seq2seq\models\__init__.py", line 1, in <module>
        from .transformer import Transformer, TransformerAttentionDecoder, TransformerAttentionEncoder
      File "seq2seq.pytorch\seq2seq\models\transformer.py", line 6, in <module>
        from .seq2seq_base import Seq2Seq
      File "seq2seq.pytorch\seq2seq\models\seq2seq_base.py", line 8, in <module>
        from seq2seq.tools.beam_search import SequenceGenerator, PermutedSequenceGenerator
    ImportError: cannot import name 'PermutedSequenceGenerator'
    

    I'm not sure whether it needs to be imported for training to work. Thanks

    opened by kevaday 1
  • ModuleNotFound Error

    ModuleNotFound Error

    Hi Elad,

    I am using your code for my task. I face an error related to ModuleNotFoundError: No module named 'seq2seq.tools.utils.log'. I already executed script python setup.py develop but after that it showing the same error.

    Best Arunav

    opened by Shandilya21 3
  • module 'torch.nn' has no attribute 'MultiheadAttention'

    module 'torch.nn' has no attribute 'MultiheadAttention'

    Hi In the file of modules/attention.py, the class MultiHeadAttention(nn.MultiheadAttention) is reported an error:

    class MultiHeadAttention(nn.MultiheadAttention): AttributeError: module 'torch.nn' has no attribute 'MultiheadAttention'

    I use the pytorch=0.4.1. Is there any version of torch that containing the nn.MultiheadAttention?

    opened by chenQ1114 1
  • A question on the beam search

    A question on the beam search

    In this line, https://github.com/eladhoffer/seq2seq.pytorch/blob/348276b2fcc5a60352b3dccaae7102666dbbd4ac/seq2seq/tools/beam_search.py#L216, the decrease of num_hyp will increase the value of k, which then leads to the out-of-bounds for the words[idx][k], I don't know why you use num_hyp?

    opened by wjb123 0
  • How do we use scripts under the train folder

    How do we use scripts under the train folder

    Hi. Please i would like to know how to train train these models. When i ran train_en_de.sh, i get errors:

    Traceback (most recent call last): File "/media/vivien/A/NEW-SMT/seq2seq.pytorch-master/main.py", line 15, in from seq2seq.tools.utils.log import setup_logging ModuleNotFoundError: No module named 'seq2seq.tools.utils.log' Traceback (most recent call last): File "/media/vivien/A/NEW-SMT/seq2seq.pytorch-master/main.py", line 15, in from seq2seq.tools.utils.log import setup_logging ModuleNotFoundError: No module named 'seq2seq.tools.utils.log' Traceback (most recent call last): File "/media/vivien/A/NEW-SMT/seq2seq.pytorch-master/main.py", line 15, in from seq2seq.tools.utils.log import setup_logging ModuleNotFoundError: No module named 'seq2seq.tools.utils.log' Traceback (most recent call last): File "/media/vivien/A/NEW-SMT/seq2seq.pytorch-master/main.py", line 15, in from seq2seq.tools.utils.log import setup_logging ModuleNotFoundError: No module named 'seq2seq.tools.utils.log'

    opened by liperrino 1
Owner
Elad Hoffer
Elad Hoffer
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 195 Dec 17, 2022
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

Sunbow Liu 22 Nov 25, 2022
Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

OFA Sys 1.4k Jan 8, 2023
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 161 Dec 8, 2022
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

zhaohu xing 112 Dec 16, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group 897 Jan 5, 2023
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

Cornelius Roemer 24 Oct 26, 2022
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

hierarchical-transformer-1d Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using ?? transformers In Progress!! 2021.

MyungHoon Jin 7 Nov 6, 2022
Sequence-tagging using deep learning

Classification using Deep Learning Requirements PyTorch version >= 1.9.1+cu111 Python version >= 3.8.10 PyTorch-Lightning version >= 1.4.9 Huggingface

Vineet Kumar 2 Dec 20, 2022
Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

30 Days Of Machine Learning Using Pytorch Objective of the repository is to learn and build machine learning models using Pytorch. List of Algorithms

Mayur 119 Nov 24, 2022
A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

A PyTorch Implementation of GGNN This is a PyTorch implementation of the Gated Graph Sequence Neural Networks (GGNN) as described in the paper Gated G

Ching-Yao Chuang 427 Dec 13, 2022
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 906 Jan 3, 2023
A Fast Sequence Transducer Implementation with PyTorch Bindings

transducer A Fast Sequence Transducer Implementation with PyTorch Bindings. The corresponding publication is Sequence Transduction with Recurrent Neur

Awni Hannun 184 Dec 18, 2022
A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

A PyTorch Implementation of GGNN This is a PyTorch implementation of the Gated Graph Sequence Neural Networks (GGNN) as described in the paper Gated G

Ching-Yao Chuang 427 Dec 13, 2022
PyTorch trainer and model for Sequence Classification

PyTorch-trainer-and-model-for-Sequence-Classification After cloning the repository, modify your training data so that the training data is a .csv file

NhanTieu 2 Dec 9, 2022
Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling

null 274 Dec 6, 2022
Simulate genealogical trees and genomic sequence data using population genetic models

msprime msprime is a population genetics simulator based on tskit. Msprime can simulate random ancestral histories for a sample of individuals (consis

Tskit developers 150 Dec 14, 2022
Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Decision Transformer Lili Chen*, Kevin Lu*, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas†, and Igor M

Kevin Lu 1.4k Jan 7, 2023