A Pytorch Implementation of the Transformer: Attention Is All You Need
Our implementation is largely based on Tensorflow implementation
Requirements
- NumPy >= 1.11.1
- Pytorch >= 0.3.0
- nltk
- tensorboard-pytorch (build from source)
Why This Project?
I'm a freshman of pytorch. So I tried to implement some projects by pytorch. Recently, I read the paper Attention is all you need and impressed by the idea. So that's it. I got similar result compared with the original tensorflow implementation.
Differences with the original paper
I don't intend to replicate the paper exactly. Rather, I aim to implement the main ideas in the paper and verify them in a SIMPLE and QUICK way. In this respect, some parts in my code are different than those in the paper. Among them are
- I used the IWSLT 2016 de-en dataset, not the wmt dataset because the former is much smaller, and requires no special preprocessing.
- I constructed vocabulary with words, not subwords for simplicity. Of course, you can try bpe or word-piece if you want.
- I parameterized positional encoding. The paper used some sinusoidal formula, but Noam, one of the authors, says they both work. See the discussion in reddit
- The paper adjusted the learning rate to global steps. I fixed the learning to a small number, 0.0001 simply because training was reasonably fast enough with the small dataset (Only a couple of hours on a single GTX 1060!!).
File description
hyperparams.py
includes all hyper parameters that are needed.prepro.py
creates vocabulary files for the source and the target.data_load.py
contains functions regarding loading and batching data.modules.py
has all building blocks for encoder/decoder networks.train.py
has the model.eval.py
is for evaluation.
Training
- STEP 1. Download IWSLT 2016 German–English parallel corpus and extract it to
corpora/
folder.
wget -qO- https://wit3.fbk.eu/archive/2016-01//texts/de/en/de-en.tgz | tar xz; mv de-en corpora
- STEP 2. Adjust hyper parameters in
hyperparams.py
if necessary. - STEP 3. Run
prepro.py
to generate vocabulary files to thepreprocessed
folder. - STEP 4. Run
train.py
or download pretrained weights, put it into folder './models/' and change theeval_epoch
inhpyerparams.py
to 18 - STEP 5. Show loss and accuracy in tensorboard
tensorboard --logdir runs
Evaluation
- Run
eval.py
.
Results
I got a BLEU score of 16.7.(tensorflow implementation 17.14) (Recollect I trained with a small dataset, limited vocabulary) Some of the evaluation results are as follows. Details are available in the results
folder.
source: Ich bin nicht sicher was ich antworten soll
expected: I'm not really sure about the answer
got: I'm not sure what I'm going to answer
source: Was macht den Unterschied aus
expected: What makes his story different
got: What makes a difference
source: Vielen Dank
expected: Thank you
got: Thank you
source: Das ist ein Baum
expected: This is a tree
got: So this is a tree