Simple Transformer
I've written a series of articles on the transformer architecture and language models on Medium.
This repository contains an implementation of the Transformer architecture presented in the paper Attention Is All You Need by Ashish Vaswani, et. al.
My goal is to write an implementation that is easy to understand and dig into nitty-gritty details where the devil is.
Python environment
You can use any Python virtual environment like venv and conda.
For example, with venv:
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -e.
Spacy Tokenizer Data Preparation
To use Spacy's tokenizer, make sure to download required languages.
For example, English and Germany tokenizers can be downloaded as below:
python -m spacy download en_core_web_sm
python -m spacy download de_core_news_sm
Text Data from Torchtext
This project uses text datasets from Torchtext.
from torchtext import datasets
The default configuration uses Multi30k
dataset.
Training
python train.py config_path
The default config path is config/config.yaml
.
It is possible to resume training from a checkpoint.
python train.py --checkpoint_path runs/20220108-164720-Multi30k-Transformer/checkpoint-010-2.3343.pt
You can run tensorboard
to see the training progress.
tensorboard --logdir=runs
The logs are created under runs
.
Test
python test.py checkpoint_path
Example,
python test.py runs/20220108-164720-Multi30k-Transformer/checkpoint-010-2.3343.pt
config.yaml
is copied to the model folder when training starts, and the test.py
assumes the existence of a config yaml file.
Unit tests
There are some unit tests in the tests
folder.
pytest tests
References:
- The Annotated Transformer by Harvard NLP
- How to code The Transformer in Pytorch by Samuel Lynn-Evans
- The Illustrated Transformer by Jay Alammar
- Transformer Architecture: The Positional Encoding by Amirhossein Kazemnejad
- Transformers without Tears: Improving the Normalization of Self-Attention by Toan Q. Nguyen & Julian Salazar
- Tensor2Tensor by TensorFlow
- PyTorch Transformer by PyTorch
- Language Modeling with nn.Transformer and Torchtext by PyTorch
- My Medium Articles