[TensorFlow 2] Attention is all you need (Transformer)
TensorFlow implementation of "Attention is all you need (Transformer)"
Dataset
The MNIST dataset is used for confirming the working of the transformer.
The dataset is processed as follows for regarding as a sequential form.
- Trim off the sides from the square image.
- (H X W) -> (H X W_trim)
- H (Height) = W (Width) = 28
- W_trim = 18
- The height axis is regarded as a sequence and the width axis is regarded as a feature of each sequence.
- (H X W) = (S X F)
- S (Sequence) = 28
- F (Feature) = 18
- (H X W) -> (H X W_trim)
- Specify the target Y as an inverse sequence of X to differentiate the input sequence from the target sequence.
- In the figure, the data is shown in an upside-down form.
Results
Training
Generation
Class | Attention Map | Reconstruction |
0 | ||
1 | ||
2 | ||
3 | ||
4 | ||
5 | ||
6 | ||
7 | ||
8 | ||
9 |
Requirements
- Tensorflow 2.4.0
- whiteboxlayer 0.2.1
Reference
[1] Vaswani, Ashish, et al. Attention is all you need. Advances in neural information processing systems. 2017.