A PyTorch implementation of the Transformer model in "Attention is All You Need".

Overview

Attention is all you need: A Pytorch Implementation

This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017).

A novel sequence to sequence framework utilizes the self-attention mechanism, instead of Convolution operation or Recurrent structure, and achieve the state-of-the-art performance on WMT 2014 English-to-German translation task. (2017/06/12)

The official Tensorflow Implementation can be found in: tensorflow/tensor2tensor.

To learn more about self-attention mechanism, you could read "A Structured Self-attentive Sentence Embedding".

The project support training and translation with trained model now.

Note that this project is still a work in progress.

BPE related parts are not yet fully tested.

If there is any suggestion or error, feel free to fire an issue to let me know. :)

Usage

WMT'16 Multimodal Translation: de-en

An example of training for the WMT'16 Multimodal Translation task (http://www.statmt.org/wmt16/multimodal-task.html).

0) Download the spacy language model.

# conda install -c conda-forge spacy 
python -m spacy download en
python -m spacy download de

1) Preprocess the data with torchtext and spacy.

python preprocess.py -lang_src de -lang_trg en -share_vocab -save_data m30k_deen_shr.pkl

2) Train the model

python train.py -data_pkl m30k_deen_shr.pkl -log m30k_deen_shr -embs_share_weight -proj_share_weight -label_smoothing -output_dir output -b 256 -warmup 128000 -epoch 400

3) Test the model

python translate.py -data_pkl m30k_deen_shr.pkl -model trained.chkpt -output prediction.txt

[(WIP)] WMT'17 Multimodal Translation: de-en w/ BPE

1) Download and preprocess the data with bpe:

Since the interfaces is not unified, you need to switch the main function call from main_wo_bpe to main.

python preprocess.py -raw_dir /tmp/raw_deen -data_dir ./bpe_deen -save_data bpe_vocab.pkl -codes codes.txt -prefix deen

2) Train the model

python train.py -data_pkl ./bpe_deen/bpe_vocab.pkl -train_path ./bpe_deen/deen-train -val_path ./bpe_deen/deen-val -log deen_bpe -embs_share_weight -proj_share_weight -label_smoothing -output_dir output -b 256 -warmup 128000 -epoch 400

3) Test the model (not ready)

  • TODO:
    • Load vocabulary.
    • Perform decoding after the translation.

Performance

Training

  • Parameter settings:
    • batch size 256
    • warmup step 4000
    • epoch 200
    • lr_mul 0.5
    • label smoothing
    • do not apply BPE and shared vocabulary
    • target embedding / pre-softmax linear layer weight sharing.

Testing

  • coming soon.

TODO

  • Evaluation on the generated text.
  • Attention weight plot.

Acknowledgement

  • The byte pair encoding parts are borrowed from subword-nmt.
  • The project structure, some scripts and the dataset preprocessing steps are heavily borrowed from OpenNMT/OpenNMT-py.
  • Thanks for the suggestions from @srush, @iamalbert, @Zessay, @JulesGM, @ZiJianZhao, and @huanghoujing.
Comments
  • Decoder input

    Decoder input

    Hi, I am not sure if you are feeding the right input to the decoder.

    (pg. 2) "Given z, the decoder then generates an output sequence (y1, ..., ym) of symbols one element at a time. At each step the model is auto-regressive, consuming the previously generated symbols as additional input when generating the next."

    I believe your decoder input is a batch of target sequences.

    opened by munkim 16
  • Index error during translating

    Index error during translating

    Hi, I tried to force the GPU selection with CUDA_VISIBLE_DEVICES=1 but it pops an error: RuntimeError: cublas runtime error : library not initialized at /py/conda-bld/pytorch_1490903321756/work/torch/lib/THC/THCGeneral.c:387

    I think it's related to this: https://discuss.pytorch.org/t/cublas-runtime-error-library-not-initialized-at-data-users-soumith-builder-wheel-pytorch-src-torch-lib-thc-thcgeneral-c-383/1375/8

    bug 
    opened by vince62s 13
  • Memory Problem?

    Memory Problem?

    Hi, I clone your code and run train it on WMT English-German task, but it failed with "RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generic/THCStorage.cu:66". I run it on a Tesla K40 which has the same memory capacity of 12GB as your Titan X, and with the default settings. So I don`t know why this happens, do you have any idea? Thanks

    opened by renqianluo 12
  • Masking bug?

    Masking bug?

    I get 98% accuracy after 10 epochs on the multi30k validation set using this 1-layer model:

    python train.py -data data/multi30k.atok.low.pt -save_model trained -save_mode best -proj_share_weight -dropout 0.0 -n_layers 1 -n_warmup_steps 40 -epoch 50 -d_inner_hid 1 -d_model 128 -d_word_vec 128 -n_head 4
    

    This is a very small model (note -d_inner_hid 1), which should not get good results at all (98% accuracy is way too high in any case). Generating translations with translate.py produces non-sense. This makes me suspect that there is a problem with the masking code that allows the model to 'cheat' by looking at the target sequence.

    I haven't been able to figure out where the problem is, but something seems wrong.

    opened by larspars 12
  • bugs in the masking code

    bugs in the masking code

    hi, i found that in decoder there is a subsequent mask which mask out the future information here . However, in line 123, you feed in the dec_input(which is the target embeding) at first layer. now check this line and then the MultiHeadAttention moudle's forward function, it has a residual connection and will make dec_input directly reached output, see here. so it doest not use the subsequent mask, which means that the model knows the future. am i correct?

    opened by eriche2016 9
  • embedding of positional encoding?

    embedding of positional encoding?

    Great work and thanks a lot. I wanted to ask why you do embeddings of the pos encoder? https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/1600401b30eacd3747184827fd40d97752a7627b/transformer/Models.py#L55

    I believe the pos encoder should just be added to the input embeddings, like here: https://github.com/Kyubyong/transformer/blob/master/train.py

    Let me know, thanks a lot

    opened by culurciello 7
  • why is the BLEU score of the translate result  so bad?

    why is the BLEU score of the translate result so bad?

    I can gain 50.8% accuracy on training set and 40.6% accuracy on validation set with WMT14 ch-en,but the translate result can only gain 1% BLEU score,and i find many sentence have the same beginning words or Phrases.Have you test the BLEU score when you got the model? [ Epoch 9 ]

    • (Training) ppl: 42.43964, accuracy: 50.856 %, elapse: 98.557 min
    • (Validation) ppl: 33.06820, accuracy: 40.699 %, elapse: 0.026 min
    opened by qtxue 6
  • Why encoder and decoder use

    Why encoder and decoder use "non_pad_mask"?

    https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/20f355eb655bad40195ae302b9d8036716be9a23/transformer/Layers.py#L23

    I think the non_pad_mask is not necessary, because processing of padding is done by attn_mask. Why is it necessary?

    opened by tamuhey 5
  • Error about the mask in ScaledDotProductAttention

    Error about the mask in ScaledDotProductAttention

    Currently, the attention mask in the ScaledDotProductAttention is generated in Line 28 in Models.py by: pad_attn_mask = seq_k.data.eq(Constants.PAD).unsqueeze(1) pad_attn_mask = pad_attn_mask.expand(mb_size, len_q, len_k)

    Ignoring the batch dimension for an explanation, I assume the generated pad_attn_mask is a matrix of shape (len_q * len_k), then this code will produce the matrix like [A 1], where 1 is an all one submatrix. However, I think the generated attention mask should be like [B 1 // 1 1], where 1 is an all one submatrix and // means line break (sorry I don't know how to type formula in Markdown environments).

    opened by yangze0930 5
  • nan loss when training

    nan loss when training

    Training and validation loss is nan (using commit e21800a6):

    $ python3 preprocess.py -train_src data/multi30k/train.en -train_tgt data/multi30k/train.de -valid_src data/multi30k/val.en -valid_tgt data/multi30k/val.de -output data/multi30k/data.pt
    $ python3 train.py -data data/multi30k/data.pt -save_model trained -save_model best
    [ Epoch 0 ]
      - (Training)   loss:      nan, accuracy: 3.7 %
      - (Validation) loss:      nan, accuracy: 10.0 %
        - [Info] The checkpoint file has been updated.
    [ Epoch 1 ]
      - (Training)   loss:      nan, accuracy: 9.09 %
      - (Validation) loss:      nan, accuracy: 9.87 %
    [ Epoch 2 ]
      - (Training)   loss:      nan, accuracy: 9.09 %
      - (Validation) loss:      nan, accuracy: 9.83 %
    [ Epoch 3 ]
      - (Training)   loss:      nan, accuracy: 9.1 %
      - (Validation) loss:      nan, accuracy: 9.92 %
    [ Epoch 4 ]
      - (Training)   loss:      nan, accuracy: 9.09 %
      - (Validation) loss:      nan, accuracy: 9.91 %
    
    bug 
    opened by sliedes 5
  • About Position Embedding and mask

    About Position Embedding and mask

    Hi, Huang! As far as I know, we hope the pad embedding is a zero-vector, even when it add the position embedding. However, in your new code, the pad embedding is not a zero-vector when the word-embedding add the position embedding. Does it matter? What's more, the encoder output will not multiply the non-pad mask, will this affect the final result? Thanks for your code! Look forward to you reply.

    opened by Zessay 4
  • ValueError: Cell is empty

    ValueError: Cell is empty

    when i run the commond in ubuntu python preprocess.py -lang_src de -lang_trg en -share_vocab -save_data m30k_deen_shr.pkl An error occurs as follows:

    ` (att) guest1@GPU2:~/zjl/attention-is-all-you-need-pytorch-master$ python preprocess.py -lang_src de -lang_trg en -share_vocab -save_data m30k_deen_shr.pkl Namespace(data_src=None, data_trg=None, keep_case=False, lang_src='de', lang_trg='en', max_len=100, min_word_count=3, save_data='m30k_deen_shr.pkl', share_vocab=True) [Info] Get source language vocabulary size: 5375 [Info] Get target language vocabulary size: 4556 [Info] Merging two vocabulary ... [Info] Get merged vocabulary size: 9521 [Info] Dumping the processed data to pickle file m30k_deen_shr.pkl Traceback (most recent call last): File "preprocess.py", line 337, in main_wo_bpe() File "preprocess.py", line 332, in main_wo_bpe pickle.dump(data,f) File "/home/guest1/anaconda3/envs/att/lib/python3.6/site-packages/dill/_dill.py", line 267, in dump Pickler(file, protocol, **_kwds).dump(obj) File "/home/guest1/anaconda3/envs/att/lib/python3.6/site-packages/dill/_dill.py", line 454, in dump StockPickler.dump(self, obj)

    ......

    File "/home/guest1/anaconda3/envs/att/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/home/guest1/anaconda3/envs/att/lib/python3.6/site-packages/dill/_dill.py", line 1177, in save_cell f = obj.cell_contents ValueError: Cell is empty `

    How can I solve it?

    opened by Kznnd 0
  • Bump tensorflow from 1.14.0 to 2.9.3

    Bump tensorflow from 1.14.0 to 2.9.3

    Bumps tensorflow from 1.14.0 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • preprocess error

    preprocess error

    There are doubts in the preprocessing process, is the spatial language model in the code the four datasets of train, test, val, and split in the WMT16 multimodal translation task, and what code command line are these data sets imported through preprocess.py!以及我的preprocess.py的运行结果存在报错情况,如果能解答我的困惑,我将不胜感激! As well as the running result of my preprocess.py, there are errors. If you can solve my confusion, I would be grateful! B04C451113E7B59B9A1442197DA509C2

    opened by zhoup150344 1
  • OverflowError

    OverflowError

    When I run the training code, it does not indicate the specific error location, only the error is: “ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size OverflowError: value too large to convert to uint32_t Exception ignored in: 'preshed.bloom.bloom_from_bytes' OverflowError: value too large to convert to uint32_t ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size OverflowError: value too large to convert to uint32_t Exception ignored in: 'preshed.bloom.bloom_from_bytes' OverflowError: value too large to convert to uint32_t OverflowError: value too large to convert to uint32_t Exception ignored in: 'preshed.bloom.bloom_from_bytes' OverflowError: value too large to convert to uint32_t OverflowError: value too large to convert to uint32_t Exception ignored in: 'preshed.bloom.bloom_from_bytes' OverflowError: value too large to convert to uint32_t ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size OverflowError: value too large to convert to uint32_t Exception ignored in: 'preshed.bloom.bloom_from_bytes' OverflowError: value too large to convert to uint32_t Segmentation fault ”

    how should i handle this?

    opened by Daming-TF 0
  • download dataset error

    download dataset error

    hello, I want to download the WMT'17 by your codes,but I faid,could you tell me how to solve this problem,thank you so much.

    the error as following: Already downloaded and extracted http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz. Already downloaded and extracted http://data.statmt.org/wmt17/translation-task/dev.tgz. Downloading from http://storage.googleapis.com/tf-perf-public/official_transformer/test_data/newstest2014.tgz to newstest2014.tgz. newstest2014.tgz: 0.00B [00:00, ?B/s] Traceback (most recent call last): File "preprocess.py", line 336, in main() File "preprocess.py", line 187, in main raw_test = get_raw_files(opt.raw_dir, _TEST_DATA_SOURCES) File "preprocess.py", line 100, in get_raw_files src_file, trg_file = download_and_extract(raw_dir, d["url"], d["src"], d["trg"]) File "preprocess.py", line 71, in download_and_extract compressed_file = _download_file(download_dir, url) File "preprocess.py", line 93, in _download_file urllib.request.urlretrieve(url, filename=filename, reporthook=t.update_to) File "/usr/local/lib/python3.7/urllib/request.py", line 247, in urlretrieve with contextlib.closing(urlopen(url, data)) as fp: File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/usr/local/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/local/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/usr/local/lib/python3.7/urllib/request.py", line 569, in error return self._call_chain(*args) File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/local/lib/python3.7/urllib/request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

    opened by qimg412 2
  • My question

    My question

    if trg_emb_prj_weight_sharing:
                # Share the weight between target word embedding & last dense layer
                self.trg_word_prj.weight = self.decoder.trg_word_emb.weight
    if emb_src_trg_weight_sharing:
                self.encoder.src_word_emb.weight = self.decoder.trg_word_emb.weight
    

    The code above want to realize weight share, but I'm confused that the embed layer and the linear layer have different shape of weight. How can this assignment work?

    opened by Messiz 2
Owner
Yu-Hsiang Huang
Natural Language Processing Lab. National Taiwan University. Deep Learning enthusiast.
Yu-Hsiang Huang
pytorch implementation of Attention is all you need

A Pytorch Implementation of the Transformer: Attention Is All You Need Our implementation is largely based on Tensorflow implementation Requirements N

null 230 Dec 7, 2022
Implementation of Vaswani, Ashish, et al. "Attention is all you need."

Attention Is All You Need Paper Implementation This is my from-scratch implementation of the original transformer architecture from the following pape

Brando Koch 195 Dec 30, 2022
Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Mozhdeh Gheini 16 Jul 16, 2022
[ACM MM 2021] Yes, "Attention is All You Need", for Exemplar based Colorization

Transformer for Image Colorization This is an implemention for Yes, "Attention Is All You Need", for Exemplar based Colorization, and the current soft

Wang Yin 30 Dec 7, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Phil Wang 272 Dec 23, 2022
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Phil Wang 12.6k Jan 9, 2023
In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Contrastive Learning of Object Representations Supervisor: Prof. Dr. Gemma Roig Institutions: Goethe University CVAI - Computational Vision & Artifici

Dirk Neuhäuser 6 Dec 8, 2022
Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

Hila Chefer 489 Jan 7, 2023
🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

PyTorch implementation of OpenAI's Finetuned Transformer Language Model This is a PyTorch implementation of the TensorFlow code provided with OpenAI's

Hugging Face 1.4k Jan 5, 2023
PyTorch implementation of MuseMorphose, a Transformer-based model for music style transfer.

MuseMorphose This repository contains the official implementation of the following paper: Shih-Lun Wu, Yi-Hsuan Yang MuseMorphose: Full-Song and Fine-

Yating Music, Taiwan AI Labs 142 Jan 8, 2023
Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

null 61 Jan 1, 2023
Transformer - Transformer in PyTorch

Transformer 完成进度 Embeddings and PositionalEncoding with example. MultiHeadAttent

Tianyang Li 1 Jan 6, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Shuffle Transformer The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer" Introduction Very recently, window-

null 87 Nov 29, 2022
Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Swin-Transformer-Tensorflow A direct translation of the official PyTorch implementation of "Swin Transformer: Hierarchical Vision Transformer using Sh

null 52 Dec 29, 2022
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

FLASH - Pytorch Implementation of the Transformer variant proposed in the paper Transformer Quality in Linear Time Install $ pip install FLASH-pytorch

Phil Wang 209 Dec 28, 2022
😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

------ Update September 2018 ------ It's been a year since TorchMoji and DeepMoji were released. We're trying to understand how it's being used such t

Hugging Face 865 Dec 24, 2022
Transformer model implemented with Pytorch

transformer-pytorch Transformer model implemented with Pytorch Attention is all you need-[Paper] Architecture Self-Attention self_attention.py class

Mingu Kang 12 Sep 3, 2022