A PyTorch implementation of the Transformer model in "Attention is All You Need".

Yu-Hsiang Huang

Last update: Jan 4, 2023

Related tags

Deep Learning nlp natural-language-processing deep-learning pytorch attention attention-is-all-you-need

Overview

Attention is all you need: A Pytorch Implementation

This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017).

A novel sequence to sequence framework utilizes the self-attention mechanism, instead of Convolution operation or Recurrent structure, and achieve the state-of-the-art performance on WMT 2014 English-to-German translation task. (2017/06/12)

The official Tensorflow Implementation can be found in: tensorflow/tensor2tensor.

To learn more about self-attention mechanism, you could read "A Structured Self-attentive Sentence Embedding".

The project support training and translation with trained model now.

Note that this project is still a work in progress.

BPE related parts are not yet fully tested.

If there is any suggestion or error, feel free to fire an issue to let me know. :)

Usage

WMT'16 Multimodal Translation: de-en

An example of training for the WMT'16 Multimodal Translation task (http://www.statmt.org/wmt16/multimodal-task.html).

0) Download the spacy language model.

# conda install -c conda-forge spacy 
python -m spacy download en
python -m spacy download de

1) Preprocess the data with torchtext and spacy.

python preprocess.py -lang_src de -lang_trg en -share_vocab -save_data m30k_deen_shr.pkl

2) Train the model

python train.py -data_pkl m30k_deen_shr.pkl -log m30k_deen_shr -embs_share_weight -proj_share_weight -label_smoothing -output_dir output -b 256 -warmup 128000 -epoch 400

3) Test the model

python translate.py -data_pkl m30k_deen_shr.pkl -model trained.chkpt -output prediction.txt

[(WIP)] WMT'17 Multimodal Translation: de-en w/ BPE

1) Download and preprocess the data with bpe:

Since the interfaces is not unified, you need to switch the main function call from main_wo_bpe to main.

python preprocess.py -raw_dir /tmp/raw_deen -data_dir ./bpe_deen -save_data bpe_vocab.pkl -codes codes.txt -prefix deen

2) Train the model

python train.py -data_pkl ./bpe_deen/bpe_vocab.pkl -train_path ./bpe_deen/deen-train -val_path ./bpe_deen/deen-val -log deen_bpe -embs_share_weight -proj_share_weight -label_smoothing -output_dir output -b 256 -warmup 128000 -epoch 400

3) Test the model (not ready)

TODO:
- Load vocabulary.
- Perform decoding after the translation.

Performance

Training

Parameter settings:
- batch size 256
- warmup step 4000
- epoch 200
- lr_mul 0.5
- label smoothing
- do not apply BPE and shared vocabulary
- target embedding / pre-softmax linear layer weight sharing.

Testing

coming soon.

TODO

Evaluation on the generated text.
Attention weight plot.

Acknowledgement

The byte pair encoding parts are borrowed from subword-nmt.
The project structure, some scripts and the dataset preprocessing steps are heavily borrowed from OpenNMT/OpenNMT-py.
Thanks for the suggestions from @srush, @iamalbert, @Zessay, @JulesGM, @ZiJianZhao, and @huanghoujing.

Comments

Decoder input

Hi, I am not sure if you are feeding the right input to the decoder.

(pg. 2) "Given z, the decoder then generates an output sequence (y₁, ..., y_m) of symbols one element at a time. At each step the model is auto-regressive, consuming the previously generated symbols as additional input when generating the next."

I believe your decoder input is a batch of target sequences.

opened by munkim 16
Index error during translating

Hi, I tried to force the GPU selection with CUDA_VISIBLE_DEVICES=1 but it pops an error: RuntimeError: cublas runtime error : library not initialized at /py/conda-bld/pytorch_1490903321756/work/torch/lib/THC/THCGeneral.c:387

I think it's related to this: https://discuss.pytorch.org/t/cublas-runtime-error-library-not-initialized-at-data-users-soumith-builder-wheel-pytorch-src-torch-lib-thc-thcgeneral-c-383/1375/8
bug

opened by vince62s 13
Memory Problem?

Hi, I clone your code and run train it on WMT English-German task, but it failed with "RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generic/THCStorage.cu:66". I run it on a Tesla K40 which has the same memory capacity of 12GB as your Titan X, and with the default settings. So I don`t know why this happens, do you have any idea? Thanks

opened by renqianluo 12
Masking bug?
I get 98% accuracy after 10 epochs on the multi30k validation set using this 1-layer model:

python train.py -data data/multi30k.atok.low.pt -save_model trained -save_mode best -proj_share_weight -dropout 0.0 -n_layers 1 -n_warmup_steps 40 -epoch 50 -d_inner_hid 1 -d_model 128 -d_word_vec 128 -n_head 4

This is a very small model (note -d_inner_hid 1), which should not get good results at all (98% accuracy is way too high in any case). Generating translations with translate.py produces non-sense. This makes me suspect that there is a problem with the masking code that allows the model to 'cheat' by looking at the target sequence.

I haven't been able to figure out where the problem is, but something seems wrong.
opened by larspars 12
bugs in the masking code

hi, i found that in decoder there is a subsequent mask which mask out the future information here . However, in line 123, you feed in the dec_input(which is the target embeding) at first layer. now check this line and then the MultiHeadAttention moudle's forward function, it has a residual connection and will make dec_input directly reached output, see here. so it doest not use the subsequent mask, which means that the model knows the future. am i correct?

opened by eriche2016 9
embedding of positional encoding?

Great work and thanks a lot. I wanted to ask why you do embeddings of the pos encoder? https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/1600401b30eacd3747184827fd40d97752a7627b/transformer/Models.py#L55

I believe the pos encoder should just be added to the input embeddings, like here: https://github.com/Kyubyong/transformer/blob/master/train.py

Let me know, thanks a lot

opened by culurciello 7
why is the BLEU score of the translate result so bad？
I can gain 50.8% accuracy on training set and 40.6% accuracy on validation set with WMT14 ch-en,but the translate result can only gain 1% BLEU score,and i find many sentence have the same beginning words or Phrases.Have you test the BLEU score when you got the model? [ Epoch 9 ]

(Training) ppl: 42.43964, accuracy: 50.856 %, elapse: 98.557 min

(Validation) ppl: 33.06820, accuracy: 40.699 %, elapse: 0.026 min
opened by qtxue 6
Why encoder and decoder use "non_pad_mask"?

https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/20f355eb655bad40195ae302b9d8036716be9a23/transformer/Layers.py#L23

I think the non_pad_mask is not necessary, because processing of padding is done by attn_mask. Why is it necessary?

opened by tamuhey 5
Error about the mask in ScaledDotProductAttention

Currently, the attention mask in the ScaledDotProductAttention is generated in Line 28 in Models.py by: pad_attn_mask = seq_k.data.eq(Constants.PAD).unsqueeze(1) pad_attn_mask = pad_attn_mask.expand(mb_size, len_q, len_k)

Ignoring the batch dimension for an explanation, I assume the generated pad_attn_mask is a matrix of shape (len_q * len_k), then this code will produce the matrix like [A 1], where 1 is an all one submatrix. However, I think the generated attention mask should be like [B 1 // 1 1], where 1 is an all one submatrix and // means line break (sorry I don't know how to type formula in Markdown environments).

opened by yangze0930 5

nan loss when training

Training and validation loss is nan (using commit e21800a6):

$ python3 preprocess.py -train_src data/multi30k/train.en -train_tgt data/multi30k/train.de -valid_src data/multi30k/val.en -valid_tgt data/multi30k/val.de -output data/multi30k/data.pt
$ python3 train.py -data data/multi30k/data.pt -save_model trained -save_model best
[ Epoch 0 ]
  - (Training)   loss:      nan, accuracy: 3.7 %
  - (Validation) loss:      nan, accuracy: 10.0 %
    - [Info] The checkpoint file has been updated.
[ Epoch 1 ]
  - (Training)   loss:      nan, accuracy: 9.09 %
  - (Validation) loss:      nan, accuracy: 9.87 %
[ Epoch 2 ]
  - (Training)   loss:      nan, accuracy: 9.09 %
  - (Validation) loss:      nan, accuracy: 9.83 %
[ Epoch 3 ]
  - (Training)   loss:      nan, accuracy: 9.1 %
  - (Validation) loss:      nan, accuracy: 9.92 %
[ Epoch 4 ]
  - (Training)   loss:      nan, accuracy: 9.09 %
  - (Validation) loss:      nan, accuracy: 9.91 %

bug

opened by sliedes 5

About Position Embedding and mask

Hi, Huang! As far as I know, we hope the pad embedding is a zero-vector, even when it add the position embedding. However, in your new code, the pad embedding is not a zero-vector when the word-embedding add the position embedding. Does it matter? What's more, the encoder output will not multiply the non-pad mask, will this affect the final result? Thanks for your code! Look forward to you reply.

opened by Zessay 4
ValueError: Cell is empty

when i run the commond in ubuntu python preprocess.py -lang_src de -lang_trg en -share_vocab -save_data m30k_deen_shr.pkl An error occurs as follows:

` (att) guest1@GPU2:~/zjl/attention-is-all-you-need-pytorch-master$ python preprocess.py -lang_src de -lang_trg en -share_vocab -save_data m30k_deen_shr.pkl Namespace(data_src=None, data_trg=None, keep_case=False, lang_src='de', lang_trg='en', max_len=100, min_word_count=3, save_data='m30k_deen_shr.pkl', share_vocab=True) [Info] Get source language vocabulary size: 5375 [Info] Get target language vocabulary size: 4556 [Info] Merging two vocabulary ... [Info] Get merged vocabulary size: 9521 [Info] Dumping the processed data to pickle file m30k_deen_shr.pkl Traceback (most recent call last): File "preprocess.py", line 337, in main_wo_bpe() File "preprocess.py", line 332, in main_wo_bpe pickle.dump(data,f) File "/home/guest1/anaconda3/envs/att/lib/python3.6/site-packages/dill/_dill.py", line 267, in dump Pickler(file, protocol, **_kwds).dump(obj) File "/home/guest1/anaconda3/envs/att/lib/python3.6/site-packages/dill/_dill.py", line 454, in dump StockPickler.dump(self, obj)

......

File "/home/guest1/anaconda3/envs/att/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/home/guest1/anaconda3/envs/att/lib/python3.6/site-packages/dill/_dill.py", line 1177, in save_cell f = obj.cell_contents ValueError: Cell is empty `

How can I solve it?

opened by Kznnd 0
Bump tensorflow from 1.14.0 to 2.9.3
Bumps tensorflow from 1.14.0 to 2.9.3.

Release notes

Sourced from tensorflow's releases.

TensorFlow 2.9.3

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

TensorFlow 2.9.2

Release 2.9.2

This releases introduces several vulnerability fixes:

Fixes a CHECK failure in tf.reshape caused by overflows (CVE-2022-35934)

Fixes a CHECK failure in SobolSample caused by missing validation (CVE-2022-35935)

Fixes an OOB read in Gather_nd op in TF Lite (CVE-2022-35937)

Fixes a CHECK failure in TensorListReserve caused by missing validation (CVE-2022-35960)

Fixes an OOB write in Scatter_nd op in TF Lite (CVE-2022-35939)

Fixes an integer overflow in RaggedRangeOp (CVE-2022-35940)

Fixes a CHECK failure in AvgPoolOp (CVE-2022-35941)

Fixes a CHECK failures in UnbatchGradOp (CVE-2022-35952)

Fixes a segfault TFLite converter on per-channel quantized transposed convolutions (CVE-2022-36027)

Fixes a CHECK failures in AvgPool3DGrad (CVE-2022-35959)

Fixes a CHECK failures in FractionalAvgPoolGrad (CVE-2022-35963)

Fixes a segfault in BlockLSTMGradV2 (CVE-2022-35964)

Fixes a segfault in LowerBound and UpperBound (CVE-2022-35965)

... (truncated)

Changelog

Sourced from tensorflow's changelog.

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

Release 2.8.4

This release introduces several vulnerability fixes:

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

... (truncated)

Commits

a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2

258f9a1 Update py_func.cc

cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474

3e75385 Update version numbers to 2.9.3

bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695

3506c90 Update RELEASE.md

8dcb48e Update RELEASE.md

4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...

6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple

5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
preprocess error

There are doubts in the preprocessing process, is the spatial language model in the code the four datasets of train, test, val, and split in the WMT16 multimodal translation task, and what code command line are these data sets imported through preprocess.py!以及我的preprocess.py的运行结果存在报错情况，如果能解答我的困惑，我将不胜感激！ As well as the running result of my preprocess.py, there are errors. If you can solve my confusion, I would be grateful!

opened by zhoup150344 1
OverflowError

When I run the training code, it does not indicate the specific error location, only the error is: “ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size OverflowError: value too large to convert to uint32_t Exception ignored in: 'preshed.bloom.bloom_from_bytes' OverflowError: value too large to convert to uint32_t ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size OverflowError: value too large to convert to uint32_t Exception ignored in: 'preshed.bloom.bloom_from_bytes' OverflowError: value too large to convert to uint32_t OverflowError: value too large to convert to uint32_t Exception ignored in: 'preshed.bloom.bloom_from_bytes' OverflowError: value too large to convert to uint32_t OverflowError: value too large to convert to uint32_t Exception ignored in: 'preshed.bloom.bloom_from_bytes' OverflowError: value too large to convert to uint32_t ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size ValueError: bytes length not a multiple of item size Exception ignored in: 'preshed.bloom.bloom_from_bytes' ValueError: bytes length not a multiple of item size OverflowError: value too large to convert to uint32_t Exception ignored in: 'preshed.bloom.bloom_from_bytes' OverflowError: value too large to convert to uint32_t Segmentation fault ”

how should i handle this？

opened by Daming-TF 0
download dataset error

hello, I want to download the WMT'17 by your codes,but I faid,could you tell me how to solve this problem,thank you so much.

the error as following: Already downloaded and extracted http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz. Already downloaded and extracted http://data.statmt.org/wmt17/translation-task/dev.tgz. Downloading from http://storage.googleapis.com/tf-perf-public/official_transformer/test_data/newstest2014.tgz to newstest2014.tgz. newstest2014.tgz: 0.00B [00:00, ?B/s] Traceback (most recent call last): File "preprocess.py", line 336, in main() File "preprocess.py", line 187, in main raw_test = get_raw_files(opt.raw_dir, _TEST_DATA_SOURCES) File "preprocess.py", line 100, in get_raw_files src_file, trg_file = download_and_extract(raw_dir, d["url"], d["src"], d["trg"]) File "preprocess.py", line 71, in download_and_extract compressed_file = _download_file(download_dir, url) File "preprocess.py", line 93, in _download_file urllib.request.urlretrieve(url, filename=filename, reporthook=t.update_to) File "/usr/local/lib/python3.7/urllib/request.py", line 247, in urlretrieve with contextlib.closing(urlopen(url, data)) as fp: File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/usr/local/lib/python3.7/urllib/request.py", line 531, in open response = meth(req, response) File "/usr/local/lib/python3.7/urllib/request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "/usr/local/lib/python3.7/urllib/request.py", line 569, in error return self._call_chain(*args) File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/local/lib/python3.7/urllib/request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

opened by qimg412 2

My question

if trg_emb_prj_weight_sharing:
            # Share the weight between target word embedding & last dense layer
            self.trg_word_prj.weight = self.decoder.trg_word_emb.weight
if emb_src_trg_weight_sharing:
            self.encoder.src_word_emb.weight = self.decoder.trg_word_emb.weight

The code above want to realize weight share, but I'm confused that the embed layer and the linear layer have different shape of weight. How can this assignment work?

opened by Messiz 2

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Related tags

Overview

Attention is all you need: A Pytorch Implementation

Usage

WMT'16 Multimodal Translation: de-en

0) Download the spacy language model.

1) Preprocess the data with torchtext and spacy.

2) Train the model

3) Test the model

[(WIP)] WMT'17 Multimodal Translation: de-en w/ BPE

1) Download and preprocess the data with bpe:

2) Train the model

3) Test the model (not ready)

Performance

Training

Testing

TODO

Acknowledgement

Comments

TensorFlow 2.9.3

Release 2.9.3

TensorFlow 2.9.2

Release 2.9.2

Release 2.9.3

Release 2.8.4

Owner

Yu-Hsiang Huang

pytorch implementation of Attention is all you need

Implementation of Vaswani, Ashish, et al. "Attention is all you need."

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

[ACM MM 2021] Yes, "Attention is All You Need", for Exemplar based Colorization

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

PyTorch implementation of MuseMorphose, a Transformer-based model for music style transfer.

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

Transformer - Transformer in PyTorch

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

Transformer model implemented with Pytorch