Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

Overview

[UPDATED] A TensorFlow Implementation of Attention Is All You Need

When I opened this repository in 2017, there was no official code yet. I tried to implement the paper as I understood, but to no surprise it had several bugs. I realized them mostly thanks to people who issued here, so I'm very grateful to all of them. Though there is the official implementation as well as several other unofficial github repos, I decided to update my own one. This update focuses on:

  • readable / understandable code writing
  • modularization (but not too much)
  • revising known bugs. (masking, positional encoding, ...)
  • updating to TF1.12. (tf.data, ...)
  • adding some missing components (bpe, shared weight matrix, ...)
  • including useful comments in the code.

I still stick to IWSLT 2016 de-en. I guess if you'd like to test on a big data such as WMT, you would rely on the official implementation. After all, it's pleasant to check quickly if your model works. The initial code for TF1.2 is moved to the tf1.2_lecacy folder for the record.

Requirements

  • python==3.x (Let's move on to python 3 if you still use python 2)
  • tensorflow==1.12.0
  • numpy>=1.15.4
  • sentencepiece==0.1.8
  • tqdm>=4.28.1

Training

bash download.sh

It should be extracted to iwslt2016/de-en folder automatically.

  • STEP 2. Run the command below to create preprocessed train/eval/test data.
python prepro.py

If you want to change the vocabulary size (default:32000), do this.

python prepro.py --vocab_size 8000

It should create two folders iwslt2016/prepro and iwslt2016/segmented.

  • STEP 3. Run the following command.
python train.py

Check hparams.py to see which parameters are possible. For example,

python train.py --logdir myLog --batch_size 256 --dropout_rate 0.5
  • STEP 3. Or download the pretrained models.
wget https://dl.dropbox.com/s/4lom1czy5xfzr4q/log.zip; unzip log.zip; rm log.zip

Training Loss Curve

Learning rate

Bleu score on devset

Inference (=test)

  • Run
python test.py --ckpt log/1/iwslt2016_E19L2.64-29146 (OR yourCkptFile OR yourCkptFileDirectory)

Results

  • Typically, machine translation is evaluated with Bleu score.
  • All evaluation results are available in eval/1 and test/1.
tst2013 (dev) tst2014 (test)
28.06 23.88

Notes

  • Beam decoding will be added soon.
  • I'm going to update the code when TF2.0 comes out if possible.
Comments
  • python test.py --ckpt log/1 --d_model=512 --d_ff=1024  Report an error

    python test.py --ckpt log/1 --d_model=512 --d_ff=1024 Report an error

    After I finished training, I ran the test.py file and reported the following error. I have not solved it, I hope I can get your help. @Kyubyong

    ` INFO:tensorflow:Restoring parameters from log/1/model_E04-L4.76-49080 INFO:root:# get hypotheses

    Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[125,100] = 100 is not in [0, 100) [[{{node encoder/positional_encoding/embedding_lookup}}]]

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "test.py", line 54, in hypotheses = get_hypotheses(num_test_batches, num_test_samples, sess, y_hat, m.idx2token) File "/mnt/data/xxxx/transformer-master/utils.py", line 144, in get_hypotheses h = sess.run(tensor) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[125,100] = 100 is not in [0, 100) [[node encoder/positional_encoding/embedding_lookup (defined at /mnt/data/xxxx/transformer-master/modules.py:295) ]]

    Caused by op 'encoder/positional_encoding/embedding_lookup', defined at: File "test.py", line 41, in y_hat, _ = m.eval(xs, ys) File "/mnt/data/xxxx/transformer-master/model.py", line 163, in eval memory, sents1 = self.encode(xs, False) File "/mnt/data/xxxx/transformer-master/model.py", line 50, in encode enc += positional_encoding(enc, self.hp.maxlen1) File "/mnt/data/xxxx/transformer-master/modules.py", line 295, in positional_encoding outputs = tf.nn.embedding_lookup(position_enc, position_ind) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/ops/embedding_ops.py", line 316, in embedding_lookup transform_fn=None) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/ops/embedding_ops.py", line 133, in _embedding_lookup_and_transform result = _clip(array_ops.gather(params[0], ids, name=name), File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper return target(*args, **kwargs) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 3273, in gather return gen_array_ops.gather_v2(params, indices, axis, name=name) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3748, in gather_v2 "GatherV2", params=params, indices=indices, axis=axis, name=name) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op op_def=op_def) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in init self._traceback = tf_stack.extract_stack()

    InvalidArgumentError (see above for traceback): indices[125,100] = 100 is not in [0, 100) [[node encoder/positional_encoding/embedding_lookup (defined at /mnt/data/xxxx /transformer-master/modules.py:295) ]] `

    opened by Single430 13
  • does the key masking work?

    does the key masking work?

    Hi @Kyubyong as you can see the key masking code as following:

    # Key Masking key_masks = tf.sign(tf.abs(tf.reduce_sum(keys, axis=-1))) # (N, T_k) key_masks = tf.tile(key_masks, [num_heads, 1]) # (h*N, T_k) key_masks = tf.tile(tf.expand_dims(key_masks, 1), [1, tf.shape(queries)[1], 1]) # (h*N, T_q, T_k)

    the params keys,is the sum of word_embedding and position_embedding. it means that even the word in a sentence is padding 0, as add postion_embedding to the word_embedding, there's no 0 vector for the final word_embedding. therefore, the key_masks must all be one, no zero! so I'm confused if the code works?

    opened by liuwei1206 6
  • training data was used to eval

    training data was used to eval

    in train.py line 77 _, _eval_summaries = sess.run([eval_init_op, eval_summaries]) this line will execute eval_summaries with train data, do:

    sess.run(eval_init_op)       
    _eval_summaries = sess.run(eval_summaries)
    

    instead

    opened by hihell 3
  • why

    why "split" to get multi-head?

    as the paper said or in some other implementation: self.w_qs = nn.Linear(d_model, n_head * d_k)
    the data size is larger. but in this project, it is

           Q_ = tf.concat(tf.split(Q, num_heads, axis=2), axis=0) # (h*N, T_q, C/h) 
           K_ = tf.concat(tf.split(K, num_heads, axis=2), axis=0) # (h*N, T_k, C/h) 
           V_ = tf.concat(tf.split(V, num_heads, axis=2), axis=0) # (h*N, T_k, C/h)
    

    it's like using partial of Q/K/V to form one head. Can anyone help to explain why it uses "split" and "concat" to get multi-head?

    Thanks!

    opened by LifangD 3
  • Are the projection layers among multiple blocks shared?

    Are the projection layers among multiple blocks shared?

    Hi, I have a question about the codes.

            # Linear projections
            Q = tf.layers.dense(queries, num_units, activation=tf.nn.relu) # (N, T_q, C)
            K = tf.layers.dense(keys, num_units, activation=tf.nn.relu) # (N, T_k, C)
            V = tf.layers.dense(keys, num_units, activation=tf.nn.relu) # (N, T_k, C)
    

    Is there a mechanism that tied these three layers between multiple blocks? It seems their parameters are not shared between different blocks. What should i do to tie them?

    Thanks!

    opened by haoransh 3
  • Bump tensorflow from 1.12.0 to 2.5.3

    Bump tensorflow from 1.12.0 to 2.5.3

    Bumps tensorflow from 1.12.0 to 2.5.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.5.3

    Release 2.5.3

    Note: This is the last release in the 2.5 series.

    This releases introduces several vulnerability fixes:

    • Fixes a floating point division by 0 when executing convolution operators (CVE-2022-21725)
    • Fixes a heap OOB read in shape inference for ReverseSequence (CVE-2022-21728)
    • Fixes a heap OOB access in Dequantize (CVE-2022-21726)
    • Fixes an integer overflow in shape inference for Dequantize (CVE-2022-21727)
    • Fixes a heap OOB access in FractionalAvgPoolGrad (CVE-2022-21730)
    • Fixes an overflow and divide by zero in UnravelIndex (CVE-2022-21729)
    • Fixes a type confusion in shape inference for ConcatV2 (CVE-2022-21731)
    • Fixes an OOM in ThreadPoolHandle (CVE-2022-21732)
    • Fixes an OOM due to integer overflow in StringNGrams (CVE-2022-21733)
    • Fixes more issues caused by incomplete validation in boosted trees code (CVE-2021-41208)
    • Fixes an integer overflows in most sparse component-wise ops (CVE-2022-23567)
    • Fixes an integer overflows in AddManySparseToTensorsMap (CVE-2022-23568)
    • Fixes a number of CHECK-failures in MapStage (CVE-2022-21734)
    • Fixes a division by zero in FractionalMaxPool (CVE-2022-21735)
    • Fixes a number of CHECK-fails when building invalid/overflowing tensor shapes (CVE-2022-23569)
    • Fixes an undefined behavior in SparseTensorSliceDataset (CVE-2022-21736)
    • Fixes an assertion failure based denial of service via faulty bin count operations (CVE-2022-21737)
    • Fixes a reference binding to null pointer in QuantizedMaxPool (CVE-2022-21739)
    • Fixes an integer overflow leading to crash in SparseCountSparseOutput (CVE-2022-21738)
    • Fixes a heap overflow in SparseCountSparseOutput (CVE-2022-21740)
    • Fixes an FPE in BiasAndClamp in TFLite (CVE-2022-23557)
    • Fixes an FPE in depthwise convolutions in TFLite (CVE-2022-21741)
    • Fixes an integer overflow in TFLite array creation (CVE-2022-23558)
    • Fixes an integer overflow in TFLite (CVE-2022-23559)
    • Fixes a dangerous OOB write in TFLite (CVE-2022-23561)
    • Fixes a vulnerability leading to read and write outside of bounds in TFLite (CVE-2022-23560)
    • Fixes a set of vulnerabilities caused by using insecure temporary files (CVE-2022-23563)
    • Fixes an integer overflow in Range resulting in undefined behavior and OOM (CVE-2022-23562)
    • Fixes a vulnerability where missing validation causes tf.sparse.split to crash when axis is a tuple (CVE-2021-41206)
    • Fixes a CHECK-fail when decoding resource handles from proto (CVE-2022-23564)
    • Fixes a CHECK-fail with repeated AttrDef (CVE-2022-23565)
    • Fixes a heap OOB write in Grappler (CVE-2022-23566)
    • Fixes a CHECK-fail when decoding invalid tensors from proto (CVE-2022-23571)
    • Fixes an unitialized variable access in AssignOp (CVE-2022-23573)
    • Fixes an integer overflow in OpLevelCostEstimator::CalculateTensorSize (CVE-2022-23575)
    • Fixes an integer overflow in OpLevelCostEstimator::CalculateOutputSize (CVE-2022-23576)
    • Fixes a null dereference in GetInitOp (CVE-2022-23577)
    • Fixes a memory leak when a graph node is invalid (CVE-2022-23578)
    • Fixes an abort caused by allocating a vector that is too large (CVE-2022-23580)
    • Fixes multiple CHECK-failures during Grappler's IsSimplifiableReshape (CVE-2022-23581)
    • Fixes multiple CHECK-failures during Grappler's SafeToRemoveIdentity (CVE-2022-23579)
    • Fixes multiple CHECK-failures in TensorByteSize (CVE-2022-23582)
    • Fixes multiple CHECK-failures in binary ops due to type confusion (CVE-2022-23583)

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.5.3

    This releases introduces several vulnerability fixes:

    • Fixes a floating point division by 0 when executing convolution operators (CVE-2022-21725)
    • Fixes a heap OOB read in shape inference for ReverseSequence (CVE-2022-21728)
    • Fixes a heap OOB access in Dequantize (CVE-2022-21726)
    • Fixes an integer overflow in shape inference for Dequantize (CVE-2022-21727)
    • Fixes a heap OOB access in FractionalAvgPoolGrad (CVE-2022-21730)
    • Fixes an overflow and divide by zero in UnravelIndex (CVE-2022-21729)
    • Fixes a type confusion in shape inference for ConcatV2 (CVE-2022-21731)
    • Fixes an OOM in ThreadPoolHandle (CVE-2022-21732)
    • Fixes an OOM due to integer overflow in StringNGrams (CVE-2022-21733)
    • Fixes more issues caused by incomplete validation in boosted trees code (CVE-2021-41208)
    • Fixes an integer overflows in most sparse component-wise ops (CVE-2022-23567)
    • Fixes an integer overflows in AddManySparseToTensorsMap (CVE-2022-23568)
    • Fixes a number of CHECK-failures in MapStage (CVE-2022-21734)
    • Fixes a division by zero in FractionalMaxPool (CVE-2022-21735)
    • Fixes a number of CHECK-fails when building invalid/overflowing tensor shapes (CVE-2022-23569)
    • Fixes an undefined behavior in SparseTensorSliceDataset (CVE-2022-21736)
    • Fixes an assertion failure based denial of service via faulty bin count operations (CVE-2022-21737)
    • Fixes a reference binding to null pointer in QuantizedMaxPool (CVE-2022-21739)
    • Fixes an integer overflow leading to crash in SparseCountSparseOutput (CVE-2022-21738)
    • Fixes a heap overflow in SparseCountSparseOutput (CVE-2022-21740)
    • Fixes an FPE in BiasAndClamp in TFLite (CVE-2022-23557)
    • Fixes an FPE in depthwise convolutions in TFLite (CVE-2022-21741)

    ... (truncated)

    Commits
    • 959e9b2 Merge pull request #54213 from tensorflow/fix-sanity-on-r2.5
    • d05fcbc Fix sanity build
    • f2526a0 Merge pull request #54205 from tensorflow/disable-flaky-tests-on-r2.5
    • a5f94df Disable flaky test
    • 7babe52 Merge pull request #54201 from tensorflow/cherrypick-510ae18200d0a4fad797c0bf...
    • 0e5d378 Set Env Variable to override Setuptools new behavior
    • fdd4195 Merge pull request #54176 from tensorflow-jenkins/relnotes-2.5.3-6805
    • 4083165 Update RELEASE.md
    • a2bb7f1 Merge pull request #54185 from tensorflow/cherrypick-d437dec4d549fc30f9b85c75...
    • 5777ea3 Update third_party/icu/workspace.bzl
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
  • Feeding data in that place?

    Feeding data in that place?

    I’m learning your code recently, but here I don’t know where to feed the data and train it.

    I sincerely hope that you can reply!

    `

    summary_writer = tf.summary.FileWriter(hp.logdir, sess.graph)
    sess.run(train_init_op)
    total_steps = hp.num_epochs * num_train_batches
    _gs = sess.run(global_step)
    for i in tqdm(range(_gs, total_steps+1)):
        _, _gs, _summary = sess.run([train_op, global_step, train_summaries])
        epoch = math.ceil(_gs / num_train_batches)
        summary_writer.add_summary(_summary, _gs)
    
        if _gs and _gs % num_train_batches == 0:
            logging.info("epoch {} is done".format(epoch))
            _loss = sess.run(loss) # train loss
    
            logging.info("# test evaluation")
            _, _eval_summaries = sess.run([eval_init_op, eval_summaries])
            summary_writer.add_summary(_eval_summaries, _gs)
    
            logging.info("# get hypotheses")
            hypotheses = get_hypotheses(num_eval_batches, num_eval_samples, sess, y_hat, m.idx2token)
    
            logging.info("# write results")
            model_output = "iwslt2016_E%02dL%.2f" % (epoch, _loss)
            if not os.path.exists(hp.evaldir): os.makedirs(hp.evaldir)
            translation = os.path.join(hp.evaldir, model_output)
            with open(translation, 'w') as fout:
                fout.write("\n".join(hypotheses))
    
            logging.info("# calc bleu score and append it to translation")
            calc_bleu(hp.eval3, translation)
    
            logging.info("# save models")
            ckpt_name = os.path.join(hp.logdir, model_output)
            saver.save(sess, ckpt_name, global_step=_gs)
            logging.info("after training of {} epochs, {} has been saved.".format(epoch, ckpt_name))
    
            logging.info("# fall back to train mode")
            sess.run(train_init_op)
    summary_writer.close()`
    
    opened by Single430 2
  • why normalization variables are trainable

    why normalization variables are trainable

    In function normalize(in modules.py), beta & gamma are set as variables. I don't know why should they be trainable. Couldn't I just use 0. & 1.?

    
    def normalize(inputs,
                  epsilon=1e-8,
                  scope="ln",
                  reuse=None):
        """Applies layer normalization.
    
        Args:
          inputs: A tensor with 2 or more dimensions, where the first dimension has
            `batch_size`.
          epsilon: A floating number. A very small number for preventing ZeroDivision Error.
          scope: Optional scope for `variable_scope`.
          reuse: Boolean, whether to reuse the weights of a previous layer
            by the same name.
    
        Returns:
          A tensor with the same shape and data dtype as `inputs`.
        """
        with tf.variable_scope(scope, reuse=reuse):
            inputs_shape = inputs.get_shape()
            params_shape = inputs_shape[-1:]
    
            mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)
            beta = tf.Variable(tf.zeros(params_shape))
            gamma = tf.Variable(tf.ones(params_shape))
            normalized = (inputs - mean) / ((variance + epsilon) ** (.5))
            outputs = gamma * normalized + beta
    
        return outputs
    
    
    opened by RayXu14 2
  • update tensorflow api to 1.8

    update tensorflow api to 1.8

    AttributeError: module 'tensorflow.contrib.linalg' has no attribute 'LinearOperatorTriL'
    

    new tensorflow api

    https://www.tensorflow.org/api_docs/python/tf/linalg/LinearOperatorLowerTriangular

    opened by xu-song 2
  • possible error in positional encoding computation

    possible error in positional encoding computation

    Hi, I was just looking through the positional encoding code, and I see this line: https://github.com/Kyubyong/transformer/blob/37febcef69f6e4757a853df342c5c5fd299039b5/modules.py#L148

    It looks wrong to me. Shouldn't it be something like the following?

     rad_block = tf.div(position_block, tf.pow(10000, tf.div(unit_block, num_units // 2))) 
    
    opened by greaber 2
  • Error for  positional encoding

    Error for positional encoding

    I am trying to run for sinusoid PE, but it throws the following error.

    File "train.py", line 51, in __init__ scope="enc_pe") File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 885, in binary_op_wrapper y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y") File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 836, in convert_to_tensor as_ref=False) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 926, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 774, in _TensorTensorConversionFunction (dtype.name, t.dtype.name, str(t))) ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float64: 'Tensor("encoder/enc_pe/embedding_lookup:0", shape=(32, 49, 512), dtype=float64)'

    opened by y12uc231 2
  • Bump tensorflow from 1.12.0 to 2.9.3

    Bump tensorflow from 1.12.0 to 2.9.3

    Bumps tensorflow from 1.12.0 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • about spm.SentencePieceTrainer.Train(train)

    about spm.SentencePieceTrainer.Train(train)

    when I running the prepro.py, it is terminating at spm.SentencePieceTrainer.Train(train).And I get no any information. Has anyone encountered this problem? how to deal with it?

    INFO:root:# Train a joint BPE model with sentencepiece

    Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

    opened by Joll123 1
  • About the query_mask

    About the query_mask

    Source Code:

        padding_num = -2 ** 32 + 1
        if type in ("k", "key", "keys"):
            key_masks = tf.to_float(key_masks)
            key_masks = tf.tile(key_masks, [tf.shape(inputs)[0] // tf.shape(key_masks)[0], 1]) # (h*N, seqlen)
            key_masks = tf.expand_dims(key_masks, 1)  # (h*N, 1, seqlen)
            outputs = inputs + key_masks * padding_num
    

    I think the outputs should be:

        padding_num = -2 ** 32 + 1
        if type in ("k", "key", "keys"):
            key_masks = tf.to_float(key_masks) # (N, T_k)
            key_masks = tf.tile(key_masks, [tf.shape(inputs)[0] // tf.shape(key_masks)[0], 1]) # (h*N, seqlen)
            key_masks = tf.tile(tf.expand_dims(key_masks, 1), [1, tf.shape(key_masks)[1], 1]) # (h*N, T_q, seqlen)
            paddings = tf.ones_like(key_masks) * padding_num
            outputs = tf.where(tf.equal(key_masks, 0), paddings, inputs)
    
    opened by bjuthjliu 0
  • tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,100] = 100 is not in [0, 100)

    tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,100] = 100 is not in [0, 100)

    When I run train.py, the code shows xx running to a 5% break and shows the following problem: tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,100] = 100 is not in [0, 100)

    Does anyone know how I can solve this problem?

    opened by keaiyang222 0
  • 多头问题的实现推导不太理解

    多头问题的实现推导不太理解

    1、为啥分成多头后又进行的concat?

    Split and concat

        Q_ = tf.concat(tf.split(Q, num_heads, axis=2), axis=0)  # (h*N, T_q, d_model/h)
        K_ = tf.concat(tf.split(K, num_heads, axis=2), axis=0)  # (h*N, T_k, d_model/h)
        V_ = tf.concat(tf.split(V, num_heads, axis=2), axis=0)  # (h*N, T_k, d_model/h)
    

    2、为啥下面的这个操作等价于多头的拼接 outputs = tf.concat(tf.split(outputs, num_heads, axis=0), axis=2) # (N, T_q, d_model)

    opened by frostjsy 0
Owner
Kyubyong Park
Lives in Seoul, Korea. Studied Linguistics at SNU and Univ. of Hawaii.
Kyubyong Park
NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

Artefact 114 Dec 15, 2022
Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

smaller-LaBSE LaBSE(Language-agnostic BERT Sentence Embedding) is a very good method to get sentence embeddings across languages. But it is hard to fi

Jeong Ukjae 13 Sep 2, 2022
Need: Image Search With Python

Need: Image Search The problem is that a user needs to search for a specific ima

Surya Komandooru 1 Dec 30, 2021
Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

T-TA (Transformer-based Text Auto-encoder) This repository contains codes for Transformer-based Text Auto-encoder (T-TA, paper: Fast and Accurate Deep

Jeong Ukjae 13 Dec 13, 2022
Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

ITTR - Pytorch Implementation of the Hybrid Perception Block (HPB) and Dual-Pruned Self-Attention (DPSA) block from the ITTR paper for Image to Image

Phil Wang 17 Dec 23, 2022
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

Phil Wang 364 Jan 6, 2023
Khandakar Muhtasim Ferdous Ruhan 1 Dec 30, 2021
DeepAmandine is an artificial intelligence that allows you to talk to it for hours, you won't know the difference.

DeepAmandine This is an artificial intelligence based on GPT-3 that you can chat with, it is very nice and makes a lot of jokes. We wish you a good ex

BuyWithCrypto 3 Apr 19, 2022
GSoC'2021 | TensorFlow implementation of Wav2Vec2

GSoC'2021 | TensorFlow implementation of Wav2Vec2

Vasudev Gupta 73 Nov 28, 2022
Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Ankur Dhuriya 10 Oct 13, 2022
A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

multitask-learning-transformers A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You

Shahrukh Khan 48 Jan 2, 2023
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
Intent parsing and slot filling in PyTorch with seq2seq + attention

PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars

Sean Robertson 159 Apr 4, 2022
multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search

multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search

hellonlp 30 Dec 12, 2022
[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

Yongming Rao 89 Dec 18, 2022
Modified GPT using average pooling to reduce the softmax attention memory constraints.

NLP-GPT-Upsampling This repository contains an implementation of Open AI's GPT Model. In particular, this implementation takes inspiration from the Ny

WD 1 Dec 3, 2021
null 1 Jun 28, 2022
HAIS_2GNN: 3D Visual Grounding with Graph and Attention

HAIS_2GNN: 3D Visual Grounding with Graph and Attention This repository is for the HAIS_2GNN research project. Tao Gu, Yue Chen Introduction The motiv

Yue Chen 1 Nov 26, 2022
End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Image captioning End-to-end image captioning with EfficientNet-b3 + LSTM with Attention Model is seq2seq model. In the encoder pretrained EfficientNet

null 2 Feb 10, 2022