Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

Kyubyong Park

Last update: Dec 26, 2022

Related tags

Overview

[UPDATED] A TensorFlow Implementation of Attention Is All You Need

When I opened this repository in 2017, there was no official code yet. I tried to implement the paper as I understood, but to no surprise it had several bugs. I realized them mostly thanks to people who issued here, so I'm very grateful to all of them. Though there is the official implementation as well as several other unofficial github repos, I decided to update my own one. This update focuses on:

readable / understandable code writing
modularization (but not too much)
revising known bugs. (masking, positional encoding, ...)
updating to TF1.12. (tf.data, ...)
adding some missing components (bpe, shared weight matrix, ...)
including useful comments in the code.

I still stick to IWSLT 2016 de-en. I guess if you'd like to test on a big data such as WMT, you would rely on the official implementation. After all, it's pleasant to check quickly if your model works. The initial code for TF1.2 is moved to the tf1.2_lecacy folder for the record.

Requirements

python==3.x (Let's move on to python 3 if you still use python 2)
tensorflow==1.12.0
numpy>=1.15.4
sentencepiece==0.1.8
tqdm>=4.28.1

Training

STEP 1. Run the command below to download IWSLT 2016 German–English parallel corpus.

bash download.sh

It should be extracted to iwslt2016/de-en folder automatically.

STEP 2. Run the command below to create preprocessed train/eval/test data.

python prepro.py

If you want to change the vocabulary size (default:32000), do this.

python prepro.py --vocab_size 8000

It should create two folders iwslt2016/prepro and iwslt2016/segmented.

STEP 3. Run the following command.

python train.py

Check hparams.py to see which parameters are possible. For example,

python train.py --logdir myLog --batch_size 256 --dropout_rate 0.5

STEP 3. Or download the pretrained models.

wget https://dl.dropbox.com/s/4lom1czy5xfzr4q/log.zip; unzip log.zip; rm log.zip

Training Loss Curve

Learning rate

Bleu score on devset

Inference (=test)

python test.py --ckpt log/1/iwslt2016_E19L2.64-29146 (OR yourCkptFile OR yourCkptFileDirectory)

Results

Typically, machine translation is evaluated with Bleu score.
All evaluation results are available in eval/1 and test/1.

tst2013 (dev)	tst2014 (test)
28.06	23.88

Notes

Beam decoding will be added soon.
I'm going to update the code when TF2.0 comes out if possible.

Comments

python test.py --ckpt log/1 --d_model=512 --d_ff=1024 Report an error

After I finished training, I ran the test.py file and reported the following error. I have not solved it, I hope I can get your help. @Kyubyong

` INFO:tensorflow:Restoring parameters from log/1/model_E04-L4.76-49080 INFO:root:# get hypotheses

Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[125,100] = 100 is not in [0, 100) [[{{node encoder/positional_encoding/embedding_lookup}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "test.py", line 54, in hypotheses = get_hypotheses(num_test_batches, num_test_samples, sess, y_hat, m.idx2token) File "/mnt/data/xxxx/transformer-master/utils.py", line 144, in get_hypotheses h = sess.run(tensor) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[125,100] = 100 is not in [0, 100) [[node encoder/positional_encoding/embedding_lookup (defined at /mnt/data/xxxx/transformer-master/modules.py:295) ]]

Caused by op 'encoder/positional_encoding/embedding_lookup', defined at: File "test.py", line 41, in y_hat, _ = m.eval(xs, ys) File "/mnt/data/xxxx/transformer-master/model.py", line 163, in eval memory, sents1 = self.encode(xs, False) File "/mnt/data/xxxx/transformer-master/model.py", line 50, in encode enc += positional_encoding(enc, self.hp.maxlen1) File "/mnt/data/xxxx/transformer-master/modules.py", line 295, in positional_encoding outputs = tf.nn.embedding_lookup(position_enc, position_ind) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/ops/embedding_ops.py", line 316, in embedding_lookup transform_fn=None) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/ops/embedding_ops.py", line 133, in _embedding_lookup_and_transform result = _clip(array_ops.gather(params[0], ids, name=name), File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper return target(*args, **kwargs) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 3273, in gather return gen_array_ops.gather_v2(params, indices, axis, name=name) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3748, in gather_v2 "GatherV2", params=params, indices=indices, axis=axis, name=name) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op op_def=op_def) File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in init self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[125,100] = 100 is not in [0, 100) [[node encoder/positional_encoding/embedding_lookup (defined at /mnt/data/xxxx /transformer-master/modules.py:295) ]] `

opened by Single430 13
does the key masking work?

Hi @Kyubyong as you can see the key masking code as following:

# Key Masking key_masks = tf.sign(tf.abs(tf.reduce_sum(keys, axis=-1))) # (N, T_k) key_masks = tf.tile(key_masks, [num_heads, 1]) # (h*N, T_k) key_masks = tf.tile(tf.expand_dims(key_masks, 1), [1, tf.shape(queries)[1], 1]) # (h*N, T_q, T_k)

the params keys，is the sum of word_embedding and position_embedding. it means that even the word in a sentence is padding 0, as add postion_embedding to the word_embedding, there's no 0 vector for the final word_embedding. therefore, the key_masks must all be one, no zero! so I'm confused if the code works?

opened by liuwei1206 6
training data was used to eval
in train.py line 77 _, _eval_summaries = sess.run([eval_init_op, eval_summaries]) this line will execute eval_summaries with train data, do:

sess.run(eval_init_op) _eval_summaries = sess.run(eval_summaries)

instead
opened by hihell 3
why "split" to get multi-head?
as the paper said or in some other implementation: self.w_qs = nn.Linear(d_model, n_head * d_k)
the data size is larger. but in this project, it is

Q_ = tf.concat(tf.split(Q, num_heads, axis=2), axis=0) # (h*N, T_q, C/h) K_ = tf.concat(tf.split(K, num_heads, axis=2), axis=0) # (h*N, T_k, C/h) V_ = tf.concat(tf.split(V, num_heads, axis=2), axis=0) # (h*N, T_k, C/h)

it's like using partial of Q/K/V to form one head. Can anyone help to explain why it uses "split" and "concat" to get multi-head?

Thanks!
opened by LifangD 3
Are the projection layers among multiple blocks shared?
Hi, I have a question about the codes.

# Linear projections Q = tf.layers.dense(queries, num_units, activation=tf.nn.relu) # (N, T_q, C) K = tf.layers.dense(keys, num_units, activation=tf.nn.relu) # (N, T_k, C) V = tf.layers.dense(keys, num_units, activation=tf.nn.relu) # (N, T_k, C)

Is there a mechanism that tied these three layers between multiple blocks? It seems their parameters are not shared between different blocks. What should i do to tie them?

Thanks!
opened by haoransh 3
Bump tensorflow from 1.12.0 to 2.5.3
Bumps tensorflow from 1.12.0 to 2.5.3.

Release notes

Sourced from tensorflow's releases.

TensorFlow 2.5.3

Release 2.5.3

Note: This is the last release in the 2.5 series.

This releases introduces several vulnerability fixes:

Fixes a floating point division by 0 when executing convolution operators (CVE-2022-21725)

Fixes a heap OOB read in shape inference for ReverseSequence (CVE-2022-21728)

Fixes a heap OOB access in Dequantize (CVE-2022-21726)

Fixes an integer overflow in shape inference for Dequantize (CVE-2022-21727)

Fixes a heap OOB access in FractionalAvgPoolGrad (CVE-2022-21730)

Fixes an overflow and divide by zero in UnravelIndex (CVE-2022-21729)

Fixes a type confusion in shape inference for ConcatV2 (CVE-2022-21731)

Fixes an OOM in ThreadPoolHandle (CVE-2022-21732)

Fixes an OOM due to integer overflow in StringNGrams (CVE-2022-21733)

Fixes more issues caused by incomplete validation in boosted trees code (CVE-2021-41208)

Fixes an integer overflows in most sparse component-wise ops (CVE-2022-23567)

Fixes an integer overflows in AddManySparseToTensorsMap (CVE-2022-23568)

Fixes a number of CHECK-failures in MapStage (CVE-2022-21734)

Fixes a division by zero in FractionalMaxPool (CVE-2022-21735)

Fixes a number of CHECK-fails when building invalid/overflowing tensor shapes (CVE-2022-23569)

Fixes an undefined behavior in SparseTensorSliceDataset (CVE-2022-21736)

Fixes an assertion failure based denial of service via faulty bin count operations (CVE-2022-21737)

Fixes a reference binding to null pointer in QuantizedMaxPool (CVE-2022-21739)

Fixes an integer overflow leading to crash in SparseCountSparseOutput (CVE-2022-21738)

Fixes a heap overflow in SparseCountSparseOutput (CVE-2022-21740)

Fixes an FPE in BiasAndClamp in TFLite (CVE-2022-23557)

Fixes an FPE in depthwise convolutions in TFLite (CVE-2022-21741)

Fixes an integer overflow in TFLite array creation (CVE-2022-23558)

Fixes an integer overflow in TFLite (CVE-2022-23559)

Fixes a dangerous OOB write in TFLite (CVE-2022-23561)

Fixes a vulnerability leading to read and write outside of bounds in TFLite (CVE-2022-23560)

Fixes a set of vulnerabilities caused by using insecure temporary files (CVE-2022-23563)

Fixes an integer overflow in Range resulting in undefined behavior and OOM (CVE-2022-23562)

Fixes a vulnerability where missing validation causes tf.sparse.split to crash when axis is a tuple (CVE-2021-41206)

Fixes a CHECK-fail when decoding resource handles from proto (CVE-2022-23564)

Fixes a CHECK-fail with repeated AttrDef (CVE-2022-23565)

Fixes a heap OOB write in Grappler (CVE-2022-23566)

Fixes a CHECK-fail when decoding invalid tensors from proto (CVE-2022-23571)

Fixes an unitialized variable access in AssignOp (CVE-2022-23573)

Fixes an integer overflow in OpLevelCostEstimator::CalculateTensorSize (CVE-2022-23575)

Fixes an integer overflow in OpLevelCostEstimator::CalculateOutputSize (CVE-2022-23576)

Fixes a null dereference in GetInitOp (CVE-2022-23577)

Fixes a memory leak when a graph node is invalid (CVE-2022-23578)

Fixes an abort caused by allocating a vector that is too large (CVE-2022-23580)

Fixes multiple CHECK-failures during Grappler's IsSimplifiableReshape (CVE-2022-23581)

Fixes multiple CHECK-failures during Grappler's SafeToRemoveIdentity (CVE-2022-23579)

Fixes multiple CHECK-failures in TensorByteSize (CVE-2022-23582)

Fixes multiple CHECK-failures in binary ops due to type confusion (CVE-2022-23583)

... (truncated)

Changelog

Sourced from tensorflow's changelog.

Release 2.5.3

This releases introduces several vulnerability fixes:

Fixes a floating point division by 0 when executing convolution operators (CVE-2022-21725)

Fixes a heap OOB read in shape inference for ReverseSequence (CVE-2022-21728)

Fixes a heap OOB access in Dequantize (CVE-2022-21726)

Fixes an integer overflow in shape inference for Dequantize (CVE-2022-21727)

Fixes a heap OOB access in FractionalAvgPoolGrad (CVE-2022-21730)

Fixes an overflow and divide by zero in UnravelIndex (CVE-2022-21729)

Fixes a type confusion in shape inference for ConcatV2 (CVE-2022-21731)

Fixes an OOM in ThreadPoolHandle (CVE-2022-21732)

Fixes an OOM due to integer overflow in StringNGrams (CVE-2022-21733)

Fixes more issues caused by incomplete validation in boosted trees code (CVE-2021-41208)

Fixes an integer overflows in most sparse component-wise ops (CVE-2022-23567)

Fixes an integer overflows in AddManySparseToTensorsMap (CVE-2022-23568)

Fixes a number of CHECK-failures in MapStage (CVE-2022-21734)

Fixes a division by zero in FractionalMaxPool (CVE-2022-21735)

Fixes a number of CHECK-fails when building invalid/overflowing tensor shapes (CVE-2022-23569)

Fixes an undefined behavior in SparseTensorSliceDataset (CVE-2022-21736)

Fixes an assertion failure based denial of service via faulty bin count operations (CVE-2022-21737)

Fixes a reference binding to null pointer in QuantizedMaxPool (CVE-2022-21739)

Fixes an integer overflow leading to crash in SparseCountSparseOutput (CVE-2022-21738)

Fixes a heap overflow in SparseCountSparseOutput (CVE-2022-21740)

Fixes an FPE in BiasAndClamp in TFLite (CVE-2022-23557)

Fixes an FPE in depthwise convolutions in TFLite (CVE-2022-21741)

... (truncated)

Commits

959e9b2 Merge pull request #54213 from tensorflow/fix-sanity-on-r2.5

d05fcbc Fix sanity build

f2526a0 Merge pull request #54205 from tensorflow/disable-flaky-tests-on-r2.5

a5f94df Disable flaky test

7babe52 Merge pull request #54201 from tensorflow/cherrypick-510ae18200d0a4fad797c0bf...

0e5d378 Set Env Variable to override Setuptools new behavior

fdd4195 Merge pull request #54176 from tensorflow-jenkins/relnotes-2.5.3-6805

4083165 Update RELEASE.md

a2bb7f1 Merge pull request #54185 from tensorflow/cherrypick-d437dec4d549fc30f9b85c75...

5777ea3 Update third_party/icu/workspace.bzl

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 2

Feeding data in that place?

I’m learning your code recently, but here I don’t know where to feed the data and train it.

I sincerely hope that you can reply！

summary_writer = tf.summary.FileWriter(hp.logdir, sess.graph)
sess.run(train_init_op)
total_steps = hp.num_epochs * num_train_batches
_gs = sess.run(global_step)
for i in tqdm(range(_gs, total_steps+1)):
    _, _gs, _summary = sess.run([train_op, global_step, train_summaries])
    epoch = math.ceil(_gs / num_train_batches)
    summary_writer.add_summary(_summary, _gs)

    if _gs and _gs % num_train_batches == 0:
        logging.info("epoch {} is done".format(epoch))
        _loss = sess.run(loss) # train loss

        logging.info("# test evaluation")
        _, _eval_summaries = sess.run([eval_init_op, eval_summaries])
        summary_writer.add_summary(_eval_summaries, _gs)

        logging.info("# get hypotheses")
        hypotheses = get_hypotheses(num_eval_batches, num_eval_samples, sess, y_hat, m.idx2token)

        logging.info("# write results")
        model_output = "iwslt2016_E%02dL%.2f" % (epoch, _loss)
        if not os.path.exists(hp.evaldir): os.makedirs(hp.evaldir)
        translation = os.path.join(hp.evaldir, model_output)
        with open(translation, 'w') as fout:
            fout.write("\n".join(hypotheses))

        logging.info("# calc bleu score and append it to translation")
        calc_bleu(hp.eval3, translation)

        logging.info("# save models")
        ckpt_name = os.path.join(hp.logdir, model_output)
        saver.save(sess, ckpt_name, global_step=_gs)
        logging.info("after training of {} epochs, {} has been saved.".format(epoch, ckpt_name))

        logging.info("# fall back to train mode")
        sess.run(train_init_op)
summary_writer.close()`

opened by Single430 2

why normalization variables are trainable

In function normalize(in modules.py), beta & gamma are set as variables. I don't know why should they be trainable. Couldn't I just use 0. & 1.?


def normalize(inputs,
              epsilon=1e-8,
              scope="ln",
              reuse=None):
    """Applies layer normalization.

    Args:
      inputs: A tensor with 2 or more dimensions, where the first dimension has
        `batch_size`.
      epsilon: A floating number. A very small number for preventing ZeroDivision Error.
      scope: Optional scope for `variable_scope`.
      reuse: Boolean, whether to reuse the weights of a previous layer
        by the same name.

    Returns:
      A tensor with the same shape and data dtype as `inputs`.
    """
    with tf.variable_scope(scope, reuse=reuse):
        inputs_shape = inputs.get_shape()
        params_shape = inputs_shape[-1:]

        mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)
        beta = tf.Variable(tf.zeros(params_shape))
        gamma = tf.Variable(tf.ones(params_shape))
        normalized = (inputs - mean) / ((variance + epsilon) ** (.5))
        outputs = gamma * normalized + beta

    return outputs

opened by RayXu14 2

update tensorflow api to 1.8
AttributeError: module 'tensorflow.contrib.linalg' has no attribute 'LinearOperatorTriL'

new tensorflow api

https://www.tensorflow.org/api_docs/python/tf/linalg/LinearOperatorLowerTriangular
opened by xu-song 2
possible error in positional encoding computation
Hi, I was just looking through the positional encoding code, and I see this line: https://github.com/Kyubyong/transformer/blob/37febcef69f6e4757a853df342c5c5fd299039b5/modules.py#L148

It looks wrong to me. Shouldn't it be something like the following?

rad_block = tf.div(position_block, tf.pow(10000, tf.div(unit_block, num_units // 2)))
opened by greaber 2
Error for positional encoding

I am trying to run for sinusoid PE, but it throws the following error.

File "train.py", line 51, in __init__ scope="enc_pe") File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 885, in binary_op_wrapper y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y") File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 836, in convert_to_tensor as_ref=False) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 926, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 774, in _TensorTensorConversionFunction (dtype.name, t.dtype.name, str(t))) ValueError: Tensor conversion requested dtype float32 for Tensor with dtype float64: 'Tensor("encoder/enc_pe/embedding_lookup:0", shape=(32, 49, 512), dtype=float64)'

opened by y12uc231 2
Bump tensorflow from 1.12.0 to 2.9.3
Bumps tensorflow from 1.12.0 to 2.9.3.

Release notes

Sourced from tensorflow's releases.

TensorFlow 2.9.3

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

TensorFlow 2.9.2

Release 2.9.2

This releases introduces several vulnerability fixes:

Fixes a CHECK failure in tf.reshape caused by overflows (CVE-2022-35934)

Fixes a CHECK failure in SobolSample caused by missing validation (CVE-2022-35935)

Fixes an OOB read in Gather_nd op in TF Lite (CVE-2022-35937)

Fixes a CHECK failure in TensorListReserve caused by missing validation (CVE-2022-35960)

Fixes an OOB write in Scatter_nd op in TF Lite (CVE-2022-35939)

Fixes an integer overflow in RaggedRangeOp (CVE-2022-35940)

Fixes a CHECK failure in AvgPoolOp (CVE-2022-35941)

Fixes a CHECK failures in UnbatchGradOp (CVE-2022-35952)

Fixes a segfault TFLite converter on per-channel quantized transposed convolutions (CVE-2022-36027)

Fixes a CHECK failures in AvgPool3DGrad (CVE-2022-35959)

Fixes a CHECK failures in FractionalAvgPoolGrad (CVE-2022-35963)

Fixes a segfault in BlockLSTMGradV2 (CVE-2022-35964)

Fixes a segfault in LowerBound and UpperBound (CVE-2022-35965)

... (truncated)

Changelog

Sourced from tensorflow's changelog.

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

Release 2.8.4

This release introduces several vulnerability fixes:

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

... (truncated)

Commits

a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2

258f9a1 Update py_func.cc

cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474

3e75385 Update version numbers to 2.9.3

bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695

3506c90 Update RELEASE.md

8dcb48e Update RELEASE.md

4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...

6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple

5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
about spm.SentencePieceTrainer.Train(train)

when I running the prepro.py, it is terminating at spm.SentencePieceTrainer.Train(train).And I get no any information. Has anyone encountered this problem? how to deal with it?

INFO:root:# Train a joint BPE model with sentencepiece

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

opened by Joll123 1

About the query_mask

Source Code:

    padding_num = -2 ** 32 + 1
    if type in ("k", "key", "keys"):
        key_masks = tf.to_float(key_masks)
        key_masks = tf.tile(key_masks, [tf.shape(inputs)[0] // tf.shape(key_masks)[0], 1]) # (h*N, seqlen)
        key_masks = tf.expand_dims(key_masks, 1)  # (h*N, 1, seqlen)
        outputs = inputs + key_masks * padding_num

I think the outputs should be:

    padding_num = -2 ** 32 + 1
    if type in ("k", "key", "keys"):
        key_masks = tf.to_float(key_masks) # (N, T_k)
        key_masks = tf.tile(key_masks, [tf.shape(inputs)[0] // tf.shape(key_masks)[0], 1]) # (h*N, seqlen)
        key_masks = tf.tile(tf.expand_dims(key_masks, 1), [1, tf.shape(key_masks)[1], 1]) # (h*N, T_q, seqlen)
        paddings = tf.ones_like(key_masks) * padding_num
        outputs = tf.where(tf.equal(key_masks, 0), paddings, inputs)

opened by bjuthjliu 0

tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,100] = 100 is not in [0, 100)

When I run train.py, the code shows xx running to a 5% break and shows the following problem: tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,100] = 100 is not in [0, 100)

Does anyone know how I can solve this problem?

opened by keaiyang222 0
多头问题的实现推导不太理解
1、为啥分成多头后又进行的concat？

Split and concat

Q_ = tf.concat(tf.split(Q, num_heads, axis=2), axis=0) # (h*N, T_q, d_model/h) K_ = tf.concat(tf.split(K, num_heads, axis=2), axis=0) # (h*N, T_k, d_model/h) V_ = tf.concat(tf.split(V, num_heads, axis=2), axis=0) # (h*N, T_k, d_model/h)

2、为啥下面的这个操作等价于多头的拼接 outputs = tf.concat(tf.split(outputs, num_heads, axis=0), axis=2) # (N, T_q, d_model)
opened by frostjsy 0

Owner

Kyubyong Park

Lives in Seoul, Korea. Studied Linguistics at SNU and Univ. of Hawaii.

GitHub

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

114 Dec 15, 2022

Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

smaller-LaBSE LaBSE(Language-agnostic BERT Sentence Embedding) is a very good method to get sentence embeddings across languages. But it is hard to fi

13 Sep 2, 2022

Need: Image Search With Python

Need: Image Search The problem is that a user needs to search for a specific ima

1 Dec 30, 2021

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

T-TA (Transformer-based Text Auto-encoder) This repository contains codes for Transformer-based Text Auto-encoder (T-TA, paper: Fast and Accurate Deep

13 Dec 13, 2022

Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

ITTR - Pytorch Implementation of the Hybrid Perception Block (HPB) and Dual-Pruned Self-Attention (DPSA) block from the ITTR paper for Image to Image

17 Dec 23, 2022

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

364 Jan 6, 2023

Arabic-Phonetic-Output - You can input the phonetic version of any Arabic text here. This software will show you output in Arabic (with vowels)

Arabic-Phonetic-Output You can input the phonetic version of any Arabic text her

1 Dec 30, 2021

DeepAmandine is an artificial intelligence that allows you to talk to it for hours, you won't know the difference.

DeepAmandine This is an artificial intelligence based on GPT-3 that you can chat with, it is very nice and makes a lot of jokes. We wish you a good ex

3 Apr 19, 2022

GSoC'2021 | TensorFlow implementation of Wav2Vec2

73 Nov 28, 2022

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

10 Oct 13, 2022

A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

multitask-learning-transformers A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You

48 Jan 2, 2023

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

490 Dec 15, 2022

Intent parsing and slot filling in PyTorch with seq2seq + attention

PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars

159 Apr 4, 2022

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

30 Dec 12, 2022

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

89 Dec 18, 2022

Modified GPT using average pooling to reduce the softmax attention memory constraints.

NLP-GPT-Upsampling This repository contains an implementation of Open AI's GPT Model. In particular, this implementation takes inspiration from the Ny

1 Dec 3, 2021

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

Seq2seq_attn Use the Seq2Seq method to implement machine translation and use the

1 Jun 28, 2022

HAIS_2GNN: 3D Visual Grounding with Graph and Attention

HAIS_2GNN: 3D Visual Grounding with Graph and Attention This repository is for the HAIS_2GNN research project. Tao Gu, Yue Chen Introduction The motiv

1 Nov 26, 2022

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Image captioning End-to-end image captioning with EfficientNet-b3 + LSTM with Attention Model is seq2seq model. In the encoder pretrained EfficientNet

2 Feb 10, 2022

Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

Related tags

Overview

[UPDATED] A TensorFlow Implementation of Attention Is All You Need

Requirements

Training

Training Loss Curve

Learning rate

Bleu score on devset

Inference (=test)

Results

Notes

Comments

TensorFlow 2.5.3

Release 2.5.3

Release 2.5.3

new tensorflow api

TensorFlow 2.9.3

Release 2.9.3

TensorFlow 2.9.2

Release 2.9.2

Release 2.9.3

Release 2.8.4

Split and concat

Owner

Kyubyong Park

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

Need: Image Search With Python

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Arabic-Phonetic-Output - You can input the phonetic version of any Arabic text here. This software will show you output in Arabic (with vowels)

DeepAmandine is an artificial intelligence that allows you to talk to it for hours, you won't know the difference.

GSoC'2021 | TensorFlow implementation of Wav2Vec2

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

Intent parsing and slot filling in PyTorch with seq2seq + attention

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Modified GPT using average pooling to reduce the softmax attention memory constraints.

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

HAIS_2GNN: 3D Visual Grounding with Graph and Attention

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention