ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Google Research

Last update: Dec 26, 2022

Related tags

Text Data & NLP albert

Overview

ALBERT

***************New March 28, 2020 ***************

Add a colab tutorial to run fine-tuning for GLUE datasets.

***************New January 7, 2020 ***************

v2 TF-Hub models should be working now with TF 1.15, as we removed the native Einsum op from the graph. See updated TF-Hub links below.

***************New December 30, 2019 ***************

Chinese models are released. We would like to thank CLUE team for providing the training data.

Version 2 of ALBERT models is released.

Base: [Tar file] [TF-Hub]
Large: [Tar file] [TF-Hub]
Xlarge: [Tar file] [TF-Hub]
Xxlarge: [Tar file] [TF-Hub]

In this version, we apply 'no dropout', 'additional training data' and 'long training time' strategies to all models. We train ALBERT-base for 10M steps and other models for 3M steps.

The result comparison to the v1 models is as followings:

	Average	SQuAD1.1	SQuAD2.0	MNLI	SST-2	RACE
V2
ALBERT-base	82.3	90.2/83.2	82.1/79.3	84.6	92.9	66.8
ALBERT-large	85.7	91.8/85.2	84.9/81.8	86.5	94.9	75.2
ALBERT-xlarge	87.9	92.9/86.4	87.9/84.1	87.9	95.4	80.7
ALBERT-xxlarge	90.9	94.6/89.1	89.8/86.9	90.6	96.8	86.8
V1
ALBERT-base	80.1	89.3/82.3	80.0/77.1	81.6	90.3	64.0
ALBERT-large	82.4	90.6/83.9	82.3/79.4	83.5	91.7	68.5
ALBERT-xlarge	85.5	92.5/86.1	86.1/83.1	86.4	92.4	74.8
ALBERT-xxlarge	91.0	94.8/89.3	90.2/87.4	90.8	96.9	86.5

The comparison shows that for ALBERT-base, ALBERT-large, and ALBERT-xlarge, v2 is much better than v1, indicating the importance of applying the above three strategies. On average, ALBERT-xxlarge is slightly worse than the v1, because of the following two reasons: 1) Training additional 1.5 M steps (the only difference between these two models is training for 1.5M steps and 3M steps) did not lead to significant performance improvement. 2) For v1, we did a little bit hyperparameter search among the parameters sets given by BERT, Roberta, and XLnet. For v2, we simply adopt the parameters from v1 except for RACE, where we use a learning rate of 1e-5 and 0 ALBERT DR (dropout rate for ALBERT in finetuning). The original (v1) RACE hyperparameter will cause model divergence for v2 models. Given that the downstream tasks are sensitive to the fine-tuning hyperparameters, we should be careful about so called slight improvements.

ALBERT is "A Lite" version of BERT, a popular unsupervised language representation learning algorithm. ALBERT uses parameter-reduction techniques that allow for large-scale configurations, overcome previous memory limitations, and achieve better behavior with respect to model degradation.

For a technical description of the algorithm, see our paper:

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

Release Notes

Initial release: 10/9/2019

Results

Performance of ALBERT on GLUE benchmark results using a single-model setup on dev:

Models	MNLI	QNLI	QQP	RTE	SST	MRPC	CoLA	STS
BERT-large	86.6	92.3	91.3	70.4	93.2	88.0	60.6	90.0
XLNet-large	89.8	93.9	91.8	83.8	95.6	89.2	63.6	91.8
RoBERTa-large	90.2	94.7	92.2	86.6	96.4	90.9	68.0	92.4
ALBERT (1M)	90.4	95.2	92.0	88.1	96.8	90.2	68.7	92.7
ALBERT (1.5M)	90.8	95.3	92.2	89.2	96.9	90.9	71.4	93.0

Performance of ALBERT-xxl on SQuaD and RACE benchmarks using a single-model setup:

Models	SQuAD1.1 dev	SQuAD2.0 dev	SQuAD2.0 test	RACE test (Middle/High)
BERT-large	90.9/84.1	81.8/79.0	89.1/86.3	72.0 (76.6/70.1)
XLNet	94.5/89.0	88.8/86.1	89.1/86.3	81.8 (85.5/80.2)
RoBERTa	94.6/88.9	89.4/86.5	89.8/86.8	83.2 (86.5/81.3)
UPM	-	-	89.9/87.2	-
XLNet + SG-Net Verifier++	-	-	90.1/87.2	-
ALBERT (1M)	94.8/89.2	89.9/87.2	-	86.0 (88.2/85.1)
ALBERT (1.5M)	94.8/89.3	90.2/87.4	90.9/88.1	86.5 (89.0/85.5)

Pre-trained Models

TF-Hub modules are available:

Base: [Tar file] [TF-Hub]
Large: [Tar file] [TF-Hub]
Xlarge: [Tar file] [TF-Hub]
Xxlarge: [Tar file] [TF-Hub]

Example usage of the TF-Hub module in code:

tags = set()
if is_training:
  tags.add("train")
albert_module = hub.Module("https://tfhub.dev/google/albert_base/1", tags=tags,
                           trainable=True)
albert_inputs = dict(
    input_ids=input_ids,
    input_mask=input_mask,
    segment_ids=segment_ids)
albert_outputs = albert_module(
    inputs=albert_inputs,
    signature="tokens",
    as_dict=True)

# If you want to use the token-level output, use
# albert_outputs["sequence_output"] instead.
output_layer = albert_outputs["pooled_output"]

Most of the fine-tuning scripts in this repository support TF-hub modules via the --albert_hub_module_handle flag.

Pre-training Instructions

To pretrain ALBERT, use run_pretraining.py:

pip install -r albert/requirements.txt
python -m albert.run_pretraining \
    --input_file=... \
    --output_dir=... \
    --init_checkpoint=... \
    --albert_config_file=... \
    --do_train \
    --do_eval \
    --train_batch_size=4096 \
    --eval_batch_size=64 \
    --max_seq_length=512 \
    --max_predictions_per_seq=20 \
    --optimizer='lamb' \
    --learning_rate=.00176 \
    --num_train_steps=125000 \
    --num_warmup_steps=3125 \
    --save_checkpoints_steps=5000

Fine-tuning on GLUE

To fine-tune and evaluate a pretrained ALBERT on GLUE, please see the convenience script run_glue.sh.

Lower-level use cases may want to use the run_classifier.py script directly. The run_classifier.py script is used both for fine-tuning and evaluation of ALBERT on individual GLUE benchmark tasks, such as MNLI:

pip install -r albert/requirements.txt
python -m albert.run_classifier \
  --data_dir=... \
  --output_dir=... \
  --init_checkpoint=... \
  --albert_config_file=... \
  --spm_model_file=... \
  --do_train \
  --do_eval \
  --do_predict \
  --do_lower_case \
  --max_seq_length=128 \
  --optimizer=adamw \
  --task_name=MNLI \
  --warmup_step=1000 \
  --learning_rate=3e-5 \
  --train_step=10000 \
  --save_checkpoints_steps=100 \
  --train_batch_size=128

Good default flag values for each GLUE task can be found in run_glue.sh.

You can fine-tune the model starting from TF-Hub modules instead of raw checkpoints by setting e.g. --albert_hub_module_handle=https://tfhub.dev/google/albert_base/1 instead of --init_checkpoint.

You can find the spm_model_file in the tar files or under the assets folder of the tf-hub module. The name of the model file is "30k-clean.model".

After evaluation, the script should report some output like this:

***** Eval results *****
  global_step = ...
  loss = ...
  masked_lm_accuracy = ...
  masked_lm_loss = ...
  sentence_order_accuracy = ...
  sentence_order_loss = ...

Fine-tuning on SQuAD

To fine-tune and evaluate a pretrained model on SQuAD v1, use the run_squad_v1.py script:

pip install -r albert/requirements.txt
python -m albert.run_squad_v1 \
  --albert_config_file=... \
  --output_dir=... \
  --train_file=... \
  --predict_file=... \
  --train_feature_file=... \
  --predict_feature_file=... \
  --predict_feature_left_file=... \
  --init_checkpoint=... \
  --spm_model_file=... \
  --do_lower_case \
  --max_seq_length=384 \
  --doc_stride=128 \
  --max_query_length=64 \
  --do_train=true \
  --do_predict=true \
  --train_batch_size=48 \
  --predict_batch_size=8 \
  --learning_rate=5e-5 \
  --num_train_epochs=2.0 \
  --warmup_proportion=.1 \
  --save_checkpoints_steps=5000 \
  --n_best_size=20 \
  --max_answer_length=30

You can fine-tune the model starting from TF-Hub modules instead of raw checkpoints by setting e.g. --albert_hub_module_handle=https://tfhub.dev/google/albert_base/1 instead of --init_checkpoint.

For SQuAD v2, use the run_squad_v2.py script:

pip install -r albert/requirements.txt
python -m albert.run_squad_v2 \
  --albert_config_file=... \
  --output_dir=... \
  --train_file=... \
  --predict_file=... \
  --train_feature_file=... \
  --predict_feature_file=... \
  --predict_feature_left_file=... \
  --init_checkpoint=... \
  --spm_model_file=... \
  --do_lower_case \
  --max_seq_length=384 \
  --doc_stride=128 \
  --max_query_length=64 \
  --do_train \
  --do_predict \
  --train_batch_size=48 \
  --predict_batch_size=8 \
  --learning_rate=5e-5 \
  --num_train_epochs=2.0 \
  --warmup_proportion=.1 \
  --save_checkpoints_steps=5000 \
  --n_best_size=20 \
  --max_answer_length=30

You can fine-tune the model starting from TF-Hub modules instead of raw checkpoints by setting e.g. --albert_hub_module_handle=https://tfhub.dev/google/albert_base/1 instead of --init_checkpoint.

Fine-tuning on RACE

For RACE, use the run_race.py script:

pip install -r albert/requirements.txt
python -m albert.run_race \
  --albert_config_file=... \
  --output_dir=... \
  --train_file=... \
  --eval_file=... \
  --data_dir=...\
  --init_checkpoint=... \
  --spm_model_file=... \
  --max_seq_length=512 \
  --max_qa_length=128 \
  --do_train \
  --do_eval \
  --train_batch_size=32 \
  --eval_batch_size=8 \
  --learning_rate=1e-5 \
  --train_step=12000 \
  --warmup_step=1000 \
  --save_checkpoints_steps=100

You can fine-tune the model starting from TF-Hub modules instead of raw checkpoints by setting e.g. --albert_hub_module_handle=https://tfhub.dev/google/albert_base/1 instead of --init_checkpoint.

SentencePiece

Command for generating the sentence piece vocabulary:

spm_train \
--input all.txt --model_prefix=30k-clean --vocab_size=30000 --logtostderr
--pad_id=0 --unk_id=1 --eos_id=-1 --bos_id=-1
--control_symbols=[CLS],[SEP],[MASK]
--user_defined_symbols="(,),\",-,.,–,£,€"
--shuffle_input_sentence=true --input_sentence_size=10000000
--character_coverage=0.99995 --model_type=unigram

Comments

LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum)

I am using run_classifier_with_tfhub with --albert_hub_module_handle=https://tfhub.dev/google/albert_base/2.

I am getting error like "LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum)"

The argument is: python3 -m run_classifier_with_tfhub --data_dir=../../DataSet/CoLA/ --t ask_name=cola --output_dir=testing_ttt --vocab_file=vocab.txt --albert_hub_module_handle=https://tfhub.dev/google/albert_base/2 --do_train=True --do_eval=True --max_seq _length=128 --train_batch_size=32 --learning_rate=2e-05 --num_train_epochs=3.0

I am using tensorflow==1.15.0

opened by MichaelCaohn 22
No decreasing loss when pre-train for xxlarge
Hi, I'm pre-training xxlarge model using own language. I trained on TPU-v2-256 but loss is not decreasing. Below is the learning information.

vocab size: 33001

training data size: 518G ( dupe factor: 10)

max_seq_length: 512

3 gram masking, using SOP

word size: 5 B

batch size: 512

optimizer: lamb

learning rate: 0.00176

I1211 08:56:02.464132 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:<DatasetV1Adapter shapes: {input_ids: (2, 512), input_mask: (2, 512), masked_lm_ids: (2, 77), masked_lm_positions: (2, 77), masked_lm_weights: (2, 77), next_sentence_labels: (2, 1), segment_ids: (2, 512)}, types: {input_ids: tf.int32, input_mask: tf.int32, masked_lm_ids: tf.int32, masked_lm_positions: tf.int32, masked_lm_weights: tf.float32, next_sentence_labels: tf.int32, segment_ids: tf.int32}> I1211 08:56:02.510196 140024623753024 run_pretraining.py:457] <DatasetV1Adapter shapes: {input_ids: (2, 512), input_mask: (2, 512), masked_lm_ids: (2, 77), masked_lm_positions: (2, 77), masked_lm_weights: (2, 77), next_sentence_labels: (2, 1), segment_ids: (2, 512)}, types: {input_ids: tf.int32, input_mask: tf.int32, masked_lm_ids: tf.int32, masked_lm_positions: tf.int32, masked_lm_weights: tf.float32, next_sentence_labels: tf.int32, segment_ids: tf.int32}> INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.523885 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.526081 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.527927 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.529864 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.531889 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.533753 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.535558 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.537545 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] 2019-12-11 08:56:02.673414: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2019-12-11 08:56:02.673472: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303) 2019-12-11 08:56:02.673496: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (instance-2): /proc/driver/nvidia/version does not exist INFO:tensorflow:*** Features *** I1211 08:56:02.704437 140024623753024 run_pretraining.py:150] *** Features *** INFO:tensorflow: name = input_ids, shape = (2, 512) I1211 08:56:02.704912 140024623753024 run_pretraining.py:152] name = input_ids, shape = (2, 512) INFO:tensorflow: name = input_mask, shape = (2, 512) I1211 08:56:02.705025 140024623753024 run_pretraining.py:152] name = input_mask, shape = (2, 512) INFO:tensorflow: name = masked_lm_ids, shape = (2, 77) I1211 08:56:02.705091 140024623753024 run_pretraining.py:152] name = masked_lm_ids, shape = (2, 77) INFO:tensorflow: name = masked_lm_positions, shape = (2, 77) I1211 08:56:02.705152 140024623753024 run_pretraining.py:152] name = masked_lm_positions, shape = (2, 77) INFO:tensorflow: name = masked_lm_weights, shape = (2, 77) I1211 08:56:02.705220 140024623753024 run_pretraining.py:152] name = masked_lm_weights, shape = (2, 77) INFO:tensorflow: name = next_sentence_labels, shape = (2, 1) I1211 08:56:02.705290 140024623753024 run_pretraining.py:152] name = next_sentence_labels, shape = (2, 1) INFO:tensorflow: name = segment_ids, shape = (2, 512) I1211 08:56:02.705374 140024623753024 run_pretraining.py:152] name = segment_ids, shape = (2, 512)

INFO:tensorflow:**** Trainable Variables **** I1211 08:56:04.239879 140024623753024 run_pretraining.py:220] **** Trainable Variables **** INFO:tensorflow: name = bert/embeddings/word_embeddings:0, shape = (33001, 128) I1211 08:56:04.239998 140024623753024 run_pretraining.py:226] name = bert/embeddings/word_embeddings:0, shape = (33001, 128) INFO:tensorflow: name = bert/embeddings/token_type_embeddings:0, shape = (2, 128) I1211 08:56:04.240141 140024623753024 run_pretraining.py:226] name = bert/embeddings/token_type_embeddings:0, shape = (2, 128) INFO:tensorflow: name = bert/embeddings/position_embeddings:0, shape = (512, 128) I1211 08:56:04.240252 140024623753024 run_pretraining.py:226] name = bert/embeddings/position_embeddings:0, shape = (512, 128) INFO:tensorflow: name = bert/embeddings/LayerNorm/beta:0, shape = (128,) I1211 08:56:04.240369 140024623753024 run_pretraining.py:226] name = bert/embeddings/LayerNorm/beta:0, shape = (128,) INFO:tensorflow: name = bert/embeddings/LayerNorm/gamma:0, shape = (128,) I1211 08:56:04.240468 140024623753024 run_pretraining.py:226] name = bert/embeddings/LayerNorm/gamma:0, shape = (128,) INFO:tensorflow: name = bert/encoder/embedding_hidden_mapping_in/kernel:0, shape = (128, 4096) I1211 08:56:04.240564 140024623753024 run_pretraining.py:226] name = bert/encoder/embedding_hidden_mapping_in/kernel:0, shape = (128, 4096) INFO:tensorflow: name = bert/encoder/embedding_hidden_mapping_in/bias:0, shape = (4096,) I1211 08:56:04.240664 140024623753024 run_pretraining.py:226] name = bert/encoder/embedding_hidden_mapping_in/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel:0, shape = (4096, 4096) I1211 08:56:04.240769 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias:0, shape = (4096,) I1211 08:56:04.240869 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel:0, shape = (4096, 4096) I1211 08:56:04.240964 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias:0, shape = (4096,) I1211 08:56:04.241075 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel:0, shape = (4096, 4096) I1211 08:56:04.241171 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias:0, shape = (4096,) I1211 08:56:04.241268 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel:0, shape = (4096, 4096) I1211 08:56:04.241392 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias:0, shape = (4096,) I1211 08:56:04.241534 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta:0, shape = (4096,) I1211 08:56:04.241631 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma:0, shape = (4096,) I1211 08:56:04.241748 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel:0, shape = (4096, 16384) I1211 08:56:04.241850 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel:0, shape = (4096, 16384) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias:0, shape = (16384,) I1211 08:56:04.241949 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias:0, shape = (16384,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel:0, shape = (16384, 4096) I1211 08:56:04.242043 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel:0, shape = (16384, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias:0, shape = (4096,) I1211 08:56:04.242140 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta:0, shape = (4096,) I1211 08:56:04.242233 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma:0, shape = (4096,) I1211 08:56:04.242332 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma:0, shape = (4096,) INFO:tensorflow: name = bert/pooler/dense/kernel:0, shape = (4096, 4096) I1211 08:56:04.242433 140024623753024 run_pretraining.py:226] name = bert/pooler/dense/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/pooler/dense/bias:0, shape = (4096,) I1211 08:56:04.242532 140024623753024 run_pretraining.py:226] name = bert/pooler/dense/bias:0, shape = (4096,) INFO:tensorflow: name = cls/predictions/transform/dense/kernel:0, shape = (4096, 128) I1211 08:56:04.242635 140024623753024 run_pretraining.py:226] name = cls/predictions/transform/dense/kernel:0, shape = (4096, 128) INFO:tensorflow: name = cls/predictions/transform/dense/bias:0, shape = (128,) I1211 08:56:04.242760 140024623753024 run_pretraining.py:226] name = cls/predictions/transform/dense/bias:0, shape = (128,) INFO:tensorflow: name = cls/predictions/transform/LayerNorm/beta:0, shape = (128,) I1211 08:56:04.242856 140024623753024 run_pretraining.py:226] name = cls/predictions/transform/LayerNorm/beta:0, shape = (128,) INFO:tensorflow: name = cls/predictions/transform/LayerNorm/gamma:0, shape = (128,) I1211 08:56:04.242951 140024623753024 run_pretraining.py:226] name = cls/predictions/transform/LayerNorm/gamma:0, shape = (128,) INFO:tensorflow: name = cls/predictions/output_bias:0, shape = (33001,) I1211 08:56:04.243044 140024623753024 run_pretraining.py:226] name = cls/predictions/output_bias:0, shape = (33001,) INFO:tensorflow: name = cls/seq_relationship/output_weights:0, shape = (2, 4096) I1211 08:56:04.243137 140024623753024 run_pretraining.py:226] name = cls/seq_relationship/output_weights:0, shape = (2, 4096) INFO:tensorflow: name = cls/seq_relationship/output_bias:0, shape = (2,) I1211 08:56:04.243235 140024623753024 run_pretraining.py:226] name = cls/seq_relationship/output_bias:0, shape = (2,)

I1211 09:12:03.138811 140024623753024 basic_session_run_hooks.py:262] loss = 10.181114, step = 1000 I1211 09:26:09.008900 140024623753024 basic_session_run_hooks.py:260] loss = 7.6005945, step = 2000 (845.870 sec) I1211 09:40:12.286720 140024623753024 basic_session_run_hooks.py:260] loss = 7.645055, step = 3000 (843.278 sec) I1211 09:54:16.299396 140024623753024 basic_session_run_hooks.py:260] loss = 7.6258326, step = 4000 (844.013 sec) I1211 10:08:19.825035 140024623753024 basic_session_run_hooks.py:260] loss = 7.363482, step = 5000 (843.526 sec) I1211 10:22:25.123742 140024623753024 basic_session_run_hooks.py:260] loss = 6.8203845, step = 6000 (845.299 sec) I1211 10:36:29.082039 140024623753024 basic_session_run_hooks.py:260] loss = 6.5194592, step = 7000 (843.958 sec) I1211 10:50:31.896788 140024623753024 basic_session_run_hooks.py:260] loss = 6.854472, step = 8000 (842.815 sec) I1211 11:04:36.726402 140024623753024 basic_session_run_hooks.py:260] loss = 7.0283566, step = 9000 (844.830 sec) I1211 11:19:29.132026 140024623753024 basic_session_run_hooks.py:260] loss = 6.5989375, step = 10000 (892.406 sec) I1211 11:33:32.866184 140024623753024 basic_session_run_hooks.py:260] loss = 6.550018, step = 11000 (843.734 sec) ... ... I1211 13:41:01.039676 140024623753024 basic_session_run_hooks.py:260] loss = 6.5004697, step = 20000 (894.206 sec) ... I1211 16:02:31.998177 140024623753024 basic_session_run_hooks.py:260] loss = 7.100818, step = 30000 (892.416 sec) ... I1211 18:24:15.941736 140024623753024 basic_session_run_hooks.py:260] loss = 6.5937705, step = 40000 (896.439 sec) ... I1211 20:45:50.533722 140024623753024 basic_session_run_hooks.py:260] loss = 5.950697, step = 50000 (895.989 sec) ... I1211 23:07:25.169874 140024623753024 basic_session_run_hooks.py:260] loss = 6.789865, step = 60000 (893.845 sec) ... I1212 01:28:58.518174 140024623753024 basic_session_run_hooks.py:260] loss = 6.453152, step = 70000 (892.751 sec) ... I1212 03:50:25.943136 140024623753024 basic_session_run_hooks.py:260] loss = 6.7387037, step = 80000 (889.578 sec)

What's wrong?
opened by jwkim912 16

ALBERT-xxlarge V2 training on TPU V3-512 extremely slow

Hello,

We are training a bioinformatics data using ALBERT-xxlarge on TPU V3-512.

According to the paper you trained "ALBERT-xxlarge" for 125k in 32h.

However, our training will take 7 days to complete 130k.

Our vocab file is only 34 and this is our training command:

python -m albert.run_pretraining \
    --input_file=gs://...../_train_*.tfrecord \
    --output_dir=gs:/....../albert_model/ \
    --albert_config_file=/......../albert-xxlarge-v2-config.json \
    --do_train \
    --do_eval \
    --train_batch_size=10240 \
    --eval_batch_size=64 \
    --max_seq_length=512 \
    --max_predictions_per_seq=20 \
    --optimizer='lamb' \
    --learning_rate=.002 \
    --iterations_per_loop=100 \
    --num_train_steps=130000 \
    --num_warmup_steps=42000 \
    --save_checkpoints_steps=1000 \
    --use_tpu=TRUE \
    --num_tpu_cores=512 \
    --tpu_name=.....

I also tried to change the "iterations_per_loop" to 1000 or even bigger but that didn't help.

The current logs from the training is :

I0407 18:04:44.154831 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
W0407 18:04:44.230242 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
INFO:tensorflow:Outfeed finished for iteration (8, 70)
I0407 18:05:06.949739 140197647865600 tpu_estimator.py:279] Outfeed finished for iteration (8, 70)
I0407 18:05:14.312140 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
W0407 18:05:14.393373 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
I0407 18:05:44.470578 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
W0407 18:05:44.566381 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
INFO:tensorflow:Outfeed finished for iteration (8, 84)
I0407 18:06:08.473748 140197647865600 tpu_estimator.py:279] Outfeed finished for iteration (8, 84)
I0407 18:06:14.650656 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
W0407 18:06:14.725901 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
I0407 18:06:44.819700 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
W0407 18:06:44.902827 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
INFO:tensorflow:Outfeed finished for iteration (8, 98)
I0407 18:07:09.999137 140197647865600 tpu_estimator.py:279] Outfeed finished for iteration (8, 98)
I0407 18:07:14.984425 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
W0407 18:07:15.060185 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
INFO:tensorflow:loss = 3.1807582, step = 900 (463.227 sec)
I0407 18:07:18.708591 140198823081728 basic_session_run_hooks.py:260] loss = 3.1807582, step = 900 (463.227 sec)
INFO:tensorflow:global_step/sec: 0.215877
I0407 18:07:18.709693 140198823081728 tpu_estimator.py:2307] global_step/sec: 0.215877
INFO:tensorflow:examples/sec: 2210.58
I0407 18:07:18.709883 140198823081728 tpu_estimator.py:2308] examples/sec: 2210.58
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.

It takes around 463 seconds per 100 steps, which means we can train 130k in 7 days. (130000 / 100 ) * 463 = 601900 seconds = 7 days.

The tpu, server and the bucket all at the same region.

In SUMMIT (world fastest computer) I was able to train bert with 30 layers and it took only 24 hours to finish around 122k steps using 6k V100 GPUs with Global batch size of 11k.

Do you have any idea why we can't reproduce the same speed as the paper ?

@0x0539 @Danny-Google Your feedback will be highly appreciated

opened by agemagician 15

Index Out of Range Error in tokenization using TF Hub for Pretrained Albert Models

I am getting Index out of Range error in tokenization.py when running a finetune Albert large model with TF Hub. I printed out the vocab file and printing out the token before the error. You can see the error and print-outs below.

Vocab File: b'/tmp/tfhub_modules/c88f9d4ac7469966b2fab3b577a8031ae23e125a/assets/30k-clean.model'
Token:  

Traceback (most recent call last):
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 318, in <module>
    tf.compat.v1.app.run()
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 185, in main
    tokenizer = create_tokenizer_from_hub_module(FLAGS.albert_hub_module_handle)
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 161, in create_tokenizer_from_hub_module
    spm_model_file=FLAGS.spm_model_file)
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/tokenization.py", line 249, in __init__
    self.vocab = load_vocab(vocab_file)
  File "/home/[user]/Documents/ml-tests/falling-albert/albert/tokenization.py", line 203, in load_vocab
    token = token.strip().split()[0]
IndexError: list index out of range

Albert Finetune Shell Script

#!/bin/bash
pip install -r albert/requirements.txt
python -m albert.run_classifier_with_tfhub \
--albert_hub_module_handle=https://tfhub.dev/google/albert_xlarge/1 \
--task_name=cola \
--do_train=true \
--do_eval=true  \
--data_dir=./data-to-albert \
--max_seq_length=128  \
--train_batch_size=32  \
--learning_rate=2e-05 \
--num_train_epochs=3.0  \
--output_dir=./checkpoints/test

opened by jsmith09 9

Where can I find the "spm_model_file" when run run_squad_v2.py

Thanks for publishing your code. I was trying to run "run_squad_v2" to learning ALBERT. There is a flag, "--spm_model_file" when running it. What is that? Where can I download that file?

opened by wmmxk 9
Has anyone reproduced SQuAD 1.1 score(90.2/83.2) on albert-base V2?
Hi, I downloaded pre-trained ALBERT base V2 model at the link in README.md and tried to fine-tune it on SQuAD 1.1 dataset without using albert hub module. However, I got f1=16.14 and exact match=7.34 as my final result, which is significantly lower than scores(90.2/83.2) reported at README.md.

Here is the command that I used for fine-tuning

ALBERT_ROOT is the directory path where I keep my albert-base-v2 model

train_feature_file, predict_feature_file, predict_feature_left_file were created in SQUAD_DIR after I ran the following command

python -m run_squad_v1
--albert_config_file="${ALBERT_ROOT}/albert_config.json"
--output_dir=./output_base_v2/SQUAD
--train_file="$SQUAD_DIR/train-v1.1.json"
--predict_file="$SQUAD_DIR/dev-v1.1.json"
--train_feature_file="$SQUAD_DIR/train.tfrecord"
--predict_feature_file="$SQUAD_DIR/dev.tfrecord"
--predict_feature_left_file="$SQUAD_DIR/pred_left_file.pkl"
--init_checkpoint=""
--spm_model_file="${ALBERT_ROOT}/30k-clean.model"
--do_lower_case
--max_seq_length=384
--doc_stride=128
--max_query_length=64
--do_train=true
--do_predict=true
--train_batch_size=48
--predict_batch_size=8
--learning_rate=5e-5
--num_train_epochs=2.0
--warmup_proportion=.1
--save_checkpoints_steps=5000
--n_best_size=20
--max_answer_length=30
opened by YJYJLee 8

Bad eval results on RTE and CoLA

I tried fine-tuning ALBERT-base model on the two smallest glue tasks, but got only about 66% accuracy for both. I was using GPU (2080Ti) for it. The script for glue fine-tuning has bug in the evaluation part, and I tried to fix it, but I am quite new to tensorflow so I am not sure if there is still something wrong with the script. Below is the script I am using:

set -ex

OUTPUT_DIR="glue_baseline"

# To start from a custom pretrained checkpoint, set ALBERT_HUB_MODULE_HANDLE
# below to an empty string and set INIT_CHECKPOINT to your checkpoint path.
ALBERT_HUB_MODULE_HANDLE="https://tfhub.dev/google/albert_base/1"
INIT_CHECKPOINT=""

ALBERT_ROOT=pretrained/albert_base


function run_task() {
  COMMON_ARGS="--output_dir="${OUTPUT_DIR}/$1" --data_dir="${ALBERT_ROOT}/glue" --vocab_file="${ALBERT_ROOT}/vocab.txt" --spm_model_file="${ALBERT_ROOT}/30k-clean.model" --do_lower_case --max_seq_length=128 --optimizer=adamw --task_name=$1 --warmup_step=$2 --learning_rate=$3 --train_step=$4 --save_checkpoints_steps=$5 --train_batch_size=$6"
  python3 -m run_classifier \
      ${COMMON_ARGS} \
      --do_train \
      --nodo_eval \
      --nodo_predict \
      --albert_hub_module_handle="${ALBERT_HUB_MODULE_HANDLE}" \
      --init_checkpoint="${INIT_CHECKPOINT}"
  python3 -m run_classifier \
      ${COMMON_ARGS} \
      --nodo_train \
      --do_eval \
      --albert_hub_module_handle="${ALBERT_HUB_MODULE_HANDLE}" \
      --do_predict
}

run_task RTE 200 3e-5 800 100 32

I tried printing the training loss and it seems to have converged, but somehow the eval results are nearly random. The eval accuracy for different checkpoints are different, so I think these checkpoints have been loaded.

opened by zhuchen03 8

[ALBERT]: LookupError: gradient registry has no entry for: AddV2
When run run_classifier_with_tfhub.py, but the training crashed. The error is:

LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)

My tensorflow-gpu version is 1.14.0

Anyone knows the reason, pls help.. Thanks
opened by wxp16 8
[ALBERT]Has anyone reproduced ALBERT a scores on GLUE dataset?

I convert tf weight to pytorch weight ,and on QQP dataset, I only get 87% accuracy.

model: albert-base epochs: 3 learning_rate; 2e-5 batch size: 24 max sequence length: 128 warmup_proportion: 0.1

opened by lonePatient 8
"no dropout" on v2 models

You say that you are using "no dropout" on the TFHub v2-models. However, looking at the albert_config.json-files there seem to be dropout on most models (https://tfhub.dev/google/albert_base/2). Only on the xxlarge, there is no dropout (https://tfhub.dev/google/albert_xxlarge/2). What is correct?

opened by peregilk 8
Significantly lower than expected eval accuracy on MNLI
Before ALBERT was moved to this repository, I downloaded the pre-trained ALBERT-base-2 from TFHub and used run_classifier_sp.py to evaluate the model on MNLI by modifying the provided run.sh script to execute the following instead of run_pretraining_test:

python -m albert.run_classifier_sp \ --output_dir="/path/to/output" \ --export_dir="/path/to/export" \ --do_eval \ --nouse_tpu \ --eval_batch_size=1 \ --max_seq_length=4 \ --max_eval_steps=3 \ --vocab_file="/path/to/albert-base-2/assets/30k-clean.vocab" \ --data_dir="/path/to/glue/MNLI" \ --task_name=MNLI

This gave an eval accuracy of approximately 0.34, which is significantly lower than the expected 0.84 discussed in the paper.

Has anyone else seen such low out-of-the-box evaluation results? Is this simply an issue with how I'm running the evaluation? If so, are there any recommendations for running evaluation to achieve better results?
opened by 5donuts 7
Bump tensorflow from 1.15.2 to 2.9.3
Bumps tensorflow from 1.15.2 to 2.9.3.

Release notes

Sourced from tensorflow's releases.

TensorFlow 2.9.3

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

TensorFlow 2.9.2

Release 2.9.2

This releases introduces several vulnerability fixes:

Fixes a CHECK failure in tf.reshape caused by overflows (CVE-2022-35934)

Fixes a CHECK failure in SobolSample caused by missing validation (CVE-2022-35935)

Fixes an OOB read in Gather_nd op in TF Lite (CVE-2022-35937)

Fixes a CHECK failure in TensorListReserve caused by missing validation (CVE-2022-35960)

Fixes an OOB write in Scatter_nd op in TF Lite (CVE-2022-35939)

Fixes an integer overflow in RaggedRangeOp (CVE-2022-35940)

Fixes a CHECK failure in AvgPoolOp (CVE-2022-35941)

Fixes a CHECK failures in UnbatchGradOp (CVE-2022-35952)

Fixes a segfault TFLite converter on per-channel quantized transposed convolutions (CVE-2022-36027)

Fixes a CHECK failures in AvgPool3DGrad (CVE-2022-35959)

Fixes a CHECK failures in FractionalAvgPoolGrad (CVE-2022-35963)

Fixes a segfault in BlockLSTMGradV2 (CVE-2022-35964)

Fixes a segfault in LowerBound and UpperBound (CVE-2022-35965)

... (truncated)

Changelog

Sourced from tensorflow's changelog.

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

Release 2.8.4

This release introduces several vulnerability fixes:

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

... (truncated)

Commits

a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2

258f9a1 Update py_func.cc

cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474

3e75385 Update version numbers to 2.9.3

bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695

3506c90 Update RELEASE.md

8dcb48e Update RELEASE.md

4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...

6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple

5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
The results can't be reproduced

hi，I run the code according to the official command, but it doesn't reproduce the results, only get accuracy of ~71(datasets: RTE), can you tell me where is wrong?

opened by kavin525zhang 2
tokenization: log spm usage only in debug mode to avoid console spamming

Hi,

I'm not sure if the repo is currently maintained, but this PR increases the log level (from info to debug) for the SPM tokenization usage message. It will be displayed whenever a line is tokenized/detokenized which is very annoying, because the default log level is info.

opened by stefan-it 0
Difference between v1 and v2 for xxlarge

Hi,

I wanted to clarify a point from the paper and README that I am confused about. In the paper, and the repo's README, it seems like the v1 model was trained only on wikipedia and the book corpus, to compare with BERT. However, in the README, there's the following text:

On average, ALBERT-xxlarge is slightly worse than the v1, because of the following two reasons: 1) Training additional 1.5 M steps (the only difference between these two models is training for 1.5M steps and 3M steps) did not lead to significant performance improvement. 2) For v1, we did a little bit hyperparameter search among the parameters sets given by BERT, Roberta, and XLnet.

This implies that the xxlarge version of v1 was also trained on additional data.

The question is whether the v1 xxlarge model was solely trained on wiki+books, or was it trained on additional data?

opened by yanaiela 0
Explicitly import estimator from tensorflow as a separate import instead of

Explicitly import estimator from tensorflow as a separate import instead of accessing it via tf.estimator and depend on the tensorflow estimator target.

opened by copybara-service[bot] 0

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Related tags

Overview

ALBERT

Release Notes

Results

Pre-trained Models

Pre-training Instructions

Fine-tuning on GLUE

Fine-tuning on SQuAD

Fine-tuning on RACE

SentencePiece

Comments

TensorFlow 2.9.3

Release 2.9.3

TensorFlow 2.9.2

Release 2.9.2

Release 2.9.3

Release 2.8.4

Owner

Google Research

天池中药说明书实体识别挑战冠军方案；中文命名实体识别；NER; BERT-CRF & BERT-SPAN & BERT-MRC；Pytorch

MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert, MILES uses the bert-base-multilingual-uncased model, as well as simple language-agnostic approaches to complex word identification (CWI) and candidate ranking.

LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

VD-BERT: A Unified Vision and Dialog Transformer with BERT

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

IMDB film review sentiment classification based on BERT's supervised learning model.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Language-Agnostic SEntence Representations

Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Pre-training BERT masked language models with custom vocabulary

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

This repository describes our reproducible framework for assessing self-supervised representation learning from speech

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning