ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Overview

ALBERT

***************New March 28, 2020 ***************

Add a colab tutorial to run fine-tuning for GLUE datasets.

***************New January 7, 2020 ***************

v2 TF-Hub models should be working now with TF 1.15, as we removed the native Einsum op from the graph. See updated TF-Hub links below.

***************New December 30, 2019 ***************

Chinese models are released. We would like to thank CLUE team for providing the training data.

Version 2 of ALBERT models is released.

In this version, we apply 'no dropout', 'additional training data' and 'long training time' strategies to all models. We train ALBERT-base for 10M steps and other models for 3M steps.

The result comparison to the v1 models is as followings:

Average SQuAD1.1 SQuAD2.0 MNLI SST-2 RACE
V2
ALBERT-base 82.3 90.2/83.2 82.1/79.3 84.6 92.9 66.8
ALBERT-large 85.7 91.8/85.2 84.9/81.8 86.5 94.9 75.2
ALBERT-xlarge 87.9 92.9/86.4 87.9/84.1 87.9 95.4 80.7
ALBERT-xxlarge 90.9 94.6/89.1 89.8/86.9 90.6 96.8 86.8
V1
ALBERT-base 80.1 89.3/82.3 80.0/77.1 81.6 90.3 64.0
ALBERT-large 82.4 90.6/83.9 82.3/79.4 83.5 91.7 68.5
ALBERT-xlarge 85.5 92.5/86.1 86.1/83.1 86.4 92.4 74.8
ALBERT-xxlarge 91.0 94.8/89.3 90.2/87.4 90.8 96.9 86.5

The comparison shows that for ALBERT-base, ALBERT-large, and ALBERT-xlarge, v2 is much better than v1, indicating the importance of applying the above three strategies. On average, ALBERT-xxlarge is slightly worse than the v1, because of the following two reasons: 1) Training additional 1.5 M steps (the only difference between these two models is training for 1.5M steps and 3M steps) did not lead to significant performance improvement. 2) For v1, we did a little bit hyperparameter search among the parameters sets given by BERT, Roberta, and XLnet. For v2, we simply adopt the parameters from v1 except for RACE, where we use a learning rate of 1e-5 and 0 ALBERT DR (dropout rate for ALBERT in finetuning). The original (v1) RACE hyperparameter will cause model divergence for v2 models. Given that the downstream tasks are sensitive to the fine-tuning hyperparameters, we should be careful about so called slight improvements.

ALBERT is "A Lite" version of BERT, a popular unsupervised language representation learning algorithm. ALBERT uses parameter-reduction techniques that allow for large-scale configurations, overcome previous memory limitations, and achieve better behavior with respect to model degradation.

For a technical description of the algorithm, see our paper:

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

Release Notes

  • Initial release: 10/9/2019

Results

Performance of ALBERT on GLUE benchmark results using a single-model setup on dev:

Models MNLI QNLI QQP RTE SST MRPC CoLA STS
BERT-large 86.6 92.3 91.3 70.4 93.2 88.0 60.6 90.0
XLNet-large 89.8 93.9 91.8 83.8 95.6 89.2 63.6 91.8
RoBERTa-large 90.2 94.7 92.2 86.6 96.4 90.9 68.0 92.4
ALBERT (1M) 90.4 95.2 92.0 88.1 96.8 90.2 68.7 92.7
ALBERT (1.5M) 90.8 95.3 92.2 89.2 96.9 90.9 71.4 93.0

Performance of ALBERT-xxl on SQuaD and RACE benchmarks using a single-model setup:

Models SQuAD1.1 dev SQuAD2.0 dev SQuAD2.0 test RACE test (Middle/High)
BERT-large 90.9/84.1 81.8/79.0 89.1/86.3 72.0 (76.6/70.1)
XLNet 94.5/89.0 88.8/86.1 89.1/86.3 81.8 (85.5/80.2)
RoBERTa 94.6/88.9 89.4/86.5 89.8/86.8 83.2 (86.5/81.3)
UPM - - 89.9/87.2 -
XLNet + SG-Net Verifier++ - - 90.1/87.2 -
ALBERT (1M) 94.8/89.2 89.9/87.2 - 86.0 (88.2/85.1)
ALBERT (1.5M) 94.8/89.3 90.2/87.4 90.9/88.1 86.5 (89.0/85.5)

Pre-trained Models

TF-Hub modules are available:

Example usage of the TF-Hub module in code:

tags = set()
if is_training:
  tags.add("train")
albert_module = hub.Module("https://tfhub.dev/google/albert_base/1", tags=tags,
                           trainable=True)
albert_inputs = dict(
    input_ids=input_ids,
    input_mask=input_mask,
    segment_ids=segment_ids)
albert_outputs = albert_module(
    inputs=albert_inputs,
    signature="tokens",
    as_dict=True)

# If you want to use the token-level output, use
# albert_outputs["sequence_output"] instead.
output_layer = albert_outputs["pooled_output"]

Most of the fine-tuning scripts in this repository support TF-hub modules via the --albert_hub_module_handle flag.

Pre-training Instructions

To pretrain ALBERT, use run_pretraining.py:

pip install -r albert/requirements.txt
python -m albert.run_pretraining \
    --input_file=... \
    --output_dir=... \
    --init_checkpoint=... \
    --albert_config_file=... \
    --do_train \
    --do_eval \
    --train_batch_size=4096 \
    --eval_batch_size=64 \
    --max_seq_length=512 \
    --max_predictions_per_seq=20 \
    --optimizer='lamb' \
    --learning_rate=.00176 \
    --num_train_steps=125000 \
    --num_warmup_steps=3125 \
    --save_checkpoints_steps=5000

Fine-tuning on GLUE

To fine-tune and evaluate a pretrained ALBERT on GLUE, please see the convenience script run_glue.sh.

Lower-level use cases may want to use the run_classifier.py script directly. The run_classifier.py script is used both for fine-tuning and evaluation of ALBERT on individual GLUE benchmark tasks, such as MNLI:

pip install -r albert/requirements.txt
python -m albert.run_classifier \
  --data_dir=... \
  --output_dir=... \
  --init_checkpoint=... \
  --albert_config_file=... \
  --spm_model_file=... \
  --do_train \
  --do_eval \
  --do_predict \
  --do_lower_case \
  --max_seq_length=128 \
  --optimizer=adamw \
  --task_name=MNLI \
  --warmup_step=1000 \
  --learning_rate=3e-5 \
  --train_step=10000 \
  --save_checkpoints_steps=100 \
  --train_batch_size=128

Good default flag values for each GLUE task can be found in run_glue.sh.

You can fine-tune the model starting from TF-Hub modules instead of raw checkpoints by setting e.g. --albert_hub_module_handle=https://tfhub.dev/google/albert_base/1 instead of --init_checkpoint.

You can find the spm_model_file in the tar files or under the assets folder of the tf-hub module. The name of the model file is "30k-clean.model".

After evaluation, the script should report some output like this:

***** Eval results *****
  global_step = ...
  loss = ...
  masked_lm_accuracy = ...
  masked_lm_loss = ...
  sentence_order_accuracy = ...
  sentence_order_loss = ...

Fine-tuning on SQuAD

To fine-tune and evaluate a pretrained model on SQuAD v1, use the run_squad_v1.py script:

pip install -r albert/requirements.txt
python -m albert.run_squad_v1 \
  --albert_config_file=... \
  --output_dir=... \
  --train_file=... \
  --predict_file=... \
  --train_feature_file=... \
  --predict_feature_file=... \
  --predict_feature_left_file=... \
  --init_checkpoint=... \
  --spm_model_file=... \
  --do_lower_case \
  --max_seq_length=384 \
  --doc_stride=128 \
  --max_query_length=64 \
  --do_train=true \
  --do_predict=true \
  --train_batch_size=48 \
  --predict_batch_size=8 \
  --learning_rate=5e-5 \
  --num_train_epochs=2.0 \
  --warmup_proportion=.1 \
  --save_checkpoints_steps=5000 \
  --n_best_size=20 \
  --max_answer_length=30

You can fine-tune the model starting from TF-Hub modules instead of raw checkpoints by setting e.g. --albert_hub_module_handle=https://tfhub.dev/google/albert_base/1 instead of --init_checkpoint.

For SQuAD v2, use the run_squad_v2.py script:

pip install -r albert/requirements.txt
python -m albert.run_squad_v2 \
  --albert_config_file=... \
  --output_dir=... \
  --train_file=... \
  --predict_file=... \
  --train_feature_file=... \
  --predict_feature_file=... \
  --predict_feature_left_file=... \
  --init_checkpoint=... \
  --spm_model_file=... \
  --do_lower_case \
  --max_seq_length=384 \
  --doc_stride=128 \
  --max_query_length=64 \
  --do_train \
  --do_predict \
  --train_batch_size=48 \
  --predict_batch_size=8 \
  --learning_rate=5e-5 \
  --num_train_epochs=2.0 \
  --warmup_proportion=.1 \
  --save_checkpoints_steps=5000 \
  --n_best_size=20 \
  --max_answer_length=30

You can fine-tune the model starting from TF-Hub modules instead of raw checkpoints by setting e.g. --albert_hub_module_handle=https://tfhub.dev/google/albert_base/1 instead of --init_checkpoint.

Fine-tuning on RACE

For RACE, use the run_race.py script:

pip install -r albert/requirements.txt
python -m albert.run_race \
  --albert_config_file=... \
  --output_dir=... \
  --train_file=... \
  --eval_file=... \
  --data_dir=...\
  --init_checkpoint=... \
  --spm_model_file=... \
  --max_seq_length=512 \
  --max_qa_length=128 \
  --do_train \
  --do_eval \
  --train_batch_size=32 \
  --eval_batch_size=8 \
  --learning_rate=1e-5 \
  --train_step=12000 \
  --warmup_step=1000 \
  --save_checkpoints_steps=100

You can fine-tune the model starting from TF-Hub modules instead of raw checkpoints by setting e.g. --albert_hub_module_handle=https://tfhub.dev/google/albert_base/1 instead of --init_checkpoint.

SentencePiece

Command for generating the sentence piece vocabulary:

spm_train \
--input all.txt --model_prefix=30k-clean --vocab_size=30000 --logtostderr
--pad_id=0 --unk_id=1 --eos_id=-1 --bos_id=-1
--control_symbols=[CLS],[SEP],[MASK]
--user_defined_symbols="(,),\",-,.,–,£,€"
--shuffle_input_sentence=true --input_sentence_size=10000000
--character_coverage=0.99995 --model_type=unigram
Comments
  • LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum)

    LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum)

    I am using run_classifier_with_tfhub with --albert_hub_module_handle=https://tfhub.dev/google/albert_base/2.

    I am getting error like "LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum)"

    The argument is: python3 -m run_classifier_with_tfhub --data_dir=../../DataSet/CoLA/ --t ask_name=cola --output_dir=testing_ttt --vocab_file=vocab.txt --albert_hub_module_handle=https://tfhub.dev/google/albert_base/2 --do_train=True --do_eval=True --max_seq _length=128 --train_batch_size=32 --learning_rate=2e-05 --num_train_epochs=3.0

    I am using tensorflow==1.15.0

    opened by MichaelCaohn 22
  • No decreasing loss when pre-train for xxlarge

    No decreasing loss when pre-train for xxlarge

    Hi, I'm pre-training xxlarge model using own language. I trained on TPU-v2-256 but loss is not decreasing. Below is the learning information.

    • vocab size: 33001
    • training data size: 518G ( dupe factor: 10)
    • max_seq_length: 512
    • 3 gram masking, using SOP
    • word size: 5 B
    • batch size: 512
    • optimizer: lamb
    • learning rate: 0.00176

    I1211 08:56:02.464132 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:<DatasetV1Adapter shapes: {input_ids: (2, 512), input_mask: (2, 512), masked_lm_ids: (2, 77), masked_lm_positions: (2, 77), masked_lm_weights: (2, 77), next_sentence_labels: (2, 1), segment_ids: (2, 512)}, types: {input_ids: tf.int32, input_mask: tf.int32, masked_lm_ids: tf.int32, masked_lm_positions: tf.int32, masked_lm_weights: tf.float32, next_sentence_labels: tf.int32, segment_ids: tf.int32}> I1211 08:56:02.510196 140024623753024 run_pretraining.py:457] <DatasetV1Adapter shapes: {input_ids: (2, 512), input_mask: (2, 512), masked_lm_ids: (2, 77), masked_lm_positions: (2, 77), masked_lm_weights: (2, 77), next_sentence_labels: (2, 1), segment_ids: (2, 512)}, types: {input_ids: tf.int32, input_mask: tf.int32, masked_lm_ids: tf.int32, masked_lm_positions: tf.int32, masked_lm_weights: tf.float32, next_sentence_labels: tf.int32, segment_ids: tf.int32}> INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.523885 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.526081 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.527927 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.529864 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.531889 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.533753 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.535558 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.537545 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] 2019-12-11 08:56:02.673414: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2019-12-11 08:56:02.673472: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303) 2019-12-11 08:56:02.673496: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (instance-2): /proc/driver/nvidia/version does not exist INFO:tensorflow:*** Features *** I1211 08:56:02.704437 140024623753024 run_pretraining.py:150] *** Features *** INFO:tensorflow: name = input_ids, shape = (2, 512) I1211 08:56:02.704912 140024623753024 run_pretraining.py:152] name = input_ids, shape = (2, 512) INFO:tensorflow: name = input_mask, shape = (2, 512) I1211 08:56:02.705025 140024623753024 run_pretraining.py:152] name = input_mask, shape = (2, 512) INFO:tensorflow: name = masked_lm_ids, shape = (2, 77) I1211 08:56:02.705091 140024623753024 run_pretraining.py:152] name = masked_lm_ids, shape = (2, 77) INFO:tensorflow: name = masked_lm_positions, shape = (2, 77) I1211 08:56:02.705152 140024623753024 run_pretraining.py:152] name = masked_lm_positions, shape = (2, 77) INFO:tensorflow: name = masked_lm_weights, shape = (2, 77) I1211 08:56:02.705220 140024623753024 run_pretraining.py:152] name = masked_lm_weights, shape = (2, 77) INFO:tensorflow: name = next_sentence_labels, shape = (2, 1) I1211 08:56:02.705290 140024623753024 run_pretraining.py:152] name = next_sentence_labels, shape = (2, 1) INFO:tensorflow: name = segment_ids, shape = (2, 512) I1211 08:56:02.705374 140024623753024 run_pretraining.py:152] name = segment_ids, shape = (2, 512)

    INFO:tensorflow:**** Trainable Variables **** I1211 08:56:04.239879 140024623753024 run_pretraining.py:220] **** Trainable Variables **** INFO:tensorflow: name = bert/embeddings/word_embeddings:0, shape = (33001, 128) I1211 08:56:04.239998 140024623753024 run_pretraining.py:226] name = bert/embeddings/word_embeddings:0, shape = (33001, 128) INFO:tensorflow: name = bert/embeddings/token_type_embeddings:0, shape = (2, 128) I1211 08:56:04.240141 140024623753024 run_pretraining.py:226] name = bert/embeddings/token_type_embeddings:0, shape = (2, 128) INFO:tensorflow: name = bert/embeddings/position_embeddings:0, shape = (512, 128) I1211 08:56:04.240252 140024623753024 run_pretraining.py:226] name = bert/embeddings/position_embeddings:0, shape = (512, 128) INFO:tensorflow: name = bert/embeddings/LayerNorm/beta:0, shape = (128,) I1211 08:56:04.240369 140024623753024 run_pretraining.py:226] name = bert/embeddings/LayerNorm/beta:0, shape = (128,) INFO:tensorflow: name = bert/embeddings/LayerNorm/gamma:0, shape = (128,) I1211 08:56:04.240468 140024623753024 run_pretraining.py:226] name = bert/embeddings/LayerNorm/gamma:0, shape = (128,) INFO:tensorflow: name = bert/encoder/embedding_hidden_mapping_in/kernel:0, shape = (128, 4096) I1211 08:56:04.240564 140024623753024 run_pretraining.py:226] name = bert/encoder/embedding_hidden_mapping_in/kernel:0, shape = (128, 4096) INFO:tensorflow: name = bert/encoder/embedding_hidden_mapping_in/bias:0, shape = (4096,) I1211 08:56:04.240664 140024623753024 run_pretraining.py:226] name = bert/encoder/embedding_hidden_mapping_in/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel:0, shape = (4096, 4096) I1211 08:56:04.240769 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias:0, shape = (4096,) I1211 08:56:04.240869 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel:0, shape = (4096, 4096) I1211 08:56:04.240964 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias:0, shape = (4096,) I1211 08:56:04.241075 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel:0, shape = (4096, 4096) I1211 08:56:04.241171 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias:0, shape = (4096,) I1211 08:56:04.241268 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel:0, shape = (4096, 4096) I1211 08:56:04.241392 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias:0, shape = (4096,) I1211 08:56:04.241534 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta:0, shape = (4096,) I1211 08:56:04.241631 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma:0, shape = (4096,) I1211 08:56:04.241748 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel:0, shape = (4096, 16384) I1211 08:56:04.241850 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel:0, shape = (4096, 16384) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias:0, shape = (16384,) I1211 08:56:04.241949 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias:0, shape = (16384,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel:0, shape = (16384, 4096) I1211 08:56:04.242043 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel:0, shape = (16384, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias:0, shape = (4096,) I1211 08:56:04.242140 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta:0, shape = (4096,) I1211 08:56:04.242233 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma:0, shape = (4096,) I1211 08:56:04.242332 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma:0, shape = (4096,) INFO:tensorflow: name = bert/pooler/dense/kernel:0, shape = (4096, 4096) I1211 08:56:04.242433 140024623753024 run_pretraining.py:226] name = bert/pooler/dense/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/pooler/dense/bias:0, shape = (4096,) I1211 08:56:04.242532 140024623753024 run_pretraining.py:226] name = bert/pooler/dense/bias:0, shape = (4096,) INFO:tensorflow: name = cls/predictions/transform/dense/kernel:0, shape = (4096, 128) I1211 08:56:04.242635 140024623753024 run_pretraining.py:226] name = cls/predictions/transform/dense/kernel:0, shape = (4096, 128) INFO:tensorflow: name = cls/predictions/transform/dense/bias:0, shape = (128,) I1211 08:56:04.242760 140024623753024 run_pretraining.py:226] name = cls/predictions/transform/dense/bias:0, shape = (128,) INFO:tensorflow: name = cls/predictions/transform/LayerNorm/beta:0, shape = (128,) I1211 08:56:04.242856 140024623753024 run_pretraining.py:226] name = cls/predictions/transform/LayerNorm/beta:0, shape = (128,) INFO:tensorflow: name = cls/predictions/transform/LayerNorm/gamma:0, shape = (128,) I1211 08:56:04.242951 140024623753024 run_pretraining.py:226] name = cls/predictions/transform/LayerNorm/gamma:0, shape = (128,) INFO:tensorflow: name = cls/predictions/output_bias:0, shape = (33001,) I1211 08:56:04.243044 140024623753024 run_pretraining.py:226] name = cls/predictions/output_bias:0, shape = (33001,) INFO:tensorflow: name = cls/seq_relationship/output_weights:0, shape = (2, 4096) I1211 08:56:04.243137 140024623753024 run_pretraining.py:226] name = cls/seq_relationship/output_weights:0, shape = (2, 4096) INFO:tensorflow: name = cls/seq_relationship/output_bias:0, shape = (2,) I1211 08:56:04.243235 140024623753024 run_pretraining.py:226] name = cls/seq_relationship/output_bias:0, shape = (2,)

    I1211 09:12:03.138811 140024623753024 basic_session_run_hooks.py:262] loss = 10.181114, step = 1000 I1211 09:26:09.008900 140024623753024 basic_session_run_hooks.py:260] loss = 7.6005945, step = 2000 (845.870 sec) I1211 09:40:12.286720 140024623753024 basic_session_run_hooks.py:260] loss = 7.645055, step = 3000 (843.278 sec) I1211 09:54:16.299396 140024623753024 basic_session_run_hooks.py:260] loss = 7.6258326, step = 4000 (844.013 sec) I1211 10:08:19.825035 140024623753024 basic_session_run_hooks.py:260] loss = 7.363482, step = 5000 (843.526 sec) I1211 10:22:25.123742 140024623753024 basic_session_run_hooks.py:260] loss = 6.8203845, step = 6000 (845.299 sec) I1211 10:36:29.082039 140024623753024 basic_session_run_hooks.py:260] loss = 6.5194592, step = 7000 (843.958 sec) I1211 10:50:31.896788 140024623753024 basic_session_run_hooks.py:260] loss = 6.854472, step = 8000 (842.815 sec) I1211 11:04:36.726402 140024623753024 basic_session_run_hooks.py:260] loss = 7.0283566, step = 9000 (844.830 sec) I1211 11:19:29.132026 140024623753024 basic_session_run_hooks.py:260] loss = 6.5989375, step = 10000 (892.406 sec) I1211 11:33:32.866184 140024623753024 basic_session_run_hooks.py:260] loss = 6.550018, step = 11000 (843.734 sec) ... ... I1211 13:41:01.039676 140024623753024 basic_session_run_hooks.py:260] loss = 6.5004697, step = 20000 (894.206 sec) ... I1211 16:02:31.998177 140024623753024 basic_session_run_hooks.py:260] loss = 7.100818, step = 30000 (892.416 sec) ... I1211 18:24:15.941736 140024623753024 basic_session_run_hooks.py:260] loss = 6.5937705, step = 40000 (896.439 sec) ... I1211 20:45:50.533722 140024623753024 basic_session_run_hooks.py:260] loss = 5.950697, step = 50000 (895.989 sec) ... I1211 23:07:25.169874 140024623753024 basic_session_run_hooks.py:260] loss = 6.789865, step = 60000 (893.845 sec) ... I1212 01:28:58.518174 140024623753024 basic_session_run_hooks.py:260] loss = 6.453152, step = 70000 (892.751 sec) ... I1212 03:50:25.943136 140024623753024 basic_session_run_hooks.py:260] loss = 6.7387037, step = 80000 (889.578 sec)

    What's wrong?

    opened by jwkim912 16
  • ALBERT-xxlarge V2 training on TPU V3-512 extremely slow

    ALBERT-xxlarge V2 training on TPU V3-512 extremely slow

    Hello,

    We are training a bioinformatics data using ALBERT-xxlarge on TPU V3-512.

    According to the paper you trained "ALBERT-xxlarge" for 125k in 32h.

    However, our training will take 7 days to complete 130k.

    Our vocab file is only 34 and this is our training command:

    python -m albert.run_pretraining \
        --input_file=gs://...../_train_*.tfrecord \
        --output_dir=gs:/....../albert_model/ \
        --albert_config_file=/......../albert-xxlarge-v2-config.json \
        --do_train \
        --do_eval \
        --train_batch_size=10240 \
        --eval_batch_size=64 \
        --max_seq_length=512 \
        --max_predictions_per_seq=20 \
        --optimizer='lamb' \
        --learning_rate=.002 \
        --iterations_per_loop=100 \
        --num_train_steps=130000 \
        --num_warmup_steps=42000 \
        --save_checkpoints_steps=1000 \
        --use_tpu=TRUE \
        --num_tpu_cores=512 \
        --tpu_name=.....
    

    I also tried to change the "iterations_per_loop" to 1000 or even bigger but that didn't help.

    The current logs from the training is :

    I0407 18:04:44.154831 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
    WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    W0407 18:04:44.230242 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    INFO:tensorflow:Outfeed finished for iteration (8, 70)
    I0407 18:05:06.949739 140197647865600 tpu_estimator.py:279] Outfeed finished for iteration (8, 70)
    I0407 18:05:14.312140 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
    WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    W0407 18:05:14.393373 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    I0407 18:05:44.470578 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
    WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    W0407 18:05:44.566381 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    INFO:tensorflow:Outfeed finished for iteration (8, 84)
    I0407 18:06:08.473748 140197647865600 tpu_estimator.py:279] Outfeed finished for iteration (8, 84)
    I0407 18:06:14.650656 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
    WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    W0407 18:06:14.725901 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    I0407 18:06:44.819700 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
    WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    W0407 18:06:44.902827 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    INFO:tensorflow:Outfeed finished for iteration (8, 98)
    I0407 18:07:09.999137 140197647865600 tpu_estimator.py:279] Outfeed finished for iteration (8, 98)
    I0407 18:07:14.984425 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
    WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    W0407 18:07:15.060185 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    INFO:tensorflow:loss = 3.1807582, step = 900 (463.227 sec)
    I0407 18:07:18.708591 140198823081728 basic_session_run_hooks.py:260] loss = 3.1807582, step = 900 (463.227 sec)
    INFO:tensorflow:global_step/sec: 0.215877
    I0407 18:07:18.709693 140198823081728 tpu_estimator.py:2307] global_step/sec: 0.215877
    INFO:tensorflow:examples/sec: 2210.58
    I0407 18:07:18.709883 140198823081728 tpu_estimator.py:2308] examples/sec: 2210.58
    INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
    

    It takes around 463 seconds per 100 steps, which means we can train 130k in 7 days. (130000 / 100 ) * 463 = 601900 seconds = 7 days.

    The tpu, server and the bucket all at the same region.

    In SUMMIT (world fastest computer) I was able to train bert with 30 layers and it took only 24 hours to finish around 122k steps using 6k V100 GPUs with Global batch size of 11k.

    Do you have any idea why we can't reproduce the same speed as the paper ?

    @0x0539 @Danny-Google Your feedback will be highly appreciated

    opened by agemagician 15
  • Index Out of Range Error in tokenization using TF Hub for Pretrained Albert Models

    Index Out of Range Error in tokenization using TF Hub for Pretrained Albert Models

    I am getting Index out of Range error in tokenization.py when running a finetune Albert large model with TF Hub. I printed out the vocab file and printing out the token before the error. You can see the error and print-outs below.

    Vocab File: b'/tmp/tfhub_modules/c88f9d4ac7469966b2fab3b577a8031ae23e125a/assets/30k-clean.model'
    Token:  
    
    Traceback (most recent call last):
      File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 318, in <module>
        tf.compat.v1.app.run()
      File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
        _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
      File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/absl/app.py", line 299, in run
        _run_main(main, args)
      File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
        sys.exit(main(argv))
      File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 185, in main
        tokenizer = create_tokenizer_from_hub_module(FLAGS.albert_hub_module_handle)
      File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 161, in create_tokenizer_from_hub_module
        spm_model_file=FLAGS.spm_model_file)
      File "/home/[user]/Documents/ml-tests/falling-albert/albert/tokenization.py", line 249, in __init__
        self.vocab = load_vocab(vocab_file)
      File "/home/[user]/Documents/ml-tests/falling-albert/albert/tokenization.py", line 203, in load_vocab
        token = token.strip().split()[0]
    IndexError: list index out of range
    

    Albert Finetune Shell Script

    #!/bin/bash
    pip install -r albert/requirements.txt
    python -m albert.run_classifier_with_tfhub \
    --albert_hub_module_handle=https://tfhub.dev/google/albert_xlarge/1 \
    --task_name=cola \
    --do_train=true \
    --do_eval=true  \
    --data_dir=./data-to-albert \
    --max_seq_length=128  \
    --train_batch_size=32  \
    --learning_rate=2e-05 \
    --num_train_epochs=3.0  \
    --output_dir=./checkpoints/test
    
    opened by jsmith09 9
  • Where can I find the

    Where can I find the "spm_model_file" when run run_squad_v2.py

    Thanks for publishing your code. I was trying to run "run_squad_v2" to learning ALBERT. There is a flag, "--spm_model_file" when running it. What is that? Where can I download that file?

    opened by wmmxk 9
  • Has anyone reproduced SQuAD 1.1 score(90.2/83.2) on albert-base V2?

    Has anyone reproduced SQuAD 1.1 score(90.2/83.2) on albert-base V2?

    Hi, I downloaded pre-trained ALBERT base V2 model at the link in README.md and tried to fine-tune it on SQuAD 1.1 dataset without using albert hub module. However, I got f1=16.14 and exact match=7.34 as my final result, which is significantly lower than scores(90.2/83.2) reported at README.md.

    Here is the command that I used for fine-tuning

    • ALBERT_ROOT is the directory path where I keep my albert-base-v2 model
    • train_feature_file, predict_feature_file, predict_feature_left_file were created in SQUAD_DIR after I ran the following command

    python -m run_squad_v1
    --albert_config_file="${ALBERT_ROOT}/albert_config.json"
    --output_dir=./output_base_v2/SQUAD
    --train_file="$SQUAD_DIR/train-v1.1.json"
    --predict_file="$SQUAD_DIR/dev-v1.1.json"
    --train_feature_file="$SQUAD_DIR/train.tfrecord"
    --predict_feature_file="$SQUAD_DIR/dev.tfrecord"
    --predict_feature_left_file="$SQUAD_DIR/pred_left_file.pkl"
    --init_checkpoint=""
    --spm_model_file="${ALBERT_ROOT}/30k-clean.model"
    --do_lower_case
    --max_seq_length=384
    --doc_stride=128
    --max_query_length=64
    --do_train=true
    --do_predict=true
    --train_batch_size=48
    --predict_batch_size=8
    --learning_rate=5e-5
    --num_train_epochs=2.0
    --warmup_proportion=.1
    --save_checkpoints_steps=5000
    --n_best_size=20
    --max_answer_length=30

    opened by YJYJLee 8
  • Bad eval results on RTE and CoLA

    Bad eval results on RTE and CoLA

    I tried fine-tuning ALBERT-base model on the two smallest glue tasks, but got only about 66% accuracy for both. I was using GPU (2080Ti) for it. The script for glue fine-tuning has bug in the evaluation part, and I tried to fix it, but I am quite new to tensorflow so I am not sure if there is still something wrong with the script. Below is the script I am using:

    set -ex
    
    OUTPUT_DIR="glue_baseline"
    
    # To start from a custom pretrained checkpoint, set ALBERT_HUB_MODULE_HANDLE
    # below to an empty string and set INIT_CHECKPOINT to your checkpoint path.
    ALBERT_HUB_MODULE_HANDLE="https://tfhub.dev/google/albert_base/1"
    INIT_CHECKPOINT=""
    
    ALBERT_ROOT=pretrained/albert_base
    
    
    function run_task() {
      COMMON_ARGS="--output_dir="${OUTPUT_DIR}/$1" --data_dir="${ALBERT_ROOT}/glue" --vocab_file="${ALBERT_ROOT}/vocab.txt" --spm_model_file="${ALBERT_ROOT}/30k-clean.model" --do_lower_case --max_seq_length=128 --optimizer=adamw --task_name=$1 --warmup_step=$2 --learning_rate=$3 --train_step=$4 --save_checkpoints_steps=$5 --train_batch_size=$6"
      python3 -m run_classifier \
          ${COMMON_ARGS} \
          --do_train \
          --nodo_eval \
          --nodo_predict \
          --albert_hub_module_handle="${ALBERT_HUB_MODULE_HANDLE}" \
          --init_checkpoint="${INIT_CHECKPOINT}"
      python3 -m run_classifier \
          ${COMMON_ARGS} \
          --nodo_train \
          --do_eval \
          --albert_hub_module_handle="${ALBERT_HUB_MODULE_HANDLE}" \
          --do_predict
    }
    
    run_task RTE 200 3e-5 800 100 32
    
    

    I tried printing the training loss and it seems to have converged, but somehow the eval results are nearly random. The eval accuracy for different checkpoints are different, so I think these checkpoints have been loaded.

    opened by zhuchen03 8
  • [ALBERT]: LookupError: gradient registry has no entry for: AddV2

    [ALBERT]: LookupError: gradient registry has no entry for: AddV2

    When run run_classifier_with_tfhub.py, but the training crashed. The error is:

    LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)
    

    My tensorflow-gpu version is 1.14.0

    Anyone knows the reason, pls help.. Thanks

    opened by wxp16 8
  • [ALBERT]Has anyone reproduced ALBERT a scores on GLUE dataset?

    [ALBERT]Has anyone reproduced ALBERT a scores on GLUE dataset?

    I convert tf weight to pytorch weight ,and on QQP dataset, I only get 87% accuracy.

    model: albert-base epochs: 3 learning_rate; 2e-5 batch size: 24 max sequence length: 128 warmup_proportion: 0.1

    opened by lonePatient 8
  • "no dropout" on v2 models

    You say that you are using "no dropout" on the TFHub v2-models. However, looking at the albert_config.json-files there seem to be dropout on most models (https://tfhub.dev/google/albert_base/2). Only on the xxlarge, there is no dropout (https://tfhub.dev/google/albert_xxlarge/2). What is correct?

    opened by peregilk 8
  • Significantly lower than expected eval accuracy on MNLI

    Significantly lower than expected eval accuracy on MNLI

    Before ALBERT was moved to this repository, I downloaded the pre-trained ALBERT-base-2 from TFHub and used run_classifier_sp.py to evaluate the model on MNLI by modifying the provided run.sh script to execute the following instead of run_pretraining_test:

     python -m albert.run_classifier_sp \
        --output_dir="/path/to/output" \
        --export_dir="/path/to/export" \
        --do_eval \
        --nouse_tpu \
        --eval_batch_size=1 \
        --max_seq_length=4 \
        --max_eval_steps=3 \
        --vocab_file="/path/to/albert-base-2/assets/30k-clean.vocab" \
        --data_dir="/path/to/glue/MNLI" \
        --task_name=MNLI
    

    This gave an eval accuracy of approximately 0.34, which is significantly lower than the expected 0.84 discussed in the paper.

    Has anyone else seen such low out-of-the-box evaluation results? Is this simply an issue with how I'm running the evaluation? If so, are there any recommendations for running evaluation to achieve better results?

    opened by 5donuts 7
  • Bump tensorflow from 1.15.2 to 2.9.3

    Bump tensorflow from 1.15.2 to 2.9.3

    Bumps tensorflow from 1.15.2 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • The results can't be reproduced

    The results can't be reproduced

    hi,I run the code according to the official command, but it doesn't reproduce the results, only get accuracy of ~71(datasets: RTE), can you tell me where is wrong?

    opened by kavin525zhang 2
  • tokenization: log spm usage only in debug mode to avoid console spamming

    tokenization: log spm usage only in debug mode to avoid console spamming

    Hi,

    I'm not sure if the repo is currently maintained, but this PR increases the log level (from info to debug) for the SPM tokenization usage message. It will be displayed whenever a line is tokenized/detokenized which is very annoying, because the default log level is info.

    opened by stefan-it 0
  • Difference between v1 and v2 for xxlarge

    Difference between v1 and v2 for xxlarge

    Hi,

    I wanted to clarify a point from the paper and README that I am confused about. In the paper, and the repo's README, it seems like the v1 model was trained only on wikipedia and the book corpus, to compare with BERT. However, in the README, there's the following text:

    On average, ALBERT-xxlarge is slightly worse than the v1, because of the following two reasons: 1) Training additional 1.5 M steps (the only difference between these two models is training for 1.5M steps and 3M steps) did not lead to significant performance improvement. 2) For v1, we did a little bit hyperparameter search among the parameters sets given by BERT, Roberta, and XLnet.

    This implies that the xxlarge version of v1 was also trained on additional data.

    The question is whether the v1 xxlarge model was solely trained on wiki+books, or was it trained on additional data?

    opened by yanaiela 0
  • Explicitly import estimator from tensorflow as a separate import instead of

    Explicitly import estimator from tensorflow as a separate import instead of

    Explicitly import estimator from tensorflow as a separate import instead of accessing it via tf.estimator and depend on the tensorflow estimator target.

    opened by copybara-service[bot] 0
Owner
Google Research
Google Research
天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch

天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch

zxx飞翔的鱼 751 Dec 30, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

Salesforce 44 Nov 1, 2022
Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Ubiquitous Knowledge Processing Lab 59 Dec 1, 2022
PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

data2vec-pytorch PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI (F

Aryan Shekarlaban 105 Jan 4, 2023
IMDB film review sentiment classification based on BERT's supervised learning model.

IMDB film review sentiment classification based on BERT's supervised learning model. On the other hand, the model can be extended to other natural language multi-classification tasks.

Paris 1 Apr 17, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2.3k Dec 29, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2k Feb 9, 2021
Language-Agnostic SEntence Representations

LASER Language-Agnostic SEntence Representations LASER is a library to calculate and use multilingual sentence embeddings. NEWS 2019/11/08 CCMatrix is

Facebook Research 3.2k Jan 4, 2023
Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Recurrent VLN-BERT Code of the Recurrent-VLN-BERT paper: A Recurrent Vision-and-Language BERT for Navigation Yicong Hong, Qi Wu, Yuankai Qi, Cristian

YicongHong 109 Dec 21, 2022
Ongoing research training transformer language models at scale, including: BERT & GPT-2

What is this fork of Megatron-LM and Megatron-DeepSpeed This is a detached fork of https://github.com/microsoft/Megatron-DeepSpeed, which in itself is

BigScience Workshop 316 Jan 3, 2023
Ongoing research training transformer language models at scale, including: BERT & GPT-2

Megatron (1 and 2) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA.

NVIDIA Corporation 3.5k Dec 30, 2022
Pre-training BERT masked language models with custom vocabulary

Pre-training BERT Masked Language Models (MLM) This repository contains the method to pre-train a BERT model using custom vocabulary. It was used to p

Stella Douka 14 Nov 2, 2022
Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

NLP-Summarizer Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5 This project aimed to provide in

Samuel Sharkey 1 Feb 7, 2022
Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

Amazon 69 Jan 3, 2023
This repository describes our reproducible framework for assessing self-supervised representation learning from speech

LeBenchmark: a reproducible framework for assessing SSL from speech Self-Supervised Learning (SSL) using huge unlabeled data has been successfully exp

null 49 Aug 24, 2022
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B

Maluuba Inc. 309 Oct 19, 2022