ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Related tags

Deep Learning albert
Overview

ALBERT

***************New March 28, 2020 ***************

Add a colab tutorial to run fine-tuning for GLUE datasets.

***************New January 7, 2020 ***************

v2 TF-Hub models should be working now with TF 1.15, as we removed the native Einsum op from the graph. See updated TF-Hub links below.

***************New December 30, 2019 ***************

Chinese models are released. We would like to thank CLUE team for providing the training data.

Version 2 of ALBERT models is released.

In this version, we apply 'no dropout', 'additional training data' and 'long training time' strategies to all models. We train ALBERT-base for 10M steps and other models for 3M steps.

The result comparison to the v1 models is as followings:

Average SQuAD1.1 SQuAD2.0 MNLI SST-2 RACE
V2
ALBERT-base 82.3 90.2/83.2 82.1/79.3 84.6 92.9 66.8
ALBERT-large 85.7 91.8/85.2 84.9/81.8 86.5 94.9 75.2
ALBERT-xlarge 87.9 92.9/86.4 87.9/84.1 87.9 95.4 80.7
ALBERT-xxlarge 90.9 94.6/89.1 89.8/86.9 90.6 96.8 86.8
V1
ALBERT-base 80.1 89.3/82.3 80.0/77.1 81.6 90.3 64.0
ALBERT-large 82.4 90.6/83.9 82.3/79.4 83.5 91.7 68.5
ALBERT-xlarge 85.5 92.5/86.1 86.1/83.1 86.4 92.4 74.8
ALBERT-xxlarge 91.0 94.8/89.3 90.2/87.4 90.8 96.9 86.5

The comparison shows that for ALBERT-base, ALBERT-large, and ALBERT-xlarge, v2 is much better than v1, indicating the importance of applying the above three strategies. On average, ALBERT-xxlarge is slightly worse than the v1, because of the following two reasons: 1) Training additional 1.5 M steps (the only difference between these two models is training for 1.5M steps and 3M steps) did not lead to significant performance improvement. 2) For v1, we did a little bit hyperparameter search among the parameters sets given by BERT, Roberta, and XLnet. For v2, we simply adopt the parameters from v1 except for RACE, where we use a learning rate of 1e-5 and 0 ALBERT DR (dropout rate for ALBERT in finetuning). The original (v1) RACE hyperparameter will cause model divergence for v2 models. Given that the downstream tasks are sensitive to the fine-tuning hyperparameters, we should be careful about so called slight improvements.

ALBERT is "A Lite" version of BERT, a popular unsupervised language representation learning algorithm. ALBERT uses parameter-reduction techniques that allow for large-scale configurations, overcome previous memory limitations, and achieve better behavior with respect to model degradation.

For a technical description of the algorithm, see our paper:

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

Release Notes

  • Initial release: 10/9/2019

Results

Performance of ALBERT on GLUE benchmark results using a single-model setup on dev:

Models MNLI QNLI QQP RTE SST MRPC CoLA STS
BERT-large 86.6 92.3 91.3 70.4 93.2 88.0 60.6 90.0
XLNet-large 89.8 93.9 91.8 83.8 95.6 89.2 63.6 91.8
RoBERTa-large 90.2 94.7 92.2 86.6 96.4 90.9 68.0 92.4
ALBERT (1M) 90.4 95.2 92.0 88.1 96.8 90.2 68.7 92.7
ALBERT (1.5M) 90.8 95.3 92.2 89.2 96.9 90.9 71.4 93.0

Performance of ALBERT-xxl on SQuaD and RACE benchmarks using a single-model setup:

Models SQuAD1.1 dev SQuAD2.0 dev SQuAD2.0 test RACE test (Middle/High)
BERT-large 90.9/84.1 81.8/79.0 89.1/86.3 72.0 (76.6/70.1)
XLNet 94.5/89.0 88.8/86.1 89.1/86.3 81.8 (85.5/80.2)
RoBERTa 94.6/88.9 89.4/86.5 89.8/86.8 83.2 (86.5/81.3)
UPM - - 89.9/87.2 -
XLNet + SG-Net Verifier++ - - 90.1/87.2 -
ALBERT (1M) 94.8/89.2 89.9/87.2 - 86.0 (88.2/85.1)
ALBERT (1.5M) 94.8/89.3 90.2/87.4 90.9/88.1 86.5 (89.0/85.5)

Pre-trained Models

TF-Hub modules are available:

Example usage of the TF-Hub module in code:

tags = set()
if is_training:
  tags.add("train")
albert_module = hub.Module("https://tfhub.dev/google/albert_base/1", tags=tags,
                           trainable=True)
albert_inputs = dict(
    input_ids=input_ids,
    input_mask=input_mask,
    segment_ids=segment_ids)
albert_outputs = albert_module(
    inputs=albert_inputs,
    signature="tokens",
    as_dict=True)

# If you want to use the token-level output, use
# albert_outputs["sequence_output"] instead.
output_layer = albert_outputs["pooled_output"]

Most of the fine-tuning scripts in this repository support TF-hub modules via the --albert_hub_module_handle flag.

Pre-training Instructions

To pretrain ALBERT, use run_pretraining.py:

pip install -r albert/requirements.txt
python -m albert.run_pretraining \
    --input_file=... \
    --output_dir=... \
    --init_checkpoint=... \
    --albert_config_file=... \
    --do_train \
    --do_eval \
    --train_batch_size=4096 \
    --eval_batch_size=64 \
    --max_seq_length=512 \
    --max_predictions_per_seq=20 \
    --optimizer='lamb' \
    --learning_rate=.00176 \
    --num_train_steps=125000 \
    --num_warmup_steps=3125 \
    --save_checkpoints_steps=5000

Fine-tuning on GLUE

To fine-tune and evaluate a pretrained ALBERT on GLUE, please see the convenience script run_glue.sh.

Lower-level use cases may want to use the run_classifier.py script directly. The run_classifier.py script is used both for fine-tuning and evaluation of ALBERT on individual GLUE benchmark tasks, such as MNLI:

pip install -r albert/requirements.txt
python -m albert.run_classifier \
  --data_dir=... \
  --output_dir=... \
  --init_checkpoint=... \
  --albert_config_file=... \
  --spm_model_file=... \
  --do_train \
  --do_eval \
  --do_predict \
  --do_lower_case \
  --max_seq_length=128 \
  --optimizer=adamw \
  --task_name=MNLI \
  --warmup_step=1000 \
  --learning_rate=3e-5 \
  --train_step=10000 \
  --save_checkpoints_steps=100 \
  --train_batch_size=128

Good default flag values for each GLUE task can be found in run_glue.sh.

You can fine-tune the model starting from TF-Hub modules instead of raw checkpoints by setting e.g. --albert_hub_module_handle=https://tfhub.dev/google/albert_base/1 instead of --init_checkpoint.

You can find the spm_model_file in the tar files or under the assets folder of the tf-hub module. The name of the model file is "30k-clean.model".

After evaluation, the script should report some output like this:

***** Eval results *****
  global_step = ...
  loss = ...
  masked_lm_accuracy = ...
  masked_lm_loss = ...
  sentence_order_accuracy = ...
  sentence_order_loss = ...

Fine-tuning on SQuAD

To fine-tune and evaluate a pretrained model on SQuAD v1, use the run_squad_v1.py script:

pip install -r albert/requirements.txt
python -m albert.run_squad_v1 \
  --albert_config_file=... \
  --output_dir=... \
  --train_file=... \
  --predict_file=... \
  --train_feature_file=... \
  --predict_feature_file=... \
  --predict_feature_left_file=... \
  --init_checkpoint=... \
  --spm_model_file=... \
  --do_lower_case \
  --max_seq_length=384 \
  --doc_stride=128 \
  --max_query_length=64 \
  --do_train=true \
  --do_predict=true \
  --train_batch_size=48 \
  --predict_batch_size=8 \
  --learning_rate=5e-5 \
  --num_train_epochs=2.0 \
  --warmup_proportion=.1 \
  --save_checkpoints_steps=5000 \
  --n_best_size=20 \
  --max_answer_length=30

You can fine-tune the model starting from TF-Hub modules instead of raw checkpoints by setting e.g. --albert_hub_module_handle=https://tfhub.dev/google/albert_base/1 instead of --init_checkpoint.

For SQuAD v2, use the run_squad_v2.py script:

pip install -r albert/requirements.txt
python -m albert.run_squad_v2 \
  --albert_config_file=... \
  --output_dir=... \
  --train_file=... \
  --predict_file=... \
  --train_feature_file=... \
  --predict_feature_file=... \
  --predict_feature_left_file=... \
  --init_checkpoint=... \
  --spm_model_file=... \
  --do_lower_case \
  --max_seq_length=384 \
  --doc_stride=128 \
  --max_query_length=64 \
  --do_train \
  --do_predict \
  --train_batch_size=48 \
  --predict_batch_size=8 \
  --learning_rate=5e-5 \
  --num_train_epochs=2.0 \
  --warmup_proportion=.1 \
  --save_checkpoints_steps=5000 \
  --n_best_size=20 \
  --max_answer_length=30

You can fine-tune the model starting from TF-Hub modules instead of raw checkpoints by setting e.g. --albert_hub_module_handle=https://tfhub.dev/google/albert_base/1 instead of --init_checkpoint.

Fine-tuning on RACE

For RACE, use the run_race.py script:

pip install -r albert/requirements.txt
python -m albert.run_race \
  --albert_config_file=... \
  --output_dir=... \
  --train_file=... \
  --eval_file=... \
  --data_dir=...\
  --init_checkpoint=... \
  --spm_model_file=... \
  --max_seq_length=512 \
  --max_qa_length=128 \
  --do_train \
  --do_eval \
  --train_batch_size=32 \
  --eval_batch_size=8 \
  --learning_rate=1e-5 \
  --train_step=12000 \
  --warmup_step=1000 \
  --save_checkpoints_steps=100

You can fine-tune the model starting from TF-Hub modules instead of raw checkpoints by setting e.g. --albert_hub_module_handle=https://tfhub.dev/google/albert_base/1 instead of --init_checkpoint.

SentencePiece

Command for generating the sentence piece vocabulary:

spm_train \
--input all.txt --model_prefix=30k-clean --vocab_size=30000 --logtostderr
--pad_id=0 --unk_id=1 --eos_id=-1 --bos_id=-1
--control_symbols=[CLS],[SEP],[MASK]
--user_defined_symbols="(,),\",-,.,–,£,€"
--shuffle_input_sentence=true --input_sentence_size=10000000
--character_coverage=0.99995 --model_type=unigram
Comments
  • LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum)

    LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum)

    I am using run_classifier_with_tfhub with --albert_hub_module_handle=https://tfhub.dev/google/albert_base/2.

    I am getting error like "LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum)"

    The argument is: python3 -m run_classifier_with_tfhub --data_dir=../../DataSet/CoLA/ --t ask_name=cola --output_dir=testing_ttt --vocab_file=vocab.txt --albert_hub_module_handle=https://tfhub.dev/google/albert_base/2 --do_train=True --do_eval=True --max_seq _length=128 --train_batch_size=32 --learning_rate=2e-05 --num_train_epochs=3.0

    I am using tensorflow==1.15.0

    opened by MichaelCaohn 22
  • No decreasing loss when pre-train for xxlarge

    No decreasing loss when pre-train for xxlarge

    Hi, I'm pre-training xxlarge model using own language. I trained on TPU-v2-256 but loss is not decreasing. Below is the learning information.

    • vocab size: 33001
    • training data size: 518G ( dupe factor: 10)
    • max_seq_length: 512
    • 3 gram masking, using SOP
    • word size: 5 B
    • batch size: 512
    • optimizer: lamb
    • learning rate: 0.00176

    I1211 08:56:02.464132 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:<DatasetV1Adapter shapes: {input_ids: (2, 512), input_mask: (2, 512), masked_lm_ids: (2, 77), masked_lm_positions: (2, 77), masked_lm_weights: (2, 77), next_sentence_labels: (2, 1), segment_ids: (2, 512)}, types: {input_ids: tf.int32, input_mask: tf.int32, masked_lm_ids: tf.int32, masked_lm_positions: tf.int32, masked_lm_weights: tf.float32, next_sentence_labels: tf.int32, segment_ids: tf.int32}> I1211 08:56:02.510196 140024623753024 run_pretraining.py:457] <DatasetV1Adapter shapes: {input_ids: (2, 512), input_mask: (2, 512), masked_lm_ids: (2, 77), masked_lm_positions: (2, 77), masked_lm_weights: (2, 77), next_sentence_labels: (2, 1), segment_ids: (2, 512)}, types: {input_ids: tf.int32, input_mask: tf.int32, masked_lm_ids: tf.int32, masked_lm_positions: tf.int32, masked_lm_weights: tf.float32, next_sentence_labels: tf.int32, segment_ids: tf.int32}> INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.523885 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.526081 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.527927 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.529864 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.531889 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.533753 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.535558 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] INFO:tensorflow:Found small feature: next_sentence_labels [2, 1] I1211 08:56:02.537545 140024623753024 tpu_estimator.py:1201] Found small feature: next_sentence_labels [2, 1] 2019-12-11 08:56:02.673414: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2019-12-11 08:56:02.673472: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303) 2019-12-11 08:56:02.673496: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (instance-2): /proc/driver/nvidia/version does not exist INFO:tensorflow:*** Features *** I1211 08:56:02.704437 140024623753024 run_pretraining.py:150] *** Features *** INFO:tensorflow: name = input_ids, shape = (2, 512) I1211 08:56:02.704912 140024623753024 run_pretraining.py:152] name = input_ids, shape = (2, 512) INFO:tensorflow: name = input_mask, shape = (2, 512) I1211 08:56:02.705025 140024623753024 run_pretraining.py:152] name = input_mask, shape = (2, 512) INFO:tensorflow: name = masked_lm_ids, shape = (2, 77) I1211 08:56:02.705091 140024623753024 run_pretraining.py:152] name = masked_lm_ids, shape = (2, 77) INFO:tensorflow: name = masked_lm_positions, shape = (2, 77) I1211 08:56:02.705152 140024623753024 run_pretraining.py:152] name = masked_lm_positions, shape = (2, 77) INFO:tensorflow: name = masked_lm_weights, shape = (2, 77) I1211 08:56:02.705220 140024623753024 run_pretraining.py:152] name = masked_lm_weights, shape = (2, 77) INFO:tensorflow: name = next_sentence_labels, shape = (2, 1) I1211 08:56:02.705290 140024623753024 run_pretraining.py:152] name = next_sentence_labels, shape = (2, 1) INFO:tensorflow: name = segment_ids, shape = (2, 512) I1211 08:56:02.705374 140024623753024 run_pretraining.py:152] name = segment_ids, shape = (2, 512)

    INFO:tensorflow:**** Trainable Variables **** I1211 08:56:04.239879 140024623753024 run_pretraining.py:220] **** Trainable Variables **** INFO:tensorflow: name = bert/embeddings/word_embeddings:0, shape = (33001, 128) I1211 08:56:04.239998 140024623753024 run_pretraining.py:226] name = bert/embeddings/word_embeddings:0, shape = (33001, 128) INFO:tensorflow: name = bert/embeddings/token_type_embeddings:0, shape = (2, 128) I1211 08:56:04.240141 140024623753024 run_pretraining.py:226] name = bert/embeddings/token_type_embeddings:0, shape = (2, 128) INFO:tensorflow: name = bert/embeddings/position_embeddings:0, shape = (512, 128) I1211 08:56:04.240252 140024623753024 run_pretraining.py:226] name = bert/embeddings/position_embeddings:0, shape = (512, 128) INFO:tensorflow: name = bert/embeddings/LayerNorm/beta:0, shape = (128,) I1211 08:56:04.240369 140024623753024 run_pretraining.py:226] name = bert/embeddings/LayerNorm/beta:0, shape = (128,) INFO:tensorflow: name = bert/embeddings/LayerNorm/gamma:0, shape = (128,) I1211 08:56:04.240468 140024623753024 run_pretraining.py:226] name = bert/embeddings/LayerNorm/gamma:0, shape = (128,) INFO:tensorflow: name = bert/encoder/embedding_hidden_mapping_in/kernel:0, shape = (128, 4096) I1211 08:56:04.240564 140024623753024 run_pretraining.py:226] name = bert/encoder/embedding_hidden_mapping_in/kernel:0, shape = (128, 4096) INFO:tensorflow: name = bert/encoder/embedding_hidden_mapping_in/bias:0, shape = (4096,) I1211 08:56:04.240664 140024623753024 run_pretraining.py:226] name = bert/encoder/embedding_hidden_mapping_in/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel:0, shape = (4096, 4096) I1211 08:56:04.240769 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias:0, shape = (4096,) I1211 08:56:04.240869 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel:0, shape = (4096, 4096) I1211 08:56:04.240964 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias:0, shape = (4096,) I1211 08:56:04.241075 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel:0, shape = (4096, 4096) I1211 08:56:04.241171 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias:0, shape = (4096,) I1211 08:56:04.241268 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel:0, shape = (4096, 4096) I1211 08:56:04.241392 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias:0, shape = (4096,) I1211 08:56:04.241534 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta:0, shape = (4096,) I1211 08:56:04.241631 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma:0, shape = (4096,) I1211 08:56:04.241748 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel:0, shape = (4096, 16384) I1211 08:56:04.241850 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel:0, shape = (4096, 16384) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias:0, shape = (16384,) I1211 08:56:04.241949 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias:0, shape = (16384,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel:0, shape = (16384, 4096) I1211 08:56:04.242043 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel:0, shape = (16384, 4096) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias:0, shape = (4096,) I1211 08:56:04.242140 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta:0, shape = (4096,) I1211 08:56:04.242233 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta:0, shape = (4096,) INFO:tensorflow: name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma:0, shape = (4096,) I1211 08:56:04.242332 140024623753024 run_pretraining.py:226] name = bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma:0, shape = (4096,) INFO:tensorflow: name = bert/pooler/dense/kernel:0, shape = (4096, 4096) I1211 08:56:04.242433 140024623753024 run_pretraining.py:226] name = bert/pooler/dense/kernel:0, shape = (4096, 4096) INFO:tensorflow: name = bert/pooler/dense/bias:0, shape = (4096,) I1211 08:56:04.242532 140024623753024 run_pretraining.py:226] name = bert/pooler/dense/bias:0, shape = (4096,) INFO:tensorflow: name = cls/predictions/transform/dense/kernel:0, shape = (4096, 128) I1211 08:56:04.242635 140024623753024 run_pretraining.py:226] name = cls/predictions/transform/dense/kernel:0, shape = (4096, 128) INFO:tensorflow: name = cls/predictions/transform/dense/bias:0, shape = (128,) I1211 08:56:04.242760 140024623753024 run_pretraining.py:226] name = cls/predictions/transform/dense/bias:0, shape = (128,) INFO:tensorflow: name = cls/predictions/transform/LayerNorm/beta:0, shape = (128,) I1211 08:56:04.242856 140024623753024 run_pretraining.py:226] name = cls/predictions/transform/LayerNorm/beta:0, shape = (128,) INFO:tensorflow: name = cls/predictions/transform/LayerNorm/gamma:0, shape = (128,) I1211 08:56:04.242951 140024623753024 run_pretraining.py:226] name = cls/predictions/transform/LayerNorm/gamma:0, shape = (128,) INFO:tensorflow: name = cls/predictions/output_bias:0, shape = (33001,) I1211 08:56:04.243044 140024623753024 run_pretraining.py:226] name = cls/predictions/output_bias:0, shape = (33001,) INFO:tensorflow: name = cls/seq_relationship/output_weights:0, shape = (2, 4096) I1211 08:56:04.243137 140024623753024 run_pretraining.py:226] name = cls/seq_relationship/output_weights:0, shape = (2, 4096) INFO:tensorflow: name = cls/seq_relationship/output_bias:0, shape = (2,) I1211 08:56:04.243235 140024623753024 run_pretraining.py:226] name = cls/seq_relationship/output_bias:0, shape = (2,)

    I1211 09:12:03.138811 140024623753024 basic_session_run_hooks.py:262] loss = 10.181114, step = 1000 I1211 09:26:09.008900 140024623753024 basic_session_run_hooks.py:260] loss = 7.6005945, step = 2000 (845.870 sec) I1211 09:40:12.286720 140024623753024 basic_session_run_hooks.py:260] loss = 7.645055, step = 3000 (843.278 sec) I1211 09:54:16.299396 140024623753024 basic_session_run_hooks.py:260] loss = 7.6258326, step = 4000 (844.013 sec) I1211 10:08:19.825035 140024623753024 basic_session_run_hooks.py:260] loss = 7.363482, step = 5000 (843.526 sec) I1211 10:22:25.123742 140024623753024 basic_session_run_hooks.py:260] loss = 6.8203845, step = 6000 (845.299 sec) I1211 10:36:29.082039 140024623753024 basic_session_run_hooks.py:260] loss = 6.5194592, step = 7000 (843.958 sec) I1211 10:50:31.896788 140024623753024 basic_session_run_hooks.py:260] loss = 6.854472, step = 8000 (842.815 sec) I1211 11:04:36.726402 140024623753024 basic_session_run_hooks.py:260] loss = 7.0283566, step = 9000 (844.830 sec) I1211 11:19:29.132026 140024623753024 basic_session_run_hooks.py:260] loss = 6.5989375, step = 10000 (892.406 sec) I1211 11:33:32.866184 140024623753024 basic_session_run_hooks.py:260] loss = 6.550018, step = 11000 (843.734 sec) ... ... I1211 13:41:01.039676 140024623753024 basic_session_run_hooks.py:260] loss = 6.5004697, step = 20000 (894.206 sec) ... I1211 16:02:31.998177 140024623753024 basic_session_run_hooks.py:260] loss = 7.100818, step = 30000 (892.416 sec) ... I1211 18:24:15.941736 140024623753024 basic_session_run_hooks.py:260] loss = 6.5937705, step = 40000 (896.439 sec) ... I1211 20:45:50.533722 140024623753024 basic_session_run_hooks.py:260] loss = 5.950697, step = 50000 (895.989 sec) ... I1211 23:07:25.169874 140024623753024 basic_session_run_hooks.py:260] loss = 6.789865, step = 60000 (893.845 sec) ... I1212 01:28:58.518174 140024623753024 basic_session_run_hooks.py:260] loss = 6.453152, step = 70000 (892.751 sec) ... I1212 03:50:25.943136 140024623753024 basic_session_run_hooks.py:260] loss = 6.7387037, step = 80000 (889.578 sec)

    What's wrong?

    opened by jwkim912 16
  • ALBERT-xxlarge V2 training on TPU V3-512 extremely slow

    ALBERT-xxlarge V2 training on TPU V3-512 extremely slow

    Hello,

    We are training a bioinformatics data using ALBERT-xxlarge on TPU V3-512.

    According to the paper you trained "ALBERT-xxlarge" for 125k in 32h.

    However, our training will take 7 days to complete 130k.

    Our vocab file is only 34 and this is our training command:

    python -m albert.run_pretraining \
        --input_file=gs://...../_train_*.tfrecord \
        --output_dir=gs:/....../albert_model/ \
        --albert_config_file=/......../albert-xxlarge-v2-config.json \
        --do_train \
        --do_eval \
        --train_batch_size=10240 \
        --eval_batch_size=64 \
        --max_seq_length=512 \
        --max_predictions_per_seq=20 \
        --optimizer='lamb' \
        --learning_rate=.002 \
        --iterations_per_loop=100 \
        --num_train_steps=130000 \
        --num_warmup_steps=42000 \
        --save_checkpoints_steps=1000 \
        --use_tpu=TRUE \
        --num_tpu_cores=512 \
        --tpu_name=.....
    

    I also tried to change the "iterations_per_loop" to 1000 or even bigger but that didn't help.

    The current logs from the training is :

    I0407 18:04:44.154831 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
    WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    W0407 18:04:44.230242 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    INFO:tensorflow:Outfeed finished for iteration (8, 70)
    I0407 18:05:06.949739 140197647865600 tpu_estimator.py:279] Outfeed finished for iteration (8, 70)
    I0407 18:05:14.312140 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
    WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    W0407 18:05:14.393373 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    I0407 18:05:44.470578 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
    WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    W0407 18:05:44.566381 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    INFO:tensorflow:Outfeed finished for iteration (8, 84)
    I0407 18:06:08.473748 140197647865600 tpu_estimator.py:279] Outfeed finished for iteration (8, 84)
    I0407 18:06:14.650656 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
    WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    W0407 18:06:14.725901 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    I0407 18:06:44.819700 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
    WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    W0407 18:06:44.902827 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    INFO:tensorflow:Outfeed finished for iteration (8, 98)
    I0407 18:07:09.999137 140197647865600 tpu_estimator.py:279] Outfeed finished for iteration (8, 98)
    I0407 18:07:14.984425 140197639472896 transport.py:157] Attempting refresh to obtain initial access_token
    WARNING:tensorflow:TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    W0407 18:07:15.060185 140197639472896 preempted_hook.py:91] TPUPollingThread found TPU b'node-6' in state READY, and health HEALTHY.
    INFO:tensorflow:loss = 3.1807582, step = 900 (463.227 sec)
    I0407 18:07:18.708591 140198823081728 basic_session_run_hooks.py:260] loss = 3.1807582, step = 900 (463.227 sec)
    INFO:tensorflow:global_step/sec: 0.215877
    I0407 18:07:18.709693 140198823081728 tpu_estimator.py:2307] global_step/sec: 0.215877
    INFO:tensorflow:examples/sec: 2210.58
    I0407 18:07:18.709883 140198823081728 tpu_estimator.py:2308] examples/sec: 2210.58
    INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
    

    It takes around 463 seconds per 100 steps, which means we can train 130k in 7 days. (130000 / 100 ) * 463 = 601900 seconds = 7 days.

    The tpu, server and the bucket all at the same region.

    In SUMMIT (world fastest computer) I was able to train bert with 30 layers and it took only 24 hours to finish around 122k steps using 6k V100 GPUs with Global batch size of 11k.

    Do you have any idea why we can't reproduce the same speed as the paper ?

    @0x0539 @Danny-Google Your feedback will be highly appreciated

    opened by agemagician 15
  • Index Out of Range Error in tokenization using TF Hub for Pretrained Albert Models

    Index Out of Range Error in tokenization using TF Hub for Pretrained Albert Models

    I am getting Index out of Range error in tokenization.py when running a finetune Albert large model with TF Hub. I printed out the vocab file and printing out the token before the error. You can see the error and print-outs below.

    Vocab File: b'/tmp/tfhub_modules/c88f9d4ac7469966b2fab3b577a8031ae23e125a/assets/30k-clean.model'
    Token:  
    
    Traceback (most recent call last):
      File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 318, in <module>
        tf.compat.v1.app.run()
      File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
        _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
      File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/absl/app.py", line 299, in run
        _run_main(main, args)
      File "/home/[user]/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
        sys.exit(main(argv))
      File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 185, in main
        tokenizer = create_tokenizer_from_hub_module(FLAGS.albert_hub_module_handle)
      File "/home/[user]/Documents/ml-tests/falling-albert/albert/run_classifier_with_tfhub.py", line 161, in create_tokenizer_from_hub_module
        spm_model_file=FLAGS.spm_model_file)
      File "/home/[user]/Documents/ml-tests/falling-albert/albert/tokenization.py", line 249, in __init__
        self.vocab = load_vocab(vocab_file)
      File "/home/[user]/Documents/ml-tests/falling-albert/albert/tokenization.py", line 203, in load_vocab
        token = token.strip().split()[0]
    IndexError: list index out of range
    

    Albert Finetune Shell Script

    #!/bin/bash
    pip install -r albert/requirements.txt
    python -m albert.run_classifier_with_tfhub \
    --albert_hub_module_handle=https://tfhub.dev/google/albert_xlarge/1 \
    --task_name=cola \
    --do_train=true \
    --do_eval=true  \
    --data_dir=./data-to-albert \
    --max_seq_length=128  \
    --train_batch_size=32  \
    --learning_rate=2e-05 \
    --num_train_epochs=3.0  \
    --output_dir=./checkpoints/test
    
    opened by jsmith09 9
  • Where can I find the

    Where can I find the "spm_model_file" when run run_squad_v2.py

    Thanks for publishing your code. I was trying to run "run_squad_v2" to learning ALBERT. There is a flag, "--spm_model_file" when running it. What is that? Where can I download that file?

    opened by wmmxk 9
  • Has anyone reproduced SQuAD 1.1 score(90.2/83.2) on albert-base V2?

    Has anyone reproduced SQuAD 1.1 score(90.2/83.2) on albert-base V2?

    Hi, I downloaded pre-trained ALBERT base V2 model at the link in README.md and tried to fine-tune it on SQuAD 1.1 dataset without using albert hub module. However, I got f1=16.14 and exact match=7.34 as my final result, which is significantly lower than scores(90.2/83.2) reported at README.md.

    Here is the command that I used for fine-tuning

    • ALBERT_ROOT is the directory path where I keep my albert-base-v2 model
    • train_feature_file, predict_feature_file, predict_feature_left_file were created in SQUAD_DIR after I ran the following command

    python -m run_squad_v1
    --albert_config_file="${ALBERT_ROOT}/albert_config.json"
    --output_dir=./output_base_v2/SQUAD
    --train_file="$SQUAD_DIR/train-v1.1.json"
    --predict_file="$SQUAD_DIR/dev-v1.1.json"
    --train_feature_file="$SQUAD_DIR/train.tfrecord"
    --predict_feature_file="$SQUAD_DIR/dev.tfrecord"
    --predict_feature_left_file="$SQUAD_DIR/pred_left_file.pkl"
    --init_checkpoint=""
    --spm_model_file="${ALBERT_ROOT}/30k-clean.model"
    --do_lower_case
    --max_seq_length=384
    --doc_stride=128
    --max_query_length=64
    --do_train=true
    --do_predict=true
    --train_batch_size=48
    --predict_batch_size=8
    --learning_rate=5e-5
    --num_train_epochs=2.0
    --warmup_proportion=.1
    --save_checkpoints_steps=5000
    --n_best_size=20
    --max_answer_length=30

    opened by YJYJLee 8
  • Bad eval results on RTE and CoLA

    Bad eval results on RTE and CoLA

    I tried fine-tuning ALBERT-base model on the two smallest glue tasks, but got only about 66% accuracy for both. I was using GPU (2080Ti) for it. The script for glue fine-tuning has bug in the evaluation part, and I tried to fix it, but I am quite new to tensorflow so I am not sure if there is still something wrong with the script. Below is the script I am using:

    set -ex
    
    OUTPUT_DIR="glue_baseline"
    
    # To start from a custom pretrained checkpoint, set ALBERT_HUB_MODULE_HANDLE
    # below to an empty string and set INIT_CHECKPOINT to your checkpoint path.
    ALBERT_HUB_MODULE_HANDLE="https://tfhub.dev/google/albert_base/1"
    INIT_CHECKPOINT=""
    
    ALBERT_ROOT=pretrained/albert_base
    
    
    function run_task() {
      COMMON_ARGS="--output_dir="${OUTPUT_DIR}/$1" --data_dir="${ALBERT_ROOT}/glue" --vocab_file="${ALBERT_ROOT}/vocab.txt" --spm_model_file="${ALBERT_ROOT}/30k-clean.model" --do_lower_case --max_seq_length=128 --optimizer=adamw --task_name=$1 --warmup_step=$2 --learning_rate=$3 --train_step=$4 --save_checkpoints_steps=$5 --train_batch_size=$6"
      python3 -m run_classifier \
          ${COMMON_ARGS} \
          --do_train \
          --nodo_eval \
          --nodo_predict \
          --albert_hub_module_handle="${ALBERT_HUB_MODULE_HANDLE}" \
          --init_checkpoint="${INIT_CHECKPOINT}"
      python3 -m run_classifier \
          ${COMMON_ARGS} \
          --nodo_train \
          --do_eval \
          --albert_hub_module_handle="${ALBERT_HUB_MODULE_HANDLE}" \
          --do_predict
    }
    
    run_task RTE 200 3e-5 800 100 32
    
    

    I tried printing the training loss and it seems to have converged, but somehow the eval results are nearly random. The eval accuracy for different checkpoints are different, so I think these checkpoints have been loaded.

    opened by zhuchen03 8
  • [ALBERT]: LookupError: gradient registry has no entry for: AddV2

    [ALBERT]: LookupError: gradient registry has no entry for: AddV2

    When run run_classifier_with_tfhub.py, but the training crashed. The error is:

    LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)
    

    My tensorflow-gpu version is 1.14.0

    Anyone knows the reason, pls help.. Thanks

    opened by wxp16 8
  • [ALBERT]Has anyone reproduced ALBERT a scores on GLUE dataset?

    [ALBERT]Has anyone reproduced ALBERT a scores on GLUE dataset?

    I convert tf weight to pytorch weight ,and on QQP dataset, I only get 87% accuracy.

    model: albert-base epochs: 3 learning_rate; 2e-5 batch size: 24 max sequence length: 128 warmup_proportion: 0.1

    opened by lonePatient 8
  • "no dropout" on v2 models

    You say that you are using "no dropout" on the TFHub v2-models. However, looking at the albert_config.json-files there seem to be dropout on most models (https://tfhub.dev/google/albert_base/2). Only on the xxlarge, there is no dropout (https://tfhub.dev/google/albert_xxlarge/2). What is correct?

    opened by peregilk 8
  • Significantly lower than expected eval accuracy on MNLI

    Significantly lower than expected eval accuracy on MNLI

    Before ALBERT was moved to this repository, I downloaded the pre-trained ALBERT-base-2 from TFHub and used run_classifier_sp.py to evaluate the model on MNLI by modifying the provided run.sh script to execute the following instead of run_pretraining_test:

     python -m albert.run_classifier_sp \
        --output_dir="/path/to/output" \
        --export_dir="/path/to/export" \
        --do_eval \
        --nouse_tpu \
        --eval_batch_size=1 \
        --max_seq_length=4 \
        --max_eval_steps=3 \
        --vocab_file="/path/to/albert-base-2/assets/30k-clean.vocab" \
        --data_dir="/path/to/glue/MNLI" \
        --task_name=MNLI
    

    This gave an eval accuracy of approximately 0.34, which is significantly lower than the expected 0.84 discussed in the paper.

    Has anyone else seen such low out-of-the-box evaluation results? Is this simply an issue with how I'm running the evaluation? If so, are there any recommendations for running evaluation to achieve better results?

    opened by 5donuts 7
  • Bump tensorflow from 1.15.2 to 2.9.3

    Bump tensorflow from 1.15.2 to 2.9.3

    Bumps tensorflow from 1.15.2 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • The results can't be reproduced

    The results can't be reproduced

    hi,I run the code according to the official command, but it doesn't reproduce the results, only get accuracy of ~71(datasets: RTE), can you tell me where is wrong?

    opened by kavin525zhang 2
  • tokenization: log spm usage only in debug mode to avoid console spamming

    tokenization: log spm usage only in debug mode to avoid console spamming

    Hi,

    I'm not sure if the repo is currently maintained, but this PR increases the log level (from info to debug) for the SPM tokenization usage message. It will be displayed whenever a line is tokenized/detokenized which is very annoying, because the default log level is info.

    opened by stefan-it 0
  • Difference between v1 and v2 for xxlarge

    Difference between v1 and v2 for xxlarge

    Hi,

    I wanted to clarify a point from the paper and README that I am confused about. In the paper, and the repo's README, it seems like the v1 model was trained only on wikipedia and the book corpus, to compare with BERT. However, in the README, there's the following text:

    On average, ALBERT-xxlarge is slightly worse than the v1, because of the following two reasons: 1) Training additional 1.5 M steps (the only difference between these two models is training for 1.5M steps and 3M steps) did not lead to significant performance improvement. 2) For v1, we did a little bit hyperparameter search among the parameters sets given by BERT, Roberta, and XLnet.

    This implies that the xxlarge version of v1 was also trained on additional data.

    The question is whether the v1 xxlarge model was solely trained on wiki+books, or was it trained on additional data?

    opened by yanaiela 0
  • Explicitly import estimator from tensorflow as a separate import instead of

    Explicitly import estimator from tensorflow as a separate import instead of

    Explicitly import estimator from tensorflow as a separate import instead of accessing it via tf.estimator and depend on the tensorflow estimator target.

    opened by copybara-service[bot] 0
Owner
Google Research
Google Research
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

Pranaydeep Singh 22 Dec 8, 2022
Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations.

S2VC Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations. In thi

null 81 Dec 15, 2022
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

Facebook Research 253 Jan 6, 2023
Implementation of the method described in the Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

null 4 Mar 11, 2022
Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

Microsoft 282 Jan 9, 2023
I-BERT: Integer-only BERT Quantization

I-BERT: Integer-only BERT Quantization HuggingFace Implementation I-BERT is also available in the master branch of HuggingFace! Visit the following li

Sehoon Kim 139 Dec 27, 2022
Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

THUNLP 37 Oct 30, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data This repository provides the implementation details for

null 124 Dec 27, 2022
VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

Salesforce 44 Nov 1, 2022
[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

RoSTER The source code used for Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training, p

Yu Meng 60 Dec 30, 2022
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

VITA 59 Dec 28, 2022
Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Patch-Rotation(PatchRot) Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models Submitted to Neurips2021 To

null 4 Jul 12, 2021
Learning trajectory representations using self-supervision and programmatic supervision.

Trajectory Embedding for Behavior Analysis (TREBA) Implementation from the paper: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Y

null 58 Jan 6, 2023
Supervised Contrastive Learning for Downstream Optimized Sequence Representations

SupCL-Seq ?? Supervised Contrastive Learning for Downstream Optimized Sequence representations (SupCS-Seq) accepted to be published in EMNLP 2021, ext

Hooman Sedghamiz 18 Oct 21, 2022
Python scripts to detect faces in Python with the BlazeFace Tensorflow Lite models

Python scripts to detect faces using Python with the BlazeFace Tensorflow Lite models. Tested on Windows 10, Tensorflow 2.4.0 (Python 3.8).

Ibai Gorordo 46 Nov 17, 2022
This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network.

Lite-HRNet: A Lightweight High-Resolution Network Introduction This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution

HRNet 675 Dec 25, 2022
Python codes for Lite Audio-Visual Speech Enhancement.

Lite Audio-Visual Speech Enhancement (Interspeech 2020) Introduction This is the PyTorch implementation of Lite Audio-Visual Speech Enhancement (LAVSE

Shang-Yi Chuang 85 Dec 1, 2022