ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Overview

ELECTRA

Introduction

ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a GAN. At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the SQuAD 2.0 dataset.

For a detailed description and experimental results, please refer to our ICLR 2020 paper ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators.

This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. GLUE), QA tasks (e.g., SQuAD), and sequence tagging tasks (e.g., text chunking).

This repository also contains code for Electric, a version of ELECTRA inspired by energy-based models. Electric provides a more principled view of ELECTRA as a "negative sampling" cloze model. It can also efficiently produce pseudo-likelihood scores for text, which can be used to re-rank the outputs of speech recognition or machine translation systems. For details on Electric, please refer to out EMNLP 2020 paper Pre-Training Transformers as Energy-Based Cloze Models.

Released Models

We are initially releasing three pre-trained models:

Model Layers Hidden Size Params GLUE score (test set) Download
ELECTRA-Small 12 256 14M 77.4 link
ELECTRA-Base 12 768 110M 82.7 link
ELECTRA-Large 24 1024 335M 85.2 link

The models were trained on uncased English text. They correspond to ELECTRA-Small++, ELECTRA-Base++, ELECTRA-1.75M in our paper. We hope to release other models, such as multilingual models, in the future.

On GLUE, ELECTRA-Large scores slightly better than ALBERT/XLNET, ELECTRA-Base scores better than BERT-Large, and ELECTRA-Small scores slightly worst than TinyBERT (but uses no distillation). See the expected results section below for detailed performance numbers.

Requirements

Pre-training

Use build_pretraining_dataset.py to create a pre-training dataset from a dump of raw text. It has the following arguments:

  • --corpus-dir: A directory containing raw text files to turn into ELECTRA examples. A text file can contain multiple documents with empty lines separating them.
  • --vocab-file: File defining the wordpiece vocabulary.
  • --output-dir: Where to write out ELECTRA examples.
  • --max-seq-length: The number of tokens per example (128 by default).
  • --num-processes: If >1 parallelize across multiple processes (1 by default).
  • --blanks-separate-docs: Whether blank lines indicate document boundaries (True by default).
  • --do-lower-case/--no-lower-case: Whether to lower case the input text (True by default).

Use run_pretraining.py to pre-train an ELECTRA model. It has the following arguments:

  • --data-dir: a directory where pre-training data, model weights, etc. are stored. By default, the training loads examples from <data-dir>/pretrain_tfrecords and a vocabulary from <data-dir>/vocab.txt.
  • --model-name: a name for the model being trained. Model weights will be saved in <data-dir>/models/<model-name> by default.
  • --hparams (optional): a JSON dict or path to a JSON file containing model hyperparameters, data paths, etc. See configure_pretraining.py for the supported hyperparameters.

If training is halted, re-running the run_pretraining.py with the same arguments will continue the training where it left off.

You can continue pre-training from the released ELECTRA checkpoints by

  1. Setting the model-name to point to a downloaded model (e.g., --model-name electra_small if you downloaded weights to $DATA_DIR/electra_small).
  2. Setting num_train_steps by (for example) adding "num_train_steps": 4010000 to the --hparams. This will continue training the small model for 10000 more steps (it has already been trained for 4e6 steps).
  3. Increase the learning rate to account for the linear learning rate decay. For example, to start with a learning rate of 2e-4 you should set the learning_rate hparam to 2e-4 * (4e6 + 10000) / 10000.
  4. For ELECTRA-Small, you also need to specifiy "generator_hidden_size": 1.0 in the hparams because we did not use a small generator for that model.

Quickstart: Pre-train a small ELECTRA model.

These instructions pre-train a small ELECTRA model (12 layers, 256 hidden size). Unfortunately, the data we used in the paper is not publicly available, so we will use the OpenWebTextCorpus released by Aaron Gokaslan and Vanya Cohen instead. The fully-trained model (~4 days on a v100 GPU) should perform roughly in between GPT and BERT-Base in terms of GLUE performance. By default the model is trained on length-128 sequences, so it is not suitable for running on question answering. See the "expected results" section below for more details on model performance.

Setup

  1. Place a vocabulary file in $DATA_DIR/vocab.txt. Our ELECTRA models all used the exact same vocabulary as English uncased BERT, which you can download here.
  2. Download the OpenWebText corpus (12G) and extract it (i.e., run tar xf openwebtext.tar.xz). Place it in $DATA_DIR/openwebtext.
  3. Run python3 build_openwebtext_pretraining_dataset.py --data-dir $DATA_DIR --num-processes 5. It pre-processes/tokenizes the data and outputs examples as tfrecord files under $DATA_DIR/pretrain_tfrecords. The tfrecords require roughly 30G of disk space.

Pre-training the model.

Run python3 run_pretraining.py --data-dir $DATA_DIR --model-name electra_small_owt to train a small ELECTRA model for 1 million steps on the data. This takes slightly over 4 days on a Tesla V100 GPU. However, the model should achieve decent results after 200k steps (10 hours of training on the v100 GPU).

To customize the training, add --hparams '{"hparam1": value1, "hparam2": value2, ...}' to the run command. --hparams can also be a path to a .json file containing the hyperparameters. Some particularly useful options:

  • "debug": true trains a tiny ELECTRA model for a few steps.
  • "model_size": one of "small", "base", or "large": determines the size of the model
  • "electra_objective": false trains a model with masked language modeling instead of replaced token detection (essentially BERT with dynamic masking and no next-sentence prediction).
  • "num_train_steps": n controls how long the model is pre-trained for.
  • "pretrain_tfrecords": <paths> determines where the pre-training data is located. Note you need to specify the specific files not just the directory (e.g., <data-dir>/pretrain_tf_records/pretrain_data.tfrecord*)
  • "vocab_file": <path> and "vocab_size": n can be used to set a custom wordpiece vocabulary.
  • "learning_rate": lr, "train_batch_size": n, etc. can be used to change training hyperparameters
  • "model_hparam_overrides": {"hidden_size": n, "num_hidden_layers": m}, etc. can be used to changed the hyperparameters for the underlying transformer (the "model_size" flag sets the default values).

See configure_pretraining.py for the full set of supported hyperparameters.

Evaluating the pre-trained model.

To evaluate the model on a downstream task, see the below finetuning instructions. To evaluate the generator/discriminator on the openwebtext data run python3 run_pretraining.py --data-dir $DATA_DIR --model-name electra_small_owt --hparams '{"do_train": false, "do_eval": true}'. This will print out eval metrics such as the accuracy of the generator and discriminator, and also writing the metrics out to data-dir/model-name/results.

Fine-tuning

Use run_finetuning.py to fine-tune and evaluate an ELECTRA model on a downstream NLP task. It expects three arguments:

  • --data-dir: a directory where data, model weights, etc. are stored. By default, the script loads finetuning data from <data-dir>/finetuning_data/<task-name> and a vocabulary from <data-dir>/vocab.txt.
  • --model-name: a name of the pre-trained model: the pre-trained weights should exist in data-dir/models/model-name.
  • --hparams: a JSON dict containing model hyperparameters, data paths, etc. (e.g., --hparams '{"task_names": ["rte"], "model_size": "base", "learning_rate": 1e-4, ...}'). See configure_pretraining.py for the supported hyperparameters. Instead of a dict, this can also be a path to a .json file containing the hyperparameters. You must specify the "task_names" and "model_size" (see examples below).

Eval metrics will be saved in data-dir/model-name/results and model weights will be saved in data-dir/model-name/finetuning_models by default. Evaluation is done on the dev set by default. To customize the training, add --hparams '{"hparam1": value1, "hparam2": value2, ...}' to the run command. Some particularly useful options:

  • "debug": true fine-tunes a tiny ELECTRA model for a few steps.
  • "task_names": ["task_name"]: specifies the tasks to train on. A list because the codebase nominally supports multi-task learning, (although be warned this has not been thoroughly tested).
  • "model_size": one of "small", "base", or "large": determines the size of the model; you must set this to the same size as the pre-trained model.
  • "do_train" and "do_eval": train and/or evaluate a model (both are set to true by default). For using "do_eval": true with "do_train": false, you need to specify the init_checkpoint, e.g., python3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{"model_size": "base", "task_names": ["mnli"], "do_train": false, "do_eval": true, "init_checkpoint": "<data-dir>/models/electra_base/finetuning_models/mnli_model_1"}'
  • "num_trials": n: If >1, does multiple fine-tuning/evaluation runs with different random seeds.
  • "learning_rate": lr, "train_batch_size": n, etc. can be used to change training hyperparameters.
  • "model_hparam_overrides": {"hidden_size": n, "num_hidden_layers": m}, etc. can be used to changed the hyperparameters for the underlying transformer (the "model_size" flag sets the default values).

Setup

Get a pre-trained ELECTRA model either by training your own (see pre-training instructions above), or downloading the release ELECTRA weights and unziping them under $DATA_DIR/models (e.g., you should have a directory$DATA_DIR/models/electra_large if you are using the large model).

Finetune ELECTRA on a GLUE task

Download the GLUE data by running this script. Set up the data by running mv CoLA cola && mv MNLI mnli && mv MRPC mrpc && mv QNLI qnli && mv QQP qqp && mv RTE rte && mv SST-2 sst && mv STS-B sts && mv diagnostic/diagnostic.tsv mnli && mkdir -p $DATA_DIR/finetuning_data && mv * $DATA_DIR/finetuning_data.

Then run run_finetuning.py. For example, to fine-tune ELECTRA-Base on MNLI

python3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{"model_size": "base", "task_names": ["mnli"]}'

Or fine-tune a small model pre-trained using the above instructions on CoLA.

python3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_small_owt --hparams '{"model_size": "small", "task_names": ["cola"]}'

Finetune ELECTRA on question answering

The code supports SQuAD 1.1 and 2.0, as well as datasets in the 2019 MRQA shared task

  • Squad 1.1: Download the train and dev datasets and move them under $DATA_DIR/finetuning_data/squadv1/(train|dev).json
  • Squad 2.0: Download the datasets from the SQuAD Website and move them under $DATA_DIR/finetuning_data/squad/(train|dev).json
  • MRQA tasks: Download the data from here. Move the data to $DATA_DIR/finetuning_data/(newsqa|naturalqs|triviaqa|searchqa)/(train|dev).jsonl.

Then run (for example)

python3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{"model_size": "base", "task_names": ["squad"]}'

This repository uses the official evaluation code released by the SQuAD authors and the MRQA shared task to compute metrics

Finetune ELECTRA on sequence tagging

Download the CoNLL-2000 text chunking dataset from here and put it under $DATA_DIR/finetuning_data/chunk/(train|dev).txt. Then run

python3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{"model_size": "base", "task_names": ["chunk"]}'

Adding a new task

The easiest way to run on a new task is to implement a new finetune.task.Task, add it to finetune.task_builder.py, and then use run_finetuning.py as normal. For classification/qa/sequence tagging, you can inherit from a finetune.classification.classification_tasks.ClassificationTask, finetune.qa.qa_tasks.QATask, or finetune.tagging.tagging_tasks.TaggingTask. For preprocessing data, we use the same tokenizer as BERT.

Expected Results

Here are expected results for ELECTRA on various tasks (test set for chunking, dev set for the other tasks). Note that variance in fine-tuning can be quite large, so for some tasks you may see big fluctuations in scores when fine-tuning from the same checkpoint multiple times. The below scores show median performance over a large number of random seeds. ELECTRA-Small/Base/Large are our released models. ELECTRA-Small-OWT is the OpenWebText-trained model from above (it performs a bit worse than ELECTRA-Small due to being trained for less time and on a smaller dataset).

CoLA SST MRPC STS QQP MNLI QNLI RTE SQuAD 1.1 SQuAD 2.0 Chunking
Metrics MCC Acc Acc Spearman Acc Acc Acc Acc EM EM F1
ELECTRA-Large 69.1 96.9 90.8 92.6 92.4 90.9 95.0 88.0 89.7 88.1 97.2
ELECTRA-Base 67.7 95.1 89.5 91.2 91.5 88.8 93.2 82.7 86.8 80.5 97.1
ELECTRA-Small 57.0 91.2 88.0 87.5 89.0 81.3 88.4 66.7 75.8 70.1 96.5
ELECTRA-Small-OWT 56.8 88.3 87.4 86.8 88.3 78.9 87.9 68.5 -- -- --

See here for losses / training curves of the models during pre-training.

Electric

To train Electric, use the same pre-training script and command as ELECTRA. Pass "electra_objective": false and "electric_objective": true to the hyperparameters. We plan to release pre-trained Electric models soon!

Citation

If you use this code for your publication, please cite the original paper:

@inproceedings{clark2020electra,
  title = {{ELECTRA}: Pre-training Text Encoders as Discriminators Rather Than Generators},
  author = {Kevin Clark and Minh-Thang Luong and Quoc V. Le and Christopher D. Manning},
  booktitle = {ICLR},
  year = {2020},
  url = {https://openreview.net/pdf?id=r1xMH1BtvB}
}

If you use the code for Electric, please cite the Electric paper:

@inproceedings{clark2020electric,
  title = {Pre-Training Transformers as Energy-Based Cloze Models},
  author = {Kevin Clark and Minh-Thang Luong and Quoc V. Le and Christopher D. Manning},
  booktitle = {EMNLP},
  year = {2020},
  url = {https://www.aclweb.org/anthology/2020.emnlp-main.20.pdf}
}

Contact Info

For help or issues using ELECTRA, please submit a GitHub issue.

For personal communication related to ELECTRA, please contact Kevin Clark ([email protected]).

Comments
  • Add toggle to turn off `strip_accents`.

    Add toggle to turn off `strip_accents`.

    In some languages like German the accents are important and change the sementics. Examples:

    1. mochte vs. möchte
    2. musste vs. müsste
    3. etc.

    But when doing lower_case they are automatically always stripped.

    This PR adds a toggle to make it possible to do lower_case but keep the accents. This conforms to the transformers.tokenization_bert.BertTokenizerFast which also has an boolean parameter called strip_accents.

    opened by PhilipMay 13
  • Auto loading in huggingface Transformers is broken

    Auto loading in huggingface Transformers is broken

    When I try to load the model following the instructions on huggingface.co/models, i.e.:

    tokenizer = AutoTokenizer.from_pretrained("google/electra-small-generator")
    

    I get the following error:

    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    <ipython-input-9-bb330c08e050> in <module>
    ----> 1 tokenizer = AutoTokenizer.from_pretrained("google/electra-small-generator")
    
    /opt/conda/lib/python3.6/site-packages/transformers/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
        179         config = kwargs.pop("config", None)
        180         if not isinstance(config, PretrainedConfig):
    --> 181             config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
        182 
        183         if "bert-base-japanese" in pretrained_model_name_or_path:
    
    /opt/conda/lib/python3.6/site-packages/transformers/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
        185 
        186         if "model_type" in config_dict:
    --> 187             config_class = CONFIG_MAPPING[config_dict["model_type"]]
        188             return config_class.from_dict(config_dict, **kwargs)
        189         else:
    
    KeyError: 'electra'
    

    The version of transformers is 2.7.0. I reproduced the problem in colab here.

    opened by xhluca 7
  • Loss of base and large models

    Loss of base and large models

    Hi,

    I'm currently working on a new non-English ELECTRA model. Training on GPU seems to work and is running fine 🤗

    Next steps would be to try model training on a TPU, so I would just like to ask if you can post the final loss of both base and large models (or even share the loss training curve) so that we have a kind of reference point when training own models 🤔

    Thanks many in advance,

    Stefan

    opened by stefan-it 7
  • 'adam_m not found in checkpoint ' when further pretraining

    'adam_m not found in checkpoint ' when further pretraining

    When I was trying further pretraining on the models with domain-specific data in Colab, I encountered a problem that the official pretrained model could not be loaded.

    Here is the commend for further pretraining.

    hparam =    '{"model_size": "small", \
                 "use_tpu":true, \
                 "num_tpu_cores":8, \
                 "tpu_name":"grpc://10.53.161.26:8470", \
                 "num_train_steps":4000100,\
                 "pretrain_tfrecords":"gs://tweet_torch/electra/electra/data/pretrain_tf_records/pretrain_data.tfrecord*", \
                 "model_dir":"gs://tweet_torch/electra/electra/data/electra_small/", \
                 "generator_hidden_size":1.0\
                }'
    !python electra/run_pretraining.py  \
                        --data-dir "gs://tweet_torch/electra/electra/data/" \
                        --model-name "electra_small" \
                        --hparams '{hparam}'
    

    And the error message is pretty long so I just paste some of it that seems useful.

    ERROR:tensorflow:Error recorded from training_loop: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
    From /job:worker/replica:0/task:0:
    Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
    	 [[node save/RestoreV2 (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]
    
    
    opened by DayuanJiang 6
  • NaN loss during training

    NaN loss during training

    Thank you for releasing your codes.

    I have succeeded training a small model using a GPU by following Quickstart: Pre-train a small ELECTRA model, but a NaN loss during training error occurred when I trained a base model.

    Do you have any idea?

    I use tensorflow 1.15.0 and Tesla V100-PCIE-32GB, and an error log is as follows:

    $ python run_pretraining.py --data-dir ../electra-en-data --model-name electra_base_owt_200k --hparams '{"num_train_steps": 200000, "model_size": "base", "train_batch_size": 128}'
    ..
    2020-04-07 09:51:45.360762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30458 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:83:00.0, compute capability: 7.0)
    2020-04-07 09:52:37.813499: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
    1/200000 = 0.0%, SPS: 0.0, ELAP: 21, ETA: 47 days, 14:12:08 - loss: 44.4968
    2/200000 = 0.0%, SPS: 0.1, ELAP: 39, ETA: 45 days, 4:02:14 - loss: 44.3760
    3/200000 = 0.0%, SPS: 0.1, ELAP: 40, ETA: 31 days, 5:01:16 - loss: 44.5174
    4/200000 = 0.0%, SPS: 0.1, ELAP: 42, ETA: 24 days, 5:38:09 - loss: 44.1623
    5/200000 = 0.0%, SPS: 0.1, ELAP: 43, ETA: 20 days, 1:12:59 - loss: 44.2913
    ERROR:tensorflow:Model diverged with loss = NaN.
    ERROR:tensorflow:Error recorded from training_loop: NaN loss during training.
    Traceback (most recent call last):
      File "run_pretraining.py", line 385, in <module>
        main()
      File "run_pretraining.py", line 381, in main
        args.model_name, args.data_dir, **hparams))
      File "run_pretraining.py", line 344, in train_or_eval
        max_steps=config.num_train_steps)
      File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3035, in train
        rendezvous.raise_errors()
      File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 136, in raise_errors
        six.reraise(typ, value, traceback)
      File "/home/***/anaconda3/lib/python3.7/site-packages/six.py", line 693, in reraise
        raise value
      File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
        saving_listeners=saving_listeners)
      File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
        loss = self._train_model(input_fn, hooks, saving_listeners)
      File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
        return self._train_model_default(input_fn, hooks, saving_listeners)
      File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
        saving_listeners)
      File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1494, in _train_with_estimator_spec
        _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
      File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
        run_metadata=run_metadata)
      File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1259, in run
        run_metadata=run_metadata)
      File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
        raise six.reraise(*original_exc_info)
      File "/home/***/anaconda3/lib/python3.7/site-packages/six.py", line 693, in reraise
        raise value
      File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
        return self._sess.run(*args, **kwargs)
      File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1426, in run
        run_metadata=run_metadata))
      File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 761, in after_run
        raise NanLossDuringTrainingError
    tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.
    
    opened by tomohideshibata 6
  • Load model in Pytorch.

    Load model in Pytorch.

    Hi! Thanks for making source code available and for great paper.

    Are there any plans to support loading models in Pytorch? Or implementation in transformers by Huggingface?

    opened by loopdigga96 6
  • ERROR:tensorflow:   Failed to close session after error.Other threads may hang.

    ERROR:tensorflow: Failed to close session after error.Other threads may hang.

    I am trying to pretrain my ELECTRA base, I keep getting this output:

    Running training
    ================================================================================
    2020-11-13 08:00:18.044763: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:370] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
    Model is built!
    2020-11-13 08:00:48.956655: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:370] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
    ERROR:tensorflow:Error recorded from infeed: From /job:worker/replica:0/task:0:
    {{function_node __inference_tf_data_experimental_map_and_batch_<lambda>_69}} Key: segment_ids.  Can't parse serialized Example.
    	 [[{{node ParseSingleExample/ParseSingleExample}}]]
    	 [[input_pipeline_task0/while/IteratorGetNext]]
    ERROR:tensorflow:Closing session due to error From /job:worker/replica:0/task:0:
    {{function_node __inference_tf_data_experimental_map_and_batch_<lambda>_69}} Key: segment_ids.  Can't parse serialized Example.
    	 [[{{node ParseSingleExample/ParseSingleExample}}]]
    	 [[input_pipeline_task0/while/IteratorGetNext]]
    2020-11-13 08:01:08.642776: W tensorflow/core/distributed_runtime/rpc/grpc_remote_master.cc:157] RPC failed with status = "Unavailable: Socket closed" and grpc_error_string = "{"created":"@1605254468.642525410","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"Socket closed","grpc_status":14}", maybe retrying the RPC
    2020-11-13 08:01:08.642779: W tensorflow/core/distributed_runtime/rpc/grpc_remote_master.cc:157] RPC failed with status = "Unavailable: Socket closed" and grpc_error_string = "{"created":"@1605254468.642549072","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"Socket closed","grpc_status":14}", maybe retrying the RPC
    ERROR:tensorflow:Error recorded from outfeed: Step was cancelled by an explicit call to `Session::Close()`.
    ERROR:tensorflow:
    
    
    Failed to close session after error.Other threads may hang.
    
    
    
    2020-11-13 08:01:50.857700: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:370] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
    ERROR:tensorflow:Error recorded from infeed: From /job:worker/replica:0/task:0:
    {{function_node __inference_tf_data_experimental_map_and_batch_<lambda>_69}} Key: segment_ids.  Can't parse serialized Example.
    	 [[{{node ParseSingleExample/ParseSingleExample}}]]
    	 [[input_pipeline_task0/while/IteratorGetNext]]
    
    opened by etetteh 5
  • KeyError: '[SEP]'

    KeyError: '[SEP]'

    when running run_pretraining.py I get this error before it pretrains:

    ================================================================================ Running training

    2020-04-28 04:43:55.132186: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:356] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created. ERROR:tensorflow:Error recorded from training_loop: '[SEP]' Traceback (most recent call last): File "run_pretraining.py", line 384, in main() . (lines ignored because they're not useful) . File "/home/manai_elye2s/pretrain/electra/pretrain/pretrain_helpers.py", line 121, in _get_candidates_mask ignore_ids = [vocab["[SEP]"], vocab["[CLS]"], vocab["[MASK]"]] KeyError: '[SEP]'

    I got this both with my own vocab and the default one I downloaded from this repo. In both vocab.txt files there are the [SEP] [CLS] and [MASK] tokens, without space

    opened by elyesmanai 5
  • Format of corpus

    Format of corpus

    According to the paper, ELECTRA does not involve NSP (next sentence prediction) task. In that case, do we need sentence segmentation? Does build_pretraining_dataset.py consider each line as a separate sentence? Or can we just feed raw text (with empty lines as separators for documents) ?

    opened by mahnerak 4
  • ValueError: Must specify max_steps > 0, given: 0

    ValueError: Must specify max_steps > 0, given: 0

    $python3 electra_small/run_finetuning.py \
    --data-dir $DATA_DIR \
    --model-name "ELECTRA-small" \
    --hparams '{"model_size": "small", "task_names": ["<task_name>"], "num_trials": 5, "learning_rate": 3e-4, "train_batch_size": 16, "use_tpu": "True", "num_tpu_cores": 8, "tpu_name": "<tpu_name>", "tpu_zone": "europe-west4-a", "gcp_project": "<gcp_name>", "vocab_size": 50000, "num_train_epochs": 10}'
    

    I am getting the following error. Is there something I am missing?

    Training for 0 steps
    ERROR:tensorflow:Error recorded from training_loop: Must specify max_steps > 0, given: 0
    Traceback (most recent call last):
      File "electra_small/run_finetuning.py", line 323, in <module>
        main()
      File "electra_small/run_finetuning.py", line 319, in main
        args.model_name, args.data_dir, **hparams))
      File "electra_small/run_finetuning.py", line 270, in run_finetuning
        model_runner.train()
      File "electra_small/run_finetuning.py", line 183, in train
        input_fn=self._train_input_fn, max_steps=self.train_steps)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3035, in train
        rendezvous.raise_errors()
      File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 136, in raise_errors
        six.reraise(typ, value, traceback)
      File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
        raise value
      File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
        saving_listeners=saving_listeners)
      File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
        'Must specify max_steps > 0, given: {}'.format(max_steps))
    ValueError: Must specify max_steps > 0, given: 0
    
    opened by etetteh 3
  • Training Electra on 2 phases like Bert

    Training Electra on 2 phases like Bert

    Bert could be trained in 2 phases. The first phase with shorter length (128) then the second phase with longer length (512). The first phase accelerate training while the second phase makes the positional encoding learn longer sentences. This does work as long as the "max_position_embeddings" is 512.

    Can Electra trained on the same way or since it has a final layer for classifying each token it will not work ?

    @clarkkev @stefan-it @mrm8488 @michelole , Your feedback is highly appreciated.

    opened by agemagician 3
  • failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED

    failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED

    How can I save this problem? I use 3080ti /tensorflow 1.15 / python 3.7.10 / cuda 10.1

    2022-12-27 15:27:14.672772: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED ERROR:tensorflow:Error recorded from training_loop: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(4096, 2), b.shape=(2, 128), m=4096, n=128, k=2 [[node electra/embeddings_1/MatMul (defined at /opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] [[add_10/_9743]] (1) Internal: Blas GEMM launch failed : a.shape=(4096, 2), b.shape=(2, 128), m=4096, n=128, k=2 [[node electra/embeddings_1/MatMul (defined at /opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] 0 successful operations. 0 derived errors ignored.

    opened by EJDU21 0
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 1
  • Cannot import trace from tensorflow.python.profiler

    Cannot import trace from tensorflow.python.profiler

    Just installed tensorflow 1.15 using conda, getting this error when attempting to run the pre-training command provided in the quickstart section of the readme.

    Traceback (most recent call last):
      File "electra/run_pretraining.py", line 29, in <module>
        from model import modeling
      File "electra\model\modeling.py", line 33, in <module>
        from tensorflow.contrib import layers as contrib_layers
      File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow\__init__.py", line 50, in __getattr__
        module = self._load()
      File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow\__init__.py", line 44, in _load
        module = _importlib.import_module(self.__name__)
      File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\importlib\__init__.py", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_core\contrib\__init__.py", line 39, in <module>
        from tensorflow.contrib import compiler
      File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_core\contrib\compiler\__init__.py", line 21, in <module>
        from tensorflow.contrib.compiler import jit
      File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_core\contrib\compiler\__init__.py", line 22, in <module>
        from tensorflow.contrib.compiler import xla
      File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_core\contrib\compiler\xla.py", line 22, in <module>
        from tensorflow.python.estimator import model_fn as model_fn_lib
      File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_core\python\estimator\model_fn.py", line 26, in <module>
        from tensorflow_estimator.python.estimator import model_fn
      File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_estimator\__init__.py", line 10, in <module>
        from tensorflow_estimator._api.v1 import estimator
      File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_estimator\_api\v1\estimator\__init__.py", line 10, in <module>
        from tensorflow_estimator._api.v1.estimator import experimental
      File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_estimator\_api\v1\estimator\experimental\__init__.py", line 10, in <module>
        from tensorflow_estimator.python.estimator.canned.dnn import dnn_logit_fn_builder
      File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_estimator\python\estimator\canned\dnn.py", line 27, in <module>
        from tensorflow_estimator.python.estimator import estimator
      File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 36, in <module>
        from tensorflow.python.profiler import trace
    ImportError: cannot import name 'trace' from 'tensorflow.python.profiler' (C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_core\python\profiler\__init__.py)
    
    opened by n-garc 2
  • Tagging Task Segment ids

    Tagging Task Segment ids

    Why Tagging Task segment ids are ones instead of zeros?

    https://github.com/google-research/electra/blob/8a46635f32083ada044d7e9ad09604742600ee7b/finetune/tagging/tagging_tasks.py#L144

    Tagging task only contains the first segment and it should be zeros, right?

    @clarkkev

    opened by kamalkraj 0
  • Optimal Learning Rate and Training Steps for Large Batch Size

    Optimal Learning Rate and Training Steps for Large Batch Size

    First of all, thank you for sharing great work !

    I was wondering how would you recommend choosing optimal hyperparams for large batch size ?

    For example, if i train Electra Large model on v3-128 tpu, a batch size of 4096 is affordable. In this case, what learning rate and training steps would you suggest ? As for the data, I'm planning to train the model with my own dataset, which is of ~ 300GB of tfrecords

    Do you have any rough ideas ?

    Thank you

    opened by robinsongh381 0
Owner
Google Research
Google Research
Chinese NER with albert/electra or other bert descendable model (keras)

Chinese NLP (albert/electra with Keras) Named Entity Recognization Project Structure ./ ├── NER │   ├── __init__.py │   ├── log

null 2 Nov 20, 2022
Neural text generators like the GPT models promise a general-purpose means of manipulating texts.

Boolean Prompting for Neural Text Generators Neural text generators like the GPT models promise a general-purpose means of manipulating texts. These m

Jeffrey M. Binder 20 Jan 9, 2023
OpenAI CLIP text encoders for multiple languages!

Multilingual-CLIP OpenAI CLIP text encoders for any language Colab Notebook · Pre-trained Models · Report Bug Overview OpenAI recently released the pa

Fredrik Carlsson 481 Dec 30, 2022
Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

背景 安装教程 快速上手 (一)预训练模型 (二)机器翻译 (三)文本分类 TenTrans 进阶 1. 多语言机器翻译 2. 跨语言预训练 背景 TrenTrans是一个统一的端到端的多语言多任务预训练平台,支持多种预训练方式,以及序列生成和自然语言理解任务。 安装教程 git clone git

Tencent Minority-Mandarin Translation Team 42 Dec 20, 2022
DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task。涵盖68个领域、共计916万词的专业词典知识库,可用于文本分类、知识增强、领域词汇库扩充等自然语言处理应用。

liuhuanyong 357 Dec 24, 2022
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Alexander Veysov 3.2k Dec 31, 2022
Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

MT5_paddle Use PaddlePaddle to reproduce the paper:mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer English | 简体中文 mT5: A Massively

null 2 Oct 17, 2021
GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training Code and model from our AAAI 2021 paper

Amazon Web Services - Labs 83 Jan 9, 2023
Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

Wasi Ahmad 138 Dec 30, 2022
MASS: Masked Sequence to Sequence Pre-training for Language Generation

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Microsoft 1.1k Dec 17, 2022
Pre-training BERT masked language models with custom vocabulary

Pre-training BERT Masked Language Models (MLM) This repository contains the method to pre-train a BERT model using custom vocabulary. It was used to p

Stella Douka 14 Nov 2, 2022
TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

Yixuan Su 26 Oct 17, 2022
CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training This is the official repository for the code and models of the paper CCQA: A N

Meta Research 29 Nov 30, 2022
iBOT: Image BERT Pre-Training with Online Tokenizer

Image BERT Pre-Training with iBOT Official PyTorch implementation and pretrained models for paper iBOT: Image BERT Pre-Training with Online Tokenizer.

Bytedance Inc. 435 Jan 6, 2023
Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

This repository provides a library for efficient training of masked language models (MLM), built with fairseq. We fork fairseq to give researchers mor

Princeton Natural Language Processing 92 Dec 27, 2022
SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Introduction This codebase contains source-code of the Python-based implementation (ARES) of our SIGIR 2022 paper. Chen, Jia, et al. "Axiomatically Re

Jia Chen 17 Nov 9, 2022
Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers

beyond masking Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers The code is coming Figure 1: Pipeline of token-based pre-

Yunjie Tian 23 Sep 27, 2022