ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Google Research

Last update: Dec 28, 2022

Related tags

Overview

ELECTRA

Introduction

ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a GAN. At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the SQuAD 2.0 dataset.

For a detailed description and experimental results, please refer to our ICLR 2020 paper ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators.

This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. GLUE), QA tasks (e.g., SQuAD), and sequence tagging tasks (e.g., text chunking).

This repository also contains code for Electric, a version of ELECTRA inspired by energy-based models. Electric provides a more principled view of ELECTRA as a "negative sampling" cloze model. It can also efficiently produce pseudo-likelihood scores for text, which can be used to re-rank the outputs of speech recognition or machine translation systems. For details on Electric, please refer to out EMNLP 2020 paper Pre-Training Transformers as Energy-Based Cloze Models.

Released Models

We are initially releasing three pre-trained models:

Model	Layers	Hidden Size	Params	GLUE score (test set)	Download
ELECTRA-Small	12	256	14M	77.4	link
ELECTRA-Base	12	768	110M	82.7	link
ELECTRA-Large	24	1024	335M	85.2	link

The models were trained on uncased English text. They correspond to ELECTRA-Small++, ELECTRA-Base++, ELECTRA-1.75M in our paper. We hope to release other models, such as multilingual models, in the future.

On GLUE, ELECTRA-Large scores slightly better than ALBERT/XLNET, ELECTRA-Base scores better than BERT-Large, and ELECTRA-Small scores slightly worst than TinyBERT (but uses no distillation). See the expected results section below for detailed performance numbers.

Requirements

Python 3
TensorFlow 1.15 (although we hope to support TensorFlow 2.0 at a future date)
NumPy
scikit-learn and SciPy (for computing some evaluation metrics).

Pre-training

Use build_pretraining_dataset.py to create a pre-training dataset from a dump of raw text. It has the following arguments:

--corpus-dir: A directory containing raw text files to turn into ELECTRA examples. A text file can contain multiple documents with empty lines separating them.
--vocab-file: File defining the wordpiece vocabulary.
--output-dir: Where to write out ELECTRA examples.
--max-seq-length: The number of tokens per example (128 by default).
--num-processes: If >1 parallelize across multiple processes (1 by default).
--blanks-separate-docs: Whether blank lines indicate document boundaries (True by default).
--do-lower-case/--no-lower-case: Whether to lower case the input text (True by default).

Use run_pretraining.py to pre-train an ELECTRA model. It has the following arguments:

--data-dir: a directory where pre-training data, model weights, etc. are stored. By default, the training loads examples from <data-dir>/pretrain_tfrecords and a vocabulary from <data-dir>/vocab.txt.
--model-name: a name for the model being trained. Model weights will be saved in <data-dir>/models/<model-name> by default.
--hparams (optional): a JSON dict or path to a JSON file containing model hyperparameters, data paths, etc. See configure_pretraining.py for the supported hyperparameters.

If training is halted, re-running the run_pretraining.py with the same arguments will continue the training where it left off.

You can continue pre-training from the released ELECTRA checkpoints by

Setting the model-name to point to a downloaded model (e.g., --model-name electra_small if you downloaded weights to $DATA_DIR/electra_small).
Setting num_train_steps by (for example) adding "num_train_steps": 4010000 to the --hparams. This will continue training the small model for 10000 more steps (it has already been trained for 4e6 steps).
Increase the learning rate to account for the linear learning rate decay. For example, to start with a learning rate of 2e-4 you should set the learning_rate hparam to 2e-4 * (4e6 + 10000) / 10000.
For ELECTRA-Small, you also need to specifiy "generator_hidden_size": 1.0 in the hparams because we did not use a small generator for that model.

Quickstart: Pre-train a small ELECTRA model.

These instructions pre-train a small ELECTRA model (12 layers, 256 hidden size). Unfortunately, the data we used in the paper is not publicly available, so we will use the OpenWebTextCorpus released by Aaron Gokaslan and Vanya Cohen instead. The fully-trained model (~4 days on a v100 GPU) should perform roughly in between GPT and BERT-Base in terms of GLUE performance. By default the model is trained on length-128 sequences, so it is not suitable for running on question answering. See the "expected results" section below for more details on model performance.

Setup

Place a vocabulary file in $DATA_DIR/vocab.txt. Our ELECTRA models all used the exact same vocabulary as English uncased BERT, which you can download here.
Download the OpenWebText corpus (12G) and extract it (i.e., run tar xf openwebtext.tar.xz). Place it in $DATA_DIR/openwebtext.
Run python3 build_openwebtext_pretraining_dataset.py --data-dir $DATA_DIR --num-processes 5. It pre-processes/tokenizes the data and outputs examples as tfrecord files under $DATA_DIR/pretrain_tfrecords. The tfrecords require roughly 30G of disk space.

Pre-training the model.

Run python3 run_pretraining.py --data-dir $DATA_DIR --model-name electra_small_owt to train a small ELECTRA model for 1 million steps on the data. This takes slightly over 4 days on a Tesla V100 GPU. However, the model should achieve decent results after 200k steps (10 hours of training on the v100 GPU).

To customize the training, add --hparams '{"hparam1": value1, "hparam2": value2, ...}' to the run command. --hparams can also be a path to a .json file containing the hyperparameters. Some particularly useful options:

"debug": true trains a tiny ELECTRA model for a few steps.
"model_size": one of "small", "base", or "large": determines the size of the model
"electra_objective": false trains a model with masked language modeling instead of replaced token detection (essentially BERT with dynamic masking and no next-sentence prediction).
"num_train_steps": n controls how long the model is pre-trained for.
"pretrain_tfrecords": <paths> determines where the pre-training data is located. Note you need to specify the specific files not just the directory (e.g., <data-dir>/pretrain_tf_records/pretrain_data.tfrecord*)
"vocab_file": <path> and "vocab_size": n can be used to set a custom wordpiece vocabulary.
"learning_rate": lr, "train_batch_size": n, etc. can be used to change training hyperparameters
"model_hparam_overrides": {"hidden_size": n, "num_hidden_layers": m}, etc. can be used to changed the hyperparameters for the underlying transformer (the "model_size" flag sets the default values).

See configure_pretraining.py for the full set of supported hyperparameters.

Evaluating the pre-trained model.

To evaluate the model on a downstream task, see the below finetuning instructions. To evaluate the generator/discriminator on the openwebtext data run python3 run_pretraining.py --data-dir $DATA_DIR --model-name electra_small_owt --hparams '{"do_train": false, "do_eval": true}'. This will print out eval metrics such as the accuracy of the generator and discriminator, and also writing the metrics out to data-dir/model-name/results.

Fine-tuning

Use run_finetuning.py to fine-tune and evaluate an ELECTRA model on a downstream NLP task. It expects three arguments:

--data-dir: a directory where data, model weights, etc. are stored. By default, the script loads finetuning data from <data-dir>/finetuning_data/<task-name> and a vocabulary from <data-dir>/vocab.txt.
--model-name: a name of the pre-trained model: the pre-trained weights should exist in data-dir/models/model-name.
--hparams: a JSON dict containing model hyperparameters, data paths, etc. (e.g., --hparams '{"task_names": ["rte"], "model_size": "base", "learning_rate": 1e-4, ...}'). See configure_pretraining.py for the supported hyperparameters. Instead of a dict, this can also be a path to a .json file containing the hyperparameters. You must specify the "task_names" and "model_size" (see examples below).

Eval metrics will be saved in data-dir/model-name/results and model weights will be saved in data-dir/model-name/finetuning_models by default. Evaluation is done on the dev set by default. To customize the training, add --hparams '{"hparam1": value1, "hparam2": value2, ...}' to the run command. Some particularly useful options:

"debug": true fine-tunes a tiny ELECTRA model for a few steps.
"task_names": ["task_name"]: specifies the tasks to train on. A list because the codebase nominally supports multi-task learning, (although be warned this has not been thoroughly tested).
"model_size": one of "small", "base", or "large": determines the size of the model; you must set this to the same size as the pre-trained model.
"do_train" and "do_eval": train and/or evaluate a model (both are set to true by default). For using "do_eval": true with "do_train": false, you need to specify the init_checkpoint, e.g., python3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{"model_size": "base", "task_names": ["mnli"], "do_train": false, "do_eval": true, "init_checkpoint": "<data-dir>/models/electra_base/finetuning_models/mnli_model_1"}'
"num_trials": n: If >1, does multiple fine-tuning/evaluation runs with different random seeds.
"learning_rate": lr, "train_batch_size": n, etc. can be used to change training hyperparameters.
"model_hparam_overrides": {"hidden_size": n, "num_hidden_layers": m}, etc. can be used to changed the hyperparameters for the underlying transformer (the "model_size" flag sets the default values).

Setup

Get a pre-trained ELECTRA model either by training your own (see pre-training instructions above), or downloading the release ELECTRA weights and unziping them under $DATA_DIR/models (e.g., you should have a directory$DATA_DIR/models/electra_large if you are using the large model).

Finetune ELECTRA on a GLUE task

Download the GLUE data by running this script. Set up the data by running mv CoLA cola && mv MNLI mnli && mv MRPC mrpc && mv QNLI qnli && mv QQP qqp && mv RTE rte && mv SST-2 sst && mv STS-B sts && mv diagnostic/diagnostic.tsv mnli && mkdir -p $DATA_DIR/finetuning_data && mv * $DATA_DIR/finetuning_data.

Then run run_finetuning.py. For example, to fine-tune ELECTRA-Base on MNLI

python3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{"model_size": "base", "task_names": ["mnli"]}'

Or fine-tune a small model pre-trained using the above instructions on CoLA.

python3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_small_owt --hparams '{"model_size": "small", "task_names": ["cola"]}'

Finetune ELECTRA on question answering

The code supports SQuAD 1.1 and 2.0, as well as datasets in the 2019 MRQA shared task

Squad 1.1: Download the train and dev datasets and move them under $DATA_DIR/finetuning_data/squadv1/(train|dev).json
Squad 2.0: Download the datasets from the SQuAD Website and move them under $DATA_DIR/finetuning_data/squad/(train|dev).json
MRQA tasks: Download the data from here. Move the data to $DATA_DIR/finetuning_data/(newsqa|naturalqs|triviaqa|searchqa)/(train|dev).jsonl.

Then run (for example)

python3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{"model_size": "base", "task_names": ["squad"]}'

This repository uses the official evaluation code released by the SQuAD authors and the MRQA shared task to compute metrics

Finetune ELECTRA on sequence tagging

Download the CoNLL-2000 text chunking dataset from here and put it under $DATA_DIR/finetuning_data/chunk/(train|dev).txt. Then run

python3 run_finetuning.py --data-dir $DATA_DIR --model-name electra_base --hparams '{"model_size": "base", "task_names": ["chunk"]}'

Adding a new task

The easiest way to run on a new task is to implement a new finetune.task.Task, add it to finetune.task_builder.py, and then use run_finetuning.py as normal. For classification/qa/sequence tagging, you can inherit from a finetune.classification.classification_tasks.ClassificationTask, finetune.qa.qa_tasks.QATask, or finetune.tagging.tagging_tasks.TaggingTask. For preprocessing data, we use the same tokenizer as BERT.

Expected Results

Here are expected results for ELECTRA on various tasks (test set for chunking, dev set for the other tasks). Note that variance in fine-tuning can be quite large, so for some tasks you may see big fluctuations in scores when fine-tuning from the same checkpoint multiple times. The below scores show median performance over a large number of random seeds. ELECTRA-Small/Base/Large are our released models. ELECTRA-Small-OWT is the OpenWebText-trained model from above (it performs a bit worse than ELECTRA-Small due to being trained for less time and on a smaller dataset).

	CoLA	SST	MRPC	STS	QQP	MNLI	QNLI	RTE	SQuAD 1.1	SQuAD 2.0	Chunking
Metrics	MCC	Acc	Acc	Spearman	Acc	Acc	Acc	Acc	EM	EM	F1
ELECTRA-Large	69.1	96.9	90.8	92.6	92.4	90.9	95.0	88.0	89.7	88.1	97.2
ELECTRA-Base	67.7	95.1	89.5	91.2	91.5	88.8	93.2	82.7	86.8	80.5	97.1
ELECTRA-Small	57.0	91.2	88.0	87.5	89.0	81.3	88.4	66.7	75.8	70.1	96.5
ELECTRA-Small-OWT	56.8	88.3	87.4	86.8	88.3	78.9	87.9	68.5	--	--	--

See here for losses / training curves of the models during pre-training.

Electric

To train Electric, use the same pre-training script and command as ELECTRA. Pass "electra_objective": false and "electric_objective": true to the hyperparameters. We plan to release pre-trained Electric models soon!

Citation

If you use this code for your publication, please cite the original paper:

@inproceedings{clark2020electra,
  title = {{ELECTRA}: Pre-training Text Encoders as Discriminators Rather Than Generators},
  author = {Kevin Clark and Minh-Thang Luong and Quoc V. Le and Christopher D. Manning},
  booktitle = {ICLR},
  year = {2020},
  url = {https://openreview.net/pdf?id=r1xMH1BtvB}
}

If you use the code for Electric, please cite the Electric paper:

@inproceedings{clark2020electric,
  title = {Pre-Training Transformers as Energy-Based Cloze Models},
  author = {Kevin Clark and Minh-Thang Luong and Quoc V. Le and Christopher D. Manning},
  booktitle = {EMNLP},
  year = {2020},
  url = {https://www.aclweb.org/anthology/2020.emnlp-main.20.pdf}
}

Contact Info

For help or issues using ELECTRA, please submit a GitHub issue.

For personal communication related to ELECTRA, please contact Kevin Clark ([email protected]).

Comments

Add toggle to turn off `strip_accents`.
In some languages like German the accents are important and change the sementics. Examples:

mochte vs. möchte

musste vs. müsste

etc.

But when doing lower_case they are automatically always stripped.

This PR adds a toggle to make it possible to do lower_case but keep the accents. This conforms to the transformers.tokenization_bert.BertTokenizerFast which also has an boolean parameter called strip_accents.
opened by PhilipMay 13

Auto loading in huggingface Transformers is broken

When I try to load the model following the instructions on huggingface.co/models, i.e.:

tokenizer = AutoTokenizer.from_pretrained("google/electra-small-generator")

I get the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-9-bb330c08e050> in <module>
----> 1 tokenizer = AutoTokenizer.from_pretrained("google/electra-small-generator")

/opt/conda/lib/python3.6/site-packages/transformers/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    179         config = kwargs.pop("config", None)
    180         if not isinstance(config, PretrainedConfig):
--> 181             config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
    182 
    183         if "bert-base-japanese" in pretrained_model_name_or_path:

/opt/conda/lib/python3.6/site-packages/transformers/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    185 
    186         if "model_type" in config_dict:
--> 187             config_class = CONFIG_MAPPING[config_dict["model_type"]]
    188             return config_class.from_dict(config_dict, **kwargs)
    189         else:

KeyError: 'electra'

The version of transformers is 2.7.0. I reproduced the problem in colab here.

opened by xhluca 7

Loss of base and large models

Hi,

I'm currently working on a new non-English ELECTRA model. Training on GPU seems to work and is running fine 🤗

Next steps would be to try model training on a TPU, so I would just like to ask if you can post the final loss of both base and large models (or even share the loss training curve) so that we have a kind of reference point when training own models 🤔

Thanks many in advance,

Stefan

opened by stefan-it 7

'adam_m not found in checkpoint ' when further pretraining

When I was trying further pretraining on the models with domain-specific data in Colab, I encountered a problem that the official pretrained model could not be loaded.

Here is the commend for further pretraining.

hparam =    '{"model_size": "small", \
             "use_tpu":true, \
             "num_tpu_cores":8, \
             "tpu_name":"grpc://10.53.161.26:8470", \
             "num_train_steps":4000100,\
             "pretrain_tfrecords":"gs://tweet_torch/electra/electra/data/pretrain_tf_records/pretrain_data.tfrecord*", \
             "model_dir":"gs://tweet_torch/electra/electra/data/electra_small/", \
             "generator_hidden_size":1.0\
            }'
!python electra/run_pretraining.py  \
                    --data-dir "gs://tweet_torch/electra/electra/data/" \
                    --model-name "electra_small" \
                    --hparams '{hparam}'

And the error message is pretty long so I just paste some of it that seems useful.

ERROR:tensorflow:Error recorded from training_loop: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
From /job:worker/replica:0/task:0:
Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
	 [[node save/RestoreV2 (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]

opened by DayuanJiang 6

NaN loss during training

Thank you for releasing your codes.

I have succeeded training a small model using a GPU by following Quickstart: Pre-train a small ELECTRA model, but a NaN loss during training error occurred when I trained a base model.

Do you have any idea?

I use tensorflow 1.15.0 and Tesla V100-PCIE-32GB, and an error log is as follows:

$ python run_pretraining.py --data-dir ../electra-en-data --model-name electra_base_owt_200k --hparams '{"num_train_steps": 200000, "model_size": "base", "train_batch_size": 128}'
..
2020-04-07 09:51:45.360762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30458 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:83:00.0, compute capability: 7.0)
2020-04-07 09:52:37.813499: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
1/200000 = 0.0%, SPS: 0.0, ELAP: 21, ETA: 47 days, 14:12:08 - loss: 44.4968
2/200000 = 0.0%, SPS: 0.1, ELAP: 39, ETA: 45 days, 4:02:14 - loss: 44.3760
3/200000 = 0.0%, SPS: 0.1, ELAP: 40, ETA: 31 days, 5:01:16 - loss: 44.5174
4/200000 = 0.0%, SPS: 0.1, ELAP: 42, ETA: 24 days, 5:38:09 - loss: 44.1623
5/200000 = 0.0%, SPS: 0.1, ELAP: 43, ETA: 20 days, 1:12:59 - loss: 44.2913
ERROR:tensorflow:Model diverged with loss = NaN.
ERROR:tensorflow:Error recorded from training_loop: NaN loss during training.
Traceback (most recent call last):
  File "run_pretraining.py", line 385, in <module>
    main()
  File "run_pretraining.py", line 381, in main
    args.model_name, args.data_dir, **hparams))
  File "run_pretraining.py", line 344, in train_or_eval
    max_steps=config.num_train_steps)
  File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3035, in train
    rendezvous.raise_errors()
  File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 136, in raise_errors
    six.reraise(typ, value, traceback)
  File "/home/***/anaconda3/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
    saving_listeners=saving_listeners)
  File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
    saving_listeners)
  File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1494, in _train_with_estimator_spec
    _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
  File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1259, in run
    run_metadata=run_metadata)
  File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "/home/***/anaconda3/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1426, in run
    run_metadata=run_metadata))
  File "/home/***/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 761, in after_run
    raise NanLossDuringTrainingError
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.

opened by tomohideshibata 6

Load model in Pytorch.

Hi! Thanks for making source code available and for great paper.

Are there any plans to support loading models in Pytorch? Or implementation in transformers by Huggingface?

opened by loopdigga96 6

ERROR:tensorflow: Failed to close session after error.Other threads may hang.

I am trying to pretrain my ELECTRA base, I keep getting this output:

Running training
================================================================================
2020-11-13 08:00:18.044763: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:370] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
Model is built!
2020-11-13 08:00:48.956655: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:370] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
ERROR:tensorflow:Error recorded from infeed: From /job:worker/replica:0/task:0:
{{function_node __inference_tf_data_experimental_map_and_batch_<lambda>_69}} Key: segment_ids.  Can't parse serialized Example.
	 [[{{node ParseSingleExample/ParseSingleExample}}]]
	 [[input_pipeline_task0/while/IteratorGetNext]]
ERROR:tensorflow:Closing session due to error From /job:worker/replica:0/task:0:
{{function_node __inference_tf_data_experimental_map_and_batch_<lambda>_69}} Key: segment_ids.  Can't parse serialized Example.
	 [[{{node ParseSingleExample/ParseSingleExample}}]]
	 [[input_pipeline_task0/while/IteratorGetNext]]
2020-11-13 08:01:08.642776: W tensorflow/core/distributed_runtime/rpc/grpc_remote_master.cc:157] RPC failed with status = "Unavailable: Socket closed" and grpc_error_string = "{"created":"@1605254468.642525410","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"Socket closed","grpc_status":14}", maybe retrying the RPC
2020-11-13 08:01:08.642779: W tensorflow/core/distributed_runtime/rpc/grpc_remote_master.cc:157] RPC failed with status = "Unavailable: Socket closed" and grpc_error_string = "{"created":"@1605254468.642549072","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"Socket closed","grpc_status":14}", maybe retrying the RPC
ERROR:tensorflow:Error recorded from outfeed: Step was cancelled by an explicit call to `Session::Close()`.
ERROR:tensorflow:


Failed to close session after error.Other threads may hang.



2020-11-13 08:01:50.857700: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:370] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
ERROR:tensorflow:Error recorded from infeed: From /job:worker/replica:0/task:0:
{{function_node __inference_tf_data_experimental_map_and_batch_<lambda>_69}} Key: segment_ids.  Can't parse serialized Example.
	 [[{{node ParseSingleExample/ParseSingleExample}}]]
	 [[input_pipeline_task0/while/IteratorGetNext]]

opened by etetteh 5

KeyError: '[SEP]'

when running run_pretraining.py I get this error before it pretrains:

================================================================================ Running training

2020-04-28 04:43:55.132186: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:356] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created. ERROR:tensorflow:Error recorded from training_loop: '[SEP]' Traceback (most recent call last): File "run_pretraining.py", line 384, in main() . (lines ignored because they're not useful) . File "/home/manai_elye2s/pretrain/electra/pretrain/pretrain_helpers.py", line 121, in _get_candidates_mask ignore_ids = [vocab["[SEP]"], vocab["[CLS]"], vocab["[MASK]"]] KeyError: '[SEP]'

I got this both with my own vocab and the default one I downloaded from this repo. In both vocab.txt files there are the [SEP] [CLS] and [MASK] tokens, without space

opened by elyesmanai 5
Format of corpus

According to the paper, ELECTRA does not involve NSP (next sentence prediction) task. In that case, do we need sentence segmentation? Does build_pretraining_dataset.py consider each line as a separate sentence? Or can we just feed raw text (with empty lines as separators for documents) ?

opened by mahnerak 4

ValueError: Must specify max_steps > 0, given: 0

$python3 electra_small/run_finetuning.py \
--data-dir $DATA_DIR \
--model-name "ELECTRA-small" \
--hparams '{"model_size": "small", "task_names": ["<task_name>"], "num_trials": 5, "learning_rate": 3e-4, "train_batch_size": 16, "use_tpu": "True", "num_tpu_cores": 8, "tpu_name": "<tpu_name>", "tpu_zone": "europe-west4-a", "gcp_project": "<gcp_name>", "vocab_size": 50000, "num_train_epochs": 10}'

I am getting the following error. Is there something I am missing?

Training for 0 steps
ERROR:tensorflow:Error recorded from training_loop: Must specify max_steps > 0, given: 0
Traceback (most recent call last):
  File "electra_small/run_finetuning.py", line 323, in <module>
    main()
  File "electra_small/run_finetuning.py", line 319, in main
    args.model_name, args.data_dir, **hparams))
  File "electra_small/run_finetuning.py", line 270, in run_finetuning
    model_runner.train()
  File "electra_small/run_finetuning.py", line 183, in train
    input_fn=self._train_input_fn, max_steps=self.train_steps)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3035, in train
    rendezvous.raise_errors()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 136, in raise_errors
    six.reraise(typ, value, traceback)
  File "/usr/local/lib/python3.5/dist-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
    'Must specify max_steps > 0, given: {}'.format(max_steps))
ValueError: Must specify max_steps > 0, given: 0

opened by etetteh 3

Training Electra on 2 phases like Bert

Bert could be trained in 2 phases. The first phase with shorter length (128) then the second phase with longer length (512). The first phase accelerate training while the second phase makes the positional encoding learn longer sentences. This does work as long as the "max_position_embeddings" is 512.

Can Electra trained on the same way or since it has a final layer for classifying each token it will not work ?

@clarkkev @stefan-it @mrm8488 @michelole , Your feedback is highly appreciated.

opened by agemagician 3
failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED

How can I save this problem? I use 3080ti /tensorflow 1.15 / python 3.7.10 / cuda 10.1

2022-12-27 15:27:14.672772: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED ERROR:tensorflow:Error recorded from training_loop: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(4096, 2), b.shape=(2, 128), m=4096, n=128, k=2 [[node electra/embeddings_1/MatMul (defined at /opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] [[add_10/_9743]] (1) Internal: Blas GEMM launch failed : a.shape=(4096, 2), b.shape=(2, 128), m=4096, n=128, k=2 [[node electra/embeddings_1/MatMul (defined at /opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] 0 successful operations. 0 derived errors ignored.

opened by EJDU21 0
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 1

Cannot import trace from tensorflow.python.profiler

Just installed tensorflow 1.15 using conda, getting this error when attempting to run the pre-training command provided in the quickstart section of the readme.

Traceback (most recent call last):
  File "electra/run_pretraining.py", line 29, in <module>
    from model import modeling
  File "electra\model\modeling.py", line 33, in <module>
    from tensorflow.contrib import layers as contrib_layers
  File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow\__init__.py", line 50, in __getattr__
    module = self._load()
  File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow\__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_core\contrib\__init__.py", line 39, in <module>
    from tensorflow.contrib import compiler
  File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_core\contrib\compiler\__init__.py", line 21, in <module>
    from tensorflow.contrib.compiler import jit
  File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_core\contrib\compiler\__init__.py", line 22, in <module>
    from tensorflow.contrib.compiler import xla
  File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_core\contrib\compiler\xla.py", line 22, in <module>
    from tensorflow.python.estimator import model_fn as model_fn_lib
  File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_core\python\estimator\model_fn.py", line 26, in <module>
    from tensorflow_estimator.python.estimator import model_fn
  File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_estimator\__init__.py", line 10, in <module>
    from tensorflow_estimator._api.v1 import estimator
  File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_estimator\_api\v1\estimator\__init__.py", line 10, in <module>
    from tensorflow_estimator._api.v1.estimator import experimental
  File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_estimator\_api\v1\estimator\experimental\__init__.py", line 10, in <module>
    from tensorflow_estimator.python.estimator.canned.dnn import dnn_logit_fn_builder
  File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_estimator\python\estimator\canned\dnn.py", line 27, in <module>
    from tensorflow_estimator.python.estimator import estimator
  File "C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 36, in <module>
    from tensorflow.python.profiler import trace
ImportError: cannot import name 'trace' from 'tensorflow.python.profiler' (C:\Users\<user>\anaconda3\envs\tensorflow1.15\lib\site-packages\tensorflow_core\python\profiler\__init__.py)

opened by n-garc 2

Tagging Task Segment ids

Why Tagging Task segment ids are ones instead of zeros?

https://github.com/google-research/electra/blob/8a46635f32083ada044d7e9ad09604742600ee7b/finetune/tagging/tagging_tasks.py#L144

Tagging task only contains the first segment and it should be zeros, right?

@clarkkev

opened by kamalkraj 0
Optimal Learning Rate and Training Steps for Large Batch Size

First of all, thank you for sharing great work !

I was wondering how would you recommend choosing optimal hyperparams for large batch size ?

For example, if i train Electra Large model on v3-128 tpu, a batch size of 4096 is affordable. In this case, what learning rate and training steps would you suggest ? As for the data, I'm planning to train the model with my own dataset, which is of ~ 300GB of tfrecords

Do you have any rough ideas ?

Thank you

opened by robinsongh381 0

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Related tags

Overview

ELECTRA

Introduction

Released Models

Requirements

Pre-training

Quickstart: Pre-train a small ELECTRA model.

Setup

Pre-training the model.

Evaluating the pre-trained model.

Fine-tuning

Setup

Finetune ELECTRA on a GLUE task

Finetune ELECTRA on question answering

Finetune ELECTRA on sequence tagging

Adding a new task

Expected Results

Electric

Citation

Contact Info

Comments

================================================================================ Running training

Patching CVE-2007-4559

Owner

Google Research

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..

PyTorch code accompanying our paper on Maximum Entropy Generators for Energy-Based Models

Pytorch implementation of the paper Progressive Growing of Points with Tree-structured Generators (BMVC 2021)

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

RE3: State Entropy Maximization with Random Encoders for Efficient Exploration

GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.