Hi!
I got an apparently working code for converting a BERT model into a longformer, but now I am trying to convert BERTeus to Longoformer, which I expected to work in the same way (just changing the dataset + model name/path).
with a small(with big same issue) training corpus (50K lines), the training starts well, but it breaks around step 20, after 3-4 epochs.
2020-09-22 15:01:55.336576: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-09-22 15:01:55.338202: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
INFO:__main__:Loading the model from tmp/bert-base-4096
INFO:transformers.configuration_utils:loading configuration file tmp/bert-base-4096/config.json
INFO:transformers.configuration_utils:Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"attention_window": [
512,
512,
512,
512,
512,
512,
512,
512,
512,
512,
512,
512
],
"gradient_checkpointing": true,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 4096,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"output_past": true,
"pad_token_id": 3,
"type_vocab_size": 2,
"vocab_size": 50099
}
INFO:transformers.tokenization_utils_base:Model name 'tmp/bert-base-4096' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, TurkuNLP/bert-base-finnish-cased-v1, TurkuNLP/bert-base-finnish-uncased-v1, wietsedv/bert-base-dutch-cased). Assuming 'tmp/bert-base-4096' is a path, a model identifier, or url to a directory containing tokenizer files.
INFO:transformers.tokenization_utils_base:Didn't find file tmp/bert-base-4096/added_tokens.json. We won't load it.
INFO:transformers.tokenization_utils_base:Didn't find file tmp/bert-base-4096/tokenizer.json. We won't load it.
INFO:transformers.tokenization_utils_base:loading file tmp/bert-base-4096/vocab.txt
INFO:transformers.tokenization_utils_base:loading file None
INFO:transformers.tokenization_utils_base:loading file tmp/bert-base-4096/special_tokens_map.json
INFO:transformers.tokenization_utils_base:loading file tmp/bert-base-4096/tokenizer_config.json
INFO:transformers.tokenization_utils_base:loading file None
/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/modeling_auto.py:798: FutureWarning: The class `AutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models.
FutureWarning,
INFO:transformers.configuration_utils:loading configuration file tmp/bert-base-4096/config.json
INFO:transformers.configuration_utils:Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"attention_window": [
512,
512,
512,
512,
512,
512,
512,
512,
512,
512,
512,
512
],
"gradient_checkpointing": true,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 4096,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"output_past": true,
"pad_token_id": 3,
"type_vocab_size": 2,
"vocab_size": 50099
}
INFO:transformers.modeling_utils:loading weights file tmp/bert-base-4096/pytorch_model.bin
WARNING:transformers.modeling_utils:Some weights of the model checkpoint at tmp/bert-base-4096 were not used when initializing BertForMaskedLM: ['bert.encoder.layer.0.attention.self.query_global.weight', 'bert.encoder.layer.0.attention.self.query_global.bias', 'bert.encoder.layer.0.attention.self.key_global.weight', 'bert.encoder.layer.0.attention.self.key_global.bias', 'bert.encoder.layer.0.attention.self.value_global.weight', 'bert.encoder.layer.0.attention.self.value_global.bias', 'bert.encoder.layer.1.attention.self.query_global.weight', 'bert.encoder.layer.1.attention.self.query_global.bias', 'bert.encoder.layer.1.attention.self.key_global.weight', 'bert.encoder.layer.1.attention.self.key_global.bias', 'bert.encoder.layer.1.attention.self.value_global.weight', 'bert.encoder.layer.1.attention.self.value_global.bias', 'bert.encoder.layer.2.attention.self.query_global.weight', 'bert.encoder.layer.2.attention.self.query_global.bias', 'bert.encoder.layer.2.attention.self.key_global.weight', 'bert.encoder.layer.2.attention.self.key_global.bias', 'bert.encoder.layer.2.attention.self.value_global.weight', 'bert.encoder.layer.2.attention.self.value_global.bias', 'bert.encoder.layer.3.attention.self.query_global.weight', 'bert.encoder.layer.3.attention.self.query_global.bias', 'bert.encoder.layer.3.attention.self.key_global.weight', 'bert.encoder.layer.3.attention.self.key_global.bias', 'bert.encoder.layer.3.attention.self.value_global.weight', 'bert.encoder.layer.3.attention.self.value_global.bias', 'bert.encoder.layer.4.attention.self.query_global.weight', 'bert.encoder.layer.4.attention.self.query_global.bias', 'bert.encoder.layer.4.attention.self.key_global.weight', 'bert.encoder.layer.4.attention.self.key_global.bias', 'bert.encoder.layer.4.attention.self.value_global.weight', 'bert.encoder.layer.4.attention.self.value_global.bias', 'bert.encoder.layer.5.attention.self.query_global.weight', 'bert.encoder.layer.5.attention.self.query_global.bias', 'bert.encoder.layer.5.attention.self.key_global.weight', 'bert.encoder.layer.5.attention.self.key_global.bias', 'bert.encoder.layer.5.attention.self.value_global.weight', 'bert.encoder.layer.5.attention.self.value_global.bias', 'bert.encoder.layer.6.attention.self.query_global.weight', 'bert.encoder.layer.6.attention.self.query_global.bias', 'bert.encoder.layer.6.attention.self.key_global.weight', 'bert.encoder.layer.6.attention.self.key_global.bias', 'bert.encoder.layer.6.attention.self.value_global.weight', 'bert.encoder.layer.6.attention.self.value_global.bias', 'bert.encoder.layer.7.attention.self.query_global.weight', 'bert.encoder.layer.7.attention.self.query_global.bias', 'bert.encoder.layer.7.attention.self.key_global.weight', 'bert.encoder.layer.7.attention.self.key_global.bias', 'bert.encoder.layer.7.attention.self.value_global.weight', 'bert.encoder.layer.7.attention.self.value_global.bias', 'bert.encoder.layer.8.attention.self.query_global.weight', 'bert.encoder.layer.8.attention.self.query_global.bias', 'bert.encoder.layer.8.attention.self.key_global.weight', 'bert.encoder.layer.8.attention.self.key_global.bias', 'bert.encoder.layer.8.attention.self.value_global.weight', 'bert.encoder.layer.8.attention.self.value_global.bias', 'bert.encoder.layer.9.attention.self.query_global.weight', 'bert.encoder.layer.9.attention.self.query_global.bias', 'bert.encoder.layer.9.attention.self.key_global.weight', 'bert.encoder.layer.9.attention.self.key_global.bias', 'bert.encoder.layer.9.attention.self.value_global.weight', 'bert.encoder.layer.9.attention.self.value_global.bias', 'bert.encoder.layer.10.attention.self.query_global.weight', 'bert.encoder.layer.10.attention.self.query_global.bias', 'bert.encoder.layer.10.attention.self.key_global.weight', 'bert.encoder.layer.10.attention.self.key_global.bias', 'bert.encoder.layer.10.attention.self.value_global.weight', 'bert.encoder.layer.10.attention.self.value_global.bias', 'bert.encoder.layer.11.attention.self.query_global.weight', 'bert.encoder.layer.11.attention.self.query_global.bias', 'bert.encoder.layer.11.attention.self.key_global.weight', 'bert.encoder.layer.11.attention.self.key_global.bias', 'bert.encoder.layer.11.attention.self.value_global.weight', 'bert.encoder.layer.11.attention.self.value_global.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
INFO:transformers.modeling_utils:All the weights of BertForMaskedLM were initialized from the model checkpoint at tmp/bert-base-4096.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use BertForMaskedLM for predictions without further training.
INFO:__main__:Pretraining bert-base-4096 ...
INFO:filelock:Lock 140392820589624 acquired on cached_lm_BertTokenizerFast_4094_valEusLong.txt.lock
INFO:transformers.data.datasets.language_modeling:Loading features from cached file cached_lm_BertTokenizerFast_4094_valEusLong.txt [took 0.008 s]
INFO:filelock:Lock 140392820589624 released on cached_lm_BertTokenizerFast_4094_valEusLong.txt.lock
INFO:__main__:Loading and tokenizing training data is usually slow: trainEusLong1.txt
INFO:filelock:Lock 140392820589456 acquired on cached_lm_BertTokenizerFast_4094_trainEusLong1.txt.lock
INFO:transformers.data.datasets.language_modeling:Loading features from cached file cached_lm_BertTokenizerFast_4094_trainEusLong1.txt [took 0.053 s]
INFO:filelock:Lock 140392820589456 released on cached_lm_BertTokenizerFast_4094_trainEusLong1.txt.lock
INFO:transformers.training_args:PyTorch: setting up devices
INFO:transformers.trainer:You are instantiating a Trainer but W&B is not installed. To use wandb logging, run `pip install wandb; wandb login` see https://docs.wandb.com/huggingface.
INFO:transformers.trainer:***** Running Evaluation *****
INFO:transformers.trainer: Num examples = 70
INFO:transformers.trainer: Batch size = 1
Evaluation: 0%| | 0/70 [00:00<?, ?it/s]/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Evaluation: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 70/70 [00:21<00:00, 3.22it/s]
INFO:transformers.trainer:{'eval_loss': 12.326190962110246, 'step': 0}
INFO:__main__:Initial eval bpc: 17.782934574086813
INFO:transformers.trainer:***** Running training *****
INFO:transformers.trainer: Num examples = 388
INFO:transformers.trainer: Num Epochs = 501
INFO:transformers.trainer: Instantaneous batch size per device = 1
INFO:transformers.trainer: Total train batch size (w. parallel, distributed & accumulation) = 64
INFO:transformers.trainer: Gradient Accumulation steps = 64
INFO:transformers.trainer: Total optimization steps = 3000
INFO:transformers.trainer: Starting fine-tuning.
Epoch: 0%| | 0/501 [00:00<?, ?it/sINFO:transformers.trainer:{'loss': 12.102866038680077, 'learning_rate': 6.000000000000001e-08, 'epoch': 0.16494845360824742, 'step': 1} | 63/388 [01:18<06:51, 1.27s/it]
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-1
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-1/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-1/pytorch_model.bin
INFO:transformers.trainer:{'loss': 12.099215269088745, 'learning_rate': 1.2000000000000002e-07, 'epoch': 0.32989690721649484, 'step': 2} | 127/388 [02:50<05:35, 1.29s/it]
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-2
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-2/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-2/pytorch_model.bin
INFO:transformers.trainer:{'loss': 12.078452616930008, 'learning_rate': 1.8e-07, 'epoch': 0.4948453608247423, 'step': 3} | 191/388 [04:24<04:14, 1.29s/it]
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-3
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-3/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-3/pytorch_model.bin
INFO:transformers.trainer:{'loss': 12.023080185055733, 'learning_rate': 2.4000000000000003e-07, 'epoch': 0.6597938144329897, 'step': 4} | 255/388 [05:56<02:50, 1.28s/it]
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-4
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-4/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-4/pytorch_model.bin
INFO:transformers.trainer:{'loss': 12.003526121377945, 'learning_rate': 3.0000000000000004e-07, 'epoch': 0.8247422680412371, 'step': 5}█████████▉ | 319/388 [07:29<01:28, 1.29s/it]INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-5
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-5/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-5/pytorch_model.bin
INFO:transformers.trainer:{'loss': 11.993770495057106, 'learning_rate': 3.6e-07, 'epoch': 0.9896907216494846, 'step': 6}███████████████████████████████████████████████▎ | 383/388 [09:01<00:06, 1.29s/it]
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-6
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-6/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-6/pytorch_model.bin
Iteration: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 388/388 [09:18<00:00, 1.44s/it]
Epoch: 0%|▎ | 1/501 [09:18<77:36:08, 558.74s/it] INFO:transformers.trainer:{'loss': 12.672470852732658, 'learning_rate': 4.2e-07, 'epoch': 1.1649484536082475, 'step': 7} | 63/388 [01:20<06:58, 1.29s/it]
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-7
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-7/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-7/pytorch_model.bin
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-8
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-8/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-8/pytorch_model.bin
Iteration: 36%|███████████████████████████████████████████████████████▏ | 140/388 [03:21<05:27, 1.32s/iItINFO:transformers.trainer:{'loss': 11.813278079032898, 'learning_rate': 5.4e-07, 'epoch': 1.4948453608247423, 'step': 9} | 191/388 [04:27<04:15, 1.30s/it]
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-9
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-9/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-9/pytorch_model.bin
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-10
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-10/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-10/pytorch_model.bin
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-11
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-11/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-11/pytorch_model.bin
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-12
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-12/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-12/pytorch_model.bin
Iteration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 388/388 [09:24<00:00, 1.45s/it]
Epoch: 0%|▌ | 2/501 [18:43<77:40:49, 560.42s/it]<00:00, 2.07s/it]INFO:transformers.trainer:{'loss': 12.117324143648148, 'learning_rate': 7.799999999999999e-07, 'epoch': 2.1649484536082473, 'step': 13} | 63/388 [01:20<06:59, 1.29s/it]
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-13
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-13/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-13/pytorch_model.bin
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-14
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-14/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-14/pytorch_model.bin
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-15
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-15/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-15/pytorch_model.bin
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-16
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-16/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-16/pytorch_model.bin
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-17
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-17/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-17/pytorch_model.bin
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-18
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-18/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-18/pytorch_model.bin
Iteration: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 388/388 [09:24<00:00, 1.45s/it]
Epoch: 1%|▊ | 3/501 [28:07<77:40:37, 561.52s/it]4<00:00, 2.07s/itINFO:transformers.trainer:{'loss': 11.206573352217674, 'learning_rate': 1.14e-06, 'epoch': 3.1649484536082473, 'step': 19} | 63/388 [01:20<06:58, 1.29s/it]
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-19
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-19/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-19/pytorch_model.bin
INFO:transformers.trainer:Saving model checkpoint to tmp/checkpoint-20
INFO:transformers.configuration_utils:Configuration saved in tmp/checkpoint-20/config.json
INFO:transformers.modeling_utils:Model weights saved in tmp/checkpoint-20/pytorch_model.bin
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [467,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [467,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [467,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Iteration: 39%|████████████████████████████████████████████████████████████▋ | 153/388 [03:38<05:35, 1.43s/it]
Epoch: 1%|▊ | 3/501 [31:45<87:51:44, 635.15s/it]
Traceback (most recent call last):
File "BERTeus2LongB.py", line 305, in <module>
pretrain_and_evaluate(training_args, model, tokenizer, eval_only=False, model_path=training_args.output_dir)
File "BERTeus2LongB.py", line 183, in pretrain_and_evaluate
trainer.train(model_path=model_path)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/trainer.py", line 499, in train
tr_loss += self._training_step(model, inputs, optimizer)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/trainer.py", line 622, in _training_step
outputs = model(**inputs)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/modeling_bert.py", line 1083, in forward
output_hidden_states=output_hidden_states,
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/modeling_bert.py", line 753, in forward
input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/modeling_bert.py", line 182, in forward
embeddings = inputs_embeds + position_embeddings + token_type_embeddings
RuntimeError: CUDA error: device-side assert triggered
the same run with
###########################################
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
###########################################
...
Epoch: 1%|▉ | 3/501 [30:52<85:25:53, 617.58s/it]
Traceback (most recent call last):
File "BERTeus2LongB.py", line 305, in <module>
pretrain_and_evaluate(training_args, model, tokenizer, eval_only=False, model_path=training_args.output_dir)
File "BERTeus2LongB.py", line 183, in pretrain_and_evaluate
trainer.train(model_path=model_path)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/trainer.py", line 499, in train
tr_loss += self._training_step(model, inputs, optimizer)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/trainer.py", line 622, in _training_step
outputs = model(**inputs)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/modeling_bert.py", line 1083, in forward
output_hidden_states=output_hidden_states,
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/modeling_bert.py", line 762, in forward
output_hidden_states=output_hidden_states,
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/modeling_bert.py", line 430, in forward
encoder_attention_mask,
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/torch/utils/checkpoint.py", line 155, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/torch/utils/checkpoint.py", line 74, in forward
outputs = run_function(*args)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/modeling_bert.py", line 420, in custom_forward
return module(*inputs, output_attentions)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/modeling_bert.py", line 371, in forward
hidden_states, attention_mask, head_mask, output_attentions=output_attentions,
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/modeling_bert.py", line 315, in forward
hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, output_attentions,
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/transformers/modeling_bert.py", line 243, in forward
attention_scores = attention_scores + attention_mask
RuntimeError: CUDA error: device-side assert triggered
(transformers) gurbizu@azken:/mnt/datuak/gorka-tmp$ python BERTeus2LongB.py
Any hint what causes this error?
By the way, I also got sometimes this error, which I am not able to reproduce right now:
File "BERTeus2LongB.py", line 305, in <module>
pretrain_and_evaluate(training_args, model, tokenizer, eval_only=False, model_path=training_args.output_dir)
...
File "/mnt/datuak/virtualenvs/transformers/lib/python3.6/site-packages/torch/nn/functional.py", line 1372, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
Regards,
Gorka