Hi,
I want to finetune BanglaBERT for sequence classification.
- I face the following error while running on google colab using GPU.
- I don't face this error while running on google colab using CPU.
This error occurred while running the following command (the example of sequence classificaton from github):
python ./sequence_classification/sequence_classification.py --overwrite_output_dir --model_name_or_path "csebuetnlp/banglabert" --dataset_dir "./sequence_classification/sample_inputs/single_sequence/jsonl" --output_dir "./sequence_classification/outputs/" --learning_rate=2e-5 --warmup_ratio 0.1 --gradient_accumulation_steps 2 --weight_decay 0.1 --lr_scheduler_type "linear" --per_device_train_batch_size=16 --per_device_eval_batch_size=16 --max_seq_length 512 --logging_strategy "epoch" --save_strategy "epoch" --evaluation_strategy "epoch" --num_train_epochs=3 --do_train --do_eval
Error Traceback:
05/08/2022 09:24:21 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
05/08/2022 09:24:21 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=2,
greater_is_better=None,
group_by_length=False,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=./sequence_classification/outputs/runs/May08_09-24-21_0da7ed02e26d,
logging_first_step=False,
logging_steps=500,
logging_strategy=IntervalStrategy.EPOCH,
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
output_dir=./sequence_classification/outputs/,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=16,
per_device_train_batch_size=16,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=outputs,
push_to_hub_organization=None,
push_to_hub_token=None,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=./sequence_classification/outputs/,
save_on_each_node=False,
save_steps=500,
save_strategy=IntervalStrategy.EPOCH,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_legacy_prediction_loop=False,
warmup_ratio=0.1,
warmup_steps=0,
weight_decay=0.1,
)
05/08/2022 09:24:21 - WARNING - datasets.builder - Using custom data configuration default-1e09c73b0f004fd6
05/08/2022 09:24:21 - INFO - datasets.builder - Overwrite dataset info from restored data version.
05/08/2022 09:24:21 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0
05/08/2022 09:24:21 - WARNING - datasets.builder - Reusing dataset json (/root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0)
05/08/2022 09:24:21 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0
100% 3/3 [00:00<00:00, 886.31it/s]
[INFO|configuration_utils.py:561] 2022-05-08 09:24:22,163 >> loading configuration file https://huggingface.co/csebuetnlp/banglabert/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/60928dc4b87f5881692890e6541e6538f91588d2ea40cbbbdc04cfb2cb83a6b1.2388211ba94f448fcf40aef3c9526142a8c2f2a8fb4fce8a3801462f51b2bab5
[INFO|configuration_utils.py:598] 2022-05-08 09:24:22,164 >> Model config ElectraConfig {
"architectures": [
"ElectraForPreTraining"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"embedding_size": 768,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "electra",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"summary_activation": "gelu",
"summary_last_dropout": 0.1,
"summary_type": "first",
"summary_use_proj": true,
"transformers_version": "4.11.0.dev0",
"type_vocab_size": 2,
"vocab_size": 32000
}
[INFO|configuration_utils.py:561] 2022-05-08 09:24:23,954 >> loading configuration file https://huggingface.co/csebuetnlp/banglabert/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/60928dc4b87f5881692890e6541e6538f91588d2ea40cbbbdc04cfb2cb83a6b1.2388211ba94f448fcf40aef3c9526142a8c2f2a8fb4fce8a3801462f51b2bab5
[INFO|configuration_utils.py:598] 2022-05-08 09:24:23,955 >> Model config ElectraConfig {
"architectures": [
"ElectraForPreTraining"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"embedding_size": 768,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "electra",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"summary_activation": "gelu",
"summary_last_dropout": 0.1,
"summary_type": "first",
"summary_use_proj": true,
"transformers_version": "4.11.0.dev0",
"type_vocab_size": 2,
"vocab_size": 32000
}
[INFO|tokenization_utils_base.py:1739] 2022-05-08 09:24:29,230 >> loading file https://huggingface.co/csebuetnlp/banglabert/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/65e95b847336b6bf69b37fdb8682a97e822799adcd9745dcf9bf44cfe4db1b9a.8f92ca2cf7e2eaa550b10c40331ae9bf0f2e40abe3b549f66a3d7f13bfc6de47
[INFO|tokenization_utils_base.py:1739] 2022-05-08 09:24:29,230 >> loading file https://huggingface.co/csebuetnlp/banglabert/resolve/main/added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:1739] 2022-05-08 09:24:29,230 >> loading file https://huggingface.co/csebuetnlp/banglabert/resolve/main/special_tokens_map.json from cache at /root/.cache/huggingface/transformers/7820dfc553e8dfb8a1e82042b7d0d691c7a7cd1e30ed2974218f696e81c5f3b1.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d
[INFO|tokenization_utils_base.py:1739] 2022-05-08 09:24:29,230 >> loading file https://huggingface.co/csebuetnlp/banglabert/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/76fa87a0ec9c34c9b15732bf7e06bced447feff46287b8e7d246a55d301784d7.b4f59cefeba4296760d2cf1037142788b96f2be40230bf6393d2fba714562485
[INFO|tokenization_utils_base.py:1739] 2022-05-08 09:24:29,230 >> loading file https://huggingface.co/csebuetnlp/banglabert/resolve/main/tokenizer.json from cache at None
[INFO|configuration_utils.py:561] 2022-05-08 09:24:30,126 >> loading configuration file https://huggingface.co/csebuetnlp/banglabert/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/60928dc4b87f5881692890e6541e6538f91588d2ea40cbbbdc04cfb2cb83a6b1.2388211ba94f448fcf40aef3c9526142a8c2f2a8fb4fce8a3801462f51b2bab5
[INFO|configuration_utils.py:598] 2022-05-08 09:24:30,126 >> Model config ElectraConfig {
"architectures": [
"ElectraForPreTraining"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"embedding_size": 768,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "electra",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"summary_activation": "gelu",
"summary_last_dropout": 0.1,
"summary_type": "first",
"summary_use_proj": true,
"transformers_version": "4.11.0.dev0",
"type_vocab_size": 2,
"vocab_size": 32000
}
[INFO|modeling_utils.py:1279] 2022-05-08 09:24:31,075 >> loading weights file https://huggingface.co/csebuetnlp/banglabert/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/913ea71768a80ccdde3a9ab9a88cf2a93f37a52008896997655d0f63b0d0743a.8aaedac281b72dbb5296319c53be5a4e4a52339eded3f68d49201e140e221615
[WARNING|modeling_utils.py:1516] 2022-05-08 09:24:32,600 >> Some weights of the model checkpoint at csebuetnlp/banglabert were not used when initializing ElectraForSequenceClassification: ['discriminator_predictions.dense.weight', 'discriminator_predictions.dense.bias', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense_prediction.bias']
- This IS expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:1527] 2022-05-08 09:24:32,600 >> Some weights of ElectraForSequenceClassification were not initialized from the model checkpoint at csebuetnlp/banglabert and are newly initialized: ['classifier.out_proj.bias', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
05/08/2022 09:24:32 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0/cache-c8c752bb15628b86.arrow
05/08/2022 09:24:32 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0/cache-1d7e8a13339dd538.arrow
05/08/2022 09:24:32 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0/cache-5b734993f8fa5b18.arrow
05/08/2022 09:24:33 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0/cache-ae957e77cc0e01d1.arrow
05/08/2022 09:24:33 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0/cache-ad37b78f61cc4fc6.arrow
05/08/2022 09:24:33 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-1e09c73b0f004fd6/0.0.0/cache-efbe758578e42091.arrow
05/08/2022 09:24:33 - INFO - __main__ - Sample 0 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'input_ids': [2, 4992, 10267, 784, 27147, 415, 830, 7761, 1333, 16, 983, 12484, 825, 5083, 2893, 426, 2636, 16493, 415, 815, 2068, 795, 205, 3], 'label': 0, 'sentence1': 'যেই মাদারির পোলারা এই কাজটি করেছে, সেই সালারা অবৈধ জারপ সন্তান ছারা আর কিছুই না।', 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
05/08/2022 09:24:33 - INFO - __main__ - Sample 3 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1], 'input_ids': [2, 10634, 5452, 817, 972, 6037, 3], 'label': 0, 'sentence1': 'মুসা কপা\u200cলে কি আ\u200cছে জা\u200cনিনা', 'token_type_ids': [0, 0, 0, 0, 0, 0, 0]}.
05/08/2022 09:24:33 - INFO - __main__ - Sample 1 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1], 'input_ids': [2, 2157, 18812, 16332, 12062, 16135, 1292, 3], 'label': 0, 'sentence1': 'ভারতের কুখ্যাত ষড়যন্ত্রের মুখোশ উন্মোচন হলো', 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0]}.
05/08/2022 09:24:35 - INFO - datasets.load - Found main folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/accuracy/accuracy.py at /root/.cache/huggingface/modules/datasets_modules/metrics/accuracy
05/08/2022 09:24:35 - INFO - datasets.load - Found specific version folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/accuracy/accuracy.py at /root/.cache/huggingface/modules/datasets_modules/metrics/accuracy/6dba4616f6b2bbd19659d1db3a48cc001c8f13a10cdc73a2641a55f7a60b7b5b
05/08/2022 09:24:35 - INFO - datasets.load - Found script file from https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/accuracy/accuracy.py to /root/.cache/huggingface/modules/datasets_modules/metrics/accuracy/6dba4616f6b2bbd19659d1db3a48cc001c8f13a10cdc73a2641a55f7a60b7b5b/accuracy.py
05/08/2022 09:24:35 - INFO - datasets.load - Couldn't find dataset infos file at https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/accuracy/dataset_infos.json
05/08/2022 09:24:35 - INFO - datasets.load - Found metadata file for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/accuracy/accuracy.py at /root/.cache/huggingface/modules/datasets_modules/metrics/accuracy/6dba4616f6b2bbd19659d1db3a48cc001c8f13a10cdc73a2641a55f7a60b7b5b/accuracy.json
05/08/2022 09:24:36 - INFO - datasets.load - Found main folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/precision/precision.py at /root/.cache/huggingface/modules/datasets_modules/metrics/precision
05/08/2022 09:24:36 - INFO - datasets.load - Found specific version folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/precision/precision.py at /root/.cache/huggingface/modules/datasets_modules/metrics/precision/94709a71c6fe37171ef49d3466fec24dee9a79846c9f176dff66a649e9811690
05/08/2022 09:24:36 - INFO - datasets.load - Found script file from https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/precision/precision.py to /root/.cache/huggingface/modules/datasets_modules/metrics/precision/94709a71c6fe37171ef49d3466fec24dee9a79846c9f176dff66a649e9811690/precision.py
05/08/2022 09:24:36 - INFO - datasets.load - Couldn't find dataset infos file at https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/precision/dataset_infos.json
05/08/2022 09:24:36 - INFO - datasets.load - Found metadata file for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/precision/precision.py at /root/.cache/huggingface/modules/datasets_modules/metrics/precision/94709a71c6fe37171ef49d3466fec24dee9a79846c9f176dff66a649e9811690/precision.json
05/08/2022 09:24:38 - INFO - datasets.load - Found main folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/recall/recall.py at /root/.cache/huggingface/modules/datasets_modules/metrics/recall
05/08/2022 09:24:38 - INFO - datasets.load - Found specific version folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/recall/recall.py at /root/.cache/huggingface/modules/datasets_modules/metrics/recall/1e3b93e2bed42e1498e628f161d79ee019dd8e78d36985d3c7ecbc018adf35e8
05/08/2022 09:24:38 - INFO - datasets.load - Found script file from https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/recall/recall.py to /root/.cache/huggingface/modules/datasets_modules/metrics/recall/1e3b93e2bed42e1498e628f161d79ee019dd8e78d36985d3c7ecbc018adf35e8/recall.py
05/08/2022 09:24:38 - INFO - datasets.load - Couldn't find dataset infos file at https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/recall/dataset_infos.json
05/08/2022 09:24:38 - INFO - datasets.load - Found metadata file for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/recall/recall.py at /root/.cache/huggingface/modules/datasets_modules/metrics/recall/1e3b93e2bed42e1498e628f161d79ee019dd8e78d36985d3c7ecbc018adf35e8/recall.json
05/08/2022 09:24:39 - INFO - datasets.load - Found main folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/f1/f1.py at /root/.cache/huggingface/modules/datasets_modules/metrics/f1
05/08/2022 09:24:39 - INFO - datasets.load - Found specific version folder for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/f1/f1.py at /root/.cache/huggingface/modules/datasets_modules/metrics/f1/6c86fddbf90432b9c43a8c38c62a0dd9de63bad2ef0a896f9ae20273d6d6f6e9
05/08/2022 09:24:39 - INFO - datasets.load - Found script file from https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/f1/f1.py to /root/.cache/huggingface/modules/datasets_modules/metrics/f1/6c86fddbf90432b9c43a8c38c62a0dd9de63bad2ef0a896f9ae20273d6d6f6e9/f1.py
05/08/2022 09:24:39 - INFO - datasets.load - Couldn't find dataset infos file at https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/f1/dataset_infos.json
05/08/2022 09:24:39 - INFO - datasets.load - Found metadata file for metric https://raw.githubusercontent.com/huggingface/datasets/1.11.0/metrics/f1/f1.py at /root/.cache/huggingface/modules/datasets_modules/metrics/f1/6c86fddbf90432b9c43a8c38c62a0dd9de63bad2ef0a896f9ae20273d6d6f6e9/f1.json
[INFO|trainer.py:521] 2022-05-08 09:24:43,888 >> The following columns in the training set don't have a corresponding argument in `ElectraForSequenceClassification.forward` and have been ignored: sentence1.
[INFO|trainer.py:1168] 2022-05-08 09:24:43,900 >> ***** Running training *****
[INFO|trainer.py:1169] 2022-05-08 09:24:43,900 >> Num examples = 4
[INFO|trainer.py:1170] 2022-05-08 09:24:43,900 >> Num Epochs = 3
[INFO|trainer.py:1171] 2022-05-08 09:24:43,900 >> Instantaneous batch size per device = 16
[INFO|trainer.py:1172] 2022-05-08 09:24:43,900 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:1173] 2022-05-08 09:24:43,900 >> Gradient Accumulation steps = 2
[INFO|trainer.py:1174] 2022-05-08 09:24:43,900 >> Total optimization steps = 3
0% 0/3 [00:00<?, ?it/s]Traceback (most recent call last):
File "./sequence_classification/sequence_classification.py", line 479, in <module>
main()
File "./sequence_classification/sequence_classification.py", line 426, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1284, in train
tr_loss += self.training_step(model, inputs)
File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1789, in training_step
loss = self.compute_loss(model, inputs)
File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1821, in compute_loss
outputs = model(**inputs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/models/electra/modeling_electra.py", line 973, in forward
return_dict,
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/models/electra/modeling_electra.py", line 879, in forward
input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/models/electra/modeling_electra.py", line 206, in forward
inputs_embeds = self.word_embeddings(input_ids)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/sparse.py", line 160, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2183, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
0% 0/3 [00:00<?, ?it/s]
Probable solution from pytorch discussion forum which I can't figure out: https://discuss.pytorch.org/t/code-that-loads-sgd-fails-to-load-adam-state-to-gpu/61783/3?u=shaibagon
Thanks.