Hi, I met a RuntimeError when training a prefix model. Do you have any suggestions?
Here is the evironments:
certifi (2021.5.30)
charset-normalizer (2.0.4)
click (8.0.1)
dataclasses (0.8)
filelock (3.0.12)
idna (3.2)
importlib-metadata (4.8.1)
itsdangerous (2.0.1)
Jinja2 (3.0.1)
joblib (1.0.1)
MarkupSafe (2.0.1)
nltk (3.6.2)
numpy (1.19.5)
packaging (21.0)
Pillow (8.3.2)
pip (9.0.3)
pyparsing (2.4.7)
Python-dev (2.0.0.dev0)
regex (2021.8.28)
requests (2.26.0)
sacremoses (0.0.45)
sentencepiece (0.1.96)
setuptools (39.2.0)
six (1.16.0)
tokenizers (0.8.1rc2)
torch (1.8.0+cu111)
torchvision (0.9.0+cu111)
tqdm (4.62.2)
transformers (3.2.0, /home/yanzhongxiang/PrefixTuning/transformers/src)
typing-extensions (3.10.0.2)
urllib3 (1.26.6)
Werkzeug (2.0.1)
zipp (3.5.0)
Here is the command line:
python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101 --cache_dir ./cache
Here is the error information:
webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
python run_language_modeling.py --output_dir=webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --model_type=gpt2 --model_name_or_path=gpt2-medium --tokenizer_name=gpt2-medium --per_device_train_batch_size 5 --per_device_eval_batch_size 5 --save_steps 500000 --num_train_epochs 5 --do_train --train_data_file=../data/webnlg_challenge_2017/train.json --do_eval --line_by_line --save_total_limit 1 --overwrite_output_dir --task_mode webnlg --eval_data_file=../data/webnlg_challenge_2017/dev.json --tuning_mode prefixtune --logging_dir webnlg_models/runs/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --train_embs no --optim_prefix yes --preseqlen 5 --prefix_mode activation --format_mode cat --gradient_accumulation_steps 1 --learning_rate 5e-05 --weight_decay 0.0 --seed 101 --disable_tqdm --mid_dim 512 --init_random no --use_dropout no --prefix_dropout 0.0 --objective_mode 1 --evaluate_during_training --eval_steps 5000 --cache_dir cache/gpt2-medium-s3
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/__init__.py
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/training_args.py:299: FutureWarning: The `evaluate_during_training` argument is deprecated in favor of `evaluation_strategy` (which has more options)
FutureWarning,
09/16/2021 10:22:04 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 8, distributed training: False, 16-bits training: False
09/16/2021 10:22:04 - INFO - __main__ - Training/evaluation parameters TrainingArguments(output_dir='webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=False, evaluate_during_training=True, evaluation_strategy=<EvaluationStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=5, per_device_eval_batch_size=5, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=5.0, max_steps=-1, warmup_steps=0, logging_dir='webnlg_models/runs/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', logging_first_step=False, logging_steps=500, save_steps=500000, save_total_limit=1, no_cuda=False, seed=101, fp16=False, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=5000, dataloader_num_workers=0, past_index=-1, run_name=None, disable_tqdm=True, remove_unused_columns=True, label_names=None)
objective is 1
False
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/tokenization_utils_base.py:1324: FutureWarning: The `max_len` attribute has been deprecated and will be removed in a future version, use `model_max_length` instead.
FutureWarning,
prefixtune
adapting the size of the model embedding to include [PAD]
len(tokenizer) = 50257
len(tokenizer) = 50258
<|endoftext|> 50256
<|endoftext|> 50256
loading the prefix model from None
training the prefix model from scratch.
under the PrefixTuning model
PrefixTuning
preseqlen is 5, optimizing the prefix directly
[Full prefix-tuning Setting :) ]
torch.Size([5, 1024])
torch.Size([512, 1024])
torch.Size([512])
torch.Size([49152, 512])
torch.Size([49152])
total param is 25744896
webnlg
tgt_avg: 30.665242718446603
src_avg: 49.62568654646324
ratios: 1.6183040519881826
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220, 50256, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
| Aarhus_Airport : cityServed : "Aarhus, Denmark" <|endoftext|> The Aarhus is the airport of Aarhus, Denmark. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220]
[50256, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[1748, 50, 8520]
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220, 50256, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
| Aarhus_Airport : cityServed : "Aarhus, Denmark" <|endoftext|> Aarhus Airport serves the city of Aarhus, Denmark. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220]
[50256, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[1748, 50, 8520]
webnlg
tgt_avg: 31.644375553587246
src_avg: 51.023914968999115
ratios: 1.6124165535386898
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
[220, 930, 317, 283, 7537, 1058, 3554, 5376, 1058, 12806, 62, 33, 917, 82, 36232, 220, 50256, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
| Aarhus : leaderName : Jacob_Bundsgaard <|endoftext|> The leader of Aarhus is Jacob Bundsgaard. <|endoftext|>
[220, 930, 317, 283, 7537, 1058, 3554, 5376, 1058, 12806, 62, 33, 917, 82, 36232, 220]
[50256, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
[3554, 5376]
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 23443, 24539, 1058, 20479, 17, 13, 15, 220, 50256, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
| Aarhus_Airport : runwayLength : 2702.0 <|endoftext|> Aarhus Airport's runway length is 2702.0. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 23443, 24539, 1058, 20479, 17, 13, 15, 220]
[50256, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
[23443, 24539]
FORMAT MODE IS cat
/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py:309: FutureWarning: Passing `prediction_loss_only` as a keyword argument is deprecated and won't be possible in a future version. Use `args.prediction_loss_only` instead.
FutureWarning,
09/16/2021 10:22:53 - WARNING - trainer_prefix - You are instantiating a Trainer but Tensorboard is not installed. You should consider installing it.
/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py:1291: FutureWarning: This method is deprecated, use `Trainer.is_world_process_zero()` instead.
warnings.warn("This method is deprecated, use `Trainer.is_world_process_zero()` instead.", FutureWarning)
{'state': {}, 'param_groups': [{'weight_decay': 0.0, 'lr': 5e-05, 'betas': (0.9, 0.999), 'eps': 1e-08, 'correct_bias': True, 'params': [0, 1, 2]}, {'weight_decay': 0.0, 'lr': 5e-05, 'betas': (0.9, 0.999), 'eps': 1e-08, 'correct_bias': True, 'params': [3, 4]}]}
09/16/2021 10:22:53 - INFO - trainer_prefix - ***** Running training *****
09/16/2021 10:22:53 - INFO - trainer_prefix - Num examples = 18025
09/16/2021 10:22:53 - INFO - trainer_prefix - Num Epochs = 5
09/16/2021 10:22:53 - INFO - trainer_prefix - Instantaneous batch size per device = 5
09/16/2021 10:22:53 - INFO - trainer_prefix - Total train batch size (w. parallel, distributed & accumulation) = 40
09/16/2021 10:22:53 - INFO - trainer_prefix - Gradient Accumulation steps = 1
09/16/2021 10:22:53 - INFO - trainer_prefix - Total optimization steps = 2255
Traceback (most recent call last):
File "run_language_modeling.py", line 1159, in <module>
main()
File "run_language_modeling.py", line 993, in main
trainer.train(model_path=model_path)
File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 811, in train
tr_loss += self.training_step(model, inputs)
File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 1174, in training_step
loss = self.compute_loss(model, inputs, gpt2_model=self.gpt2)
File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 1214, in compute_loss
outputs = model(**inputs, gpt2_model=gpt2_model)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/gpt2/train_control.py", line 327, in forward
return_dict=return_dict, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 951, in forward
return_dict=return_dict,
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 619, in forward
inputs_embeds = self.wte(input_ids)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/sparse.py", line 147, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/functional.py", line 1913, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device