hi
i run this code for VCR task.(training)
i got an error like this.
(vilbert-mt) ailab@ailab:~/vilbert-multi-task$ python train_tasks.py --bert_model bert-base-uncased --from_pretrained save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --tasks 5-6 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model
Failed to import tensorflow.
03/13/2020 13:52:59 - INFO - __main__ - device: cuda n_gpu: 2, distributed training: False, 16-bits training: False
03/13/2020 13:53:00 - INFO - pytorch_transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/ailab/.cache/torch/pytorch_transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
03/13/2020 13:53:00 - INFO - vilbert.task_utils - Loading VCR_Q-A Dataset with batch size 16
03/13/2020 13:53:11 - INFO - vilbert.task_utils - Loading VCR_QA-R Dataset with batch size 16
03/13/2020 13:53:25 - INFO - vilbert.utils - logging file at: save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-multi_task_model/logs
03/13/2020 13:53:25 - INFO - vilbert.utils - loading weights file save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin
03/13/2020 13:53:29 - INFO - vilbert.utils - Weights of VILBertForVLTasks not initialized from pretrained model: ['bert.embeddings.task_embeddings.weight', 'vil_prediction.logit_fc.0.weight', 'vil_prediction.logit_fc.0.bias', 'vil_prediction.logit_fc.2.weight', 'vil_prediction.logit_fc.2.bias', 'vil_prediction.logit_fc.3.weight', 'vil_prediction.logit_fc.3.bias', 'vil_prediction_gqa.logit_fc.0.weight', 'vil_prediction_gqa.logit_fc.0.bias', 'vil_prediction_gqa.logit_fc.2.weight', 'vil_prediction_gqa.logit_fc.2.bias', 'vil_prediction_gqa.logit_fc.3.weight', 'vil_prediction_gqa.logit_fc.3.bias', 'vil_binary_prediction.logit_fc.0.weight', 'vil_binary_prediction.logit_fc.0.bias', 'vil_binary_prediction.logit_fc.2.weight', 'vil_binary_prediction.logit_fc.2.bias', 'vil_binary_prediction.logit_fc.3.weight', 'vil_binary_prediction.logit_fc.3.bias', 'vil_tri_prediction.weight', 'vil_tri_prediction.bias']
03/13/2020 13:53:29 - INFO - vilbert.utils - Weights from pretrained model not used in VILBertForVLTasks: ['vil_prediction.main.0.bias', 'vil_prediction.main.0.weight_g', 'vil_prediction.main.0.weight_v', 'vil_prediction.main.3.bias', 'vil_prediction.main.3.weight_g', 'vil_prediction.main.3.weight_v']
559 559
***** Running training *****
Num Iters: {'TASK5': 2039, 'TASK6': 2039}
Batch size: {'TASK5': 16, 'TASK6': 16}
Num steps: 40780
Epoch: 0%| | 0/20 [00:01<?, ?it/s]
Traceback (most recent call last):
File "train_tasks.py", line 670, in <module>
main()
File "train_tasks.py", line 529, in main
task_losses,
File "/home/ailab/vilbert-multi-task/vilbert/task_utils.py", line 200, in ForwardModelsTrain
if task_cfg[task_id]["process"] in ["dialog"]:
KeyError: 'process'
Exception in thread Thread-3:
Traceback (most recent call last):
File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/utils/data/_utils/pin_memory.py", line 21, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 276, in rebuild_storage_fd
fd = df.detach()
File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 737, in answer_challenge
response = connection.recv_bytes(256) # reject large message
File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
how do i do? help T^T