System information
- Have I written custom code: Yes
- OS Platform(e.g., window10 or Linux Ubuntu 16.04): Linux
- Python version: 3.8
- Deep learning framework and version(e.g., Tensorflow2.1 or Pytorch1.3): pytorch1.7.1
- Use GPU or not: Use
- CUDA/cuDNN version(if you use GPU): CUDA11.7
- The network you trained(e.g., Resnet34 network): faster_res50_rpn
Describe the current behavior
您好,我用train_multi_GPU.py跑VG的数据集,数据集是按照my_dataset.py中的输出进行设置的,也转成了tensor,但是在”global_features,loss_dict = model(images, targets)“这一步的时候总是报"RuntimeError: chunk expects at least a 1-dimensional tensor“错误,不知道是哪个输入没有满足要求,请问有没有什么解决的办法?
我的自定义部分:将训练数据集改成了VG,将coco相关的代码注释了,同时取消了验证集,还有一部分代码写在roi_head之后,不会影响前面的基础模型的训练,其它地方的代码都没有动过。
Error info / logs
Traceback (most recent call last):
File "train_multi_GPU.py", line 273, in
main(args)
File "train_multi_GPU.py", line 151, in main
mean_loss, lr = utils.train_one_epoch(model, optimizer, data_loader,
File "/home/zzyyxx/Image_Catpion/faster_rcnn/train_utils/train_eval_utils.py", line 46, in train_one_epoch
global_features,loss_dict = model(images, targets)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 617, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 643, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 36, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim) if inputs else []
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 28, in scatter
res = scatter_map(inputs)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 17, in scatter_map
return list(map(list, zip(*map(scatter_map, obj))))
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 19, in scatter_map
return list(map(type(obj), zip(*map(scatter_map, obj.items()))))
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 13, in scatter_map
return Scatter.apply(target_gpus, None, dim, obj)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/_functions.py", line 92, in forward
outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/comm.py", line 186, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: chunk expects at least a 1-dimensional tensor
Traceback (most recent call last):
File "train_multi_GPU.py", line 273, in
main(args)
File "train_multi_GPU.py", line 151, in main
mean_loss, lr = utils.train_one_epoch(model, optimizer, data_loader,
File "/home/zzyyxx/Image_Catpion/faster_rcnn/train_utils/train_eval_utils.py", line 46, in train_one_epoch
global_features,loss_dict = model(images, targets)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 617, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 643, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 36, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim) if inputs else []
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 28, in scatter
res = scatter_map(inputs)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 17, in scatter_map
return list(map(list, zip(*map(scatter_map, obj))))
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 19, in scatter_map
return list(map(type(obj), zip(*map(scatter_map, obj.items()))))
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 13, in scatter_map
return Scatter.apply(target_gpus, None, dim, obj)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/_functions.py", line 92, in forward
outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/nn/parallel/comm.py", line 186, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: chunk expects at least a 1-dimensional tensor
Traceback (most recent call last):
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in
main()
File "/home/zzyyxx/enter/envs/ZTorch/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/home/zzyyxx/enter/envs/ZTorch/bin/python', '-u', 'train_multi_GPU.py']' returned non-zero exit status 1.