PyTorch: 1.11
CudaToolKits: 11.3.1
Error occur while running this command
python tools/train.py configs/raft/raft_8x2_50k_kitti2015_288x960.py
Complete stacktrace
/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1646755953518/work/aten/src/ATen/native/TensorShape.cpp:2228.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torch/autograd/__init__.py:175: UserWarning: Error detected in ReluBackward0. Traceback of forward call that caused the error:
File "tools/train.py", line 209, in <module>
main()
File "tools/train.py", line 205, in main
meta=meta)
File "/media/exthdd/laizeqiang/lzq/projects/misc/mmflow/mmflow/apis/train.py", line 238, in train_model
runner.run(data_loaders, cfg.workflow)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 134, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 61, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/media/exthdd/laizeqiang/lzq/projects/misc/mmflow/mmflow/models/flow_estimators/base.py", line 90, in train_step
losses = self(**data, test_mode=False)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/media/exthdd/laizeqiang/lzq/projects/misc/mmflow/mmflow/models/flow_estimators/base.py", line 59, in forward
return self.forward_train(*args, **kwargs)
File "/media/exthdd/laizeqiang/lzq/projects/misc/mmflow/mmflow/models/flow_estimators/raft.py", line 107, in forward_train
feat1, feat2, h_feat, cxt_feat = self.extract_feat(imgs)
File "/media/exthdd/laizeqiang/lzq/projects/misc/mmflow/mmflow/models/flow_estimators/raft.py", line 74, in extract_feat
cxt_feat = self.context(img1)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/media/exthdd/laizeqiang/lzq/projects/misc/mmflow/mmflow/models/encoders/raft_encoder.py", line 296, in forward
x = res_layer(x)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/media/exthdd/laizeqiang/lzq/projects/misc/mmflow/mmflow/models/utils/res_layer.py", line 88, in forward
out = _inner_forward(x)
File "/media/exthdd/laizeqiang/lzq/projects/misc/mmflow/mmflow/models/utils/res_layer.py", line 76, in _inner_forward
out = self.relu(out)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/activation.py", line 98, in forward
return F.relu(input, inplace=self.inplace)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/functional.py", line 1442, in relu
result = torch.relu(input)
(Triggered internally at /opt/conda/conda-bld/pytorch_1646755953518/work/torch/csrc/autograd/python_anomaly_mode.cpp:104.)
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "tools/train.py", line 209, in <module>
main()
File "tools/train.py", line 205, in main
meta=meta)
File "/media/exthdd/laizeqiang/lzq/projects/misc/mmflow/mmflow/apis/train.py", line 238, in train_model
runner.run(data_loaders, cfg.workflow)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 134, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 67, in train
self.call_hook('after_train_iter')
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 309, in call_hook
getattr(hook, fn_name)(self)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/hooks/optimizer.py", line 56, in after_train_iter
runner.outputs['loss'].backward()
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/laizeqiang/miniconda3/envs/openmmlab/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 128, 36, 120]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!