每次跑完8个batch后出现这个问题,改batchsize没用,都是8个batch后报错。
机器装了3块GPU,设置的GPU_ID = 1
Files already downloaded and verified
Files already downloaded and verified
Train - Epoch 1, Batch: 0, Loss: 2.296886, Time 5.307902
Train - Epoch 1, Batch: 1, Loss: 2.301040, Time 0.105161
Train - Epoch 1, Batch: 2, Loss: 2.300776, Time 0.110913
Train - Epoch 1, Batch: 3, Loss: 2.303986, Time 0.104652
Train - Epoch 1, Batch: 4, Loss: 2.289750, Time 0.100140
Train - Epoch 1, Batch: 5, Loss: 2.315252, Time 0.099318
Train - Epoch 1, Batch: 6, Loss: 2.298506, Time 0.106323
Train - Epoch 1, Batch: 7, Loss: 2.310294, Time 0.106855
Traceback (most recent call last):
File "/work/sunbiao/AdderNetCUDA-LingYeAI/main.py", line 146, in
main()
File "/work/sunbiao/AdderNetCUDA-LingYeAI/main.py", line 142, in main
train_and_test(e)
File "/work/sunbiao/AdderNetCUDA-LingYeAI/main.py", line 135, in train_and_test
train(epoch)
File "/work/sunbiao/AdderNetCUDA-LingYeAI/main.py", line 90, in train
output = net(images)
File "/home/nature/anaconda3/envs/addernet/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/work/sunbiao/AdderNetCUDA-LingYeAI/densenet.py", line 83, in forward
x = self.trans3(self.dense3(x))
File "/home/nature/anaconda3/envs/addernet/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/nature/anaconda3/envs/addernet/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/nature/anaconda3/envs/addernet/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/work/sunbiao/AdderNetCUDA-LingYeAI/densenet.py", line 17, in forward
y = self.conv1(func.relu(self.bn1(x)))
File "/home/nature/anaconda3/envs/addernet/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/work/sunbiao/AdderNetCUDA-LingYeAI/adder.py", line 104, in forward
output = adder2d_function(x, self.adder, self.stride, self.padding)
File "/work/sunbiao/AdderNetCUDA-LingYeAI/adder.py", line 39, in adder2d_function
out = out.permute(3, 0, 1, 2).contiguous()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.