Thanks so much for sharing your code! I'm trying to run it from the start, but have a problem during training phase. Appreciate you support in finding a root cause.
The command I run to train U-NET, paths are adjusted for the defaults:
$ th main.lua
produces error log
Setting up data loader using data/train.h5
Data loader setup done!
...
Epoch : 1, Learning Rate : 1.00000
THCudaCheck FAIL file=/home/david/torch/extra/cutorch/lib/THC/generic/THCTensorCopy.c line=81 error=77 : an illegal memory access was encountered
/home/david/torch/install/bin/luajit: cuda runtime error (77) : an illegal memory access was encountered at /home/david/torch/extra/cutorch/lib/THC/generic/THCStorage.c:147
Environment: Ubuntu 14.04, Titan X, CUDA 7.5, cuDNN v.5
Possible root causes:
- I tried to temporary remove
SpatialMaxPooling
module, following this discussion https://groups.google.com/forum/m/#!msg/torch7/Ru-I6vP2ql0/s2vOsKoVBgAJ
Finally, I simplified the NN to include no modules, but the problem persists. So, the SpatialMaxPooling
is not problematic.
- I think that the created dataset in hdf5 format has some problems. I'll try to check its correctness. If you know how to check correctness, please advice.
- I recently switched to cuDNN v.5. Could this version be problematic?
Thanks!