Hi, I encountered some CUDA errors, can you help on this?
reproducing command: CUDA_VISIBLE_DEVICES=0 python train.py
pytorch version: 1.8.0+cu111
CUDA Version: 11.2
error details:
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [100,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [101,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [102,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [103,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [104,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [106,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [107,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [108,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [109,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [110,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [111,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [65,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [66,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [67,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [68,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [69,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [70,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [71,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [72,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [73,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [74,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [75,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [76,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [77,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [78,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [79,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [80,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [81,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [82,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [83,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [84,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [85,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [86,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [87,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [88,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [89,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize
failed.
terminate called after throwing an instance of 'c10::Error'
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [93,0,0 what(): CUDA error: device-side assert triggered
Exception raised from record at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:116 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f9cf4eb92f2 in /data/shaoqing.tan/anaconda3/envs/albef_py/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f9cf4eb667b in /data/shaoqing.tan/anaconda3/envs/albef_py/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: + 0x25fdc6b5 (0x7f9a1aa336b5 in /data/shaoqing.tan/anaconda3/envs/albef_py/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: + 0x25bb613a (0x7f9a1a60d13a in /data/shaoqing.tan/anaconda3/envs/albef_py/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #4: + 0x25fd1d03 (0x7f9a1aa28d03 in /data/shaoqing.tan/anaconda3/envs/albef_py/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #5: + 0x10c4ee8 (0x7f9a6a999ee8 in /data/shaoqing.tan/anaconda3/envs/albef_py/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: at::native::lstm(at::Tensor const&, c10::ArrayRefat::Tensor, c10::ArrayRefat::Tensor, bool, long, double, bool, bool, bool) + 0x23b (0x7f9a6a9838bb in /data/shaoqing.tan/anaconda3/envs/albef_py/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #7: + 0x17c31f7 (0x7f9a6b0981f7 in /data/shaoqing.tan/anaconda3/envs/albef_py/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: + 0x17c327c (0x7f9a6b09827c in /data/shaoqing.tan/anaconda3/envs/albef_py/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: at::lstm(at::Tensor const&, c10::ArrayRefat::Tensor, c10::ArrayRefat::Tensor, bool, long, double, bool, bool, bool) + 0x24b (0x7f9a6ae10e8b in /data/shaoqing.tan/anaconda3/envs/albef_py/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: + 0x40d0b3 (0x7f9b64abd0b3 in /data/shaoqing.tan/anaconda3/envs/albef_py/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #46: __libc_start_main + 0xf0 (0x7f9cf9d77840 in /lib/x86_64-linux-gnu/libc.so.6)
] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [33,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [34,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [35,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [156,0,0], thread: [36,0,0] Assertion srcIndex < srcSelectDimSize
failed.
Aborted (core dumped)