What I run
python train.py ../train/ --img_size 128 --batch 8
SPEC
2 x RTX2080 ti
Two cards or one card does not change the error.
CODE
Latest.
ENVIRONMENT
# packages in environment at /opt/conda:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
async-generator 1.10 pypi_0 pypi
attrs 20.3.0 pypi_0 pypi
backcall 0.2.0 py_0
bash-kernel 0.7.2 pypi_0 pypi
beautifulsoup4 4.9.3 pyhb0f4dca_0
blas 1.0 mkl
bleach 3.2.1 pypi_0 pypi
bzip2 1.0.8 h7b6447c_0
ca-certificates 2020.10.14 0
certifi 2020.6.20 pyhd3eb1b0_3
cffi 1.14.0 py38he30daa8_1
chardet 3.0.4 py38_1003
conda 4.9.2 py38h06a4308_0
conda-build 3.20.5 py38_1
conda-package-handling 1.6.1 py38h7b6447c_0
cryptography 2.9.2 py38h1ba5d50_0
cudatoolkit 11.0.221 h6bb024c_0
dataclasses 0.6 pypi_0 pypi
decorator 4.4.2 py_0
defusedxml 0.6.0 pypi_0 pypi
dnspython 2.0.0 pypi_0 pypi
entrypoints 0.3 pypi_0 pypi
filelock 3.0.12 py_0
freetype 2.10.4 h5ab3b9f_0
future 0.18.2 pypi_0 pypi
gdown 3.12.2 pypi_0 pypi
glob2 0.7 py_0
icu 58.2 he6710b0_3
idna 2.9 py_1
intel-openmp 2020.2 254
ipykernel 5.3.4 pypi_0 pypi
ipython 7.19.0 pypi_0 pypi
ipython_genutils 0.2.0 py38_0
ipywidgets 7.5.1 pypi_0 pypi
jedi 0.17.2 py38_0
jinja2 2.11.2 py_0
jpeg 9b h024ee3a_2
json5 0.9.5 pypi_0 pypi
jsonschema 3.2.0 pypi_0 pypi
jupyter 1.0.0 pypi_0 pypi
jupyter-client 6.1.7 pypi_0 pypi
jupyter-console 6.2.0 pypi_0 pypi
jupyter-core 4.7.0 pypi_0 pypi
jupyterlab 2.2.9 pypi_0 pypi
jupyterlab-pygments 0.1.2 pypi_0 pypi
jupyterlab-server 1.2.0 pypi_0 pypi
lcms2 2.11 h396b838_0
ld_impl_linux-64 2.33.1 h53a641e_7
libarchive 3.4.2 h62408e4_0
libedit 3.1.20181209 hc058e9b_0
libffi 3.3 he6710b0_1
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
liblief 0.10.1 he6710b0_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 h2733197_1
libuv 1.40.0 h7b6447c_0
libxml2 2.9.10 hb55368b_3
lz4-c 1.9.2 heb0550a_3
markupsafe 1.1.1 py38h7b6447c_0
mistune 0.8.4 pypi_0 pypi
mkl 2020.2 256
mkl-service 2.3.0 py38he904b0f_0
mkl_fft 1.2.0 py38h23d657b_0
mkl_random 1.1.1 py38h0573a6f_0
nbclient 0.5.1 pypi_0 pypi
nbconvert 6.0.7 pypi_0 pypi
nbformat 5.0.8 pypi_0 pypi
ncurses 6.2 he6710b0_1
nest-asyncio 1.4.3 pypi_0 pypi
ninja 1.10.1 py38hfd86e86_0
notebook 5.7.5 pypi_0 pypi
numpy 1.19.2 py38h54aff64_0
numpy-base 1.19.2 py38hfa32c7d_0
olefile 0.46 py_0
openssl 1.1.1h h7b6447c_0
packaging 20.4 pypi_0 pypi
pandocfilters 1.4.3 pypi_0 pypi
parso 0.7.0 py_0
patchelf 0.12 he6710b0_0
pexpect 4.8.0 py38_0
pickleshare 0.7.5 py38_1000
pillow 8.0.0 py38h9a89aac_0
pip 20.0.2 py38_3
pkginfo 1.6.0 py38_0
prometheus-client 0.9.0 pypi_0 pypi
prompt-toolkit 3.0.8 py_0
psutil 5.7.2 py38h7b6447c_0
ptyprocess 0.6.0 py38_0
py-lief 0.10.1 py38h403a769_0
pycosat 0.6.3 py38h7b6447c_1
pycparser 2.20 py_0
pygments 2.7.1 py_0
pyopenssl 19.1.0 py38_0
pyparsing 2.4.7 pypi_0 pypi
pyrsistent 0.17.3 pypi_0 pypi
pysocks 1.7.1 py38_0
python 3.8.3 hcff3b4d_0
python-dateutil 2.8.1 pypi_0 pypi
python-etcd 0.4.5 pypi_0 pypi
python-libarchive-c 2.9 py_0
pytorch 1.7.0 py3.8_cuda11.0.221_cudnn8.0.3_0 pytorch
pytz 2020.1 py_0
pyyaml 5.3.1 py38h7b6447c_0
pyzmq 20.0.0 pypi_0 pypi
qtconsole 4.7.7 pypi_0 pypi
qtpy 1.9.0 pypi_0 pypi
readline 8.0 h7b6447c_0
requests 2.23.0 py38_0
ripgrep 12.1.1 0
ruamel_yaml 0.15.87 py38h7b6447c_0
scipy 1.5.2 py38h0b6359f_0
send2trash 1.5.0 pypi_0 pypi
setuptools 46.4.0 py38_0
six 1.14.0 py38_0
soupsieve 2.0.1 py_0
sqlite 3.31.1 h62c20be_1
terminado 0.9.1 pypi_0 pypi
testpath 0.4.4 pypi_0 pypi
tk 8.6.8 hbc83047_0
torchelastic 0.2.1 pypi_0 pypi
torchvision 0.8.0 py38_cu110 pytorch
tornado 5.1.1 pypi_0 pypi
tqdm 4.46.0 py_0
traitlets 5.0.5 py_0
typing_extensions 3.7.4.3 py_0
urllib3 1.25.8 py38_0
wcwidth 0.2.5 py_0
webencodings 0.5.1 pypi_0 pypi
wheel 0.34.2 py38_0
widgetsnbextension 3.5.1 pypi_0 pypi
xz 5.2.5 h7b6447c_0
yaml 0.1.7 had09818_2
zlib 1.2.11 h7b6447c_3
zstd 1.4.5 h9ceee32_0
ERROR MESSAGE:
Namespace(affine=False, batch=8, img_size=128, iter=200000, lr=0.0001, n_bits=5, n_block=4, n_flow=32, n_sample=20, no_lu=False, path='../train/', temp=0.7)
/workspace/glow-pytorch/model.py:102: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1603729096996/work/torch/csrc/utils/tensor_numpy.cpp:141.)
w_s = torch.from_numpy(w_s)
Loss: 2.15042; logP: -2.13823; logdet: 4.98781; lr: 0.0001000: 0%| | 1/200000
Traceback (most recent call last):
File "train.py", line 177, in <module>
train(args, model, optimizer)
File "train.py", line 148, in train
model_single.reverse(z_sample).cpu().data,
File "/workspace/glow-pytorch/model.py", line 367, in reverse
input = block.reverse(z_list[-1], z_list[-1], reconstruct=reconstruct)
File "/workspace/glow-pytorch/model.py", line 322, in reverse
input = flow.reverse(input)
File "/workspace/glow-pytorch/model.py", line 239, in reverse
input = self.invconv.reverse(input)
File "/workspace/glow-pytorch/model.py", line 136, in reverse
return F.conv2d(output, weight.squeeze().inverse().unsqueeze(2).unsqueeze(3))
RuntimeError: cusolver error: 7, when calling `cusolverDnCreate(handle)`
EXTRA:
This bug happens when doing reverse calculation when i % 100 == 0
. I changed it to i == 1
to faster the bug reproduction.
And, changing w_s = torch.from_numpy(w_s)
to w_s = torch.from_numpy(w_s.copy())
turn offs all warnings above. But the error still occurs.