hi,
thanks for this code.
i have 2 related questions.
q1. when running this example on my machine, i got this error:
Traceback (most recent call last):
File "x/permut.py", line 29, in <module>
torch.from_numpy(rgb / 0.125).cuda().float())
File "x/PAM_cuda/pl.py", line 20, in forward
rank, barycentric, blur_neighbours1, blur_neighbours2, indices = PermutohedralLattice.prepare(feat)
File "x/PAM_cuda/pl.py", line 116, in prepare
_ = HT_opp.insert(table, n_entries, loc[scit].type(torch.cuda.IntTensor), loc_hash[scit].type(torch.cuda.IntTensor))
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
it is a segfault error.
the installation is done using
python setup.y build
python setup.y install
when running with $ CUDA_LAUNCH_BLOCKING=1 python permut.py
, i got this:
Traceback (most recent call last):
File "permut.py", line 29, in <module>
torch.from_numpy(rgb / 0.125).cuda().float())
File "x/PAM_cuda/pl.py", line 20, in forward
rank, barycentric, blur_neighbours1, blur_neighbours2, indices = PermutohedralLattice.prepare(feat)
File "x/PAM_cuda/pl.py", line 116, in prepare
_ = HT_opp.insert(table, n_entries, loc[scit].type(torch.cuda.IntTensor), loc_hash[scit].type(torch.cuda.IntTensor))
RuntimeError: CUDA error: invalid device function
Segmentation fault (core dumped)
the used code is:
import sys
from os.path import dirname, abspath
import re
import torch.nn as nn
import torch
import torch.nn.functional as F
# path stuff
# path stuff
from PAM_cuda.pl import PermutohedralLattice
if __name__ == '__main__':
import numpy as np
import cv2
import torch
import matplotlib.pyplot as plt
im = cv2.imread("dog.png")
indices = np.reshape(np.indices(im.shape[:2]), (2, -1))[None, :]
im = np.transpose(im, (2, 0, 1))
rgb = np.reshape(im, (3, -1))[None, :]
pl = PermutohedralLattice.apply
out = pl(torch.from_numpy(indices / 5.0).cuda().float(),
torch.from_numpy(rgb / 0.125).cuda().float())
output = out.squeeze().cpu().numpy()
output = np.transpose(output, (1, 0))
output = np.reshape(output, (im.shape[1], im.shape[2], 3))
plt.imshow(output / output.max())
plt.imshow(np.transpose(im, (1, 2, 0)))
any idea how to fix this?
i will post the other question in a separate issue.
thanks for your help
info:
conda virtual env: conda create -n env_test python=3.7
python 3.7.9
pytorch 1.9.0 installed with conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia
cv2 4.1.2
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
CUDA Version with nvcc-smi
: 11.1
gpu: p100
nvisia-smi:
NVIDIA-SMI 455.32.00
Driver Version: 455.32.00
so far , i tested only on one server, where i expected the example to work.
let me know if you need more info.
the virtual env is within conda.
thanks