So, I am trying to convert my point cloud to a mesh using your code. There seems to be some problem in the implementation i think
here's my point cloud:
test_100k.ply.zip
I've run your code with
python onSurPrior.py --data_dir ./data/ --out_dir ./train_net/ --CUDA 0 --INPUT_NUM 102906 --epoch 30000 --input_ply_file test_100k.ply --train
epoch: 25000 epoch loss: 6.665881e-05 loss_sdf: 2.4980092e-05 move loss: 0.0002083936
epoch: 25500 epoch loss: 0.000111603404 loss_sdf: 2.5077257e-05 move loss: 0.00043263074
epoch: 26000 epoch loss: 0.00020120309 loss_sdf: 2.490602e-05 move loss: 0.0008814853
epoch: 26500 epoch loss: 0.00021180889 loss_sdf: 2.4966e-05 move loss: 0.00093421445
epoch: 27000 epoch loss: 5.7760502e-05 loss_sdf: 2.4734798e-05 move loss: 0.00016512853
epoch: 27500 epoch loss: 4.7512913e-05 loss_sdf: 2.4716563e-05 move loss: 0.00011398175
epoch: 28000 epoch loss: 0.00021200601 loss_sdf: 2.4922432e-05 move loss: 0.00093541783
epoch: 28500 epoch loss: 0.00020918738 loss_sdf: 2.4982723e-05 move loss: 0.0009210232
epoch: 29000 epoch loss: 6.323241e-05 loss_sdf: 2.4784342e-05 move loss: 0.00019224035
epoch: 29500 epoch loss: 4.3345903e-05 loss_sdf: 2.486095e-05 move loss: 9.242476e-05
save model
run_time: 4451.92893910408
it seems to train file with loss decreasing
but the mesh output is not good
i've run the test part with
python onSurPrior.py --data_dir ./data/ --out_dir ./train_net/ --CUDA 0 --INPUT_NUM 102906 --epoch 30000 --input_ply_file test_100k.ply --test
but there is an error
g_points_knn: Tensor("GatherV2_1:0", shape=(1, 4096, 50, 3), dtype=float32)
g_points_knn: Tensor("Reshape_12:0", shape=(4096, 50, 3), dtype=float32)
rotate_p: Tensor("Tile_1:0", shape=(4096, 50, 3), dtype=float32)
feature_f: Tensor("pointnet_1/Relu:0", shape=(4096, 512), dtype=float32)
pointnet: Tensor("pointnet_1/dense_1/BiasAdd:0", shape=(4096, 512), dtype=float32)
feature_f: Tensor("pointnet_2/Relu:0", shape=(4096, 512), dtype=float32)
pointnet: Tensor("pointnet_2/dense_1/BiasAdd:0", shape=(4096, 512), dtype=float32)
feature_bs: (2000, 2000)
test start
256
[1.05 1.4866935 0.773277 ]
[-0.05 -0.05 -0.05]
max_min: 0.00043043494 -0.0009069443 3.9753166e-05
0 16777216
Traceback (most recent call last):
File "onSurPrior.py", line 832, in <module>
vertices, triangles, _, _ = marching_cubes_lewiner(vox, thresh)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p38/lib/python3.8/site-packages/skimage/measure/_marching_cubes_lewiner.py", line 135, in marching_cubes_lewiner
raise ValueError("Surface level must be within volume data range.")
ValueError: Surface level must be within volume data range.
i believe this is due to the output being larger than the voxel specified, so i made the following change
bd_max = np.asarray(bd_max) + 0.05 * 20
bd_min = np.asarray(bd_min) - 0.05 * 20
this does not give the surface level must be within volume data range
error, but the output is wrong
So i removed the bd_max and bd_min change, and set voxel size to 128 and run on a different system. and this was the output, still its not good.
also i tried to change the voxel size from 128 to 256, but that doesn't help
I'm not sure what i am doing wrong. could there be some issue with my GPU ? I'm using a V100 (32GB) , i tried on a RTX 3090, but that had different issues:
(/job:localhost/replica:0/task:0/device:GPU:0 with 22255 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:06:00.0, compute capability: 8.6)
feature_bs: (2000, 2000)
train start
2022-05-15 11:09:50.715829: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-05-15 11:11:18.078192: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(4096, 3), b.shape=(3, 512), m=4096, n=512, k=3
[[{{node global/dense_1/MatMul}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "onSurPrior.py", line 746, in <module>
sess.run([loss_optim],feed_dict={input_points_3d:input_points_2d_bs,feature_object:feature_bs_t,points_target_sparse:knn_bs})
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(4096, 3), b.shape=(3, 512), m=4096, n=512, k=3
[[node global/dense_1/MatMul (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
NOTE: I'm using the same conda environment tf.yml