Torch-ngp - A pytorch implementation of the hash encoder proposed in instant-ngp

hawkey

Last update: Jan 1, 2023

Related tags

Deep Learning torch-ngp

Overview

HashGrid Encoder (WIP)

A pytorch implementation of the HashGrid Encoder from instant-ngp, as described in Instant Neural Graphics Primitives with a Multiresolution Hash Encoding.

Note: This repo only tries to implement the hash grid encoder for now, and is far from instant (especially for NeRF experiments).

SDF	NeRF

Progress

HashGrid Encoder
- basic pytorch CUDA extension
- fp16 support
Experiments
- SDF
  - baseline
  - better SDF calculation (especially for non-watertight meshes)
- NeRF
  - baseline (although much slower)
  - ray marching in CUDA.

Usage

We use the same data format as instant-ngp, e.g., armadillo and fox. Please download the data from instant-ngp and put them under ./data.

# SDF experiment
bash scripts/run_sdf.sh

# NeRF experiment
bash scripts/run_nerf.sh

Comments

Error building extension '_raymarching'

I get this when tryin to launch the example for the first time

Traceback (most recent call last): File "C:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\utils\cpp_extension.py", line 1740, in _run_ninja_build subprocess.run( File "C:\Users\franz\anaconda3\envs\torchngp\lib\subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "E:\nvidia\torch-ngp\main_nerf.py", line 63, in from nerf.network import NeRFNetwork File "E:\nvidia\torch-ngp\nerf\network.py", line 7, in from .renderer import NeRFRenderer File "E:\nvidia\torch-ngp\nerf\renderer.py", line 9, in import raymarching File "E:\nvidia\torch-ngp\raymarching_init_.py", line 1, in from .raymarching import * File "E:\nvidia\torch-ngp\raymarching\raymarching.py", line 9, in from .backend import backend File "E:\nvidia\torch-ngp\raymarching\backend.py", line 31, in backend = load(name='raymarching', File "C:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\utils\cpp_extension.py", line 1144, in load return jit_compile( File "C:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\utils\cpp_extension.py", line 1357, in jit_compile write_ninja_file_and_build_library( File "C:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\utils\cpp_extension.py", line 1469, in write_ninja_file_and_build_library run_ninja_build( File "C:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\utils\cpp_extension.py", line 1756, in run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'raymarching': [1/2] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc --generate-dependencies-with-compile --dependency-output raymarching.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=raymarching -DTORCH_API_INCLUDE_EXTENSION_H -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\TH -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" -IC:\Users\franz\anaconda3\envs\torchngp\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -O3 -std=c++14 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS_ -U__CUDA_NO_HALF2_OPERATORS__ -c E:\nvidia\torch-ngp\raymarching\src\raymarching.cu -o raymarching.cuda.o FAILED: raymarching.cuda.o C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc --generate-dependencies-with-compile --dependency-output raymarching.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=raymarching -DTORCH_API_INCLUDE_EXTENSION_H -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\TH -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" -IC:\Users\franz\anaconda3\envs\torchngp\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS_ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -O3 -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -c E:\nvidia\torch-ngp\raymarching\src\raymarching.cu -o raymarching.cuda.o cl : warning della riga di comando D9025 : override di '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : warning della riga di comando D9025 : override di '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : warning della riga di comando D9025 : override di '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' raymarching.cu C:/Users/franz/anaconda3/envs/torchngp/lib/site-packages/torch/include\c10/macros/Macros.h(142): warning C4067: token imprevisti dopo una direttiva per il preprocessore. Previsto un carattere di nuova riga cl : warning della riga di comando D9025 : override di '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : warning della riga di comando D9025 : override di '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : warning della riga di comando D9025 : override di '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' raymarching.cu C:/Users/franz/anaconda3/envs/torchngp/lib/site-packages/torch/include\c10/macros/Macros.h(142): warning C4067: token imprevisti dopo una direttiva per il preprocessore. Previsto un carattere di nuova riga C:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\pybind11\cast.h(1429): error: too few arguments for template template parameter "Tuple" detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]" (1507): here

C:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\pybind11\cast.h(1503): error: too few arguments for template template parameter "Tuple" detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]" (1507): here

E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(332): warning #177-D: variable "index" was declared but never referenced detected during instantiation of "void kernel_composite_weights_forward(const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, scalar_t *) [with scalar_t=double]" (626): here

E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(332): warning #177-D: variable "index" was declared but never referenced detected during instantiation of "void kernel_composite_weights_forward(const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, scalar_t *) [with scalar_t=float]" (626): here

E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(332): warning #177-D: variable "index" was declared but never referenced detected during instantiation of "void kernel_composite_weights_forward(const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, scalar_t *) [with scalar_t=c10::Half]" (626): here

E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(407): warning #177-D: variable "weight" was declared but never referenced detected during instantiation of "void kernel_composite_weights_backward(const scalar_t *, const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, const scalar_t *, scalar_t *) [with scalar_t=double]" (659): here

E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(389): warning #177-D: variable "index" was declared but never referenced detected during instantiation of "void kernel_composite_weights_backward(const scalar_t *, const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, const scalar_t *, scalar_t *) [with scalar_t=double]" (659): here

E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(407): warning #177-D: variable "weight" was declared but never referenced detected during instantiation of "void kernel_composite_weights_backward(const scalar_t *, const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, const scalar_t *, scalar_t *) [with scalar_t=float]" (659): here

E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(389): warning #177-D: variable "index" was declared but never referenced detected during instantiation of "void kernel_composite_weights_backward(const scalar_t *, const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, const scalar_t *, scalar_t *) [with scalar_t=float]" (659): here

E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(407): warning #177-D: variable "weight" was declared but never referenced detected during instantiation of "void kernel_composite_weights_backward(const scalar_t *, const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, const scalar_t *, scalar_t *) [with scalar_t=c10::Half]" (659): here

E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(389): warning #177-D: variable "index" was declared but never referenced detected during instantiation of "void kernel_composite_weights_backward(const scalar_t *, const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, const scalar_t *, scalar_t *) [with scalar_t=c10::Half]" (659): here

E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(777): warning #177-D: variable "near" was declared but never referenced detected during instantiation of "void kernel_march_rays(uint32_t, uint32_t, const int *, const scalar_t *, const scalar_t *, const scalar_t *, float, float, uint32_t, uint32_t, const scalar_t *, float, const scalar_t *, const scalar_t *, scalar_t *, scalar_t *, scalar_t *, uint32_t) [with scalar_t=double]" (853): here

E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(777): warning #177-D: variable "near" was declared but never referenced detected during instantiation of "void kernel_march_rays(uint32_t, uint32_t, const int *, const scalar_t *, const scalar_t *, const scalar_t *, float, float, uint32_t, uint32_t, const scalar_t *, float, const scalar_t *, const scalar_t *, scalar_t *, scalar_t *, scalar_t *, uint32_t) [with scalar_t=float]" (853): here

2 errors detected in the compilation of "E:/nvidia/torch-ngp/raymarching/src/raymarching.cu". raymarching.cu ninja: build stopped: subcommand failed.

opened by francescofugazzi 14

NeRF inference profiling

This issue records the current profiling of NeRF, which shows the speed bottleneck.

Inference with --fp16 --ff on 3 1920x1080 frames, 128+128 points per ray:

-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ----------[258/1997]
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls       
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------        
                                           _hash_encode         3.53%     391.299ms        10.87%        1.206s     396.505us        2.607s        24.09%        3.510s       1.154ms          3042        
void kernel_grid<c10::Half, 3u, 2u>(c10::Half const*...         0.00%       0.000us         0.00%       0.000us       0.000us        2.607s        24.09%        2.607s     857.029us          3042        
                                         _ffmlp_forward         5.02%     557.577ms         8.39%     930.914ms     153.010us        2.054s        18.98%        2.074s     340.958us          6084        
void kernel_mlp_fused<64, 1, 8, __half, true>(Activa...         0.00%       0.000us         0.00%       0.000us       0.000us        2.054s        18.98%        2.054s     337.648us          6084        
                                             aten::_cat         1.98%     219.730ms         4.66%     517.135ms      30.892us        1.317s        12.17%        1.317s      78.680us         16740        
                                             aten::sort         0.30%      33.207ms         2.29%     254.487ms     167.316us        1.226s        11.33%        1.281s     841.893us          1521        
void at::native::bitonicSortKVInPlace<float, long, 2...         0.00%       0.000us         0.00%       0.000us       0.000us        1.226s        11.33%        1.226s     806.281us          1521        
                                            aten::copy_         3.08%     341.543ms        31.30%        3.474s      95.054us        1.150s        10.62%        1.155s      31.611us         36546        
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us        1.035s         9.57%        1.035s     340.388us          3042        
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     541.719ms         5.01%     541.719ms      35.602us         15216        
                                            _sh_encoder         2.18%     241.916ms         3.85%     426.764ms     140.291us     534.972ms         4.94%     595.884ms     195.886us          3042        
void kernel_sh<c10::Half>(c10::Half const*, c10::Hal...         0.00%       0.000us         0.00%       0.000us       0.000us     534.972ms         4.94%     534.972ms     175.862us          3042        
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     442.365ms         4.09%     442.365ms     145.419us          3042        
                                              aten::mul         2.60%     288.552ms         4.19%     465.465ms      20.391us     292.856ms         2.71%     292.856ms      12.829us         22827        
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     247.842ms         2.29%     247.842ms      13.572us         18261        
                                              aten::sub         3.23%     358.866ms         5.20%     577.077ms      21.069us     217.357ms         2.01%     217.357ms       7.936us         27390        
                                           aten::gather         0.97%     107.481ms         1.54%     171.240ms      28.146us     212.941ms         1.97%     212.941ms      35.000us          6084        
                                              aten::add         1.94%     214.941ms         3.08%     341.548ms      20.410us     202.851ms         1.87%     202.851ms      12.122us         16734        
                                              aten::div         1.78%     197.500ms         2.82%     312.510ms      20.526us     180.809ms         1.67%     180.809ms      11.876us         15225        
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     133.226ms         1.23%     133.226ms      14.599us          9126        
                                          aten::sigmoid         0.44%      48.847ms         0.73%      80.799ms      26.561us     124.591ms         1.15%     124.591ms      40.957us          3042        
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     124.591ms         1.15%     124.591ms      40.957us          3042        
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us     123.191ms         1.14%     123.191ms      80.993us          1521        
                                        aten::clamp_min         0.57%      63.140ms         2.08%     230.413ms      30.274us     115.899ms         1.07%     230.170ms      30.242us          7611        
void at::native::_scatter_gather_elementwise_kernel<...         0.00%       0.000us         0.00%       0.000us       0.000us     115.718ms         1.07%     115.718ms      38.040us          3042        
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     114.181ms         1.06%     114.181ms      37.535us          3042        
void at::native::_scatter_gather_elementwise_kernel<...         0.00%       0.000us         0.00%       0.000us       0.000us      97.223ms         0.90%      97.223ms      31.960us          3042        
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      95.482ms         0.88%      95.482ms      12.550us          7608        
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us      92.027ms         0.85%      92.027ms      12.096us          7608        
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      89.500ms         0.83%      89.500ms      14.711us          6084        
                                              aten::sum         1.05%     116.752ms         1.50%     166.470ms      27.362us      87.986ms         0.81%      87.986ms      14.462us          6084        
                                            aten::clamp         0.61%      67.853ms         1.10%     122.594ms      20.150us      87.157ms         0.81%      88.785ms      14.593us          6084        
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      87.157ms         0.81%      87.157ms      19.101us          4563        
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      80.040ms         0.74%      80.040ms      52.623us          1521        
                                              aten::min         0.98%     108.652ms         2.15%     238.483ms      39.198us      66.320ms         0.61%      95.284ms      15.661us          6084        
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      65.883ms         0.61%      65.883ms      10.829us          6084        
_ZN2at6native32tensor_kernel_scan_innermost_dimIfLi1...         0.00%       0.000us         0.00%       0.000us       0.000us      64.053ms         0.59%      64.053ms      21.056us          3042        
                                          aten::cumprod         0.43%      47.625ms         0.65%      71.634ms      23.548us      63.409ms         0.59%      63.409ms      20.845us          3042        
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      58.054ms         0.54%      58.054ms      19.084us          3042  
                                              aten::max         0.94%     104.159ms         2.01%     222.583ms      36.585us      57.724ms         0.53%      86.936ms      14.289us          6084  
                       Memcpy DtoH (Device -> Pageable)         0.00%       0.000us         0.00%       0.000us       0.000us      51.809ms         0.48%      51.809ms       8.507us          6090  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      49.877ms         0.46%      49.877ms      16.396us          3042  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      46.988ms         0.43%      46.988ms      10.298us          4563  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      45.896ms         0.42%      45.896ms       7.536us          6090  
                                            aten::fill_         1.03%     114.067ms         2.19%     242.845ms      15.960us      43.753ms         0.40%      43.753ms       2.875us         15216  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      41.419ms         0.38%      41.419ms      13.616us          3042  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      40.997ms         0.38%      40.997ms      13.477us          3042  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      39.479ms         0.36%      39.479ms       6.476us          6096  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us      39.258ms         0.36%      39.258ms      25.811us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      37.032ms         0.34%      37.032ms       4.869us          7605  
                                     aten::searchsorted         0.16%      17.916ms         0.39%      43.311ms      28.475us      33.258ms         0.31%      33.258ms      21.866us          1521  
void at::native::(anonymous namespace)::searchsorted...         0.00%       0.000us         0.00%       0.000us       0.000us      33.258ms         0.31%      33.258ms      21.866us          1521  
                         Memcpy DtoD (Device -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us      31.294ms         0.29%      31.294ms       5.144us          6084  
                                          aten::maximum         0.16%      17.381ms         0.25%      27.532ms      18.101us      29.250ms         0.27%      29.250ms      19.231us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      29.250ms         0.27%      29.250ms      19.231us          1521  
                                          aten::minimum         0.15%      16.095ms         0.24%      26.151ms      17.193us      28.983ms         0.27%      28.983ms      19.055us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      28.983ms         0.27%      28.983ms      19.055us          1521  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      24.661ms         0.23%      24.661ms      16.214us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      21.328ms         0.20%      21.328ms       7.011us          3042  
                                              aten::neg         0.31%      34.620ms         0.51%      56.827ms      18.681us      19.157ms         0.18%      19.157ms       6.298us          3042  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      19.157ms         0.18%      19.157ms       6.298us          3042  
_ZN2at6native32tensor_kernel_scan_innermost_dimIfLi1...         0.00%       0.000us         0.00%       0.000us       0.000us      18.386ms         0.17%      18.386ms      12.088us          1521  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us      18.318ms         0.17%      18.318ms      12.043us          1521  
                                              aten::exp         0.35%      38.918ms         0.53%      58.556ms      19.249us      18.278ms         0.17%      18.278ms       6.009us          3042  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      18.278ms         0.17%      18.278ms       6.009us          3042  
                                           aten::cumsum         0.21%      23.407ms         0.31%      34.911ms      22.953us      17.978ms         0.17%      17.978ms      11.820us          1521  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      14.847ms         0.14%      14.847ms       9.761us          1521  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      14.280ms         0.13%      14.280ms       9.389us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      13.793ms         0.13%      13.793ms       9.068us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      13.094ms         0.12%      13.094ms       1.434us          9132  
                       Memcpy HtoD (Pageable -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us      11.129ms         0.10%      11.129ms       1.824us          6102  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      10.466ms         0.10%      10.466ms       6.867us          1524  
                                         aten::_s_where         0.44%      48.315ms         0.88%      98.069ms      21.492us       9.361ms         0.09%       9.361ms       2.052us          4563  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       9.361ms         0.09%       9.361ms       2.052us          4563  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       9.338ms         0.09%       9.338ms       3.070us          3042  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us       8.848ms         0.08%       8.848ms       5.817us          1521  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us       8.264ms         0.08%       8.264ms       5.433us          1521  
                                             aten::norm         0.26%      28.412ms         0.40%      44.594ms      29.261us       7.870ms         0.07%       7.870ms       5.164us          1524  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us       7.846ms         0.07%       7.846ms       5.158us          1521  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us       7.667ms         0.07%       7.667ms       5.041us          1521  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us       6.533ms         0.06%       6.533ms       2.148us          3042  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       6.372ms         0.06%       6.372ms       1.396us          4566  
                                               aten::lt         0.40%      44.178ms         0.59%      65.952ms      21.680us       6.184ms         0.06%       6.184ms       2.033us          3042  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       4.586ms         0.04%       4.586ms       3.015us          1521  
                                           aten::arange         0.92%     102.334ms         1.46%     162.401ms      53.281us       3.053ms         0.03%       6.106ms       2.003us          3048  
void (anonymous namespace)::elementwise_kernel_with_...         0.00%       0.000us         0.00%       0.000us       0.000us       3.053ms         0.03%       3.053ms       2.007us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.718ms         0.02%       1.718ms       1.127us          1524  
                                               aten::gt         0.21%      23.296ms         0.31%      34.756ms      22.851us       1.647ms         0.02%       1.647ms       1.083us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.647ms         0.02%       1.647ms       1.083us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.598ms         0.01%       1.598ms       1.051us          1521  
                                              aten::bmm         0.02%       2.302ms         0.07%       7.562ms       1.260ms       1.026ms         0.01%       2.360ms     393.333us             6  
                        volta_fp16_sgemm_fp16_128x32_tn         0.00%       0.000us         0.00%       0.000us       0.000us       1.026ms         0.01%       1.026ms     171.000us             6  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     289.000us         0.00%     289.000us      96.333us             3  
                              aten::_local_scalar_dense         0.29%      32.665ms         1.26%     139.968ms      22.979us     231.000us         0.00%     231.000us       0.038us          6091  
void at::native::reduce_kernel<128, 4, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us     203.000us         0.00%     203.000us      67.667us             3  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us       6.000us         0.00%       6.000us       2.000us             3  
                                        Memset (Device)         0.00%       0.000us         0.00%       0.000us       0.000us       3.000us         0.00%       3.000us       0.000us          6084  
                                            aten::empty         3.03%     336.134ms         3.03%     336.578ms       4.422us       0.000us         0.00%       0.000us       0.000us         76110  
                                          aten::random_         0.00%      31.000us         0.00%      31.000us      31.000us       0.000us         0.00%       0.000us       0.000us             1  
                                             aten::item         0.13%      14.935ms         1.40%     154.881ms      25.428us       0.000us         0.00%     231.000us       0.038us          6091  
                                            aten::zeros         0.00%      51.000us         0.00%      82.000us      20.500us       0.000us         0.00%       0.000us       0.000us             4  
                                            aten::zero_         0.22%      24.701ms         1.15%     127.125ms      20.881us       0.000us         0.00%      21.785ms       3.578us          6088  
enumerate(DataLoader)#_SingleProcessDataLoaderIter._...         0.01%     841.000us         0.01%       1.235ms     308.750us       0.000us         0.00%       0.000us       0.000us             4  
                                               aten::to         0.63%      69.980ms        32.04%        3.556s     111.123us       0.000us         0.00%     604.426ms      18.888us         32001  
                                            aten::stack         0.09%      10.056ms         0.64%      70.649ms      46.176us       0.000us         0.00%     125.256ms      81.867us          1530  
                                        aten::unsqueeze         0.70%      77.737ms         0.94%     104.781ms       4.046us       0.000us         0.00%       0.000us       0.000us         25896  
                                       aten::as_strided         1.31%     145.867ms         1.33%     147.451ms       0.969us       0.000us         0.00%       0.000us       0.000us        152229  
                                              aten::cat         0.61%      68.121ms         5.27%     585.256ms      34.962us       0.000us         0.00%        1.317s      78.680us         16740  
                                          aten::resize_         1.42%     157.545ms         1.42%     157.717ms       5.182us       0.000us         0.00%       0.000us       0.000us         30435  
                                          aten::detach_         0.00%      15.000us         0.00%      22.000us       2.444us       0.000us         0.00%       0.000us       0.000us             9  
                                                detach_         0.00%       7.000us         0.00%      11.000us       1.222us       0.000us         0.00%       0.000us       0.000us             9  
                                         aten::_to_copy         0.99%     109.595ms        31.46%        3.492s     163.761us       0.000us         0.00%     604.426ms      28.345us         21324  
                                    aten::empty_strided         3.28%     364.021ms         3.30%     366.594ms       9.263us       0.000us         0.00%       0.000us       0.000us         39576  
                                        cudaMemcpyAsync         2.73%     303.436ms         2.73%     303.436ms      16.603us       0.000us         0.00%       0.000us       0.000us         18276  
                                            aten::slice         1.85%     205.048ms         2.44%     270.957ms       3.492us       0.000us         0.00%       0.000us       0.000us         77583  
                                           aten::select         0.36%      39.487ms         0.48%      52.748ms       4.321us       0.000us         0.00%       0.000us       0.000us         12207  
                                         aten::linspace         0.23%      25.233ms         0.42%      46.684ms       7.658us       0.000us         0.00%       0.000us       0.000us          6096  
                                         aten::meshgrid         0.00%      74.000us         0.00%     149.000us      49.667us       0.000us         0.00%       0.000us       0.000us             3  
                                             aten::view         0.31%      34.551ms         0.31%      34.551ms       2.523us       0.000us         0.00%       0.000us       0.000us         13695  
                                           aten::expand         0.32%      35.520ms         0.44%      49.138ms       4.604us       0.000us         0.00%       0.000us       0.000us         10674  
                                                aten::t         0.00%      60.000us         0.00%      79.000us      13.167us       0.000us         0.00%       0.000us       0.000us             6  
                                        aten::transpose         0.00%      46.000us         0.00%      70.000us       5.833us       0.000us         0.00%       0.000us       0.000us            12  
                                        aten::expand_as         0.09%       9.560ms         0.25%      27.680ms       6.050us       0.000us         0.00%       0.000us       0.000us          4575  
                                       aten::contiguous         0.03%       3.433ms         0.57%      62.986ms      41.248us       0.000us         0.00%       7.658ms       5.015us          1527  
                                            aten::clone         0.31%      34.313ms         2.37%     263.087ms      34.567us       0.000us         0.00%     489.353ms      64.295us          7611  
                                       aten::empty_like         0.67%      74.421ms         2.41%     267.441ms      10.338us       0.000us         0.00%       0.000us       0.000us         25869  
                                  cudaStreamSynchronize        24.06%        2.670s        24.06%        2.670s     219.156us       0.000us         0.00%       0.000us       0.000us         12183  
                                          aten::reshape         0.84%      92.801ms         3.51%     389.672ms      10.245us       0.000us         0.00%     481.695ms      12.664us         38037  
                                   aten::_reshape_alias         0.62%      68.539ms         0.65%      72.009ms       2.254us       0.000us         0.00%       0.000us       0.000us         31953  
                                        aten::ones_like         0.30%      32.979ms         2.49%     276.039ms      30.228us       0.000us         0.00%      21.968ms       2.406us          9132  
                                       cudaLaunchKernel        16.00%        1.776s        16.00%        1.776s       8.336us       0.000us         0.00%       0.000us       0.000us        213009  
                                               cudaFree         0.00%       8.000us         0.00%       8.000us       4.000us       0.000us         0.00%       0.000us       0.000us             2  
                                 cudaDeviceGetAttribute         0.00%       2.000us         0.00%       2.000us       0.143us       0.000us         0.00%       0.000us       0.000us            14  
                                   cudaGetSymbolAddress         0.00%       1.000us         0.00%       1.000us       1.000us       0.000us         0.00%       0.000us       0.000us             1  
                                             cudaMalloc         0.02%       2.525ms         0.02%       2.525ms     229.545us       0.000us         0.00%       0.000us       0.000us            11  
                               cudaEventCreateWithFlags         0.00%       7.000us         0.00%       7.000us       0.389us       0.000us         0.00%       0.000us       0.000us            18  
                                   cudaFuncSetAttribute         0.07%       7.749ms         0.07%       7.749ms       1.251us       0.000us         0.00%       0.000us       0.000us          6192  
cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFla...         0.00%      22.000us         0.00%      22.000us       1.833us       0.000us         0.00%       0.000us       0.000us            12  
                                   aten::frobenius_norm         0.17%      19.175ms         0.96%     106.797ms      70.215us       0.000us         0.00%       9.278ms       6.100us          1521  
                                             aten::rsub         0.60%      66.377ms         2.68%     297.521ms      27.944us       0.000us         0.00%      43.565ms       4.092us         10647  
                                            aten::where         0.11%      11.916ms         0.99%     109.985ms      24.104us       0.000us         0.00%       9.361ms       2.052us          4563  
                                     aten::_unsafe_view         0.13%      14.005ms         0.23%      25.146ms       4.133us       0.000us         0.00%       0.000us       0.000us          6084  
                                        cudaMemsetAsync         0.37%      40.511ms         0.37%      40.511ms       6.659us       0.000us         0.00%       0.000us       0.000us          6084  
                                          aten::permute         0.14%      15.828ms         0.20%      22.640ms       7.442us       0.000us         0.00%       0.000us       0.000us          3042  
                                  cudaStreamIsCapturing         0.00%      13.000us         0.00%      13.000us       1.625us       0.000us         0.00%       0.000us       0.000us             8  
                                             aten::relu         0.48%      53.008ms         1.52%     168.367ms      55.347us       0.000us         0.00%     114.181ms      37.535us          3042  
                                       aten::zeros_like         0.20%      22.493ms         1.99%     220.955ms      36.317us       0.000us         0.00%      21.785ms       3.581us          6084  
                                           aten::detach         0.03%       3.163ms         0.06%       6.806ms       4.457us       0.000us         0.00%       0.000us       0.000us          1527  
                                                 detach         0.03%       3.643ms         0.04%       4.445ms       2.911us       0.000us         0.00%       0.000us       0.000us          1527  
                                  cudaDeviceSynchronize         0.00%      48.000us         0.00%      48.000us      48.000us       0.000us         0.00%       0.000us       0.000us             1  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 11.099s
Self CUDA time total: 10.822s

Inference with --fp16 --ff --cuda_raymarching on 3 1920x1080 frames:


-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------        
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Cal[76/1997]
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                           _hash_encode         2.08%     199.448ms         5.88%     564.231ms     369.020us        2.179s        23.75%        2.824s       1.847ms          1529  
void kernel_grid<c10::Half, 3u, 2u>(c10::Half const*...         0.00%       0.000us         0.00%       0.000us       0.000us        2.179s        23.75%        2.179s       1.425ms          1529  
                                         _ffmlp_forward         4.10%     393.471ms         6.16%     591.005ms     193.772us        1.983s        21.61%        1.993s     653.448us          3050  
void kernel_mlp_fused<64, 1, 8, __half, true>(Activa...         0.00%       0.000us         0.00%       0.000us       0.000us        1.979s        21.57%        1.979s     650.671us          3042  
                                       _generate_points         2.07%     198.297ms        71.59%        6.872s       4.518ms        1.638s        17.86%        1.688s       1.110ms          1521  
void kernel_generate_points<c10::Half>(c10::Half con...         0.00%       0.000us         0.00%       0.000us       0.000us        1.638s        17.86%        1.638s       1.077ms          1521  
                                             aten::_cat         0.55%      52.390ms         1.16%     111.195ms      36.255us        1.222s        13.31%        1.222s     398.330us          3067  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us        1.128s        12.29%        1.128s     741.391us          1521  
                                            aten::copy_         2.10%     201.777ms         6.39%     613.478ms      30.858us     879.039ms         9.58%     884.456ms      44.488us         19881  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     552.081ms         6.02%     552.081ms     120.779us          4571  
void kernel_sh<c10::Half>(c10::Half const*, c10::Hal...         0.00%       0.000us         0.00%       0.000us       0.000us     516.578ms         5.63%     516.578ms     339.631us          1521  
                                            _sh_encoder         1.17%     112.374ms         1.54%     148.212ms      97.444us     516.234ms         5.63%     516.234ms     339.404us          1521  
                                       _accumulate_rays         1.37%     131.976ms         2.77%     265.826ms     174.771us     330.450ms         3.60%     333.671ms     219.376us          1521  
void kernel_accumulate_rays_forward<c10::Half>(c10::...         0.00%       0.000us         0.00%       0.000us       0.000us     330.450ms         3.60%     330.450ms     217.258us          1521  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     250.761ms         2.73%     250.761ms      32.844us          7635  
                                          aten::sigmoid         0.25%      23.853ms         0.42%      40.290ms      26.489us     113.450ms         1.24%     113.450ms      74.589us          1521  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     113.450ms         1.24%     113.450ms      74.589us          1521  
                                        aten::clamp_min         0.28%      27.235ms         1.10%     105.430ms      34.409us     102.666ms         1.12%     205.332ms      67.014us          3064  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     102.576ms         1.12%     102.576ms      67.087us          1529  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us      91.992ms         1.00%      91.992ms      60.481us          1521  
                                              aten::div         0.48%      46.482ms         0.77%      73.898ms      24.110us      89.251ms         0.97%      89.251ms      29.119us          3065  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      54.861ms         0.60%      54.861ms      36.069us          1521  
                                              aten::add         0.27%      25.442ms         0.42%      40.336ms      26.329us      53.245ms         0.58%      53.245ms      34.755us          1532  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      53.051ms         0.58%      53.051ms      34.879us          1521  
                       Memcpy DtoH (Device -> Pageable)         0.00%       0.000us         0.00%       0.000us       0.000us      39.039ms         0.43%      39.039ms       8.509us          4588  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      33.580ms         0.37%      33.580ms      22.078us          1521  
                                              aten::max         0.39%      37.096ms         1.18%     113.050ms      73.889us      27.348ms         0.30%      79.880ms      52.209us          1530  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      27.216ms         0.30%      27.216ms      17.893us          1521  
                                              aten::min         0.44%      42.345ms         1.37%     131.581ms      86.001us      26.664ms         0.29%      84.516ms      55.239us          1530  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      26.534ms         0.29%      26.534ms      17.445us          1521  
                       Memcpy HtoD (Pageable -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us      15.368ms         0.17%      15.368ms       9.883us          1555  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      12.741ms         0.14%      12.741ms       4.188us          3042  
                                            aten::fill_         0.53%      51.353ms         1.22%     116.670ms      19.151us      12.460ms         0.14%      12.460ms       2.045us          6092  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us       9.223ms         0.10%       9.223ms       3.030us          3044  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       7.497ms         0.08%       7.497ms       2.464us          3042  
void kernel_mlp_fused<64, 1, 8, __half, false>(Activ...         0.00%       0.000us         0.00%       0.000us       0.000us       3.178ms         0.03%       3.178ms     397.250us             8  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       3.091ms         0.03%       3.091ms       2.032us          1521  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us       2.030ms         0.02%       2.030ms     676.667us             3  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.872ms         0.02%       1.872ms       1.224us          1529  
                                              aten::bmm         0.04%       3.937ms         0.10%       9.900ms       1.650ms       1.018ms         0.01%       2.343ms     390.500us             6  
                        volta_fp16_sgemm_fp16_128x32_tn         0.00%       0.000us         0.00%       0.000us       0.000us       1.018ms         0.01%       1.018ms     169.667us             6  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     698.000us         0.01%     698.000us      58.167us            12  
                                              aten::sub         0.00%     292.000us         0.00%     457.000us      38.083us     615.000us         0.01%     615.000us      51.250us            12  
                                              aten::mul         0.00%     204.000us         0.00%     346.000us      28.833us     378.000us         0.00%     378.000us      31.500us            12  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     287.000us         0.00%     287.000us      31.889us             9  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     283.000us         0.00%     283.000us      94.333us             3  
                              aten::_local_scalar_dense         0.30%      29.017ms        67.66%        6.495s       1.415ms     276.000us         0.00%     276.000us       0.060us          4589  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     276.000us         0.00%     276.000us      46.000us             6  
                                             aten::norm         0.00%     136.000us         0.00%     194.000us      64.667us     201.000us         0.00%     201.000us      67.000us             [26/1997]
void at::native::reduce_kernel<128, 4, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us     201.000us         0.00%     201.000us      67.000us             3        
                          aten::max_pool3d_with_indices         0.00%      32.000us         0.00%      68.000us      68.000us     149.000us         0.00%     149.000us     149.000us             1        
void at::native::(anonymous namespace)::max_pool3d_w...         0.00%       0.000us         0.00%       0.000us       0.000us     149.000us         0.00%     149.000us     149.000us             1  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     139.000us         0.00%     139.000us      46.333us             3  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us     124.000us         0.00%     124.000us      13.778us             9  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us     123.000us         0.00%     123.000us      13.667us             9  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     105.000us         0.00%     105.000us      13.125us             8  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     100.000us         0.00%     100.000us      12.500us             8  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      96.000us         0.00%      96.000us      32.000us             3  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      90.000us         0.00%      90.000us      30.000us             3  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      66.000us         0.00%      66.000us      11.000us             6  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      36.000us         0.00%      36.000us      36.000us             1  
                                             aten::mean         0.00%      52.000us         0.00%      79.000us      79.000us      25.000us         0.00%      25.000us      25.000us             1  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      25.000us         0.00%      25.000us      25.000us             1  
                                        Memset (Device)         0.00%       0.000us         0.00%       0.000us       0.000us      15.000us         0.00%      15.000us       0.005us          3047  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       7.000us         0.00%       7.000us       2.333us             3  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us       6.000us         0.00%       6.000us       2.000us             3  
                                         aten::linspace         0.00%     173.000us         0.00%     282.000us      15.667us       0.000us         0.00%       0.000us       0.000us            18  
                                            aten::empty         2.00%     192.373ms         2.02%     194.152ms       5.084us       0.000us         0.00%       0.000us       0.000us         38190  
                                            aten::split         0.00%      45.000us         0.00%     113.000us      37.667us       0.000us         0.00%       0.000us       0.000us             3  
                                           aten::narrow         0.00%      92.000us         0.00%     243.000us       6.750us       0.000us         0.00%       0.000us       0.000us            36  
                                            aten::slice         0.74%      70.841ms         0.98%      93.966ms       4.104us       0.000us         0.00%       0.000us       0.000us         22895  
                                       aten::as_strided         0.37%      35.140ms         0.37%      35.144ms       1.146us       0.000us         0.00%       0.000us       0.000us         30680  
                                       aten::zeros_like         0.06%       5.980ms         0.64%      61.913ms      40.679us       0.000us         0.00%       5.716ms       3.756us          1522  
                                       aten::empty_like         0.18%      17.617ms         0.56%      53.704ms       8.784us       0.000us         0.00%       0.000us       0.000us          6114  
                                    aten::empty_strided         1.13%     108.323ms         1.13%     108.323ms      10.107us       0.000us         0.00%       0.000us       0.000us         10718  
                                            aten::zero_         0.20%      18.962ms         1.14%     109.460ms      23.920us       0.000us         0.00%      10.604ms       2.317us          4576  
                                       cudaLaunchKernel         4.97%     476.865ms         4.97%     476.865ms      10.077us       0.000us         0.00%       0.000us       0.000us         47323  
                                         aten::meshgrid         0.00%     185.000us         0.00%     403.000us      36.636us       0.000us         0.00%       0.000us       0.000us            11  
                                             aten::view         0.14%      13.340ms         0.14%      13.340ms       2.884us       0.000us         0.00%       0.000us       0.000us          4626  
                                           aten::expand         0.00%     209.000us         0.00%     270.000us       5.294us       0.000us         0.00%       0.000us       0.000us            51  
                                          aten::reshape         0.45%      42.905ms         1.49%     142.878ms       8.504us       0.000us         0.00%     441.697ms      26.288us         16802  
                                            aten::clone         0.20%      18.941ms         1.62%     155.338ms      33.872us       0.000us         0.00%     552.081ms     120.384us          4586  
                                     aten::_unsafe_view         0.04%       4.005ms         0.07%       7.094ms       4.612us       0.000us         0.00%       0.000us       0.000us          1538  
                                              aten::cat         0.16%      14.993ms         1.31%     126.188ms      41.144us       0.000us         0.00%        1.222s     398.330us          3067  
                                          aten::resize_         0.28%      27.066ms         0.28%      27.066ms       5.879us       0.000us         0.00%       0.000us       0.000us          4604  
                                            aten::zeros         0.14%      13.016ms         1.06%     101.817ms      33.339us       0.000us         0.00%       4.888ms       1.601us          3054  
                                               aten::to         0.78%      74.779ms         7.25%     696.110ms      30.367us       0.000us         0.00%     317.633ms      13.857us         22923  
                                         aten::_to_copy         0.67%      64.749ms         6.50%     623.481ms      50.946us       0.000us         0.00%     317.633ms      25.955us         12238  
                                        cudaMemcpyAsync        68.03%        6.530s        68.03%        6.530s       1.063ms       0.000us         0.00%       0.000us       0.000us          6143  
                                  cudaStreamSynchronize         1.54%     148.007ms         1.54%     148.007ms      24.129us       0.000us         0.00%       0.000us       0.000us          6134  
                                   aten::_reshape_alias         0.38%      36.356ms         0.39%      37.819ms       2.478us       0.000us         0.00%       0.000us       0.000us         15264  
                                        cudaMemsetAsync         0.22%      20.762ms         0.22%      20.762ms       6.814us       0.000us         0.00%       0.000us       0.000us          3047  
                                             aten::item         0.13%      12.496ms        67.79%        6.507s       1.418ms       0.000us         0.00%     276.000us       0.060us          4589  
                                  cudaStreamIsCapturing         0.00%       5.000us         0.00%       5.000us       1.667us       0.000us         0.00%       0.000us       0.000us             3  
                                             cudaMalloc         0.03%       2.440ms         0.03%       2.440ms     406.667us       0.000us         0.00%       0.000us       0.000us             6  
                                          aten::permute         0.09%       8.435ms         0.12%      11.996ms       7.846us       0.000us         0.00%       0.000us       0.000us          1529  
                                   cudaFuncSetAttribute         0.04%       3.610ms         0.04%       3.610ms       1.143us       0.000us         0.00%       0.000us       0.000us          3158  
                                           aten::select         0.34%      33.081ms         0.39%      37.883ms      12.264us       0.000us         0.00%       0.000us       0.000us          3089  
                                             aten::relu         0.08%       7.800ms         0.69%      66.312ms      43.370us       0.000us         0.00%     102.576ms      67.087us          1529  
                                           aten::detach         0.00%      34.000us         0.00%     128.000us       9.143us       0.000us         0.00%       0.000us       0.000us            14  
                                                 detach         0.00%      94.000us         0.00%      97.000us       6.929us       0.000us         0.00%       0.000us       0.000us            14  
                                  aten::constant_pad_nd         0.00%      11.000us         0.00%      71.000us      71.000us       0.000us         0.00%      51.000us      51.000us             1  
                                        aten::unsqueeze         0.00%     131.000us         0.00%     196.000us       4.780us       0.000us         0.00%       0.000us       0.000us            41  
                                       aten::max_pool3d         0.00%      38.000us         0.00%     106.000us     106.000us       0.000us         0.00%     149.000us     149.000us             1  
                                          aten::squeeze         0.00%       8.000us         0.00%       9.000us       4.500us       0.000us         0.00%       0.000us       0.000us             2  
                                          aten::random_         0.00%      28.000us         0.00%      28.000us      28.000us       0.000us         0.00%       0.000us       0.000us             1  
enumerate(DataLoader)#_SingleProcessDataLoaderIter._...         0.01%     829.000us         0.01%       1.073ms     268.250us       0.000us         0.00%       0.000us       0.000us             4  
                                            aten::stack         0.00%      87.000us         0.01%     516.000us      57.333us       0.000us         0.00%       2.030ms     225.556us             9  
                                          aten::detach_         0.00%      16.000us         0.00%      22.000us       2.444us       0.000us         0.00%       0.000us       0.000us             9  
                                                detach_         0.00%       6.000us         0.00%      12.000us       1.333us       0.000us         0.00%       0.000us       0.000us             9  
                                                aten::t         0.00%      58.000us         0.00%      76.000us      12.667us       0.000us         0.00%       0.000us       0.000us             6  
                                        aten::transpose         0.00%      55.000us         0.00%      75.000us       6.250us       0.000us         0.00%       0.000us       0.000us            12  
                                        aten::expand_as         0.00%      28.000us         0.00%      89.000us       7.417us       0.000us         0.00%       0.000us       0.000us            12  
                                       aten::contiguous         0.07%       7.113ms         1.10%     105.922ms      34.751us       0.000us         0.00%     110.384ms      36.215us          3048  
                                           aten::arange         0.02%       2.239ms         0.05%       4.421ms     736.833us       0.000us         0.00%       0.000us       0.000us             6  
                                        aten::ones_like         0.00%      42.000us         0.00%     461.000us      76.833us       0.000us         0.00%      79.000us      13.167us             6  
                                               cudaFree         0.00%      12.000us         0.00%      12.000us       6.000us       0.000us         0.00%       0.000us       0.000us             2  
                                 cudaDeviceGetAttribute         0.00%       2.000us         0.00%       2.000us       0.143us       0.000us         0.00%       0.000us       0.000us            14  
                                   cudaGetSymbolAddress         0.00%       4.000us         0.00%       4.000us       4.000us       0.000us         0.00%       0.000us       0.000us             1  
                               cudaEventCreateWithFlags         0.00%      12.000us         0.00%      12.000us       0.667us       0.000us         0.00%       0.000us       0.000us            18  
cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFla...         0.00%      14.000us         0.00%      14.000us       1.167us       0.000us         0.00%       0.000us       0.000us            12  
                                             aten::ones         0.07%       7.094ms         0.44%      41.995ms      27.610us       0.000us         0.00%       1.762ms       1.158us          1521  
                                  cudaDeviceSynchronize         0.00%      19.000us         0.00%      19.000us      19.000us       0.000us         0.00%       0.000us       0.000us             1  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 9.599s
Self CUDA time total: 9.175s

opened by ashawkey 14

some black fog on the test results

hi, thank you very much for your work.

I have one question,when I was training, I had a 34+ PSNR，But this is what happened when I tested a new pose.

https://user-images.githubusercontent.com/63493274/162655669-09784b3a-ca72-4b20-9aef-8ff5d7f68d8a.mp4

In addition, I ran the original NeRF with the same data and got better results as follows

https://user-images.githubusercontent.com/63493274/162655815-b73fa03d-7b16-4f87-88a6-e7236819150a.mp4

I tried tweaking the parameters "bound" and "scale" but the results didn't improve.

What the reason behind this result do you think?

Many thanks.

opened by QAZWSX0827 13
real scene test, get poor result

hi, thank you very much for your work.

I have one question, when I try to test my real scene data. When it comes to evaluate, the result is like this,

but when it comes to novel view, the result is the following image,

What the reason behind this result do you think?

Many thanks.

opened by neilgogogo 13
Can‘t reproduced the experiment, low PSNR in Lego dataset!

with the default command "-O", in a single RTX1080, train 30K steps, 100 images, 300 epoch, Lego dataset. the speed is fast, but I only get PSNR=28.244854, LPIPS = 0.070723. Is there any mistake? please give me some advice.

opened by Saoyu99 12
Training randomly broke with RuntimeError on Backpropagation

Just wondering, has anyone encountered this problem before?

I tried to test the torch-ngp's network_tcnn in my own code. However, it sometimes randomly stopped by a weird error "RuntimeError: Function '_module_functionBackward' returned nan values in its 0th output. " This is really hard to debug since it occurs randomly.

Is this an amp error? or a tiny-cuda-nn error? It never happens if I didn't use AMP and tiny-cuda-nn.

Update: last time I checked, this never happen if I don't use amp mixed precision.

I pasted the log below:

/root/miniconda3/envs/torch-ngp/lib/python3.8/site-packages/toorch/autograd/init.py:173: UserWarning: Error detected in _module_functionBackward. Traceback of forward call that caused the error: File "run_nerf.py", line 333, in train() File "run_nerf.py", line 324, in train train_nerf(args, train_dl, val_dl, hwf, i_split, near, far, render_poses) File "run_nerf.py", line 175, in train_nerf loss, psnr = train_nerf_on_epoch(args, train_dl, H, W, focal, N_rand, optimizer, loss_func, global_step, render_kwargs_train, scaler) File "run_nerf.py", line 55, in train_nerf_on_epoch rgb, disp, acc, extras = render(H, W, focal, chunk=args.chunk, rays=batch_rays, retraw=True, img_idx=img_idx, **render_kwargs_train) File "/home/shuaic/storage/nerf-pytorch-dev/script/models/rendering.py", line 261, in render all_ret = batchify_rays(rays, chunk, **kwargs) File "/home/shuaic/storage/nerf-pytorch-dev/script/models/rendering.py", line 206, in batchify_rays ret = render_rays(rays_flat[i:i+chunk], **kwargs) File "/home/shuaic/storage/nerf-pytorch-dev/script/models/rendering.py", line 154, in render_rays raw = network_query_fn(pts, viewdirs, img_idxs, network_fine, 'fine', output_transient, test_time=test_time) File "/home/shuaic/storage/nerf-pytorch-dev/script/models/nerfh_tcnn.py", line 288, in run_NeRFH_TCNN(inputs, viewdirs, ts, network_fn, typ=typ, File "/home/shuaic/storage/nerf-pytorch-dev/script/models/nerfh_tcnn.py", line 413, in run_NeRFH_TCNN out_chunks += [fn(inputs_flat[i: i+netchunk], input_dirs_flat[i:i+netchunk], ts=ts[i:i+netchunk], output_transient=output_transient)] File "/root/miniconda3/envs/torch-ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/shuaic/storage/nerf-pytorch-dev/script/models/nerfh_tcnn.py", line 250, in forward density_outputs = self.density(x) # [65536, 3] File "/home/shuaic/storage/nerf-pytorch-dev/script/models/nerfh_tcnn.py", line 162, in density h = self.sigma_net(x) File "/root/miniconda3/envs/torch-ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/root/miniconda3/envs/torch-ngp/lib/python3.8/site-packages/tinycudann/modules.py", line 119, in forward output = _module_function.apply( (Triggered internally at /opt/conda/conda-bld/pytorch_1646755903507/work/torch/csrc/autograd/python_anomaly_mode.cpp:104.) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 83%|█████████████████████████████████████████████████████████████████████████████▉ | 8288/10001 [8:05:58<1:40:26, 3.52s/it] Traceback (most recent call last): File "run_nerf.py", line 333, in train() File "run_nerf.py", line 324, in train train_nerf(args, train_dl, val_dl, hwf, i_split, near, far, render_poses) File "run_nerf.py", line 175, in train_nerf loss, psnr = train_nerf_on_epoch(args, train_dl, H, W, focal, N_rand, optimizer, loss_func, global_step, render_kwargs_train, scaler) File "run_nerf.py", line 75, in train_nerf_on_epoch scaler.scale(loss).backward() File "/root/miniconda3/envs/torch-ngp/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/root/miniconda3/envs/torch-ngp/lib/python3.8/site-packages/torch/autograd/init.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Function '_module_functionBackward' returned nan values in its 0th output.

opened by chenusc11 10
Converting rendered depth to 3D point cloud in world coordinates

Hi @ashawkey, thanks for the great project!

How would you plot a 3D point cloud in world coordinates using the rendered depth? I saw you've a function plot_pointcloud() but that looks like it's plotting in 'torch-ngp coordinates'.

I believe I need to string together nerf_matrix_to_ngp, rays_o, rays_d and depth (from render()) but I'm confused with how the different coordinate systems interact. I'd really appreciate some guidance on plotting 3D point clouds in world coordinates!

opened by kevin-thankyou-lin 9

Compilation issue - RuntimeError: Error building extension '_hash_encoder'

Thanks for the nice work! I met the following issue when I run python train_nerf.py data/fox --workspace trial_nerf. Do you have any thoughts? Many thanks for your help!

Traceback (most recent call last):
  File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1723, in _run_ninja_build
    env=env)
  File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train_nerf.py", line 3, in <module>
    from nerf.network import NeRFNetwork
  File "/home/wangjk/programs/torch-ngp/nerf/network.py", line 9, in <module>
    from encoding import get_encoder
  File "/home/wangjk/programs/torch-ngp/encoding.py", line 6, in <module>
    from hashencoder import HashEncoder
  File "/home/wangjk/programs/torch-ngp/hashencoder/__init__.py", line 1, in <module>
    from .hashgrid import HashEncoder
  File "/home/wangjk/programs/torch-ngp/hashencoder/hashgrid.py", line 8, in <module>
    from .backend import _backend
  File "/home/wangjk/programs/torch-ngp/hashencoder/backend.py", line 12, in <module>
    sources=[os.path.join(_src_path, 'src', f) for f in [
  File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1136, in load
    keep_intermediates=keep_intermediates)
  File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile
    is_standalone=is_standalone)
  File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1452, in _write_ninja_file_and_build_library
    error_prefix=f"Error building extension '{name}'")
  File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension '_hash_encoder': [1/2] /home/wangjk/anaconda3/envs/largesteps/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/largesteps/include -isystem /home/wangjk/anaconda3/envs/largesteps/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
FAILED: hashencoder.cuda.o 
/home/wangjk/anaconda3/envs/largesteps/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/largesteps/include -isystem /home/wangjk/anaconda3/envs/largesteps/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
/home/wangjk/programs/torch-ngp/hashencoder/src/hashencoder.cu(26): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (__half *, c10::Half)

1 error detected in the compilation of "/home/wangjk/programs/torch-ngp/hashencoder/src/hashencoder.cu".
ninja: build stopped: subcommand failed.

More info:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:15:15_PDT_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0

>>> import torch
>>> torch.version.cuda
'11.3'
>>> torch.__version__
'1.10.0'

opened by wangjksjtu 9

If NDC helps for forward-facing dataset?

Hi, thanks for the awesome work.

I have tested the code on several forward-facing scenes captured by myself, and found that we need carefully tune the parameters such as offset and bound to get proper results. I want to ask if first converting to NDC space like original NeRF do can avoid these problems? Because NDC helps to map the whole scene into a unit cube. If possible, any suggestion to use the NDC space? Thanks!

opened by Harper714 8

Distributed data parallel training

I'm trying to train torch-ngp on multiple GPUs.

I modified the dataloader by passing a distributedSampler, and passed the local_rank and world_size to Trainer, then run the script by torch.distributed.run.

The training process was normal before several epochs(e.g. 6), then crashed reporting this:

Traceback (most recent call last):                                                                                                                                     
  File "train_nerf_ddp.py", line 207, in <module>                                                                                                                      
    trainer.train(train_loader, valid_loader, opt.num_epochs)                                                                                                          
  File "/data/torch-ngp/nerf_ddp/utils.py", line 407, in train                                                                                                         
    self.train_one_epoch(train_loader)                                                                                                                                 
  File "/data/torch-ngp/nerf_ddp/utils.py", line 621, in train_one_epoch                                                                                               
    preds, truths, loss = self.train_step(data)                                                                                                                        
  File "/data/torch-ngp/nerf_ddp/utils.py", line 303, in train_step                                                                                                    
    outputs = self.model.render(rays_o, rays_d, z_far=self.depth_scale, staged=False, bg_color=bg_color, perturb=True, **self.conf)                                    
  File "/data/torch-ngp/nerf_ddp/renderer.py", line 404, in render                                                                                                     
    depth, image, depth_var = _run(rays_o, rays_d, num_steps, upsample_steps, bg_color, perturb, z_far=z_far)                                                          
  File "/data/torch-ngp/nerf_ddp/renderer.py", line 164, in run                                                                                                        
    sigmas, rgbs = self(pts.reshape(B, -1, 3), dirs.reshape(B, -1, 3))                                                                                                 
  File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl                                                    
    return forward_call(*input, **kwargs)                                                                                                                              
  File "/data/torch-ngp/nerf_ddp/network.py", line 73, in forward                                                                                                      
    x = self.encoder(x, bound=self.bound)                                                                                                                              
  File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl                                                    
    return forward_call(*input, **kwargs)                                                                                                                              
  File "/data/torch-ngp/hashencoder/hashgrid.py", line 137, in forward                                                                                                 
    outputs = hash_encode(inputs, self.embeddings, self.offsets, self.per_level_scale, self.base_resolution, inputs.requires_grad)                                     
  File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 94, in decorate_fwd                                               
    return fwd(*args, **kwargs)                                                                                                                                        
  File "/data/torch-ngp/hashencoder/hashgrid.py", line 41, in forward                                                                                                  
    outputs = outputs.permute(1, 0, 2).reshape(B, L * C)      
RuntimeError: CUDA error: an illegal memory access was encountered
loss=0.0212 (0.0189), psnr=14.04 (15.27):    4% 1/23 [00:00<00:19,  1.12it/s]terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1640811806235/work/c10/cuda/CUDACachingAllocator.cpp:1211 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f20ca556d62 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1c613 (0x7f210fa74613 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a2 (0x7f210fa75022 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0xa4 (0x7f20ca540314 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x299129 (0x7f2163a7c129 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0xada181 (0x7f21642bd181 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: THPVariable_subclass_dealloc(_object*) + 0x292 (0x7f21642bd482 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x10d0fc (0x55a5309840fc in /data/miniconda3/envs/ngp/bin/python)
frame #8: <unknown function> + 0x10fbcc (0x55a530986bcc in /data/miniconda3/envs/ngp/bin/python)
frame #9: <unknown function> + 0x217ca3 (0x55a530a8eca3 in /data/miniconda3/envs/ngp/bin/python)
frame #10: <unknown function> + 0x10fd05 (0x55a530986d05 in /data/miniconda3/envs/ngp/bin/python)
frame #11: <unknown function> + 0x1aa047 (0x55a530a21047 in /data/miniconda3/envs/ngp/bin/python)
frame #12: <unknown function> + 0x10d0fc (0x55a5309840fc in /data/miniconda3/envs/ngp/bin/python)
frame #13: <unknown function> + 0x10fbcc (0x55a530986bcc in /data/miniconda3/envs/ngp/bin/python)
frame #14: <unknown function> + 0x217ca3 (0x55a530a8eca3 in /data/miniconda3/envs/ngp/bin/python)
frame #15: <unknown function> + 0x10fd35 (0x55a530986d35 in /data/miniconda3/envs/ngp/bin/python)
frame #16: <unknown function> + 0x1aa047 (0x55a530a21047 in /data/miniconda3/envs/ngp/bin/python)
frame #17: <unknown function> + 0x10fd35 (0x55a530986d35 in /data/miniconda3/envs/ngp/bin/python)
frame #18: <unknown function> + 0x1aa047 (0x55a530a21047 in /data/miniconda3/envs/ngp/bin/python)
frame #19: _PyModule_ClearDict + 0x473 (0x55a530a1d723 in /data/miniconda3/envs/ngp/bin/python)
frame #20: PyImport_Cleanup + 0x408 (0x55a530a62f88 in /data/miniconda3/envs/ngp/bin/python)
frame #21: Py_FinalizeEx + 0x79 (0x55a530ac94f9 in /data/miniconda3/envs/ngp/bin/python)
frame #22: Py_RunMain + 0x1bc (0x55a530acc87c in /data/miniconda3/envs/ngp/bin/python)
frame #23: Py_BytesMain + 0x39 (0x55a530accc69 in /data/miniconda3/envs/ngp/bin/python)
frame #24: __libc_start_main + 0xe7 (0x7f219c264c87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #25: <unknown function> + 0x1f7427 (0x55a530a6e427 in /data/miniconda3/envs/ngp/bin/python)

I tried to use --tcnn, it crashed, too:

Traceback (most recent call last):                                                                                                                                     
  File "train_nerf_ddp.py", line 207, in <module>                                                                                                                      
    trainer.train(train_loader, valid_loader, opt.num_epochs)                                                                                                          
  File "/data/torch-ngp/nerf_ddp/utils.py", line 407, in train                                                                                                         
    self.train_one_epoch(train_loader)
  File "/data/torch-ngp/nerf_ddp/utils.py", line 621, in train_one_epoch
    preds, truths, loss = self.train_step(data)
  File "/data/torch-ngp/nerf_ddp/utils.py", line 303, in train_step
    outputs = self.model.render(rays_o, rays_d, z_far=self.depth_scale, staged=False, bg_color=bg_color, perturb=True, **self.conf)
  File "/data/torch-ngp/nerf_ddp/renderer.py", line 404, in render
    depth, image, depth_var = _run(rays_o, rays_d, num_steps, upsample_steps, bg_color, perturb, z_far=z_far)
  File "/data/torch-ngp/nerf_ddp/renderer.py", line 164, in run
    sigmas, rgbs = self(pts.reshape(B, -1, 3), dirs.reshape(B, -1, 3))
  File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/torch-ngp/nerf_ddp/network_tcnn.py", line 93, in forward
    x = self.encoder(x)
  File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/tinycudann/modules.py", line 82, in forward
    output = _module_function.apply(                                                                                                                                   
  File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/tinycudann/modules.py", line 31, in forward                                                              
    native_ctx, output = native_tcnn_module.fwd(input, params)   
   RuntimeError: /tmp/pip-req-build-3voywypo/include/tiny-cuda-nn/gpu_memory.h:574 cudaDeviceSynchronize() failed with error an illegal memory access was encount[100/234]
loss=0.0471 (0.0579), psnr=8.83 (7.39):    4% 1/23 [00:01<00:23,  1.05s/it]Could not free memory: /tmp/pip-req-build-3voywypo/include/tiny-cuda-nn/gpu_memory.h:128 cud
aFree(rawptr) failed with error an illegal memory access was encountered                                                                                               
Could not free memory: /tmp/pip-req-build-3voywypo/include/tiny-cuda-nn/gpu_memory.h:128 cudaFree(rawptr) failed with error an illegal memory access was encountered   
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered        
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered      
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered        
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered      
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered        
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered      
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered        
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
loss=0.0592 (0.0557), psnr=7.68 (7.02):   17% 4/23 [00:01<00:05,  3.23it/s]terminate called after throwing an instance of 'c10::Error'
  what():  NCCL error in: /opt/conda/conda-bld/pytorch_1640811806235/work/torch/csrc/distributed/c10d/NCCLUtils.hpp:181, unhandled cuda error, NCCL version 21.0.3
Process Group destroyed on rank 2
Exception raised from ncclCommAbort at /opt/conda/conda-bld/pytorch_1640811806235/work/torch/csrc/distributed/c10d/NCCLUtils.hpp:181 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f2124ff7d62 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f2124ff468b in /data/miniconda3/envs/ngp/lib/python3.8/sit
e-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x107c48e (0x7f2176de148e in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0x113 (0x7f2176dc9d93 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #4: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0x9 (0x7f2176dc9fb9 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #5: <unknown function> + 0xe67b76 (0x7f21bf0ebb76 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0xe4d885 (0x7f21bf0d1885 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x2a1b00 (0x7f21be525b00 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0x2a2d6e (0x7f21be526d6e in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x10d098 (0x56344cddb098 in /data/miniconda3/envs/ngp/bin/python)
frame #10: <unknown function> + 0x10fbcc (0x56344cdddbcc in /data/miniconda3/envs/ngp/bin/python)
frame #11: PyDict_Clear + 0x14b (0x56344cddef6b in /data/miniconda3/envs/ngp/bin/python)
frame #12: <unknown function> + 0x110ff9 (0x56344cddeff9 in /data/miniconda3/envs/ngp/bin/python)
frame #13: <unknown function> + 0x130246 (0x56344cdfe246 in /data/miniconda3/envs/ngp/bin/python)
frame #14: _PyGC_CollectNoFail + 0x2a (0x56344cf08a2a in /data/miniconda3/envs/ngp/bin/python)
frame #15: PyImport_Cleanup + 0x2ce (0x56344ceb9e4e in /data/miniconda3/envs/ngp/bin/python)
frame #16: Py_FinalizeEx + 0x79 (0x56344cf204f9 in /data/miniconda3/envs/ngp/bin/python)
frame #17: Py_RunMain + 0x1bc (0x56344cf2387c in /data/miniconda3/envs/ngp/bin/python)
frame #18: Py_BytesMain + 0x39 (0x56344cf23c69 in /data/miniconda3/envs/ngp/bin/python)
frame #19: __libc_start_main + 0xe7 (0x7f21f6d05c87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #20: <unknown function> + 0x1f7427 (0x56344cec5427 in /data/miniconda3/envs/ngp/bin/python)

Did I do something wrong? Or does the code simply not support multi-GPU training?

opened by xmk2222 8

Tips on how to tune "bound" & "scale" for a new scene?

I have a CG-generated dataset containing correct poses and depth range, exactly like Blender dataset. I used it to train a NeRF sucessfully by NeRF-pytorch code, but I failed to use it to train a NeRF by torch-ngp.

I think it may be due to my wrong setting of "bound" & "scale" for this scene. So do you have any tips on how to tune "bound" & "scale" for a new scene?

opened by JasonLSC 7
The training process on custom data dies, what to do?

Versions: Ubuntu 18.04 Cuda 11.2 Torch 1.11 I prepared data with COLMAP and started training on the custom dataset, dataloader loads data and its interrupts (seem's like I kill the process but I didn't press any button). Also, I tried on the Fox dataset, and everything works, but on custom, it dies, how fix it?

I tried it on wsl2 with Ubuntu 22.04, CUDA 11.3, Torch 1.12 and the situation is about the same. I installed all dependencies and built raytracing. On the fox dataset everything ok, but on my custom, it loads data and then prints "KILLED". About the dataset it contains 325 images with 1080p resolution, I tested it on Nvidia instant-ngp, and it worked fine. Help please

opened by MykytaKyt 1
About the inference speed

Hi, I have some questions about the inference speed of TensoRF. Is this implementation of the TensoRF faster than the original repo? The original repo is implemented purely in Pytorch such that the inference speed is slow. Since this repo is implemented with cuda, should it be much faster?

According to performance-reference, the speed seems still not fast?

Many thanks!

opened by weihaosky 0
Multi-camera dataset DNeRF

Hi,

I want to use torch-ngp to create DNeRFs, however I need to create and load a dataset with videos from multiple camera viewpoints. I’m wondering if this implementation of d-NeRF can/does support this?

thanks

opened by fotfotfive 0
Performance on MIPNeRF scenes (garden, bike)

Thanks for the great release!

Has anyone managed to reproduce sharp rendering results on MIPNeRF's scenes (namely garden etc) using torch-ngp ? If yes, what values of bound, scale worked the best ?

My current novel view synthesis results are not of high fidelity. Would appreciate any quick cues.

Thanks & Best Regards Shivam

opened by ShivamDuggal4 0

Torch-ngp - A pytorch implementation of the hash encoder proposed in instant-ngp

Related tags

Overview

HashGrid Encoder (WIP)

Progress

Usage

Comments

Owner

hawkey

Instant-nerf-pytorch - NeRF trained SUPER FAST in pytorch

Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

Pytorch implementation of the popular Improv RNN model originally proposed by the Magenta team.

A PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-Supervised Learning Framework".

Instant Real-Time Example-Based Style Transfer to Facial Videos

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Pytorch and Torch testing code of CartoonGAN

Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Torch-mutable-modules - Use in-place and assignment operations on PyTorch module parameters with support for autograd

Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Torch implementation of "Enhanced Deep Residual Networks for Single Image Super-Resolution"

PyTorch reimplementation of the Smooth ReLU activation function proposed in the paper "Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations" [arXiv 2022].

This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021.

An implementation for the loss function proposed in Decoupled Contrastive Loss paper.

Implementation of the method proposed in the paper "Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation"