Torch-ngp - A pytorch implementation of the hash encoder proposed in instant-ngp

Overview

HashGrid Encoder (WIP)

A pytorch implementation of the HashGrid Encoder from instant-ngp, as described in Instant Neural Graphics Primitives with a Multiresolution Hash Encoding.

Note: This repo only tries to implement the hash grid encoder for now, and is far from instant (especially for NeRF experiments).

SDF NeRF

Progress

  • HashGrid Encoder
    • basic pytorch CUDA extension
    • fp16 support
  • Experiments
    • SDF
      • baseline
      • better SDF calculation (especially for non-watertight meshes)
    • NeRF
      • baseline (although much slower)
      • ray marching in CUDA.

Usage

We use the same data format as instant-ngp, e.g., armadillo and fox. Please download the data from instant-ngp and put them under ./data.

# SDF experiment
bash scripts/run_sdf.sh

# NeRF experiment
bash scripts/run_nerf.sh
Comments
  • Error building extension '_raymarching'

    Error building extension '_raymarching'

    I get this when tryin to launch the example for the first time

    Traceback (most recent call last): File "C:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\utils\cpp_extension.py", line 1740, in _run_ninja_build subprocess.run( File "C:\Users\franz\anaconda3\envs\torchngp\lib\subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "E:\nvidia\torch-ngp\main_nerf.py", line 63, in from nerf.network import NeRFNetwork File "E:\nvidia\torch-ngp\nerf\network.py", line 7, in from .renderer import NeRFRenderer File "E:\nvidia\torch-ngp\nerf\renderer.py", line 9, in import raymarching File "E:\nvidia\torch-ngp\raymarching_init_.py", line 1, in from .raymarching import * File "E:\nvidia\torch-ngp\raymarching\raymarching.py", line 9, in from .backend import backend File "E:\nvidia\torch-ngp\raymarching\backend.py", line 31, in backend = load(name='raymarching', File "C:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\utils\cpp_extension.py", line 1144, in load return jit_compile( File "C:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\utils\cpp_extension.py", line 1357, in jit_compile write_ninja_file_and_build_library( File "C:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\utils\cpp_extension.py", line 1469, in write_ninja_file_and_build_library run_ninja_build( File "C:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\utils\cpp_extension.py", line 1756, in run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'raymarching': [1/2] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc --generate-dependencies-with-compile --dependency-output raymarching.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=raymarching -DTORCH_API_INCLUDE_EXTENSION_H -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\TH -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" -IC:\Users\franz\anaconda3\envs\torchngp\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -O3 -std=c++14 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS_ -U__CUDA_NO_HALF2_OPERATORS__ -c E:\nvidia\torch-ngp\raymarching\src\raymarching.cu -o raymarching.cuda.o FAILED: raymarching.cuda.o C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc --generate-dependencies-with-compile --dependency-output raymarching.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=raymarching -DTORCH_API_INCLUDE_EXTENSION_H -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\TH -IC:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" -IC:\Users\franz\anaconda3\envs\torchngp\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS_ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -O3 -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -c E:\nvidia\torch-ngp\raymarching\src\raymarching.cu -o raymarching.cuda.o cl : warning della riga di comando D9025 : override di '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : warning della riga di comando D9025 : override di '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : warning della riga di comando D9025 : override di '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' raymarching.cu C:/Users/franz/anaconda3/envs/torchngp/lib/site-packages/torch/include\c10/macros/Macros.h(142): warning C4067: token imprevisti dopo una direttiva per il preprocessore. Previsto un carattere di nuova riga cl : warning della riga di comando D9025 : override di '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : warning della riga di comando D9025 : override di '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : warning della riga di comando D9025 : override di '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' raymarching.cu C:/Users/franz/anaconda3/envs/torchngp/lib/site-packages/torch/include\c10/macros/Macros.h(142): warning C4067: token imprevisti dopo una direttiva per il preprocessore. Previsto un carattere di nuova riga C:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\pybind11\cast.h(1429): error: too few arguments for template template parameter "Tuple" detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]" (1507): here

    C:\Users\franz\anaconda3\envs\torchngp\lib\site-packages\torch\include\pybind11\cast.h(1503): error: too few arguments for template template parameter "Tuple" detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]" (1507): here

    E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(332): warning #177-D: variable "index" was declared but never referenced detected during instantiation of "void kernel_composite_weights_forward(const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, scalar_t *) [with scalar_t=double]" (626): here

    E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(332): warning #177-D: variable "index" was declared but never referenced detected during instantiation of "void kernel_composite_weights_forward(const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, scalar_t *) [with scalar_t=float]" (626): here

    E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(332): warning #177-D: variable "index" was declared but never referenced detected during instantiation of "void kernel_composite_weights_forward(const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, scalar_t *) [with scalar_t=c10::Half]" (626): here

    E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(407): warning #177-D: variable "weight" was declared but never referenced detected during instantiation of "void kernel_composite_weights_backward(const scalar_t *, const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, const scalar_t *, scalar_t *) [with scalar_t=double]" (659): here

    E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(389): warning #177-D: variable "index" was declared but never referenced detected during instantiation of "void kernel_composite_weights_backward(const scalar_t *, const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, const scalar_t *, scalar_t *) [with scalar_t=double]" (659): here

    E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(407): warning #177-D: variable "weight" was declared but never referenced detected during instantiation of "void kernel_composite_weights_backward(const scalar_t *, const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, const scalar_t *, scalar_t *) [with scalar_t=float]" (659): here

    E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(389): warning #177-D: variable "index" was declared but never referenced detected during instantiation of "void kernel_composite_weights_backward(const scalar_t *, const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, const scalar_t *, scalar_t *) [with scalar_t=float]" (659): here

    E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(407): warning #177-D: variable "weight" was declared but never referenced detected during instantiation of "void kernel_composite_weights_backward(const scalar_t *, const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, const scalar_t *, scalar_t *) [with scalar_t=c10::Half]" (659): here

    E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(389): warning #177-D: variable "index" was declared but never referenced detected during instantiation of "void kernel_composite_weights_backward(const scalar_t *, const scalar_t *, const scalar_t *, const int *, uint32_t, uint32_t, const scalar_t *, scalar_t *) [with scalar_t=c10::Half]" (659): here

    E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(777): warning #177-D: variable "near" was declared but never referenced detected during instantiation of "void kernel_march_rays(uint32_t, uint32_t, const int *, const scalar_t *, const scalar_t *, const scalar_t *, float, float, uint32_t, uint32_t, const scalar_t *, float, const scalar_t *, const scalar_t *, scalar_t *, scalar_t *, scalar_t *, uint32_t) [with scalar_t=double]" (853): here

    E:\nvidia\torch-ngp\raymarching\src\raymarching.cu(777): warning #177-D: variable "near" was declared but never referenced detected during instantiation of "void kernel_march_rays(uint32_t, uint32_t, const int *, const scalar_t *, const scalar_t *, const scalar_t *, float, float, uint32_t, uint32_t, const scalar_t *, float, const scalar_t *, const scalar_t *, scalar_t *, scalar_t *, scalar_t *, uint32_t) [with scalar_t=float]" (853): here

    2 errors detected in the compilation of "E:/nvidia/torch-ngp/raymarching/src/raymarching.cu". raymarching.cu ninja: build stopped: subcommand failed.

    opened by francescofugazzi 14
  • NeRF inference profiling

    NeRF inference profiling

    This issue records the current profiling of NeRF, which shows the speed bottleneck.

    Inference with --fp16 --ff on 3 1920x1080 frames, 128+128 points per ray:

    -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ----------[258/1997]
                                                       Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls       
    -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------        
                                               _hash_encode         3.53%     391.299ms        10.87%        1.206s     396.505us        2.607s        24.09%        3.510s       1.154ms          3042        
    void kernel_grid<c10::Half, 3u, 2u>(c10::Half const*...         0.00%       0.000us         0.00%       0.000us       0.000us        2.607s        24.09%        2.607s     857.029us          3042        
                                             _ffmlp_forward         5.02%     557.577ms         8.39%     930.914ms     153.010us        2.054s        18.98%        2.074s     340.958us          6084        
    void kernel_mlp_fused<64, 1, 8, __half, true>(Activa...         0.00%       0.000us         0.00%       0.000us       0.000us        2.054s        18.98%        2.054s     337.648us          6084        
                                                 aten::_cat         1.98%     219.730ms         4.66%     517.135ms      30.892us        1.317s        12.17%        1.317s      78.680us         16740        
                                                 aten::sort         0.30%      33.207ms         2.29%     254.487ms     167.316us        1.226s        11.33%        1.281s     841.893us          1521        
    void at::native::bitonicSortKVInPlace<float, long, 2...         0.00%       0.000us         0.00%       0.000us       0.000us        1.226s        11.33%        1.226s     806.281us          1521        
                                                aten::copy_         3.08%     341.543ms        31.30%        3.474s      95.054us        1.150s        10.62%        1.155s      31.611us         36546        
    void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us        1.035s         9.57%        1.035s     340.388us          3042        
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     541.719ms         5.01%     541.719ms      35.602us         15216        
                                                _sh_encoder         2.18%     241.916ms         3.85%     426.764ms     140.291us     534.972ms         4.94%     595.884ms     195.886us          3042        
    void kernel_sh<c10::Half>(c10::Half const*, c10::Hal...         0.00%       0.000us         0.00%       0.000us       0.000us     534.972ms         4.94%     534.972ms     175.862us          3042        
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     442.365ms         4.09%     442.365ms     145.419us          3042        
                                                  aten::mul         2.60%     288.552ms         4.19%     465.465ms      20.391us     292.856ms         2.71%     292.856ms      12.829us         22827        
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     247.842ms         2.29%     247.842ms      13.572us         18261        
                                                  aten::sub         3.23%     358.866ms         5.20%     577.077ms      21.069us     217.357ms         2.01%     217.357ms       7.936us         27390        
                                               aten::gather         0.97%     107.481ms         1.54%     171.240ms      28.146us     212.941ms         1.97%     212.941ms      35.000us          6084        
                                                  aten::add         1.94%     214.941ms         3.08%     341.548ms      20.410us     202.851ms         1.87%     202.851ms      12.122us         16734        
                                                  aten::div         1.78%     197.500ms         2.82%     312.510ms      20.526us     180.809ms         1.67%     180.809ms      11.876us         15225        
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     133.226ms         1.23%     133.226ms      14.599us          9126        
                                              aten::sigmoid         0.44%      48.847ms         0.73%      80.799ms      26.561us     124.591ms         1.15%     124.591ms      40.957us          3042        
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     124.591ms         1.15%     124.591ms      40.957us          3042        
    void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us     123.191ms         1.14%     123.191ms      80.993us          1521        
                                            aten::clamp_min         0.57%      63.140ms         2.08%     230.413ms      30.274us     115.899ms         1.07%     230.170ms      30.242us          7611        
    void at::native::_scatter_gather_elementwise_kernel<...         0.00%       0.000us         0.00%       0.000us       0.000us     115.718ms         1.07%     115.718ms      38.040us          3042        
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     114.181ms         1.06%     114.181ms      37.535us          3042        
    void at::native::_scatter_gather_elementwise_kernel<...         0.00%       0.000us         0.00%       0.000us       0.000us      97.223ms         0.90%      97.223ms      31.960us          3042        
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      95.482ms         0.88%      95.482ms      12.550us          7608        
    void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us      92.027ms         0.85%      92.027ms      12.096us          7608        
    void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      89.500ms         0.83%      89.500ms      14.711us          6084        
                                                  aten::sum         1.05%     116.752ms         1.50%     166.470ms      27.362us      87.986ms         0.81%      87.986ms      14.462us          6084        
                                                aten::clamp         0.61%      67.853ms         1.10%     122.594ms      20.150us      87.157ms         0.81%      88.785ms      14.593us          6084        
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      87.157ms         0.81%      87.157ms      19.101us          4563        
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      80.040ms         0.74%      80.040ms      52.623us          1521        
                                                  aten::min         0.98%     108.652ms         2.15%     238.483ms      39.198us      66.320ms         0.61%      95.284ms      15.661us          6084        
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      65.883ms         0.61%      65.883ms      10.829us          6084        
    _ZN2at6native32tensor_kernel_scan_innermost_dimIfLi1...         0.00%       0.000us         0.00%       0.000us       0.000us      64.053ms         0.59%      64.053ms      21.056us          3042        
                                              aten::cumprod         0.43%      47.625ms         0.65%      71.634ms      23.548us      63.409ms         0.59%      63.409ms      20.845us          3042        
    void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      58.054ms         0.54%      58.054ms      19.084us          3042  
                                                  aten::max         0.94%     104.159ms         2.01%     222.583ms      36.585us      57.724ms         0.53%      86.936ms      14.289us          6084  
                           Memcpy DtoH (Device -> Pageable)         0.00%       0.000us         0.00%       0.000us       0.000us      51.809ms         0.48%      51.809ms       8.507us          6090  
    void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      49.877ms         0.46%      49.877ms      16.396us          3042  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      46.988ms         0.43%      46.988ms      10.298us          4563  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      45.896ms         0.42%      45.896ms       7.536us          6090  
                                                aten::fill_         1.03%     114.067ms         2.19%     242.845ms      15.960us      43.753ms         0.40%      43.753ms       2.875us         15216  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      41.419ms         0.38%      41.419ms      13.616us          3042  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      40.997ms         0.38%      40.997ms      13.477us          3042  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      39.479ms         0.36%      39.479ms       6.476us          6096  
    void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us      39.258ms         0.36%      39.258ms      25.811us          1521  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      37.032ms         0.34%      37.032ms       4.869us          7605  
                                         aten::searchsorted         0.16%      17.916ms         0.39%      43.311ms      28.475us      33.258ms         0.31%      33.258ms      21.866us          1521  
    void at::native::(anonymous namespace)::searchsorted...         0.00%       0.000us         0.00%       0.000us       0.000us      33.258ms         0.31%      33.258ms      21.866us          1521  
                             Memcpy DtoD (Device -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us      31.294ms         0.29%      31.294ms       5.144us          6084  
                                              aten::maximum         0.16%      17.381ms         0.25%      27.532ms      18.101us      29.250ms         0.27%      29.250ms      19.231us          1521  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      29.250ms         0.27%      29.250ms      19.231us          1521  
                                              aten::minimum         0.15%      16.095ms         0.24%      26.151ms      17.193us      28.983ms         0.27%      28.983ms      19.055us          1521  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      28.983ms         0.27%      28.983ms      19.055us          1521  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      24.661ms         0.23%      24.661ms      16.214us          1521  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      21.328ms         0.20%      21.328ms       7.011us          3042  
                                                  aten::neg         0.31%      34.620ms         0.51%      56.827ms      18.681us      19.157ms         0.18%      19.157ms       6.298us          3042  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      19.157ms         0.18%      19.157ms       6.298us          3042  
    _ZN2at6native32tensor_kernel_scan_innermost_dimIfLi1...         0.00%       0.000us         0.00%       0.000us       0.000us      18.386ms         0.17%      18.386ms      12.088us          1521  
    void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us      18.318ms         0.17%      18.318ms      12.043us          1521  
                                                  aten::exp         0.35%      38.918ms         0.53%      58.556ms      19.249us      18.278ms         0.17%      18.278ms       6.009us          3042  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      18.278ms         0.17%      18.278ms       6.009us          3042  
                                               aten::cumsum         0.21%      23.407ms         0.31%      34.911ms      22.953us      17.978ms         0.17%      17.978ms      11.820us          1521  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      14.847ms         0.14%      14.847ms       9.761us          1521  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      14.280ms         0.13%      14.280ms       9.389us          1521  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      13.793ms         0.13%      13.793ms       9.068us          1521  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      13.094ms         0.12%      13.094ms       1.434us          9132  
                           Memcpy HtoD (Pageable -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us      11.129ms         0.10%      11.129ms       1.824us          6102  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      10.466ms         0.10%      10.466ms       6.867us          1524  
                                             aten::_s_where         0.44%      48.315ms         0.88%      98.069ms      21.492us       9.361ms         0.09%       9.361ms       2.052us          4563  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       9.361ms         0.09%       9.361ms       2.052us          4563  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       9.338ms         0.09%       9.338ms       3.070us          3042  
    void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us       8.848ms         0.08%       8.848ms       5.817us          1521  
    void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us       8.264ms         0.08%       8.264ms       5.433us          1521  
                                                 aten::norm         0.26%      28.412ms         0.40%      44.594ms      29.261us       7.870ms         0.07%       7.870ms       5.164us          1524  
    void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us       7.846ms         0.07%       7.846ms       5.158us          1521  
    void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us       7.667ms         0.07%       7.667ms       5.041us          1521  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us       6.533ms         0.06%       6.533ms       2.148us          3042  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       6.372ms         0.06%       6.372ms       1.396us          4566  
                                                   aten::lt         0.40%      44.178ms         0.59%      65.952ms      21.680us       6.184ms         0.06%       6.184ms       2.033us          3042  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       4.586ms         0.04%       4.586ms       3.015us          1521  
                                               aten::arange         0.92%     102.334ms         1.46%     162.401ms      53.281us       3.053ms         0.03%       6.106ms       2.003us          3048  
    void (anonymous namespace)::elementwise_kernel_with_...         0.00%       0.000us         0.00%       0.000us       0.000us       3.053ms         0.03%       3.053ms       2.007us          1521  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.718ms         0.02%       1.718ms       1.127us          1524  
                                                   aten::gt         0.21%      23.296ms         0.31%      34.756ms      22.851us       1.647ms         0.02%       1.647ms       1.083us          1521  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.647ms         0.02%       1.647ms       1.083us          1521  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.598ms         0.01%       1.598ms       1.051us          1521  
                                                  aten::bmm         0.02%       2.302ms         0.07%       7.562ms       1.260ms       1.026ms         0.01%       2.360ms     393.333us             6  
                            volta_fp16_sgemm_fp16_128x32_tn         0.00%       0.000us         0.00%       0.000us       0.000us       1.026ms         0.01%       1.026ms     171.000us             6  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     289.000us         0.00%     289.000us      96.333us             3  
                                  aten::_local_scalar_dense         0.29%      32.665ms         1.26%     139.968ms      22.979us     231.000us         0.00%     231.000us       0.038us          6091  
    void at::native::reduce_kernel<128, 4, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us     203.000us         0.00%     203.000us      67.667us             3  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us       6.000us         0.00%       6.000us       2.000us             3  
                                            Memset (Device)         0.00%       0.000us         0.00%       0.000us       0.000us       3.000us         0.00%       3.000us       0.000us          6084  
                                                aten::empty         3.03%     336.134ms         3.03%     336.578ms       4.422us       0.000us         0.00%       0.000us       0.000us         76110  
                                              aten::random_         0.00%      31.000us         0.00%      31.000us      31.000us       0.000us         0.00%       0.000us       0.000us             1  
                                                 aten::item         0.13%      14.935ms         1.40%     154.881ms      25.428us       0.000us         0.00%     231.000us       0.038us          6091  
                                                aten::zeros         0.00%      51.000us         0.00%      82.000us      20.500us       0.000us         0.00%       0.000us       0.000us             4  
                                                aten::zero_         0.22%      24.701ms         1.15%     127.125ms      20.881us       0.000us         0.00%      21.785ms       3.578us          6088  
    enumerate(DataLoader)#_SingleProcessDataLoaderIter._...         0.01%     841.000us         0.01%       1.235ms     308.750us       0.000us         0.00%       0.000us       0.000us             4  
                                                   aten::to         0.63%      69.980ms        32.04%        3.556s     111.123us       0.000us         0.00%     604.426ms      18.888us         32001  
                                                aten::stack         0.09%      10.056ms         0.64%      70.649ms      46.176us       0.000us         0.00%     125.256ms      81.867us          1530  
                                            aten::unsqueeze         0.70%      77.737ms         0.94%     104.781ms       4.046us       0.000us         0.00%       0.000us       0.000us         25896  
                                           aten::as_strided         1.31%     145.867ms         1.33%     147.451ms       0.969us       0.000us         0.00%       0.000us       0.000us        152229  
                                                  aten::cat         0.61%      68.121ms         5.27%     585.256ms      34.962us       0.000us         0.00%        1.317s      78.680us         16740  
                                              aten::resize_         1.42%     157.545ms         1.42%     157.717ms       5.182us       0.000us         0.00%       0.000us       0.000us         30435  
                                              aten::detach_         0.00%      15.000us         0.00%      22.000us       2.444us       0.000us         0.00%       0.000us       0.000us             9  
                                                    detach_         0.00%       7.000us         0.00%      11.000us       1.222us       0.000us         0.00%       0.000us       0.000us             9  
                                             aten::_to_copy         0.99%     109.595ms        31.46%        3.492s     163.761us       0.000us         0.00%     604.426ms      28.345us         21324  
                                        aten::empty_strided         3.28%     364.021ms         3.30%     366.594ms       9.263us       0.000us         0.00%       0.000us       0.000us         39576  
                                            cudaMemcpyAsync         2.73%     303.436ms         2.73%     303.436ms      16.603us       0.000us         0.00%       0.000us       0.000us         18276  
                                                aten::slice         1.85%     205.048ms         2.44%     270.957ms       3.492us       0.000us         0.00%       0.000us       0.000us         77583  
                                               aten::select         0.36%      39.487ms         0.48%      52.748ms       4.321us       0.000us         0.00%       0.000us       0.000us         12207  
                                             aten::linspace         0.23%      25.233ms         0.42%      46.684ms       7.658us       0.000us         0.00%       0.000us       0.000us          6096  
                                             aten::meshgrid         0.00%      74.000us         0.00%     149.000us      49.667us       0.000us         0.00%       0.000us       0.000us             3  
                                                 aten::view         0.31%      34.551ms         0.31%      34.551ms       2.523us       0.000us         0.00%       0.000us       0.000us         13695  
                                               aten::expand         0.32%      35.520ms         0.44%      49.138ms       4.604us       0.000us         0.00%       0.000us       0.000us         10674  
                                                    aten::t         0.00%      60.000us         0.00%      79.000us      13.167us       0.000us         0.00%       0.000us       0.000us             6  
                                            aten::transpose         0.00%      46.000us         0.00%      70.000us       5.833us       0.000us         0.00%       0.000us       0.000us            12  
                                            aten::expand_as         0.09%       9.560ms         0.25%      27.680ms       6.050us       0.000us         0.00%       0.000us       0.000us          4575  
                                           aten::contiguous         0.03%       3.433ms         0.57%      62.986ms      41.248us       0.000us         0.00%       7.658ms       5.015us          1527  
                                                aten::clone         0.31%      34.313ms         2.37%     263.087ms      34.567us       0.000us         0.00%     489.353ms      64.295us          7611  
                                           aten::empty_like         0.67%      74.421ms         2.41%     267.441ms      10.338us       0.000us         0.00%       0.000us       0.000us         25869  
                                      cudaStreamSynchronize        24.06%        2.670s        24.06%        2.670s     219.156us       0.000us         0.00%       0.000us       0.000us         12183  
                                              aten::reshape         0.84%      92.801ms         3.51%     389.672ms      10.245us       0.000us         0.00%     481.695ms      12.664us         38037  
                                       aten::_reshape_alias         0.62%      68.539ms         0.65%      72.009ms       2.254us       0.000us         0.00%       0.000us       0.000us         31953  
                                            aten::ones_like         0.30%      32.979ms         2.49%     276.039ms      30.228us       0.000us         0.00%      21.968ms       2.406us          9132  
                                           cudaLaunchKernel        16.00%        1.776s        16.00%        1.776s       8.336us       0.000us         0.00%       0.000us       0.000us        213009  
                                                   cudaFree         0.00%       8.000us         0.00%       8.000us       4.000us       0.000us         0.00%       0.000us       0.000us             2  
                                     cudaDeviceGetAttribute         0.00%       2.000us         0.00%       2.000us       0.143us       0.000us         0.00%       0.000us       0.000us            14  
                                       cudaGetSymbolAddress         0.00%       1.000us         0.00%       1.000us       1.000us       0.000us         0.00%       0.000us       0.000us             1  
                                                 cudaMalloc         0.02%       2.525ms         0.02%       2.525ms     229.545us       0.000us         0.00%       0.000us       0.000us            11  
                                   cudaEventCreateWithFlags         0.00%       7.000us         0.00%       7.000us       0.389us       0.000us         0.00%       0.000us       0.000us            18  
                                       cudaFuncSetAttribute         0.07%       7.749ms         0.07%       7.749ms       1.251us       0.000us         0.00%       0.000us       0.000us          6192  
    cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFla...         0.00%      22.000us         0.00%      22.000us       1.833us       0.000us         0.00%       0.000us       0.000us            12  
                                       aten::frobenius_norm         0.17%      19.175ms         0.96%     106.797ms      70.215us       0.000us         0.00%       9.278ms       6.100us          1521  
                                                 aten::rsub         0.60%      66.377ms         2.68%     297.521ms      27.944us       0.000us         0.00%      43.565ms       4.092us         10647  
                                                aten::where         0.11%      11.916ms         0.99%     109.985ms      24.104us       0.000us         0.00%       9.361ms       2.052us          4563  
                                         aten::_unsafe_view         0.13%      14.005ms         0.23%      25.146ms       4.133us       0.000us         0.00%       0.000us       0.000us          6084  
                                            cudaMemsetAsync         0.37%      40.511ms         0.37%      40.511ms       6.659us       0.000us         0.00%       0.000us       0.000us          6084  
                                              aten::permute         0.14%      15.828ms         0.20%      22.640ms       7.442us       0.000us         0.00%       0.000us       0.000us          3042  
                                      cudaStreamIsCapturing         0.00%      13.000us         0.00%      13.000us       1.625us       0.000us         0.00%       0.000us       0.000us             8  
                                                 aten::relu         0.48%      53.008ms         1.52%     168.367ms      55.347us       0.000us         0.00%     114.181ms      37.535us          3042  
                                           aten::zeros_like         0.20%      22.493ms         1.99%     220.955ms      36.317us       0.000us         0.00%      21.785ms       3.581us          6084  
                                               aten::detach         0.03%       3.163ms         0.06%       6.806ms       4.457us       0.000us         0.00%       0.000us       0.000us          1527  
                                                     detach         0.03%       3.643ms         0.04%       4.445ms       2.911us       0.000us         0.00%       0.000us       0.000us          1527  
                                      cudaDeviceSynchronize         0.00%      48.000us         0.00%      48.000us      48.000us       0.000us         0.00%       0.000us       0.000us             1  
    -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
    Self CPU time total: 11.099s
    Self CUDA time total: 10.822s
    
    
    

    Inference with --fp16 --ff --cuda_raymarching on 3 1920x1080 frames:

    
    -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------        
                                                       Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Cal[76/1997]
    -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                               _hash_encode         2.08%     199.448ms         5.88%     564.231ms     369.020us        2.179s        23.75%        2.824s       1.847ms          1529  
    void kernel_grid<c10::Half, 3u, 2u>(c10::Half const*...         0.00%       0.000us         0.00%       0.000us       0.000us        2.179s        23.75%        2.179s       1.425ms          1529  
                                             _ffmlp_forward         4.10%     393.471ms         6.16%     591.005ms     193.772us        1.983s        21.61%        1.993s     653.448us          3050  
    void kernel_mlp_fused<64, 1, 8, __half, true>(Activa...         0.00%       0.000us         0.00%       0.000us       0.000us        1.979s        21.57%        1.979s     650.671us          3042  
                                           _generate_points         2.07%     198.297ms        71.59%        6.872s       4.518ms        1.638s        17.86%        1.688s       1.110ms          1521  
    void kernel_generate_points<c10::Half>(c10::Half con...         0.00%       0.000us         0.00%       0.000us       0.000us        1.638s        17.86%        1.638s       1.077ms          1521  
                                                 aten::_cat         0.55%      52.390ms         1.16%     111.195ms      36.255us        1.222s        13.31%        1.222s     398.330us          3067  
    void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us        1.128s        12.29%        1.128s     741.391us          1521  
                                                aten::copy_         2.10%     201.777ms         6.39%     613.478ms      30.858us     879.039ms         9.58%     884.456ms      44.488us         19881  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     552.081ms         6.02%     552.081ms     120.779us          4571  
    void kernel_sh<c10::Half>(c10::Half const*, c10::Hal...         0.00%       0.000us         0.00%       0.000us       0.000us     516.578ms         5.63%     516.578ms     339.631us          1521  
                                                _sh_encoder         1.17%     112.374ms         1.54%     148.212ms      97.444us     516.234ms         5.63%     516.234ms     339.404us          1521  
                                           _accumulate_rays         1.37%     131.976ms         2.77%     265.826ms     174.771us     330.450ms         3.60%     333.671ms     219.376us          1521  
    void kernel_accumulate_rays_forward<c10::Half>(c10::...         0.00%       0.000us         0.00%       0.000us       0.000us     330.450ms         3.60%     330.450ms     217.258us          1521  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     250.761ms         2.73%     250.761ms      32.844us          7635  
                                              aten::sigmoid         0.25%      23.853ms         0.42%      40.290ms      26.489us     113.450ms         1.24%     113.450ms      74.589us          1521  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     113.450ms         1.24%     113.450ms      74.589us          1521  
                                            aten::clamp_min         0.28%      27.235ms         1.10%     105.430ms      34.409us     102.666ms         1.12%     205.332ms      67.014us          3064  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     102.576ms         1.12%     102.576ms      67.087us          1529  
    void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us      91.992ms         1.00%      91.992ms      60.481us          1521  
                                                  aten::div         0.48%      46.482ms         0.77%      73.898ms      24.110us      89.251ms         0.97%      89.251ms      29.119us          3065  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      54.861ms         0.60%      54.861ms      36.069us          1521  
                                                  aten::add         0.27%      25.442ms         0.42%      40.336ms      26.329us      53.245ms         0.58%      53.245ms      34.755us          1532  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      53.051ms         0.58%      53.051ms      34.879us          1521  
                           Memcpy DtoH (Device -> Pageable)         0.00%       0.000us         0.00%       0.000us       0.000us      39.039ms         0.43%      39.039ms       8.509us          4588  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      33.580ms         0.37%      33.580ms      22.078us          1521  
                                                  aten::max         0.39%      37.096ms         1.18%     113.050ms      73.889us      27.348ms         0.30%      79.880ms      52.209us          1530  
    void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      27.216ms         0.30%      27.216ms      17.893us          1521  
                                                  aten::min         0.44%      42.345ms         1.37%     131.581ms      86.001us      26.664ms         0.29%      84.516ms      55.239us          1530  
    void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      26.534ms         0.29%      26.534ms      17.445us          1521  
                           Memcpy HtoD (Pageable -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us      15.368ms         0.17%      15.368ms       9.883us          1555  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      12.741ms         0.14%      12.741ms       4.188us          3042  
                                                aten::fill_         0.53%      51.353ms         1.22%     116.670ms      19.151us      12.460ms         0.14%      12.460ms       2.045us          6092  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us       9.223ms         0.10%       9.223ms       3.030us          3044  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       7.497ms         0.08%       7.497ms       2.464us          3042  
    void kernel_mlp_fused<64, 1, 8, __half, false>(Activ...         0.00%       0.000us         0.00%       0.000us       0.000us       3.178ms         0.03%       3.178ms     397.250us             8  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       3.091ms         0.03%       3.091ms       2.032us          1521  
    void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us       2.030ms         0.02%       2.030ms     676.667us             3  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.872ms         0.02%       1.872ms       1.224us          1529  
                                                  aten::bmm         0.04%       3.937ms         0.10%       9.900ms       1.650ms       1.018ms         0.01%       2.343ms     390.500us             6  
                            volta_fp16_sgemm_fp16_128x32_tn         0.00%       0.000us         0.00%       0.000us       0.000us       1.018ms         0.01%       1.018ms     169.667us             6  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     698.000us         0.01%     698.000us      58.167us            12  
                                                  aten::sub         0.00%     292.000us         0.00%     457.000us      38.083us     615.000us         0.01%     615.000us      51.250us            12  
                                                  aten::mul         0.00%     204.000us         0.00%     346.000us      28.833us     378.000us         0.00%     378.000us      31.500us            12  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     287.000us         0.00%     287.000us      31.889us             9  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     283.000us         0.00%     283.000us      94.333us             3  
                                  aten::_local_scalar_dense         0.30%      29.017ms        67.66%        6.495s       1.415ms     276.000us         0.00%     276.000us       0.060us          4589  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     276.000us         0.00%     276.000us      46.000us             6  
                                                 aten::norm         0.00%     136.000us         0.00%     194.000us      64.667us     201.000us         0.00%     201.000us      67.000us             [26/1997]
    void at::native::reduce_kernel<128, 4, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us     201.000us         0.00%     201.000us      67.000us             3        
                              aten::max_pool3d_with_indices         0.00%      32.000us         0.00%      68.000us      68.000us     149.000us         0.00%     149.000us     149.000us             1        
    void at::native::(anonymous namespace)::max_pool3d_w...         0.00%       0.000us         0.00%       0.000us       0.000us     149.000us         0.00%     149.000us     149.000us             1  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     139.000us         0.00%     139.000us      46.333us             3  
    void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us     124.000us         0.00%     124.000us      13.778us             9  
    void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us     123.000us         0.00%     123.000us      13.667us             9  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     105.000us         0.00%     105.000us      13.125us             8  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     100.000us         0.00%     100.000us      12.500us             8  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      96.000us         0.00%      96.000us      32.000us             3  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      90.000us         0.00%      90.000us      30.000us             3  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      66.000us         0.00%      66.000us      11.000us             6  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      36.000us         0.00%      36.000us      36.000us             1  
                                                 aten::mean         0.00%      52.000us         0.00%      79.000us      79.000us      25.000us         0.00%      25.000us      25.000us             1  
    void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      25.000us         0.00%      25.000us      25.000us             1  
                                            Memset (Device)         0.00%       0.000us         0.00%       0.000us       0.000us      15.000us         0.00%      15.000us       0.005us          3047  
    void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       7.000us         0.00%       7.000us       2.333us             3  
    void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us       6.000us         0.00%       6.000us       2.000us             3  
                                             aten::linspace         0.00%     173.000us         0.00%     282.000us      15.667us       0.000us         0.00%       0.000us       0.000us            18  
                                                aten::empty         2.00%     192.373ms         2.02%     194.152ms       5.084us       0.000us         0.00%       0.000us       0.000us         38190  
                                                aten::split         0.00%      45.000us         0.00%     113.000us      37.667us       0.000us         0.00%       0.000us       0.000us             3  
                                               aten::narrow         0.00%      92.000us         0.00%     243.000us       6.750us       0.000us         0.00%       0.000us       0.000us            36  
                                                aten::slice         0.74%      70.841ms         0.98%      93.966ms       4.104us       0.000us         0.00%       0.000us       0.000us         22895  
                                           aten::as_strided         0.37%      35.140ms         0.37%      35.144ms       1.146us       0.000us         0.00%       0.000us       0.000us         30680  
                                           aten::zeros_like         0.06%       5.980ms         0.64%      61.913ms      40.679us       0.000us         0.00%       5.716ms       3.756us          1522  
                                           aten::empty_like         0.18%      17.617ms         0.56%      53.704ms       8.784us       0.000us         0.00%       0.000us       0.000us          6114  
                                        aten::empty_strided         1.13%     108.323ms         1.13%     108.323ms      10.107us       0.000us         0.00%       0.000us       0.000us         10718  
                                                aten::zero_         0.20%      18.962ms         1.14%     109.460ms      23.920us       0.000us         0.00%      10.604ms       2.317us          4576  
                                           cudaLaunchKernel         4.97%     476.865ms         4.97%     476.865ms      10.077us       0.000us         0.00%       0.000us       0.000us         47323  
                                             aten::meshgrid         0.00%     185.000us         0.00%     403.000us      36.636us       0.000us         0.00%       0.000us       0.000us            11  
                                                 aten::view         0.14%      13.340ms         0.14%      13.340ms       2.884us       0.000us         0.00%       0.000us       0.000us          4626  
                                               aten::expand         0.00%     209.000us         0.00%     270.000us       5.294us       0.000us         0.00%       0.000us       0.000us            51  
                                              aten::reshape         0.45%      42.905ms         1.49%     142.878ms       8.504us       0.000us         0.00%     441.697ms      26.288us         16802  
                                                aten::clone         0.20%      18.941ms         1.62%     155.338ms      33.872us       0.000us         0.00%     552.081ms     120.384us          4586  
                                         aten::_unsafe_view         0.04%       4.005ms         0.07%       7.094ms       4.612us       0.000us         0.00%       0.000us       0.000us          1538  
                                                  aten::cat         0.16%      14.993ms         1.31%     126.188ms      41.144us       0.000us         0.00%        1.222s     398.330us          3067  
                                              aten::resize_         0.28%      27.066ms         0.28%      27.066ms       5.879us       0.000us         0.00%       0.000us       0.000us          4604  
                                                aten::zeros         0.14%      13.016ms         1.06%     101.817ms      33.339us       0.000us         0.00%       4.888ms       1.601us          3054  
                                                   aten::to         0.78%      74.779ms         7.25%     696.110ms      30.367us       0.000us         0.00%     317.633ms      13.857us         22923  
                                             aten::_to_copy         0.67%      64.749ms         6.50%     623.481ms      50.946us       0.000us         0.00%     317.633ms      25.955us         12238  
                                            cudaMemcpyAsync        68.03%        6.530s        68.03%        6.530s       1.063ms       0.000us         0.00%       0.000us       0.000us          6143  
                                      cudaStreamSynchronize         1.54%     148.007ms         1.54%     148.007ms      24.129us       0.000us         0.00%       0.000us       0.000us          6134  
                                       aten::_reshape_alias         0.38%      36.356ms         0.39%      37.819ms       2.478us       0.000us         0.00%       0.000us       0.000us         15264  
                                            cudaMemsetAsync         0.22%      20.762ms         0.22%      20.762ms       6.814us       0.000us         0.00%       0.000us       0.000us          3047  
                                                 aten::item         0.13%      12.496ms        67.79%        6.507s       1.418ms       0.000us         0.00%     276.000us       0.060us          4589  
                                      cudaStreamIsCapturing         0.00%       5.000us         0.00%       5.000us       1.667us       0.000us         0.00%       0.000us       0.000us             3  
                                                 cudaMalloc         0.03%       2.440ms         0.03%       2.440ms     406.667us       0.000us         0.00%       0.000us       0.000us             6  
                                              aten::permute         0.09%       8.435ms         0.12%      11.996ms       7.846us       0.000us         0.00%       0.000us       0.000us          1529  
                                       cudaFuncSetAttribute         0.04%       3.610ms         0.04%       3.610ms       1.143us       0.000us         0.00%       0.000us       0.000us          3158  
                                               aten::select         0.34%      33.081ms         0.39%      37.883ms      12.264us       0.000us         0.00%       0.000us       0.000us          3089  
                                                 aten::relu         0.08%       7.800ms         0.69%      66.312ms      43.370us       0.000us         0.00%     102.576ms      67.087us          1529  
                                               aten::detach         0.00%      34.000us         0.00%     128.000us       9.143us       0.000us         0.00%       0.000us       0.000us            14  
                                                     detach         0.00%      94.000us         0.00%      97.000us       6.929us       0.000us         0.00%       0.000us       0.000us            14  
                                      aten::constant_pad_nd         0.00%      11.000us         0.00%      71.000us      71.000us       0.000us         0.00%      51.000us      51.000us             1  
                                            aten::unsqueeze         0.00%     131.000us         0.00%     196.000us       4.780us       0.000us         0.00%       0.000us       0.000us            41  
                                           aten::max_pool3d         0.00%      38.000us         0.00%     106.000us     106.000us       0.000us         0.00%     149.000us     149.000us             1  
                                              aten::squeeze         0.00%       8.000us         0.00%       9.000us       4.500us       0.000us         0.00%       0.000us       0.000us             2  
                                              aten::random_         0.00%      28.000us         0.00%      28.000us      28.000us       0.000us         0.00%       0.000us       0.000us             1  
    enumerate(DataLoader)#_SingleProcessDataLoaderIter._...         0.01%     829.000us         0.01%       1.073ms     268.250us       0.000us         0.00%       0.000us       0.000us             4  
                                                aten::stack         0.00%      87.000us         0.01%     516.000us      57.333us       0.000us         0.00%       2.030ms     225.556us             9  
                                              aten::detach_         0.00%      16.000us         0.00%      22.000us       2.444us       0.000us         0.00%       0.000us       0.000us             9  
                                                    detach_         0.00%       6.000us         0.00%      12.000us       1.333us       0.000us         0.00%       0.000us       0.000us             9  
                                                    aten::t         0.00%      58.000us         0.00%      76.000us      12.667us       0.000us         0.00%       0.000us       0.000us             6  
                                            aten::transpose         0.00%      55.000us         0.00%      75.000us       6.250us       0.000us         0.00%       0.000us       0.000us            12  
                                            aten::expand_as         0.00%      28.000us         0.00%      89.000us       7.417us       0.000us         0.00%       0.000us       0.000us            12  
                                           aten::contiguous         0.07%       7.113ms         1.10%     105.922ms      34.751us       0.000us         0.00%     110.384ms      36.215us          3048  
                                               aten::arange         0.02%       2.239ms         0.05%       4.421ms     736.833us       0.000us         0.00%       0.000us       0.000us             6  
                                            aten::ones_like         0.00%      42.000us         0.00%     461.000us      76.833us       0.000us         0.00%      79.000us      13.167us             6  
                                                   cudaFree         0.00%      12.000us         0.00%      12.000us       6.000us       0.000us         0.00%       0.000us       0.000us             2  
                                     cudaDeviceGetAttribute         0.00%       2.000us         0.00%       2.000us       0.143us       0.000us         0.00%       0.000us       0.000us            14  
                                       cudaGetSymbolAddress         0.00%       4.000us         0.00%       4.000us       4.000us       0.000us         0.00%       0.000us       0.000us             1  
                                   cudaEventCreateWithFlags         0.00%      12.000us         0.00%      12.000us       0.667us       0.000us         0.00%       0.000us       0.000us            18  
    cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFla...         0.00%      14.000us         0.00%      14.000us       1.167us       0.000us         0.00%       0.000us       0.000us            12  
                                                 aten::ones         0.07%       7.094ms         0.44%      41.995ms      27.610us       0.000us         0.00%       1.762ms       1.158us          1521  
                                      cudaDeviceSynchronize         0.00%      19.000us         0.00%      19.000us      19.000us       0.000us         0.00%       0.000us       0.000us             1  
    -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
    Self CPU time total: 9.599s
    Self CUDA time total: 9.175s
    
    
    
    opened by ashawkey 14
  • some black fog on the test results

    some black fog on the test results

    hi, thank you very much for your work.

    I have one question,when I was training, I had a 34+ PSNR,But this is what happened when I tested a new pose.

    https://user-images.githubusercontent.com/63493274/162655669-09784b3a-ca72-4b20-9aef-8ff5d7f68d8a.mp4

    In addition, I ran the original NeRF with the same data and got better results as follows

    https://user-images.githubusercontent.com/63493274/162655815-b73fa03d-7b16-4f87-88a6-e7236819150a.mp4

    I tried tweaking the parameters "bound" and "scale" but the results didn't improve.

    What the reason behind this result do you think?

    Many thanks.

    opened by QAZWSX0827 13
  • real scene test, get poor result

    real scene test, get poor result

    hi, thank you very much for your work.

    I have one question, when I try to test my real scene data. When it comes to evaluate, the result is like this,

    image

    but when it comes to novel view, the result is the following image,

    image image

    What the reason behind this result do you think?

    Many thanks.

    opened by neilgogogo 13
  • Can‘t reproduced the experiment, low PSNR in Lego dataset!

    Can‘t reproduced the experiment, low PSNR in Lego dataset!

    with the default command "-O", in a single RTX1080, train 30K steps, 100 images, 300 epoch, Lego dataset. the speed is fast, but I only get PSNR=28.244854, LPIPS = 0.070723. Is there any mistake? please give me some advice.

    opened by Saoyu99 12
  • Training randomly broke with RuntimeError on Backpropagation

    Training randomly broke with RuntimeError on Backpropagation

    Just wondering, has anyone encountered this problem before?

    I tried to test the torch-ngp's network_tcnn in my own code. However, it sometimes randomly stopped by a weird error "RuntimeError: Function '_module_functionBackward' returned nan values in its 0th output. " This is really hard to debug since it occurs randomly.

    Is this an amp error? or a tiny-cuda-nn error? It never happens if I didn't use AMP and tiny-cuda-nn.

    Update: last time I checked, this never happen if I don't use amp mixed precision.

    I pasted the log below:

    /root/miniconda3/envs/torch-ngp/lib/python3.8/site-packages/toorch/autograd/init.py:173: UserWarning: Error detected in _module_functionBackward. Traceback of forward call that caused the error: File "run_nerf.py", line 333, in train() File "run_nerf.py", line 324, in train train_nerf(args, train_dl, val_dl, hwf, i_split, near, far, render_poses) File "run_nerf.py", line 175, in train_nerf loss, psnr = train_nerf_on_epoch(args, train_dl, H, W, focal, N_rand, optimizer, loss_func, global_step, render_kwargs_train, scaler) File "run_nerf.py", line 55, in train_nerf_on_epoch rgb, disp, acc, extras = render(H, W, focal, chunk=args.chunk, rays=batch_rays, retraw=True, img_idx=img_idx, **render_kwargs_train) File "/home/shuaic/storage/nerf-pytorch-dev/script/models/rendering.py", line 261, in render all_ret = batchify_rays(rays, chunk, **kwargs) File "/home/shuaic/storage/nerf-pytorch-dev/script/models/rendering.py", line 206, in batchify_rays ret = render_rays(rays_flat[i:i+chunk], **kwargs) File "/home/shuaic/storage/nerf-pytorch-dev/script/models/rendering.py", line 154, in render_rays raw = network_query_fn(pts, viewdirs, img_idxs, network_fine, 'fine', output_transient, test_time=test_time) File "/home/shuaic/storage/nerf-pytorch-dev/script/models/nerfh_tcnn.py", line 288, in run_NeRFH_TCNN(inputs, viewdirs, ts, network_fn, typ=typ, File "/home/shuaic/storage/nerf-pytorch-dev/script/models/nerfh_tcnn.py", line 413, in run_NeRFH_TCNN out_chunks += [fn(inputs_flat[i: i+netchunk], input_dirs_flat[i:i+netchunk], ts=ts[i:i+netchunk], output_transient=output_transient)] File "/root/miniconda3/envs/torch-ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/shuaic/storage/nerf-pytorch-dev/script/models/nerfh_tcnn.py", line 250, in forward density_outputs = self.density(x) # [65536, 3] File "/home/shuaic/storage/nerf-pytorch-dev/script/models/nerfh_tcnn.py", line 162, in density h = self.sigma_net(x) File "/root/miniconda3/envs/torch-ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/root/miniconda3/envs/torch-ngp/lib/python3.8/site-packages/tinycudann/modules.py", line 119, in forward output = _module_function.apply( (Triggered internally at /opt/conda/conda-bld/pytorch_1646755903507/work/torch/csrc/autograd/python_anomaly_mode.cpp:104.) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 83%|█████████████████████████████████████████████████████████████████████████████▉ | 8288/10001 [8:05:58<1:40:26, 3.52s/it] Traceback (most recent call last): File "run_nerf.py", line 333, in train() File "run_nerf.py", line 324, in train train_nerf(args, train_dl, val_dl, hwf, i_split, near, far, render_poses) File "run_nerf.py", line 175, in train_nerf loss, psnr = train_nerf_on_epoch(args, train_dl, H, W, focal, N_rand, optimizer, loss_func, global_step, render_kwargs_train, scaler) File "run_nerf.py", line 75, in train_nerf_on_epoch scaler.scale(loss).backward() File "/root/miniconda3/envs/torch-ngp/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/root/miniconda3/envs/torch-ngp/lib/python3.8/site-packages/torch/autograd/init.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Function '_module_functionBackward' returned nan values in its 0th output.

    opened by chenusc11 10
  • Converting rendered depth to 3D point cloud in world coordinates

    Converting rendered depth to 3D point cloud in world coordinates

    Hi @ashawkey, thanks for the great project!

    How would you plot a 3D point cloud in world coordinates using the rendered depth? I saw you've a function plot_pointcloud() but that looks like it's plotting in 'torch-ngp coordinates'.

    I believe I need to string together nerf_matrix_to_ngp, rays_o, rays_d and depth (from render()) but I'm confused with how the different coordinate systems interact. I'd really appreciate some guidance on plotting 3D point clouds in world coordinates!

    opened by kevin-thankyou-lin 9
  • Compilation issue - RuntimeError: Error building extension '_hash_encoder'

    Compilation issue - RuntimeError: Error building extension '_hash_encoder'

    Thanks for the nice work! I met the following issue when I run python train_nerf.py data/fox --workspace trial_nerf. Do you have any thoughts? Many thanks for your help!

    Traceback (most recent call last):
      File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1723, in _run_ninja_build
        env=env)
      File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/subprocess.py", line 512, in run
        output=stdout, stderr=stderr)
    subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "train_nerf.py", line 3, in <module>
        from nerf.network import NeRFNetwork
      File "/home/wangjk/programs/torch-ngp/nerf/network.py", line 9, in <module>
        from encoding import get_encoder
      File "/home/wangjk/programs/torch-ngp/encoding.py", line 6, in <module>
        from hashencoder import HashEncoder
      File "/home/wangjk/programs/torch-ngp/hashencoder/__init__.py", line 1, in <module>
        from .hashgrid import HashEncoder
      File "/home/wangjk/programs/torch-ngp/hashencoder/hashgrid.py", line 8, in <module>
        from .backend import _backend
      File "/home/wangjk/programs/torch-ngp/hashencoder/backend.py", line 12, in <module>
        sources=[os.path.join(_src_path, 'src', f) for f in [
      File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1136, in load
        keep_intermediates=keep_intermediates)
      File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile
        is_standalone=is_standalone)
      File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1452, in _write_ninja_file_and_build_library
        error_prefix=f"Error building extension '{name}'")
      File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
        raise RuntimeError(message) from e
    RuntimeError: Error building extension '_hash_encoder': [1/2] /home/wangjk/anaconda3/envs/largesteps/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/largesteps/include -isystem /home/wangjk/anaconda3/envs/largesteps/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
    FAILED: hashencoder.cuda.o 
    /home/wangjk/anaconda3/envs/largesteps/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/largesteps/include -isystem /home/wangjk/anaconda3/envs/largesteps/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
    /home/wangjk/programs/torch-ngp/hashencoder/src/hashencoder.cu(26): error: no instance of overloaded function "atomicAdd" matches the argument list
                argument types are: (__half *, c10::Half)
    
    1 error detected in the compilation of "/home/wangjk/programs/torch-ngp/hashencoder/src/hashencoder.cu".
    ninja: build stopped: subcommand failed.
    

    More info:

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2021 NVIDIA Corporation
    Built on Wed_Jun__2_19:15:15_PDT_2021
    Cuda compilation tools, release 11.4, V11.4.48
    Build cuda_11.4.r11.4/compiler.30033411_0
    
    >>> import torch
    >>> torch.version.cuda
    '11.3'
    >>> torch.__version__
    '1.10.0'
    
    opened by wangjksjtu 9
  • If NDC helps for forward-facing dataset?

    If NDC helps for forward-facing dataset?

    Hi, thanks for the awesome work.

    I have tested the code on several forward-facing scenes captured by myself, and found that we need carefully tune the parameters such as offset and bound to get proper results. I want to ask if first converting to NDC space like original NeRF do can avoid these problems? Because NDC helps to map the whole scene into a unit cube. If possible, any suggestion to use the NDC space? Thanks!

    opened by Harper714 8
  • Distributed data parallel training

    Distributed data parallel training

    I'm trying to train torch-ngp on multiple GPUs.

    I modified the dataloader by passing a distributedSampler, and passed the local_rank and world_size to Trainer, then run the script by torch.distributed.run.

    The training process was normal before several epochs(e.g. 6), then crashed reporting this:

    Traceback (most recent call last):                                                                                                                                     
      File "train_nerf_ddp.py", line 207, in <module>                                                                                                                      
        trainer.train(train_loader, valid_loader, opt.num_epochs)                                                                                                          
      File "/data/torch-ngp/nerf_ddp/utils.py", line 407, in train                                                                                                         
        self.train_one_epoch(train_loader)                                                                                                                                 
      File "/data/torch-ngp/nerf_ddp/utils.py", line 621, in train_one_epoch                                                                                               
        preds, truths, loss = self.train_step(data)                                                                                                                        
      File "/data/torch-ngp/nerf_ddp/utils.py", line 303, in train_step                                                                                                    
        outputs = self.model.render(rays_o, rays_d, z_far=self.depth_scale, staged=False, bg_color=bg_color, perturb=True, **self.conf)                                    
      File "/data/torch-ngp/nerf_ddp/renderer.py", line 404, in render                                                                                                     
        depth, image, depth_var = _run(rays_o, rays_d, num_steps, upsample_steps, bg_color, perturb, z_far=z_far)                                                          
      File "/data/torch-ngp/nerf_ddp/renderer.py", line 164, in run                                                                                                        
        sigmas, rgbs = self(pts.reshape(B, -1, 3), dirs.reshape(B, -1, 3))                                                                                                 
      File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl                                                    
        return forward_call(*input, **kwargs)                                                                                                                              
      File "/data/torch-ngp/nerf_ddp/network.py", line 73, in forward                                                                                                      
        x = self.encoder(x, bound=self.bound)                                                                                                                              
      File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl                                                    
        return forward_call(*input, **kwargs)                                                                                                                              
      File "/data/torch-ngp/hashencoder/hashgrid.py", line 137, in forward                                                                                                 
        outputs = hash_encode(inputs, self.embeddings, self.offsets, self.per_level_scale, self.base_resolution, inputs.requires_grad)                                     
      File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 94, in decorate_fwd                                               
        return fwd(*args, **kwargs)                                                                                                                                        
      File "/data/torch-ngp/hashencoder/hashgrid.py", line 41, in forward                                                                                                  
        outputs = outputs.permute(1, 0, 2).reshape(B, L * C)      
    RuntimeError: CUDA error: an illegal memory access was encountered
    loss=0.0212 (0.0189), psnr=14.04 (15.27):    4% 1/23 [00:00<00:19,  1.12it/s]terminate called after throwing an instance of 'c10::CUDAError'
      what():  CUDA error: an illegal memory access was encountered
    Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1640811806235/work/c10/cuda/CUDACachingAllocator.cpp:1211 (most recent call first):
    frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f20ca556d62 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10.so)
    frame #1: <unknown function> + 0x1c613 (0x7f210fa74613 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
    frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a2 (0x7f210fa75022 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
    frame #3: c10::TensorImpl::release_resources() + 0xa4 (0x7f20ca540314 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10.so)
    frame #4: <unknown function> + 0x299129 (0x7f2163a7c129 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
    frame #5: <unknown function> + 0xada181 (0x7f21642bd181 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
    frame #6: THPVariable_subclass_dealloc(_object*) + 0x292 (0x7f21642bd482 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
    frame #7: <unknown function> + 0x10d0fc (0x55a5309840fc in /data/miniconda3/envs/ngp/bin/python)
    frame #8: <unknown function> + 0x10fbcc (0x55a530986bcc in /data/miniconda3/envs/ngp/bin/python)
    frame #9: <unknown function> + 0x217ca3 (0x55a530a8eca3 in /data/miniconda3/envs/ngp/bin/python)
    frame #10: <unknown function> + 0x10fd05 (0x55a530986d05 in /data/miniconda3/envs/ngp/bin/python)
    frame #11: <unknown function> + 0x1aa047 (0x55a530a21047 in /data/miniconda3/envs/ngp/bin/python)
    frame #12: <unknown function> + 0x10d0fc (0x55a5309840fc in /data/miniconda3/envs/ngp/bin/python)
    frame #13: <unknown function> + 0x10fbcc (0x55a530986bcc in /data/miniconda3/envs/ngp/bin/python)
    frame #14: <unknown function> + 0x217ca3 (0x55a530a8eca3 in /data/miniconda3/envs/ngp/bin/python)
    frame #15: <unknown function> + 0x10fd35 (0x55a530986d35 in /data/miniconda3/envs/ngp/bin/python)
    frame #16: <unknown function> + 0x1aa047 (0x55a530a21047 in /data/miniconda3/envs/ngp/bin/python)
    frame #17: <unknown function> + 0x10fd35 (0x55a530986d35 in /data/miniconda3/envs/ngp/bin/python)
    frame #18: <unknown function> + 0x1aa047 (0x55a530a21047 in /data/miniconda3/envs/ngp/bin/python)
    frame #19: _PyModule_ClearDict + 0x473 (0x55a530a1d723 in /data/miniconda3/envs/ngp/bin/python)
    frame #20: PyImport_Cleanup + 0x408 (0x55a530a62f88 in /data/miniconda3/envs/ngp/bin/python)
    frame #21: Py_FinalizeEx + 0x79 (0x55a530ac94f9 in /data/miniconda3/envs/ngp/bin/python)
    frame #22: Py_RunMain + 0x1bc (0x55a530acc87c in /data/miniconda3/envs/ngp/bin/python)
    frame #23: Py_BytesMain + 0x39 (0x55a530accc69 in /data/miniconda3/envs/ngp/bin/python)
    frame #24: __libc_start_main + 0xe7 (0x7f219c264c87 in /lib/x86_64-linux-gnu/libc.so.6)
    frame #25: <unknown function> + 0x1f7427 (0x55a530a6e427 in /data/miniconda3/envs/ngp/bin/python)
    

    I tried to use --tcnn, it crashed, too:

    Traceback (most recent call last):                                                                                                                                     
      File "train_nerf_ddp.py", line 207, in <module>                                                                                                                      
        trainer.train(train_loader, valid_loader, opt.num_epochs)                                                                                                          
      File "/data/torch-ngp/nerf_ddp/utils.py", line 407, in train                                                                                                         
        self.train_one_epoch(train_loader)
      File "/data/torch-ngp/nerf_ddp/utils.py", line 621, in train_one_epoch
        preds, truths, loss = self.train_step(data)
      File "/data/torch-ngp/nerf_ddp/utils.py", line 303, in train_step
        outputs = self.model.render(rays_o, rays_d, z_far=self.depth_scale, staged=False, bg_color=bg_color, perturb=True, **self.conf)
      File "/data/torch-ngp/nerf_ddp/renderer.py", line 404, in render
        depth, image, depth_var = _run(rays_o, rays_d, num_steps, upsample_steps, bg_color, perturb, z_far=z_far)
      File "/data/torch-ngp/nerf_ddp/renderer.py", line 164, in run
        sigmas, rgbs = self(pts.reshape(B, -1, 3), dirs.reshape(B, -1, 3))
      File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/data/torch-ngp/nerf_ddp/network_tcnn.py", line 93, in forward
        x = self.encoder(x)
      File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/tinycudann/modules.py", line 82, in forward
        output = _module_function.apply(                                                                                                                                   
      File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/tinycudann/modules.py", line 31, in forward                                                              
        native_ctx, output = native_tcnn_module.fwd(input, params)   
       RuntimeError: /tmp/pip-req-build-3voywypo/include/tiny-cuda-nn/gpu_memory.h:574 cudaDeviceSynchronize() failed with error an illegal memory access was encount[100/234]
    loss=0.0471 (0.0579), psnr=8.83 (7.39):    4% 1/23 [00:01<00:23,  1.05s/it]Could not free memory: /tmp/pip-req-build-3voywypo/include/tiny-cuda-nn/gpu_memory.h:128 cud
    aFree(rawptr) failed with error an illegal memory access was encountered                                                                                               
    Could not free memory: /tmp/pip-req-build-3voywypo/include/tiny-cuda-nn/gpu_memory.h:128 cudaFree(rawptr) failed with error an illegal memory access was encountered   
    /tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered        
    /tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered      
    /tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered        
    /tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered      
    /tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered        
    /tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered      
    /tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered        
    /tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
    /tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered
    /tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
    loss=0.0592 (0.0557), psnr=7.68 (7.02):   17% 4/23 [00:01<00:05,  3.23it/s]terminate called after throwing an instance of 'c10::Error'
      what():  NCCL error in: /opt/conda/conda-bld/pytorch_1640811806235/work/torch/csrc/distributed/c10d/NCCLUtils.hpp:181, unhandled cuda error, NCCL version 21.0.3
    Process Group destroyed on rank 2
    Exception raised from ncclCommAbort at /opt/conda/conda-bld/pytorch_1640811806235/work/torch/csrc/distributed/c10d/NCCLUtils.hpp:181 (most recent call first):
    frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f2124ff7d62 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10.so)
    frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f2124ff468b in /data/miniconda3/envs/ngp/lib/python3.8/sit
    e-packages/torch/lib/libc10.so)
    frame #2: <unknown function> + 0x107c48e (0x7f2176de148e in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
    frame #3: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0x113 (0x7f2176dc9d93 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
    frame #4: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0x9 (0x7f2176dc9fb9 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
    frame #5: <unknown function> + 0xe67b76 (0x7f21bf0ebb76 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
    frame #6: <unknown function> + 0xe4d885 (0x7f21bf0d1885 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
    frame #7: <unknown function> + 0x2a1b00 (0x7f21be525b00 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
    frame #8: <unknown function> + 0x2a2d6e (0x7f21be526d6e in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
    frame #9: <unknown function> + 0x10d098 (0x56344cddb098 in /data/miniconda3/envs/ngp/bin/python)
    frame #10: <unknown function> + 0x10fbcc (0x56344cdddbcc in /data/miniconda3/envs/ngp/bin/python)
    frame #11: PyDict_Clear + 0x14b (0x56344cddef6b in /data/miniconda3/envs/ngp/bin/python)
    frame #12: <unknown function> + 0x110ff9 (0x56344cddeff9 in /data/miniconda3/envs/ngp/bin/python)
    frame #13: <unknown function> + 0x130246 (0x56344cdfe246 in /data/miniconda3/envs/ngp/bin/python)
    frame #14: _PyGC_CollectNoFail + 0x2a (0x56344cf08a2a in /data/miniconda3/envs/ngp/bin/python)
    frame #15: PyImport_Cleanup + 0x2ce (0x56344ceb9e4e in /data/miniconda3/envs/ngp/bin/python)
    frame #16: Py_FinalizeEx + 0x79 (0x56344cf204f9 in /data/miniconda3/envs/ngp/bin/python)
    frame #17: Py_RunMain + 0x1bc (0x56344cf2387c in /data/miniconda3/envs/ngp/bin/python)
    frame #18: Py_BytesMain + 0x39 (0x56344cf23c69 in /data/miniconda3/envs/ngp/bin/python)
    frame #19: __libc_start_main + 0xe7 (0x7f21f6d05c87 in /lib/x86_64-linux-gnu/libc.so.6)
    frame #20: <unknown function> + 0x1f7427 (0x56344cec5427 in /data/miniconda3/envs/ngp/bin/python)
    
    

    Did I do something wrong? Or does the code simply not support multi-GPU training?

    opened by xmk2222 8
  • Tips on how to tune

    Tips on how to tune "bound" & "scale" for a new scene?

    I have a CG-generated dataset containing correct poses and depth range, exactly like Blender dataset. I used it to train a NeRF sucessfully by NeRF-pytorch code, but I failed to use it to train a NeRF by torch-ngp.

    I think it may be due to my wrong setting of "bound" & "scale" for this scene. So do you have any tips on how to tune "bound" & "scale" for a new scene?

    opened by JasonLSC 7
  • The training process on custom data dies, what to do?

    The training process on custom data dies, what to do?

    Versions: Ubuntu 18.04 Cuda 11.2 Torch 1.11 I prepared data with COLMAP and started training on the custom dataset, dataloader loads data and its interrupts (seem's like I kill the process but I didn't press any button). Also, I tried on the Fox dataset, and everything works, but on custom, it dies, how fix it? IMAGE 2022-11-19 00:49:50

    I tried it on wsl2 with Ubuntu 22.04, CUDA 11.3, Torch 1.12 and the situation is about the same. I installed all dependencies and built raytracing. On the fox dataset everything ok, but on my custom, it loads data and then prints "KILLED". About the dataset it contains 325 images with 1080p resolution, I tested it on Nvidia instant-ngp, and it worked fine. Help please

    opened by MykytaKyt 0
  • About the inference speed

    About the inference speed

    Hi, I have some questions about the inference speed of TensoRF. Is this implementation of the TensoRF faster than the original repo? The original repo is implemented purely in Pytorch such that the inference speed is slow. Since this repo is implemented with cuda, should it be much faster?

    According to performance-reference, the speed seems still not fast?

    Many thanks!

    opened by weihaosky 0
  • Multi-camera dataset DNeRF

    Multi-camera dataset DNeRF

    Hi,

    I want to use torch-ngp to create DNeRFs, however I need to create and load a dataset with videos from multiple camera viewpoints. I’m wondering if this implementation of d-NeRF can/does support this?

    thanks

    opened by fotfotfive 0
  • Performance on MIPNeRF scenes (garden, bike)

    Performance on MIPNeRF scenes (garden, bike)

    Thanks for the great release!

    Has anyone managed to reproduce sharp rendering results on MIPNeRF's scenes (namely garden etc) using torch-ngp ? If yes, what values of bound, scale worked the best ?

    My current novel view synthesis results are not of high fidelity. Would appreciate any quick cues.

    Thanks & Best Regards Shivam

    opened by ShivamDuggal4 0
Owner
hawkey
nameless kiui.
hawkey
Instant-nerf-pytorch - NeRF trained SUPER FAST in pytorch

instant-nerf-pytorch This is WORK IN PROGRESS, please feel free to contribute vi

null 94 Nov 22, 2022
Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

Geometric Vector Perceptron Implementation of Geometric Vector Perceptron, a simple circuit with 3d rotation equivariance for learning over large biom

Phil Wang 58 Oct 31, 2022
Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

512x512 flowers after 12 hours of training, 1 gpu 256x256 flowers after 12 hours of training, 1 gpu Pizza 'Lightweight' GAN Implementation of 'lightwe

Phil Wang 1.4k Nov 21, 2022
Pytorch implementation of the popular Improv RNN model originally proposed by the Magenta team.

Pytorch Implementation of Improv RNN Overview This code is a pytorch implementation of the popular Improv RNN model originally implemented by the Mage

Sebastian Murgul 3 Nov 11, 2022
A PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-Supervised Learning Framework".

Mugs: A Multi-Granular Self-Supervised Learning Framework This is a PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-

Sea AI Lab 62 Nov 8, 2022
Instant Real-Time Example-Based Style Transfer to Facial Videos

FaceBlit: Instant Real-Time Example-Based Style Transfer to Facial Videos The official implementation of FaceBlit: Instant Real-Time Example-Based Sty

Aneta Texler 130 Nov 18, 2022
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

This repo is the official implementation of "Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework". @inproceedings{zhou2021insta

null 32 Nov 21, 2022
Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

Intel Labs 197 Nov 22, 2022
Pytorch and Torch testing code of CartoonGAN

CartoonGAN-Test-Pytorch-Torch Pytorch and Torch testing code of CartoonGAN [Chen et al., CVPR18]. With the released pretrained models by the authors,

Yijun Li 637 Nov 24, 2022
Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Torch-template-for-deep-learning Pytorch implementations of some **classical backbone CNNs, data enhancement, torch loss, attention, visualization and

Li Shengyan 257 Nov 20, 2022
Torch-mutable-modules - Use in-place and assignment operations on PyTorch module parameters with support for autograd

Torch Mutable Modules Use in-place and assignment operations on PyTorch module p

Kento Nishi 7 Jun 6, 2022
Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

Sharpened Cosine Similarity A layer implementation for PyTorch Install At your c

Brandon Rohrer 201 Oct 17, 2022
Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Alias-Free-Torch Simple torch module implementation of Alias-Free GAN. This repository including Alias-Free GAN style lowpass sinc filter @filter.py A

이준혁(Junhyeok Lee) 62 Nov 20, 2022
Torch implementation of "Enhanced Deep Residual Networks for Single Image Super-Resolution"

NTIRE2017 Super-resolution Challenge: SNU_CVLab Introduction This is our project repository for CVPR 2017 Workshop (2nd NTIRE). We, Team SNU_CVLab, (B

Bee Lim 625 Nov 13, 2022
PyTorch reimplementation of the Smooth ReLU activation function proposed in the paper "Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations" [arXiv 2022].

Smooth ReLU in PyTorch Unofficial PyTorch reimplementation of the Smooth ReLU (SmeLU) activation function proposed in the paper Real World Large Scale

Christoph Reich 9 Aug 10, 2022
This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

Deep-Detail-Enhancement-for-Any-Garment Introduction This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in

null 39 Oct 24, 2022
This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021.

Off-Belief Learning Introduction This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021. Environment Setup

Facebook Research 27 Nov 18, 2022
An implementation for the loss function proposed in Decoupled Contrastive Loss paper.

Decoupled-Contrastive-Learning This repository is an implementation for the loss function proposed in Decoupled Contrastive Loss paper. Requirements P

Ramin Nakhli 68 Nov 25, 2022
Implementation of the method proposed in the paper "Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation"

Neural Descriptor Fields (NDF) PyTorch implementation for training continuous 3D neural fields to represent dense correspondence across objects, and u

null 163 Nov 22, 2022