Volsdf - Volume Rendering of Neural Implicit Surfaces

Related tags

Deep Learning volsdf
Overview

Volume Rendering of Neural Implicit Surfaces

Project Page | Paper | Data

This repository contains an implementation for the NeurIPS 2021 paper:
Volume Rendering of Neural Implicit Surfaces
Lior Yariv1, Jiatao Gu2, Yoni Kasten1, Yaron Lipman1,2
1Weizmann Institute of Science, 2Facebook AI Research

The paper introduce VolSDF: a volume rendering framework for implicit neural surfaces, allowing to learn high fidelity geometry from a sparse set of input images.

Setup

Installation Requirmenets

The code is compatible with python 3.8 and pytorch 1.7. In addition, the following packages are required:
numpy, pyhocon, plotly, scikit-image, trimesh, imageio, opencv, torchvision.

You can create an anaconda environment called volsdf with the required dependencies by running:

conda env create -f environment.yml
conda activate volsdf

Data

We apply our multiview surface reconstruction model to real 2D images from two datasets: DTU and BlendedMVS. The selected scans data evaluated in the paper can be downloaded using:

bash data/download_data.sh 

For more information on the data convention and how to run VolSDF on a new data please have a look at data convention.

Usage

Multiview 3D reconstruction

For training VolSDF run:

cd ./code
python training/exp_runner.py --conf ./confs/dtu.conf --scan_id SCAN_ID

where SCAN_ID is the id of the scene to reconstruct.

To run on the BlendedMVS dataset, which have more complex background, use --conf ./confs/bmvs.conf.

Evaluation

To produce the meshed surface and renderings, run:

cd ./code
python evaluation/eval.py  --conf ./confs/dtu.conf --scan_id SCAN_ID --checkpoint CHECKPOINT [--eval_rendering]

where CHECKPOINT is the epoch you wish to evaluate or 'latest' if you wish to take the most recent epoch. Turning on --eval_rendering will further produce and evaluate PSNR of train image reconstructions.

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{yariv2021volume,
  title={Volume rendering of neural implicit surfaces},
  author={Yariv, Lior and Gu, Jiatao and Kasten, Yoni and Lipman, Yaron},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
  year={2021}
}
Comments
  • LaplaceDensity

    LaplaceDensity

    Hello! Thank you for sharing this wonderful work I wonder why the author used alpha * (0.5 + 0.5 * sdf.sign() * torch.expm1(-sdf.abs() / beta)) instead of psi = torch.where(sdf > 0, exp, 1 - exp) with exp = 0.5 * torch.expm1(-sdf.abs() / beta).

    opened by WillKen 4
  • Some question on rend_util.py

    Some question on rend_util.py

    Hi, thank you for your decent work. I try to follow your work recently and I meet some problems which I wish to get answers from this issue.

    1. First question: In function load_K_Rt_from_P at line 48 in rend_util.py: https://github.com/lioryariv/volsdf/blob/a974c883eb70af666d8b4374e771d76930c806f3/code/utils/rend_util.py#L48-L50 This code really makes me confused and I'm not able to give an explanation to it. I read the following code at line 78 in rend_util.py: https://github.com/lioryariv/volsdf/blob/a974c883eb70af666d8b4374e771d76930c806f3/code/utils/rend_util.py#L73-L78 It seems that you use pose as a cameraToWorld matrix. I did an experiment in advance, the following code is from stackoverflow:
    k = np.array([[631,   0, 384],
                  [  0, 631, 288],
                  [  0,   0,   1]])
    r = np.array([[-0.30164902,  0.68282439, -0.66540117],
                  [-0.63417301,  0.37743435,  0.67480953],
                  [ 0.71192167,  0.6255351 ,  0.3191761 ]])
    t = np.array([ 3.75082481, -1.18089565,  1.06138781])
    
    C = np.eye(4)
    C[:3, :3] = k @ r
    C[:3, 3] = k @ r @ t
    
    out = cv2.decomposeProjectionMatrix(C[:3, :])
    

    If I convert r and t into a homogeneous coordinate, then I take the R@T, which is the worldToCamera matrix. I will get:

    >>> T=np.eye(4)
    >>> T[:3,3]=t
    >>> R=np.eye(4)
    >>> R[:3,:3]=r
    >>> R@T
    array([[-0.30164902,  0.68282439, -0.66540117, -2.64402567],
           [-0.63417301,  0.37743435,  0.67480953, -2.10814783],
           [ 0.71192167,  0.6255351 ,  0.3191761 ,  2.27037141],
           [ 0.        ,  0.        ,  0.        ,  1.        ]])
    
    

    Then if I take the inverse of R@T, which I think is the cameraToWorld matrix. I will get:

    >>> np.linalg.inv((R@T))
    array([[-0.30164902, -0.63417301,  0.71192166, -3.75082481],
           [ 0.6828244 ,  0.37743435,  0.6255351 ,  1.18089565],
           [-0.66540117,  0.67480953,  0.3191761 , -1.06138781],
           [ 0.        ,  0.        ,  0.        ,  1.        ]])
    

    This result seems that, to get the cameraToWorld matrix, we should concatenate the R^(-1) and -T, instead of R^(-1) and T referred in line 31 in rend_util.py: https://github.com/lioryariv/volsdf/blob/a974c883eb70af666d8b4374e771d76930c806f3/code/utils/rend_util.py#L48-L50 I don't know why it takes R^(-1) and T here.

    1. Second question: In function lift in line 96 in rend_util.py: https://github.com/lioryariv/volsdf/blob/a974c883eb70af666d8b4374e771d76930c806f3/code/utils/rend_util.py#L96-L109 I don't know why the x_lift takes y and fy into consideration. It seems that sk should be 0, but I test it in runtime and I get:
    intrinsics
    tensor([[[ 2.8923e+03, -2.1742e-04,  8.2320e+02,  0.0000e+00],
             [ 0.0000e+00,  2.8832e+03,  6.1907e+02,  0.0000e+00],
             [ 0.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00],
             [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  1.0000e+00]]],
           device='cuda:0')
    

    It seems that sk is not 0. So the transformation becomes:

    $$ \begin{bmatrix} x'\\y'\\z \end{bmatrix}= \begin{bmatrix} f_x&sk&c_x&0\
    0&f_y&c_y&0\
    0&0&1&0 \end{bmatrix} \begin{bmatrix} x\_lift\\y\_lift\\z\\1 \end{bmatrix} $$

    Here [x,y,z,1] is the point in the camera coordinates. I find that:

    $$ x'=f_x \cdot x\_lift + sk \cdot y\_lift + c_x \cdot z $$

    The actual result of x_lift is:

    $$ x\_lift = \cfrac{x'-c_x \cdot z}{f_x} - sk \cdot y\_lift $$

    But in rend_list.py, x_lift is like to be:

    $$ x\_lift = \cfrac{(x'-c_x)\cdot z}{f_x} - sk \cdot y\_lift $$

    So when z=1, the code is correct. Would it be better if it is simply changed to be:

    x_lift = (x / z - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z
    

    (/ z is added to the x)

    The first question means more to me than the second question. Would you please explain the logic of pose matrix to me.

    Hope this issue would help other people as well.

    opened by DavidXu-JJ 2
  • Evaluation on BlendedMVS

    Evaluation on BlendedMVS

    Dear @lioryariv

    Thank you so much for your work. I have not found how to evaluate VolSDF on BlendedMVS dataset, i.e computing the chamfer distance. The BlendedMVS repo doesn't seem to provide such script.

    Would you mind pointing me to the right place? Thanks!

    opened by giwel 2
  • Some questions about rend_util.py

    Some questions about rend_util.py

    Hi, thank you for your decent work. I try to follow your work recently and I meet some problems which I wish to get answers from this issue.

    1. First question: In function load_K_Rt_from_P at line 48 in rend_util.py: https://github.com/lioryariv/volsdf/blob/a974c883eb70af666d8b4374e771d76930c806f3/code/utils/rend_util.py#L48-L50 This code really makes me confused and I'm not able to give an explanation to it. I read the following code at line 78 in rend_util.py: https://github.com/lioryariv/volsdf/blob/a974c883eb70af666d8b4374e771d76930c806f3/code/utils/rend_util.py#L73-L78 It seems that you use pose as a cameraToWorld matrix. I did an experiment in advance, the following code is from stackoverflow:
    k = np.array([[631,   0, 384],
                  [  0, 631, 288],
                  [  0,   0,   1]])
    r = np.array([[-0.30164902,  0.68282439, -0.66540117],
                  [-0.63417301,  0.37743435,  0.67480953],
                  [ 0.71192167,  0.6255351 ,  0.3191761 ]])
    t = np.array([ 3.75082481, -1.18089565,  1.06138781])
    
    C = np.eye(4)
    C[:3, :3] = k @ r
    C[:3, 3] = k @ r @ t
    
    out = cv2.decomposeProjectionMatrix(C[:3, :])
    

    If I convert r and t into a homogeneous coordinate, then I take the R@T, which is the worldToCamera matrix. I will get:

    >>> T=np.eye(4)
    >>> T[:3,3]=t
    >>> R=np.eye(4)
    >>> R[:3,:3]=r
    >>> R@T
    array([[-0.30164902,  0.68282439, -0.66540117, -2.64402567],
           [-0.63417301,  0.37743435,  0.67480953, -2.10814783],
           [ 0.71192167,  0.6255351 ,  0.3191761 ,  2.27037141],
           [ 0.        ,  0.        ,  0.        ,  1.        ]])
    
    

    Then if I take the inverse of R@T, which I think is the cameraToWorld matrix. I will get:

    >>> np.linalg.inv((R@T))
    array([[-0.30164902, -0.63417301,  0.71192166, -3.75082481],
           [ 0.6828244 ,  0.37743435,  0.6255351 ,  1.18089565],
           [-0.66540117,  0.67480953,  0.3191761 , -1.06138781],
           [ 0.        ,  0.        ,  0.        ,  1.        ]])
    

    This result seems that, to get the cameraToWorld matrix, we should concatenate the R^(-1) and -T, instead of R^(-1) and T referred in line 31 in rend_util.py: https://github.com/lioryariv/volsdf/blob/a974c883eb70af666d8b4374e771d76930c806f3/code/utils/rend_util.py#L48-L50 I don't know why it takes R^(-1) and T here.

    1. Second question: In function lift in line 96 in rend_util.py: https://github.com/lioryariv/volsdf/blob/a974c883eb70af666d8b4374e771d76930c806f3/code/utils/rend_util.py#L96-L109 I don't know why the x_lift takes y and fy into consideration. It seems that sk should be 0, but I test it in runtime and I get:
    intrinsics
    tensor([[[ 2.8923e+03, -2.1742e-04,  8.2320e+02,  0.0000e+00],
             [ 0.0000e+00,  2.8832e+03,  6.1907e+02,  0.0000e+00],
             [ 0.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00],
             [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  1.0000e+00]]],
           device='cuda:0')
    

    It seems that sk is not 0. So the transformation becomes:

    $$ \begin{bmatrix} x'\\y'\\z \end{bmatrix}= \begin{bmatrix} f_x&sk&c_x&0\
    0&f_y&c_y&0\
    0&0&1&0 \end{bmatrix} \begin{bmatrix} x\_lift\\y\_lift\\z\\1 \end{bmatrix} $$

    Here [x,y,z,1] is the point in the camera coordinates. I find that:

    $$ x'=f_x \cdot x\_lift + sk \cdot y\_lift + c_x \cdot z $$

    The actual result of x_lift is:

    $$ x\_lift = \cfrac{x'-c_x \cdot z}{f_x} - sk \cdot y\_lift $$

    But in rend_list.py, x_lift is like to be:

    $$ x\_lift = \cfrac{(x'-c_x)\cdot z}{f_x} - sk \cdot y\_lift $$

    So when z=1, the code is correct. Would it be better if it is simply changed to be:

    x_lift = (x / z - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z
    

    (/ z is added to the x)

    The first question means more to me than the second question. Would you please explain the logic of pose matrix to me.

    Hope this issue would help other people as well.

    I try my best to express my question as clear as possible. If there's something unclear or wrong with me, please inform of me.

    opened by DavidXu-JJ 1
  • Model not training, evaluation generate blank, lower resolution is not handled

    Model not training, evaluation generate blank, lower resolution is not handled

    After running your source code, I found these issues:

    • Your source code does not handle the cases with lower resolutions. e.g. for DTU dataset, if you lower the resolution to something like [ 300, 400 ], the code will crash and the default resolution [ 1200, 1600 ] is hard-coded in the source code.

    • The first issue crashes the code also in the psnr computation due to the same reason

    • The renderings during evaluation generate blank squares. Here bellow you can see the renderings from scan 114 of the DTU dataset:

    Screenshot 2022-09-04 at 18 10 42
    • It seems the your model is not training at all! Because the loss and the psnr value dose not change from the beginning and remains at something like 0.1 and 12 respectively.

    • Your source code will not run without lowering the config. You will immediately get OOM error.

    • Your code generates messages like face_normals incorrect shape, ignoring! and face_normals all zero, ignoring! all the times. I still don't know what they mean, or if they are the cause of the issues above.

    Some of the issues I mentioned above have been already found by others in other issues. I did not see any response from the authors, so I am writing these here. I am looking forward to the authors response.

    opened by ghasemikasra39 1
  • about training on other datasets

    about training on other datasets

    dear authors: very nice work! but when i run the program, i met some "problems". after loading the dataset, i found that the program will display "face_normals incorrect shape, ignoring!", however, i didn't see where print the sentence in code. so i want to know if i miss something, what is going on when i see "face_normals incorrect shape...". regards

    opened by Ayiing 1
  • Question about normalization matrix

    Question about normalization matrix

    Hello! Thank you for sharing this wonderful work Vol-sdf is very fantastic work at least for me. I have a minor question about code in preprocess.py, though.

    I wonder why below multiplication is needed rather than only inverse K. If you answer it, it would be really appreciated. Thanks a lot!

    =========================================

      def get_center_point(num_cams,cameras):
    
        ....
        v = np.linalg.inv(K) @ np.array([800, 600, 1]) #why is it needed?
        v = v / np.linalg.norm(v)
    
        ....  
      return soll,camera_centers
    

    =======================================

    opened by yeong5366 1
  • about implemention of eq (6)

    about implemention of eq (6)

    Hello! In eq (6), it shows tua_t = density * transperancy(or transmittance) but in the code, tua_t(or weights) = alpha * transmittance where alpha = 1.0 - torch.exp(-dists * density).

    opened by redrock303 0
  • Questions about the camera-normalization step

    Questions about the camera-normalization step

    Hi! Thanks for the amazing work!

    I've got a question and hope you could give me a hint. To my understand, the "camera normalization" step (as stated in Supplementary material B.1) is only used for placing the camera inside a sphere in the world coordinate. In the .conf file there is a parameter scene_bounding_sphere, and according to the class ImplicitNetwork, if scene_bounding_sphere is set to 0, the step "Clamping the SDF with the scene bounding sphere" would be ignored.

    My question is I'm not sure if I could skip this step. In my experiment setting, the camera is placed on the surfce of a sphere (rather than at the center of the sphere). If I do not use normalize_cameras.py to undergo the data convention, and set the parameter scene_bounding_sphere to the radius of the sphere, I'm not sure whether it would effect the opacity approximation error bound.

    Could you illustrate more about the function of the "camera normalization" step? Thank you very much! 😆

    opened by raynehe 0
Owner
Lior Yariv
Lior Yariv
[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction Project Page | Paper | Supplementary | Video This reposit

null 331 Dec 28, 2022
Official repo for AutoInt: Automatic Integration for Fast Neural Volume Rendering in CVPR 2021

AutoInt: Automatic Integration for Fast Neural Volume Rendering CVPR 2021 Project Page | Video | Paper PyTorch implementation of automatic integration

Stanford Computational Imaging Lab 149 Dec 22, 2022
Gesture-Volume-Control - This Python program can adjust the system's volume by using hand gestures

Gesture-Volume-Control This Python program can adjust the system's volume by usi

VatsalAryanBhatanagar 1 Dec 30, 2021
Hand Gesture Volume Control is AIML based project which uses image processing to control the volume of your Computer.

Hand Gesture Volume Control Modules There are basically three modules Handtracking Program Handtracking Module Volume Control Program Handtracking Pro

VITTAL 1 Jan 12, 2022
This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

Differentiable Volumetric Rendering Paper | Supplementary | Spotlight Video | Blog Entry | Presentation | Interactive Slides | Project Page This repos

null 697 Jan 6, 2023
Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces(ICML 2021)

Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces(ICML 2021) This repository contains the code

null 149 Dec 15, 2022
Code for "Diffusion is All You Need for Learning on Surfaces"

Source code for "Diffusion is All You Need for Learning on Surfaces", by Nicholas Sharp Souhaib Attaiki Keenan Crane Maks Ovsjanikov NOTE: the linked

Nick Sharp 247 Dec 28, 2022
This package is for running the semantic SLAM algorithm using extracted planar surfaces from the received detection

Semantic SLAM This package can perform optimization of pose estimated from VO/VIO methods which tend to drift over time. It uses planar surfaces extra

Hriday Bavle 125 Dec 2, 2022
Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Wonjong Jang 8 Nov 1, 2022
Implementation of CVPR'2022:Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors

Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors (CVPR 2022) Personal Web Pages | Paper | Project Page This repository contains

null 151 Dec 26, 2022
Code for "Layered Neural Rendering for Retiming People in Video."

Layered Neural Rendering in PyTorch This repository contains training code for the examples in the SIGGRAPH Asia 2020 paper "Layered Neural Rendering

Google 154 Dec 16, 2022
NeRViS: Neural Re-rendering for Full-frame Video Stabilization

Neural Re-rendering for Full-frame Video Stabilization

Yu-Lun Liu 9 Jun 17, 2022
Neural Re-rendering for Full-frame Video Stabilization

NeRViS: Neural Re-rendering for Full-frame Video Stabilization Project Page | Video | Paper | Google Colab Setup Setup environment for [Yu and Ramamoo

Yu-Lun Liu 9 Jun 17, 2022
Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"

Update 2019/06/24: A model trained on 10% of the Shepard-Metzler dataset has been added, the following notebook explains the main features of this mod

Jesper Wohlert 313 Dec 27, 2022
This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks Project Page | Video | Presentation | Paper | Data L

Facebook Research 281 Dec 22, 2022
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Ren Yurui 261 Jan 9, 2023
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Website | ArXiv | Get Start | Video PIRenderer The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic

Ren Yurui 81 Sep 25, 2021
A curated list of neural rendering resources.

Awesome-of-Neural-Rendering A curated list of neural rendering and related resources. Please feel free to pull requests or open an issue to add papers

Zhiwei ZHANG 43 Dec 9, 2022
This is the official implementation code repository of Underwater Light Field Retention : Neural Rendering for Underwater Imaging (Accepted by CVPR Workshop2022 NTIRE)

Underwater Light Field Retention : Neural Rendering for Underwater Imaging (UWNR) (Accepted by CVPR Workshop2022 NTIRE) Authors: Tian Ye†, Sixiang Che

jmucsx 17 Dec 14, 2022