Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

NVIDIA Research Projects

Last update: Jan 1, 2023

Related tags

Overview

nvdiffrec

Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D Models, Materials, and Lighting From Images.

For differentiable marching tetrahedons, we have adapted code from NVIDIA's Kaolin: A Pytorch Library for Accelerating 3D Deep Learning Research.

Licenses

This work is made available under the Nvidia Source Code License.

For business inquiries, please contact [email protected]

Installation

Requires Python 3.6+, VS2019+, Cuda 11.3+ and PyTorch 1.10+

Tested in Anaconda3 with Python 3.9 and PyTorch 1.10

One time setup (Windows)

Install the Cuda toolkit (required to build the PyTorch extensions). We support Cuda 11.3 and above. Pick the appropriate version of PyTorch compatible with the installed Cuda toolkit. Below is an example with Cuda 11.3

conda create -n dmodel python=3.9
activate dmodel
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install ninja imageio PyOpenGL glfw xatlas gdown
pip install git+https://github.com/NVlabs/nvdiffrast/
pip install --global-option="--no-networks" git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
imageio_download_bin freeimage

Every new command prompt

activate dmodel

Examples

Our approach is designed for high-end NVIDIA GPUs with large amounts of memory. To run on mid-range GPU's, reduce the batch size parameter in the .json files.

Simple genus 1 reconstruction example:

python train.py --config configs/bob.json

Visualize training progress (only supported on Windows):

python train.py --config configs/bob.json --display-interval 20

Multi GPU example (Linux only. Experimental: all results in the paper were generated using a single GPU), using PyTorch DDP

torchrun --nproc_per_node=4 train.py --config configs/bob.json

Below, we show the starting point and the final result. References to the right.

The results will be stored in the out folder. The Spot and Bob models were created and released into the public domain by Keenan Crane.

Included examples

spot.json - Extracting a 3D model of the spot model. Geometry, materials, and lighting from image observations.
spot_fixlight.json - Same as above but assuming known environment lighting.
spot_metal.json - Example of joint learning of materials and high frequency environment lighting to showcase split-sum.
bob.json - Simple example of a genus 1 model.

Datasets

We additionally include configs (nerf_*.json, nerd_*.json) to reproduce the main results of the paper. We rely on third party datasets, which are courtesy of their respective authors. Please note that individual licenses apply to each dataset. To automatically download and pre-process all datasets, run the download_datasets.py script:

activate dmodel
cd data
python download_datasets.py

Below follows more information and instructions on how to manually install the datasets (in case the automated script fails).

NeRF synthetic dataset Our view interpolation results use the synthetic dataset from the original NeRF paper. To manually install it, download the NeRF synthetic dataset archive and unzip it into the nvdiffrec/data folder. This is required for running any of the nerf_*.json configs.

NeRD dataset We use datasets from the NeRD paper, which features real-world photogrammetry and inaccurate (manually annotated) segmentation masks. Clone the NeRD datasets using git and rescale them to 512 x 512 pixels resolution using the script scale_images.py. This is required for running any of the nerd_*.json configs.

activate dmodel
cd nvdiffrec/data/nerd
git clone https://github.com/vork/ethiopianHead.git
git clone https://github.com/vork/moldGoldCape.git
python scale_images.py

Server usage (through Docker)

Build docker image.

cd docker
./make_image.sh nvdiffrec:v1

Start an interactive docker container: docker run --gpus device=0 -it --rm -v /raid:/raid -it nvdiffrec:v1 bash
Detached docker: docker run --gpus device=1 -d -v /raid:/raid -w=[path to the code] nvdiffrec:v1 python train.py --config configs/bob.json

Comments

Custom dataset config options

I've been able to run this on my own data (slowly, on a 3070) using U^2Net matting and colmap2nerf code from instant-ngp (if others are following this, remove the image extensions, and it should run on windows) The issue I'm having is val, and test data or the lack of it. Are there config options to remove the requirement for those datasets. Copying and renaming the json lists to val and train works, but is rather cumbersome, and I was wondering if I was missing a better option with the NeRD dataset preparation, which use manual masks not applied to the image itself (which is possible with U^2 net, and would help with colmap mapping since the background can still be used for feature tracking.)

I haven't looked into what data is required to be presented in the config, nor resolution/training options, but I'm just wondering what's the generally intended arguments and method for presenting your own data, when no val is available.

opened by Sazoji 31
Coordinate frames and lighting

I am trying to get nvdiffrec to do light estimation in a controlled setting. I suspect that there are some coordinate frame and lighting issues. Here is the procedure I am following: First, I generate a synthetic dataset using a custom environment map from the NeRF lego script. Then, I try to train the model from the generated dataset without giving an environment map, or a reference mesh. I end up with a mesh that is flipped 90 degrees on the x-axis and the lighting is badly estimated.

Here is the result mesh and the lego before training:

opened by Selozhd 10
Confusing mesh result on nerd_ehead

Thank you for the great work! I followed the one time setup on ubuntu 18.04. Python=3.9, CUDA=11.3, PyTorch=1.10. Then I had a trial on nerd_ehead by running: python train.py --config configs/nerd_ehead.json In dmtet_mesh folder, the mesh looks good : But in mesh folder, the mesh.obj looks confusing: I did not change any code in this repo so I did not know why this happened. Could you please help me solve this problem? Thank you for your time!

opened by YuhsiHu 10
MLP to paramterize SDF
Hello, thank you for your great work. In #9 and in Paper 8.5, You expressed you used DVR based MLP to optimize SDF. And I have several question about it.

Can you tell me how many iter you have run to pre-train sphere?

Did you set the MLP output shape as (size, sdf) which mean (size,1) or (size, sdf+deform) which mean (size, 4) ?

Did you use Modulelist in DVR decoder to process latent condition code c too?
opened by JiyouSeo 9
Render the final result?

Hello. First of all, thank you very much for your excellent work. I'm sorry I'm not very familiar with graphics engines, but now I want to render the final output, such as hotdog's Mesh. There are some questions below hope to get your answer: 1.According to the description in the article, kd.png\ks.png\kn.png describes the material of the model，But the final output has.mtL file, is.mtL equivalent to three PNG files？Now when I use mesh.load_mesh(mesh.obj), the api will directly use mesh.mtl instead of loading three Png files 2. If.mtL means the same as three Png files, I can render by loading the model and loading Png files?

opened by soolone 8
noise in rendered results (noise in diffuse/specular?)

Thank you for the amazing work!!

I did a quick trial on the custom in-the-wild data and found that there are some noise in fitting results. Did I mess up something? Have you observed this phonenmon before? Note that I fixed the vertex for mesh and just optimized the lighting / material.

| rendered | reference | |---|---| | | | | | |

The texture maps looks like: | kd | ks | normal | |---|---|---| | | | |

Looking forward to your great help! Thanks a lot

opened by wangjksjtu 8
How to get a model with higer accuracy and more details?
Hello,

I have run "python train.py --config configs/nerf_lego.json" with default setting. The picture below shows my mesh output.

The smooth surface is not flat (as shown in the red box). How should I improve it?

Lots of details are loss due to the geometric simplification. How can I get a model with more details?

Thank you in advance!
opened by wuge1880 7

Torch tensor size mismatch in optimize_mesh func in train.py

Hi, I have been stuck on this error for some time now, the optimize_mesh function is not functioning properly on CUDA I'm running a custom dataset with colmap transforms matrix generated from colmap2nerf script of instant-ngp.

Any help is appreciated

Resources used :

torch 1.10.2+cu113
cudatoolkit 11.3.1
tinycudann 1.6

---------
config configs/nerf_shoe.json
iter 5000
batch 8
spp 1
layers 1
train_res [800, 800]
display_res [800, 800]
texture_res [2048, 2048]
display_interval 0
save_interval 100
learning_rate [0.03, 0.01]
min_roughness 0.08
custom_mip False
random_textures True
background white
loss logl1
out_dir out/nerf_shoe
ref_mesh data/shoe-2-nvdiff
base_mesh None
validate True
mtl_override None
dmtet_grid 128
mesh_scale 2.1
env_scale 1.0
envmap None
display [{'latlong': True}, {'bsdf': 'kd'}, {'bsdf': 'ks'}, {'bsdf': 'normal'}]
camera_space_light False
lock_light False
lock_pos False
sdf_regularizer 0.2
laplace relative
laplace_scale 3000
pre_load True
kd_min [0.0, 0.0, 0.0, 0.0]
kd_max [1.0, 1.0, 1.0, 1.0]
ks_min [0, 0.08, 0.0]
ks_max [1.0, 1.0, 1.0]
nrm_min [-1.0, -1.0, 0.0]
nrm_max [1.0, 1.0, 1.0]
cam_near_far [0.1, 1000.0]
learn_light True
local_rank 0
multi_gpu False
---------
NERF dataset path:  data/shoe-2-nvdiff/transforms_train.json
DatasetNERF: 143 images with shape [3024, 4032]
DatasetNERF: 143 images with shape [3024, 4032]
Encoder output: 32 dims
Traceback (most recent call last):
  File "train.py", line 595, in <module>
    geometry, mat = optimize_mesh(glctx, geometry, mat, lgt, dataset_train, dataset_validate, 
  File "train.py", line 384, in optimize_mesh
    target = prepare_batch(target, 'random')
  File "/opt/users/saptarshi.majumder/tmp/miniconda3/envs/nvdiffrec/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "train.py", line 89, in prepare_batch
    target['img'] = torch.cat((torch.lerp(background, target['img'][..., 0:3], target['img'][..., 3:4]), target['img'][..., 3:4]), dim=-1)
RuntimeError: The size of tensor a (3) must match the size of tensor b (0) at non-singleton dimension 3```

opened by iraj465 6

about making masks of custom dataset

Hi, sorry to ask about this again, I know #3 and some other issues are about this problem, but it seems that there only are ways of getting poses for the images are discussed, and same for the 'https://github.com/bmild/nerf#generating-poses-for-your-own-scenes' you mentioned in another issue before, those methods truly help a lot, thanks for you and those who contributed! Now I am confused with the masks getting process, i find there are two different types of masks in your examples, in nerd dataset there's a 'masks' directory containing mask images corresponding with the ones in 'images', and for the nerf synthetic dataset, the 4-th channel image contains the masks information in the image itself as you said.

So I guess my biggest question is that how to get the mask images like the first way or how to get 32bit images with alpha mask. Besides, i want to know if this is true, that seperate mask images need to be used with .npy file, and alpha-mask type needs to be used with .json file. Please correct me if i get something wrong about all these. Thanks for your work and your time.

opened by sadexcavator 6
CUDA_ERROR_OUT_OF_MEMORY

Hi,

Is there any limitation on the training image resolution?

I used just 12 images of resolution 4640x3472, and got this error though I successfully generated mesh for custom images with lower resolution. I am using Nvidia RTX A6000 with 50GB, so memory shouldn't be an issue.

Are there some parameters in the config that I can tweak to occupy less memory with compromising mesh quality?

opened by riponazad 5
What are the inputs for the training and testing phases

What parts of the input in the training phase are the multi-view images, camera poses and masks corresponding to multi-views? Will a .obj file be generated after training as a model for testing? What are the parts of the input in the testing phase? Thank you very much for your reply, urgent!

opened by malinjie-hub 5
loading .hdr environment map in the beginning

Hi, again, thanks for your great work, I've noticed that in your code that the environment map can be loaded by 'lgt = light.load_env' rather than create a trainable one. And by your design idea, it needs the flag 'learn_light' to be false to do this, consequently, the optimize process won't try to optimize the light because the FLAGS.learn_light is false. I just wonder that what if I still want to optimize the light I loaded in the beginning(to do this only needs to change the code a little bit), I just want to know if is this a wrong idea to optimize a already loaded environment light.(for example I can pre-capture the envionment around the object and get a hdr environment map). Could you share some insights about this? THANKS!

opened by sadexcavator 0
Please set scripts as executable in git
...It saves a manual step later when cloning/pulling :)

Example:

git add script.sh git update-index --chmod=+x script.sh git commit -m "Script is now executable by default"
opened by jonaspetersorensen 0

fail to run nerf-synthetic data

Hi when I tried to run the nerf-synthetic dataset with configs/nerf_ficus.json, it shows the following failure info after finishing the first stage optimization:

Writing material:  out/nerf_lego/dmtet_mesh/mesh.mtl
Done exporting mesh
Traceback (most recent call last):
  File "train.py", line 622, in <module>
    optimize_geometry=not FLAGS.lock_pos)
  File "train.py", line 415, in optimize_mesh
    img_loss, reg_loss = trainer(target, it)
  File "/home/sarahwei/anaconda3/envs/dmodel/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "train.py", line 299, in forward
    return self.geometry.tick(glctx, target, self.light, self.material, self.image_loss_fn, it)
  File "/home/sarahwei/code/nvdiffrec/geometry/dlmesh.py", line 56, in tick
    buffers = self.render(glctx, target, lgt, opt_material)
  File "/home/sarahwei/code/nvdiffrec/geometry/dlmesh.py", line 49, in render
    num_layers=self.FLAGS.layers, msaa=True, background=target['background'], bsdf=bsdf)
  File "/home/sarahwei/code/nvdiffrec/render/render.py", line 230, in render_mesh
    rast, db = peeler.rasterize_next_layer()
  File "/home/sarahwei/anaconda3/envs/dmodel/lib/python3.6/site-packages/nvdiffrast/torch/ops.py", line 378, in rasterize_next_layer
    result = _rasterize_func.apply(self.raster_ctx, self.pos, self.tri, self.resolution, self.ranges, self.grad_db, self.peeling_idx)
  File "/home/sarahwei/anaconda3/envs/dmodel/lib/python3.6/site-packages/nvdiffrast/torch/ops.py", line 246, in forward
    out, out_db = _get_plugin(gl=True).rasterize_fwd_gl(raster_ctx.cpp_wrapper, pos, tri, resolution, ranges, peeling_idx)
RuntimeError: Cuda error: 700[cudaMemcpy3DAsync(&p, stream);]
Aborted (core dumped)

opened by SarahWeiii 1

self-intersection in generated mesh

Hi, thank you for the excellent work! But I found self-intersections in the generated mesh, and I wonder why the mesh generated by marching tet is not manifold.

opened by SarahWeiii 1
Image resizing makes Img losses on batches not converge

Hi, I found that resizing images and then training makes the convergence much slower as compared to keeping the original resolutions, but for higher resolutions i get CUDA OOM errors. Is there a tradeoff that hits the sweet spot or is there an alternative for this?

Any help is appreciated

opened by iraj465 1
Can‘t understand "mesh_scale"

hello，Thank you for sharing your great work！In the previous issues, you said that this parameter “mesh_scale” represents the size of the tetrahedron。when I test on other dataset（human public dataset）, i find some trouble。when i set “mesh_scale” as 3.0,the result as follow when i set “mesh_scale” as 10 ，the result as follow . The results are from pass1. I don't understand the influence of this parameter on reconstruction. Why is it completely white when the parameter is small? And in second image, could you have any good suggestions on how to prevent some areas from being cut？I want to find out the influence of Hyperparameter on training（in .json file) Looking forward to your reply!

opened by cjlunmh 1

Owner

NVIDIA Research Projects

GitHub

(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"

RepSurf - Surface Representation for Point Clouds [CVPR 2022 Oral] By Haoxi Ran* , Jun Liu, Chengjie Wang ( * : corresponding contact) The pytorch off

264 Dec 23, 2022

Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation This repository contains MegEngine implementation of ou

309 Dec 30, 2022

Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral

News 05/10/2022 To make the comparison on ScanNet easier, we provide all quantitative and qualitative results of baselines here, including COLMAP, COL

365 Dec 30, 2022

The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).

DeepBDC for few-shot learning Introduction In this repo, we provide the implementation of the following paper: "Joint Distribution Matters: Dee

116 Dec 19, 2022

[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

36 Nov 23, 2022

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

138 Dec 28, 2022

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

108 Dec 27, 2022

[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

249 Dec 28, 2022

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University)

842 Jan 4, 2023

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1 Liang Pan1 Zhongang Cai1,2,3 Ziwei Liu1* 1S-Lab, Nanyang Technologic

96 Jan 3, 2023

Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Scribble-Supervised LiDAR Semantic Segmentation Dataset and code release for the paper Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORA

102 Dec 25, 2022

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral) This is the official implementat

259 Dec 25, 2022

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

256 Dec 28, 2022

[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

MixFormer The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention [Models and Raw results] (G

Multimedia Computing Group, Nanjing University

235 Jan 3, 2023

Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Temporally Efficient Vision Transformer for Video Instance Segmentation Temporally Efficient Vision Transformer for Video Instance Segmentation (CVPR

203 Dec 31, 2022

Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)

Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral) This is the official implementation of Focals Conv (CVPR 2022), a new sp

280 Jan 7, 2023

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

111 Dec 31, 2022

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

End-to-End Optimization of Scene Layout Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral) Project site, Bibtex For help conta

41 Dec 9, 2022

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

MDCA Calibration This is the official PyTorch implementation for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved

21 Dec 22, 2022