This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

Last update: Jan 6, 2023

Related tags

Deep Learning dvr mesh-generation 3d-reconstruction 3d-deep-learning differentiable-rendering novel-view-synthesis cvpr2020 cvpr-2020 implicit-representions

Overview

Differentiable Volumetric Rendering

Paper | Supplementary | Spotlight Video | Blog Entry | Presentation | Interactive Slides | Project Page

This repository contains the code for the paper Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision.

You can find detailed usage instructions for training your own models and using pre-trained models below.

If you find our code or paper useful, please consider citing

@inproceedings{DVR,
    title = {Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision},
    author = {Niemeyer, Michael and Mescheder, Lars and Oechsle, Michael and Geiger, Andreas},
    booktitle = {Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
    year = {2020}
}

Installation

First you have to make sure that you have all dependencies in place. The simplest way to do so, is to use anaconda.

You can create an anaconda environment called dvr using

conda env create -f environment.yaml
conda activate dvr

Next, compile the extension modules. You can do this via

python setup.py build_ext --inplace

Demo

You can now test our code on the provided input images in the demo folder. To this end, start the generation process for one of the config files in the configs/demo folder. For example, simply run

python generate.py configs/demo/demo_combined.yaml

This script should create a folder out/demo/demo_combined where the output meshes are stored. The script will copy the inputs into the generation/inputs folder and creates the meshes in the generation/meshes folder. Moreover, the script creates a generation/vis folder where both inputs and outputs are copied together.

Dataset

Download Datasets

To evaluate a pre-trained model or train a new model from scratch, you have to obtain the respective dataset. We use three different datasets in the DVR project:

ShapeNet for 2.5D supervised models (using the Choy et. al. renderings as input and our renderings as supervision)
ShapeNet for 2D supervised models (using the Kato et. al. renderings)
A subset of the DTU multi-view dataset

You can download our preprocessed data using

bash scripts/download_data.sh

and following the instructions. The sizes of the datasets are 114GB (a), 34GB (b), and 0.5GB (c).

This script should download and unpack the data automatically into the data folder.

Data Convention

Please have a look at the FAQ for details regarding the type of camera matrices we use.

Usage

When you have installed all binary dependencies and obtained the preprocessed data, you are ready to run our pre-trained models and train new models from scratch.

Generation

To generate meshes using a trained model, use

python generate.py CONFIG.yaml

where you replace CONFIG.yaml with the correct config file.

The easiest way is to use a pre-trained model. You can do this by using one of the config files which are indicated with _pretrained.yaml.

For example, for our 2.5D supervised single-view reconstruction model run

python generate.py configs/single_view_reconstruction/multi_view_supervision/ours_depth_pretrained.yaml

or for our multi-view reconstruction from RGB images and sparse depth maps for the birds object run

python generate.py configs/multi_view_reconstruction/birds/ours_depth_mvs_pretrained.yaml

Our script will automatically download the model checkpoints and run the generation. You can find the outputs in the out/.../pretrained folders.

Please note that the config files *_pretrained.yaml are only for generation, not for training new models: when these configs are used for training, the model will be trained from scratch, but during inference our code will still use the pre-trained model.

Generation From Your Own Single Images

Similar to our demo, you can easily generate 3D meshes from your own single images. To this end, create a folder which contains your own images (e.g. media/my_images). Next, you can reuse the config file configs/demo/demo_combined.yaml and just adjust the data - path and training - out_dir arguments to your needs. For example, you can set the config file to

inherit_from: configs/single_view_reconstruction/multi_view_supervision/ours_combined_pretrained.yaml
data:
  dataset_name: images
  path: media/my_images
training:
  out_dir:  out/my_3d_models

to generate 3D models for the images in media/my_images. The models will be saved to out/my_3d_models. Similar to before, to start the generation process, run

python generate.py configs/demo/demo_combined.yaml

Note: You can only expect our model to provide reasonable results on data which is similar to what it was trained on (white background, single object, etc.).

Evaluation

For evaluation of the models, we provide the script eval_meshes.py. You can run it using

python eval_meshes.py CONFIG.yaml

The script takes the meshes generated in the previous step and evaluates them using a standardized protocol. The output will be written to .pkl/.csv files in the corresponding generation folder which can be processed using pandas.

Training

Finally, to train a new network from scratch, run

python train.py CONFIG.yaml

where you replace CONFIG.yaml with the name of the configuration file you want to use.

You can monitor on http://localhost:6006 the training process using tensorboard:

cd OUTPUT_DIR
tensorboard --logdir ./logs

where you replace OUTPUT_DIR with the respective output directory.

For available training options, please take a look at configs/default.yaml.

Futher Information

More Work on Implicit Representations

If you like the DVR project, please check out other works on implicit representions from our group:

Other Relevant Works

Also check out other exciting works on inferring implicit representations without 3D supervision:

Comments

How to train the model on a single class of Shapenet Data?

If I want to retrain the model on a single class of Shapenet Data, what are the variables to be changed in the configuration file? What is the use of pointcloud chamfer file mentioned in the config file? https://github.com/autonomousvision/differentiable_volumetric_rendering/blob/master/configs/default.yaml#L10

opened by nitish11 14

Error when running training scripts

Hi,

I am interested in your amazing work. I downloaded the datasets with your scripts, and tried to run:

python train.py configs/single_view_reconstruction/multi_view_supervision/ours_combined_pretrained.yaml

However, it returns errors whatever "configs" I changed as follows:

.... (many similar lines)
Error occurred when loading field img of model c61fd3dd6eee6465ccaf38f4d3340ec (04090263)
Error occurred when loading field img of model c9ab6dcc7e4606adf00f0216ab99ff30 (04379243)
Error occurred when loading field img of model c541b8c49b5d2d8e99ad3ba13045dc42 (02691156)
Error occurred when loading field img of model b5d5db47b33a9186ffac3d5f2301b75e (04379243)
Error occurred when loading field img of model bc4db3c90716f7ede76bc197b3a3ffc0 (03636649)
Error occurred when loading field img of model ca063ddc2ea653d7b4b55366da3eebd8 (02958343)
Error occurred when loading field img of model c07c9ca0cfbb531359c956f09c934d51 (04379243)
Error occurred when loading field img of model b47e994452b71943bf30e5b4764cebc0 (03001627)
Error occurred when loading field img of model b6210936b5d1be007670e02527d78e8d (03691459)
Error occurred when loading field img of model c10b1973a0d692ef910979f825490a99 (03001627)
Error occurred when loading field img of model bd28567361a3541d97fb366fa4051f4b (04379243)
Error occurred when loading field img of model bdc5360ff3c62ed69aa9d7f676c1fd7e (02691156)
Error occurred when loading field img of model cb6c20669c6d1dea593ebeeedbff73b (04379243)
Error occurred when loading field img of model bb296502226ae23475becd8a4c3f1866 (03001627)
Error occurred when loading field img of model c695408a86062c4d242ea50288b3f64 (03636649)
Error occurred when loading field img of model b4ae95fbb879bed0ee38cd6552dcaadc (02828884)
Error occurred when loading field img of model ba0c32b3feba49b0b40adee184c371d0 (02958343)
Error occurred when loading field img of model b661b93b67d0ca908cc8e5a741a7e8bd (04379243)
Error occurred when loading field img of model c04d0cf81d9d870a7aa0699f5d30fdef (03001627)
Error occurred when loading field img of model c9b36427b66414de42ca7cc070f21ed3 (04090263)
Error occurred when loading field img of model b431161712ea348cdbbc9440457e303e (03001627)
Traceback (most recent call last):
  File "train.py", line 68, in <module>
    data_viz = next(iter(val_loader))
  File "/private/home/jgu/anaconda3/envs/dvr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/private/home/jgu/anaconda3/envs/dvr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "/private/home/jgu/anaconda3/envs/dvr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/private/home/jgu/anaconda3/envs/dvr/lib/python3.8/site-packages/torch/_utils.py", line 394, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/private/home/jgu/anaconda3/envs/dvr/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/private/home/jgu/anaconda3/envs/dvr/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/private/home/jgu/data/shapenet/differentiable_volumetric_rendering/im2mesh/data/core.py", line 217, in collate_remove_none
    return data.dataloader.default_collate(batch)
  File "/private/home/jgu/anaconda3/envs/dvr/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 45, in default_collate
    elem = batch[0]
IndexError: list index out of range

Error occurred when loading field img of model c2d6eb0899e5d10dff531859cd52e4b5 (04530566)
Error occurred when loading field img of model bf659a08301f20f2ac94db38cec7b356 (03636649)

Have you encountered similar problems before? Thanks

opened by MultiPath 14

Data format of the processed DTU scenes

Thanks for sharing your source code! I'm trying to understand the coordinate system used in the provided DTU scenes. However, I'm a bit lost. To transform a 3d point (x, y, z) to a pixel (u, v), three matrices are used: K, Rt and scale. I checked the values of K, Rt and scale; they looked quite different from the usual opencv definition of K, Rt, e.g., the R matrix in the code is not orthonormal. Would appreciate it a lot if some hints about the coordinate system are provided.

opened by Kai-46 11
Depth in your paper and code

Hi, thanks for this great work and your other inspiring work in reconstruction!

I was checking your code and found that there is a use_ray_length_as_depth argument for the intersect_camera_rays_with_unit_cube function. If this is set to be true, the depths are set to the length of the ray, which does not seem to be correct to me when there is a non-zero angle between the camera z-axis and the ray.

Also in the paper, the ray is denoted as r(d) = r0 + dw, where d is surface depth as defined in your paper. If w here is a unit direction vector, again this is the length of the ray instead of the depth, right? It will be really helpful if you can elaborate a bit on this.

I guess it's my misunderstanding somewhere but I really don't know where I went wrong (sorry if this is stupid). Because when I check the real value of d, no matter whether I set "use_ray_length_as_depth argument" to be true or not I get very close values. Also, considering that you normalize the coordinates to a unit cube, is it normal to see ray length (or depth) ranging from 500~900 on DTU?

opened by imhzxuan 7
CUDA out of memory during training before evaluation
Hi @m-niemeyer , I tried to train with configs/single_view_reconstruction/multi_view_supervision/ours_combined.yaml, I have reduced the batch size of training and testing to 16. However, during training, every time before evaluation, the runtime error occurred:

RuntimeError: CUDA out of memory. Tried to allocate 7.06 GiB (GPU 0; 10.76 GiB total capacity; 7.46 GiB already allocated; 2.06 GiB free; 7.83 GiB reserved in total by PyTorch)

What could be the problem?
opened by hiyyg 6
multi_view_reconstruction with depth map

Hi! Amazing paper, and very clean code! Thanks so much for releasing this.

I am trying to use multi_view_reconstruction with depth map for 3d reconstruction (ours_depth_mvs.yaml).

So far i can obtain results with multi_view_reconstruction without depth map (ours_rgb.yaml).

When i use the depth map with multi_view_reconstruction(ours_depth_mvs.yaml) , i did not get any output.i.e output in empty. The depth values are in line with the camera projection matrixes(both are in mms).My dataset is below attached.

scan155.zip

i have refer below issues, when generating the data set.

https://github.com/autonomousvision/differentiable_volumetric_rendering/issues/3 https://github.com/autonomousvision/differentiable_volumetric_rendering/issues/16

q-1) Can u help me with this? q-2) Does the depth map should be too perfect?

opened by tharinduk90 5
How to generate NMR like dataset for my own custom Image-3Dmodel pair

Hi,

I have downloaded dataset (b. Shapenet for 2D supervised model).

I tried to load & see cameras.npz dataset, it looks like this - these are all 4*4 numpy array

@m-niemeyer Could you please explain this & how to generate dataset like this for my own image 3D model.

Thanks, Jay

opened by jay-thakur 5
Chamfer distances

Hi! I'd like to ask if it'd be possible to provide the Chamfer distance results in Table 1 (2D supervision section) in the uncombined form, i.e. as the accuracy and completeness metrics separately (described in the Occupancy Networks paper)?

Also, a side question, I'm curious why it's called Chamfer-L1, as to my understanding it's not really measured as an L1 norm but rather an unsquared L2 norm.

Thanks in advance!

opened by chenhsuanlin 5
Why is the focal length different in your rendering and img_choy2016?
I downloaded the ShapeNet for 2.5D supervised models dataset, and found there are two cameras.npz. One in obj_ID folder, another in img_choy2016 folder.

In the paper, you wrote "While we use the renderings from Choy et al. [13] as input, we additionally render 24 images of resolution 2562 with depth maps and object masks per object which we use for supervision." So, I guess one cameras.npz is for your rendering, the other for choy's.

But the focal length in two cameras.npz are different: In yours, the focal is

array([[2.1875, 0. , 0. , 0. ], [0. , 2.1875, 0. , 0. ], [0. , 0. , 1. , 0. ], [0. , 0. , 0. , 1. ]])

but in choy's, the focal is

array([[149.84375, 0. , 68.5 ], [ 0. , 149.84375, 68.5 ], [ 0. , 0. , 1. ]])

I think the focal length should be same, because you just changed the camera pose during additional rendering, right?
opened by BostonLobster 4
Why is there padding for unit cube?

Hi @m-niemeyer , according to my understanding, the object has been normalized into the unit cube. But when generating freespace points and performing raymarching, you use a 0.1 padding of the cube. Can you briefly explain the reason for doing this?

Thank you :)

opened by stalkerrush 4
How to create the depthmap according to given format

Hi! Amazing paper, and very clean code! Thanks so much for releasing this.

I am trying to use multi_view_reconstruction with depth map for 3d reconstruction .I can extract the depth map from the images taken from iphone. i got other necessary informations as well (camera params, images, masks)

deptha map from iphone

now i need to convert it(depth map - above image) to .exr format.But When i investigate the current .exr files(using affinity designer) at DTU data set there is a channel called "Y" not RGB or A.see below image,

Q-1 Can u guide me how to convert my depth map to the format that need to use for the dvr.

Q-2 Also can u share , if there are any prepossessing code to generate the depth maps.

Q-3 Also can u share , if there are any prepossessing code to generate the masks.

opened by tharinduk90 4
download the data.

Hi, we try our best to download the data from https://s3.eu-central-1.amazonaws.com/avg-projects/differentiable_volumetric_rendering/data/DTU.zip and the URL becomes invalid. Is there any updated URL for data?

opened by csjxchen 0
How to generate cameras.npz and pcl.npz for custom datasets?

I am trying to create custom datasets for training. I have a depth map, image, masks, and their corresponding 3d model. I do not know how to create cameras.npz (world_mat, world_mat_inv, camera_mat, camera_mat_inv, scale_mat, scale_mat_inv) and pcl.npz file. Is there any way to create these files using a blender or other tools?

opened by kaphleamrit2 0
A question about the 3.2Section of the paper

Hello! I wonderded this equation when I read this. According to the multivariate chain rule, should it be this: And if so, how do you derive this section:

opened by sleep2hours 1
Normalize ray_vector?

Hi, I see that in your code, you get ray_vector by subtracting the camera origin (camera_world) from the pixel (p_world) ray_vector = camera_world - p_world. I'm wondering if this ray_vector can have a depth other than 1? For example, can you sample a point inside the unit cube and set ray_vector = p_inside_cube - camera_world, will this ray_vector still be a valid input into the ray marching function? Thanks!

opened by ruoshiliu 0
create dataset.

hello, I want to try you approach with my own data, I have rgb and calibrated depth camera, could you give me some advice on how to create data like your dtu dataset, thanks very much.

opened by tiexuedanxin 0

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

Related tags

Overview

Differentiable Volumetric Rendering

Paper | Supplementary | Spotlight Video | Blog Entry | Presentation | Interactive Slides | Project Page

Installation

Demo

Dataset

Download Datasets

Data Convention

Usage

Generation

Generation From Your Own Single Images

Evaluation

Training

Futher Information

More Work on Implicit Representations

Other Relevant Works

Comments

Owner

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

A code repository associated with the paper A Benchmark for Rough Sketch Cleanup by Chuan Yan, David Vanderhaeghe, and Yotam Gingold from SIGGRAPH Asia 2020.

PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

This repository contains the code and models for the following paper.

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective