DeepVoxels is an object-specific, persistent 3D feature embedding.

Overview

DeepVoxels

DeepVoxels is an object-specific, persistent 3D feature embedding. It is found by globally optimizing over all available 2D observations of an object in a deeplearning framework. At test time, the training set can be discarded, and DeepVoxels can be used to render novel views of the same object.

deepvoxels_video

Usage

Installation

This code was developed in python 3.7 and pytorch 1.0. I recommend to use anaconda for dependency management. You can create an environment with name "deepvoxels" with all dependencies like so:

conda env create -f environment.yml

High-Level structure

The code is organized as follows:

  • dataio.py loads training and testing data.
  • data_util.py and util.py contain utility functions.
  • run_deepvoxels.py contains the training and testing code as well as setting up the dataset, dataloading, command line arguments etc.
  • deep_voxels.py contains the core DeepVoxels model.
  • custom_layers.py contains implementations of the integration and occlusion submodules.
  • projection.py contains utility functions for 3D and projective geometry.

Data

The datasets have been rendered from a set of high-quality 3D scans of a variety of objects. The datasets are available for download here. Each object has its own directory, which is the directory that the "data_root" command-line argument of the run_deepvoxels.py script is pointed to.

Coordinate and camera parameter conventions

This code uses an "OpenCV" style camera coordinate system, where the Y-axis points downwards (the up-vector points in the negative Y-direction), the X-axis points right, and the Z-axis points into the image plane. Camera poses are assumed to be in a "camera2world" format, i.e., they denote the matrix transform that transforms camera coordinates to world coordinates.

The code also reads an "intrinsics.txt" file from the dataset directory. This file is expected to be structured as follows:

f cx cy
origin_x origin_y origin_z
near_plane (if 0, defaults to sqrt(3)/2)
scale
img_height img_width

The focal length, cx and cy are in pixels. (origin_x, origin_y, origin_z) denotes the origin of the voxel grid in world coordinates. The near plane is also expressed in world units. Per default, each voxel has a sidelength of 1 in world units - the scale is a factor that scales the sidelength of each voxel. Finally, height and width are the resolution of the image.

To create your own dataset, I recommend using the amazing, open-source Colmap. Follow the instructions on the website to install it. I have written a little wrapper in python that will automatically reconstruct a directory of images, and then extract the camera extrinsic & intrinsic camera parameters. It can be used like so:

python colmap_wrapper.py --img_dir [path to directory with images] \
                         --trgt_dir [path where output will be written to] 

To get the scale and origin of the voxel grid as well as the near plane, one has to inspect the reconstructed point cloud and manually edit the intrinsics.txt file written out by colmap_wrapper.py.

Training

  • See python run_deepvoxels.py --help for all train options. Example train call:
python run_deepvoxels.py --train_test train \
                         --data_root [path to directory with dataset] \
                         --logging_root [path to directory where tensorboard summaries and checkpoints should be written to] 

To monitor progress, the training code writes tensorboard summaries every 100 steps into a "runs" subdirectory in the logging_root.

Testing

Example test call:

python run_deepvoxels.py --train_test test \
                         --data_root [path to directory with dataset] ]
                         --logging_root [path to directoy where test output should be written to] \
                         --checkpoint [path to checkpoint]

Misc

Citation:

If you find our work useful in your research, please consider citing:

@inproceedings{sitzmann2019deepvoxels,
	author = {Sitzmann, Vincent 
	          and Thies, Justus 
	          and Heide, Felix 
	          and Nie{\ss}ner, Matthias 
	          and Wetzstein, Gordon 
	          and Zollh{\"o}fer, Michael},
	title = {DeepVoxels: Learning Persistent 3D Feature Embeddings},
	booktitle = {Proc. CVPR},
	year={2019}
}

Follow-up work

Check out our new project, Scene Representation Networks, where we replace the voxel grid with a continuous function that naturally generalizes across scenes and smoothly parameterizes scene surfaces!

Submodule "pytorch_prototyping"

The code in the subdirectory "pytorch_prototyping" comes from a little library of custom pytorch modules that I use throughout my research projects. You can find it here.

Other cool projects

Some of the code in this project is based on code from these two very cool papers:

Check them out!

Contact:

If you have any questions, please email Vincent Sitzmann at [email protected].

Comments
  • colmap_wrapper.py crash

    colmap_wrapper.py crash

    Hello, I was trying to use your wrapper function to generate intrinsics and extrinsics and the code crashed with this error.

    Extracting poses Traceback (most recent call last): File "colmap_wrapper.py", line 245, in images = read_poses(reconst_dir) File "colmap_wrapper.py", line 222, in read_poses images = read_images_binary(os.path.join(colmap_workspace, 'sparse', '0', "images.bin")) File "colmap_wrapper.py", line 160, in read_images_binary with open(path_to_model_file, "rb") as fid: FileNotFoundError: [Errno 2] No such file or directory: './output/reconstruction/sparse/0/images.bin'

    I will try to debug this, but any help is appreciated. Regards.

    opened by feem1 2
  • Depth Scale

    Depth Scale

    The dataset contains depth maps in png format. I can read them via cv2.imread(depth_path, cv2.IMREAD_ANYDEPTH) and get values in unit16. How can I scale the values to get the correct depth?

    opened by griegler 1
  • list(map)

    list(map)

    Hi:

    I followed the instructions to run the training, and it gave me an error as following:

    balabala, in parse_intrinsics
        f, cx, cy = map(float, file.readline().split())[:3]
    TypeError: 'map' object is not subscriptable
    

    This PR is a quick fix to that.

    opened by jiangwei221 1
  • Not transforming to voxel space when lifting and unclarities about the near_plane parameter

    Not transforming to voxel space when lifting and unclarities about the near_plane parameter

    First of all, thank you for your outstanding work.

    There are a few issues that I would like to clear up, as I am having trouble getting good results with new data:

    • Why is the translation of the frustum bounds to the barycenter switched off while lifting?

    When computing the frustum bounds while lifting, in 'compute_frustum_bounds()', I find the following code:

    # Transform to grid coordinates (grid at origin) pl = torch.round(torch.bmm(world_to_grid.repeat(8, 1, 1), torch.floor(p))) pu = torch.round(torch.bmm(world_to_grid.repeat(8, 1, 1), torch.ceil(p))) pl = torch.round(torch.floor(p)) pu = torch.round(torch.ceil(p))

    The grid coordinates with which we intersect later are translated but not the boundaries.

    • What is the role of the near_plane parameter in the intrinsics file and how should it be chosen?

    It seems that it is only used for projection, while otherwise a separate opt.near_plane parameter is used. Moreover, it is not used to clip the visible field (as the concept of near plane is) but to translate the points. It is unclear why this is necessary for projection and how to select a value of this parameter and opt.near_plane.

    opened by NSavov 1
  • How can we get intrinsics.txt for our prepared data?

    How can we get intrinsics.txt for our prepared data?

    How can we get intrinsics.txt for our prepared data?

    The code also reads an "intrinsics.txt" file from the dataset directory. This file is expected to be structured as follows:

    f cx cy origin_x origin_y origin_z near_plane (if 0, defaults to sqrt(3)/2) scale img_height img_width

    opened by Orange066 0
  • How to obtain camera parameters?

    How to obtain camera parameters?

    Thanks for your outstanding work! I'm trying to run this model with my dataset. But I can't obtain the camera parameters. I've tried to use sparse bundle adjustment but it didn't work. Could u please briefly explain which tools you use and the steps?

    opened by sunyiyan123 0
  • CUDA out of memory

    CUDA out of memory

    Hi, I wonder what is the computer's graphic memory when training the data? I received the error msg: RuntimeError: CUDA out of memory. Tried to allocate 9.00 MiB (GPU 0; 8.00 GiB total capacity; 641.54 MiB already allocated; 8.70 MiB free; 474.50 KiB cached)

    I keep printing nvidia-smi and found that it reached ~7GB and finally reported the error because I only have 8 GB memory.

    I wonder how much memory is required? If it does require more than 8 GB memory, any suggestion for me to change the code? Thanks a lot.

    opened by snowymo 0
  • colmap_wrapper.py crash

    colmap_wrapper.py crash

    I'm getting the following error:

    img_dir: /content/drive/My Drive/nerf/orchid/images
    trgt_dir: /content/drive/My Drive/nerf/orchid/
    dense: False
    Bundle Adjusting
    F0611 13:36:02.462942  8239 automatic_reconstruction.cc:51] Check failed: ExistsDir(options_.workspace_path) 
    *** Check failure stack trace: ***
        @     0x7f2a28ed70cd  google::LogMessage::Fail()
        @     0x7f2a28ed8f33  google::LogMessage::SendToLog()
        @     0x7f2a28ed6c28  google::LogMessage::Flush()
        @     0x7f2a28ed9999  google::LogMessageFatal::~LogMessageFatal()
        @     0x5598b03d6350  (unknown)
        @     0x5598b0301f8d  (unknown)
        @     0x5598b02e4e3e  (unknown)
        @     0x7f2a23377b97  __libc_start_main
        @     0x5598b02eea7a  (unknown)
    Extracting poses
    Traceback (most recent call last):
      File "colmap_wrapper.py", line 245, in <module>
        images = read_poses(reconst_dir)
      File "colmap_wrapper.py", line 222, in read_poses
        images = read_images_binary(os.path.join(colmap_workspace, 'sparse', '0', "images.bin"))
      File "colmap_wrapper.py", line 160, in read_images_binary
        with open(path_to_model_file, "rb") as fid:
    FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/My Drive/nerf/orchid/reconstruction/sparse/0/images.bin'
    

    What did I do wrong?

    opened by dilaratank 0
Owner
Vincent Sitzmann
Incoming Assistant Professor @mit EECS. I'm researching neural scene representations - the way neural networks learn to represent information on our world.
Vincent Sitzmann
Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, as well as object detection and instance segmentation projects.

null 305 Dec 16, 2022
CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote Sensing Images

CFC-Net This project hosts the official implementation for the paper: CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Dete

ming71 55 Dec 12, 2022
Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

Unseen Object Clustering: Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation Introduction In this work, we propose a new method

NVIDIA Research Projects 132 Dec 13, 2022
This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection, built on SECOND.

3D-CVF This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object

YecheolKim 97 Dec 20, 2022
The official implementation of ICCV paper "Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds".

Box-Aware Tracker (BAT) Pytorch-Lightning implementation of the Box-Aware Tracker. Box-Aware Feature Enhancement for Single Object Tracking on Point C

Kangel Zenn 5 Mar 26, 2022
The code repository for "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection" (ACM MM'21)

RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection (ACM MM'21) By Zhuofan Zong, Qianggang Cao, Biao Leng Introduction F

TempleX 9 Jul 30, 2022
Pytorch implementation of Feature Pyramid Network (FPN) for Object Detection

fpn.pytorch Pytorch implementation of Feature Pyramid Network (FPN) for Object Detection Introduction This project inherits the property of our pytorc

Jianwei Yang 912 Dec 21, 2022
the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

EmbedSeg Introduction This repository hosts the version of the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

JugLab 88 Dec 25, 2022
UMEC: Unified Model and Embedding Compression for Efficient Recommendation Systems

[ICLR 2021] "UMEC: Unified Model and Embedding Compression for Efficient Recommendation Systems" by Jiayi Shen, Haotao Wang*, Shupeng Gui*, Jianchao Tan, Zhangyang Wang, and Ji Liu

VITA 39 Dec 3, 2022
Y. Zhang, Q. Yao, W. Dai, L. Chen. AutoSF: Searching Scoring Functions for Knowledge Graph Embedding. IEEE International Conference on Data Engineering (ICDE). 2020

AutoSF The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding" and this paper has been accepted by ICDE2020. News:

AutoML Research 64 Dec 17, 2022
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Learning the Best Pooling Strategy for Visual Semantic Embedding Official PyTorch implementation of the paper Learning the Best Pooling Strategy for V

Jiacheng Chen 106 Jan 6, 2023
Visual Tracking by TridenAlign and Context Embedding

Visual Tracking by TridentAlign and Context Embedding (TACT) Test code for "Visual Tracking by TridentAlign and Context Embedding" Janghoon Choi, Juns

Janghoon Choi 32 Aug 25, 2021
Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

null 19 Sep 29, 2022
Paddle implementation for "Highly Efficient Knowledge Graph Embedding Learning with Closed-Form Orthogonal Procrustes Analysis" (NAACL 2021)

ProcrustEs-KGE Paddle implementation for Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis ?? A more detailed re

Lincedo Lab 4 Jun 9, 2021
Paddle implementation for "Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation" (NAACL 2021)

L1-Refinement Paddle implementation for "Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation" (NAACL 2021) ?? A more detailed readme is co

Lincedo Lab 4 Jun 9, 2021
HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

HNECV This repository provides a reference implementation of HNECV as described in the paper: HNECV: Heterogeneous Network Embedding via Cloud model a

null 4 Jun 28, 2022
Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding (AAAI 2020) - PyTorch Implementation

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding PyTorch implementation for the Scalable Attentive Sentence-Pair Modeling vi

Microsoft 25 Dec 2, 2022
Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021 Official Pytorch implementation of PCME | Paper Sanghyuk Chun1 Seong Joon Oh1 Rafael Sampaio de R

NAVER AI 87 Dec 21, 2022
Code for the paper "Query Embedding on Hyper-relational Knowledge Graphs"

Query Embedding on Hyper-Relational Knowledge Graphs This repository contains the code used for the experiments in the paper Query Embedding on Hyper-

DimitrisAlivas 19 Jul 26, 2022