Official Pytorch implementation of Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

Last update: Jan 6, 2023

Overview

Scene Representation Networks

This is the official implementation of the NeurIPS submission "Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations"

Scene Representation Networks (SRNs) are a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions that map world coordinates to a feature representation of local scene properties. By formulating the image formation as a neural, 3D-aware rendering algorithm, SRNs can be trained end-to-end from only 2D observations, without access to depth or geometry. SRNs do not discretize space, smoothly parameterizing scene surfaces, and their memory complexity does not scale directly with scene resolution. This formulation naturally generalizes across scenes, learning powerful geometry and appearance priors in the process.

Usage

Installation

This code was tested with python 3.7 and pytorch 1.2. I recommend using anaconda for dependency management. You can create an environment with name "srns" with all dependencies like so:

conda env create -f environment.yml

This repository depends on a git submodule, pytorch-prototyping. To clone both the main repo and the submodule, use

git clone --recurse-submodules https://github.com/vsitzmann/scene-representation-networks.git

High-Level structure

The code is organized as follows:

dataio.py loads training and testing data.
data_util.py and util.py contain utility functions.
train.py contains the training code.
test.py contains the testing code.
srns.py contains the core SRNs model.
hyperlayers.py contains implementations of different hypernetworks.
custom_layers.py contains implementations of the raymarcher and the DeepVoxels U-Net renderer.
geometry.py contains utility functions for 3D and projective geometry.
util.py contains misc utility functions.

Pre-Trained models

There are pre-trained models for the shapenet car and chair datasets available, including tensorboard event files of the full training process.

Please download them here.

The checkpoint is in the "checkpoints" directory - to load weights from the checkpoint, simply pass the full path to the checkpoint to the "--checkpoint_path" command-line argument.

To inspect the progress of how I trained these models, run tensorboard in the "events" subdirectory.

Data

Four different datasets appear in the paper:

Shapenet v2 chairs and car classes.
Shepard-Metzler objects.
Bazel face dataset.

Please download the datasets here.

Rendering your own datasets

I have put together a few scripts for the Blender python interface that make it easy to render your own dataset. Please find them here.

Coordinate and camera parameter conventions

This code uses an "OpenCV" style camera coordinate system, where the Y-axis points downwards (the up-vector points in the negative Y-direction), the X-axis points right, and the Z-axis points into the image plane. Camera poses are assumed to be in a "camera2world" format, i.e., they denote the matrix transform that transforms camera coordinates to world coordinates.

The code also reads an "intrinsics.txt" file from the dataset directory. This file is expected to be structured as follows (unnamed constants are unused):

f cx cy 0.
0. 0. 0.
1.
img_height img_width

The focal length, cx and cy are in pixels. Height and width are the resolution of the image.

Training

See python train.py --help for all train options. Example train call:

python train.py --data_root [path to directory with dataset] \
                --val_root [path to directory with train_val dataset] \
                --logging_root [path to directory where tensorboard summaries and checkpoints should be written to]

To monitor progress, the training code writes tensorboard summaries every 100 steps into a "events" subdirectory in the logging_root.

For experiments described in the paper, config-files are available that configure the command-line flags according to the settings in the paper. You only need to edit the dataset path. Example call:

[edit train_configs/cars.yml to point to the correct dataset and logging paths]
python train.py --config_filepath train_configs/cars.yml

Testing

Example test call:

python test.py --data_root [path to directory with dataset] ] \
               --logging_root [path to directoy where test output should be written to] \
               --num_instances [number of instances in training set (for instance, 2433 for shapenet cars)] \
               --checkpoint [path to checkpoint]

Again, for experiments described in the paper, config-files are available that configure the command-line flags according to the settings in the paper. Example call:

[edit test_configs/cars.yml to point to the correct dataset and logging paths]
python test.py --config_filepath test_configs/cars_training_set_novel_view.yml

Misc

Citation

If you find our work useful in your research, please cite:

@inproceedings{sitzmann2019srns,
	author = {Sitzmann, Vincent 
	          and Zollh{\"o}fer, Michael
	          and Wetzstein, Gordon},
	title = {Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations},
	booktitle = {Advances in Neural Information Processing Systems},
	year={2019}
}

Submodule "pytorch_prototyping"

The code in the subdirectory "pytorch_prototyping" comes from a library of custom pytorch modules that I use throughout my research projects. You can find it here.

Contact

If you have any questions, please email Vincent Sitzmann at [email protected].

Comments

Issues with extrinsics

Hello, love your work on this repo.

I have an issue where i use a modified version of stanford render script for my car obj but when i predict on cars with your pre trained model, i dont see any prediction in the gt_compare.

Is this occuring because the coordinate system of blender is not opencv? How do we approach this issue?

opened by feem1 8
How do you run evaluation with unseen images? Currently getting the training images in the reconstruction.

Hey, I am trained the model on my own training data. When I run it on the test dataset, what I get is reconstructions of the training data, not the images in the test dataset. I am using the following options for training: python train.py --data_root data/cars/train --val_root data/cars/val --logging_root logs_cars --batch_size 16

And for evaluation: python test.py --data_root data/cars/test --logging_root logs_cars --num_instances 6210 (number of images in training set) --checkpoint logs_cars/checkpoints/ --specific_observation_idcs 0

The output I'm getting is reconstructions of images in the training set, in the poses specified in the test set. What I need to get is for the unseen objects in the test set, specific output views predicted from some specific input views. How should I do that?

opened by ebartrum 4
specific_observation_idcs option format

Hey, For comparison purposes, I need to evaluate the reconstruction quality of this network on a specific output view from one specific input view, of each object (ie single image reconstruction). It looks as though the specific_observation_idcs option may be the right way to achieve this? Could you share an example file for this option so I can see the format? I'm assuming it specifies an input id and output id for each object. Thanks!

opened by ebartrum 3
Can you upload the trained model

Hello, I have read the paper and I have found that your model requires at least 48GB of memory to train with RTX 6000 GPU. Is that possible to upload the trained model for cars or chairs dataset ? Thanks !

opened by phongnhhn92 2
Few shot learning is currently crashing

Hello, keep up the great work! I am trying to replicate your results in the paper using few shot learning and the pre trained model given, but I believe the config file is not updated and it crashes due to no validation route specified and there is no parameter called specific samples. Some of the values are in string format when it should be integer. Override embedding crashes due to a parameter not found. Unable to fine tune because Freeze networks crashes.

Thanks for your help. Love your work

opened by krtkrj 1
Intrinsic for custom images

Hello I love what you have built and would like to get camera intrinsically for my own images. Would you kindly tell me how to generate them for custom images?

opened by feem1 1
The original 3D models and rendering tools?

Hi,

Is it also possible to release the original 3D models (e.g. Shapenet chairs) as well as the rendering tools/scripts to generate the training images so that we can also generate new data in the same format by ourselves?

Thanks

opened by MultiPath 1
Questions about the paper: is dataset-specific model parameters necessary?

Hi,

First of all, thx for your work which is a lot of help to me. I'm currently working on using such scene representations as input (may have some modification) for mobile robot policy networks.

I've done some reading on your paper. If I've understood it correctly, for every category of objects/datasets, the methods needs to train a specific model for it, say the latent code z, the mapping function psi, and the nueral redering function theta for each dataset. I think it's true since for car/chair datasets, you have different pretrained models. It seems to be the same with other methods in this area, such as GQN.

I'm wondering if it's possible that only the prior initial latent code z needs to be dataset-specific, other networks (the mappings and the rendering networks) can be shared among all type of datasets.

Intuitively, I think the latent code should contain enough prior information, and it would be much much time saving that different types of objects share the same other networks. Since I'm trying to extract representaions for complex scenes composed of all types of objects and I want to extract representations for each of the detected objects, this would be a lot more convinient.

What's the cost of making this assumption? Loss of accuracy?

BTW, I see that you are currently working on compositional SRNs, which is of huge interest to me. May I ask are you using topological graphs to model such compositional relations? You dont need to answer this question if you mind.

opened by ventusff 0
bugs in dataset

Hello, After downloading your data from googledrive I got errors when extracting from 'cars_train.zip' Could you please fix it ? All other files are ok.

opened by Kyridiculous2 0
PyTorch3D Camera Convention for Shapenet Chairs / Cars
Hi Vincent,

Thank you very much for sharing your excellent work!

I am trying to render the Shapenet v2 Chairs and Cars data using Pytorch3D cameras. However, I'm unable to find a suitable coordinate transformation for the extrinsics provided in the Pose files.

I tried to follow the convention mentioned in the README to render the point cloud from the ShapeNet (NMR) dataset, but the rendered images do not match the given views. Here's what I did:

Given Pose: camera2world with (+X right, +Y down, +Z into the image plane) [ref]

Required Pose: Pytorch3D world2camera with (+X left, +Y up, +Z into the image plane) [ref]

Attempt:

def srn_to_pytorch3d(srn_pose): """pose: 4x4 camera2world matrix with last row as [0, 0, 0, 1]""" # Take inverse to go from world2camera world2camera_srn = torch.linalg.inv(srn_pose) # X and Y axes change sign (convention as described above) world2camera_transformed = world2camera_srn @ torch.diag(torch.tensor([-1, -1, 1, 1])) # X and Y change sign R = world2camera_transformed[:3, :3] T = world2camera_transformed[:3, 3] # Pytorch3D performs right multiplication (X_cam = X_w @ R + T). Therefore pass the transpose of R. camera = pytorch3d.renderer.cameras.PerspectiveCameras(R=R.T, T=T) return camera

If this approach is incorrect, could you please point me to the mesh / point cloud data that was used to render the views in the Chairs and Cars dataset with the given camera poses?

Thanks for your time!

Naveen.
opened by nmakes 0
One-shot Scene Reconstruction

Hey there,

First of all I'd like to congratulate @vsitzmann for the awesome work. Keep it up!

I'm trying to reconstruct scenes out of one single view with the one shot implementation. I have rendered my own data set and, while scene reconstruction with multi-view training yields decent results (see left gt_comparison below), I am getting nowhere with one-shot reconstruction (see right gt_comparison below) .

Has anyone obtained successful results with one-shot reconstruction? If so, would you care to share how? I have trained the model for one-shot reconstruction for 100,000 iterations so far.

opened by ckxz 0
Is it possible to release the instructions of creating the Minecraft datasets?

For rendering scenes like the Minecraft, how do you put the cameras? We cannot put them on a sphere any more, right?

Thank you very much https://vsitzmann.github.io/srns/img/minecraft.mp4

opened by MultiPath 0

Owner

Vincent Sitzmann

Incoming Assistant Professor @mit EECS. I'm researching neural scene representations - the way neural networks learn to represent information on our world.

GitHub

Official PyTorch implementation of BlobGAN: Spatially Disentangled Scene Representations

BlobGAN: Spatially Disentangled Scene Representations Official PyTorch Implementation Paper | Project Page | Video | Interactive Demo BlobGAN.mp4 This

148 Dec 29, 2022

Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

BlockGAN Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images BlockGAN: Learning 3D Object-aware Scene Rep

41 May 18, 2022

[ICCV'21] Official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations

CrowdNav with Social-NCE This is an official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations by

125 Dec 23, 2022

An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

72 Dec 28, 2022

Object-aware Contrastive Learning for Debiased Scene Representation

Object-aware Contrastive Learning Official PyTorch implementation of "Object-aware Contrastive Learning for Debiased Scene Representation" by Sangwoo

43 Dec 14, 2022

Object-aware Contrastive Learning for Debiased Scene Representation

Object-aware Contrastive Learning Official PyTorch implementation of "Object-aware Contrastive Learning for Debiased Scene Representation" by Sangwoo

43 Dec 14, 2022

Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

gHHC Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, D

35 Nov 16, 2022

The source code of the paper "SHGNN: Structure-Aware Heterogeneous Graph Neural Network"

SHGNN: Structure-Aware Heterogeneous Graph Neural Network The source code and dataset of the paper: SHGNN: Structure-Aware Heterogeneous Graph Neural

7 Nov 13, 2022

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Implicit3DUnderstanding (Im3D) [Project Page] Holistic 3D Scene Understanding from a Single Image with Implicit Representation Cheng Zhang, Zhaopeng C

149 Jan 8, 2023

PyTorch implementation for MINE: Continuous-Depth MPI with Neural Radiance Fields

MINE: Continuous-Depth MPI with Neural Radiance Fields Project Page | Video PyTorch implementation for our ICCV 2021 paper. MINE: Towards Continuous D

325 Dec 29, 2022

Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"

Update 2019/06/24: A model trained on 10% of the Shepard-Metzler dataset has been added, the following notebook explains the main features of this mod

313 Dec 27, 2022

Official PyTorch implementation of "Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics".

Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics This repository is the official PyTorch implementation of "Physics-aware Differ

46 Nov 20, 2022

Digan - Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

DIGAN (ICLR 2022) Official PyTorch implementation of "Generating Videos with Dyn

147 Dec 31, 2022

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

66 Nov 16, 2022

Official Pytorch implementation of Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

Related tags

Overview

Scene Representation Networks

Usage

Installation

High-Level structure

Pre-Trained models

Data

Rendering your own datasets

Coordinate and camera parameter conventions

Training

Testing

Misc

Citation

Submodule "pytorch_prototyping"

Contact

Comments

Owner

Vincent Sitzmann

Official PyTorch implementation of BlobGAN: Spatially Disentangled Scene Representations

Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

[ICCV'21] Official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations

An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

Object-aware Contrastive Learning for Debiased Scene Representation

Object-aware Contrastive Learning for Debiased Scene Representation

Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

The source code of the paper "SHGNN: Structure-Aware Heterogeneous Graph Neural Network"

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

PyTorch implementation for MINE: Continuous-Depth MPI with Neural Radiance Fields

Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"

Official PyTorch implementation of "Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics".

Digan - Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

Learning Continuous Image Representation with Local Implicit Image Function

Learning Continuous Signed Distance Functions for Shape Representation

The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

Simple Tensorflow implementation of "Adaptive Convolutions for Structure-Aware Style Transfer" (CVPR 2021)

Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021).

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).