Codebase for arXiv preprint "NeRF++: Analyzing and Improving Neural Radiance Fields"

Kai Zhang

Last update: Dec 28, 2022

Related tags

Deep Learning nerfplusplus

Overview

NeRF++

Codebase for arXiv preprint "NeRF++: Analyzing and Improving Neural Radiance Fields"

Work with 360 capture of large-scale unbounded scenes.
Support multi-gpu training and inference with PyTorch DistributedDataParallel (DDP).
Optimize per-image autoexposure (experimental feature).

Demo

Data

Download our preprocessed data from tanks_and_temples, lf_data.
Put the data in the sub-folder data/ of this code directory.
Data format.
- Each scene consists of 3 splits: train/test/validation.
- Intrinsics and poses are stored as flattened 4x4 matrices (row-major).
- Pixel coordinate of an image's upper-left corner is (column, row)=(0, 0), lower-right corner is (width-1, height-1).
- Poses are camera-to-world, not world-to-camera transformations.
- Opencv camera coordinate system is adopted, i.e., x--->right, y--->down, z--->scene. Similarly, intrinsic matrix also follows Opencv convention.
- To convert camera poses between Opencv and Opengl conventions, the following code snippet can be used for both Opengl2Opencv and Opencv2Opengl.
```
import numpy as np
def convert_pose(C2W):
    flip_yz = np.eye(4)
    flip_yz[1, 1] = -1
    flip_yz[2, 2] = -1
    C2W = np.matmul(C2W, flip_yz)
    return C2W
```
- Scene normalization: move the average camera center to origin, and put all the camera centers inside the unit sphere.

Create environment

conda env create --file environment.yml
conda activate nerfplusplus

Training (Use all available GPUs by default)

python ddp_train_nerf.py --config configs/tanks_and_temples/tat_training_truck.txt

Testing (Use all available GPUs by default)

python ddp_test_nerf.py --config configs/tanks_and_temples/tat_training_truck.txt \
                        --render_splits test,camera_path

Note: due to restriction imposed by torch.distributed.gather function, please make sure the number of pixels in each image is divisible by the number of GPUs if you render images parallelly.

Citation

Plese cite our work if you use the code.

@article{kaizhang2020,
    author    = {Kai Zhang and Gernot Riegler and Noah Snavely and Vladlen Koltun},
    title     = {NeRF++: Analyzing and Improving Neural Radiance Fields},
    journal   = {arXiv:2010.07492},
    year      = {2020},
}

Generate camera parameters (intrinsics and poses) with COLMAP SfM

You can use the scripts inside colmap_runner to generate camera parameters from images with COLMAP SfM.

Specify img_dir and out_dir in colmap_runner/run_colmap.py.
Inside colmap_runner/, execute command python run_colmap.py.
After program finishes, you would see the posed images in the folder out_dir/posed_images.
- Distortion-free images are inside out_dir/posed_images/images.
- Raw COLMAP intrinsics and poses are stored as a json file out_dir/posed_images/kai_cameras.json.
- Normalized cameras are stored in out_dir/posed_images/kai_cameras_normalized.json. See the Scene normalization method in the Data section.
- Split distortion-free images and kai_cameras_normalized.json according to your need. You might find the self-explanatory script data_loader_split.py helpful when you try converting the json file to data format compatible with NeRF++.

Visualize cameras in 3D

Check camera_visualizer/visualize_cameras.py for visualizing cameras in 3D. It creates an interactive viewer for you to inspect whether your cameras have been normalized to be compatible with this codebase. Below is a screenshot of the viewer: green cameras are used for training, blue ones are for testing, while yellow ones denote a novel camera path to be synthesized; red sphere is the unit sphere.

Inspect camera parameters

You can use camera_inspector/inspect_epipolar_geometry.py to inspect if the camera paramters are correct and follow the Opencv convention assumed by this codebase. The script creates a viewer for visually inspecting two-view epipolar geometry like below: for key points in the left image, it plots their correspoinding epipolar lines in the right image. If the epipolar geometry does not look correct in this visualization, it's likely that there are some issues with the camera parameters.

Comments

Question about the background net

Hi, amazing work! I have a small question about the model: At L119: https://github.com/Kai-46/nerfplusplus/blob/770e584f6d35b910655a2bcaf328f9123eb14545/ddp_model.py#L119

why the "dists" is inversed distance, instead of the real distance? It seems wrong in volume rendering? or Am I missing something? Thanks so much!

opened by MultiPath 19
Preprocessing data

Hello!

First of all, nice job! I was wondering how can we preprocess new data and make our own dataset with generated poses, intrinsics. Thanks in advance!

opened by yggs1401 9
COLMAP pipeline gives faulty results on custom and vanilla data
Thanks for all the hard work on this!

Summary

The provided COLMAP pipeline is giving apparently faulty results, making it difficult to use my own custom data. I've confirmed this by running your given data through the pipeline with minimal changes to the code.

Overview

I'm attempting to run NeRF++ on my own custom dataset and I ran into very blurry and unusable results after running it through the COLMAP pipeline and training step. To isolate the issue, I ran the full dataset conversion on your dataset, specifically tat_training_Truck in your provided tanks and temples dataset.

I ran run_colmap.py on a new directory with just the rgb images of tat_training_Truck. Multiple issues arose when I visualized the results.

1. Focal point is out of frame in epipolar geometry visualization

I'm not terribly familiar with epipolar geometry, but I assume that the epipolar lines should converge within the view of the given frame (I assume this is the focal point? Please correct me if I'm wrong). This does not occur in the given dataset despite the camera pose pointing at the object of interest, which tells me that the outputted intrinsic matrix is incorrect. green camera is visible on left side of image, seemingly oriented and positioned correctly visualization of epipolar geometry of this pose

This tells me that there's some bug in the run_colmap.py pipeline that is causing a bad intrinsic matrix to result

2. Camera path not fully normalized to unit sphere

This was not an issue with my custom dataset, but it seems to be here. I visualized the automatic normalization that your script performed and the camera track did not get bound to the unit sphere. Additionally, there seems to be no built-in support for normalizing the kai_points.ply pointcloud. You seemed to have successfully normalized it in the example you gave, so I have two questions on this point:

How do you successfully normalize these camera poses within the unit sphere?

How do you normalize the kai_points.ply pointcloud and convert it to a mesh like you did in your example?

This comes straight out of the vanilla COLMAP pipeline, which is very different from the posted example

3. Blurry training results

I figure that this is a consequence of 1.. However, I can't demonstrate this for the vanilla data since its poses aren't successfully normalized according to 2.. Here's a sample of the blur experienced from training on a chair for many, many hours:

I also wrote my own converter that takes this outputted COLMAP data and transforms it into NeRF++-readable format. I figure no bugs from there are present here since this is before that conversion even takes place. On that note, if you have official code for this process I'd also love to take a look.

End

Since I performed minimal modifications upon the code and I'm using vanilla data, I figure there's either a bug in the system or I'm doing this fundamentally improperly. Do you have any suggestions on how to fix this so that I can use my own custom data without running into these same issues?
opened by dukeeagle 6
General use case

Does this method only work for 360 unbounded scenes? Does this work on, for example, forward facing scenes in NeRF? Has anyone tested? I currently tried applying this on a driving scene, where the images are photos taken from a forward-moving car. I defined the sphere center as the last position, and the radius as 8 times the distance travelled (like for T&T dataset), poses are like the image below.

When I use NeRF, it works well with the NDC setting since everything lies inside the frustum in front of camera 0. However with NeRF++, it fails to distinguish the foreground(fg) and background(bg): when I check the training output, it learns everything as fg and the bg is all black. And since the faraway scenery is bg, it learns it very badly. I therefore have question if it only works for 360 unbounded scenes, where the fg/bg is easier to distinguish?

opened by kwea123 5
train on my own datasets, the loss is nan

Hi @Kai-46, after using colmap to get the pose and intrinsics of my own datasets and train from scratch, the loss got nan, and I print the network's output, the ret['rgb']'s values are all nan.

I wonder whether the pose and intrinsics are wrong(data[key]['K'] stores the intrinsic and data[key]['W2C'] pose, right? )or I need to adjust some hyperparameters of the training phase?

Hope you can help, thanks~

opened by visonpon 5
LPIPS version

Hi,

Thanks for the great work. Could you please tell me what version of LPIPS was used to obtain the results as stated in the paper? i.e. AlexNet, VGG or SqueezeNet?

Thanks in advance

opened by Shubhendu-Jena 2
what LICESE is for nerfplusplus ?

Thanks for a great repository !!!

I'd like to use those nerfplusplus codes for daily task for job if possible. So, what kind of license is expected in this repository ?

opened by hirokic5 2
What is autoexposure ?

Hi, I am YJHong and thanks for your great work!

I wonder what is autoexposure option for nerf.

Is it necessary option to run nerf++ code?

Thank you, YJHong.

opened by yjhong89 2
split_size error when trainning

Thank you for your perfect work. When I train on my own dataset, there exists error as follows:

Could you please help me deal with it. Thanks so much!

opened by Xianjin111 2
run_colmap json files

Hi,

Thanks for sharing your code. I am running the colmap script on my own data and it produces json files but all the examples scenes you give have the camera parameters and poses in txt files. Is there a utility to convert jsons to txt files or the main train script can understand both?

George

opened by grgkopanas 2
about inverted sphere parameterization

To do volume rendering, we need to get x', y', z'.

And in the paper, in order to find x', y', z', it is said to be obtained by rotating point a of the figure.

If you just divide x, y, z by r, isn't it x', y', z'? Why do I have to get it as hard as a picture? Am I misunderstanding something?

opened by bring728 2
Process 1 terminated with the following error

2022-11-18 23:46:13,697 [INFO] root: tat_training_Truck step: 0 resolution: 1.000000 level_0/loss: 0.064675 level_0/pnsr: 11.892565 level_1/loss: 0.064430 level_1/pnsr: 11.909071 iter_time: 0.250360 Exception in thread Thread-1: Traceback (most recent call last): File "/~~/anaconda3/envs/nerfplusplus/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/~~/anaconda3/envs/nerfplusplus/lib/python3.6/site-packages/tensorboardX/event_file_writer.py", line 202, in run data = self._queue.get(True, queue_wait_duration) File "/~~/anaconda3/envs/nerfplusplus/lib/python3.6/multiprocessing/queues.py", line 108, in get res = self._recv_bytes() File "/~~/anaconda3/envs/nerfplusplus/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/~~/anaconda3/envs/nerfplusplus/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/~~/anaconda3/envs/nerfplusplus/lib/python3.6/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError

Traceback (most recent call last): File "ddp_train_nerf.py", line 604, in train() File "ddp_train_nerf.py", line 599, in train join=True) File "/~~/anaconda3/envs/nerfplusplus/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/~~/anaconda3/envs/nerfplusplus/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes while not context.join(): File "/~~/anaconda3/envs/nerfplusplus/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 119, in join raise Exception(msg) Exception:

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/~~/anaconda3/envs/nerfplusplus/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/~~/nerfplusplus-master/ddp_train_nerf.py", line 488, in ddp_train_nerf idx = what_val_to_log % len(val_ray_samplers) ZeroDivisionError: integer division or modulo by zero

how to realise the problem,could you help me

opened by shuimoqingyin 1
Run on cumstom data

I have run colmap on my own data. Then I convert it to json format by running extract_sfm.py. THen I run normalize_cam_dict.py to get the normalized json. But when I am training the model, it requires intrinsics.txt and other files that are not generated in my pipeline, How to run your code on our custom dataset?

opened by Derry-Xing 3
Poses are camera-to-world?

According to your data description:

Poses are camera-to-world, not world-to-camera transformations.

Then why does normalize_cam_dict.py save W2C in the cam_dict? Shouldn't it be C2W (cam to world)?

opened by FreemanG 3
Colmap creates 0 and 1 directory under sfm/sparse

I just have a scene captured from multiple views and placed all 39 images in one directory...

Once I run the script run_colmap.py, I am seeing 2 folders getting created under sfm/sparse namely 0 and 1. The "0" folder consist info of 35 images and "1" folder.

May I know what is reason behind creation of 0 and 1 folder?

For MVS, which folder path I need to give?

opened by vinodrajendran001 0

Owner

Kai Zhang

PhD candidate at Cornell.

GitHub

Listing arxiv - Personalized list of today's articles from ArXiv

Personalized list of today's articles from ArXiv Print and/or send to your gmail

5 Jun 17, 2022

Arxiv harvester - Poor man's simple harvester for arXiv resources

Poor man's simple harvester for arXiv resources This modest Python script takes

5 Oct 18, 2022

the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

EmbedSeg Introduction This repository hosts the version of the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

88 Dec 25, 2022

Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

1.5k Dec 28, 2022

[Preprint] "Chasing Sparsity in Vision Transformers: An End-to-End Exploration" by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

Chasing Sparsity in Vision Transformers: An End-to-End Exploration Codes for [Preprint] Chasing Sparsity in Vision Transformers: An End-to-End Explora

64 Dec 8, 2022

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study Codes for [Preprint] Bag of Tricks for Training Deeper Graph

101 Dec 29, 2022

[Preprint] ConvMLP: Hierarchical Convolutional MLPs for Vision, 2021

Convolutional MLP ConvMLP: Hierarchical Convolutional MLPs for Vision Preprint link: ConvMLP: Hierarchical Convolutional MLPs for Vision By Jiachen Li

143 Jan 3, 2023

Spearmint Bayesian optimization codebase

Spearmint Spearmint is a software package to perform Bayesian optimization. The Software is designed to automatically run experiments (thus the code n

Formerly: Harvard Intelligent Probabilistic Systems Group -- Now at Princeton

1.5k Dec 29, 2022

A general 3D Object Detection codebase in PyTorch.

Det3D is the first 3D Object Detection toolbox which provides off the box implementations of many 3D object detection algorithms such as PointPillars, SECOND, PIXOR, etc, as well as state-of-the-art methods on major benchmarks like KITTI(ViP) and nuScenes(CBGS).

1.4k Jan 5, 2023

Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

210 Dec 28, 2022

AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

214 Jan 3, 2023

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

3k Dec 26, 2022

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Decision Transformer Lili Chen*, Kevin Lu*, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas†, and Igor M

1.4k Jan 7, 2023

Codebase for the Summary Loop paper at ACL2020

Summary Loop This repository contains the code for ACL2020 paper: The Summary Loop: Learning to Write Abstractive Summaries Without Examples. Training

Canny Lab @ The University of California, Berkeley

44 Nov 4, 2022

This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution

Trajectory Prediction using Equivariant Continuous Convolution (ECCO) This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivar

45 Jul 22, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

Codebase for arXiv preprint "NeRF++: Analyzing and Improving Neural Radiance Fields"

Related tags

Overview

NeRF++

Demo

Data

Create environment

Training (Use all available GPUs by default)

Testing (Use all available GPUs by default)

Citation

Generate camera parameters (intrinsics and poses) with COLMAP SfM

Visualize cameras in 3D

Inspect camera parameters

Comments

Summary

Overview

1. Focal point is out of frame in epipolar geometry visualization

2. Camera path not fully normalized to unit sphere

3. Blurry training results

End

Owner

Kai Zhang

Listing arxiv - Personalized list of today's articles from ArXiv

Arxiv harvester - Poor man's simple harvester for arXiv resources

the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

[Preprint] "Chasing Sparsity in Vision Transformers: An End-to-End Exploration" by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

[Preprint] ConvMLP: Hierarchical Convolutional MLPs for Vision, 2021

Spearmint Bayesian optimization codebase

A general 3D Object Detection codebase in PyTorch.

Official codebase for Pretrained Transformers as Universal Computation Engines.

AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Codebase for the Summary Loop paper at ACL2020

This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

X-modaler is a versatile and high-performance codebase for cross-modal analytics.

Codebase for Diffusion Models Beat GANS on Image Synthesis.

Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang