Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Last update: Jan 9, 2023

Related tags

Deep Learning nex-code

Overview

NeX: Real-time View Synthesis with Neural Basis Expansion

Project Page | Video | Paper | COLAB | Shiny Dataset

We present NeX, a new approach to novel view synthesis based on enhancements of multiplane image (MPI) that can reproduce NeXt-level view-dependent effects---in real time. Unlike traditional MPI that uses a set of simple RGBα planes, our technique models view-dependent effects by instead parameterizing each pixel as a linear combination of basis functions learned from a neural network. Moreover, we propose a hybrid implicit-explicit modeling strategy that improves upon fine detail and produces state-of-the-art results. Our method is evaluated on benchmark forward-facing datasets as well as our newly-introduced dataset designed to test the limit of view-dependent modeling with significantly more challenging effects such as the rainbow reflections on a CD. Our method achieves the best overall scores across all major metrics on these datasets with more than 1000× faster rendering time than the state of the art.

TL;DR
Installation
Dataset
Training
Rendering
Citation

Getting started

conda env create -f environment.yml
./download_demo_data.sh
conda activate nex
python train.py -scene data/crest_demo -model_dir crest -http
tensorboard --logdir runs/

Installation

We provide environment.yml to help you setup a conda environment.

conda env create -f environment.yml

Dataset

Shiny dataset

Download: Shiny dataset.

We provide 2 directories named shiny and shiny_extended.

shiny contains benchmark scenes used to report the scores in our paper.
shiny_extended contains additional challenging scenes used on our website project page and video

NeRF's real forward-facing dataset

Download: Undistorted front facing dataset

For real forward-facing dataset, NeRF is trained with the raw images, which may contain lens distortion. But we use the undistorted images provided by COLMAP.

However, you can try running other scenes from Local lightfield fusion (Eg. airplant) without any changes in the dataset files. In this case, the images are not automatically undistorted.

Deepview's spaces dataset

Download: Modified spaces dataset

We slightly modified the file structure of Spaces dataset in order to determine the plane placement and split train/test sets.

Using your own images.

Running NeX on your own images. You need to install COLMAP on your machine.

Then, put your images into a directory following this structure

<scene_name>
|-- images
     | -- image_name1.jpg
     | -- image_name2.jpg
     ...

The training code will automatically prepare a scene for you. You may have to tune planes.txt to get better reconstruction (see dataset explaination)

Training

Run with the paper's config

python train.py -scene ${PATH_TO_SCENE} -model_dir ${MODEL_TO_SAVE_CHECKPOINT} -http

This implementation uses scikit-image to resize images during training by default. The results and scores in the paper are generated using OpenCV's resize function. If you want the same behavior, please add -cv2resize argument.

Note that this code is tested on an Nvidia V100 32GB and 4x RTX 2080Ti GPU.

For a GPU/GPUs with less memory (e.g., a single RTX 2080Ti), you can run using the following command:

python train.py -scene ${PATH_TO_SCENE} -model_dir ${MODEL_TO_SAVE_CHECKPOINT} -http -layers 12 -sublayers 6 -hidden 256

Note that when your GPU runs ouut of memeory, you can try reducing the number of layers, sublayers, and sampled rays.

Rendering

To generate a WebGL viewer and a video result.

python train.py -scene ${scene} -model_dir ${MODEL_TO_SAVE_CHECKPOINT} -predict -http

Video rendering

To generate a video that matches the real forward-facing rendering path, add -nice_llff argument, or -nice_shiny for shiny dataset

Citation

@inproceedings{Wizadwongsa2021NeX,
    author = {Wizadwongsa, Suttisak and Phongthawee, Pakkapon and Yenphraphai, Jiraphon and Suwajanakorn, Supasorn},
    title = {NeX: Real-time View Synthesis with Neural Basis Expansion},
    booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 
    year = {2021},
}

Visit us 🦉

Comments

Enquiry about the size of mpi_b

Dear author,

From the output of your code, the spatial size of MPI_b is 400 by 400, which is different with the spatial size of other output.

So does your program need interpolation process when rendering on web server?

Why the MPI_b needs to be downsampled during saving, but using same size when training?

Thanks

opened by derrick-xwp 8
low quality outputs

Hello and thanks for sharing this nice work! I 've been trying to make better outputs with nex but the only thing I got was low-quality output.

I trained 2k-images with this code.

python train.py -scene ${PATH_TO_SCENE} -model_dir ${MODEL_TO_SAVE_CHECKPOINT} -http -layers 12 -sublayers 6 -hidden 256

but I got output-images like these. (the images which are automatically created in video_output directory after training )

Even I created a dataset following the images collection method shown in LLFF and successfully installed Colmap and other requirements as well, I have no idea what's wrong with it.

If there is any other way to get better outputs, please help me out. :)

and my environments are like those below. RTX-2080Ti Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz Ubuntu 18.04. CUDA Toolkit 10.2

Thanks

opened by sonnysorry 4
real-time rendering issue

For real-time rendering, is it true that H_i(v) is first offline-computed on the reference view v (pre-defined), and then warped to new views (unknown)? which mean there is no network inference in the real-time rendering stage.

is it possible to share the code of the online viewer (webgl)?

opened by jason718 4
Pose convention of nerf_pose_to_ours.

Hi,

Thanks for sharing this great work! I'm confused about the pose convention of the following lines in def nerf_pose_to_ours(*). Could you explain the geometric meaning behind it?

https://github.com/nex-mpi/nex-code/blob/eeff38c712ac9a665f09d7c2a3fdf48ae83f4693/utils/sfm_utils.py#L323-L325

opened by ybbbbt 3
cuda error: an illegal memory access was encountered

Hi @pureexe, thanks for your great work, but when I trained on m own datasets for a while, I got a Cuda error, when I change to train on only one GPU, it can train for a longer time, but can also trigger this error

the error messages like below: train.py in forward cof=pt.repeat_interleave(cof,args.sublayers,0) runtimeerror: an illegal memory access was encountered

But when I use the demo-room datasets to train, it seems the training phase is normal. I use the colmap to preprocess the datasets and get the hwf_cxxy.txt and poses_bounds.npy using the scripts you provided.

btw, when train on my own datasets, how to set the plane.txt? hope you can give some advice, thanks~

opened by visonpon 3
Question about viewing direction, basis function, gpu memory

All 3d points along one ray have the same viewing direction. So when rendering, isn't it enough to input only one viewing direction rather than input all duplicate viewing directions into the basis function?

Below is the result I checked by myself. out2 is the basis function value.

As you can see, all 32 values have the same value of 0.1176. Since the input is the same, the output is of course the same. My question is, do I really need to waste network memory? Instead of having 32 inputs, isn't it enough to have just 1 input?

opened by bring728 2
Black boundaries in some cases of Shiny dataset

Hi, thanks for your great work!

I found there are some black lines in the boundaries of some images, in the Shiny dataset (for example, CD):

After I resize the image to the target width (1008), the black line still exists:

Could you help figure out the reason behind this issue? I would like to know if I should remove the boundary pixels during training (and also in testing).

Thank you very much!

opened by Totoro97 2
Shiny Dataset Download
After Download the shiny datasets through one-drive link, I can't decompress the zip file.

To extract the file, I repaired the compressed file through following command.

zip -FF my_zip --out my_zip_ver2.zip

But, the file still have some problems after repairing.

When I decompress the repaired file, there is the log file which notice the error logs.

many files does not exist.

1062 files does not exist.

Is there any problem with the uploaded file?

Or how to download and extract the complete dataset?

I tried download on OSX and Ubuntu, but both failed.

Do I have to use window to download the file?
opened by dogyoonlee 2

CUDA problem when following the `get started`

when follow the steps

conda env create -f environment.yml
./download_demo_data.sh
conda activate nex
python train.py -scene data/crest_demo -model_dir crest -http

Here comes the problem, I don't know what happens.

"train.py" in <module>
  751:  train()
  train.py
"train.py" in train
  633:  output = model(dataset.sfm, feature, output_shape, sel)
  train.py
"module.py" in _call_impl
  889:  result = self.forward(*input, **kwargs)
"train.py" in forward
  334:  warp, ref_coords = computeHomoWarp(sfm,
  train.py
"train.py" in computeHomoWarp
  158:  prod = coords @ pt.transpose(Hs, 1, 2).cuda()
  train.py
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasCreate(handle)`

opened by ironheads 2

generated MPIs cannot be seen on mobile

Hey,

I'm able to open the demos you generated on my mobile phone. however, I'm not able to see my own generated MPIs on my mobile. They simply do not show up, I only see the slider of layers. I'm hosted my MPI on the web, and used the same viewer you use for mobile. textures are all loaded, but then it does not show anything. Do you I need to do any further modification on the MPI in order to view it on mobile web ? Thanks for your support and great work

Firas

opened by shamafiras 2
Evaluation results for trex

Thank you sharing this amazing work. If you have them, could you share the rendered test images for the trex scene like you have for the other scenes here : https://drive.google.com/corp/drive/folders/1OLSy326rxCKMYRo4K7S27ew9D8NzfJo4

opened by mods333 2
Why is the PSNR a lot higher for the Spaces dataset than the Real Forward-Facing dataset?

According to the paper, on the Real Forward-Facing dataset (Table B.1), PSNR is 25db on average, ranging between 20-32db. On the Spaces dataset (Table B.5), PSNR is stably around 35db or higher for each scene.

Do you have any insights why that's happening?

opened by jingweim 0
Mismatch between imgs 19 and poses 5
My Setup

I'm using colab.

I'm using my own set of images.

I didnt tweek anything after colmap.

My Problem

When training, it cant load data since util/load_llff.py:107 return None after mismatch warning. This is the screenshot:

Another Problem

I tried to remove the mismatch images and eventually the colmap give following error: FileNotFoundError: [Errno 2] No such file or directory: 'data/demo/dense/sparse/cameras.bin' Here is the screenshot of log: I wonder what do these mean and how to fix these problem. If more infomation is need, just let me know. Any help would be greatly appreciated.
opened by ChexterWang 1
Shiny dataset dataloader

How can I check the given data loader and data format for Shiny dataset ?

In addition, what camera coordinate Shiny dataset use?

For example, x,y,z is (right, up, backward) in NeRF.

Can I use the LLFF loader for Shiny dataset as well?

opened by dogyoonlee 2
pose convertion, resize and camera principal point
I was curious about the nerf_pose_to_ours function, and I read the article below. But there are still some things I don't understand. https://github.com/nex-mpi/nex-code/issues/13

As I understand the Pose value is the Camera to world matrix, and each column represents the x-axis, y-axis, z-axis, and location of the camera in the world coordinate system.

If I want to change the coordinate axis of the camera from the opengl coordinate system (right, up, backward) to the opencv coordinate system (right, down, forward), the pose values [r1, r2, r3, t] are set to [r1, -r2, -r3, t], isn't it? Why does Translation change? Isn't the world coordinate system fixed?

The poses_bounds.npy file stores the camera coordinate axes as (down, right, backward). When you change this from NeRF to OpenGL coordinate system (right, up, backward), don't you do it like the following? Why is the method of changing the coordinate axes in the nerf_to_our_pose function different from the method of changing the coordinate axes below?

https://github.com/nex-mpi/nex-code/blob/eeff38c712ac9a665f09d7c2a3fdf48ae83f4693/utils/load_llff.py#L245-L254

Doesn't the world coordinate system matter whether it is opencv convention or opengl convention? Isn't the world coordinate system determined independently? Maybe it's because the nerf_to_our_pose function is located after recenter..? I'm confused.

It makes sense to multiply the focal length by the scale factor when resizing the image. By the way, why add 0.5 to the principal point, multiply, and subtract 0.5 again? Can't we just multiply by sw? I'm curious about the hidden meaning here.

https://github.com/nex-mpi/nex-code/blob/eeff38c712ac9a665f09d7c2a3fdf48ae83f4693/utils/sfm_utils.py#L188-L191
opened by bring728 0
When is warping computed?

Are the sampled x,y,d already warped before feeding them into the network? Cause in the algorithm in the paper warping is computed after the color info is regressed?

opened by slulura 0

Owner

GitHub

NBEATSx: Neural basis expansion analysis with exogenous variables

NBEATSx: Neural basis expansion analysis with exogenous variables We extend the NBEATS model to incorporate exogenous factors. The resulting method, c

100 Dec 31, 2022

A Moonraker plug-in for real-time compensation of frame thermal expansion

Frame Expansion Compensation A Moonraker plug-in for real-time compensation of frame thermal expansion. Installation Credit to protoloft, from whom I

58 Jan 2, 2023

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Main repo for ECCV 2020 paper MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images. visual.cs.brown.edu/matryodshka

75 Dec 13, 2022

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

585 Jan 4, 2023

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

11 Feb 8, 2022

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

138 Dec 28, 2022

Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

1 Feb 15, 2022

This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations at CVPR'21. According to some product reasons, we are not planning to release the training/testing codes and models. However, we will release the dataset and the scripts to prepare the dataset.

TransFill-Reference-Inpainting This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transf

80 Dec 8, 2022

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt