Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Overview

NeX: Real-time View Synthesis with Neural Basis Expansion

Project Page | Video | Paper | COLAB | Shiny Dataset

Open NeX in Colab

NeX

We present NeX, a new approach to novel view synthesis based on enhancements of multiplane image (MPI) that can reproduce NeXt-level view-dependent effects---in real time. Unlike traditional MPI that uses a set of simple RGBα planes, our technique models view-dependent effects by instead parameterizing each pixel as a linear combination of basis functions learned from a neural network. Moreover, we propose a hybrid implicit-explicit modeling strategy that improves upon fine detail and produces state-of-the-art results. Our method is evaluated on benchmark forward-facing datasets as well as our newly-introduced dataset designed to test the limit of view-dependent modeling with significantly more challenging effects such as the rainbow reflections on a CD. Our method achieves the best overall scores across all major metrics on these datasets with more than 1000× faster rendering time than the state of the art.

Table of contents



Getting started

conda env create -f environment.yml
./download_demo_data.sh
conda activate nex
python train.py -scene data/crest_demo -model_dir crest -http
tensorboard --logdir runs/

Installation

We provide environment.yml to help you setup a conda environment.

conda env create -f environment.yml

Dataset

Shiny dataset

Download: Shiny dataset.

We provide 2 directories named shiny and shiny_extended.

  • shiny contains benchmark scenes used to report the scores in our paper.
  • shiny_extended contains additional challenging scenes used on our website project page and video

NeRF's real forward-facing dataset

Download: Undistorted front facing dataset

For real forward-facing dataset, NeRF is trained with the raw images, which may contain lens distortion. But we use the undistorted images provided by COLMAP.

However, you can try running other scenes from Local lightfield fusion (Eg. airplant) without any changes in the dataset files. In this case, the images are not automatically undistorted.

Deepview's spaces dataset

Download: Modified spaces dataset

We slightly modified the file structure of Spaces dataset in order to determine the plane placement and split train/test sets.

Using your own images.

Running NeX on your own images. You need to install COLMAP on your machine.

Then, put your images into a directory following this structure

<scene_name>
|-- images
     | -- image_name1.jpg
     | -- image_name2.jpg
     ...

The training code will automatically prepare a scene for you. You may have to tune planes.txt to get better reconstruction (see dataset explaination)

Training

Run with the paper's config

python train.py -scene ${PATH_TO_SCENE} -model_dir ${MODEL_TO_SAVE_CHECKPOINT} -http

This implementation uses scikit-image to resize images during training by default. The results and scores in the paper are generated using OpenCV's resize function. If you want the same behavior, please add -cv2resize argument.

Note that this code is tested on an Nvidia V100 32GB and 4x RTX 2080Ti GPU.

For a GPU/GPUs with less memory (e.g., a single RTX 2080Ti), you can run using the following command:

python train.py -scene ${PATH_TO_SCENE} -model_dir ${MODEL_TO_SAVE_CHECKPOINT} -http -layers 12 -sublayers 6 -hidden 256

Note that when your GPU runs ouut of memeory, you can try reducing the number of layers, sublayers, and sampled rays.

Rendering

To generate a WebGL viewer and a video result.

python train.py -scene ${scene} -model_dir ${MODEL_TO_SAVE_CHECKPOINT} -predict -http

Video rendering

To generate a video that matches the real forward-facing rendering path, add -nice_llff argument, or -nice_shiny for shiny dataset

Citation

@inproceedings{Wizadwongsa2021NeX,
    author = {Wizadwongsa, Suttisak and Phongthawee, Pakkapon and Yenphraphai, Jiraphon and Suwajanakorn, Supasorn},
    title = {NeX: Real-time View Synthesis with Neural Basis Expansion},
    booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 
    year = {2021},
}

Visit us 🦉

Vision & Learning Laboratory VISTEC - Vidyasirimedhi Institute of Science and Technology

Comments
  • Enquiry about the size of mpi_b

    Enquiry about the size of mpi_b

    Dear author,

    From the output of your code, the spatial size of MPI_b is 400 by 400, which is different with the spatial size of other output.

    So does your program need interpolation process when rendering on web server?

    Why the MPI_b needs to be downsampled during saving, but using same size when training?

    Thanks

    opened by derrick-xwp 8
  • low quality outputs

    low quality outputs

    Hello and thanks for sharing this nice work! I 've been trying to make better outputs with nex but the only thing I got was low-quality output.

    I trained 2k-images with this code.

    python train.py -scene ${PATH_TO_SCENE} -model_dir ${MODEL_TO_SAVE_CHECKPOINT} -http -layers 12 -sublayers 6 -hidden 256

    but I got output-images like these. (the images which are automatically created in video_output directory after training )

    Even I created a dataset following the images collection method shown in LLFF and successfully installed Colmap and other requirements as well, I have no idea what's wrong with it.

    If there is any other way to get better outputs, please help me out. :)

    and my environments are like those below. RTX-2080Ti Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz Ubuntu 18.04. CUDA Toolkit 10.2

    Thanks

    opened by sonnysorry 4
  • real-time rendering issue

    real-time rendering issue

    For real-time rendering, is it true that H_i(v) is first offline-computed on the reference view v (pre-defined), and then warped to new views (unknown)? which mean there is no network inference in the real-time rendering stage.

    is it possible to share the code of the online viewer (webgl)?

    opened by jason718 4
  • Pose convention of nerf_pose_to_ours.

    Pose convention of nerf_pose_to_ours.

    Hi,

    Thanks for sharing this great work! I'm confused about the pose convention of the following lines in def nerf_pose_to_ours(*). Could you explain the geometric meaning behind it?

    https://github.com/nex-mpi/nex-code/blob/eeff38c712ac9a665f09d7c2a3fdf48ae83f4693/utils/sfm_utils.py#L323-L325

    opened by ybbbbt 3
  • cuda error: an illegal memory access was encountered

    cuda error: an illegal memory access was encountered

    Hi @pureexe, thanks for your great work, but when I trained on m own datasets for a while, I got a Cuda error, when I change to train on only one GPU, it can train for a longer time, but can also trigger this error

    the error messages like below: train.py in forward cof=pt.repeat_interleave(cof,args.sublayers,0) runtimeerror: an illegal memory access was encountered

    But when I use the demo-room datasets to train, it seems the training phase is normal. I use the colmap to preprocess the datasets and get the hwf_cxxy.txt and poses_bounds.npy using the scripts you provided.

    btw, when train on my own datasets, how to set the plane.txt? hope you can give some advice, thanks~

    opened by visonpon 3
  • Question about viewing direction, basis function, gpu memory

    Question about viewing direction, basis function, gpu memory

    All 3d points along one ray have the same viewing direction. So when rendering, isn't it enough to input only one viewing direction rather than input all duplicate viewing directions into the basis function?

    Below is the result I checked by myself. out2 is the basis function value. image

    As you can see, all 32 values have the same value of 0.1176. Since the input is the same, the output is of course the same. My question is, do I really need to waste network memory? Instead of having 32 inputs, isn't it enough to have just 1 input?

    opened by bring728 2
  • Black boundaries in some cases of Shiny dataset

    Black boundaries in some cases of Shiny dataset

    Hi, thanks for your great work!

    I found there are some black lines in the boundaries of some images, in the Shiny dataset (for example, CD):

    image

    After I resize the image to the target width (1008), the black line still exists:

    image

    Could you help figure out the reason behind this issue? I would like to know if I should remove the boundary pixels during training (and also in testing).

    Thank you very much!

    opened by Totoro97 2
  • Shiny Dataset Download

    Shiny Dataset Download

    After Download the shiny datasets through one-drive link, I can't decompress the zip file.

    To extract the file, I repaired the compressed file through following command.

    zip -FF my_zip --out my_zip_ver2.zip

    But, the file still have some problems after repairing.

    When I decompress the repaired file, there is the log file which notice the error logs.

    image image
    • many files does not exist.

    1062 files does not exist.

    Is there any problem with the uploaded file?

    Or how to download and extract the complete dataset?

    I tried download on OSX and Ubuntu, but both failed.

    Do I have to use window to download the file?

    opened by dogyoonlee 2
  • CUDA problem when following the `get started`

    CUDA problem when following the `get started`

    when follow the steps

    conda env create -f environment.yml
    ./download_demo_data.sh
    conda activate nex
    python train.py -scene data/crest_demo -model_dir crest -http
    

    Here comes the problem, I don't know what happens.

    "train.py" in <module>
      751:  train()
      train.py
    "train.py" in train
      633:  output = model(dataset.sfm, feature, output_shape, sel)
      train.py
    "module.py" in _call_impl
      889:  result = self.forward(*input, **kwargs)
    "train.py" in forward
      334:  warp, ref_coords = computeHomoWarp(sfm,
      train.py
    "train.py" in computeHomoWarp
      158:  prod = coords @ pt.transpose(Hs, 1, 2).cuda()
      train.py
    RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasCreate(handle)`
    
    opened by ironheads 2
  • generated MPIs cannot be seen on mobile

    generated MPIs cannot be seen on mobile

    Hey,

    I'm able to open the demos you generated on my mobile phone. however, I'm not able to see my own generated MPIs on my mobile. They simply do not show up, I only see the slider of layers. I'm hosted my MPI on the web, and used the same viewer you use for mobile. textures are all loaded, but then it does not show anything. Do you I need to do any further modification on the MPI in order to view it on mobile web ? Thanks for your support and great work

    Firas

    opened by shamafiras 2
  • Evaluation results for trex

    Evaluation results for trex

    Thank you sharing this amazing work. If you have them, could you share the rendered test images for the trex scene like you have for the other scenes here : https://drive.google.com/corp/drive/folders/1OLSy326rxCKMYRo4K7S27ew9D8NzfJo4

    opened by mods333 2
  • Why is the PSNR a lot higher for the Spaces dataset than the Real Forward-Facing dataset?

    Why is the PSNR a lot higher for the Spaces dataset than the Real Forward-Facing dataset?

    According to the paper, on the Real Forward-Facing dataset (Table B.1), PSNR is 25db on average, ranging between 20-32db. On the Spaces dataset (Table B.5), PSNR is stably around 35db or higher for each scene.

    Do you have any insights why that's happening?

    opened by jingweim 0
  • Mismatch between imgs 19 and poses 5

    Mismatch between imgs 19 and poses 5

    My Setup

    • I'm using colab.
    • I'm using my own set of images.
    • I didnt tweek anything after colmap.

    My Problem

    When training, it cant load data since util/load_llff.py:107 return None after mismatch warning. This is the screenshot: image

    Another Problem

    I tried to remove the mismatch images and eventually the colmap give following error: FileNotFoundError: [Errno 2] No such file or directory: 'data/demo/dense/sparse/cameras.bin' Here is the screenshot of log: image I wonder what do these mean and how to fix these problem. If more infomation is need, just let me know. Any help would be greatly appreciated.

    opened by ChexterWang 1
  • Shiny dataset dataloader

    Shiny dataset dataloader

    How can I check the given data loader and data format for Shiny dataset ?

    In addition, what camera coordinate Shiny dataset use?

    For example, x,y,z is (right, up, backward) in NeRF.

    Can I use the LLFF loader for Shiny dataset as well?

    opened by dogyoonlee 2
  • pose convertion, resize and camera principal point

    pose convertion, resize and camera principal point

    1. I was curious about the nerf_pose_to_ours function, and I read the article below. But there are still some things I don't understand. https://github.com/nex-mpi/nex-code/issues/13

    As I understand the Pose value is the Camera to world matrix, and each column represents the x-axis, y-axis, z-axis, and location of the camera in the world coordinate system.

    If I want to change the coordinate axis of the camera from the opengl coordinate system (right, up, backward) to the opencv coordinate system (right, down, forward), the pose values [r1, r2, r3, t] are set to [r1, -r2, -r3, t], isn't it? Why does Translation change? Isn't the world coordinate system fixed?

    The poses_bounds.npy file stores the camera coordinate axes as (down, right, backward). When you change this from NeRF to OpenGL coordinate system (right, up, backward), don't you do it like the following? Why is the method of changing the coordinate axes in the nerf_to_our_pose function different from the method of changing the coordinate axes below?

    https://github.com/nex-mpi/nex-code/blob/eeff38c712ac9a665f09d7c2a3fdf48ae83f4693/utils/load_llff.py#L245-L254

    Doesn't the world coordinate system matter whether it is opencv convention or opengl convention? Isn't the world coordinate system determined independently? Maybe it's because the nerf_to_our_pose function is located after recenter..? I'm confused.

    1. It makes sense to multiply the focal length by the scale factor when resizing the image. By the way, why add 0.5 to the principal point, multiply, and subtract 0.5 again? Can't we just multiply by sw? I'm curious about the hidden meaning here.

    https://github.com/nex-mpi/nex-code/blob/eeff38c712ac9a665f09d7c2a3fdf48ae83f4693/utils/sfm_utils.py#L188-L191

    opened by bring728 0
  • When is warping computed?

    When is warping computed?

    Are the sampled x,y,d already warped before feeding them into the network? Cause in the algorithm in the paper warping is computed after the color info is regressed?

    opened by slulura 0
Owner
null
NBEATSx: Neural basis expansion analysis with exogenous variables

NBEATSx: Neural basis expansion analysis with exogenous variables We extend the NBEATS model to incorporate exogenous factors. The resulting method, c

Cristian Challu 100 Dec 31, 2022
A Moonraker plug-in for real-time compensation of frame thermal expansion

Frame Expansion Compensation A Moonraker plug-in for real-time compensation of frame thermal expansion. Installation Credit to protoloft, from whom I

null 58 Jan 2, 2023
MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Main repo for ECCV 2020 paper MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images. visual.cs.brown.edu/matryodshka

Brown University Visual Computing Group 75 Dec 13, 2022
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 585 Jan 4, 2023
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

German Bauer 11 Feb 8, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

Rounak Das 1 Feb 15, 2022
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Facebook Research 296 Dec 29, 2022
Code for Mesh Convolution Using a Learned Kernel Basis

Mesh Convolution This repository contains the implementation (in PyTorch) of the paper FULLY CONVOLUTIONAL MESH AUTOENCODER USING EFFICIENT SPATIALLY

Yi_Zhou 35 Jan 3, 2023
Official code release for "GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis"

GRAF This repository contains official code for the paper GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. You can find detailed usage i

null 349 Dec 29, 2022
Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Moustafa Meshry 16 Oct 5, 2022
Code release of paper "Deep Multi-View Stereo gone wild"

Deep MVS gone wild Pytorch implementation of "Deep MVS gone wild" (Paper | website) This repository provides the code to reproduce the experiments of

François Darmon 53 Dec 24, 2022
Medical Image Segmentation using Squeeze-and-Expansion Transformers

Medical Image Segmentation using Squeeze-and-Expansion Transformers Introduction This repository contains the code of the IJCAI'2021 paper 'Medical Im

askerlee 172 Dec 20, 2022
An expansion for RDKit to read all types of files in one line

RDMolReader An expansion for RDKit to read all types of files in one line How to use? Add this single .py file to your project and import MolFromFile(

Ali Khodabandehlou 1 Dec 18, 2021
Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

One-Shot Free-View Neural Talking Head Synthesis Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Vide

ZLH 406 Dec 23, 2022
Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis

face-vid2vid Usage Dataset Preparation cd datasets wget https://yt-dl.org/downloads/latest/youtube-dl -O youtube-dl chmod a+rx youtube-dl python load_

worstcoder 68 Dec 30, 2022
[ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing

NeRFlow [ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing Datasets The pouring dataset used for experiments can be download he

null 44 Dec 20, 2022
A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

NeRF Minimal Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Result of Tiny-NeRF RGB Depth

Soumik Rakshit 11 Jul 24, 2022