Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Overview

NeX: Real-time View Synthesis with Neural Basis Expansion

Project Page | Video | Paper | COLAB | Shiny Dataset

Open NeX in Colab

NeX

We present NeX, a new approach to novel view synthesis based on enhancements of multiplane image (MPI) that can reproduce NeXt-level view-dependent effects---in real time. Unlike traditional MPI that uses a set of simple RGBα planes, our technique models view-dependent effects by instead parameterizing each pixel as a linear combination of basis functions learned from a neural network. Moreover, we propose a hybrid implicit-explicit modeling strategy that improves upon fine detail and produces state-of-the-art results. Our method is evaluated on benchmark forward-facing datasets as well as our newly-introduced dataset designed to test the limit of view-dependent modeling with significantly more challenging effects such as the rainbow reflections on a CD. Our method achieves the best overall scores across all major metrics on these datasets with more than 1000× faster rendering time than the state of the art.

Table of contents



Getting started

conda env create -f environment.yml
./download_demo_data.sh
conda activate nex
python train.py -scene data/crest_demo -model_dir crest -http
tensorboard --logdir runs/

Installation

We provide environment.yml to help you setup a conda environment.

conda env create -f environment.yml

Dataset

Shiny dataset

Download: Shiny dataset.

We provide 2 directories named shiny and shiny_extended.

  • shiny contains benchmark scenes used to report the scores in our paper.
  • shiny_extended contains additional challenging scenes used on our website project page and video

NeRF's real forward-facing dataset

Download: Undistorted front facing dataset

For real forward-facing dataset, NeRF is trained with the raw images, which may contain lens distortion. But we use the undistorted images provided by COLMAP.

However, you can try running other scenes from Local lightfield fusion (Eg. airplant) without any changes in the dataset files. In this case, the images are not automatically undistorted.

Deepview's spaces dataset

Download: Modified spaces dataset

We slightly modified the file structure of Spaces dataset in order to determine the plane placement and split train/test sets.

Using your own images.

Running NeX on your own images. You need to install COLMAP on your machine.

Then, put your images into a directory following this structure

<scene_name>
|-- images
     | -- image_name1.jpg
     | -- image_name2.jpg
     ...

The training code will automatically prepare a scene for you. You may have to tune planes.txt to get better reconstruction (see dataset explaination)

Training

Run with the paper's config

python train.py -scene ${PATH_TO_SCENE} -model_dir ${MODEL_TO_SAVE_CHECKPOINT} -http

This implementation uses scikit-image to resize images during training by default. The results and scores in the paper are generated using OpenCV's resize function. If you want the same behavior, please add -cv2resize argument.

Note that this code is tested on an Nvidia V100 32GB and 4x RTX 2080Ti GPU.

For a GPU/GPUs with less memory (e.g., a single RTX 2080Ti), you can run using the following command:

python train.py -scene ${PATH_TO_SCENE} -model_dir ${MODEL_TO_SAVE_CHECKPOINT} -http -layers 12 -sublayers 6 -hidden 256

Note that when your GPU runs ouut of memeory, you can try reducing the number of layers, sublayers, and sampled rays.

Rendering

To generate a WebGL viewer and a video result.

python train.py -scene ${scene} -model_dir ${MODEL_TO_SAVE_CHECKPOINT} -predict -http

Video rendering

To generate a video that matches the real forward-facing rendering path, add -nice_llff argument, or -nice_shiny for shiny dataset

Citation

@inproceedings{Wizadwongsa2021NeX,
    author = {Wizadwongsa, Suttisak and Phongthawee, Pakkapon and Yenphraphai, Jiraphon and Suwajanakorn, Supasorn},
    title = {NeX: Real-time View Synthesis with Neural Basis Expansion},
    booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 
    year = {2021},
}

Visit us 🦉

Vision & Learning Laboratory VISTEC - Vidyasirimedhi Institute of Science and Technology

Comments
  • Enquiry about the size of mpi_b

    Enquiry about the size of mpi_b

    Dear author,

    From the output of your code, the spatial size of MPI_b is 400 by 400, which is different with the spatial size of other output.

    So does your program need interpolation process when rendering on web server?

    Why the MPI_b needs to be downsampled during saving, but using same size when training?

    Thanks

    opened by derrick-xwp 8
  • low quality outputs

    low quality outputs

    Hello and thanks for sharing this nice work! I 've been trying to make better outputs with nex but the only thing I got was low-quality output.

    I trained 2k-images with this code.

    python train.py -scene ${PATH_TO_SCENE} -model_dir ${MODEL_TO_SAVE_CHECKPOINT} -http -layers 12 -sublayers 6 -hidden 256

    but I got output-images like these. (the images which are automatically created in video_output directory after training )

    Even I created a dataset following the images collection method shown in LLFF and successfully installed Colmap and other requirements as well, I have no idea what's wrong with it.

    If there is any other way to get better outputs, please help me out. :)

    and my environments are like those below. RTX-2080Ti Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz Ubuntu 18.04. CUDA Toolkit 10.2

    Thanks

    opened by sonnysorry 4
  • real-time rendering issue

    real-time rendering issue

    For real-time rendering, is it true that H_i(v) is first offline-computed on the reference view v (pre-defined), and then warped to new views (unknown)? which mean there is no network inference in the real-time rendering stage.

    is it possible to share the code of the online viewer (webgl)?

    opened by jason718 4
  • Pose convention of nerf_pose_to_ours.

    Pose convention of nerf_pose_to_ours.

    Hi,

    Thanks for sharing this great work! I'm confused about the pose convention of the following lines in def nerf_pose_to_ours(*). Could you explain the geometric meaning behind it?

    https://github.com/nex-mpi/nex-code/blob/eeff38c712ac9a665f09d7c2a3fdf48ae83f4693/utils/sfm_utils.py#L323-L325

    opened by ybbbbt 3
  • cuda error: an illegal memory access was encountered

    cuda error: an illegal memory access was encountered

    Hi @pureexe, thanks for your great work, but when I trained on m own datasets for a while, I got a Cuda error, when I change to train on only one GPU, it can train for a longer time, but can also trigger this error

    the error messages like below: train.py in forward cof=pt.repeat_interleave(cof,args.sublayers,0) runtimeerror: an illegal memory access was encountered

    But when I use the demo-room datasets to train, it seems the training phase is normal. I use the colmap to preprocess the datasets and get the hwf_cxxy.txt and poses_bounds.npy using the scripts you provided.

    btw, when train on my own datasets, how to set the plane.txt? hope you can give some advice, thanks~

    opened by visonpon 3
  • Question about viewing direction, basis function, gpu memory

    Question about viewing direction, basis function, gpu memory

    All 3d points along one ray have the same viewing direction. So when rendering, isn't it enough to input only one viewing direction rather than input all duplicate viewing directions into the basis function?

    Below is the result I checked by myself. out2 is the basis function value. image

    As you can see, all 32 values have the same value of 0.1176. Since the input is the same, the output is of course the same. My question is, do I really need to waste network memory? Instead of having 32 inputs, isn't it enough to have just 1 input?

    opened by bring728 2
  • Black boundaries in some cases of Shiny dataset

    Black boundaries in some cases of Shiny dataset

    Hi, thanks for your great work!

    I found there are some black lines in the boundaries of some images, in the Shiny dataset (for example, CD):

    image

    After I resize the image to the target width (1008), the black line still exists:

    image

    Could you help figure out the reason behind this issue? I would like to know if I should remove the boundary pixels during training (and also in testing).

    Thank you very much!

    opened by Totoro97 2
  • Shiny Dataset Download

    Shiny Dataset Download

    After Download the shiny datasets through one-drive link, I can't decompress the zip file.

    To extract the file, I repaired the compressed file through following command.

    zip -FF my_zip --out my_zip_ver2.zip

    But, the file still have some problems after repairing.

    When I decompress the repaired file, there is the log file which notice the error logs.

    image image
    • many files does not exist.

    1062 files does not exist.

    Is there any problem with the uploaded file?

    Or how to download and extract the complete dataset?

    I tried download on OSX and Ubuntu, but both failed.

    Do I have to use window to download the file?

    opened by dogyoonlee 2
  • CUDA problem when following the `get started`

    CUDA problem when following the `get started`

    when follow the steps

    conda env create -f environment.yml
    ./download_demo_data.sh
    conda activate nex
    python train.py -scene data/crest_demo -model_dir crest -http
    

    Here comes the problem, I don't know what happens.

    "train.py" in <module>
      751:  train()
      train.py
    "train.py" in train
      633:  output = model(dataset.sfm, feature, output_shape, sel)
      train.py
    "module.py" in _call_impl
      889:  result = self.forward(*input, **kwargs)
    "train.py" in forward
      334:  warp, ref_coords = computeHomoWarp(sfm,
      train.py
    "train.py" in computeHomoWarp
      158:  prod = coords @ pt.transpose(Hs, 1, 2).cuda()
      train.py
    RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasCreate(handle)`
    
    opened by ironheads 2
  • generated MPIs cannot be seen on mobile

    generated MPIs cannot be seen on mobile

    Hey,

    I'm able to open the demos you generated on my mobile phone. however, I'm not able to see my own generated MPIs on my mobile. They simply do not show up, I only see the slider of layers. I'm hosted my MPI on the web, and used the same viewer you use for mobile. textures are all loaded, but then it does not show anything. Do you I need to do any further modification on the MPI in order to view it on mobile web ? Thanks for your support and great work

    Firas

    opened by shamafiras 2
  • Evaluation results for trex

    Evaluation results for trex

    Thank you sharing this amazing work. If you have them, could you share the rendered test images for the trex scene like you have for the other scenes here : https://drive.google.com/corp/drive/folders/1OLSy326rxCKMYRo4K7S27ew9D8NzfJo4

    opened by mods333 2
  • Why is the PSNR a lot higher for the Spaces dataset than the Real Forward-Facing dataset?

    Why is the PSNR a lot higher for the Spaces dataset than the Real Forward-Facing dataset?

    According to the paper, on the Real Forward-Facing dataset (Table B.1), PSNR is 25db on average, ranging between 20-32db. On the Spaces dataset (Table B.5), PSNR is stably around 35db or higher for each scene.

    Do you have any insights why that's happening?

    opened by jingweim 0
  • Mismatch between imgs 19 and poses 5

    Mismatch between imgs 19 and poses 5

    My Setup

    • I'm using colab.
    • I'm using my own set of images.
    • I didnt tweek anything after colmap.

    My Problem

    When training, it cant load data since util/load_llff.py:107 return None after mismatch warning. This is the screenshot: image

    Another Problem

    I tried to remove the mismatch images and eventually the colmap give following error: FileNotFoundError: [Errno 2] No such file or directory: 'data/demo/dense/sparse/cameras.bin' Here is the screenshot of log: image I wonder what do these mean and how to fix these problem. If more infomation is need, just let me know. Any help would be greatly appreciated.

    opened by ChexterWang 1
  • Shiny dataset dataloader

    Shiny dataset dataloader

    How can I check the given data loader and data format for Shiny dataset ?

    In addition, what camera coordinate Shiny dataset use?

    For example, x,y,z is (right, up, backward) in NeRF.

    Can I use the LLFF loader for Shiny dataset as well?

    opened by dogyoonlee 2
  • pose convertion, resize and camera principal point

    pose convertion, resize and camera principal point

    1. I was curious about the nerf_pose_to_ours function, and I read the article below. But there are still some things I don't understand. https://github.com/nex-mpi/nex-code/issues/13

    As I understand the Pose value is the Camera to world matrix, and each column represents the x-axis, y-axis, z-axis, and location of the camera in the world coordinate system.

    If I want to change the coordinate axis of the camera from the opengl coordinate system (right, up, backward) to the opencv coordinate system (right, down, forward), the pose values [r1, r2, r3, t] are set to [r1, -r2, -r3, t], isn't it? Why does Translation change? Isn't the world coordinate system fixed?

    The poses_bounds.npy file stores the camera coordinate axes as (down, right, backward). When you change this from NeRF to OpenGL coordinate system (right, up, backward), don't you do it like the following? Why is the method of changing the coordinate axes in the nerf_to_our_pose function different from the method of changing the coordinate axes below?

    https://github.com/nex-mpi/nex-code/blob/eeff38c712ac9a665f09d7c2a3fdf48ae83f4693/utils/load_llff.py#L245-L254

    Doesn't the world coordinate system matter whether it is opencv convention or opengl convention? Isn't the world coordinate system determined independently? Maybe it's because the nerf_to_our_pose function is located after recenter..? I'm confused.

    1. It makes sense to multiply the focal length by the scale factor when resizing the image. By the way, why add 0.5 to the principal point, multiply, and subtract 0.5 again? Can't we just multiply by sw? I'm curious about the hidden meaning here.

    https://github.com/nex-mpi/nex-code/blob/eeff38c712ac9a665f09d7c2a3fdf48ae83f4693/utils/sfm_utils.py#L188-L191

    opened by bring728 0
  • When is warping computed?

    When is warping computed?

    Are the sampled x,y,d already warped before feeding them into the network? Cause in the algorithm in the paper warping is computed after the color info is regressed?

    opened by slulura 0
Owner
null
Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

UBC Computer Vision Group 358 Dec 24, 2022
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

Ryuichi Yamamoto 1.8k Dec 30, 2022
End-2-end speech synthesis with recurrent neural networks

Introduction New: Interactive demo using Google Colaboratory can be found here TTS-Cube is an end-2-end speech synthesis system that provides a full p

Tiberiu Boros 214 Dec 7, 2022
DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time. While it efficiently searches the answers out of 60 billion phrases in Wikipedia, it is also very accurate having competitive accuracy with state-of-the-art open-domain QA models

Jinhyuk Lee 543 Jan 8, 2023
Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统,包含语音编码器、语音合成器、声码器和可视化模块。

Kuang Dada 6 Nov 8, 2022
Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features ?? Chinese supported mandarin and tested with

Weijia Chen 25.6k Jan 6, 2023
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation This repository contains the implementation of the following paper: Live Speech

OldSix 575 Dec 31, 2022
🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

English | 中文 Features ?? Chinese supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, data_aishell, and etc. ?

Vega 25.6k Dec 31, 2022
A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Simple-Vosk A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk. Check out the official Vosk G

null 2 Jun 19, 2022
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)

InstaDeep Ltd 72 Dec 9, 2022
Binaural Speech Synthesis

Binaural Speech Synthesis This repository contains code to train a mono-to-binaural neural sound renderer. If you use this code or the provided datase

Facebook Research 135 Dec 18, 2022
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented. Mostly I would recommend giving a quick look to the figures beyond the introduction.

Corentin Jemine 38.5k Jan 3, 2023
Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

Neosapience 103 Dec 23, 2022
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Keon Lee 67 Nov 14, 2022
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

Yuchao Zhang 204 Jul 14, 2022
IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models. Everything is pure Python and PyTorch based to keep it as simple and beginner-friendly, yet powerful as possible.

Digital Phonetics at the University of Stuttgart 247 Jan 5, 2023