[ICCV 2021] Our work presents a novel neural rendering approach that can efficiently reconstruct geometric and neural radiance fields for view synthesis.

Overview

MVSNeRF

Project page | Paper

This repository contains a pytorch lightning implementation for the ICCV 2021 paper: MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo. Our work present a novel neural rendering approach that can efficiently reconstruct geometric and neural radiance fields for view synthesis, Moreover, if dense images are captured, our estimated radiance field representation can be easily fine-tuned; this leads to fast per-scene reconstruction.

Pipeline

Installation

Tested on Ubuntu 16.04 + Pytorch 1.8 + Pytorch Lignting 1.3.5

Install environment:

pip install pytorch-lightning, inplace_abn
pip install imageio, pillow, scikit-image, opencv-python, config-argparse, lpips

Training

Please see each subsection for training on different datasets. Available training datasets:

DTU dataset

Data download

Download the preprocessed DTU training data and Depth_raw from original MVSNet repo and unzip. We provide a DTU example, please follow with the example's folder structure.

Training model

Run

CUDA_VISIBLE_DEVICES=$cuda  python train_mvs_nerf_pl.py \
   --expname $exp_name
   --num_epochs 6
   --use_viewdirs \
   --dataset_name dtu \
   --datadir $DTU_DIR

More options refer to the opt.py, training command example:

CUDA_VISIBLE_DEVICES=0  python train_mvs_nerf_pl.py
    --with_depth  --imgScale_test 1.0 \
    --expname mvs-nerf-is-all-your-need \
    --num_epochs 6 --N_samples 128 --use_viewdirs --batch_size 1024 \
    --dataset_name dtu \
    --datadir path/to/dtu/data \
    --N_vis 6

You may need to add --with_depth if you want to quantity depth during training. --N_vis denotes the validation frequency. --imgScale_test is the downsample ratio during validation, like 0.5. The training process takes about 30h on single RTX 2080Ti for 6 epochs.

Important: please always set batch_size to 1 when you are trining a genelize model, you can enlarge it when fine-tuning.

Checkpoint: a pre-trained checkpint is included in ckpts/mvsnerf-v0.tar.

Evaluation: We also provide a rendering and quantity scipt in renderer.ipynb, and you can also use the run_batch.py if you want to testing or finetuning on different dataset. More results can be found from Here, please check your configuration if your rendering result looks absnormal.

Rendering from the trained model should have result like this:

no-finetuned

Finetuning

Blender

Steps

Data download

Download nerf_synthetic.zip from here

CUDA_VISIBLE_DEVICES=0  python train_mvs_nerf_finetuning_pl.py  \
    --dataset_name blender --datadir /path/to/nerf_synthetic/lego \
    --expname lego-ft  --with_rgb_loss  --batch_size 1024  \
    --num_epochs 1 --imgScale_test 1.0 --white_bkgd  --pad 0 \
    --ckpt ./ckpts/mvsnerf-v0.tar --N_vis 1

LLFF

Steps

Data download

Download nerf_llff_data.zip from here

CUDA_VISIBLE_DEVICES=0  python train_mvs_nerf_finetuning_pl.py  \
    --dataset_name llff --datadir /path/to/nerf_llff_data/{scene_name} \
    --expname horns-ft  --with_rgb_loss  --batch_size 1024  \
    --num_epochs 1 --imgScale_test 1.0  --pad 24 \
    --ckpt ./ckpts/mvsnerf-v0.tar --N_vis 1

DTU

Steps
CUDA_VISIBLE_DEVICES=0  python train_mvs_nerf_finetuning_pl.py  \
    --dataset_name dtu_ft --datadir /path/to/DTU/mvs_training/dtu/scan1 \
    --expname scan1-ft  --with_rgb_loss  --batch_size 1024  \
    --num_epochs 1 --imgScale_test 1.0   --pad 24 \
    --ckpt ./ckpts/mvsnerf-v0.tar --N_vis 1

Rendering

After training or finetuning, you can render free-viewpoint videos with the renderer-video.ipynb. if you want to use your own data, please using the right hand coordinate system (intrinsic, nearfar and extrinsic either with camera to world or world to camera in opencv format) and modify the rendering scipts.

After 10k iterations (~ 15min), you should have videos like this:

finetuned

Citation

If you find our code or paper helps, please consider citing:

@article{chen2021mvsnerf,
  title={MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo},
  author={Chen, Anpei and Xu, Zexiang and Zhao, Fuqiang and Zhang, Xiaoshuai and Xiang, Fanbo and Yu, Jingyi and Su, Hao},
  journal={arXiv preprint arXiv:2103.15595},
  year={2021}
}

Big thanks to CasMVSNet_pl, our code is partially borrowing from them.

Relevant Works

MVSNet: Depth Inference for Unstructured Multi-view Stereo (ECCV 2018)
Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, Long Quan

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching (CVPR 2020)
Xiaodong Gu, Zhiwen Fan, Zuozhuo Dai, Siyu Zhu, Feitong Tan, Ping Tan

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (ECCV 2020)
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng

IBRNet: Learning Multi-View Image-Based Rendering (CVPR 2021)
Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, Thomas Funkhouser

PixelNeRF: Neural Radiance Fields from One or Few Images (CVPR 2021)
Alex Yu, Vickie Ye, Matthew Tancik, Angjoo Kanazawa

Comments
  • How to derive homographic warping formula in section 3.1

    How to derive homographic warping formula in section 3.1

    Hi, thank you for sharing the cool work. I am wondering how to derive the matrix warping formula in section 3.1 i.e

    2021-09-21_17-06

    Because according to my own derivation, the result is

    CodeCogsEqn (1)

    Are there any derivation premises that I have not considered? Thanks in advance

    opened by murumura 4
  • Question regarding the architecture

    Question regarding the architecture

    Hi, I really appreciate your work. An impressive idea. I want to ask a question regarding the 3D CNN architecture: image

    1. It's written, that CBR3D0 takes T which is cost volume of shape (BATCH_SIZE, CHANNELS (32), N_DEPTH, HEIGHT / 4, WIDTH / 4) and I which are concatenated images, as I guess (BATCH_SIZE, 9, 1, HEIGHT, WIDTH). So, there appears to be a shape inconsistency. Could you, please, clarify this one, if possible.
    2. It also states, that CBT3D2 takes outputs from CTB3D1 and CBR3D2, which is a total of 48 channels instead of 64. Is it a typo? Thanks in advance for your reply!
    opened by oOXpycTOo 4
  • Cannot reproduce the results on LLFF dataset

    Cannot reproduce the results on LLFF dataset

    Hi, I used your code in render.ipynb and tried to evaluate the pretrained model on LLFF, but I found the metrics are significant lower than that reported in the paper, for example: for the room scene, the result PSNR are only 22.99, but in the paper it is 26.95.

    Screen Shot 2022-08-24 at 9 27 17 PM
    opened by cwchenwang 3
  • cost volume constraint - closer camera view

    cost volume constraint - closer camera view

    As you said in #7 ,

    we think building volume in the target view may provide a better depth quality but is not an efficient way to do the free-viewpoint rendering

    I also feel that the fact that the cost volume must be in some reference view has many limitations. Leave alone view extrapolations, how does your method perform when you move the camera closer to the scene (not zoom in but physically place the camera closer to the scene)?

    Currently, the LLFF scenes are all captured roughly at the same distance to the scene, so interpolating at this distance seems great as your video in the readme. However I wonder how it performs in the situation I describe above. With the current cost volume, I can think of two problems:

    1. Although this novel view is still view interpolation (it lies inside the frustums of all reference views), the view to the nearest reference view is very far.
    2. The novel view lies inside the cost volume, a situation that is not seen at all at training time.

    To my knowledge, traditional MPI methods cannot handle this kind of situation well either. NeRF performs excellently without problem since it reconstructs the whole scene in 3D without relying on reference views in test time. Following is an example where I move the camera very close to the scene (left is the nearest reference view and right is NeRF synthesized result). c

    Pardon me for not being able to run your code myself. I didn't find an easy way to run your code with specifying a pose...

    opened by kwea123 3
  • Why finetuning MVSNet

    Why finetuning MVSNet

    Hello, thanks for your excellent work! In the original paper, it seems that you have claimed that during finetuning, MVSNerF is free from the 2d feature extractor and the 3d cnn (Correpsonding to the MVSNet class of the code), which means the MVSNet doesn't need to be optimized during fintuning.

    In introduction

    In essence, the encoding volume is a localized neural representation of the radiance field; once estimated, this volume can be used directly (dropping the 3D CNN) for final rendering by differentiable ray marching.

    In section 3.4

    Note that, we optimize only the encoding volume and the MLP, instead of our entire network

    However, intrain_mvs_nerf_finetuning_pl.py and models.py. It seems that the MVSNet also needs to be optimized during finetuning, which confuses me.

    Could you please explain the difference of the paper and the code? Many thanks!

    opened by ldz666666 2
  • few views in training

    few views in training

    In dtu.py:

    if self.split=='train':
        ids = torch.randperm(5)[:3]
        view_ids = [src_views[i] for i in ids] + [target_view]
    

    It seems that only a small number of images are used for training in the dtu dataset?

    opened by caiyongqi 2
  • H*W or W*H?

    H*W or W*H?

    Hi. src_grid = src_grid.view(B, D, W_pad, H_pad, 2) in function homo_warp. The shape of src_grid seems to be (B, DHW, 2), so is there something wrong with the above code?

    opened by caiyongqi 2
  • Some questions about code details and abnormal phenomenon

    Some questions about code details and abnormal phenomenon

    Hi, it's very kind of you for releasing the code of the paper, and there are some questions about code details and abnormal phenomenon:

    image image As is shown in the first image, I am bit confused of the codes in 140-144 lines of ./data/dtu.py. The reference should be the first element of "view_ids", the in codes in 140-144 lines you have put in the last position. Is something wrong with my understanding?

    image And again in the codes in 233-234 lines of ./train_mvs_nerf_pl.py, here appears two notions, tgt and ref. So may I ask what exactly src, ref, and tgt stands for?

    Is the "with_depth" option means that we have got the depth map of dataset and we should not activate it if we do not get the depth map?

    image Last qusetion, when I train with my own dataset, nan appears after a few iterations. Have you ever met with the same problem?

    Looking forward to your answers. Thanks for your work again.

    opened by Impuuuuuu 2
  • Enquiry into dtu_pairs.txt file

    Enquiry into dtu_pairs.txt file

    Hi, thanks for your great work. I would like to make an enquiry into how dtu_pairs.txt file was created (not pairs.th file) and especially into what the floating point values mean in each line.

    Thank you

    opened by mckhang 2
  • About the img_downscale

    About the img_downscale

    Hi, I have pull down your newest code. However, I cannot run the finetune script for you have not define the img_scale in train_mvs_nerf_finetuning_pl.py

    opened by starhiking 2
  • About the accuracy

    About the accuracy

    Hi, I have some questions concerning accuracy in the following:

    1. How to evaluate the accuracy for the 16 DTU? I have changed the max_len from 10 to -1 in val_dataset. Does the modification work right and Does there exist other methods for evaluating?

    2. Except #114 scene in DTU, I cannot implement the accuracy of other scenes (1, 8, 21, 103) as the paper said. Could you please offer the run commands for pretraining and fine-tuning (DTU) to the official accuracy? And I am sure the dataset is right because the #114 also comes from it.

    3. By the way, is the newest update(597f5) significant, can the older version (736d9a4b) be restored its accuracy?

    opened by starhiking 2
  • On extrinsic preprocess

    On extrinsic preprocess

    Hello, thanks for your great work I wonder when applying mvsnerf on other types of data ( faces, bodys, for example ), what should i do with the extrinsic and the intrinsic in my own data. To be more detailed 1: Do i need to convert the extrinsic matrix to opencv format? If so, do a simple matrix-multiplication between the extrinsic(c2w for my own data) and the blender2opencv matrix is enough ? 2: How should i deal with the scale factor and the downsample factor like what is in ./data/dtu.py. Do i need to perform intrinsic[:2] = intirnsic[:2]*4 when collecting intrinsic, and intrinsic[:2] = intrinsic[:2]/4 when calculating proj mats like in dtu.py, or directly perform dividing 4 without intrinsic[:2] = intirnsic[:2]*4.

    Looking for your reply !

    opened by ldz666666 0
  • About less than 3 images to reconstruct mesh

    About less than 3 images to reconstruct mesh

    Dear author, If I want to use less than 3 images(included) to reconstruct complicated mesh, such as human full body with clothes, which method could do a better job now? pixelNerf, mvsNerf or any other feasible methods?

    By the way, will this project develop mesh generation functionality in the future?

    Look forward to your reply, thank you.

    opened by skyohorizon 0
  • About your volume rendering implementation

    About your volume rendering implementation

    Hi there,

    Thank you for the great work! When I try to follow your code, I have a question on the raw2alpha function at https://github.com/apchenstu/mvsnerf/blob/main/renderer.py#L18, which describes the process of transforming volume density to the weights of rendering. However, in original NeRF paper, the distance between ray points should be another factor in the exponential function. In both your paper and your code, it seems that this term is gone.

    I tried your original code and I found ignoring the distance does not affect the performance that much. But is there any particular reason that you ignore this term?

    Thank you for your time and look forward to your reply!

    opened by jcliu0428 0
  • Question about the position input of the MLP x.

    Question about the position input of the MLP x.

    Hi! Thanks for open sourcing the code of your wonderful work!

    I have some questions about the position input x of the NeRF MLP. If I get it right, x is in NDC space. While the same point in the world space may have different 'x' in the NDC space, as the output view changes. So could you share some insights on the effect of x here? Thanks a lot!

    opened by GostInShell 0
Owner
Anpei Chen
Anpei Chen
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

null 111 Dec 29, 2022
[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

NerfingMVS Project Page | Paper | Video | Data NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo Yi Wei, Shaohui

Yi Wei 369 Dec 24, 2022
A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

NeRF Minimal Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Result of Tiny-NeRF RGB Depth

Soumik Rakshit 11 Jul 24, 2022
[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction Project Page | Paper | Supplementary | Video This reposit

null 331 Dec 28, 2022
[ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing

NeRFlow [ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing Datasets The pouring dataset used for experiments can be download he

null 44 Dec 20, 2022
This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks Project Page | Video | Presentation | Paper | Data L

Facebook Research 281 Dec 22, 2022
BARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)

BARF ?? : Bundle-Adjusting Neural Radiance Fields Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Simon Lucey IEEE International Conference on Comp

Chen-Hsuan Lin 539 Dec 28, 2022
This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis | Project Page | Paper | PyTorch implementation for the paper "AD-NeRF: Audio

null 551 Dec 29, 2022
Official code release for "GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis"

GRAF This repository contains official code for the paper GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. You can find detailed usage i

null 349 Dec 29, 2022
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 585 Jan 4, 2023
(Arxiv 2021) NeRF--: Neural Radiance Fields Without Known Camera Parameters

NeRF--: Neural Radiance Fields Without Known Camera Parameters Project Page | Arxiv | Colab Notebook | Data Zirui Wang¹, Shangzhe Wu², Weidi Xie², Min

Active Vision Laboratory 411 Dec 26, 2022
The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

Neural Deformation Graphs Project Page | Paper | Video Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction Aljaž Božič, Pablo P

Aljaz Bozic 134 Dec 16, 2022
Pytorch implementation for A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose

A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose Paper | Website | Data A-NeRF: Articulated Neural Radiance F

Shih-Yang Su 172 Dec 22, 2022
This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies.

Deformable Neural Radiance Fields This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies. Project Page Paper Video This codebase conta

Google 1k Jan 9, 2023
Unofficial & improved implementation of NeRF--: Neural Radiance Fields Without Known Camera Parameters

[Unofficial code-base] NeRF--: Neural Radiance Fields Without Known Camera Parameters [ Project | Paper | Official code base ] ⬅️ Thanks the original

Jianfei Guo 239 Dec 22, 2022
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.

This repository contains the code release for Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. This implementation is written in JAX, and is a fork of Google's JaxNeRF implementation. Contact Jon Barron if you encounter any issues.

Google 625 Dec 30, 2022
Code for KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs Check out the paper on arXiv: https://arxiv.org/abs/2103.13744 This repo cont

Christian Reiser 373 Dec 20, 2022
Code release for DS-NeRF (Depth-supervised Neural Radiance Fields)

Depth-supervised NeRF: Fewer Views and Faster Training for Free Project | Paper | YouTube Pytorch implementation of our method for learning neural rad

null 524 Jan 8, 2023
PyTorch implementation for MINE: Continuous-Depth MPI with Neural Radiance Fields

MINE: Continuous-Depth MPI with Neural Radiance Fields Project Page | Video PyTorch implementation for our ICCV 2021 paper. MINE: Towards Continuous D

Zijian Feng 325 Dec 29, 2022