PyTorch implementation of paper "IBRNet: Learning Multi-View Image-Based Rendering", CVPR 2021.

Related tags

Deep Learning IBRNet
Overview

IBRNet: Learning Multi-View Image-Based Rendering

PyTorch implementation of paper "IBRNet: Learning Multi-View Image-Based Rendering", CVPR 2021.

IBRNet: Learning Multi-View Image-Based Rendering
Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, Thomas Funkhouser
CVPR 2021

project page | paper | data & model

Demo

Installation

Clone this repo with submodules:

git clone --recurse-submodules https://github.com/googleinterns/IBRNet
cd IBRNet/

The code is tested with Python3.7, PyTorch == 1.5 and CUDA == 10.2. We recommend you to use anaconda to make sure that all dependencies are in place. To create an anaconda environment:

conda env create -f environment.yml
conda activate ibrnet

Datasets

1. Training datasets

├──data/
    ├──ibrnet_collected_1/
    ├──ibrnet_collected_2/
    ├──real_iconic_noface/
    ├──spaces_dataset/
    ├──RealEstate10K-subset/
    ├──google_scanned_objects/

Please first cd data/, and then download datasets into data/ following the instructions below. The organization of the datasets should be the same as above.

(a) Our captures

We captured 67 forward-facing scenes (each scene contains 20-60 images). To download our data ibrnet_collected.zip (4.1G) for training, run:

gdown https://drive.google.com/uc?id=1rkzl3ecL3H0Xxf5WTyc2Swv30RIyr1R_
unzip ibrnet_collected.zip

P.S. We've captured some more scenes in ibrnet_collected_more.zip, but we didn't include them for training. Feel free to download them if you would like more scenes for your task, but you wouldn't need them to reproduce our results.

(b) LLFF released scenes

Download and process real_iconic_noface.zip (6.6G) using the following commands:

# download 
gdown https://drive.google.com/uc?id=1ThgjloNt58ZdnEuiCeRf9tATJ-HI0b01
unzip real_iconic_noface.zip

# [IMPORTANT] remove scenes that appear in the test set
cd real_iconic_noface/
rm -rf data2_fernvlsb data2_hugetrike data2_trexsanta data3_orchid data5_leafscene data5_lotr data5_redflower
cd ../

(c) Spaces Dataset

Download spaces dataset by:

git clone https://github.com/augmentedperception/spaces_dataset

(d) RealEstate10K

The full RealEstate10K dataset is very large and can be difficult to download. Hence, we provide a subset of RealEstate10K training scenes containing only 200 scenes. In our experiment, we found using more scenes from RealEstate10K only provides marginal improvement. To download our camera files (2MB):

gdown https://drive.google.com/uc?id=1IgJIeCPPZ8UZ529rN8dw9ihNi1E9K0hL
unzip RealEstate10K_train_cameras_200.zip -d RealEstate10K-subset

Besides the camera files, you also need to download the corresponding video frames from YouTube. You can download the frames (29G) by running the following commands. The script uses ffmpeg to extract frames, so please make sure you have ffmpeg installed.

git clone https://github.com/qianqianwang68/RealEstate10K_Downloader
cd RealEstate10K_Downloader
python generate_dataset.py train
cd ../

(e) Google Scanned Objects

Google Scanned Objects contain 1032 diffuse objects with various shapes and appearances. We use gaps to render these objects for training. Each object is rendered at 512 × 512 pixels from viewpoints on a quarter of the sphere. We render 250 views for each object. To download our renderings (7.5GB), run:

gdown https://drive.google.com/uc?id=1w1Cs0yztH6kE3JIz7mdggvPGCwIKkVi2
unzip google_scanned_objects_renderings.zip

2. Evaluation datasets

├──data/
    ├──deepvoxels/
    ├──nerf_synthetic/
    ├──nerf_llff_data/

The evaluation datasets include DeepVoxel synthetic dataset, NeRF realistic 360 dataset and the real forward-facing dataset. To download all three datasets (6.7G), run the following command under data/ directory:

bash download_eval_data.sh

Evaluation

First download our pretrained model under the project root directory:

gdown https://drive.google.com/uc?id=165Et85R8YnL-5NcehG0fzqsnAUN8uxUJ
unzip pretrained_model.zip

You can use eval/eval.py to evaluate the pretrained model. For example, to obtain the PSNR, SSIM and LPIPS on the fern scene in the real forward-facing dataset, you can first specify your paths in configs/eval_llff.txt and then run:

cd eval/
python eval.py --config ../configs/eval_llff.txt

Rendering videos of smooth camera paths

You can use render_llff_video.py to render videos of smooth camera paths for the real forward-facing scenes. For example, you can first specify your paths in configs/eval_llff.txt and then run:

cd eval/
python render_llff_video.py --config ../configs/eval_llff.txt

You can also capture your own data of forward-facing scenes and synthesize novel views using our method. Please follow the instructions from LLFF on how to capture and process the images.

Training

We strongly recommend you to train the model with multiple GPUs:

# this example uses 8 GPUs (nproc_per_node=8) 
python -m torch.distributed.launch --nproc_per_node=8 train.py --config configs/pretrain.txt

Alternatively, you can train with a single GPU by setting distributed=False in configs/pretrain.txt and running:

python train.py --config configs/pretrain.txt

Finetuning

To finetune on a specific scene, for example, fern, using the pretrained model, run:

# this example uses 2 GPUs (nproc_per_node=2) 
python -m torch.distributed.launch --nproc_per_node=2 train.py --config configs/finetune_llff.txt

Additional information

  • Our current implementation is not well-optimized in terms of the time efficiency at inference. Rendering a 1000x800 image can take from 30s to over a minute depending on specific GPU models. Please make sure to maximize the GPU memory utilization by increasing the size of the chunk to reduce inference time. You can also try to decrease the number of input source views (but subject to performance loss).
  • If you want to create and train on your own datasets, you can implement your own Dataset class following our examples in ibrnet/data_loaders/. You can verify the camera poses using data_verifier.py in ibrnet/data_loaders/.
  • Since the evaluation datasets are either object-centric or forward-facing scenes, our provided view selection methods are very simple (based on either viewpoints or camera locations). If you want to evaluate our method on new scenes with other kinds of camera distributions, you might need to implement your own view selection methods to identify the most effective source views.
  • If you have any questions, you can contact [email protected].

Citation

@inproceedings{wang2021ibrnet,
  author    = {Wang, Qianqian and Wang, Zhicheng and Genova, Kyle and Srinivasan, Pratul and Zhou, Howard  and Barron, Jonathan T. and Martin-Brualla, Ricardo and Snavely, Noah and Funkhouser, Thomas},
  title     = {IBRNet: Learning Multi-View Image-Based Rendering},
  booktitle = {CVPR},
  year      = {2021}
}

Comments
  • Downloading RealEstate10K frames fails

    Downloading RealEstate10K frames fails

    Hi,

    I am trying to train the network but I haven't been able to download the RealEstate10K frames from the download script. It doesn't give me any errors and the output says that they are downloaded, but none of them are and they all go into failed_videos_train.txt.

    Is there a different way I could download these frames to reproduce the results?

    Thank you

    opened by violetamenendez 4
  • The results generated with the pretrained weights looks weired

    The results generated with the pretrained weights looks weired

    I have downloaded the evaluation dataset and the pre-trained weights, but the generated with the pre-trained weights look like this: image and the score is also low: image The scores for other test sets are also low.

    opened by FomalhautB 2
  • The results of ibrnet_collect is weired.

    The results of ibrnet_collect is weired.

    The results on ibrnet_collected dataset don't look very good but with the same pretrained model, it looks good on llff datasets. GT image OUTPUT image GT image OUTPUT IMG_3099_pred_fine

    opened by silence401 1
  • The evaluated performance is a little bit lower than in the paper

    The evaluated performance is a little bit lower than in the paper

    I downloaded the pretrained weights and used the eval_deepvoxels.sh script to evaluate the model. But the evaluated performance is a little bit lower than in the paper:

    ------cube-------
    final coarse psnr: 32.93767583847046, final fine psnr: 32.02404493331909
    fine coarse ssim: 0.9823936659097672, final fine ssim: 0.9840710365772247 
    final coarse lpips: 0.019328504391014575, fine fine lpips: 0.019714539949782194 
    
    ------vase-------
    final coarse psnr: 34.84811542510986, final fine psnr: 35.24699348449707
    fine coarse ssim: 0.9875932204723358, final fine ssim: 0.9840710365772247 
    final coarse lpips: 0.01578717289492488, fine fine lpips: 0.015975559004582463 
    
    ------armchair-------
    final coarse psnr: 39.09828433990479, final fine psnr: 38.42974273681641
    fine coarse ssim: 0.9945194631814956, final fine ssim: 0.9945065212249756 
    final coarse lpips: 0.027687973510473966, fine fine lpips: 0.02769788108766079 
    
    ------greek-------
    final coarse psnr: 38.57263313293457, final fine psnr: 38.17310089111328
    fine coarse ssim: 0.984335949420929, final fine ssim: 0.9856698685884475 
    final coarse lpips: 0.024405473172664643, fine fine lpips: 0.022999074896797537 
    

    No matter the average of coarse or fine models, the average score is about 3-5% lower than in the paper.

    opened by FomalhautB 1
  • Obtaining Mesh through marching cubes as in NeRF

    Obtaining Mesh through marching cubes as in NeRF

    Hello! thanks for the great work! I was wondering how we could obtain a 3D mesh model as in NeRF using IBRNet! The input to the model is a source view's viewing directions, and I am unsure how we could retrieve the sigma value of a specific x,y,z location!

    thank you

    opened by november07 1
  • normalize pixels?

    normalize pixels?

    Hi,

    I read the code in projection.py. However, I wonder why we need to normalize the pixels after reprojecting 3D points to the image plane.

        def normalize(self, pixel_locations, h, w):
            resize_factor = torch.tensor([w-1., h-1.]).to(pixel_locations.device)[None, None, :]
            normalized_pixel_locations = 2 * pixel_locations / resize_factor - 1.  # [n_views, n_points, 2]
            return normalized_pixel_locations
    
    opened by AIBluefisher 1
  • Identification for GSO objects in the repo

    Identification for GSO objects in the repo

    Thanks for your impressive work! I want to know the relationship between the object id in your repo and those in the GSO website. For example, the names from the GSO website are just like "YumYum_D3_Liquid"/"ZigKick_Hoops", but the names in your repo are in the format like "hN6VWtwq96u"/"Fb7Ffb1zM46". Could you please tell me about the name of GSO objects you use?

    opened by ZhangCYG 1
  • Config of finetuning on Realistic Synthetic dataset?

    Config of finetuning on Realistic Synthetic dataset?

    Hi, I noticed you only have the fine-tuning config for LLFF dataset. Could you provide more information on fine-tuning on other datasets, i.e. Realistic Synthetic 360 and Diffuse Synthetic 360?

    opened by cwchenwang 0
  • About ablation study

    About ablation study

    Hi, I really like this paper and codes! The codes are so clean and easy to understand :)

    I have a question about ablation results regarding "Multi-view feature aggregation" (Section 3.2.1) in the paper. The paper says that a global feature, which consists of mean/variance of multi-view features, improves the model's occlusion handling capability against a direct average or max-pooling in a PointNet. But I cannot find experiments about this in the manuscript and suppl.

    Could you give some results for the statement? Or at least some insights!

    Many thanks in advance.

    opened by hongsukchoi 0
Owner
Google Interns
Google Interns
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 585 Jan 4, 2023
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

RSG: A Simple but Effective Module for Learning Imbalanced Datasets (CVPR 2021) A Pytorch implementation of our CVPR 2021 paper "RSG: A Simple but Eff

null 120 Dec 12, 2022
Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

Zhengxia Zou 1.5k Dec 28, 2022
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Qin Wang 87 Jan 8, 2023
PyTorch reimplementation of the paper Involution: Inverting the Inherence of Convolution for Visual Recognition [CVPR 2021].

Involution: Inverting the Inherence of Convolution for Visual Recognition Unofficial PyTorch reimplementation of the paper Involution: Inverting the I

Christoph Reich 100 Dec 1, 2022
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
Implementation for the paper SMPLicit: Topology-aware Generative Model for Clothed People (CVPR 2021)

SMPLicit: Topology-aware Generative Model for Clothed People [Project] [arXiv] License Software Copyright License for non-commercial scientific resear

Enric Corona 225 Dec 13, 2022
The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Graph Optimizer This repo contains the official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averagin

Chenyu 109 Dec 23, 2022
Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences", CVPR 2021.

HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature fo

Google Interns 50 Dec 21, 2022
Implementation of the CVPR 2021 paper "Online Multiple Object Tracking with Cross-Task Synergy"

Online Multiple Object Tracking with Cross-Task Synergy This repository is the implementation of the CVPR 2021 paper "Online Multiple Object Tracking

null 54 Oct 15, 2022
Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

SCGAN Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer" Prepare The pre-trained model is avaiable at http

null 118 Dec 12, 2022
This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Omnimatte in PyTorch This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effect

Erika Lu 728 Dec 28, 2022
The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

null 9 Nov 14, 2022
The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

FAPIS The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter Introduction This repo is primari

Khoi Nguyen 8 Dec 11, 2022
Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Adaptive Class Suppression Loss for Long-Tail Object Detection This repo is the official implementation for CVPR 2021 paper: Adaptive Class Suppressio

CASIA-IVA-Lab 67 Dec 4, 2022
Official PyTorch implementation of RobustNet (CVPR 2021 Oral)

RobustNet (CVPR 2021 Oral): Official Project Webpage Codes and pretrained models will be released soon. This repository provides the official PyTorch

Sungha Choi 173 Dec 21, 2022
PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

Completer: Incomplete Multi-view Clustering via Contrastive Prediction This repo contains the code and data of the following paper accepted by CVPR 20

XLearning Group 72 Dec 7, 2022