MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Overview

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Codes for the following paper:

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images
Benjamin Attal, Selena Ling, Aaron Gokaslan, Christian Richardt, James Tompkin
ECCV 2020

High-level overview of approach.

See more at our project page.

If you use these codes, please cite:

@inproceedings{Attal:2020:ECCV,
    author    = "Benjamin Attal and Selena Ling and Aaron Gokaslan and Christian Richardt and James Tompkin",
    title     = "{MatryODShka}: Real-time {6DoF} Video View Synthesis using Multi-Sphere Images",
    booktitle = "European Conference on Computer Vision (ECCV)",
    month     = aug,
    year      = "2020",
    url       = "https://visual.cs.brown.edu/matryodshka"
}

Note that our codes are based on the code from the paper "Stereo Maginification: Learning View Synthesis using Multiplane Images" by Zhou et al. [1], and on the code from the paper "Pixel2mesh: Generating 3D Mesh Models from Single RGB Images." by Wang et al. [3]. Please also cite their work.

Setup

  • Create a conda environment from the matryodshka-gpu.yml file.
  • Run ./download_glob.sh to download the files needed for training and testing.
  • Download the dataset as in Section Replica dataset.

Training the model

See train.py for training the model.

  • To train with transform inverse regularization, use --transform_inverse_reg flag.

  • To train with CoordNet, use --coord_net flag.

  • To experiment with different losses (elpips or l2), use --which_loss flag.

    • To train with spherical weighting on loss maps, use --spherical_attention flag.
  • To train with graph convolution network (GCN), use --gcn flag. Note the particular GCN architecture definition we used is from the Pixel2Mesh repo [3].

  • The current scripts support training on Replica 360 and cubemap dataset and RealEstate10K dataset. Use --input_type to switch between these types of inputs (ODS, PP, REALESTATE_PP).

See scripts/train/*.sh for some sample scripts.

Testing the model

See test.py for testing the model with replica-360 test set.

  • When testing on video frames, e.g. test_video_640x320, include on_video in --test_type flag.
  • When testing on high-resolution images, include high_res in --test_type flag.

See scripts/test/*.sh for sample scripts.

Evaluation

See eval.py for evaluating the model, which saves the metric scores into a json file. We evaluate our models on

  • third-view reconstruction quality

    • See scripts/eval/*-reg.sh for a sample script.
  • frame-to-frame reconstruction differences on video sequences to evaluate the effect of transform inverse regularization on temporal consistency.

    • Include on_video when specifying the --eval_type flag.
    • See scripts/eval/*-video.sh for a sample script.

Pre-trained model

Download models pre-trained with and without transform inverse regularization by running ./download_model.sh. These can also be found here at the Brown library for archival purposes.

Replica dataset

We rendered a 360 and a cubemap dataset for training from the Facebook Replica Dataset [2]. This data can be found here at the Brown library for archival purposes. You should have access to the following datasets.

  • train_640x320
  • test_640x320
  • test_video_640x320

You can also find the camera pose information here that were used to render the training dataset. Each line of the txt fileach line of the txt file is formatted as below:

camera_position_x camera_position_y camera_position_z ods_baseline target1_offset_x target1_offset_y target1_offset_z target2_offset_x target2_offset_y target2_offset_z target3_offset_x target3_offset_y target3_offset_z

We also have a fork of the Replica dataset codebase which can regenerate our data from scratch. This contains customized rendering scripts that allow output of ODS, equirectangular, and cubemap projection spherical imagery, along with corresponding depth maps.

Note that the 360 dataset we release for download was rendered with an incorrect 90-degree camera rotation around the up vector and a horizontal flip. Regenerating the dataset from our released code fork with the customized rendering scripts will not include this coordinate change. The output model performance should be approximately the same.

Exporting the model to ONNX

We export our model to ONNX by firstly converting the checkpoint into a pb file, which then gets converted to an onnx file with the tf2onnx module. See export.py for exporting the model into .pb file.

See scripts/export/model-name.sh for a sample script to run export.py, and scripts/export/pb2onnx.sh for a sample script to run pb-to-onnx conversion.

Unity Application + ONNX to TensorRT Conversion

We are still working on releasing the real-time Unity application and onnx2trt conversion scripts. Please bear with us!

References

[1] Zhou, Tinghui, et al. "Stereo magnification: Learning view synthesis using multiplane images." arXiv preprint arXiv:1805.09817 (2018). https://github.com/google/stereo-magnification

[2] Straub, Julian, et al. "The Replica dataset: A digital replica of indoor spaces." arXiv preprint arXiv:1906.05797 (2019). https://github.com/facebookresearch/Replica-Dataset

[3] Wang, Nanyang, et al. "Pixel2mesh: Generating 3d mesh models from single rgb images." Proceedings of the European Conference on Computer Vision (ECCV). 2018. https://github.com/nywang16/Pixel2Mesh

Comments
  • Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>

    Error reported to Coordinator:

    Hello,

    Thanks for the amazing work! :) I have been trying to run a test data. This is the error I am getting:

    
    INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, 2 root error(s) found.
      (0) Invalid argument: Expect 8 fields but have 13 in record 0
    	 [[{{node DecodeCSV}}]]
    	 [[IteratorGetNext]]
      (1) Invalid argument: Expect 8 fields but have 13 in record 0
    	 [[{{node DecodeCSV}}]]
    	 [[IteratorGetNext]]
    	 [[IteratorGetNext/_231]]
    0 successful operations.
    0 derived errors ignored.
    I0406 06:04:50.266259 139791091263232 coordinator.py:224] Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, 2 root error(s) found.
      (0) Invalid argument: Expect 8 fields but have 13 in record 0
    	 [[{{node DecodeCSV}}]]
    	 [[IteratorGetNext]]
      (1) Invalid argument: Expect 8 fields but have 13 in record 0
    	 [[{{node DecodeCSV}}]]
    	 [[IteratorGetNext]]
    	 [[IteratorGetNext/_231]]
    0 successful operations.
    0 derived errors ignored.
    Traceback (most recent call last):
      File "test.py", line 399, in <module>
        tf.app.run()
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
        _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
      File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 303, in run
        _run_main(main, args)
      File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 251, in _run_main
        sys.exit(main(argv))
      File "test.py", line 206, in main
        [ins, outs, jitter_outs, step] = sess.run([inputs, outputs, jitter_outputs, global_step])
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 950, in run
        run_metadata_ptr)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
        feed_dict_tensor, options, run_metadata)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
        run_metadata)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
        raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
      (0) Invalid argument: Expect 8 fields but have 13 in record 0
    	 [[{{node DecodeCSV}}]]
    	 [[IteratorGetNext]]
      (1) Invalid argument: Expect 8 fields but have 13 in record 0
    	 [[{{node DecodeCSV}}]]
    	 [[IteratorGetNext]]
    	 [[IteratorGetNext/_231]]
    0 successful operations.
    0 derived errors ignored.
    

    Below is the command I have been using:

    python test.py \
    --cameras_glob '/home/DNN_Methods/Softwares/matryodshka-main/replica-6dof/6dof/apartment_0_6dof.txt' \
    --image_dir '/home/DNN_Methods/Softwares/matryodshka-main/input/' \
    --test_type on_video \
    --input_type ODS \
    --experiment_name matryodshka-with-transform-inverse-reg-checkpoint \
    --checkpoint_dir '/home/DNN_Methods/Softwares/matryodshka-main/pretrained-models/' \
    --output_root '/home/DNN_Methods/Softwares/matryodshka-main/trial_1/' \
    --coord_net
    

    Could you please tell me where it is going wrong?

    Thanks very much! Any help would be much appreciated :)

    opened by TanyaStevens 4
  • latest commit not work

    latest commit not work

    Hello guys, thanks for sharing your code & data, but the latest commit in this repo is not working for me. By using the pertained model & Replica 360 Data that you released, the output PSV is all in single color which seems wrong. However, the commit 55df70335 works pretty well. So I guess that your latest commit maybe not match with your released model or data.

    opened by Cydiater 2
  • Training MatryODShka on ERP Images

    Training MatryODShka on ERP Images

    Hello @jamestompkin and @iszihan, I am a student currently working with spherical view synthesis. I would like to be able to make comparisons with MatryODShka but am using ERP images instead of ODS images. Would it be possible to also train MatryODShka with ERP images as input? If not, do you think it would be fair to instead create the 2 sphere volumes using 2 ERP images and projecting each to a location between them?

    opened by MarcelRogge 2
  • som questions about MSI feature and training

    som questions about MSI feature and training

    Hi @jamestompkin, thanks for your generous sharing, and I want to ask some questions: First, I came to notice that in another paperimmersive light field video with a layered mesh representation, they also use MSI features to train, are they the same one?

    Second, as you guys use ODS images to train and the paper above use almost 50 images from different angles to train, I wonder whether can use some fisheye cameras(e.g, four located at the vertex of a square) to capture a dynamic scene and use your networks to train since fisheye cameras can also capture the same scene using fewer cameras compare to the above paper

    Third, using four fisheye cameras can capture more details than your two ODS images, I wonder if these inputs can get better results?

    opened by visonpon 1
  • An error when running test.py

    An error when running test.py

    Thanks very much for sharing the code. However, when I try to execute test.py with pre-trained models, I receive an error about checkpoint.

    Exception has occurred: InvalidArgumentError Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

    Assign requires shapes of both tensors to match. lhs shape= [3,3,256,256] rhs shape= [3,3,257,256] [[node save/Assign_17 (defined at home/tuxiang/Documents/wjy/matryodshka-main/test.py:192) ]]

    Errors may have originated from an input operation. Input Source operations connected to node save/Assign_17: net/conv3_1/weights (defined at home/tuxiang/Documents/wjy/matryodshka-main/matryodshka/nets.py:411)

    It would be great if you could provide some insight for solving this issue.

    Thank you very much.

    opened by Miawwww 0
  • Wrap kernels + reduced pole artifacts

    Wrap kernels + reduced pole artifacts

    Added wrap padding to MSI training network (note that it has not been added to the inference net). This helps reduce artifacts on the sides of result images/depthmaps. Also modified the ODS projection method to set out-of-bounds phi values to the top/bottom of the image to reduce pole artifacts in results.

    opened by maggie1059 0
  • Where is

    Where is "matryodshka-gpu.yml"?

    I would like to try the code, but I can't find the "matryodshka-gpu.yml" so that I don't know the version of tensorflow and other package. Can you add the file into this repository?Thanks!

    opened by FutureShow 0
Owner
Brown University Visual Computing Group
Brown University Visual Computing Group
Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

null 536 Dec 20, 2022
Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

null 538 Jan 9, 2023
Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

Jonas Köhler 893 Dec 28, 2022
img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation Figure 1: We estimate the 6DoF rigid transformation of a 3D face (rendered in si

Vítor Albiero 519 Dec 29, 2022
Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Good news! We release a clean version of PVNet: clean-pvnet, including how to train the PVNet on the custom dataset. Use PVNet with a detector. The tr

ZJU3DV 722 Dec 27, 2022
Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

SUO-SLAM This repository hosts the code for our CVPR 2022 paper "Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation". ArXiv li

Robot Perception & Navigation Group (RPNG) 97 Jan 3, 2023
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

German Bauer 11 Feb 8, 2022
PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images This r

null 63 Dec 16, 2022
Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

Rounak Das 1 Feb 15, 2022
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Facebook Research 296 Dec 29, 2022
Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

One-Shot Free-View Neural Talking Head Synthesis Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Vide

ZLH 406 Dec 23, 2022
Out-of-boundary View Synthesis towards Full-frame Video Stabilization

Out-of-boundary View Synthesis towards Full-frame Video Stabilization Introduction | Update | Results Demo | Introduction This repository contains the

null 25 Oct 10, 2022
[ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing

NeRFlow [ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing Datasets The pouring dataset used for experiments can be download he

null 44 Dec 20, 2022
Dynamic View Synthesis from Dynamic Monocular Video

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer This repository contains code to compute depth from a

Intelligent Systems Lab Org 2.3k Jan 1, 2023
Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video Project Website | Video | Paper Dynamic View Synthesis from Dynamic Monocular Video Chen Gao, Ayus

Chen Gao 139 Dec 28, 2022
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Achraf Rahouti 3 Nov 30, 2021
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis Multi-View Consistent Generative Adversarial Networks for 3D-aware

Xuanmeng Zhang 78 Dec 10, 2022
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 585 Jan 4, 2023
Direct application of DALLE-2 to video synthesis, using factored space-time Unet and Transformers

DALLE2 Video (wip) ** only to be built after DALLE2 image is done and replicated, and the importance of the prior network is validated ** Direct appli

Phil Wang 105 May 15, 2022