Geometry-Free View Synthesis: Transformers and no 3D Priors

CompVis Heidelberg

Last update: Dec 22, 2022

Related tags

Overview

Geometry-Free View Synthesis: Transformers and no 3D Priors

Geometry-Free View Synthesis: Transformers and no 3D Priors
Robin Rombach*, Patrick Esser*, Björn Ommer
* equal contribution

arXiv | BibTeX | Colab

Interactive Scene Exploration Results

RealEstate10K:

Videos: short (2min) / long (12min)

ACID:

Videos: short (2min) / long (9min)

Demo

For a quickstart, you can try the Colab demo, but for a smoother experience we recommend installing the local demo as described below.

Installation

The demo requires building a PyTorch extension. If you have a sane development environment with PyTorch, g++ and nvcc, you can simply

pip install git+https://github.com/CompVis/geometry-free-view-synthesis#egg=geometry-free-view-synthesis

If you run into problems and have a GPU with compute capability below 8, you can also use the provided conda environment:

git clone https://github.com/CompVis/geometry-free-view-synthesis
conda env create -f geometry-free-view-synthesis/environment.yaml
conda activate geofree
pip install geometry-free-view-synthesis/

Running

After installation, running

braindance.py

will start the demo on a sample scene. Explore the scene interactively using the WASD keys to move and arrow keys to look around. Once positioned, hit the space bar to render the novel view with GeoGPT.

You can move again with WASD keys. Mouse control can be activated with the m key. Run braindance.py to run the demo on your own images. By default, it uses the re-impl-nodepth (trained on RealEstate without explicit transformation and no depth input) which can be changed with the --model flag. The corresponding checkpoints will be downloaded the first time they are required. Specify an output path using --video path/to/vid.mp4 to record a video.

> braindance.py -h
usage: braindance.py [-h] [--model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}] [--video [VIDEO]] [path]

What's up, BD-maniacs?

key(s)       action                  
=====================================
wasd         move around             
arrows       look around             
m            enable looking with mouse
space        render with transformer 
q            quit                    

positional arguments:
  path                  path to image or directory from which to select image. Default example is used if not specified.

optional arguments:
  -h, --help            show this help message and exit
  --model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}
                        pretrained model to use.
  --video [VIDEO]       path to write video recording to. (no recording if unspecified).

Training

Data Preparation

We support training on RealEstate10K and ACID. Both come in the same format as described here and the preparation is the same for both of them. You will need to have colmap installed and available on your $PATH.

We assume that you have extracted the .txt files of the dataset you want to prepare into $TXT_ROOT, e.g. for RealEstate:

> tree $TXT_ROOT
├── test
│   ├── 000c3ab189999a83.txt
│   ├── ...
│   └── fff9864727c42c80.txt
└── train
    ├── 0000cc6d8b108390.txt
    ├── ...
    └── ffffe622a4de5489.txt

and that you have downloaded the frames (we downloaded them in resolution 640 x 360) into $IMG_ROOT, e.g. for RealEstate:

> tree $IMG_ROOT
├── test
│   ├── 000c3ab189999a83
│   │   ├── 45979267.png
│   │   ├── ...
│   │   └── 55255200.png
│   ├── ...
│   ├── 0017ce4c6a39d122
│   │   ├── 40874000.png
│   │   ├── ...
│   │   └── 48482000.png
├── train
│   ├── ...

To prepare the $SPLIT split of the dataset ($SPLIT being one of train, test for RealEstate and train, test, validation for ACID) in $SPA_ROOT, run the following within the scripts directory:

python sparse_from_realestate_format.py --txt_src ${TXT_ROOT}/${SPLIT} --img_src ${IMG_ROOT}/${SPLIT} --spa_dst ${SPA_ROOT}/${SPLIT}

You can also simply set TXT_ROOT, IMG_ROOT and SPA_ROOT as environment variables and run ./sparsify_realestate.sh or ./sparsify_acid.sh. Take a look into the sources to run with multiple workers in parallel.

Finally, symlink $SPA_ROOT to data/realestate_sparse/data/acid_sparse.

First Stage Models

As described in our paper, we train the transformer models in a compressed, discrete latent space of pretrained VQGANs. These pretrained models can be conveniently downloaded by running

python scripts/download_vqmodels.py

which will also create symlinks ensuring that the paths specified in the training configs (see configs/*) exist. In case some of the models have already been downloaded, the script will only create the symlinks.

For training custom first stage models, we refer to the taming transformers repository.

Running the Training

After both the preparation of the data and the first stage models are done, the experiments on ACID and RealEstate10K as described in our paper can be reproduced by running

python geofree/main.py --base configs//_13x23_.yaml -t --gpus 0,

where is one of realestate/acid and is one of expl_img/expl_feat/expl_emb/impl_catdepth/impl_depth/impl_nodepth/hybrid. These abbreviations correspond to the experiments listed in the following Table (see also Fig.2 in the main paper)

Note that each experiment was conducted on a GPU with 40 GB VRAM.

BibTeX

@misc{rombach2021geometryfree,
      title={Geometry-Free View Synthesis: Transformers and no 3D Priors}, 
      author={Robin Rombach and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2104.07652},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Comments

downloading RealEstate10K

Hi, I downloaded the RealEstate10K dataset, and it only contains the txt files. Can you attach your code for downloading the dataset from those txt files? Thanks!

opened by avihu111 3
[Question] Multiple Image Input

How can I include multiple input images into the synth? When starting braindance.py path-to-my-imagefolder, a filemanager opens that just let's me select a single image.

I'd like to re-create a whole room by including the info's of multiple images.

opened by chris-aeviator 1
Training on objects

Hello,

I am curious if the implementation would be capable of novel view synthesis of objects if trained on multiview object data, as opposed to scene data? Do you see any potential blockers?

Thanks!

opened by lalalune 0
Rel10K training images for the first stage

Hi,

Could you share the Rel10K training images for the first stage training? We would like to train a different first stage model with the identical training data for fair comparison.

Best,

opened by hytseng0509 0
Why use inverse of intrinsics matrix?

I notice that in the multiembedder, the inverse intrinsic matrix is also included. Since this information should be already determined by K, why is this also used? Did you notice some improvement?

Thanks

opened by alextrevithick 1
there is no "points"

when using colmap to generate the new data. https://github.com/CompVis/geometry-free-view-synthesis/blob/00dc639c98dfb9246bee0009649c5be8f8b58e1e/scripts/sparse_from_realestate_format.py#L185

create empty data. But in the training code, there are many parts need to process this empty data. Why?

And could I do not use colmap to generate new data and train this model? I do not get the point to use it because we already have intrinsic params and camera pose. And I do not find any part in code using colmap data like database data.

Thank you!

opened by zimingzhong 0
Label assignment after source and destination frame ids are selected?

I am trying to understand the code and had one query about the dataloading in acid and real estate 10k dataset. I will try explaining it using the acid dataset case. The label assignment for large/small, forward/backward movement is done before actually sampling the source and destination frame ids. According to my understanding, this should be done after the frame id selection is done. Can you please let me know if I am missing something?

opened by DRealArun 1

Geometry-Free View Synthesis: Transformers and no 3D Priors

Related tags

Overview

Geometry-Free View Synthesis: Transformers and no 3D Priors

Interactive Scene Exploration Results

Demo

Installation

Running

Training

Data Preparation

First Stage Models

Running the Training

BibTeX

Comments

downloading RealEstate10K

[Question] Multiple Image Input

Training on objects

Rel10K training images for the first stage

Why use inverse of intrinsics matrix?

there is no "points"

Label assignment after source and destination frame ids are selected?

Owner

CompVis Heidelberg

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis

Implementation of ICCV19 Paper "Learning Two-View Correspondences and Geometry Using Order-Aware Network"

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

Official implementation of "Accelerating Reinforcement Learning with Learned Skill Priors", Pertsch et al., CoRL 2020

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Implementation of CVPR'2022:Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors

Implementation of CVPR'2022:Surface Reconstruction from Point Clouds by Learning Predictive Context Priors

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

[ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes