Official code release for "GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis"

Related tags

Deep Learning graf
Overview

GRAF


This repository contains official code for the paper GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis.

You can find detailed usage instructions for training your own models and using pre-trained models below.

If you find our code or paper useful, please consider citing

@inproceedings{Schwarz2020NEURIPS,
  title = {GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis},
  author = {Schwarz, Katja and Liao, Yiyi and Niemeyer, Michael and Geiger, Andreas},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year = {2020}
}

Installation

First you have to make sure that you have all dependencies in place. The simplest way to do so, is to use anaconda.

You can create an anaconda environment called graf using

conda env create -f environment.yml
conda activate graf

Next, for nerf-pytorch install torchsearchsorted. Note that this requires torch>=1.4.0 and CUDA >= v10.1. You can install torchsearchsorted via

cd submodules/nerf_pytorch
pip install -r requirements.txt
cd torchsearchsorted
pip install .
cd ../../../

Demo

You can now test our code via:

python eval.py configs/carla.yaml --pretrained --rotation_elevation

This script should create a folder results/carla_128_from_pretrained/eval/ where you can find generated videos varying camera pose for the Cars dataset.

Datasets

If you only want to generate images using our pretrained models you do not need to download the datasets. The datasets are only needed if you want to train a model from scratch.

Cars

To download the Cars dataset from the paper simply run

cd data
./download_carla.sh
cd ..

This creates a folder data/carla/ downloads the images as a zip file and extracts them to data/carla/. While we do not use camera poses in this project we provide them for completeness. Your can download them by running

cd data
./download_carla_poses.sh
cd ..

This downloads the camera intrinsics (single file, equal for all images) and extrinsics corresponding to each image.

Faces

Download celebA. Then replace data/celebA in configs/celebA.yaml with *PATH/TO/CELEBA*/Img/img_align_celebA.

Download celebA_hq. Then replace data/celebA_hq in configs/celebAHQ.yaml with *PATH/TO/CELEBA_HQ*.

Cats

Download the CatDataset. Run

cd data
python preprocess_cats.py PATH/TO/CATS/DATASET
cd ..

to preprocess the data and save it to data/cats. If successful this script should print: Preprocessed 9407 images.

Birds

Download CUB-200-2011 and the corresponding Segmentation Masks. Run

cd data
python preprocess_cub.py PATH/TO/CUB-200-2011 PATH/TO/SEGMENTATION/MASKS
cd ..

to preprocess the data and save it to data/cub. If successful this script should print: Preprocessed 8444 images.

Usage

When you have installed all dependencies, you are ready to run our pre-trained models for 3D-aware image synthesis.

Generate images using a pretrained model

To evaluate a pretrained model, run

python eval.py CONFIG.yaml --pretrained --fid_kid --rotation_elevation --shape_appearance

where you replace CONFIG.yaml with one of the config files in ./configs.

This script should create a folder results/EXPNAME/eval with FID and KID scores in fid_kid.csv, videos for rotation and elevation in the respective folders and an interpolation for shape and appearance, shape_appearance.png.

Note that some pretrained models are available for different image sizes which you can choose by setting data:imsize in the config file to one of the following values:

configs/carla.yaml: 
    data:imsize 64 or 128 or 256 or 512
configs/celebA.yaml:
    data:imsize 64 or 128
configs/celebAHQ.yaml:
    data:imsize 256 or 512

Train a model from scratch

To train a 3D-aware generative model from scratch run

python train.py CONFIG.yaml

where you replace CONFIG.yaml with your config file. The easiest way is to use one of the existing config files in the ./configs directory which correspond to the experiments presented in the paper. Note that this will train the model from scratch and will not resume training for a pretrained model.

You can monitor on http://localhost:6006 the training process using tensorboard:

cd OUTPUT_DIR
tensorboard --logdir ./monitoring --port 6006

where you replace OUTPUT_DIR with the respective output directory.

For available training options, please take a look at configs/default.yaml.

Evaluation of a new model

For evaluation of the models run

python eval.py CONFIG.yaml --fid_kid --rotation_elevation --shape_appearance

where you replace CONFIG.yaml with your config file.

Multi-View Consistency Check

You can evaluate the multi-view consistency of the generated images by running a Multi-View-Stereo (MVS) algorithm on the generated images. This evaluation uses COLMAP and make sure that you have COLMAP installed to run

python eval.py CONFIG.yaml --reconstruction

where you replace CONFIG.yaml with your config file. You can also evaluate our pretrained models via:

python eval.py configs/carla.yaml --pretrained --reconstruction

This script should create a folder results/EXPNAME/eval/reconstruction/ where you can find generated multi-view images in images/ and the corresponding 3D reconstructions in models/.

Further Information

GAN training

This repository uses Lars Mescheder's awesome framework for GAN training.

NeRF

We base our code for the Generator on this great Pytorch reimplementation of Neural Radiance Fields.

Comments
  • How much is the required GPU memory at least?

    How much is the required GPU memory at least?

    Hi, thanks for releasing the codes. It is really interesting work. I am trying to train a model from scratch on CUB dataset. After finishing the preparation, I run the following command CUDA_VISIBLE_DEVICES=0 python train.py configs/cub.yaml using single GPU. However, After this log [cub_64 epoch 7, it 7990, t 1.099] g_loss = 1.0365, d_loss = 1.1345, reg=0.0230, I got RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 11.91 GiB total capacity; 10.89 GiB already allocated; 1.06 MiB free; 11.29 GiB reserved in total by PyTorch)

    I am using TITAN X (Pascal) with 12196MiB available memory. I wonder how much is the minimum GPU memory that can run the training. Or do you have any instructions for adjusting hyperparameters to decrease memory consumption?

    Thanks!

    opened by yangyu12 10
  • How did you get the pose

    How did you get the pose

    Hi there. I read in your paper that you can "learn a 3D-aware generative model from unposed 2D images." But in section 3.2.1 you also mentioned that "We sample the camera pose ξ = [R|t] from a pose distribution p ξ ." So I am wondering what's the pose in fact. How to get pose from the unposed images?

    opened by JasonBoy1 5
  • About input of network

    About input of network

    I have a question. During evaluation, you network do not receive the input image. Maybe your method needs to train a model for every single object? How do you rotate the image without the knowledge of original image?

    Thank you.

    opened by JasonBoy1 3
  • Error with CUDA 11.1

    Error with CUDA 11.1

    I ran train.py with celebA. Used the conda env, with CUDA 11.1, and got this error: Traceback (most recent call last): File "train.py", line 139, in <module> x_real = get_nsamples(train_loader, ntest) File "/home/ed/Documents/repos/graf/graf/utils.py", line 11, in get_nsamples x_next = next(iter(data_loader)) File "/home/ed/.conda/envs/graf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in __next__ data = self._next_data() File "/home/ed/.conda/envs/graf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data return self._process_data(data) File "/home/ed/.conda/envs/graf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data data.reraise() File "/home/ed/.conda/envs/graf/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/ed/.conda/envs/graf/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/home/ed/.conda/envs/graf/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ed/.conda/envs/graf/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ed/Documents/repos/graf/graf/datasets.py", line 41, in __getitem__ img = self.transform(img) File "/home/ed/.conda/envs/graf/lib/python3.8/site-packages/torchvision/transforms/transforms.py", line 67, in __call__ img = t(img) File "/home/ed/.conda/envs/graf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/ed/.conda/envs/graf/lib/python3.8/site-packages/torchvision/transforms/transforms.py", line 615, in forward if torch.rand(1) < self.p: File "/home/ed/.conda/envs/graf/lib/python3.8/site-packages/torch/cuda/__init__.py", line 163, in _lazy_init raise RuntimeError( RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

    opened by ebartrum 3
  • quention about paper

    quention about paper

    Hi,Thanks for your great work, can you explain for the detail about 'It is further important to note that we do not downsample the real image I based on s, but instead query I at sparse locations to retain high-frequency details, see Fig. 3.'? As far as i am concerned, i think the bilinear sampling operation is just downsampling , but i down konw what exactly your means "query" in this sentence.

    opened by diaodeyi 2
  • Question about fine vs coarse raysampling

    Question about fine vs coarse raysampling

    Hey, My impression from reading the paper was that the coarse then fine raysampling steps are performed on the same implicit function in order to sample in areas of higher alpha density for this function. However, my understanding from reading the code is that generator and generator fine have 2 separate implicit functions with different parameters. Could you clarify which of these is correct? As far as I understand, only the output from generator fine is used to compute the loss.

    opened by ebartrum 2
  • Question about the 150k chairs dataset from PhotoShapes

    Question about the 150k chairs dataset from PhotoShapes

    You mention you've used "150k Chairs from Photoshapes". Is there a script/generated images that you can share for this dataset to use? Thanks!

    opened by vishnukool 2
  • There are many different categories and labels in a picture.

    There are many different categories and labels in a picture.

    This is a great project. I want to use giraffe to generate data of different shapes or specific conditions (such as darker scenes) from my dataset. But my custom dataset is very similar to COCO dataset. There are many different categories and labels in a picture. I also hope to be able to generate higher resolution images such as 640x640x3. Please suggest what I need to pay attention to.

    opened by quan70636 1
  • Question about training time and gpu

    Question about training time and gpu

    Thank you for your interesting works. I'm not familiar with this field, neural rendering, so I have no idea of the training time and required memory. Could you let me know the training time and which GPU did you use for training?

    opened by natureyoo 1
  • setting u and v for the camera poses

    setting u and v for the camera poses

    Hello! I would like to appreciate sharing your great work, it is indeed a wonderful work!

    1)May I ask following question regarding the code of GRAF? I know that we control the camera poses by setting the min and max value of u and v. But I would like to know how exactly the camera poses (rotation, elevation) are calculated using u and v from the following codes.

    u = azimuth / 360 v = 0.5* (1-cos(polar * pi/180))

    I would greatly appreciate if you could explain this to me!

    2)Also, if I have the dataset that has the image of a human face from every degree (0-360) then should I reset the u and v accordingly for the training? In that case, may I ask how could I set the u and v?

    Thank you so much!

    opened by YoungJoongUNC 1
  • ArgumentError

    ArgumentError

    https://github.com/autonomousvision/graf/blob/eaa2d6e47e92a7f56f9531299b032c20c5b6caf6/submodules/nerf_pytorch/run_nerf.py#L368

    The argument is duplicated, and causes an error It should be parser.add_argument ("- N_rand", type = int, default = 32 * 32 * 4, help = 'batch size (number of random rays per gradient step)')

    opened by ytuza 1
  • Different Raysampler between train and val/test

    Different Raysampler between train and val/test

    I notice that in graf/transforms.py, during train, class FlexGridRaySampler(RaySampler) is used but during val/test, class FullRaySampler(RaySampler) is used

    I wonder why use different raysampler? what is the principle behind that opreation choice?

    opened by AddASecond 1
  • A question about shape/appearance codes

    A question about shape/appearance codes

    When I tried to analyze how the Shape/APPEARANCE CODES was generated, I found that there was no obvious part in the code. In the Forward code of NERF, there seemed to be no APPEARANCE CODE, and the shape code was the input coordinate?This is strange. What should be like?

    opened by yuedumingz 0
  • Rendering angles

    Rendering angles

    Hello, I was training the model on my data with

      umax: 1.0 
      umin: 0 
      vmax: 0.45642212862617093 
      vmin: 0.32898992833716556
    

    but then I am trying to render the images/videos from a different angle and it is not really working:

      umax: 0.04166666666666667
      umin: 0.
      vmax: 1.
      vmin: 0.  
    

    I am kind of trying to get the result with almost no rotation in the azimuth angle, but with half-rotation for the polar angle. Do I need to actually retrain the whole model for this because the difference in the angles is too big? I tried something similar with NeRF before and the rotation actually worked, only that I was getting just noise in the area outside of the object.

    opened by povolann 0
  • about chamfer distance

    about chamfer distance

    Hi, Thank you for sharing your nice work. I have a question about one of the evaluation metrics. Is it possible to compute chamfer distance using this code? Thanks in advance.

    opened by emjay73 0
  • How to generate images of certain categories?

    How to generate images of certain categories?

    Since GRAF is not conditioned on any label, during training time do the latent shape and appearance vectors vary from image to image rather than category from category? If so, how to generate a certain type of image (eg. sofa instead of chair)?

    Thanks!

    opened by ruipengZ 0
  • Fixing Argument Error in run_nerf.py and adding gitignore file

    Fixing Argument Error in run_nerf.py and adding gitignore file

    As pointed out in one of the issues. The below argument is duplicated, and causes an error. graf/submodules/nerf_pytorch/run_nerf.py Line 368 in eaa2d6e parser.add_argument("--N_samples", type=int, default=32*32*4, help='batch size (number of random rays per gradient step)')

    It should be parser.add_argument ("- N_rand", type = int, default = 32 * 32 * 4, help = 'batch size (number of random rays per gradient step)')

    opened by vishnukool 0
Owner
null
This is the official code release for the paper Shape and Material Capture at Home

This is the official code release for the paper Shape and Material Capture at Home. The code enables you to reconstruct a 3D mesh and Cook-Torrance BRDF from one or more images captured with a flashlight or camera with flash.

null 89 Dec 10, 2022
Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

null 235 Dec 26, 2022
Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Moustafa Meshry 16 Oct 5, 2022
Official code release for: EditGAN: High-Precision Semantic Image Editing

Official code release for: EditGAN: High-Precision Semantic Image Editing

null 565 Jan 5, 2023
Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

null 536 Dec 20, 2022
The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Domain Generalization for Medical Imaging Classification with Linear Dependency Regularization The code release of paper 'Domain Generalization for Me

Yufei Wang 56 Dec 28, 2022
Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021)

Transferable Semantic Augmentation for Domain Adaptation Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021) Paper

null 66 Dec 16, 2022
Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

UBC Computer Vision Group 360 Jan 6, 2023
Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

The Boombox: Visual Reconstruction from Acoustic Vibrations Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick Columbia University Project Website |

Boyuan Chen 12 Nov 30, 2022
We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

ConTNet Introduction ConTNet (Convlution-Tranformer Network) is proposed mainly in response to the following two issues: (1) ConvNets lack a large rec

null 93 Nov 8, 2022
Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

Geometry-Aware Gradient Algorithms for Neural Architecture Search This repository contains the code required to run the experiments for the DARTS sear

null 18 May 27, 2022
This is the dataset and code release of the OpenRooms Dataset.

This is the dataset and code release of the OpenRooms Dataset.

Visual Intelligence Lab of UCSD 95 Jan 8, 2023
Code release of paper "Deep Multi-View Stereo gone wild"

Deep MVS gone wild Pytorch implementation of "Deep MVS gone wild" (Paper | website) This repository provides the code to reproduce the experiments of

François Darmon 53 Dec 24, 2022
Code release for DS-NeRF (Depth-supervised Neural Radiance Fields)

Depth-supervised NeRF: Fewer Views and Faster Training for Free Project | Paper | YouTube Pytorch implementation of our method for learning neural rad

null 524 Jan 8, 2023
Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

BlockGAN Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images BlockGAN: Learning 3D Object-aware Scene Rep

null 41 May 18, 2022
Code Release for Learning to Adapt to Evolving Domains

EAML Code release for "Learning to Adapt to Evolving Domains" (NeurIPS 2020) Prerequisites PyTorch >= 0.4.0 (with suitable CUDA and CuDNN version) tor

null 23 Dec 7, 2022
Code release for "Self-Tuning for Data-Efficient Deep Learning" (ICML 2021)

Self-Tuning for Data-Efficient Deep Learning This repository contains the implementation code for paper: Self-Tuning for Data-Efficient Deep Learning

THUML @ Tsinghua University 101 Dec 11, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

null 68 Dec 14, 2022
Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification Code release for The Devil is in the Channels: Mutual-Channel

PRIS-CV: Computer Vision Group 230 Dec 31, 2022