This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

Last update: Dec 30, 2022

Related tags

Deep Learning generative-adversarial-network generative-model nerf implicit-surfaces generative-modelling cvpr2021 neural-scene-representations

Overview

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

Project Page | Paper | Supplementary | Video | Slides | Blog | Talk

If you find our code or paper useful, please cite as

@inproceedings{GIRAFFE,
    title = {GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields},
    author = {Niemeyer, Michael and Geiger, Andreas},
    booktitle = {Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
    year = {2021}
}

TL; DR - Quick Start

First you have to make sure that you have all dependencies in place. The simplest way to do so, is to use anaconda.

You can create an anaconda environment called giraffe using

conda env create -f environment.yml
conda activate giraffe

You can now test our code on the provided pre-trained models. For example, simply run

python render.py configs/256res/cars_256_pretrained.yaml

This script should create a model output folder out/cars256_pretrained. The animations are then saved to the respective subfolders in out/cars256_pretrained/rendering.

Usage

Datasets

To train a model from scratch or to use our ground truth activations for evaluation, you have to download the respective dataset.

For this, please run

bash scripts/download_dataset.sh

and following the instructions. This script should download and unpack the data automatically into the data/ folder.

Controllable Image Synthesis

To render images of a trained model, run

python render.py CONFIG.yaml

where you replace CONFIG.yaml with the correct config file. The easiest way is to use a pre-trained model. You can do this by using one of the config files which are indicated with *_pretrained.yaml.

For example, for our model trained on Cars at 256x256 pixels, run

python render.py configs/256res/cars_256_pretrained.yaml

or for celebA-HQ at 256x256 pixels, run

python render.py configs/256res/celebahq_256_pretrained.yaml

Our script will automatically download the model checkpoints and render images. You can find the outputs in the out/*_pretrained folders.

Please note that the config files *_pretrained.yaml are only for evaluation or rendering, not for training new models: when these configs are used for training, the model will be trained from scratch, but during inference our code will still use the pre-trained model.

FID Evaluation

For evaluation of the models, we provide the script eval.py. You can run it using

python eval.py CONFIG.yaml

The script generates 20000 images and calculates the FID score.

Note: For some experiments, the numbers in the paper might slightly differ because we used the evaluation protocol from GRAF to fairly compare against the methods reported in GRAF.

Training

Finally, to train a new network from scratch, run

python train.py CONFIG.yaml

where you replace CONFIG.yaml with the name of the configuration file you want to use.

You can monitor on http://localhost:6006 the training process using tensorboard:

cd OUTPUT_DIR
tensorboard --logdir ./logs

where you replace OUTPUT_DIR with the respective output directory. For available training options, please take a look at configs/default.yaml.

2D-GAN Baseline

For convinience, we have implemented a 2D-GAN baseline which closely follows this GAN_stability repo. For example, you can train a 2D-GAN on CompCars at 64x64 pixels similar to our GIRAFFE method by running

python train.py configs/64res/cars_64_2dgan.yaml

Using Your Own Dataset

If you want to train a model on a new dataset, you first need to generate ground truth activations for the intermediate or final FID calculations. For this, you can use the script in scripts/calc_fid/precalc_fid.py. For example, if you want to generate an FID file for the comprehensive cars dataset at 64x64 pixels, you need to run

python scripts/precalc_fid.py  "data/comprehensive_cars/images/*.jpg" --regex True --gpu 0 --out-file "data/comprehensive_cars/fid_files/comprehensiveCars_64.npz" --img-size 64

or for LSUN churches, you need to run

python scripts/precalc_fid.py path/to/LSUN --class-name scene_categories/church_outdoor_train_lmdb --lsun True --gpu 0 --out-file data/church/fid_files/church_64.npz --img-size 64

Note: We apply the same transformations to the ground truth images for this FID calculation as we do during training. If you want to use your own dataset, you need to adjust the image transformations in the script accordingly. Further, you might need to adjust the object-level and camera transformations to your dataset.

Evaluating Generated Images

We provide the script eval_files.py for evaluating the FID score of your own generated images. For example, if you would like to evaluate your images on CompCars at 64x64 pixels, save them to an npy file and run

python eval_files.py --input-file "path/to/your/images.npy" --gt-file "data/comprehensive_cars/fid_files/comprehensiveCars_64.npz"

Futher Information

More Work on Implicit Representations

If you like the GIRAFFE project, please check out related works on neural representions from our group:

Comments

Unable to down load the model

Thanks for your excellent work! I want to test the demo in the render.py, but cant download the car model. The URL can't be connected in the chromes too.Could you please help me how to solve this problem?

https://s3.eu-central-1.amazonaws.com/avg-projects/giraffe/models/checkpoint_cars256-d9ea5e11.pt

opened by liuzhihui2046 4
About the model structure
Hi , thanks for your great work , i have two questions about the model:

Why we abandoned the patch input and discriminator in the GARF?

How did this mode solve the data demands of the dataset like the scene bounds .etc?(GARF use the LLFF or COLMAP)
opened by diaodeyi 3
[code] details about code

in giraffe/models/decoder.py, line 131, why need unsqueeze opr? the shape of net is (batch,hidden), and the output of self.fc_z(z_shape) is (batch, hidden) too.

opened by Feynman1999 3
Fitting large number of objects in memory

Thanks for sharing this awesome work, and congrats on winning best paper!

I'd like to train GIRAFFE on a custom dataset with up to 20+ objects per image, but I'm finding that a batch of 32 images won't fit into 11GB of GPU memory. For 64x54 resolution, I can render at most a batch of 18 images, and for 256x256 resolution, I can render at most a batch of 9 images. I haven't tried training yet, but I would expect it to take up at least as much memory as inference.

Do you think it would be safe to reduce the training batch size, or would that make the GAN training unstable at some point? Thanks.

opened by kjmillerCURIS 3
Some doubts regarding the disentanglement of objects w.r.t. each other and w.r.t. background
Hello,

Thanks so much for sharing the code for your amazing work. I had few doubts regarding the disentanglement part of the work:

Is the object-background disentanglement explicit (i.e. using background-foreground masks to train one part of the generator using just background pixels, and remaining parts using foreground pixels) or does the model learn it implicitly. I saw that the paper mentioned that scale and translation are fixed for the background to make it span the entire scene, and to make it centered at origin. But does the model 'unsupervisedly' learn to generate background to feature field generator of this configuration, or is there some explicit supervision also. Paper seems to suggest it's unsupervised, but I just wanted to confirm.

I saw that you have N+1 generators for N objects (1 for background). So are all the N object generator MLPs essentially same generators / shared weights, or are they different. Assuming all objects are say cars, then probably one generator would be okay to generate all the objects, but if we have different objects in scene, like car, bicycle, pedestrian, etc, then probably a per-category object generator would make sense?

Thanks again!
opened by ankuPRK 2
ffhq fid score reproduce check.

Hello,

I trained the model with ffhq_256.yaml file.

But I was not able to reproduce the fid score with Pretrained GIRAFFE model you offered(ffhq_256_pretrained.yaml).

Could you please check the configuration file(ffhq_256.yaml file)?

FFHQ | FID (20000 images) Pretrained from Github | 31.507948 My Reproduce Model | 43.068982

opened by Jinoh-Cho 2
The number of images in celeba-hq dataset? 30k or 200k?

Hi, sorry to bother you, I meet some problem about which folder I should choose.

I'd like to use images of 128*128 resolution, and I notice the celeba-128 folders contains 30k images, but the img_celeba images also has 200k images, so I got little confused about it. Could you tell me which folder should I choose? Really thanks!!! : )

opened by Tianhang-Cheng 2
Future works

Are you solving the following problem? If not, we can discuss it.

Disentanglement Failures. For Churches, the background sometimes contains a church, and for CompCars, the object sometimes contains background parts or vice versa. We attribute these to mismatches between the assumed uniform distributions over object and camera poses and their real distributions, and identify learning them instead as interesting future work.

opened by zzw-zwzhang 2

Error when train the code

Thanks for your great work, when i try to train the giraffe use the FFHQ , i meet a error:

(giraffe) ➜  giraffe-main python train.py configs/256res/ffhq_256.yaml           
/home/rjs/.conda/envs/giraffe/lib/python3.8/site-packages/kornia/augmentation/augmentation.py:1872: DeprecationWarning: GaussianBlur is no longer maintained and will be removed from the future versions. Please use RandomGaussianBlur instead.
  warnings.warn(
Start loading file addresses ...
done! time: 0.00021123886108398438
Number of images found: 0
Traceback (most recent call last):
  File "train.py", line 54, in <module>
    train_loader = torch.utils.data.DataLoader(
  File "/home/rjs/.conda/envs/giraffe/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 262, in __init__
    sampler = RandomSampler(dataset, generator=generator)  # type: ignore
  File "/home/rjs/.conda/envs/giraffe/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 103, in __init__
    raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

opened by diaodeyi 2

[Question about Equation7][transform points from object to scene space]

Dear author, I am really inspired by your brilliant work! However, I found a simple point struck me, As Eq6 says: k(x) "it transform points from object to scene space" while I found its usage is the k^-1(x) in Eq7. Would you be kind to explain this? Thank you very much!

opened by tomguluson92 2
Multi-GPU training
Hi,

Thank you for releasing your code. I have a few questions.

I tried to train the model on cars_256 on my machine which has 2 gpus, each having around 11GB memory. It encountered the OOM error, because the code currently uses only one GPU and in an earlier response you have indicated that the 256x256 config requires 16GB. So I am thinking of changing the code to use multi-gpu using DataParallel as shown below. I would like to know if there is anything from a computational perspective that needs to be taken care of when running the code on multiple gpus (will there be any inaccuracies in the results if the training code is run on multiple gpus)?

model = torch.nn.DataParallel(model, device_ids=gpu_list)

thanks
opened by athena913 2
How to control appearance and shape?
Hello, I am a freshman in GAN and I am confused with some operations.

How can I know which appearance code is for which color? For example, if I want to get a red object, then how can I know exactly which appearance code should I use?

I guess maybe the 'control' means 'swap'？For example, I have the appearance code App1, shape code Sha1 for Object1, and the apperance code App2, shape code Sha2 for Object2. And then I can get an object with the shape like Object1 and with the color like Object2 by using shape code Sha1 and appearance code App2? Am I right?

If I am right, so if I want to generate an red object, I need to try some runs to generate some red object first, and use these appearance codes?
opened by LeeBC2298 1
how to obtain 3d bounding box of the object from its {s,t,R}

Thanks for your excellent work ! I am really surprised by the controllable object generation !

I wonder if there is a way to extract 3d bounding box of the object from its affine transformation {s,t,R}.

I tried to establish the equation between the 3d bounding box and {s,t,R}, but failed. I transform a cube by equation(6) in the paper, but it seems their relationships do not follow equation(6). Do you have any suggestion ?

opened by PeizeSun 1

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

Related tags

Overview

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

Project Page | Paper | Supplementary | Video | Slides | Blog | Talk

TL; DR - Quick Start

Usage

Datasets

Controllable Image Synthesis

FID Evaluation

Training

2D-GAN Baseline

Using Your Own Dataset

Evaluating Generated Images

Futher Information

More Work on Implicit Representations

Comments

Owner

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

This repository contains all the code and materials distributed in the 2021 Q-Programming Summer of Qode.

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

This repository contains the code and models for the following paper.

This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider