This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

Overview

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

Project Page | Paper | Supplementary | Video | Slides | Blog | Talk

Add Clevr Tranlation Horizontal Cars Interpolate Shape Faces

If you find our code or paper useful, please cite as

@inproceedings{GIRAFFE,
    title = {GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields},
    author = {Niemeyer, Michael and Geiger, Andreas},
    booktitle = {Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
    year = {2021}
}

TL; DR - Quick Start

Rotating Cars Tranlation Horizontal Cars Tranlation Horizontal Cars

First you have to make sure that you have all dependencies in place. The simplest way to do so, is to use anaconda.

You can create an anaconda environment called giraffe using

conda env create -f environment.yml
conda activate giraffe

You can now test our code on the provided pre-trained models. For example, simply run

python render.py configs/256res/cars_256_pretrained.yaml

This script should create a model output folder out/cars256_pretrained. The animations are then saved to the respective subfolders in out/cars256_pretrained/rendering.

Usage

Datasets

To train a model from scratch or to use our ground truth activations for evaluation, you have to download the respective dataset.

For this, please run

bash scripts/download_dataset.sh

and following the instructions. This script should download and unpack the data automatically into the data/ folder.

Controllable Image Synthesis

To render images of a trained model, run

python render.py CONFIG.yaml

where you replace CONFIG.yaml with the correct config file. The easiest way is to use a pre-trained model. You can do this by using one of the config files which are indicated with *_pretrained.yaml.

For example, for our model trained on Cars at 256x256 pixels, run

python render.py configs/256res/cars_256_pretrained.yaml

or for celebA-HQ at 256x256 pixels, run

python render.py configs/256res/celebahq_256_pretrained.yaml

Our script will automatically download the model checkpoints and render images. You can find the outputs in the out/*_pretrained folders.

Please note that the config files *_pretrained.yaml are only for evaluation or rendering, not for training new models: when these configs are used for training, the model will be trained from scratch, but during inference our code will still use the pre-trained model.

FID Evaluation

For evaluation of the models, we provide the script eval.py. You can run it using

python eval.py CONFIG.yaml

The script generates 20000 images and calculates the FID score.

Note: For some experiments, the numbers in the paper might slightly differ because we used the evaluation protocol from GRAF to fairly compare against the methods reported in GRAF.

Training

Finally, to train a new network from scratch, run

python train.py CONFIG.yaml

where you replace CONFIG.yaml with the name of the configuration file you want to use.

You can monitor on http://localhost:6006 the training process using tensorboard:

cd OUTPUT_DIR
tensorboard --logdir ./logs

where you replace OUTPUT_DIR with the respective output directory. For available training options, please take a look at configs/default.yaml.

2D-GAN Baseline

For convinience, we have implemented a 2D-GAN baseline which closely follows this GAN_stability repo. For example, you can train a 2D-GAN on CompCars at 64x64 pixels similar to our GIRAFFE method by running

python train.py configs/64res/cars_64_2dgan.yaml

Using Your Own Dataset

If you want to train a model on a new dataset, you first need to generate ground truth activations for the intermediate or final FID calculations. For this, you can use the script in scripts/calc_fid/precalc_fid.py. For example, if you want to generate an FID file for the comprehensive cars dataset at 64x64 pixels, you need to run

python scripts/precalc_fid.py  "data/comprehensive_cars/images/*.jpg" --regex True --gpu 0 --out-file "data/comprehensive_cars/fid_files/comprehensiveCars_64.npz" --img-size 64

or for LSUN churches, you need to run

python scripts/precalc_fid.py path/to/LSUN --class-name scene_categories/church_outdoor_train_lmdb --lsun True --gpu 0 --out-file data/church/fid_files/church_64.npz --img-size 64

Note: We apply the same transformations to the ground truth images for this FID calculation as we do during training. If you want to use your own dataset, you need to adjust the image transformations in the script accordingly. Further, you might need to adjust the object-level and camera transformations to your dataset.

Evaluating Generated Images

We provide the script eval_files.py for evaluating the FID score of your own generated images. For example, if you would like to evaluate your images on CompCars at 64x64 pixels, save them to an npy file and run

python eval_files.py --input-file "path/to/your/images.npy" --gt-file "data/comprehensive_cars/fid_files/comprehensiveCars_64.npz"

Futher Information

More Work on Implicit Representations

If you like the GIRAFFE project, please check out related works on neural representions from our group:

Comments
  • Unable to down load the model

    Unable to down load the model

    Thanks for your excellent work! I want to test the demo in the render.py, but cant download the car model. The URL can't be connected in the chromes too.Could you please help me how to solve this problem?

    https://s3.eu-central-1.amazonaws.com/avg-projects/giraffe/models/checkpoint_cars256-d9ea5e11.pt

    opened by liuzhihui2046 4
  • About the model structure

    About the model structure

    Hi , thanks for your great work , i have two questions about the model:

    1. Why we abandoned the patch input and discriminator in the GARF?
    2. How did this mode solve the data demands of the dataset like the scene bounds .etc?(GARF use the LLFF or COLMAP)
    opened by diaodeyi 3
  • [code] details about code

    [code] details about code

    in giraffe/models/decoder.py, line 131, why need unsqueeze opr? the shape of net is (batch,hidden), and the output of self.fc_z(z_shape) is (batch, hidden) too.

    opened by Feynman1999 3
  • Fitting large number of objects in memory

    Fitting large number of objects in memory

    Thanks for sharing this awesome work, and congrats on winning best paper!

    I'd like to train GIRAFFE on a custom dataset with up to 20+ objects per image, but I'm finding that a batch of 32 images won't fit into 11GB of GPU memory. For 64x54 resolution, I can render at most a batch of 18 images, and for 256x256 resolution, I can render at most a batch of 9 images. I haven't tried training yet, but I would expect it to take up at least as much memory as inference.

    Do you think it would be safe to reduce the training batch size, or would that make the GAN training unstable at some point? Thanks.

    opened by kjmillerCURIS 3
  • Some doubts regarding the disentanglement of objects w.r.t. each other and w.r.t. background

    Some doubts regarding the disentanglement of objects w.r.t. each other and w.r.t. background

    Hello,

    Thanks so much for sharing the code for your amazing work. I had few doubts regarding the disentanglement part of the work:

    • Is the object-background disentanglement explicit (i.e. using background-foreground masks to train one part of the generator using just background pixels, and remaining parts using foreground pixels) or does the model learn it implicitly. I saw that the paper mentioned that scale and translation are fixed for the background to make it span the entire scene, and to make it centered at origin. But does the model 'unsupervisedly' learn to generate background to feature field generator of this configuration, or is there some explicit supervision also. Paper seems to suggest it's unsupervised, but I just wanted to confirm.

    • I saw that you have N+1 generators for N objects (1 for background). So are all the N object generator MLPs essentially same generators / shared weights, or are they different. Assuming all objects are say cars, then probably one generator would be okay to generate all the objects, but if we have different objects in scene, like car, bicycle, pedestrian, etc, then probably a per-category object generator would make sense?

    Thanks again!

    opened by ankuPRK 2
  • ffhq fid score reproduce check.

    ffhq fid score reproduce check.

    Hello,

    I trained the model with ffhq_256.yaml file.

    But I was not able to reproduce the fid score with Pretrained GIRAFFE model you offered(ffhq_256_pretrained.yaml).

    Could you please check the configuration file(ffhq_256.yaml file)?

    FFHQ | FID (20000 images) Pretrained from Github | 31.507948 My Reproduce Model | 43.068982

    opened by Jinoh-Cho 2
  • The number of images in celeba-hq dataset? 30k or 200k?

    The number of images in celeba-hq dataset? 30k or 200k?

    Hi, sorry to bother you, I meet some problem about which folder I should choose.

    I'd like to use images of 128*128 resolution, and I notice the celeba-128 folders contains 30k images, but the img_celeba images also has 200k images, so I got little confused about it. Could you tell me which folder should I choose? Really thanks!!! : )

    image

    opened by Tianhang-Cheng 2
  • Future works

    Future works

    Are you solving the following problem? If not, we can discuss it.

    Disentanglement Failures. For Churches, the background sometimes contains a church, and for CompCars, the object sometimes contains background parts or vice versa. We attribute these to mismatches between the assumed uniform distributions over object and camera poses and their real distributions, and identify learning them instead as interesting future work.

    opened by zzw-zwzhang 2
  • Error when train the code

    Error when train the code

    Thanks for your great work, when i try to train the giraffe use the FFHQ , i meet a error:

    (giraffe) ➜  giraffe-main python train.py configs/256res/ffhq_256.yaml           
    /home/rjs/.conda/envs/giraffe/lib/python3.8/site-packages/kornia/augmentation/augmentation.py:1872: DeprecationWarning: GaussianBlur is no longer maintained and will be removed from the future versions. Please use RandomGaussianBlur instead.
      warnings.warn(
    Start loading file addresses ...
    done! time: 0.00021123886108398438
    Number of images found: 0
    Traceback (most recent call last):
      File "train.py", line 54, in <module>
        train_loader = torch.utils.data.DataLoader(
      File "/home/rjs/.conda/envs/giraffe/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 262, in __init__
        sampler = RandomSampler(dataset, generator=generator)  # type: ignore
      File "/home/rjs/.conda/envs/giraffe/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 103, in __init__
        raise ValueError("num_samples should be a positive integer "
    ValueError: num_samples should be a positive integer value, but got num_samples=0
    
    opened by diaodeyi 2
  • [Question about Equation7][transform points from object to scene space]

    [Question about Equation7][transform points from object to scene space]

    Dear author, I am really inspired by your brilliant work! However, I found a simple point struck me, As Eq6 says: k(x) "it transform points from object to scene space" while I found its usage is the k^-1(x) in Eq7. Would you be kind to explain this? Thank you very much!

    image

    opened by tomguluson92 2
  • Multi-GPU training

    Multi-GPU training

    Hi,

    Thank you for releasing your code. I have a few questions.

    1. I tried to train the model on cars_256 on my machine which has 2 gpus, each having around 11GB memory. It encountered the OOM error, because the code currently uses only one GPU and in an earlier response you have indicated that the 256x256 config requires 16GB. So I am thinking of changing the code to use multi-gpu using DataParallel as shown below. I would like to know if there is anything from a computational perspective that needs to be taken care of when running the code on multiple gpus (will there be any inaccuracies in the results if the training code is run on multiple gpus)?

    model = torch.nn.DataParallel(model, device_ids=gpu_list)

    thanks

    opened by athena913 2
  • How to control appearance and shape?

    How to control appearance and shape?

    Hello, I am a freshman in GAN and I am confused with some operations.

    1. How can I know which appearance code is for which color? For example, if I want to get a red object, then how can I know exactly which appearance code should I use?

    2. I guess maybe the 'control' means 'swap'?For example, I have the appearance code App1, shape code Sha1 for Object1, and the apperance code App2, shape code Sha2 for Object2. And then I can get an object with the shape like Object1 and with the color like Object2 by using shape code Sha1 and appearance code App2? Am I right?

    3. If I am right, so if I want to generate an red object, I need to try some runs to generate some red object first, and use these appearance codes?

    opened by LeeBC2298 1
  • how to obtain 3d bounding box of the object from its {s,t,R}

    how to obtain 3d bounding box of the object from its {s,t,R}

    Thanks for your excellent work ! I am really surprised by the controllable object generation !

    I wonder if there is a way to extract 3d bounding box of the object from its affine transformation {s,t,R}.

    I tried to establish the equation between the 3d bounding box and {s,t,R}, but failed. I transform a cube by equation(6) in the paper, but it seems their relationships do not follow equation(6). Do you have any suggestion ?

    opened by PeizeSun 1
This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

Differentiable Volumetric Rendering Paper | Supplementary | Spotlight Video | Blog Entry | Presentation | Interactive Slides | Project Page This repos

null 697 Jan 6, 2023
null 190 Jan 3, 2023
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

MultiModal-InfoMax This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Informa

Deep Cognition and Language Research (DeCLaRe) Lab 89 Dec 26, 2022
This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

About Repository This repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.' About Code

Arun Verma 1 Nov 9, 2021
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Skeleton Aware Multi-modal Sign Language Recognition By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu. Smile Lab @ Northeastern

Isen (Songyao Jiang) 128 Dec 8, 2022
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

selfcontact This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] It includes the main function

Lea Müller 68 Dec 6, 2022
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

SMPLify-XMC This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright Lic

Lea Müller 83 Dec 14, 2022
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

TUCH This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright License fo

Lea Müller 45 Jan 7, 2023
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

HeadNeRF: A Real-time NeRF-based Parametric Head Model This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametr

null 294 Jan 1, 2023
This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

STaCK: Sentence Ordering with Temporal Commonsense Knowledge This repository contains the pytorch implementation of the paper STaCK: Sentence Ordering

Deep Cognition and Language Research (DeCLaRe) Lab 23 Dec 16, 2022
RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

[3DV 2021] We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network.

Phong Nguyen Ha 4 May 26, 2022
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Qin Wang 87 Jan 8, 2023
This repository contains all the code and materials distributed in the 2021 Q-Programming Summer of Qode.

Q-Programming Summer of Qode This repository contains all the code and materials distributed in the Q-Programming Summer of Qode. If you want to creat

Sammarth Kumar 11 Jun 11, 2021
This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

Hierarchical Motion Understanding via Motion Programs (CVPR 2021) This repository contains the official implementation of: Hierarchical Motion Underst

Sumith Kulal 40 Dec 5, 2022
This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Deep Continuous Clustering Introduction This is a Pytorch implementation of the DCC algorithms presented in the following paper (paper): Sohil Atul Sh

Sohil Shah 197 Nov 29, 2022
This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks Project Page | Video | Presentation | Paper | Data L

Facebook Research 281 Dec 22, 2022
This repository contains the code and models for the following paper.

DC-ShadowNet Introduction This is an implementation of the following paper DC-ShadowNet: Single-Image Hard and Soft Shadow Removal Using Unsupervised

AuAgCu 65 Dec 27, 2022
This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

SBEVNet: End-to-End Deep Stereo Layout Estimation This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by D

Divam Gupta 19 Dec 17, 2022