This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

Overview

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

Project Page | Paper | Supplementary | Video | Slides | Blog | Talk

Add Clevr Tranlation Horizontal Cars Interpolate Shape Faces

If you find our code or paper useful, please cite as

@inproceedings{GIRAFFE,
    title = {GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields},
    author = {Niemeyer, Michael and Geiger, Andreas},
    booktitle = {Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
    year = {2021}
}

TL; DR - Quick Start

Rotating Cars Tranlation Horizontal Cars Tranlation Horizontal Cars

First you have to make sure that you have all dependencies in place. The simplest way to do so, is to use anaconda.

You can create an anaconda environment called giraffe using

conda env create -f environment.yml
conda activate giraffe

You can now test our code on the provided pre-trained models. For example, simply run

python render.py configs/256res/cars_256_pretrained.yaml

This script should create a model output folder out/cars256_pretrained. The animations are then saved to the respective subfolders in out/cars256_pretrained/rendering.

Usage

Datasets

To train a model from scratch or to use our ground truth activations for evaluation, you have to download the respective dataset.

For this, please run

bash scripts/download_data.sh

and following the instructions. This script should download and unpack the data automatically into the data/ folder.

Controllable Image Synthesis

To render images of a trained model, run

python render.py CONFIG.yaml

where you replace CONFIG.yaml with the correct config file. The easiest way is to use a pre-trained model. You can do this by using one of the config files which are indicated with *_pretrained.yaml.

For example, for our model trained on Cars at 256x256 pixels, run

python render.py configs/256res/cars_256_pretrained.yaml

or for celebA-HQ at 256x256 pixels, run

python render.py configs/256res/celebahq_256_pretrained.yaml

Our script will automatically download the model checkpoints and render images. You can find the outputs in the out/*_pretrained folders.

Please note that the config files *_pretrained.yaml are only for evaluation or rendering, not for training new models: when these configs are used for training, the model will be trained from scratch, but during inference our code will still use the pre-trained model.

FID Evaluation

For evaluation of the models, we provide the script eval.py. You can run it using

python eval.py CONFIG.yaml

The script generates 20000 images and calculates the FID score.

Note: For some experiments, the numbers in the paper might slightly differ because we used the evaluation protocol from GRAF to fairly compare against the methods reported in GRAF.

Training

Finally, to train a new network from scratch, run

python train.py CONFIG.yaml

where you replace CONFIG.yaml with the name of the configuration file you want to use.

You can monitor on http://localhost:6006 the training process using tensorboard:

cd OUTPUT_DIR
tensorboard --logdir ./logs

where you replace OUTPUT_DIR with the respective output directory. For available training options, please take a look at configs/default.yaml.

2D-GAN Baseline

For convinience, we have implemented a 2D-GAN baseline which closely follows this GAN_stability repo. For example, you can train a 2D-GAN on CompCars at 64x64 pixels similar to our GIRAFFE method by running

python train.py configs/64res/cars_64_2dgan.yaml

Using Your Own Dataset

If you want to train a model on a new dataset, you first need to generate ground truth activations for the intermediate or final FID calculations. For this, you can use the script in scripts/calc_fid/precalc_fid.py. For example, if you want to generate an FID file for the comprehensive cars dataset at 64x64 pixels, you need to run

python scripts/precalc_fid.py  "data/comprehensive_cars/images/*.jpg" --regex True --gpu 0 --out-file "data/comprehensive_cars/fid_files/comprehensiveCars_64.npz" --img-size 64

or for LSUN churches, you need to run

python scripts/precalc_fid.py path/to/LSUN --class-name scene_categories/church_outdoor_train_lmdb --lsun True --gpu 0 --out-file data/church/fid_files/church_64.npz --img-size 64

Note: We apply the same transformations to the ground truth images for this FID calculation as we do during training. If you want to use your own dataset, you need to adjust the image transformations in the script accordingly. Further, you might need to adjust the object-level and camera transformations to your dataset.

Evaluating Generated Images

We provide the script eval_files.py for evaluating the FID score of your own generated images. For example, if you would like to evaluate your images on CompCars at 64x64 pixels, save them to an npy file and run

python eval_files.py --input-file "path/to/your/images.npy" --gt-file "data/comprehensive_cars/fid_files/comprehensiveCars_64.npz"

Futher Information

More Work on Implicit Representations

If you like the GIRAFFE project, please check out related works on neural representions from our group:

Comments
  • Unable to down load the model

    Unable to down load the model

    Thanks for your excellent work! I want to test the demo in the render.py, but cant download the car model. The URL can't be connected in the chromes too.Could you please help me how to solve this problem?

    https://s3.eu-central-1.amazonaws.com/avg-projects/giraffe/models/checkpoint_cars256-d9ea5e11.pt

    opened by liuzhihui2046 4
  • About the model structure

    About the model structure

    Hi , thanks for your great work , i have two questions about the model:

    1. Why we abandoned the patch input and discriminator in the GARF?
    2. How did this mode solve the data demands of the dataset like the scene bounds .etc?(GARF use the LLFF or COLMAP)
    opened by diaodeyi 3
  • [code] details about code

    [code] details about code

    in giraffe/models/decoder.py, line 131, why need unsqueeze opr? the shape of net is (batch,hidden), and the output of self.fc_z(z_shape) is (batch, hidden) too.

    opened by Feynman1999 3
  • Fitting large number of objects in memory

    Fitting large number of objects in memory

    Thanks for sharing this awesome work, and congrats on winning best paper!

    I'd like to train GIRAFFE on a custom dataset with up to 20+ objects per image, but I'm finding that a batch of 32 images won't fit into 11GB of GPU memory. For 64x54 resolution, I can render at most a batch of 18 images, and for 256x256 resolution, I can render at most a batch of 9 images. I haven't tried training yet, but I would expect it to take up at least as much memory as inference.

    Do you think it would be safe to reduce the training batch size, or would that make the GAN training unstable at some point? Thanks.

    opened by kjmillerCURIS 3
  • Some doubts regarding the disentanglement of objects w.r.t. each other and w.r.t. background

    Some doubts regarding the disentanglement of objects w.r.t. each other and w.r.t. background

    Hello,

    Thanks so much for sharing the code for your amazing work. I had few doubts regarding the disentanglement part of the work:

    • Is the object-background disentanglement explicit (i.e. using background-foreground masks to train one part of the generator using just background pixels, and remaining parts using foreground pixels) or does the model learn it implicitly. I saw that the paper mentioned that scale and translation are fixed for the background to make it span the entire scene, and to make it centered at origin. But does the model 'unsupervisedly' learn to generate background to feature field generator of this configuration, or is there some explicit supervision also. Paper seems to suggest it's unsupervised, but I just wanted to confirm.

    • I saw that you have N+1 generators for N objects (1 for background). So are all the N object generator MLPs essentially same generators / shared weights, or are they different. Assuming all objects are say cars, then probably one generator would be okay to generate all the objects, but if we have different objects in scene, like car, bicycle, pedestrian, etc, then probably a per-category object generator would make sense?

    Thanks again!

    opened by ankuPRK 2
  • ffhq fid score reproduce check.

    ffhq fid score reproduce check.

    Hello,

    I trained the model with ffhq_256.yaml file.

    But I was not able to reproduce the fid score with Pretrained GIRAFFE model you offered(ffhq_256_pretrained.yaml).

    Could you please check the configuration file(ffhq_256.yaml file)?

    FFHQ | FID (20000 images) Pretrained from Github | 31.507948 My Reproduce Model | 43.068982

    opened by Jinoh-Cho 2
  • The number of images in celeba-hq dataset? 30k or 200k?

    The number of images in celeba-hq dataset? 30k or 200k?

    Hi, sorry to bother you, I meet some problem about which folder I should choose.

    I'd like to use images of 128*128 resolution, and I notice the celeba-128 folders contains 30k images, but the img_celeba images also has 200k images, so I got little confused about it. Could you tell me which folder should I choose? Really thanks!!! : )

    image

    opened by Tianhang-Cheng 2
  • Future works

    Future works

    Are you solving the following problem? If not, we can discuss it.

    Disentanglement Failures. For Churches, the background sometimes contains a church, and for CompCars, the object sometimes contains background parts or vice versa. We attribute these to mismatches between the assumed uniform distributions over object and camera poses and their real distributions, and identify learning them instead as interesting future work.

    opened by zzw-zwzhang 2
  • Error when train the code

    Error when train the code

    Thanks for your great work, when i try to train the giraffe use the FFHQ , i meet a error:

    (giraffe) ➜  giraffe-main python train.py configs/256res/ffhq_256.yaml           
    /home/rjs/.conda/envs/giraffe/lib/python3.8/site-packages/kornia/augmentation/augmentation.py:1872: DeprecationWarning: GaussianBlur is no longer maintained and will be removed from the future versions. Please use RandomGaussianBlur instead.
      warnings.warn(
    Start loading file addresses ...
    done! time: 0.00021123886108398438
    Number of images found: 0
    Traceback (most recent call last):
      File "train.py", line 54, in <module>
        train_loader = torch.utils.data.DataLoader(
      File "/home/rjs/.conda/envs/giraffe/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 262, in __init__
        sampler = RandomSampler(dataset, generator=generator)  # type: ignore
      File "/home/rjs/.conda/envs/giraffe/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 103, in __init__
        raise ValueError("num_samples should be a positive integer "
    ValueError: num_samples should be a positive integer value, but got num_samples=0
    
    opened by diaodeyi 2
  • [Question about Equation7][transform points from object to scene space]

    [Question about Equation7][transform points from object to scene space]

    Dear author, I am really inspired by your brilliant work! However, I found a simple point struck me, As Eq6 says: k(x) "it transform points from object to scene space" while I found its usage is the k^-1(x) in Eq7. Would you be kind to explain this? Thank you very much!

    image

    opened by tomguluson92 2
  • Multi-GPU training

    Multi-GPU training

    Hi,

    Thank you for releasing your code. I have a few questions.

    1. I tried to train the model on cars_256 on my machine which has 2 gpus, each having around 11GB memory. It encountered the OOM error, because the code currently uses only one GPU and in an earlier response you have indicated that the 256x256 config requires 16GB. So I am thinking of changing the code to use multi-gpu using DataParallel as shown below. I would like to know if there is anything from a computational perspective that needs to be taken care of when running the code on multiple gpus (will there be any inaccuracies in the results if the training code is run on multiple gpus)?

    model = torch.nn.DataParallel(model, device_ids=gpu_list)

    thanks

    opened by athena913 2
  • How to control appearance and shape?

    How to control appearance and shape?

    Hello, I am a freshman in GAN and I am confused with some operations.

    1. How can I know which appearance code is for which color? For example, if I want to get a red object, then how can I know exactly which appearance code should I use?

    2. I guess maybe the 'control' means 'swap'?For example, I have the appearance code App1, shape code Sha1 for Object1, and the apperance code App2, shape code Sha2 for Object2. And then I can get an object with the shape like Object1 and with the color like Object2 by using shape code Sha1 and appearance code App2? Am I right?

    3. If I am right, so if I want to generate an red object, I need to try some runs to generate some red object first, and use these appearance codes?

    opened by LeeBC2298 1
  • how to obtain 3d bounding box of the object from its {s,t,R}

    how to obtain 3d bounding box of the object from its {s,t,R}

    Thanks for your excellent work ! I am really surprised by the controllable object generation !

    I wonder if there is a way to extract 3d bounding box of the object from its affine transformation {s,t,R}.

    I tried to establish the equation between the 3d bounding box and {s,t,R}, but failed. I transform a cube by equation(6) in the paper, but it seems their relationships do not follow equation(6). Do you have any suggestion ?

    opened by PeizeSun 1
Owner
null
null 189 Jan 2, 2023
This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

null 79 Dec 27, 2022
Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

Parser-Free Virtual Try-on via Distilling Appearance Flows, CVPR 2021 Official code for CVPR 2021 paper 'Parser-Free Virtual Try-on via Distilling App

null 395 Jan 3, 2023
This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Proteno This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deploymen

null 37 Dec 4, 2022
Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

capbot-siic Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021. Problem Inspiration A plethora

Aryan Kargwal 19 Feb 17, 2022
Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances This repository contains the code and pre-trained mode

ICTNLP 90 Dec 27, 2022
Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Mask-Align: Self-Supervised Neural Word Alignment This is the implementation of our work Mask-Align: Self-Supervised Neural Word Alignment. @inproceed

THUNLP-MT 46 Dec 15, 2022
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022
Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

?? Fingerprinting Fine-tuned Language Models in the wild This is the code and dataset for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned La

LCS2-IIITDelhi 5 Sep 13, 2022
Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

TRICE: a task-agnostic transferring framework for multi-source sequence generation This is the source code of our work Transfer Learning for Sequence

THUNLP-MT 9 Jun 27, 2022
This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

The baseline code is for EDA: Easy Data Augmentation techniques for boosting performance on text classification tasks

Akbar Karimi 81 Dec 9, 2022
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

LancoPKU 105 Jan 3, 2023
Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Chenhe Dong 28 Nov 10, 2022
💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes Official PyTorch implementation and EmoCause evaluatio

Hyunwoo Kim 50 Dec 21, 2022
Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"

Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"

Jifan Chen 22 Oct 21, 2022
Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning This repo is for Findings at EMNLP 2021 paper: Learn Cont

INK Lab @ USC 6 Sep 2, 2022
Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Realistic Few-Shot Relation Extraction This repository contains code to reproduce the results in the paper "Towards Realistic Few-Shot Relation Extrac

Bloomberg 8 Nov 9, 2022
Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT.

KR-BERT-SimCSE Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT. Training Unsupervised python train_unsupervised.py --mi

Jeong Ukjae 27 Dec 12, 2022