Official implementation of the ICLR 2021 paper

Overview

You Only Need Adversarial Supervision for Semantic Image Synthesis

Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial Supervision for Semantic Image Synthesis". The code allows the users to reproduce and extend the results reported in the study. Please cite the paper when reporting, reproducing or extending the results.

[OpenReview] [Arxiv]

Overview

This repository implements the OASIS model, which generates realistic looking images from semantic label maps. In addition, many different images can be generated from any given label map by simply resampling a noise vector (first two rows of the figure below). The model also allows to just resample parts of the image (see the last two rows of the figure below). Check out the paper for details, as well as the appendix, which contains many additional examples.

Setup

First, clone this repository:

git clone https://github.com/boschresearch/OASIS.git
cd OASIS

The code is tested for Python 3.7.6 and the packages listed in oasis.yml. The basic requirements are PyTorch and Torchvision. The easiest way to get going is to install the oasis conda environment via

conda env create --file oasis.yml
source activate oasis

Datasets

For COCO-Stuff, Cityscapes or ADE20K, please follow the instructions for the dataset preparation as outlined in https://github.com/NVlabs/SPADE.

Training the model

To train the model, execute the training scripts in the scripts folder. In these scripts you first need to specify the path to the data folder. Via the --name parameter the experiment can be given a unique identifier. The experimental results are then saved in the folder ./checkpoints, where a new folder for each run is created with the specified experiment name. You can also specify another folder for the checkpoints using the --checkpoints_dir parameter. If you want to continue training, start the respective script with the --continue_train flag. Have a look at config.py for other options you can specify.
Training on 4 NVIDIA Tesla V100 (32GB) is recommended.

Testing the model

To test a trained model, execute the testing scripts in the scripts folder. The --name parameter should correspond to the experiment name that you want to test, and the --checkpoints_dir should the folder where the experiment is saved (default: ./checkpoints). These scripts will generate images from a pretrained model in ./results/name/.

Measuring FID

The FID is computed on the fly during training, using the popular PyTorch FID implementation from https://github.com/mseitzer/pytorch-fid. At the beginning of training, the inception moments of the real images are computed before the actual training loop starts. How frequently the FID should be evaluated is controlled via the parameter --freq_fid, which is set to 5000 steps by default. The inception net that is used for FID computation automatically downloads a pre-trained inception net checkpoint. If that automatic download fails, for instance because your server has restricted internet access, get the checkpoint named pt_inception-2015-12-05-6726825d.pth from here and place it in /utils/fid_folder/. In this case, do not forget to replace load_state_dict_from_url function accordingly.

Pretrained models

The checkpoints for the pre-trained models are available here as zip files. Copy them into the checkpoints folder (the default is ./checkpoints, create it if it doesn't yet exist) and unzip them. The folder structure should be

checkpoints_dir
├── oasis_ade20k_pretrained                   
├── oasis_cityscapes_pretrained  
└── oasis_coco_pretrained

You can generate images with a pre-trained checkpoint via test.py. Using the example of ADE20K:

python test.py --dataset_mode ade20k --name oasis_ade20k_pretrained \
--dataroot path_to/ADEChallenge2016

This script will create a folder named ./results in which the resulting images are saved.

If you want to continue training from this checkpoint, use train.py with the same --name parameter and add --continue_train --which_iter best.

Citation

If you use this work please cite

@inproceedings{schonfeld_sushko_iclr2021,
  title={You Only Need Adversarial Supervision for Semantic Image Synthesis},
  author={Sch{\"o}nfeld, Edgar and Sushko, Vadim and Zhang, Dan and Gall, Juergen and Schiele, Bernt and Khoreva, Anna},
  booktitle={International Conference on Learning Representations},
  year={2021}
}   

License

This project is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.

For a list of other open source components included in this project, see the file 3rd-party-licenses.txt.

Purpose of the project

This software is a research prototype, solely developed for and published as part of the publication cited above. It will neither be maintained nor monitored in any way.

Contact

Please feel free to open an issue or contact us personally if you have questions, need help, or need explanations. Write to one of the following email addresses, and maybe put one other in the cc:

[email protected]
[email protected]
[email protected]
[email protected]

Comments
  • RuntimeError: Creating MTGP constants failed

    RuntimeError: Creating MTGP constants failed

    Hi, I am trying to implement this repo. I've downloaded the ade20k checkpoints and created a conda env following your yaml file.

    When I run the testing command python test.py --name oasis_ade20k --dataset_mode ade20k --gpu_ids 0 \ azureuser@ivan-fantasia-default --dataroot test_images --batch_size 1 I get the following error:

    /opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [232,0,0], thread: [101,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
    Traceback (most recent call last):
      File "test.py", line 25, in <module>
        generated = model(None, label, "generate", None)
      File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
        return self.module(*inputs[0], **kwargs[0])
      File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/azureuser/IM/OASIS/models/models.py", line 72, in forward
        fake = self.netEMA(label)
      File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/azureuser/IM/OASIS/models/generator.py", line 36, in forward
        z = torch.randn(seg.size(0), self.opt.z_dim, dtype=torch.float32, device=dev)
    RuntimeError: Creating MTGP constants failed. at /opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCTensorRandom.cu:35
    

    I am running on test_image folder which are some ade20k images.

    Any suggestion? Thanks ;)

    opened by ivanlengyel 9
  • noisy data set

    noisy data set

    hello dear author @edgarschnfld

    I downloaded the pre-trained model and added noise to the labels that are input for the network but the test failed and the model didn't work for the noisy dataset at all. what should I do for testing the model on noise?

    opened by ranch-hands 9
  • test new labels on this network

    test new labels on this network

    hi dear @SebastianSchildt @SushkoVadim I used another network to create label maps from images (this network: https://github.com/CSAILVision/semantic-segmentation-pytorch) then fed your oasis network with them in test mode, but it doesn't work. it says: [RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED] would you please help ?

    thank you so much

    opened by ranch-hands 8
  • Any idea on why OASIS discriminator can encourage repetitive patterns?

    Any idea on why OASIS discriminator can encourage repetitive patterns?

    While doing my experiments I've replaced SPADE's discriminator with OASIS one and tried this with VGG being both disabled and enabled. In any case images seems to be visually better (even though FID is higher on CityScapes) but what is very noticeable (unfortunately I can't share images) is that OASIS discriminator for some reason encourages repetitive patterns in result images, specifically in broad semantic regions like road. Do you have any idea what could cause that given that the only change in my model I did is replacing MultiscaleDiscriminator with OASISDiscriminator?

    I'm sorry for a vague question but I just wonder if there any obvious things I can try to fight this. I also use LabelMix loss with Lambda = 10.

    opened by s1ddok 5
  • beta2 value reasoning

    beta2 value reasoning

    Hey!

    Could you explain the reasoning behind beta2 being equal to 0.999 instead of 0.9 (like in SPADE)? Didn't find any mention of this neither in the paper nor in code

    opened by s1ddok 4
  • what is input and output of network exactly

    what is input and output of network exactly

    hello dear all @SebastianSchildt @LGro @johannes-mueller @schorg @c2bo

    Does anybody knows what the input and output of this network is exactly? does it take those labels and the output is the normal pictures? Am I right?

    thanks in advance

    opened by ranch-hands 4
  • How do you compute FID?

    How do you compute FID?

    To reproduce your results, one thing I am not sure, could you please help me to make sure?

    For cityscapes dataset, from https://github.com/NVlabs/SPADE, and [https://github.com/mseitzer/pytorch-fid](FID computation). Without doubt, your synthesized images are from your codes.

    But how about the real images?

    1. Resize. resize real images to the same size with your synthesized (256 * 512) in nearest down sampling?
    2. what are the real images? only val image froms cityscapes (gtFine/val) or all images (gtFine/val, gtFine/test, gtFine/train)?

    Thanks a lot.

    opened by xml94 3
  • EMA, 3D noise

    EMA, 3D noise

    Hi. @edgarschnfld @SushkoVadim

    I am a student studying semantic image synthesis. Thank you for the great work. I have two questions about the difference between paper and code.

    1. EMA As you cite [Yaz et al., 2018], exponential moving average is a good technique for training GAN. However, in your code https://github.com/boschresearch/OASIS/blob/6e728ec5f5b7b69d6744485aa69a355e0164423c/utils/utils.py#L125-L132 I think below code might be added
    model.module.netG.state_dict()[key].data.copy_(
        model.module.netEMA.state_dict()[key].data
    )
    

    If not, netG is not trained using EMA.

    Yaz, Yasin, et al. "The unusual effectiveness of averaging in GAN training." International Conference on Learning Representations. 2018.

    1. 3D noise

    If I do not misunderstand your paper, the paper says that the noise of OASIS has been sampled from a 3D normal distribution. And this is one of the main differences with SPADE. However, in your code at, https://github.com/boschresearch/OASIS/blob/f049c37ff711792c09e573586640e1fd11d69d58/models/generator.py#L34-L39 Noise is not sampled from the 3D normal distribution. It was also sampled from a 1D normal distribution. Then expand it to 3D, which replicates the same vector spatial way. In my opinion, this code should be replaced by

    z = torch.randn(seg.shape, ...)
    

    I think both two parts are pretty crucial for your paper. If there is any reason for these choices or my fault, please let me know.

    Thank you.

    opened by pmh9960 2
  • The purpose of collecting running stats when updating EMA before FID computation, image or network saving

    The purpose of collecting running stats when updating EMA before FID computation, image or network saving

    Hi @SushkoVadim @edgarschnfld,

    Thanks for the excellent work. I noted that when updating EMA, you collect the running stats for BatchNorm before FID computation, image or network saving (see https://github.com/boschresearch/OASIS/blob/master/utils/utils.py#L133).

    May I ask about the purpose and intuition of this operation? How significant would it affect the model performance? Thank you in advance.

    opened by EndlessSora 2
  • reported number in paper is from best or latest model?

    reported number in paper is from best or latest model?

    I noticed that in the code you will calculate FID according to test images and save the best model accordingly(correct me if wrong). I am wondering if is the best model used to report numbers in the paper? Thanks

    opened by Yuheng-Li 2
  • Computation of the loss reweighting

    Computation of the loss reweighting

    Hey, I've noticed that the weight reweighting that is used in the code doesn't match the papers equation Indeed, here, the formula is BxHxW/(num_pixels_of_class_i*num_of_non_zeros_classes_in_batch) whereas the paper showcase the following equation: BxHxW/(num_pixels_of_class_i). Did i get something wrong? If not can you clarify which one is the correct equation? Thanks

    opened by nicolas-dufour 1
  • Questioin on losses.py

    Questioin on losses.py

    Hello, Thank you for sharing your great work! I noticed that on line 36 of the losses.py file, coefficients = torch.reciprocal(class_occurence) * torch.numel(label) / (num_of_classes * label.shape[1]) it seems that label.shape[1] should be modified to label.shape[0]here so that the solution is in line with the equation in the paper (HxW), not sure if I understand it correctly? Or is there some detail I'm not noticing? Thanks.

    opened by ZongWei-HUST 0
  • how to generate diverse images from a

    how to generate diverse images from a "Label map"?

    how to generate diverse images from a "Label map"? for example: how to generate "OASIS1"、"OASIS2"、"OASIS3"、"OASIS4"、"OASIS5" form "Label map" what is the terminal line? image

    opened by mapengsen 0
  • Question on SpectralNorm (Discriminator)

    Question on SpectralNorm (Discriminator)

    Hello @edgarschnfld, @SushkoVadim,

    thank you for sharing your work! I was wondering if you have explored the importance of the spectral norm in the discriminator in more detail. I also noticed that you applied the spectral norm to every layer except the last convolution, see https://github.com/boschresearch/OASIS/blob/master/models/discriminator.py#L23 Is this a specific design decision?

    It would be great if you could share some insights here. Thanks,

    Nikolai

    opened by Nikolai10 0
  • The label of output by the discriminator is inconsistent with the label of the dataset

    The label of output by the discriminator is inconsistent with the label of the dataset

    Hi, I'm trying to use OASIS discriminator with my own dataset's semantic segmantation. In short, my ground truth onehot segmantation has 3 channels, and it is shaped like: channel 0: background, channel 1: human, and channel 2 for the fake pixel label as paper said. But when calculate cross entropy loss, the target label seems to have been processed into an one channel label like: 1:background, 2:human whenis_real=Trueor 0: fake when is_real=False. The model still works properly, but the predicted semantic segmentation display order is not quite correct when visualizing. Can you please tell me if my process is correct, any suggestions would be very helpful, thanks.

    opened by Wzj-zju 1
Owner
Bosch Research
Bosch Research
Official code for the ICLR 2021 paper Neural ODE Processes

Neural ODE Processes Official code for the paper Neural ODE Processes (ICLR 2021). Abstract Neural Ordinary Differential Equations (NODEs) use a neura

Cristian Bodnar 50 Oct 28, 2022
Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology Sharon Zhou, Eric Zelikman

Stanford Machine Learning Group 34 Nov 16, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

Dongkwan Kim 127 Dec 28, 2022
Official PyTorch implementation of "Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets" (ICLR 2021)

Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets This is the official PyTorch implementation for the paper Rapid Neural A

null 48 Dec 26, 2022
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
Official Pytorch implementation of ICLR 2018 paper Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge.

Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge: Official Pytorch implementation of ICLR 2018 paper Deep Learning for Phy

emmanuel 47 Nov 6, 2022
ReLoss - Official implementation for paper "Relational Surrogate Loss Learning" ICLR 2022

Relational Surrogate Loss Learning (ReLoss) Official implementation for paper "R

Tao Huang 31 Nov 22, 2022
An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Neural Attention Distillation This is an implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep

Yige-Li 84 Jan 4, 2023
Tensorflow 2 implementation of the paper: Learning and Evaluating Representations for Deep One-class Classification published at ICLR 2021

Deep Representation One-class Classification (DROC). This is not an officially supported Google product. Tensorflow 2 implementation of the paper: Lea

Google Research 137 Dec 23, 2022
Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

CLIORA This is the official codebase for ICLR oral paper: Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling. We introduce

Bo Wan                                             32 Dec 23, 2022
Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)

The Official Implementation of CLIB (Continual Learning for i-Blurry) Online Continual Learning on Class Incremental Blurry Task Configuration with An

NAVER AI 34 Oct 26, 2022
PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Unbiased Teacher for Semi-Supervised Object Detection This is the PyTorch implementation of our paper: Unbiased Teacher for Semi-Supervised Object Detection

Facebook Research 366 Dec 28, 2022
Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

CSE-Autoloss Designing proper loss functions for vision tasks has been a long-standing research direction to advance the capability of existing models

Peidong Liu(刘沛东) 54 Dec 17, 2022
Code for ICLR 2021 Paper, "Anytime Sampling for Autoregressive Models via Ordered Autoencoding"

Anytime Autoregressive Model Anytime Sampling for Autoregressive Models via Ordered Autoencoding , ICLR 21 Yilun Xu, Yang Song, Sahaj Gara, Linyuan Go

Yilun Xu 22 Sep 8, 2022
Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)

Training GANs with Stronger Augmentations via Contrastive Discriminator (ICLR 2021) This repository contains the code for reproducing the paper: Train

Jongheon Jeong 174 Dec 29, 2022
Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Maths from examples - Learning advanced mathematical computations from examples This is the source code and data sets relevant to the paper Learning a

Facebook Research 171 Nov 23, 2022
Based on the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral

Geometry-aware Instance-reweighted Adversarial Training This repository provides codes for Geometry-aware Instance-reweighted Adversarial Training (ht

Jingfeng 47 Dec 22, 2022
Zero-shot Synthesis with Group-Supervised Learning (ICLR 2021 paper)

GSL - Zero-shot Synthesis with Group-Supervised Learning Figure: Zero-shot synthesis performance of our method with different dataset (iLab-20M, RaFD,

Andy_Ge 62 Dec 21, 2022