Official implementation of the ICLR 2021 paper

Bosch Research

Last update: Dec 28, 2022

Related tags

Deep Learning machine-learning computer-vision deep-learning pytorch gan image-generation multi-modal generative-adversarial-networks oasis image-to-image-translation semantic-image-synthesis iclr2021 label-to-image-translation

Overview

You Only Need Adversarial Supervision for Semantic Image Synthesis

Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial Supervision for Semantic Image Synthesis". The code allows the users to reproduce and extend the results reported in the study. Please cite the paper when reporting, reproducing or extending the results.

[OpenReview] [Arxiv]

Overview

This repository implements the OASIS model, which generates realistic looking images from semantic label maps. In addition, many different images can be generated from any given label map by simply resampling a noise vector (first two rows of the figure below). The model also allows to just resample parts of the image (see the last two rows of the figure below). Check out the paper for details, as well as the appendix, which contains many additional examples.

Setup

First, clone this repository:

git clone https://github.com/boschresearch/OASIS.git
cd OASIS

The code is tested for Python 3.7.6 and the packages listed in oasis.yml. The basic requirements are PyTorch and Torchvision. The easiest way to get going is to install the oasis conda environment via

conda env create --file oasis.yml
source activate oasis

Datasets

For COCO-Stuff, Cityscapes or ADE20K, please follow the instructions for the dataset preparation as outlined in https://github.com/NVlabs/SPADE.

Training the model

To train the model, execute the training scripts in the scripts folder. In these scripts you first need to specify the path to the data folder. Via the --name parameter the experiment can be given a unique identifier. The experimental results are then saved in the folder ./checkpoints, where a new folder for each run is created with the specified experiment name. You can also specify another folder for the checkpoints using the --checkpoints_dir parameter. If you want to continue training, start the respective script with the --continue_train flag. Have a look at config.py for other options you can specify.
Training on 4 NVIDIA Tesla V100 (32GB) is recommended.

Testing the model

To test a trained model, execute the testing scripts in the scripts folder. The --name parameter should correspond to the experiment name that you want to test, and the --checkpoints_dir should the folder where the experiment is saved (default: ./checkpoints). These scripts will generate images from a pretrained model in ./results/name/.

Measuring FID

The FID is computed on the fly during training, using the popular PyTorch FID implementation from https://github.com/mseitzer/pytorch-fid. At the beginning of training, the inception moments of the real images are computed before the actual training loop starts. How frequently the FID should be evaluated is controlled via the parameter --freq_fid, which is set to 5000 steps by default. The inception net that is used for FID computation automatically downloads a pre-trained inception net checkpoint. If that automatic download fails, for instance because your server has restricted internet access, get the checkpoint named pt_inception-2015-12-05-6726825d.pth from here and place it in /utils/fid_folder/. In this case, do not forget to replace load_state_dict_from_url function accordingly.

Pretrained models

The checkpoints for the pre-trained models are available here as zip files. Copy them into the checkpoints folder (the default is ./checkpoints, create it if it doesn't yet exist) and unzip them. The folder structure should be

checkpoints_dir
├── oasis_ade20k_pretrained                   
├── oasis_cityscapes_pretrained  
└── oasis_coco_pretrained

You can generate images with a pre-trained checkpoint via test.py. Using the example of ADE20K:

python test.py --dataset_mode ade20k --name oasis_ade20k_pretrained \
--dataroot path_to/ADEChallenge2016

This script will create a folder named ./results in which the resulting images are saved.

If you want to continue training from this checkpoint, use train.py with the same --name parameter and add --continue_train --which_iter best.

Citation

If you use this work please cite

@inproceedings{schonfeld_sushko_iclr2021,
  title={You Only Need Adversarial Supervision for Semantic Image Synthesis},
  author={Sch{\"o}nfeld, Edgar and Sushko, Vadim and Zhang, Dan and Gall, Juergen and Schiele, Bernt and Khoreva, Anna},
  booktitle={International Conference on Learning Representations},
  year={2021}
}

License

This project is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.

For a list of other open source components included in this project, see the file 3rd-party-licenses.txt.

Purpose of the project

This software is a research prototype, solely developed for and published as part of the publication cited above. It will neither be maintained nor monitored in any way.

Contact

Please feel free to open an issue or contact us personally if you have questions, need help, or need explanations. Write to one of the following email addresses, and maybe put one other in the cc:

[email protected]
[email protected]
[email protected]
[email protected]

Comments

RuntimeError: Creating MTGP constants failed

Hi, I am trying to implement this repo. I've downloaded the ade20k checkpoints and created a conda env following your yaml file.

When I run the testing command python test.py --name oasis_ade20k --dataset_mode ade20k --gpu_ids 0 \ azureuser@ivan-fantasia-default --dataroot test_images --batch_size 1 I get the following error:

/opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [232,0,0], thread: [101,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
Traceback (most recent call last):
  File "test.py", line 25, in <module>
    generated = model(None, label, "generate", None)
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/azureuser/IM/OASIS/models/models.py", line 72, in forward
    fake = self.netEMA(label)
  File "/anaconda/envs/oasis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/azureuser/IM/OASIS/models/generator.py", line 36, in forward
    z = torch.randn(seg.size(0), self.opt.z_dim, dtype=torch.float32, device=dev)
RuntimeError: Creating MTGP constants failed. at /opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/THC/THCTensorRandom.cu:35

I am running on test_image folder which are some ade20k images.

Any suggestion? Thanks ;)

opened by ivanlengyel 9

noisy data set

hello dear author @edgarschnfld

I downloaded the pre-trained model and added noise to the labels that are input for the network but the test failed and the model didn't work for the noisy dataset at all. what should I do for testing the model on noise?

opened by ranch-hands 9
test new labels on this network

hi dear @SebastianSchildt @SushkoVadim I used another network to create label maps from images (this network: https://github.com/CSAILVision/semantic-segmentation-pytorch) then fed your oasis network with them in test mode, but it doesn't work. it says: [RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED] would you please help ?

thank you so much

opened by ranch-hands 8
Any idea on why OASIS discriminator can encourage repetitive patterns?

While doing my experiments I've replaced SPADE's discriminator with OASIS one and tried this with VGG being both disabled and enabled. In any case images seems to be visually better (even though FID is higher on CityScapes) but what is very noticeable (unfortunately I can't share images) is that OASIS discriminator for some reason encourages repetitive patterns in result images, specifically in broad semantic regions like road. Do you have any idea what could cause that given that the only change in my model I did is replacing MultiscaleDiscriminator with OASISDiscriminator?

I'm sorry for a vague question but I just wonder if there any obvious things I can try to fight this. I also use LabelMix loss with Lambda = 10.

opened by s1ddok 5
beta2 value reasoning

Hey!

Could you explain the reasoning behind beta2 being equal to 0.999 instead of 0.9 (like in SPADE)? Didn't find any mention of this neither in the paper nor in code

opened by s1ddok 4
what is input and output of network exactly

hello dear all @SebastianSchildt @LGro @johannes-mueller @schorg @c2bo

Does anybody knows what the input and output of this network is exactly? does it take those labels and the output is the normal pictures? Am I right?

thanks in advance

opened by ranch-hands 4
How do you compute FID?
To reproduce your results, one thing I am not sure, could you please help me to make sure?

For cityscapes dataset, from https://github.com/NVlabs/SPADE, and [https://github.com/mseitzer/pytorch-fid](FID computation). Without doubt, your synthesized images are from your codes.

But how about the real images?

Resize. resize real images to the same size with your synthesized (256 * 512) in nearest down sampling?

what are the real images? only val image froms cityscapes (gtFine/val) or all images (gtFine/val, gtFine/test, gtFine/train)?

Thanks a lot.
opened by xml94 3
EMA, 3D noise
Hi. @edgarschnfld @SushkoVadim

I am a student studying semantic image synthesis. Thank you for the great work. I have two questions about the difference between paper and code.

EMA As you cite [Yaz et al., 2018], exponential moving average is a good technique for training GAN. However, in your code https://github.com/boschresearch/OASIS/blob/6e728ec5f5b7b69d6744485aa69a355e0164423c/utils/utils.py#L125-L132 I think below code might be added

model.module.netG.state_dict()[key].data.copy_( model.module.netEMA.state_dict()[key].data )

If not, netG is not trained using EMA.

Yaz, Yasin, et al. "The unusual effectiveness of averaging in GAN training." International Conference on Learning Representations. 2018.

3D noise

If I do not misunderstand your paper, the paper says that the noise of OASIS has been sampled from a 3D normal distribution. And this is one of the main differences with SPADE. However, in your code at, https://github.com/boschresearch/OASIS/blob/f049c37ff711792c09e573586640e1fd11d69d58/models/generator.py#L34-L39 Noise is not sampled from the 3D normal distribution. It was also sampled from a 1D normal distribution. Then expand it to 3D, which replicates the same vector spatial way. In my opinion, this code should be replaced by

z = torch.randn(seg.shape, ...)

I think both two parts are pretty crucial for your paper. If there is any reason for these choices or my fault, please let me know.

Thank you.
opened by pmh9960 2
The purpose of collecting running stats when updating EMA before FID computation, image or network saving

Hi @SushkoVadim @edgarschnfld,

Thanks for the excellent work. I noted that when updating EMA, you collect the running stats for BatchNorm before FID computation, image or network saving (see https://github.com/boschresearch/OASIS/blob/master/utils/utils.py#L133).

May I ask about the purpose and intuition of this operation? How significant would it affect the model performance? Thank you in advance.

opened by EndlessSora 2
reported number in paper is from best or latest model?

I noticed that in the code you will calculate FID according to test images and save the best model accordingly(correct me if wrong). I am wondering if is the best model used to report numbers in the paper? Thanks

opened by Yuheng-Li 2
Computation of the loss reweighting

Hey, I've noticed that the weight reweighting that is used in the code doesn't match the papers equation Indeed, here, the formula is BxHxW/(num_pixels_of_class_i*num_of_non_zeros_classes_in_batch) whereas the paper showcase the following equation: BxHxW/(num_pixels_of_class_i). Did i get something wrong? If not can you clarify which one is the correct equation? Thanks

opened by nicolas-dufour 1
Questioin on losses.py

Hello, Thank you for sharing your great work! I noticed that on line 36 of the losses.py file, coefficients = torch.reciprocal(class_occurence) * torch.numel(label) / (num_of_classes * label.shape[1]) it seems that label.shape[1] should be modified to label.shape[0]here so that the solution is in line with the equation in the paper (HxW), not sure if I understand it correctly? Or is there some detail I'm not noticing? Thanks.

opened by ZongWei-HUST 0
how to generate diverse images from a "Label map"?

how to generate diverse images from a "Label map"? for example: how to generate "OASIS1"、"OASIS2"、"OASIS3"、"OASIS4"、"OASIS5" form "Label map" what is the terminal line?

opened by mapengsen 0
Question on SpectralNorm (Discriminator)

Hello @edgarschnfld, @SushkoVadim,

thank you for sharing your work! I was wondering if you have explored the importance of the spectral norm in the discriminator in more detail. I also noticed that you applied the spectral norm to every layer except the last convolution, see https://github.com/boschresearch/OASIS/blob/master/models/discriminator.py#L23 Is this a specific design decision?

It would be great if you could share some insights here. Thanks,

Nikolai

opened by Nikolai10 0
The label of output by the discriminator is inconsistent with the label of the dataset

Hi, I'm trying to use OASIS discriminator with my own dataset's semantic segmantation. In short, my ground truth onehot segmantation has 3 channels, and it is shaped like: channel 0: background, channel 1: human, and channel 2 for the fake pixel label as paper said. But when calculate cross entropy loss, the target label seems to have been processed into an one channel label like: 1:background, 2:human whenis_real=Trueor 0: fake when is_real=False. The model still works properly, but the predicted semantic segmentation display order is not quite correct when visualizing. Can you please tell me if my process is correct, any suggestions would be very helpful, thanks.

opened by Wzj-zju 1

Official implementation of the ICLR 2021 paper

Related tags

Overview

You Only Need Adversarial Supervision for Semantic Image Synthesis

Overview

Setup

Datasets

Training the model

Testing the model

Measuring FID

Pretrained models

Citation

License

Purpose of the project

Contact

Comments

Owner

Bosch Research

Official code for the ICLR 2021 paper Neural ODE Processes

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology

The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

Official PyTorch implementation of "Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets" (ICLR 2021)

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

Official Pytorch implementation of ICLR 2018 paper Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge.

ReLoss - Official implementation for paper "Relational Surrogate Loss Learning" ICLR 2022

An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Tensorflow 2 implementation of the paper: Learning and Evaluating Representations for Deep One-class Classification published at ICLR 2021

Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

Code for ICLR 2021 Paper, "Anytime Sampling for Autoregressive Models via Ordered Autoencoding"

Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)

Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Based on the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral

Zero-shot Synthesis with Group-Supervised Learning (ICLR 2021 paper)