DAGAN - Dual Attention GANs for Semantic Image Synthesis

Hao Tang

Last update: Oct 8, 2022

Related tags

Text Data & NLP DAGAN

Overview

Semantic Image Synthesis with DAGAN
Installation
Dataset Preparation
Generating Images Using Pretrained Model
Train and Test New Models
Evaluation
Acknowledgments
Related Projects
Citation
Contributions
Collaborations

Semantic Image Synthesis with DAGAN

Dual Attention GANs for Semantic Image Synthesis
Hao Tang¹, Song Bai², Nicu Sebe¹³.
¹University of Trento, Italy, ²University of Oxford, UK, ³Huawei Research Ireland, Ireland.
In ACM MM 2020.
The repository offers the official implementation of our paper in PyTorch.

In the meantime, check out our related CVPR 2020 paper Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation and Arxiv paper Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis.

Framework

Results of Generated Images

Cityscapes (512×256)

Facades (1024×1024)

ADE20K (256×256)

CelebAMask-HQ (512×512)

Results of Generated Segmenation Maps

License

The code is released for academic research use only. For commercial use, please contact [email protected].

Installation

Clone this repo.

git clone https://github.com/Ha0Tang/DAGAN
cd DAGAN/

This code requires PyTorch 1.0 and python 3+. Please install dependencies by

pip install -r requirements.txt

This code also requires the Synchronized-BatchNorm-PyTorch rep.

cd DAGAN_v1/
cd models/networks/
git clone https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
cp -rf Synchronized-BatchNorm-PyTorch/sync_batchnorm .
cd ../../

To reproduce the results reported in the paper, you would need an NVIDIA DGX1 machine with 8 V100 GPUs.

Dataset Preparation

Please download the datasets on the respective webpages.

Facades: 55.8M, here.
DeepFashion: 592.3M, here.
CelebAMask-HQ: 2.7G, here.
Cityscapes: 8.4G, here.
ADE20K: 953.7M, here.
COCO-Stuff: 21.5G, here.

We also provide the prepared datasets for your convience.

sh datasets/download_dagan_dataset.sh [dataset]

where [dataset] can be one of facades, deepfashion, celeba, cityscapes, ade20k, or coco_stuff.

Generating Images Using Pretrained Model

Download the pretrained models using the following script,

sh scripts/download_dagan_model.sh GauGAN_DAGAN_[dataset]

where [dataset] can be one of cityscapes, ade, facades, or celeba.

Change several parameter and then generate images using test_[dataset].sh. If you are running on CPU mode, append --gpu_ids -1.
The outputs images are stored at ./results/[type]_pretrained/ by default. You can view them using the autogenerated HTML file in the directory.

Train and Test New Models

Prepare dataset.
Change several parameters and then run train_[dataset].sh for training. There are many options you can specify. To specify the number of GPUs to utilize, use --gpu_ids. If you want to use the second and third GPUs for example, use --gpu_ids 1,2.
Testing is similar to testing pretrained models. Use --results_dir to specify the output directory. --how_many will specify the maximum number of images to generate. By default, it loads the latest checkpoint. It can be changed using --which_epoch.

Evaluation

FID: mseitzer/pytorch-fid
FRD: Ha0Tang/GestureGAN
LPIPS: richzhang/PerceptualSimilarity
DRN: fyu/drn [model: drn-d-105_ms_cityscapes.pth]
UperNet: CSAILVision/semantic-segmentation-pytorch [model: baseline-resnet101-upernet]
DeepLab: kazuto1011/deeplab-pytorch [model: deeplabv2_resnet101_msc-cocostuff164k-100000.pth]

For more details, please refer to this issue.

Acknowledgments

This source code is inspired by both GauGAN/SPADE and LGGAN.

Related Projects

EdgeGAN | LGGAN | SelectionGAN | PanoGAN | Guided-I2I-Translation-Papers

Citation

If you use this code for your research, please consider giving stars ⭐ and citing our papers 🦖 :

DAGAN

@inproceedings{tang2020dual,
  title={Dual Attention GANs for Semantic Image Synthesis},
  author={Tang, Hao and Bai, Song and Sebe, Nicu},
  booktitle ={ACM MM},
  year={2020}
}

EdgeGAN

@article{tang2020edge,
  title={Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis},
  author={Tang, Hao and Qi, Xiaojuan and Xu, Dan and Torr, Philip HS and Sebe, Nicu},
  journal={arXiv preprint arXiv:2003.13898},
  year={2020}
}

LGGAN

@inproceedings{tang2019local,
  title={Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation},
  author={Tang, Hao and Xu, Dan and Yan, Yan and Torr, Philip HS and Sebe, Nicu},
  booktitle={CVPR},
  year={2020}
}

SelectionGAN

@inproceedings{tang2019multi,
  title={Multi-channel attention selection gan with cascaded semantic guidance for cross-view image translation},
  author={Tang, Hao and Xu, Dan and Sebe, Nicu and Wang, Yanzhi and Corso, Jason J and Yan, Yan},
  booktitle={CVPR},
  year={2019}
}

@article{tang2020multi,
  title={Multi-channel attention selection gans for guided image-to-image translation},
  author={Tang, Hao and Xu, Dan and Yan, Yan and Corso, Jason J and Torr, Philip HS and Sebe, Nicu},
  journal={arXiv preprint arXiv:2002.01048},
  year={2020}
}

Contributions

If you have any questions/comments/bug reports, feel free to open a github issue or pull a request or e-mail to the author Hao Tang ([email protected]).

Collaborations

I'm always interested in meeting new people and hearing about potential collaborations. If you'd like to work together or get in contact with me, please email [email protected]. Some of our projects are listed here.

Take a few minutes to appreciate what you have and how far you've come.

Comments

How to apply paired image to image translation this model?

Hi, I'm so glad to meet your Paper and Code! But I have a question. I want to image to image translation this model like pix2pixHD, But I tested the model I trained and the result was input_label only returns the semantic mask and synthesized_image does not return anything. How Can I apply this model in Image to Image using paired dataset?

Thanks again!!

opened by chokyungjin 28
Missing key(s) and Unexpected key(s) in state_dict : "cab.conv1.weight"

Thanks for your excellent work, but testing your dataset with pretrained models,error occurs like this,

RuntimeError: Error(s) in loading state_dict for SPADEGenerator: Missing key(s) in state_dict: "channelAtt.conv1.weight", "channelAtt.conv1.bias", "channelAtt.conv2.weight", "channelAtt.conv2.bias". Unexpected key(s) in state_dict: "cab.conv1.weight", "cab.conv1.bias", "cab.conv2.weight", "cab.conv2.bias".

maybe the model you upload doesn't fit the net structure？ The versions torch--1.0.0 and torchvision--0.2.1 are the same as you.

thx!

opened by Kravrolens 5
Missing key(s) in state_dict

Hi, impressive work here. I would like to have an experiment on data augmentation with your code. However, when I followed instructions on your front page until 'sh test_ade.sh', it showed me: '' RuntimeError: Error(s) in loading state_dict for SPADEGenerator: Missing key(s) in state_dict: "channelAtt.conv1.weight", "channelAtt.conv1.bias", "channelAtt.conv2.weight", "channelAtt.conv2.bias". Unexpected key(s) in state_dict: "cab.conv1.weight", "cab.conv1.bias", "cab.conv2.weight", "cab.conv2.bias".

Do you have any idea about this issue?

opened by WenyuZhu 2
can't download dataset and pre-trained model

hello，I want to download these datasets and pre-trained models，however it shows that no such file or directory and I can't download successfully. So does the data still exist on this server? thanks

opened by KevinLight831 2
download scripts missed

hello, following readme commands like sh scripts/download_dagan_model.sh [dataset], i cannot find the directory scripts (nor the scripts files). am i missing something or the readme is outdated?

opened by eps696 2
About the paper

Thank you so much for your great job. I posted this question in one another page of your codes. sorry for that. I was just wondering when we are going to do an ablation study it is enough to put the loss term of the ablated part to zero or we have to change the structure of our code and disentangle all the parameters of the part from the rest of the network?

opened by Mathilda88 1
no_instance param is not working

Hi, I'm so glad to meet your Paper and Code! But I have a question. I tried to train with no_instance parameter, But the code has returned this error. But I don't know why this is happening, Because in Pix2pixHD, which is similar to the code you configured on the custom dataloader, the no_instance parameter is working.

File "/home/user/.local/lib/python3.6/site-packages/torch/nn/init.py", line 282, in _calculate_fan_in_and_fan_out receptive_field_size = tensor[0][0].numel() IndexError: index 0 is out of bounds for dimension 0 with size 0

One more question, I want to learn models using grayscale images, But the option does not have an input_nc option, only have output_nc.

Thanks again!!

opened by chokyungjin 1
Test model trained with USE_VAE switch?

Hello! I've trained model using --use_vae switch, but cant find in the source code/docs how to test model using style image. It looks like there is no testing capabilities in the source code, am I wrong? How I can use segment map + style image to perform test using netE?

Closest I found is function "guide_test(self):" in the tf implementation of spade: line 575 of https://github.com/taki0112/SPADE-Tensorflow/blob/4517824ea3e9428d5ab5413847ed2af9891b5830/SPADE.py

opened by Kitty-sunray 0
DAGAN v.2 or successor?

Hello! Awesome work! There is DAGAN v2.0 folder created more than 15 month ago. Is there a paper or any expectation of it to be released? Or what is successor to DAGAN?

opened by Kitty-sunray 0
Docker image for inference

This pull request includes a Dockerfile that packages DAGAN in a reproducible Docker image. I've pushed the image to the Replicate Docker registry and included a link in the README.

I noticed that the pre-trained models failed to load due to the name of the channel attention layer having been renamed from cab, so I had to rename it back to cab.

We are working to make Replicate a registry of machine learning models that can be easily reproduced. With Docker the models can be run "forever", without having to worry about missing dependencies. The website design is still a work in progress, so it might look a little rough around the edges.

opened by andreasjansson 0

Owner

Hao Tang

To develop a complete mind: Study the science of art; Study the art of science. Learn how to see. Realize that everything connects to everything else.

GitHub

Dual languaged (rus+eng) tool for packing and unpacking archives of Silky Engine.

SilkyArcTool English Dual languaged (rus+eng) GUI tool for packing and unpacking archives of Silky Engine. It is not the same arc as used in Ai6WIN. I

5 Sep 15, 2022

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Image captioning End-to-end image captioning with EfficientNet-b3 + LSTM with Attention Model is seq2seq model. In the encoder pretrained EfficientNet

2 Feb 10, 2022

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

537 Jan 5, 2023

Binaural Speech Synthesis

Binaural Speech Synthesis This repository contains code to train a mono-to-binaural neural sound renderer. If you use this code or the provided datase

135 Dec 18, 2022

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented. Mostly I would recommend giving a quick look to the figures beyond the introduction.

38.5k Jan 3, 2023

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

30 Dec 12, 2022

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

89 Dec 18, 2022

DAGAN - Dual Attention GANs for Semantic Image Synthesis

Related tags

Overview

Contents

Semantic Image Synthesis with DAGAN

Framework

Results of Generated Images

Cityscapes (512×256)

Facades (1024×1024)

ADE20K (256×256)

CelebAMask-HQ (512×512)

Results of Generated Segmenation Maps

Installation

Dataset Preparation

Generating Images Using Pretrained Model

Train and Test New Models

Evaluation

Acknowledgments

Related Projects

Citation

Contributions

Collaborations

Comments

Owner

Hao Tang

Dual languaged (rus+eng) tool for packing and unpacking archives of Silky Engine.

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Binaural Speech Synthesis

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

PyTorch implementation of Tacotron speech synthesis model.

End-2-end speech synthesis with recurrent neural networks

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

pytorch implementation of Attention is all you need

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Intent parsing and slot filling in PyTorch with seq2seq + attention

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification