DAGAN - Dual Attention GANs for Semantic Image Synthesis

Overview

License CC BY-NC-SA 4.0 Python 3.6 Packagist Last Commit Maintenance Contributing Ask Me Anything !

Contents

Semantic Image Synthesis with DAGAN

Dual Attention GANs for Semantic Image Synthesis
Hao Tang1, Song Bai2, Nicu Sebe13.
1University of Trento, Italy, 2University of Oxford, UK, 3Huawei Research Ireland, Ireland.
In ACM MM 2020.
The repository offers the official implementation of our paper in PyTorch.

In the meantime, check out our related CVPR 2020 paper Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation and Arxiv paper Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis.

Framework

Results of Generated Images

Cityscapes (512×256)

Facades (1024×1024)

ADE20K (256×256)

CelebAMask-HQ (512×512)

Results of Generated Segmenation Maps

License

Creative Commons License
Copyright (C) 2020 University of Trento, Italy.

All rights reserved. Licensed under the CC BY-NC-SA 4.0 (Attribution-NonCommercial-ShareAlike 4.0 International)

The code is released for academic research use only. For commercial use, please contact [email protected].

Installation

Clone this repo.

git clone https://github.com/Ha0Tang/DAGAN
cd DAGAN/

This code requires PyTorch 1.0 and python 3+. Please install dependencies by

pip install -r requirements.txt

This code also requires the Synchronized-BatchNorm-PyTorch rep.

cd DAGAN_v1/
cd models/networks/
git clone https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
cp -rf Synchronized-BatchNorm-PyTorch/sync_batchnorm .
cd ../../

To reproduce the results reported in the paper, you would need an NVIDIA DGX1 machine with 8 V100 GPUs.

Dataset Preparation

Please download the datasets on the respective webpages.

  • Facades: 55.8M, here.
  • DeepFashion: 592.3M, here.
  • CelebAMask-HQ: 2.7G, here.
  • Cityscapes: 8.4G, here.
  • ADE20K: 953.7M, here.
  • COCO-Stuff: 21.5G, here.

We also provide the prepared datasets for your convience.

sh datasets/download_dagan_dataset.sh [dataset]

where [dataset] can be one of facades, deepfashion, celeba, cityscapes, ade20k, or coco_stuff.

Generating Images Using Pretrained Model

  1. Download the pretrained models using the following script,
sh scripts/download_dagan_model.sh GauGAN_DAGAN_[dataset]

where [dataset] can be one of cityscapes, ade, facades, or celeba.

  1. Change several parameter and then generate images using test_[dataset].sh. If you are running on CPU mode, append --gpu_ids -1.
  2. The outputs images are stored at ./results/[type]_pretrained/ by default. You can view them using the autogenerated HTML file in the directory.

Train and Test New Models

  1. Prepare dataset.
  2. Change several parameters and then run train_[dataset].sh for training. There are many options you can specify. To specify the number of GPUs to utilize, use --gpu_ids. If you want to use the second and third GPUs for example, use --gpu_ids 1,2.
  3. Testing is similar to testing pretrained models. Use --results_dir to specify the output directory. --how_many will specify the maximum number of images to generate. By default, it loads the latest checkpoint. It can be changed using --which_epoch.

Evaluation

For more details, please refer to this issue.

Acknowledgments

This source code is inspired by both GauGAN/SPADE and LGGAN.

Related Projects

EdgeGAN | LGGAN | SelectionGAN | PanoGAN | Guided-I2I-Translation-Papers

Citation

If you use this code for your research, please consider giving stars and citing our papers 🦖 :

DAGAN

@inproceedings{tang2020dual,
  title={Dual Attention GANs for Semantic Image Synthesis},
  author={Tang, Hao and Bai, Song and Sebe, Nicu},
  booktitle ={ACM MM},
  year={2020}
}

EdgeGAN

@article{tang2020edge,
  title={Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis},
  author={Tang, Hao and Qi, Xiaojuan and Xu, Dan and Torr, Philip HS and Sebe, Nicu},
  journal={arXiv preprint arXiv:2003.13898},
  year={2020}
}

LGGAN

@inproceedings{tang2019local,
  title={Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation},
  author={Tang, Hao and Xu, Dan and Yan, Yan and Torr, Philip HS and Sebe, Nicu},
  booktitle={CVPR},
  year={2020}
}

SelectionGAN

@inproceedings{tang2019multi,
  title={Multi-channel attention selection gan with cascaded semantic guidance for cross-view image translation},
  author={Tang, Hao and Xu, Dan and Sebe, Nicu and Wang, Yanzhi and Corso, Jason J and Yan, Yan},
  booktitle={CVPR},
  year={2019}
}

@article{tang2020multi,
  title={Multi-channel attention selection gans for guided image-to-image translation},
  author={Tang, Hao and Xu, Dan and Yan, Yan and Corso, Jason J and Torr, Philip HS and Sebe, Nicu},
  journal={arXiv preprint arXiv:2002.01048},
  year={2020}
}

Contributions

If you have any questions/comments/bug reports, feel free to open a github issue or pull a request or e-mail to the author Hao Tang ([email protected]).

Collaborations

I'm always interested in meeting new people and hearing about potential collaborations. If you'd like to work together or get in contact with me, please email [email protected]. Some of our projects are listed here.


Take a few minutes to appreciate what you have and how far you've come.

Comments
  • How to apply paired image to image translation this model?

    How to apply paired image to image translation this model?

    Hi, I'm so glad to meet your Paper and Code! But I have a question. I want to image to image translation this model like pix2pixHD, But I tested the model I trained and the result was input_label only returns the semantic mask and synthesized_image does not return anything. How Can I apply this model in Image to Image using paired dataset?

    Thanks again!!

    opened by chokyungjin 28
  • Missing key(s) and Unexpected key(s) in state_dict :

    Missing key(s) and Unexpected key(s) in state_dict : "cab.conv1.weight"

    Thanks for your excellent work, but testing your dataset with pretrained models,error occurs like this,

    RuntimeError: Error(s) in loading state_dict for SPADEGenerator: Missing key(s) in state_dict: "channelAtt.conv1.weight", "channelAtt.conv1.bias", "channelAtt.conv2.weight", "channelAtt.conv2.bias". Unexpected key(s) in state_dict: "cab.conv1.weight", "cab.conv1.bias", "cab.conv2.weight", "cab.conv2.bias".

    maybe the model you upload doesn't fit the net structure? The versions torch--1.0.0 and torchvision--0.2.1 are the same as you.

    thx!

    opened by Kravrolens 5
  • Missing key(s) in state_dict

    Missing key(s) in state_dict

    Hi, impressive work here. I would like to have an experiment on data augmentation with your code. However, when I followed instructions on your front page until 'sh test_ade.sh', it showed me: '' RuntimeError: Error(s) in loading state_dict for SPADEGenerator: Missing key(s) in state_dict: "channelAtt.conv1.weight", "channelAtt.conv1.bias", "channelAtt.conv2.weight", "channelAtt.conv2.bias". Unexpected key(s) in state_dict: "cab.conv1.weight", "cab.conv1.bias", "cab.conv2.weight", "cab.conv2.bias".

    Do you have any idea about this issue?

    opened by WenyuZhu 2
  • can't download dataset and pre-trained model

    can't download dataset and pre-trained model

    hello,I want to download these datasets and pre-trained models,however it shows that no such file or directory and I can't download successfully. So does the data still exist on this server? thanks

    opened by KevinLight831 2
  • download scripts missed

    download scripts missed

    hello, following readme commands like sh scripts/download_dagan_model.sh [dataset], i cannot find the directory scripts (nor the scripts files). am i missing something or the readme is outdated?

    opened by eps696 2
  • About the paper

    About the paper

    Thank you so much for your great job. I posted this question in one another page of your codes. sorry for that. I was just wondering when we are going to do an ablation study it is enough to put the loss term of the ablated part to zero or we have to change the structure of our code and disentangle all the parameters of the part from the rest of the network?

    opened by Mathilda88 1
  • no_instance param is not working

    no_instance param is not working

    Hi, I'm so glad to meet your Paper and Code! But I have a question. I tried to train with no_instance parameter, But the code has returned this error. But I don't know why this is happening, Because in Pix2pixHD, which is similar to the code you configured on the custom dataloader, the no_instance parameter is working.

    File "/home/user/.local/lib/python3.6/site-packages/torch/nn/init.py", line 282, in _calculate_fan_in_and_fan_out receptive_field_size = tensor[0][0].numel() IndexError: index 0 is out of bounds for dimension 0 with size 0

    One more question, I want to learn models using grayscale images, But the option does not have an input_nc option, only have output_nc.

    Thanks again!!

    opened by chokyungjin 1
  • Test model trained with USE_VAE switch?

    Test model trained with USE_VAE switch?

    Hello! I've trained model using --use_vae switch, but cant find in the source code/docs how to test model using style image. It looks like there is no testing capabilities in the source code, am I wrong? How I can use segment map + style image to perform test using netE?

    Closest I found is function "guide_test(self):" in the tf implementation of spade: line 575 of https://github.com/taki0112/SPADE-Tensorflow/blob/4517824ea3e9428d5ab5413847ed2af9891b5830/SPADE.py

    opened by Kitty-sunray 0
  • DAGAN v.2 or successor?

    DAGAN v.2 or successor?

    Hello! Awesome work! There is DAGAN v2.0 folder created more than 15 month ago. Is there a paper or any expectation of it to be released? Or what is successor to DAGAN?

    opened by Kitty-sunray 0
  • Docker image for inference

    Docker image for inference

    This pull request includes a Dockerfile that packages DAGAN in a reproducible Docker image. I've pushed the image to the Replicate Docker registry and included a link in the README.

    I noticed that the pre-trained models failed to load due to the name of the channel attention layer having been renamed from cab, so I had to rename it back to cab.

    We are working to make Replicate a registry of machine learning models that can be easily reproduced. With Docker the models can be run "forever", without having to worry about missing dependencies. The website design is still a work in progress, so it might look a little rough around the edges.

    opened by andreasjansson 0
Owner
Hao Tang
To develop a complete mind: Study the science of art; Study the art of science. Learn how to see. Realize that everything connects to everything else.
Hao Tang
Dual languaged (rus+eng) tool for packing and unpacking archives of Silky Engine.

SilkyArcTool English Dual languaged (rus+eng) GUI tool for packing and unpacking archives of Silky Engine. It is not the same arc as used in Ai6WIN. I

Tester 5 Sep 15, 2022
End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Image captioning End-to-end image captioning with EfficientNet-b3 + LSTM with Attention Model is seq2seq model. In the encoder pretrained EfficientNet

null 2 Feb 10, 2022
Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

null 537 Jan 5, 2023
Binaural Speech Synthesis

Binaural Speech Synthesis This repository contains code to train a mono-to-binaural neural sound renderer. If you use this code or the provided datase

Facebook Research 135 Dec 18, 2022
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented. Mostly I would recommend giving a quick look to the figures beyond the introduction.

Corentin Jemine 38.5k Jan 3, 2023
Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

Neosapience 103 Dec 23, 2022
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Keon Lee 67 Nov 14, 2022
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

Yuchao Zhang 204 Jul 14, 2022
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

Ryuichi Yamamoto 1.8k Dec 30, 2022
IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models. Everything is pure Python and PyTorch based to keep it as simple and beginner-friendly, yet powerful as possible.

Digital Phonetics at the University of Stuttgart 247 Jan 5, 2023
PyTorch implementation of Tacotron speech synthesis model.

tacotron_pytorch PyTorch implementation of Tacotron speech synthesis model. Inspired from keithito/tacotron. Currently not as much good speech quality

Ryuichi Yamamoto 279 Dec 9, 2022
End-2-end speech synthesis with recurrent neural networks

Introduction New: Interactive demo using Google Colaboratory can be found here TTS-Cube is an end-2-end speech synthesis system that provides a full p

Tiberiu Boros 214 Dec 7, 2022
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism This repository is the official PyTorch implementation of our AAAI-2022 paper, in

Jinglin Liu 829 Jan 7, 2023
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
pytorch implementation of Attention is all you need

A Pytorch Implementation of the Transformer: Attention Is All You Need Our implementation is largely based on Tensorflow implementation Requirements N

null 230 Dec 7, 2022
A PyTorch implementation of the Transformer model in "Attention is All You Need".

Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish V

Yu-Hsiang Huang 7.1k Jan 5, 2023
Intent parsing and slot filling in PyTorch with seq2seq + attention

PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars

Sean Robertson 159 Apr 4, 2022
multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search

multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search

hellonlp 30 Dec 12, 2022
[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

Yongming Rao 89 Dec 18, 2022