StyleSwin: Transformer-based GAN for High-resolution Image Generation

Overview

StyleSwin

Teaser

This repo is the official implementation of "StyleSwin: Transformer-based GAN for High-resolution Image Generation".

By Bowen Zhang, Shuyang Gu, Bo Zhang, Jianmin Bao, Dong Chen, Fang Wen, Yong Wang and Baining Guo.

Code and pretrained models will be released soon. Please stay tuned.

Abstract

Despite the tantalizing success in a broad of vision tasks, transformers have not yet demonstrated on-par ability as ConvNets in high-resolution image generative modeling. In this paper, we seek to explore using pure transformers to build a generative adversarial network for high-resolution image synthesis. To this end, we believe that local attention is crucial to strike the balance between computational efficiency and modeling capacity. Hence, the proposed generator adopts Swin transformer in a style-based architecture. To achieve a larger receptive field, we propose double attention which simultaneously leverages the context of the local and the shifted windows, leading to improved generation quality. Moreover, we show that offering the knowledge of the absolute position that has been lost in window-based transformers greatly benefits the generation quality. The proposed StyleSwin is scalable to high resolutions, with both the coarse geometry and fine structures benefit from the strong expressivity of transformers. However, blocking artifacts occur during high-resolution synthesis because performing the local attention in a block-wise manner may break the spatial coherency. To solve this, we empirically investigate various solutions, among which we find that employing a wavelet discriminator to examine the spectral discrepancy effectively suppresses the artifacts. Extensive experiments show the superiority over prior transformer-based GANs, especially on high resolutions, e.g., 1024x1024. The StyleSwin, without complex training strategies, excels over StyleGAN on CelebA-HQ 1024x1024, and achieves on-par performance on FFHQ 1024x1024, proving the promise of using transformers for high-resolution image generation.

Main Results

Quantitative Results

Dataset Resolution FID Pretrained Model
FFHQ 256x256 2.81 -
LSUN Church 256x256 3.10 -
CelebA-HQ 256x256 3.25 -
FFHQ 1024x1024 5.07 -
CelebA-HQ 1024x1024 4.43 -

Qualitative Results

Image samples of FFHQ-1024 generated by StyleSwin:

Image samples of CelebA-HQ 1024 generated by StyleSwin:

Latent code interpolation examples of FFHQ-1024 between the left-most and the right-most images:

Citing StyleSwin

@misc{zhang2021styleswin,
      title={StyleSwin: Transformer-based GAN for High-resolution Image Generation}, 
      author={Bowen Zhang and Shuyang Gu and Bo Zhang and Jianmin Bao and Dong Chen and Fang Wen and Yong Wang and Baining Guo},
      year={2021},
      eprint={2112.10762},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Maintenance

This is the codebase for our research work. Please open a GitHub issue for any help. If you have any questions regarding the technical details, feel free to contact [email protected] or [email protected].

License

The codes and the pretrained model in this repository are under the MIT license as specified by the LICENSE file. We use our labeled dataset to train the scratch detection model.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Comments
  • Betas

    Betas

    Hello, i forgot to ask in the previous issue, but what is the intuition behind beta1=0.0 and beta2=0.99? I've seen it in a couple more projects (Such as CIPS), and i always wondered how did they come up with these values (As usually, GANs have beta1=0.5 and beta2=0.999). Is there some property of these values that helps training? Or is it just betas that seemed to work the most?

    opened by TheGullahanMaster 22
  • Query: How to save the generated samples

    Query: How to save the generated samples

    Thank you authors for sharing the interesting work you have done in deep generative modeling. I have a rather small doubt, this is regarding the command to generate samples.

    python -m torch.distributed.launch --nproc_per_node=1 train_styleswin.py --sample_path /path_to_save_generated_samples --size 256 --G_channel_multiplier 2 --ckpt /path/to/checkpoint --eval --val_num_batches 12500 --val_batch_size 4 --eval_gt_path /path_to_real_images_50k
    

    Here, I wanted to know what should bepath_to_save_generated_samples?

    I am getting the following error message:

    Traceback (most recent call last):
      File "train_styleswin.py", line 382, in <module>
        os.mkdir(args.sample_path)
    FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/Repositories/StyleSwin/StyleSwin_generated_samples/samples'
    

    I made a directory in the cloned repository called StyleSwin_generated_samples for saving the samples.

    opened by KomputerMaster64 5
  • nvcc fatal: Unknown option '-generate-dependencies-with-compile'

    nvcc fatal: Unknown option '-generate-dependencies-with-compile'

    Hi, I was referring to your github and trying to implement the StyleSwin repo. ecountering the following problem: nvcc fatal : Unknown option '-generate-dependencies-with-compile', not sure whast the problem. Screenshot from 2022-07-11 13-45-56

    opened by navuboy 5
  • Train and Validation Split

    Train and Validation Split

    What was the train and validation split used? I'm using the checkpoint provided and testing with a validation set of the top 10k, similar to Co-Mod-GAN's split (section 5.1). Using this split I am getting a FID of 4.26.

    opened by stevenwalton 4
  • Training log of losses

    Training log of losses

    Hi, Thank you for your awesome research! I am training the model with my own dataset, i want to know the training logs of losses if it is possible. Discriminator loss seems to converge so fast (close to 0) , is it right?

    Best regards, Hankyu Jang

    opened by hanq0212 4
  • FID Curve

    FID Curve

    Great work! However, I use 4 x 3090 GPUs to train StyleSwin on the FFHQ-256 dataset, and evaluate FID on the same dataset. Then get the following FID curve from 0-500k after 4 days.

    image

    Is this normal? I might not see the probability of getting FID less than 10 after 1000k. Could you please show your FID curve in this dataset?

    opened by tau-yihouxiang 3
  • Why do you replace noise with SPE?

    Why do you replace noise with SPE?

    Compared with Stylegan2, I notice that you you replace noise with SPE at the same place. What are the differences between SPE and noise? Can SPE achieve the effect of noise? Seems like SPE is a fixed vector?

    Thanks.

    opened by zhouwy19 3
  • question strategy in resolution progression

    question strategy in resolution progression

    Congratulations on this great job. I would like to ask you if your training strategy is similar to the StyleGAN resolution progression (e.g. 64x64, then 128x128) Thanks!

    opened by JcccKing 2
  • How to finetune?

    How to finetune?

    Sorry if this is a silly question, but I wanted ask if you could provide and example of how to fine-tune one of your existing models to a new dataset?

    An example from StyleGan3 # Fine-tune StyleGAN3-R for MetFaces-U using 1 GPU, starting from the pre-trained FFHQ-U pickle. python train.py --outdir=~/training-runs --cfg=stylegan3-r --data=~/datasets/metfacesu-1024x1024.zip \ --gpus=8 --batch=32 --gamma=6.6 --mirror=1 --kimg=5000 --snap=5 \ --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-ffhqu-1024x1024.pkl

    opened by GinIsTheAnswer 2
  • pre_training and fine-tune

    pre_training and fine-tune

    Hi, thank you so much for sharing your code! I have a question. I'd like to transfer the model to medical images. But I don't I retrain directly or I need load the pre-training model and fine-tune it? What do you think about it? Looking forward to your reply!

    opened by Joker-ZXR 2
  • About Automatic Mixed Precision

    About Automatic Mixed Precision

    Thank you for awesome research and code release! Is there any reason that you don't use automatic mixed precision package of pytorch? Did it lower the performance of model when you use it?

    opened by hanq0212 2
  • Error using ckpt when resuming

    Error using ckpt when resuming

    Thanks for sharing, I am having this error:

    Traceback (most recent call last): File "train_styleswin.py", line 409, in generator.load_state_dict(ckpt["g"]) File "/mnt/anaconda3/envs/StyleSwin/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1668, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Generator: Unexpected key(s) in state_dict: "layers.4.blocks.0.attn_mask2", "layers.4.blocks.0.norm1.style.weight", "layers.4.blocks.0.norm1.style.bias", "layers.4.blocks.0.qkv.weight", "layers.4.blocks.0.qkv.bias", "layers.4.blocks.0.proj.weight", "layers.4 and so on

    this is the command I am running:

    python -m torch.distributed.launch --nproc_per_node=2 train_styleswin.py --batch 4 --path /mnt/DATASETS/FFHQ --checkpoint_path /mnt/PROCESSEDdata/StyleSwin/Train --sample_path /mnt/PROCESSEDdata/StyleSwin/Train --size 32 --G_channel_multiplier 2 --bcr --D_lr 0.0002 --D_sn --ttur --eval_gt_path /mnt/DATASETS/FFHQ --lr_decay --lr_decay_start_steps 775000 --iter 1000000 --ckpt /mnt/PROCESSEDdata/StyleSwin/FFHQ_1024.pt --use_checkpoint
    

    I tried with and without the use_checkpoint flag and also the 256 version giving back the same error.

    Best

    opened by alexKup88 1
  • Bump tensorflow from 1.15.0 to 2.9.3

    Bump tensorflow from 1.15.0 to 2.9.3

    Bumps tensorflow from 1.15.0 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

Syed Waqas Zamir 906 Dec 30, 2022
FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

XCL 191 Dec 31, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
A Fast and Stable GAN for Small and High Resolution Imagesets - pytorch

A Fast and Stable GAN for Small and High Resolution Imagesets - pytorch The official pytorch implementation of the paper "Towards Faster and Stabilize

Bingchen Liu 455 Jan 8, 2023
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Vision Longformer This project provides the source code for the vision longformer paper. Multi-Scale Vision Longformer: A New Vision Transformer for H

Microsoft 209 Dec 30, 2022
Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

Computational Photography Lab @ SFU 1.1k Jan 2, 2023
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

Poisson Image Editing - A Parallel Implementation Jiayi Weng (jiayiwen), Zixu Chen (zixuc) Poisson Image Editing is a technique that can fuse two imag

Jiayi Weng 110 Dec 27, 2022
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation. Training python train.py --c

Rishikesh (ऋषिकेश) 55 Dec 26, 2022
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 170 Jan 4, 2023
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 54 Aug 30, 2021
This is an official implementation of the High-Resolution Transformer for Dense Prediction.

High-Resolution Transformer for Dense Prediction Introduction This is the official implementation of High-Resolution Transformer (HRT). We present a H

HRNet 403 Dec 13, 2022
Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

Flickr-Faces-HQ Dataset (FFHQ) Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative

NVIDIA Research Projects 2.9k Dec 28, 2022
Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

HFGI: High-Fidelity GAN Inversion for Image Attribute Editing High-Fidelity GAN Inversion for Image Attribute Editing Update: We released the inferenc

Tengfei Wang 371 Dec 30, 2022
DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time Introduction This is official implementation for DR-GAN (IEEE TCS

Kang Liao 18 Dec 23, 2022
Text to Image Generation with Semantic-Spatial Aware GAN

text2image This repository includes the implementation for Text to Image Generation with Semantic-Spatial Aware GAN This repo is not completely. Netwo

CVDDL 124 Dec 30, 2022
Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

MUST-GAN Code | paper The Pytorch implementation of our CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generat

TianxiangMa 46 Dec 26, 2022
Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

tyty 4 Aug 28, 2022
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

null 23 Oct 17, 2022
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

HiFiGAN Denoiser This is a Unofficial Pytorch implementation of the paper HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep F

Rishikesh (ऋषिकेश) 134 Dec 27, 2022