StyleSwin: Transformer-based GAN for High-resolution Image Generation

Microsoft

Last update: Dec 28, 2022

Related tags

Deep Learning deep-neural-networks computer-vision deep-learning pytorch transformer generative-adversarial-network image-generation gans image-synthesis styleswin

Overview

StyleSwin

This repo is the official implementation of "StyleSwin: Transformer-based GAN for High-resolution Image Generation".

By Bowen Zhang, Shuyang Gu, Bo Zhang, Jianmin Bao, Dong Chen, Fang Wen, Yong Wang and Baining Guo.

Code and pretrained models will be released soon. Please stay tuned.

Abstract

Despite the tantalizing success in a broad of vision tasks, transformers have not yet demonstrated on-par ability as ConvNets in high-resolution image generative modeling. In this paper, we seek to explore using pure transformers to build a generative adversarial network for high-resolution image synthesis. To this end, we believe that local attention is crucial to strike the balance between computational efficiency and modeling capacity. Hence, the proposed generator adopts Swin transformer in a style-based architecture. To achieve a larger receptive field, we propose double attention which simultaneously leverages the context of the local and the shifted windows, leading to improved generation quality. Moreover, we show that offering the knowledge of the absolute position that has been lost in window-based transformers greatly benefits the generation quality. The proposed StyleSwin is scalable to high resolutions, with both the coarse geometry and fine structures benefit from the strong expressivity of transformers. However, blocking artifacts occur during high-resolution synthesis because performing the local attention in a block-wise manner may break the spatial coherency. To solve this, we empirically investigate various solutions, among which we find that employing a wavelet discriminator to examine the spectral discrepancy effectively suppresses the artifacts. Extensive experiments show the superiority over prior transformer-based GANs, especially on high resolutions, e.g., 1024x1024. The StyleSwin, without complex training strategies, excels over StyleGAN on CelebA-HQ 1024x1024, and achieves on-par performance on FFHQ 1024x1024, proving the promise of using transformers for high-resolution image generation.

Main Results

Quantitative Results

Dataset	Resolution	FID	Pretrained Model
FFHQ	256x256	2.81	-
LSUN Church	256x256	3.10	-
CelebA-HQ	256x256	3.25	-
FFHQ	1024x1024	5.07	-
CelebA-HQ	1024x1024	4.43	-

Qualitative Results

Image samples of FFHQ-1024 generated by StyleSwin:

Image samples of CelebA-HQ 1024 generated by StyleSwin:

Latent code interpolation examples of FFHQ-1024 between the left-most and the right-most images:

Citing StyleSwin

@misc{zhang2021styleswin,
      title={StyleSwin: Transformer-based GAN for High-resolution Image Generation}, 
      author={Bowen Zhang and Shuyang Gu and Bo Zhang and Jianmin Bao and Dong Chen and Fang Wen and Yong Wang and Baining Guo},
      year={2021},
      eprint={2112.10762},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Maintenance

This is the codebase for our research work. Please open a GitHub issue for any help. If you have any questions regarding the technical details, feel free to contact [email protected] or [email protected].

License

The codes and the pretrained model in this repository are under the MIT license as specified by the LICENSE file. We use our labeled dataset to train the scratch detection model.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Comments

Betas

Hello, i forgot to ask in the previous issue, but what is the intuition behind beta1=0.0 and beta2=0.99? I've seen it in a couple more projects (Such as CIPS), and i always wondered how did they come up with these values (As usually, GANs have beta1=0.5 and beta2=0.999). Is there some property of these values that helps training? Or is it just betas that seemed to work the most?

opened by TheGullahanMaster 22

Query: How to save the generated samples

Thank you authors for sharing the interesting work you have done in deep generative modeling. I have a rather small doubt, this is regarding the command to generate samples.

python -m torch.distributed.launch --nproc_per_node=1 train_styleswin.py --sample_path /path_to_save_generated_samples --size 256 --G_channel_multiplier 2 --ckpt /path/to/checkpoint --eval --val_num_batches 12500 --val_batch_size 4 --eval_gt_path /path_to_real_images_50k

Here, I wanted to know what should bepath_to_save_generated_samples?

I am getting the following error message:

Traceback (most recent call last):
  File "train_styleswin.py", line 382, in <module>
    os.mkdir(args.sample_path)
FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/Repositories/StyleSwin/StyleSwin_generated_samples/samples'

I made a directory in the cloned repository called StyleSwin_generated_samples for saving the samples.

opened by KomputerMaster64 5

nvcc fatal: Unknown option '-generate-dependencies-with-compile'

Hi, I was referring to your github and trying to implement the StyleSwin repo. ecountering the following problem: nvcc fatal : Unknown option '-generate-dependencies-with-compile', not sure whast the problem.

opened by navuboy 5
Train and Validation Split

What was the train and validation split used? I'm using the checkpoint provided and testing with a validation set of the top 10k, similar to Co-Mod-GAN's split (section 5.1). Using this split I am getting a FID of 4.26.

opened by stevenwalton 4
Training log of losses

Hi, Thank you for your awesome research! I am training the model with my own dataset, i want to know the training logs of losses if it is possible. Discriminator loss seems to converge so fast (close to 0) , is it right?

Best regards, Hankyu Jang

opened by hanq0212 4
FID Curve

Great work! However, I use 4 x 3090 GPUs to train StyleSwin on the FFHQ-256 dataset, and evaluate FID on the same dataset. Then get the following FID curve from 0-500k after 4 days.

Is this normal? I might not see the probability of getting FID less than 10 after 1000k. Could you please show your FID curve in this dataset?

opened by tau-yihouxiang 3
Why do you replace noise with SPE?

Compared with Stylegan2, I notice that you you replace noise with SPE at the same place. What are the differences between SPE and noise? Can SPE achieve the effect of noise? Seems like SPE is a fixed vector?

Thanks.

opened by zhouwy19 3
question strategy in resolution progression

Congratulations on this great job. I would like to ask you if your training strategy is similar to the StyleGAN resolution progression (e.g. 64x64, then 128x128) Thanks!

opened by JcccKing 2
How to finetune?

Sorry if this is a silly question, but I wanted ask if you could provide and example of how to fine-tune one of your existing models to a new dataset?

An example from StyleGan3 # Fine-tune StyleGAN3-R for MetFaces-U using 1 GPU, starting from the pre-trained FFHQ-U pickle. python train.py --outdir=~/training-runs --cfg=stylegan3-r --data=~/datasets/metfacesu-1024x1024.zip \ --gpus=8 --batch=32 --gamma=6.6 --mirror=1 --kimg=5000 --snap=5 \ --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-ffhqu-1024x1024.pkl

opened by GinIsTheAnswer 2
pre_training and fine-tune

Hi, thank you so much for sharing your code！ I have a question. I'd like to transfer the model to medical images. But I don't I retrain directly or I need load the pre-training model and fine-tune it? What do you think about it? Looking forward to your reply!

opened by Joker-ZXR 2
About Automatic Mixed Precision

Thank you for awesome research and code release! Is there any reason that you don't use automatic mixed precision package of pytorch? Did it lower the performance of model when you use it?

opened by hanq0212 2
Error using ckpt when resuming
Thanks for sharing, I am having this error:

Traceback (most recent call last): File "train_styleswin.py", line 409, in generator.load_state_dict(ckpt["g"]) File "/mnt/anaconda3/envs/StyleSwin/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1668, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Generator: Unexpected key(s) in state_dict: "layers.4.blocks.0.attn_mask2", "layers.4.blocks.0.norm1.style.weight", "layers.4.blocks.0.norm1.style.bias", "layers.4.blocks.0.qkv.weight", "layers.4.blocks.0.qkv.bias", "layers.4.blocks.0.proj.weight", "layers.4 and so on

this is the command I am running:

python -m torch.distributed.launch --nproc_per_node=2 train_styleswin.py --batch 4 --path /mnt/DATASETS/FFHQ --checkpoint_path /mnt/PROCESSEDdata/StyleSwin/Train --sample_path /mnt/PROCESSEDdata/StyleSwin/Train --size 32 --G_channel_multiplier 2 --bcr --D_lr 0.0002 --D_sn --ttur --eval_gt_path /mnt/DATASETS/FFHQ --lr_decay --lr_decay_start_steps 775000 --iter 1000000 --ckpt /mnt/PROCESSEDdata/StyleSwin/FFHQ_1024.pt --use_checkpoint

I tried with and without the use_checkpoint flag and also the 256 version giving back the same error.

Best
opened by alexKup88 1
Bump tensorflow from 1.15.0 to 2.9.3
Bumps tensorflow from 1.15.0 to 2.9.3.

Release notes

Sourced from tensorflow's releases.

TensorFlow 2.9.3

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

TensorFlow 2.9.2

Release 2.9.2

This releases introduces several vulnerability fixes:

Fixes a CHECK failure in tf.reshape caused by overflows (CVE-2022-35934)

Fixes a CHECK failure in SobolSample caused by missing validation (CVE-2022-35935)

Fixes an OOB read in Gather_nd op in TF Lite (CVE-2022-35937)

Fixes a CHECK failure in TensorListReserve caused by missing validation (CVE-2022-35960)

Fixes an OOB write in Scatter_nd op in TF Lite (CVE-2022-35939)

Fixes an integer overflow in RaggedRangeOp (CVE-2022-35940)

Fixes a CHECK failure in AvgPoolOp (CVE-2022-35941)

Fixes a CHECK failures in UnbatchGradOp (CVE-2022-35952)

Fixes a segfault TFLite converter on per-channel quantized transposed convolutions (CVE-2022-36027)

Fixes a CHECK failures in AvgPool3DGrad (CVE-2022-35959)

Fixes a CHECK failures in FractionalAvgPoolGrad (CVE-2022-35963)

Fixes a segfault in BlockLSTMGradV2 (CVE-2022-35964)

Fixes a segfault in LowerBound and UpperBound (CVE-2022-35965)

... (truncated)

Changelog

Sourced from tensorflow's changelog.

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

Release 2.8.4

This release introduces several vulnerability fixes:

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

... (truncated)

Commits

a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2

258f9a1 Update py_func.cc

cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474

3e75385 Update version numbers to 2.9.3

bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695

3506c90 Update RELEASE.md

8dcb48e Update RELEASE.md

4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...

6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple

5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

StyleSwin: Transformer-based GAN for High-resolution Image Generation

Related tags

Overview

StyleSwin

Abstract

Main Results

Quantitative Results

Qualitative Results

Citing StyleSwin

Maintenance

License

Comments

TensorFlow 2.9.3

Release 2.9.3

TensorFlow 2.9.2

Release 2.9.2

Release 2.9.3

Release 2.8.4

Owner

Microsoft

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

A Fast and Stable GAN for Small and High Resolution Imagesets - pytorch

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

This is an official implementation of the High-Resolution Transformer for Dense Prediction.

Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

Text to Image Generation with Semantic-Spatial Aware GAN

Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

Image-generation-baseline - MUGE Text To Image Generation Baseline

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks