How to Train a GAN? Tips and tricks to make GANs work

Overview

(this list is no longer maintained, and I am not sure how relevant it is in 2020)

How to Train a GAN? Tips and tricks to make GANs work

While research in Generative Adversarial Networks (GANs) continues to improve the fundamental stability of these models, we use a bunch of tricks to train them and make them stable day to day.

Here are a summary of some of the tricks.

Here's a link to the authors of this document

If you find a trick that is particularly useful in practice, please open a Pull Request to add it to the document. If we find it to be reasonable and verified, we will merge it in.

1. Normalize the inputs

  • normalize the images between -1 and 1
  • Tanh as the last layer of the generator output

2: A modified loss function

In GAN papers, the loss function to optimize G is min (log 1-D), but in practice folks practically use max log D

  • because the first formulation has vanishing gradients early on
  • Goodfellow et. al (2014)

In practice, works well:

  • Flip labels when training generator: real = fake, fake = real

3: Use a spherical Z

  • Dont sample from a Uniform distribution

cube

  • Sample from a gaussian distribution

sphere

4: BatchNorm

  • Construct different mini-batches for real and fake, i.e. each mini-batch needs to contain only all real images or all generated images.
  • when batchnorm is not an option use instance normalization (for each sample, subtract mean and divide by standard deviation).

batchmix

5: Avoid Sparse Gradients: ReLU, MaxPool

  • the stability of the GAN game suffers if you have sparse gradients
  • LeakyReLU = good (in both G and D)
  • For Downsampling, use: Average Pooling, Conv2d + stride
  • For Upsampling, use: PixelShuffle, ConvTranspose2d + stride

6: Use Soft and Noisy Labels

  • Label Smoothing, i.e. if you have two target labels: Real=1 and Fake=0, then for each incoming sample, if it is real, then replace the label with a random number between 0.7 and 1.2, and if it is a fake sample, replace it with 0.0 and 0.3 (for example).
    • Salimans et. al. 2016
  • make the labels the noisy for the discriminator: occasionally flip the labels when training the discriminator

7: DCGAN / Hybrid Models

  • Use DCGAN when you can. It works!
  • if you cant use DCGANs and no model is stable, use a hybrid model : KL + GAN or VAE + GAN

8: Use stability tricks from RL

  • Experience Replay
    • Keep a replay buffer of past generations and occassionally show them
    • Keep checkpoints from the past of G and D and occassionaly swap them out for a few iterations
  • All stability tricks that work for deep deterministic policy gradients
  • See Pfau & Vinyals (2016)

9: Use the ADAM Optimizer

  • optim.Adam rules!
    • See Radford et. al. 2015
  • Use SGD for discriminator and ADAM for generator

10: Track failures early

  • D loss goes to 0: failure mode
  • check norms of gradients: if they are over 100 things are screwing up
  • when things are working, D loss has low variance and goes down over time vs having huge variance and spiking
  • if loss of generator steadily decreases, then it's fooling D with garbage (says martin)

11: Dont balance loss via statistics (unless you have a good reason to)

  • Dont try to find a (number of G / number of D) schedule to uncollapse training
  • It's hard and we've all tried it.
  • If you do try it, have a principled approach to it, rather than intuition

For example

while lossD > A:
  train D
while lossG > B:
  train G

12: If you have labels, use them

  • if you have labels available, training the discriminator to also classify the samples: auxillary GANs

13: Add noise to inputs, decay over time

14: [notsure] Train discriminator more (sometimes)

  • especially when you have noise
  • hard to find a schedule of number of D iterations vs G iterations

15: [notsure] Batch Discrimination

  • Mixed results

16: Discrete variables in Conditional GANs

  • Use an Embedding layer
  • Add as additional channels to images
  • Keep embedding dimensionality low and upsample to match image channel size

17: Use Dropouts in G in both train and test phase

Authors

  • Soumith Chintala
  • Emily Denton
  • Martin Arjovsky
  • Michael Mathieu
Comments
  • Label smoothing should be one-sided?

    Label smoothing should be one-sided?

    Regarding trick 6, label smoothing should be one-sided, real images only (Salimans et. al. 2016). Their rationale makes sense. Did you find evidence to the contrary?

    opened by MustafaMustafa 15
  • Probable things to investigate when Generator falls behind the Discriminator

    Probable things to investigate when Generator falls behind the Discriminator

    Hi,

    I am unsure whether this is worth creating a new issue or not, so please feel free to let me know if it's not. Actually I am quite new to training GANs and hence was hoping someone with more experience can provide some guidance.

    My problem is, my Generator error keeps on increasing steadily (not spiking suddenly, but gradually) and the Discriminator error keeps on reducing simultaneously. Below I provide the statistics :

    Generator error / Discriminator error
    0.75959807634354 / 0.59769108891487
    1.3820139408112 / 0.35363309383392
    1.9390360116959 / 0.2000379934907
    2.1018676519394 / 0.16694237589836
    2.5574874728918 / 0.10423161834478
    2.8098415493965 / 0.082516837120056
    3.2860078886151 / 0.046023709699512
    3.630749514699 / 0.028832530975342
    3.7707495708019 / 0.022863862104714
    3.8990840911865 / 0.020417057722807
    4.1248006802052 / 0.017872251570225
    4.259504699707 / 0.01507920846343
    4.2479295730591 / 0.013462643604726
    4.4426490783691 / 0.010646429285407
    4.6057481756434 / 0.0098107368685305
    4.6718273162842 / 0.0096474666148424
    4.8214926728979 / 0.0079655896406621
    4.7656826004386 / 0.0076067917048931
    4.8425741195679 / 0.0080536706373096
    4.9743659980595 / 0.0066521260887384
    

    When this is the case, what are some of the probable things to investigate or I should look out for while trying to mitigate the problem ?

    For example, I came across: Make the discriminator much less expressive by using a smaller model. Generation is a much harder task and requires more parameters, so the generator should be significantly bigger. [1]

    If someone has similar pointers up their sleeves, it will be very helpful.

    Secondly, just out of curiosity, is there a reason that most of the implementations that I have come across, uses the same lr for both the generator and discriminator ? (DC-GAN, pix2pix, Text-to-image).

    I mean, since the generator's job is much harder (generating something plausible from random noise), in hindsight, giving a higher lr to it makes more sense. Or is it just simply application specific and the same lr just works out for the above mentioned works ?

    Thanks in advance !

    opened by kmul00 10
  • Question: Any interest in putting together a premium version for sale on SugarKubes?

    Question: Any interest in putting together a premium version for sale on SugarKubes?

    Hi @soumith , congrats on the traction! Would you be interested in putting together a premium version of this repo to sell on SugarKubes? It's a code and container marketplace I'm putting together.

    opened by wrannaman 3
  • normal behavior of GAN

    normal behavior of GAN

    I am wondering is there any rules to check whether GAN converges or not if I cannot check the generators directly. Those tricks are very helpful but when I face the loss it is still hard for me to make conclusion.

    So in my case this is the result: Epoch 1:

    component              | loss | generation_loss | auxiliary_loss
    -----------------------------------------------------------------
    generator (train)      | 5.93 | 0.73            | 5.20
    generator (test)       | 2.48 | 1.50            | 0.98
    discriminator (train)  | 6.22 | 0.59            | 5.63
    discriminator (test)   | 4.13 | 0.32            | 3.82
    
    

    Epoch 2:

    component              | loss | generation_loss | auxiliary_loss
    -----------------------------------------------------------------
    generator (train)      | 3.06 | 0.88            | 2.18
    generator (test)       | 1.66 | 1.51            | 0.14
    discriminator (train)  | 4.03 | 0.44            | 3.58
    discriminator (test)   | 3.43 | 0.28            | 3.16
    

    Epoch 3:

    component              | loss | generation_loss | auxiliary_loss
    -----------------------------------------------------------------
    generator (train)      | 2.50 | 0.83            | 1.67
    generator (test)       | 1.76 | 1.70            | 0.06
    discriminator (train)  | 3.51 | 0.36            | 3.14
    discriminator (test)   | 3.20 | 0.20            | 2.99
    

    Epoch 4:

    component              | loss | generation_loss | auxiliary_loss
    -----------------------------------------------------------------
    generator (train)      | 2.27 | 0.73            | 1.54
    generator (test)       | 2.19 | 2.16            | 0.03
    discriminator (train)  | 3.27 | 0.29            | 2.98
    discriminator (test)   | 3.05 | 0.13            | 2.91
    

    Epoch 5:

    component              | loss | generation_loss | auxiliary_loss
    -----------------------------------------------------------------
    generator (train)      | 2.06 | 0.62            | 1.44
    generator (test)       | 2.68 | 2.66            | 0.02
    discriminator (train)  | 3.11 | 0.21            | 2.89
    discriminator (test)   | 2.96 | 0.09            | 2.87
    

    Loss of generator and discriminator are both decreasing but the loss on cv set is increasing. I already insert noise to the discriminator and used dropout for generator(but I think it is not used for cv).

    opened by bobchennan 3
  • Discriminator accuracy: 0.5 on average, but collapsed into 0 for negative samples and 1 for positive samples

    Discriminator accuracy: 0.5 on average, but collapsed into 0 for negative samples and 1 for positive samples

    I am monitoring the discriminator accuracy on separate batches for positive and negative samples as suggested in trick 4, but it often occurs the following situation:

    • Average accuracy = 0.5 (average loss 0.69)
    • Acc on negative samples = 0.0 (loss 0.69)
    • Acc on positive samples = 1.0 (loss 0.7)

    What could be the cause of this problem?

    Cheers, Daniele

    opened by danielegrattarola 2
  • Please explain trick 2

    Please explain trick 2

    https://github.com/soumith/ganhacks#2-a-modified-loss-function

    When training generator standard way is to pass [batch_generated_imgs, batch_real_imgs] and np.ones(shape=batch_size * 2) where all images are labelled 1 (real) to trick discriminator.

    If I understand this trick correctly it is saying to pass [batch_generated_imgs, batch_real_imgs] and np.concat(np.ones(shape=batch_size), np.zeroes(shape=batch_size)) where labels are now flipped for fake and real?

    opened by mjdietzx 2
  • I can't understand this trick

    I can't understand this trick

    Hello , Soumith The trick ----- Add gaussion noisy to every layer of D not G. Today , I use this trick to DCGAN

    the D network

    def dis_net(data_array , weights , biases ,reuse=False):
        data_array = GaussionNoisy_layers(data_array)
    
        conv1 = conv2d(data_array , weights['wc1'] , biases['bc1'])
    
        conv1 = lrelu(conv1)
        conv1 = GaussionNoisy_layers(conv1 , sigma=0.3)
    
        conv2 = conv2d(conv1 , weights['wc2']  , biases['bc2'])
        conv2 = batch_norma(conv2 , scope="dis_bn1" , reuse=reuse)
        conv2 = lrelu(conv2)
        conv2 = GaussionNoisy_layers(conv2 , sigma=0.5)
    
        conv3 = conv2d(conv2 , weights['wc3'] , biases['bc3'])
        conv3 = batch_norma(conv3 , scope="dis_bn2" , reuse=reuse)
        conv3 = lrelu(conv3)
        conv3 = GaussionNoisy_layers(conv3 , sigma=0.5)
    
        conv4 = conv2d(conv3 , weights['wc4'] , biases['bc4'])
        conv4 = batch_norma(conv4 , scope="dis_bn3" , reuse=reuse)
        conv4 = lrelu(conv4)
        conv4 = GaussionNoisy_layers(conv4 , sigma=0.5)
    
        out = tf.reshape(conv4 , [-1 , weights['wd'].get_shape().as_list()[0]])
        out = fully_connect(out ,weights['wd'] , biases['bd'])
    
        return tf.nn.sigmoid(out) , out
    

    the Gaussion noisy

    def GaussionNoisy_layers(input_layer , sigma = 0.1):
    
        noisy = np.random.normal(0.0 , sigma , tf.to_int64(input_layer).get_shape())
        return noisy + input_layer
    

    but it can't converge. if i just add noisy to one layer of D Gan can converge.

    Why ? Thanks for your answer!

    opened by zhangqianhui 2
  • > @KrnTneja : Thanks for your tricks. Could you provide any code to do it? I also meet the problem of loss D goes to zero

    > @KrnTneja : Thanks for your tricks. Could you provide any code to do it? I also meet the problem of loss D goes to zero

    @KrnTneja : Thanks for your tricks. Could you provide any code to do it? I also meet the problem of loss D goes to zero

    There isn't really any code to show. Just ensure that last layer of your discriminator is not a sigmoid layer i.e. output shouldn't be constrained to [0,1]. I was using PyTorch, where I had to use torch.nn.BCEWithLogitsLoss instead of torch.nn.BCELoss.

    Originally posted by @KrnTneja in https://github.com/soumith/ganhacks/issues/36#issuecomment-493886559

    opened by shaurov2253 1
  • Generator try to maximize, Discriminator try to minimize

    Generator try to maximize, Discriminator try to minimize

    Hi, I am training a CycleGAN but the log file(Tensorbaord scalar) for both G and D is confusing to me. As it written in the paper (Also, all the GAN papers), G aims to minimize the objective against an adversary D that tries to maximize it.

    At the beginning of training, G is tried to minimize but after 60 epochs start to maximize also D is tried to maximize at the beginning and then start to minimize after epoch 60. I want to know the intuition behind of it. any idea?

    should not model to minimize G and maximize D all the time, why after some epochs generators start to maximize and discriminator minimize?

    Tensorboard scalar: https://imgur.com/rdTxrUP

    Thanks in advance.

    opened by Auth0rM0rgan 1
  • There is no source that has empirically demonstrated a positive effect of replay buffer on GAN training

    There is no source that has empirically demonstrated a positive effect of replay buffer on GAN training

    I understand that the trick #8 "Use stability tricks from RL" is based on an actual paper. But other than the paper and this webpage, there's no source that has suggested experience replay for GAN. In the paper, it was only suggested it was reasonable to try it from the analogy, and so is it in this page. I would like to see an empirical evidence of the extent to which the GAN training benefits from keeping a replay buffer of past generations and occasionally showing them.

    opened by AranKomat 1
  • G is always really smaller than D

    G is always really smaller than D

    I am training a generative adversarial network to perform style transfer from two different image domains (source and target). Since I have available class information i have an extra Q network (except G and D) that measures the classification results fo the generated images for the target domain and their labels. From the convergence of the system I have noticed that D is starting from 8 (the error of the network) and slightly drops until 4.5 and the generator error is starting from 1 and quickly drops to 0.2. Is that behaviour an example of mode-collapse? What is exactly the relationship between the errors of D and G? The loss function of D and G I am using can be found here https://github.com/r0nn13/conditional-dcgan-keras while the loss function of Q network is categorical cross-entropy. The loss functions can be found here: https://imgur.com/a/bDrTcpm

    opened by kristosh 0
  • MAE increasing when training!

    MAE increasing when training!

    Hi, there. I am training a GAN for style transfer, based on UNet, input a Domain-A image and predict a Domain-B image. But I have met a question, after I checked my codes and made sure not any bugs in it. And I found that, the loss is to calculate with the input and the prediction, is L1loss()

    But the loss is decreasing while the MAE(mean absolute error) is increasing, which means the prediction is going worse. And PSNR, SSIM is going worse, too. I have no idea, is it mode collapse?

    Thanks alot!

    opened by DISAPPEARED13 0
  • Tricks to sharpen GAN's synthesis

    Tricks to sharpen GAN's synthesis

    The model is for image transfering. But output(fake image) looks blur and too smooth, seems lost texture information. Is there any tricks to sharpen it?

    Best.

    opened by DISAPPEARED13 0
  • Dose it need to transfer pretrained discriminator when transfer learning for GAN?

    Dose it need to transfer pretrained discriminator when transfer learning for GAN?

    I need to use transfer learning for my gan. I wonder if the pretrained discriminator need to be transferred or is it ok to only transfer the pretrained generator?

    opened by micklexqg 1
  • Reference for trick#16

    Reference for trick#16

    On the NIPS 2016, while discussing trick#16, Soumith mentions the name of Michel Matthew. I am looking for what he referred to (Paper, tutorial anything). Can anyone help?

    opened by shaurov2253 0
Owner
Soumith Chintala
/\︿╱\ _________________________________ \0_ 0 /╱\╱____________________________ \▁︹_/
Soumith Chintala
Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data (CVPR 2022) Potentials of primitive shapes f

null 31 Sep 27, 2022
GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.

MTV-TSA: Adaptable GAN Encoders for Image Reconstruction via Multi-type Latent Vectors with Two-scale Attentions. This is the official code release fo

owl 37 Dec 24, 2022
[CVPR 2021] Pytorch implementation of Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs In this work, we propose a framework HijackGAN, which enables non-linear latent space travers

Hui-Po Wang 46 Sep 5, 2022
GAN JAX - A toy project to generate images from GANs with JAX

GAN JAX - A toy project to generate images from GANs with JAX This project aims to bring the power of JAX, a Python framework developped by Google and

Valentin Goldité 14 Nov 29, 2022
Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

tyty 4 Aug 28, 2022
PyTorch implementation for OCT-GAN Neural ODE-based Conditional Tabular GANs (WWW 2021)

OCT-GAN: Neural ODE-based Conditional Tabular GANs (OCT-GAN) Code for reproducing the experiments in the paper: Jayoung Kim*, Jinsung Jeon*, Jaehoon L

BigDyL 7 Dec 27, 2022
A scientific and useful toolbox, which contains practical and effective long-tail related tricks with extensive experimental results

Bag of tricks for long-tailed visual recognition with deep convolutional neural networks This repository is the official PyTorch implementation of AAA

Yong-Shun Zhang 181 Dec 28, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

null 405 Jan 6, 2023
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 360 Dec 10, 2022
[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study Codes for [Preprint] Bag of Tricks for Training Deeper Graph

VITA 101 Dec 29, 2022
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 359 Jan 5, 2023
Train SN-GAN with AdaBelief

SNGAN-AdaBelief Train a state-of-the-art spectral normalization GAN with AdaBelief https://github.com/juntang-zhuang/Adabelief-Optimizer Acknowledgeme

Juntang Zhuang 10 Jun 11, 2022
FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

XCL 191 Dec 31, 2022
DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time Introduction This is official implementation for DR-GAN (IEEE TCS

Kang Liao 18 Dec 23, 2022
TransGAN: Two Transformers Can Make One Strong GAN

[Preprint] "TransGAN: Two Transformers Can Make One Strong GAN", Yifan Jiang, Shiyu Chang, Zhangyang Wang

VITA 1.5k Jan 7, 2023
ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

ST++ This is the official PyTorch implementation of our paper: ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation. Lihe Ya

Lihe Yang 147 Jan 3, 2023
Make a Turtlebot3 follow a figure 8 trajectory and create a robot arm and make it follow a trajectory

HW2 - ME 495 Overview Part 1: Makes the robot move in a figure 8 shape. The robot starts moving when launched on a real turtlebot3 and can be paused a

Devesh Bhura 0 Oct 21, 2022
[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Anycost GAN video | paper | website Anycost GANs for Interactive Image Synthesis and Editing Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zh

MIT HAN Lab 726 Dec 28, 2022