Collapse by Conditioning: Training Class-conditional GANs with Limited Data

Mohamad Shahbazi

Last update: Dec 6, 2022

Related tags

Deep Learning transitional-cGAN

Overview

Collapse by Conditioning: Training Class-conditional GANs with Limited Data

Mohamad Shahbazi, Martin Danelljan, Danda P. Paudel, Luc Van Gool
Paper: https://openreview.net/forum?id=7TZeCsNOUB_

Abstract

Class-conditioning offers a direct means of controlling a Generative Adversarial Network (GAN) based on a discrete input variable. While necessary in many applications, the additional information provided by the class labels could even be expected to benefit the training of the GAN itself. Contrary to this belief, we observe that class-conditioning causes mode collapse in limited data settings, where unconditional learning leads to satisfactory generative ability. Motivated by this observation, we propose a training strategy for conditional GANs (cGANs) that effectively prevents the observed mode-collapse by leveraging unconditional learning. Our training strategy starts with an unconditional GAN and gradually injects conditional information into the generator and the objective function. The proposed method for training cGANs with limited data results not only in stable training but also in generating high-quality images, thanks to the early-stage exploitation of the shared information across classes. We analyze the aforementioned mode collapse problem in comprehensive experiments on four datasets. Our approach demonstrates outstanding results compared with state-of-the-art methods and established baselines.

Requirements

Linux and Windows are supported, but Linux is recommended for performance and compatibility reasons.
For the batch size of 64, we have used 4 NVIDIA GeForce RTX 2080 Ti GPUs (each having 11 GiB of memory).
64-bit Python 3.7 and PyTorch 1.7.1. See https://pytorch.org/ for PyTorch installation instructions.
CUDA toolkit 11.0 or later. Use at least version 11.1 if running on RTX 3090. (Why is a separate CUDA toolkit installation required? See comments of this Github issue.)
Python libraries: pip install wandb click requests tqdm pyspng ninja imageio-ffmpeg==0.4.3.
This project uses Weights and Biases for visualization and logging. In addition to installing W&B (included in the command above), you need to create a free account on W&B website. Then, you must login to your account in the command line using the command ‍‍‍wandb login (The login information will be asked after running the command).
Docker users: use the provided Dockerfile by StyleGAN2+ADA (./Dockerfile) to build an image with the required library dependencies.

The code relies heavily on custom PyTorch extensions that are compiled on the fly using NVCC. On Windows, the compilation requires Microsoft Visual Studio. We recommend installing Visual Studio Community Edition and adding it into PATH using "C:\Program Files (x86)\Microsoft Visual Studio\ \Community\VC\Auxiliary\Build\vcvars64.bat".

Getting Started

The code for this project is based on the Pytorch implementation of StyleGAN2+ADA. Please first read the instructions provided for StyleGAN2+ADA. Here, we mainly provide the additional details required to use our method.

For a quick start, we have provided example scripts in ./scripts, as well as an example dataset (a tar file containing a subset of ImageNet Carnivores dataset used in the paper) in ./datasets. Note that the scripts do not include the command for activating python environments. Moreover, the paths for the dataset and output directories can be modified in the scripts based on your own setup.

The following command runs a script that extracts the tar file and creates a ZIP file in the same directory.

bash scripts/prepare_dataset_ImageNetCarnivores_20_100.sh

The ZIP file is later used for training and evaluation. For more details on how to use your custom datasets, see Dataset Prepration.

Following command runs a script that trains the model using our method with default hyper-parameters:

bash scripts/train_ImageNetCarnivores_20_100.sh

For more details on how to use your custom datasets, see Training

To calculate the evaluation metrics on a pretrained model, use the following command:

bash scripts/inference_metrics_ImageNetCarnivores_20_100.sh

Outputs from the training and inferenve commands are by default placed under out/, controlled by --outdir. Downloaded network pickles are cached under $HOME/.cache/dnnlib, which can be overridden by setting the DNNLIB_CACHE_DIR environment variable. The default PyTorch extension build directory is $HOME/.cache/torch_extensions, which can be overridden by setting TORCH_EXTENSIONS_DIR.

Dataset Prepration

Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels.

Custom datasets can be created from a folder containing images (each sub-directory containing images of one class in case of multi-class datasets) using dataset_tool.py; Here is an example of how to convert the dataset folder to the desired ZIP file:

python dataset_tool.py --source=datasets/ImageNet_Carnivores_20_100 --dest=datasets/ImageNet_Carnivores_20_100.zip --transform=center-crop --width=128 --height=128

The above example reads the images from the image folder provided by --src, resizes the images to the sizes provided by --width and --height, and applys the transform center-crop to them. The resulting images along with the metadata (label information) are stored as a ZIP file determined by --dest. see python dataset_tool.py --help for more information. See StyleGAN2+ADA instructions for more details on specific datasets or Legacy TFRecords datasets .

The created ZIP file can be passed to the training and evaluation code using --data argument.

Training

Training new networks can be done using train.py. In order to perform the training using our method, the argument --cond should be set to 1, so that the training is done conditionally. In addition, the start and the end of the transition from unconditional to conditional training should be specified using the arguments t_start_kimg and --t_end_kimg. Here is an example training command:

python train.py --outdir=./out/ \
--data=datasets/ImageNet_Carnivores_20_100.zip \
--cond=1 --t_start_kimg=2000  --t_end_kimg=4000  \
--gpus=4 \
--cfg=auto --mirror=1 \
--metrics=fid50k_full,kid50k_full

See StyleGAN2+ADA instructions for more details on the arguments, configurations amd hyper-parammeters. Please refer to python train.py --help for the full list of arguments.

Note: Our code currently can be used only for unconditional or transitional training. For the original conditional training, you can use the original implementation StyleGAN2+ADA.

Evaluation and Logging

By default, train.py automatically computes FID for each network pickle exported during training. More metrics can be added to the argument --metrics (as a comma-seperated list). To monitor the training, you can inspect the log.txt an JSON files (e.g. metric-fid50k_full.jsonl for FID) saved in the ouput directory. Alternatively, you can inspect WandB or Tensorboard logs (By default, WandB creates the logs under the project name "Transitional-cGAN", which can be accessed in your account on the website).

When desired, the automatic computation can be disabled with --metrics=none to speed up the training slightly (3%–9%). Additional metrics can also be computed after the training:

# Previous training run: look up options automatically, save result to JSONL file.
python calc_metrics.py --metrics=pr50k3_full \
    --network=~/training-runs/00000-ffhq10k-res64-auto1/network-snapshot-000000.pkl

# Pre-trained network pickle: specify dataset explicitly, print result to stdout.
python calc_metrics.py --metrics=fid50k_full --data=~/datasets/ffhq.zip --mirror=1 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl

The first example looks up the training configuration and performs the same operation as if --metrics=pr50k3_full had been specified during training. The second example downloads a pre-trained network pickle, in which case the values of --mirror and --data must be specified explicitly.

See StyleGAN2+ADA instructions for more details on the available metrics.

Contact

For any questions, suggestions, or issues with the code, please contact Mohamad Shahbazi at [email protected]

How to Cite

@inproceedings{
shahbazi2022collapse,
title={Collapse by Conditioning: Training Class-conditional {GAN}s with Limited Data},
author={Shahbazi, Mohamad and Danelljan, Martin and Pani Paudel, Danda and Van Gool, Luc},
booktitle={The Tenth International Conference on Learning Representations },
year={2022},
url={https://openreview.net/forum?id=7TZeCsNOUB_}

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

43 Nov 7, 2022

Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)

Training GANs with Stronger Augmentations via Contrastive Discriminator (ICLR 2021) This repository contains the code for reproducing the paper: Train

174 Dec 29, 2022

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

ViTGAN: Training GANs with Vision Transformers A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers. Refer

127 Dec 23, 2022

A pytorch implementation of Paper "Improved Training of Wasserstein GANs"

WGAN-GP An pytorch implementation of Paper "Improved Training of Wasserstein GANs". Prerequisites Python, NumPy, SciPy, Matplotlib A recent NVIDIA GPU

1.4k Dec 14, 2022

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

885 Jan 1, 2023

Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

The DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergence, and present a conditional cross-attention mechanism for fast DETR training. Our approach is motivated by that the cross-attention in DETR relies highly on the content embeddings and that the spatial embeddings make minor contributions, increasing the need for high-quality content embeddings and thus increasing the training difficulty.

281 Dec 30, 2022

Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction

Official PyTorch code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction. Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe,

152 Dec 16, 2022

Run Effective Large Batch Contrastive Learning on Limited Memory GPU

Gradient Cache Gradient Cache is a simple technique for unlimitedly scaling contrastive learning batch far beyond GPU memory constraint. This means tr

198 Dec 29, 2022

This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels].

CGPN This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels]. Req

10 Sep 12, 2022

Comments

Approach is ineffective, at least with certain datasets/domains
In my experience with this approach, it doesn't help prevent mode collapse (at least in certain scenarios like mine), it simply delays it and prolongs the training process with the gradual transition from unconditional to conditional regimes.

From my testing, it does seem to keep training more stable and avoid mode collapse before and during the transition period. Soon after the transition is complete however, outputs start deteriorating, and the model still ends up in a total or near-total mode collapse. I suspect the nature of the problem is something conceptually similar to the vanishing gradient problem. As the transition moves closer to the conditional regime, it is by definition farther from the global/unconditional data, and benefits less from it as training goes on until the transition is complete and any benefit stops entirely.

Building on your basic idea of training unconditionally at first before introducing labels, I found a way simpler and seemingly more stable approach that doesn't require modifying the vanilla StyleGAN architecture(s), or a transition period. Simply, it's training an unconditional model, then using one-shot weight transfer to instantly "convert" the unconditional model to a conditional one, and resume training that. Steps:

Create conditional dataset

Start training unconditional model with the same configuration you would the conditional one, but simply omitting the --cond=1 flag

Train until the model has "learned enough" from the unconditional regime. I haven't done extensive testing so I don't know what the "optimal" point in training this would be, but a crude yet effective method would be to stop once FID stops improving. It does have the benefit of avoiding new hyperparameters, and should be effective regardless of domain/dataset

Start training a randomly initialized conditional model with the same command used in step 2, but with --cond=1 this time. Stop training as soon as the initial model .pkl file is generated and written to disk

Load both pickles (trained unconditional and untrained conditional), copy all layer weights with matching names and shapes from unconditional to conditional model, save the resulting "converted" pickle

Resume conditional training with --cond=1 and --resume pointing to your converted conditional pickle from the pervious step

Since the transition is instantaneous and conditioning starts with the model having access to 100% of the high-level data learned during unconditional training, it avoids the data loss that seems to occur during a gradual transition. I can't say that it entirely fixes the mode collapse problem yet, since I'm still in the middle of training the model and it may still collapse to one degree or another before convergence, but I can definitely say it's been significantly more stable. There's no sign of mode collapse in any of the classes 3000 kImg after the transition, whereas the gradual transition approach experienced near-total mode collapse across all classes after only about 500 kImg after the transition ended, on the same dataset.

Hoping the info helps someone. This was one of the very few implementations/papers that deals with stability problems during conditional StyleGAN training, and problems with complex domains in general, but it didn't help me, so I'm sharing something that did. Also somewhat relevant to you might be "StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets" (https://arxiv.org/abs/2202.00273). Their re-introduction of progressive growing, and disabling things like style mixing and path length regularization, might be useful in training complex conditional models. However, their model is way more complex that vanilla StyleGAN, depends on external models, and still has issues.

I'm currently playing around with the idea of progressively growing vanilla StyleGAN architectures via the weight-transfer method, switching to a conditional regimes at different stages of model scaling. So guess we'll see in time, might be an effective method of training on complex conditional domains, while avoiding mode collapse and architecture changes.
opened by Kaoru8 3

Collapse by Conditioning: Training Class-conditional GANs with Limited Data

Related tags

Overview

Collapse by Conditioning: Training Class-conditional GANs with Limited Data

Abstract

Overview

Requirements

Getting Started

Dataset Prepration

Training

Evaluation and Logging

Contact

How to Cite

You might also like...

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

A pytorch implementation of Paper "Improved Training of Wasserstein GANs"

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction

Run Effective Large Batch Contrastive Learning on Limited Memory GPU

This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels].

Comments

Approach is ineffective, at least with certain datasets/domains

Owner

Mohamad Shahbazi

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)

PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

Synthesizing and manipulating 2048x1024 images with conditional GANs

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

PyTorch implementation for OCT-GAN Neural ODE-based Conditional Tabular GANs (WWW 2021)

The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.