StyleGAN2-ADA - Official PyTorch implementation

Overview

Need Help?

  • If you’re new to StyleGAN2-ADA and looking to get started, please check out this video series from a course Lia Coleman and I taught in October 2020.
  • Interested in contributing? Please submit PRs or discuss changes in the Artificial Images Slack channel

Edits made to this repo

  • Fakes .jpg: save yourself a ton of space with fakes during training saved as .jpg instead of .png
  • Multiple interpolation options: use --process="interpolation", see --help for more options
  • Easing options for interpolations: see --help for more (this would be a great place for new coders to build additional feautures/options)
  • Vertical Mirroring: use --mirrory=True to flip training set top to bottom currently broken
  • Set Initial Augmentation Strength: use --initstrength={float value} to set the initialized strength of augmentations (really helpful when restarting training)
  • Set Initial Kimg count: use --nkimg={int value} to set the initial kimg count (helpful with restarts)
  • Closed Form Factorization: converted from Rosinality repo by Philip Bizimis; additional video creation features
  • Additional Projector Techniques Thanks to Peter Baylies for his projector code that optionally uses pixel-based loss or CLIP
  • Interpolate from Projector .npz Files Use the combine_npz.py script to combine multiple .npz files
  • Convert to Rosinality model structure Thanks to Justin Pinkney for making this! Converting to Rosinality opens up numerous additional tools for manipulatinng StyleGAN models

StyleGAN2-ADA — Official PyTorch implementation

Teaser image

Training Generative Adversarial Networks with Limited Data
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, Timo Aila
https://arxiv.org/abs/2006.06676

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch and when fine-tuning an existing GAN on another dataset. We demonstrate, on several datasets, that good results are now possible using only a few thousand training images, often matching StyleGAN2 results with an order of magnitude fewer images. We expect this to open up new application domains for GANs. We also find that the widely used CIFAR-10 is, in fact, a limited data benchmark, and improve the record FID from 5.59 to 2.42.

For business inquiries, please contact [email protected]
For press and other inquiries, please contact Hector Marinez at [email protected]

Release notes

This repository is a faithful reimplementation of StyleGAN2-ADA in PyTorch, focusing on correctness, performance, and compatibility.

Correctness

  • Full support for all primary training configurations.
  • Extensive verification of image quality, training curves, and quality metrics against the TensorFlow version.
  • Results are expected to match in all cases, excluding the effects of pseudo-random numbers and floating-point arithmetic.

Performance

  • Training is typically 5%–30% faster compared to the TensorFlow version on NVIDIA Tesla V100 GPUs.
  • Inference is up to 35% faster in high resolutions, but it may be slightly slower in low resolutions.
  • GPU memory usage is comparable to the TensorFlow version.
  • Faster startup time when training new networks (<50s), and also when using pre-trained networks (<4s).
  • New command line options for tweaking the training performance.

Compatibility

  • Compatible with old network pickles created using the TensorFlow version.
  • New ZIP/PNG based dataset format for maximal interoperability with existing 3rd party tools.
  • TFRecords datasets are no longer supported — they need to be converted to the new format.
  • New JSON-based format for logs, metrics, and training curves.
  • Training curves are also exported in the old TFEvents format if TensorBoard is installed.
  • Command line syntax is mostly unchanged, with a few exceptions (e.g., dataset_tool.py).
  • Comparison methods are not supported (--cmethod, --dcap, --cfg=cifarbaseline, --aug=adarv)
  • Truncation is now disabled by default.

Data repository

Path Description
stylegan2-ada-pytorch Main directory hosted on Amazon S3
  ├  ada-paper.pdf Paper PDF
  ├  images Curated example images produced using the pre-trained models
  ├  videos Curated example interpolation videos
  └  pretrained Pre-trained models
    ├  ffhq.pkl FFHQ at 1024x1024, trained using original StyleGAN2
    ├  metfaces.pkl MetFaces at 1024x1024, transfer learning from FFHQ using ADA
    ├  afhqcat.pkl AFHQ Cat at 512x512, trained from scratch using ADA
    ├  afhqdog.pkl AFHQ Dog at 512x512, trained from scratch using ADA
    ├  afhqwild.pkl AFHQ Wild at 512x512, trained from scratch using ADA
    ├  cifar10.pkl Class-conditional CIFAR-10 at 32x32
    ├  brecahad.pkl BreCaHAD at 512x512, trained from scratch using ADA
    ├  paper-fig7c-training-set-sweeps Models used in Fig.7c (sweep over training set size)
    ├  paper-fig11a-small-datasets Models used in Fig.11a (small datasets & transfer learning)
    ├  paper-fig11b-cifar10 Models used in Fig.11b (CIFAR-10)
    ├  transfer-learning-source-nets Models used as starting point for transfer learning
    └  metrics Feature detectors used by the quality metrics

Requirements

  • Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
  • 1–8 high-end NVIDIA GPUs with at least 12 GB of memory. We have done all testing and development using NVIDIA DGX-1 with 8 Tesla V100 GPUs.
  • 64-bit Python 3.7, PyTorch 1.7.1, and CUDA toolkit 11.0 or newer. Use CUDA toolkit 11.1 or later with RTX 3090.
  • Docker users: use the provided Dockerfile to build an image with the required library dependencies.

The code relies heavily on custom PyTorch extensions that are compiled on the fly using NVCC. On Windows, the compilation requires Microsoft Visual Studio. We recommend installing Visual Studio Community Edition and adding it into PATH using "C:\Program Files (x86)\Microsoft Visual Studio\<VERSION>\Community\VC\Auxiliary\Build\vcvars64.bat".

Getting started

Pre-trained networks are stored as *.pkl files that can be referenced using local filenames or URLs:

# Generate curated MetFaces images without truncation (Fig.10 left)
python generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

# Generate uncurated MetFaces images with truncation (Fig.12 upper left)
python generate.py --outdir=out --trunc=0.7 --seeds=600-605 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

# Generate class conditional CIFAR-10 images (Fig.17 left, Car)
python generate.py --outdir=out --seeds=0-35 --class=1 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/cifar10.pkl

# Style mixing example
python style_mixing.py --outdir=out --rows=85,100,75,458,1500 --cols=55,821,1789,293 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

Outputs from the above commands are placed under out/*.png, controlled by --outdir. Downloaded network pickles are cached under $HOME/.cache/dnnlib, which can be overridden by setting the DNNLIB_CACHE_DIR environment variable. The default PyTorch extension build directory is $HOME/.cache/torch_extensions, which can be overridden by setting TORCH_EXTENSIONS_DIR.

Docker: You can run the above curated image example using Docker as follows:

docker build --tag sg2ada:latest .
./docker_run.sh python3 generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

Note: The Docker image requires NVIDIA driver release r455.23 or later.

Legacy networks: The above commands can load most of the network pickles created using the previous TensorFlow versions of StyleGAN2 and StyleGAN2-ADA. However, for future compatibility, we recommend converting such legacy pickles into the new format used by the PyTorch version:

python legacy.py \
    --source=https://nvlabs-fi-cdn.nvidia.com/stylegan2/networks/stylegan2-cat-config-f.pkl \
    --dest=stylegan2-cat-config-f.pkl

Projecting images to latent space

To find the matching latent vector for a given image file, run:

python projector.py --outdir=out --target=~/mytargetimg.png \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl

For optimal results, the target image should be cropped and aligned similar to the FFHQ dataset. The above command saves the projection target out/target.png, result out/proj.png, latent vector out/projected_w.npz, and progression video out/proj.mp4. You can render the resulting latent vector by specifying --projected_w for generate.py:

python generate.py --outdir=out --projected_w=out/projected_w.npz \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl

Using networks from Python

You can use pre-trained networks in your own Python code as follows:

with open('ffhq.pkl', 'rb') as f:
    G = pickle.load(f)['G_ema'].cuda()  # torch.nn.Module
z = torch.randn([1, G.z_dim]).cuda()    # latent codes
c = None                                # class labels (not used in this example)
img = G(z, c)                           # NCHW, float32, dynamic range [-1, +1]

The above code requires torch_utils and dnnlib to be accessible via PYTHONPATH. It does not need source code for the networks themselves — their class definitions are loaded from the pickle via torch_utils.persistence.

The pickle contains three networks. 'G' and 'D' are instantaneous snapshots taken during training, and 'G_ema' represents a moving average of the generator weights over several training steps. The networks are regular instances of torch.nn.Module, with all of their parameters and buffers placed on the CPU at import and gradient computation disabled by default.

The generator consists of two submodules, G.mapping and G.synthesis, that can be executed separately. They also support various additional options:

w = G.mapping(z, c, truncation_psi=0.5, truncation_cutoff=8)
img = G.synthesis(w, noise_mode='const', force_fp32=True)

Please refer to generate.py, style_mixing.py, and projector.py for further examples.

Preparing datasets

Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels.

Custom datasets can be created from a folder containing images; see python dataset_tool.py --help for more information. Alternatively, the folder can also be used directly as a dataset, without running it through dataset_tool.py first, but doing so may lead to suboptimal performance.

Legacy TFRecords datasets are not supported — see below for instructions on how to convert them.

FFHQ:

Step 1: Download the Flickr-Faces-HQ dataset as TFRecords.

Step 2: Extract images from TFRecords using dataset_tool.py from the TensorFlow version of StyleGAN2-ADA:

# Using dataset_tool.py from TensorFlow version at
# https://github.com/NVlabs/stylegan2-ada/
python ../stylegan2-ada/dataset_tool.py unpack \
    --tfrecord_dir=~/ffhq-dataset/tfrecords/ffhq --output_dir=/tmp/ffhq-unpacked

Step 3: Create ZIP archive using dataset_tool.py from this repository:

# Original 1024x1024 resolution.
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq.zip

# Scaled down 256x256 resolution.
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq256x256.zip \
    --width=256 --height=256

MetFaces: Download the MetFaces dataset and create ZIP archive:

python dataset_tool.py --source=~/downloads/metfaces/images --dest=~/datasets/metfaces.zip

AFHQ: Download the AFHQ dataset and create ZIP archive:

python dataset_tool.py --source=~/downloads/afhq/train/cat --dest=~/datasets/afhqcat.zip
python dataset_tool.py --source=~/downloads/afhq/train/dog --dest=~/datasets/afhqdog.zip
python dataset_tool.py --source=~/downloads/afhq/train/wild --dest=~/datasets/afhqwild.zip

CIFAR-10: Download the CIFAR-10 python version and convert to ZIP archive:

python dataset_tool.py --source=~/downloads/cifar-10-python.tar.gz --dest=~/datasets/cifar10.zip

LSUN: Download the desired categories from the LSUN project page and convert to ZIP archive:

python dataset_tool.py --source=~/downloads/lsun/raw/cat_lmdb --dest=~/datasets/lsuncat200k.zip \
    --transform=center-crop --width=256 --height=256 --max_images=200000

python dataset_tool.py --source=~/downloads/lsun/raw/car_lmdb --dest=~/datasets/lsuncar200k.zip \
    --transform=center-crop-wide --width=512 --height=384 --max_images=200000

BreCaHAD:

Step 1: Download the BreCaHAD dataset.

Step 2: Extract 512x512 resolution crops using dataset_tool.py from the TensorFlow version of StyleGAN2-ADA:

# Using dataset_tool.py from TensorFlow version at
# https://github.com/NVlabs/stylegan2-ada/
python dataset_tool.py extract_brecahad_crops --cropsize=512 \
    --output_dir=/tmp/brecahad-crops --brecahad_dir=~/downloads/brecahad/images

Step 3: Create ZIP archive using dataset_tool.py from this repository:

python dataset_tool.py --source=/tmp/brecahad-crops --dest=~/datasets/brecahad.zip

Training new networks

In its most basic form, training new networks boils down to:

python train.py --outdir=~/training-runs --data=~/mydataset.zip --gpus=1 --dry-run
python train.py --outdir=~/training-runs --data=~/mydataset.zip --gpus=1

The first command is optional; it validates the arguments, prints out the training configuration, and exits. The second command kicks off the actual training.

In this example, the results are saved to a newly created directory ~/training-runs/<ID>-mydataset-auto1, controlled by --outdir. The training exports network pickles (network-snapshot-<INT>.pkl) and example images (fakes<INT>.png) at regular intervals (controlled by --snap). For each pickle, it also evaluates FID (controlled by --metrics) and logs the resulting scores in metric-fid50k_full.jsonl (as well as TFEvents if TensorBoard is installed).

The name of the output directory reflects the training configuration. For example, 00000-mydataset-auto1 indicates that the base configuration was auto1, meaning that the hyperparameters were selected automatically for training on one GPU. The base configuration is controlled by --cfg:

Base config Description
auto (default) Automatically select reasonable defaults based on resolution and GPU count. Serves as a good starting point for new datasets but does not necessarily lead to optimal results.
stylegan2 Reproduce results for StyleGAN2 config F at 1024x1024 using 1, 2, 4, or 8 GPUs.
paper256 Reproduce results for FFHQ and LSUN Cat at 256x256 using 1, 2, 4, or 8 GPUs.
paper512 Reproduce results for BreCaHAD and AFHQ at 512x512 using 1, 2, 4, or 8 GPUs.
paper1024 Reproduce results for MetFaces at 1024x1024 using 1, 2, 4, or 8 GPUs.
cifar Reproduce results for CIFAR-10 (tuned configuration) using 1 or 2 GPUs.

The training configuration can be further customized with additional command line options:

  • --aug=noaug disables ADA.
  • --cond=1 enables class-conditional training (requires a dataset with labels).
  • --mirror=1 amplifies the dataset with x-flips. Often beneficial, even with ADA.
  • --resume=ffhq1024 --snap=10 performs transfer learning from FFHQ trained at 1024x1024.
  • --resume=~/training-runs/<NAME>/network-snapshot-<INT>.pkl resumes a previous training run.
  • --gamma=10 overrides R1 gamma. We recommend trying a couple of different values for each new dataset.
  • --aug=ada --target=0.7 adjusts ADA target value (default: 0.6).
  • --augpipe=blit enables pixel blitting but disables all other augmentations.
  • --augpipe=bgcfnc enables all available augmentations (blit, geom, color, filter, noise, cutout).

Please refer to python train.py --help for the full list.

Expected training time

The total training time depends heavily on resolution, number of GPUs, dataset, desired quality, and hyperparameters. The following table lists expected wallclock times to reach different points in the training, measured in thousands of real images shown to the discriminator ("kimg"):

Resolution GPUs 1000 kimg 25000 kimg sec/kimg GPU mem CPU mem
128x128 1 4h 05m 4d 06h 12.8–13.7 7.2 GB 3.9 GB
128x128 2 2h 06m 2d 04h 6.5–6.8 7.4 GB 7.9 GB
128x128 4 1h 20m 1d 09h 4.1–4.6 4.2 GB 16.3 GB
128x128 8 1h 13m 1d 06h 3.9–4.9 2.6 GB 31.9 GB
256x256 1 6h 36m 6d 21h 21.6–24.2 5.0 GB 4.5 GB
256x256 2 3h 27m 3d 14h 11.2–11.8 5.2 GB 9.0 GB
256x256 4 1h 45m 1d 20h 5.6–5.9 5.2 GB 17.8 GB
256x256 8 1h 24m 1d 11h 4.4–5.5 3.2 GB 34.7 GB
512x512 1 21h 03m 21d 22h 72.5–74.9 7.6 GB 5.0 GB
512x512 2 10h 59m 11d 10h 37.7–40.0 7.8 GB 9.8 GB
512x512 4 5h 29m 5d 17h 18.7–19.1 7.9 GB 17.7 GB
512x512 8 2h 48m 2d 22h 9.5–9.7 7.8 GB 38.2 GB
1024x1024 1 1d 20h 46d 03h 154.3–161.6 8.1 GB 5.3 GB
1024x1024 2 23h 09m 24d 02h 80.6–86.2 8.6 GB 11.9 GB
1024x1024 4 11h 36m 12d 02h 40.1–40.8 8.4 GB 21.9 GB
1024x1024 8 5h 54m 6d 03h 20.2–20.6 8.3 GB 44.7 GB

The above measurements were done using NVIDIA Tesla V100 GPUs with default settings (--cfg=auto --aug=ada --metrics=fid50k_full). "sec/kimg" shows the expected range of variation in raw training performance, as reported in log.txt. "GPU mem" and "CPU mem" show the highest observed memory consumption, excluding the peak at the beginning caused by torch.backends.cudnn.benchmark.

In typical cases, 25000 kimg or more is needed to reach convergence, but the results are already quite reasonable around 5000 kimg. 1000 kimg is often enough for transfer learning, which tends to converge significantly faster. The following figure shows example convergence curves for different datasets as a function of wallclock time, using the same settings as above:

Training curves

Note: --cfg=auto serves as a reasonable first guess for the hyperparameters but it does not necessarily lead to optimal results for a given dataset. For example, --cfg=stylegan2 yields considerably better FID for FFHQ-140k at 1024x1024 than illustrated above. We recommend trying out at least a few different values of --gamma for each new dataset.

Quality metrics

By default, train.py automatically computes FID for each network pickle exported during training. We recommend inspecting metric-fid50k_full.jsonl (or TensorBoard) at regular intervals to monitor the training progress. When desired, the automatic computation can be disabled with --metrics=none to speed up the training slightly (3%–9%).

Additional quality metrics can also be computed after the training:

# Previous training run: look up options automatically, save result to JSONL file.
python calc_metrics.py --metrics=pr50k3_full \
    --network=~/training-runs/00000-ffhq10k-res64-auto1/network-snapshot-000000.pkl

# Pre-trained network pickle: specify dataset explicitly, print result to stdout.
python calc_metrics.py --metrics=fid50k_full --data=~/datasets/ffhq.zip --mirror=1 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl

The first example looks up the training configuration and performs the same operation as if --metrics=pr50k3_full had been specified during training. The second example downloads a pre-trained network pickle, in which case the values of --mirror and --data must be specified explicitly.

Note that many of the metrics have a significant one-off cost when calculating them for the first time for a new dataset (up to 30min). Also note that the evaluation is done using a different random seed each time, so the results will vary if the same metric is computed multiple times.

We employ the following metrics in the ADA paper. Execution time and GPU memory usage is reported for one NVIDIA Tesla V100 GPU at 1024x1024 resolution:

Metric Time GPU mem Description
fid50k_full 13 min 1.8 GB Fréchet inception distance[1] against the full dataset
kid50k_full 13 min 1.8 GB Kernel inception distance[2] against the full dataset
pr50k3_full 13 min 4.1 GB Precision and recall[3] againt the full dataset
is50k 13 min 1.8 GB Inception score[4] for CIFAR-10

In addition, the following metrics from the StyleGAN and StyleGAN2 papers are also supported:

Metric Time GPU mem Description
fid50k 13 min 1.8 GB Fréchet inception distance against 50k real images
kid50k 13 min 1.8 GB Kernel inception distance against 50k real images
pr50k3 13 min 4.1 GB Precision and recall against 50k real images
ppl2_wend 36 min 2.4 GB Perceptual path length[5] in W, endpoints, full image
ppl_zfull 36 min 2.4 GB Perceptual path length in Z, full paths, cropped image
ppl_wfull 36 min 2.4 GB Perceptual path length in W, full paths, cropped image
ppl_zend 36 min 2.4 GB Perceptual path length in Z, endpoints, cropped image
ppl_wend 36 min 2.4 GB Perceptual path length in W, endpoints, cropped image

References:

  1. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, Heusel et al. 2017
  2. Demystifying MMD GANs, Bińkowski et al. 2018
  3. Improved Precision and Recall Metric for Assessing Generative Models, Kynkäänniemi et al. 2019
  4. Improved Techniques for Training GANs, Salimans et al. 2016
  5. A Style-Based Generator Architecture for Generative Adversarial Networks, Karras et al. 2018

License

Copyright © 2021, NVIDIA Corporation. All rights reserved.

This work is made available under the Nvidia Source Code License.

Citation

@inproceedings{Karras2020ada,
  title     = {Training Generative Adversarial Networks with Limited Data},
  author    = {Tero Karras and Miika Aittala and Janne Hellsten and Samuli Laine and Jaakko Lehtinen and Timo Aila},
  booktitle = {Proc. NeurIPS},
  year      = {2020}
}

Development

This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.

Acknowledgements

We thank David Luebke for helpful comments; Tero Kuosmanen and Sabu Nadarajan for their support with compute infrastructure; and Edgar Schönfeld for guidance on setting up unconditional BigGAN.

Comments
  • Add ability to convert to rosinality state dict

    Add ability to convert to rosinality state dict

    This should enable conversion of any legacy NVidia pkl to a state_dict appropriate for rosinality's pytorch version of StyleGAN2. After conversion you should be able to pass the generated .pt file to the generate.py of that repo something like:

    python generate.py --size 1024 --pics 20 --ckpt my-converted-newtork.pt --channel_multiplier 2 --mlp_depth 2 --truncation 0.5
    

    You will need to make sure you set the channel_multiplier and mlp_depth options to correctly match the configuration of the exported network.

    opened by justinpinkney 4
  • Training starts but stops with Exiting message

    Training starts but stops with Exiting message

    `Training options: { "num_gpus": 1, "image_snapshot_ticks": 4, "network_snapshot_ticks": 4, "metrics": [], "random_seed": 0, "training_set_kwargs": { "class_name": "training.dataset.ImageFolderDataset", "path": "/content/drive/MyDrive/colab-sg2-ada-pytorch/stylegan2-ada-pytorch/datasets/output.zip", "use_labels": false, "max_size": 99119, "xflip": false, "resolution": 128 }, "data_loader_kwargs": { "pin_memory": true, "num_workers": 3, "prefetch_factor": 2 }, "G_kwargs": { "class_name": "training.networks.Generator", "z_dim": 512, "w_dim": 512, "mapping_kwargs": { "num_layers": 8 }, "synthesis_kwargs": { "channel_base": 32768, "channel_max": 512, "num_fp16_res": 4, "conv_clamp": 256 } }, "D_kwargs": { "class_name": "training.networks.Discriminator", "block_kwargs": {}, "mapping_kwargs": {}, "epilogue_kwargs": { "mbstd_group_size": 4 }, "channel_base": 32768, "channel_max": 512, "num_fp16_res": 4, "conv_clamp": 256 }, "G_opt_kwargs": { "class_name": "torch.optim.Adam", "lr": 0.002, "betas": [ 0, 0.99 ], "eps": 1e-08 }, "D_opt_kwargs": { "class_name": "torch.optim.Adam", "lr": 0.002, "betas": [ 0, 0.99 ], "eps": 1e-08 }, "loss_kwargs": { "class_name": "training.loss.StyleGAN2Loss", "r1_gamma": 10 }, "total_kimg": 25000, "batch_size": 4, "batch_gpu": 4, "ema_kimg": 10, "ema_rampup": null, "nimg": 25000000, "ada_target": 0.6, "augment_kwargs": { "class_name": "training.augment.AugmentPipe", "xflip": 1, "rotate90": 1, "xint": 1, "scale": 1, "rotate": 1, "aniso": 1, "xfrac": 1, "brightness": 1, "contrast": 1, "lumaflip": 1, "hue": 1, "saturation": 1 }, "resume_pkl": "/content/drive/MyDrive/colab-sg2-ada-pytorch/stylegan2-ada-pytorch/results2/00002-output-11gb-gpu-noaug-resumecustom/network-snapshot-000960.pkl", "ada_kimg": 100, "run_dir": "/content/drive/MyDrive/colab-sg2-ada-pytorch/stylegan2-ada-pytorch/results2/00005-output-11gb-gpu-resumecustom" }

    Output directory: /content/drive/MyDrive/colab-sg2-ada-pytorch/stylegan2-ada-pytorch/results2/00005-output-11gb-gpu-resumecustom Training data: /content/drive/MyDrive/colab-sg2-ada-pytorch/stylegan2-ada-pytorch/datasets/output.zip Training duration: 25000 kimg Number of GPUs: 1 Number of images: 99119 Image resolution: 128 Conditional model: False Dataset x-flips: False

    Creating output directory... Launching processes... Loading training set...

    Num images: 99119 Image shape: [3, 128, 128] Label shape: [0]

    Constructing networks... Resuming from "/content/drive/MyDrive/colab-sg2-ada-pytorch/stylegan2-ada-pytorch/results2/00002-output-11gb-gpu-noaug-resumecustom/network-snapshot-000960.pkl" Setting up PyTorch plugin "bias_act_plugin"... Done. Setting up PyTorch plugin "upfirdn2d_plugin"... Done.

    Generator Parameters Buffers Output shape Datatype


    mapping.fc0 262656 - [4, 512] float32 mapping.fc1 262656 - [4, 512] float32 mapping.fc2 262656 - [4, 512] float32 mapping.fc3 262656 - [4, 512] float32 mapping.fc4 262656 - [4, 512] float32 mapping.fc5 262656 - [4, 512] float32 mapping.fc6 262656 - [4, 512] float32 mapping.fc7 262656 - [4, 512] float32 mapping - 512 [4, 12, 512] float32 synthesis.b4.conv1 2622465 32 [4, 512, 4, 4] float32 synthesis.b4.torgb 264195 - [4, 3, 4, 4] float32 synthesis.b4:0 8192 16 [4, 512, 4, 4] float32 synthesis.b4:1 - - [4, 512, 4, 4] float32 synthesis.b8.conv0 2622465 80 [4, 512, 8, 8] float32 synthesis.b8.conv1 2622465 80 [4, 512, 8, 8] float32 synthesis.b8.torgb 264195 - [4, 3, 8, 8] float32 synthesis.b8:0 - 16 [4, 512, 8, 8] float32 synthesis.b8:1 - - [4, 512, 8, 8] float32 synthesis.b16.conv0 2622465 272 [4, 512, 16, 16] float16 synthesis.b16.conv1 2622465 272 [4, 512, 16, 16] float16 synthesis.b16.torgb 264195 - [4, 3, 16, 16] float16 synthesis.b16:0 - 16 [4, 512, 16, 16] float16 synthesis.b16:1 - - [4, 512, 16, 16] float32 synthesis.b32.conv0 2622465 1040 [4, 512, 32, 32] float16 synthesis.b32.conv1 2622465 1040 [4, 512, 32, 32] float16 synthesis.b32.torgb 264195 - [4, 3, 32, 32] float16 synthesis.b32:0 - 16 [4, 512, 32, 32] float16 synthesis.b32:1 - - [4, 512, 32, 32] float32 synthesis.b64.conv0 2622465 4112 [4, 512, 64, 64] float16 synthesis.b64.conv1 2622465 4112 [4, 512, 64, 64] float16 synthesis.b64.torgb 264195 - [4, 3, 64, 64] float16 synthesis.b64:0 - 16 [4, 512, 64, 64] float16 synthesis.b64:1 - - [4, 512, 64, 64] float32 synthesis.b128.conv0 1442561 16400 [4, 256, 128, 128] float16 synthesis.b128.conv1 721409 16400 [4, 256, 128, 128] float16 synthesis.b128.torgb 132099 - [4, 3, 128, 128] float16 synthesis.b128:0 - 16 [4, 256, 128, 128] float16 synthesis.b128:1 - - [4, 256, 128, 128] float32


    Total 29328669 44448 - -

    Discriminator Parameters Buffers Output shape Datatype


    b128.fromrgb 1024 16 [4, 256, 128, 128] float16 b128.skip 131072 16 [4, 512, 64, 64] float16 b128.conv0 590080 16 [4, 256, 128, 128] float16 b128.conv1 1180160 16 [4, 512, 64, 64] float16 b128 - 16 [4, 512, 64, 64] float16 b64.skip 262144 16 [4, 512, 32, 32] float16 b64.conv0 2359808 16 [4, 512, 64, 64] float16 b64.conv1 2359808 16 [4, 512, 32, 32] float16 b64 - 16 [4, 512, 32, 32] float16 b32.skip 262144 16 [4, 512, 16, 16] float16 b32.conv0 2359808 16 [4, 512, 32, 32] float16 b32.conv1 2359808 16 [4, 512, 16, 16] float16 b32 - 16 [4, 512, 16, 16] float16 b16.skip 262144 16 [4, 512, 8, 8] float16 b16.conv0 2359808 16 [4, 512, 16, 16] float16 b16.conv1 2359808 16 [4, 512, 8, 8] float16 b16 - 16 [4, 512, 8, 8] float16 b8.skip 262144 16 [4, 512, 4, 4] float32 b8.conv0 2359808 16 [4, 512, 8, 8] float32 b8.conv1 2359808 16 [4, 512, 4, 4] float32 b8 - 16 [4, 512, 4, 4] float32 b4.mbstd - - [4, 513, 4, 4] float32 b4.conv 2364416 16 [4, 512, 4, 4] float32 b4.fc 4194816 - [4, 512] float32 b4.out 513 - [4, 1] float32


    Total 28389121 352 - -

    Setting up augmentation... Distributing across 1 GPUs... Setting up training phases... Exporting sample images... Initializing logs... 2021-05-02 21:56:16.385776: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 Training for 25000 kimg...

    tick 0 kimg 25000.0 time 24s sec/tick 2.9 sec/kimg 724.56 maintenance 21.6 cpumem 4.69 gpumem 9.76 augment 0.000

    Exiting...`

    opened by graham-eisele 3
  • Transfer Learning fails when training 256 resolution model based on custom dataset

    Transfer Learning fails when training 256 resolution model based on custom dataset

    Hi, I have prepared my dataset according to dataset_tool.py and set datasets width and height to 256x256. Here is the problem: When running python train.py and my Transfer Learning source network is ffhq256 the execution fails pretty soon (in the beginning of "Constructing networks") with this error: RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 1.What is the problem here? Thanks in advance! PS: I also tried ffhq512 and it can train the model, but the output model's structure is ffhq256, so I'm very confused.

    opened by LLSean 2
  • SeFa vectors must be used in W-space

    SeFa vectors must be used in W-space

    Dear authors, thank you for the great work and accompanying videos, they are very helpful.

    Describe the bug I believe the SeFa-discovered latent directions should be used to alter the latent vectors from the W-space, while the code seems to use them in the Z-space: https://github.com/dvschultz/stylegan2-ada-pytorch/blob/main/apply_factor.py#L37.

    Expected behavior According to the SeFa authors' code, for StyleGan2 the eigenvectors are used to alter the Ws: https://github.com/genforce/sefa/blob/master/interface.py#L53.

    In any case, interestingly, the visual results from applying eigenvectors in Z-space kind of correlate with W-space ones.

    opened by vlfom 2
  • Possible to change dataset format?

    Possible to change dataset format?

    In the creation of dataset

    Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels.

    Is it possible to change it to JPEG instead? Cause PNG takes up huge space and in the TF version of ADA there is a hack by using

    Raw dataset creations: Taken from the @skyflynil repo, reduces the size of datasets dramatically. Use create_from images_raw and create_from image_folders_raw in dataset creation, and use --use-raw=True in training (False by default!)

    Is it possible to implement this in the pytorch version?

    opened by YukiSakuma 1
  • how to convert .pkl to .pt

    how to convert .pkl to .pt

    I want to change the pkl file learned through stylegan2-ada to a pt file, but I would like to explain it in more detail. For example in SG2-ADA-PT to Rosinality.ipynb !python export_weights.py /path/to/pkl /path/to/saved.pt in this code {/path/to/pkl /path/to/saved. pt } It would be nice if you could tell me an example to put here.

    opened by leeisack 1
  • closed_form_factorization.py doesn't work in google colab

    closed_form_factorization.py doesn't work in google colab

    Hi! I have a problem with script making Closed-Form Factorization. I have pkl file from training stylegan2-ada model (resolution 256) by tensorflow (in google colab too). When I run closed_form_factorization.py with this pkl file, I get this error: File "closed_form_factorization.py", line 20, in G = pickle.load(f)['G_ema'].to(device) # type: ignore ModuleNotFoundError: No module named 'dnnlib.tflib'

    I tried to upload tflib module in dnnlib directory manually (from classik stylegan2-ada repo) but then I got another error: File "closed_form_factorization.py", line 21, in G = pickle.load(f)['G_ema'].to(device) # type: ignore TypeError: tuple indices must be integers or slices, not str

    If I change indice to integer, then I get error: File "closed_form_factorization.py", line 21, in G = pickle.load(f)[0].to(device) # type: ignore AttributeError: 'Network' object has no attribute 'to'

    If I delete ".to(device)" - new error: File "closed_form_factorization.py", line 25, in for k in G.named_parameters() AttributeError: 'Network' object has no attribute 'named_parameters'

    If I delete indice and ".to(device)": File "closed_form_factorization.py", line 25, in for k in G.named_parameters() AttributeError: 'tuple' object has no attribute 'named_parameters'

    So, I think I did something wrong from the very beggining. My version of TF 1.14 (if it's important), because stylegan2-ada works only in 1.x TF. What should I do to revive closed_form_factorization.py in colab? Impatiantly waiting for your response.

    opened by katyonats 1
  • generic GAN question color space collapse

    generic GAN question color space collapse

    I am wondering if you have any tips, my training ( using transfer learning from FFHQ network) runs fine for a few 1000 Kimgs then starts to show a tendency to go monochromatic, into one hue of a color. my input set is very random, maybe thats one of the problems and non labeled. when I restart the training with the newly trained network, the color comes back, but quality of the images deteriorates and the fid metric rises continously indicating some divergence. visually it seems still ok, but objects become more distorted.

    would lowering the learning rate help here, I tried to turn off all adaptive augmentation but still get distortion. we are achieving our goal to some extent, the end result being objects that at first sight are one thing untill you inspect closer and you see that it may be actually having aspects of two or three objects from the original dataset.

    apart from the slow startup, the pytorch seems faster and better behaving than the tensorflow version, but this could be subjective.

    thnx

    opened by kbuggenhout 1
  • [Request] Add support for aesthetic score -aided dataset into the discriminator and loss functions.

    [Request] Add support for aesthetic score -aided dataset into the discriminator and loss functions.

    Add aesthetic-rating aided dataset support into the discriminator loss function. Add a function that could penalize the discriminator for thinking an image from the dataset is "more real" than one which has a higher aesthetic-score rating (could be an aesthetic score system beetwen 1 to 5 or 1 to 10). We could organize a dataset based on aesthetic scores by seperating them into seperate folders (example : folders named 1,2,3,4,5,6,7,8,9,10 , for 1-10 aesthetic score rating).

    This would greatly help use-cases where you have a lot of images for a dataset but only a select few are good enough in quality and creativity, but you still wanna keep the diversity of a much larger dataset.

    opened by nom57 1
  • Generate render a video from projected W in stylegan3?

    Generate render a video from projected W in stylegan3?

    Interested in knowing if the video generation with the w vector projection can be replicated in your Stylegan3 repo? This has been very helpful, thank you @dvschultz !

    Process: Project images as npz file (vs npy) -> combine multiple vectors into a single npz file -> generate interpolation video between projected images. Currently testing process with https://github.com/PDillis/stylegan3-fun however lacking video options.

    // !python generate.py --process=interpolation --interpolation=linear --easing=easeInOutQuad --space=w --network=/content/ladiesblack.pkl --outdir=/content/combined-proj/ --projected-w=/content/npz/combined.npz --frames=120

    opened by georgeguida 0
  • Problem with metrics

    Problem with metrics

    It appears I can no longer calculate metrics. An attempt no limit the number of workers for Dataloader via --workers argument to 2 or 1 didn't work, the Dataloader still trying to run 3 workers.

    Evaluating metrics... /usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. cpuset_checked)) /usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return forward_call(*input, **kwargs) /usr/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 28 leaked semaphores to clean up at shutdown len(cache))

    opened by GorMorGor 0
  • cannot import name 'notf' from 'tensorboard.compat'

    cannot import name 'notf' from 'tensorboard.compat'

    Everything was running just fine for the past 3 months but since today I'm keep running this issue.

    Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/tensorboard/compat/init.py", line 42, in tf from tensorboard.compat import notf # noqa: F401 ImportError: cannot import name 'notf' from 'tensorboard.compat' (/usr/local/lib/python3.7/dist-packages/tensorboard/compat/init.py)

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "train.py", line 582, in main() # pylint: disable=no-value-for-parameter File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 829, in call return self.main(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "train.py", line 575, in main subprocess_fn(rank=0, args=args, temp_dir=temp_dir) File "train.py", line 422, in subprocess_fn training_loop.training_loop(rank=rank, **args) File "/content/drive/My Drive/colab-sg2-ada-pytorch/stylegan2-ada-pytorch/training/training_loop.py", line 244, in training_loop stats_tfevents = tensorboard.SummaryWriter(run_dir) File "/usr/local/lib/python3.7/dist-packages/torch/utils/tensorboard/writer.py", line 220, in init self._get_file_writer() File "/usr/local/lib/python3.7/dist-packages/torch/utils/tensorboard/writer.py", line 251, in _get_file_writer self.flush_secs, self.filename_suffix) File "/usr/local/lib/python3.7/dist-packages/torch/utils/tensorboard/writer.py", line 61, in init log_dir, max_queue, flush_secs, filename_suffix) File "/usr/local/lib/python3.7/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 72, in init tf.io.gfile.makedirs(logdir) File "/usr/local/lib/python3.7/dist-packages/tensorboard/lazy.py", line 65, in getattr return getattr(load_once(self), attr_name) File "/usr/local/lib/python3.7/dist-packages/tensorboard/lazy.py", line 97, in wrapper cache[arg] = f(arg) File "/usr/local/lib/python3.7/dist-packages/tensorboard/lazy.py", line 50, in load_once module = load_fn() File "/usr/local/lib/python3.7/dist-packages/tensorboard/compat/init.py", line 45, in tf import tensorflow File "/usr/local/lib/python3.7/dist-packages/tensorflow/init.py", line 51, in from ._api.v2 import compat File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/init.py", line 37, in from . import v1 File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/init.py", line 30, in from . import compat File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/compat/init.py", line 37, in from . import v1 File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/compat/v1/init.py", line 47, in from tensorflow._api.v2.compat.v1 import lite File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/lite/init.py", line 9, in from . import experimental File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/lite/experimental/init.py", line 8, in from . import authoring File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/lite/experimental/authoring/init.py", line 8, in from tensorflow.lite.python.authoring.authoring import compatible File "/usr/local/lib/python3.7/dist-packages/tensorflow/lite/python/authoring/authoring.py", line 43, in from tensorflow.lite.python import convert File "/usr/local/lib/python3.7/dist-packages/tensorflow/lite/python/convert.py", line 29, in from tensorflow.lite.python import util File "/usr/local/lib/python3.7/dist-packages/tensorflow/lite/python/util.py", line 51, in from jax import xla_computation as _xla_computation File "/usr/local/lib/python3.7/dist-packages/jax/init.py", line 59, in from .core import eval_context as ensure_compile_time_eval File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 47, in import jax._src.pretty_printer as pp File "/usr/local/lib/python3.7/dist-packages/jax/_src/pretty_printer.py", line 56, in CAN_USE_COLOR = _can_use_color() File "/usr/local/lib/python3.7/dist-packages/jax/_src/pretty_printer.py", line 54, in _can_use_color return sys.stdout.isatty() AttributeError: 'Logger' object has no attribute 'isatty'

    opened by GorMorGor 6
  • Error : size mismatch

    Error : size mismatch

    Hi,

    I am trying to convert the stylegan_human_v2_1024.pkl checkpoints from StyleGAN-Human repo (1024x512 res) but I get the following error : Capture d’écran 2022-05-29 à 17 35 20

    Does anyone encountered this ?

    Thanks in advance for your help

    opened by nourgana 0
Owner
Derrick Schultz
Artists who uses code. Most of this stuff isn’t production level—I’m an artist first, programmer second.
Derrick Schultz
StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation

StyleGAN2 with adaptive discriminator augmentation (ADA) — Official TensorFlow implementation Training Generative Adversarial Networks with Limited Da

NVIDIA Research Projects 1.7k Dec 29, 2022
StyleGAN2-ada for practice

This version of the newest PyTorch-based StyleGAN2-ada is intended mostly for fellow artists, who rarely look at scientific metrics, but rather need a working creative tool. Tested on Python 3.7 + PyTorch 1.7.1, requires FFMPEG for sequence-to-video conversions. For more explicit details refer to the original implementations.

vadim epstein 170 Nov 16, 2022
A colab notebook for training Stylegan2-ada on colab, transfer learning onto your own dataset.

Stylegan2-Ada-Google-Colab-Starter-Notebook A no thrills colab notebook for training Stylegan2-ada on colab. transfer learning onto your own dataset h

Harnick Khera 66 Dec 16, 2022
Cartoon-StyleGan2 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Fine-tuning StyleGAN2 for Cartoon Face Generation

Jihye Back 520 Jan 4, 2023
Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Face Identity Disentanglement via Latent Space Mapping - Implement in pytorch with StyleGAN 2 Description Pytorch implementation of the paper Face Ide

Daniel Roich 58 Dec 24, 2022
StyleGAN2 - Official TensorFlow Implementation

StyleGAN2 - Official TensorFlow Implementation

NVIDIA Research Projects 10.1k Dec 28, 2022
Navigating StyleGAN2 w latent space using CLIP

Navigating StyleGAN2 w latent space using CLIP an attempt to build sth with the official SG2-ADA Pytorch impl kinda inspired by Generating Images from

Mike K. 55 Dec 6, 2022
StyleGAN2 Webtoon / Anime Style Toonify

StyleGAN2 Webtoon / Anime Style Toonify Korea Webtoon or Japanese Anime Character Stylegan2 base high Quality 1024x1024 / 512x512 Generate and Transfe

null 121 Dec 21, 2022
Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Matthias Wright 169 Dec 26, 2022
Fine-tuning StyleGAN2 for Cartoon Face Generation

Cartoon-StyleGAN ?? : Fine-tuning StyleGAN2 for Cartoon Face Generation Abstract Recent studies have shown remarkable success in the unsupervised imag

Jihye Back 520 Jan 4, 2023
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Justin 1.1k Dec 24, 2022
A web porting for NVlabs' StyleGAN2, to facilitate exploring all kinds characteristic of StyleGAN networks

This project is a web porting for NVlabs' StyleGAN2, to facilitate exploring all kinds characteristic of StyleGAN networks. Thanks for NVlabs' excelle

K.L. 150 Dec 15, 2022
ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

BG Kim 3 Oct 6, 2022
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 49 Nov 23, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

null 364 Dec 14, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

Liang Liu 28 Nov 16, 2022
Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection This project provides an implementation for "LLA: Loss-aware Label Assignment for Dens

null 35 Dec 6, 2022