StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation

Overview

StyleGAN2 with adaptive discriminator augmentation (ADA)
— Official TensorFlow implementation

Teaser image

Training Generative Adversarial Networks with Limited Data
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, Timo Aila
https://arxiv.org/abs/2006.06676

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch and when fine-tuning an existing GAN on another dataset. We demonstrate, on several datasets, that good results are now possible using only a few thousand training images, often matching StyleGAN2 results with an order of magnitude fewer images. We expect this to open up new application domains for GANs. We also find that the widely used CIFAR-10 is, in fact, a limited data benchmark, and improve the record FID from 5.59 to 2.42.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing

Looking for the PyTorch version?

The Official PyTorch version is now available and supersedes the TensorFlow version. See the full list of versions here.

What's new

This repository supersedes the original StyleGAN2 with the following new features:

  • ADA: Significantly better results for datasets with less than ~30k training images. State-of-the-art results for CIFAR-10.
  • Mixed-precision support: ~1.6x faster training, ~1.3x faster inference, ~1.5x lower GPU memory consumption.
  • Better hyperparameter defaults: Reasonable out-of-the-box results for different dataset resolutions and GPU counts.
  • Clean codebase: Extensive refactoring and simplification. The code should be generally easier to work with.
  • Command line tools: Easily reproduce training runs from the paper, generate projection videos for arbitrary images, etc.
  • Network import: Full support for network pickles produced by StyleGAN and StyleGAN2. Faster loading times.
  • Augmentation pipeline: Self-contained, reusable GPU implementation of extensive high-quality image augmentations.
  • Bugfixes

External data repository

Path Description
stylegan2-ada Main directory hosted on Amazon S3
  ├  ada-paper.pdf Paper PDF
  ├  images Curated example images produced using the pre-trained models
  ├  videos Curated example interpolation videos
  └  pretrained Pre-trained models
    ├  metfaces.pkl MetFaces at 1024x1024, transfer learning from FFHQ using ADA
    ├  brecahad.pkl BreCaHAD at 512x512, trained from scratch using ADA
    ├  afhqcat.pkl AFHQ Cat at 512x512, trained from scratch using ADA
    ├  afhqdog.pkl AFHQ Dog at 512x512, trained from scratch using ADA
    ├  afhqwild.pkl AFHQ Wild at 512x512, trained from scratch using ADA
    ├  cifar10.pkl Class-conditional CIFAR-10 at 32x32
    ├  ffhq.pkl FFHQ at 1024x1024, trained using original StyleGAN2
    ├  paper-fig7c-training-set-sweeps All models used in Fig.7c (baseline, ADA, bCR)
    ├  paper-fig8a-comparison-methods All models used in Fig.8a (comparison methods)
    ├  paper-fig8b-discriminator-capacity All models used in Fig.8b (discriminator capacity)
    ├  paper-fig11a-small-datasets All models used in Fig.11a (small datasets, transfer learning)
    ├  paper-fig11b-cifar10 All models used in Fig.11b (CIFAR-10)
    ├  transfer-learning-source-nets Models used as starting point for transfer learning
    └  metrics Feature detectors used by the quality metrics

Requirements

  • Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
  • 64-bit Python 3.6 or 3.7. We recommend Anaconda3 with numpy 1.14.3 or newer.
  • We recommend TensorFlow 1.14, which we used for all experiments in the paper, but TensorFlow 1.15 is also supported on Linux. TensorFlow 2.x is not supported.
  • On Windows you need to use TensorFlow 1.14, as the standard 1.15 installation does not include necessary C++ headers.
  • 1–8 high-end NVIDIA GPUs with at least 12 GB of GPU memory, NVIDIA drivers, CUDA 10.0 toolkit and cuDNN 7.5.
  • Docker users: use the provided Dockerfile to build an image with the required library dependencies.

The generator and discriminator networks rely heavily on custom TensorFlow ops that are compiled on the fly using NVCC. On Windows, the compilation requires Microsoft Visual Studio to be in PATH. We recommend installing Visual Studio Community Edition and adding it into PATH using "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat".

Getting started

Pre-trained networks are stored as *.pkl files that can be referenced using local filenames or URLs:

# Generate curated MetFaces images without truncation (Fig.10 left)
python generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl

# Generate uncurated MetFaces images with truncation (Fig.12 upper left)
python generate.py --outdir=out --trunc=0.7 --seeds=600-605 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl

# Generate class conditional CIFAR-10 images (Fig.17 left, Car)
python generate.py --outdir=out --trunc=1 --seeds=0-35 --class=1 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/cifar10.pkl

Outputs from the above commands are placed under out/*.png. You can change the location with --outdir. Temporary cache files, such as CUDA build results and downloaded network pickles, will be saved under $HOME/.cache/dnnlib. This can be overridden using the DNNLIB_CACHE_DIR environment variable.

Docker: You can run the above curated image example using Docker as follows:

docker build --tag stylegan2ada:latest .
docker run --gpus all -it --rm -v `pwd`:/scratch --user $(id -u):$(id -g) stylegan2ada:latest bash -c \
    "(cd /scratch && DNNLIB_CACHE_DIR=/scratch/.cache python3 generate.py --trunc=1 --seeds=85,265,297,849 \
    --outdir=out --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl)"

Note: The above defaults to a container base image that requires NVIDIA driver release r455.23 or later. To build an image for older drivers and GPUs, run:

docker build --build-arg BASE_IMAGE=tensorflow/tensorflow:1.14.0-gpu-py3 --tag stylegan2ada:latest .

Projecting images to latent space

To find the matching latent vector for a given image file, run:

python projector.py --outdir=out --target=targetimg.png \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/ffhq.pkl

For optimal results, the target image should be cropped and aligned similar to the original FFHQ dataset. The above command saves the projection target out/target.png, result out/proj.png, latent vector out/dlatents.npz, and progression video out/proj.mp4. You can render the resulting latent vector by specifying --dlatents for python generate.py:

python generate.py --outdir=out --dlatents=out/dlatents.npz \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/ffhq.pkl

Preparing datasets

Datasets are stored as multi-resolution TFRecords, i.e., the same format used by StyleGAN and StyleGAN2. Each dataset consists of multiple *.tfrecords files stored under a common directory, e.g., ~/datasets/ffhq/ffhq-r*.tfrecords

MetFaces: Download the MetFaces dataset and convert to TFRecords:

python dataset_tool.py create_from_images ~/datasets/metfaces ~/downloads/metfaces/images
python dataset_tool.py display ~/datasets/metfaces

BreCaHAD: Download the BreCaHAD dataset. Generate 512x512 resolution crops and convert to TFRecords:

python dataset_tool.py extract_brecahad_crops --cropsize=512 \
    --output_dir=/tmp/brecahad-crops --brecahad_dir=~/downloads/brecahad/images

python dataset_tool.py create_from_images ~/datasets/brecahad /tmp/brecahad-crops
python dataset_tool.py display ~/datasets/brecahad

AFHQ: Download the AFHQ dataset and convert to TFRecords:

python dataset_tool.py create_from_images ~/datasets/afhqcat ~/downloads/afhq/train/cat
python dataset_tool.py create_from_images ~/datasets/afhqdog ~/downloads/afhq/train/dog
python dataset_tool.py create_from_images ~/datasets/afhqwild ~/downloads/afhq/train/wild
python dataset_tool.py display ~/datasets/afhqcat

CIFAR-10: Download the CIFAR-10 python version. Convert to two separate TFRecords for unconditional and class-conditional training:

python dataset_tool.py create_cifar10 --ignore_labels=1 \
    ~/datasets/cifar10u ~/downloads/cifar-10-batches-py

python dataset_tool.py create_cifar10 --ignore_labels=0 \
    ~/datasets/cifar10c ~/downloads/cifar-10-batches-py

python dataset_tool.py display ~/datasets/cifar10c

FFHQ: Download the Flickr-Faces-HQ dataset as TFRecords:

pushd ~
git clone https://github.com/NVlabs/ffhq-dataset.git
cd ffhq-dataset
python download_ffhq.py --tfrecords
popd
python dataset_tool.py display ~/ffhq-dataset/tfrecords/ffhq

LSUN: Download the desired LSUN categories in LMDB format from the LSUN project page and convert to TFRecords:

python dataset_tool.py create_lsun --resolution=256 --max_images=200000 \
    ~/datasets/lsuncat200k ~/downloads/lsun/cat_lmdb

python dataset_tool.py display ~/datasets/lsuncat200k

Custom: Custom datasets can be created by placing all images under a single directory. The images must be square-shaped and they must all have the same power-of-two dimensions. To convert the images to multi-resolution TFRecords, run:

python dataset_tool.py create_from_images ~/datasets/custom ~/custom-images
python dataset_tool.py display ~/datasets/custom

Training new networks

In its most basic form, training new networks boils down to:

python train.py --outdir=~/training-runs --gpus=1 --data=~/datasets/custom --dry-run
python train.py --outdir=~/training-runs --gpus=1 --data=~/datasets/custom

The first command is optional; it will validate the arguments, print out the resulting training configuration, and exit. The second command will kick off the actual training.

In this example, the results will be saved to a newly created directory ~/training-runs/<RUNNING_ID>-custom-auto1 (controlled by --outdir). The training will export network pickles (network-snapshot-<KIMG>.pkl) and example images (fakes<KIMG>.png) at regular intervals (controlled by --snap). For each pickle, it will also evaluate FID by default (controlled by --metrics) and log the resulting scores in metric-fid50k_full.txt.

The name of the output directory (e.g., 00000-custom-auto1) reflects the hyperparameter configuration that was used. In this case, custom indicates the training set (--data) and auto1 indicates the base configuration that was used to select the hyperparameters (--cfg):

Base config Description
auto (default) Automatically select reasonable defaults based on resolution and GPU count. Serves as a good starting point for new datasets, but does not necessarily lead to optimal results.
stylegan2 Reproduce results for StyleGAN2 config F at 1024x1024 using 1, 2, 4, or 8 GPUs.
paper256 Reproduce results for FFHQ and LSUN Cat at 256x256 using 1, 2, 4, or 8 GPUs.
paper512 Reproduce results for BreCaHAD and AFHQ at 512x512 using 1, 2, 4, or 8 GPUs.
paper1024 Reproduce results for MetFaces at 1024x1024 using 1, 2, 4, or 8 GPUs.
cifar Reproduce results for CIFAR-10 (tuned configuration) using 1 or 2 GPUs.
cifarbaseline Reproduce results for CIFAR-10 (baseline configuration) using 1 or 2 GPUs.

The training configuration can be further customized with additional arguments. Common examples:

  • --aug=noaug disables ADA (default: enabled).
  • --mirror=1 amplifies the dataset with x-flips. Often beneficial, even with ADA.
  • --resume=ffhq1024 --snap=10 performs transfer learning from FFHQ trained at 1024x1024.
  • --resume=~/training-runs/<RUN_NAME>/network-snapshot-<KIMG>.pkl resumes where a previous training run left off.
  • --gamma=10 overrides R1 gamma. We strongly recommend trying out at least a few different values for each new dataset.

Augmentation fine-tuning:

  • --aug=ada --target=0.7 adjusts ADA target value (default: 0.6).
  • --aug=adarv selects the alternative ADA heuristic (requires a separate validation set).
  • --augpipe=blit limits the augmentation pipeline to pixel blitting only.
  • --augpipe=bgcfnc enables all available augmentations (blit, geom, color, filter, noise, cutout).
  • --cmethod=bcr enables bCR with small integer translations.

Please refer to python train.py --help for the full list.

Expected training time

The total training time depends heavily on the resolution, number of GPUs, desired quality, dataset, and hyperparameters. In general, the training time can be expected to scale linearly with respect to the resolution and inversely proportional with respect to the number of GPUs. Small datasets tend to reach their lowest achievable FID faster than larger ones, but the convergence is somewhat less predictable. Transfer learning tends to converge significantly faster than training from scratch.

To give a rough idea of typical training times, the following figure shows several examples of FID as a function of wallclock time. Each curve corresponds to training a given dataset from scratch using --cfg=auto with a given number of NVIDIA Tesla V100 GPUs:

Training curves

Please note that --cfg=auto only serves as a reasonable first guess for the hyperparameters — it does not necessarily lead to optimal results for a given dataset. For example, --cfg=stylegan2 yields considerably better FID for FFHQ-140k at 1024x1024 than illustrated above. We recommend trying out at least a few different values of --gamma for each new dataset.

Preparing training set sweeps

In the paper, we perform several experiments using artificially limited/amplified versions of the training data, such as ffhq30k, ffhq140k, and lsuncat30k. These are constructed by first unpacking the original dataset into a temporary directory with python dataset_tool.py unpack and then repackaging the appropriate versions into TFRecords with python dataset_tool.py pack. In the following examples, the temporary directories are created under /tmp and can be safely deleted afterwards.

# Unpack FFHQ images at 256x256 resolution.
python dataset_tool.py unpack --resolution=256 \
    --tfrecord_dir=~/ffhq-dataset/tfrecords/ffhq --output_dir=/tmp/ffhq-unpacked

# Create subset with 30k images.
python dataset_tool.py pack --num_train=30000 --num_validation=10000 --seed=123 \
    --tfrecord_dir=~/datasets/ffhq30k --unpacked_dir=/tmp/ffhq-unpacked

# Create amplified version with 140k images.
python dataset_tool.py pack --num_train=70000 --num_validation=0 --mirror=1 --seed=123 \
    --tfrecord_dir=~/datasets/ffhq140k --unpacked_dir=/tmp/ffhq-unpacked

# Unpack LSUN Cat images at 256x256 resolution.
python dataset_tool.py unpack --resolution=256 \
    --tfrecord_dir=~/datasets/lsuncat200k --output_dir=/tmp/lsuncat200k-unpacked

# Create subset with 30k images.
python dataset_tool.py pack --num_train=30000 --num_validation=10000 --seed=123 \
    --tfrecord_dir=~/datasets/lsuncat30k --unpacked_dir=/tmp/lsuncat200k-unpacked

Please note that when training with artifically limited/amplified datasets, the quality metrics (e.g., fid50k_full) should still be evaluated against the corresponding original datasets. This can be done by specifying a separate metric dataset for train.py and calc_metrics.py using the --metricdata argument. For example:

python train.py [OTHER_OPTIONS] --data=~/datasets/ffhq30k --metricdata=~/ffhq-dataset/tfrecords/ffhq

Reproducing training runs from the paper

The pre-trained network pickles (stylegan2-ada/pretrained/paper-fig*) reflect the training configuration the same way as the output directory names, making it straightforward to reproduce a given training run from the paper. For example:

# 1. AFHQ Dog
# paper-fig11a-small-datasets/afhqdog-mirror-paper512-ada.pkl
python train.py --outdir=~/training-runs --gpus=8 --data=~/datasets/afhqdog \
    --mirror=1 --cfg=paper512 --aug=ada

# 2. Class-conditional CIFAR-10
# pretrained/paper-fig11b-cifar10/cifar10c-cifar-ada-best-fid.pkl
python train.py --outdir=~/training-runs --gpus=2 --data=~/datasets/cifar10c \
    --cfg=cifar --aug=ada

# 3. MetFaces with transfer learning from FFHQ
# paper-fig11a-small-datasets/metfaces-mirror-paper1024-ada-resumeffhq1024.pkl
python train.py --outdir=~/training-runs --gpus=8 --data=~/datasets/metfaces \
    --mirror=1 --cfg=paper1024 --aug=ada --resume=ffhq1024 --snap=10

# 4. 10k subset of FFHQ with ADA and bCR
# paper-fig7c-training-set-sweeps/ffhq10k-paper256-ada-bcr.pkl
python train.py --outdir=~/training-runs --gpus=8 --data=~/datasets/ffhq10k \
    --cfg=paper256 --aug=ada --cmethod=bcr --metricdata=~/ffhq-dataset/tfrecords/ffhq

# 5. StyleGAN2 config F
# transfer-learning-source-nets/ffhq-res1024-mirror-stylegan2-noaug.pkl
python train.py --outdir=~/training-runs --gpus=8 --data=~/ffhq-dataset/tfrecords/ffhq \
    --res=1024 --mirror=1 --cfg=stylegan2 --aug=noaug --metrics=fid50k

Notes:

  • You can use fewer GPUs than shown in the above examples. This will only increase the training time — it will not affect the quality of the results.
  • Example 3 specifies --snap=10 to export network pickles more frequently than usual. This is recommended, because transfer learning tends to yield very fast convergence.
  • Example 4 specifies --metricdata to evaluate quality metrics against the original FFHQ dataset, not the artificially limited 10k subset used for training.
  • Example 5 specifies --metrics=fid50k to evaluate FID the same way as in the StyleGAN2 paper (see below).

Quality metrics

By default, train.py will automatically compute FID for each network pickle. We strongly recommend inspecting metric-fid50k_full.txt at regular intervals to monitor the training progress. When desired, the automatic computation can be disabled with --metrics none to speed up the training.

Additional quality metrics can also be computed after the training:

# Previous training run: look up options automatically, save result to text file.
python calc_metrics.py --metrics=pr50k3_full \
    --network=~/training-runs/00000-ffhq10k-res64-auto1/network-snapshot-000000.pkl

# Pretrained network pickle: specify dataset explicitly, print result to stdout.
python calc_metrics.py --metrics=fid50k_full --metricdata=~/datasets/ffhq --mirror=1 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/ffhq.pkl

The first example will automatically find training_options.json stored alongside the network pickle and perform the same operation as if --metrics pr50k3_full had been specified during training. The second example will download a pre-trained network pickle, in which case the values of --mirror and --metricdata have to be specified explicitly.

Note that many of the metrics have a significant one-off cost (up to an hour or more) when they are calculated for the first time using a given dataset. Also note that the evaluation is done using a different random seed each time, so the results will vary if the same metric is computed multiple times.

We employ the following metrics in the ADA paper. The expected execution times correspond to using one Tesla V100 GPU at 1024x1024 and 256x256 resolution:

Metric 1024x1024 256x256 Description
fid50k_full 15 min 5 min Fréchet inception distance[1] against the full dataset.
kid50k_full 15 min 5 min Kernel inception distance[2] against the full dataset.
pr50k3_full 20 min 10 min Precision and recall[3] againt the full dataset.
is50k 25 min 5 min Inception score[4] for CIFAR-10.

In addition, all metrics that were used in the StyleGAN and StyleGAN2 papers are also supported for backwards compatibility:

Legacy: StyleGAN2 1024x1024 Description
fid50k 15 min Fréchet inception distance against 50k real images.
kid50k 15 min Kernel inception distance against 50k real images.
pr50k3 20 min Precision and recall against 50k real images.
ppl2_wend 40 min Perceptual path length[5] in W at path endpoints against full image.
Legacy: StyleGAN 1024x1024 Description
ppl_zfull 40 min Perceptual path length in Z for full paths against cropped image.
ppl_wfull 40 min Perceptual path length in W for full paths against cropped image.
ppl_zend 40 min Perceptual path length in Z at path endpoints against cropped image.
ppl_wend 40 min Perceptual path length in W at path endpoints against cropped image.
ls 10 hrs Linear separability[5] with respect to CelebA attributes.

References:

  1. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, Heusel et al. 2017
  2. Demystifying MMD GANs, Bińkowski et al. 2018
  3. Improved Precision and Recall Metric for Assessing Generative Models, Kynkäänniemi et al. 2019
  4. Improved Techniques for Training GANs, Salimans et al. 2016
  5. A Style-Based Generator Architecture for Generative Adversarial Networks, Karras et al. 2018

License

Copyright © 2020, NVIDIA Corporation. All rights reserved.

This work is made available under the Nvidia Source Code License.

Citation

@inproceedings{Karras2020ada,
  title     = {Training Generative Adversarial Networks with Limited Data},
  author    = {Tero Karras and Miika Aittala and Janne Hellsten and Samuli Laine and Jaakko Lehtinen and Timo Aila},
  booktitle = {Proc. NeurIPS},
  year      = {2020}
}

Development

This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.

Acknowledgements

We thank David Luebke for helpful comments; Tero Kuosmanen and Sabu Nadarajan for their support with compute infrastructure; and Edgar Schönfeld for guidance on setting up unconditional BigGAN.

Comments
  • rtx 3000 series broken compatibility

    rtx 3000 series broken compatibility

    I tried to install nvidia driver ( 455 ) by myself on my ubuntu 18.04 with python 3.7 and tensorflow 1.14 (also tried 1.15). It always said it couldn't find a gpu when trying to start training (or other errors like attempting to import cublas.10 files with a failure, while I had cuda 11 installed instead ). I got an rtx 3090 founder edition gpu. I tried different approaches by reinstalling things and wasted more than 10 hours, it never worked for me. It was working on my titan rtx though, on a few different computer rigs. Finally I thought that maintainers claimed it is working on their end for rtx 3000, maybe I can try their docker container. It didn't work initially, then I realized I have a few more steps to do, so I installed nvidia-docker2 ( nvidia-container-toolkit ) thinking that it should certainly work. Unfortunately, it causes errors again:

    Output directory: ./results/00015-jjl_1024-mirror-24gb-gpu-bg-resumeffhq1024 Training data: ./datasets/jjl_1024 Training length: 25000 kimg Resolution: 1024 Number of GPUs: 1

    Creating output directory... Loading training set... Image shape: [3, 1024, 1024] Label shape: [0]

    Constructing networks... Setting up TensorFlow plugin "fused_bias_act.cu": Compiling... Failed! Traceback (most recent call last): File "train.py", line 591, in main() File "train.py", line 583, in main run_training(**vars(args)) File "train.py", line 473, in run_training training_loop.training_loop(**training_options) File "/var/www/training/training_loop.py", line 123, in training_loop Gs = G.clone('Gs') File "/var/www/dnnlib/tflib/network.py", line 457, in clone net.copy_vars_from(self) File "/var/www/dnnlib/tflib/network.py", line 490, in copy_vars_from src_net._get_vars() File "/var/www/dnnlib/tflib/network.py", line 297, in _get_vars self._vars = OrderedDict(self._get_own_vars()) File "/var/www/dnnlib/tflib/network.py", line 286, in _get_own_vars self._init_graph() File "/var/www/dnnlib/tflib/network.py", line 151, in _init_graph out_expr = self._build_func(*self._input_templates, **build_kwargs) File "/var/www/training/networks.py", line 231, in G_main num_layers = components.synthesis.input_shape[1] File "/var/www/dnnlib/tflib/network.py", line 232, in input_shape return self.input_shapes[0] File "/var/www/dnnlib/tflib/network.py", line 219, in input_shapes self._input_shapes = [t.shape.as_list() for t in self.input_templates] File "/var/www/dnnlib/tflib/network.py", line 267, in input_templates self._init_graph() File "/var/www/dnnlib/tflib/network.py", line 151, in _init_graph out_expr = self._build_func(*self._input_templates, **build_kwargs) File "/var/www/training/networks.py", line 439, in G_synthesis x = layer(x, layer_idx=0, fmaps=nf(1), kernel=3) File "/var/www/training/networks.py", line 392, in layer x = modulated_conv2d_layer(x, dlatents_in[:, layer_idx], fmaps=fmaps, kernel=kernel, up=up, resample_kernel=resample_kernel, fused_modconv=fused_modconv) File "/var/www/training/networks.py", line 105, in modulated_conv2d_layer s = apply_bias_act(s, bias_var='mod_bias', trainable=trainable) + 1 # [BI] Add bias (initially 1). File "/var/www/training/networks.py", line 50, in apply_bias_act return fused_bias_act(x, b=tf.cast(b, x.dtype), act=act, gain=gain, clamp=clamp) File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 72, in fused_bias_act return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain, clamp=clamp) File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 132, in _fused_bias_act_cuda cuda_op = _get_plugin().fused_bias_act File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 18, in _get_plugin return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu') File "/var/www/dnnlib/tflib/custom_ops.py", line 159, in get_plugin _run_cmd(nvcc_cmd + ' "%s" --shared -o "%s" --keep --keep-dir "%s"' % (cuda_file, tmp_file, tmp_dir)) File "/var/www/dnnlib/tflib/custom_ops.py", line 69, in _run_cmd raise RuntimeError('NVCC returned an error. See below for full command line and output log:\n\n%s\n\n%s' % (cmd, output)) RuntimeError: NVCC returned an error. See below for full command line and output log:

    nvcc --compiler-options '-fPIC' --compiler-options '-I/usr/local/lib/python3.6/dist-packages/tensorflow/include -D_GLIBCXX_USE_CXX11_ABI=0' --linker-options '-L/usr/local/lib/python3.6/dist-packages/tensorflow -l:libtensorflow_framework.so.1' --gpu-architecture=sm_86 --use_fast_math --disable-warnings --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/protobuf_archive/src" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/com_google_absl" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/eigen_archive" 2>&1 "/var/www/dnnlib/tflib/ops/fused_bias_act.cu" --shared -o "/tmp/tmp4dn1nm6o/fused_bias_act_tmp.so" --keep --keep-dir "/tmp/tmp4dn1nm6o"

    nvcc fatal : Value 'sm_86' is not defined for option 'gpu-architecture'

    By googling it I identified that similar errors ( sm_75 ) are occurring when there is code / cuda / driver compatibility issues. At least that's what people say. Please help with a decent working container version at least.

    opened by JulianPinzaru 38
  • RTX 30x0 Support

    RTX 30x0 Support

    Your current docker image relies on an older version of CUDA. The current 3080 and 3090 series GPUs are only supported under CUDA 11.1. It would be wonderful if you could update the image or offer a workaround.

    Error message when running with older CUDA: nvcc fatal : Value 'sm_86' is not defined for option 'gpu-architecture'

    opened by reflare 13
  • Resume from the latest pickle

    Resume from the latest pickle

    Hello,

    I know you don't accept pull requests. However:

    • this could be of interest to others who want to run the code on Google Colab,
    • this is the first place where they will look for such a change.

    I have added the ability to resume from the latest .pkl file with the command-line argument --resume=latest. The value of cur_nimg is inferred from the file name. I have yet to figure out how to automatically compute the relevant value of aug.strength to resume from.

    opened by woctezuma 10
  • how to create dataset with label

    how to create dataset with label

    If I want to finetune stylegan2-ada with conditional label how I prepare dataset ??

    the readem had only for cifar10 dataset with conditional but how to use for my own dataset??

    what is the format of data ?

    opened by Johnson-yue 9
  • No GPU devices found

    No GPU devices found

    Hi, when I run the following on a p2.xlarge deep learning ami in AWS using the command

    docker build --tag stylegan2ada:latest .
    
    docker run --gpus all -it --rm -v `pwd`:/scratch --user $(id -u):$(id -g) stylegan2ada:latest bash -c \
        "(cd /scratch && DNNLIB_CACHE_DIR=/scratch/.cache python3 generate.py --trunc=1 --seeds=85,265,297,849 \
        --outdir=out --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl)"
    

    i get this error.

    NVIDIA Release 20.10-tf1 (build 16775850)
    TensorFlow Version 1.15.4
    
    Container image Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
    Copyright 2017-2020 The TensorFlow Authors.  All rights reserved.
    
    NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
    
    Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
    NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
    ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
    ERROR: No supported GPU(s) detected to run this container
    
    NOTE: Legacy NVIDIA Driver detected.  Compatibility mode ENABLED.
    
    NOTE: MOFED driver for multi-node communication was not detected.
          Multi-node communication performance may be reduced.
    
    NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
       insufficient for TensorFlow.  NVIDIA recommends the use of the following flags:
       nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...
    
    2021-01-10 16:18:03.840894: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
    WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
    2021-01-10 16:18:08.234607: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300075000 Hz
    2021-01-10 16:18:08.236149: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x50ff110 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2021-01-10 16:18:08.236185: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    2021-01-10 16:18:08.241208: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
    2021-01-10 16:18:08.398652: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2021-01-10 16:18:08.399598: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5174bd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
    2021-01-10 16:18:08.399631: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
    2021-01-10 16:18:08.399902: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2021-01-10 16:18:08.400718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties: 
    name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
    pciBusID: 0000:00:1e.0
    2021-01-10 16:18:08.400787: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
    2021-01-10 16:18:08.437007: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
    2021-01-10 16:18:08.461765: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
    2021-01-10 16:18:08.469196: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
    2021-01-10 16:18:08.507750: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
    2021-01-10 16:18:08.516625: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
    2021-01-10 16:18:08.516919: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
    2021-01-10 16:18:08.517138: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2021-01-10 16:18:08.518088: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2021-01-10 16:18:08.518890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1747] Ignoring visible gpu device (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7) with Cuda compute capability 3.7. The minimum required Cuda capability is 5.2.
    2021-01-10 16:18:08.518934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
    2021-01-10 16:18:08.518951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212]      0 
    2021-01-10 16:18:08.518973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0:   N 
    Loading networks from "https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl"...
    Downloading https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl ... done
    Setting up TensorFlow plugin "fused_bias_act.cu": 2021-01-10 16:18:33.824257: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2021-01-10 16:18:33.825117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties: 
    name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
    pciBusID: 0000:00:1e.0
    2021-01-10 16:18:33.825170: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
    2021-01-10 16:18:33.825214: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
    2021-01-10 16:18:33.825252: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
    2021-01-10 16:18:33.825288: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
    2021-01-10 16:18:33.825324: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
    2021-01-10 16:18:33.825354: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
    2021-01-10 16:18:33.825392: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
    2021-01-10 16:18:33.825525: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2021-01-10 16:18:33.826419: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2021-01-10 16:18:33.827220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1747] Ignoring visible gpu device (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7) with Cuda compute capability 3.7. The minimum required Cuda capability is 5.2.
    2021-01-10 16:18:33.827261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
    2021-01-10 16:18:33.827278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212]      0 
    2021-01-10 16:18:33.827295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0:   N 
    Failed!
    Traceback (most recent call last):
      File "generate.py", line 121, in <module>
        main()
      File "generate.py", line 116, in main
        generate_images(**vars(args))
      File "generate.py", line 52, in generate_images
        noise_vars = [var for name, var in Gs.components.synthesis.vars.items() if name.startswith('noise')]
      File "/scratch/dnnlib/tflib/network.py", line 293, in vars
        return copy.copy(self._get_vars())
      File "/scratch/dnnlib/tflib/network.py", line 297, in _get_vars
        self._vars = OrderedDict(self._get_own_vars())
      File "/scratch/dnnlib/tflib/network.py", line 286, in _get_own_vars
        self._init_graph()
      File "/scratch/dnnlib/tflib/network.py", line 151, in _init_graph
        out_expr = self._build_func(*self._input_templates, **build_kwargs)
      File "<string>", line 431, in G_synthesis
      File "<string>", line 384, in layer
      File "<string>", line 97, in modulated_conv2d_layer
      File "<string>", line 42, in apply_bias_act
      File "/scratch/dnnlib/tflib/ops/fused_bias_act.py", line 72, in fused_bias_act
        return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain, clamp=clamp)
      File "/scratch/dnnlib/tflib/ops/fused_bias_act.py", line 132, in _fused_bias_act_cuda
        cuda_op = _get_plugin().fused_bias_act
      File "/scratch/dnnlib/tflib/ops/fused_bias_act.py", line 18, in _get_plugin
        return custom_ops.get_plugin(os.path.splitext(__file__)[0] + '.cu')
      File "/scratch/dnnlib/tflib/custom_ops.py", line 139, in get_plugin
        compile_opts += f' --gpu-architecture={_get_cuda_gpu_arch_string()}'
      File "/scratch/dnnlib/tflib/custom_ops.py", line 60, in _get_cuda_gpu_arch_string
        raise RuntimeError('No GPU devices found')
    RuntimeError: No GPU devices found
    

    which is unexpected since when I run a bash in the docker container

    docker run --gpus all -it --rm -v `pwd`:/scratch --user $(id -u):$(id -g) stylegan2ada:latest bash
    

    I get

    I have no name!@c6fb7621777c:/workspace$ nvidia-smi
    Sun Jan 10 16:22:15 2021       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.1     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
    | N/A   38C    P8    32W / 149W |      0MiB / 11441MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    I have no name!@c6fb7621777c:/workspace$ 
    

    and when I run nvcc in the docker container

    I have no name!@c6fb7621777c:/workspace$ nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2020 NVIDIA Corporation
    Built on Tue_Sep_15_19:10:02_PDT_2020
    Cuda compilation tools, release 11.1, V11.1.74
    Build cuda_11.1.TC455_06.29069683_0
    

    any suggestions?

    opened by matthewchung74 6
  • ffhq1024 fakes_init look bad

    ffhq1024 fakes_init look bad

    Upon running a straight, out-of-the-box train.py with --resume=ffhq1024, the "fakes_init.png" looks very weird: https://storage.googleapis.com/public-assets-xander/fakes_init.jpg

    opened by aiXander 6
  • NVCC error compiling fused_bias_act.cpp

    NVCC error compiling fused_bias_act.cpp

    I get an error when running python run_generator.py generate-images ....

    I solved some of the issues following this https://stackoverflow.com/questions/59342888/tensorflow-error-this-file-requires-compiler-and-library-support-for-the-iso-c#.

    Error:

    dnnlib: Running run_generator.generate_images() on localhost...
    Loading networks from "./networks/stylegan2-ffhq-config-f.pkl"...
    Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Compiling... Failed!
    Traceback (most recent call last):
      File "run_generator.py", line 168, in <module>
        main()
      File "run_generator.py", line 163, in main
        dnnlib.submit_run(sc, func_name_map[subcmd], **kwargs)
      File "/home/matjazibb/dev/stylegan2/dnnlib/submission/submit.py", line 343, in submit_run
        return farm.submit(submit_config, host_run_dir)
      File "/home/matjazibb/dev/stylegan2/dnnlib/submission/internal/local.py", line 22, in submit
        return run_wrapper(submit_config)
      File "/home/matjazibb/dev/stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper
        run_func_obj(**submit_config.run_func_kwargs)
      File "/home/matjazibb/dev/stylegan2/run_generator.py", line 21, in generate_images
        _G, _D, Gs = pretrained_networks.load_networks(network_pkl)
      File "/home/matjazibb/dev/stylegan2/pretrained_networks.py", line 76, in load_networks
        G, D, Gs = pickle.load(stream, encoding='latin1')
      File "/home/matjazibb/dev/stylegan2/dnnlib/tflib/network.py", line 297, in __setstate__
        self._init_graph()
      File "/home/matjazibb/dev/stylegan2/dnnlib/tflib/network.py", line 154, in _init_graph
        out_expr = self._build_func(*self.input_templates, **build_kwargs)
      File "<string>", line 491, in G_synthesis_stylegan2
      File "<string>", line 455, in layer
      File "<string>", line 99, in modulated_conv2d_layer
      File "<string>", line 68, in apply_bias_act
      File "/home/matjazibb/dev/stylegan2/dnnlib/tflib/ops/fused_bias_act.py", line 68, in fused_bias_act
        return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain)
      File "/home/matjazibb/dev/stylegan2/dnnlib/tflib/ops/fused_bias_act.py", line 122, in _fused_bias_act_cuda
        cuda_kernel = _get_plugin().fused_bias_act
      File "/home/matjazibb/dev/stylegan2/dnnlib/tflib/ops/fused_bias_act.py", line 16, in _get_plugin
        return custom_ops.get_plugin(os.path.splitext(__file__)[0] + '.cu')
      File "/home/matjazibb/dev/stylegan2/dnnlib/tflib/custom_ops.py", line 147, in get_plugin
        _run_cmd(nvcc_cmd + ' "%s" --shared -o "%s" --keep --keep-dir "%s"' % (cuda_file, tmp_file, tmp_dir))
      File "/home/matjazibb/dev/stylegan2/dnnlib/tflib/custom_ops.py", line 61, in _run_cmd
        raise RuntimeError('NVCC returned an error. See below for full command line and output log:\n\n%s\n\n%s' % (cmd, output))
    RuntimeError: NVCC returned an error. See below for full command line and output log:
    
    nvcc --std=c++11 -DNDEBUG "/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so" --compiler-options '-fPIC -D_GLIBCXX_USE_CXX11_ABI=1' --gpu-architecture=sm_52 --use_fast_math --disable-warnings --include-path "/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include" --include-path "/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/external/protobuf_archive/src" --include-path "/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/external/com_google_absl" --include-path "/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/external/eigen_archive" 2>&1 "/home/matjazibb/dev/stylegan2/dnnlib/tflib/ops/fused_bias_act.cu" --shared -o "/tmp/tmpn331x8yd/fused_bias_act_tmp.so" --keep --keep-dir "/tmp/tmpn331x8yd"
    
    /usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(36): error: identifier "__builtin_ia32_monitorx" is undefined
    
    /usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(42): error: identifier "__builtin_ia32_mwaitx" is undefined
    
    /home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/absl/strings/str_cat.h(268): error: expression must have a constant value
    
    /home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/absl/strings/str_cat.h(268): error: expression must have a constant value
    
    /home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/absl/memory/memory.h(616): error: class "std::allocator<tensorflow::OpKernelContext::WrappedAllocator>" has no member "is_nothrow"
              detected during:
                instantiation of type "absl::memory_internal::GetIsNothrow<std::allocator<tensorflow::OpKernelContext::WrappedAllocator>>" 
    (264): here
                instantiation of type "absl::memory_internal::ExtractOrT<absl::memory_internal::GetIsNothrow, std::allocator<tensorflow::OpKernelContext::WrappedAllocator>, std::false_type>" 
    (642): here
                instantiation of class "absl::allocator_is_nothrow<Alloc> [with Alloc=std::allocator<tensorflow::OpKernelContext::WrappedAllocator>]" 
    /home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/absl/container/inlined_vector.h(190): here
                instantiation of "absl::InlinedVector<T, N, A>::InlinedVector(absl::InlinedVector<T, N, A> &&) [with T=tensorflow::OpKernelContext::WrappedAllocator, N=4UL, A=std::allocator<tensorflow::OpKernelContext::WrappedAllocator>]" 
    /home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/framework/op_kernel.h(1081): here
    
    5 errors detected in the compilation of "/tmp/tmpn331x8yd/fused_bias_act.cpp1.ii".
    

    OS info:

    Distributor ID: Ubuntu
    Description:    Ubuntu 16.04.6 LTS
    Release:        16.04
    Codename:       xenial
    

    GPU info:

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 455.45.01    Driver Version: 455.45.01    CUDA Version: 11.1     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 980 Ti  Off  | 00000000:01:00.0 Off |                  N/A |
    | 20%   40C    P0    55W / 260W |      0MiB /  6082MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    

    nvcc:

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2015 NVIDIA Corporation
    Built on Tue_Aug_11_14:27:32_CDT_2015
    Cuda compilation tools, release 7.5, V7.5.17
    

    g++:

    g++ --version
    g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
    Copyright (C) 2015 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    

    Conda env:

    _libgcc_mutex             0.1                        main  
    _tflow_select             2.1.0                       gpu  
    absl-py                   0.11.0             pyhd3eb1b0_1  
    astor                     0.8.1                    py36_0  
    blas                      1.0                         mkl  
    brotlipy                  0.7.0           py36h27cfd23_1003  
    c-ares                    1.17.1               h27cfd23_0  
    ca-certificates           2020.12.8            h06a4308_0  
    certifi                   2020.12.5        py36h06a4308_0  
    cffi                      1.14.4           py36h261ae71_0  
    chardet                   4.0.0           py36h06a4308_1003  
    cryptography              3.3.1            py36h3c74f83_0  
    cudatoolkit               10.1.243             h6bb024c_0  
    cudnn                     7.6.5                cuda10.1_0  
    cupti                     10.1.168                      0  
    freetype                  2.10.4               h5ab3b9f_0  
    gast                      0.4.0                      py_0  
    google-pasta              0.2.0                      py_0  
    grpcio                    1.31.0           py36hf8bcb03_0  
    h5py                      2.10.0           py36hd6299e0_1  
    hdf5                      1.10.6               hb1b8bf9_0  
    idna                      2.10                       py_0  
    importlib-metadata        2.0.0                      py_1  
    intel-openmp              2020.2                      254  
    jpeg                      9b                   h024ee3a_2  
    keras-applications        1.0.8                      py_1  
    keras-preprocessing       1.1.0                      py_1  
    lcms2                     2.11                 h396b838_0  
    ld_impl_linux-64          2.33.1               h53a641e_7  
    libedit                   3.1.20191231         h14c3975_1  
    libffi                    3.3                  he6710b0_2  
    libgcc-ng                 9.1.0                hdf63c60_0  
    libgfortran-ng            7.3.0                hdf63c60_0  
    libpng                    1.6.37               hbc83047_0  
    libprotobuf               3.13.0.1             hd408876_0  
    libstdcxx-ng              9.1.0                hdf63c60_0  
    libtiff                   4.1.0                h2733197_1  
    lz4-c                     1.9.2                heb0550a_3  
    markdown                  3.3.3            py36h06a4308_0  
    mkl                       2020.2                      256  
    mkl-service               2.3.0            py36he8ac12f_0  
    mkl_fft                   1.2.0            py36h23d657b_0  
    mkl_random                1.1.1            py36h0573a6f_0  
    ncurses                   6.2                  he6710b0_1  
    numpy                     1.19.2           py36h54aff64_0  
    numpy-base                1.19.2           py36hfa32c7d_0  
    olefile                   0.46                     py36_0  
    openssl                   1.1.1i               h27cfd23_0  
    pillow                    8.1.0            py36he98fc37_0  
    pip                       20.3.3           py36h06a4308_0  
    protobuf                  3.13.0.1         py36he6710b0_1  
    pycparser                 2.20                       py_2  
    pyopenssl                 20.0.1             pyhd3eb1b0_1  
    pysocks                   1.7.1            py36h06a4308_0  
    python                    3.6.12               hcff3b4d_2  
    readline                  8.0                  h7b6447c_0  
    requests                  2.25.1             pyhd3eb1b0_0  
    scipy                     1.5.2            py36h0b6359f_0  
    setuptools                51.1.2           py36h06a4308_4  
    six                       1.15.0           py36h06a4308_0  
    sqlite                    3.33.0               h62c20be_0  
    tensorboard               1.14.0           py36hf484d3e_0  
    tensorflow                1.14.0          gpu_py36h3fb9ad6_0  
    tensorflow-base           1.14.0          gpu_py36he45bfe2_0  
    tensorflow-estimator      1.14.0                     py_0  
    tensorflow-gpu            1.14.0               h0d30ee6_0  
    termcolor                 1.1.0                    py36_1  
    tk                        8.6.10               hbc83047_0  
    urllib3                   1.26.2             pyhd3eb1b0_0  
    werkzeug                  1.0.1                      py_0  
    wheel                     0.36.2             pyhd3eb1b0_0  
    wrapt                     1.12.1           py36h7b6447c_1  
    xz                        5.2.5                h7b6447c_0  
    zipp                      3.4.0              pyhd3eb1b0_0  
    zlib                      1.2.11               h7b6447c_3  
    zstd                      1.4.5                h9ceee32_0
    
    opened by arruw 5
  • Can't see the GPU on Colab

    Can't see the GPU on Colab

    Hi, I try to run the code in Google Colab with GPU. However it fails to load the pertained network. It can't find the GPU.

    the code I ran is below as well as the error message. Thanks for the advice !python generate.py --outdir=out --trunc=1 --seeds=0-35 --class=1
    --network='https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/cifar10.pkl'

    File "/content/stylegan2-ada/dnnlib/tflib/ops/fused_bias_act.py", line 72, in fused_bias_act return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain, clamp=clamp) File "/content/stylegan2-ada/dnnlib/tflib/ops/fused_bias_act.py", line 132, in _fused_bias_act_cuda cuda_op = _get_plugin().fused_bias_act File "/content/stylegan2-ada/dnnlib/tflib/ops/fused_bias_act.py", line 18, in _get_plugin return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu') File "/content/stylegan2-ada/dnnlib/tflib/custom_ops.py", line 139, in get_plugin compile_opts += f' --gpu-architecture={_get_cuda_gpu_arch_string()}' File "/content/stylegan2-ada/dnnlib/tflib/custom_ops.py", line 60, in _get_cuda_gpu_arch_string raise RuntimeError('No GPU devices found') RuntimeError: No GPU devices found

    opened by baymak 5
  • Continue training

    Continue training

    Hello,

    My training stopped after 7000 kimg and is it possible to continue training from the last kimg(where the training stopped) instead of starting the training again from 0 kimg?

    Thanks

    opened by vivekbharadhwajsa 4
  • crazy slowdown in processing time in stylegan2-ada/train.py --resume on Colab

    crazy slowdown in processing time in stylegan2-ada/train.py --resume on Colab

    I understand this is probably not a stylegan2-ada code issue but no idea where else to turn. Trying to train my first custom model. Dataset 2500 1024x1024 images. Colab Pro Tesla P100-PCIE plenty of RAM, plenty of disk. Ran successfully for 2 days with one restart after colab disconnect. to restart i used resume to pick up where i left off with last .pkl file from previous run. For those 2 runs the I was getting timings consistently around sec/tick 698.6 sec/kimg 174.65 maintenance 774.5 see attached log-goodResume.txt

    After second Colab disconnect (3rd day) i repeated the same --resume pattern with last created .pkl This time and 2 subsequent attempts the performance is ridiculously slow from the outset sec/tick 1941.7 sec/kimg 485.43 maintenance 2219.6 see attached log-badResume.txt

    log-goodResume.txt log-badResume.txt

    The only procedural change i made between the first --resume which worked well and the later resumes which are unworkably slow is that in the latter runs i moved and renamed the pkl file that i want to resume to be.. more convenient located and named. i.e i moved network-snapshot-000320.pkl up a folder and renamed it latest.pkl so that my subsequent resumes would not require changing my train.py resume clause in the command.. just a convenience pattern which i assume should have no bearing on the slowdown

    Any advice on how to resovlve this would be appreciated. i am kind of dead in the water now.

    opened by MoemaMike 4
  • Is tf.nn.depthwise_conv2d_backprop_input equal to tf.nn.conv2d_transpose for implementing Upsample2D?

    Is tf.nn.depthwise_conv2d_backprop_input equal to tf.nn.conv2d_transpose for implementing Upsample2D?

    Hi, I am trying to adapt the codes to accommodate my project which is based on PyTorch. I have trouble with transferring the Upsample2D with a specific kernel (e.g. https://github.com/NVlabs/stylegan2-ada/blob/main/training/augment.py#L425) since pytorch does not have a corresponding API for tf.nn.depthwise_conv2d_backprop_input. I wonder if this operation is the same as tf.nn.conv2d_transpose so that I can safely replace it with torch.nn.funtional.conv_transpose2d. It would be very helpful if you can comment on it. Thanks a lot!

    opened by yangyu12 4
  • FFHQ download script broken

    FFHQ download script broken

    File "download_ffhq.py", line 84, in download_file
        raise IOError('Incorrect file size', file_path)
    OSError: [Errno Incorrect file size] ffhq-dataset-v2.json
    
    opened by kyleliang919 1
  • Model conversion to tf.keras.Model (external application)

    Model conversion to tf.keras.Model (external application)

    Hello @tkarras, @nurpax,

    I was wondering whether it is possible to extract a single network component, e.g. the discriminator architecture to a tf.keras.Model. Or am I bound to the Network wrapper class?

    I would be happy if you could provide some insights here.

    Kind regards, Nikolai

    opened by Nikolai10 0
  • NotImplementedError: Cannot convert a symbolic Tensor (Train_gpu0/Loss_R1/gradients/Train_gpu0/Augment_1/transform/ImageProjectiveTransformV2_grad/flat_transforms_to_matrices/strided_slice:0) to a numpy array.

    NotImplementedError: Cannot convert a symbolic Tensor (Train_gpu0/Loss_R1/gradients/Train_gpu0/Augment_1/transform/ImageProjectiveTransformV2_grad/flat_transforms_to_matrices/strided_slice:0) to a numpy array.

    Hi,

    I am trying to train a GAN but this issue occurs all the time. I do not know if it is a bug or if I am making something wrong. I am using tensorflow 1.x on google colab.

    The last line it is executing is: !python train.py --outdir='/content/drive/MyDrive/stylegan2-ada/training-runs' --gpus=1 --data='/content/drive/MyDrive/stylegan2-ada/datasets/{dataset_name}' I tried it with some other training configurations but the same error occurred every time

    Here is the output of the program:

    /content/drive/MyDrive/stylegan2-ada tcmalloc: large alloc 4294967296 bytes == 0x6ec6000 @ 0x7f71755b4001 0x7f71727db1af 0x7f7172831c23 0x7f7172832a87 0x7f71728d4823 0x5936cc 0x548c51 0x5127f1 0x549e0e 0x4bca8a 0x532b86 0x594a96 0x548cc1 0x5127f1 0x549576 0x4bca8a 0x5134a6 0x549576 0x4bca8a 0x5134a6 0x549e0e 0x4bca8a 0x5134a6 0x593dd7 0x5118f8 0x549576 0x604173 0x5f5506 0x5f8c6c 0x5f9206 0x64faf2 tcmalloc: large alloc 4294967296 bytes == 0x7f6fa9dc6000 @ 0x7f71755b21e7 0x7f71727db0ce 0x7f7172831cf5 0x7f7172831f4f 0x7f71728d4673 0x5936cc 0x548c51 0x5127f1 0x549576 0x593fce 0x548ae9 0x5127f1 0x549576 0x593fce 0x548ae9 0x5127f1 0x549576 0x593fce 0x548ae9 0x5127f1 0x593dd7 0x5118f8 0x549576 0x593fce 0x548ae9 0x51566f 0x549576 0x593fce 0x548ae9 0x5127f1 0x549e0e tcmalloc: large alloc 4294967296 bytes == 0x7f6ea8dc4000 @ 0x7f71755b21e7 0x7f71727db0ce 0x7f7172831cf5 0x7f7172831f4f 0x7f7135e07235 0x7f713578a792 0x7f713578ad42 0x7f7135743aee 0x59371f 0x548c51 0x51566f 0x593dd7 0x511e2c 0x549e0e 0x4bcb19 0x5134a6 0x549576 0x593fce 0x511e2c 0x549e0e 0x593fce 0x511e2c 0x593dd7 0x511e2c 0x549576 0x4bcb19 0x59c019 0x595ef6 0x5134a6 0x549576 0x593fce

    Training options: { "G_args": { "func_name": "training.networks.G_main", "fmap_base": 8192, "fmap_max": 512, "mapping_layers": 2, "num_fp16_res": 4, "conv_clamp": 256 }, "D_args": { "func_name": "training.networks.D_main", "mbstd_group_size": 4, "fmap_base": 8192, "fmap_max": 512, "num_fp16_res": 4, "conv_clamp": 256 }, "G_opt_args": { "beta1": 0.0, "beta2": 0.99, "learning_rate": 0.0025 }, "D_opt_args": { "beta1": 0.0, "beta2": 0.99, "learning_rate": 0.0025 }, "loss_args": { "func_name": "training.loss.stylegan2", "r1_gamma": 0.8192 }, "augment_args": { "class_name": "training.augment.AdaptiveAugment", "tune_heuristic": "rt", "tune_target": 0.6, "apply_func": "training.augment.augment_pipeline", "apply_args": { "xflip": 1, "rotate90": 1, "xint": 1, "scale": 1, "rotate": 1, "aniso": 1, "xfrac": 1, "brightness": 1, "contrast": 1, "lumaflip": 1, "hue": 1, "saturation": 1 } }, "num_gpus": 1, "image_snapshot_ticks": 50, "network_snapshot_ticks": 50, "train_dataset_args": { "path": "/content/drive/MyDrive/stylegan2-ada/datasets/Pferde", "max_label_size": 0, "resolution": 256, "mirror_augment": false }, "metric_arg_list": [ { "name": "fid50k_full", "class_name": "metrics.frechet_inception_distance.FID", "max_reals": null, "num_fakes": 50000, "minibatch_per_gpu": 8, "force_dataset_args": { "shuffle": false, "max_images": null, "repeat": false, "mirror_augment": false } } > ], "metric_dataset_args": { "path": "/content/drive/MyDrive/stylegan2-ada/datasets/Pferde", "max_label_size": 0, "resolution": 256, "mirror_augment": false }, "total_kimg": 25000, "minibatch_size": 16, "minibatch_gpu": 16, "G_smoothing_kimg": 5.0, "G_smoothing_rampup": 0.05, "run_dir": "/content/drive/MyDrive/stylegan2-ada/training-runs/00001-Pferde-auto1" }

    Output directory: /content/drive/MyDrive/stylegan2-ada/training-runs/00001-Pferde-auto1 Training data: /content/drive/MyDrive/stylegan2-ada/datasets/Pferde Training length: 25000 kimg Resolution: 256 Number of GPUs: 1

    Creating output directory... Loading training set... tcmalloc: large alloc 4294967296 bytes == 0x6ec6000 @ 0x7f71755b4001 0x7f71727db1af 0x7f7172831c23 0x7f7172832a87 0x7f71728d4823 0x5936cc 0x548c51 0x5127f1 0x549e0e 0x4bca8a 0x532b86 0x594a96 0x548cc1 0x5127f1 0x549576 0x4bca8a 0x5134a6 0x549576 0x4bca8a 0x5134a6 0x549e0e 0x4bca8a 0x5134a6 0x593dd7 0x5118f8 0x549576 0x604173 0x5f5506 0x5f8c6c 0x5f9206 0x64faf2 tcmalloc: large alloc 4294967296 bytes == 0x7f6d9d770000 @ 0x7f71755b21e7 0x7f71727db0ce 0x7f7172831cf5 0x7f7172831f4f 0x7f71728d4673 0x5936cc 0x548c51 0x5127f1 0x549576 0x593fce 0x548ae9 0x5127f1 0x549576 0x593fce 0x548ae9 0x5127f1 0x549576 0x593fce 0x548ae9 0x5127f1 0x593dd7 0x5118f8 0x549576 0x593fce 0x548ae9 0x51566f 0x549576 0x593fce 0x548ae9 0x5127f1 0x549e0e tcmalloc: large alloc 4294967296 bytes == 0x7f6d9d770000 @ 0x7f71755b21e7 0x7f71727db0ce 0x7f7172831cf5 0x7f7172831f4f 0x7f7135e07235 0x7f713578a792 0x7f713578ad42 0x7f7135743aee 0x59371f 0x548c51 0x51566f 0x593dd7 0x511e2c 0x549e0e 0x4bcb19 0x5134a6 0x549576 0x593fce 0x511e2c 0x549e0e 0x593fce 0x511e2c 0x593dd7 0x511e2c 0x549576 0x4bcb19 0x59c019 0x595ef6 0x5134a6 0x549576 0x593fce Image shape: [3, 256, 256] Label shape: [0]

    Constructing networks... Setting up TensorFlow plugin "fused_bias_act.cu": Loading... Done. Setting up TensorFlow plugin "upfirdn_2d.cu": Loading... Done.

    G Params OutputShape WeightShape


    latents_in - (?, 512) -
    labels_in - (?, 0) -
    G_mapping/Normalize - (?, 512) -
    G_mapping/Dense0 262656 (?, 512) (512, 512)
    G_mapping/Dense1 262656 (?, 512) (512, 512)
    G_mapping/Broadcast - (?, 14, 512) -
    dlatent_avg - (512,) -
    Truncation/Lerp - (?, 14, 512) -
    G_synthesis/4x4/Const 8192 (?, 512, 4, 4) (1, 512, 4, 4)
    G_synthesis/4x4/Conv 2622465 (?, 512, 4, 4) (3, 3, 512, 512) G_synthesis/4x4/ToRGB 264195 (?, 3, 4, 4) (1, 1, 512, 3)
    G_synthesis/8x8/Conv0_up 2622465 (?, 512, 8, 8) (3, 3, 512, 512) G_synthesis/8x8/Conv1 2622465 (?, 512, 8, 8) (3, 3, 512, 512) G_synthesis/8x8/Upsample - (?, 3, 8, 8) -
    G_synthesis/8x8/ToRGB 264195 (?, 3, 8, 8) (1, 1, 512, 3)
    G_synthesis/16x16/Conv0_up 2622465 (?, 512, 16, 16) (3, 3, 512, 512) G_synthesis/16x16/Conv1 2622465 (?, 512, 16, 16) (3, 3, 512, 512) G_synthesis/16x16/Upsample - (?, 3, 16, 16) -
    G_synthesis/16x16/ToRGB 264195 (?, 3, 16, 16) (1, 1, 512, 3)
    G_synthesis/32x32/Conv0_up 2622465 (?, 512, 32, 32) (3, 3, 512, 512) G_synthesis/32x32/Conv1 2622465 (?, 512, 32, 32) (3, 3, 512, 512) G_synthesis/32x32/Upsample - (?, 3, 32, 32) -
    G_synthesis/32x32/ToRGB 264195 (?, 3, 32, 32) (1, 1, 512, 3)
    G_synthesis/64x64/Conv0_up 1442561 (?, 256, 64, 64) (3, 3, 512, 256) G_synthesis/64x64/Conv1 721409 (?, 256, 64, 64) (3, 3, 256, 256) G_synthesis/64x64/Upsample - (?, 3, 64, 64) -
    G_synthesis/64x64/ToRGB 132099 (?, 3, 64, 64) (1, 1, 256, 3)
    G_synthesis/128x128/Conv0_up 426369 (?, 128, 128, 128) (3, 3, 256, 128) G_synthesis/128x128/Conv1 213249 (?, 128, 128, 128) (3, 3, 128, 128) G_synthesis/128x128/Upsample - (?, 3, 128, 128) -
    G_synthesis/128x128/ToRGB 66051 (?, 3, 128, 128) (1, 1, 128, 3)
    G_synthesis/256x256/Conv0_up 139457 (?, 64, 256, 256) (3, 3, 128, 64) G_synthesis/256x256/Conv1 69761 (?, 64, 256, 256) (3, 3, 64, 64)
    G_synthesis/256x256/Upsample - (?, 3, 256, 256) -
    G_synthesis/256x256/ToRGB 33027 (?, 3, 256, 256) (1, 1, 64, 3)


    Total 23191522

    D Params OutputShape WeightShape


    images_in - (?, 3, 256, 256) -
    labels_in - (?, 0) -
    256x256/FromRGB 256 (?, 64, 256, 256) (1, 1, 3, 64)
    256x256/Conv0 36928 (?, 64, 256, 256) (3, 3, 64, 64)
    256x256/Conv1_down 73856 (?, 128, 128, 128) (3, 3, 64, 128) 256x256/Skip 8192 (?, 128, 128, 128) (1, 1, 64, 128) 128x128/Conv0 147584 (?, 128, 128, 128) (3, 3, 128, 128) 128x128/Conv1_down 295168 (?, 256, 64, 64) (3, 3, 128, 256) 128x128/Skip 32768 (?, 256, 64, 64) (1, 1, 128, 256) 64x64/Conv0 590080 (?, 256, 64, 64) (3, 3, 256, 256) 64x64/Conv1_down 1180160 (?, 512, 32, 32) (3, 3, 256, 512) 64x64/Skip 131072 (?, 512, 32, 32) (1, 1, 256, 512) 32x32/Conv0 2359808 (?, 512, 32, 32) (3, 3, 512, 512) 32x32/Conv1_down 2359808 (?, 512, 16, 16) (3, 3, 512, 512) 32x32/Skip 262144 (?, 512, 16, 16) (1, 1, 512, 512) 16x16/Conv0 2359808 (?, 512, 16, 16) (3, 3, 512, 512) 16x16/Conv1_down 2359808 (?, 512, 8, 8) (3, 3, 512, 512) 16x16/Skip 262144 (?, 512, 8, 8) (1, 1, 512, 512) 8x8/Conv0 2359808 (?, 512, 8, 8) (3, 3, 512, 512) 8x8/Conv1_down 2359808 (?, 512, 4, 4) (3, 3, 512, 512) 8x8/Skip 262144 (?, 512, 4, 4) (1, 1, 512, 512) 4x4/MinibatchStddev - (?, 513, 4, 4) -
    4x4/Conv 2364416 (?, 512, 4, 4) (3, 3, 513, 512) 4x4/Dense0 4194816 (?, 512) (8192, 512)
    Output 513 (?, 1) (512, 1)


    Total 24001089

    Exporting sample images... Replicating networks across 1 GPUs... Initializing augmentations... Setting up optimizers... Constructing training graph... Traceback (most recent call last): File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py", line 2380, in get_attr c_api.TF_OperationGetAttrValueProto(self._c_op, name, buf) tensorflow.python.framework.errors_impl.InvalidArgumentError: Operation 'Train_gpu0/Augment_1/transform/ImageProjectiveTransformV2' has no attr named '_XlaCompile'.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/gradients_util.py", line 345, in _MaybeCompile xla_compile = op.get_attr("_XlaCompile") File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py", line 2384, in get_attr raise ValueError(str(e)) ValueError: Operation 'Train_gpu0/Augment_1/transform/ImageProjectiveTransformV2' has no attr named '_XlaCompile'.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "train.py", line 561, in main() File "train.py", line 553, in main run_training(**vars(args)) File "train.py", line 451, in run_training training_loop.training_loop(**training_options) File "/content/drive/MyDrive/stylegan2-ada/training/training_loop.py", line 187, in training_loop terms = dnnlib.util.call_func_by_name(G=G_gpu, D=D_gpu, aug=aug, fake_labels=fake_labels, real_images=real_images_var, real_labels=real_labels_var, **loss_args) File "/content/drive/MyDrive/stylegan2-ada/dnnlib/util.py", line 281, in call_func_by_name return func_obj(*args, **kwargs) File "/content/drive/MyDrive/stylegan2-ada/training/loss.py", line 110, in stylegan2 r1_grads = tf.gradients(tf.reduce_sum(D_real.scores), [real_images])[0] File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/gradients_impl.py", line 158, in gradients unconnected_gradients) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/gradients_util.py", line 679, in _GradientsHelper lambda: grad_fn(op, *out_grads)) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/gradients_util.py", line 350, in _MaybeCompile return grad_fn() # Exit early File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/gradients_util.py", line 679, in lambda: grad_fn(op, *out_grads)) File "/tensorflow-1.15.2/python3.7/tensorflow_core/contrib/image/python/ops/image_ops.py", line 420, in _image_projective_transform_grad transforms = flat_transforms_to_matrices(transforms=transforms) File "/tensorflow-1.15.2/python3.7/tensorflow_core/contrib/image/python/ops/image_ops.py", line 362, in flat_transforms_to_matrices [transforms, array_ops.ones([num_transforms, 1])], axis=1), File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py", line 2560, in ones output = _constant_if_small(one, shape, dtype, name) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py", line 2295, in _constant_if_small if np.prod(shape) < 1000: File "<array_function internals>", line 6, in prod File "/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py", line 3052, in prod keepdims=keepdims, initial=initial, where=where) File "/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py", line 736, in array " array.".format(self.name)) NotImplementedError: Cannot convert a symbolic Tensor (Train_gpu0/Loss_R1/gradients/Train_gpu0/Augment_1/transform/ImageProjectiveTransformV2_grad/flat_transforms_to_matrices/strided_slice:0) to a numpy array.

    Kind regards!

    opened by EichhoernchenKathy 0
  • Fix a bug that leads to

    Fix a bug that leads to "ValueError: axes don't match array" in dataset_tool.py

    This is a patch to the Issue #110

    Here's a fix for dataset_tool.py that fixes ValueError: axes don't match array, images will work one run and not work the next error.

    My debugging of the issue was as follows:

    • I first thought that maybe some of the images I scraped were grayscale. To solve this I used imagemagick and tried to run magick identify *.jpgto search for greyscale images to purge from the dataset but the issue still persisted in my case.
    • I tried to mass-edit the colorspace of the images and resolution. This still hasn't fixed the error which I was randomly getting on some of the images.
    • I still can't debug the exact origin of the issue such as a specific colorspace causing dataset_tool.py to crash.

    I propose to use PIL to convert the image to RGB in any case. It should be able to work fine if the images are not sRGB, including the case when they are grayscale. It will most likely slows down the dataset preprocessing step a little bit (I haven't run any benchmarks). However, it is convenient if the training data is coming from varied sources, which I believe is the case for my users.

    opened by lowlypalace 0
  • ValueError: axes don't match array, images will work one run and not work the next

    ValueError: axes don't match array, images will work one run and not work the next

    I'm working on training a GAN through with a dataset of photos I scraped from Bing Image Search API and converted to 1024x1024, but keep getting this error when creating the tfrecords:

    Traceback (most recent call last):
      File "dataset_tool.py", line 1249, in <module>
        execute_cmdline(sys.argv)
      File "dataset_tool.py", line 1244, in execute_cmdline
        func(**vars(args))
      File "dataset_tool.py", line 714, in create_from_images
        img = img.transpose([2, 0, 1]) # HWC => CHW
    ValueError: axes don't match array
    

    I then printed out what image files it would stuck on, and began taking those out of the dataset. But what it stalls on seems completely random. Anyone experienced similar issue?

    opened by lowlypalace 0
Owner
NVIDIA Research Projects
NVIDIA Research Projects
StyleGAN2-ADA - Official PyTorch implementation

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes.

NVIDIA Research Projects 3.2k Dec 30, 2022
StyleGAN2-ADA - Official PyTorch implementation

Need Help? If you’re new to StyleGAN2-ADA and looking to get started, please check out this video series from a course Lia Coleman and I taught in Oct

Derrick Schultz 217 Jan 4, 2023
StyleGAN2-ada for practice

This version of the newest PyTorch-based StyleGAN2-ada is intended mostly for fellow artists, who rarely look at scientific metrics, but rather need a working creative tool. Tested on Python 3.7 + PyTorch 1.7.1, requires FFMPEG for sequence-to-video conversions. For more explicit details refer to the original implementations.

vadim epstein 170 Nov 16, 2022
A colab notebook for training Stylegan2-ada on colab, transfer learning onto your own dataset.

Stylegan2-Ada-Google-Colab-Starter-Notebook A no thrills colab notebook for training Stylegan2-ada on colab. transfer learning onto your own dataset h

Harnick Khera 66 Dec 16, 2022
Cartoon-StyleGan2 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Fine-tuning StyleGAN2 for Cartoon Face Generation

Jihye Back 520 Jan 4, 2023
An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

null 45 Dec 8, 2022
StyleGAN2 - Official TensorFlow Implementation

StyleGAN2 - Official TensorFlow Implementation

NVIDIA Research Projects 10.1k Dec 28, 2022
Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

RecycleD Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN

Yunan Zhu 23 Nov 5, 2022
Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Face Identity Disentanglement via Latent Space Mapping - Implement in pytorch with StyleGAN 2 Description Pytorch implementation of the paper Face Ide

Daniel Roich 58 Dec 24, 2022
Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)

Training GANs with Stronger Augmentations via Contrastive Discriminator (ICLR 2021) This repository contains the code for reproducing the paper: Train

Jongheon Jeong 174 Dec 29, 2022
Multi-scale discriminator feature-wise loss function

Multi-Scale Discriminative Feature Loss This repository provides code for Multi-Scale Discriminative Feature (MDF) loss for image reconstruction algor

Graphics and Displays group - University of Cambridge 76 Dec 12, 2022
The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

Dinghan Shen 49 Dec 22, 2022
Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req

Rowel Atienza 152 Dec 28, 2022
[WWW 2021] Source code for "Graph Contrastive Learning with Adaptive Augmentation"

GCA Source code for Graph Contrastive Learning with Adaptive Augmentation (WWW 2021) For example, to run GCA-Degree under WikiCS, execute: python trai

Big Data and Multi-modal Computing Group, CRIPAC 97 Jan 7, 2023
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021) This repository is for BAAF-Net introduce

null 90 Dec 29, 2022
[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data (NeurIPS 2021) This repository provides the official PyTorch implementation

Liming Jiang 155 Nov 30, 2021
Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection Main requirements torch >= 1.0 torchvision >= 0.2.0 Python 3 Environm

null 15 Apr 4, 2022
Navigating StyleGAN2 w latent space using CLIP

Navigating StyleGAN2 w latent space using CLIP an attempt to build sth with the official SG2-ADA Pytorch impl kinda inspired by Generating Images from

Mike K. 55 Dec 6, 2022
StyleGAN2 Webtoon / Anime Style Toonify

StyleGAN2 Webtoon / Anime Style Toonify Korea Webtoon or Japanese Anime Character Stylegan2 base high Quality 1024x1024 / 512x512 Generate and Transfe

null 121 Dec 21, 2022