StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation

NVIDIA Research Projects

Last update: Dec 29, 2022

Related tags

Deep Learning stylegan2-ada

Overview

StyleGAN2 with adaptive discriminator augmentation (ADA)
— Official TensorFlow implementation

Training Generative Adversarial Networks with Limited Data
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, Timo Aila
https://arxiv.org/abs/2006.06676

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch and when fine-tuning an existing GAN on another dataset. We demonstrate, on several datasets, that good results are now possible using only a few thousand training images, often matching StyleGAN2 results with an order of magnitude fewer images. We expect this to open up new application domains for GANs. We also find that the widely used CIFAR-10 is, in fact, a limited data benchmark, and improve the record FID from 5.59 to 2.42.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing

Looking for the PyTorch version?

The Official PyTorch version is now available and supersedes the TensorFlow version. See the full list of versions here.

What's new

This repository supersedes the original StyleGAN2 with the following new features:

ADA: Significantly better results for datasets with less than ~30k training images. State-of-the-art results for CIFAR-10.
Mixed-precision support: ~1.6x faster training, ~1.3x faster inference, ~1.5x lower GPU memory consumption.
Better hyperparameter defaults: Reasonable out-of-the-box results for different dataset resolutions and GPU counts.
Clean codebase: Extensive refactoring and simplification. The code should be generally easier to work with.
Command line tools: Easily reproduce training runs from the paper, generate projection videos for arbitrary images, etc.
Network import: Full support for network pickles produced by StyleGAN and StyleGAN2. Faster loading times.
Augmentation pipeline: Self-contained, reusable GPU implementation of extensive high-quality image augmentations.
Bugfixes

External data repository

Path	Description
stylegan2-ada	Main directory hosted on Amazon S3
├ ada-paper.pdf	Paper PDF
├ images	Curated example images produced using the pre-trained models
├ videos	Curated example interpolation videos
└ pretrained	Pre-trained models
├ metfaces.pkl	MetFaces at 1024x1024, transfer learning from FFHQ using ADA
├ brecahad.pkl	BreCaHAD at 512x512, trained from scratch using ADA
├ afhqcat.pkl	AFHQ Cat at 512x512, trained from scratch using ADA
├ afhqdog.pkl	AFHQ Dog at 512x512, trained from scratch using ADA
├ afhqwild.pkl	AFHQ Wild at 512x512, trained from scratch using ADA
├ cifar10.pkl	Class-conditional CIFAR-10 at 32x32
├ ffhq.pkl	FFHQ at 1024x1024, trained using original StyleGAN2
├ paper-fig7c-training-set-sweeps	All models used in Fig.7c (baseline, ADA, bCR)
├ paper-fig8a-comparison-methods	All models used in Fig.8a (comparison methods)
├ paper-fig8b-discriminator-capacity	All models used in Fig.8b (discriminator capacity)
├ paper-fig11a-small-datasets	All models used in Fig.11a (small datasets, transfer learning)
├ paper-fig11b-cifar10	All models used in Fig.11b (CIFAR-10)
├ transfer-learning-source-nets	Models used as starting point for transfer learning
└ metrics	Feature detectors used by the quality metrics

Requirements

Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
64-bit Python 3.6 or 3.7. We recommend Anaconda3 with numpy 1.14.3 or newer.
We recommend TensorFlow 1.14, which we used for all experiments in the paper, but TensorFlow 1.15 is also supported on Linux. TensorFlow 2.x is not supported.
On Windows you need to use TensorFlow 1.14, as the standard 1.15 installation does not include necessary C++ headers.
1–8 high-end NVIDIA GPUs with at least 12 GB of GPU memory, NVIDIA drivers, CUDA 10.0 toolkit and cuDNN 7.5.
Docker users: use the provided Dockerfile to build an image with the required library dependencies.

The generator and discriminator networks rely heavily on custom TensorFlow ops that are compiled on the fly using NVCC. On Windows, the compilation requires Microsoft Visual Studio to be in PATH. We recommend installing Visual Studio Community Edition and adding it into PATH using "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat".

Getting started

Pre-trained networks are stored as *.pkl files that can be referenced using local filenames or URLs:

# Generate curated MetFaces images without truncation (Fig.10 left)
python generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl

# Generate uncurated MetFaces images with truncation (Fig.12 upper left)
python generate.py --outdir=out --trunc=0.7 --seeds=600-605 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl

# Generate class conditional CIFAR-10 images (Fig.17 left, Car)
python generate.py --outdir=out --trunc=1 --seeds=0-35 --class=1 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/cifar10.pkl

Outputs from the above commands are placed under out/*.png. You can change the location with --outdir. Temporary cache files, such as CUDA build results and downloaded network pickles, will be saved under $HOME/.cache/dnnlib. This can be overridden using the DNNLIB_CACHE_DIR environment variable.

Docker: You can run the above curated image example using Docker as follows:

docker build --tag stylegan2ada:latest .
docker run --gpus all -it --rm -v `pwd`:/scratch --user $(id -u):$(id -g) stylegan2ada:latest bash -c \
    "(cd /scratch && DNNLIB_CACHE_DIR=/scratch/.cache python3 generate.py --trunc=1 --seeds=85,265,297,849 \
    --outdir=out --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl)"

Note: The above defaults to a container base image that requires NVIDIA driver release r455.23 or later. To build an image for older drivers and GPUs, run:

docker build --build-arg BASE_IMAGE=tensorflow/tensorflow:1.14.0-gpu-py3 --tag stylegan2ada:latest .

Projecting images to latent space

To find the matching latent vector for a given image file, run:

python projector.py --outdir=out --target=targetimg.png \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/ffhq.pkl

For optimal results, the target image should be cropped and aligned similar to the original FFHQ dataset. The above command saves the projection target out/target.png, result out/proj.png, latent vector out/dlatents.npz, and progression video out/proj.mp4. You can render the resulting latent vector by specifying --dlatents for python generate.py:

python generate.py --outdir=out --dlatents=out/dlatents.npz \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/ffhq.pkl

Preparing datasets

Datasets are stored as multi-resolution TFRecords, i.e., the same format used by StyleGAN and StyleGAN2. Each dataset consists of multiple *.tfrecords files stored under a common directory, e.g., ~/datasets/ffhq/ffhq-r*.tfrecords

MetFaces: Download the MetFaces dataset and convert to TFRecords:

python dataset_tool.py create_from_images ~/datasets/metfaces ~/downloads/metfaces/images
python dataset_tool.py display ~/datasets/metfaces

BreCaHAD: Download the BreCaHAD dataset. Generate 512x512 resolution crops and convert to TFRecords:

python dataset_tool.py extract_brecahad_crops --cropsize=512 \
    --output_dir=/tmp/brecahad-crops --brecahad_dir=~/downloads/brecahad/images

python dataset_tool.py create_from_images ~/datasets/brecahad /tmp/brecahad-crops
python dataset_tool.py display ~/datasets/brecahad

AFHQ: Download the AFHQ dataset and convert to TFRecords:

python dataset_tool.py create_from_images ~/datasets/afhqcat ~/downloads/afhq/train/cat
python dataset_tool.py create_from_images ~/datasets/afhqdog ~/downloads/afhq/train/dog
python dataset_tool.py create_from_images ~/datasets/afhqwild ~/downloads/afhq/train/wild
python dataset_tool.py display ~/datasets/afhqcat

CIFAR-10: Download the CIFAR-10 python version. Convert to two separate TFRecords for unconditional and class-conditional training:

python dataset_tool.py create_cifar10 --ignore_labels=1 \
    ~/datasets/cifar10u ~/downloads/cifar-10-batches-py

python dataset_tool.py create_cifar10 --ignore_labels=0 \
    ~/datasets/cifar10c ~/downloads/cifar-10-batches-py

python dataset_tool.py display ~/datasets/cifar10c

FFHQ: Download the Flickr-Faces-HQ dataset as TFRecords:

pushd ~
git clone https://github.com/NVlabs/ffhq-dataset.git
cd ffhq-dataset
python download_ffhq.py --tfrecords
popd
python dataset_tool.py display ~/ffhq-dataset/tfrecords/ffhq

LSUN: Download the desired LSUN categories in LMDB format from the LSUN project page and convert to TFRecords:

python dataset_tool.py create_lsun --resolution=256 --max_images=200000 \
    ~/datasets/lsuncat200k ~/downloads/lsun/cat_lmdb

python dataset_tool.py display ~/datasets/lsuncat200k

Custom: Custom datasets can be created by placing all images under a single directory. The images must be square-shaped and they must all have the same power-of-two dimensions. To convert the images to multi-resolution TFRecords, run:

python dataset_tool.py create_from_images ~/datasets/custom ~/custom-images
python dataset_tool.py display ~/datasets/custom

Training new networks

In its most basic form, training new networks boils down to:

python train.py --outdir=~/training-runs --gpus=1 --data=~/datasets/custom --dry-run
python train.py --outdir=~/training-runs --gpus=1 --data=~/datasets/custom

The first command is optional; it will validate the arguments, print out the resulting training configuration, and exit. The second command will kick off the actual training.

In this example, the results will be saved to a newly created directory ~/training-runs/<RUNNING_ID>-custom-auto1 (controlled by --outdir). The training will export network pickles (network-snapshot-<KIMG>.pkl) and example images (fakes<KIMG>.png) at regular intervals (controlled by --snap). For each pickle, it will also evaluate FID by default (controlled by --metrics) and log the resulting scores in metric-fid50k_full.txt.

The name of the output directory (e.g., 00000-custom-auto1) reflects the hyperparameter configuration that was used. In this case, custom indicates the training set (--data) and auto1 indicates the base configuration that was used to select the hyperparameters (--cfg):

Base config	Description
`auto` (default)	Automatically select reasonable defaults based on resolution and GPU count. Serves as a good starting point for new datasets, but does not necessarily lead to optimal results.
`stylegan2`	Reproduce results for StyleGAN2 config F at 1024x1024 using 1, 2, 4, or 8 GPUs.
`paper256`	Reproduce results for FFHQ and LSUN Cat at 256x256 using 1, 2, 4, or 8 GPUs.
`paper512`	Reproduce results for BreCaHAD and AFHQ at 512x512 using 1, 2, 4, or 8 GPUs.
`paper1024`	Reproduce results for MetFaces at 1024x1024 using 1, 2, 4, or 8 GPUs.
`cifar`	Reproduce results for CIFAR-10 (tuned configuration) using 1 or 2 GPUs.
`cifarbaseline`	Reproduce results for CIFAR-10 (baseline configuration) using 1 or 2 GPUs.

The training configuration can be further customized with additional arguments. Common examples:

--aug=noaug disables ADA (default: enabled).
--mirror=1 amplifies the dataset with x-flips. Often beneficial, even with ADA.
--resume=ffhq1024 --snap=10 performs transfer learning from FFHQ trained at 1024x1024.
--resume=~/training-runs/<RUN_NAME>/network-snapshot-<KIMG>.pkl resumes where a previous training run left off.
--gamma=10 overrides R1 gamma. We strongly recommend trying out at least a few different values for each new dataset.

Augmentation fine-tuning:

--aug=ada --target=0.7 adjusts ADA target value (default: 0.6).
--aug=adarv selects the alternative ADA heuristic (requires a separate validation set).
--augpipe=blit limits the augmentation pipeline to pixel blitting only.
--augpipe=bgcfnc enables all available augmentations (blit, geom, color, filter, noise, cutout).
--cmethod=bcr enables bCR with small integer translations.

Please refer to python train.py --help for the full list.

Expected training time

The total training time depends heavily on the resolution, number of GPUs, desired quality, dataset, and hyperparameters. In general, the training time can be expected to scale linearly with respect to the resolution and inversely proportional with respect to the number of GPUs. Small datasets tend to reach their lowest achievable FID faster than larger ones, but the convergence is somewhat less predictable. Transfer learning tends to converge significantly faster than training from scratch.

To give a rough idea of typical training times, the following figure shows several examples of FID as a function of wallclock time. Each curve corresponds to training a given dataset from scratch using --cfg=auto with a given number of NVIDIA Tesla V100 GPUs:

Please note that --cfg=auto only serves as a reasonable first guess for the hyperparameters — it does not necessarily lead to optimal results for a given dataset. For example, --cfg=stylegan2 yields considerably better FID for FFHQ-140k at 1024x1024 than illustrated above. We recommend trying out at least a few different values of --gamma for each new dataset.

Preparing training set sweeps

In the paper, we perform several experiments using artificially limited/amplified versions of the training data, such as ffhq30k, ffhq140k, and lsuncat30k. These are constructed by first unpacking the original dataset into a temporary directory with python dataset_tool.py unpack and then repackaging the appropriate versions into TFRecords with python dataset_tool.py pack. In the following examples, the temporary directories are created under /tmp and can be safely deleted afterwards.

# Unpack FFHQ images at 256x256 resolution.
python dataset_tool.py unpack --resolution=256 \
    --tfrecord_dir=~/ffhq-dataset/tfrecords/ffhq --output_dir=/tmp/ffhq-unpacked

# Create subset with 30k images.
python dataset_tool.py pack --num_train=30000 --num_validation=10000 --seed=123 \
    --tfrecord_dir=~/datasets/ffhq30k --unpacked_dir=/tmp/ffhq-unpacked

# Create amplified version with 140k images.
python dataset_tool.py pack --num_train=70000 --num_validation=0 --mirror=1 --seed=123 \
    --tfrecord_dir=~/datasets/ffhq140k --unpacked_dir=/tmp/ffhq-unpacked

# Unpack LSUN Cat images at 256x256 resolution.
python dataset_tool.py unpack --resolution=256 \
    --tfrecord_dir=~/datasets/lsuncat200k --output_dir=/tmp/lsuncat200k-unpacked

# Create subset with 30k images.
python dataset_tool.py pack --num_train=30000 --num_validation=10000 --seed=123 \
    --tfrecord_dir=~/datasets/lsuncat30k --unpacked_dir=/tmp/lsuncat200k-unpacked

Please note that when training with artifically limited/amplified datasets, the quality metrics (e.g., fid50k_full) should still be evaluated against the corresponding original datasets. This can be done by specifying a separate metric dataset for train.py and calc_metrics.py using the --metricdata argument. For example:

python train.py [OTHER_OPTIONS] --data=~/datasets/ffhq30k --metricdata=~/ffhq-dataset/tfrecords/ffhq

Reproducing training runs from the paper

The pre-trained network pickles (stylegan2-ada/pretrained/paper-fig*) reflect the training configuration the same way as the output directory names, making it straightforward to reproduce a given training run from the paper. For example:

# 1. AFHQ Dog
# paper-fig11a-small-datasets/afhqdog-mirror-paper512-ada.pkl
python train.py --outdir=~/training-runs --gpus=8 --data=~/datasets/afhqdog \
    --mirror=1 --cfg=paper512 --aug=ada

# 2. Class-conditional CIFAR-10
# pretrained/paper-fig11b-cifar10/cifar10c-cifar-ada-best-fid.pkl
python train.py --outdir=~/training-runs --gpus=2 --data=~/datasets/cifar10c \
    --cfg=cifar --aug=ada

# 3. MetFaces with transfer learning from FFHQ
# paper-fig11a-small-datasets/metfaces-mirror-paper1024-ada-resumeffhq1024.pkl
python train.py --outdir=~/training-runs --gpus=8 --data=~/datasets/metfaces \
    --mirror=1 --cfg=paper1024 --aug=ada --resume=ffhq1024 --snap=10

# 4. 10k subset of FFHQ with ADA and bCR
# paper-fig7c-training-set-sweeps/ffhq10k-paper256-ada-bcr.pkl
python train.py --outdir=~/training-runs --gpus=8 --data=~/datasets/ffhq10k \
    --cfg=paper256 --aug=ada --cmethod=bcr --metricdata=~/ffhq-dataset/tfrecords/ffhq

# 5. StyleGAN2 config F
# transfer-learning-source-nets/ffhq-res1024-mirror-stylegan2-noaug.pkl
python train.py --outdir=~/training-runs --gpus=8 --data=~/ffhq-dataset/tfrecords/ffhq \
    --res=1024 --mirror=1 --cfg=stylegan2 --aug=noaug --metrics=fid50k

Notes:

You can use fewer GPUs than shown in the above examples. This will only increase the training time — it will not affect the quality of the results.
Example 3 specifies --snap=10 to export network pickles more frequently than usual. This is recommended, because transfer learning tends to yield very fast convergence.
Example 4 specifies --metricdata to evaluate quality metrics against the original FFHQ dataset, not the artificially limited 10k subset used for training.
Example 5 specifies --metrics=fid50k to evaluate FID the same way as in the StyleGAN2 paper (see below).

Quality metrics

By default, train.py will automatically compute FID for each network pickle. We strongly recommend inspecting metric-fid50k_full.txt at regular intervals to monitor the training progress. When desired, the automatic computation can be disabled with --metrics none to speed up the training.

Additional quality metrics can also be computed after the training:

# Previous training run: look up options automatically, save result to text file.
python calc_metrics.py --metrics=pr50k3_full \
    --network=~/training-runs/00000-ffhq10k-res64-auto1/network-snapshot-000000.pkl

# Pretrained network pickle: specify dataset explicitly, print result to stdout.
python calc_metrics.py --metrics=fid50k_full --metricdata=~/datasets/ffhq --mirror=1 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/ffhq.pkl

The first example will automatically find training_options.json stored alongside the network pickle and perform the same operation as if --metrics pr50k3_full had been specified during training. The second example will download a pre-trained network pickle, in which case the values of --mirror and --metricdata have to be specified explicitly.

Note that many of the metrics have a significant one-off cost (up to an hour or more) when they are calculated for the first time using a given dataset. Also note that the evaluation is done using a different random seed each time, so the results will vary if the same metric is computed multiple times.

We employ the following metrics in the ADA paper. The expected execution times correspond to using one Tesla V100 GPU at 1024x1024 and 256x256 resolution:

Metric	1024x1024	256x256	Description
`fid50k_full`	15 min	5 min	Fréchet inception distance^[1] against the full dataset.
`kid50k_full`	15 min	5 min	Kernel inception distance^[2] against the full dataset.
`pr50k3_full`	20 min	10 min	Precision and recall^[3] againt the full dataset.
`is50k`	25 min	5 min	Inception score^[4] for CIFAR-10.

In addition, all metrics that were used in the StyleGAN and StyleGAN2 papers are also supported for backwards compatibility:

Legacy: StyleGAN2	1024x1024	Description
`fid50k`	15 min	Fréchet inception distance against 50k real images.
`kid50k`	15 min	Kernel inception distance against 50k real images.
`pr50k3`	20 min	Precision and recall against 50k real images.
`ppl2_wend`	40 min	Perceptual path length^[5] in W at path endpoints against full image.

Legacy: StyleGAN	1024x1024	Description
`ppl_zfull`	40 min	Perceptual path length in Z for full paths against cropped image.
`ppl_wfull`	40 min	Perceptual path length in W for full paths against cropped image.
`ppl_zend`	40 min	Perceptual path length in Z at path endpoints against cropped image.
`ppl_wend`	40 min	Perceptual path length in W at path endpoints against cropped image.
`ls`	10 hrs	Linear separability^[5] with respect to CelebA attributes.

References:

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, Heusel et al. 2017
Demystifying MMD GANs, Bińkowski et al. 2018
Improved Precision and Recall Metric for Assessing Generative Models, Kynkäänniemi et al. 2019
Improved Techniques for Training GANs, Salimans et al. 2016
A Style-Based Generator Architecture for Generative Adversarial Networks, Karras et al. 2018

License

This work is made available under the Nvidia Source Code License.

Citation

@inproceedings{Karras2020ada,
  title     = {Training Generative Adversarial Networks with Limited Data},
  author    = {Tero Karras and Miika Aittala and Janne Hellsten and Samuli Laine and Jaakko Lehtinen and Timo Aila},
  booktitle = {Proc. NeurIPS},
  year      = {2020}
}

Development

This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.

Acknowledgements

We thank David Luebke for helpful comments; Tero Kuosmanen and Sabu Nadarajan for their support with compute infrastructure; and Edgar Schönfeld for guidance on setting up unconditional BigGAN.

Comments

rtx 3000 series broken compatibility

I tried to install nvidia driver ( 455 ) by myself on my ubuntu 18.04 with python 3.7 and tensorflow 1.14 (also tried 1.15). It always said it couldn't find a gpu when trying to start training (or other errors like attempting to import cublas.10 files with a failure, while I had cuda 11 installed instead ). I got an rtx 3090 founder edition gpu. I tried different approaches by reinstalling things and wasted more than 10 hours, it never worked for me. It was working on my titan rtx though, on a few different computer rigs. Finally I thought that maintainers claimed it is working on their end for rtx 3000, maybe I can try their docker container. It didn't work initially, then I realized I have a few more steps to do, so I installed nvidia-docker2 ( nvidia-container-toolkit ) thinking that it should certainly work. Unfortunately, it causes errors again:

Output directory: ./results/00015-jjl_1024-mirror-24gb-gpu-bg-resumeffhq1024 Training data: ./datasets/jjl_1024 Training length: 25000 kimg Resolution: 1024 Number of GPUs: 1

Creating output directory... Loading training set... Image shape: [3, 1024, 1024] Label shape: [0]

Constructing networks... Setting up TensorFlow plugin "fused_bias_act.cu": Compiling... Failed! Traceback (most recent call last): File "train.py", line 591, in main() File "train.py", line 583, in main run_training(**vars(args)) File "train.py", line 473, in run_training training_loop.training_loop(**training_options) File "/var/www/training/training_loop.py", line 123, in training_loop Gs = G.clone('Gs') File "/var/www/dnnlib/tflib/network.py", line 457, in clone net.copy_vars_from(self) File "/var/www/dnnlib/tflib/network.py", line 490, in copy_vars_from src_net._get_vars() File "/var/www/dnnlib/tflib/network.py", line 297, in _get_vars self._vars = OrderedDict(self._get_own_vars()) File "/var/www/dnnlib/tflib/network.py", line 286, in _get_own_vars self._init_graph() File "/var/www/dnnlib/tflib/network.py", line 151, in _init_graph out_expr = self._build_func(*self._input_templates, **build_kwargs) File "/var/www/training/networks.py", line 231, in G_main num_layers = components.synthesis.input_shape[1] File "/var/www/dnnlib/tflib/network.py", line 232, in input_shape return self.input_shapes[0] File "/var/www/dnnlib/tflib/network.py", line 219, in input_shapes self._input_shapes = [t.shape.as_list() for t in self.input_templates] File "/var/www/dnnlib/tflib/network.py", line 267, in input_templates self._init_graph() File "/var/www/dnnlib/tflib/network.py", line 151, in _init_graph out_expr = self._build_func(*self._input_templates, **build_kwargs) File "/var/www/training/networks.py", line 439, in G_synthesis x = layer(x, layer_idx=0, fmaps=nf(1), kernel=3) File "/var/www/training/networks.py", line 392, in layer x = modulated_conv2d_layer(x, dlatents_in[:, layer_idx], fmaps=fmaps, kernel=kernel, up=up, resample_kernel=resample_kernel, fused_modconv=fused_modconv) File "/var/www/training/networks.py", line 105, in modulated_conv2d_layer s = apply_bias_act(s, bias_var='mod_bias', trainable=trainable) + 1 # [BI] Add bias (initially 1). File "/var/www/training/networks.py", line 50, in apply_bias_act return fused_bias_act(x, b=tf.cast(b, x.dtype), act=act, gain=gain, clamp=clamp) File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 72, in fused_bias_act return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain, clamp=clamp) File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 132, in _fused_bias_act_cuda cuda_op = _get_plugin().fused_bias_act File "/var/www/dnnlib/tflib/ops/fused_bias_act.py", line 18, in _get_plugin return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu') File "/var/www/dnnlib/tflib/custom_ops.py", line 159, in get_plugin _run_cmd(nvcc_cmd + ' "%s" --shared -o "%s" --keep --keep-dir "%s"' % (cuda_file, tmp_file, tmp_dir)) File "/var/www/dnnlib/tflib/custom_ops.py", line 69, in _run_cmd raise RuntimeError('NVCC returned an error. See below for full command line and output log:\n\n%s\n\n%s' % (cmd, output)) RuntimeError: NVCC returned an error. See below for full command line and output log:

nvcc --compiler-options '-fPIC' --compiler-options '-I/usr/local/lib/python3.6/dist-packages/tensorflow/include -D_GLIBCXX_USE_CXX11_ABI=0' --linker-options '-L/usr/local/lib/python3.6/dist-packages/tensorflow -l:libtensorflow_framework.so.1' --gpu-architecture=sm_86 --use_fast_math --disable-warnings --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/protobuf_archive/src" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/com_google_absl" --include-path "/usr/local/lib/python3.6/dist-packages/tensorflow/include/external/eigen_archive" 2>&1 "/var/www/dnnlib/tflib/ops/fused_bias_act.cu" --shared -o "/tmp/tmp4dn1nm6o/fused_bias_act_tmp.so" --keep --keep-dir "/tmp/tmp4dn1nm6o"

nvcc fatal : Value 'sm_86' is not defined for option 'gpu-architecture'

By googling it I identified that similar errors ( sm_75 ) are occurring when there is code / cuda / driver compatibility issues. At least that's what people say. Please help with a decent working container version at least.

opened by JulianPinzaru 38
RTX 30x0 Support

Your current docker image relies on an older version of CUDA. The current 3080 and 3090 series GPUs are only supported under CUDA 11.1. It would be wonderful if you could update the image or offer a workaround.

Error message when running with older CUDA: nvcc fatal : Value 'sm_86' is not defined for option 'gpu-architecture'

opened by reflare 13
Resume from the latest pickle
Hello,

I know you don't accept pull requests. However:

this could be of interest to others who want to run the code on Google Colab,

this is the first place where they will look for such a change.

I have added the ability to resume from the latest .pkl file with the command-line argument --resume=latest. The value of cur_nimg is inferred from the file name. I have yet to figure out how to automatically compute the relevant value of aug.strength to resume from.
opened by woctezuma 10
how to create dataset with label

If I want to finetune stylegan2-ada with conditional label how I prepare dataset ??

the readem had only for cifar10 dataset with conditional but how to use for my own dataset??

what is the format of data ?

opened by Johnson-yue 9

No GPU devices found

Hi, when I run the following on a p2.xlarge deep learning ami in AWS using the command

docker build --tag stylegan2ada:latest .

docker run --gpus all -it --rm -v `pwd`:/scratch --user $(id -u):$(id -g) stylegan2ada:latest bash -c \
    "(cd /scratch && DNNLIB_CACHE_DIR=/scratch/.cache python3 generate.py --trunc=1 --seeds=85,265,297,849 \
    --outdir=out --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl)"

i get this error.

NVIDIA Release 20.10-tf1 (build 16775850)
TensorFlow Version 1.15.4

Container image Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
Copyright 2017-2020 The TensorFlow Authors.  All rights reserved.

NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
ERROR: Detected NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: No supported GPU(s) detected to run this container

NOTE: Legacy NVIDIA Driver detected.  Compatibility mode ENABLED.

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TensorFlow.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

2021-01-10 16:18:03.840894: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2021-01-10 16:18:08.234607: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300075000 Hz
2021-01-10 16:18:08.236149: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x50ff110 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-10 16:18:08.236185: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-10 16:18:08.241208: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-01-10 16:18:08.398652: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-10 16:18:08.399598: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5174bd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-10 16:18:08.399631: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2021-01-10 16:18:08.399902: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-10 16:18:08.400718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
2021-01-10 16:18:08.400787: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-01-10 16:18:08.437007: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-01-10 16:18:08.461765: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-01-10 16:18:08.469196: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-01-10 16:18:08.507750: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-01-10 16:18:08.516625: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-01-10 16:18:08.516919: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-01-10 16:18:08.517138: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-10 16:18:08.518088: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-10 16:18:08.518890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1747] Ignoring visible gpu device (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7) with Cuda compute capability 3.7. The minimum required Cuda capability is 5.2.
2021-01-10 16:18:08.518934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-10 16:18:08.518951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212]      0 
2021-01-10 16:18:08.518973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0:   N 
Loading networks from "https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl"...
Downloading https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl ... done
Setting up TensorFlow plugin "fused_bias_act.cu": 2021-01-10 16:18:33.824257: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-10 16:18:33.825117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
2021-01-10 16:18:33.825170: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-01-10 16:18:33.825214: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-01-10 16:18:33.825252: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-01-10 16:18:33.825288: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-01-10 16:18:33.825324: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-01-10 16:18:33.825354: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-01-10 16:18:33.825392: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-01-10 16:18:33.825525: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-10 16:18:33.826419: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-10 16:18:33.827220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1747] Ignoring visible gpu device (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7) with Cuda compute capability 3.7. The minimum required Cuda capability is 5.2.
2021-01-10 16:18:33.827261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-10 16:18:33.827278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212]      0 
2021-01-10 16:18:33.827295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0:   N 
Failed!
Traceback (most recent call last):
  File "generate.py", line 121, in <module>
    main()
  File "generate.py", line 116, in main
    generate_images(**vars(args))
  File "generate.py", line 52, in generate_images
    noise_vars = [var for name, var in Gs.components.synthesis.vars.items() if name.startswith('noise')]
  File "/scratch/dnnlib/tflib/network.py", line 293, in vars
    return copy.copy(self._get_vars())
  File "/scratch/dnnlib/tflib/network.py", line 297, in _get_vars
    self._vars = OrderedDict(self._get_own_vars())
  File "/scratch/dnnlib/tflib/network.py", line 286, in _get_own_vars
    self._init_graph()
  File "/scratch/dnnlib/tflib/network.py", line 151, in _init_graph
    out_expr = self._build_func(*self._input_templates, **build_kwargs)
  File "<string>", line 431, in G_synthesis
  File "<string>", line 384, in layer
  File "<string>", line 97, in modulated_conv2d_layer
  File "<string>", line 42, in apply_bias_act
  File "/scratch/dnnlib/tflib/ops/fused_bias_act.py", line 72, in fused_bias_act
    return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain, clamp=clamp)
  File "/scratch/dnnlib/tflib/ops/fused_bias_act.py", line 132, in _fused_bias_act_cuda
    cuda_op = _get_plugin().fused_bias_act
  File "/scratch/dnnlib/tflib/ops/fused_bias_act.py", line 18, in _get_plugin
    return custom_ops.get_plugin(os.path.splitext(__file__)[0] + '.cu')
  File "/scratch/dnnlib/tflib/custom_ops.py", line 139, in get_plugin
    compile_opts += f' --gpu-architecture={_get_cuda_gpu_arch_string()}'
  File "/scratch/dnnlib/tflib/custom_ops.py", line 60, in _get_cuda_gpu_arch_string
    raise RuntimeError('No GPU devices found')
RuntimeError: No GPU devices found

which is unexpected since when I run a bash in the docker container

docker run --gpus all -it --rm -v `pwd`:/scratch --user $(id -u):$(id -g) stylegan2ada:latest bash

I get

I have no name!@c6fb7621777c:/workspace$ nvidia-smi
Sun Jan 10 16:22:15 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   38C    P8    32W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
I have no name!@c6fb7621777c:/workspace$

and when I run nvcc in the docker container

I have no name!@c6fb7621777c:/workspace$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0

any suggestions?

opened by matthewchung74 6

ffhq1024 fakes_init look bad

Upon running a straight, out-of-the-box train.py with --resume=ffhq1024, the "fakes_init.png" looks very weird: https://storage.googleapis.com/public-assets-xander/fakes_init.jpg

opened by aiXander 6

NVCC error compiling fused_bias_act.cpp

I get an error when running python run_generator.py generate-images ....

I solved some of the issues following this https://stackoverflow.com/questions/59342888/tensorflow-error-this-file-requires-compiler-and-library-support-for-the-iso-c#.

Error:

dnnlib: Running run_generator.generate_images() on localhost...
Loading networks from "./networks/stylegan2-ffhq-config-f.pkl"...
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Compiling... Failed!
Traceback (most recent call last):
  File "run_generator.py", line 168, in <module>
    main()
  File "run_generator.py", line 163, in main
    dnnlib.submit_run(sc, func_name_map[subcmd], **kwargs)
  File "/home/matjazibb/dev/stylegan2/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/home/matjazibb/dev/stylegan2/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/home/matjazibb/dev/stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/home/matjazibb/dev/stylegan2/run_generator.py", line 21, in generate_images
    _G, _D, Gs = pretrained_networks.load_networks(network_pkl)
  File "/home/matjazibb/dev/stylegan2/pretrained_networks.py", line 76, in load_networks
    G, D, Gs = pickle.load(stream, encoding='latin1')
  File "/home/matjazibb/dev/stylegan2/dnnlib/tflib/network.py", line 297, in __setstate__
    self._init_graph()
  File "/home/matjazibb/dev/stylegan2/dnnlib/tflib/network.py", line 154, in _init_graph
    out_expr = self._build_func(*self.input_templates, **build_kwargs)
  File "<string>", line 491, in G_synthesis_stylegan2
  File "<string>", line 455, in layer
  File "<string>", line 99, in modulated_conv2d_layer
  File "<string>", line 68, in apply_bias_act
  File "/home/matjazibb/dev/stylegan2/dnnlib/tflib/ops/fused_bias_act.py", line 68, in fused_bias_act
    return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain)
  File "/home/matjazibb/dev/stylegan2/dnnlib/tflib/ops/fused_bias_act.py", line 122, in _fused_bias_act_cuda
    cuda_kernel = _get_plugin().fused_bias_act
  File "/home/matjazibb/dev/stylegan2/dnnlib/tflib/ops/fused_bias_act.py", line 16, in _get_plugin
    return custom_ops.get_plugin(os.path.splitext(__file__)[0] + '.cu')
  File "/home/matjazibb/dev/stylegan2/dnnlib/tflib/custom_ops.py", line 147, in get_plugin
    _run_cmd(nvcc_cmd + ' "%s" --shared -o "%s" --keep --keep-dir "%s"' % (cuda_file, tmp_file, tmp_dir))
  File "/home/matjazibb/dev/stylegan2/dnnlib/tflib/custom_ops.py", line 61, in _run_cmd
    raise RuntimeError('NVCC returned an error. See below for full command line and output log:\n\n%s\n\n%s' % (cmd, output))
RuntimeError: NVCC returned an error. See below for full command line and output log:

nvcc --std=c++11 -DNDEBUG "/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so" --compiler-options '-fPIC -D_GLIBCXX_USE_CXX11_ABI=1' --gpu-architecture=sm_52 --use_fast_math --disable-warnings --include-path "/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include" --include-path "/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/external/protobuf_archive/src" --include-path "/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/external/com_google_absl" --include-path "/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/external/eigen_archive" 2>&1 "/home/matjazibb/dev/stylegan2/dnnlib/tflib/ops/fused_bias_act.cu" --shared -o "/tmp/tmpn331x8yd/fused_bias_act_tmp.so" --keep --keep-dir "/tmp/tmpn331x8yd"

/usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(36): error: identifier "__builtin_ia32_monitorx" is undefined

/usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(42): error: identifier "__builtin_ia32_mwaitx" is undefined

/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/absl/strings/str_cat.h(268): error: expression must have a constant value

/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/absl/strings/str_cat.h(268): error: expression must have a constant value

/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/absl/memory/memory.h(616): error: class "std::allocator<tensorflow::OpKernelContext::WrappedAllocator>" has no member "is_nothrow"
          detected during:
            instantiation of type "absl::memory_internal::GetIsNothrow<std::allocator<tensorflow::OpKernelContext::WrappedAllocator>>" 
(264): here
            instantiation of type "absl::memory_internal::ExtractOrT<absl::memory_internal::GetIsNothrow, std::allocator<tensorflow::OpKernelContext::WrappedAllocator>, std::false_type>" 
(642): here
            instantiation of class "absl::allocator_is_nothrow<Alloc> [with Alloc=std::allocator<tensorflow::OpKernelContext::WrappedAllocator>]" 
/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/absl/container/inlined_vector.h(190): here
            instantiation of "absl::InlinedVector<T, N, A>::InlinedVector(absl::InlinedVector<T, N, A> &&) [with T=tensorflow::OpKernelContext::WrappedAllocator, N=4UL, A=std::allocator<tensorflow::OpKernelContext::WrappedAllocator>]" 
/home/matjazibb/miniconda3/envs/stylegan2-try2/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/framework/op_kernel.h(1081): here

5 errors detected in the compilation of "/tmp/tmpn331x8yd/fused_bias_act.cpp1.ii".

OS info:

Distributor ID: Ubuntu
Description:    Ubuntu 16.04.6 LTS
Release:        16.04
Codename:       xenial

GPU info:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01    Driver Version: 455.45.01    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 980 Ti  Off  | 00000000:01:00.0 Off |                  N/A |
| 20%   40C    P0    55W / 260W |      0MiB /  6082MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

nvcc:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

g++:

g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Conda env:

_libgcc_mutex             0.1                        main  
_tflow_select             2.1.0                       gpu  
absl-py                   0.11.0             pyhd3eb1b0_1  
astor                     0.8.1                    py36_0  
blas                      1.0                         mkl  
brotlipy                  0.7.0           py36h27cfd23_1003  
c-ares                    1.17.1               h27cfd23_0  
ca-certificates           2020.12.8            h06a4308_0  
certifi                   2020.12.5        py36h06a4308_0  
cffi                      1.14.4           py36h261ae71_0  
chardet                   4.0.0           py36h06a4308_1003  
cryptography              3.3.1            py36h3c74f83_0  
cudatoolkit               10.1.243             h6bb024c_0  
cudnn                     7.6.5                cuda10.1_0  
cupti                     10.1.168                      0  
freetype                  2.10.4               h5ab3b9f_0  
gast                      0.4.0                      py_0  
google-pasta              0.2.0                      py_0  
grpcio                    1.31.0           py36hf8bcb03_0  
h5py                      2.10.0           py36hd6299e0_1  
hdf5                      1.10.6               hb1b8bf9_0  
idna                      2.10                       py_0  
importlib-metadata        2.0.0                      py_1  
intel-openmp              2020.2                      254  
jpeg                      9b                   h024ee3a_2  
keras-applications        1.0.8                      py_1  
keras-preprocessing       1.1.0                      py_1  
lcms2                     2.11                 h396b838_0  
ld_impl_linux-64          2.33.1               h53a641e_7  
libedit                   3.1.20191231         h14c3975_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libpng                    1.6.37               hbc83047_0  
libprotobuf               3.13.0.1             hd408876_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.1.0                h2733197_1  
lz4-c                     1.9.2                heb0550a_3  
markdown                  3.3.3            py36h06a4308_0  
mkl                       2020.2                      256  
mkl-service               2.3.0            py36he8ac12f_0  
mkl_fft                   1.2.0            py36h23d657b_0  
mkl_random                1.1.1            py36h0573a6f_0  
ncurses                   6.2                  he6710b0_1  
numpy                     1.19.2           py36h54aff64_0  
numpy-base                1.19.2           py36hfa32c7d_0  
olefile                   0.46                     py36_0  
openssl                   1.1.1i               h27cfd23_0  
pillow                    8.1.0            py36he98fc37_0  
pip                       20.3.3           py36h06a4308_0  
protobuf                  3.13.0.1         py36he6710b0_1  
pycparser                 2.20                       py_2  
pyopenssl                 20.0.1             pyhd3eb1b0_1  
pysocks                   1.7.1            py36h06a4308_0  
python                    3.6.12               hcff3b4d_2  
readline                  8.0                  h7b6447c_0  
requests                  2.25.1             pyhd3eb1b0_0  
scipy                     1.5.2            py36h0b6359f_0  
setuptools                51.1.2           py36h06a4308_4  
six                       1.15.0           py36h06a4308_0  
sqlite                    3.33.0               h62c20be_0  
tensorboard               1.14.0           py36hf484d3e_0  
tensorflow                1.14.0          gpu_py36h3fb9ad6_0  
tensorflow-base           1.14.0          gpu_py36he45bfe2_0  
tensorflow-estimator      1.14.0                     py_0  
tensorflow-gpu            1.14.0               h0d30ee6_0  
termcolor                 1.1.0                    py36_1  
tk                        8.6.10               hbc83047_0  
urllib3                   1.26.2             pyhd3eb1b0_0  
werkzeug                  1.0.1                      py_0  
wheel                     0.36.2             pyhd3eb1b0_0  
wrapt                     1.12.1           py36h7b6447c_1  
xz                        5.2.5                h7b6447c_0  
zipp                      3.4.0              pyhd3eb1b0_0  
zlib                      1.2.11               h7b6447c_3  
zstd                      1.4.5                h9ceee32_0

opened by arruw 5

Can't see the GPU on Colab

Hi, I try to run the code in Google Colab with GPU. However it fails to load the pertained network. It can't find the GPU.

the code I ran is below as well as the error message. Thanks for the advice !python generate.py --outdir=out --trunc=1 --seeds=0-35 --class=1
--network='https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/cifar10.pkl'

File "/content/stylegan2-ada/dnnlib/tflib/ops/fused_bias_act.py", line 72, in fused_bias_act return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain, clamp=clamp) File "/content/stylegan2-ada/dnnlib/tflib/ops/fused_bias_act.py", line 132, in _fused_bias_act_cuda cuda_op = _get_plugin().fused_bias_act File "/content/stylegan2-ada/dnnlib/tflib/ops/fused_bias_act.py", line 18, in _get_plugin return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu') File "/content/stylegan2-ada/dnnlib/tflib/custom_ops.py", line 139, in get_plugin compile_opts += f' --gpu-architecture={_get_cuda_gpu_arch_string()}' File "/content/stylegan2-ada/dnnlib/tflib/custom_ops.py", line 60, in _get_cuda_gpu_arch_string raise RuntimeError('No GPU devices found') RuntimeError: No GPU devices found

opened by baymak 5
Continue training

Hello,

My training stopped after 7000 kimg and is it possible to continue training from the last kimg(where the training stopped) instead of starting the training again from 0 kimg?

Thanks

opened by vivekbharadhwajsa 4
crazy slowdown in processing time in stylegan2-ada/train.py --resume on Colab

I understand this is probably not a stylegan2-ada code issue but no idea where else to turn. Trying to train my first custom model. Dataset 2500 1024x1024 images. Colab Pro Tesla P100-PCIE plenty of RAM, plenty of disk. Ran successfully for 2 days with one restart after colab disconnect. to restart i used resume to pick up where i left off with last .pkl file from previous run. For those 2 runs the I was getting timings consistently around sec/tick 698.6 sec/kimg 174.65 maintenance 774.5 see attached log-goodResume.txt

After second Colab disconnect (3rd day) i repeated the same --resume pattern with last created .pkl This time and 2 subsequent attempts the performance is ridiculously slow from the outset sec/tick 1941.7 sec/kimg 485.43 maintenance 2219.6 see attached log-badResume.txt

log-goodResume.txt log-badResume.txt

The only procedural change i made between the first --resume which worked well and the later resumes which are unworkably slow is that in the latter runs i moved and renamed the pkl file that i want to resume to be.. more convenient located and named. i.e i moved network-snapshot-000320.pkl up a folder and renamed it latest.pkl so that my subsequent resumes would not require changing my train.py resume clause in the command.. just a convenience pattern which i assume should have no bearing on the slowdown

Any advice on how to resovlve this would be appreciated. i am kind of dead in the water now.

opened by MoemaMike 4
Is tf.nn.depthwise_conv2d_backprop_input equal to tf.nn.conv2d_transpose for implementing Upsample2D?

Hi, I am trying to adapt the codes to accommodate my project which is based on PyTorch. I have trouble with transferring the Upsample2D with a specific kernel (e.g. https://github.com/NVlabs/stylegan2-ada/blob/main/training/augment.py#L425) since pytorch does not have a corresponding API for tf.nn.depthwise_conv2d_backprop_input. I wonder if this operation is the same as tf.nn.conv2d_transpose so that I can safely replace it with torch.nn.funtional.conv_transpose2d. It would be very helpful if you can comment on it. Thanks a lot!

opened by yangyu12 4

FFHQ download script broken

File "download_ffhq.py", line 84, in download_file
    raise IOError('Incorrect file size', file_path)
OSError: [Errno Incorrect file size] ffhq-dataset-v2.json

opened by kyleliang919 1

Model conversion to tf.keras.Model (external application)

Hello @tkarras, @nurpax,

I was wondering whether it is possible to extract a single network component, e.g. the discriminator architecture to a tf.keras.Model. Or am I bound to the Network wrapper class?

I would be happy if you could provide some insights here.

Kind regards, Nikolai

opened by Nikolai10 0
NotImplementedError: Cannot convert a symbolic Tensor (Train_gpu0/Loss_R1/gradients/Train_gpu0/Augment_1/transform/ImageProjectiveTransformV2_grad/flat_transforms_to_matrices/strided_slice:0) to a numpy array.

Hi,

I am trying to train a GAN but this issue occurs all the time. I do not know if it is a bug or if I am making something wrong. I am using tensorflow 1.x on google colab.

The last line it is executing is: !python train.py --outdir='/content/drive/MyDrive/stylegan2-ada/training-runs' --gpus=1 --data='/content/drive/MyDrive/stylegan2-ada/datasets/{dataset_name}' I tried it with some other training configurations but the same error occurred every time

Here is the output of the program:

/content/drive/MyDrive/stylegan2-ada tcmalloc: large alloc 4294967296 bytes == 0x6ec6000 @ 0x7f71755b4001 0x7f71727db1af 0x7f7172831c23 0x7f7172832a87 0x7f71728d4823 0x5936cc 0x548c51 0x5127f1 0x549e0e 0x4bca8a 0x532b86 0x594a96 0x548cc1 0x5127f1 0x549576 0x4bca8a 0x5134a6 0x549576 0x4bca8a 0x5134a6 0x549e0e 0x4bca8a 0x5134a6 0x593dd7 0x5118f8 0x549576 0x604173 0x5f5506 0x5f8c6c 0x5f9206 0x64faf2 tcmalloc: large alloc 4294967296 bytes == 0x7f6fa9dc6000 @ 0x7f71755b21e7 0x7f71727db0ce 0x7f7172831cf5 0x7f7172831f4f 0x7f71728d4673 0x5936cc 0x548c51 0x5127f1 0x549576 0x593fce 0x548ae9 0x5127f1 0x549576 0x593fce 0x548ae9 0x5127f1 0x549576 0x593fce 0x548ae9 0x5127f1 0x593dd7 0x5118f8 0x549576 0x593fce 0x548ae9 0x51566f 0x549576 0x593fce 0x548ae9 0x5127f1 0x549e0e tcmalloc: large alloc 4294967296 bytes == 0x7f6ea8dc4000 @ 0x7f71755b21e7 0x7f71727db0ce 0x7f7172831cf5 0x7f7172831f4f 0x7f7135e07235 0x7f713578a792 0x7f713578ad42 0x7f7135743aee 0x59371f 0x548c51 0x51566f 0x593dd7 0x511e2c 0x549e0e 0x4bcb19 0x5134a6 0x549576 0x593fce 0x511e2c 0x549e0e 0x593fce 0x511e2c 0x593dd7 0x511e2c 0x549576 0x4bcb19 0x59c019 0x595ef6 0x5134a6 0x549576 0x593fce

Training options: { "G_args": { "func_name": "training.networks.G_main", "fmap_base": 8192, "fmap_max": 512, "mapping_layers": 2, "num_fp16_res": 4, "conv_clamp": 256 }, "D_args": { "func_name": "training.networks.D_main", "mbstd_group_size": 4, "fmap_base": 8192, "fmap_max": 512, "num_fp16_res": 4, "conv_clamp": 256 }, "G_opt_args": { "beta1": 0.0, "beta2": 0.99, "learning_rate": 0.0025 }, "D_opt_args": { "beta1": 0.0, "beta2": 0.99, "learning_rate": 0.0025 }, "loss_args": { "func_name": "training.loss.stylegan2", "r1_gamma": 0.8192 }, "augment_args": { "class_name": "training.augment.AdaptiveAugment", "tune_heuristic": "rt", "tune_target": 0.6, "apply_func": "training.augment.augment_pipeline", "apply_args": { "xflip": 1, "rotate90": 1, "xint": 1, "scale": 1, "rotate": 1, "aniso": 1, "xfrac": 1, "brightness": 1, "contrast": 1, "lumaflip": 1, "hue": 1, "saturation": 1 } }, "num_gpus": 1, "image_snapshot_ticks": 50, "network_snapshot_ticks": 50, "train_dataset_args": { "path": "/content/drive/MyDrive/stylegan2-ada/datasets/Pferde", "max_label_size": 0, "resolution": 256, "mirror_augment": false }, "metric_arg_list": [ { "name": "fid50k_full", "class_name": "metrics.frechet_inception_distance.FID", "max_reals": null, "num_fakes": 50000, "minibatch_per_gpu": 8, "force_dataset_args": { "shuffle": false, "max_images": null, "repeat": false, "mirror_augment": false } } > ], "metric_dataset_args": { "path": "/content/drive/MyDrive/stylegan2-ada/datasets/Pferde", "max_label_size": 0, "resolution": 256, "mirror_augment": false }, "total_kimg": 25000, "minibatch_size": 16, "minibatch_gpu": 16, "G_smoothing_kimg": 5.0, "G_smoothing_rampup": 0.05, "run_dir": "/content/drive/MyDrive/stylegan2-ada/training-runs/00001-Pferde-auto1" }

Output directory: /content/drive/MyDrive/stylegan2-ada/training-runs/00001-Pferde-auto1 Training data: /content/drive/MyDrive/stylegan2-ada/datasets/Pferde Training length: 25000 kimg Resolution: 256 Number of GPUs: 1

Creating output directory... Loading training set... tcmalloc: large alloc 4294967296 bytes == 0x6ec6000 @ 0x7f71755b4001 0x7f71727db1af 0x7f7172831c23 0x7f7172832a87 0x7f71728d4823 0x5936cc 0x548c51 0x5127f1 0x549e0e 0x4bca8a 0x532b86 0x594a96 0x548cc1 0x5127f1 0x549576 0x4bca8a 0x5134a6 0x549576 0x4bca8a 0x5134a6 0x549e0e 0x4bca8a 0x5134a6 0x593dd7 0x5118f8 0x549576 0x604173 0x5f5506 0x5f8c6c 0x5f9206 0x64faf2 tcmalloc: large alloc 4294967296 bytes == 0x7f6d9d770000 @ 0x7f71755b21e7 0x7f71727db0ce 0x7f7172831cf5 0x7f7172831f4f 0x7f71728d4673 0x5936cc 0x548c51 0x5127f1 0x549576 0x593fce 0x548ae9 0x5127f1 0x549576 0x593fce 0x548ae9 0x5127f1 0x549576 0x593fce 0x548ae9 0x5127f1 0x593dd7 0x5118f8 0x549576 0x593fce 0x548ae9 0x51566f 0x549576 0x593fce 0x548ae9 0x5127f1 0x549e0e tcmalloc: large alloc 4294967296 bytes == 0x7f6d9d770000 @ 0x7f71755b21e7 0x7f71727db0ce 0x7f7172831cf5 0x7f7172831f4f 0x7f7135e07235 0x7f713578a792 0x7f713578ad42 0x7f7135743aee 0x59371f 0x548c51 0x51566f 0x593dd7 0x511e2c 0x549e0e 0x4bcb19 0x5134a6 0x549576 0x593fce 0x511e2c 0x549e0e 0x593fce 0x511e2c 0x593dd7 0x511e2c 0x549576 0x4bcb19 0x59c019 0x595ef6 0x5134a6 0x549576 0x593fce Image shape: [3, 256, 256] Label shape: [0]

Constructing networks... Setting up TensorFlow plugin "fused_bias_act.cu": Loading... Done. Setting up TensorFlow plugin "upfirdn_2d.cu": Loading... Done.

G Params OutputShape WeightShape

latents_in - (?, 512) -
labels_in - (?, 0) -
G_mapping/Normalize - (?, 512) -
G_mapping/Dense0 262656 (?, 512) (512, 512)
G_mapping/Dense1 262656 (?, 512) (512, 512)
G_mapping/Broadcast - (?, 14, 512) -
dlatent_avg - (512,) -
Truncation/Lerp - (?, 14, 512) -
G_synthesis/4x4/Const 8192 (?, 512, 4, 4) (1, 512, 4, 4)
G_synthesis/4x4/Conv 2622465 (?, 512, 4, 4) (3, 3, 512, 512) G_synthesis/4x4/ToRGB 264195 (?, 3, 4, 4) (1, 1, 512, 3)
G_synthesis/8x8/Conv0_up 2622465 (?, 512, 8, 8) (3, 3, 512, 512) G_synthesis/8x8/Conv1 2622465 (?, 512, 8, 8) (3, 3, 512, 512) G_synthesis/8x8/Upsample - (?, 3, 8, 8) -
G_synthesis/8x8/ToRGB 264195 (?, 3, 8, 8) (1, 1, 512, 3)
G_synthesis/16x16/Conv0_up 2622465 (?, 512, 16, 16) (3, 3, 512, 512) G_synthesis/16x16/Conv1 2622465 (?, 512, 16, 16) (3, 3, 512, 512) G_synthesis/16x16/Upsample - (?, 3, 16, 16) -
G_synthesis/16x16/ToRGB 264195 (?, 3, 16, 16) (1, 1, 512, 3)
G_synthesis/32x32/Conv0_up 2622465 (?, 512, 32, 32) (3, 3, 512, 512) G_synthesis/32x32/Conv1 2622465 (?, 512, 32, 32) (3, 3, 512, 512) G_synthesis/32x32/Upsample - (?, 3, 32, 32) -
G_synthesis/32x32/ToRGB 264195 (?, 3, 32, 32) (1, 1, 512, 3)
G_synthesis/64x64/Conv0_up 1442561 (?, 256, 64, 64) (3, 3, 512, 256) G_synthesis/64x64/Conv1 721409 (?, 256, 64, 64) (3, 3, 256, 256) G_synthesis/64x64/Upsample - (?, 3, 64, 64) -
G_synthesis/64x64/ToRGB 132099 (?, 3, 64, 64) (1, 1, 256, 3)
G_synthesis/128x128/Conv0_up 426369 (?, 128, 128, 128) (3, 3, 256, 128) G_synthesis/128x128/Conv1 213249 (?, 128, 128, 128) (3, 3, 128, 128) G_synthesis/128x128/Upsample - (?, 3, 128, 128) -
G_synthesis/128x128/ToRGB 66051 (?, 3, 128, 128) (1, 1, 128, 3)
G_synthesis/256x256/Conv0_up 139457 (?, 64, 256, 256) (3, 3, 128, 64) G_synthesis/256x256/Conv1 69761 (?, 64, 256, 256) (3, 3, 64, 64)
G_synthesis/256x256/Upsample - (?, 3, 256, 256) -
G_synthesis/256x256/ToRGB 33027 (?, 3, 256, 256) (1, 1, 64, 3)

Total 23191522

D Params OutputShape WeightShape

images_in - (?, 3, 256, 256) -
labels_in - (?, 0) -
256x256/FromRGB 256 (?, 64, 256, 256) (1, 1, 3, 64)
256x256/Conv0 36928 (?, 64, 256, 256) (3, 3, 64, 64)
256x256/Conv1_down 73856 (?, 128, 128, 128) (3, 3, 64, 128) 256x256/Skip 8192 (?, 128, 128, 128) (1, 1, 64, 128) 128x128/Conv0 147584 (?, 128, 128, 128) (3, 3, 128, 128) 128x128/Conv1_down 295168 (?, 256, 64, 64) (3, 3, 128, 256) 128x128/Skip 32768 (?, 256, 64, 64) (1, 1, 128, 256) 64x64/Conv0 590080 (?, 256, 64, 64) (3, 3, 256, 256) 64x64/Conv1_down 1180160 (?, 512, 32, 32) (3, 3, 256, 512) 64x64/Skip 131072 (?, 512, 32, 32) (1, 1, 256, 512) 32x32/Conv0 2359808 (?, 512, 32, 32) (3, 3, 512, 512) 32x32/Conv1_down 2359808 (?, 512, 16, 16) (3, 3, 512, 512) 32x32/Skip 262144 (?, 512, 16, 16) (1, 1, 512, 512) 16x16/Conv0 2359808 (?, 512, 16, 16) (3, 3, 512, 512) 16x16/Conv1_down 2359808 (?, 512, 8, 8) (3, 3, 512, 512) 16x16/Skip 262144 (?, 512, 8, 8) (1, 1, 512, 512) 8x8/Conv0 2359808 (?, 512, 8, 8) (3, 3, 512, 512) 8x8/Conv1_down 2359808 (?, 512, 4, 4) (3, 3, 512, 512) 8x8/Skip 262144 (?, 512, 4, 4) (1, 1, 512, 512) 4x4/MinibatchStddev - (?, 513, 4, 4) -
4x4/Conv 2364416 (?, 512, 4, 4) (3, 3, 513, 512) 4x4/Dense0 4194816 (?, 512) (8192, 512)
Output 513 (?, 1) (512, 1)

Total 24001089

Exporting sample images... Replicating networks across 1 GPUs... Initializing augmentations... Setting up optimizers... Constructing training graph... Traceback (most recent call last): File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py", line 2380, in get_attr c_api.TF_OperationGetAttrValueProto(self._c_op, name, buf) tensorflow.python.framework.errors_impl.InvalidArgumentError: Operation 'Train_gpu0/Augment_1/transform/ImageProjectiveTransformV2' has no attr named '_XlaCompile'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/gradients_util.py", line 345, in _MaybeCompile xla_compile = op.get_attr("_XlaCompile") File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py", line 2384, in get_attr raise ValueError(str(e)) ValueError: Operation 'Train_gpu0/Augment_1/transform/ImageProjectiveTransformV2' has no attr named '_XlaCompile'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 561, in main() File "train.py", line 553, in main run_training(**vars(args)) File "train.py", line 451, in run_training training_loop.training_loop(**training_options) File "/content/drive/MyDrive/stylegan2-ada/training/training_loop.py", line 187, in training_loop terms = dnnlib.util.call_func_by_name(G=G_gpu, D=D_gpu, aug=aug, fake_labels=fake_labels, real_images=real_images_var, real_labels=real_labels_var, **loss_args) File "/content/drive/MyDrive/stylegan2-ada/dnnlib/util.py", line 281, in call_func_by_name return func_obj(*args, **kwargs) File "/content/drive/MyDrive/stylegan2-ada/training/loss.py", line 110, in stylegan2 r1_grads = tf.gradients(tf.reduce_sum(D_real.scores), [real_images])[0] File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/gradients_impl.py", line 158, in gradients unconnected_gradients) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/gradients_util.py", line 679, in _GradientsHelper lambda: grad_fn(op, *out_grads)) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/gradients_util.py", line 350, in _MaybeCompile return grad_fn() # Exit early File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/gradients_util.py", line 679, in lambda: grad_fn(op, *out_grads)) File "/tensorflow-1.15.2/python3.7/tensorflow_core/contrib/image/python/ops/image_ops.py", line 420, in _image_projective_transform_grad transforms = flat_transforms_to_matrices(transforms=transforms) File "/tensorflow-1.15.2/python3.7/tensorflow_core/contrib/image/python/ops/image_ops.py", line 362, in flat_transforms_to_matrices [transforms, array_ops.ones([num_transforms, 1])], axis=1), File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py", line 2560, in ones output = _constant_if_small(one, shape, dtype, name) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py", line 2295, in _constant_if_small if np.prod(shape) < 1000: File "<array_function internals>", line 6, in prod File "/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py", line 3052, in prod keepdims=keepdims, initial=initial, where=where) File "/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py", line 736, in array " array.".format(self.name)) NotImplementedError: Cannot convert a symbolic Tensor (Train_gpu0/Loss_R1/gradients/Train_gpu0/Augment_1/transform/ImageProjectiveTransformV2_grad/flat_transforms_to_matrices/strided_slice:0) to a numpy array.

Kind regards!

opened by EichhoernchenKathy 0
Fix a bug that leads to "ValueError: axes don't match array" in dataset_tool.py
This is a patch to the Issue #110

Here's a fix for dataset_tool.py that fixes ValueError: axes don't match array, images will work one run and not work the next error.

My debugging of the issue was as follows:

I first thought that maybe some of the images I scraped were grayscale. To solve this I used imagemagick and tried to run magick identify *.jpgto search for greyscale images to purge from the dataset but the issue still persisted in my case.

I tried to mass-edit the colorspace of the images and resolution. This still hasn't fixed the error which I was randomly getting on some of the images.

I still can't debug the exact origin of the issue such as a specific colorspace causing dataset_tool.py to crash.

I propose to use PIL to convert the image to RGB in any case. It should be able to work fine if the images are not sRGB, including the case when they are grayscale. It will most likely slows down the dataset preprocessing step a little bit (I haven't run any benchmarks). However, it is convenient if the training data is coming from varied sources, which I believe is the case for my users.
opened by lowlypalace 0
ValueError: axes don't match array, images will work one run and not work the next
I'm working on training a GAN through with a dataset of photos I scraped from Bing Image Search API and converted to 1024x1024, but keep getting this error when creating the tfrecords:

Traceback (most recent call last): File "dataset_tool.py", line 1249, in <module> execute_cmdline(sys.argv) File "dataset_tool.py", line 1244, in execute_cmdline func(**vars(args)) File "dataset_tool.py", line 714, in create_from_images img = img.transpose([2, 0, 1]) # HWC => CHW ValueError: axes don't match array

I then printed out what image files it would stuck on, and began taking those out of the dataset. But what it stalls on seems completely random. Anyone experienced similar issue?
opened by lowlypalace 0

StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation

Related tags

Overview

StyleGAN2 with adaptive discriminator augmentation (ADA)— Official TensorFlow implementation

Looking for the PyTorch version?

What's new

External data repository

Requirements

Getting started

Projecting images to latent space

Preparing datasets

Training new networks

Expected training time

Preparing training set sweeps

Reproducing training runs from the paper

Quality metrics

License

Citation

Development

Acknowledgements

Comments

Owner

NVIDIA Research Projects

StyleGAN2-ADA - Official PyTorch implementation

StyleGAN2-ADA - Official PyTorch implementation

StyleGAN2-ada for practice

A colab notebook for training Stylegan2-ada on colab, transfer learning onto your own dataset.

Cartoon-StyleGan2 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

StyleGAN2 - Official TensorFlow Implementation

Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)

Multi-scale discriminator feature-wise loss function

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

[WWW 2021] Source code for "Graph Contrastive Learning with Adaptive Augmentation"

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

Navigating StyleGAN2 w latent space using CLIP

StyleGAN2 Webtoon / Anime Style Toonify

StyleGAN2 with adaptive discriminator augmentation (ADA)
— Official TensorFlow implementation