Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

Overview

Alias-Free Generative Adversarial Networks (StyleGAN3)
Official PyTorch implementation of the NeurIPS 2021 paper

Teaser image

Alias-Free Generative Adversarial Networks
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, Timo Aila
https://nvlabs.github.io/stylegan3

Abstract: We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.

For business inquiries, please contact [email protected]
For press and other inquiries, please contact Hector Marinez at [email protected]

Release notes

This repository is an updated version of stylegan2-ada-pytorch, with several new features:

  • Alias-free generator architecture and training configurations (stylegan3-t, stylegan3-r).
  • Tools for interactive visualization (visualizer.py), spectral analysis (avg_spectra.py), and video generation (gen_video.py).
  • Equivariance metrics (eqt50k_int, eqt50k_frac, eqr50k).
  • General improvements: reduced memory usage, slightly faster training, bug fixes.

Compatibility:

  • Compatible with old network pickles created using stylegan2-ada and stylegan2-ada-pytorch.
  • Supports old StyleGAN2 training configurations, including ADA and transfer learning. See Training configurations for details.
  • Improved compatibility with Ampere GPUs and newer versions of PyTorch, CuDNN, etc.

Synthetic image detection

While new generator approaches enable new media synthesis capabilities, they may also present a new challenge for AI forensics algorithms for detection and attribution of synthetic media. In collaboration with digital forensic researchers participating in DARPA's SemaFor program, we curated a synthetic image dataset that allowed the researchers to test and validate the performance of their image detectors in advance of the public release. Please see here for more details.

Additional material

  • Result videos
  • Curated example images
  • StyleGAN3 pre-trained models for config T (translation equiv.) and config R (translation and rotation equiv.)

    Access individual networks via https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/ , where is one of:
    stylegan3-t-ffhq-1024x1024.pkl, stylegan3-t-ffhqu-1024x1024.pkl, stylegan3-t-ffhqu-256x256.pkl
    stylegan3-r-ffhq-1024x1024.pkl, stylegan3-r-ffhqu-1024x1024.pkl, stylegan3-r-ffhqu-256x256.pkl
    stylegan3-t-metfaces-1024x1024.pkl, stylegan3-t-metfacesu-1024x1024.pkl
    stylegan3-r-metfaces-1024x1024.pkl, stylegan3-r-metfacesu-1024x1024.pkl
    stylegan3-t-afhqv2-512x512.pkl
    stylegan3-r-afhqv2-512x512.pkl

  • StyleGAN2 pre-trained models compatible with this codebase

    Access individual networks via https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan2/versions/1/files/ , where is one of:
    stylegan2-ffhq-1024x1024.pkl, stylegan2-ffhq-512x512.pkl, stylegan2-ffhq-256x256.pkl
    stylegan2-ffhqu-1024x1024.pkl, stylegan2-ffhqu-256x256.pkl
    stylegan2-metfaces-1024x1024.pkl, stylegan2-metfacesu-1024x1024.pkl
    stylegan2-afhqv2-512x512.pkl
    stylegan2-afhqcat-512x512.pkl, stylegan2-afhqdog-512x512.pkl, stylegan2-afhqwild-512x512.pkl
    stylegan2-brecahad-512x512.pkl, stylegan2-cifar10-32x32.pkl
    stylegan2-celebahq-256x256.pkl, stylegan2-lsundog-256x256.pkl

Requirements

  • Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
  • 1–8 high-end NVIDIA GPUs with at least 12 GB of memory. We have done all testing and development using Tesla V100 and A100 GPUs.
  • 64-bit Python 3.8 and PyTorch 1.9.0 (or later). See https://pytorch.org for PyTorch install instructions.
  • CUDA toolkit 11.1 or later. (Why is a separate CUDA toolkit installation required? See Troubleshooting).
  • Python libraries: see environment.yml for exact library dependencies. You can use the following commands with Miniconda3 to create and activate your StyleGAN3 Python environment:
    • conda env create -f environment.yml
    • conda activate stylegan3
  • Docker users:

The code relies heavily on custom PyTorch extensions that are compiled on the fly using NVCC. On Windows, the compilation requires Microsoft Visual Studio. We recommend installing Visual Studio Community Edition and adding it into PATH using "C:\Program Files (x86)\Microsoft Visual Studio\ \Community\VC\Auxiliary\Build\vcvars64.bat" .

See Troubleshooting for help on common installation and run-time problems.

Getting started

Pre-trained networks are stored as *.pkl files that can be referenced using local filenames or URLs:

# Generate an image using pre-trained AFHQv2 model ("Ours" in Figure 1, left).
python gen_images.py --outdir=out --trunc=1 --seeds=2 \
    --network=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-afhqv2-512x512.pkl

# Render a 4x2 grid of interpolations for seeds 0 through 31.
python gen_video.py --output=lerp.mp4 --trunc=1 --seeds=0-31 --grid=4x2 \
    --network=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-afhqv2-512x512.pkl

Outputs from the above commands are placed under out/*.png, controlled by --outdir. Downloaded network pickles are cached under $HOME/.cache/dnnlib, which can be overridden by setting the DNNLIB_CACHE_DIR environment variable. The default PyTorch extension build directory is $HOME/.cache/torch_extensions, which can be overridden by setting TORCH_EXTENSIONS_DIR.

Docker: You can run the above curated image example using Docker as follows:

# Build the stylegan3:latest image
docker build --tag stylegan3 .

# Run the gen_images.py script using Docker:
docker run --gpus all -it --rm --user $(id -u):$(id -g) \
    -v `pwd`:/scratch --workdir /scratch -e HOME=/scratch \
    stylegan3 \
    python gen_images.py --outdir=out --trunc=1 --seeds=2 \
         --network=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-afhqv2-512x512.pkl

Note: The Docker image requires NVIDIA driver release r470 or later.

The docker run invocation may look daunting, so let's unpack its contents here:

  • --gpus all -it --rm --user $(id -u):$(id -g): with all GPUs enabled, run an interactive session with current user's UID/GID to avoid Docker writing files as root.
  • -v `pwd`:/scratch --workdir /scratch: mount current running dir (e.g., the top of this git repo on your host machine) to /scratch in the container and use that as the current working dir.
  • -e HOME=/scratch: let PyTorch and StyleGAN3 code know where to cache temporary files such as pre-trained models and custom PyTorch extension build results. Note: if you want more fine-grained control, you can instead set TORCH_EXTENSIONS_DIR (for custom extensions build dir) and DNNLIB_CACHE_DIR (for pre-trained model download cache). You want these cache dirs to reside on persistent volumes so that their contents are retained across multiple docker run invocations.

Interactive visualization

This release contains an interactive model visualization tool that can be used to explore various characteristics of a trained model. To start it, run:

python visualizer.py

Visualizer screenshot

Using networks from Python

You can use pre-trained networks in your own Python code as follows:

with open('ffhq.pkl', 'rb') as f:
    G = pickle.load(f)['G_ema'].cuda()  # torch.nn.Module
z = torch.randn([1, G.z_dim]).cuda()    # latent codes
c = None                                # class labels (not used in this example)
img = G(z, c)                           # NCHW, float32, dynamic range [-1, +1], no truncation

The above code requires torch_utils and dnnlib to be accessible via PYTHONPATH. It does not need source code for the networks themselves — their class definitions are loaded from the pickle via torch_utils.persistence.

The pickle contains three networks. 'G' and 'D' are instantaneous snapshots taken during training, and 'G_ema' represents a moving average of the generator weights over several training steps. The networks are regular instances of torch.nn.Module, with all of their parameters and buffers placed on the CPU at import and gradient computation disabled by default.

The generator consists of two submodules, G.mapping and G.synthesis, that can be executed separately. They also support various additional options:

w = G.mapping(z, c, truncation_psi=0.5, truncation_cutoff=8)
img = G.synthesis(w, noise_mode='const', force_fp32=True)

Please refer to gen_images.py for complete code example.

Preparing datasets

Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels. Custom datasets can be created from a folder containing images; see python dataset_tool.py --help for more information. Alternatively, the folder can also be used directly as a dataset, without running it through dataset_tool.py first, but doing so may lead to suboptimal performance.

FFHQ: Download the Flickr-Faces-HQ dataset as 1024x1024 images and create a zip archive using dataset_tool.py:

# Original 1024x1024 resolution.
python dataset_tool.py --source=/tmp/images1024x1024 --dest=~/datasets/ffhq-1024x1024.zip

# Scaled down 256x256 resolution.
python dataset_tool.py --source=/tmp/images1024x1024 --dest=~/datasets/ffhq-256x256.zip \
    --width=256 --height=256

See the FFHQ README for information on how to obtain the unaligned FFHQ dataset images. Use the same steps as above to create a ZIP archive for training and validation.

MetFaces: Download the MetFaces dataset and create a ZIP archive:

python dataset_tool.py --source=~/downloads/metfaces/images --dest=~/datasets/metfaces-1024x1024.zip

See the MetFaces README for information on how to obtain the unaligned MetFaces dataset images. Use the same steps as above to create a ZIP archive for training and validation.

AFHQv2: Download the AFHQv2 dataset and create a ZIP archive:

python dataset_tool.py --source=~/downloads/afhqv2 --dest=~/datasets/afhqv2-512x512.zip

Note that the above command creates a single combined dataset using all images of all three classes (cats, dogs, and wild animals), matching the setup used in the StyleGAN3 paper. Alternatively, you can also create a separate dataset for each class:

python dataset_tool.py --source=~/downloads/afhqv2/train/cat --dest=~/datasets/afhqv2cat-512x512.zip
python dataset_tool.py --source=~/downloads/afhqv2/train/dog --dest=~/datasets/afhqv2dog-512x512.zip
python dataset_tool.py --source=~/downloads/afhqv2/train/wild --dest=~/datasets/afhqv2wild-512x512.zip

Training

You can train new networks using train.py. For example:

# Train StyleGAN3-T for AFHQv2 using 8 GPUs.
python train.py --outdir=~/training-runs --cfg=stylegan3-t --data=~/datasets/afhqv2-512x512.zip \
    --gpus=8 --batch=32 --gamma=8.2 --mirror=1

# Fine-tune StyleGAN3-R for MetFaces-U using 1 GPU, starting from the pre-trained FFHQ-U pickle.
python train.py --outdir=~/training-runs --cfg=stylegan3-r --data=~/datasets/metfacesu-1024x1024.zip \
    --gpus=8 --batch=32 --gamma=6.6 --mirror=1 --kimg=5000 --snap=5 \
    --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-ffhqu-1024x1024.pkl

# Train StyleGAN2 for FFHQ at 1024x1024 resolution using 8 GPUs.
python train.py --outdir=~/training-runs --cfg=stylegan2 --data=~/datasets/ffhq-1024x1024.zip \
    --gpus=8 --batch=32 --gamma=10 --mirror=1 --aug=noaug

Note that the result quality and training time depend heavily on the exact set of options. The most important ones (--gpus, --batch, and --gamma) must be specified explicitly, and they should be selected with care. See python train.py --help for the full list of options and Training configurations for general guidelines & recommendations, along with the expected training speed & memory usage in different scenarios.

The results of each training run are saved to a newly created directory, for example ~/training-runs/00000-stylegan3-t-afhqv2-512x512-gpus8-batch32-gamma8.2. The training loop exports network pickles (network-snapshot- .pkl ) and random image grids (fakes .png ) at regular intervals (controlled by --snap). For each exported pickle, it evaluates FID (controlled by --metrics) and logs the result in metric-fid50k_full.jsonl. It also records various statistics in training_stats.jsonl, as well as *.tfevents if TensorBoard is installed.

Quality metrics

By default, train.py automatically computes FID for each network pickle exported during training. We recommend inspecting metric-fid50k_full.jsonl (or TensorBoard) at regular intervals to monitor the training progress. When desired, the automatic computation can be disabled with --metrics=none to speed up the training slightly.

Additional quality metrics can also be computed after the training:

# Previous training run: look up options automatically, save result to JSONL file.
python calc_metrics.py --metrics=eqt50k_int,eqr50k \
    --network=~/training-runs/00000-stylegan3-r-mydataset/network-snapshot-000000.pkl

# Pre-trained network pickle: specify dataset explicitly, print result to stdout.
python calc_metrics.py --metrics=fid50k_full --data=~/datasets/ffhq-1024x1024.zip --mirror=1 \
    --network=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-t-ffhq-1024x1024.pkl

The first example looks up the training configuration and performs the same operation as if --metrics=eqt50k_int,eqr50k had been specified during training. The second example downloads a pre-trained network pickle, in which case the values of --data and --mirror must be specified explicitly.

Note that the metrics can be quite expensive to compute (up to 1h), and many of them have an additional one-off cost for each new dataset (up to 30min). Also note that the evaluation is done using a different random seed each time, so the results will vary if the same metric is computed multiple times.

Recommended metrics:

  • fid50k_full: Fréchet inception distance[1] against the full dataset.
  • kid50k_full: Kernel inception distance[2] against the full dataset.
  • pr50k3_full: Precision and recall[3] againt the full dataset.
  • ppl2_wend: Perceptual path length[4] in W, endpoints, full image.
  • eqt50k_int: Equivariance[5] w.r.t. integer translation (EQ-T).
  • eqt50k_frac: Equivariance w.r.t. fractional translation (EQ-Tfrac).
  • eqr50k: Equivariance w.r.t. rotation (EQ-R).

Legacy metrics:

  • fid50k: Fréchet inception distance against 50k real images.
  • kid50k: Kernel inception distance against 50k real images.
  • pr50k3: Precision and recall against 50k real images.
  • is50k: Inception score[6] for CIFAR-10.

References:

  1. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, Heusel et al. 2017
  2. Demystifying MMD GANs, Bińkowski et al. 2018
  3. Improved Precision and Recall Metric for Assessing Generative Models, Kynkäänniemi et al. 2019
  4. A Style-Based Generator Architecture for Generative Adversarial Networks, Karras et al. 2018
  5. Alias-Free Generative Adversarial Networks, Karras et al. 2021
  6. Improved Techniques for Training GANs, Salimans et al. 2016

Spectral analysis

The easiest way to inspect the spectral properties of a given generator is to use the built-in FFT mode in visualizer.py. In addition, you can visualize average 2D power spectra (Appendix A, Figure 15) as follows:

# Calculate dataset mean and std, needed in subsequent steps.
python avg_spectra.py stats --source=~/datasets/ffhq-1024x1024.zip

# Calculate average spectrum for the training data.
python avg_spectra.py calc --source=~/datasets/ffhq-1024x1024.zip \
    --dest=tmp/training-data.npz --mean=112.684 --std=69.509

# Calculate average spectrum for a pre-trained generator.
python avg_spectra.py calc \
    --source=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-ffhq-1024x1024.pkl \
    --dest=tmp/stylegan3-r.npz --mean=112.684 --std=69.509 --num=70000

# Display results.
python avg_spectra.py heatmap tmp/training-data.npz
python avg_spectra.py heatmap tmp/stylegan3-r.npz
python avg_spectra.py slices tmp/training-data.npz tmp/stylegan3-r.npz

Average spectra screenshot

License

Copyright © 2021, NVIDIA Corporation & affiliates. All rights reserved.

This work is made available under the Nvidia Source Code License.

Citation

@inproceedings{Karras2021,
  author = {Tero Karras and Miika Aittala and Samuli Laine and Erik H\"ark\"onen and Janne Hellsten and Jaakko Lehtinen and Timo Aila},
  title = {Alias-Free Generative Adversarial Networks},
  booktitle = {Proc. NeurIPS},
  year = {2021}
}

Development

This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.

Acknowledgements

We thank David Luebke, Ming-Yu Liu, Koki Nagano, Tuomas Kynkäänniemi, and Timo Viitanen for reviewing early drafts and helpful suggestions. Frédo Durand for early discussions. Tero Kuosmanen for maintaining our compute infrastructure. AFHQ authors for an updated version of their dataset. Getty Images for the training images in the Beaches dataset. We did not receive external funding or additional revenues for this project.

Comments
  • runerror: model don't match with net

    runerror: model don't match with net

    Describe the bug load the *.pkl don't match the model

    To Reproduce Steps to reproduce the behavior:

    1. python train.py --outdir=~/training-runs --cfg=stylegan3-t --data={dir}.zip
      --gpus=1 --batch=32 --gamma=6.6 --mirror=1 --kimg=5000 --snap=5
      --resume={dir}/stylegan3-t-ffhqu-256x256.pkl
    2. RuntimeError: The size of tensor a (128) must match the size of tensor b (64) at non-singleton dimension 1
    opened by vastyao 15
  • gen_images is generating identical images for a given seed even with noise_mode='random'

    gen_images is generating identical images for a given seed even with noise_mode='random'

    I have recent custom trained stylegan3 networks and when i invoke gen_images.py to generate images for a seed list and specify the truncation value of 0, all of the generated images are identical to each other. Yet for the same seeds with non zero values for trunc the images are very different from each other, as one would expect.

    I can reproduce this by just invoking the simplest of gen_images command lines. I have not tested for a seed range larger than 30 but i don't see why the results would be any different.

    !python stylegan3/gen_images.py --seeds=1-30 --trunc=0
    --outdir='/content/drive/MyDrive/out'
    --network='/content/drive/MyDrive/networks/x.pkl'

    EDIT: see below, I was misinterpreting this as a change in truncation implementation but actually it seems there may be a different issue, specifically noise_mode='random' doing nothing?

    opened by MoemaMike 12
  • ninja: build stopped: subcommand failed.

    ninja: build stopped: subcommand failed.

    Dear Authors,

    I get the following errors when running the code using the stylegan3-t config:

    Setting up PyTorch plugin "filtered_lrelu_plugin"... Failed!
    Traceback (most recent call last):
      File "/cluster/home/user/development/lib64/python3.8/site-packages/torch/utils/cpp_extension.py", line 1666, in _run_ninja_build
        subprocess.run(
      File "/cluster/apps/nss/gcc-6.3.0/python/3.8.5/x86_64/lib64/python3.8/subprocess.py", line 512, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "train.py", line 286, in <module>
        main() # pylint: disable=no-value-for-parameter
      File "/cluster/home/user/development/lib64/python3.8/site-packages/click/core.py", line 1128, in __call__
        return self.main(*args, **kwargs)
      File "/cluster/home/user/development/lib64/python3.8/site-packages/click/core.py", line 1053, in main
        rv = self.invoke(ctx)
      File "/cluster/home/user/development/lib64/python3.8/site-packages/click/core.py", line 1395, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/cluster/home/user/development/lib64/python3.8/site-packages/click/core.py", line 754, in invoke
        return __callback(*args, **kwargs)
      File "train.py", line 281, in main
        launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
      File "train.py", line 96, in launch_training
        subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
      File "train.py", line 47, in subprocess_fn
        training_loop.training_loop(rank=rank, **c)
      File "/cluster/home/user/gan/stylegan3/training/training_loop.py", line 168, in training_loop
        img = misc.print_module_summary(G, [z, c])
      File "/cluster/home/user/gan/stylegan3/torch_utils/misc.py", line 216, in print_module_summary
        outputs = module(*inputs)
      File "/cluster/home/user/development/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
        result = forward_call(*input, **kwargs)
      File "/cluster/home/user/gan/stylegan3/training/networks_stylegan3.py", line 512, in forward
        img = self.synthesis(ws, update_emas=update_emas, **synthesis_kwargs)
      File "/cluster/home/user/development/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
        result = forward_call(*input, **kwargs)
      File "/cluster/home/user/gan/stylegan3/training/networks_stylegan3.py", line 471, in forward
        x = getattr(self, name)(x, w, **layer_kwargs)
      File "/cluster/home/user/development/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
        result = forward_call(*input, **kwargs)
      File "/cluster/home/user/gan/stylegan3/training/networks_stylegan3.py", line 355, in forward
        x = filtered_lrelu.filtered_lrelu(x=x, fu=self.up_filter, fd=self.down_filter, b=self.bias.to(x.dtype),
      File "/cluster/home/user/gan/stylegan3/torch_utils/ops/filtered_lrelu.py", line 114, in filtered_lrelu
        if impl == 'cuda' and x.device.type == 'cuda' and _init():
      File "/cluster/home/user/gan/stylegan3/torch_utils/ops/filtered_lrelu.py", line 26, in _init
        _plugin = custom_ops.get_plugin(
      File "/cluster/home/user/gan/stylegan3/torch_utils/custom_ops.py", line 136, in get_plugin
        torch.utils.cpp_extension.load(name=module_name, build_directory=cached_build_dir,
      File "/cluster/home/user/development/lib64/python3.8/site-packages/torch/utils/cpp_extension.py", line 1080, in load
        return _jit_compile(
      File "/cluster/home/user/development/lib64/python3.8/site-packages/torch/utils/cpp_extension.py", line 1293, in _jit_compile
        _write_ninja_file_and_build_library(
      File "/cluster/home/user/development/lib64/python3.8/site-packages/torch/utils/cpp_extension.py", line 1405, in _write_ninja_file_and_build_library
        _run_ninja_build(
      File "/cluster/home/user/development/lib64/python3.8/site-packages/torch/utils/cpp_extension.py", line 1682, in _run_ninja_build
        raise RuntimeError(message) from e
    RuntimeError: Error building extension 'filtered_lrelu_plugin': [1/5] /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/bin/g++ -MMD -MF filtered_lrelu.o.d -DTORCH_EXTENSION_NAME=filtered_lrelu_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/TH -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/THC -isystem /cluster/apps/gcc-6.3.0/cuda-11.1.1-s2fmzfqahrfvezvmg4tslqqedhl3bggv/include -isystem /cluster/apps/nss/gcc-6.3.0/python/3.8.5/x86_64/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /cluster/home/user/.cache/torch_extensions/filtered_lrelu_plugin/0c8207121140d174807b17c24d32b436-tesla-v100-sxm2-32gb/filtered_lrelu.cpp -o filtered_lrelu.o 
    FAILED: filtered_lrelu.o 
    /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/bin/g++ -MMD -MF filtered_lrelu.o.d -DTORCH_EXTENSION_NAME=filtered_lrelu_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/TH -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/THC -isystem /cluster/apps/gcc-6.3.0/cuda-11.1.1-s2fmzfqahrfvezvmg4tslqqedhl3bggv/include -isystem /cluster/apps/nss/gcc-6.3.0/python/3.8.5/x86_64/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /cluster/home/user/.cache/torch_extensions/filtered_lrelu_plugin/0c8207121140d174807b17c24d32b436-tesla-v100-sxm2-32gb/filtered_lrelu.cpp -o filtered_lrelu.o 
    In file included from /cluster/home/user/development/lib/python3.8/site-packages/torch/include/ATen/ATen.h:13:0,
                     from /cluster/home/user/development/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                     from /cluster/home/user/development/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                     from /cluster/home/user/development/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                     from /cluster/home/user/development/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                     from /cluster/home/user/development/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                     from /cluster/home/user/development/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                     from /cluster/home/user/development/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                     from /cluster/home/user/development/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                     from /cluster/home/user/.cache/torch_extensions/filtered_lrelu_plugin/0c8207121140d174807b17c24d32b436-tesla-v100-sxm2-32gb/filtered_lrelu.cpp:9:
    /cluster/home/user/.cache/torch_extensions/filtered_lrelu_plugin/0c8207121140d174807b17c24d32b436-tesla-v100-sxm2-32gb/filtered_lrelu.cpp: In lambda function:
    /cluster/home/user/.cache/torch_extensions/filtered_lrelu_plugin/0c8207121140d174807b17c24d32b436-tesla-v100-sxm2-32gb/filtered_lrelu.cpp:149:12: error: expected ‘(’ before ‘constexpr’
             if constexpr (sizeof(scalar_t) <= 4) // Exclude doubles. constexpr prevents template instantiation.
                ^
    /cluster/home/user/.cache/torch_extensions/filtered_lrelu_plugin/0c8207121140d174807b17c24d32b436-tesla-v100-sxm2-32gb/filtered_lrelu.cpp: In lambda function:
    /cluster/home/user/.cache/torch_extensions/filtered_lrelu_plugin/0c8207121140d174807b17c24d32b436-tesla-v100-sxm2-32gb/filtered_lrelu.cpp:149:12: error: expected ‘(’ before ‘constexpr’
             if constexpr (sizeof(scalar_t) <= 4) // Exclude doubles. constexpr prevents template instantiation.
                ^
    /cluster/home/user/.cache/torch_extensions/filtered_lrelu_plugin/0c8207121140d174807b17c24d32b436-tesla-v100-sxm2-32gb/filtered_lrelu.cpp: In lambda function:
    /cluster/home/user/.cache/torch_extensions/filtered_lrelu_plugin/0c8207121140d174807b17c24d32b436-tesla-v100-sxm2-32gb/filtered_lrelu.cpp:149:12: error: expected ‘(’ before ‘constexpr’
             if constexpr (sizeof(scalar_t) <= 4) // Exclude doubles. constexpr prevents template instantiation.
                ^
    [2/5] /cluster/apps/gcc-6.3.0/cuda-11.1.1-s2fmzfqahrfvezvmg4tslqqedhl3bggv/bin/nvcc  -ccbin /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/bin/gcc -DTORCH_EXTENSION_NAME=filtered_lrelu_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/TH -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/THC -isystem /cluster/apps/gcc-6.3.0/cuda-11.1.1-s2fmzfqahrfvezvmg4tslqqedhl3bggv/include -isystem /cluster/apps/nss/gcc-6.3.0/python/3.8.5/x86_64/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /cluster/home/user/.cache/torch_extensions/filtered_lrelu_plugin/0c8207121140d174807b17c24d32b436-tesla-v100-sxm2-32gb/filtered_lrelu_ns.cu -o filtered_lrelu_ns.cuda.o 
    [3/5] /cluster/apps/gcc-6.3.0/cuda-11.1.1-s2fmzfqahrfvezvmg4tslqqedhl3bggv/bin/nvcc  -ccbin /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/bin/gcc -DTORCH_EXTENSION_NAME=filtered_lrelu_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/TH -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/THC -isystem /cluster/apps/gcc-6.3.0/cuda-11.1.1-s2fmzfqahrfvezvmg4tslqqedhl3bggv/include -isystem /cluster/apps/nss/gcc-6.3.0/python/3.8.5/x86_64/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /cluster/home/user/.cache/torch_extensions/filtered_lrelu_plugin/0c8207121140d174807b17c24d32b436-tesla-v100-sxm2-32gb/filtered_lrelu_rd.cu -o filtered_lrelu_rd.cuda.o 
    [4/5] /cluster/apps/gcc-6.3.0/cuda-11.1.1-s2fmzfqahrfvezvmg4tslqqedhl3bggv/bin/nvcc  -ccbin /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-6.3.0-sqhtfh32p5gerbkvi5hih7cfvcpmewvj/bin/gcc -DTORCH_EXTENSION_NAME=filtered_lrelu_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/TH -isystem /cluster/home/user/development/lib64/python3.8/site-packages/torch/include/THC -isystem /cluster/apps/gcc-6.3.0/cuda-11.1.1-s2fmzfqahrfvezvmg4tslqqedhl3bggv/include -isystem /cluster/apps/nss/gcc-6.3.0/python/3.8.5/x86_64/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /cluster/home/user/.cache/torch_extensions/filtered_lrelu_plugin/0c8207121140d174807b17c24d32b436-tesla-v100-sxm2-32gb/filtered_lrelu_wr.cu -o filtered_lrelu_wr.cuda.o 
    ninja: build stopped: subcommand failed.
    

    The exact command that I'm using is:

    python train.py --outdir=./training-runs --data=$WORKDIR/data.zip --cfg=stylegan3-t --gpus=2 --batch=32 --gamma=8.2
    

    System Details:

    • OS: CentOS Linux release 7.9.2009
    • PyTorch version: 1.9.1
    • CUDA toolkit version: 11.1.1
    • NVIDIA driver version: 450.80.02
    • GPU: V100
    • GCC version: 6.3

    Thank you for your help in advance.

    opened by Msadat97 12
  • Weird shape patterns produced when attempting to train on black & white medical data

    Weird shape patterns produced when attempting to train on black & white medical data

    Hello,

    I am trying to train StyleGAN3 on 512x512 black and white PNG images. They are originally of abdominal CT scans with pixels blacked out far from the liver. Example images:

    Overall_Data_Non_Contrast_CT_4720 Overall_Data_Non_Contrast_CT_4740

    When I go to run StyleGAN3 on it, I receive images with weird shapes and lines inside the liver.

    03 04

    These images were taken at 21,600 kimg, but the same patterns can be seen throughout the entire training.

    Command line: ./train.py --outdir /output --gpus 8 --data /input/rescaled255_outline.zip --cfg stylegan3-t --batch 32 --gamma 8.2 --betas 0.,0.99 --aug noaug --freezed 0 --p 0.2 --target 0.6 --cbase 32768 --cmax 512 --dlr 0.002 --mbstd-group 4 --metrics fid50k_full --kimg 25000 --tick 4 --snap 50 --seed 0 --fp32=1 --workers 3

    The only thing I changed in the implementation was making the beta values a commandline parameter. Anyone have any ideas? I've tried setting beta0=0.9 with the same results. Also, tried changing gamma with the same results. When I run the styleGAN2 config, the lines/shapes don't show up. Though the noise is weird which is why I'm trying to get StyleGAN2 to work.

    Thank you in advance!! :)

    opened by mckellwoodland 9
  • Conditional generation in visualizer

    Conditional generation in visualizer

    visualizer.py has been useful for understanding how my model trained and generates images. Now, I moved on to training conditional gan, and so far, the training results seem reasonable. I also tried to load my conditionally trained weight file onto visualizer, but I do not see a class option. I have been wondering if there is a way to generate images in certain class in visualizer.py.

    I appreciate in advance.

    opened by jasony93 7
  • Can we get more useful pretrained models?

    Can we get more useful pretrained models?

    I'm all out of spare $40k ML rigs and colab pro is going to take months to train this. A lot of us simply don't have the money to train this from scratch.

    With the chip shortage making this particularly tough, is it out of the question that we will see something other than just pretrained face models? The only way for us plebs to really use any of this code is to transfer learn onto it at the moment or wait on a handful of benevolent randos to share a model. Adjusting when retraining from scratch isn't viable without an unreasonably expensive piece of hardware (at the moment at least).

    A full body model trained on humanoid characters or photos, a landscape model, a generic art model--any of those would be more creatively useful than cat pic and portrait models. I really want to dig in to StyleGAN3 and explore what it can do but I'm struggling to transfer learn anything other than faces onto it. Portrait models aren't working for transferring anything other than other portraits (expected but I at least wanted to try).

    Only recently have gotten into StyleGAN, maybe I'll be able to use it in a couple years..

    opened by WyattAutomation 7
  • stuck on evaluating metrics

    stuck on evaluating metrics

    Began training a new GAN last night on a dataset of 512x512 images, and after letting it run overnight I still do not have the 'tick 1' benchmark. I did notice in the readme that it would take a while if only training on 1 GPU (i am training on a single 3090), but 14+ hours before the first benchmark seems long. Do I need to reset my training parameters and start over? (gpus=1, batch=32, gamma=8, batch-gpu=8, snap=5)

    opened by eccegallery 7
  • Question: Is it possible to use stylegan2(-ada) models with stylegan3?

    Question: Is it possible to use stylegan2(-ada) models with stylegan3?

    I know that this form is ment for bug reports but I currently can't find another way to ask.

    Is it possible to use stylegan2-ada(-pytorch) models with stylegan3 and expect decent results or do I need to fully retrain the model?

    If I do need to retrain the model, is there a way to make it augment the dataset like stylegan2-ada? If there is a way to do that, could you please provide me on how to do that?

    Thanks so much! I'm sure this is going to be awesome 😉😉

    opened by Randy1435 7
  • Evaluating Metrics for very long

    Evaluating Metrics for very long

    Hello. I have the issue that when I want to generate new 512x512 images with my dataset I made with the native dataset tool it just prints the first tick and then evaluates metrics till I stop the script. What does evaluating Metrics mean and how can I fix this or make it faster? Thanks

    opened by Hat-The-Second 5
  • Is it possible to control the scale of the output?

    Is it possible to control the scale of the output?

    Hi,

    I am tweaking with the affine transformations for the first layer at: https://github.com/NVlabs/stylegan3/blob/a5a69f58294509598714d1e88c9646c3d7c6ec94/training/networks_stylegan3.py#L205 I am wondering if it is possible to control the scale as well as translation and rotation? I have tried adding scale but the results are bad. If the scale is too big, then face collapses into the point or shred all over the place with smaller scale. In particular, I am interested in generating the images with wide crop, so that the whole head(with hair) and neck is visible.

    Would be very grateful for your help or any commentaries.

    opened by backpass 5
  • filtered_lrelu_plugin problem

    filtered_lrelu_plugin problem

    At first I got:

    filtered_lrelu.cpp(147): error C4984: 'if constexpr' is a C++17 language extension

    It's from this line:

    if constexpr (sizeof(scalar_t) <= 4)

    sizeof scalar_t is either less/equal 4, or greater than 4, so I tried to remove this condition sentence.

    But then I got:

    filtered_lrelu_plugin.pyd : fatal error LNK1120: 6 unresolved externals

    There are 6 unresolved symbols about template function 'choose_filtered_lrelu_kernel'. Maybe scalar_t is greater than 4, so I removed all the code referencing 'choose_filtered_lrelu_kernel'.

    But then I failed at this line:

    filtered_lrelu.cpp(160)

    TORCH_CHECK(spec.exec, "internal error - CUDA kernel not found") // This should not happen because we tested earlier that kernel exists.
    

    My environment is Windows 10, python 3.8, CUDA 11.1 (cudnn-11.1-windows-x64-v8.0.5.39), torch 1.9.0.

    opened by k-l-lambda 5
  • increase the efficiency of GPU usage

    increase the efficiency of GPU usage

    Can someone explain, I have problem with gpu usage while training:

    tick 0     kimg 0.0      time 28s          sec/tick 5.0     sec/kimg 1243.95 maintenance 23.5   cpumem 5.17   gpumem 17.68  reserved 20.87  augment 0.000
    tick 1     kimg 20.0     time 1h 19m 08s   sec/tick 4711.9  sec/kimg 235.59  maintenance 7.7    cpumem 5.54   gpumem 10.88  reserved 20.23  augment 0.188
    tick 2     kimg 40.0     time 2h 37m 04s   sec/tick 4667.8  sec/kimg 233.39  maintenance 7.9    cpumem 4.79   gpumem 10.88  reserved 20.24  augment 0.374
    tick 3     kimg 60.0     time 3h 55m 46s   sec/tick 4713.9  sec/kimg 235.69  maintenance 8.1    cpumem 3.33   gpumem 10.99  reserved 20.24  augment 0.543
    tick 4     kimg 80.0     time 5h 22m 19s   sec/tick 5185.1  sec/kimg 259.25  maintenance 8.2    cpumem 2.63   gpumem 11.09  reserved 20.25  augment 0.688
    tick 5     kimg 100.0    time 6h 48m 03s   sec/tick 5135.8  sec/kimg 256.79  maintenance 8.4    cpumem 2.13   gpumem 11.63  reserved 20.26  augment 0.804
    tick 6     kimg 120.0    time 8h 16m 33s   sec/tick 5302.2  sec/kimg 265.11  maintenance 7.7    cpumem 1.73   gpumem 11.10  reserved 20.26  augment 0.901
    tick 7     kimg 140.0    time 9h 40m 06s   sec/tick 5004.8  sec/kimg 250.24  maintenance 8.4    cpumem 1.37   gpumem 11.09  reserved 20.27  augment 0.984
    tick 8     kimg 160.0    time 10h 59m 36s  sec/tick 4761.6  sec/kimg 238.08  maintenance 8.2    cpumem 1.25   gpumem 11.24  reserved 20.28  augment 1.062
    tick 9     kimg 180.0    time 12h 18m 14s  sec/tick 4709.5  sec/kimg 235.47  maintenance 8.1    cpumem 1.23   gpumem 11.25  reserved 20.28  augment 1.146
    tick 10    kimg 200.0    time 13h 36m 16s  sec/tick 4674.1  sec/kimg 233.71  maintenance 7.8    cpumem 1.24   gpumem 11.64  reserved 20.29  augment 1.231
    tick 11    kimg 220.0    time 14h 54m 19s  sec/tick 4675.1  sec/kimg 233.75  maintenance 7.9    cpumem 1.26   gpumem 11.21  reserved 20.29  augment 1.314
    tick 12    kimg 240.0    time 16h 12m 33s  sec/tick 4686.5  sec/kimg 234.33  maintenance 8.2    cpumem 1.26   gpumem 11.14  reserved 20.30  augment 1.392
    tick 13    kimg 260.0    time 17h 33m 48s  sec/tick 4866.5  sec/kimg 243.32  maintenance 7.9    cpumem 1.22   gpumem 11.50  reserved 20.30  augment 1.476
    

    At the 0 tick there is 17 gb gpu usage, but than it decreases drastically. And increase of batch size don't help because of not enough memory error at 0 tick. Is there any way to fix it?

    opened by PodoprikhinMaxim 1
  • 4090 Support?

    4090 Support?

    I've been able to get stylegan3 to work using the docker img however it's running PyTorch 1.10 which doesn't work well with the 4090? Will there be any support for this card in the future or PyTorch 1.13 support?

    Would seriously love to train models locally with my new card.

    Running the 4090 through docker gives me speeds about three times slower than the A100. Should be much closer to the same speed I believe.

    opened by taconugget 0
  • generate_images raise Value Error: not enough image data??

    generate_images raise Value Error: not enough image data??

    I am using my owner datasets to train a model by the stylegan3 code. The datasets are gray images. I have trained a model and want to use the generate_images file to generate samples. But it can't work when I run the code. The information is followed. error: Loading networks from "../input/afhqv2model/network-snapshot-000270.pkl"... Generating image for seed 0 (0/1) ... Setting up PyTorch plugin "bias_act_plugin"... Done. Setting up PyTorch plugin "filtered_lrelu_plugin"... Done. Traceback (most recent call last): File "../input/stylegan3pytorch/stylegan3-main/gen_images.py", line 143, in generate_images() # pylint: disable=no-value-for-parameter File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1128, in call return self.main(*args, **kwargs) File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "../input/stylegan3pytorch/stylegan3-main/gen_images.py", line 137, in generate_images PIL.Image.fromarray(img[0].cpu().numpy(), 'RGB').save(f'{outdir}/seed{seed:04d}.png') File "/opt/conda/lib/python3.7/site-packages/PIL/Image.py", line 2949, in fromarray return frombuffer(mode, size, obj, "raw", rawmode, 0, 1) File "/opt/conda/lib/python3.7/site-packages/PIL/Image.py", line 2876, in frombuffer return frombytes(mode, size, data, decoder_name, args) File "/opt/conda/lib/python3.7/site-packages/PIL/Image.py", line 2822, in frombytes im.frombytes(data, decoder_name, args) File "/opt/conda/lib/python3.7/site-packages/PIL/Image.py", line 831, in frombytes raise ValueError("not enough image data") ValueError: not enough image data What can I do about that? Thanks for helping me solve the problem.

    opened by huaikaimancheng 1
  • Implementing training in HSV color space?

    Implementing training in HSV color space?

    I've been thinking for a while now about the fact that most image-based models operate in RGB color space. While it makes sense from a practicality point of view, since it's the most common format and thus requires little or no preprocessing/conversion, it has some properties that make it a less-than-ideal choice for training a neural network. Interpolation through the RGB space is "unnatural" in terms of human perception - points mathematically close to each other in the color space can be pretty far apart/different in terms of their perceived similarity. While human perception obviously isn't the same as a neural network's, it stands to reason that a smoother, more natural color interpolation curve would be compatible with the whole concept of gradient descent, and could result in faster training/convergence - especially in datasets with predominantly human-created images (artwork, design, fashion, etc.) where human color perception is very much a factor in color distribution in the training images.

    All of that in mind, I set out to convert StyleGAN3 to operate in the HSV color space. It seemed like a trivial task at first - after loading the RGB images as usual, just convert to HSV and normalize to the -1,1 range as you would RGB images. The network is then effectively training in HSV, and when synthesizing we just need to do HSV -> RGB after de-normalizing. Of course, it ended up being not quite that simple...

    I'm pretty confident in doing any necessary conversion/normalization changes that need to be done to training/dataset.py and training/training_loop.py, but things get more complicated from there. Since different color spaces were never a concern, RGB is assumed and hardcoded in quite a few places including the core network architecture(s), with toRGB layers, fromRGB functions, etc. And I definitely don't have the confidence to make changes to the architecture and also know that what I'm doing actually makes sense for what I'm trying to accomplish, so hoping someone more knowledgeable can chime in...

    How difficult would this actually be to implement, and would it require architecture changes, or can I just assume that everything will work as expected provided I just normalize network inputs to the same range?

    opened by Kaoru8 0
  • Bug in conditioning of discriminator?

    Bug in conditioning of discriminator?

    Describe the bug While the conditioning of the generator seems to construct the net and behave expectably, the conditioning of the discriminator seems to ignore depth of the mapping network.

    To Reproduce Steps to reproduce the behavior: Create any model with --map-depth value other than 8:

    Generator            Parameters  Buffers  Output shape      Datatype
    ---                  ---         ---      ---               ---
    mapping.embed        3072        -        [16, 512]         float32
    mapping.fc0          524800      -        [16, 512]         float32
    mapping.fc1          262656      -        [16, 512]         float32
    mapping              -           512      [16, 10, 512]     float32
    synthesis.b4.conv1   69761       32       [16, 64, 4, 4]    float32
    synthesis.b4.torgb   33027       -        [16, 3, 4, 4]     float32
    synthesis.b4:0       1024        16       [16, 64, 4, 4]    float32
    synthesis.b4:1       -           -        [16, 3, 4, 4]     float32
    synthesis.b8.conv0   69761       80       [16, 64, 8, 8]    float32
    synthesis.b8.conv1   69761       80       [16, 64, 8, 8]    float32
    synthesis.b8.torgb   33027       -        [16, 3, 8, 8]     float32
    synthesis.b8:0       -           16       [16, 64, 8, 8]    float32
    synthesis.b8:1       -           -        [16, 3, 8, 8]     float32
    synthesis.b16.conv0  69761       272      [16, 64, 16, 16]  float32
    synthesis.b16.conv1  69761       272      [16, 64, 16, 16]  float32
    synthesis.b16.torgb  33027       -        [16, 3, 16, 16]   float32
    synthesis.b16:0      -           16       [16, 64, 16, 16]  float32
    synthesis.b16:1      -           -        [16, 3, 16, 16]   float32
    synthesis.b32.conv0  69761       1040     [16, 64, 32, 32]  float32
    synthesis.b32.conv1  69761       1040     [16, 64, 32, 32]  float32
    synthesis.b32.torgb  33027       -        [16, 3, 32, 32]   float32
    synthesis.b32:0      -           16       [16, 64, 32, 32]  float32
    synthesis.b32:1      -           -        [16, 3, 32, 32]   float32
    synthesis.b64.conv0  69761       4112     [16, 64, 64, 64]  float32
    synthesis.b64.conv1  69761       4112     [16, 64, 64, 64]  float32
    synthesis.b64.torgb  33027       -        [16, 3, 64, 64]   float32
    synthesis.b64:0      -           16       [16, 64, 64, 64]  float32
    synthesis.b64:1      -           -        [16, 3, 64, 64]   float32
    ---                  ---         ---      ---               ---
    Total                1584536     11632    -                 -
    
    
    Discriminator  Parameters  Buffers  Output shape      Datatype
    ---            ---         ---      ---               ---
    b64.fromrgb    256         16       [16, 64, 64, 64]  float32
    b64.skip       4096        16       [16, 64, 32, 32]  float32
    b64.conv0      36928       16       [16, 64, 64, 64]  float32
    b64.conv1      36928       16       [16, 64, 32, 32]  float32
    b64            -           16       [16, 64, 32, 32]  float32
    b32.skip       4096        16       [16, 64, 16, 16]  float32
    b32.conv0      36928       16       [16, 64, 32, 32]  float32
    b32.conv1      36928       16       [16, 64, 16, 16]  float32
    b32            -           16       [16, 64, 16, 16]  float32
    b16.skip       4096        16       [16, 64, 8, 8]    float32
    b16.conv0      36928       16       [16, 64, 16, 16]  float32
    b16.conv1      36928       16       [16, 64, 8, 8]    float32
    b16            -           16       [16, 64, 8, 8]    float32
    b8.skip        4096        16       [16, 64, 4, 4]    float32
    b8.conv0       36928       16       [16, 64, 8, 8]    float32
    b8.conv1       36928       16       [16, 64, 4, 4]    float32
    b8             -           16       [16, 64, 4, 4]    float32
    mapping.embed  384         -        [16, 64]          float32
    mapping.fc0    4160        -        [16, 64]          float32
    mapping.fc1    4160        -        [16, 64]          float32
    mapping.fc2    4160        -        [16, 64]          float32
    mapping.fc3    4160        -        [16, 64]          float32
    mapping.fc4    4160        -        [16, 64]          float32
    mapping.fc5    4160        -        [16, 64]          float32
    mapping.fc6    4160        -        [16, 64]          float32
    mapping.fc7    4160        -        [16, 64]          float32
    b4.mbstd       -           -        [16, 65, 4, 4]    float32
    b4.conv        37504       16       [16, 64, 4, 4]    float32
    b4.fc          65600       -        [16, 64]          float32
    b4.out         4160        -        [16, 64]          float32
    b4             -           -        [16, 1]           float32
    ---            ---         ---      ---               ---
    Total          452992      288      -                 -
    

    Expected behavior I expect that the discrimintor would have same number of mapping layers as generator, in example above, 2 was used for --map-depth, but the discriminator have 8 mapping layers at the end of itself for some reason. While it seems that it doesn't break the model(not sure), because it trains, it progresses, i didn't did a long run, just a couple of KIMGs, so can't say anything indepth about it.

    Alias Free GAN(StyleGAN 3) impy on that 8 layers of mapping network are unnecesary, and 2 layers is enough, so i decided to use same strategy with SG2 network. By the way, with --cfg=stylegan3-t/r config, same thing happens, even though 2 layers was configured for it out of the box in the code.

    At first i though that there is some hardcoded constant value of 8 layers in the code that was left by mistake, but i didn't find anything that could prove this point, moreover, i find this part of code very convoluted and hard to understand(at least for me), which gave more questions than answers for me. I am talking about networks_stylegan2/3.py file, cause i was investigating into it, and i'm pretty sure this is the right place to look at.

    Additional context This is just a toy model i was using to experiment with it, so don't try to understand why it has just 64 filters in all channels, and other stuff ;) CMD prompt that was used to initialize the model: python train.py --cfg=stylegan2 --gpus=1 --batch=16 --outdir= --data= --cmax=64 --metrics=none --gamma=2 --mirror=1 --fp32=1 --cond=1 --map-depth=2 GTX10X0 series was used, so FP32 mode was turned on, cause mixed precision didn't give any speedup, only slowdown on GPUs of this series.

    opened by DEBIHOOD 0
  • Geometric artifact when training on FFHQ256x256 with config stylegan3-r

    Geometric artifact when training on FFHQ256x256 with config stylegan3-r

    I tried to retrained stylegan3-r on FFHQ256x256 with reduced channel numbers. As this doc (https://github.com/NVlabs/stylegan3/blob/main/docs/configs.md) says, i expected that reducing channel numbers will not reduce quality. I set channel_base= 16384 and channel_max=256, so automatically doubled in model, and also set r1 to be 2.

    Now FID reach 12.xxx but the results has serious weird artifacts and don't seem like FID 12.xxxx. This didn't happened when i trained StyleGAN2 config.

    i'm working on video generation in latent level. But provided weights are too heavy for my model. I expected StyleGAN3 can reduce the serious problem of aliasing. I already read related issues #185 , #77 etc. is there really no way to avoid this phenomenon?

    스크린샷 2022-11-07 오후 8 28 28 스크린샷 2022-11-07 오후 8 11 59
    opened by TaekyungKi 0
Owner
NVIDIA Research Projects
NVIDIA Research Projects
Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Alias-Free-Torch Simple torch module implementation of Alias-Free GAN. This repository including Alias-Free GAN style lowpass sinc filter @filter.py A

이준혁(Junhyeok Lee) 64 Dec 22, 2022
Unofficial Alias-Free GAN implementation. Based on rosinality's version with expanded training and inference options.

Alias-Free GAN An unofficial version of Alias-Free Generative Adversarial Networks (https://arxiv.org/abs/2106.12423). This repository was heavily bas

dusk (they/them) 75 Dec 12, 2022
Trying to understand alias-free-gan.

alias-free-gan-explanation Trying to understand alias-free-gan in my own way. [Chinese Version 中文版本] CC-BY-4.0 License. Tzu-Heng Lin motivation of thi

Tzu-Heng Lin 12 Mar 17, 2022
Digan - Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

DIGAN (ICLR 2022) Official PyTorch implementation of "Generating Videos with Dyn

Sihyun Yu 147 Dec 31, 2022
Official implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks"

DiscoGAN Official PyTorch implementation of Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. Prerequisites Python 2.7

SK T-Brain 754 Dec 29, 2022
Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

Thomas Neumann 117 Nov 27, 2022
Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

StackGAN-v2 StackGAN-v1: Tensorflow implementation StackGAN-v1: Pytorch implementation Inception score evaluation Pytorch implementation for reproduci

Han Zhang 809 Dec 16, 2022
PyTorch implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks"

DiscoGAN in PyTorch PyTorch implementation of Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. * All samples in READM

Taehoon Kim 1k Jan 4, 2023
A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing.

AnimeGAN A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing. Randomly Generated Images The images are

Jie Lei 雷杰 1.2k Jan 3, 2023
A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

Eugenio Herrera 175 Dec 29, 2022
StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

null 3k Jan 8, 2023
PyTorch implementations of Generative Adversarial Networks.

This repository has gone stale as I unfortunately do not have the time to maintain it anymore. If you would like to continue the development of it as

Erik Linder-Norén 13.4k Jan 8, 2023
Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations

ODE GAN (Prototype) in PyTorch Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary

Somshubra Majumdar 15 Feb 10, 2022
[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks, ICLR 2021 (Spotlight) Demo | Paper [NEW!] Time to play with our interac

Shengyu Zhao 373 Jan 2, 2023
Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)

Regularizing Generative Adversarial Networks under Limited Data [Project Page][Paper] Implementation for our GAN regularization method. The proposed r

Google 148 Nov 18, 2022
NR-GAN: Noise Robust Generative Adversarial Networks

NR-GAN: Noise Robust Generative Adversarial Networks (CVPR 2020) This repository provides PyTorch implementation for noise robust GAN (NR-GAN). NR-GAN

Takuhiro Kaneko 59 Dec 11, 2022
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

Rishikesh (ऋषिकेश) 31 Dec 8, 2022
Generating Anime Images by Implementing Deep Convolutional Generative Adversarial Networks paper

AnimeGAN - Deep Convolutional Generative Adverserial Network PyTorch implementation of DCGAN introduced in the paper: Unsupervised Representation Lear

Rohit Kukreja 23 Jul 21, 2022
π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis Project Page | Paper | Data Eric Ryan Chan*, Marco Monteiro*, Pe

null 375 Dec 31, 2022