Official Implementation for HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing

Overview

HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing

Yuval Alaluf*, Omer Tov*, Ron Mokady, Rinon Gal, Amit H. Bermano
*Denotes equal contribution

The inversion of real images into StyleGAN's latent space is a well-studied problem. Nevertheless, applying existing approaches to real-world scenarios remains an open challenge, due to an inherent trade-off between reconstruction and editability: latent space regions which can accurately represent real images typically suffer from degraded semantic control. Recent work proposes to mitigate this trade-off by fine-tuning the generator to add the target image to well-behaved, editable regions of the latent space. While promising, this fine-tuning scheme is impractical for prevalent use as it requires a lengthy training phase for each new image. In this work, we introduce this approach into the realm of encoder-based inversion. We propose HyperStyle, a hypernetwork that learns to modulate StyleGAN's weights to faithfully express a given image in editable regions of the latent space. A naive modulation approach would require training a hypernetwork with over three billion parameters. Through careful network design, we reduce this to be in line with existing encoders. HyperStyle yields reconstructions comparable to those of optimization techniques with the near real-time inference capabilities of encoders. Lastly, we demonstrate HyperStyle's effectiveness on several applications beyond the inversion task, including the editing of out-of-domain images which were never seen during training.

Inference Notebook:
Animation Notebook:


Given a desired input image, our hypernetworks learn to modulate a pre-trained StyleGAN network to achieve accurate image reconstructions in editable regions of the latent space. Doing so enables one to effectively apply techniques such as StyleCLIP and InterFaceGAN for editing real images.

Description

Official Implementation of our HyperStyle paper for both training and evaluation. HyperStyle introduces a new approach for learning to efficiently modify a pretrained StyleGAN generator based on a given target image through the use of hypernetworks.

Getting Started

Prerequisites

  • Linux or macOS
  • NVIDIA GPU + CUDA CuDNN (CPU may be possible with some modifications, but is not inherently supported)
  • Python 3

Installation

  • Dependencies: We recommend running this repository using Anaconda.
    All dependencies for defining the environment are provided in environment/hyperstyle_env.yaml.

Pretrained HyperStyle Models

In this repository, we provide pretrained HyperStyle models for various domains.
All models make use of a modified, pretrained e4e encoder for obtaining an initial inversion into the W latent space.

Please download the pretrained models from the following links.

Path Description
Human Faces HyperStyle trained on the FFHQ dataset.
Cars HyperStyle trained on the Stanford Cars dataset.
Wild HyperStyle trained on the AFHQ Wild dataset.

Auxiliary Models

In addition, we provide various auxiliary models needed for training your own HyperStyle models from scratch.
These include the pretrained e4e encoders into W, pretrained StyleGAN2 generators, and models used for loss computation.


Pretrained W-Encoders

Path Description
Faces W-Encoder Pretrained e4e encoder trained on FFHQ into the W latent space.
Cars W-Encoder Pretrained e4e encoder trained on Stanford Cars into the W latent space.
Wild W-Encoder Pretrained e4e encoder trained on AFHQ Wild into the W latent space.

StyleGAN2 Generators

Path Description
FFHQ StyleGAN StyleGAN2 model trained on FFHQ with 1024x1024 output resolution.
LSUN Car StyleGAN StyleGAN2 model trained on LSUN Car with 512x384 output resolution.
AFHQ Wild StyleGAN StyleGAN-ADA model trained on AFHQ Wild with 512x512 output resolution.
Toonify Toonify generator from Doron Adler and Justin Pinkney converted to Pytorch using rosinality's conversion script, used in domain adaptation.
Pixar Pixar generator from StyleGAN-NADA used in domain adaptation.

Note: all StyleGAN models are converted from the official TensorFlow models to PyTorch using the conversion script from rosinality.


Other Utility Models

Path Description
IR-SE50 Model Pretrained IR-SE50 model taken from TreB1eN for use in our ID loss and encoder backbone on human facial domain.
ResNet-34 Model ResNet-34 model trained on ImageNet taken from torchvision for initializing our encoder backbone.
MoCov2 Model Pretrained ResNet-50 model trained using MOCOv2 for computing MoCo-based loss on non-facial domains. The model is taken from the official implementation.
CurricularFace Backbone Pretrained CurricularFace model taken from HuangYG123 for use in ID similarity metric computation.
MTCNN Weights for MTCNN model taken from TreB1eN for use in ID similarity metric computation. (Unpack the tar.gz to extract the 3 model weights.)

By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models. However, you may use your own paths by changing the necessary values in configs/path_configs.py.



Training

Preparing your Data

In order to train HyperStyle on your own data, you should perform the following steps:

  1. Update configs/paths_config.py with the necessary data paths and model paths for training and inference.
dataset_paths = {
    'train_data': '/path/to/train/data'
    'test_data': '/path/to/test/data',
}
  1. Configure a new dataset under the DATASETS variable defined in configs/data_configs.py. There, you should define the source/target data paths for the train and test sets as well as the transforms to be used for training and inference.
DATASETS = {
	'my_hypernet': {
		'transforms': transforms_config.EncodeTransforms,   # can define a custom transform, if desired
		'train_source_root': dataset_paths['train_data'],
		'train_target_root': dataset_paths['train_data'],
		'test_source_root': dataset_paths['test_data'],
		'test_target_root': dataset_paths['test_data'],
	}
}
  1. To train with your newly defined dataset, simply use the flag --dataset_type my_hypernet.

Preparing your Generator

In this work, we use rosinality's StyleGAN2 implementation. If you wish to use your own generator trained using NVIDIA's implementation there are a few options we recommend:

  1. Using NVIDIA's StyleGAN2 / StyleGAN-ADA TensorFlow implementation.
    You can then convert the TensorFlow .pkl checkpoints to the supported format using the conversion script found in rosinality's implementation.
  2. Using NVIDIA's StyleGAN-ADA PyTorch implementation.
    You can then convert the PyTorch .pkl checkpoints to the supported format using the conversion script created by Justin Pinkney found in dvschultz's fork.

Once you have the converted .pt files, you should be ready to use them in this repository.


Training HyperStyle

The main training script can be found in scripts/train.py.
Intermediate training results are saved to opts.exp_dir. This includes checkpoints, train outputs, and test outputs.
Additionally, if you have tensorboard installed, you can visualize tensorboard logs in opts.exp_dir/logs.

Training HyperStyle with the settings used in the paper can be done by running the following command. Here, we provide an example for training on the human faces domain:

python scripts/train.py \
--dataset_type=ffhq_hypernet \
--encoder_type=SharedWeightsHyperNetResNet \
--exp_dir=experiments/hyperstyle \
--workers=8 \
--batch_size=8 \
--test_batch_size=8 \
--test_workers=8 \
--val_interval=5000 \
--save_interval=10000 \
--lpips_lambda=0.8 \
--l2_lambda=1 \
--id_lambda=0.1 \
--n_iters_per_batch=5 \
--max_val_batches=150 \
--output_size=1024 \
--load_w_encoder \
--w_encoder_checkpoint_path pretrained_models/faces_w_encoder \ 
--layers_to_tune=0,2,3,5,6,8,9,11,12,14,15,17,18,20,21,23,24

Additional Notes:

  • To select which generator layers to tune with the hypernetwork, you can use the --layers_to_tune flag.
    • By default, we will alter all non-toRGB convolutional layers.
  • ID/similarity losses:
    • For the human facial domain we use a specialized ID loss based on a pretrained ArcFace network. This is set using the flag --id_lambda=0.1.
    • For all other domains, please set --id_lambda=0 and --moco_lambda=0.5 to use the MoCo-based similarity loss from Tov et al.
      • Note, you cannot set both id_lambda and moco_lambda to be active simultaneously.
  • You should also adjust the --output_size and --stylegan_weights flags according to your StyleGAN generator.
  • To use the HyperStyle with Refinement Blocks based on separable convolutions (see the ablation study), you can set the encoder_type to SharedWeightsHyperNetResNetSeparable.
  • See options/train_options.py for all training-specific flags.

Pre-Extracting Initial Inversions:

To provide a small speed-up and slightly reduce memory consumption, we could pre-extract all the latents and inversions from our W-encoder rather than inverting on the fly during training.
We provide an example for how to do this in configs/data_configs.py under the ffhq_hypernet_pre_extract dataset.
Here, we must define:

  • train_source_root: the directory holding all the initial inversions
  • train_target_root: the directory holding all target images (i.e., original images)
  • train_latents_path: the .npy file holding the latents for the inversions of the form
    latents = { "0.jpg": latent, "1.jpg": latent, ... }.

And similarly for the test dataset.

Performing the above and pre-extracting the latents and inversions could also allow you to train HyperStyle using latents from various encoders such as pSp, e4e, and ReStyle into W+ rather than using our pretrained encoder into W.

During training, we will use the LatentsImagesDataset for loading the inversion, latent code, and target image.


Inference

Inference Notebooks

To help visualize the results of ReStyle we provide a Jupyter notebook found in notebooks/inference_playground.ipynb.
The notebook will download the pretrained models and run inference on the images found in notebooks/images or on images of your choosing. It is recommended to run this in Google Colab.

We have also provided a notebook for generating interpolation videos such as those found in the project page. This notebook can be run using Google Colab here.


Inference Script

You can use scripts/inference.py to apply a trained HyperStyle model on a set of images:

python scripts/inference.py \
--exp_dir=/path/to/experiment \
--checkpoint_path=experiment/checkpoints/best_model.pt \
--data_path=/path/to/test_data \
--test_batch_size=4 \
--test_workers=4 \
--n_iters_per_batch=5 \
--load_w_encoder \
--w_encoder_checkpoint_path /path/to/w_encoder.pt

This script will save each step's outputs in a separate sub-directory (e.g., the outputs of step i will be saved in /path/to/experiment/inference_results/i). In addition, side-by-side reconstruction results will be saved to /path/to/experiment/inference_coupled.

Notes:

  • By default, the images will be saved at their original output resolutions (e.g., 1024x1024 for faces, 512x384 for cars).
    • If you wish to save outputs resized to resolutions of 256x256 (or 256x192 for cars), you can do so by adding the flag --resize_outputs.
  • This script will also save all the latents as an .npy file in a dictionary format as follows:
    • latents = { "0.jpg": latent, "1.jpg": latent, ... }
  • In addition, by setting the flag --save_weight_deltas, we will save the final predicted weight deltas for each image.
    • These will be saved as .npy files in the sub-directory weight_deltas.
    • Setting this flag is important if you would like to apply them for some down-stream task. For example, if you would like apply them for editing using StyleCLIP (see below).

Computing Metrics

Given a trained model and generated outputs, we can compute the loss metrics on a given dataset.
These scripts receive the inference output directory and ground truth directory.

python scripts/calc_losses_on_images.py \
--metrics lpips,l2,msssim \
--output_path=/path/to/experiment/inference_results \
--gt_path=/path/to/test_images

Here, we can compute multiple metrics using a comma-separated list with the flag --metrics.

Similarly, to compute the ID similarity:

python scripts/calc_losses_on_images.py \
--output_path=/path/to/experiment/inference_results \
--gt_path=/path/to/test_images

These scripts will traverse through each sub-directory of output_path to compute the metrics on each step's output images.


Editing


Editing results obtained via HyperStyle using StyleCLIP, InterFaceGAN, and GanSpace, respectively.

For performing inference and editing using InterFaceGAN (for faces) and GANSpace (for cars), you can run editing/inference_face_editing.py and editing/inference_cars_editing.py.


Editing Faces with InterFaceGAN:

python editing/inference_face_editing.py \
--exp_dir=/path/to/experiment \
--checkpoint_path=experiment/checkpoints/best_model.pt \
--data_path=/path/to/test_data \
--test_batch_size=4 \
--test_workers=4 \
--n_iters_per_batch=3 \
--edit_directions=age,pose,smile \
--factor_ranges=5

For InterFaceGAN we currently support edits of age, pose, and smile.


Editing Cars with GanSpace:

python editing/inference_cars_editing.py \
--exp_dir=/path/to/experiment \
--checkpoint_path=experiment/checkpoints/best_model.pt \
--data_path=/path/to/test_data \
--test_batch_size=4 \
--test_workers=4 \
--n_iters_per_batch=3

For GANSpace we currently support edits of pose, cube, color, and grass.

These scripts will perform the inversion immediately followed by the latent space edit.
For each image, we save the original image followed by the inversion and the resulting edits.


Editing Faces with StyleCLIP:

In addition, we support editing with StyleCLIP's global directions approach on the human faces domain. Editing can be performed by running editing/styleclip/edit.py. For example,

python editing/styleclip/edit.py \
--exp_dir /path/to/experiment \   
--weight_deltas_path /path/to/experiment/weight_deltas \
--neutral_text "a face" \
--target_tex "a face with a beard" \

Note: before running the above script, you need to install the official CLIP package:

pip install git+https://github.com/openai/CLIP.git

Note: we assume that latents.npy and the directory weight_deltas, obtained by running inference.py are both saved in the the given exp_dir.
For each input image we save a grid of results with different values of alpha and beta as defined in StyleCLIP.


Domain Adaptation


Domain adaptation results obtained via HyperStyle by applying the learned weight offsets to various fine-tuned generators.

In scripts/run_domain_adaptation.py, we provide a script for performing domain adaptation from the FFHQ domain to another (e.g., toons or sketches). Specifically, using a HyperStyle network trained on FFHQ, we can predict the weight offsets for a given input image. We can then apply the predicted weight offsets to a fine-tuned generator to obtain a translated image that better preserves the input image.

A example command is provided below:

python scripts/run_domain_adaptation.py \
--exp_dir /path/to/experiment \   
--checkpoint_path=experiment/checkpoints/best_model.pt \
--data_path=/path/to/test_data \
--test_batch_size=4 \
--test_workers=4 \
--load_w_encoder \
--w_encoder_checkpoint_path=pretrained_models/faces_w_encoder.pt \
--restyle_checkpoint_path=pretrained_models/restyle_e4e_ffhq_encode.pt \
--finetuned_generator_checkpoint_path=pretrained_models/pixar.pt \
--n_iters_per_batch=2 \
--restyle_n_iterations=2

Here, since we are performing a translation to a new domain, we recommend setting the number of iterations to a small number (e.g., 2-3).

Below we provide links the pre-trained ReStyle-e4e network and various fine-tuned generators.

Path Description
FFHQ ReStyle e4e ReStyle e4e trained on FFHQ with 1024x1024 output resolution.
Toonify Toonify generator from Doron Adler and Justin Pinkney converted to Pytorch using rosinality's conversion script, used in domain adaptation.
Pixar Pixar generator from StyleGAN-NADA used in domain adaptation.
Sketch Sketch generator from StyleGAN-NADA used in domain adaptation.
Disney Princess Disney princess generator from StyleGAN-NADA used in domain adaptation.

Repository structure

Path Description
hyperstyle Repository root folder
├  configs Folder containing configs defining model/data paths and data transforms
├  criteria Folder containing various loss criterias for training
├  datasets Folder with various dataset objects
├  docs Folder containing images displayed in the README
├  environment Folder containing Anaconda environment used in our experiments
├  editing Folder containing scripts for applying various editing techniques
├  licenses Folder containing licenses of the open source projects used in this repository
├ models Folder containing all the models and training objects
│  ├  encoders Folder containing various encoder architecture implementations such as the W-encoder, pSp, and e4e
│  ├  hypernetworks Implementations of our hypernetworks and Refinement Blocks
│  ├  mtcnn MTCNN implementation from TreB1eN
│  ├  stylegan2 StyleGAN2 model from rosinality
│  ├  hyperstyle.py Main class for our HyperStyle network
├  notebooks Folder with jupyter notebooks containing HyperStyle inference playgrounds
├  options Folder with training and test command-line options
├  scripts Folder with running scripts for training, inference, and metric computations
├  training Folder with main training logic and Ranger implementation from lessw2020
├  utils Folder with various utility functions

Related Works

Many GAN inversion techniques focus on finding a latent code that most accurately reconstructs a given image using a fixed, pre-trained generator. These works include encoder-based approaches such as pSp, e4e and ReStyle, and optimization techniques such as those from Abdal et al. and Zhu et al., among many others.

In contrast, HyperStyle learns to modulate the weights of a pre-trained StyleGAN using a hypernetwork to achieve more accurate reconstructions. Previous generator tuning approaches performed a per-image optimization for fine-tuning the generator weights (Roich et. al) or feature activations (Bau et al.).

Given our inversions we can apply off-the-shelf editing techniques such as StyleCLIP, InterFaceGAN, and GANSpace, even on the modified generator.

Finally, we can apply weight offsets learned on HyperStyle trained on FFHQ to fine-tuned generators such as those obtained from StyleGAN-NADA, resulting in more faithful translations.

Credits

StyleGAN2 model and implementation:
https://github.com/rosinality/stylegan2-pytorch
Copyright (c) 2019 Kim Seonghyeon
License (MIT) https://github.com/rosinality/stylegan2-pytorch/blob/master/LICENSE

IR-SE50 model and implementations:
https://github.com/TreB1eN/InsightFace_Pytorch
Copyright (c) 2018 TreB1eN
License (MIT) https://github.com/TreB1eN/InsightFace_Pytorch/blob/master/LICENSE

Ranger optimizer implementation:
https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
License (Apache License 2.0) https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer/blob/master/LICENSE

LPIPS model and implementation:
https://github.com/S-aiueo32/lpips-pytorch
Copyright (c) 2020, Sou Uchida
License (BSD 2-Clause) https://github.com/S-aiueo32/lpips-pytorch/blob/master/LICENSE

pSp model and implementation:
https://github.com/eladrich/pixel2style2pixel
Copyright (c) 2020 Elad Richardson, Yuval Alaluf
License (MIT) https://github.com/eladrich/pixel2style2pixel/blob/master/LICENSE

e4e model and implementation:
https://github.com/omertov/encoder4editing
Copyright (c) 2021 omertov
License (MIT) https://github.com/omertov/encoder4editing/blob/main/LICENSE

ReStyle model and implementation:
https://github.com/yuval-alaluf/restyle-encoder
Copyright (c) 2021 Yuval Alaluf
License (MIT) https://github.com/yuval-alaluf/restyle-encoder/blob/main/LICENSE

StyleCLIP implementation:
https://github.com/orpatashnik/StyleCLIP
Copyright (c) 2021 Or Patashnik, Zongze Wu
https://github.com/orpatashnik/StyleCLIP/blob/main/LICENSE

StyleGAN-NADA models:
https://github.com/rinongal/StyleGAN-nada
Copyright (c) 2021 rinongal
https://github.com/rinongal/StyleGAN-nada/blob/main/LICENSE

Please Note: The CUDA files under the StyleGAN2 ops directory are made available under the Nvidia Source Code License-NC

Acknowledgments

This code borrows from pixel2style2pixel, encoder4editing, and ReStyle.

Citation

If you use this code for your research, please cite the following work:

@misc{alaluf2021hyperstyle,
      title={HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing}, 
      author={Yuval Alaluf and Omer Tov and Ron Mokady and Rinon Gal and Amit H. Bermano},
      year={2021},
      eprint={2111.15666},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Comments
  • Hyperstyle vs Restyle

    Hyperstyle vs Restyle

    Hi Yuval, firstly, fantastic job on your contributions across multiple GAN problems including Restyle and Hyperstyle. Can you advise if there are clear benefits for real image inversion in Hyperstyle in comparison to Restyle?

    Thanks!

    opened by zenjedi 13
  • RuntimeError: Error(s) in loading state_dict for WEncoder:

    RuntimeError: Error(s) in loading state_dict for WEncoder:

    Hi, Thank you for your work.

    I want to train with custom dataset but I am getting the following error.

    image

    In the --w_encoder_checkpoint_path section, I used the model I obtained from the training I made in the encoder4editing repository.

    opened by Cagdas-Yilmaz 9
  • Export this model to ONNX

    Export this model to ONNX

    Hi, I was trying to export this model to onnx but when I searched the colab notebook I found out that there is no Class defined with (nn,Module) or the model definition which is not there so I further searched and opened the 'model.py' which contain too many classes with (nn.Module). Since I am a beginner, I have very less knowledge about pytorch, onnx, ML models, etc. I have just started working on ML and working on a project which require converting to onnx. Please help. I am using this model just for fun learning and testing the viability of my project.

    opened by AAYUSH-SWAMI 8
  • FileNotFoundError: [Errno 2] No such file or directory: './pretrained_models/restlye_e4e.pt'

    FileNotFoundError: [Errno 2] No such file or directory: './pretrained_models/restlye_e4e.pt'

    Error:

    FileNotFoundError: [Errno 2] No such file or directory: './pretrained_models/restlye_e4e.pt' when trying to run:

    # load restyle-e4e model
    restyle_e4e, restyle_e4e_opts = load_model(restyle_e4e_path, is_restyle_encoder=True)
    print(f'ReStyle-e4e model successfully loaded!')
    

    In this line:

    # load ReStyle e4e:
    RESTYLE_E4E_MODELS = {'id': '1e2oXVeBPXMQoUoC_4TNwAWpOPpSEhE_e', 'name': 'restlye_e4e.pt'}
    

    ID 1e2oXVeBPXMQoUoC_4TNwAWpOPpSEhE_e is for "restyle_e4e_ffhq_encode.pt" file. File "restlye_e4e.pt" is nowhere to be found

    opened by truetais 8
  • the generated image is a little blur

    the generated image is a little blur

    tbq Thanks for your great work. It give me the best performance as far as I know for reconstruction of real image. But I found the reconstruct image is a litte blur. The left of upload image is real image, and the right is the reconstructed one. How can I get more clear image? They are all 1024x1024 without resizing from 256 to 1024.

    tbq the left of the next image is aligned and resize to 1024x1024, and the right is reconstructed one.

    opened by tengshaofeng 7
  • Style Clip editing on hyperstyle + PTI output using global directions.

    Style Clip editing on hyperstyle + PTI output using global directions.

    Hey, So Hyperstyle saves weights when executed and afterward when I tune the inversion using PTI and then use styleGAN for output. The main issue I am facing is that styleGAN loads the saved weights from Hyperstyle and thus the editing is being done on Hyperstyle inversion and not Hyperstyle + PTI tuned inversion. So is there a way to use global directions only and save the weights after the tuning through PTI has been performed?

    opened by jaingaurav1601 7
  • Backward problem

    Backward problem

    Hi, it's a problem about the loss backward implementation. I see in the coach, that the training process is written as follows.

    for _iter in range(n_iter):
              forward()
              loss = cal_loss()
              loss.backward()
    opt.zero_grad()
    opt.step()
    

    Why is the process implemented as multi backwards and only one optimizer step, e.g. for time-saving? Can we write as one loss backwards and one optimizer step in the for loop?

    for _iter in range(n_iter):
              forward()
              loss = cal_loss()
              loss.backward()
              opt.zero_grad()
              opt.step()
    

    Another problem is in one for loop, does the former loss backward's gradient will be saved correctly and does it affect the follow backward's gradient? Thanks in advance! Thanks for providing such an impressive work!

    opened by FeiiYin 5
  • inversion train issue: weird results

    inversion train issue: weird results

    When the global step < 15000, the losses are decreasing and images in logs/images/train are natural though they're a bit different from input. Then the losses rises to about 1.0 and images in logs/images/train get weird. I'm confused, hope to get your help.

    opened by Sakura-ldx 5
  • How to finetune on cats and generate toonification on animals.

    How to finetune on cats and generate toonification on animals.

    Hi @yuval-alaluf,

    Thanks for your quick response.

    For toonification on cats. Below is the approach I thought, correct me if I am wrong.

    1. Train the hyperstyle gan model to generate the latent space on for the given input cat image.
    2. using that latent space use toonification/finetuning with new cartoon cat dataset to generate cartoon cat images.

    Below are some of the issues I am facing: I thought of using the pre-trained AFHQ_wild hypernet model to generate the latent space, But the trained model is more biased towards fox and tiger than cats, so I thought of finetuning the model using the AFHQ_wild model on my new cat's dataset. Below are the training parameters I am considering, Pls let me know if I am doing wrong, I am getting below error.

    Error:

    Loading hypernet weights from resnet34! Loading decoder weights from pretrained path: /home/ubuntu/Desktop/Avatar-gen/hyper_z/hyperstyle/pretrained_models/hyperstyle_afhq_wild.pt Traceback (most recent call last): File "/home/ubuntu/Desktop/Avatar-gen/hyper_z/hyperstyle/scripts/train.py", line 32, in main() File "/home/ubuntu/Desktop/Avatar-gen/hyper_z/hyperstyle/scripts/train.py", line 19, in main coach = Coach(opts) File "/home/ubuntu/Desktop/Avatar-gen/hyper_z/hyperstyle/./training/coach_hyperstyle.py", line 35, in init self.net = HyperStyle(self.opts).to(self.device) File "/home/ubuntu/Desktop/Avatar-gen/hyper_z/hyperstyle/./models/hyperstyle.py", line 25, in init self.load_weights() File "/home/ubuntu/Desktop/Avatar-gen/hyper_z/hyperstyle/./models/hyperstyle.py", line 58, in load_weights self.decoder.load_state_dict(ckpt['g_ema'], strict=True) KeyError: 'g_ema'

    ====================================================================== I am using the below training parameters.

    python scripts/train.py
    --dataset_type=my_hypernet
    --encoder_type=SharedWeightsHyperNetResNet
    --exp_dir=experiments/hyperstyle
    --workers=4
    --batch_size=4
    --test_batch_size=2
    --test_workers=2
    --val_interval=5000
    --save_interval=10000
    --lpips_lambda=0.8
    --l2_lambda=1
    --id_lambda=0
    --n_iters_per_batch=5
    --max_val_batches=150
    --output_size=256
    --stylegan_weights=pretrained_models/hyperstyle_afhq_wild.pt
    --load_w_encoder
    --w_encoder_checkpoint_path pretrained_models/afhq_wild_w_encoder.pt
    --layers_to_tune=0,2,3,5,6,8,9,11,12,14,15,17,18,20,21,23,24

    For toonification: I have the cartoon cat dataset around 400 images, How to finetune after I got the latents for the given input cat image?

    opened by Zeeshan75 4
  • Why using restyle e4e latent instead of hyperstyle latent when running domain adaptation?

    Why using restyle e4e latent instead of hyperstyle latent when running domain adaptation?

    Hi, first of all, thank you very much for this awesome project!

    I have a question about the domain adaptation task. If my understanding is correct, you load the nets from restyle e4e in the scripts/run_domain_adaptation.py script. This model is then used to get the latent image, from which you will generate the final image after domain adaptation.

    My question is the following: after running some tests, the latent from Hyperstyle seems to be more faithful to the original image than the latent from restyle e4e, so why do you use the latent from restyle and not the Hyperstyle one? Is there a reason to prefer the latent from restyle e4e for the domain adaptation task?

    Thank you for your answer.

    opened by Shysto 4
  • Training time of HyperStyle

    Training time of HyperStyle

    Hi,

    I'm currently trying to train HyperStyle from scratch for FFHQ. I use the training command suggested in the "Training HyperStyle" section in the readme. I run it on one A100 Nvidia GPU. From training, I see that 1k steps take approximately one hour. So, as I understand to finish the full training (500k steps), I need to wait 500 hours (20 days). Is it right?

    Could you please share how much time you spent training HyperStyle and which resources you used?

    Thanks!

    opened by ai-alanov 4
Owner
Computer vision research scientist and enthusiast
null
Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

HFGI: High-Fidelity GAN Inversion for Image Attribute Editing High-Fidelity GAN Inversion for Image Attribute Editing Update: We released the inferenc

Tengfei Wang 371 Dec 30, 2022
Official code implementation for "Personalized Federated Learning using Hypernetworks"

Personalized Federated Learning using Hypernetworks This is an official implementation of Personalized Federated Learning using Hypernetworks paper. [

Aviv Shamsian 121 Dec 25, 2022
[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

[Project] [PDF] This repository contains code for our SIGGRAPH'22 paper "StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets" by Axel Sauer, Katja

null 742 Jan 4, 2023
Chunkmogrify: Real image inversion via Segments

Chunkmogrify: Real image inversion via Segments Teaser video with live editing sessions can be found here This code demonstrates the ideas discussed i

David Futschik 112 Jan 4, 2023
Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement Recently, the power of unconditional image synthesis has significantly advanced th

null 967 Jan 4, 2023
Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Face Identity Disentanglement via Latent Space Mapping - Implement in pytorch with StyleGAN 2 Description Pytorch implementation of the paper Face Ide

Daniel Roich 58 Dec 24, 2022
Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

StyleCariGAN in PyTorch Official implementation of StyleCariGAN:Caricature Generation via StyleGAN Feature Map Modulation in PyTorch Requirements PyTo

PeterZhouSZ 49 Oct 31, 2022
Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation This repository contains the official PyTorch implementation of the following

Wonjong Jang 270 Dec 30, 2022
StyleGAN - Official TensorFlow Implementation

StyleGAN — Official TensorFlow Implementation Picture: These people are not real – they were produced by our generator that allows control over differ

NVIDIA Research Projects 13.1k Jan 9, 2023
Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Ima

Xuanchi Ren 86 Dec 7, 2022
Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

R2RNet Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network." Jiang Hai, Zhu Xuan, Ren Yang, Yutong Hao, Fengzhu

null 77 Dec 24, 2022
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

Poisson Image Editing - A Parallel Implementation Jiayi Weng (jiayiwen), Zixu Chen (zixuc) Poisson Image Editing is a technique that can fuse two imag

Jiayi Weng 110 Dec 27, 2022
A collection of resources on GAN Inversion.

This repo is a collection of resources on GAN inversion, as a supplement for our survey

null 857 Dec 29, 2022
[CVPR 2021] Unsupervised 3D Shape Completion through GAN Inversion

ShapeInversion Paper Junzhe Zhang, Xinyi Chen, Zhongang Cai, Liang Pan, Haiyu Zhao, Shuai Yi, Chai Kiat Yeo, Bo Dai, Chen Change Loy "Unsupervised 3D

null 100 Dec 22, 2022
[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

ZJU-VIPA 47 Jan 9, 2023
A Simplied Framework of GAN Inversion

Framework of GAN Inversion Introcuction You can implement your own inversion idea using our repo. We offer a full range of tuning settings (in hparams

Kangneng Zhou 13 Sep 27, 2022
Style-based Neural Drum Synthesis with GAN inversion

Style-based Drum Synthesis with GAN Inversion Demo TensorFlow implementation of a style-based version of the adversarial drum synth (ADS) from the pap

Sound and Music Analysis (SoMA) Group 29 Nov 19, 2022
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Demo video: CVPR 2021 Oral: Single Channel Manipulation: Localized or attribu

Zongze Wu 267 Dec 30, 2022
Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

Daniil Pakhomov 134 Dec 19, 2022