Official Implementation for Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Last update: Dec 30, 2022

Related tags

Deep Learning generative-adversarial-network image-translation stylegan stylegan-encoder cvpr2021 pixel2style2pixel psp-model psp-framework

Overview

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

We present a generic image-to-image translation framework, pixel2style2pixel (pSp). Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended W+ latent space. We first show that our encoder can directly embed real images into W+, with no additional optimization. Next, we propose utilizing our encoder to directly solve image-to-image translation tasks, defining them as encoding problems from some input domain into the latent domain. By deviating from the standard "invert first, edit later" methodology used with previous StyleGAN encoders, our approach can handle a variety of tasks even when the input image is not represented in the StyleGAN domain. We show that solving translation tasks through StyleGAN significantly simplifies the training process, as no adversary is required, has better support for solving tasks without pixel-to-pixel correspondence, and inherently supports multi-modal synthesis via the resampling of styles. Finally, we demonstrate the potential of our framework on a variety of facial image-to-image translation tasks, even when compared to state-of-the-art solutions designed specifically for a single task, and further show that it can be extended beyond the human facial domain.

The proposed pixel2style2pixel framework can be used to solve a wide variety of image-to-image translation tasks. Here we show results of pSp on StyleGAN inversion, multi-modal conditional image synthesis, facial frontalization, inpainting and super-resolution.

Description

Official Implementation of our pSp paper for both training and evaluation. The pSp method extends the StyleGAN model to allow solving different image-to-image translation problems using its encoder.

Recent Updates

2020.10.04: Initial code release
2020.10.06: Add pSp toonify model (Thanks to the great work from Doron Adler and Justin Pinkney)!
2021.04.23: Added several new features:

Added supported for StyleGANs of different resolutions (e.g., 256, 512, 1024). This can be set using the flag --output_size, which is set to 1024 by default.
Added support for the MoCo-Based similarity loss introduced in encoder4editing (Tov et al. 2021). More details are provided below.

2021.07.06: Added support for training with Weights & Biases. See below for details.

Applications

StyleGAN Encoding

Here, we use pSp to find the latent code of real images in the latent domain of a pretrained StyleGAN generator.

Face Frontalization

In this application we want to generate a front-facing face from a given input image.

Conditional Image Synthesis

Here we wish to generate photo-realistic face images from ambiguous sketch images or segmentation maps. Using style-mixing, we inherently support multi-modal synthesis for a single input.

Super Resolution

Given a low-resolution input image, we generate a corresponding high-resolution image. As this too is an ambiguous task, we can use style-mixing to produce several plausible results.

Getting Started

Prerequisites

Linux or macOS
NVIDIA GPU + CUDA CuDNN (CPU may be possible with some modifications, but is not inherently supported)
Python 2 or 3

Installation

Clone this repo:

git clone https://github.com/eladrich/pixel2style2pixel.git
cd pixel2style2pixel

Dependencies:
We recommend running this repository using Anaconda. All dependencies for defining the environment are provided in environment/psp_env.yaml.

Inference Notebook

To help visualize the pSp framework on multiple tasks and to help you get started, we provide a Jupyter notebook found in notebooks/inference_playground.ipynb that allows one to visualize the various applications of pSp.
The notebook will download the necessary pretrained models and run inference on the images found in notebooks/images.
For the tasks of conditional image synthesis and super resolution, the notebook also demonstrates pSp's ability to perform multi-modal synthesis using style-mixing.

Pretrained Models

Please download the pre-trained models from the following links. Each pSp model contains the entire pSp architecture, including the encoder and decoder weights.

Path	Description
StyleGAN Inversion	pSp trained with the FFHQ dataset for StyleGAN inversion.
Face Frontalization	pSp trained with the FFHQ dataset for face frontalization.
Sketch to Image	pSp trained with the CelebA-HQ dataset for image synthesis from sketches.
Segmentation to Image	pSp trained with the CelebAMask-HQ dataset for image synthesis from segmentation maps.
Super Resolution	pSp trained with the CelebA-HQ dataset for super resolution (up to x32 down-sampling).
Toonify	pSp trained with the FFHQ dataset for toonification using StyleGAN generator from Doron Adler and Justin Pinkney.

If you wish to use one of the pretrained models for training or inference, you may do so using the flag --checkpoint_path.

In addition, we provide various auxiliary models needed for training your own pSp model from scratch as well as pretrained models needed for computing the ID metrics reported in the paper.

Path	Description
FFHQ StyleGAN	StyleGAN model pretrained on FFHQ taken from rosinality with 1024x1024 output resolution.
IR-SE50 Model	Pretrained IR-SE50 model taken from TreB1eN for use in our ID loss during pSp training.
MoCo ResNet-50	Pretrained ResNet-50 model trained using MOCOv2 for computing MoCo-based similarity loss on non-facial domains. The model is taken from the official implementation.
CurricularFace Backbone	Pretrained CurricularFace model taken from HuangYG123 for use in ID similarity metric computation.
MTCNN	Weights for MTCNN model taken from TreB1eN for use in ID similarity metric computation. (Unpack the tar.gz to extract the 3 model weights.)

By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models. However, you may use your own paths by changing the necessary values in configs/path_configs.py.

Training

Preparing your Data

Currently, we provide support for numerous datasets and experiments (encoding, frontalization, etc.).
- Refer to configs/paths_config.py to define the necessary data paths and model paths for training and evaluation.
- Refer to configs/transforms_config.py for the transforms defined for each dataset/experiment.
- Finally, refer to configs/data_configs.py for the source/target data paths for the train and test sets as well as the transforms.
If you wish to experiment with your own dataset, you can simply make the necessary adjustments in
1. data_configs.py to define your data paths.
2. transforms_configs.py to define your own data transforms.

As an example, assume we wish to run encoding using ffhq (dataset_type=ffhq_encode). We first go to configs/paths_config.py and define:

dataset_paths = {
    'ffhq': '/path/to/ffhq/images256x256'
    'celeba_test': '/path/to/CelebAMask-HQ/test_img',
}

The transforms for the experiment are defined in the class EncodeTransforms in configs/transforms_config.py.
Finally, in configs/data_configs.py, we define:

DATASETS = {
   'ffhq_encode': {
        'transforms': transforms_config.EncodeTransforms,
        'train_source_root': dataset_paths['ffhq'],
        'train_target_root': dataset_paths['ffhq'],
        'test_source_root': dataset_paths['celeba_test'],
        'test_target_root': dataset_paths['celeba_test'],
    },
}

When defining our datasets, we will take the values in the above dictionary.

Training pSp

The main training script can be found in scripts/train.py.
Intermediate training results are saved to opts.exp_dir. This includes checkpoints, train outputs, and test outputs.
Additionally, if you have tensorboard installed, you can visualize tensorboard logs in opts.exp_dir/logs.

Training the pSp Encoder

python scripts/train.py \
--dataset_type=ffhq_encode \
--exp_dir=/path/to/experiment \
--workers=8 \
--batch_size=8 \
--test_batch_size=8 \
--test_workers=8 \
--val_interval=2500 \
--save_interval=5000 \
--encoder_type=GradualStyleEncoder \
--start_from_latent_avg \
--lpips_lambda=0.8 \
--l2_lambda=1 \
--id_lambda=0.1

Frontalization

python scripts/train.py \
--dataset_type=ffhq_frontalize \
--exp_dir=/path/to/experiment \
--workers=8 \
--batch_size=8 \
--test_batch_size=8 \
--test_workers=8 \
--val_interval=2500 \
--save_interval=5000 \
--encoder_type=GradualStyleEncoder \
--start_from_latent_avg \
--lpips_lambda=0.08 \
--l2_lambda=0.001 \
--lpips_lambda_crop=0.8 \
--l2_lambda_crop=0.01 \
--id_lambda=1 \
--w_norm_lambda=0.005

Sketch to Face

python scripts/train.py \
--dataset_type=celebs_sketch_to_face \
--exp_dir=/path/to/experiment \
--workers=8 \
--batch_size=8 \
--test_batch_size=8 \
--test_workers=8 \
--val_interval=2500 \
--save_interval=5000 \
--encoder_type=GradualStyleEncoder \
--start_from_latent_avg \
--lpips_lambda=0.8 \
--l2_lambda=1 \
--id_lambda=0 \
--w_norm_lambda=0.005 \
--label_nc=1 \
--input_nc=1

Segmentation Map to Face

python scripts/train.py \
--dataset_type=celebs_seg_to_face \
--exp_dir=/path/to/experiment \
--workers=8 \
--batch_size=8 \
--test_batch_size=8 \
--test_workers=8 \
--val_interval=2500 \
--save_interval=5000 \
--encoder_type=GradualStyleEncoder \
--start_from_latent_avg \
--lpips_lambda=0.8 \
--l2_lambda=1 \
--id_lambda=0 \
--w_norm_lambda=0.005 \
--label_nc=19 \
--input_nc=19

Notice with conditional image synthesis no identity loss is utilized (i.e. --id_lambda=0)

Super Resolution

python scripts/train.py \
--dataset_type=celebs_super_resolution \
--exp_dir=/path/to/experiment \
--workers=8 \
--batch_size=8 \
--test_batch_size=8 \
--test_workers=8 \
--val_interval=2500 \
--save_interval=5000 \
--encoder_type=GradualStyleEncoder \
--start_from_latent_avg \
--lpips_lambda=0.8 \
--l2_lambda=1 \
--id_lambda=0.1 \
--w_norm_lambda=0.005 \
--resize_factors=1,2,4,8,16,32

Additional Notes

See options/train_options.py for all training-specific flags.
See options/test_options.py for all test-specific flags.
If you wish to resume from a specific checkpoint (e.g. a pretrained pSp model), you may do so using --checkpoint_path.
By default, we assume that the StyleGAN used outputs images at resolution 1024x1024. If you wish to use a StyleGAN at a smaller resolution, you can do so by using the flag --output_size (e.g., --output_size=256).
If you wish to generate images from segmentation maps, please specify --label_nc=N and --input_nc=N where N is the number of semantic categories.
Similarly, for generating images from sketches, please specify --label_nc=1 and --input_nc=1.
Specifying --label_nc=0 (the default value), will directly use the RGB colors as input.

Identity/Similarity Losses
In pSp, we introduce a facial identity loss using a pre-trained ArcFace network for facial recognition. When operating on the human facial domain, we highly recommend employing this loss objective by using the flag --id_lambda.
In a more recent paper, encoder4editing, the authors generalize this identity loss to other domains by using a MoCo-based ResNet to extract features instead of an ArcFace network. Applying this MoCo-based similarity loss can be done by using the flag --moco_lambda. We recommend setting --moco_lambda=0.5 in your experiments.
Please note, you cannot set both id_lambda and moco_lambda to be active simultaneously (e.g., to use the MoCo-based loss, you should specify, --moco_lambda=0.5 --id_lambda=0).

Weights & Biases Integration

To help track your experiments, we've integrated Weights & Biases into our training process. To enable Weights & Biases (wandb), first make an account on the platform's webpage and install wandb using pip install wandb. Then, to train pSp using wandb, simply add the flag --use_wandb.

Note that when running for the first time, you will be asked to provide your access key which can be accessed via the Weights & Biases platform.

Using Weights & Biases will allow you to visualize the training and testing loss curves as well as intermediate training results.

Testing

Inference

Having trained your model, you can use scripts/inference.py to apply the model on a set of images.
For example,

python scripts/inference.py \
--exp_dir=/path/to/experiment \
--checkpoint_path=experiment/checkpoints/best_model.pt \
--data_path=/path/to/test_data \
--test_batch_size=4 \
--test_workers=4 \
--couple_outputs

Additional notes to consider:

During inference, the options used during training are loaded from the saved checkpoint and are then updated using the test options passed to the inference script. For example, there is no need to pass --dataset_type or --label_nc to the inference script, as they are taken from the loaded opts.
When running inference for segmentation-to-image or sketch-to-image, it is highly recommend to do so with a style-mixing, as is done in the paper. This can simply be done by adding --latent_mask=8,9,10,11,12,13,14,15,16,17 when calling the script.
When running inference for super-resolution, please provide a single down-sampling value using --resize_factors.
Adding the flag --couple_outputs will save an additional image containing the input and output images side-by-side in the sub-directory inference_coupled. Otherwise, only the output image is saved to the sub-directory inference_results.
By default, the images will be saved at resolutiosn of 1024x1024, the original output size of StyleGAN. If you wish to save outputs resized to resolutions of 256x256, you can do so by adding the flag --resize_outputs.

Multi-Modal Synthesis with Style-Mixing

Given a trained model for conditional image synthesis or super-resolution, we can easily generate multiple outputs for a given input image. This can be done using the script scripts/style_mixing.py.
For example, running the following command will perform style-mixing for a segmentation-to-image experiment:

python scripts/style_mixing.py \
--exp_dir=/path/to/experiment \
--checkpoint_path=/path/to/experiment/checkpoints/best_model.pt \
--data_path=/path/to/test_data/ \
--test_batch_size=4 \
--test_workers=4 \
--n_images=25 \
--n_outputs_to_generate=5 \
--latent_mask=8,9,10,11,12,13,14,15,16,17

Here, we inject 5 randomly drawn vectors and perform style-mixing on the latents [8,9,10,11,12,13,14,15,16,17].

Additional notes to consider:

To perform style-mixing on a subset of images, you may use the flag --n_images. The default value of None will perform style mixing on every image in the given data_path.
You may also include the argument --mix_alpha=m where m is a float defining the mixing coefficient between the input latent and the randomly drawn latent.
When performing style-mixing for super-resolution, please provide a single down-sampling value using --resize_factors.
By default, the images will be saved at resolutiosn of 1024x1024, the original output size of StyleGAN. If you wish to save outputs resized to resolutions of 256x256, you can do so by adding the flag --resize_outputs.

Computing Metrics

Similarly, given a trained model and generated outputs, we can compute the loss metrics on a given dataset.
These scripts receive the inference output directory and ground truth directory.

Calculating the identity loss:

python scripts/calc_id_loss_parallel.py \
--data_path=/path/to/experiment/inference_outputs \
--gt_path=/path/to/test_images \

Calculating LPIPS loss:

python scripts/calc_losses_on_images.py \
--mode lpips
--data_path=/path/to/experiment/inference_outputs \
--gt_path=/path/to/test_images \

Calculating L2 loss:

python scripts/calc_losses_on_images.py \
--mode l2
--data_path=/path/to/experiment/inference_outputs \
--gt_path=/path/to/test_images \

Additional Applications

To better show the flexibility of our pSp framework we present additional applications below.

As with our main applications, you may download the pretrained models here:

Path	Description
Toonify	pSp trained with the FFHQ dataset for toonification using StyleGAN generator from Doron Adler and Justin Pinkney.

Toonify

Using the toonify StyleGAN built by Doron Adler and Justin Pinkney, we take a real face image and generate a toonified version of the given image. We train the pSp encoder to directly reconstruct real face images inside the toons latent space resulting in a projection of each image to the closest toon. We do so without requiring any labeled pairs or distillation!

This is trained exactly like the StyleGAN inversion task with several changes:

Change from FFHQ StyleGAN to toonifed StyleGAN (can be set using --stylegan_weights)
- The toonify generator is taken from Doron Adler and Justin Pinkney and converted to Pytorch using rosinality's conversion script.
- For convenience, the converted generator Pytorch model may be downloaded here.
Increase id_lambda from 0.1 to 1
Increase w_norm_lambda from 0.005 to 0.025

We obtain the best results after around 6000 iterations of training (can be set using --max_steps)

Repository structure

Path	Description
pixel2style2pixel	Repository root folder
├ configs	Folder containing configs defining model/data paths and data transforms
├ criteria	Folder containing various loss criterias for training
├ datasets	Folder with various dataset objects and augmentations
├ environment	Folder containing Anaconda environment used in our experiments
├ models	Folder containting all the models and training objects
│ ├ encoders	Folder containing our pSp encoder architecture implementation and ArcFace encoder implementation from TreB1eN
│ ├ mtcnn	MTCNN implementation from TreB1eN
│ ├ stylegan2	StyleGAN2 model from rosinality
│ └ psp.py	Implementation of our pSp framework
├ notebook	Folder with jupyter notebook containing pSp inference playground
├ options	Folder with training and test command-line options
├ scripts	Folder with running scripts for training and inference
├ training	Folder with main training logic and Ranger implementation from lessw2020
├ utils	Folder with various utility functions

TODOs

Add multi-gpu support

Credits

StyleGAN2 implementation:
https://github.com/rosinality/stylegan2-pytorch
Copyright (c) 2019 Kim Seonghyeon
License (MIT) https://github.com/rosinality/stylegan2-pytorch/blob/master/LICENSE

MTCNN, IR-SE50, and ArcFace models and implementations:
https://github.com/TreB1eN/InsightFace_Pytorch
Copyright (c) 2018 TreB1eN
License (MIT) https://github.com/TreB1eN/InsightFace_Pytorch/blob/master/LICENSE

CurricularFace model and implementation:
https://github.com/HuangYG123/CurricularFace
Copyright (c) 2020 HuangYG123
License (MIT) https://github.com/HuangYG123/CurricularFace/blob/master/LICENSE

Ranger optimizer implementation:
https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
License (Apache License 2.0) https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer/blob/master/LICENSE

LPIPS implementation:
https://github.com/S-aiueo32/lpips-pytorch
Copyright (c) 2020, Sou Uchida
License (BSD 2-Clause) https://github.com/S-aiueo32/lpips-pytorch/blob/master/LICENSE

Please Note: The CUDA files under the StyleGAN2 ops directory are made available under the Nvidia Source Code License-NC

Inspired by pSp

Below are several works inspired by pSp that we found particularly interesting:

Reverse Toonification
Using our pSp encoder, artist Nathan Shipley transformed animated figures and paintings into real life. Check out his amazing work on his twitter page and website.

Deploying pSp with StyleSpace for Editing
Awesome work from Justin Pinkney who deployed our pSp model on Runway and provided support for editing the resulting inversions using the StyleSpace Analysis paper. Check out his repository here.

Encoder4Editing (e4e)
Building on the work of pSp, Tov et al. design an encoder to enable high quality edits on real images. Check out their paper and code.

Style-based Age Manipulation (SAM)
Leveraging pSp and the rich semantics of StyleGAN, SAM learns non-linear latent space paths for modeling the age transformation of real face images. Check out the project page here.

ReStyle
ReStyle builds on recent encoders such as pSp and e4e by introducing an iterative refinment mechanism to gradually improve the inversion of real images. Check out the project page here.

pSp in the Media

Citation

If you use this code for your research, please cite our paper Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation:

@InProceedings{richardson2021encoding,
      author = {Richardson, Elad and Alaluf, Yuval and Patashnik, Or and Nitzan, Yotam and Azar, Yaniv and Shapiro, Stav and Cohen-Or, Daniel},
      title = {Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation},
      booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      month = {June},
      year = {2021}
}

Comments

Question about ffhq_encode

I tried to train a toon model with source/target images as below Source Target Result Trained it till like 6000 iters as mentioned in the thread with the same settings above is just example trained the data with like 1000 images. It does not really give the expected output it actually changes the structure of the whole face. can you give some rough idea of what I could be doing wrong? and how do I preserve the input face and somehow make the eyes a bit large?

opened by justmaulik 27
Training a 'cartoonify' model with unpaired data.

Hi, I wonder if you can help me.

Basically I'd like to train a model similar to the Toonify model, except on a different target domain (I'm going for a more hand-drawn cartoony style) with unpaired data - about 1000 examples.

I've tried training a model starting with the ffhq_cartoon_blended weights for about 12000 steps (batch size 4) and the recommended toonify hyper parameters. However, the output don't look very good, they look like an overlaid version of the 'defauly' toonify face and the target image. (See below) I wonder if you have advice for more succesful training. Thanks!

opened by snakch 18

about toonifed StyleGAN2 and model blending

Hi, may I ask you a question. I have trained 2 stylegan2 models successfully by pytorch, one trained on my own cartoon dataset, another trained on ffhq dataset. They both generate normal 256*256 images. Now I want to get a toonifed StyleGAN2, so I blended the two models (by pytorch, I will paste the blend_models_pytorch.py later). and I tried the blend_models.py (by using .pkl file) the author provided. But I found that my results seem to have some errors. So could you help me??

my results: d8f0831a213b7ec1ca4d8f7a93dd04f

my normal stylegan2 results(trained on my own cartoon dataset) e258f837340daa04efb297d3026c08b

my blend_models_pytorch.py:

import os import torch from torchvision import transforms, utils from model import Generator

def extract_conv_names(model, resolution):

extract_names = {
    4: [],
    8: ['convs.0.', 'convs.1.', 'to_rgbs.0.'],
    16: ['convs.0.', 'convs.1.', 'to_rgbs.0.', 'convs.2.', 'convs.3.', 'to_rgbs.1.'],
    32: ['convs.0.', 'convs.1.', 'to_rgbs.0.', 'convs.2.', 'convs.3.', 'to_rgbs.1.', 'convs.4.', 'convs.5.', 'to_rgbs.2.'],
    64: ['convs.0.', 'convs.1.', 'to_rgbs.0.', 'convs.2.', 'convs.3.', 'to_rgbs.1.', 'convs.4.', 'convs.5.', 'to_rgbs.2.',
         'convs.6.', 'convs.7.', 'to_rgbs.3.']
}

# input    conv1    to_rgb1    convs   to_rgbs
keys = [key for key, value in model.items()]
used_names = []
for key in keys:
    if 'input' in key:
        used_names.append((key, 0))
    elif 'conv1' in key:
        used_names.append((key, 0))
    elif 'to_rgb1' in key:
        used_names.append((key, 0))
    elif 'convs' in key:
        temp_label = True
        for cn in extract_names[resolution]:
            if cn in key:
                used_names.append((key, 0))
                temp_label = False
        if temp_label:
            used_names.append((key, 1))
    elif 'to_rgbs' in key:
        temp_label = True
        for cn in extract_names[resolution]:
            if cn in key:
                used_names.append((key, 0))
                temp_label = False
        if temp_label:
            used_names.append((key, 1))

return used_names

def blend_models(model_1, model_2, resolution, blend_width=None): # y is the blending amount which y = 0 means all model 1, y = 1 means all model_2

model_1_names = extract_conv_names(model_1, resolution)
model_2_names = extract_conv_names(model_2, resolution)

assert all((x == y for x, y in zip(model_1_names, model_2_names)))

model_out = model_1.copy()

model_names = [x[0] for x in model_1_names]
model_y = [x[1] for x in model_1_names]

if blend_width:
    # exponent = -x / blend_width
    # y = 1 / (1 + math.exp(exponent))
    print('blend_width 为true 时未实现')
else:
    for key, y in zip(model_names, model_y):
        model_out[key] = model_1[key] * (1 - y) + model_2[key] * y

return model_out

def main(ckpt_ffhq, ckpt_cartoon, resolution=8, blend_width=None, output_grid=None, output_pt=None): """ :param ckpt_ffhq: file from which to take low res layers :param ckpt_cartoon: file from which to take high res layers :param resolution: Resolution level at which to switch between models :param blend_width: None = hard switch, float = smooth switch (logistic) with given width :param output_grid: Path of image file to save example grid (None = don't save) :param seed: seed for random grid :param output_pt: Output path of pickle (None = don't save) :return: """

low_res_G, low_res_D, low_res_Gs = ckpt_ffhq['g'], ckpt_ffhq['d'], ckpt_ffhq['g_ema']
high_res_G, high_res_D, high_res_Gs = ckpt_cartoon['g'], ckpt_cartoon['d'], ckpt_cartoon['g_ema']

out = blend_models(low_res_Gs, high_res_Gs, resolution=resolution, blend_width=blend_width)


sample_z = torch.randn(1, 512, device='cuda')

g_ema = Generator(256, 512, 8).to('cuda')
g_ema.load_state_dict(out)
sample, _ = g_ema([sample_z], truncation=1, truncation_latent=None)

utils.save_image(
    sample,
    output_grid,
    nrow=1,
    normalize=True,
    range=(-1, 1),
)

if name == 'main': print("ok")

dir = r'E:\ND\SVN_AI_prj_02_Dev\02-FaceReconstruction_Dev\pixel2style2pixel_wqy\pretrained_models'
ckpt_cartoon = torch.load(os.path.join(dir, 'stylegan2_cartoon256_420k.pt'), map_location="cpu")
ckpt_ffhq = torch.load(os.path.join(dir, 'stylegan2_ffhq256_550k.pt'), map_location="cpu")

gf_blended = torch.load(os.path.join(dir, "stylegan2-ffhq1024-config-f.pt"), map_location="cpu")

saved = os.path.join(dir, "ffhq_cartoon_blended256.pt")
main(ckpt_ffhq, ckpt_cartoon, resolution=32, output_pt=saved, output_grid=r'D:\Downloads\blended_pytorch.jpg')

opened by huangfaan 17

How to train my model on multiple GPUs?

Hi,I am very interesting in your pspGAN,.I have two 1080Ti,I want to know how to change the code to use them both. I tried to change the code of 'coach.py' from self.net = pSp(self.opts).to(self.device) to self.net = (torch.nn.DataParallel(pSp(self.opts)).module).to.(self.device) But it doesn't work,so i want to get some help from you Thanks!

opened by zbw0329 13
File not compiling

I have run model before it was compiling and working fine but after that immediately next day it stopped working model is not compiling I have to again again interrupt compiling and sometimes even GPU gets lost. But model does not compile or file does not compile and solution to this.

opened by CJPJ007 13
why our model generate a image suround by a block shadow

The Source Domain data of stylegan2, the pre-training model we adopted, is FFHQ dataset, while the target Domain dataset is some cartoon images. Stylegan2, on the other hand, produces good cartoon images. When we use this Stylegan2 to train the PSP model, both the Source Domain and target Domain datasets of the PSP model are FFHQ data. Something weird happens, and all the generated images are surrounded by a shadow of a square. We don't know if you have encountered this problem or if you have any solutions?

opened by maxrumi 13
Training a toonify model with paired data

Thanks for the excellent work. I've managed to train a toonify model on my own pair dataset which is starting to look good.But I found that the results were not as effective as the target. I use the following command: python scripts/train.py --dataset_type=toonify --exp_dir=./experiment --workers=2 --batch_size=2 --test_batch_size=2 --test_workers=2 --val_interval=2500 --save_interval=2500 --encoder_type=GradualStyleEncoder --start_from_latent_avg --lpips_lambda=0.8 --l2_lambda=1.0 --id_lambda=0.1 --w_norm_lambda=0.005 --stylegan_weights=pretrained_models/stylegan2-ffhq-config-f.pt

I have 2700 pairs of data in my dataset. Could you please give me some possible advices? Thanks again.

opened by applech666 11
the generated faces are not real when converting 3d face meshes to real faces

Hi~

Thanks a lot for your awesome work. I am trying to translate 3d face meshes to real faces based on the FFHQ dataset by using the following command:

python scripts/train.py \ --dataset_type=ffhq_3d_to_face \ --exp_dir=exp/ffhq_3d_to_face/ \ --workers=8 \ --batch_size=8 \ --test_batch_size=8 \ --test_workers=8 \ --val_interval=2500 \ --save_interval=5000 \ --encoder_type=GradualStyleEncoder \ --start_from_latent_avg \ --lpips_lambda=0.8 \ --l2_lambda=1 \ --id_lambda=0.1 \ --w_norm_lambda=0.005 \ --output_size=256

where the 3d face meshes are obtained by inference on FFHQ with 3DDFA_v2(https://github.com/cleardusk/3DDFA_V2). After training for 50W iterations, however, the generated faces are not as real as those of the paper.

Could you please give me some possible advices? Thanks again.

opened by Honlan 11
Pushing performance
Hi there!

So I've managed to train a model on my own dataset which is starting to look very good. There are still some details that I'd like to improve if possible. For context, I am attempting to perform a task similar to the Toonify model, but on a different domain. I've trained and blended my own StyleGAN2 model, and I'm trying psp on FFHQ. The issues I'm seeing are:

Some high level details are not translated. For instance, face pose doesn't match the original image. Clothing elements also don't appear.

Generally the output image look quite different from generated samples of my blended StyleGAN model.

The training parameters I use are:

"--batch_size=4", "--max_steps=21000", "--encoder_type=GradualStyleEncoder", "--start_from_latent_avg", "--lpips_lambda=0.4", "--id_lambda=1.0", "--w_norm_lambda=0.02", "--l2_lambda=1"

My transforms are:

transforms_dict = { 'transform_gt_train': transforms.Compose([ transforms.Resize((256, 256)), transforms.RandomHorizontalFlip(0.5), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]), 'transform_source': None, 'transform_test': transforms.Compose([ transforms.Resize((256, 256)), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]), 'transform_inference': transforms.Compose([ transforms.Resize((256, 256)), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]) }

I tried removing RandomHorizontalFlips but to no avail for face orientation.

All of this may just be a limitation of the encoder + my own Stylegan and I'm already quite happy with the results but any suggestions you have would be great!

Here are some training images illustrating what I mean:

And for reference, some samples of my StyleGAN2 model:
opened by snakch 11
Running the code, but nothing happened!

Hello, I'm kind of noob, so thanks for dealing with my question even if it is about basics for you! I tried to run the code several times on both windows and wsl ubuntu but I never could make it run it, I followed the instructions as explained but unfortunately, nothing happened when i run the python scripts/inference.py

here is a screenshot https://i.ibb.co/bmNnXM1/Capture-d-cran-40.png

Thank you 👍

opened by accountvb 11
using a different stylegan weights

Hi, thanks for the awesome work.

I'd like to train an encoder using a pre-trained 256x256 stylegan model (https://github.com/rosinality/stylegan2-pytorch) but I'm not sure using this model would be okay for your training code. Since in my case the resolution only goes up to 256, there should be 14x512 latent vectors which are less than your latent vector size, 18x512. My question is, can I still use 256x256 stylegan pre-trained model for this work?

Thanks in advance.

opened by jis478 11
image-to-sketch

If I want to use your model to do a image-to-sketch task, should I pretrain the StyleGAN with my sketch datasets, then train your encoder with real facial image?

opened by hahachocolate 0
Why are the coarse details determined from the larger blocks?

In a ResNet, the coarser blocks come first:

https://github.com/eladrich/pixel2style2pixel/blob/361117156fc4eb90f463a1ca71eaf7f80d573e67/models/encoders/helpers.py#L32-L35

So why do the coarse style blocks use the fine resnet blocks?

https://github.com/eladrich/pixel2style2pixel/blob/361117156fc4eb90f463a1ca71eaf7f80d573e67/models/encoders/psp_encoders.py#L95-L105

In the video that was provided, each sample has randomness introduced through replacing the fine stylegan input latents with the random noise. This means the difference between all of the images is the fine layer. It is observed that skin tone is from the fine style layer and the facial features are from the coarse style layer. Is that meant to happen?

https://user-images.githubusercontent.com/29491356/203987089-62e51315-85b4-44f3-8ea6-77e293e9ea2c.mp4

opened by Richienb 3

Training pSp encoder converged too soon

Hi,

I trained the pSp encoder on a set of 6k real->cartoon image pairs. I trained for 70k iterations after which the test/loss function converged and stopped improving. The results after 70k iterations are still too far away from a good outcome.

Below is a screenshot of Tensorboard as well as examples of latest best model and training settings.

Any recommendation on how to get a better results? What hyper-parameters might be worth changing?

Thank you for your great work!

Tensorboard after 70k iterations: Screen Shot 2022-10-30 at 2 18 55 PM

Model after 70k iterations: 0097_69000 0087_69000 0070_69000

Hyper-parameters / Settings:

{
    "batch_size": 4,
    "board_interval": 50,
    "checkpoint_path": null,
    "dataset_type": "toonify",
    "encoder_type": "GradualStyleEncoder",
    "exp_dir": "/home/ubuntu/out",
    "id_lambda": 1.0,
    "image_interval": 100,
    "input_nc": 3,
    "l2_lambda": 1.0,
    "l2_lambda_crop": 0,
    "label_nc": 0,
    "learn_in_w": false,
    "learning_rate": 0.0001,
    "lpips_lambda": 0.8,
    "lpips_lambda_crop": 0,
    "max_steps": 500000,
    "moco_lambda": 0,
    "optim_name": "ranger",
    "output_size": 1024,
    "resize_factors": null,
    "save_interval": null,
    "start_from_latent_avg": true,
    "stylegan_weights": "pretrained_models/stylegan2-ffhq-config-f.pt",
    "test_batch_size": 2,
    "test_workers": 2,
    "train_decoder": false,
    "use_wandb": false,
    "val_interval": 1000,
    "w_norm_lambda": 0.025,
    "workers": 4
}

opened by kimyanna 3

I want to ask a little bit about style mixing,5 different angles of the same face

I want to ask a little bit about style mixing. If I input 5 pictures of a person's face at 5 different angles If I use your style mixing it produces 5 ouputs will it be continuous? I mean those 5 faces refer to the same style, even though it's 5 different angles of the same face. Ex: input 5 different angles of the same face.

opened by D-Mad 5

Owner

GitHub https://eladrich.github.io/pixel2style2pixel/

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

[Project] [PDF] This repository contains code for our SIGGRAPH'22 paper "StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets" by Axel Sauer, Katja

742 Jan 4, 2023

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

Transfer Style API It's an API to use with Tranfer Style App, where you can use

1 Feb 13, 2022

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

134 Dec 19, 2022

Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

364 Dec 14, 2022

Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Face Identity Disentanglement via Latent Space Mapping - Implement in pytorch with StyleGAN 2 Description Pytorch implementation of the paper Face Ide

58 Dec 24, 2022

Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

StyleCariGAN in PyTorch Official implementation of StyleCariGAN:Caricature Generation via StyleGAN Feature Map Modulation in PyTorch Requirements PyTo

49 Oct 31, 2022

Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation This repository contains the official PyTorch implementation of the following

270 Dec 30, 2022

StyleGAN - Official TensorFlow Implementation

StyleGAN — Official TensorFlow Implementation Picture: These people are not real – they were produced by our generator that allows control over differ

13.1k Jan 9, 2023

Fast Neural Style for Image Style Transform by Pytorch

FastNeuralStyle by Pytorch Fast Neural Style for Image Style Transform by Pytorch This is famous Fast Neural Style of Paper Perceptual Losses for Real

81 Sep 3, 2022

Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Ima

86 Dec 7, 2022

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

489 Jan 7, 2023

Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Likelihood-Regret Official implementation of Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020. T

33 Oct 12, 2022

Only a Matter of Style: Age Transformation Using a Style-Based Regression Model

Only a Matter of Style: Age Transformation Using a Style-Based Regression Model The task of age transformation illustrates the change of an individual

444 Dec 30, 2022

GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.

MTV-TSA: Adaptable GAN Encoders for Image Reconstruction via Multi-type Latent Vectors with Two-scale Attentions. This is the official code release fo

37 Dec 24, 2022

Official Implementation for Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Related tags

Overview

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Description

Recent Updates

Applications

StyleGAN Encoding

Face Frontalization

Conditional Image Synthesis

Super Resolution

Getting Started

Prerequisites

Installation

Inference Notebook

Pretrained Models

Training

Preparing your Data

Training pSp

Training the pSp Encoder

Frontalization

Sketch to Face

Segmentation Map to Face

Super Resolution

Additional Notes

Weights & Biases Integration

Testing

Inference

Multi-Modal Synthesis with Style-Mixing

Computing Metrics

Additional Applications

Toonify

Repository structure

TODOs

Credits

Inspired by pSp

pSp in the Media

Citation

Comments

Owner

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

StyleGAN - Official TensorFlow Implementation

Fast Neural Style for Image Style Transform by Pytorch

Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Only a Matter of Style: Age Transformation Using a Style-Based Regression Model

GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.

A tensorflow/keras implementation of StyleGAN to generate images of new Pokemon.

Jittor 64*64 implementation of StyleGAN

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

Maximum Spatial Perturbation for Image-to-Image Translation (Official Implementation)

Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”