Official Implementation of "Designing an Encoder for StyleGAN Image Manipulation"

Overview

Designing an Encoder for StyleGAN Image Manipulation (SIGGRAPH 2021)

Open In Colab

Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully invert a real image, one needs to find a latent code that reconstructs the input image accurately, and more importantly, allows for its meaningful manipulation. In this paper, we carefully study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We then suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images by balancing these tradeoffs. By evaluating its performance qualitatively and quantitatively on numerous challenging domains, including cars and horses, we show that our inversion method, followed by common editing techniques, achieves superior real-image editing quality, with only a small reconstruction accuracy drop.

Description

Official Implementation of "Designing an Encoder for StyleGAN Image Manipulation" paper for both training and evaluation. The e4e encoder is specifically designed to complement existing image manipulation techniques performed over StyleGAN's latent space.

Recent Updates

2021.08.17: Add single style code encoder (use --encoder_type SingleStyleCodeEncoder).
2021.03.25: Add pose editing direction.

Getting Started

Prerequisites

  • Linux or macOS
  • NVIDIA GPU + CUDA CuDNN (CPU may be possible with some modifications, but is not inherently supported)
  • Python 3

Installation

  • Clone the repository:
git clone https://github.com/omertov/encoder4editing.git
cd encoder4editing
  • Dependencies:
    We recommend running this repository using Anaconda. All dependencies for defining the environment are provided in environment/e4e_env.yaml.

Inference Notebook

We provide a Jupyter notebook found in notebooks/inference_playground.ipynb that allows one to encode and perform several editings on real images using StyleGAN.

Pretrained Models

Please download the pre-trained models from the following links. Each e4e model contains the entire pSp framework architecture, including the encoder and decoder weights.

Path Description
FFHQ Inversion FFHQ e4e encoder.
Cars Inversion Cars e4e encoder.
Horse Inversion Horse e4e encoder.
Church Inversion Church e4e encoder.

If you wish to use one of the pretrained models for training or inference, you may do so using the flag --checkpoint_path.

In addition, we provide various auxiliary models needed for training your own e4e model from scratch.

Path Description
FFHQ StyleGAN StyleGAN model pretrained on FFHQ taken from rosinality with 1024x1024 output resolution.
IR-SE50 Model Pretrained IR-SE50 model taken from TreB1eN for use in our ID loss during training.
MOCOv2 Model Pretrained ResNet-50 model trained using MOCOv2 for use in our simmilarity loss for domains other then human faces during training.

By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models. However, you may use your own paths by changing the necessary values in configs/path_configs.py.

Training

To train the e4e encoder, make sure the paths to the required models, as well as training and testing data is configured in configs/path_configs.py and configs/data_configs.py.

Training the e4e Encoder

python scripts/train.py \
--dataset_type cars_encode \
--exp_dir new/experiment/directory \
--start_from_latent_avg \
--use_w_pool \
--w_discriminator_lambda 0.1 \
--progressive_start 20000 \
--id_lambda 0.5 \
--val_interval 10000 \
--max_steps 200000 \
--stylegan_size 512 \
--stylegan_weights path/to/pretrained/stylegan.pt \
--workers 8 \
--batch_size 8 \
--test_batch_size 4 \
--test_workers 4 

Training on your own dataset

In order to train the e4e encoder on a custom dataset, perform the following adjustments:

  1. Insert the paths to your train and test data into the dataset_paths variable defined in configs/paths_config.py:
dataset_paths = {
    'my_train_data': '/path/to/train/images/directory',
    'my_test_data': '/path/to/test/images/directory'
}
  1. Configure a new dataset under the DATASETS variable defined in configs/data_configs.py:
DATASETS = {
   'my_data_encode': {
        'transforms': transforms_config.EncodeTransforms,
        'train_source_root': dataset_paths['my_train_data'],
        'train_target_root': dataset_paths['my_train_data'],
        'test_source_root': dataset_paths['my_test_data'],
        'test_target_root': dataset_paths['my_test_data']
    }
}

Refer to configs/transforms_config.py for the transformations applied to the train and test images during training.

  1. Finally, run a training session with --dataset_type my_data_encode.

Inference

Having trained your model, you can use scripts/inference.py to apply the model on a set of images.
For example,

python scripts/inference.py \
--images_dir=/path/to/images/directory \
--save_dir=/path/to/saving/directory \
path/to/checkpoint.pt 

Latent Editing Consistency (LEC)

As described in the paper, we suggest a new metric, Latent Editing Consistency (LEC), for evaluating the encoder's performance. We provide an example for calculating the metric over the FFHQ StyleGAN using the aging editing direction in metrics/LEC.py.

To run the example:

cd metrics
python LEC.py \
--images_dir=/path/to/images/directory \
path/to/checkpoint.pt 

Acknowledgments

This code borrows heavily from pixel2style2pixel

Citation

If you use this code for your research, please cite our paper Designing an Encoder for StyleGAN Image Manipulation:

@article{tov2021designing,
  title={Designing an Encoder for StyleGAN Image Manipulation},
  author={Tov, Omer and Alaluf, Yuval and Nitzan, Yotam and Patashnik, Or and Cohen-Or, Daniel},
  journal={arXiv preprint arXiv:2102.02766},
  year={2021}
}
Comments
  • How to train on my own data?

    How to train on my own data?

    Hi guys,

    Really impressive work here!

    I am wondering whether I can train on my own data and obtain a well-designed encoder for my in-domain issue instead of standard datasets of cats, horses, etc. Could you please give some instructions?

    opened by tearscoco 9
  • how get other interfacegan_direction file?

    how get other interfacegan_direction file?

    your job very great!You provided three interfacegan_direction files:age.pt,pose.pt and smile.pt. However,if I want to get other direction results,how get other interfacegan_direction file? And I compared interfacegan with your file, and the result seems to be different. Can't we use a file directly.

    opened by TomatoBoy90 8
  • Invert Images to W space

    Invert Images to W space

    Hi, thanks for your code!

    I need to invert images into their latent representations of size (1, 512) each. However, I notice that each latent representation produced by your code is of size (1, 18, 512) (I suppose this is the dimension of W+ space).

    Is there a way to get a latent representation of size (1, 512) using your code? (probably the representation in the W space) Or do you think one of the layers in the (1, 18, 512) tensor is reasonable to use as the image representation for further editing in the latent space?

    Thank you very much!

    opened by falloncandra 8
  • Error when trying to use encoder trained on own dataset

    Error when trying to use encoder trained on own dataset

    After training the encoder on my own dataset and trying to use it for inference, I get the following error :

    Loading e4e over the pSp framework from checkpoint: e4e_ffhq_encode.pt Traceback (most recent call last): File "scripts/train.py", line 88, in main() File "scripts/train.py", line 28, in main coach = Coach(opts, previous_train_ckpt) File "./training/coach.py", line 39, in init self.net = pSp(self.opts).to(self.device) File "./models/psp.py", line 28, in init self.load_weights() File "./models/psp.py", line 43, in load_weights self.encoder.load_state_dict(get_keys(ckpt, 'encoder'), strict=True) File "/opt/conda/envs/e4e_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Encoder4Editing: Unexpected key(s) in state_dict: "styles.16.convs.0.weight", "styles.16.convs.0.bias", "styles.16.convs.2.weight", "styles.16.convs.2.bias", "styles.16.convs.4.weight", "styles.16.convs.4.bias", "styles.16.convs.6.weight", "styles.16.convs.6.bias", "styles.16.convs.8.weight", "styles.16.convs.8.bias", "styles.16.convs.10.weight", "styles.16.convs.10.bias", "styles.16.linear.weight", "styles.16.linear.bias", "styles.17.convs.0.weight", "styles.17.convs.0.bias", "styles.17.convs.2.weight", "styles.17.convs.2.bias", "styles.17.convs.4.weight", "styles.17.convs.4.bias", "styles.17.convs.6.weight", "styles.17.convs.6.bias", "styles.17.convs.8.weight", "styles.17.convs.8.bias", "styles.17.convs.10.weight", "styles.17.convs.10.bias", "styles.17.linear.weight", "styles.17.linear.bias".

    Did anyone face the same problem or does anyone have any hints which may help to solve the problem ?

    opened by Alen95 7
  • train ffhq

    train ffhq

    Hi, thanks for the excellent work. I need to train my own face dataset, Can you provide scripts for training FFHQ(Some hyperparameters setting)? In addition, my face image is 256x256, does it affect the model?

    opened by ljiqy 5
  • about ffhq_encode performance

    about ffhq_encode performance

    I trained a model for ffhq_encode but the performance is bad on some scenes. The background is difficult to learn.So what should i do to improve the performance?My training data is 5000 pictures.Should I add some training data?And my loss is id_loss, should I use moco loss?

    }3DD8X%7HRWX)H4SR42DSBG 74FMT9D 19JBVA@TS)1I3JN 2%)2U7CY7B 5T(3OYAR%C8O

    opened by LLSean 4
  • when training on my dataset,person eye getting smaller and smaller,but similarity getting more close 1

    when training on my dataset,person eye getting smaller and smaller,but similarity getting more close 1

    Hi @omertov : Many thanks for your excellent works. However, I have a questions about training my data. my output just like this: image

    after few iterations,it will be like this: eye getting smaller and smaller,similarity getting more close 1 image

    I don't know why Best wishes.

    opened by chinasilva 4
  • About W space?

    About W space?

    Hi authors: Many thanks for your excellent works. However, I have some questions about W your novel definition as follows: What is the difference between ? image and, In Figure. 3, I do not understand the means of the red/blue arrow. Could you kindly help me resolve this question in your spare time?

    Best wishes.

    opened by GreenLimeSia 4
  • I think it's a good idea to add 'age + pose.pt' to github.

    I think it's a good idea to add 'age + pose.pt' to github.

    Thank you very much for your wonderful work! When I actually try it, pt is additive. Therefore, I think it's a good idea to add'age + pose.pt'to github. Then, the face will rotate while the age changes.

    opened by cedro3 4
  • How to preprocess StyleGAN's  latent code of size (1, 18, 512) to (1, 512) to get interfacegan_direction.

    How to preprocess StyleGAN's latent code of size (1, 18, 512) to (1, 512) to get interfacegan_direction.

    Hi thanks for your work!

    I would like to ask how did you get the interfacegan_direction of size (1, 512) that is used in the colab notebook.

    When we invert the image to the latent space using your code, the resulting latent code is of size (1, 18, 512). However, the method train_boundary() in the InterfaceGAN github receives input latent code of size (1, 512). What did you do to preprocess the latent code from size (1, 18, 512) to (1, 512)?

    Thank you very much for your help!

    opened by falloncandra 4
  • No such file or directory: 'pretrained_models/model_ir_se50.pth'

    No such file or directory: 'pretrained_models/model_ir_se50.pth'

    When I run train.py, it will prompt: No such file or directory: 'pretrained_models/model_ir_se50.pth', Where can I find the ‘’model_ir_se50.pth‘’? If you have time, can you help me solve this problem, thank you very much! image

    opened by azyp19970815 4
  • There is a problem with the pre-training weights

    There is a problem with the pre-training weights

    scripts/train.py --dataset_type cars_encode --exp_dir directory --use_w_pool --w_discriminator_lambda 0.1 --progressive_start 20000 --id_lambda 0.5 --val_interval 10000 --start_from_latent_avg --max_steps 200000 --stylegan_size 512 --stylegan_weights a/stylegan2-ffhq-config-f.pt --workers 8 --batch_size 8 --test_batch_size 4 --test_workers 4 {'batch_size': 8, 'board_interval': 50, 'checkpoint_path': None, 'd_reg_every': 16, 'dataset_type': 'cars_encode', 'delta_norm': 2, 'delta_norm_lambda': 0.0002, 'encoder_type': 'Encoder4Editing', 'exp_dir': 'directory', 'id_lambda': 0.5, 'image_interval': 100, 'keep_optimizer': False, 'l2_lambda': 1.0, 'learning_rate': 0.0001, 'lpips_lambda': 0.8, 'lpips_type': 'alex', 'max_steps': 200000, 'optim_name': 'ranger', 'progressive_start': 20000, 'progressive_step_every': 2000, 'progressive_steps': [0, 20000, 22000, 24000, 26000, 28000, 30000, 32000, 34000, 36000, 38000, 40000, 42000, 44000, 46000, 48000], 'r1': 10, 'resume_training_from_ckpt': None, 'save_interval': None, 'save_training_data': False, 'start_from_latent_avg': True, 'stylegan_size': 512, 'stylegan_weights': 'a/stylegan2-ffhq-config-f.pt', 'sub_exp_dir': None, 'test_batch_size': 4, 'test_workers': 4, 'train_decoder': False, 'update_param_list': None, 'use_w_pool': True, 'val_interval': 10000, 'w_discriminator_lambda': 0.1, 'w_discriminator_lr': 2e-05, 'w_pool_size': 50, 'workers': 8} Loading encoders weights from irse50! Loading decoder weights from pretrained! Traceback (most recent call last): File "/root/test/encoder4editing-main/encoder4editing-main/scripts/train.py", line 87, in main() File "/root/test/encoder4editing-main/encoder4editing-main/scripts/train.py", line 28, in main coach = Coach(opts, previous_train_ckpt) File "../training/coach.py", line 42, in init self.lpips_loss = LPIPS(net_type=self.opts.lpips_type).to(self.device).eval() File "../criteria/lpips/lpips.py", line 23, in init self.net = get_network(net_type).to("cuda") File "../criteria/lpips/networks.py", line 14, in get_network return AlexNet() File "../criteria/lpips/networks.py", line 81, in init self.layers = models.alexnet(True).features File "/root/miniconda3/lib/python3.8/site-packages/torchvision/models/alexnet.py", line 63, in alexnet state_dict = load_state_dict_from_url(model_urls['alexnet'], File "/root/miniconda3/lib/python3.8/site-packages/torch/hub.py", line 528, in load_state_dict_from_url return torch.load(cached_file, map_location=map_location) File "/root/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 593, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/root/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 762, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: unpickling stack underflow

    Process finished with exit code 1

    opened by softalter 0
  • whether the code is wrong?

    whether the code is wrong?

    Nice work you did. When I read the code carefully, I found the code in coach.py in line 402 " self.discriminator.zero_grad() r1_final_loss = self.opts.r1 / 2 * r1_loss * self.opts.d_reg_every + 0 * real_pred[0] r1_final_loss.backward() self.discriminator_optimizer.step() " self.discriminator.zero_grad() should be modified to self.discriminator_optimizer.zero_grad()

    opened by tengshaofeng 0
  • Regarding finding directions in W+ space

    Regarding finding directions in W+ space

    Hi,

    I have gone through issues However, I am wondering if we can use the w+ space predicted by e4e can be analysed in any way to find the directions. This is because I have observed some interesting findings while interpolating w+ space between two images. I also have the labels to these images.

    Thanks, Sai Sagar

    opened by jsaisagar 0
  • Error while running inference.py

    Error while running inference.py

    I'm trying to run this project on ubuntu 18.4 and getting this error while importing PSP. Can help me figure out what's going on?

    encoder4editing-app-1 | Traceback (most recent call last): encoder4editing-app-1 | File "inference.py", line 15, in encoder4editing-app-1 | from utils.model_utils import setup_model encoder4editing-app-1 | File "/utils/model_utils.py", line 3, in encoder4editing-app-1 | from models.psp import pSp encoder4editing-app-1 | File "/models/psp.py", line 6, in encoder4editing-app-1 | from models.encoders import psp_encoders encoder4editing-app-1 | File "/models/encoders/psp_encoders.py", line 9, in encoder4editing-app-1 | from models.stylegan2.model import EqualLinear encoder4editing-app-1 | File "/models/stylegan2/model.py", line 7, in encoder4editing-app-1 | from models.stylegan2.op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d encoder4editing-app-1 | File "/models/stylegan2/op/init.py", line 1, in encoder4editing-app-1 | from .fused_act import FusedLeakyReLU, fused_leaky_relu encoder4editing-app-1 | File "/models/stylegan2/op/fused_act.py", line 13, in encoder4editing-app-1 | os.path.join(module_path, 'fused_bias_act_kernel.cu'), encoder4editing-app-1 | File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1136, in load encoder4editing-app-1 | keep_intermediates=keep_intermediates) encoder4editing-app-1 | File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile encoder4editing-app-1 | is_standalone=is_standalone) encoder4editing-app-1 | File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1445, in _write_ninja_file_and_build_library encoder4editing-app-1 | is_standalone=is_standalone) encoder4editing-app-1 | File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1834, in _write_ninja_file_to_build_library encoder4editing-app-1 | cuda_flags = common_cflags + COMMON_NVCC_FLAGS + _get_cuda_arch_flags() encoder4editing-app-1 | File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1606, in _get_cuda_arch_flags encoder4editing-app-1 | arch_list[-1] += '+PTX' encoder4editing-app-1 | IndexError: list index out of range

    opened by affanmehmood 2
  • How to get figure 2

    How to get figure 2

    Hi authors: I have read this excellent paper recently. Thank you so much for your paper. It really helped me to understand the differences between W space and Wk space. I want to know how to get a 2D visualization example of W and Wk space like figure 2. Could you kindly help me resolve this question in your spare time?

    opened by lm1687019806 1
Owner
null
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 49 Nov 23, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
StyleGAN2-ADA - Official PyTorch implementation

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes.

NVIDIA Research Projects 3.2k Dec 30, 2022
Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

Bosch Research 272 Dec 28, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow 201 Dec 21, 2022
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

null 364 Dec 14, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

Liang Liu 28 Nov 16, 2022
Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection This project provides an implementation for "LLA: Loss-aware Label Assignment for Dens

null 35 Dec 6, 2022
Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

Dongkwan Kim 127 Dec 28, 2022
An official implementation of "SFNet: Learning Object-aware Semantic Correspondence" (CVPR 2019, TPAMI 2020) in PyTorch.

PyTorch implementation of SFNet This is the implementation of the paper "SFNet: Learning Object-aware Semantic Correspondence". For more information,

CV Lab @ Yonsei University 87 Dec 30, 2022
This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

BiPointNet: Binary Neural Network for Point Clouds Created by Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Li

Haotong Qin 59 Dec 17, 2022
Official code implementation for "Personalized Federated Learning using Hypernetworks"

Personalized Federated Learning using Hypernetworks This is an official implementation of Personalized Federated Learning using Hypernetworks paper. [

Aviv Shamsian 121 Dec 25, 2022
StyleGAN2 - Official TensorFlow Implementation

StyleGAN2 - Official TensorFlow Implementation

NVIDIA Research Projects 10.1k Dec 28, 2022
Old Photo Restoration (Official PyTorch Implementation)

Bringing Old Photo Back to Life (CVPR 2020 oral)

Microsoft 11.3k Dec 30, 2022
Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

GS-WGAN This repository contains the implementation for GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators (NeurIPS

null 46 Nov 9, 2022
Official PyTorch implementation of Spatial Dependency Networks.

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling Đorđe Miladinović   Aleksandar Stanić   Stefan Bauer   Jürgen Schmid

Djordje Miladinovic 34 Jan 19, 2022
Official implementation of YOGO for Point-Cloud Processing

You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module By Chenfeng Xu, Bohan Zhai, Bichen Wu, T

Chenfeng Xu 67 Dec 20, 2022