UI2I via StyleGAN2 - Unsupervised image-to-image translation method via pre-trained StyleGAN2 network

Overview

License CC BY-NC-SA 4.0

We proposed an unsupervised image-to-image translation method via pre-trained StyleGAN2 network.

mona

paper: Unsupervised Image-to-Image Translation via Pre-trained StyleGAN2 Network

Prerequisite

  • PyTorch 1.3.1
  • CUDA 10.1

Step 1: Model Fine-tuning

To obtain the target model, you need to follow the instruction of data preparation stated in the StyleGAN2 pytorch implementation here

python prepare_data.py --out LMDB_PATH --n_worker N_WORKER --size SIZE1,SIZE2,SIZE3,... DATASET_PATH

And fine-tune the model with data in the target domain:

python -m torch.distributed.launch --nproc_per_node=N_GPU --master_port=PORT train.py --batch BATCH_SIZE LMDB_PATH --ckpt your_base_model_path

Step 2: Closed-Form GAN space

Calculate the GAN space via the proposed algorithm, and a factor can then be obtained. python3 closed_form_factorization.py --ckpt your_model --out output_factor_path

Step 3: Image inversion

Inverse the image to a latent code based on the StyleGAN2 model trained on its domain python3 project_factor.py --ckpt stylegan_model_path --fact factor_path IMAGE_FILE

Step 4: LS Image generation with multiple styles

We use the inversed code to generate images with multiple style in the target domain

python3 gen_multi_style.py --model base_model_path --model2 target_model_path --fact base_inverse.pt --fact_base factor_from_base_model -o output_path --swap_layer 3 --stylenum 10

In additon to multi-modal translation, the style of the output can be specified by reference. To achieve this, we need to inverse the reference image as well since its latent code would then be used as style code in the generation.

python3 gen_ref.py --model1 base_model_path --model2 target_model_path --fact base_inverse.pt --fac_ref reference_inverse.pt --fact_base1 factor_from_base_model --fact_base2 factor_from_target_model -o output_path

pre-trained base model and dataset

We use the StyleGAN2 face models trained on FFHQ, 256x256 (by @rosinality). And the 1024x1024 can be found in the StyleGAN2 official implementation, model conversion between TF and Pytorch is needed. Models fine-tuned on such models can be used for I2I translation, though with FreezeFC they can achieve better results.

Many thanks to Gwern for providing the Anime dataset Danbooru and Doron Adler and Justin Pinkney for providing the cartoon dataset.

Some Results

cartoon2face1 cartoon2face2 cartoon2face3 cartoon2face4 face2portrait1 face2portrait2

The code is heavily borrowed from StyleGAN2 implementation (rosality's StyleGAN2 implementation) and close-form Factorization, thanks to their great work and contribution!

Comments
  • no module named dnnlib.tflib.ops

    no module named dnnlib.tflib.ops

    Hi @HideUnderBush, thanks for your great work, and when I try to use it, I encounter some problems. When I use python closed_from _factorization.py, I got aRuntimeError: no default TensorFlow session found. please call dnnlib.init_tf(), then I do like it says, call dnnlin.init_tf() before tf.get_default_session(), but got no module error as above. Hope you can help, thanks~

    opened by visonpon 2
  • Cite prior work on layer swapping

    Cite prior work on layer swapping

    Hi just stumbled across this and it looks great, particularly the anime generation images. Looks like you're essentially using the method I described in some of my blog posts around transfer learning, using one latent code from one model in another and layer swapping. (https://.www.justinpinkney.com) and I'm glad to see you cite Doron and I in for our Toonify work!

    We actually have a paper on arxiv that descirbes this approach, particularly focussing on the idea of layer swapping you're using It would be really great if you could cite our actual paper: Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains

    Perhaps as prior work where you are describing the "layer swapping" you perform? image

    opened by justinpinkney 2
  • Stylegan inversion

    Stylegan inversion

    Hi!

    According to your paper, it takes about a second to invert an image to its latent representation: "...and another 0.8 − 1 s for the inversion process". However, in your current implementation it is an iterative optimization process, which takes more than a minute with the default settings. Could you, please, clarify how do you successfully accomplish the inversion task within 1 second?

    opened by IvanBarabanau 1
  • permission denied at convert

    permission denied at convert

    Traceback (most recent call last): File "convert_weight.py", line 235, in with open(args.path, "rb") as f: PermissionError: [Errno 13] Permission denied: 'D:/converted'

    I set up environment correctly in conda, python 3.6 and tf=1.14 torch with cpp extension and after defeating all the errors i got this one. Any idea what this might be? :) thanks ia

    opened by ghost 0
  • RuntimeError: Invalid magic number; corrupt file?

    RuntimeError: Invalid magic number; corrupt file?

    @HideUnderBush, I got this error when runclosed_form_factorization.py

    Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done. Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done. Traceback (most recent call last): File "closed_form_factorization.py", line 14, in ckpt = torch.load(args.ckpt, map_location='cuda:0') File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 595, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 766, in _legacy_load raise RuntimeError("Invalid magic number; corrupt file?") RuntimeError: Invalid magic number; corrupt file?

    opened by visonpon 0
  • Bug in gen_ref.py

    Bug in gen_ref.py

    BUG: Line 104 and Line 107 should add parameter input_is_latent=True or the content and the reference cannot be properly used.

    ISSUE: Also truncation=0.5 can sometimes be too strict, making the output's content and style not matched with the input content image and the reference image.

    opened by williamyang1991 3
  • Not an issue - Question on closed form factorization

    Not an issue - Question on closed form factorization

    https://github.com/HideUnderBush/UI2I_via_StyleGAN2/blob/bd4cd6af326f22f55c58b9b3886d1a5bbdb7460f/closed_form_factorization.py#L17

    I’ve been digging through GitHub for help on g_ema tweaking from generator

    I have this ticket - https://github.com/danielroich/PTI/issues/26

    The maths is a bit beyond me / but I suspect I need to update g_ema like you’ve done here. I need to play around with this repo to investigate further.

    opened by johndpope 0
  • Some questions about fine-tuning on Danbooru Datasets

    Some questions about fine-tuning on Danbooru Datasets

    Hi, @HideUnderBush! Thanks for you amazing works! I try to reimplement the face2anime experiments on Danbooru Datasets. However, I face some confusions, could you give me some advice? Step 1: According to your scripts, I use the 512 px stylegan2 ckpt pretrained on ffhq datasets as base, and finetune on Danbooru Datasets. (I didn't change any other params, is that right?) Step 2: I use closed_form_factorization.py to decompose the model has trained 35000 iterations (35000.pt) to get factor.out file. Step 3: I try to achieve image inversion (size is 512), however, when the optimization program finished, I got an almost black result. The MSE loss is very large. (The loss is about 1.4-1.7). Are there any key points I forgot? I wish you can point out some mistakes about my steps. Thanks for your jobs!

    opened by kingofprank 0
  • Layer swap in gen_multi_style.py

    Layer swap in gen_multi_style.py

    Thank you for your amazing work. I am a little confused about the layer swap part in your implementation. It seems that you first pass the latent code into the base model and then extract the intermediate results for the target model as the following.

    img1, swap_res = g_ema1([input_latent], input_is_latent=True, save_for_swap=True, swap_layer=args.swap_layer)
    
    for i in range(args.stylenum):
        sample_z_style = torch.randn(1, 512, device=args.device)
        img_style, _ = g_ema2([input_latent], truncation=0.5, truncation_latent=mean_latent, swap=True, swap_layer=args.swap_layer,  swap_tensor=swap_res, multi_style=True, multi_style_latent=[sample_z_style])
        print(i)
        img_style_name = args.output + "_style_" + str(i) + ".png"
        img_style = make_image(img_style)
        out_style = Image.fromarray(img_style[0])
        out_style.save(img_style_name)```
    
    Is it true that you are trying to keep the low level information such as shape and pose from original model and put the lightening and texture from the target model? 
    opened by crownk1997 0
Owner
null
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Kimio Kuramitsu 1 Dec 13, 2021
High level network definitions with pre-trained weights in TensorFlow

TensorNets High level network definitions with pre-trained weights in TensorFlow (tested with 2.1.0 >= TF >= 1.4.0). Guiding principles Applicability.

Taehoon Lee 1k Dec 13, 2022
Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition" Pre-trained Deep Convo

Ankush Malaker 5 Nov 11, 2022
Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

null 61 Jan 1, 2023
Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

arXiv Dual Contrastive Learning Adversarial Generative Networks (DCLGAN) We provide our PyTorch implementation of DCLGAN, which is a simple yet powerf

null 119 Dec 4, 2022
Unsupervised Image-to-Image Translation

UNIT: UNsupervised Image-to-image Translation Networks Imaginaire Repository We have a reimplementation of the UNIT method that is more performant. It

Ming-Yu Liu 劉洺堉 1.9k Dec 26, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

THUDM 101 Dec 16, 2022
Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

THUNLP 75 Nov 2, 2022
pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

Zhiqiang Shen 16 Nov 4, 2020
Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Face Identity Disentanglement via Latent Space Mapping - Implement in pytorch with StyleGAN 2 Description Pytorch implementation of the paper Face Ide

Daniel Roich 58 Dec 24, 2022
Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial

Utku Ozbulak 53 Jul 4, 2022
Pre-Trained Image Processing Transformer (IPT)

Pre-Trained Image Processing Transformer (IPT) By Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Cha

HUAWEI Noah's Ark Lab 332 Dec 18, 2022
PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

Reference-Based-Sketch-Image-Colorization-ImageNet This is a PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization usin

Yuzhi ZHAO 11 Jul 28, 2022
CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

CLIP-Indonesian CLIP (Radford et al., 2021) is a multimodal model that can connect images and text by training a vision encoder and a text encoder joi

Galuh 17 Mar 10, 2022
Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature

Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature Q. Wan, L. Gao, X. Li and L. Wen, "Industrial Image Anomaly

smiler 6 Dec 25, 2022
Annotate datasets with a semi-trained or fully trained YOLOv5 model

YOLOv5 Auto Annotator Annotate datasets with a semi-trained or fully trained YOLOv5 model Prerequisites Ubuntu >=20.04 Python >=3.7 System dependencie

Akash James 3 May 14, 2022
Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

Han Xu 129 Dec 11, 2022
LIAO Shuiying 6 Dec 1, 2022