UI2I via StyleGAN2 - Unsupervised image-to-image translation method via pre-trained StyleGAN2 network

Last update: Dec 30, 2022

Related tags

Overview

We proposed an unsupervised image-to-image translation method via pre-trained StyleGAN2 network.

paper: Unsupervised Image-to-Image Translation via Pre-trained StyleGAN2 Network

Prerequisite

PyTorch 1.3.1
CUDA 10.1

Step 1: Model Fine-tuning

To obtain the target model, you need to follow the instruction of data preparation stated in the StyleGAN2 pytorch implementation here

python prepare_data.py --out LMDB_PATH --n_worker N_WORKER --size SIZE1,SIZE2,SIZE3,... DATASET_PATH

And fine-tune the model with data in the target domain:

python -m torch.distributed.launch --nproc_per_node=N_GPU --master_port=PORT train.py --batch BATCH_SIZE LMDB_PATH --ckpt your_base_model_path

Step 2: Closed-Form GAN space

Calculate the GAN space via the proposed algorithm, and a factor can then be obtained. python3 closed_form_factorization.py --ckpt your_model --out output_factor_path

Step 3: Image inversion

Inverse the image to a latent code based on the StyleGAN2 model trained on its domain python3 project_factor.py --ckpt stylegan_model_path --fact factor_path IMAGE_FILE

Step 4: LS Image generation with multiple styles

We use the inversed code to generate images with multiple style in the target domain

python3 gen_multi_style.py --model base_model_path --model2 target_model_path --fact base_inverse.pt --fact_base factor_from_base_model -o output_path --swap_layer 3 --stylenum 10

In additon to multi-modal translation, the style of the output can be specified by reference. To achieve this, we need to inverse the reference image as well since its latent code would then be used as style code in the generation.

python3 gen_ref.py --model1 base_model_path --model2 target_model_path --fact base_inverse.pt --fac_ref reference_inverse.pt --fact_base1 factor_from_base_model --fact_base2 factor_from_target_model -o output_path

pre-trained base model and dataset

We use the StyleGAN2 face models trained on FFHQ, 256x256 (by @rosinality). And the 1024x1024 can be found in the StyleGAN2 official implementation, model conversion between TF and Pytorch is needed. Models fine-tuned on such models can be used for I2I translation, though with FreezeFC they can achieve better results.

Many thanks to Gwern for providing the Anime dataset Danbooru and Doron Adler and Justin Pinkney for providing the cartoon dataset.

Some Results

The code is heavily borrowed from StyleGAN2 implementation (rosality's StyleGAN2 implementation) and close-form Factorization, thanks to their great work and contribution!

Comments

no module named dnnlib.tflib.ops

Hi @HideUnderBush, thanks for your great work, and when I try to use it, I encounter some problems. When I use python closed_from _factorization.py, I got aRuntimeError: no default TensorFlow session found. please call dnnlib.init_tf(), then I do like it says, call dnnlin.init_tf() before tf.get_default_session(), but got no module error as above. Hope you can help, thanks~

opened by visonpon 2
Cite prior work on layer swapping

Hi just stumbled across this and it looks great, particularly the anime generation images. Looks like you're essentially using the method I described in some of my blog posts around transfer learning, using one latent code from one model in another and layer swapping. (https://.www.justinpinkney.com) and I'm glad to see you cite Doron and I in for our Toonify work!

We actually have a paper on arxiv that descirbes this approach, particularly focussing on the idea of layer swapping you're using It would be really great if you could cite our actual paper: Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains

Perhaps as prior work where you are describing the "layer swapping" you perform?

opened by justinpinkney 2
Stylegan inversion

Hi!

According to your paper, it takes about a second to invert an image to its latent representation: "...and another 0.8 − 1 s for the inversion process". However, in your current implementation it is an iterative optimization process, which takes more than a minute with the default settings. Could you, please, clarify how do you successfully accomplish the inversion task within 1 second?

opened by IvanBarabanau 1
permission denied at convert

Traceback (most recent call last): File "convert_weight.py", line 235, in with open(args.path, "rb") as f: PermissionError: [Errno 13] Permission denied: 'D:/converted'

I set up environment correctly in conda, python 3.6 and tf=1.14 torch with cpp extension and after defeating all the errors i got this one. Any idea what this might be? :) thanks ia

opened by ghost 0
RuntimeError: Invalid magic number; corrupt file?

@HideUnderBush, I got this error when runclosed_form_factorization.py

Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done. Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done. Traceback (most recent call last): File "closed_form_factorization.py", line 14, in ckpt = torch.load(args.ckpt, map_location='cuda:0') File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 595, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 766, in _legacy_load raise RuntimeError("Invalid magic number; corrupt file?") RuntimeError: Invalid magic number; corrupt file?

opened by visonpon 0
Bug in gen_ref.py

BUG: Line 104 and Line 107 should add parameter input_is_latent=True or the content and the reference cannot be properly used.

ISSUE: Also truncation=0.5 can sometimes be too strict, making the output's content and style not matched with the input content image and the reference image.

opened by williamyang1991 3
Not an issue - Question on closed form factorization

https://github.com/HideUnderBush/UI2I_via_StyleGAN2/blob/bd4cd6af326f22f55c58b9b3886d1a5bbdb7460f/closed_form_factorization.py#L17

I’ve been digging through GitHub for help on g_ema tweaking from generator

I have this ticket - https://github.com/danielroich/PTI/issues/26

The maths is a bit beyond me / but I suspect I need to update g_ema like you’ve done here. I need to play around with this repo to investigate further.

opened by johndpope 0
Some questions about fine-tuning on Danbooru Datasets

Hi, @HideUnderBush! Thanks for you amazing works! I try to reimplement the face2anime experiments on Danbooru Datasets. However, I face some confusions, could you give me some advice? Step 1: According to your scripts, I use the 512 px stylegan2 ckpt pretrained on ffhq datasets as base, and finetune on Danbooru Datasets. (I didn't change any other params, is that right?) Step 2: I use closed_form_factorization.py to decompose the model has trained 35000 iterations (35000.pt) to get factor.out file. Step 3: I try to achieve image inversion (size is 512), however, when the optimization program finished, I got an almost black result. The MSE loss is very large. (The loss is about 1.4-1.7). Are there any key points I forgot? I wish you can point out some mistakes about my steps. Thanks for your jobs!

opened by kingofprank 0

Layer swap in gen_multi_style.py

Thank you for your amazing work. I am a little confused about the layer swap part in your implementation. It seems that you first pass the latent code into the base model and then extract the intermediate results for the target model as the following.

img1, swap_res = g_ema1([input_latent], input_is_latent=True, save_for_swap=True, swap_layer=args.swap_layer)

for i in range(args.stylenum):
    sample_z_style = torch.randn(1, 512, device=args.device)
    img_style, _ = g_ema2([input_latent], truncation=0.5, truncation_latent=mean_latent, swap=True, swap_layer=args.swap_layer,  swap_tensor=swap_res, multi_style=True, multi_style_latent=[sample_z_style])
    print(i)
    img_style_name = args.output + "_style_" + str(i) + ".png"
    img_style = make_image(img_style)
    out_style = Image.fromarray(img_style[0])
    out_style.save(img_style_name)```

Is it true that you are trying to keep the low level information such as shape and pose from original model and put the lightening and texture from the target model?

opened by crownk1997 0

UI2I via StyleGAN2 - Unsupervised image-to-image translation method via pre-trained StyleGAN2 network

Related tags

Overview

Prerequisite

Step 1: Model Fine-tuning

Step 2: Closed-Form GAN space

Step 3: Image inversion

Step 4: LS Image generation with multiple styles

pre-trained base model and dataset

Some Results

Comments

no module named dnnlib.tflib.ops

Cite prior work on layer swapping

Stylegan inversion

permission denied at convert

RuntimeError: Invalid magic number; corrupt file?

Bug in gen_ref.py

Not an issue - Question on closed form factorization

Some questions about fine-tuning on Danbooru Datasets

Layer swap in gen_multi_style.py

Owner

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

High level network definitions with pre-trained weights in TensorFlow

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

Unsupervised Image-to-Image Translation

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Pre-Trained Image Processing Transformer (IPT)

PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

Industrial Image Anomaly Localization Based on Gaussian Clustering of Pre-trained Feature

Annotate datasets with a semi-trained or fully trained YOLOv5 model

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.