Fine-tuning StyleGAN2 for Cartoon Face Generation

Overview

Cartoon-StyleGAN 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Abstract

Recent studies have shown remarkable success in the unsupervised image to image (I2I) translation. However, due to the imbalance in the data, learning joint distribution for various domains is still very challenging. Although existing models can generate realistic target images, it’s difficult to maintain the structure of the source image. In addition, training a generative model on large data in multiple domains requires a lot of time and computer resources. To address these limitations, I propose a novel image-to-image translation method that generates images of the target domain by finetuning a stylegan2 pretrained model. The stylegan2 model is suitable for unsupervised I2I translation on unbalanced datasets; it is highly stable, produces realistic images, and even learns properly from limited data when applied with simple fine-tuning techniques. Thus, in this project, I propose new methods to preserve the structure of the source images and generate realistic images in the target domain.

Inference Notebook

🎉 You can do this task in colab ! : Open In Colab

Arxiv arXiv

[NEW!] 2021.08.30 Streamlit Ver


1. Method

Baseline : StyleGAN2-ADA + FreezeD

It generates realistic images, but does not maintain the structure of the source domain.

Ours : FreezeSG (Freeze Style vector and Generator)

FreezeG is effective in maintaining the structure of the source image. As a result of various experiments, I found that not only the initial layer of the generator but also the initial layer of the style vector are important for maintaining the structure. Thus, I froze the low-resolution layer of both the generator and the style vector.

Freeze Style vector and Generator

Results

With Layer Swapping

When LS is applied, the generated images by FreezeSG have a higher similarity to the source image than when FreezeG or the baseline (FreezeD + ADA) were used. However, since this fixes the weights of the low-resolution layer of the generator, it is difficult to obtain meaningful results when layer swapping on the low-resolution layer.

Ours : Structure Loss

Based on the fact that the structure of the image is determined at low resolution, I apply structure loss to the values of the low-resolution layer so that the generated image is similar to the image in the source domain. The structure loss makes the RGB output of the source generator to be fine-tuned to have a similar value with the RGB output of the target generator during training.

Results

Compare


2. Application : Change Facial Expression / Pose

I applied various models(ex. Indomain-GAN, SeFa, StyleCLIP…) to change facial expression, posture, style, etc.

(1) Closed Form Factorization(SeFa)

Pose

Slim Face

(2) StyleCLIP – Latent Optimization

Inspired by StyleCLIP that manipulates generated images with text, I change the faces of generated cartoon characters by text. I used the latent optimization method among the three methods of StyleCLIP and additionally introduced styleclip strength. It allows the latent vector to linearly move in the direction of the optimized latent vector, making the image change better with text.

with baseline model(FreezeD)

with our model(structureLoss)

(3) Style Mixing

Style-Mixing

When mixing layers, I found specifics layers that make a face. While the overall structure (hair style, facial shape, etc.) and texture (skin color and texture) were maintained, only the face(eyes, nose and mouth) was changed.

Results


3. Requirements

I have tested on:

Installation

Clone this repo :

git clone https://github.com/happy-jihye/Cartoon-StyleGan2
cd Cartoon-StyleGan2

Pretrained Models

Please download the pre-trained models from the following links.

Path Description
StyleGAN2-FFHQ256 StyleGAN2 pretrained model(256px) with FFHQ dataset from Rosinality
StyleGAN2-Encoder In-Domain GAN Inversion model with FFHQ dataset from Bryandlee
NaverWebtoon FreezeD + ADA with NaverWebtoon Dataset
NaverWebtoon_FreezeSG FreezeSG with NaverWebtoon Dataset
NaverWebtoon_StructureLoss StructureLoss with NaverWebtoon Dataset
Romance101 FreezeD + ADA with Romance101 Dataset
TrueBeauty FreezeD + ADA with TrueBeauty Dataset
Disney FreezeD + ADA with Disney Dataset
Disney_FreezeSG FreezeSG with Disney Dataset
Disney_StructureLoss StructureLoss with Disney Dataset
Metface_FreezeSG FreezeSG with Metface Dataset
Metface_StructureLoss StructureLoss with Metface Dataset

If you want to download all of the pretrained model, you can use download_pretrained_model() function in utils.py.

Dataset

I experimented with a variety of datasets, including Naver Webtoon, Metfaces, and Disney.

NaverWebtoon Dataset contains facial images of webtoon characters serialized on Naver. I made this dataset by crawling webtoons from Naver’s webtoons site and cropping the faces to 256 x 256 sizes. There are about 15 kinds of webtoons and 8,000 images(not aligned). I trained the entire Naver Webtoon dataset, and I also trained each webtoon in this experiment

I was also allowed to share a pretrained model with writers permission to use datasets. Thank you for the writers (Yaongyi, Namsoo, justinpinkney) who gave us permission.

Getting Started !

1. Prepare LMDB Dataset

First create lmdb datasets:

python prepare_data.py --out LMDB_PATH --n_worker N_WORKER --size SIZE1,SIZE2,SIZE3,... DATASET_PATH

# if you have zip file, change it to lmdb datasets by this commend
python run.py --prepare_data=DATASET_PATH --zip=ZIP_NAME --size SIZE

2. Train

# StyleGAN2
python train.py --batch BATCH_SIZE LMDB_PATH
# ex) python train.py --batch=8 --ckpt=ffhq256.pt --freezeG=4 --freezeD=3 --augment --path=LMDB_PATH

# StructureLoss
# ex) python train.py --batch=8 --ckpt=ffhq256.pt --structure_loss=2 --freezeD=3 --augment --path=LMDB_PATH

# FreezeSG
# ex) python train.py --batch=8 --ckpt=ffhq256.pt --freezeStyle=2 --freezeG=4 --freezeD=3 --augment --path=LMDB_PATH


# Distributed Settings
python train.py --batch BATCH_SIZE --path LMDB_PATH \
    -m torch.distributed.launch --nproc_per_node=N_GPU --main_port=PORT

Options

  1. Project images to latent spaces

    python projector.py --ckpt [CHECKPOINT] --size [GENERATOR_OUTPUT_SIZE] FILE1 FILE2 ...
    
  2. Closed-Form Factorization

    You can use closed_form_factorization.py and apply_factor.py to discover meaningful latent semantic factor or directions in unsupervised manner.

    First, you need to extract eigenvectors of weight matrices using closed_form_factorization.py

    python closed_form_factorization.py [CHECKPOINT]
    

    This will create factor file that contains eigenvectors. (Default: factor.pt) And you can use apply_factor.py to test the meaning of extracted directions

    python apply_factor.py -i [INDEX_OF_EIGENVECTOR] -d [DEGREE_OF_MOVE] -n [NUMBER_OF_SAMPLES] --ckpt [CHECKPOINT] [FACTOR_FILE]
    # ex) python apply_factor.py -i 19 -d 5 -n 10 --ckpt [CHECKPOINT] factor.pt
    

StyleGAN2-ada + FreezeD

During the experiment, I also carried out a task to generate a cartoon image based on Nvidia Team's StyleGAN2-ada code. When training these models, I didn't control the dataset resolution(256px) 😂 . So the quality of the generated image can be broken.

You can practice based on this code at Colab : Open In Colab

Generated-Image Interpolation

Reference

Comments
  • regarding training on a new dataset

    regarding training on a new dataset

    Thanks for your amasing work shared here. I was wondering how to train a model using my custom dataset.

    1. is it a good way to adopt transfer learning on a novel dataset from one of models you provided, for example ffhq256.pt or Disney.pt?
    2. should i freeze some layers (D or G) on the begining? or startup training without freeze layers, and do freezing after convergence?
    3. traing with parameter augment is always better than without augment? looking forward to your replay, thank you ^_^
    opened by cherryZ0 2
  • about the freezing of generator

    about the freezing of generator

    Thanks for the code~ I'm confused about freezing the specific layers of generator.

    After you set the layers of the generator to false in Line 181 requires_grad(generator, False)

    What's the purpose of Line 192~195?

    opened by qiuyuda 2
  • ImportError: cannot import name 'ensure_checkpoint_exists' from 'utils'

    ImportError: cannot import name 'ensure_checkpoint_exists' from 'utils'

    Hi, Jihye ! Nice to meet you !

    When I had tried to start all boxes in your collab I got this error. Could you help me, please )

    ImportError: cannot import name 'ensure_checkpoint_exists' from 'utils' (/content/Cartoon-StyleGan2/utils.py)

    opened by ROLL7777777 1
  • Can you show an example of encoding one photo and generating cartoon face of that photo?

    Can you show an example of encoding one photo and generating cartoon face of that photo?

    Hi instead of changing the facial expression in the colab, Can you show an example of encoding one photo and generating cartoon face of that photo? Thank you

    opened by kasim0226 1
  • Can you share used FreezeS/G/D values?

    Can you share used FreezeS/G/D values?

    Hello and thanks for an awesome work!

    May I know which layers were frozen (i. e. what are the values for the freezeS/G/D) in order to reproduce attached images? Also, am I correct to assume that 3 layers were used for the structure loss?

    All the best, Ivan

    Screenshot 2021-07-10 at 08 57 06
    opened by icoric4 1
  • Google Drive permissions [GOOGLE COLAB]

    Google Drive permissions [GOOGLE COLAB]

    Permission denied: https://drive.google.com/uc?id=1yIn_gM3Fk3RrRphTPNBPgJ3c-PuzCjOB Maybe you need to change permission over 'Anyone with the link'? Permission denied: https://drive.google.com/uc?id=1OysFtj7QTy7rPnxV9TXeEgBfmtgr8575 Maybe you need to change permission over 'Anyone with the link'? Permission denied: https://drive.google.com/uc?id=1Oylfl5j-XGoG_pFHtQwHd2G7yNSHx2Rm Maybe you need to change permission over 'Anyone with the link'? Permission denied: https://drive.google.com/uc?id=1wWt4dPC9TJfJ6cF3mwg7kQvpuVwPhSN7 Maybe you need to change permission over 'Anyone with the link'? Permission denied: https://drive.google.com/uc?id=1yEky49SnkBqPhdWvSAwgK5Sbrc3ctz1y Maybe you need to change permission over 'Anyone with the link'? Permission denied: https://drive.google.com/uc?id=1z51gxECweWXqSYQxZJaHOJ4TtjUDGLxA Maybe you need to change permission over 'Anyone with the link'? Permission denied: https://drive.google.com/uc?id=1PJaNozfJYyQ1ChfZiU2RwJqGlOurgKl7 Maybe you need to change permission over 'Anyone with the link'? Permission denied: https://drive.google.com/uc?id=1PILW-H4Q0W8S22TO4auln1Wgz8cyroH6 Maybe you need to change permission over 'Anyone with the link'? Permission denied: https://drive.google.com/uc?id=1P5T6DL3Cl8T74HqYE0rCBQxcq15cipuw Maybe you need to change permission over 'Anyone with the link'? Permission denied: https://drive.google.com/uc?id=1P65UldIHd2QfBu88dYdo1SbGjcDaq1YL Maybe you need to change permission over 'Anyone with the link'?

    opened by GrigorySamokhin 1
  • How to rectify access denied due to multiple attempts?

    How to rectify access denied due to multiple attempts?

    Access denied with the following error: Access denied with the following error:

    Cannot retrieve the public link of the file. You may need to change
    the permission to 'Anyone with the link', or have had many accesses. 
    

    You may still be able to access the file from the browser:

     https://drive.google.com/uc?id=1yIn_gM3Fk3RrRphTPNBPgJ3c-PuzCjOB 
    
    
    Too many users have viewed or downloaded this file recently. Please
    try accessing the file again later. If the file you are trying to
    access is particularly large or is shared with many people, it may
    take up to 24 hours to be able to view or download the file. If you
    still can't access a file after 24 hours, contact your domain
    administrator. 
    
    opened by projandr 2
  • Please advise on how to convert PT model to ONNX.

    Please advise on how to convert PT model to ONNX.

    Hello: I found your repo., it looks great. However, I am rather new to Python language, but I have rather good experience with C#, and have some experience with ONNX models. I am using Python 3.9 on Windows 10. I want to know if I can convert any of the PT model, like: 550000.PT to onnx model, like: 550000.onnx. I have installed Netron 5.6.4, and I can view 550000.PT file, it looks very complicated. Can you suggest on how to do this? I found someone showed some Python script to convert other type of model to ONNX, but it seems they know the model very well. But I don’t know too much about any model used in your repo. Please advise, Thanks,

    opened by zydjohnHotmail 0
  • freezeFC and generator_source

    freezeFC and generator_source

    Thanks for the work. Looks really cool! I have two questions about the script train.py:

    (1) Does the parameter freezeFC mean that you want to freeze the whole style network (from z to w) on left of the generator? Looks you do not really freeze it but instead use the latent w computed from the source generator.

    (2) For the generator_source, do we also need to load the weight of source generator before starting the training? I see you only define it without pre-loading any weight.

    opened by Yijunmaverick 0
  • Colab notebook error because of compare_ssim

    Colab notebook error because of compare_ssim

    Thanks for providing this code!

    When trying to run the following cell in the the Colaboratory notebook to project an input image to latent space :

    #Project your own image and Make Eigenvector of latent spaces (by pretrained model)
    
    !python projector.py --ckpt=/content/Cartoon-StyleGan2/networks/ffhq256.pt --factor='networks/factor' --e_ckpt=/content/Cartoon-StyleGan2/networks/encoder_ffhq.pt \
                                --files=/content/Cartoon-StyleGan2/celea.jpg 
    

    I get the following error:

    ImportError: cannot import name 'compare_ssim' from 'skimage.measure' (/usr/local/lib/python3.7/dist-packages/skimage/measure/init.py)

    This seems to be because /usr/local/lib/python3.7/dist-packages/skimage/measure/__init__.py uses 'skimage.measure.compare_ssim' when it was recently changed to 'skimage.metrics.structural_similarity'

    Link talking about skimage update: https://github.com/williamfzc/stagesepx/issues/150

    opened by tristansyates 1
Owner
Jihye Back
Jihye Back
Saeed Lotfi 28 Dec 12, 2022
Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Face Identity Disentanglement via Latent Space Mapping - Implement in pytorch with StyleGAN 2 Description Pytorch implementation of the paper Face Ide

Daniel Roich 58 Dec 24, 2022
Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a

Tianxiang Sun 149 Jan 4, 2023
Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

xTune Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning. Environment DockerFile: dancingsoul/pytorch:xTune Install the f

Bo Zheng 42 Dec 9, 2022
Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Legged Robots that Keep on Learning Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World, whic

Laura Smith 70 Dec 7, 2022
This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

Core-tuning This repository is the official implementation of ``Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regular

vanint 18 Dec 17, 2022
Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker This repository contai

Nikita 12 Dec 14, 2022
Implementation of the paper "Fine-Tuning Transformers: Vocabulary Transfer"

Transformer-vocabulary-transfer Implementation of the paper "Fine-Tuning Transfo

LEYA 13 Nov 30, 2022
Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning This repository is official Tensorflow implementation of paper: Ensemb

Seunghyun Lee 12 Oct 18, 2022
Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

T-Few This repository contains the official code for the paper: "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learni

null 220 Dec 31, 2022
DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition, TPAMI 2021

DVG-Face: Dual Variational Generation for HFR This repo is a PyTorch implementation of DVG-Face: Dual Variational Generation for Heterogeneous Face Re

null 52 Dec 30, 2022
A large-scale face dataset for face parsing, recognition, generation and editing.

CelebAMask-HQ [Paper] [Demo] CelebAMask-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA da

switchnorm 1.7k Dec 26, 2022
Streamlit Tutorial (ex: stock price dashboard, cartoon-stylegan, vqgan-clip, stylemixing, styleclip, sefa)

Streamlit Tutorials Install pip install streamlit Run cd [directory] streamlit run app.py --server.address 0.0.0.0 --server.port [your port] # http:/

Jihye Back 30 Jan 6, 2023
An addon uses SMPL's poses and global translation to drive cartoon character in Blender.

Blender addon for driving character The addon drives the cartoon character by passing SMPL's poses and global translation into model's armature in Ble

犹在镜中 153 Dec 14, 2022
Open CV - Convert a picture to look like a cartoon sketch in python

Use the video https://www.youtube.com/watch?v=k7cVPGpnels for initial learning.

Sammith S Bharadwaj 3 Jan 29, 2022
StyleGAN2-ADA - Official PyTorch implementation

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes.

NVIDIA Research Projects 3.2k Dec 30, 2022
StyleGAN2-ada for practice

This version of the newest PyTorch-based StyleGAN2-ada is intended mostly for fellow artists, who rarely look at scientific metrics, but rather need a working creative tool. Tested on Python 3.7 + PyTorch 1.7.1, requires FFMPEG for sequence-to-video conversions. For more explicit details refer to the original implementations.

vadim epstein 170 Nov 16, 2022
Navigating StyleGAN2 w latent space using CLIP

Navigating StyleGAN2 w latent space using CLIP an attempt to build sth with the official SG2-ADA Pytorch impl kinda inspired by Generating Images from

Mike K. 55 Dec 6, 2022
StyleGAN2 - Official TensorFlow Implementation

StyleGAN2 - Official TensorFlow Implementation

NVIDIA Research Projects 10.1k Dec 28, 2022