Photo2cartoon - 人像卡通化探索项目 (photo-to-cartoon translation project)

Overview

人像卡通化 (Photo to Cartoon)

中文版 | English Version

该项目为小视科技卡通肖像探索项目。您可使用微信扫描下方二维码或搜索“AI卡通秀”小程序体验卡通化效果。

也可以前往我们的ai开放平台进行在线体验:https://ai.minivision.cn/#/coreability/cartoon

技术交流QQ群:937627932

Updates

简介

人像卡通风格渲染的目标是,在保持原图像ID信息和纹理细节的同时,将真实照片转换为卡通风格的非真实感图像。我们的思路是,从大量照片/卡通数据中习得照片到卡通画的映射。一般而言,基于成对数据的pix2pix方法能达到较好的图像转换效果,但本任务的输入输出轮廓并非一一对应,例如卡通风格的眼睛更大、下巴更瘦;且成对的数据绘制难度大、成本较高,因此我们采用unpaired image translation方法来实现。

Unpaired image translation流派最经典方法是CycleGAN,但原始CycleGAN的生成结果往往存在较为明显的伪影且不稳定。近期的论文U-GAT-IT提出了一种归一化方法——AdaLIN,能够自动调节Instance Norm和Layer Norm的比重,再结合attention机制能够实现精美的人像日漫风格转换。

与夸张的日漫风不同,我们的卡通风格更偏写实,要求既有卡通画的简洁Q萌,又有明确的身份信息。为此我们增加了Face ID Loss,使用预训练的人脸识别模型提取照片和卡通画的ID特征,通过余弦距离来约束生成的卡通画。

此外,我们提出了一种Soft-AdaLIN(Soft Adaptive Layer-Instance Normalization)归一化方法,在反规范化时将编码器的均值方差(照片特征)与解码器的均值方差(卡通特征)相融合。

模型结构方面,在U-GAT-IT的基础上,我们在编码器之前和解码器之后各增加了2个hourglass模块,渐进地提升模型特征抽象和重建能力。

由于实验数据较为匮乏,为了降低训练难度,我们将数据处理成固定的模式。首先检测图像中的人脸及关键点,根据人脸关键点旋转校正图像,并按统一标准裁剪,再将裁剪后的头像输入人像分割模型去除背景。

Start

安装依赖库

项目所需的主要依赖库如下:

  • python 3.6
  • pytorch 1.4
  • tensorflow-gpu 1.14
  • face-alignment
  • dlib
  • onnxruntime

Clone:

git clone https://github.com/minivision-ai/photo2cartoon.git
cd ./photo2cartoon

下载资源

谷歌网盘 | 百度网盘 提取码:y2ch

  1. 人像卡通化预训练模型:photo2cartoon_weights.pt(20200504更新),存放在models路径下。
  2. 头像分割模型:seg_model_384.pb,存放在utils路径下。
  3. 人脸识别预训练模型:model_mobilefacenet.pth,存放在models路径下。(From: InsightFace_Pytorch
  4. 卡通画开源数据:cartoon_data,包含trainBtestB
  5. 人像卡通化onnx模型:photo2cartoon_weights.onnx 谷歌网盘,存放在models路径下。

测试

将一张测试照片(亚洲年轻女性)转换为卡通风格:

python test.py --photo_path ./images/photo_test.jpg --save_path ./images/cartoon_result.png

测试onnx模型

python test_onnx.py --photo_path ./images/photo_test.jpg --save_path ./images/cartoon_result.png

训练

1.数据准备

训练数据包括真实照片和卡通画像,为降低训练复杂度,我们对两类数据进行了如下预处理:

  • 检测人脸及关键点。
  • 根据关键点旋转校正人脸。
  • 将关键点边界框按固定的比例扩张并裁剪出人脸区域。
  • 使用人像分割模型将背景置白。

我们开源了204张处理后的卡通画数据,您还需准备约1000张人像照片(为匹配卡通数据,尽量使用亚洲年轻女性照片,人脸大小最好超过200x200像素),使用以下命令进行预处理:

python data_process.py --data_path YourPhotoFolderPath --save_path YourSaveFolderPath

将处理后的数据按照以下层级存放,trainAtestA中存放照片头像数据,trainBtestB中存放卡通头像数据。

├── dataset
    └── photo2cartoon
        ├── trainA
            ├── xxx.jpg
            ├── yyy.png
            └── ...
        ├── trainB
            ├── zzz.jpg
            ├── www.png
            └── ...
        ├── testA
            ├── aaa.jpg 
            ├── bbb.png
            └── ...
        └── testB
            ├── ccc.jpg 
            ├── ddd.png
            └── ...

2.训练

重新训练:

python train.py --dataset photo2cartoon

加载预训练参数:

python train.py --dataset photo2cartoon --pretrained_weights models/photo2cartoon_weights.pt

多GPU训练(仍建议使用batch_size=1,单卡训练):

python train.py --dataset photo2cartoon --batch_size 4 --gpu_ids 0 1 2 3

Q&A

Q:为什么开源的卡通化模型与小程序中的效果有差异?

A:开源模型的训练数据收集自互联网,为了得到更加精美的效果,我们在训练小程序中卡通化模型时,采用了定制的卡通画数据(200多张),且增大了输入分辨率。此外,小程序中的人脸特征提取器采用自研的识别模型,效果优于本项目使用的开源识别模型。

Q:如何选取效果最好的模型?

A:首先训练模型200k iterations,然后使用FID指标挑选出最优模型,最终挑选出的模型为迭代90k iterations时的模型。

Q:关于人脸特征提取模型。

A:实验中我们发现,使用自研的识别模型计算Face ID Loss训练效果远好于使用开源识别模型,若训练效果出现鲁棒性问题,可尝试将Face ID Loss权重置零。

Q:人像分割模型是否能用与分割半身像?

A:不能。该模型是针对本项目训练的专用模型,需先裁剪出人脸区域再输入。

Tips

我们开源的模型是基于亚洲年轻女性训练的,对于其他人群覆盖不足,您可根据使用场景自行收集相应人群的数据进行训练。我们的开放平台提供了能够覆盖各类人群的卡通化服务,您可前往体验。如有定制卡通风格需求请联系商务:18852075216。

参考

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation [Paper][Code]

InsightFace_Pytorch

Comments
  • 关于test

    关于test

    在window10环境中调用test.py: python test.py --photo_path ./images/photo_test.jpg --save_path ./images/cartoon_result.png 程序没有报错,也无法得出结果。

    Warning: this detector is deprecated. Please use a different one, i.e.: S3FD. WARNING:tensorflow:From E:\DL_with_coding\photo2cartoon\utils\face_seg.py:13: The name tf.ConfigProto is deprecated. Please use tf.compat.v1. ConfigProto instead.

    WARNING:tensorflow:From E:\DL_with_coding\photo2cartoon\utils\face_seg.py:16: The name tf.Session is deprecated. Please use tf.compat.v1.Sess ion instead.

    2020-04-22 11:46:56.162121: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary w as not compiled to use: AVX AVX2 2020-04-22 11:46:56.167602: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library nvcuda.dll 2020-04-22 11:46:57.083446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce GTX 1060 with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.3415 pciBusID: 0000:01:00.0 2020-04-22 11:46:57.090947: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, ski p dlopen check. 2020-04-22 11:46:57.096467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0 2020-04-22 11:46:58.398108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-22 11:46:58.403445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2020-04-22 11:46:58.406841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2020-04-22 11:46:58.410200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task :0/device:GPU:0 with 4714 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From E:\DL_with_coding\photo2cartoon\utils\face_seg.py:26: FastGFile.init (from tensorflow.python.platform.gfile) is d eprecated and will be removed in a future version. Instructions for updating: Use tf.gfile.GFile. WARNING:tensorflow:From E:\DL_with_coding\photo2cartoon\utils\face_seg.py:27: The name tf.GraphDef is deprecated. Please use tf.compat.v1.Gra phDef instead.

    opened by xiaosuzhang 16
  • 如何选取效果最好的模型 和 训练数据?

    如何选取效果最好的模型 和 训练数据?

    1.请问在README.md,您说"首先训练模型20万步,然后使用FID指标挑选出最优模型,最终挑选出的模型为迭代9万步时的模型。" 为什么要这样做而不是直接训练到代码里默认的100万步?以及如何用FID指标挑选呢?

    2.请问小程序中演示的效果,您在训练时,真实照片和卡通画像各大约多少张呢?

    3.之前看抖音的宝宝特效(比如把明星变成宝宝) 很火,就就很想尝试一下,请问用您的代码可以实现类似的效果吗?您对尝试这块,数据或代码上有什么建议吗?

    恳请回复,万分感谢!

    opened by yuanxing-syy 7
  • 请问能分享下loss曲线或数据吗?

    请问能分享下loss曲线或数据吗?

    您好,我现在训练的d_loss在1.0-2.0徘徊,g_loss在40-80徘徊,训练了8k左右个iteration。为了训练稳定,我把face_id loss的weight调成0了,其他权重保持默认值。想问下您的实验中最终的d_loss和g_loss长啥样吗?或者如果您有画loss曲线,能否分享下呢?感谢。

    opened by fantasy-fish 6
  • 代码似乎有bug?

    代码似乎有bug?

    文件:https://github.com/minivision-ai/photo2cartoon/blob/master/models/networks.py 行:224-237 是不是应该是:

     skip1 = self.ConvBlock1_1(x)
    down1 = F.avg_pool2d(x, 2)
    down1 = self.ConvBlock1_2(down1)
    
    skip2 = **self.ConvBlock2_1(down1)**
    down2 = F.avg_pool2d(down1, 2)
    down2 = **self.ConvBlock2_2(down2)**
    
    skip3 = **self.ConvBlock3_1(down2)**
    down3 = F.avg_pool2d(down2, 2)
    down3 = **self.ConvBlock3_2(down3)**
    
    skip4 = **self.ConvBlock4_1(down3)**
    down4 = F.avg_pool2d(down3, 2)
    down4 = **self.ConvBlock4_2(down4)**
    
    opened by amazingyyc 4
  • 关于安装test

    关于安装test

    hi,你好,我在windows下进行安装测试,但是一直在: Downloading the Face Alignment Network(FAN). Please wait...

    然后下载中断,有什么其他的解决方式吗,比如我先下载这个FAN之类的

    另外,你们小程序是输入512,然后训练的trainb部分也是只有200多张吗?

    opened by zhangyunming 3
  • Multi-GPU training, IndexError: Caught IndexError in replica 0 on device 5.

    Multi-GPU training, IndexError: Caught IndexError in replica 0 on device 5.

    Traceback (most recent call last): File "train.py", line 84, in main() File "train.py", line 75, in main gan.train() File "/home/photo2cartoon/models/UGATIT_sadalin_hourglass.py", line 191, in train fake_A2B, _, _ = self.genA2B(real_A) File "/home/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/home/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 425, in reraise raise self.exc_type(msg) IndexError: Caught IndexError in replica 0 on device 5. Original Traceback (most recent call last): File "/home/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, **kwargs) File "/home/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/codes/photo2cartoon/models/networks.py", line 100, in forward gap_weight = list(self.gap_fc.parameters())[0] IndexError: list index out of range how to sovle it?

    opened by redredbluee 2
  • 关于数据集

    关于数据集

    你好,在readme的Q&A中,你们描述小程序中的效果是使用了200张定制卡通图训练,这个数据量是不是太小了,还是说这200张定制的卡通图只是用来最后微调的? 此外,你们定制卡通图的时候应该是参照真实人脸图像找专业人士绘制的吧,这样的话是不是等于有了成对的训练数据,你们用成对的数据进行有监督学习么,这样对最后的效果会不会有提升呢?

    opened by pkjq11 2
  • 使用test.py后,face alignment提示报错

    使用test.py后,face alignment提示报错

    我是Python新手,我在使用 test.py 时碰到了问题。一直卡在 self.fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, flip_input=False)这一句上,我猜测的原因为pytorch的模型没加载好。

    粘贴的图形-2 粘贴的图形-1
    opened by shenghanqin 2
  • OSError: unrecognized data stream contents when reading image file

    OSError: unrecognized data stream contents when reading image file

    While model training on custom data i got this error

    Traceback (most recent call last):
      File "/notebooks/photo2cartoon/train.py", line 84, in <module>
        main()
      File "/notebooks/photo2cartoon/train.py", line 75, in main
        gan.train()
      File "/notebooks/photo2cartoon/models/UGATIT_sadalin_hourglass.py", line 178, in train
        real_A, _ = trainA_iter.next()
      File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 652, in __next__
        data = self._next_data()
      File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 692, in _next_data
        data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
      File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/notebooks/photo2cartoon/dataset.py", line 66, in __getitem__
        sample = self.loader(path)
      File "/notebooks/photo2cartoon/dataset.py", line 99, in default_loader
        return pil_loader(path)
      File "/notebooks/photo2cartoon/dataset.py", line 95, in pil_loader
        return img.convert('RGB')
      File "/usr/local/lib/python3.9/dist-packages/PIL/Image.py", line 934, in convert
        self.load()
      File "/usr/local/lib/python3.9/dist-packages/PIL/ImageFile.py", line 276, in load
        raise_oserror(err_code)
      File "/usr/local/lib/python3.9/dist-packages/PIL/ImageFile.py", line 71, in raise_oserror
        raise OSError(message + " when reading image file")
    OSError: unrecognized data stream contents when reading image file
    
    opened by bhadreshpsavani 0
  • add MNN/TNN/ONNXRuntime C++ demo

    add MNN/TNN/ONNXRuntime C++ demo

    add MNN/TNN/ONNXRuntime C++ demo:

    usage

    #include "lite/lite.h"
    
    static void test_default()
    {
      std::string head_seg_onnx_path = "../../../hub/onnx/cv/minivision_head_seg.onnx";
      std::string cartoon_onnx_path = "../../../hub/onnx/cv/minivision_female_photo2cartoon.onnx";
      std::string test_img_path = "../../../examples/lite/resources/test_lite_female_photo2cartoon.jpg";
      std::string save_mask_path = "../../../logs/test_lite_female_photo2cartoon_seg.jpg";
      std::string save_cartoon_path = "../../../logs/test_lite_female_photo2cartoon_cartoon.jpg";
    
      auto *head_seg = new lite::cv::segmentation::HeadSeg(head_seg_onnx_path, 4); // 4 threads
      auto *female_photo2cartoon = new lite::cv::style::FemalePhoto2Cartoon(cartoon_onnx_path, 4); // 4 threads
    
      lite::types::HeadSegContent head_seg_content;
      cv::Mat img_bgr = cv::imread(test_img_path);
      head_seg->detect(img_bgr, head_seg_content);
    
      if (head_seg_content.flag && !head_seg_content.mask.empty())
      {
        cv::imwrite(save_mask_path, head_seg_content.mask * 255.f);
        // Female Photo2Cartoon Style Transfer
        lite::types::FemalePhoto2CartoonContent female_cartoon_content;
        female_photo2cartoon->detect(img_bgr, head_seg_content.mask, female_cartoon_content);
        
        if (female_cartoon_content.flag && !female_cartoon_content.cartoon.empty())
          cv::imwrite(save_cartoon_path, female_cartoon_content.cartoon);
      }
    
      delete head_seg;
      delete female_photo2cartoon;
    }
    

    the output is:

    opened by DefTruth 0
  • ONXX conversion script?

    ONXX conversion script?

    Thanks for your great work. I see that you have provided onxx prediction script, can you provide the script you used for converting the pth model to onxx

    opened by APEX101 0
  • The model removes green colour

    The model removes green colour

    image

    As you can see in the above image the green colour of the tshirt is gone. This will not happen if shirt was blue or red. I just wanted to know why this could be happening?

    opened by Dhruv88 0
  • hyperparameter tuning

    hyperparameter tuning

    First of all, you guys have done a commendable job with this model. I have been experimenting with the same and was wondering if your team could elaborate more on the hyperparameters used in the architecture and best practices to fine-tune them on custom data set? Namely,

    1. adv_weight
    2. cycle_weight
    3. identity_weight
    4. cam_weight
    5. faceId_weight

    Also, if you guys have performed any sharable study regarding their effects on final toon output that would be great. Thanks.

    opened by pradyumnjain 0
Owner
Minivision_AI
Minivision_AI
Streamlit Tutorial (ex: stock price dashboard, cartoon-stylegan, vqgan-clip, stylemixing, styleclip, sefa)

Streamlit Tutorials Install pip install streamlit Run cd [directory] streamlit run app.py --server.address 0.0.0.0 --server.port [your port] # http:/

Jihye Back 30 Jan 6, 2023
Fine-tuning StyleGAN2 for Cartoon Face Generation

Cartoon-StyleGAN ?? : Fine-tuning StyleGAN2 for Cartoon Face Generation Abstract Recent studies have shown remarkable success in the unsupervised imag

Jihye Back 520 Jan 4, 2023
Open CV - Convert a picture to look like a cartoon sketch in python

Use the video https://www.youtube.com/watch?v=k7cVPGpnels for initial learning.

Sammith S Bharadwaj 3 Jan 29, 2022
Old Photo Restoration (Official PyTorch Implementation)

Bringing Old Photo Back to Life (CVPR 2020 oral)

Microsoft 11.3k Dec 30, 2022
Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

null 184 Dec 11, 2022
The code of paper 'Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection'

Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection Pytorch implemetation of paper 'Learning to Aggregate and Personalize

Tencent YouTu Research 136 Dec 29, 2022
PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models Code accompanying CVPR'20 paper of the same title. Paper lin

Alex Damian 7k Dec 30, 2022
An algorithm that handles large-scale aerial photo co-registration, based on SURF, RANSAC and PyTorch autograd.

An algorithm that handles large-scale aerial photo co-registration, based on SURF, RANSAC and PyTorch autograd.

Luna Yue Huang 41 Oct 29, 2022
pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

PyTorch SRResNet Implementation of Paper: "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network"(https://arxiv.org/abs

Jiu XU 436 Jan 9, 2023
A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.

imutils A series of convenience functions to make basic image processing functions such as translation, rotation, resizing, skeletonization, and displ

Adrian Rosebrock 4.3k Jan 8, 2023
Image-to-Image Translation with Conditional Adversarial Networks (Pix2pix) implementation in keras

pix2pix-keras Pix2pix implementation in keras. Original paper: Image-to-Image Translation with Conditional Adversarial Networks (pix2pix) Paper Author

William Falcon 141 Dec 30, 2022
Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

Despite its importance, there are few previous works applying I2I translation to webtoon. I collected dataset from naver webtoon 연애혁명 and tried to transfer human faces to webtoon domain.

이상윤 64 Oct 19, 2022
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

null 364 Dec 14, 2022
a morph transfer UGATIT for image translation.

Morph-UGATIT a morph transfer UGATIT for image translation. Introduction 中文技术文档 This is Pytorch implementation of UGATIT, paper "U-GAT-IT: Unsupervise

null 55 Nov 14, 2022
Styled Augmented Translation

SAT Style Augmented Translation Introduction By collecting high-quality data, we were able to train a model that outperforms Google Translate on 6 dif

null 139 Dec 29, 2022
Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

arXiv Dual Contrastive Learning Adversarial Generative Networks (DCLGAN) We provide our PyTorch implementation of DCLGAN, which is a simple yet powerf

null 119 Dec 4, 2022
CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Spatially-Correlative Loss arXiv | website We provide the Pytorch implementation of "The Spatially-Correlative Loss for Various Image Translation Task

Chuanxia Zheng 89 Jan 4, 2023
TANL: Structured Prediction as Translation between Augmented Natural Languages

TANL: Structured Prediction as Translation between Augmented Natural Languages Code for the paper "Structured Prediction as Translation between Augmen

null 98 Dec 15, 2022
Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches

CoMIR: Contrastive Multimodal Image Representation for Registration Framework ?? Registration of images in different modalities with Deep Learning ??

Methods for Image Data Analysis - MIDA 55 Dec 9, 2022