[CVPR 2022] Unsupervised Image-to-Image Translation with Generative Prior

Overview

GP-UNIT - Official PyTorch Implementation

This repository provides the official PyTorch implementation for the following paper:

Unsupervised Image-to-Image Translation with Generative Prior
Shuai Yang, Liming Jiang, Ziwei Liu and Chen Change Loy
In CVPR 2022.
Project Page | Paper | Supplementary Video

Abstract: Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data. Despite the recent progress in image translation models, it remains challenging to build mappings between complex domains with drastic visual discrepancies. In this work, we present a novel framework, Generative Prior-guided UNsupervised Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm. Our key insight is to leverage the generative prior from pre-trained class-conditional GANs (e.g., BigGAN) to learn rich content correspondences across various domains. We propose a novel coarse-to-fine scheme: we first distill the generative prior to capture a robust coarse-level content representation that can link objects at an abstract semantic level, based on which fine-level content features are adaptively learned for more accurate multi-level content correspondences. Extensive experiments demonstrate the superiority of our versatile framework over state-of-the-art methods in robust, high-quality and diversified translations, even for challenging and distant domains.

Updates

  • [03/2022] Paper and supplementary video are released.
  • [04/2022] Code and dataset are released.
  • [03/2022] This website is created.

Installation

Clone this repo:

git clone https://github.com/williamyang1991/GP-UNIT.git
cd GP-UNIT

Dependencies:

We have tested on:

  • CUDA 10.1
  • PyTorch 1.7.0
  • Pillow 8.0.1; Matplotlib 3.3.3; opencv-python 4.4.0; Faiss 1.7.0; tqdm 4.54.0

All dependencies for defining the environment are provided in environment/gpunit_env.yaml. We recommend running this repository using Anaconda:

conda env create -f ./environment/gpunit_env.yaml

We use CUDA 10.1 so it will install PyTorch 1.7.0 (corresponding to Line 16, Line 113, Line 120, Line 121 of gpunit_env.yaml). Please install PyTorch that matches your own CUDA version following https://pytorch.org/.


(1) Dataset Preparation

Human face dataset, animal face dataset and aristic human face dataset can be downloaded from their official pages. Bird, dog and car datasets can be built from ImageNet with our provided script.

Task Used Dataset
Male←→Female CelebA-HQ: divided into male and female subsets by StarGANv2
Dog←→Cat←→Wild AFHQ provided by StarGANv2
Face←→Cat or Dog CelebA-HQ and AFHQ
Bird←→Dog 4 classes of birds and 4 classes of dogs in ImageNet291. Please refer to dataset preparation for building ImageNet291 from ImageNet
Bird←→Car 4 classes of birds and 4 classes of cars in ImageNet291. Please refer to dataset preparation for building ImageNet291 from ImageNet
Face→MetFace CelebA-HQ and MetFaces

(2) Inference for Latent-Guided and Exemplar-Guided Translation

Inference Notebook


To help users get started, we provide a Jupyter notebook at ./notebooks/inference_playground.ipynb that allows one to visualize the performance of GP-UNIT. The notebook will download the necessary pretrained models and run inference on the images in ./data/.

Web Demo

Try Replicate web demo here Replicate

Pretrained Models

Pretrained models can be downloaded from Google Drive or Baidu Cloud (access code: cvpr):

Task Pretrained Models
Prior Distillation content encoder
Male←→Female generators for male2female and female2male
Dog←→Cat←→Wild generators for dog2cat, cat2dog, dog2wild, wild2dog, cat2wild and wild2cat
Face←→Cat or Dog generators for face2cat, cat2face, dog2face and face2dog
Bird←→Dog generators for bird2dog and dog2bird
Bird←→Car generators for bird2car and car2bird
Face→MetFace generator for face2metface

The saved checkpoints are under the following folder structure:

checkpoint
|--content_encoder.pt     % Content encoder
|--bird2car.pt            % Bird-to-Car translation model
|--bird2dog.pt            % Bird-to-Dog translation model
...

Latent-Guided Translation

Translate a content image to the target domain with randomly sampled latent styles:

python inference.py --generator_path PRETRAINED_GENERATOR_PATH --content_encoder_path PRETRAINED_ENCODER_PATH \ 
                    --content CONTENT_IMAGE_PATH --batch STYLE_NUMBER --device DEVICE

By default, the script will use .\checkpoint\dog2cat.pt as PRETRAINED_GENERATOR_PATH, .\checkpoint\content_encoder.pt as PRETRAINED_ENCODER_PATH, and cuda as DEVICE for using GPU. For running on CPUs, use --device cpu.

Take Dog→Cat as an example, run:

python inference.py --content ./data/afhq/images512x512/test/dog/flickr_dog_000572.jpg --batch 6

Six results translation_flickr_dog_000572_N.jpg (N=0~5) are saved in the folder .\output\. An corresponding overview image translation_flickr_dog_000572_overview.jpg is additionally saved to illustrate the input content image and the six results:

Evaluation Metrics: We use the code of StarGANv2 to calculate FID and Diversity with LPIPS in our paper.

Exemplar-Guided Translation

Translate a content image to the target domain in the style of a style image by additionally specifying --style:

python inference.py --generator_path PRETRAINED_GENERATOR_PATH --content_encoder_path PRETRAINED_ENCODER_PATH \ 
                    --content CONTENT_IMAGE_PATH --style STYLE_IMAGE_PATH --device DEVICE

Take Dog→Cat as an example, run:

python inference.py --content ./data/afhq/images512x512/test/dog/flickr_dog_000572.jpg --style ./data/afhq/images512x512/test/cat/flickr_cat_000418.jpg

The result translation_flickr_dog_000572_to_flickr_cat_000418.jpg is saved in the folder .\output\. An corresponding overview image translation_flickr_dog_000572_to_flickr_cat_000418_overview.jpg is additionally saved to illustrate the input content image, the style image, and the result:

Another example of Cat→Wild, run:

python inference.py --generator_path ./checkpoint/cat2wild.pt --content ./data/afhq/images512x512/test/cat/flickr_cat_000418.jpg --style ./data/afhq/images512x512/test/wild/flickr_wild_001112.jpg

The overview image is as follows:


(3) Training GP-UNIT

Download the supporting models to the ./checkpoint/ folder:

Model Description
content_encoder.pt Our pretrained content encoder which distills BigGAN prior from the synImageNet291 dataset.
model_ir_se50.pth Pretrained IR-SE50 model taken from TreB1eN for ID loss.

Train Image-to-Image Transaltion Network

python train.py --task TASK --batch BATCH_SIZE --iter ITERATIONS \
                --source_paths SPATH1 SPATH2 ... SPATHS --source_num SNUM1 SNUM2 ... SNUMS \
                --target_paths TPATH1 TPATH2 ... TPATHT --target_num TNUM1 TNUM2 ... TNUMT

where SPATH1~SPATHS are paths to S folders containing images from the source domain (e.g., S classes of ImageNet birds), SNUMi is the number of images in SPATHi used for training. TPATHi, TNUMi are similarily defined but for the target domain. By default, BATCH_SIZE=16 and ITERATIONS=75000. If --source_num/--target_num is not specified, all images in the folders are used.

The trained model is saved as ./checkpoint/TASK-ITERATIONS.pt. Intermediate results are saved in ./log/TASK/.

This training does not necessarily lead to the optimal results, which can be further customized with additional command line options:

  • --style_layer (default: 4): the discriminator layer to compute the feature matching loss. We found setting style_layer=5 gives better performance on human faces.
  • --use_allskip (default: False): whether using dynamic skip connections to compute the reconstruction loss. For tasks involving close domains like gender translation, season transfer and face stylization, using use_allskip gives better results.
  • --use_idloss (default: False): whether using the identity loss. For Cat/Dog→Face and Face→MetFace tasks, we use this loss.
  • --not_flip_style (default: False): whether not randomly flipping the style image when extracting the style feature. Random flipping prevents the network to learn position information from the style image.
  • --mitigate_style_bias(default: False): whether resampling style features when training the sampling network. For imbalanced dataset that has minor groups, mitigate_style_bias oversamples those style features that are far from the mean style feature of the whole dataset. This leads to more diversified latent-guided translation at the cost of slight image quality degradation. We use it on CelebA-HQ and AFHQ-related tasks.

Here are some examples:
(Parts of our tasks require the ImageNet291 dataset. Please refer to data preparation)

Male→Female

python train.py --task male2female --source_paths ./data/celeba_hq/train/male --target_paths ./data/celeba_hq/train/female --style_layer 5 --mitigate_style_bias --use_allskip --not_flip_style

Cat→Dog

python train.py --task cat2dog --source_paths ./data/afhq/images512x512/train/cat --source_num 4000 --target_paths ./data/afhq/images512x512/train/dog --target_num 4000 --mitigate_style_bias

Cat→Face

python train.py --task cat2face --source_paths ./data/afhq/images512x512/train/cat --source_num 4000 --target_paths ./data/ImageNet291/train/1001_face/ --style_layer 5 --mitigate_style_bias --not_flip_style --use_idloss

Bird→Car (translating 4 classes of birds to 4 classes of cars)

python train.py --task bird2car --source_paths ./data/ImageNet291/train/10_bird/ ./data/ImageNet291/train/11_bird/ ./data/ImageNet291/train/12_bird/ ./data/ImageNet291/train/13_bird/ --source_num 600 600 600 600 --target_paths ./data/ImageNet291/train/436_vehicle/ ./data/ImageNet291/train/511_vehicle/ ./data/ImageNet291/train/627_vehicle/ ./data/ImageNet291/train/656_vehicle/ --target_num 600 600 600 600

Train Content Encoder of Prior Distillation

We provide our pretrained model content_encoder.pt at Google Drive or Baidu Cloud (access code: cvpr). This model is obtained by:

python prior_distillation.py --unpaired_data_root ./data/ImageNet291/train/ --paired_data_root ./data/synImageNet291/train/ --unpaired_mask_root ./data/ImageNet291_mask/train/ --paired_mask_root ./data/synImageNet291_mask/train/

The training requires ImageNet291 and synImageNet291 datasets. Please refer to data preparation.


Results

Male-to-Female: close domains

male2female

Cat-to-Dog: related domains

cat2dog

Dog-to-Human and Bird-to-Dog: distant domains

dog2human

bird2dog

Bird-to-Car: extremely distant domains for stress testing

bird2car

Citation

If you find this work useful for your research, please consider citing our paper:

@inproceedings{yang2022Unsupervised,
  title={Unsupervised Image-to-Image Translation with Generative Prior},
  author={Yang, Shuai and Jiang, Liming and Liu, Ziwei and Loy, Chen Change},
  booktitle={CVPR},
  year={2022}
}

Acknowledgments

The code is developed based on StarGAN v2, SPADE and Imaginaire.

You might also like...
[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior
[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

pytorch-deep-video-prior (DVP) Official PyTorch implementation for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior TensorFlo

The implementation of PEMP in paper
The implementation of PEMP in paper "Prior-Enhanced Few-Shot Segmentation with Meta-Prototypes"

Prior-Enhanced network with Meta-Prototypes (PEMP) This is the PyTorch implementation of PEMP. Overview of PEMP Meta-Prototypes & Adaptive Prototypes

Official Pytorch implementation of ICLR 2018 paper Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge.
Official Pytorch implementation of ICLR 2018 paper Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge.

Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge: Official Pytorch implementation of ICLR 2018 paper Deep Learning for Phy

Original code for
Original code for "Zero-Shot Domain Adaptation with a Physics Prior"

Zero-Shot Domain Adaptation with a Physics Prior [arXiv] [sup. material] - ICCV 2021 Oral paper, by Attila Lengyel, Sourav Garg, Michael Milford and J

Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging, ICCV2021 [PyTorch Code]
Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging, ICCV2021 [PyTorch Code]

Dense Deep Unfolding Network with 3D-CNN Prior for Snapshot Compressive Imaging, ICCV2021 [PyTorch Code]

PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

PFENet This is the implementation of our paper PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation that has been accepted to IEE

[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective Zhengzhuo Xu, Zenghao Chai, Chun Yuan This is the PyTorch implement

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].
PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

VGPL-Visual-Prior PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner (VGPL). Give

Comments
  • Add Web Demo & Docker environment

    Add Web Demo & Docker environment

    Hey @williamyang1991! 👋

    This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

    This also means we can make a web page where other people can try out your model! View it here: https://replicate.com/williamyang1991/gp-unit. The docker file can be found under the tab ‘run model with docker’.

    Do claim the page so you can own the page, customise the Example gallery as you like, push any future update to the web demo, and we'll feature it on our website and tweet about it too. You can find the 'Claim this model' button on the top of the page ~

    In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. 😊

    opened by chenxwh 8
  • content_encoder.pt

    content_encoder.pt

    If I want to train the model from a new datasets,

    1. does content_encoder.pt need to be retrained?
    2. How to train content_encoder.pt on the new datasets?
    opened by mapengsen 2
Owner
Shuai Yang
Shuai Yang
FreeSOLO for unsupervised instance segmentation, CVPR 2022

FreeSOLO: Learning to Segment Objects without Annotations This project hosts the code for implementing the FreeSOLO algorithm for unsupervised instanc

NVIDIA Research Projects 233 Sep 30, 2022
Official implementation for "QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation" (CVPR 2022)

QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation (CVPR2022) https://arxiv.org/abs/2203.08483 Unpaired image-to-image (I2I

Xueqi Hu 40 Sep 25, 2022
The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

NTIRE 2022 - Image Inpainting Challenge Important dates 2022.02.01: Release of train data (input and output images) and validation data (only input) 2

Andrés Romero 30 Sep 17, 2022
Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

arXiv Dual Contrastive Learning Adversarial Generative Networks (DCLGAN) We provide our PyTorch implementation of DCLGAN, which is a simple yet powerf

null 113 Oct 1, 2022
Unsupervised Image-to-Image Translation

UNIT: UNsupervised Image-to-image Translation Networks Imaginaire Repository We have a reimplementation of the UNIT method that is more performant. It

Ming-Yu Liu 劉洺堉 1.9k Sep 25, 2022
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 3 Aug 20, 2022
Finding an Unsupervised Image Segmenter in each of your Deep Generative Models

Finding an Unsupervised Image Segmenter in each of your Deep Generative Models Description Recent research has shown that numerous human-interpretable

Luke Melas-Kyriazi 57 Aug 29, 2022
pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

Zhiqiang Shen 16 Nov 4, 2020
The pytorch implementation of DG-Font: Deformable Generative Networks for Unsupervised Font Generation

DG-Font: Deformable Generative Networks for Unsupervised Font Generation The source code for 'DG-Font: Deformable Generative Networks for Unsupervised

null 123 Sep 7, 2022
Planar Prior Assisted PatchMatch Multi-View Stereo

ACMP [News] The code for ACMH is released!!! [News] The code for ACMM is released!!! About This repository contains the code for the paper Planar Prio

Qingshan Xu 117 Sep 23, 2022