✨
PPE Repository for our CVPR'2022 paper:
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model. Zipeng Xu, Tianwei Lin, Hao Tang, Fu Li, Dongliang He, Nicu Sebe, Radu Timofte, Luc Van Gool, Errui Ding. To appear in CVPR 2022.
Pytorch implementation is at here: zipengxuc/PPE-Pytorch.
Updates
24 Mar 2022: We update our arxiv-version paper.
30 Mar 2022: We have had some changes in releasing the code. Pytorch implementation is now at here: zipengxuc/PPE-Pytorch.
14 Apr 2022: Update our PaddlePaddle inference code in this repository.
To reproduce our results:
Setup:
-
Install CLIP:
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=<CUDA_VERSION> pip install ftfy regex tqdm gdown pip install git+https://github.com/openai/CLIP.git
-
Download pre-trained models:
The code relies on the PaddleGAN (PaddlePaddle implementation of StyleGAN2). Download the pre-trained StyleGAN2 generator from here.
We provided several pretrained PPE models on here.
-
Invert real images:
The mapper is trained on latent vectors, so it is necessary to invert images into latent space. To edit human face, StyleCLIP provides the CelebA-HQ that was inverted by e4e: test set.
Usage:
Please first put downloaded pretraiend models and data on ckpt
folder.
Inference
In PaddlePaddle version, we only provide inference code to generate editing results:
python mapper/evaluate.py
Reference
@article{xu2022ppe,
author = {Zipeng Xu and Tianwei Lin and Hao Tang and Fu Li and Dongliang He and Nicu Sebe and Radu Timofte and Luc Van Gool and Errui Ding},
title = {Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model},
journal = {arXiv preprint arXiv:2111.13333},
year = {2021}
}
If you have any questions, please contact [email protected]. :)