[CVPR 2022] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

Billy XU

Last update: Jan 3, 2023

Related tags

Deep Learning TransEditor

Overview

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing (CVPR 2022)

This repository provides the official PyTorch implementation for the following paper:

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing
Yanbo Xu*, Yueqin Yin*, Liming Jiang, Qianyi Wu, Chengyao Zheng, Chen Change Loy, Bo Dai, Wayne Wu
In CVPR 2022. (* denotes equal contribution)
Project Page | Paper

Abstract: Recent advances like StyleGAN have promoted the growth of controllable facial editing. To address its core challenge of attribute decoupling in a single latent space, attempts have been made to adopt dual-space GAN for better disentanglement of style and content representations. Nonetheless, these methods are still incompetent to obtain plausible editing results with high controllability, especially for complicated attributes. In this study, we highlight the importance of interaction in a dual-space GAN for more controllable editing. We propose TransEditor, a novel Transformer-based framework to enhance such interaction. Besides, we develop a new dual-space editing and inversion strategy to provide additional editing flexibility. Extensive experiments demonstrate the superiority of the proposed framework in image quality and editing capability, suggesting the effectiveness of TransEditor for highly controllable facial editing.

Requirements

A suitable Anaconda environment named transeditor can be created and activated with:

conda env create -f environment.yaml
conda activate transeditor

Dataset Preparation

Datasets	CelebA-HQ	Flickr-Faces-HQ (FFHQ)

You can use download.sh in StyleMapGAN to download the CelebA-HQ dataset raw images and create the LMDB dataset format, similar for the FFHQ dataset.

Download Pretrained Models

The pretrained models can be downloaded from TransEditor Pretrained Models.
The age classifier and gender classifier for the FFHQ dataset can be found at pytorch-DEX.
The out/ folder and psp_out/ folder should be put under the TransEditor/ root folder, the pth/ folder should be put under the TransEditor/our_interfaceGAN/ffhq_utils/dex folder.

Training New Networks

To train the TransEditor network, run

python train_spatial_query.py $DATA_DIR --exp_name $EXP_NAME --batch 16 --n_sample 64 --num_region 1 --num_trans 8

For the multi-gpu distributed training, run

python -m torch.distributed.launch --nproc_per_node=$GPU_NUM --master_port $PORT_NUM train_spatial_query.py $DATA_DIR --exp_name $EXP_NAME --batch 16 --n_sample 64 --num_region 1 --num_trans 8

To train the encoder-based inversion network, run

# FFHQ
python psp_spatial_train.py $FFHQ_DATA_DIR --test_path $FFHQ_TEST_DIR --ckpt .out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --start_from_latent_avg --exp_dir $INVERSION_EXP_NAME --from_plus_space 

# CelebA-HQ
python psp_spatial_train.py $CELEBA_DATA_DIR --test_path $CELEBA_TEST_DIR --ckpt ./out/transeditor_celeba/checkpoint/370000.pt --num_region 1 --num_trans 8 --start_from_latent_avg --exp_dir $INVERSION_EXP_NAME --from_plus_space

Testing (Image Generation/Interpolation)

# sampled image generation
python test_spatial_query.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --sample

# interpolation
python test_spatial_query.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --dat_interp

Inversion

We provide two kinds of inversion methods.

Encoder-based inversion

# FFHQ
python dual_space_encoder_test.py --checkpoint_path ./psp_out/transeditor_inversion_ffhq/checkpoints/best_model.pt --output_dir ./projection --num_region 1 --num_trans 8 --start_from_latent_avg --from_plus_space --dataset_type ffhq_encode --dataset_dir /dataset/ffhq/test/images

# CelebA-HQ
python dual_space_encoder_test.py --checkpoint_path ./psp_out/transeditor_inversion_celeba/checkpoints/best_model.pt --output_dir ./projection --num_region 1 --num_trans 8 --start_from_latent_avg --from_plus_space --dataset_type celebahq_encode --dataset_dir /dataset/celeba_hq/test/images

Optimization-based inversion

# FFHQ
python projector_optimization.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --dataset_dir /dataset/ffhq/test/images --step 10000

# CelebA-HQ
python projector_optimization.py --ckpt ./out/transeditor_celeba/checkpoint/370000.pt --num_region 1 --num_trans 8 --dataset_dir /dataset/celeba_hq/test/images --step 10000

Image Editing

The attribute classifiers for CelebA-HQ datasets can be found in celebahq-classifiers.
Rename the folder as pth_celeba and put it under the our_interfaceGAN/celeba_utils/ folder.

CelebA_Attributes	attribute_index
Male	0
Smiling	1
Wavy hair	3
Bald	8
Bangs	9
Black hair	12
Blond hair	13

For sampled image editing, run

# FFHQ
python our_interfaceGAN/edit_all_noinversion_ffhq.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --attribute_name pose --num_sample 150000 # pose
python our_interfaceGAN/edit_all_noinversion_ffhq.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --attribute_name gender --num_sample 150000 # gender

# CelebA-HQ
python our_interfaceGAN/edit_all_noinversion_celebahq.py --ckpt ./out/transeditor_celeba/checkpoint/370000.pt --attribute_index 0 --num_sample 150000 # Male
python our_interfaceGAN/edit_all_noinversion_celebahq.py --ckpt ./out/transeditor_celeba/checkpoint/370000.pt --attribute_index 3 --num_sample 150000 # wavy hair
python our_interfaceGAN/edit_all_noinversion_celebahq.py --ckpt ./out/transeditor_celeba/checkpoint/370000.pt --attribute_name pose --num_sample 150000 # pose

For real image editing, run

# FFHQ
python our_interfaceGAN/edit_all_inversion_ffhq.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --attribute_name pose --z_latent ./projection/encoder_inversion/ffhq_encode/encoded_z.npy --p_latent ./projection/encoder_inversion/ffhq_encode/encoded_p.npy # pose

python our_interfaceGAN/edit_all_inversion_ffhq.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --attribute_name gender --z_latent ./projection/encoder_inversion/ffhq_encode/encoded_z.npy --p_latent ./projection/encoder_inversion/ffhq_encode/encoded_p.npy # gender

# CelebA-HQ
python our_interfaceGAN/edit_all_inversion_celebahq.py --ckpt ./out/transeditor_celeba/checkpoint/370000.pt --attribute_index 0 --z_latent ./projection/encoder_inversion/celebahq_encode/encoded_z.npy --p_latent ./projection/encoder_inversion/celebahq_encode/encoded_p.npy # Male

Evaluation Metrics

# calculate fid, lpips, ppl
python metrics/evaluate_query.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --batch 64 --inception metrics/inception_ffhq.pkl --truncation 1 --ppl --lpips --fid

Results

Image Interpolation

interp_p_celeba

interp_z_celeba

Image Editing

edit_pose_ffhq

edit_gender_ffhq

edit_smile_celebahq

edit_blackhair_celebahq

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{xu2022transeditor,
  title={{TransEditor}: Transformer-Based Dual-Space {GAN} for Highly Controllable Facial Editing},
  author={Xu, Yanbo and Yin, Yueqin and Jiang, Liming and Wu, Qianyi and Zheng, Chengyao and Loy, Chen Change and Dai, Bo and Wu, Wayne},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

Acknowledgments

The code is developed based on TransStyleGAN. We appreciate the nice PyTorch implementation.

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

87 Jan 8, 2023

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Talk-to-Edit (ICCV2021) This repository contains the implementation of the following paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog Yumin

221 Jan 7, 2023

The NEOSSat is a dual-mission microsatellite designed to detect potentially hazardous Earth-orbit-crossing asteroids and track objects that reside in deep space

2 Jan 30, 2022

Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021) Hang Zhou, Yasheng Sun, Wayne Wu, Chen Cha

628 Dec 28, 2022

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022 [Project page | Video] Getting sta

51 Nov 29, 2022

Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

HFGI: High-Fidelity GAN Inversion for Image Attribute Editing High-Fidelity GAN Inversion for Image Attribute Editing Update: We released the inferenc

371 Dec 30, 2022

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search, accepted by IJCAI 2021.

Instance-Aware Latent-Space Search This is a PyTorch implementation of the following paper: Disentangled Face Attribute Editing via Instance-Aware Lat

67 Dec 21, 2022

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing Figure: High-quality facial attributes editing results with InterFaceGA

GenForce: May Generative Force Be with You

1.3k Jan 9, 2023

3D-Transformer: Molecular Representation with Transformer in 3D Space

55 Dec 19, 2022

Comments

error in multi-gpu distributed training

Hi, I can run the code with only one GPU. However, errors exist when I use the multi-gpu distributed training. The errors are listed as follows: raceback (most recent call last): File "train_spatial_query.py", line 538, in train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, device, tensorboard_writer, args.exp_name) File "train_spatial_query.py", line 235, in train fake_img, latents, mean_path_length File "train_spatial_query.py", line 97, in g_path_regularize grad, = autograd.grad(outputs=tmp, inputs=latents, create_graph=True) File "/home/nrr/.conda/envs/stylegan/lib/python3.7/site-packages/torch/autograd/init.py", line 236, in grad inputs, allow_unused, accumulate_grad=False) RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

I use four GTX3080 to train the model. The pytorch version is 1.10.2. Could you kindly help me solve the problem. Thanks.

opened by yppr 0
编辑自己的图片

为什么编辑一张celeba的图片，生成的图跟乱码似的。完全看不起人脸，是什么参数我没设置对吗?运行步骤如下： python3 dual_space_encoder_test.py --dataset_dir image/ --output_dir output --checkpoint_path best_model.pt

python our_interfaceGAN/edit_all_inversion_ffhq.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --attribute_name gender --z_latent output/encoder_inversion/ffhq_encode/encoded_z.npy --p_latent output/encoder_inversion/ffhq_encode/encoded_p.npy --num_sample 1000

opened by chensheng-code 1
编辑自己的图片

为什么编辑一张celeba的图片，生成的图跟乱码似的。完全看不起人脸，是什么参数我没设置对吗?运行步骤如下： python3 dual_space_encoder_test.py --dataset_dir image/ --output_dir output --checkpoint_path best_model.pt

python our_interfaceGAN/edit_all_inversion_ffhq.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --attribute_name gender --z_latent output/encoder_inversion/ffhq_encode/encoded_z.npy --p_latent output/encoder_inversion/ffhq_encode/encoded_p.npy --num_sample 1000

opened by chensheng-code 1
Have you ever tried to generate rectangular images?

The resolution of my dataset images is 256×512, and I want the initial feature map F0 to have size 2×4×512. I tried to changed the p size into 8×512 while keeping the size of z at 16×512. However the attention module does not seem to train properly in this case. So, do I have to compress the dimensions of z to be the same as p since I don't want to change the resolution of the input image? Or is there any solution for this problem?

Thank you!

opened by pupuchuan 1

[CVPR 2022] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

Related tags

Overview

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing (CVPR 2022)

Requirements

Dataset Preparation

Download Pretrained Models

Training New Networks

Testing (Image Generation/Interpolation)

Inversion

Image Editing

Evaluation Metrics

Results

Image Interpolation

Image Editing

Citation

Acknowledgments

You might also like...

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

The NEOSSat is a dual-mission microsatellite designed to detect potentially hazardous Earth-orbit-crossing asteroids and track objects that reside in deep space

Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022

Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search, accepted by IJCAI 2021.

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing

3D-Transformer: Molecular Representation with Transformer in 3D Space

Comments

error in multi-gpu distributed training

编辑自己的图片

编辑自己的图片

Have you ever tried to generate rectangular images?

Owner

Billy XU

Stitch it in Time: GAN-Based Facial Editing of Real Videos

The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

Official implementation for "Style Transformer for Image Inversion and Editing" (CVPR 2022)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)