[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

MIT HAN Lab

Last update: Dec 28, 2022

Related tags

Deep Learning computer-vision deep-learning computer-graphics pytorch image-editing generative-adversarial-network gan image-manipulation image-generation gans stylegan2

Overview

Anycost GAN

video | paper | website

Anycost GANs for Interactive Image Synthesis and Editing

Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zhu

MIT, Adobe Research, CMU

In CVPR 2021

Anycost GAN generates consistent outputs under various computational budgets.

Demo

Here, we can use the Anycost generator for interactive image editing. A full generator takes ~3s to render an image, which is too slow for editing. While with Anycost generator, we can provide a visually similar preview at 5x faster speed. After adjustment, we hit the "Finalize" button to synthesize the high-quality final output. Check here for the full demo.

Overview

Anycost generators can be run at diverse computation costs by using different channel and resolution configurations. Sub-generators achieve high output consistency compared to the full generator, providing a fast preview.

With (1) Sampling-based multi-resolution training, (2) adaptive-channel training, and (3) generator-conditioned discriminator, we achieve high image quality and consistency at different resolutions and channels.

Results

Anycost GAN (uniform channel version) supports 4 resolutions and 4 channel ratios, producing visually consistent images with different image fidelity.

The consistency retains during image projection and editing:

Usage

Getting Started

Clone this repo:

git clone https://github.com/mit-han-lab/anycost-gan.git
cd anycost-gan

Install PyTorch 1.7 and other dependeinces.

We recommend setting up the environment using Anaconda: conda env create -f environment.yml

Introduction Notebook

We provide a jupyter notebook example to show how to use the anycost generator for image synthesis at diverse costs: notebooks/intro.ipynb.

We also provide a colab version of the notebook: . Be sure to select the GPU as the accelerator in runtime options.

Interactive Demo

We provide an interactive demo showing how we can use anycost GAN to enable interactive image editing. To run the demo:

python demo.py

You can find a video recording of the demo here.

Using Pre-trained Models

To get the pre-trained generator, encoder, and editing directions, run:

import model

pretrained_type = 'generator'  # choosing from ['generator', 'encoder', 'boundary']
config_name = 'anycost-ffhq-config-f'  # replace the config name for other models
model.get_pretrained(pretrained_type, config=config_name)

We also provide the face attribute classifier (which is general for different generators) for computing the editing directions. You can get it by running:

model.get_pretrained('attribute-predictor')

The attribute classifier takes in the face images in FFHQ format.

After loading the Anycost generator, we can run it at a wide range of computational costs. For example:

from model.dynamic_channel import set_uniform_channel_ratio, reset_generator

g = model.get_pretrained('generator', config='anycost-ffhq-config-f')  # anycost uniform
set_uniform_channel_ratio(g, 0.5)  # set channel
g.target_res = 512  # set resolution
out, _ = g(...)  # generate image
reset_generator(g)  # restore the generator

For detailed usage and flexible-channel anycost generator, please refer to notebooks/intro.ipynb.

Model Zoo

Currently, we provide the following pre-trained generators, encoders, and editing directions. We will add more in the future.

For Anycost generators, by default, we refer to the uniform setting.

config name	generator	encoder	edit direction
anycost-ffhq-config-f	✔️	✔️	✔️
anycost-ffhq-config-f-flexible	✔️	✔️	✔️
anycost-car-config-f	✔️
stylegan2-ffhq-config-f	✔️	✔️	✔️

stylegan2-ffhq-config-f refers to the official StyleGAN2 generator converted from the repo.

Datasets

We prepare the FFHQ, CelebA-HQ, and LSUN Car datasets into a directory of images, so that it can be easily used with ImageFolder from torchvision. The dataset layout looks like:

├── PATH_TO_DATASET
│   ├── images
│   │   ├── 00000.png
│   │   ├── 00001.png
│   │   ├── ...

Due to the copyright issue, you need to download the dataset from official site and process them accordingly.

Evaluation

We provide the code to evaluate some metrics presented in the paper. Some of the code is written with horovod to support distributed evaluation and reduce the cost of inter-GPU communication, which greatly improves the speed. Check its website for a proper installation.

Fre ́chet Inception Distance (FID)

Before evaluating the FIDs, you need to compute the inception features of the real images using scripts like:

python tools/calc_inception.py \
    --resolution 1024 --batch_size 64 -j 16 --n_sample 50000 \
    --save_name assets/inceptions/inception_ffhq_res1024_50k.pkl \
    PATH_TO_FFHQ

or you can download the pre-computed inceptions from here and put it under assets/inceptions.

Then, you can evaluate the FIDs by running:

horovodrun -np N_GPU \
    python metrics/fid.py \
    --config anycost-ffhq-config-f \
    --batch_size 16 --n_sample 50000 \
    --inception assets/inceptions/inception_ffhq_res1024_50k.pkl
    # --channel_ratio 0.5 --target_res 512  # optionally using a smaller resolution/channel

Perceptual Path Lenght (PPL)

Similary, evaluting the PPL with:

horovodrun -np N_GPU \
    python metrics/ppl.py \
    --config anycost-ffhq-config-f

Attribute Consistency

Evaluating the attribute consistency by running:

horovodrun -np N_GPU \
    python metrics/attribute_consistency.py \
    --config anycost-ffhq-config-f \
    --channel_ratio 0.5 --target_res 512  # config for the sub-generator; necessary

Encoder Evaluation

To evaluate the performance of the encoder, run:

python metrics/eval_encoder.py \
    --config anycost-ffhq-config-f \
    --data_path PATH_TO_CELEBA_HQ

Training

The training code will be updated shortly.

Citation

If you use this code for your research, please cite our paper.

@inproceedings{lin2021anycost,
  author    = {Lin, Ji and Zhang, Richard and Ganz, Frieder and Han, Song and Zhu, Jun-Yan},
  title     = {Anycost GANs for Interactive Image Synthesis and Editing},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2021},
}

Related Projects

GAN Compression | Once for All | iGAN | StyleGAN2

Acknowledgement

We thank Taesung Park, Zhixin Shu, Muyang Li, and Han Cai for the helpful discussion. Part of the work is supported by NSF CAREER Award #1943349, Adobe, Naver Corporation, and MIT-IBM Watson AI Lab.

The codebase is build upon a PyTorch implementation of StyleGAN2: rosinality/stylegan2-pytorch. For editing direction extraction, we refer to InterFaceGAN.

Comments

using win run demo.py need cuda but cant using force-native=1

hi,thanks for job. my win10 pc run stylegan3 is ok but run anycost-gan , demo.py need cuda, using force-native=1, my pc cant understand,anyone can help me,thanks very much.

opened by fingerx 8

wrong image generate by using config: stylegan2-

import torch
import numpy as np
import os
from PIL import Image
from models.dynamic_channel import set_uniform_channel_ratio, reset_generator
import models


class FaceEditor:
    def __init__(self, config, device, anycost_resolution=1024, n_style_to_change=12):
        # load assets
        self.device = device
        self.anycost_channel = 1.0
        self.anycost_resolution = anycost_resolution
        self.n_style_to_change = n_style_to_change

        # build the generator
        self.generator = models.get_pretrained('generator', config).to(device)
        self.generator.eval()
        set_uniform_channel_ratio(self.generator, 0.5)  # set channel
        self.generator.target_res = anycost_resolution  # set resolution
        # self.generator.target_res = self.anycost_resolution
        self.mean_latent = self.generator.mean_style(10000)

        # select only a subset of the directions to use
        '''
        possible keys:
        ['00_5_o_Clock_Shadow', '01_Arched_Eyebrows', '02_Attractive', '03_Bags_Under_Eyes', '04_Bald', '05_Bangs',
            '06_Big_Lips', '07_Big_Nose', '08_Black_Hair', '09_Blond_Hair', '10_Blurry', '11_Brown_Hair', '12_Bushy_Eyebrows',
            '13_Chubby', '14_Double_Chin', '15_Eyeglasses', '16_Goatee', '17_Gray_Hair', '18_Heavy_Makeup', '19_High_Cheekbones',
            '20_Male', '21_Mouth_Slightly_Open', '22_Mustache', '23_Narrow_Eyes', '24_No_Beard', '25_Oval_Face', '26_Pale_Skin',
            '27_Pointy_Nose', '28_Receding_Hairline', '29_Rosy_Cheeks', '30_Sideburns', '31_Smiling', '32_Straight_Hair',
            '33_Wavy_Hair', '34_Wearing_Earrings', '35_Wearing_Hat', '36_Wearing_Lipstick', '37_Wearing_Necklace',
            '38_Wearing_Necktie', '39_Young']
        '''

        direction_map = {
            'smiling': '31_Smiling',
            'young': '39_Young',
            'wavy hair': '33_Wavy_Hair',
            'gray hair': '17_Gray_Hair',
            'blonde hair': '09_Blond_Hair',
            'eyeglass': '15_Eyeglasses',
            'mustache': '22_Mustache',
        }

        boundaries = models.get_pretrained('boundary', config)
        self.direction_dict = dict()
        for k, v in boundaries.items():
            self.direction_dict[k] = v.view(1, 1, -1)

    def get_latent_code(self, latent_code_path):
        latent_code = torch.from_numpy(np.load(os.path.join(latent_code_path))).view(1, -1, 512)
        return latent_code

    def get_direction_dict(self, attr_weights):
        final_dict = {}
        for key, value in attr_weights.items():
            if value == 0:
                continue
            final_dict[key] = value * self.direction_dict[key]
        return final_dict

    def get_boundary_dict(self):
        return self.direction_dict

    def generate_image(self, save_path, input_kwargs):
        def image_to_np(x):
            assert x.shape[0] == 1
            x = x.squeeze(0).permute(1, 2, 0)
            x = (x + 1) * 0.5  # 0-1
            x = (x * 255).cpu().numpy().astype('uint8')
            return x

        with torch.no_grad():
            out = self.generator(**input_kwargs)[0].clamp(-1, 1)
            out = image_to_np(out)
            out = np.ascontiguousarray(out)
            img_pil = Image.fromarray(out)
            img_pil.save(save_path)

    def edit(self, latent_code_path, attr_sliders, force_full_g=False):
        latent_code = torch.from_numpy(np.load(os.path.join(latent_code_path))).view(1, -1, 512).to(self.device)
        # input kwargs for the generator

        edited_code = latent_code.clone()
        for direction_name in attr_sliders.keys():
            edited_code[:, :self.n_style_to_change] = edited_code[:, :self.n_style_to_change] \
                                                 + attr_sliders[direction_name] * self.direction_dict[
                                                     direction_name].to(self.device)

        edited_code = edited_code.to(self.device)
        if not force_full_g:
            set_uniform_channel_ratio(self.generator, self.anycost_channel)
            self.generator.target_res = self.anycost_resolution
        return latent_code, edited_code

if __name__ == '__main__':
    gan_config = 'stylegan2-ffhq-config-f'
    fe = FaceEditor(config=gan_config, device='cuda:0')
    latent_code = torch.from_numpy(np.load(os.path.join(latent_code_path))).view(1, -1, 512).to(self.device)
    ori_kwargs = {'styles': ori, 'noise': None, 'randomize_noise': False, 'input_is_style': True}
      
    fe.generate_image(save_path=ori_save_path, input_kwargs=ori_kwargs)

image generate by config anycost-ffhq-config-f is pretty fine, but there the image generate with config stylegan2-ffhq-config-f is wrong. How can I fix the bug? Thankyou

opened by gongmm 5

Using this tool for another LSUN dataset + model
Hello! Thanks for creating this.

I am trying to use this tool with another model (the LSUN Churches dataset) with sliders that represent attributes.

As I understand it, these are the steps I need to take to configure this toolset to work with a different dataset + pre-trained network:

Make sure the format of the LSUN Churches dataset matches as described for other models

Change the config name as described here to config_name = ''stylegan2-church-config-f", referring to the pre-trained network found here

run models.get_pretrained('attribute-predictor') as described in the pre-trained-models section of README

Change the relevant attribute labels in the files that show up during this search

I am just wondering if there any obvious steps I am missing to get this working, I am very new to the world of GANs and toolsets. Thank you for your time 😊
opened by doctor-gonzo 5

生成图全是灰色

import torch
import numpy as np
import os, random
from PIL import Image
from tqdm import tqdm
from models.dynamic_channel import set_uniform_channel_ratio, reset_generator
import models
import time
import cv2
config = 'anycost-ffhq-config-f'
device = 'cuda:2'

class Face_Editor():
    def __init__(self):
        self.init_model()

    def init_model(self):
        self.anycost_channel = 1.0
        self.anycost_resolution = 1024
        self.generator = models.get_pretrained('generator', config).to(device)
        self.generator.eval()

    def sample(self):
        torch.manual_seed(1601)
        # latent = torch.randn(1, 1, 512, device=device)
        # mean_style = self.generator.mean_style(10000)
        # self.input_kwargs = {'styles':latent, 'return_rgbs':True, 'truncation':0.5,
        #                      'truncation_style':mean_style, 'randomize_noise':False}
        # style = torch.randn(1, 18, 512, device=device)
        style = np.load('/simple/zlp1/masters/anycost-gan/assets/demo_ori/projected_latents/00_ryan.npy')
        style = torch.from_numpy(style).view(1, -1, 512).to(device)
        self.input_kwargs = {'styles': style,
                            'noise': None, 'randomize_noise': False, 'input_is_style': True}
        image = self.generate_image()
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        cv2.imshow('image', image)
        cv2.waitKey(0)

    def generate_image(self):
        def image_to_np(x):
            assert x.shape[0] == 1
            x = x.squeeze(0).permute(1, 2, 0)
            x = (x + 1) * 0.5  # 0-1
            x = (x * 255).cpu().numpy().astype('uint8')
            return x

        with torch.no_grad():
            print(self.input_kwargs)
            out = self.generator(**self.input_kwargs)[0].clamp(-1, 1)
            out = image_to_np(out)
            return out


if __name__ == '__main__':
    FE = Face_Editor()
    FE.sample()

opened by jxust01 5

checkpoint for mult-resolution training
Dear anycost-gan team,

Thank you for sharing this great work, I really like it.

Would you minding sharing the intermedia checkpoint for mult-resolution step? To train anycost gan, we need to do 3 steps:

Training the original StyleGAN2 on FFHQ

Training Anycost GAN: mult-resolution

Training Anycost GAN: adaptive-channel

You provide the checkpoint for 1st and 3rd steps. Would you minding aslo sharing the checkpoint for the second step? I understand that I can train it by myself, but 8 gpus for 5 days is really too heave resource for us.

Thank you for your help.

Best Wishes,

Alex
opened by betterze 5
is this a bug?

Hi @junyanz @songhan , I found a possible bug, not sure, if not please correct me, thank you. when you calculate the fid, you use the transform with random flip: https://github.com/mit-han-lab/anycost-gan/blob/master/tools/calc_inception.py#L53 but when training code, there is just clamp, no flip: https://github.com/mit-han-lab/anycost-gan/blob/master/tools/train_gan.py#L279 That maybe lead to wrong evaluation result

opened by anguoyang 4
I want to embedding 256x256 image and generate 256x256 image test.

Hi, @tonylins Thank you for your good paper.

In the case of this Github, only 256x256 resolution can be encoded. However, it seems that only the resolutions of 1024x1024 and 512x512 are uploaded through the decoder.

What I want to test is to encode and decode a 256x256 image and check whether the same image as the original image comes out.

Can you send me 256x256 anycost-ffhq decoder weight?

opened by youngjae-git 4
mean style requires cuda tensor but got ''cpu'' while runing demo.py

https://github.com/mit-han-lab/anycost-gan/blob/19229bdc525ecd00ca7c2322b192504325bda9e0/models/anycost_gan.py#L85 here 'z' is "cpu" device while self.style needs "cuda" tensor??? Error is: RuntimeError: input must be a CUDA tensor i figured out the reason is the code then running to FusedLeakyReLUFunction ,which requires CUDA tensor. But in demo.py, the device is "cpu". Any idea how to fix this problem?

opened by yian2271368 4
FID for FFHQ 1024

Dear mit-han-lab,

Thank you for sharing with us this great work, I really like it.

In Table 1, you show that multiple resolution outputs have higher image quality compared to single resolution training in config E. Have you try config F, which is the standard stylegan2 mode?

According to FFHQ 1024 leadboard, the stylegan2 has FID of 2.84, while anycost GAN has FID of 2.99, which is a little bit worse. So I am wondering if you use config F as standard StyleGAN2, will you get better results than standard StyleGAN2?

Thank you for your help.

Best Wishes,

Alex

opened by betterze 4
Custom image editing

Question 1: how to generate latent image code in custom image editing Question 2: when customizing image editing properties, can i use all 40 properties to modify without needing to be retrained?I see that demo.py uses eight properties

opened by zhanghongyong123456 3
关于encoder的训练

您好，感谢分享。有一些不太理解的地方，希望能解答。

encoder, generator, discriminator的训练流程是怎样的？我猜测是先discriminator, generator训练完成后，使用generator来训练encoder。这种流程，encoder是不会影响generator。那么是否可以三个模型一起训练。互相影响，达到最优。

opened by zhaoyk1986 2
Color difference in generated image for stylegan2-ffhq-config-f model

Hi, thanks so much for this awesome library. Congrats on the great work!

I've a question regarding the color of the generated image using the stylegan2-ffhq-config-f.

Why are they yellowish and have less contrast? I'm thinking it might be the difference in how the training images for the different generators are normalized, is there a way to reverse the normalization after the images are generated using the stylegan2-ffhq-config-f?

I tried to play around with the parameters in the image_to_np function, but it did not work.

Please see attached for a sample.

Please let me know what I did wrong here or if this is an improvement that could be made.

Thank you!

opened by davidwdw 1
Error on loading pretrained model

Hi @tonylins , I tried with adaptive-channel training, and used the teacher model from your dropbox: https://www.dropbox.com/sh/l8g9amoduz99kjh/AAAY9LYZk2CnsO43ywDrLZpEa?dl=0 stylegan2-ffhq-config-f.pt

But got this error: Could you please kindly tell me what's problem/cause of it, thank you.

opened by anguoyang 1
Default parameters for project.py do not recreate projected latents in assets/demo/projected_latents

Hello, great work! I am wondering what options you use to calculate the projected latents in assets/demo/projected_latents? I am trying to recreate them using the default parameters via: python3 tools/project.py 00_ryan.jpg But the resulting vectors are numerically different and, when viewed in demo.py: (1) the projected image is good but clearly different than the projected image preloaded in the repo and (2) the editing directions don't seem to work very well for this set of latent codes.

Below I've included a screenshot of the behavior I am seeing. Note the differences in his neckline from the demo projection and the lack of any meaningful change in the output image.

opened by CooperNederhood 1

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Related tags

Overview

Anycost GAN

video | paper | website

Demo

Overview

Results

Usage

Getting Started

Introduction Notebook

Interactive Demo

Using Pre-trained Models

Model Zoo

Datasets

Evaluation

Fre ́chet Inception Distance (FID)

Perceptual Path Lenght (PPL)

Attribute Consistency

Encoder Evaluation

Training

Citation

Related Projects

Acknowledgement

Comments

Owner

MIT HAN Lab

[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

Implementation supporting the ICCV 2017 paper "GANs for Biological Image Synthesis"

Codebase for Diffusion Models Beat GANS on Image Synthesis.

PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations

[CVPR 2021] Pytorch implementation of Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Official implementation for "Style Transformer for Image Inversion and Editing" (CVPR 2022)

This is the PyTorch implementation of GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

(ICCV 2021) Official code of "Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing."

A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

[CVPR 2022] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)