Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Last update: Dec 27, 2022

Related tags

Deep Learning swapping-autoencoder-pytorch

Overview

Swapping Autoencoder for Deep Image Manipulation

Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang

UC Berkeley and Adobe Research

Project page | Paper | 3 Min Video

Overview

Swapping Autoencoder consists of autoencoding (top) and swapping (bottom) operation. Top: An encoder E embeds an input (Notre-Dame) into two codes. The structure code is a tensor with spatial dimensions; the texture code is a 2048-dimensional vector. Decoding with generator G should produce a realistic image (enforced by discriminator D matching the input (reconstruction loss). Bottom: Decoding with the texture code from a second image (Saint Basil's Cathedral) should look realistic (via D) and match the texture of the image, by training with a patch co-occurrence discriminator Dpatch that enforces the output and reference patches look indistinguishable.

Installation / Requirements

CUDA 10.1 or newer is required because it uses a custom CUDA kernel of StyleGAN2, ported by @rosinality
The author used PyTorch 1.7.1 on Python 3.6
Install dependencies with pip install dominate torchgeometry func-timeout tqdm matplotlib opencv_python lmdb numpy GPUtil Pillow scikit-learn visdom

Testing and Evaluation.

We provide the pretrained models and also several images that reproduce the figures of the paper. Please download and unzip them here (2.1GB). The scripts assume that the checkpoints are at ./checkpoints/, and the test images at ./testphotos/, but they can be changed by modifying --checkpoints_dir and --dataroot options.

Swapping and Interpolation of the mountain model using sample images

To run simple swapping and interpolation, specify the two input reference images, change input_structure_image and input_texture_image fields of experiments/mountain_pretrained_launcher.py, and run

python -m experiments mountain_pretrained test simple_swapping
python -m experiments mountain_pretrained test simple_interpolation

The provided script, opt.tag("simple_swapping") and opt.tag("simple_interpolation") in particular of experiments/mountain_pretrained_launcher.py, invokes a terminal command that looks similar to the following one.

python test.py --evaluation_metrics simple_swapping \
--preprocess scale_shortside --load_size 512 \
--name mountain_pretrained  \
--input_structure_image [path_to_sample_image] \
--input_texture_image [path_to_sample_image] \
--texture_mix_alpha 0.0 0.25 0.5 0.75 1.0

In other words, feel free to use this command if that feels more straightforward.

The output images are saved at ./results/mountain_pretrained/simpleswapping/.

Texture Swapping

Our Swapping Autoencoder learns to disentangle texture from structure for image editing tasks such as texture swapping. Each row shows the result of combining the structure code of the leftmost image with the texture code of the top image.

To reproduce this image (Figure 4) as well as Figures 9 and 12 of the paper, run the following command:

# Reads options from ./experiments/church_pretrained_launcher.py
python -m experiments church_pretrained test swapping_grid

# Reads options from ./experiments/bedroom_pretrained_launcher.py
python -m experiments bedroom_pretrained test swapping_grid

# Reads options from ./experiments/mountain_pretrained_launcher.py
python -m experiments mountain_pretrained test swapping_grid

# Reads options from ./experiments/ffhq512_pretrained_launcher.py
python -m experiments ffhq512_pretrained test swapping_grid

Make sure the dataroot and checkpoints_dir paths are correctly set in the respective ./experiments/xx_pretrained_launcher.py script.

Quantitative Evaluations

To perform quantitative evaluation such as FID in Table 1, Fig 5, and Table 2, we first need to prepare image pairs of input structure and texture references images.

The reference images are randomly selected from the val set of LSUN, FFHQ, and the Waterfalls dataset. The pairs of input structure and texture images should be located at input_structure/ and input_style/ directory, with the same file name. For example, input_structure/001.png and input_style/001.png will be loaded together for swapping.

Replace the path to the test images at dataroot="./testphotos/church/fig5_tab2/" field of the script experiments/church_pretrained_launcher.py, and run

python -m experiments church_pretrained run_test swapping_for_eval
python -m experiments ffhq1024_pretrained run_test swapping_for_eval

The results can be viewed at ./results (that can be changed using --result_dir option).

The FID is then computed between the swapped images and the original structure images, using https://github.com/mseitzer/pytorch-fid.

Model Training.

Datasets

LSUN Church and Bedroom datasets can be downloaded here. Once downloaded and unzipped, the directories should contain [category]_[train/val]_lmdb/.
FFHQ datasets can be downloaded using this link. This is the zip file of 70,000 images at 1024x1024 resolution. Unzip the files, and we will load the image files directly.
The Flickr Mountains dataset and the Flickr Waterfall dataset are not sharable due to license issues. But the images were scraped from Mountains Anywhere and Waterfalls Around the World, using the Python wrapper for the Flickr API. Please contact Taesung Park with title "Flickr Dataset for Swapping Autoencoder" for more details.

Training Scripts

The training configurations are specified using the scripts in experiments/*_launcher.py. Use the following commands to launch various trainings.

# Modify |dataroot| and |checkpoints_dir| at
# experiments/[church,bedroom,ffhq,mountain]_launcher.py
python -m experiments church train church_default
python -m experiments bedroom train bedroom_default
python -m experiments ffhq train ffhq512_default
python -m experiments ffhq train ffhq1024_default

# By default, the script uses GPUtil to look at available GPUs
# on the machine and sets appropriate GPU IDs. To specify specific set of GPUs,
# use the |--gpu| option. Be sure to also change |num_gpus| option in the corresponding script.
python -m experiments church train church_default --gpu 01234567

The training progress can be monitored using visdom at the port number specified by --display_port. The default is https://localhost:2004.

Additionally, a few swapping grids are generated using random samples of the training set. They are saved as webpages at [checkpoints_dir]/[expr_name]/snapshots/. The frequency of the grid generation is controlled using --evaluation_freq.

All configurable parameters are printed at the beginning of training. These configurations are spreaded throughout the codes in def modify_commandline_options of relevant classes, such as models/swapping_autoencoder_model.py, util/iter_counter.py, or models/networks/encoder.py. To change these configuration, simply modify the corresponding option in opt.specify of the training script.

The code for parsing and configurations are at experiments/__init__.py, experiments/__main__.py, experiments/tmux_launcher.py.

Continuing training.

The training continues by default from the last checkpoint, because the --continue_train option is set True by default. To start from scratch, remove the checkpoint, or specify continue_train=False in the training script (e.g. experiments/church_launcher.py).

Code Structure (Main Functions)

models/swapping_autoencoder_model.py: The core file that defines losses, produces visuals.
optimizers/swapping_autoencoder_optimizer.py: Defines the optimizers and alternating training of GAN.
models/networks/: contains the model architectures generator.py, discriminator.py, encoder.py, patch_discrimiantor.py, stylegan2_layers.py.
options/__init__.py: contains basic option flags. BUT many important flags are spread out over files, such as swapping_autoencoder_model.py or generator.py. When the program starts, these options are all parsed together. The best way to check the used option list is to run the training script, and look at the console output of the configured options.
util/iter_counter.py: contains iteration counting.

Change Log

4/14/2021: The configuration to train the pretrained model on the Mountains dataset had not been set correctly, and was updated accordingly.

Bibtex

If you use this code for your research, please cite our paper:

@inproceedings{park2020swapping,
  title={Swapping Autoencoder for Deep Image Manipulation},
  author={Park, Taesung and Zhu, Jun-Yan and Wang, Oliver and Lu, Jingwan and Shechtman, Eli and Efros, Alexei A. and Zhang, Richard},
  booktitle={Advances in Neural Information Processing Systems},
  year={2020}
}

Acknowledgment

The StyleGAN2 layers heavily borrows (or rather, directly copies!) the PyTorch implementation of @rosinality. We thank Nicholas Kolkin for the helpful discussion on the automated content and style evaluation, Jeongo Seo and Yoseob Kim for advice on the user interface, and William T. Peebles, Tongzhou Wang, and Yu Sun for the discussion on disentanglement.

Comments

Python model translation in to torch.jit.trace model

Hello! I'm try to convert this Swapping GAN model in to C/C++ form using just in time compilation and i have some trouble. I use this approach. I try to compilate class BaseModel() from base_model.py and try to apply torch.jit.trace(model) translation, but in this model there are no standard forward method, as for other models. Something like this:

      class MyRNNLoop(torch.nn.Module):
          def __init__(self):
              super(MyRNNLoop, self).__init__()
              self.cell = torch.jit.trace(MyCell(scripted_gate), (x, h))
      
          def forward(self, xs):
              h, y = torch.zeros(3, 4), torch.zeros(3, 4)
              for i in range(xs.size(0)):
                  y, h = self.cell(xs[i], h)
              return y, h
      
      rnn_loop = torch.jit.script(MyRNNLoop())
      print(rnn_loop.code)

traced_script_module = torch.jit.trace(BaseModel, example_inputs =imgCuda) # used GPU tensor

In base_model.py there is code:

 def forward(self, *args, command=None, **kwargs):
        """ wrapper for multigpu training. BaseModel is expected to be
        wrapped in nn.parallel.DataParallel, which distributes its call to
        the BaseModel instance on each GPU """
        if command is not None:
            method = getattr(self, command)
            assert callable(method), "[%s] is not a method of %s" % (command, type(self).__name__)
            return method(*args, **kwargs)
        else:
            raise ValueError(command)

with prams like this: command=None,** kwargs

Any ideas, how can i use torch.jit.trace(model) method with directly conversion for swapping-autoencoder-pytorch How can I use this method ( def forward(self, *args, command=None, **kwargs):) similar to typical models, to get traced model for C / C ++. You can check standard approach for model tracing here: https://pytorch.org/tutorials/advanced/cpp_export.html Do you have non-gpu or more simple realization swapping-autoencoder-pytorch?

opened by AlexTitovWork 4

Training time on FFHQ

Thanks for sharing your awesome work. I'm trying to reproduce the results of FFHQ and I want to know how long it takes to train the pretrained models of FFHQ. Thanks~

opened by VitoChien 2
reflection padding

Thanks for your awesome work. I have a little question about the paper. In the paper mentioned: "To prevent the texture code from encoding positional information, we apply reflection padding for the residual blocks, and then no padding for the conv blocks. ". Hope you will explain.

opened by baopmessi 2
Training issue

Thanks for the awesome work. I have used church dataset to train the model on a single RTX2080ti with batchsize of 4 over 1,850,000 iterations. But the result still has many artifacts in some complex places. Whether this is because the batch size is too small or simply because the number of iterations is not enough.

opened by zacharyclam 2
some questions about 1024 resolution

Hi, great work! But I have some questions about 1024 resolution. In the paper: could you provide the detail about the implemention of smaller network capacity? And In Appendix B B.3 Datasets: the model is initially trained at 512 x 512 resolution,and finetuned at 1024 resolution. Is the model with smaller network capacity?

opened by hughwcq 1
Why are the spatial and global code normalized twice?

Hi, Thanks for open-sourcing this awesome work! I noticed that there seems to be two normalization operations on the spatial and global code, one in the end of the encoder and one in the beginning of the generator, is there any special reason for this design? Thanks!

opened by sunshineatnoon 1
Training freezes at the same iteration with no error

Hi! I'm trying to run your code on the Church dataset with the command:

python -m experiments church train church_default with batch_size=16, num_gpus=4

The training freezes at the 888000-th iteration with the following message:

(iters: 888000, data: 0.000, train: 0.050, maintenance: 0.000) D_R1: 0.089 D_mix: 0.292 D_real: 0.597 D_rec: 0.290 D_total: 2.495 G_GAN_mix: 0.934 G_GAN_rec: 0.467 G_L1: 0.211 G_mix: 0.805 L1_dist: 0.211 PatchD_mix: 0.652 PatchD_real: 0.659

Training doesn’t go further after this iteration and just freezes with no error. I’ve also tried to run training on the Bedrooms dataset with batch_size=32, num_gpus=8. In addition, I’ve tried to run training in the single gpu setup on both datasets. In all cases the 888000-th number of «freezing» iteration and behavior were the same. Checkpoints and shapshots aren’t saved after this iteration as well. This justifies that the script doesn’t continue training.

What could be the possible reason for such behavior? Thank you in advance.

opened by kaaeaate 0

Question about the code

Thank you very much for the code! It's really great!

In the compute_generator_losses function, one can read:

if self.opt.lambda_PatchGAN > 0.0:
    real_feat = self.Dpatch.extract_features(
        self.get_random_crops(real),
        aggregate=self.opt.patch_use_aggregation).detach()
    mix_feat = self.Dpatch.extract_features(self.get_random_crops(mix))

    losses["G_mix"] = loss.gan_loss(
        self.Dpatch.discriminate_features(real_feat, mix_feat),
        should_be_classified_as_real=True,
    ) * self.opt.lambda_PatchGAN

and in the compute_patch_discriminator_losses function, one can read:

real_feat = self.Dpatch.extract_features(
            self.get_random_crops(real),
            aggregate=self.opt.patch_use_aggregation
        )
target_feat = self.Dpatch.extract_features(self.get_random_crops(real))
mix_feat = self.Dpatch.extract_features(self.get_random_crops(mix))
losses["PatchD_mix"] = loss.gan_loss(
    self.Dpatch.discriminate_features(real_feat, mix_feat),
    should_be_classified_as_real=False,
) * self.opt.lambda_PatchGAN

Why is should_be_classified_as_real set to True for G_mix and not for PatchD_mix?

When it comes to patches, shouldn't the gan_loss receive True when patches come from images with the same texture and False when they come from images with different textures?

opened by alamie 1

The outputs are not correct.

Hello, I load the checkpoints and testphotos, and run the command python -m experiments ffhq512_pretrained test swapping_grid and get the output

The same as other commands. I didn't modified the code. Is there something wrong?

My environment: python 3.7.3 pytorch 1.8.0

opened by panhtt 2
About Finetune

Hi,great work! I just finetuned the officially mountain checkpoints you provided,but there were many artifacts.So may I ask you about your train setting or some training details?

opened by hughwcq 3
structure not being retained

hi @taesungp great piece of work, I trained it on my dataset of 50k images for 50 Mil iterations as you suggested, on testing time the results are quite impressive but in some cases, the structure is not being correctly reconstructed, I would like that the shapes be generated almost the same(full swapping). What can be the problem will training further help?

opened by haiderasad 5

Owner

Ph.D. student @ UC Berkeley https://taesung.me

GitHub

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

10 Oct 11, 2022

Official PyTorch Implementation for InfoSwap: Information Bottleneck Disentanglement for Identity Swapping

InfoSwap: Information Bottleneck Disentanglement for Identity Swapping Code usage Please check out the user manual page. Paper Gege Gao, Huaibo Huang,

56 Dec 20, 2022

Official implementation of Generalized Data Weighting via Class-level Gradient Manipulation (NeurIPS 2021).

Generalized Data Weighting via Class-level Gradient Manipulation This repository is the official implementation of Generalized Data Weighting via Clas

9 Nov 3, 2021

Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Manga Filling with ScreenVAE SIGGRAPH ASIA 2020 | Project Website | BibTex This repository is for ScreenVAE introduced in the following paper "Manga F

30 Dec 24, 2022

Official implementation of "SinIR: Efficient General Image Manipulation with Single Image Reconstruction" (ICML 2021)

SinIR (Official Implementation) Requirements To install requirements: pip install -r requirements.txt We used Python 3.7.4 and f-strings which are in

47 Oct 11, 2022

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

48 Dec 30, 2022

Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

SegSwap Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery" [PDF] [Project page] If our project

41 Dec 10, 2022

Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper

Continual Learning With Filter Atom Swapping Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper If find t

11 Aug 29, 2022

Swapping face using Face Mesh with TensorFlow Lite

17 Apr 26, 2022

Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

GS-WGAN This repository contains the implementation for GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators (NeurIPS

46 Nov 9, 2022

Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Likelihood-Regret Official implementation of Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020. T

33 Oct 12, 2022

Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Official implementation of GOCor This is the official implementation of our paper : GOCor: Bringing Globally Optimized Correspondence Volumes into You

71 Nov 18, 2022

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

RAVE: Realtime Audio Variational autoEncoder Official implementation of RAVE: A variational autoencoder for fast and high-quality neural audio synthes

587 Jan 1, 2023

Spatial Action Maps for Mobile Manipulation (RSS 2020)

spatial-action-maps Update: Please see our new spatial-intention-maps repository, which extends this work to multi-agent settings. It contains many ne

27 Nov 30, 2022

Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

FFD Source Code Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face M

88 Nov 22, 2022

Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Related tags

Overview

Swapping Autoencoder for Deep Image Manipulation

Project page | Paper | 3 Min Video

Overview

Installation / Requirements

Testing and Evaluation.

Swapping and Interpolation of the mountain model using sample images

Texture Swapping

Quantitative Evaluations

Model Training.

Datasets

Training Scripts

Continuing training.

Code Structure (Main Functions)

Change Log

Bibtex

Acknowledgment

Comments

Owner

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Official PyTorch Implementation for InfoSwap: Information Bottleneck Disentanglement for Identity Swapping

Official implementation of Generalized Data Weighting via Class-level Gradient Manipulation (NeurIPS 2021).

Implementation for "Manga Filling Style Conversion with Screentone Variational Autoencoder" (SIGGRAPH ASIA 2020 issue)

Official implementation of "SinIR: Efficient General Image Manipulation with Single Image Reconstruction" (ICML 2021)

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper

Swapping face using Face Mesh with TensorFlow Lite

Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Spatial Action Maps for Mobile Manipulation (RSS 2020)

Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)

[NeurIPS 2020] Official repository for the project "Listening to Sound of Silence for Speech Denoising"

[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior