Official Pytorch Implementation for Splicing ViT Features for Semantic Appearance Transfer presenting Splice

Omer Bar Tal

Last update: Jan 6, 2023

Related tags

Deep Learning splice style-transfer image-translation generative-models single-image-generation vision-transformer

Overview

Splicing ViT Features for Semantic Appearance Transfer [Project Page]

Splice is a method for semantic appearance transfer, as described in Splicing ViT Features for Semantic Appearance Transfer (link to paper).

Given two input images—a source structure image and a target appearance image–our method generates a new image in which the structure of the source image is preserved, while the visual appearance of the target image is transferred in a semantically aware manner. That is, objects in the structure image are “painted” with the visual appearance of semantically related objects in the appearance image. Our method leverages a self-supervised, pre-trained ViT model as an external semantic prior. This allows us to train our generator only on a single input image pair, without any additional information (e.g., segmentation/correspondences), and without adversarial training. Thus, our framework can work across a variety of objects and scenes, and can generate high quality results in high resolution (e.g., HD).

Getting Started

Installation

git clone https://github.com/omerbt/Splice.git
pip install -r requirements.txt

Run examples

Run the following command to start training

python train.py --dataroot datasets/cows

Intermediate results will be saved to /out/output.png during optimization. The frequency of saving intermediate results is indicated in the save_epoch_freq flag of the configuration.

Sample Results

Citation

@article{Splice2022,
    author = {Tumanyan, Narek
              and Bar-Tal, Omer
              and Bagon, Shai
              and Dekel, Tali
              },
    title = {Splicing ViT Features for Semantic Appearance Transfer}, 
    journal = {arXiv preprint arXiv:2201.00424},
    year  = {2022}
}

Comments

CUDA out of memory after a few epochs

Trying your code in Colab, everything goes well for a few minutes (~300 epochs) and then it gets a CUDA memory error. Is there a leak somewhere? https://colab.research.google.com/drive/17UgzBmKtqRXniuG6fHqMGh3xIaVIJnd2?usp=sharing

Traceback (most recent call last):
  File "train.py", line 87, in <module>
    train_model(dataroot)
  File "train.py", line 58, in train_model
    losses = criterion(outputs, inputs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/Splice/util/losses.py", line 64, in forward
    losses['loss_global_cls'] = self.calculate_crop_cls_loss(outputs['x_global'], inputs['B_global'])
  File "/content/Splice/util/losses.py", line 90, in calculate_crop_cls_loss
    cls_token = self.extractor.get_feature_from_input(a)[-1][0, 0, :]
  File "/content/Splice/models/extractor.py", line 84, in get_feature_from_input
    self.model(input_img)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/.cache/torch/hub/facebookresearch_dino_main/vision_transformer.py", line 212, in forward
    x = blk(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1071, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/root/.cache/torch/hub/facebookresearch_dino_main/vision_transformer.py", line 108, in forward
    y, attn = self.attn(self.norm1(x))
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1071, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/root/.cache/torch/hub/facebookresearch_dino_main/vision_transformer.py", line 85, in forward
    attn = (q @ k.transpose(-2, -1)) * self.scale
RuntimeError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 11.17 GiB total capacity; 10.35 GiB already allocated; 23.81 MiB free; 10.68 GiB reserved in total by PyTorch)

opened by jonilaserson 2

Disabling L_struct

I'm trying to recreate the results in figure 9 in your paper. Specifically, disabling L_structure. Are both lambda_global_ssim and lambda_entire_ssim taken together form the entirety of L_struct or am I misreading the code?

So far tinkering with the various lambda terms, it seems the result always retains very high structural fidelity to the structure image.

opened by urimerhav 2
Batch Size?

Hello, very cool project! Is there a parameter to change the batch size? It seems that 8GB of VRAM isn't enough and I'm getting CUDA OOM errors. I've tried lowering the image resolutions but I still get the same issue.

opened by ExponentialML 2
Experiments with multiple target images

Hi!

I love your work and have a question regarding multiple target images. Have you experimented with multiple target images? For example: Source Domain - Horses Target Domain - Zebras What happens if we pick a new zebra image as the target for every epoch?

Thanks!

opened by aelmiger 1
Optimize ssim

Hey,

I noticed that in the calculation of self-similarity you calculate the norm of the same thing twice. I changed it, so it would be computed only once.

opened by RafailFridman 1
Fix #134: Empty hook_handlers list in the Extractor

By emptying the hook handler list, python can remove the hook handlers that are no longer active (but still take up memory). In my test run, the loss keeps decreasing after this change

opened by MichaelDoron 1
Memory leakage causes running time to increase from epoch to epoch

Pytorch version 1.10.1 CUDA version 11.3

When running the model, the time per iteration increases with each epoch. When printing import os, psutil; process = psutil.Process(os.getpid()); print(process.memory_info().rss), the memory used by the CPU increases with each epoch. After digging a bit, I think the issue is with the handlers, that are removed but aren't deleted from the hook_handlers list (the length of the hook_handlers list keeps increasing), so python's garbage collector does not collect them.

opened by MichaelDoron 0
URLError:

Hi, thanks for your great work. when I try to training model, meet this error. self.model = torch.hub.load('facebookresearch/dino:main', model_name).to(device) -- > URLError: It maybe my machine is no internet, but I try using "torch.save()" , and "torch.load" to replace "torch.hub.load " , it was error again. what shoud I fix this? Thanks again , waiting for your reply.

opened by DWCTOD 0

ValueError: high is out of bounds for int32 in seed

In the beginning, the seed is being created as int32 for some reason. Maybe pick a lower seed?

Traceback (most recent call last):
  File "train.py", line 88, in <module>
    train_model(dataroot)
  File "train.py", line 28, in train_model
    seed = np.random.randint(2 ** 32)
  File "mtrand.pyx", line 745, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1343, in numpy.random._bounded_integers._rand_int32
ValueError: high is out of bounds for int32

opened by RafailFridman 0

Owner

Omer Bar Tal

GitHub

A PyTorch Implementation of ViT (Vision Transformer)

ViT - Vision Transformer This is an implementation of ViT - Vision Transformer by Google Research Team through the paper "An Image is Worth 16x16 Word

7 May 11, 2022

PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

MoCo v3 for Self-supervised ResNet and ViT Introduction This is a PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT. The original M

887 Jan 8, 2023

PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

MAE for Self-supervised ViT Introduction This is an unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-sup

36 Oct 30, 2022

Official implement of Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer This repository contains the PyTorch code for Evo-ViT. This work proposes a slow-fas

53 Dec 5, 2022

Official pytorch code for SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal

SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal This is the official pytorch code for SSAT: A Symmetric Semantic-

57 Dec 13, 2022

Implementing Vision Transformer (ViT) in PyTorch

Lightning-Hydra-Template A clean and scalable template to kickstart your deep learning project ?? ⚡ ?? Click on Use this template to initialize new re

2 Dec 24, 2021

DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control

DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control One version of our system is implemented using the

260 Nov 28, 2022

[CVPR'21] DeepSurfels: Learning Online Appearance Fusion

DeepSurfels: Learning Online Appearance Fusion Paper | Video | Project Page This is the official implementation of the CVPR 2021 submission DeepSurfel

52 Nov 14, 2022

Unified tracking framework with a single appearance model

Paper: Do different tracking tasks require different appearance model? [ArXiv] (comming soon) [Project Page] (comming soon) UniTrack is a simple and U

300 Dec 24, 2022

Canonical Appearance Transformations

CAT-Net: Learning Canonical Appearance Transformations Code to accompany our paper "How to Train a CAT: Learning Canonical Appearance Transformations

54 Dec 24, 2022

Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance

Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance Project Page | Paper | Data This repository contains an implementatio

521 Dec 30, 2022

A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squares.

W.I.P-Aim-Memory-Game A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squar

1 Dec 8, 2021

SLAMP: Stochastic Latent Appearance and Motion Prediction

SLAMP: Stochastic Latent Appearance and Motion Prediction Official implementation of the paper SLAMP: Stochastic Latent Appearance and Motion Predicti

34 Dec 8, 2022

So-ViT: Mind Visual Tokens for Vision Transformer

So-ViT: Mind Visual Tokens for Vision Transformer Introduction This repository contains the source code under PyTorch framework and models trai

44 Nov 24, 2022

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

75 Dec 2, 2022

Official Pytorch Implementation for Splicing ViT Features for Semantic Appearance Transfer presenting Splice

Related tags

Overview

Splicing ViT Features for Semantic Appearance Transfer [Project Page]

Getting Started

Installation

Run examples

Sample Results

Citation

Comments

CUDA out of memory after a few epochs

Disabling L_struct

Batch Size?

Experiments with multiple target images

Optimize ssim

Fix #134: Empty hook_handlers list in the Extractor

Memory leakage causes running time to increase from epoch to epoch

URLError:

ValueError: high is out of bounds for int32 in seed

Owner

Omer Bar Tal

A PyTorch Implementation of ViT (Vision Transformer)

PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

Official implement of Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

Official pytorch code for SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal

Implementing Vision Transformer (ViT) in PyTorch

DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control

[CVPR'21] DeepSurfels: Learning Online Appearance Fusion

Unified tracking framework with a single appearance model

Canonical Appearance Transformations

Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance

A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squares.

SLAMP: Stochastic Latent Appearance and Motion Prediction

So-ViT: Mind Visual Tokens for Vision Transformer

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

A simple approach to emable dense segmentation with ViT.

A simple program for training and testing vit

This project uses ViT to perform image classification tasks on DATA set CIFAR10.

vit for few-shot classification