Official Pytorch Implementation for Splicing ViT Features for Semantic Appearance Transfer presenting Splice

Overview

Splicing ViT Features for Semantic Appearance Transfer [Project Page]

arXiv Pytorch teaser

Splice is a method for semantic appearance transfer, as described in Splicing ViT Features for Semantic Appearance Transfer (link to paper).

Given two input images—a source structure image and a target appearance image–our method generates a new image in which the structure of the source image is preserved, while the visual appearance of the target image is transferred in a semantically aware manner. That is, objects in the structure image are “painted” with the visual appearance of semantically related objects in the appearance image. Our method leverages a self-supervised, pre-trained ViT model as an external semantic prior. This allows us to train our generator only on a single input image pair, without any additional information (e.g., segmentation/correspondences), and without adversarial training. Thus, our framework can work across a variety of objects and scenes, and can generate high quality results in high resolution (e.g., HD).

Getting Started

Installation

git clone https://github.com/omerbt/Splice.git
pip install -r requirements.txt

Run examples

Run the following command to start training

python train.py --dataroot datasets/cows

Intermediate results will be saved to /out/output.png during optimization. The frequency of saving intermediate results is indicated in the save_epoch_freq flag of the configuration.

Sample Results

plot

Citation

@article{Splice2022,
    author = {Tumanyan, Narek
              and Bar-Tal, Omer
              and Bagon, Shai
              and Dekel, Tali
              },
    title = {Splicing ViT Features for Semantic Appearance Transfer}, 
    journal = {arXiv preprint arXiv:2201.00424},
    year  = {2022}
}
Comments
  • CUDA out of memory after a few epochs

    CUDA out of memory after a few epochs

    Trying your code in Colab, everything goes well for a few minutes (~300 epochs) and then it gets a CUDA memory error. Is there a leak somewhere? https://colab.research.google.com/drive/17UgzBmKtqRXniuG6fHqMGh3xIaVIJnd2?usp=sharing

    Traceback (most recent call last):
      File "train.py", line 87, in <module>
        train_model(dataroot)
      File "train.py", line 58, in train_model
        losses = criterion(outputs, inputs)
      File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
        return forward_call(*input, **kwargs)
      File "/content/Splice/util/losses.py", line 64, in forward
        losses['loss_global_cls'] = self.calculate_crop_cls_loss(outputs['x_global'], inputs['B_global'])
      File "/content/Splice/util/losses.py", line 90, in calculate_crop_cls_loss
        cls_token = self.extractor.get_feature_from_input(a)[-1][0, 0, :]
      File "/content/Splice/models/extractor.py", line 84, in get_feature_from_input
        self.model(input_img)
      File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
        return forward_call(*input, **kwargs)
      File "/root/.cache/torch/hub/facebookresearch_dino_main/vision_transformer.py", line 212, in forward
        x = blk(x)
      File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1071, in _call_impl
        result = forward_call(*input, **kwargs)
      File "/root/.cache/torch/hub/facebookresearch_dino_main/vision_transformer.py", line 108, in forward
        y, attn = self.attn(self.norm1(x))
      File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1071, in _call_impl
        result = forward_call(*input, **kwargs)
      File "/root/.cache/torch/hub/facebookresearch_dino_main/vision_transformer.py", line 85, in forward
        attn = (q @ k.transpose(-2, -1)) * self.scale
    RuntimeError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 11.17 GiB total capacity; 10.35 GiB already allocated; 23.81 MiB free; 10.68 GiB reserved in total by PyTorch)
    
    opened by jonilaserson 2
  • Disabling L_struct

    Disabling L_struct

    I'm trying to recreate the results in figure 9 in your paper. Specifically, disabling L_structure. Are both lambda_global_ssim and lambda_entire_ssim taken together form the entirety of L_struct or am I misreading the code?

    So far tinkering with the various lambda terms, it seems the result always retains very high structural fidelity to the structure image.

    opened by urimerhav 2
  • Batch Size?

    Batch Size?

    Hello, very cool project! Is there a parameter to change the batch size? It seems that 8GB of VRAM isn't enough and I'm getting CUDA OOM errors. I've tried lowering the image resolutions but I still get the same issue.

    opened by ExponentialML 2
  • Experiments with multiple target images

    Experiments with multiple target images

    Hi!

    I love your work and have a question regarding multiple target images. Have you experimented with multiple target images? For example: Source Domain - Horses Target Domain - Zebras What happens if we pick a new zebra image as the target for every epoch?

    Thanks!

    opened by aelmiger 1
  • Optimize ssim

    Optimize ssim

    Hey,

    I noticed that in the calculation of self-similarity you calculate the norm of the same thing twice. I changed it, so it would be computed only once.

    opened by RafailFridman 1
  • Fix #134: Empty hook_handlers list in the Extractor

    Fix #134: Empty hook_handlers list in the Extractor

    By emptying the hook handler list, python can remove the hook handlers that are no longer active (but still take up memory). In my test run, the loss keeps decreasing after this change

    opened by MichaelDoron 1
  • Memory leakage causes running time to increase from epoch to epoch

    Memory leakage causes running time to increase from epoch to epoch

    Pytorch version 1.10.1 CUDA version 11.3

    When running the model, the time per iteration increases with each epoch. When printing import os, psutil; process = psutil.Process(os.getpid()); print(process.memory_info().rss), the memory used by the CPU increases with each epoch. After digging a bit, I think the issue is with the handlers, that are removed but aren't deleted from the hook_handlers list (the length of the hook_handlers list keeps increasing), so python's garbage collector does not collect them.

    opened by MichaelDoron 0
  • URLError: <urlopen error>

    URLError:

    Hi, thanks for your great work. when I try to training model, meet this error. self.model = torch.hub.load('facebookresearch/dino:main', model_name).to(device) -- > URLError: It maybe my machine is no internet, but I try using "torch.save()" , and "torch.load" to replace "torch.hub.load " , it was error again. what shoud I fix this? Thanks again , waiting for your reply.

    opened by DWCTOD 0
  • ValueError: high is out of bounds for int32 in seed

    ValueError: high is out of bounds for int32 in seed

    In the beginning, the seed is being created as int32 for some reason. Maybe pick a lower seed?

    Traceback (most recent call last):
      File "train.py", line 88, in <module>
        train_model(dataroot)
      File "train.py", line 28, in train_model
        seed = np.random.randint(2 ** 32)
      File "mtrand.pyx", line 745, in numpy.random.mtrand.RandomState.randint
      File "_bounded_integers.pyx", line 1343, in numpy.random._bounded_integers._rand_int32
    ValueError: high is out of bounds for int32
    
    opened by RafailFridman 0
Owner
Omer Bar Tal
Omer Bar Tal
A PyTorch Implementation of ViT (Vision Transformer)

ViT - Vision Transformer This is an implementation of ViT - Vision Transformer by Google Research Team through the paper "An Image is Worth 16x16 Word

Quan Nguyen 7 May 11, 2022
PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

MoCo v3 for Self-supervised ResNet and ViT Introduction This is a PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT. The original M

Facebook Research 887 Jan 8, 2023
PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

MAE for Self-supervised ViT Introduction This is an unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-sup

null 36 Oct 30, 2022
Official implement of Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer This repository contains the PyTorch code for Evo-ViT. This work proposes a slow-fas

YifanXu 53 Dec 5, 2022
Official pytorch code for SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal

SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal This is the official pytorch code for SSAT: A Symmetric Semantic-

ForeverPupil 57 Dec 13, 2022
Implementing Vision Transformer (ViT) in PyTorch

Lightning-Hydra-Template A clean and scalable template to kickstart your deep learning project ?? ⚡ ?? Click on Use this template to initialize new re

null 2 Dec 24, 2021
DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control

DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control One version of our system is implemented using the

null 260 Nov 28, 2022
[CVPR'21] DeepSurfels: Learning Online Appearance Fusion

DeepSurfels: Learning Online Appearance Fusion Paper | Video | Project Page This is the official implementation of the CVPR 2021 submission DeepSurfel

Online Reconstruction 52 Nov 14, 2022
Unified tracking framework with a single appearance model

Paper: Do different tracking tasks require different appearance model? [ArXiv] (comming soon) [Project Page] (comming soon) UniTrack is a simple and U

ZhongdaoWang 300 Dec 24, 2022
Canonical Appearance Transformations

CAT-Net: Learning Canonical Appearance Transformations Code to accompany our paper "How to Train a CAT: Learning Canonical Appearance Transformations

STARS Laboratory 54 Dec 24, 2022
Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance

Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance Project Page | Paper | Data This repository contains an implementatio

Lior Yariv 521 Dec 30, 2022
A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squares.

W.I.P-Aim-Memory-Game A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squar

dE_soot 1 Dec 8, 2021
SLAMP: Stochastic Latent Appearance and Motion Prediction

SLAMP: Stochastic Latent Appearance and Motion Prediction Official implementation of the paper SLAMP: Stochastic Latent Appearance and Motion Predicti

Kaan Akan 34 Dec 8, 2022
So-ViT: Mind Visual Tokens for Vision Transformer

So-ViT: Mind Visual Tokens for Vision Transformer        Introduction This repository contains the source code under PyTorch framework and models trai

Jiangtao Xie 44 Nov 24, 2022
This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

null 75 Dec 2, 2022
A simple approach to emable dense segmentation with ViT.

Vision Transformer Segmentation Network This implementation of ViT in pytorch uses a super simple and straight-forward way of generating an output of

HReynaud 5 Jan 3, 2023
A simple program for training and testing vit

Vit This is a simple program for training and testing vit. Key requirements: torch, torchvision and timm. Dataset I put 5 categories of the cub classi

xiezhenyu 2 Oct 11, 2022
This project uses ViT to perform image classification tasks on DATA set CIFAR10.

Vision-Transformer-Multiprocess-DistributedDataParallel-Apex Introduction This project uses ViT to perform image classification tasks on DATA set CIFA

Kaicheng Yang 3 Jun 3, 2022
vit for few-shot classification

Few-Shot ViT Requirements PyTorch (>= 1.9) TorchVision timm (latest) einops tqdm numpy scikit-learn scipy argparse tensorboardx Pretrained Checkpoints

Martin Dong 26 Nov 30, 2022