Toward Multimodal Image-to-Image Translation

Jun-Yan Zhu

Last update: Dec 22, 2022

Related tags

Overview

BicycleGAN

Pytorch implementation for multimodal image-to-image translation. For example, given the same night image, our model is able to synthesize possible day images with different types of lighting, sky and clouds. The training requires paired data.

Note: The current software works well with PyTorch 0.41+. Check out the older branch that supports PyTorch 0.1-0.3.

Toward Multimodal Image-to-Image Translation.
Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman.
UC Berkeley and Adobe Research
In Neural Information Processing Systems, 2017.

Example results

Other Implementations

[Tensorflow] by Youngwoon Lee (USC CLVR Lab).
[Tensorflow] by Kv Manohar.

Prerequisites

Linux or macOS
Python 3
CPU or NVIDIA GPU + CUDA CuDNN

Getting Started

Installation

Clone this repo:

git clone -b master --single-branch https://github.com/junyanz/BicycleGAN.git
cd BicycleGAN

Install PyTorch and dependencies from http://pytorch.org
Install python libraries visdom, dominate, and moviepy.

For pip users:

bash ./scripts/install_pip.sh

For conda users:

bash ./scripts/install_conda.sh

Use a Pre-trained Model

Download some test photos (e.g., edges2shoes):

bash ./datasets/download_testset.sh edges2shoes

Download a pre-trained model (e.g., edges2shoes):

bash ./pretrained_models/download_model.sh edges2shoes

Generate results with the model

bash ./scripts/test_edges2shoes.sh

The test results will be saved to a html file here: ./results/edges2shoes/val/index.html.

Generate results with synchronized latent vectors

bash ./scripts/test_edges2shoes.sh --sync

Results can be found at ./results/edges2shoes/val_sync/index.html.

Generate Morphing Videos

We can also produce a morphing video similar to this GIF and Youtube video.

bash ./scripts/video_edges2shoes.sh

Results can be found at ./videos/edges2shoes/.

Model Training

To train a model, download the training images (e.g., edges2shoes).

bash ./datasets/download_dataset.sh edges2shoes

Train a model:

bash ./scripts/train_edges2shoes.sh

To view training results and loss plots, run python -m visdom.server and click the URL http://localhost:8097. To see more intermediate results, check out ./checkpoints/edges2shoes_bicycle_gan/web/index.html
See more training details for other datasets in ./scripts/train.sh.

Datasets (from pix2pix)

Download the datasets using the following script. Many of the datasets are collected by other researchers. Please cite their papers if you use the data.

Download the testset.

bash ./datasets/download_testset.sh dataset_name

Download the training and testset.

bash ./datasets/download_dataset.sh dataset_name

facades: 400 images from CMP Facades dataset. [Citation]
maps: 1096 training images scraped from Google Maps
edges2shoes: 50k training images from UT Zappos50K dataset. Edges are computed by HED edge detector + post-processing. [Citation]
edges2handbags: 137K Amazon Handbag images from iGAN project. Edges are computed by HED edge detector + post-processing. [Citation]
night2day: around 20K natural scene images from Transient Attributes dataset [Citation]

Models

Download the pre-trained models with the following script.

bash ./pretrained_models/download_model.sh model_name

edges2shoes (edge -> photo) trained on UT Zappos50K dataset.
edges2handbags (edge -> photo) trained on Amazon handbags images..

bash ./pretrained_models/download_model.sh edges2handbags
bash ./datasets/download_testset.sh edges2handbags
bash ./scripts/test_edges2handbags.sh

night2day (nighttime scene -> daytime scene) trained on around 100 webcams.

bash ./pretrained_models/download_model.sh night2day
bash ./datasets/download_testset.sh night2day
bash ./scripts/test_night2day.sh

facades (facade label -> facade photo) trained on the CMP Facades dataset.

bash ./pretrained_models/download_model.sh facades
bash ./datasets/download_testset.sh facades
bash ./scripts/test_facades.sh

maps (map photo -> aerial photo) trained on 1096 training images scraped from Google Maps.

bash ./pretrained_models/download_model.sh maps
bash ./datasets/download_testset.sh maps
bash ./scripts/test_maps.sh

Metrics

Figure 6 shows realism vs diversity of our method.

Realism We use the Amazon Mechanical Turk (AMT) Real vs Fake test from this repository, first introduced in this work.
Diversity For each input image, we produce 20 translations by randomly sampling 20 z vectors. We compute LPIPS distance between consecutive pairs to get 19 paired distances. You can compute this by putting the 20 images into a directory and using this script (note that we used version 0.0 rather than default 0.1, so use flag -v 0.0). This is done for 100 input images. This results in 1900 total distances (100 images X 19 paired distances each), which are averaged together. A larger number means higher diversity.

Citation

If you find this useful for your research, please use the following.

@inproceedings{zhu2017toward,
  title={Toward multimodal image-to-image translation},
  author={Zhu, Jun-Yan and Zhang, Richard and Pathak, Deepak and Darrell, Trevor and Efros, Alexei A and Wang, Oliver and Shechtman, Eli},
  booktitle={Advances in Neural Information Processing Systems},
  year={2017}
}

If you use modules from CycleGAN or pix2pix paper, please use the following:

@inproceedings{CycleGAN2017,
  title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networkss},
  author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
  booktitle={Computer Vision (ICCV), 2017 IEEE International Conference on},
  year={2017}
}


@inproceedings{isola2017image,
  title={Image-to-Image Translation with Conditional Adversarial Networks},
  author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
  booktitle={Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on},
  year={2017}
}

Acknowledgements

This code borrows heavily from the pytorch-CycleGAN-and-pix2pix repository.

Comments

skip this point data_size = 1

Dear sir,When I run the script:bash ./scripts/train_edges2shoes.sh,the following RuntimeError occurs. (epoch: 1, iters: 49400, time: 0.311) , z_encoded_mag: 0.577, G_total: 4.293, G_L1_encoded: 2.367, z_L1: 0.259, KL: 0.076, G_GAN: 1.002, D_GAN: 0.498, G_GAN2: 0.589, D_GAN2: 0.988 (epoch: 1, iters: 49600, time: 0.322) , z_encoded_mag: 0.409, G_total: 2.001, G_L1_encoded: 0.385, z_L1: 0.302, KL: 0.069, G_GAN: 0.794, D_GAN: 0.960, G_GAN2: 0.450, D_GAN2: 1.137 (epoch: 1, iters: 49800, time: 0.311) , z_encoded_mag: 0.939, G_total: 3.441, G_L1_encoded: 1.597, z_L1: 0.373, KL: 0.079, G_GAN: 0.833, D_GAN: 0.774, G_GAN2: 0.560, D_GAN2: 1.015 skip this point data_size = 1 Traceback (most recent call last): File "./train.py", line 28, in model.update_G() File "/home/rharad/junyanz/BicycleGAN/models/bicycle_gan_model.py", line 148, in update_G self.backward_EG() File "/home/rharad/junyanz/BicycleGAN/models/bicycle_gan_model.py", line 114, in backward_EG self.loss_G.backward(retain_graph=True) File "/home/rharad/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables) File "/home/rharad/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward variables, grad_variables, retain_graph) RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

How can I solve this problem.....

opened by rharadgithub 15
An Error in train or test `RuntimeError: output with shape [1, 256, 256] doesn't match the broadcast shape [3, 256, 256]`

Hello! During my using time，i found a confused bug，that is
RuntimeError: output with shape [1, 256, 256] doesn't match the broadcast shape [3, 256, 256] I don`t known why this error appearance, and can you help me to fix it？ whatever，i should thank you for you great project！

opened by shazhongcheng 13
How to reproduce the data of LPIPS distance?

Your work gets surprising results and I expect to reproduce the data of LPIPS distance that you list in Figure6. Given one input image, you sample 19 outputs. For every input(maps), do you calculate the LPIPS distance between the given image(maps) and corresponding 19 samples(satellite) ? After that, you sum those 19 groups of data and have a average? Is it the same to other 99 input images in your experiment ? I'm confused about this and looking forward to your reply, thank you!

opened by WorkingCC 10
Learn diversity

Hi Junyan,

Thanks for your impressive work. Recently, I apply BicycleGAN-like framework into my GAN model, hoping it learn to generate diverse results. However, my model seems not sensitve to the latent code at all. The outputs look almost same with differnt latent code, both over training and validation stages, even I enlarge the weights of KL-loss and l1-loss on latent code. My KL_loss is becoming very small during training, but the l1-loss can only be converaged to around 0.7. Do you remember what are converaged values of these losses during your training? And kindly do you have any idea or advice of it? Thanks you so much！

Looking forward to your reply.

Best, Lai

opened by remega 9
Issue while running test.py

Hi, I am getting the following error when running the code. I am using python3.6 and have installed the latest pytorch version(1.0).

Could you suggest the issue involved ?

opened by a7b23 8

narrow tensor to zero elements fails

When ran the code on a single GPU, I got an error:

...
RuntimeError: start (0) + length (0) exceeds dimension size (0). (narrow at /opt/conda/conda-bld/pytorch_1533672544752/work/aten/src/ATen/native/TensorShape.cpp:157)
frame #0: at::Type::narrow(at::Tensor const&, long, long, long) const + 0x80 (0x7f5e4df3de80 in /data/yahui/anaconda2/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.5/site-packages/torch/lib/libcaffe2.so)
...

I have checked the installation and configuration, both of them are correct. So, do you have any ideas for the problem?

opened by yhlleo 6

about multi-gpu support

Hi, thanks for your amazing work. I have a problem with the multi-gpu support. The original batchsize is 2 for a real image and another random. In fact it's one pair. If we want to use multi-gpu for faster training, the batch size should be larger. But simply changing the batchsize causes error. May I know is it easy to modify the code for this purpose?

opened by pipipopo 6
Training on Piano Roll data
Hey Jun-Yan, thanks for putting this repo together. I'm trying to train it on piano roll data and have been seeing unexpected behavior: the generator outputs the same image even though the conditions, i.e. noise vector and real A, change.

Any thoughts on what it could be? I've added the loss log, options, output images during training and output images during inference.

loss_log.txt opt.txt

Model outputs during training(fake_b_encoded, fake_b_random, real_a_encoded,real_b_encoded)

model.set_input(data) encode = False z_samples = model.get_z_random(1, opt.nz) real_A, fake_B, real_B = model.test(z_samples, encode=encode)

Model outputs during inference(real_a, real_b, fake_b)
opened by rafaelvalle 6
About val data
Hi, thanks for your amazing work. I train the network with my own database, it works well. However, I don't have so many data for training(just about 700 images). My question is

How many data should I put in the val folder? Or, the val folder is just for testing the network?

Besides the paired data, I also have some unpaired data, for example, (Many B but lack of A, training direction: A2B), let me know if you are consider an algorithm for such semi-supervised situation.

Thanks.
opened by holylone 5
Multiple inputs network

Hi, Thanks for sharing this amazing work! I have tried to apply BicycleGAN into MRI image translation tasks, and it works well! Now, I am trying to change the network as a multiple inputs network. My idea is by given multiple corresponding inputs, the output will be more realistic and accurate. Do you think this is possible to achieve based on bicycleGAN?

opened by Grenadee 5
out of memory

when I run command that bash ./scripts/train_edges2shoes.sh I got following error RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/THC/generic/THCStorage.cu:58 how to solve this problem?

opened by zxt-triumph 5
Regarding Latent Space Interpolation

Hi @junyanz, While testing the model I would like to have some control over the diversity of the generated images. Is there anyway to interpolate the latent vector to regulate the outputs? Something similar to StyleGAN?

opened by AvirupJU 0

TypeError: init() got an unexpected keyword argument 'nl_layer'

When I define the option "--netD" or "-netD2" as 'basic_128' or 'basic_256', the function "define_D" should use the parameter "nl_layer", but it report an error.

The condition judge part of function "define_D" is like this:

if netD == 'basic_128':
        net = D_NLayers(input_nc, ndf, n_layers=2, norm_layer=norm_layer, nl_layer=nl_layer)
    elif netD == 'basic_256':
        net = D_NLayers(input_nc, ndf, n_layers=3, norm_layer=norm_layer, nl_layer=nl_layer)
    elif netD == 'basic_128_multi':
        net = D_NLayersMulti(input_nc=input_nc, ndf=ndf, n_layers=2, norm_layer=norm_layer, num_D=num_Ds)
    elif netD == 'basic_256_multi':
        net = D_NLayersMulti(input_nc=input_nc, ndf=ndf, n_layers=3, norm_layer=norm_layer, num_D=num_Ds)
    else:
        raise NotImplementedError('Discriminator model name [%s] is not recognized' % net)
    return init_net(net, init_type, init_gain, gpu_ids)

opened by SiriusDanica666 0

diversity question

Hello, the training uses shoe sketches of different thicknesses (half each), and it is found that the diversity of the generated images is reduced. Do you have any suggestions to improve the diversity?

opened by ZhiHong-w 0

How to train on large images?

I change the load_size and crop_size to 512 in train_facades.sh, and get error. I want to train on large image size, how to do that?

Traceback (most recent call last):
  File "./train.py", line 48, in <module>
    model.optimize_parameters()   # calculate loss functions, get gradients, update network weights
  File "gxl/BicycleGAN/models/bicycle_gan_model.py", line 209, in optimize_parameters
    self.forward()
  File "gxl/BicycleGAN/models/bicycle_gan_model.py", line 106, in forward
    self.z_encoded, self.mu, self.logvar = self.encode(self.real_B_encoded)
  File "gxl/BicycleGAN/models/bicycle_gan_model.py", line 82, in encode
    mu, logvar = self.netE.forward(input_image)
  File "miniconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "gxl/BicycleGAN/models/networks.py", line 647, in forward
    output = self.fc(conv_flat)
  File "miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "miniconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "miniconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1690, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0

opened by universewill 0

Regarding Training your Own Images

Hi Juyanz,

How are you !hope all is fine .

I want to train this algo on different images ,I was thinking how to train them on self images , 1)What pre-processing steps do i need to take here .Can you please guide me .? 2)The model is not saving to any directory ,Can you please let know where the trained model saves(for our own images) ?

opened by kuruvilla2087 4
Question: Do you need two separate discriminators?

I get similar-ish results when using the same discriminator for both cVAE-GAN and cLR-GAN. Is this a happy coincidence and a strike of luck? Is there some theoretical backing for having 2 discriminators?

opened by pfeatherstone 2

Owner

Jun-Yan Zhu

Understanding and creating pixels.

GitHub https://junyanz.github.io/BicycleGAN/

Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches

CoMIR: Contrastive Multimodal Image Representation for Registration Framework ?? Registration of images in different modalities with Deep Learning ??

55 Dec 9, 2022

Toward Spatially Unbiased Generative Models (ICCV 2021)

Toward Spatially Unbiased Generative Models Implementation of Toward Spatially Unbiased Generative Models (ICCV 2021) Overview Recent image generation

88 Dec 1, 2022

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices" Introduction This repo is official PyTorch implementatio

203 Jan 5, 2023

Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)

Table of Content Introduction Getting Started Datasets Installation Experiments Training & Testing Pretrained models Texture fine-tuning Demo Toward R

42 Dec 5, 2022

Simple Tensorflow implementation of Toward Spatially Unbiased Generative Models (ICCV 2021)

Spatial unbiased GANs — Simple TensorFlow Implementation [Paper] : Toward Spatially Unbiased Generative Models (ICCV 2021) Abstract Recent image gener

16 Apr 15, 2022

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

122 Dec 13, 2022

Rethinking the U-Net architecture for multimodal biomedical image segmentation

MultiResUNet Rethinking the U-Net architecture for multimodal biomedical image segmentation This repository contains the original implementation of "M

308 Jan 5, 2023

Multimodal commodity image retrieval 多模态商品图像检索

Multimodal commodity image retrieval 多模态商品图像检索 Not finished yet... introduce explain:The specific description of the project and the product image dat

8 Nov 25, 2022

Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

47 Dec 16, 2022

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

5 Jan 4, 2023

A Comparative Framework for Multimodal Recommender Systems

Cornac Cornac is a comparative framework for multimodal recommender systems. It focuses on making it convenient to work with models leveraging auxilia

671 Jan 3, 2023

This repo provides the official code for TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/pdf/2103.04430.pdf).

TransBTS: Multimodal Brain Tumor Segmentation Using Transformer This repo is the official implementation for TransBTS: Multimodal Brain Tumor Segmenta

247 Dec 28, 2022

Deep Multimodal Neural Architecture Search

MMNas: Deep Multimodal Neural Architecture Search This repository corresponds to the PyTorch implementation of the MMnas for visual question answering

23 Dec 21, 2022

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

64 Dec 12, 2022

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos. By adopting a unified pipeline-based API design, PyKale enforces standardization and minimalism, via reusing existing resources, reducing repetitions and redundancy, and recycling learning models across areas.

370 Dec 27, 2022

Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

69 Dec 26, 2022

MERLOT: Multimodal Neural Script Knowledge Models

merlot MERLOT: Multimodal Neural Script Knowledge Models MERLOT is a model for learning what we are calling "neural script knowledge" -- representatio

190 Dec 22, 2022

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

Preprocessed Datasets for our Multimodal NER paper

Unified Multimodal Transformer (UMT) for Multimodal Named Entity Recognition (MNER) Two MNER Datasets and Codes for our ACL'2020 paper: Improving Mult

76 Dec 21, 2022