Toward Multimodal Image-to-Image Translation

Overview





BicycleGAN

Project Page | Paper | Video

Pytorch implementation for multimodal image-to-image translation. For example, given the same night image, our model is able to synthesize possible day images with different types of lighting, sky and clouds. The training requires paired data.

Note: The current software works well with PyTorch 0.41+. Check out the older branch that supports PyTorch 0.1-0.3.

Toward Multimodal Image-to-Image Translation.
Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman.
UC Berkeley and Adobe Research
In Neural Information Processing Systems, 2017.

Example results

Other Implementations

Prerequisites

  • Linux or macOS
  • Python 3
  • CPU or NVIDIA GPU + CUDA CuDNN

Getting Started

Installation

  • Clone this repo:
git clone -b master --single-branch https://github.com/junyanz/BicycleGAN.git
cd BicycleGAN

For pip users:

bash ./scripts/install_pip.sh

For conda users:

bash ./scripts/install_conda.sh

Use a Pre-trained Model

  • Download some test photos (e.g., edges2shoes):
bash ./datasets/download_testset.sh edges2shoes
  • Download a pre-trained model (e.g., edges2shoes):
bash ./pretrained_models/download_model.sh edges2shoes
  • Generate results with the model
bash ./scripts/test_edges2shoes.sh

The test results will be saved to a html file here: ./results/edges2shoes/val/index.html.

  • Generate results with synchronized latent vectors
bash ./scripts/test_edges2shoes.sh --sync

Results can be found at ./results/edges2shoes/val_sync/index.html.

Generate Morphing Videos

  • We can also produce a morphing video similar to this GIF and Youtube video.
bash ./scripts/video_edges2shoes.sh

Results can be found at ./videos/edges2shoes/.

Model Training

  • To train a model, download the training images (e.g., edges2shoes).
bash ./datasets/download_dataset.sh edges2shoes
  • Train a model:
bash ./scripts/train_edges2shoes.sh
  • To view training results and loss plots, run python -m visdom.server and click the URL http://localhost:8097. To see more intermediate results, check out ./checkpoints/edges2shoes_bicycle_gan/web/index.html
  • See more training details for other datasets in ./scripts/train.sh.

Datasets (from pix2pix)

Download the datasets using the following script. Many of the datasets are collected by other researchers. Please cite their papers if you use the data.

  • Download the testset.
bash ./datasets/download_testset.sh dataset_name
  • Download the training and testset.
bash ./datasets/download_dataset.sh dataset_name

Models

Download the pre-trained models with the following script.

bash ./pretrained_models/download_model.sh model_name
  • edges2shoes (edge -> photo) trained on UT Zappos50K dataset.
  • edges2handbags (edge -> photo) trained on Amazon handbags images..
bash ./pretrained_models/download_model.sh edges2handbags
bash ./datasets/download_testset.sh edges2handbags
bash ./scripts/test_edges2handbags.sh
  • night2day (nighttime scene -> daytime scene) trained on around 100 webcams.
bash ./pretrained_models/download_model.sh night2day
bash ./datasets/download_testset.sh night2day
bash ./scripts/test_night2day.sh
  • facades (facade label -> facade photo) trained on the CMP Facades dataset.
bash ./pretrained_models/download_model.sh facades
bash ./datasets/download_testset.sh facades
bash ./scripts/test_facades.sh
  • maps (map photo -> aerial photo) trained on 1096 training images scraped from Google Maps.
bash ./pretrained_models/download_model.sh maps
bash ./datasets/download_testset.sh maps
bash ./scripts/test_maps.sh

Metrics

Figure 6 shows realism vs diversity of our method.

  • Realism We use the Amazon Mechanical Turk (AMT) Real vs Fake test from this repository, first introduced in this work.

  • Diversity For each input image, we produce 20 translations by randomly sampling 20 z vectors. We compute LPIPS distance between consecutive pairs to get 19 paired distances. You can compute this by putting the 20 images into a directory and using this script (note that we used version 0.0 rather than default 0.1, so use flag -v 0.0). This is done for 100 input images. This results in 1900 total distances (100 images X 19 paired distances each), which are averaged together. A larger number means higher diversity.

Citation

If you find this useful for your research, please use the following.

@inproceedings{zhu2017toward,
  title={Toward multimodal image-to-image translation},
  author={Zhu, Jun-Yan and Zhang, Richard and Pathak, Deepak and Darrell, Trevor and Efros, Alexei A and Wang, Oliver and Shechtman, Eli},
  booktitle={Advances in Neural Information Processing Systems},
  year={2017}
}

If you use modules from CycleGAN or pix2pix paper, please use the following:

@inproceedings{CycleGAN2017,
  title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networkss},
  author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
  booktitle={Computer Vision (ICCV), 2017 IEEE International Conference on},
  year={2017}
}


@inproceedings{isola2017image,
  title={Image-to-Image Translation with Conditional Adversarial Networks},
  author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
  booktitle={Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on},
  year={2017}
}

Acknowledgements

This code borrows heavily from the pytorch-CycleGAN-and-pix2pix repository.

Comments
  • skip this point data_size = 1

    skip this point data_size = 1

    Dear sir,When I run the script:bash ./scripts/train_edges2shoes.sh,the following RuntimeError occurs. (epoch: 1, iters: 49400, time: 0.311) , z_encoded_mag: 0.577, G_total: 4.293, G_L1_encoded: 2.367, z_L1: 0.259, KL: 0.076, G_GAN: 1.002, D_GAN: 0.498, G_GAN2: 0.589, D_GAN2: 0.988 (epoch: 1, iters: 49600, time: 0.322) , z_encoded_mag: 0.409, G_total: 2.001, G_L1_encoded: 0.385, z_L1: 0.302, KL: 0.069, G_GAN: 0.794, D_GAN: 0.960, G_GAN2: 0.450, D_GAN2: 1.137 (epoch: 1, iters: 49800, time: 0.311) , z_encoded_mag: 0.939, G_total: 3.441, G_L1_encoded: 1.597, z_L1: 0.373, KL: 0.079, G_GAN: 0.833, D_GAN: 0.774, G_GAN2: 0.560, D_GAN2: 1.015 skip this point data_size = 1 Traceback (most recent call last): File "./train.py", line 28, in model.update_G() File "/home/rharad/junyanz/BicycleGAN/models/bicycle_gan_model.py", line 148, in update_G self.backward_EG() File "/home/rharad/junyanz/BicycleGAN/models/bicycle_gan_model.py", line 114, in backward_EG self.loss_G.backward(retain_graph=True) File "/home/rharad/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables) File "/home/rharad/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward variables, grad_variables, retain_graph) RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

    How can I solve this problem.....

    opened by rharadgithub 15
  • An Error in train or test  `RuntimeError: output with shape [1, 256, 256] doesn't match the broadcast shape [3, 256, 256]`

    An Error in train or test `RuntimeError: output with shape [1, 256, 256] doesn't match the broadcast shape [3, 256, 256]`

    Hello! During my using time,i found a confused bug,that is
    RuntimeError: output with shape [1, 256, 256] doesn't match the broadcast shape [3, 256, 256] I don`t known why this error appearance, and can you help me to fix it? whatever,i should thank you for you great project!

    opened by shazhongcheng 13
  • How to reproduce the data of LPIPS distance?

    How to reproduce the data of LPIPS distance?

    Your work gets surprising results and I expect to reproduce the data of LPIPS distance that you list in Figure6. Given one input image, you sample 19 outputs. For every input(maps), do you calculate the LPIPS distance between the given image(maps) and corresponding 19 samples(satellite) ? After that, you sum those 19 groups of data and have a average? Is it the same to other 99 input images in your experiment ? I'm confused about this and looking forward to your reply, thank you!

    opened by WorkingCC 10
  • Learn diversity

    Learn diversity

    Hi Junyan,

    Thanks for your impressive work. Recently, I apply BicycleGAN-like framework into my GAN model, hoping it learn to generate diverse results. However, my model seems not sensitve to the latent code at all. The outputs look almost same with differnt latent code, both over training and validation stages, even I enlarge the weights of KL-loss and l1-loss on latent code. My KL_loss is becoming very small during training, but the l1-loss can only be converaged to around 0.7. Do you remember what are converaged values of these losses during your training? And kindly do you have any idea or advice of it? Thanks you so much!

    Looking forward to your reply.

    Best, Lai

    opened by remega 9
  • Issue while running test.py

    Issue while running test.py

    Hi, I am getting the following error when running the code. I am using python3.6 and have installed the latest pytorch version(1.0). screenshot from 2019-03-01 19-35-51

    Could you suggest the issue involved ?

    opened by a7b23 8
  • narrow tensor to zero elements fails

    narrow tensor to zero elements fails

    When ran the code on a single GPU, I got an error:

    ...
    RuntimeError: start (0) + length (0) exceeds dimension size (0). (narrow at /opt/conda/conda-bld/pytorch_1533672544752/work/aten/src/ATen/native/TensorShape.cpp:157)
    frame #0: at::Type::narrow(at::Tensor const&, long, long, long) const + 0x80 (0x7f5e4df3de80 in /data/yahui/anaconda2/envs/pytorch-CycleGAN-and-pix2pix/lib/python3.5/site-packages/torch/lib/libcaffe2.so)
    ...
    

    I have checked the installation and configuration, both of them are correct. So, do you have any ideas for the problem?

    opened by yhlleo 6
  • about multi-gpu support

    about multi-gpu support

    Hi, thanks for your amazing work. I have a problem with the multi-gpu support. The original batchsize is 2 for a real image and another random. In fact it's one pair. If we want to use multi-gpu for faster training, the batch size should be larger. But simply changing the batchsize causes error. May I know is it easy to modify the code for this purpose?

    opened by pipipopo 6
  • Training on Piano Roll data

    Training on Piano Roll data

    Hey Jun-Yan, thanks for putting this repo together. I'm trying to train it on piano roll data and have been seeing unexpected behavior: the generator outputs the same image even though the conditions, i.e. noise vector and real A, change.

    Any thoughts on what it could be? I've added the loss log, options, output images during training and output images during inference.

    loss_log.txt opt.txt

    Model outputs during training(fake_b_encoded, fake_b_random, real_a_encoded,real_b_encoded) fake_b_encoded fake_b_random real_a_encoded real_b_encoded

    model.set_input(data)
    encode = False
    z_samples = model.get_z_random(1, opt.nz)
    real_A, fake_B, real_B = model.test(z_samples, encode=encode)
    

    Model outputs during inference(real_a, real_b, fake_b)

    iModel output during inference

    opened by rafaelvalle 6
  • About val data

    About val data

    Hi, thanks for your amazing work. I train the network with my own database, it works well. However, I don't have so many data for training(just about 700 images). My question is

    1. How many data should I put in the val folder? Or, the val folder is just for testing the network?
    2. Besides the paired data, I also have some unpaired data, for example, (Many B but lack of A, training direction: A2B), let me know if you are consider an algorithm for such semi-supervised situation.

    Thanks.

    opened by holylone 5
  • Multiple inputs network

    Multiple inputs network

    Hi, Thanks for sharing this amazing work! I have tried to apply BicycleGAN into MRI image translation tasks, and it works well! Now, I am trying to change the network as a multiple inputs network. My idea is by given multiple corresponding inputs, the output will be more realistic and accurate. Do you think this is possible to achieve based on bicycleGAN?

    opened by Grenadee 5
  • out of memory

    out of memory

    when I run command that bash ./scripts/train_edges2shoes.sh I got following error RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/THC/generic/THCStorage.cu:58 how to solve this problem?

    opened by zxt-triumph 5
  • Regarding Latent Space Interpolation

    Regarding Latent Space Interpolation

    Hi @junyanz, While testing the model I would like to have some control over the diversity of the generated images. Is there anyway to interpolate the latent vector to regulate the outputs? Something similar to StyleGAN?

    opened by AvirupJU 0
  • TypeError: __init__() got an unexpected keyword argument 'nl_layer'

    TypeError: __init__() got an unexpected keyword argument 'nl_layer'

    When I define the option "--netD" or "-netD2" as 'basic_128' or 'basic_256', the function "define_D" should use the parameter "nl_layer", but it report an error.

    1657181742947

    The condition judge part of function "define_D" is like this:

    if netD == 'basic_128':
            net = D_NLayers(input_nc, ndf, n_layers=2, norm_layer=norm_layer, nl_layer=nl_layer)
        elif netD == 'basic_256':
            net = D_NLayers(input_nc, ndf, n_layers=3, norm_layer=norm_layer, nl_layer=nl_layer)
        elif netD == 'basic_128_multi':
            net = D_NLayersMulti(input_nc=input_nc, ndf=ndf, n_layers=2, norm_layer=norm_layer, num_D=num_Ds)
        elif netD == 'basic_256_multi':
            net = D_NLayersMulti(input_nc=input_nc, ndf=ndf, n_layers=3, norm_layer=norm_layer, num_D=num_Ds)
        else:
            raise NotImplementedError('Discriminator model name [%s] is not recognized' % net)
        return init_net(net, init_type, init_gain, gpu_ids)
    
    opened by SiriusDanica666 0
  • diversity question

    diversity question

    Hello, the training uses shoe sketches of different thicknesses (half each), and it is found that the diversity of the generated images is reduced. Do you have any suggestions to improve the diversity?

    opened by ZhiHong-w 0
  • How to train on large images?

    How to train on large images?

    I change the load_size and crop_size to 512 in train_facades.sh, and get error. I want to train on large image size, how to do that?

    Traceback (most recent call last):
      File "./train.py", line 48, in <module>
        model.optimize_parameters()   # calculate loss functions, get gradients, update network weights
      File "gxl/BicycleGAN/models/bicycle_gan_model.py", line 209, in optimize_parameters
        self.forward()
      File "gxl/BicycleGAN/models/bicycle_gan_model.py", line 106, in forward
        self.z_encoded, self.mu, self.logvar = self.encode(self.real_B_encoded)
      File "gxl/BicycleGAN/models/bicycle_gan_model.py", line 82, in encode
        mu, logvar = self.netE.forward(input_image)
      File "miniconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
        return self.module(*inputs[0], **kwargs[0])
      File "miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "gxl/BicycleGAN/models/networks.py", line 647, in forward
        output = self.fc(conv_flat)
      File "miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "miniconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
        input = module(input)
      File "miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "miniconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 93, in forward
        return F.linear(input, self.weight, self.bias)
      File "miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1690, in linear
        ret = torch.addmm(bias, input, weight.t())
    RuntimeError: mat1 dim 1 must match mat2 dim 0
    
    opened by universewill 0
  • Regarding Training your Own Images

    Regarding Training your Own Images

    Hi Juyanz,

    How are you !hope all is fine .

    I want to train this algo on different images ,I was thinking how to train them on self images , 1)What pre-processing steps do i need to take here .Can you please guide me .? 2)The model is not saving to any directory ,Can you please let know where the trained model saves(for our own images) ?

    opened by kuruvilla2087 4
  • Question: Do you need two separate discriminators?

    Question: Do you need two separate discriminators?

    I get similar-ish results when using the same discriminator for both cVAE-GAN and cLR-GAN. Is this a happy coincidence and a strike of luck? Is there some theoretical backing for having 2 discriminators?

    opened by pfeatherstone 2
Owner
Jun-Yan Zhu
Understanding and creating pixels.
Jun-Yan Zhu
Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches

CoMIR: Contrastive Multimodal Image Representation for Registration Framework ?? Registration of images in different modalities with Deep Learning ??

Methods for Image Data Analysis - MIDA 55 Dec 9, 2022
Toward Spatially Unbiased Generative Models (ICCV 2021)

Toward Spatially Unbiased Generative Models Implementation of Toward Spatially Unbiased Generative Models (ICCV 2021) Overview Recent image generation

Jooyoung Choi 88 Dec 1, 2022
This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices" Introduction This repo is official PyTorch implementatio

Choi Sang Bum 203 Jan 5, 2023
Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)

Table of Content Introduction Getting Started Datasets Installation Experiments Training & Testing Pretrained models Texture fine-tuning Demo Toward R

VinAI Research 42 Dec 5, 2022
Simple Tensorflow implementation of Toward Spatially Unbiased Generative Models (ICCV 2021)

Spatial unbiased GANs — Simple TensorFlow Implementation [Paper] : Toward Spatially Unbiased Generative Models (ICCV 2021) Abstract Recent image gener

Junho Kim 16 Apr 15, 2022
Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

Meta Research 122 Dec 13, 2022
Rethinking the U-Net architecture for multimodal biomedical image segmentation

MultiResUNet Rethinking the U-Net architecture for multimodal biomedical image segmentation This repository contains the original implementation of "M

Nabil Ibtehaz 308 Jan 5, 2023
Multimodal commodity image retrieval 多模态商品图像检索

Multimodal commodity image retrieval 多模态商品图像检索 Not finished yet... introduce explain:The specific description of the project and the product image dat

hongjie 8 Nov 25, 2022
Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

null 47 Dec 16, 2022
This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

null 5 Jan 4, 2023
A Comparative Framework for Multimodal Recommender Systems

Cornac Cornac is a comparative framework for multimodal recommender systems. It focuses on making it convenient to work with models leveraging auxilia

Preferred.AI 671 Jan 3, 2023
This repo provides the official code for TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/pdf/2103.04430.pdf).

TransBTS: Multimodal Brain Tumor Segmentation Using Transformer This repo is the official implementation for TransBTS: Multimodal Brain Tumor Segmenta

Raymond 247 Dec 28, 2022
Deep Multimodal Neural Architecture Search

MMNas: Deep Multimodal Neural Architecture Search This repository corresponds to the PyTorch implementation of the MMnas for visual question answering

Vision and Language Group@ MIL 23 Dec 21, 2022
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

peng 64 Dec 12, 2022
PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos. By adopting a unified pipeline-based API design, PyKale enforces standardization and minimalism, via reusing existing resources, reducing repetitions and redundancy, and recycling learning models across areas.

PyKale 370 Dec 27, 2022
Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

Robotics and Perception Group 69 Dec 26, 2022
MERLOT: Multimodal Neural Script Knowledge Models

merlot MERLOT: Multimodal Neural Script Knowledge Models MERLOT is a model for learning what we are calling "neural script knowledge" -- representatio

Rowan Zellers 190 Dec 22, 2022
Alex Pashevich 62 Dec 24, 2022
Preprocessed Datasets for our Multimodal NER paper

Unified Multimodal Transformer (UMT) for Multimodal Named Entity Recognition (MNER) Two MNER Datasets and Codes for our ACL'2020 paper: Improving Mult

null 76 Dec 21, 2022