BalaGAN: Image Translation Between Imbalanced Domains via Cross-Modal Transfer

Last update: Dec 6, 2022

Related tags

Deep Learning BalaGAN

Overview

BalaGAN: Image Translation Between Imbalanced Domains via Cross-Modal Transfer

Project Page | Paper | Video

State-of-the-art image-to-image translation methods tend to struggle in an imbalanced domain setting, where one image domain lacks richness and diversity. We introduce a new unsupervised translation network, BalaGAN, specifically designed to tackle the domain imbalance problem. We leverage the latent modalities of the richer domain to turn the image-to-image translation problem, between two imbalanced domains, into a balanced, multi-class, and conditional translation problem, more resembling the style transfer setting. Specifically, we analyze the source domain and learn a decomposition of it into a set of latent modes or classes, without any supervision. This leaves us with a multitude of balanced cross-domain translation tasks, between all pairs of classes, including the target domain. During inference, the trained network takes as input a source image, as well as a reference or style image from one of the modes as a condition, and produces an image which resembles the source on the pixel-wise level, but shares the same mode as the reference. We show that employing modalities within the dataset improves the quality of the translated images, and that BalaGAN outperforms strong baselines of both unconditioned and style-transfer-based image-to-image translation methods, in terms of image quality and diversity.

Prerequisites

Linux (may work on windows and macOS but was not tested)
cuda 10.1
Anaconda3
pytorch (tested on >=1.5.0)
tensorboardX
faiss-gpu
opencv-python

Training

Data Preparation

A dataset directory should have the following structure:

dataset
├── train
│   ├── A
│   └── B
└── test
    ├── A
    └── B

where A is the source domain, and B is the target domain.

Train

The main training script is train.py. It receives several command line arguments, for more details please the file. The most important argument is a path to a config file. An example for such a file is provided in configs/dog2wolf.yaml

Tracking The Training

For each experiment, a dedicated directory is created, and all the outputs are saved there. An experiment directory contains the following:

logs directory with a tensorboard file which contains the losses along the training, and images produced by the model.
images directory, in which the images are saved as files.
checkpoints directory in which checkpoints are saved along the training.

We highly recommend using trains to track experiments!

Resume An Experiment

To resume an experiment, provide the --resume flag to the main training script. When providing this flag, the state of the latest experiment with the same --exp_name is loaded.

Pretrained Models

Coming soon...

Citation

If you use this code for your research, please cite our paper BalaGAN: Image Translation Between Imbalanced Domains via Cross-Modal Transfer

@article{patashnik2020balagan,
      title={BalaGAN: Image Translation Between Imbalanced Domains via Cross-Modal Transfer}, 
      author={Or Patashnik and Dov Danon and Hao Zhang and Daniel Cohen-Or},
      journal={arXiv preprint arXiv:2010.02036},
      year={2020}
}

You might also like...

Cross-modal Deep Face Normals with Deactivable Skip Connections

Cross-modal Deep Face Normals with Deactivable Skip Connections Victoria Fernández Abrevaya*, Adnane Boukhayma*, Philip H. S. Torr, Edmond Boyer (*Equ

72 Nov 27, 2022

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021 Official Pytorch implementation of PCME | Paper Sanghyuk Chun1 Seong Joon Oh1 Rafael Sampaio de R

87 Dec 21, 2022

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

RGBT Crowd Counting Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a L

37 Dec 8, 2022

BalaGAN: Image Translation Between Imbalanced Domains via Cross-Modal Transfer

Related tags

Overview

BalaGAN: Image Translation Between Imbalanced Domains via Cross-Modal Transfer

Project Page | Paper | Video

Prerequisites

Training

Data Preparation

Train

Tracking The Training

Resume An Experiment

Pretrained Models

Citation

You might also like...

Cross-modal Deep Face Normals with Deactivable Skip Connections

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

X-modaler is a versatile and high-performance codebase for cross-modal analytics.

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

a morph transfer UGATIT for image translation.

Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

Owner

Code for Referring Image Segmentation via Cross-Modal Progressive Comprehension, CVPR2020.

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Cross-Modal Contrastive Learning for Text-to-Image Generation

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

《Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis》(2021)