MDMM - Learning multi-domain multi-modality I2I translation

Related tags

Deep Learning MDMM
Overview

Multi-Domain Multi-Modality I2I translation

Pytorch implementation of multi-modality I2I translation for multi-domains. The project is an extension to the "Diverse Image-to-Image Translation via Disentangled Representations(https://arxiv.org/abs/1808.00948)", ECCV 2018. With the disentangled representation framework, we can learn diverse image-to-image translation among multiple domains. [DRIT]

Contact: Hsin-Ying Lee ([email protected]) and Hung-Yu Tseng ([email protected])

Example Results

Prerequisites

Usage

  • Training
python train.py --dataroot DATAROOT --name NAME --num_domains NUM_DOMAINS --display_dir DISPLAY_DIR --result_dir RESULT_DIR --isDcontent
  • Testing
python test.py --dataroot DATAROOT --name NAME --num_domains NUM_DOMAINS --out_dir OUT_DIR --resume MODEL_DIR --num NUM_PER_IMG

Datasets

We validate our model on two datasets:

  • art: Containing three domains: real images, Monet images, uki-yoe images. Data can be downloaded from CycleGAN website.
  • weather: Containing four domains: sunny, cloudy, snowy, and foggy. Data is randomly selected from the Image2Weather dataset website.

The different domains in a dataset should be placed in folders "trainA, trainB, ..." in the alphabetical order.

Models

  • The pretrained model on the art dataset
bash ./models/download_model.sh art
  • The pretrained model on the weather dataset
bash ./models/download_model.sh weather

Note

  • The feature transformation (i.e. concat 0) is not fully tested since both art and weather datasets do not require shape variations
  • The hyper-parameters matter and are task-dependent. They are not carefully selected yet.
  • Feel free to contact the author for any potential improvement of the code.

Paper

Diverse Image-to-Image Translation via Disentangled Representations
Hsin-Ying Lee*, Hung-Yu Tseng*, Jia-Bin Huang, Maneesh Kumar Singh, and Ming-Hsuan Yang
European Conference on Computer Vision (ECCV), 2018 (oral) (* equal contribution)

Please cite our paper if you find the code or dataset useful for your research.

@inproceedings{DRIT,
  author = {Lee, Hsin-Ying and Tseng, Hung-Yu and Huang, Jia-Bin and Singh, Maneesh Kumar and Yang, Ming-Hsuan},
  booktitle = {European Conference on Computer Vision},
  title = {Diverse Image-to-Image Translation via Disentangled Representations},
  year = {2018}
}
You might also like...
Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

DKPNet ICCV 2021 Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting Baseline of DKPNet is availa

Code for CVPR2021 "Visualizing Adapted Knowledge in Domain Transfer". Visualization for domain adaptation. #explainable-ai

Visualizing Adapted Knowledge in Domain Transfer @inproceedings{hou2021visualizing, title={Visualizing Adapted Knowledge in Domain Transfer}, auth

[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation

[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation [Paper] Prerequisites To install requirements: pip install -r requirements.txt

Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization This is an official implementation of "Feature Stylization and Domain-

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation [arxiv] This is the official repository for CDTrans: Cross-domain Transformer for

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

[ICCV2021] TransReID: Transformer-based Object Re-Identification [pdf] The official repository for TransReID: Transformer-based Object Re-Identificati

Implementation for
Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"

DSBF Introduction This repository contains the implementation code for paper: Domain-Specific Bias Filtering for Single Labeled Domain Generalization

A Pytorch Implementation of [Source data‐free domain adaptation of object detector through domain

A Pytorch Implementation of Source data‐free domain adaptation of object detector through domain‐specific perturbation Please follow Faster R-CNN and

Code for Emergent Translation in Multi-Agent Communication

Emergent Translation in Multi-Agent Communication PyTorch implementation of the models described in the paper Emergent Translation in Multi-Agent Comm

Comments
  • normalization absence

    normalization absence

    Hello! I found a following thing in LeakyReLUConv2d:

    class LeakyReLUConv2d(nn.Module):
      def __init__(self, ..., norm='None', ...):
        ....
        if 'norm' == 'Instance': 
          model += [nn.InstanceNorm2d(n_out, affine=False)]
        ...
    

    https://github.com/HsinYingLee/MDMM/blob/master/networks.py#L362

    It seems that normalization is never applied in LeakyReLUConv2d block.

    Does it affect the model performance, as LeakyReLUConv2d present in MultiDomain Encoder and Discriminators?

    Are the best results reported in paper are gained with turned on InstanceNormalization?

    Best Regards, Aleksei Silvestrov

    opened by cohimame 0
  • som question about the code

    som question about the code

    Hi, I like this work very much and I think it is really cool. But when I tried to use an image in the target domain to decide the attribute of the results, I found the function “test_forward_transfer(self, image, image_trg, c_trg)” in the file named "model.py" requires two arguments "image_trg" and "c_trg", but instead of using these two arguments, the function use "self.image_trg" and "self.c_trg", which wasn't defined in corresponding class "MD_multi()". I don't kown how to trackle this problem, so I take the liberty to ask you for help. Thanks

    opened by ForawardStar 0
Owner
Hsin-Ying Lee
Hsin-Ying Lee
GluonMM is a library of transformer models for computer vision and multi-modality research

GluonMM is a library of transformer models for computer vision and multi-modality research. It contains reference implementations of widely adopted baseline models and also research work from Amazon Research.

null 42 Dec 2, 2022
Code of paper Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification.

Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification We provide the codes for repr

null 12 Dec 12, 2022
《LXMERT: Learning Cross-Modality Encoder Representations from Transformers》(EMNLP 2020)

The Most Important Thing. Our code is developed based on: LXMERT: Learning Cross-Modality Encoder Representations from Transformers

null 53 Dec 16, 2022
[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning.

DeepVecFont This is the homepage for "DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning". Yizhi Wang and Zhouhui Lian. WI

Yizhi Wang 17 Dec 22, 2022
[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning.

DeepVecFont This is the homepage for "DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning". Yizhi Wang and Zhouhui Lian. WI

Yizhi Wang 5 Oct 22, 2021
MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieva

Introduction This is the source code of our TCSVT 2021 paper "MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieval". Ple

null 7 Aug 24, 2022
MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space

Update (20 Jan 2020): MODALS on text data is avialable MODALS MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space Table of Conte

null 38 Dec 15, 2022
CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification (ICCV2021)

CM-NAS Official Pytorch code of paper CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification in ICCV2021. Vis

JDAI-CV 40 Nov 25, 2022
PyTorch implementation of the cross-modality generative model that synthesizes dance from music.

Dancing to Music PyTorch implementation of the cross-modality generative model that synthesizes dance from music. Paper Hsin-Ying Lee, Xiaodong Yang,

NVIDIA Research Projects 485 Dec 26, 2022
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

Unified Multi-modal Transformers This repository maintains the official implementation of the paper UMT: Unified Multi-modal Transformers for Joint Vi

Applied Research Center (ARC), Tencent PCG 84 Jan 4, 2023