CoMoGAN: continuous model-guided image-to-image translation. CVPR 2021 oral.

Overview

CoMoGAN: Continuous Model-guided Image-to-Image Translation

Official repository.

Paper

CoMoGAN

CoMoGAN

CoMoGAN: continuous model-guided image-to-image translation [arXiv] | [supp] | [teaser]
Fabio Pizzati, Pietro Cerri, Raoul de Charette
Inria, Vislab Ambarella. CVPR'21 (oral)

If you find our work useful, please cite:

@inproceedings{pizzati2021comogan,
  title={{CoMoGAN}: continuous model-guided image-to-image translation},
  author={Pizzati, Fabio and Cerri, Pietro and de Charette, Raoul},
  booktitle={CVPR},
  year={2021}
}

Prerequisites

Tested with:

  • Python 3.7
  • Pytorch 1.7.1
  • CUDA 11.0
  • Pytorch Lightning 1.1.8
  • waymo_open_dataset 1.3.0

Preparation

The repository contains training and inference code for CoMo-MUNIT training on waymo open dataset. In the paper, we refer to this experiment as Day2Timelapse. All the models have been trained on a 32GB Tesla V100 GPU. We also provide a mixed precision training which should fit smaller GPUs as well (a usual training takes ~9GB).

Environment setup

We advise the creation of a new conda environment including all necessary packages. The repository includes a requirements file. Please create and activate the new environment with

conda env create -f requirements.yml
conda activate comogan

Dataset preparation

First, download the Waymo Open Dataset from the official website. The dataset is organized in .tfrecord files, which we preprocess and split depending on metadata annotations on time of day. Once you downloaded the dataset, you should run the dump_waymo.py script. It will read and unpack the .tfrecord files, also resizing the images for training. Please run

python scripts/dump_waymo.py --load_path path/of/waymo/open/training --save_path /path/of/extracted/training/images
python scripts/dump_waymo.py --load_path path/of/waymo/open/validation --save_path /path/of/extracted/validation/images

Running those commands should result in a similar directory structure:

root
  training
    Day
      seq_code_0_im_code_0.png
      seq_code_0_im_code_1.png
      ...
      seq_code_1_im_code_0.png
      ...
  Dawn/Dusk
      ...
  Night
      ...
  validation
    Day
      ...
    Dawn/Dusk
      ...
    Night
      ...

Pretrained weights

We release a pretrained set of weights to allow reproducibility of our results. The weights are downloadable from here. Once downloaded, unpack the file in the root of the project and test them with the inference notebook.

Training

The training routine of CoMoGAN is mainly based on the CycleGAN codebase, available with details in the official repository.

To launch a default training, run

python train.py --path_data path/to/waymo/training/dir --gpus 0

You can choose on which GPUs to train with the --gpus flag. Multi-GPU is not deeply tested but it should be managed internally by Pytorch Lightning. Typically, a full training requires 13GB+ of GPU memory unless mixed precision is set. If you have a smaller GPU, please run

python train.py --path_data path/to/waymo/training/dir --gpus 0 --mixed_precision

Please note that performances on mixed precision trainings are evaluated only qualitatively.

Experiment organization

In the training routine, an unique ID will be assigned to every training. All experiments will be saved in the logs folder, which is structured in this way:

logs/
  train_ID_0
    tensorboard/default/version_0
      checkpoints
        model_35000.pth
        ...
      hparams.yaml
      tb_log_file
  train_ID_1
    ...

In the checkpoints folder, all the intermediate checkpoints will be stored. hparams.yaml contains all the hyperparameters for a given run. You can launch a tensorboard --logdir train_ID instance on training directories to visualize intermediate outputs and loss functions.

To resume a previously stopped training, running

python train.py --id train_ID --path_data path/to/waymo/training/dir --gpus 0

will load the latest checkpoint from a given train ID checkpoints directory.

Extending the code

Command line arguments

We expose command line arguments to encourage code reusability and adaptability to other datasets or models. Right now, the available options thought for extensions are:

  • --debug: Disables logging and experiment saving. Useful for testing code modifications.
  • --model: Loads a CoMoGAN model. By default, it loads CoMo-MUNIT (code is in networks folder)
  • --data_importer: Loads data from a dataset. By default, it loads waymo for the day2timelapse experiment (code is in data folder).
  • --learning_rate: Modifies learning rate, default value for CoMo-MUNIT is 1e-4.
  • --scheduler_policy: You can choose among linear os step policy, taken respectively from CycleGAN and MUNIT training routines. Default is step.
  • --decay_iters_step: For step policy, how many iterations before reducing learning rate
  • --decay_step_gamma: Regulates how much to reduce the learning rate
  • --seed: Random seed initialization

The codebase have been rewritten almost from scratch after CVPR acceptance and optimized for reproducibility, hence the seed provided could give slightly different results from the ones reported in the paper.

Changing model and dataset requires extending the networks/base_model.py and data/base_dataset.py class, respectively. Please look into CycleGAN repository for further instructions.

Model, dataset and other options

Specific hyperparameters for different models, datasets or options not changing with high frequency are embedded in munch dictionaries in the relative classes. For instance, in networks/comomunit_model.py you can find all customizable options for CoMo-MUNIT. The same is valid for data/day2timelapse_dataset.py. The options folder includes additional options on checkpoint saving intervals and logging.

Inference

Once you trained a model, you can use the infer.ipynb notebook to visualize translation results. After having launched a notebook instance, you will be required to select the train_id of the experiment. The notebook is documented and it provides widgets for sequence, checkpoint and translation selection.

You can also use the translate.py script to translate all the images inside a directory or a sequence of images to another target directory.

python scripts/translate.py --load_path path/to/waymo/validation/day/dir --save_path path/to/saving/dir --phi 3.14

Will load image from the indicated path before translating it to a night style image due to the phi set to 3.14.

  • --phi: (𝜙) is the angle of the sun with a value between [0,2𝜋], which maps to a sun elevation ∈ [+30◦,−40◦]
  • --sequence: if you want to use only certain images, you can specify a name or a keyword contained in the image's name like --sequence segment-10203656353524179475
  • --checkpoint: if your folder logs contains more than one train_ID or if you want to select an older checkpoint, you should indicate the path to the checkpoint contained in the folder with the train_ID that you want like --checkpoint logs/train_ID_0/tensorboard/default/version_0/checkpoints/model_35000.pth

Docker

You will find a Dockerfile based on the nvidia/cuda:11.0.3-base-ubuntu18.04 image with all the dependencies that you need to run and test the code. To build it and to run it :

docker build -t notebook/comogan:1.0 .
docker run -it -v /path/to/your/local/datasets/:/datasets -p 8888:8888 --gpus '"device=0"' notebook/comogan:1.0
  • --gpus: gives you the possibility to only parse the GPU that you want to use, by default, all the available GPUs are parsed.
  • -v: mount the local directory that contained your dataset
  • -p: this option is only used for the infer.ipynb notebook. If you run the notebook on a remote server, you should also use this command to tunnel the output to your computer ssh [email protected] -NL 8888:127.0.0.1:8888
Comments
  • Questions about physical models

    Questions about physical models

    Dear author,

    Thans for your impressive work,I'm very honored to ask you a few questions. First,which physical model can I choose if I want to do RGB image 2 Infrared image translation?Is there a filter like the one described in the paper that would help me do this?Second,I think I should use a linear model, so what should I modify?I am looking forward to your advice.

    Thank you!

    opened by DZY-cybe 11
  • Questions about code

    Questions about code

    Hi! Thanks for your research and code!I have some questions about linear FIN.If I want to change a cyclic FIN to a linear FIN, do I just need to modify the definition of phi and the __apply_colormap function?I found that I also needed to change the code in many parts in comomunit.py and comomunit_model.py. Do you have any easy way?

    opened by DZY-cybe 4
  • core dumped

    core dumped

    When running translate.py to convert the daytime images to night scenes, it says segmentation failed(core dumped). The size of dataset is only about 700.

    cuda version 11.4 RAM:376GB GPU: RTX TITAN system:ubuntu 18.04

    opened by Chenanism777 3
  • question about codes

    question about codes

    Thanks for your codes!I have one question about restart a training.In the README.md, I seem to be able to use: python train.py --id train_ID --path_data path/to/waymo/training/dir --gpus 0.But when I use the pretrained model, it builds a new version.Is it right?

    opened by DZY-cybe 3
  • Download Waymo Open Dataset

    Download Waymo Open Dataset

    Your work has inspired me a lot. Thank you very much. I want to download Waymo Open Dataset for training, but I am not sure which one to download. I hope you can let me know, thank you Snipaste_2021-10-04_23-08-13 !

    opened by creater-zq 3
  • Which waymo open dataset should I download for training purpose.

    Which waymo open dataset should I download for training purpose.

    I went to their website and download https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_0_0 training_0000.tar and try to run the dump_waymo.py. but it says no such file or directory: 'sunny_sequences.txt'

    my file structure is like ./scripts train dump_waymo.py sunny_sequences.txt

    opened by Chenanism777 2
  • Linear target dataset structure

    Linear target dataset structure

    Thank you for your research and for sharing your code! I want to train a custom dataset rgb2rgb ex. blured_image 2 focused_image. From your paper it seams that the I should use the Linear target approach. How would I go about creating a dataset structure? Should it be as simple as trainA (blured images) trainB (focused images)? Can you provide your Linear target dataset loading files?

    Thank you!

    opened by cyprian 2
  • Questions about the tone mapping.

    Questions about the tone mapping.

    Dear author,

    Thank you for this very impressive work. I just visualized the tone mapping results and I think it is very similar to the images obtained by using color jittering. So, can it be simply replaced by color jittering? And what do the values in the daytime_model_lut.csv represent?

    Thanks!

    opened by xyIsHere 2
  • equation (7) in the paper

    equation (7) in the paper

    Hi, I think your work is really interesting! I have a question about the equation(7) in the paper, the h^Y and h^y_M are summed up by three kinds of features, respectively, but in the codes they are summed up by four kinds of features . Did i misunderstand something?

    https://github.com/cv-rits/CoMoGAN/blob/dd3824715152f6464a95c99dd6f936744992b122/networks/backbones/comomunit.py#L145

    opened by NguyenTriTrinh 2
  • Dump waymo dataset fail.

    Dump waymo dataset fail.

    I download waymo_open_dataset_v_1_2_0_individual_files, which file name look like this : "segment-9145030426583202228_1060_000_1080_000_with_camera_labels.tfrecord".

    And dump_waymo.py seems to extract nothing, it seems there isn't any file fits in sunny_sequences.txt. So all files are skip.

    How to solve this problem? Can I just comment this sunny_sequences.txt?

    opened by sjytker 1
  • Cyclic FIN Layer to Linear FIN Layer

    Cyclic FIN Layer to Linear FIN Layer

    Hi !

    First of all thank you for your work, I was waiting your code since I read your paper !

    I was wondering if you can give some advice to modify your code from a cyclic function of the FIN layer to a Linear one ?

    I actually try to only replace every single cos_phi / sin_phi to a simple phi, but I'm not sure that will be enough. Maybe I will miss some major points by only changing these.

    Thank you again !

    opened by cyiheng 1
  • I had a problem when building the environment for this project.

    I had a problem when building the environment for this project.

    I created the conda environment under the guidance of README.md. However, my terminal threw the errors as below: Solving environment: failed

    ResolvePackageNotFound:

    • libpng==1.6.37=hbc83047_0
    • xz==5.2.5=h7b6447c_0
    • tornado==6.1=py37h27cfd23_0
    • readline==7.0=h7b6447c_5
    • dbus==1.13.18=hb2f20db_0
    • mkl_random==1.1.1=py37h0573a6f_0
    • expat==2.2.10=he6710b0_2
    • libgcc-ng==9.1.0=hdf63c60_0
    • mkl_fft==1.3.0=py37h54f3939_0
    • lz4-c==1.9.3=h2531618_0
    • pcre==8.44=he6710b0_0
    • ninja==1.10.2=py37hff7bd54_0
    • sqlite==3.33.0=h62c20be_0
    • six==1.15.0=py37h06a4308_0
    • libuuid==1.0.3=h1bed415_2 etc. I wonder how can I solve this problem?
    opened by PrLeung 0
Owner
Codes from Computer Vision group of RITS Team, Inria
null
[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

PG-MORL This repository contains the implementation for the paper Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Contro

MIT Graphics Group 65 Jan 7, 2023
[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

NerfingMVS Project Page | Paper | Video | Data NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo Yi Wei, Shaohui

Yi Wei 369 Dec 24, 2022
[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space by Quande Liu, Cheng Chen, Ji

Quande Liu 178 Jan 6, 2023
CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Spatially-Correlative Loss arXiv | website We provide the Pytorch implementation of "The Spatially-Correlative Loss for Various Image Translation Task

Chuanxia Zheng 89 Jan 4, 2023
Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)

UNITE and UNITE+ Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021) Unbalanced Intrinsic Feature Transport for Exemplar-bas

Fangneng Zhan 183 Nov 9, 2022
Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

Image Translation with ASAPNets Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021 Webpage | Paper | Video Installation insta

Tamar Rott Shaham 100 Dec 28, 2022
S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)

S2-BNN (Self-supervised Binary Neural Networks Using Distillation Loss) This is the official pytorch implementation of our paper: "S2-BNN: Bridging th

Zhiqiang Shen 52 Dec 24, 2022
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

GDR-Net This repo provides the PyTorch implementation of the work: Gu Wang, Fabian Manhardt, Federico Tombari, Xiangyang Ji. GDR-Net: Geometry-Guided

null 169 Jan 7, 2023
SurfEmb (CVPR 2022) - SurfEmb: Dense and Continuous Correspondence Distributions

SurfEmb SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation with Learnt Surface Embeddings Rasmus Laurvig Haugard, A

Rasmus Haugaard 56 Nov 19, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Official PyTorch implementation of RobustNet (CVPR 2021 Oral)

RobustNet (CVPR 2021 Oral): Official Project Webpage Codes and pretrained models will be released soon. This repository provides the official PyTorch

Sungha Choi 173 Dec 21, 2022
Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral

NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video Project Page | Paper NeuralRecon: Real-Time Coherent 3D Reconstruction from Mon

ZJU3DV 1.4k Dec 30, 2022
Dynamic Slimmable Network (CVPR 2021, Oral)

Dynamic Slimmable Network (DS-Net) This repository contains PyTorch code of our paper: Dynamic Slimmable Network (CVPR 2021 Oral). Architecture of DS-

Changlin Li 197 Dec 9, 2022
[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis [arxiv|pdf|v

Yinan He 78 Dec 22, 2022
[CVPR 2021 Oral] Variational Relational Point Completion Network

VRCNet: Variational Relational Point Completion Network This repository contains the PyTorch implementation of the paper: Variational Relational Point

PL 121 Dec 12, 2022
Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Single-view robot pose and joint angle estimation via render & compare Yann Labbé, Justin Carpentier, Mathieu Aubry, Josef Sivic CVPR: Conference on C

Yann Labbé 51 Oct 14, 2022
Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)

Quasi-Dense Tracking This is the offical implementation of paper Quasi-Dense Similarity Learning for Multiple Object Tracking. We present a trailer th

ETH VIS Research Group 327 Dec 27, 2022
Pytorch implementation for "Adversarial Robustness under Long-Tailed Distribution" (CVPR 2021 Oral)

Adversarial Long-Tail This repository contains the PyTorch implementation of the paper: Adversarial Robustness under Long-Tailed Distribution, CVPR 20

Tong WU 89 Dec 15, 2022
Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

Facebook Research 182 Dec 30, 2022