CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Chuanxia Zheng

Last update: Jan 4, 2023

Related tags

Deep Learning F-LSeSim

Overview

Spatially-Correlative Loss

We provide the Pytorch implementation of "The Spatially-Correlative Loss for Various Image Translation Tasks". Based on the inherent self-similarity of object, we propose a new structure-preserving loss for one-sided unsupervised I2I network. The new loss will deal only with spatial relationship of repeated signal, regardless of their original absolute value.

The Spatially-Correlative Loss for Various Image Translation Tasks
Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
NTU and Monash University
In CVPR2021

ToDo

release the single-modal I2I model
a simple example to use the proposed loss

Example Results

Unpaired Image-to-Image Translation

Single Image Translation

More results on project page

Getting Started

Installation

This code was tested with Pytorch 1.7.0, CUDA 10.2, and Python 3.7

Install Pytoch 1.7.0, torchvision, and other dependencies from http://pytorch.org
Install python libraries visdom and dominate for visualization

pip install visdom dominate

Clone this repo:

git clone https://github.com/lyndonzheng/F-LSeSim
cd F-LSeSim

Datasets

Please refer to the original CUT and CycleGAN to download datasets and learn how to create your own datasets.

Training

Train the single-modal I2I translation model:

sh ./scripts/train_sc.sh

Set --use_norm for cosine similarity map, the default similarity is dot-based attention score. --learned_attn, --augment for the learned self-similarity.
To view training results and loss plots, run python -m visdom.server and copy the URL http://localhost:port.
Training models will be saved under the checkpoints folder.
The more training options can be found in the options folder.
Train the single-image translation model:

sh ./scripts/train_sinsc.sh

As the multi-modal I2I translation model was trained on MUNIT, we would not plan to merge the code to this repository. If you wish to obtain multi-modal results, please contact us at [email protected].

Testing

Test the single-modal I2I translation model:

sh ./scripts/test_sc.sh

Test the single-image translation model:

sh ./scripts/test_sinsc.sh

Test the FID score for all training epochs:

sh ./scripts/test_fid.sh

Pretrained Models

Download the pre-trained models (will be released soon) using the following links and put them undercheckpoints/ directory.

Single-image translation model: image2monet

Citation

@inproceedings{zheng2021spatiallycorrelative,
  title={The Spatially-Correlative Loss for Various Image Translation Tasks},
  author={Zheng, Chuanxia and Cham, Tat-Jen and Cai, Jianfei},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Acknowledge

Our code is developed based on CUT and CycleGAN. We also thank pytorch-fid for FID computation, LPIPS for diversity score, and D&C for density and coverage evaluation.

Comments

How to get D&C score

According to "Similar to LPIPS, we first sampled 19 pairs for each image and then used the code at https://github.com/mseitzer/pytorch-fid to extract the features of real and generated images. Finally, we fed these 1900 generated samples and real features with 2048-dimensions to the PRDC function provided in https://github.com/clovaai/generative-evaluation-prdc to calculate these scores." So will you run 19times to generate enough generated samples?

opened by ygjwd12345 7

Where are the negatively paired patches sampled from x_aug?

Thank you for releasing this wonderful code. I am confused with the implementation of the contrastive learning part. In the paper, it mentioned the negatively paired patches whether come from an image y or a different location in the image x_aug. In the code

    if self.opt.augment:
        norm_aug_A, norm_aug_B = self.normalization((self.aug_A + 1) * 0.5), self.normalization((self.aug_B + 1) * 0.5)
        norm_real_A = torch.cat([norm_real_A, norm_real_A], dim=0)
        norm_fake_B = torch.cat([norm_fake_B, norm_aug_A], dim=0)
        norm_real_B = torch.cat([norm_real_B, norm_aug_B], dim=0)
    self.loss_spatial = self.Spatial_Loss(self.netPre, norm_real_A, norm_fake_B, norm_real_B)

    feats_src = net(src, self.attn_layers, encode_only=True)
    feats_tgt = net(tgt, self.attn_layers, encode_only=True)
    if other is not None:
        feats_oth = net(torch.flip(other, [2, 3]), self.attn_layers, encode_only=True)
    else:
        feats_oth = [None for _ in range(n_layers)]

    total_loss = 0.0
    for i, (feat_src, feat_tgt, feat_oth) in enumerate(zip(feats_src, feats_tgt, feats_oth)):
        loss = self.criterionSpatial.loss(feat_src, feat_tgt, feat_oth, i)
        total_loss += loss.mean()

    sam_neg1 = (sim_src.bmm(sim_other.permute(0, 2, 1))).view(-1, Num) / self.T
    sam_neg2 = (sim_tgt.bmm(sim_other.permute(0, 2, 1))).view(-1, Num) / self.T
    sam_self = (sim_src.bmm(sim_tgt.permute(0, 2, 1))).view(-1, Num) / self.T
    sam_self = torch.cat([sam_self, sam_neg1, sam_neg2], dim=-1)
    loss = self.cross_entropy_loss(sam_self, torch.arange(0, sam_self.size(0), dtype=torch.long, device=sim_src.device) % (Num))

It seems like sam_neg1 is the from x and y/y_aug. But sam_neg2 is from y^hat and y or x_aug and y_aug. I didn't see the negatively paired patches from x and x_aug with a different patch center. Is my understanding correct?

opened by DeepTag 6

Problem about training setting

I plan to reproduce your work. May I ask if --learned_attn --augment are used in the final result reported in the paper, such as Horse → Zebra FID 38.0 or not. In the latest version, they are commented.

opened by ygjwd12345 5
About training on the winter2summer dataset

Hello! Thank you for doing such a good job. And I am trying to train it on the winter2summer dataset, the settings are as yours. I use the command --dataroot ./datasets/winter2summer --name winter2summer_SCL --model sc --learned_attn --augment to train the LSeSim model. I trained it about 50epoch and like this the FID=104.3 But the effect is not as good as shown in your paper Could you tell me what's wrong with me?

Thank you！

opened by ZhenyuLiu-BJFU 4
question about result reproduction

Hi, thanks for your nice work

I am trying to reproduce the result with your pre-trained model,

but the results are not matched to those reported in the paper.

Are the models the same model in your paper? or re-trained version?

If it is same model, please further specify how to get the values in the paper.

Best Regards,

opened by KwonGihyun 3
Evaluation of Cityscapes

Thanks for the great work! In the paper, you claim using the pre-trained drn model to conduct the evaluation, while CUT claim they trained a model with drn-d-22 structure to conduct the evaluation. I want to know which model is adopted in your paper to evaluate the performance? drn-d-22.pth or drn-d-105-ms.pth? I tested the pre-trained drn-d-22.pth model and cannot reproduce the results of CUT, but drn-d-105-ms.pth can match the reported results of CUT. @lyndonzheng

opened by fnzhan 3
Why normalize the feature by "/ np.sqrt(C)"

Hi, @lyndonzheng

I have a question about the normalization of the extracted feature. As shown in line 221, the code normalizes the feature by dividing the feature by "np.sqrt(C)". Is there any meaning about np.sqrt(C)? https://github.com/lyndonzheng/F-LSeSim/blob/e092e62ed8a2f51f3661630e1522ec2549ec31d3/models/losses.py#L221

Thank you in advance.

opened by o0t1ng0o 3
Update the pretrained VGG16 or not?

Hi, @lyndonzheng I have a question about the optimization of pretrained VGG16 when training the Learned SeSim model. As shown in the following code, the learning rate multiplies zero. I am wondering whether the pretrained VGG16 will update its parameters during training? https://github.com/lyndonzheng/F-LSeSim/blob/e092e62ed8a2f51f3661630e1522ec2549ec31d3/models/sc_model.py#L109

Thank you in advance.

opened by o0t1ng0o 2
Question about sc_model.py (bs_per_gpu)

Hi, I am wondering how the multi-gpu process works in the line 102 and 103. Line 102 : self.real_A = self.real_A[:bs_per_gpu]

I think self.real_A[:bs_per_gpu] is just simply removing a part of self.real_A, not working as spliting data for multi-gpu.

Is it a typo? or am I wrong?

opened by 1211sh 2
poor performance of single-mode summer2winter translation

I use the LSeSim setting to train the summer2winter, while only get a FID of 132.4, while the CUT can get a FID of 80. here is the translation results of LSeSim

opened by clark141 0
mulimodal translation

Hi,

Thank you for the great work. I am interested in reproducing your results with MUNIT. Hence, I wonder if the models and the training code for multi-modal translation are available (using one sided MUNIT).

opened by baran-ozaydin 1
Selecting query-key patches

@lyndonzheng Hi, thanks for your great work. After reading your paper and source code, I have a question about selecting query-key patches. In your source codes, the selecting codes are as below. In line 246, left, top = pos_x - int(pw/2), pos_y - int(ph/2). What's the reason to subtracts int(pw/2) from pos_x ? Watching codes from line 244 to line 259, It seems like a strategy to select query-key patches. Is there some reason to use this strategy or we could use other strategy to get same results ?

opened by YangGangZhiQi 2

Owner

Chuanxia Zheng

GitHub

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

44 Dec 12, 2022

CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Related tags

Overview

Spatially-Correlative Loss

ToDo

Example Results

Unpaired Image-to-Image Translation

Single Image Translation

Getting Started

Installation

Training

Testing

Pretrained Models

Citation

Acknowledge

Comments

Owner

Chuanxia Zheng

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CVPR 2021 Challenge on Super-Resolution Space

[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

Code for our CVPR 2021 paper "MetaCam+DSCE"

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

[CVPR 2021] 'Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator'

Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021)

Repository for "Exploring Sparsity in Image Super-Resolution for Efficient Inference", CVPR 2021

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"