CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Overview

Spatially-Correlative Loss

arXiv | website


We provide the Pytorch implementation of "The Spatially-Correlative Loss for Various Image Translation Tasks". Based on the inherent self-similarity of object, we propose a new structure-preserving loss for one-sided unsupervised I2I network. The new loss will deal only with spatial relationship of repeated signal, regardless of their original absolute value.

The Spatially-Correlative Loss for Various Image Translation Tasks
Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
NTU and Monash University
In CVPR2021

ToDo

  • release the single-modal I2I model
  • a simple example to use the proposed loss

Example Results

Unpaired Image-to-Image Translation

Single Image Translation

More results on project page

Getting Started

Installation

This code was tested with Pytorch 1.7.0, CUDA 10.2, and Python 3.7

pip install visdom dominate
  • Clone this repo:
git clone https://github.com/lyndonzheng/F-LSeSim
cd F-LSeSim

Datasets

Please refer to the original CUT and CycleGAN to download datasets and learn how to create your own datasets.

Training

  • Train the single-modal I2I translation model:
sh ./scripts/train_sc.sh 
  • Set --use_norm for cosine similarity map, the default similarity is dot-based attention score. --learned_attn, --augment for the learned self-similarity.

  • To view training results and loss plots, run python -m visdom.server and copy the URL http://localhost:port.

  • Training models will be saved under the checkpoints folder.

  • The more training options can be found in the options folder.

  • Train the single-image translation model:

sh ./scripts/train_sinsc.sh 

As the multi-modal I2I translation model was trained on MUNIT, we would not plan to merge the code to this repository. If you wish to obtain multi-modal results, please contact us at [email protected].

Testing

  • Test the single-modal I2I translation model:
sh ./scripts/test_sc.sh
  • Test the single-image translation model:
sh ./scripts/test_sinsc.sh
  • Test the FID score for all training epochs:
sh ./scripts/test_fid.sh

Pretrained Models

Download the pre-trained models (will be released soon) using the following links and put them undercheckpoints/ directory.

Citation

@inproceedings{zheng2021spatiallycorrelative,
  title={The Spatially-Correlative Loss for Various Image Translation Tasks},
  author={Zheng, Chuanxia and Cham, Tat-Jen and Cai, Jianfei},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Acknowledge

Our code is developed based on CUT and CycleGAN. We also thank pytorch-fid for FID computation, LPIPS for diversity score, and D&C for density and coverage evaluation.

Comments
  • How to get D&C score

    How to get D&C score

    According to "Similar to LPIPS, we first sampled 19 pairs for each image and then used the code at https://github.com/mseitzer/pytorch-fid to extract the features of real and generated images. Finally, we fed these 1900 generated samples and real features with 2048-dimensions to the PRDC function provided in https://github.com/clovaai/generative-evaluation-prdc to calculate these scores." So will you run 19times to generate enough generated samples?

    opened by ygjwd12345 7
  • Where are the negatively paired patches sampled from x_aug?

    Where are the negatively paired patches sampled from x_aug?

    Thank you for releasing this wonderful code. I am confused with the implementation of the contrastive learning part. In the paper, it mentioned the negatively paired patches whether come from an image y or a different location in the image x_aug. In the code

        if self.opt.augment:
            norm_aug_A, norm_aug_B = self.normalization((self.aug_A + 1) * 0.5), self.normalization((self.aug_B + 1) * 0.5)
            norm_real_A = torch.cat([norm_real_A, norm_real_A], dim=0)
            norm_fake_B = torch.cat([norm_fake_B, norm_aug_A], dim=0)
            norm_real_B = torch.cat([norm_real_B, norm_aug_B], dim=0)
        self.loss_spatial = self.Spatial_Loss(self.netPre, norm_real_A, norm_fake_B, norm_real_B)
    
        feats_src = net(src, self.attn_layers, encode_only=True)
        feats_tgt = net(tgt, self.attn_layers, encode_only=True)
        if other is not None:
            feats_oth = net(torch.flip(other, [2, 3]), self.attn_layers, encode_only=True)
        else:
            feats_oth = [None for _ in range(n_layers)]
    
        total_loss = 0.0
        for i, (feat_src, feat_tgt, feat_oth) in enumerate(zip(feats_src, feats_tgt, feats_oth)):
            loss = self.criterionSpatial.loss(feat_src, feat_tgt, feat_oth, i)
            total_loss += loss.mean()
    
        sam_neg1 = (sim_src.bmm(sim_other.permute(0, 2, 1))).view(-1, Num) / self.T
        sam_neg2 = (sim_tgt.bmm(sim_other.permute(0, 2, 1))).view(-1, Num) / self.T
        sam_self = (sim_src.bmm(sim_tgt.permute(0, 2, 1))).view(-1, Num) / self.T
        sam_self = torch.cat([sam_self, sam_neg1, sam_neg2], dim=-1)
        loss = self.cross_entropy_loss(sam_self, torch.arange(0, sam_self.size(0), dtype=torch.long, device=sim_src.device) % (Num))
    

    It seems like sam_neg1 is the from x and y/y_aug. But sam_neg2 is from y^hat and y or x_aug and y_aug. I didn't see the negatively paired patches from x and x_aug with a different patch center. Is my understanding correct?

    opened by DeepTag 6
  • Problem about training setting

    Problem about training setting

    I plan to reproduce your work. May I ask if --learned_attn --augment are used in the final result reported in the paper, such as Horse → Zebra FID 38.0 or not. In the latest version, they are commented.

    opened by ygjwd12345 5
  • About training on the winter2summer dataset

    About training on the winter2summer dataset

    Hello! Thank you for doing such a good job. And I am trying to train it on the winter2summer dataset, the settings are as yours. I use the command --dataroot ./datasets/winter2summer --name winter2summer_SCL --model sc --learned_attn --augment to train the LSeSim model. I trained it about 50epoch and like this image the FID=104.3 But the effect is not as good as shown in your paper image Could you tell me what's wrong with me?

    Thank you!

    opened by ZhenyuLiu-BJFU 4
  • question about result reproduction

    question about result reproduction

    Hi, thanks for your nice work

    I am trying to reproduce the result with your pre-trained model,

    but the results are not matched to those reported in the paper.

    Are the models the same model in your paper? or re-trained version?

    If it is same model, please further specify how to get the values in the paper.

    Best Regards,

    opened by KwonGihyun 3
  • Evaluation of Cityscapes

    Evaluation of Cityscapes

    Thanks for the great work! In the paper, you claim using the pre-trained drn model to conduct the evaluation, while CUT claim they trained a model with drn-d-22 structure to conduct the evaluation. I want to know which model is adopted in your paper to evaluate the performance? drn-d-22.pth or drn-d-105-ms.pth? I tested the pre-trained drn-d-22.pth model and cannot reproduce the results of CUT, but drn-d-105-ms.pth can match the reported results of CUT. @lyndonzheng

    opened by fnzhan 3
  • Why normalize the feature by

    Why normalize the feature by "/ np.sqrt(C)"

    Hi, @lyndonzheng

    I have a question about the normalization of the extracted feature. As shown in line 221, the code normalizes the feature by dividing the feature by "np.sqrt(C)". Is there any meaning about np.sqrt(C)? https://github.com/lyndonzheng/F-LSeSim/blob/e092e62ed8a2f51f3661630e1522ec2549ec31d3/models/losses.py#L221

    Thank you in advance.

    opened by o0t1ng0o 3
  • Update the pretrained VGG16 or not?

    Update the pretrained VGG16 or not?

    Hi, @lyndonzheng I have a question about the optimization of pretrained VGG16 when training the Learned SeSim model. As shown in the following code, the learning rate multiplies zero. I am wondering whether the pretrained VGG16 will update its parameters during training? https://github.com/lyndonzheng/F-LSeSim/blob/e092e62ed8a2f51f3661630e1522ec2549ec31d3/models/sc_model.py#L109

    Thank you in advance.

    opened by o0t1ng0o 2
  • Question about sc_model.py (bs_per_gpu)

    Question about sc_model.py (bs_per_gpu)

    image

    Hi, I am wondering how the multi-gpu process works in the line 102 and 103. Line 102 : self.real_A = self.real_A[:bs_per_gpu]

    I think self.real_A[:bs_per_gpu] is just simply removing a part of self.real_A, not working as spliting data for multi-gpu.

    Is it a typo? or am I wrong?

    opened by 1211sh 2
  • poor performance of single-mode summer2winter translation

    poor performance of single-mode summer2winter translation

    I use the LSeSim setting to train the summer2winter, while only get a FID of 132.4, while the CUT can get a FID of 80. here is the translation results of LSeSim image

    opened by clark141 0
  • mulimodal translation

    mulimodal translation

    Hi,

    Thank you for the great work. I am interested in reproducing your results with MUNIT. Hence, I wonder if the models and the training code for multi-modal translation are available (using one sided MUNIT).

    opened by baran-ozaydin 1
  • Selecting query-key patches

    Selecting query-key patches

    @lyndonzheng Hi, thanks for your great work. After reading your paper and source code, I have a question about selecting query-key patches. In your source codes, the selecting codes are as below. selecting_patch In line 246, left, top = pos_x - int(pw/2), pos_y - int(ph/2). What's the reason to subtracts int(pw/2) from pos_x ? Watching codes from line 244 to line 259, It seems like a strategy to select query-key patches. Is there some reason to use this strategy or we could use other strategy to get same results ?

    opened by YangGangZhiQi 2
Owner
Chuanxia Zheng
Chuanxia Zheng
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Qin Wang 87 Jan 8, 2023
CVPR 2021 Challenge on Super-Resolution Space

Learning the Super-Resolution Space Challenge NTIRE 2021 at CVPR Learning the Super-Resolution Space challenge is held as a part of the 6th edition of

andreas 104 Oct 26, 2022
[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Counterfactual Zero-Shot and Open-Set Visual Recognition This project provides implementations for our CVPR 2021 paper Counterfactual Zero-S

null 144 Dec 24, 2022
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 585 Jan 4, 2023
Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Dense Contrastive Learning for Self-Supervised Visual Pre-Training This project hosts the code for implementing the DenseCL algorithm for se

Xinlong Wang 491 Jan 3, 2023
Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Language: 简体中文 | English Introduction This is the code for Multiple Instance Active Learning for Object Detection, CVPR 2021. Installation A Linux pla

Tianning Yuan 269 Dec 21, 2022
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning By Zhenda Xie*, Yutong Lin*, Zheng Zhang, Yue Ca

Zhenda Xie 293 Dec 20, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

FlyingRoastDuck 59 Oct 31, 2022
[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Anycost GAN video | paper | website Anycost GANs for Interactive Image Synthesis and Editing Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zh

MIT HAN Lab 726 Dec 28, 2022
[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

involution Official implementation of a neural operator as described in Involution: Inverting the Inherence of Convolution for Visual Recognition (CVP

Duo Li 1.3k Dec 28, 2022
DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Evaluation, Training, Demo, and Inference of DeFMO DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021) Denys Rozumnyi, Martin R. O

Denys Rozumnyi 139 Dec 26, 2022
[CVPR 2021] 'Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator'

[CVPR2021] Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator Overview This is the entire codebase for the paper

null 35 Dec 1, 2022
Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

ReDet: A Rotation-equivariant Detector for Aerial Object Detection ReDet: A Rotation-equivariant Detector for Aerial Object Detection (CVPR2021), Jiam

csuhan 334 Dec 23, 2022
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021)

Back to the Feature with PixLoc We introduce PixLoc, a neural network for end-to-end learning of camera localization from an image and a 3D model via

Computer Vision and Geometry Lab 610 Jan 5, 2023
Repository for "Exploring Sparsity in Image Super-Resolution for Efficient Inference", CVPR 2021

SMSR Reposity for "Exploring Sparsity in Image Super-Resolution for Efficient Inference" [arXiv] Highlights Locate and skip redundant computation in S

Longguang Wang 225 Dec 26, 2022
[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Rex Cheng 364 Jan 3, 2023
CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Diverse Structure Inpainting ArXiv | Papar | Supplementary Material | BibTex This repository is for the CVPR 2021 paper, "Generating Diverse Structure

null 152 Nov 4, 2022