Official implementation for "QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation" (CVPR 2022)

Overview

QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation (CVPR2022)

https://arxiv.org/abs/2203.08483

Unpaired image-to-image (I2I) translation often requires to maximize the mutual information between the source and the translated images across different domains, which is critical for the generator to keep the source content and prevent it from unnecessary modifications. The self-supervised contrastive learning has already been successfully applied in the I2I. By constraining features from the same location to be closer than those from different ones, it implicitly ensures the result to take content from the source. However, previous work uses the features from random locations to impose the constraint, which may not be appropriate since some locations contain less information of source domain. Moreover, the feature itself does not reflect the relation with others. This paper deals with these problems by intentionally selecting significant anchor points for contrastive learning. We design a query-selected attention (QS-Attn) module, which compares feature distances in the source domain, giving an attention matrix with a probability distribution in each row. Then we select queries according to their measurement of significance, computed from the distribution. The selected ones are regarded as anchors for contrastive loss. At the same time, the reduced attention matrix is employed to route features in both domains, so that source relations maintain in the synthesis. We validate our proposed method in three different I2I datasets, showing that it increases the image quality without adding learnable parameters.



QS-Attn applies attention to select anchors for contrastive learning in single-direction I2I task

Getting Started

Prerequisites

  • Ubuntu 16.04
  • NVIDIA GPU + CUDA CuDNN
  • Python 3 Please use pip install -r requirements.txt to install the dependencies.

Pretrained Models

We provide Global, Local and Global+Local models for three datasets.

Model Cityscapes Horse2zebra AFHQ
Global Cityscapes_Global Horse2zebra_Global AFHQ_Global
Local Cityscapes_Local Horse2zebra_Local AFHQ_Local
Global+Local Cityscapes_Global+Local Horse2zebra_Global+Local AFHQ_Global+Local

Training

  • Download horse2zebra dataset :
bash ./datasets/download_qsattn_dataset.sh horse2zebra
  • Train the global model:
python train.py \
--dataroot=datasets/horse2zebra \
--name=horse2zebra_global \
--QS_mode=global
  • You can use visdom to view the training loss: Run python -m visdom.server and click the URL http://localhost:8097.

Inference

  • Test the global model:
python test.py \
--dataroot=datasets/horse2zebra \
--name=horse2zebra_qsattn_global \
--QS_mode=global

Citation

If you use this code for your research, please cite

@article{hu2022qs,
  title={QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation},
  author={Hu, Xueqi and Zhou, Xinyue and Huang, Qiusheng and Shi, Zhengyi and Sun, Li and Li, Qingli},
  journal={arXiv preprint arXiv:2203.08483},
  year={2022}
}
Issues
  • can not get satisfiying result using  default parameters

    can not get satisfiying result using default parameters

    Hello I test the FID using public model and the output is similar to paper result. However, I train the model the FID is much bigger. I do not know what happened. The option is as follows ----------------- Options --------------- QS_mode: global
    batch_size: 1
    beta1: 0.5
    beta2: 0.999
    checkpoints_dir: ./checkpoints
    continue_train: False
    crop_size: 256
    dataroot: /cache/data/horse2zebra [default: ./datasets/horse2zebra] dataset_mode: unaligned
    direction: AtoB
    display_env: main
    display_freq: 400
    display_id: 0 [default: None] display_ncols: 4
    display_port: 8097
    display_server: http://localhost
    display_winsize: 256
    easy_label: experiment_name
    epoch: latest
    epoch_count: 1
    evaluation_freq: 5000
    flip_equivariance: False
    gan_mode: lsgan
    gpu_ids: 0
    init_gain: 0.02
    init_type: xavier
    input_nc: 3
    isTrain: True [default: None] lambda_GAN: 1.0
    lambda_NCE: 1.0
    load_size: 286
    lr: 0.0002
    lr_decay_iters: 50
    lr_policy: linear
    max_dataset_size: inf
    model: qs
    n_epochs: 200
    n_epochs_decay: 200
    n_layers_D: 3
    name: horse2zebra_QSAttn_global [default: horse2zebra_qsattn_global] nce_T: 0.07
    nce_idt: True
    nce_layers: 0,4,8,12,16
    ndf: 64
    netD: basic
    netF: mlp_sample
    netF_nc: 256
    netG: resnet_9blocks
    ngf: 64
    no_antialias: False
    no_antialias_up: False
    no_dropout: True
    no_flip: False
    no_html: False
    normD: instance
    normG: instance
    num_patches: 256
    num_threads: 4
    output_nc: 3
    phase: train
    pool_size: 0
    preprocess: resize_and_crop
    pretrained_name: None
    print_freq: 100
    save_by_iter: False
    save_epoch_freq: 5
    save_latest_freq: 5000
    save_path: ./1.query-selected-attention/ serial_batches: False
    suffix:
    update_html_freq: 1000
    verbose: False
    ----------------- End -------------------

    opened by JiaXiaofei0909 3
  • The pretrained model and evaluation DRN

    The pretrained model and evaluation DRN

    Hi, thanks for your excellent work.

    1. Could you please release the pre-trained models?

    2. Could you also please release your code of evaluation of cityscapes? The PixAcc of your method is super high, and I would like to compare and cite your work.

    3. About the SWD, may I ask which code are you using? I tried google and I can only get https://github.com/koshian2/swd-pytorch . It uses tensors as input and there can be many ways of generating tensors from images.

    Again, thanks for your work! It is very interesting!

    opened by veroveroxie 5
Owner
Xueqi Hu
Xueqi Hu
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 4 Jan 27, 2022
The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

NTIRE 2022 - Image Inpainting Challenge Important dates 2022.02.01: Release of train data (input and output images) and validation data (only input) 2

Andrés Romero 25 Jun 22, 2022
Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022.

Jadena Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022. arXiv

Qing Guo 10 Jun 12, 2022
Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes / 3DCrowdNet News ?? 3DCrowdNet achieves the state-of-the-art accuracy on 3D

Hongsuk Choi 74 Jun 27, 2022
Official Implementation of CVPR 2022 paper: "Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning"

(CVPR 2022) Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning ArXiv This repo contains Official Implementat

Yujun Shi 15 May 28, 2022
Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data (CVPR 2022) Potentials of primitive shapes f

null 29 Jun 27, 2022
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 70 Jun 23, 2022
Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion This repository contains a pytorch implementation of "Learning to Listen: Modeling

null 34 Jun 23, 2022
[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

Reference-based Video Super-Resolution (RefVSR) Official PyTorch Implementation of the CVPR 2022 Paper Project | arXiv | RealMCVSR Dataset This repo c

Junyong Lee 106 Jun 20, 2022
(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"

RepSurf - Surface Representation for Point Clouds [CVPR 2022 Oral] By Haoxi Ran* , Jun Liu, Chengjie Wang ( * : corresponding contact) The pytorch off

Haoxi Ran 114 Jun 26, 2022
Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

LDL Paper | Supplementary Material Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Jie Liang*, Hu

null 102 Jun 24, 2022
Official implementation for "Style Transformer for Image Inversion and Editing" (CVPR 2022)

Style Transformer for Image Inversion and Editing (CVPR2022) https://arxiv.org/abs/2203.07932 Existing GAN inversion methods fail to provide latent co

Xueqi Hu 109 Jun 28, 2022
Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation This repository contains MegEngine implementation of ou

MEGVII Research 205 Jun 29, 2022
[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

MDCA Calibration This is the official PyTorch implementation for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved

MDCA Calibration 6 May 24, 2022
Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)

?? Sound-guided Semantic Image Manipulation (CVPR2022) Official Pytorch Implementation Sound-guided Semantic Image Manipulation IEEE/CVF Conference on

CVLAB 40 Jun 24, 2022
Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

OpenDet Expanding Low-Density Latent Regions for Open-Set Object Detection (CVPR2022) Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-So

csuhan 40 Jun 22, 2022
[CVPR 2022] Official Pytorch code for OW-DETR: Open-world Detection Transformer

OW-DETR: Open-world Detection Transformer (CVPR 2022) [Paper] Akshita Gupta*, Sanath Narayan*, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Sh

Akshita Gupta 73 Jun 23, 2022
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

NVIDIA Research Projects 960 Jun 26, 2022
Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)

Self-Supervised Models are Continual Learners This is the official repository for the paper: Self-Supervised Models are Continual Learners Enrico Fini

Enrico Fini 35 Jun 7, 2022