CLADE - Efficient Semantic Image Synthesis via Class-Adaptive Normalization (TPAMI 2021)

tzt

Last update: Nov 17, 2022

Related tags

Deep Learning CLADE

Overview

Efficient Semantic Image Synthesis via Class-Adaptive Normalization (Accepted by TPAMI)

ArXiv Paper

Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Gang Hua, Nenghai Yu

Abstract

Spatially-adaptive normalization SPADE is remarkably successful recently in conditional semantic image synthesis, which modulates the normalized activation with spatially-varying transformations learned from semantic layouts, to prevent the semantic information from being washed away. Despite its impressive performance, a more thorough understanding of the advantages inside the box is still highly demanded to help reduce the significant computation and parameter overhead introduced by this novel structure. In this paper, from a return-on-investment point of view, we conduct an in-depth analysis of the effectiveness of this spatially-adaptive normalization and observe that its modulation parameters benefit more from semantic-awareness rather than spatial-adaptiveness, especially for high-resolution input masks. Inspired by this observation, we propose class-adaptive normalization (CLADE), a lightweight but equally-effective variant that is only adaptive to semantic class. In order to further improve spatial-adaptiveness, we introduce intra-class positional map encoding calculated from semantic layouts to modulate the normalization parameters of CLADE and propose a truly spatially-adaptive variant of CLADE, namely CLADE-ICPE. %Benefiting from this design, CLADE greatly reduces the computation cost while being able to preserve the semantic information in the generation. Through extensive experiments on multiple challenging datasets, we demonstrate that the proposed CLADE can be generalized to different SPADE-based methods while achieving comparable generation quality compared to SPADE, but it is much more efficient with fewer extra parameters and lower computational cost.

Installation

Clone this repo.

git clone https://github.com/tzt101/CLADE.git
cd CLADE/

This code requires PyTorch 1.6 and python 3+. Please install dependencies by

pip install -r requirements.txt

Dataset Preparation

The Cityscapes, COCO-Stuff and ADE20K dataset can be download and prepared following SPADE. We provide the ADE20K-outdoor dataset selected by ourselves in OneDrive.

To make the distance mask which called intra-class positional encoding map in the paper, you can use the following commands:

python uitl/cal_dist_masks.py --path [Path_to_dataset] --dataset [ade20k | coco | cityscapes]

By default, the distance mask is normalized. If you do not want it, please set --norm no.

Generating Images Using Pretrained Model

Once the dataset is ready, the result images can be generated using pretrained models.

Download the pretrained models from the OneDrive, save it in checkpoints/. The structure is as follows:

./checkpoints/
    ade20k/
        best_net_G.pth
    ade20k_dist/
        best_net_G.pth
    ade20k_outdoor/
        best_net_G.pth
    ade20k_outdoor_dist/
        best_net_G.pth
    cityscapes/
        best_net_G.pth
    cityscapes_dist/
        best_net_G.pth
    coco/
        best_net_G.pth
    coco_dist/
        best_net_G.pth

_dist means that the model use the additional positional encoding, called CLADE-ICPE in the paper.

Generate the images on the test dataset.

python test.py --name [model_name] --norm_mode clade --batchSize 1 --gpu_ids 0 --which_epoch best --dataset_mode [dataset] --dataroot [Path_to_dataset]

[model_name] is the directory name of the checkpoint file downloaded in Step 1, such as ade20k and coco. [dataset] can be on of ade20k, ade20koutdoor, cityscapes and coco. [Path_to_dataset] is the path to the dataset. If you want to test CALDE-ICPE, the command is as follows:

python test.py --name [model_name] --norm_mode clade --batchSize 1 --gpu_ids 0 --which_epoch best --dataset_mode [dataset] --dataroot [Path_to_dataset] --add_dist

Training New Models

You can train your own model with the following command:

# To train CLADE and CLADE-ICPE.
python train.py --name [experiment_name] --dataset_mode [dataset] --norm_mode clade --dataroot [Path_to_dataset]
python train.py --name [experiment_name] --dataset_mode [dataset] --norm_mode clade --dataroot [Path_to_dataset] --add_dist

If you want to test the model during the training step, please set --train_eval. By default, the model every 10 epoch will be test in terms of FID. Finally, the model with best FID score will be saved as best_net_G.pth.

Calculate FID

We provide the code to calculate the FID which is based on rpo. We have pre-calculated the distribution of real images (all images are resized to 256×256 except cityscapes is 512×256) in training set of each dataset and saved them in ./datasets/train_mu_si/. You can run the following command:

python fid_score.py [Path_to_real_image] [Path_to_fake_image] --batch-size 1 --gpu 0 --load_np_name [dataset] --resize [Size]

The provided [dataset] are: ade20k, ade20koutdoor, cityscapes and coco. You can save the new dataset by replacing --load_np_name [dataset] with --save_np_name [dataset].

New Useful Options

The new options are as follows:

--use_amp: if specified, use AMP training mode.
--train_eval: if sepcified, evaluate the model during training.
--eval_dims: the default setting is 2048, Dimensionality of Inception features to use.
--eval_epoch_freq: the default setting is 10, frequency of calculate fid score at the end of epochs.

Code Structure

train.py, test.py: the entry point for training and testing.
trainers/pix2pix_trainer.py: harnesses and reports the progress of training.
models/pix2pix_model.py: creates the networks, and compute the losses
models/networks/: defines the architecture of all models
options/: creates option lists using argparse package. More individuals are dynamically added in other files as well. Please see the section below.
data/: defines the class for loading images and label maps.

Citation

If you use this code for your research, please cite our papers.

@article{tan2021efficient,
  title={Efficient Semantic Image Synthesis via Class-Adaptive Normalization},
  author={Tan, Zhentao and Chen, Dongdong and Chu, Qi and Chai, Menglei and Liao, Jing and He, Mingming and Yuan, Lu and Hua, Gang and Yu, Nenghai},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2021},
  publisher={IEEE}
}
@article{tan2020rethinking,
  title={Rethinking Spatially-Adaptive Normalization},
  author={Tan, Zhentao and Chen, Dongdong and Chu, Qi and Chai, Menglei and Liao, Jing and He, Mingming and Yuan, Lu and Yu, Nenghai},
  journal={arXiv preprint arXiv:2004.02867},
  year={2020}
}
@article{tan2020semantic,
  title={Semantic Image Synthesis via Efficient Class-Adaptive Normalization},
  author={Tan, Zhentao and Chen, Dongdong and Chu, Qi and Chai, Menglei and Liao, Jing and He, Mingming and Yuan, Lu and Gang Hua and Yu, Nenghai},
  journal={arXiv preprint arXiv:2012.04644},
  year={2020}
}

Acknowledgments

This code borrows heavily from SPADE.

DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition, TPAMI 2021

DVG-Face: Dual Variational Generation for HFR This repo is a PyTorch implementation of DVG-Face: Dual Variational Generation for Heterogeneous Face Re

52 Dec 30, 2022

🔥RandLA-Net in Tensorflow (CVPR 2020, Oral & IEEE TPAMI 2021)

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds (CVPR 2020) This is the official implementation of RandLA-Net (CVPR2020, Oral

1k Dec 30, 2022

[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

FedBN: Federated Learning on Non-IID Features via Local Batch Normalization This is the PyTorch implemention of our paper FedBN: Federated Learning on

156 Dec 15, 2022

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization Official PyTorch implementation for our URST (Ultra-Resolution Sty

148 Dec 27, 2022

Official implementation of the paper 'Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution'

DASR Paper Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution Jie Liang, Hui Zeng, and Lei Zhang. In arxiv preprint. Abs

81 Dec 28, 2022

AdaFocus (ICCV 2021) Adaptive Focus for Efficient Video Recognition

AdaFocus (ICCV 2021) This repo contains the official code and pre-trained models for AdaFocus. Adaptive Focus for Efficient Video Recognition Referenc

115 Dec 21, 2022

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021, official Pytorch implementatio

247 Dec 25, 2022

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Res2Net The official pytorch implemention of the paper "Res2Net: A New Multi-scale Backbone Architecture" Our paper is accepted by IEEE Transactions o

928 Dec 29, 2022

The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.

Face Alignment in Full Pose Range: A 3D Total Solution By Jianzhu Guo. [Updates] 2020.8.30: The pre-trained model and code of ECCV-20 are made public

3.4k Jan 2, 2023

Comments

Can't train with --add_dist

Hello,I am greatly inspired by your work.But when I train with --add_dist ,it show

(File "/home/CLADEv1/CLADEv1/train.py", line 76, in for i, data_i in enumerate(dataloader, start=iter_counter.epoch_iter): File "/usr/local/python3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 346, in next data = self.dataset_fetcher.fetch(index) # may raise StopIteration File "/usr/local/python3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/python3/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/CLADEv1/CLADEv1/data/pix2pix_dataset.py", line 98, in getitem dist_np = np.load(self.dist_paths[index]) IndexError: list index out of range)

Can you tell me how to solve it? Thank you。

opened by 925710545 3
Reproduce the mIoU for cityscapes

Hello, Thank you for making the code available. I'm trying to reproduce the mIoU for cityscapes. I want to make sure how should I test the generated images using drn --ms. What is the resolution when you calculate the mIoU using DRN ? do you upsample the genereted results before testing mIoU or downsample the labels?

Thanks again

opened by ftll574 2
Pretrained model for the discriminators network

Hello, I want to fine-tune the available pretrained network for the ADK2016-Outdoor dataset. The files for the generator network are available, but for training also the discriminator is needed. I don't find the weights for the discriminator in the published folders.

Is it possible that you also share the pretrained discriminator, for fine-tuning the network?

opened by TobiasMei 1

CLADE - Efficient Semantic Image Synthesis via Class-Adaptive Normalization (TPAMI 2021)

Related tags

Overview

Efficient Semantic Image Synthesis via Class-Adaptive Normalization (Accepted by TPAMI)

ArXiv Paper

Abstract

Installation

Dataset Preparation

Generating Images Using Pretrained Model

Training New Models

Calculate FID

New Useful Options

Code Structure

Citation

Acknowledgments

You might also like...

DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition, TPAMI 2021

🔥RandLA-Net in Tensorflow (CVPR 2020, Oral & IEEE TPAMI 2021)

[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

Official implementation of the paper 'Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution'

AdaFocus (ICCV 2021) Adaptive Focus for Efficient Video Recognition

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.

Comments

Can't train with --add_dist

Reproduce the mIoU for cityscapes

Pretrained model for the discriminators network

Owner

tzt

An official implementation of "SFNet: Learning Object-aware Semantic Correspondence" (CVPR 2019, TPAMI 2020) in PyTorch.

Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Unofficial pytorch implementation of 'Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization'

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Official PyTorch implementation of "VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization" (CVPR 2021)

Implementation of Diverse Semantic Image Synthesis via Probability Distribution Modeling

A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

The implementation of 'Image synthesis via semantic composition'.

Iterative Normalization: Beyond Standardization towards Efficient Whitening

Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection