Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

Tamar Rott Shaham

Last update: Dec 28, 2022

Related tags

Deep Learning ASAPNet

Overview

Image Translation with ASAPNets

Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

Webpage | Paper | Video

Installation

install requirements:

pip install -r requirements.txt

Code Structure

The code is heavily based on the official implementation of SPADE, and therefore has the saome structure:

train.py, test.py: the entry point for training and testing.
trainers/pix2pix_trainer.py: harnesses and reports the progress of training.
models/pix2pix_model.py: creates the networks, and compute the losses.
models/networks/: defines the architecture of all models.
options/: creates option lists using argparse package. More individuals are dynamically added in other files as well. Please see the section below.
data/: defines the class for loading images and label maps.

The ASAPNets generator is implementaed in:

models/networks/generator: defines the architecture of the ASAPNets generator.

Dataset Preparation

facades

run:

cd data 
bash facadesHR_download_and_extract.sh

This will extract the facades full resolution images into datasets/facadesHR.

cityscapes

download the dataset into datasets/cityscapes and arrange in folders: train_images, train_labels, val_images, val_labels

Generating Images Using Pretrained Models

Pretraned models can be downloaded from here. Save the models under the checkpoints/ folder. Images can be generated using the command:

# Facades 512
bash test_facades512.sh

# Facades 1024
bash test_facades512.sh

# Cityscapes
bash test_cityscapes.sh

The outputs images will appear at the./results/ folder.

Training New Models

New models can be trained with the following commands. Prepare dataset in the ./datasets/ folder. Arrange in folders: train_images, train_labels, val_images, val_labels . For custom datasets, the easiest way is to use ./data/custom_dataset.py by specifying the option --dataset_mode custom, along with --label_dir [path_to_labels] --image_dir [path_to_images]. You also need to specify options such as --label_nc for the number of label classes in the dataset, --contain_dontcare_label to specify whether it has an unknown label, or --no_instance to denote the dataset doesn't have instance maps.

Run:

python train.py --name [experiment_name] --dataset_mode custom --label_dir [path_to_labels] -- image_dir [path_to_images] --label_nc [num_labels]

There are many additional options you can specify, please explore the ./options files. To specify the number of GPUs to utilize, use --gpu_ids.

Testing

Testing is similar to testing pretrained models.

python test.py --name [name_of_experiment] --dataset_mode [dataset_mode] --dataroot [path_to_dataset]

you can load the parameters used from training by specifying --load_from_opt_file.

Acknowledgments

This code is heavily based on the official implementation of SPADE. We thank the authors for sharing their code publicly!

License

Attribution-NonCommercial-ShareAlike 4.0 International (see file).

Citation

@inproceedings{RottShaham2020ASAP,
  title={Spatially-Adaptive Pixelwise Networks for Fast Image Translation},
  author={Rott Shaham, Tamar and Gharbi, Michael and Zhang, Richard and Shechtman, Eli and Michaeli, Tomer},
  booktitle={Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

Comments

Why are my trained weights smaller than the provided pre-trained weights by 1Mb?

The weights I trained on the facedes dataset were 327MB and predicted nothing, while the pre-trained weights provided by the authors were 328Mb and were able to predict successfully.

opened by ChengruZhu 0
How did you compare speed with pix2pixHD in the paper?

Because pix2pixHD need to train global and local model, how did you train the pix2pixHD model when you comparing speed with your method? Did you train pix2pixHD global model in 256*256?

opened by kasim0226 0
What are options to train for depth estimation like Figure 10 in your paper

Hello, I want to use your wonderful work for depth estimation. But I could not start training with some errors. I tried this command

python train.py --name depthEstimation --dataset_mode custom --label_dir [monocular_Images_dir] --image_dir [depth_Images_dir] --no_instance_edge --no_instance_dist --no_one_hot

But I could not start training with this error.

RuntimeError: Given groups=1, weight of size 64 13 3 3, expected input[1, 3, 256, 256] to have 13 channels, but got 3 channels instead

The dataset images size is (512,512).

So please tell me options when you trained the depth estimation model with NYU dataset.

Thank you!

opened by 4iue0 0
About the comparison with not spatially varying f_p model

Thank you for the awesome work again. This work is very inspiring.

I have a question about the ablation study on the spatially-variant operation (Figure 9 (c) in the paper). Does this mean that f(x_p, p, phi_p; phi) is less effective than f(x_p, p; phi_p)? (where phi is spatially-invariant learnable parameter). If so, why?

Note1: In the case of f(x_p, p, phi_p; phi), the dimension for the phi_p should be much smaller since it now works as an input to the network. Note2: If we use f(x_p, p, phi_p; phi), I think it would be possible to find an analogy with LIIF model (which tackles arbitrary-scale SR problem). In other words, reversely, I think it is also possible to apply this paper's pixelwise MLP method to arbitrary-scale SR problem if directly predicting the MLP parameters is more efficient than putting the the feature as an input for the coordinate-based MLP.

opened by DongHwanJang 0
Wasn't there any artifact due to discontinuity at the border of upsampled parameters?

Since the nearest neighbor method is used for the parameter upsampling, it seems like the discontinuity at the border of parameters is inevitable. However, according to the qualitative results reported in the paper, those 16x16 grid artifacts are not observed.

Wasn't there any worry or concern about this when you design the network?

opened by DongHwanJang 0
Error on custom dataset

I tried with celebAmask-HQ dataset (MaskGAN) with 19 lebel_nc and got a strange error: RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED terminate called after throwing an instance of 'c10::Error' what(): CUDA error: device-side assert triggered

after searching for a while i found that someone posted the same kind of error on "pix2pixHD" model but i am yet unable to solve it. kindly help if anyone knows. BTW Facades dataset work fine so there is no problem with CudNN or CUDA i think.

opened by israrbacha 3

Owner

Tamar Rott Shaham

GitHub

Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

SCGAN Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer" Prepare The pre-trained model is avaiable at http

118 Dec 12, 2022

Official PyTorch code for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021)

Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021) This repository is the official PyTorc

139 Dec 29, 2022

Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

Related tags

Overview

Image Translation with ASAPNets

Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

Webpage | Paper | Video

Installation

Code Structure

Dataset Preparation

facades

cityscapes

Generating Images Using Pretrained Models

Training New Models

Testing

Acknowledgments

License

Citation

Comments

Why are my trained weights smaller than the provided pre-trained weights by 1Mb?

How did you compare speed with pix2pixHD in the paper?

What are options to train for depth estimation like Figure 10 in your paper

About the comparison with not spatially varying f_p model

Wasn't there any artifact due to discontinuity at the border of upsampled parameters?

Error on custom dataset

Owner

Tamar Rott Shaham

Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

Official PyTorch code for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021)

Toward Spatially Unbiased Generative Models (ICCV 2021)

Simple Tensorflow implementation of Toward Spatially Unbiased Generative Models (ICCV 2021)

[CVPR 2021] Region-aware Adaptive Instance Normalization for Image Harmonization

[CVPR 2021] Official PyTorch Implementation for "Iterative Filter Adaptive Network for Single Image Defocus Deblurring"

Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)

Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

PyTorch Implementation of Spatially Consistent Representation Learning(SCRL)

Dynamical movement primitives (DMPs), probabilistic movement primitives (ProMPs), spatially coupled bimanual DMPs.

Official PyTorch implementation of BlobGAN: Spatially Disentangled Scene Representations

Image-to-Image Translation with Conditional Adversarial Networks (Pix2pix) implementation in keras

PyTorch implementation of "Image-to-Image Translation Using Conditional Adversarial Networks".

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

[CVPR-2021] UnrealPerson: An adaptive pipeline for costless person re-identification

Adaptive Prototype Learning and Allocation for Few-Shot Segmentation (CVPR 2021)

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection