[CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

Hao Tang

Last update: Dec 2, 2022

Related tags

Deep Learning computer-vision deep-learning computer-graphics pytorch generative-adversarial-network image-manipulation image-generation gans image-translation dayton adversarial-learning image-to-image-translation semantic-maps cross-view cvpr-2019 cvpr2019 cvpr19 cvusa-dataset

Overview

SelectionGAN for Guided Image-to-Image Translation

CVPR Paper | Extended Paper | Guided-I2I-Translation-Papers

Citation

If you use this code for your research, please cite our papers.

@article{tang2020multi,
  title={Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation},
  author={Tang, Hao and Xu, Dan and Yan, Yan and Corso, Jason J and Torr, Philip HS and Sebe, Nicu},
  journal={arXiv preprint arXiv:2002.01048},
  year={2020}
}

@inproceedings{tang2019multichannel,
  title={Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation},
  author={Tang, Hao and Xu, Dan and Sebe, Nicu and Wang, Yanzhi and Corso, Jason J. and Yan, Yan},
  booktitle={CVPR},
  year={2019}
}

@article{tang2020edge,
  title={Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis},
  author={Tang, Hao and Qi, Xiaojuan and Xu, Dan and Torr, Philip HS and Sebe, Nicu},
  journal={arXiv preprint arXiv:2003.13898},
  year={2020}
}

@inproceedings{tang2019local,
  title={Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation},
  author={Tang, Hao and Xu, Dan and Yan, Yan and Torr, Philip HS and Sebe, Nicu},
  booktitle={CVPR},
  year={2020}
}

In the meantime, check out our related papers:

cross-view image translation:
- Cross-View Panorama Image Synthesis
person image generation:
- XingGAN for Person Image Generation (ECCV 2020)
- Bipartite Graph Reasoning GANs for Person Image Generation (BMVC 2020 Oral)
semantic image synthesis:

More related guided image-to-image translation papers can be found in this page.

To Do List

SelectionGAN: CVPR version
SelectionGAN++: TPAMI submission
Pix2pix++: Takes RGB image and target semantic map as inputs: code
X-ForK++: Takes RGB image and target semantic map as inputs: code
X-Seq++: Takes RGB image and target semantic map as inputs: code

Others

How to write a great science paper

Acknowledgments

This source code is inspired by Pix2pix.

Contributions

If you have any questions/comments/bug reports, feel free to open a github issue or pull a request or e-mail to the author Hao Tang ([email protected]).

Collaborations

I'm always interested in meeting new people and hearing about potential collaborations. If you'd like to work together or get in contact with me, please email [email protected]. Some of our projects are listed here.

In life, patience is the key. It's much better to be going somewhere slowly than nowhere fast.

Comments

question

Execute code command --continue_train --which_epoch 35 --epoch_count 36,it comes an error.It can not found _net_D.pth.coule you tell me how to solve it?

opened by haveayoungage 7
inception score

Hello, I trained a model on the CVUSA dataset, but the IS value and KL result are slightly worse during the evaluation. The following are my running commands. Are there any problems that I have not noticed? python train.py --dataroot ./datasets/cvusa/
--name cvusa_selectiongan
--model selectiongan
--which_model_netG unet_256
--which_direction AtoB
--dataset_mode aligned
--norm batch
--gpu_ids 0,1
--batchSize 4
--loadSize 286
--fineSize 256
--no_flip
--display_id 1
--lambda_L1 100
--lambda_L1_seg 1

opened by 04lm40 4
Running code

Where should we run this code .In collab and kaggle notebook , we are running out of space , to download data sets ,with given command for pose transfer (person transfer).

opened by Abhijithchintu 3
Can't download pretrained models

Hello. I've been trying to download your pretrained models but neither the Google Drive, nor the Baidu link works. Could you please check this? Thanks.

opened by MihaiDogariu 2
the code is different from the paper

in the paper, page 5, the Fm is said to be provided into 3 line, where a matrix multiplication operation was perform between 2 of them to create channel attention map A, but i can't see that in the paper. is that a new change? or it's a more effective way?

opened by dangbb 1
self.padding, self.dilation, self.groups) RuntimeError: Calculated padded input size per channel: (3 x 3). Kernel size: (4 x 4). Kernel size can't be greater than actual input size

self.padding, self.dilation, self.groups) RuntimeError: Calculated padded input size per channel: (3 x 3). Kernel size: (4 x 4). Kernel size can't be greater than actual input size

opened by GANGREEK 1

CNDNN_ERROR ?

Hello Sir,

Using my-datasets, I tried to train your code. But I met CUDNN-ERROR.

...
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [374,0,0], thread: [62,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [374,0,0], thread: [63,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
Traceback (most recent call last):
  File "/data1/TESTBOARD/additional_networks/generation/SelectionGAN_Ha0Tang/semantic_synthesis/train.py", line 40, in <module>
    trainer.run_generator_one_step(data_i)
  File "/data1/TESTBOARD/additional_networks/generation/SelectionGAN_Ha0Tang/semantic_synthesis/trainers/pix2pix_trainer.py", line 35, in run_generator_one_step
    g_losses, generated = self.pix2pix_model(data, mode='generator')
  File "/home/itsme/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/itsme/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/itsme/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data1/TESTBOARD/additional_networks/generation/SelectionGAN_Ha0Tang/semantic_synthesis/models/pix2pix_model.py", line 46, in forward
    input_semantics, real_image)
  File "/data1/TESTBOARD/additional_networks/generation/SelectionGAN_Ha0Tang/semantic_synthesis/models/pix2pix_model.py", line 136, in compute_generator_loss
    input_semantics, real_image, compute_kld_loss=self.opt.use_vae)
  File "/data1/TESTBOARD/additional_networks/generation/SelectionGAN_Ha0Tang/semantic_synthesis/models/pix2pix_model.py", line 198, in generate_fake
    fake_image = self.netG(input_semantics, z=z)
  File "/home/itsme/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data1/TESTBOARD/additional_networks/generation/SelectionGAN_Ha0Tang/semantic_synthesis/models/networks/generator.py", line 90, in forward
    x = self.fc(x)
  File "/home/itsme/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/itsme/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 443, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/itsme/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 440, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:1055 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f44b3256a22 in /home/itsme/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x10aa3 (0x7f44b34b7aa3 in /home/itsme/anaconda3/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a7 (0x7f44b34b9147 in /home/itsme/anaconda3/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7f44b32405a4 in /home/itsme/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0xa2f382 (0x7f4558065382 in /home/itsme/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0xa2f421 (0x7f4558065421 in /home/itsme/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #21: __libc_start_main + 0xe7 (0x7f455add0b97 in /lib/x86_64-linux-gnu/libc.so.6)

How to solve it??

Thanks, Edward Cho.

opened by edwardcho 8

Checkers on generated images

Hi,

I have tried to duplicate the result based on your dayton_a2g_256_pretrained model. However, there are many checkers in the generated images which is not clearly shown in Fig.4 of you paper. I followed all the hyperparameters as you mentioned in readme file

I am wondering if you encountered the same issue in your experiments or do I miss anything?

The attached is one of the screenshots. Thank you in advance!

opened by zhangdan8962 6

[CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

Related tags

Overview

SelectionGAN for Guided Image-to-Image Translation

CVPR Paper | Extended Paper | Guided-I2I-Translation-Papers

Citation

To Do List

Others

Acknowledgments

Contributions

Collaborations

Comments

question

inception score

Running code

Can't download pretrained models

the code is different from the paper

self.padding, self.dilation, self.groups) RuntimeError: Calculated padded input size per channel: (3 x 3). Kernel size: (4 x 4). Kernel size can't be greater than actual input size

CNDNN_ERROR ?

Checkers on generated images

Owner

Hao Tang

[CVPR'21] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (MTCNN)

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Photographic Image Synthesis with Cascaded Refinement Networks - Pytorch Implementation

Implementation of "Selection via Proxy: Efficient Data Selection for Deep Learning" from ICLR 2020.

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

An official implementation of "SFNet: Learning Object-aware Semantic Correspondence" (CVPR 2019, TPAMI 2020) in PyTorch.

STEAL - Learning Semantic Boundaries from Noisy Annotations (CVPR 2019)

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

Implementation of "Bidirectional Projection Network for Cross Dimension Scene Understanding" CVPR 2021 (Oral)

[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo