Text to Image Generation with Semantic-Spatial Aware GAN

CVDDL

Last update: Dec 30, 2022

Related tags

Deep Learning text2image

Overview

text2image

This repository includes the implementation for Text to Image Generation with Semantic-Spatial Aware GAN

This repo is not completely.

Network Structure

The structure of the spatial-semantic aware convolutional network (SSACN) is shown as below

Requirements

python 3.6+
pytorch 1.0+
numpy
matplotlib
opencv

Or install full requirements by running:

pip install -r requirements.txt

TODO

instruction to prepare dataset
remove all unnecessary files
add link to download our pre-trained model
clean code including comments
instruction for training
instruction for evaluation

Prepare data

Download the preprocessed metadata for birds coco and save them to data/
Download the birds image data. Extract them to data/birds/
Download coco dataset and extract the images to data/coco/

Pre-trained text encoder

Download the pre-trained text encoder for CUB and save it to DAMSMencoders/bird/inception/
Download the pre-trained text encoder for coco and save it to DAMSMencoders/coco/inception/

Trained model

you can download our trained models from our onedrive repo

Start training

See opts.py for the options.

Evaluation

Performance

You will get the scores close to below after training under xe loss for xxxxx epochs:

Qualitative Results

Some qualitative results on coco and birds dataset from different methods are shown as follows:

The predicted mask maps on different stages are shown as as follows:

Reference

If you find this repo helpful in your research, please consider citing our paper:

@article{liao2021text,
  title={Text to Image Generation with Semantic-Spatial Aware GAN},
  author={Liao, Wentong and Hu, Kai and Yang, Michael Ying and Rosenhahn, Bodo},
  journal={arXiv preprint arXiv:2104.00567},
  year={2021}
}

The code is released for academic research use only. For commercial use, please contact Wentong Liao.

Acknowledgements

This implementation borrows part of the code from DF-GAN.

Comments

inference.py file

Hi there, A brilliant work! Thanks.

I would be more grateful if you can provide the inference.py, sometimes also called predict.py, by which I can generate image of any input sentence.

opened by Cm744 12
How to put the data to evaluate the result?

Could you please tell me how to put the image data into a folder when I want to use FID, LPIPS, or IS? The train set and data set are split by ".pikle" files and I don't know how to the evaluation metric.

opened by cookie-ke 10
Problems encountered in training

Hello, author. At the beginning of the training, the system asked me for the netG model of the 600th training. After I downloaded the model you gave me, the program continued to run, but stopped after only a few minutes. Ask why you need the model for the 600th run to start training and why the program stops after a few minutes. Thanks to the author. 我的英文不太好，加一个我原本的意思。作者你好，在开始训练时系统向我索要第600次训练的netG模型。我在您所给的模型下载下来后，程序可以继续运行，但只运行了几分钟就会停下了。请问为什么开始训练需要第600次的模型，还有为什么程序运行几分钟就会停下来。感谢作者。

opened by 1hexf1 7
How to calculate R-precision?

@wtliao

I wanted to calculate R-precision for comparisons with few state-of-the-art benchmarks so I adjusted the code into main.py to calculate R-precision using already given finetuned trained model by you. I achieved 73% value around. I was just checking your paper newest (5th) version where you have reported R precision 86%. Can you please share your code to calculate the R-precision so that I can figure out why we have large differences in our R precision scores.

Thank you

opened by priyankaupadhyay090 5
Issue of FID and IS score
@wtliao

I have calculated the scores:

FID score but I am getting very high value : FID: 73.33472569962976

IS score is quite low : Inception mean: 4.732609 , Inception std: 0.1345223

For now, I am generating the images from epoch 550

Is the best FID score is calculated from last epoch 550 ? or Should I calculate the IS and FID score of the checkpoints every 10 or 50 epochs (or let me know what epoch I should use) and then choose the checkpoints with best FID and IS score ?
opened by priyankaupadhyay090 3
seek help, How can I solve this problem

Traceback (most recent call last): File "main.py", line 410, in image_encoder = CNN_ENCODER(cfg.TEXT.EMBEDDING_DIM) File "/data01/hxf/text2image/DAMSM.py", line 127, in init model = models.inception_v3(pretrained=True, transform_input=False) File "/home/omnisky/hxf/lib/python3.6/site-packages/torchvision/models/inception.py", line 53, in inception_v3 progress=progress) File "/home/omnisky/hxf/lib/python3.6/site-packages/torch/hub.py", line 557, in load_state_dict_from_url return torch.load(cached_file, map_location=map_location) File "/home/omnisky/hxf/lib/python3.6/site-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/omnisky/hxf/lib/python3.6/site-packages/torch/serialization.py", line 777, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'.

opened by 1hexf1 3
How to train the models (instead of using provided trained model on onedrive repo)

@wtliao I want to train the CUB and COCO models by myself instead of using provided trained model (onedrive repo). Would model.py help me to train the model ? or is there something more I should follow ?

opened by priyankaupadhyay090 1
about the mask maps

Hi, thank you for your excellent releasing code! I have a little question, how can I visualize the predicted mask maps when validation. I attempted to use the save_image() API from torchvision.utils, but the results are not consistent with the provided mask maps from the paper.

Looking forward to your reply! Thank you!

opened by zjuirene 0
captions.pickle no found

Hello, I am a freshman, I want to run this program. But when things got tough.(FileNotFoundError: [Errno 2] No such file or directory: '../data/birds/captions.pickle'). How should this step be solved, thank you.

opened by 1hexf1 0
Help, this problem how to solve

Traceback (most recent call last): File "D:\text2image-main\text2image-main\main.py", line 496, in dataset = TextDataset(cfg.DATA_DIR, 'test', File "D:\text2image-main\text2image-main\datasets.py", line 124, in init self.wordtoix, self.n_words = self.load_text_data(data_dir, split) File "D:\text2image-main\text2image-main\datasets.py", line 243, in load_text_data x = pickle.load(f) EOFError: Ran out of input

i find something ways to solve , while set "num_workers = 0". but this way cannot solve. please help

opened by WenBingo 0
FID not matching for COCO dataset

Hi,

Thanks for providing the code for the paper!

I tried to reproduce the FID score on the COCO dataset by generating images for the validation dataset as reported in the paper using the generator for 120th epoch (netG_120.pth) and text encoder for 595th epoch. I used the Pytorch implementation of the FID score and it gives me around 121 FID score as opposed to 19 reported.

I had resized the original COCO images to 256x256 resolution to have a consistent image size but the score is still high.

Even the generation is weird for sample sentences.

For eg.

The image attached has caption "A close up of a boat on a field with a cloudy sky". This caption has been taken from the paper but the generation using the final generator model and text encoder model is nowhere near presented in the paper.

Any suggestions from your side as to what has to be done?

Also can you please mention the difference between main.py and main_finetune.py? There is not much of a difference in both these scripts.

opened by Mishra1995 1
FID score in COCO

Hello，I test the trained models from your onedrive in COCO，the FID score is 28.1929. It has a big gap with the paper data (19.37) . My FID code is from https://github.com/bioinf-jku/TTUR and my GPU is 4 3090 . I'm confused about this and want to know the parameters that your FID code sets about COCO, such as the image size of groundtruth，whether or not you split the coco data and so on.Thanks

opened by suhengleaf 1

Owner

CVDDL

MRSA Leibniz Uni Hannover TNT CV&ML

GitHub

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022) https://arxiv.org/abs/2203.09388 Jianqi Ma, Zheto

104 Jan 5, 2023

Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

23 Oct 17, 2022

A 1.3B text-to-image generation model trained on 14 million image-text pairs

minDALL-E on Conceptual Captions minDALL-E, named after minGPT, is a 1.3B text-to-image generation model trained on 14 million image-text pairs for no

604 Dec 14, 2022

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

Siamese Deep Neural Networks for Semantic Text Similarity PyTorch A repository c

32 Dec 15, 2022

π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis Project Page | Paper | Data Eric Ryan Chan*, Marco Monteiro*, Pe

375 Dec 31, 2022

Code of Adverse Weather Image Translation with Asymmetric and Uncertainty aware GAN

Adverse Weather Image Translation with Asymmetric and Uncertainty-aware GAN (AU-GAN) Official Tensorflow implementation of Adverse Weather Image Trans

36 Dec 26, 2022

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

61 Nov 14, 2022

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r

6 Dec 1, 2021

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time Introduction This is official implementation for DR-GAN (IEEE TCS

18 Dec 23, 2022

Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

MUST-GAN Code | paper The Pytorch implementation of our CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generat

46 Dec 26, 2022

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

4 Aug 28, 2022

Maximum Spatial Perturbation for Image-to-Image Translation (Official Implementation)

MSPC for I2I This repository is by Yanwu Xu and contains the PyTorch source code to reproduce the experiments in our CVPR2022 paper Maximum Spatial Pe

51 Dec 14, 2022

AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

214 Jan 3, 2023

GAN Image Generator and Characterwise Image Recognizer with python

MODEL SUMMARY 모델의 구조는 크게 6단계로 나뉩니다. STEP 0: Input Image Predict 할 이미지를 모델에 입력합니다. STEP 1: Make Black and White Image STEP 1 은 입력받은 이미지의 글자를 흑색으로, 배경을

1 Feb 9, 2022

An implementation of the [Hierarchical (Sig-Wasserstein) GAN] algorithm for large dimensional Time Series Generation

Hierarchical GAN for large dimensional financial market data Implementation This repository is an implementation of the [Hierarchical (Sig-Wasserstein

11 Nov 29, 2022

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

5 Jan 4, 2023

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

337 Dec 15, 2022

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

261 Jan 9, 2023

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Website | ArXiv | Get Start | Video PIRenderer The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic

81 Sep 25, 2021