Text to Image Generation with Semantic-Spatial Aware GAN

Overview

text2image

This repository includes the implementation for Text to Image Generation with Semantic-Spatial Aware GAN

This repo is not completely.

Network Structure

network_structure

The structure of the spatial-semantic aware convolutional network (SSACN) is shown as below

ssacn

Requirements

  • python 3.6+
  • pytorch 1.0+
  • numpy
  • matplotlib
  • opencv

Or install full requirements by running:

pip install -r requirements.txt

TODO

  • instruction to prepare dataset
  • remove all unnecessary files
  • add link to download our pre-trained model
  • clean code including comments
  • instruction for training
  • instruction for evaluation

Prepare data

  1. Download the preprocessed metadata for birds coco and save them to data/
  2. Download the birds image data. Extract them to data/birds/
  3. Download coco dataset and extract the images to data/coco/

Pre-trained text encoder

  1. Download the pre-trained text encoder for CUB and save it to DAMSMencoders/bird/inception/
  2. Download the pre-trained text encoder for coco and save it to DAMSMencoders/coco/inception/

Trained model

you can download our trained models from our onedrive repo

Start training

See opts.py for the options.

Evaluation

Performance

You will get the scores close to below after training under xe loss for xxxxx epochs:

results

Qualitative Results

Some qualitative results on coco and birds dataset from different methods are shown as follows: qualitative_results

The predicted mask maps on different stages are shown as as follows: mask

Reference

If you find this repo helpful in your research, please consider citing our paper:

@article{liao2021text,
  title={Text to Image Generation with Semantic-Spatial Aware GAN},
  author={Liao, Wentong and Hu, Kai and Yang, Michael Ying and Rosenhahn, Bodo},
  journal={arXiv preprint arXiv:2104.00567},
  year={2021}
}

The code is released for academic research use only. For commercial use, please contact Wentong Liao.

Acknowledgements

This implementation borrows part of the code from DF-GAN.

Comments
  • inference.py file

    inference.py file

    Hi there, A brilliant work! Thanks.

    I would be more grateful if you can provide the inference.py, sometimes also called predict.py, by which I can generate image of any input sentence.

    opened by Cm744 12
  • How to put the data to evaluate the result?

    How to put the data to evaluate the result?

    Could you please tell me how to put the image data into a folder when I want to use FID, LPIPS, or IS? The train set and data set are split by ".pikle" files and I don't know how to the evaluation metric.

    opened by cookie-ke 10
  • Problems encountered in training

    Problems encountered in training

    Hello, author. At the beginning of the training, the system asked me for the netG model of the 600th training. After I downloaded the model you gave me, the program continued to run, but stopped after only a few minutes. Ask why you need the model for the 600th run to start training and why the program stops after a few minutes. Thanks to the author. 我的英文不太好,加一个我原本的意思。 作者你好,在开始训练时系统向我索要第600次训练的netG模型。我在您所给的模型下载下来后,程序可以继续运行,但只运行了几分钟就会停下了。请问为什么开始训练需要第600次的模型,还有为什么程序运行几分钟就会停下来。感谢作者。

    opened by 1hexf1 7
  • How to calculate R-precision?

    How to calculate R-precision?

    @wtliao

    I wanted to calculate R-precision for comparisons with few state-of-the-art benchmarks so I adjusted the code into main.py to calculate R-precision using already given finetuned trained model by you. I achieved 73% value around. I was just checking your paper newest (5th) version where you have reported R precision 86%. Can you please share your code to calculate the R-precision so that I can figure out why we have large differences in our R precision scores.

    Thank you

    opened by priyankaupadhyay090 5
  • Issue of FID and IS score

    Issue of FID and IS score

    @wtliao

    I have calculated the scores:

    • FID score but I am getting very high value : FID: 73.33472569962976
    • IS score is quite low : Inception mean: 4.732609 , Inception std: 0.1345223

    For now, I am generating the images from epoch 550

    Is the best FID score is calculated from last epoch 550 ? or Should I calculate the IS and FID score of the checkpoints every 10 or 50 epochs (or let me know what epoch I should use) and then choose the checkpoints with best FID and IS score ?

    opened by priyankaupadhyay090 3
  • seek help, How can I solve this problem

    seek help, How can I solve this problem

    Traceback (most recent call last): File "main.py", line 410, in image_encoder = CNN_ENCODER(cfg.TEXT.EMBEDDING_DIM) File "/data01/hxf/text2image/DAMSM.py", line 127, in init model = models.inception_v3(pretrained=True, transform_input=False) File "/home/omnisky/hxf/lib/python3.6/site-packages/torchvision/models/inception.py", line 53, in inception_v3 progress=progress) File "/home/omnisky/hxf/lib/python3.6/site-packages/torch/hub.py", line 557, in load_state_dict_from_url return torch.load(cached_file, map_location=map_location) File "/home/omnisky/hxf/lib/python3.6/site-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/omnisky/hxf/lib/python3.6/site-packages/torch/serialization.py", line 777, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'.

    opened by 1hexf1 3
  • How to train the models (instead of using provided trained model on onedrive repo)

    How to train the models (instead of using provided trained model on onedrive repo)

    @wtliao I want to train the CUB and COCO models by myself instead of using provided trained model (onedrive repo). Would model.py help me to train the model ? or is there something more I should follow ?

    opened by priyankaupadhyay090 1
  • about the mask maps

    about the mask maps

    Hi, thank you for your excellent releasing code! I have a little question, how can I visualize the predicted mask maps when validation. I attempted to use the save_image() API from torchvision.utils, but the results are not consistent with the provided mask maps from the paper.

    Looking forward to your reply! Thank you!

    opened by zjuirene 0
  • captions.pickle no found

    captions.pickle no found

    Hello, I am a freshman, I want to run this program. But when things got tough.(FileNotFoundError: [Errno 2] No such file or directory: '../data/birds/captions.pickle'). How should this step be solved, thank you.

    opened by 1hexf1 0
  • Help, this problem how to solve

    Help, this problem how to solve

    Traceback (most recent call last): File "D:\text2image-main\text2image-main\main.py", line 496, in dataset = TextDataset(cfg.DATA_DIR, 'test', File "D:\text2image-main\text2image-main\datasets.py", line 124, in init self.wordtoix, self.n_words = self.load_text_data(data_dir, split) File "D:\text2image-main\text2image-main\datasets.py", line 243, in load_text_data x = pickle.load(f) EOFError: Ran out of input

    i find something ways to solve , while set "num_workers = 0". but this way cannot solve. please help

    opened by WenBingo 0
  • FID not matching for COCO dataset

    FID not matching for COCO dataset

    Hi,

    Thanks for providing the code for the paper!

    I tried to reproduce the FID score on the COCO dataset by generating images for the validation dataset as reported in the paper using the generator for 120th epoch (netG_120.pth) and text encoder for 595th epoch. I used the Pytorch implementation of the FID score and it gives me around 121 FID score as opposed to 19 reported.

    I had resized the original COCO images to 256x256 resolution to have a consistent image size but the score is still high.

    Even the generation is weird for sample sentences.

    For eg.

    The image attached has caption "A close up of a boat on a field with a cloudy sky". This caption has been taken from the paper but the generation using the final generator model and text encoder model is nowhere near presented in the paper.

    Any suggestions from your side as to what has to be done? img_0

    Also can you please mention the difference between main.py and main_finetune.py? There is not much of a difference in both these scripts.

    opened by Mishra1995 1
  • FID score in COCO

    FID score in COCO

    Hello,I test the trained models from your onedrive in COCO,the FID score is 28.1929. It has a big gap with the paper data (19.37) . My FID code is from https://github.com/bioinf-jku/TTUR and my GPU is 4 3090 . I'm confused about this and want to know the parameters that your FID code sets about COCO, such as the image size of groundtruth,whether or not you split the coco data and so on.Thanks

    opened by suhengleaf 1
Owner
CVDDL
MRSA Leibniz Uni Hannover TNT CV&ML
CVDDL
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022) https://arxiv.org/abs/2203.09388 Jianqi Ma, Zheto

MA Jianqi, shiki 104 Jan 5, 2023
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

null 23 Oct 17, 2022
A 1.3B text-to-image generation model trained on 14 million image-text pairs

minDALL-E on Conceptual Captions minDALL-E, named after minGPT, is a 1.3B text-to-image generation model trained on 14 million image-text pairs for no

Kakao Brain 604 Dec 14, 2022
π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis Project Page | Paper | Data Eric Ryan Chan*, Marco Monteiro*, Pe

null 375 Dec 31, 2022
Code of Adverse Weather Image Translation with Asymmetric and Uncertainty aware GAN

Adverse Weather Image Translation with Asymmetric and Uncertainty-aware GAN (AU-GAN) Official Tensorflow implementation of Adverse Weather Image Trans

Jeong-gi Kwak 36 Dec 26, 2022
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

Microsoft 61 Nov 14, 2022
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r

null 6 Dec 1, 2021
DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time Introduction This is official implementation for DR-GAN (IEEE TCS

Kang Liao 18 Dec 23, 2022
Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

MUST-GAN Code | paper The Pytorch implementation of our CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generat

TianxiangMa 46 Dec 26, 2022
Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

tyty 4 Aug 28, 2022
Maximum Spatial Perturbation for Image-to-Image Translation (Official Implementation)

MSPC for I2I This repository is by Yanwu Xu and contains the PyTorch source code to reproduce the experiments in our CVPR2022 paper Maximum Spatial Pe

null 51 Dec 14, 2022
AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

Multimedia Research 214 Jan 3, 2023
GAN Image Generator and Characterwise Image Recognizer with python

MODEL SUMMARY 모델의 구조는 크게 6단계로 나뉩니다. STEP 0: Input Image Predict 할 이미지를 모델에 입력합니다. STEP 1: Make Black and White Image STEP 1 은 입력받은 이미지의 글자를 흑색으로, 배경을

Juwan HAN 1 Feb 9, 2022
An implementation of the [Hierarchical (Sig-Wasserstein) GAN] algorithm for large dimensional Time Series Generation

Hierarchical GAN for large dimensional financial market data Implementation This repository is an implementation of the [Hierarchical (Sig-Wasserstein

null 11 Nov 29, 2022
This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

null 5 Jan 4, 2023
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Ren Yurui 261 Jan 9, 2023
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Website | ArXiv | Get Start | Video PIRenderer The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic

Ren Yurui 81 Sep 25, 2021