Generative Adversarial Text-to-Image Synthesis

Overview

###Generative Adversarial Text-to-Image Synthesis Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee

This is the code for our ICML 2016 paper on text-to-image synthesis using conditional GANs. You can use it to train and sample from text-to-image models. The code is adapted from the excellent dcgan.torch.

####Setup Instructions

You will need to install Torch, CuDNN, and the display package.

####How to train a text to image model:

  1. Download the birds and flowers and COCO caption data in Torch format.
  2. Download the birds and flowers and COCO image data.
  3. Download the text encoders for birds and flowers and COCO descriptions.
  4. Modify the CONFIG file to point to your data and text encoder paths.
  5. Run one of the training scripts, e.g. ./scripts/train_cub.sh

####How to generate samples:

  • For flowers: ./scripts/demo_flowers.sh. Add text descriptions to scripts/flowers_queries.txt.
  • For birds: ./scripts/demo_cub.sh.
  • For COCO (more general images): ./scripts/demo_coco.sh.
  • An html file will be generated with the results:

####Pretrained models:

####How to train a text encoder from scratch:

  • You may want to do this if you have your own new dataset of text descriptions.
  • For flowers and birds: follow the instructions here.
  • For MS-COCO: ./scripts/train_coco_txt.sh.

####Citation

If you find this useful, please cite our work as follows:

@inproceedings{reed2016generative,
  title={Generative Adversarial Text-to-Image Synthesis},
  author={Scott Reed and Zeynep Akata and Xinchen Yan and Lajanugen Logeswaran and Bernt Schiele and Honglak Lee},
  booktitle={Proceedings of The 33rd International Conference on Machine Learning},
  year={2016}
}
Comments
  •  bad argument #4 to 'v' (cannot convert 'struct THCudaLongTensor *' to 'struct THCudaTensor *')

    bad argument #4 to 'v' (cannot convert 'struct THCudaLongTensor *' to 'struct THCudaTensor *')

    rzai@rzai00:~/prj/icml2016$ bash scripts/train_coco_txt.sh { img_dir : "/media/rzai/ai_data/VQA-ALL/mscoco.org-visualqa.org/train2014" beta1 : 0.5 nThreads : 6 txtSize : 1024 niter : 200 batchSize : 256 lr_decay : 0.5 fineSize : 64 use_cudnn : 1 init_t : "" numCaption : 1 loadSize : 76 print_every : 4 encoder : "gru18" name : "coco_gru18_bs256_c512" gpu : 1 checkpoint_dir : "checkpoints" dataset : "coco_txt" filenames : "" lr : 0.0002 ntrain : inf decay_every : 50 save_every : 5 data_root : "/home/rzai/_reedscot/de_coco_icml.tar.gz/train2014_ex_t7" doc_length : 201 cnn_dim : 512 display_id : 101 display : 0 } Random Seed: 3243 Starting donkey with id: 1 seed: 3244 Starting donkey with id: 5 seed: 3248 Starting donkey with id: 4 seed: 3247 Starting donkey with id: 2 seed: 3245 Starting donkey with id: 3 seed: 3246 Starting donkey with id: 6 seed: 3249 Dataset: coco_txt Size: 82783 Warning: cudnn.convert does not work with nngraph yet. Ignoring nn.gModuleWarning: cudnn.convert does not work with nngraph yet. Ignoring nn.gModule/home/rzai/torch/install/bin/luajit: /home/rzai/torch/install/share/lua/5.1/nn/Container.lua:67: In 3 module of nn.Sequential: /home/rzai/torch/install/share/lua/5.1/nn/THNN.lua:110: bad argument #4 to 'v' (cannot convert 'struct THCudaLongTensor *' to 'struct THCudaTensor *') stack traceback: [C]: in function 'v' /home/rzai/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'TemporalMaxPooling_updateOutput' ...ai/torch/install/share/lua/5.1/nn/TemporalMaxPooling.lua:19: in function <...ai/torch/install/share/lua/5.1/nn/TemporalMaxPooling.lua:12> [C]: in function 'xpcall' /home/rzai/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/rzai/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'func' /home/rzai/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval' /home/rzai/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward' main_txt_coco.lua:181: in function 'opfunc' /home/rzai/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam' main_txt_coco.lua:207: in main chunk [C]: in function 'dofile' ...rzai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670

    WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above. stack traceback: [C]: in function 'error' /home/rzai/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors' /home/rzai/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'func' /home/rzai/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval' /home/rzai/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward' main_txt_coco.lua:181: in function 'opfunc' /home/rzai/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam' main_txt_coco.lua:207: in main chunk [C]: in function 'dofile' ...rzai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670 rzai@rzai00:~/prj/icml2016$

    opened by loveJasmine 4
  • Image information when training the text encoder for COCO?

    Image information when training the text encoder for COCO?

    Hi Reed

    This is really a nice work and thanks for sharing the code.

    However, when I trying to read the code for text encoder for COCO, it seems your loss function is just using the text information. there is no image information used here. However, things are different for flower and bird dataset.

    https://github.com/reedscot/icml2016/blob/master/main_txt_coco.lua#L138

    Could you give some hint why you did this?

    Thanks

    Best

    Jiasen

    opened by jiasenlu 2
  • out of memory on demo_coco

    out of memory on demo_coco

    Hello there, We tried to test your model on a computer with following configuration :

    Ubuntu 16.04 with CUDA 8, CuDNN 5 & Torch 7

    The demo_coco.sh failed with following error

    THCudaCheck FAIL file=/home/sku/torch/extra/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory /home/sku/torch/install/bin/luajit: /home/sku/torch/install/share/lua/5.1/torch/File.lua:351: cuda runtime error (2) : out of memory at /home/sku/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:66 stack traceback: [C]: in function 'read' /home/sku/torch/install/share/lua/5.1/torch/File.lua:351: in function </home/sku/torch/install/share/lua/5.1/torch/File.lua:245> [C]: in function 'read' /home/sku/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject' /home/sku/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' /home/sku/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read' /home/sku/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject' /home/sku/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' /home/sku/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' /home/sku/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read' /home/sku/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject' /home/sku/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load' txt2img_demo.lua:43: in main chunk [C]: in function 'dofile' ...sku/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

    Error remain the same when changing batch_size from 16 to 8 or 1 in txt2img_demo.lua Does anyone have any idea to solve this problem ?

    opened by olivain 0
  • Train another dataset

    Train another dataset

    Hi Scott , nice job and thanks for sharing your code! I have the sign langage dataset (0->9 digits) and I want to train so that I can use the model (.mlmodel) in my ios app . Could you please tell me how can I change your code so that I can train my dataset? Thanks.

    opened by siryne 0
  • About the coco text encoder

    About the coco text encoder

    Hi! Thanks for your code sharing.I am really interested in your work. But I can't find the checkpoint about COCO_NET_TXT. Would u mind sharing the file named "COCO_NET_TXT=/home/reedscot/checkpoints/coco_gru18_bs64_cls0.5_ngf128_ndf128_a10_c512_80_net_T.t7"?

    opened by MikeXuQ 0
  • Inconsistencies in /cub_icml. Updating CUB class name.

    Inconsistencies in /cub_icml. Updating CUB class name.

    Recently, I downloaded the latest CUB datasets and found some inconsistencies between the caption classes and CUB image classes. These classes have been renamed as: 009.Brewers_Blackbird -> 009.Brewer_Blackbird 022.Chuck_wills_Widow -> 022.Chuck_will_Widow 023.Brandts_Cormorant -> 023.Brandt_Cormorant 061.Heermanns_Gull -> 061.Heermann_Gull 067.Annas_Hummingbird -> 067.Anna_Hummingbird 093.Clarks_Nutcracker -> 093.Clark_Nutcracker 098.Scotts_Oriole -> 098.Scott_Oriole 113.Bairds_Sparrow -> 113.Baird_Sparrow 115.Brewers_Sparrow -> 115.Brewer_Sparrow 122.Harriss_Sparrow -> 122.Harris_Sparrow 123.Henslows_Sparrow -> 123.Henslow_Sparrow 124.Le_Contes_Sparrow -> 124.Le_Conte_Sparrow 125.Lincolns_Sparrow -> 125.Lincoln_Sparrow 126.Nelson_Sparrow -> 126.Nelson_Sharp_tailed_Sparrow 178.Swainsons_Warbler -> 178.Swainson_Warbler 180.Wilsons_Warbler -> 180.Wilson_Warbler 193.Bewicks_Wren -> 193.Bewick_Wren

    Please rename these folders in cub/icml.

    opened by jingliao132 0
  • Steps to follow to train a text encoder from scratch in README.MD

    Steps to follow to train a text encoder from scratch in README.MD

    Hi, you had mentioned in README.MD: "You may want to do this if you have your own new dataset of text descriptions."

    But there is no link pointing to the instructions. Please add. Thanks! 👍

    opened by shivani-kapania 1
Owner
Scott Ellison Reed
Research Scientist
Scott Ellison Reed
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis Multi-View Consistent Generative Adversarial Networks for 3D-aware

Xuanmeng Zhang 78 Dec 10, 2022
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

Rishikesh (ऋषिकेश) 31 Dec 8, 2022
Official code release for "GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis"

GRAF This repository contains official code for the paper GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. You can find detailed usage i

null 349 Dec 29, 2022
A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis

A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis This is the pytorch implementation for our MICCAI 2021 paper. A Mul

Jiarong Ye 7 Apr 4, 2022
A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

Semantic Image Synthesis via Adversarial Learning This is a PyTorch implementation of the paper Semantic Image Synthesis via Adversarial Learning. Req

Seonghyeon Nam 146 Nov 25, 2022
StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

null 3k Jan 8, 2023
[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks, ICLR 2021 (Spotlight) Demo | Paper [NEW!] Time to play with our interac

Shengyu Zhao 373 Jan 2, 2023
Image Deblurring using Generative Adversarial Networks

DeblurGAN arXiv Paper Version Pytorch implementation of the paper DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. Our netwo

Orest Kupyn 2.2k Jan 1, 2023
pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

PyTorch SRResNet Implementation of Paper: "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network"(https://arxiv.org/abs

Jiu XU 436 Jan 9, 2023
Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

Flickr-Faces-HQ Dataset (FFHQ) Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative

NVIDIA Research Projects 2.9k Dec 28, 2022
Semi-supervised Representation Learning for Remote Sensing Image Classification Based on Generative Adversarial Networks

SSRL-for-image-classification Semi-supervised Representation Learning for Remote Sensing Image Classification Based on Generative Adversarial Networks

Feng 2 Nov 19, 2021
A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

Yunxia Zhao 3 Dec 29, 2022
A Flow-based Generative Network for Speech Synthesis

WaveGlow: a Flow-based Generative Network for Speech Synthesis Ryan Prenger, Rafael Valle, and Bryan Catanzaro In our recent paper, we propose WaveGlo

NVIDIA Corporation 2k Dec 26, 2022
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

Yuchao Zhang 204 Jul 14, 2022
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Fre-GAN Vocoder Fre-GAN: Adversarial Frequency-consistent Audio Synthesis Training: python train.py --config config.json Citation: @misc{kim2021frega

Rishikesh (ऋषिकेश) 93 Dec 17, 2022
Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

Thomas Neumann 117 Nov 27, 2022
GANsformer: Generative Adversarial Transformers Drew A

GANsformer: Generative Adversarial Transformers Drew A. Hudson* & C. Lawrence Zitnick *I wish to thank Christopher D. Manning for the fruitf

Drew Arad Hudson 1.2k Jan 2, 2023
Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

ASEGAN: Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder 中文版简介 Readme with English Version 介绍 基于SEGAN模型的改进版本,使用自主设计的非

Nitin 53 Nov 17, 2022
Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)

Regularizing Generative Adversarial Networks under Limited Data [Project Page][Paper] Implementation for our GAN regularization method. The proposed r

Google 148 Nov 18, 2022