A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

Overview

Semantic Image Synthesis via Adversarial Learning

This is a PyTorch implementation of the paper Semantic Image Synthesis via Adversarial Learning.

Model architecture

Requirements

Pretrained word vectors for fastText

Download a pretrained English word vectors. You can see the list of pretrained vectors on this page.

Datasets

The caption data is from this repository. After downloading, modify CONFIG file so that all paths of the datasets point to the data you downloaded.

Run

  • scripts/train_text_embedding_[birds/flowers].sh
    Train a visual-semantic embedding model using the method of Kiros et al..
  • scripts/train_[birds/flowers].sh
    Train a GAN using a pretrained text embedding model.
  • scripts/test_[birds/flowers].sh
    Generate some examples using original images and semantically relevant texts.

Results

Flowers

Birds

Acknowledgements

We would like to thank Hao Dong, who is one of the first authors of the paper Semantic Image Synthesis via Adversarial Learning, for providing helpful advice for the implementation.

Comments
  • training model request

    training model request

    Hi, we are taking some experiments in different text guided image manipulation models. To be fair, could you provide your fine-tuned training model checkpoint file?

    opened by laxyon 2
  • Could you provide the pre-trained text embedding?

    Could you provide the pre-trained text embedding?

    Hi, thanks for your great implementation. I meet some problems in training a visual-semantic embedding. Would you mind directly offering the pre-trained text embedding so that I can directly train the model? Thanks!

    opened by HelenMao 2
  • train_text_embedding.py the 'trainclasses_file' should be which .txt in config?

    train_text_embedding.py the 'trainclasses_file' should be which .txt in config?

    there is a classes.txt in datasets CUB_200_2011 which is from your link, I used the classes.py as the 'trainclasses_file' but get the error like below: Traceback (most recent call last): File "/home/xin/PycharmProjects/colorchage2/train_text_embedding.py", line 81, in std=[0.229, 0.224, 0.225]) File "/home/xin/PycharmProjects/colorchage2/data.py", line 31, in init self.data = self._load_dataset(img_root, caption_root, classes_fllename, word_embedding) File "/home/xin/PycharmProjects/colorchage2/data.py", line 39, in _load_dataset filenames = os.listdir(os.path.join(caption_root, cls)) OSError: [Errno 2] No such file or directory: '/home/xin/PycharmProjects/colorchage2/datasets/CUB_200_2011/cub_icml/1 001.Black_footed_Albatross' I think it is because there are count numbers at the first place of every row in classes.txt, so it is not suitable to use this file. so where you get the 'trainclasses_file'? can you give a link?

    opened by XinCynthia 2
  • Computational Details

    Computational Details

    Hi,

    Thanks for creating this project. I was trying to train this model and wanted to know the configuration details of the machine on which you trained this (GPU type and memory, CPU cores, RAM used, etc). Also, it would be great if you can tell how much time it took for you to train this model.

    Thanks a lot.

    opened by apsdehal 2
  • fasttext error! Couldn't not load_model after tried your fixed code!

    fasttext error! Couldn't not load_model after tried your fixed code!

    Environment Azure NC6 56GB RAM Python2.7 after run './scripts/train_birds.sh', I got below error infomation:

    Loading a pretrained fastText model... Traceback (most recent call last): File "train.py", line 88, in word_embedding = fasttext.load_model(args.fasttext_model) File "fasttext/fasttext.pyx", line 154, in fasttext.fasttext.load_model Exception: fastText: Cannot load /home/zijie/research/data/fastText/wiki.en.bin due to C++ extension failed to allocate the memory

    opened by jeffzjzhang 1
  • can not generate realistic image

    can not generate realistic image

    I am new to GAN. I run your code and the generated images are not realistic. this is generated images after 150epoch epoch_150

    and this is generated images after 570 epoch epoch_570

    It seems that they are similar, and there are no improvement after training many epochs. Can you give me some advice. By the way, can you give me your pre-trained word-embedding model, the link you gave before is unavailable.

    opened by WangQinghuCS 0
  • code changes required for image size128

    code changes required for image size128

    Hi, I wanted to run the code for images of size 128 as input instead of 64. I see that image size 64 is hardcoded in "train.py" while transforming into tensors.

    Is there any other place across the code where I need to change the image size (or other parameters dependent on image size) to run it on 128 size input image.

    opened by loseway12138 1
  • A wrong in loaddataset.

    A wrong in loaddataset.

    Sorry to bother you again, but I encountered an error loading the data: Loading a dataset... Traceback (most recent call last): File "train_text_embedding.py", line 106, in img = img[indices, ...] File "/home/tjl/anaconda3/envs/tjl/lib/python3.6/site-packages/torch/autograd/variable.py", line 76, in getitem return Index.apply(self, key) File "/home/tjl/anaconda3/envs/tjl/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 16, in forward result = i.index(ctx.index) IndexError: When performing advanced indexing the indexing objects must be LongTensors or convertible to LongTensors My pytorch version is 0.2.0 and other environments have been configured. I hope you can give me some Suggestions on how to deal with this problem!

    opened by tangjialiang-jj 1
  • Severe mode collapse

    Severe mode collapse

    Hi all,

    I was training the model on birds dataset as well as my own data and very soon in training for both datasets (e.g from epoch 5) I start to see some ugly mode collapse that continues till epoch 100 and beyond.

    Example from my data (dresses):

    Epoch 3: Screenshot 2019-06-07 at 15 33 01

    Epoch 5: Screenshot 2019-06-07 at 15 31 42

    Epoch 23: Screenshot 2019-06-07 at 15 32 00

    Epoch 97: Screenshot 2019-06-07 at 15 25 10

    Do you have any ideas how to improve the training?

    opened by IvonaTau 1
  • A mistake in modified vgg encoder?

    A mistake in modified vgg encoder?

    https://github.com/woozzu/dong_iccv_2017/blob/e7f371aefb3aa2208df832ed779c16027c8f3600/model.py#L77-L83

    It seems that you used VGG16bn, and you modified the conv4 layers to a dilated convolution layer. But I found the encoders[24,27,30] were batch normalization layers. It seems an error.

    opened by Ereebay 5
  • how to get the 'txt_feat, txt_feat_mismatch, txt_feat_relevant'?

    how to get the 'txt_feat, txt_feat_mismatch, txt_feat_relevant'?

    Hi, I read your paper and try to implements it by myself, I don't know how to get the 'txt_feat_mismatch' and 'txt_feat_relevant', I can't understand that your step in the '/train-preprocess' , why we can't get it use 'np.roll()', am Imiss something? Please help, Thanks for you patient.

    opened by huangtao36 3
  • The kld loss in UPDATE GENERATOR process

    The kld loss in UPDATE GENERATOR process

    I noticed that you use the kld = torch.mean(-z_log_stddev + 0.5 * (torch.exp(2 * z_log_stddev) + torch.pow(z_mean, 2) - 1)) in UPDATE GENERATOR . But I don't understand that why you chose this as a part of your loss function. And it seems that it was not mentioned in the original paper. Could you please tell me its intention here? And apart from this, the z_log_stddev and z_mean are just got from two different Linear+LeakyReLU layer. Emmm... why did you use the Linear+LeakyReLU layer rather than calculate mean and std directly?

    Thanks for your help~

    opened by MapleSpirit 2
Owner
Seonghyeon Nam
Postdoc at York University
Seonghyeon Nam
PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules

Dynamic Routing Between Capsules - PyTorch implementation PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules from Sara Sabour,

Adam Bielski 475 Dec 24, 2022
Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

Karan Desai 105 Nov 25, 2022
Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

Karan Desai 105 Nov 25, 2022
The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.

Face Alignment in Full Pose Range: A 3D Total Solution By Jianzhu Guo. [Updates] 2020.8.30: The pre-trained model and code of ECCV-20 are made public

Jianzhu Guo 3.4k Jan 2, 2023
PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

About PyTorch 1.2.0 Now the master branch supports PyTorch 1.2.0 by default. Due to the serious version problem (especially torch.utils.data.dataloade

Sanghyun Son 2.1k Jan 1, 2023
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

Bottom-Up and Top-Down Attention for Visual Question Answering An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge. The

Hengyuan Hu 731 Jan 3, 2023
Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels Blazing fast x86-64 VM kernel fuzzing framework with performant VM reloads for Linux, MacOS an

Chair for Sys­tems Se­cu­ri­ty 541 Nov 27, 2022
Fader Networks: Manipulating Images by Sliding Attributes - NIPS 2017

FaderNetworks PyTorch implementation of Fader Networks (NIPS 2017). Fader Networks can generate different realistic versions of images by modifying at

Facebook Research 753 Dec 23, 2022
Oriented Response Networks, in CVPR 2017

Oriented Response Networks [Home] [Project] [Paper] [Supp] [Poster] Torch Implementation The torch branch contains: the official torch implementation

ZhouYanzhao 217 Dec 12, 2022
Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Attention Transfer PyTorch code for "Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Tran

Sergey Zagoruyko 1.4k Dec 23, 2022
meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)

meProp The codes were used for the paper meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting (ICML 2017) [pdf]

LancoPKU 107 Nov 18, 2022
PyTorch implementation of paper: AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer, ICCV 2021.

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer [Paper] [PyTorch Implementation] [Paddle Implementation] Overview This reposit

null 148 Dec 30, 2022
official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

null 77 Dec 27, 2022
A PyTorch implementation of SlowFast based on ICCV 2019 paper "SlowFast Networks for Video Recognition"

SlowFast A PyTorch implementation of SlowFast based on ICCV 2019 paper SlowFast Networks for Video Recognition. Requirements Anaconda PyTorch conda in

Hao Ren 8 Dec 23, 2022
Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

Deep 3D Mask Volume for View Synthesis of Dynamic Scenes Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic S

Ken Lin 17 Oct 12, 2022
Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

Ren Tianhe 49 Nov 10, 2022
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

Hyungtae Lim 225 Dec 29, 2022
PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

Saim Wani 4 May 8, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022