Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning

Last update: Dec 31, 2022

Related tags

Deep Learning T2I_CL

Overview

T2I_CL

This is the official Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning

Requirements

Linux
Python ≥ 3.6
PyTorch ≥ 1.4.0

Prepare Data

Download the preprocessed datasets from AttnGAN

Alternatively, another site is from DM-GAN

Training

Pretrain DAMSM+CL:
- For bird dataset: python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0
- For coco dataset: python pretrain_DAMSM.py --cfg cfg/DAMSM/coco.yml --gpu 0
Train AttnGAN+CL:
- For bird dataset: python main.py --cfg cfg/bird_attn2.yml --gpu 0
- For coco dataset: python main.py --cfg cfg/coco_attn2.yml --gpu 0
Train DM-GAN+CL:
- For bird dataset: python main.py --cfg cfg/bird_DMGAN.yml --gpu 0
- For coco dataset: python main.py --cfg cfg/coco_DMGAN.yml --gpu 0

Pretrained Models

DAMSM+CL for bird. Download and save it to DAMSMencoders/
DAMSM+CL for coco. Download and save it to DAMSMencoders/
AttnGAN+CL for bird. Download and save it to models/
AttnGAN+CL for coco. Download and save it to models/
DM-GAN+CL for bird. Download and save it to models/
DM-GAN+CL for coco. Download and save it to models/

Evaluation

Sampling and get the R-precision:
- python main.py --cfg cfg/eval_bird.yml --gpu 0
- python main.py --cfg cfg/eval_coco.yml --gpu 0
Inception score:
- python inception_score_bird.py --image_folder fake_images_bird
- python inception_score_coco.py fake_images_coco
FID:
- python fid_score.py --gpu 0 --batch-size 50 --path1 real_images_bird --path2 fake_images_bird
- python fid_score.py --gpu 0 --batch-size 50 --path1 real_images_coco --path2 fake_images_coco

Citation

If you find this work useful in your research, please consider citing:

@article{ye2021improving,
  title={Improving Text-to-Image Synthesis Using Contrastive Learning},
  author={Ye, Hui and Yang, Xiulong and Takac, Martin and Sunderraman, Rajshekhar and Ji, Shihao},
  journal={arXiv preprint arXiv:2107.02423},
  year={2021}
}

Acknowledge

Our work is based on the following works:

Comments

Throwing import error
I'm very sorry for posting such a simple bug but for the life of me I can't figure out why its throwing this error, I have tried reinstalling numpy and have all the required packages. Numpy even works when running other python files but not here for some reason.

Error:

Traceback (most recent call last): File "pretrain_DAMSM.py", line 3, in <module> from miscc.utils import mkdir_p File "/books-nn/T2I_CL/DM-GAN+CL/code/miscc/utils.py", line 4, in <module> from torch.nn import init ModuleNotFoundError: No module named 'torch'
opened by Stelath 5
assert dataset Assertion Error

Hi. Is there something wrong with my data import? The path was changed in the cofig file, but it seemed that there was a problem reading the text file.

I saw the original attgan said that the data list should look like this： data/birds Then my dataset is a list like this. Is it correct?

#if use cd command to explore the dataset:

!cd /content/T2I_CL/AttnGAN+CL/data/birds !ls

attributes image_class_labels.txt parts bounding_boxes.txt images README classes.txt images.txt train_test_split.txt

Is it correct?

opened by sumorday 4
Weight initialization problem

hi, I noticed that the code when initializing the weights is different from AttnGAN. Can you tell me the reason for doing this? https://github.com/huiyegit/T2I_CL/blob/6f749b869ac76bc6423bc319adc8f6c7c386f17b/AttnGAN%2BCL/code/miscc/utils.py#L290-L295

opened by YIRuriZhongtian 4

CUDA not executing during runtime

CUDA for some reason fails to execute when running, I have the correct version of PyTorch and also have an NVIDIA driver installed on the system.

Thrown as a result of running the command: python pretrain_DAMSM.py --cfg cfg/DAMSM/book.yml --gpu 0

         'B_DCGAN': False,
         'CONDITION_DIM': 100,
         'DF_DIM': 64,
         'GF_DIM': 128,
         'R_NUM': 2,
         'Z_DIM': 100},
 'GPU_ID': 0,
 'RNN_TYPE': 'LSTM',
 'TEXT': {'CAPTIONS_PER_IMAGE': 1, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 18},
 'TRAIN': {'BATCH_SIZE': 48,
           'B_NET_D': True,
           'DISCRIMINATOR_LR': 0.0002,
           'ENCODER_LR': 0.002,
           'FLAG': True,
           'GENERATOR_LR': 0.0002,
           'MAX_EPOCH': 600,
           'NET_E': '',
           'NET_G': '',
           'RNN_GRAD_CLIP': 0.25,
           'SMOOTH': {'GAMMA1': 4.0,
                      'GAMMA2': 5.0,
                      'GAMMA3': 10.0,
                      'LAMBDA': 1.0},
           'SNAPSHOT_INTERVAL': 50},
 'TREE': {'BASE_SIZE': 299, 'BRANCH_NUM': 1},
 'WORKERS': 1}
/opt/conda/envs/dm_gan/lib/python3.6/site-packages/torchvision/transforms/transforms.py:220: UserWarning: The use o
f the transforms.Scale transform is deprecated, please use transforms.Resize instead.                             
  "please use transforms.Resize instead.")
Load filenames from: ../data/books/train/filenames.pickle (4625)                                                  
Load filenames from: ../data/books/test/filenames.pickle (1622)                                                   
Load from:  ../data/books/captions.pickle
31146 1
Load filenames from: ../data/books/train/filenames.pickle (4625)                                                  
Load filenames from: ../data/books/test/filenames.pickle (1622)                                                   
Load from:  ../data/books/captions.pickle
/opt/conda/envs/dm_gan/lib/python3.6/site-packages/torch/nn/modules/rnn.py:50: UserWarning: dropout option adds dro
pout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5
 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
Load pretrained model from  https://download.pytorch.org/models/inception_v3_google-1a9a5a14.pth                  
Traceback (most recent call last):
  File "pretrain_DAMSM.py", line 350, in <module>
    dataset.ixtoword, image_dir, criterion)
  File "pretrain_DAMSM.py", line 87, in train
    words_features, sent_code = cnn_model(imgs[-1])
  File "/opt/conda/envs/dm_gan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__     
    result = self.forward(*input, **kwargs)
  File "/books-nn/T2I_CL/DM-GAN+CL/code/model.py", line 208, in forward                                           
    x = self.Conv2d_1a_3x3(x)
  File "/opt/conda/envs/dm_gan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__     
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/dm_gan/lib/python3.6/site-packages/torchvision/models/inception.py", line 433, in forward 
    x = self.bn(x)
  File "/opt/conda/envs/dm_gan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__     
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/dm_gan/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 107, in forward   
    exponential_average_factor, self.eps)
  File "/opt/conda/envs/dm_gan/lib/python3.6/site-packages/torch/nn/functional.py", line 1670, in batch_norm      
    training, momentum, eps, torch.backends.cudnn.enabled                                                         
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

opened by Stelath 2

pretrained discriminator request

Hello @huiyegit First of all, thank you for providing this code! I am trying to utilize transfer learning (as in http://arxiv.org/pdf/1805.01677v2) to see if I can optimize the model for my test domain. As far as I understand, I would need a pretrained Discriminator as well to perform this task. Unfortunately, due to hardware limitations, I am unable to perform the training myself with reasonable effort. This is why I wanted to ask if it would be possible for you to provide the discriminator model to go with the DM-GAN-CL COCO Generator with 200 Epochs and the AttnGAN-CL COCO Generator with 80 Epochs?

opened by TimStefany 2
file not found

Hi there, I cannot find file inception_score_coco.py when I want to evaluate the inception score of the generated images. Is there anything wrong with that?

opened by MaxyLee 2
How to calculate R-precision?

@huiyegit

Is there any other file which I should follow to calculate R-precision ?

It says that: Sampling and get the R-precision: python main.py --cfg cfg/eval_bird.yml --gpu 0

but I dont see any functions inside main.py to calculate R-precision.

opened by priyankaupadhyay090 1
RuntimeError: Input, output and indices must be on the current device

Hey, I am trying to run AttnGAN+CL main.py for sampling (python main.py --cfg cfg/eval_bird.yml --gpu 0) and getting an error.

python main.py Using config: {'B_VALIDATION': True, 'CONFIG_NAME': 'attn2', 'CUDA': False, 'DATASET_NAME': 'birds', 'DATA_DIR': 'data/birds', 'GAN': {'B_ATTENTION': True, 'B_DCGAN': False, 'CONDITION_DIM': 100, 'DF_DIM': 64, 'GF_DIM': 32, 'R_NUM': 2, 'Z_DIM': 100}, 'GPU_ID': 0, 'RNN_TYPE': 'LSTM', 'TEXT': {'CAPTIONS_PER_IMAGE': 10, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 25}, 'TRAIN': {'BATCH_SIZE': 10, 'B_NET_D': False, 'DISCRIMINATOR_LR': 0.0002, 'ENCODER_LR': 0.0002, 'FLAG': False, 'GENERATOR_LR': 0.0002, 'MAX_EPOCH': 600, 'NET_E': 'DAMSMencoders/bird/text_encoder200.pth', 'NET_G': 'models/netG_epoch_600.pth', 'RNN_GRAD_CLIP': 0.25, 'SMOOTH': {'GAMMA1': 5.0, 'GAMMA2': 5.0, 'GAMMA3': 10.0, 'LAMBDA': 1.0}, 'SNAPSHOT_INTERVAL': 2000}, 'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3}, 'WORKERS': 1} seed now is : 100 Total filenames: 11788 001.Black_footed_Albatross/Black_Footed_Albatross_0046_18.jpg load images pickles Load filenames from: data/birds/train/filenames.pickle (8855) loading train images load images pickles Load filenames from: data/birds/test/filenames.pickle (2933) loading test images Load from: data/birds/captions.pickle captions file loaded for test 5450 10 generating images for the whole valid dataset self encoder /opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py:61: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1 "num_layers={}".format(dropout, num_layers)) calling text encoder as RNN encoder Load text encoder from: DAMSMencoders/bird/text_encoder200.pth /opt/conda/lib/python3.6/site-packages/torchvision/models/inception.py:77: FutureWarning: The default weight initialization of inception_v3 will be changed in future releases of torchvision. If you wish to keep the old behavior (which leads to long initialization times due to scipy/scipy#11299), please set init_weights=True. ' due to scipy/scipy#11299), please set init_weights=True.', FutureWarning) Load pretrained model from https://download.pytorch.org/models/inception_v3_google-1a9a5a14.pth calling image encoder Load image encoder from: DAMSMencoders/bird/image_encoder200.pth /netscratch/pupadhyay/project/T2I_CL/AttnGAN+CL/trainer.py:465: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. noise = Variable(torch.FloatTensor(batch_size, nz), volatile=True) Load G from: models/netG_epoch_600.pth cnt: 10 word_emb and sent_emb starts calling RNN encoder forward loop embedding value Traceback (most recent call last): File "main.py", line 193, in algo.sampling(split_dir) # sampling() defined in trainer.py file File "/netscratch/pupadhyay/project/T2I_CL/AttnGAN+CL/trainer.py", line 504, in sampling words_embs, sent_emb = text_encoder(captions, cap_lens, hidden) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/netscratch/pupadhyay/project/T2I_CL/AttnGAN+CL/model.py", line 139, in forward emb = self.encoder(captions) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 126, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 1855, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Input, output and indices must be on the current device.

Error is coming from model.py line139 where we are defining RNN_ENCODER, forwad() function: emb = self.encoder(captions)

I have set WORKERS = 1, GPU_ID = 0 into cfg/bird.yml so that if multi-gpu was giving the error, it should be solve by only using one GPU.

This is the container command I used to srun --container-image=/netscratch/enroot/dlcc_pytorch_20.10.sqsh --mem=64000M --cpus-per-task=16 --gres=gpu:1 --pty /bin/bash

Has anyone faced this issue ?

opened by priyankaupadhyay090 1
Cog version

"😵 Uh oh! This model can't be run on Replicate because it was built with a version of Cog that is no longer supported." https://replicate.com/huiyegit/t2i_cl

opened by Jakeukalane 0
Issue of FID score

@huiyegit @shihaoji I have calculated the FID score but I am getting very high value : 35.190974412567414

Am I doing anything wrong? Please let me know if there is any hyper parameter which I need to change to calculate the FID

opened by priyankaupadhyay090 5
RuntimeError: Input, output and indices must be on the current device

@huiyegit and @shihaoji thank you for the nice work. I am using code for AttnGAN+CL. I am trying to generated samples by using Sampling and get the R-precision: python main.py --cfg cfg/eval_bird.yml --gpu 0

While running main.py. I set

WORKERS = 1 GPU_ID = 0

I got an error:

python main.py --cfg cfg/eval_bird.yml --gpu 0 Using config: {'B_VALIDATION': True, 'CONFIG_NAME': 'attn2', 'CUDA': False, 'DATASET_NAME': 'birds', 'DATA_DIR': 'data/birds', 'GAN': {'B_ATTENTION': True, 'B_DCGAN': False, 'CONDITION_DIM': 100, 'DF_DIM': 64, 'GF_DIM': 32, 'R_NUM': 2, 'Z_DIM': 100}, 'GPU_ID': 0, 'RNN_TYPE': 'LSTM', 'TEXT': {'CAPTIONS_PER_IMAGE': 10, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 25}, 'TRAIN': {'BATCH_SIZE': 10, 'B_NET_D': False, 'DISCRIMINATOR_LR': 0.0002, 'ENCODER_LR': 0.0002, 'FLAG': False, 'GENERATOR_LR': 0.0002, 'MAX_EPOCH': 600, 'NET_E': 'DAMSMencoders/bird/text_encoder200.pth', 'NET_G': 'models/netG_epoch_600.pth', 'RNN_GRAD_CLIP': 0.25, 'SMOOTH': {'GAMMA1': 5.0, 'GAMMA2': 5.0, 'GAMMA3': 10.0, 'LAMBDA': 1.0}, 'SNAPSHOT_INTERVAL': 2000}, 'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3}, 'WORKERS': 1} seed now is : 100 Total filenames: 11788 001.Black_footed_Albatross/Black_Footed_Albatross_0046_18.jpg load images pickles Load filenames from: data/birds/train/filenames.pickle (8855) loading train images load images pickles Load filenames from: data/birds/test/filenames.pickle (2933) loading test images Load from: data/birds/captions.pickle captions file loaded for test 5450 10 generating images for the whole valid dataset self encoder /opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py:61: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1 "num_layers={}".format(dropout, num_layers)) calling text encoder as RNN encoder Load text encoder from: DAMSMencoders/bird/text_encoder200.pth /opt/conda/lib/python3.6/site-packages/torchvision/models/inception.py:77: FutureWarning: The default weight initialization of inception_v3 will be changed in future releases of torchvision. If you wish to keep the old behavior (which leads to long initialization times due to scipy/scipy#11299), please set init_weights=True. ' due to scipy/scipy#11299), please set init_weights=True.', FutureWarning) Load pretrained model from https://download.pytorch.org/models/inception_v3_google-1a9a5a14.pth calling image encoder Load image encoder from: DAMSMencoders/bird/image_encoder200.pth /netscratch/pupadhyay/project/T2I_CL/AttnGAN+CL/trainer.py:465: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. noise = Variable(torch.FloatTensor(batch_size, nz), volatile=True) Load G from: models/netG_epoch_600.pth cnt: 10 word_emb and sent_emb starts calling RNN encoder forward loop embedding value Traceback (most recent call last): File "main.py", line 193, in algo.sampling(split_dir) # sampling() defined in trainer.py file File "/netscratch/pupadhyay/project/T2I_CL/AttnGAN+CL/trainer.py", line 504, in sampling words_embs, sent_emb = text_encoder(captions, cap_lens, hidden) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/netscratch/pupadhyay/project/T2I_CL/AttnGAN+CL/model.py", line 139, in forward emb = self.encoder(captions) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 126, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 1855, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Input, output and indices must be on the current device

I used 1 GPU for container as well to avoid multi-gpu uses to solve the error but error remains same.

srun --container-image=/netscratch/enroot/dlcc_pytorch_20.10.sqsh --container-workdir=pwd -p V100-16GB,V100-32GB,A100,RTX6000,RTX3090,RTXA6000 --mem=64000M --cpus-per-task=16 --gres=gpu:1 --time=08:00:00 --pty /bin/bash

is there anyway to solve this ?

opened by priyankaupadhyay090 4
issue of IS score (from stu.sdu.edu.kz)

I am sorry my email can not be delivered to your email address. I post my reply here. Please try this source code to calculate the IS score: https://github.com/MinfengZhu/DM-GAN/tree/master/eval/IS For dataset Bird, there is one pre-trained Inception-V3 : https://github.com/hanzhanggit/StackGAN-inception-model For dataset COCO, the Inception-V3 should be downloaded automatically.

opened by huiyegit 3
Add Docker environment & web demo

Hey @huiyegit! 👋

This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

This also means we can make a web page where other people can try out your model! View it here: https://replicate.ai/huiyegit/t2i_cl

That page also has instructions on how to use the Docker image, which is on our registry at r8.im/huiyegit/t2i_cl.

In case you're wondering who the heck I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. :)

opened by bfirsh 0

Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning

Related tags

Overview

T2I_CL

Requirements

Prepare Data

Training

Pretrained Models

Evaluation

Citation

Acknowledge

Comments

Owner

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

Personal implementation of paper "Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval"

Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch

Cross-Modal Contrastive Learning for Text-to-Image Generation

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

Official PyTorch implementation of the paper: Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting.

PyTorch implementation code for the paper MixCo: Mix-up Contrastive Learning for Visual Representation

PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning