Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

Overview

CLIP-GLaSS

Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

An in-browser demo is available here

Installation

Clone this repository

git clone https://github.com/galatolofederico/clip-glass && cd clip-glass

Create a virtual environment and install the requirements

virtualenv --python=python3.6 env && . ./env/bin/activate
pip install -r requirements.txt

Run CLIP-GLaSS

You can run CLIP-GLaSS with:

python run.py --config  --target 

Specifying and according to the following table:

Config Meaning Target Type
GPT2 Use GPT2 to solve the Image-to-Text task Image
DeepMindBigGAN512 Use DeepMind's BigGAN 512x512 to solve the Text-to-Image task Text
DeepMindBigGAN256 Use DeepMind's BigGAN 256x256 to solve the Text-to-Image task Text
StyleGAN2_ffhq_d Use StyleGAN2-ffhq to solve the Text-to-Image task Text
StyleGAN2_ffhq_nod Use StyleGAN2-ffhq without Discriminator to solve the Text-to-Image task Text
StyleGAN2_church_d Use StyleGAN2-church to solve the Text-to-Image task Text
StyleGAN2_church_nod Use StyleGAN2-church without Discriminator to solve the Text-to-Image task Text
StyleGAN2_car_d Use StyleGAN2-car to solve the Text-to-Image task Text
StyleGAN2_car_nod Use StyleGAN2-car without Discriminator to solve the Text-to-Image task Text

If you do not have downloaded the models weights you will be prompted to run ./download-weights.sh You will find the results in the folder ./tmp, a different output folder can be specified with --tmp-folder

Examples

python run.py --config StyleGAN2_ffhq_d --target "the face of a man with brown eyes and stubble beard"
python run.py --config GPT2 --target gpt2_images/dog.jpeg

Acknowledgments and licensing

This work heavily relies on the following amazing repositories and would have not been possible without them:

All their work can be shared under the terms of the respective original licenses.

All my original work (everything except the content of the folders clip, stylegan2 and gpt2) is released under the terms of the GNU/GPLv3 license. Coping, adapting e republishing it is not only consent but also encouraged.

Citing

If you want to cite use you can use this BibTeX

@article{galatolo_glass
,	author	= {Galatolo, Federico A and Cimino, Mario GCA and Vaglini, Gigliola}
,	title	= {Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search}
,	year	= {2021}
}

Contacts

For any further question feel free to reach me at [email protected] or on Telegram @galatolo

Comments
  • Support

    Support "This Anime Does Not Exist" StyleGAN2 model by aydao/gwern for anime image generation

    Website: https://thisanimedoesnotexist.ai/

    The model can be downloaded here: https://www.gwern.net/Faces#tadne-download

    Considering GLaSS already supports BigGAN and different SG2 models, I hope it wouldn't be too hard to add this great model too.

    opened by n00mkrad 4
  • Using Multi GPUs

    Using Multi GPUs

    When I try the text-to-image task, I always run out of CUDA memory on single GPU. I try to set the device to 'cuda:0,1' but it doesn't work. I get error like this:

    File "run.py", line 54, in <module>
        problem = GenerationProblem(config)
      File "/home/ubuntu/Documents/clip-glass/problem.py", line 9, in __init__
        self.generator = Generator(config)
      File "/home/ubuntu/Documents/clip-glass/generator.py", line 16, in __init__
        self.CLIP, clip_preprocess = clip.load("ViT-B/32", device=self.config.device, jit=False)
      File "/home/ubuntu/anaconda3/envs/lib/python3.7/site-packages/clip/clip.py", line 137, in load
        model = build_model(state_dict or model.state_dict()).to(device)
      File "/home/ubuntu/anaconda3/envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 600, in to
        device, dtype, non_blocking, convert_to_format = torch._C._nn._parse_to(*args, **kwargs)
    RuntimeError: Invalid device string: 'cuda:0,1,'
    

    I wonder how to set it on multi GPUs properly.

    opened by KevinGoodman 3
  • Demo Colab Notebook doesn't support new pytorch versions

    Demo Colab Notebook doesn't support new pytorch versions

    In the initialization command, generating the pytorch version string does not work for versions not included in suffix mapping dictionary. Before: pytorch_version = "1.7.1" + pytorch_suffix[version] if version in pytorch_suffix else "+cu110"

    Fixed parentheses: pytorch_version = "1.7.1" + (pytorch_suffix[version] if version in pytorch_suffix else "+cu110")

    The notebook is incredible and a great resource to go along with the research, great work!

    opened by exofusion 3
  • Captioning results not compatible to the paper

    Captioning results not compatible to the paper

    Hi,

    I tried your model in image captioning using the demo dog image but got a totally different results from your paper. I ran your script 5 times under the default setting and got the following results: ['the picture of the dog's body.\n\n"The dog's body is’] ['the picture of a dog with a bloated, bloated, bloa’] ["the picture of the puppy's body, with the body's b”] ["the picture of the dog's body, with a large, round”] ["the picture of the dog's body. The dog's body is c”] ['the picture of a dog with a large belly.\n\nâ¼\n\nâ¼\n\nâ¼\n’]

    The captioning result shown in your paper is as follows. image

    Is there any setting modification I need to take for image captioning? Thank you.

    opened by zhuang93 2
  • GPT-2 output console length?

    GPT-2 output console length?

    Hi, first thanks for your job :) I don't know if it's an issue. When I select config "GPT-2", the output text of the prediction seems to be incomplete (example: "the picture of a man who is a man, a man who is a" ) --> seems like something is missing. Is this a bug? if not, is there a way to increase output length?

    many thanks in advance

    opened by smithee77 2
  • Why BigGAN not use discrimnator?

    Why BigGAN not use discrimnator?

    Thanks for your work. I found that the discriminator is helpful to improve the generation quality in StyleGAN setting. Why not use the BigGAN discriminator in the BigGAN setting?

    Thanks.

    opened by liuzhengzhe 1
  • Issue when running:  virtualenv --python=python3.6 env && . ./env/bin/activate

    Issue when running: virtualenv --python=python3.6 env && . ./env/bin/activate

    Hello, When I run the following,

    virtualenv --python=python3.6 env && . ./env/bin/activate

    I get this output:

    RuntimeError: failed to find interpreter for Builtin discover of python_spec='python3.6'

    Thoughts?

    opened by alexp-12 1
  • RuntimeError: Method 'forward' is not defined.

    RuntimeError: Method 'forward' is not defined.

    your demo notebook worked for me yesterday but today it's giving me this: RuntimeError: Method 'forward' is not defined.

    I really like your implementation! I don't think I changed anything in what I'm doing. any ideas?
    i'm pretty much a noob, trying to learn this stuff. thanks in advance

    opened by socalledsound 1
  • how to complete image with text

    how to complete image with text

    how to complete image with text

    example: I give an unfilled image from the middle down, then I write "same image but below, a sketch of the image"

    and it generates an image of but half down is a sketch half up.

    opened by molo32 1
  • Support for GPT-3

    Support for GPT-3

    Hi! Love the project.

    I'm in the OpenAI GPT-3 beta, and I was wondering if it's possible for clip-glass to support GPT-3 for the image-to-text task.

    If it's possible, I'd love to help set that integration up but I'm not sure where to start.

    opened by indiv0 1
Owner
Federico Galatolo
PhD Student @ University of Pisa
Federico Galatolo
FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

XCL 191 Dec 31, 2022
PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models Code accompanying CVPR'20 paper of the same title. Paper lin

Alex Damian 7k Dec 30, 2022
Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

HamasKhan 3 Jul 8, 2022
Navigating StyleGAN2 w latent space using CLIP

Navigating StyleGAN2 w latent space using CLIP an attempt to build sth with the official SG2-ADA Pytorch impl kinda inspired by Generating Images from

Mike K. 55 Dec 6, 2022
Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt. This is done by

Mehdi Cherti 135 Dec 30, 2022
Disentangled Face Attribute Editing via Instance-Aware Latent Space Search, accepted by IJCAI 2021.

Instance-Aware Latent-Space Search This is a PyTorch implementation of the following paper: Disentangled Face Attribute Editing via Instance-Aware Lat

null 67 Dec 21, 2022
Face Identity Disentanglement via Latent Space Mapping [SIGGRAPH ASIA 2020]

Face Identity Disentanglement via Latent Space Mapping Description Official Implementation of the paper Face Identity Disentanglement via Latent Space

null 150 Dec 7, 2022
Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Face Identity Disentanglement via Latent Space Mapping - Implement in pytorch with StyleGAN 2 Description Pytorch implementation of the paper Face Ide

Daniel Roich 58 Dec 24, 2022
Code for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in Deep Latent Space"

SRHEN This is a better and simpler implementation for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in

null 1 Oct 28, 2022
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Achraf Rahouti 3 Nov 30, 2021
CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

CLIP-GEN [简体中文][English] 本项目在萤火二号集群上用 PyTorch 实现了论文 《CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP》。 CLIP-GEN 是一个 Language-F

null 75 Dec 29, 2022
Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

DenseNAS The code of the CVPR2020 paper Densely Connected Search Space for More Flexible Neural Architecture Search. Neural architecture search (NAS)

Jamin Fong 291 Nov 18, 2022
Generating Anime Images by Implementing Deep Convolutional Generative Adversarial Networks paper

AnimeGAN - Deep Convolutional Generative Adverserial Network PyTorch implementation of DCGAN introduced in the paper: Unsupervised Representation Lear

Rohit Kukreja 23 Jul 21, 2022
A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

Eugenio Herrera 175 Dec 29, 2022
Visualizer using audio and semantic analysis to explore BigGAN (Brock et al., 2018) latent space.

BigGAN Audio Visualizer Description This visualizer explores BigGAN (Brock et al., 2018) latent space by using pitch/tempo of an audio file to generat

Rush Kapoor 2 Nov 21, 2022
Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

CLIP-Guided-Diffusion Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab. Original colab notebooks by Ka

Nerdy Rodent 336 Dec 9, 2022
[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing Figure: High-quality facial attributes editing results with InterFaceGA

GenForce: May Generative Force Be with You 1.3k Dec 29, 2022
MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space

Update (20 Jan 2020): MODALS on text data is avialable MODALS MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space Table of Conte

null 38 Dec 15, 2022
PyTorch implementation of the WarpedGANSpace: Finding non-linear RBF paths in GAN latent space (ICCV 2021)

Authors official PyTorch implementation of the "WarpedGANSpace: Finding non-linear RBF paths in GAN latent space" [ICCV 2021].

Christos Tzelepis 100 Dec 6, 2022