Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

Related tags

clip-glass
Overview

CLIP-GLaSS

Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

An in-browser demo is available here

Installation

Clone this repository

git clone https://github.com/galatolofederico/clip-glass && cd clip-glass

Create a virtual environment and install the requirements

virtualenv --python=python3.6 env && . ./env/bin/activate
pip install -r requirements.txt

Run CLIP-GLaSS

You can run CLIP-GLaSS with:

python run.py --config  --target 

Specifying and according to the following table:

Config Meaning Target Type
GPT2 Use GPT2 to solve the Image-to-Text task Image
DeepMindBigGAN512 Use DeepMind's BigGAN 512x512 to solve the Text-to-Image task Text
DeepMindBigGAN256 Use DeepMind's BigGAN 256x256 to solve the Text-to-Image task Text
StyleGAN2_ffhq_d Use StyleGAN2-ffhq to solve the Text-to-Image task Text
StyleGAN2_ffhq_nod Use StyleGAN2-ffhq without Discriminator to solve the Text-to-Image task Text
StyleGAN2_church_d Use StyleGAN2-church to solve the Text-to-Image task Text
StyleGAN2_church_nod Use StyleGAN2-church without Discriminator to solve the Text-to-Image task Text
StyleGAN2_car_d Use StyleGAN2-car to solve the Text-to-Image task Text
StyleGAN2_car_nod Use StyleGAN2-car without Discriminator to solve the Text-to-Image task Text

If you do not have downloaded the models weights you will be prompted to run ./download-weights.sh You will find the results in the folder ./tmp, a different output folder can be specified with --tmp-folder

Examples

python run.py --config StyleGAN2_ffhq_d --target "the face of a man with brown eyes and stubble beard"
python run.py --config GPT2 --target gpt2_images/dog.jpeg

Acknowledgments and licensing

This work heavily relies on the following amazing repositories and would have not been possible without them:

All their work can be shared under the terms of the respective original licenses.

All my original work (everything except the content of the folders clip, stylegan2 and gpt2) is released under the terms of the GNU/GPLv3 license. Coping, adapting e republishing it is not only consent but also encouraged.

Citing

If you want to cite use you can use this BibTeX

@article{galatolo_glass
,	author	= {Galatolo, Federico A and Cimino, Mario GCA and Vaglini, Gigliola}
,	title	= {Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search}
,	year	= {2021}
}

Contacts

For any further question feel free to reach me at [email protected] or on Telegram @galatolo

Issues
  • Support

    Support "This Anime Does Not Exist" StyleGAN2 model by aydao/gwern for anime image generation

    Website: https://thisanimedoesnotexist.ai/

    The model can be downloaded here: https://www.gwern.net/Faces#tadne-download

    Considering GLaSS already supports BigGAN and different SG2 models, I hope it wouldn't be too hard to add this great model too.

    opened by n00mkrad 4
  • Demo Colab Notebook doesn't support new pytorch versions

    Demo Colab Notebook doesn't support new pytorch versions

    In the initialization command, generating the pytorch version string does not work for versions not included in suffix mapping dictionary. Before: pytorch_version = "1.7.1" + pytorch_suffix[version] if version in pytorch_suffix else "+cu110"

    Fixed parentheses: pytorch_version = "1.7.1" + (pytorch_suffix[version] if version in pytorch_suffix else "+cu110")

    The notebook is incredible and a great resource to go along with the research, great work!

    opened by exofusion 3
  • GPT-2 output console length?

    GPT-2 output console length?

    Hi, first thanks for your job :) I don't know if it's an issue. When I select config "GPT-2", the output text of the prediction seems to be incomplete (example: "the picture of a man who is a man, a man who is a" ) --> seems like something is missing. Is this a bug? if not, is there a way to increase output length?

    many thanks in advance

    opened by smithee77 2
  • Support for GPT-3

    Support for GPT-3

    Hi! Love the project.

    I'm in the OpenAI GPT-3 beta, and I was wondering if it's possible for clip-glass to support GPT-3 for the image-to-text task.

    If it's possible, I'd love to help set that integration up but I'm not sure where to start.

    opened by indiv0 1
  • how to complete image with text

    how to complete image with text

    how to complete image with text

    example: I give an unfilled image from the middle down, then I write "same image but below, a sketch of the image"

    and it generates an image of but half down is a sketch half up.

    opened by molo32 1
  • How do I get latent code from the generated images?

    How do I get latent code from the generated images?

    How do I get latent code from the generated images, or where they are saved?

    opened by molo32 1
  • RuntimeError: Method 'forward' is not defined.

    RuntimeError: Method 'forward' is not defined.

    your demo notebook worked for me yesterday but today it's giving me this: RuntimeError: Method 'forward' is not defined.

    I really like your implementation! I don't think I changed anything in what I'm doing. any ideas?
    i'm pretty much a noob, trying to learn this stuff. thanks in advance

    opened by socalledsound 1
  • Issue when running:  virtualenv --python=python3.6 env && . ./env/bin/activate

    Issue when running: virtualenv --python=python3.6 env && . ./env/bin/activate

    Hello, When I run the following,

    virtualenv --python=python3.6 env && . ./env/bin/activate

    I get this output:

    RuntimeError: failed to find interpreter for Builtin discover of python_spec='python3.6'

    Thoughts?

    opened by alexp-12 1
  • Captioning results not compatible to the paper

    Captioning results not compatible to the paper

    Hi,

    I tried your model in image captioning using the demo dog image but got a totally different results from your paper. I ran your script 5 times under the default setting and got the following results: ['the picture of the dog's body.\n\n"The dog's body is’] ['the picture of a dog with a bloated, bloated, bloa’] ["the picture of the puppy's body, with the body's b”] ["the picture of the dog's body, with a large, round”] ["the picture of the dog's body. The dog's body is c”] ['the picture of a dog with a large belly.\n\nâ¼\n\nâ¼\n\nâ¼\n’]

    The captioning result shown in your paper is as follows. image

    Is there any setting modification I need to take for image captioning? Thank you.

    opened by zhuang93 2
Owner
Federico Galatolo
PhD Student @ University of Pisa
Federico Galatolo
Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Overview PyTorch 0.4.1 | Python 3.6.5 Annotated implementations with comparative introductions for minimax, non-saturating, wasserstein, wasserstein g

Shayne O'Brien 453 Sep 14, 2021
Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

CLIP-Guided-Diffusion Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab. Original colab notebooks by Ka

Nerdy Rodent 12 Sep 21, 2021
PyTorch implementations of Generative Adversarial Networks.

This repository has gone stale as I unfortunately do not have the time to maintain it anymore. If you would like to continue the development of it as

Erik Linder-Norén 10.2k Sep 22, 2021
A collection of resources on GAN Inversion.

This repo is a collection of resources on GAN inversion, as a supplement for our survey

null 335 Sep 24, 2021
Simple implementation of OpenAI CLIP model in PyTorch.

It was in January of 2021 that OpenAI announced two new models: DALL-E and CLIP, both multi-modality models connecting texts and images in some way. In this article we are going to implement CLIP model from scratch in PyTorch. OpenAI has open-sourced some of the code relating to CLIP model but I found it intimidating and it was far from something short and simple. I also came across a good tutorial inspired by CLIP model on Keras code examples and I translated some parts of it into PyTorch to build this tutorial totally with our beloved PyTorch!

Moein Shariatnia 72 Sep 23, 2021
Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

CLIP-GLaSS Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search An in-browser demo is

Federico Galatolo 125 Sep 17, 2021
Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

Nerdy Rodent 641 Sep 26, 2021
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

null 128 Sep 17, 2021
An open source implementation of CLIP.

OpenCLIP Welcome to an open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training). The goal of this repository is to enable

null 274 Sep 21, 2021
Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021 Official Pytorch implementation of PCME | Paper Sanghyuk Chun1 Seong Joon Oh1 Rafael Sampaio de R

NAVER AI 38 Aug 31, 2021
Official repository for "On Generating Transferable Targeted Perturbations" (ICCV 2021)

On Generating Transferable Targeted Perturbations (ICCV'21) Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Fatih Porikli Paper:

Muzammal Naseer 29 Sep 15, 2021
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

VQGAN-CLIP-Docker About Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized This is a stripped and minimal dependency repository for running loca

Kevin Costa 35 Sep 16, 2021
[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing Figure: High-quality facial attributes editing results with InterFaceGA

GenForce: May Generative Force Be with You 1k Sep 20, 2021
code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? Code for paper: Does Unsupervised Architecture Representation

null 31 Aug 22, 2021
GANsformer: Generative Adversarial Transformers Drew A

GANsformer: Generative Adversarial Transformers Drew A. Hudson* & C. Lawrence Zitnick *I wish to thank Christopher D. Manning for the fruitf

Drew Arad Hudson 773 Sep 23, 2021
An easier way to build neural search on the cloud

An easier way to build neural search on the cloud Jina is a deep learning-powered search framework for building cross-/multi-modal search systems (e.g

Jina AI 11k Sep 25, 2021
How to Become More Salient? Surfacing Representation Biases of the Saliency Prediction Model

How to Become More Salient? Surfacing Representation Biases of the Saliency Prediction Model

Bogdan Kulynych 44 Sep 23, 2021
Finding an Unsupervised Image Segmenter in each of your Deep Generative Models

Finding an Unsupervised Image Segmenter in each of your Deep Generative Models Description Recent research has shown that numerous human-interpretable

Luke Melas-Kyriazi 38 Sep 12, 2021
Code to reproduce experiments in the paper "Explainability Requires Interactivity".

Explainability Requires Interactivity This repository contains the code to train all custom models used in the paper Explainability Requires Interacti

Digital Health & Machine Learning 3 Sep 22, 2021