Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

Related tags

text-to-image text2image
Overview

VQGAN-CLIP Overview

A repo for running VQGAN+CLIP locally. This started out as a Katherine Crowson VQGAN+CLIP derived Google colab notebook.

Original notebook: Open In Colab

Some example images:

Environment:

  • Tested on Ubuntu 20.04
  • GPU: Nvidia RTX 3090
  • Typical VRAM requirements:
    • 24 GB for a 900x900 image
    • 10 GB for a 512x512 image
    • 8 GB for a 380x380 image

Still a work in progress - I've not actually tested everything yet :)

Set up

Example set up using Anaconda to create a virtual Python environment with the prerequisites:

conda create --name vqgan python=3.9
conda activate vqgan

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install ftfy regex tqdm omegaconf pytorch-lightning IPython kornia imageio imageio-ffmpeg einops 

git clone https://github.com/openai/CLIP
git clone https://github.com/CompVis/taming-transformers.git

You will also need at least 1 VQGAN pretrained model. E.g.

mkdir checkpoints
curl -L -o checkpoints/vqgan_imagenet_f16_16384.yaml -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_16384.yaml' #ImageNet 16384
curl -L -o checkpoints/vqgan_imagenet_f16_16384.ckpt -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_16384.ckpt' #ImageNet 16384

By default, the model .yaml and .ckpt files are expected in the checkpoints directory. See https://github.com/CompVis/taming-transformers for more information on datasets and models.

Run

To generate images from text, specify your text prompt as shown in the example below:

python generate.py -p "A painting of an apple in a fruit bowl"

Multiple prompts

Text and image prompts can be split using the pipe symbol in order to allow multiple prompts. For example:

python generate.py -p "A painting of an apple in a fruit bowl | psychedelic | surreal | weird"

Image prompts can be split in the same way. For example:

python generate.py -p "A picture of a bedroom with a portrait of Van Gogh" -ip "samples/VanGogh.jpg | samples/Bedroom.png"

"Style Transfer"

An input image with style text and a low number of iterations can be used create a sort of "style transfer" effect. For example:

python generate.py -p "A painting in the style of Picasso" -ii samples/VanGogh.jpg -i 80 -se 10 -opt AdamW -lr 0.25
Output Style
Picasso
Sketch
Psychedelic

Feedback example

By feeding back the generated images and making slight changes, some interesting effects can be created.

The example zoom.sh shows this by applying a zoom and rotate to generated images, before feeding them back in again. To use zoom.sh, specifying a text prompt, output filename and number of frames. E.g.

./zoom.sh "A painting of a red telephone box spinning through a time vortex" Telephone.png 150

Random text example

Use random.sh to make a batch of images from random text. Edit the text and number of generated images to your taste!

./random.sh

Advanced options

To view the available options, use "-h".

python generate.py -h
usage: generate.py [-h] [-p PROMPTS] [-o OUTPUT] [-i MAX_ITERATIONS] [-ip IMAGE_PROMPTS]
[-nps [NOISE_PROMPT_SEEDS ...]] [-npw [NOISE_PROMPT_WEIGHTS ...]] [-s SIZE SIZE]
[-ii INIT_IMAGE] [-iw INIT_WEIGHT] [-m CLIP_MODEL] [-conf VQGAN_CONFIG]
[-ckpt VQGAN_CHECKPOINT] [-lr STEP_SIZE] [-cuts CUTN] [-cutp CUT_POW] [-se DISPLAY_FREQ]
[-sd SEED] [-opt OPTIMISER]
optional arguments:
  -h, --help            show this help message and exit
  -p PROMPTS, --prompts PROMPTS
                        Text prompts
  -o OUTPUT, --output OUTPUT
                        Number of iterations
  -i MAX_ITERATIONS, --iterations MAX_ITERATIONS
                        Number of iterations
  -ip IMAGE_PROMPTS, --image_prompts IMAGE_PROMPTS
                        Image prompts / target image
  -nps [NOISE_PROMPT_SEEDS ...], --noise_prompt_seeds [NOISE_PROMPT_SEEDS ...]
                        Noise prompt seeds
  -npw [NOISE_PROMPT_WEIGHTS ...], --noise_prompt_weights [NOISE_PROMPT_WEIGHTS ...]
                        Noise prompt weights
  -s SIZE SIZE, --size SIZE SIZE
                        Image size (width height)
  -ii INIT_IMAGE, --init_image INIT_IMAGE
                        Initial image
  -iw INIT_WEIGHT, --init_weight INIT_WEIGHT
                        Initial image weight
  -m CLIP_MODEL, --clip_model CLIP_MODEL
                        CLIP model
  -conf VQGAN_CONFIG, --vqgan_config VQGAN_CONFIG
                        VQGAN config
  -ckpt VQGAN_CHECKPOINT, --vqgan_checkpoint VQGAN_CHECKPOINT
                        VQGAN checkpoint
  -lr STEP_SIZE, --learning_rate STEP_SIZE
                        Learning rate
  -cuts CUTN, --num_cuts CUTN
                        Number of cuts
  -cutp CUT_POW, --cut_power CUT_POW
                        Cut power
  -se DISPLAY_FREQ, --save_every DISPLAY_FREQ
                        Save image iterations
  -sd SEED, --seed SEED
                        Seed
  -opt OPTIMISER, --optimiser OPTIMISER
                        Optimiser (Adam, AdamW, Adagrad, Adamax)

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}
@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Katherine Crowson - https://github.com/crowsonkb

Public Domain images from Open Access Images at the Art Institute of Chicago - https://www.artic.edu/open-access/open-access-images

Issues
  • which CUDA version is required for pytorch here?

    which CUDA version is required for pytorch here?

    I'm getting UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:115.). I do have a GPU, but I'm using CUDA version 8 (it's a shared lab machine).

    Is the old CUDA version why I get the above error? Any way to fix this, apart from setting up a brand new system?

    opened by christiansievers 4
  • requirements.txt

    requirements.txt

    Hey! Would you mind adding a requirements.txt? I'm really just looking for the version #s of the relevant repos that are used here. It should be straightforward to extract from the output of "pip freeze". Thanks in advance!

    opened by thorbjorn444 2
  • Saving each iteration to create a video

    Saving each iteration to create a video

    Is there a way I can save each image along the process rather than just the final output? Then using ffmpeg to combine the images into an animation? I've got it working on my PC! just interested in that feature as I can't get 900*900 on collab.. Thanks!

    opened by shaolinseed 2
  • No module named 'CLIP'

    No module named 'CLIP'

    After following your video—with the conda approach, making the environment, updating it with the .yml and getting torch==1.9.0—I am getting the following error from generate.py:

    ModuleNotFoundError: No module named 'CLIP'

    I tried to even install the CLIP repo via pip before re-installing torch and everything else but it didn't work...

    I am sure this is a silly issue

    opened by Lucaslpena 2
  • RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling `cusolverDnCreate(handle)`

    RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling `cusolverDnCreate(handle)`

    Hey!

    Thanks for this, I am so ready to create bizarreness.

    Hardware: Ryzen 7 3700X 32GB RAM RTX 2070 Super

    OS: Windows 10 Pro

    I'm getting the below error when running generate.py:

    python generate.py -p "Yee"

    Output: (vqgan) PS C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP> python generate.py -p "Yee" Working with z of shape (1, 256, 16, 16) = 65536 dimensions. loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth VQLPIPSWithDiscriminator running with hinge loss. Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torchvision\transforms\transforms.py:280: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( Using device: cuda:0 Optimising using: Adam Using text prompts: ['Yee'] Using seed: 329366907029900 0it [00:01, ?it/s] Traceback (most recent call last): File "C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP\generate.py", line 461, in <module> train(i) File "C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP\generate.py", line 444, in train lossAll = ascend_txt() File "C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP\generate.py", line 423, in ascend_txt iii = perceptor.encode_image(normalize(make_cutouts(out))).float() File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP\generate.py", line 241, in forward batch = self.augs(torch.cat(cutouts, dim=0)) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\container.py", line 139, in forward input = module(input) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\augmentation\base.py", line 245, in forward output = self.apply_func(in_tensor, in_transform, self._params, return_transform) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\augmentation\base.py", line 210, in apply_func output[to_apply] = self.apply_transform(in_tensor[to_apply], params, trans_matrix[to_apply]) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\augmentation\augmentation.py", line 684, in apply_transform return warp_affine( File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\geometry\transform\imgwarp.py", line 192, in warp_affine dst_norm_trans_src_norm: torch.Tensor = normalize_homography(M_3x3, (H, W), dsize) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\geometry\transform\homography_warper.py", line 380, in normalize_homography src_pix_trans_src_norm = _torch_inverse_cast(src_norm_trans_src_pix) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\utils\helpers.py", line 48, in _torch_inverse_cast return torch.inverse(input.to(dtype)).to(input.dtype) RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling cusolverDnCreate(handle) `

    opened by vexersa 2
  • Update README.md

    Update README.md

    opened by DrJKL 0
  • Add Metadata with the prompt info to the outputs

    Add Metadata with the prompt info to the outputs

    Let me know if you want me to store more of the information there too.

    opened by DrJKL 0
  • chore(docs): Update README and remove unused imports

    chore(docs): Update README and remove unused imports

    • Updated the README.md
    • Added default image size for VRAM size
    • Remove unused imports
    opened by thehappydinoa 0
  • About code implementation  Feedback example

    About code implementation Feedback example

    1. I particularly like this example, which is a great discovery. Can you use code to realize this example? I'm running under WIN, but I can't realize zoom.sh
    2. Is there any text prompt that can be generated automatically? I wonder if I can generate it myself , replace random.sh
    opened by zhanghongyong123456 8
Owner
Nerdy Rodent
Just a nerdy rodent. I do arty stuff with computers.
Nerdy Rodent
Try out deep learning models online on Google Colab

Try out deep learning models online on Google Colab

Erdene-Ochir Tuguldur 1.1k Jul 21, 2021
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

null 105 Jul 27, 2021
Simple implementation of OpenAI CLIP model in PyTorch.

It was in January of 2021 that OpenAI announced two new models: DALL-E and CLIP, both multi-modality models connecting texts and images in some way. In this article we are going to implement CLIP model from scratch in PyTorch. OpenAI has open-sourced some of the code relating to CLIP model but I found it intimidating and it was far from something short and simple. I also came across a good tutorial inspired by CLIP model on Keras code examples and I translated some parts of it into PyTorch to build this tutorial totally with our beloved PyTorch!

Moein Shariatnia 57 Jul 26, 2021
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

TL;DR Ignite is a high-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently. Click on the image to

null 3.6k Jul 22, 2021
Accelerated deep learning R&D

Accelerated deep learning R&D PyTorch framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation, and

Catalyst-Team 2.7k Jul 24, 2021
The official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." *.

F-Clip — Fully Convolutional Line Parsing This repository contains the official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang

Xili Dai 54 Jul 3, 2021
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

This is a release of our VIMPAC paper to illustrate the implementations. The pretrained checkpoints and scripts will be soon open-sourced in HuggingFace transformers.

Hao Tan 39 Jul 18, 2021
Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

CLIP-GLaSS Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search An in-browser demo is

Federico Galatolo 107 Jul 18, 2021
CLIP+FFT text-to-image

Aphantasia This is a text-to-image tool, part of the artwork of the same name. Based on CLIP model, with FFT parameterizer from Lucent library as a ge

vadim epstein 239 Jul 20, 2021
Navigating StyleGAN2 w latent space using CLIP

Navigating StyleGAN2 w latent space using CLIP an attempt to build sth with the official SG2-ADA Pytorch impl kinda inspired by Generating Images from

Mike K. 33 Jun 17, 2021
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Phil Wang 3.2k Jul 23, 2021
A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

train-CLIP ?? A PyTorch Lightning solution to training CLIP from scratch. Goal ⚽ Our aim is to create an easy to use Lightning implementation of OpenA

Cade Gordon 106 Jul 23, 2021
Jupyter notebooks for the code samples of the book "Deep Learning with Python"

Jupyter notebooks for the code samples of the book "Deep Learning with Python"

François Chollet 13.3k Jul 22, 2021
Repository for XLM-T, a framework for evaluating multilingual language models on Twitter data

This is the XLM-T repository, which includes data, code and pre-trained multilingual language models for Twitter. XLM-T - A Multilingual Language Mode

Cardiff NLP 59 Jul 12, 2021