Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

Nerdy Rodent

Last update: Jan 4, 2023

Related tags

Overview

VQGAN-CLIP Overview

A repo for running VQGAN+CLIP locally. This started out as a Katherine Crowson VQGAN+CLIP derived Google colab notebook.

Original notebook:

Some example images:

Environment:

Tested on Ubuntu 20.04
GPU: Nvidia RTX 3090
Typical VRAM requirements:
- 24 GB for a 900x900 image
- 10 GB for a 512x512 image
- 8 GB for a 380x380 image

Still a work in progress - I've not actually tested everything yet :)

Set up

Example set up using Anaconda to create a virtual Python environment with the prerequisites:

conda create --name vqgan python=3.9
conda activate vqgan

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install ftfy regex tqdm omegaconf pytorch-lightning IPython kornia imageio imageio-ffmpeg einops 

git clone https://github.com/openai/CLIP
git clone https://github.com/CompVis/taming-transformers.git

You will also need at least 1 VQGAN pretrained model. E.g.

mkdir checkpoints
curl -L -o checkpoints/vqgan_imagenet_f16_16384.yaml -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_16384.yaml' #ImageNet 16384
curl -L -o checkpoints/vqgan_imagenet_f16_16384.ckpt -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_16384.ckpt' #ImageNet 16384

By default, the model .yaml and .ckpt files are expected in the checkpoints directory. See https://github.com/CompVis/taming-transformers for more information on datasets and models.

Run

To generate images from text, specify your text prompt as shown in the example below:

python generate.py -p "A painting of an apple in a fruit bowl"

Multiple prompts

Text and image prompts can be split using the pipe symbol in order to allow multiple prompts. For example:

python generate.py -p "A painting of an apple in a fruit bowl | psychedelic | surreal | weird"

Image prompts can be split in the same way. For example:

python generate.py -p "A picture of a bedroom with a portrait of Van Gogh" -ip "samples/VanGogh.jpg | samples/Bedroom.png"

"Style Transfer"

An input image with style text and a low number of iterations can be used create a sort of "style transfer" effect. For example:

python generate.py -p "A painting in the style of Picasso" -ii samples/VanGogh.jpg -i 80 -se 10 -opt AdamW -lr 0.25

Output	Style
	Picasso
	Sketch
	Psychedelic

Feedback example

By feeding back the generated images and making slight changes, some interesting effects can be created.

The example zoom.sh shows this by applying a zoom and rotate to generated images, before feeding them back in again. To use zoom.sh, specifying a text prompt, output filename and number of frames. E.g.

./zoom.sh "A painting of a red telephone box spinning through a time vortex" Telephone.png 150

Random text example

Use random.sh to make a batch of images from random text. Edit the text and number of generated images to your taste!

./random.sh

Advanced options

To view the available options, use "-h".

python generate.py -h

usage: generate.py [-h] [-p PROMPTS] [-o OUTPUT] [-i MAX_ITERATIONS] [-ip IMAGE_PROMPTS]
[-nps [NOISE_PROMPT_SEEDS ...]] [-npw [NOISE_PROMPT_WEIGHTS ...]] [-s SIZE SIZE]
[-ii INIT_IMAGE] [-iw INIT_WEIGHT] [-m CLIP_MODEL] [-conf VQGAN_CONFIG]
[-ckpt VQGAN_CHECKPOINT] [-lr STEP_SIZE] [-cuts CUTN] [-cutp CUT_POW] [-se DISPLAY_FREQ]
[-sd SEED] [-opt OPTIMISER]

optional arguments:
  -h, --help            show this help message and exit
  -p PROMPTS, --prompts PROMPTS
                        Text prompts
  -o OUTPUT, --output OUTPUT
                        Number of iterations
  -i MAX_ITERATIONS, --iterations MAX_ITERATIONS
                        Number of iterations
  -ip IMAGE_PROMPTS, --image_prompts IMAGE_PROMPTS
                        Image prompts / target image
  -nps [NOISE_PROMPT_SEEDS ...], --noise_prompt_seeds [NOISE_PROMPT_SEEDS ...]
                        Noise prompt seeds
  -npw [NOISE_PROMPT_WEIGHTS ...], --noise_prompt_weights [NOISE_PROMPT_WEIGHTS ...]
                        Noise prompt weights
  -s SIZE SIZE, --size SIZE SIZE
                        Image size (width height)
  -ii INIT_IMAGE, --init_image INIT_IMAGE
                        Initial image
  -iw INIT_WEIGHT, --init_weight INIT_WEIGHT
                        Initial image weight
  -m CLIP_MODEL, --clip_model CLIP_MODEL
                        CLIP model
  -conf VQGAN_CONFIG, --vqgan_config VQGAN_CONFIG
                        VQGAN config
  -ckpt VQGAN_CHECKPOINT, --vqgan_checkpoint VQGAN_CHECKPOINT
                        VQGAN checkpoint
  -lr STEP_SIZE, --learning_rate STEP_SIZE
                        Learning rate
  -cuts CUTN, --num_cuts CUTN
                        Number of cuts
  -cutp CUT_POW, --cut_power CUT_POW
                        Cut power
  -se DISPLAY_FREQ, --save_every DISPLAY_FREQ
                        Save image iterations
  -sd SEED, --seed SEED
                        Seed
  -opt OPTIMISER, --optimiser OPTIMISER
                        Optimiser (Adam, AdamW, Adagrad, Adamax)

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}

@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Katherine Crowson - https://github.com/crowsonkb

Public Domain images from Open Access Images at the Art Institute of Chicago - https://www.artic.edu/open-access/open-access-images

Comments

About code implementation Feedback example
I particularly like this example, which is a great discovery. Can you use code to realize this example? I'm running under WIN, but I can't realize zoom.sh

Is there any text prompt that can be generated automatically? I wonder if I can generate it myself , replace random.sh
opened by zhanghongyong123456 12

"nan" losses issue for some small subset of users

Example last few iterations of a user for whom it's not working:

47it [01:22,  1.60s/it]
48it [01:24,  1.70s/it]
49it [01:26,  1.67s/it]
50it [01:27,  1.64s/it]
                       
50it [01:28,  1.64s/it]
i: 0, loss: nan, losses: nan
i: 50, loss: nan, losses: nan

Example last few iterations for my PC:

[e] 48it [00:16,  2.77it/s]
[e] 49it [00:16,  2.80it/s]
[e] 50it [00:16,  2.94it/s]
[e]                        
[e] 50it [00:17,  2.94it/s]
i: 0, loss: 0.92412, losses: 0.92412
i: 50, loss: 0.765271, losses: 0.765271

I have no clue how to even begin debugging this

opened by monsieurpooh 10

No matching distribution found for torch==1.9.0+cu111

Hey there! I'm trying to run this on an M1 MacBook Pro.

I installed Anaconda and created and activated the environment like in the Readme. Then I tried to run the first pip command, but I get an error:

Any clue what's causing this and how to fix it?

opened by OfficialCRUGG 8
Why do I get different ouputs when using the same input on a different repo?

I got an amazing output that i got on the google collab notebook and am trying to replicate it to a larger scale running locally on my 3090. For some reason the outputs appear to have a different style than that of collab (im using the same model, prompt, seed and save interval..)

Is there something thats been altered ? Is it to do with the optimizer or learning rate? (these can't be specified on the collab notebook).

Thanks alot for this bit of software it has given me hours of experimenting and fun!

opened by shaolinseed 8
Download nvidia-smi -L SyntaxError:invalid syntax

Trying to open the program, renamed to python images to make it a little easier. Heres what I entered into cmd:

cd desktop

python pythonimages.py

nvidia-smi -L ^ SyntaxError:invalid syntax

Please help, thanks

opened by charlesf218 7
zoom.sh fails after 25 iterations

here is the output: Working with z of shape (1, 256, 16, 16) = 65536 dimensions. loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth VQLPIPSWithDiscriminator running with hinge loss. Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt Using device: cuda:0 Optimising using: Adam Using text prompts: ['Rectangle camera Robot surreal'] Using seed: 6045542010 i: 0, loss: 0.873045, losses: 0.873045 i: 25, loss: 0.817415, losses: 0.817415 25it [00:07, 3.42it/s] zoom.sh: 21: convert: not found zoom.sh: 22: convert: not found zoom.sh: 25: Syntax error: Bad for loop variable

opened by Logic-Beach 7
I keep getting a traceback trying to make a video

in line 988 "AttributeError: 'int' object has no attribute 'stdin'"

ffmpeg command failed - check your installation 0%| | 0/5 [00:00<?, ?it/s] Traceback (most recent call last): File "C:\Users\Caleb\VQGAN-CLIP\generate.py", line 988, in im.save(p.stdin, 'PNG') AttributeError: 'int' object has no attribute 'stdin'

Maybe i'm trying to make a video wrong, but the issue persists even with the provided example of the telephone box.

opened by constantupgrade 6
memory issues
I've been trying to do larger resolution images but no matter what size GPU I use, i get a message like the one below where it seems pytorch is using a massive amount of the available memory? Any advice on how to go about creating larger images?

GPU 0; 31.75 GiB total capacity; 29.72 GiB already allocated; 381.00 MiB free; 29.94 GiB reserved in total by PyTorch
opened by sidhomj 5
Cannot reproduce results from README.md

Hello, First of all, thank you for an amazing repository - this is both very spectacular and well-written. I am having a problem reproducing the results from the README.md file though. I was trying to reproduce the following input:

python generate.py -p "A painting of an apple in a fruit bowl"

However, at the output I am obtaining the following image:

Does anyone have the same issue or can guide me on what could be the reason for such discrepancies? Regards

opened by wprazuch 4

RuntimeError: Error(s) in loading state_dict for VQModel

So I'm trying to be brave and set this up on my Windows 10 machine running Conda since my Titan RTX GPU is on that box. I was able to install everything w/o any issues but when I try to run the example it bails out. Not 100% sure what the error is.

(vqgan) PS C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP> python generate.py -p "A painting of an apple in a fruit bowl"
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Traceback (most recent call last):
  File "C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP\generate.py", line 546, in <module>
    model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
  File "C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP\generate.py", line 520, in load_vqgan_model
    model.init_from_ckpt(checkpoint_path)
  File "C:\Users\stiet\anaconda3\envs\vqgan\lib\site-packages\taming\models\vqgan.py", line 48, in init_from_ckpt
    self.load_state_dict(sd, strict=False)
  File "C:\Users\stiet\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VQModel:
        size mismatch for loss.discriminator.main.8.weight: copying a param with shape torch.Size([1, 256, 4, 4]) from checkpoint, the shape in current model is torch.Size([512, 256, 4, 4]).
        size mismatch for quantize.embedding.weight: copying a param with shape torch.Size([16384, 256]) from checkpoint, the shape in current model is torch.Size([1024, 256]).
(vqgan) PS C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP> ls


    Directory: C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d-----         9/30/2021   3:52 PM                checkpoints
d-----         9/30/2021   3:23 PM                CLIP
d-----         9/30/2021   3:19 PM                samples
d-----         9/30/2021   3:54 PM                taming
d-----         9/30/2021   3:23 PM                taming-transformers
-a----         9/30/2021   3:19 PM            190 .gitignore
-a----         9/30/2021   3:19 PM           5277 download_models.sh
-a----         9/30/2021   3:19 PM          42380 generate.py
-a----         9/30/2021   3:19 PM           1095 LICENSE
-a----         9/30/2021   3:19 PM           1592 opt_tester.sh
-a----         9/30/2021   3:19 PM           1474 random.sh
-a----         9/30/2021   3:19 PM          13240 README.md
-a----         9/30/2021   3:19 PM           1187 requirements.txt
-a----         9/30/2021   3:19 PM           1544 video_styler.sh
-a----         9/30/2021   3:19 PM           2376 vqgan.yml
-a----         9/30/2021   3:19 PM           1444 zoom.sh

opened by gateway 4

Problem unidentified by a newbie (me)

Hello, I followed your video (thank a lot by the way, it seems like I did not followed well actually) Maybe you'll understand what I can do at this point :

(vqgan) C:\Users\Milaj\github\VQGAN-CLIP>python generate.py -p "A painting of an apple in a fruit bowl" Traceback (most recent call last): File "C:\Users\Milaj\github\VQGAN-CLIP\generate.py", line 466, in model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device) File "C:\Users\Milaj\github\VQGAN-CLIP\generate.py", line 436, in load_vqgan_model config = OmegaConf.load(config_path) File "C:\Users\Milaj\anaconda3\envs\vqgan\lib\site-packages\omegaconf\omegaconf.py", line 183, in load with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Milaj\github\VQGAN-CLIP\checkpoints\vqgan_imagenet_f16_16384.yaml'

Thank you in advance, tell me if you need more infos

opened by glouglou2marseille 4

Metal Performance Shaders (MPS) Support

https://pytorch.org/docs/stable/notes/mps.html

#47702

Install the latest PyTorch with Metal Performance Shaders (MPS) support:

They're in stable, so you probably already have it.

Stable:

pip install torch torchvision torchaudio -f https://download.pytorch.org/whl/torch_stable.html

Nightly:

pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

(Source: https://pytorch.org/tutorials/prototype/ios_gpu_workflow.html)

pip uninstall torch torchvision torchaudio if for some reason you need to remove them.

Verifying the existence of MPS support in PyTorch:

(vqgan) sysadmin@codekitty VQGAN-CLIP % python
Python 3.9.12 (main, Jun  1 2022, 06:34:44)
[Clang 12.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.has_mps
True
>>> torch.backends.mps.is_available()
True
>>> torch.backends.mps.is_built()
True

In `generate.py` remove or comment out the following:

if not args.cuda_device == 'cpu' and not torch.cuda.is_available():
    args.cuda_device = 'cpu'
    args.video_fps = 0
    print("Warning: No GPU found! Using the CPU instead. The iterations will be slow.#")
    print("Perhaps CUDA/ROCm or the right pytorch version is not properly installed?")

(for testing purposes only)

For future reference, you can check for MPS availability with

if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print("MPS not available because the current PyTorch install was not "
              "built with MPS enabled.")
    else:
        print("MPS not available because the current MacOS version is not 12.3+ "
              "and/or you do not have an MPS-enabled device on this machine.")

Getting `random.sh` working on Mac:

The script errors out because shuf is missing, we can get shuf by installing coreutils.

brew install coreutils

Running:

python generate.py -p "A painting of an apple in a fruit bowl" -cd mps

Attempt 1:

Output:

(vqgan) sysadmin@codekitty VQGAN-CLIP % python generate.py --cuda_device mps -p "A painting of an apple in a fruit bowl"
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt
Traceback (most recent call last):
  File "/Users/sysadmin/Development/VQGAN-CLIP/generate.py", line 625, in <module>
    embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float()
  File "/Users/sysadmin/Development/VQGAN-CLIP/CLIP/clip/model.py", line 354, in encode_text
    x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection
NotImplementedError: The operator 'aten::index.Tensor' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Attempt 2:

Running with:

export PYTORCH_ENABLE_MPS_FALLBACK=1

Output:

(vqgan) sysadmin@codekitty VQGAN-CLIP % python generate.py --cuda_device mps -p "A painting of an apple in a fruit bowl"
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt
Using device: mps
Optimising using: Adam
Using text prompts: ['A painting of an apple in a fruit bowl']
Using seed: 4061991638698407707
0it [00:00, ?it/s]-:27:11: error: invalid input tensor shapes, indices shape and updates shape must be equal
-:27:11: note: see current operation: %25 = "mps.scatter_along_axis"(%23, %arg5, %24, %1) {mode = 6 : i32} : (tensor<3311616xf32>, tensor<224xf32>, tensor<1103872xi32>, tensor<i32>) -> tensor<3311616xf32>
/AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1267: failed assertion `Error: MLIR pass manager failed'
zsh: abort      python generate.py --cuda_device mps -p
(vqgan) sysadmin@codekitty VQGAN-CLIP % /Users/sysadmin/Development/miniconda3/envs/vqgan/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

opened by researcx 0

python generate.py -p "A painting of an apple in a fruit bowl" Illegal instruction (core dumped)

Run through all the instruction but it wouldn't work so I tried to pip install all of the requirements. I am running on Linux mint latest version. but when go to run it I get ------ python generate.py -p "A painting of an apple in a fruit bowl" Illegal instruction (core dumped) -------- I tried updating my NVIDIA drivers i using a 3060 RTX-Geforce.

opened by Abrahm1234 0

Running generate.py from another python file "best practise"

I want to call generate.py from my python application, what is the "best" way to do this? i currently have two options:

This uses the shell to run another process, not ideal as I would like to stay in python, but its easy...

  import os
  vqgan_generate_dir = "/foo/baa"

  def generate(*, prompts=None):
      vqgan_arguments = []
  
      if prompts:
          vqgan_arguments.append("--prompts")
          vqgan_arguments.append(prompts)
      else:
          default_prompt="a nice default prompt"
          vqgan_arguments.append("--prompts")
          vqgan_arguments.append(default_prompt)
  
  
      vqan_argument_string = ' '.join(vqgan_arguments)
      # os.system(f"{vqgan_generate_dir}/generate.py {vqan_argument_string}")
      print(f"{vqgan_generate_dir}/generate.py {vqan_argument_string}")

This uses straight Python by modifying sys.argv before calling generate.py, this seems over convoluted to pass args:

  import sys
  import argparse
  import imp

  vqgan_generate_dir = "/foo/baa/"

  def generate(   prompts   ):

      vq_parser = argparse.ArgumentParser(description='Process some params.')
      vq_parser.add_argument("-p",    "--prompts", type=str, help="Text prompts", default=None, dest='prompts')
     
      sys.argv = ['generate.py'] # define your commandline arguments here
  
      if prompts:
          sys.argv.append("-p")
          sys.argv.append(prompts)
  
      args = vq_parser.parse_args()
      print (args)
  
      ############################################################################
      #Load & run the generate.py module
      ############################################################################
  
      try:
          fp, pathname, description = imp.find_module('generate', vqgan_generate_path)
          generate = imp.load_module("generate", fp, pathname, description)
      except ImportError:
          print(f"Could not import: {vqgan_generate_dir}generate.py")
          quit()
      finally:
          if fp:
              fp.close()

neither of these are tested yet as I'm just exploring concepts, what do you think is the best way? is there another way?

option two gets really long-winded once you want ALL args, and option one is overly simplistic and doesn't allow you to control the params going in well.

opened by vendablefall 0

the effect of init_weight

https://github.com/nerdyrodent/VQGAN-CLIP/blob/a6c8c487b89727d3c3440b8b3c406331c12275d6/generate.py#L726

Why calculate a mseloss of z and 0 and multiply it by a function on init_weight? Is it some kind of regular term?

opened by WorldHellooo 0
Reducing Generation Time

This is a fantastic tool! I've been using it to generate several one-off images and I'm excited to expand upon it and contribute in any way I can.

It takes at least 20 minutes to generate a default 512x512 image, and maybe 15 minutes to generate a 256x256 image on an AWS P2.XL instance. I get around 3.26s/it. The result is similar on P2.8XL instance, with 8 available GPUs. This is quite a long runtime, especially with expensive AWS instances.

I want to be able to scale the image generation workload by either running multiple prompts simultaneously or pool the generation across multiple gpus to reduce the draw time to sub 5 minutes.

Parallel prompt generation is more straightforward to accomplish. You can keep a map of active jobs on the visible GPUs and a queue of prompts to distribute among them, spinning up a new generate.py process for each available GPU.

The hard part I have been trying to examine is the draw time. Is there a way to map a single generation job across multiple, distributed or local, GPUs? This question has been raised in issues before, such as #24 and #47, but I want to bring it back. The challenge is that image generation occurs over multiple iterations of the same base seeded image, essentially procedurally generates the next layer, so you can't neatly split the "dataset" like in other ML workloads. I am not as strong in ML engineering as I thought I was, so if someone can help me clarify what I am getting at or tell me where I am wrong in this explanation/approach, I would highly appreciate it and close the issue.

I am familiar with AWS infrastructure such as EC2, Sagemaker, Glue, Batch, so if you think it's an infrastructure or hardware bottleneck, I can research more in that direction.

TLDR: How do we scale up image generation? In what way can we cut down generation time from ~20 minutes to ~5 minutes?

Let me know what you think. Thank you.

opened by SafaTinaztepe 3

Owner

Nerdy Rodent

Just a nerdy rodent. I do arty stuff with computers.

GitHub

Traditional deepdream with VQGAN+CLIP and optical flow. Ready to use in Google Colab

VQGAN-CLIP-Video cat.mp4 policeman.mp4 schoolboy.mp4 forsenBOG.mp4

23 Oct 26, 2022

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt. This is done by

135 Dec 30, 2022

A colab notebook for training Stylegan2-ada on colab, transfer learning onto your own dataset.

Stylegan2-Ada-Google-Colab-Starter-Notebook A no thrills colab notebook for training Stylegan2-ada on colab. transfer learning onto your own dataset h

66 Dec 16, 2022

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

VQGAN-CLIP-Docker About Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized This is a stripped and minimal dependency repository for running loca

73 Sep 11, 2022

CLIP + VQGAN / PixelDraw

clipit Yet Another VQGAN-CLIP Codebase This started as a fork of @nerdyrodent's VQGAN-CLIP code which was based on the notebooks of @RiversWithWings a

276 Dec 12, 2022

Streamlit Tutorial (ex: stock price dashboard, cartoon-stylegan, vqgan-clip, stylemixing, styleclip, sefa)

Streamlit Tutorials Install pip install streamlit Run cd [directory] streamlit run app.py --server.address 0.0.0.0 --server.port [your port] # http:/

30 Jan 6, 2023

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models. You can easily generate all kind of art from drawing, painting, sketch, or even a specific artist style just using a text input. You can also specify the dimensions of the image. The process can take 3-20 mins and the results will be emailed to you.

643 Dec 30, 2022

An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics.

Sketch Simulator An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics. See

12 Dec 18, 2022

Making a music video with Wav2CLIP and VQGAN-CLIP

music2video Overview A repo for making a music video with Wav2CLIP and VQGAN-CLIP. The base code was derived from VQGAN-CLIP The CLIP embedding for au

163 Dec 26, 2022

Api for getting bin info and getting encrypted card details for adyen.

Bin Info And Adyen Cse Enc Python api for getting bin info and getting encrypted

8 Dec 30, 2022

Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

QSORT QSORT(Quick + Simple Online and Realtime Tracking) is a simple online and realtime tracking algorithm for 2D multiple object tracking in video s

8 Jul 27, 2022

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

191 Dec 31, 2022

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

Related tags

Overview

VQGAN-CLIP Overview

Set up

Run

Multiple prompts

"Style Transfer"

Feedback example

Random text example

Advanced options

Citations

Comments

Install the latest PyTorch with Metal Performance Shaders (MPS) support:

In generate.py remove or comment out the following:

Getting random.sh working on Mac:

Running:

Attempt 1:

Attempt 2:

Owner

Nerdy Rodent

Traditional deepdream with VQGAN+CLIP and optical flow. Ready to use in Google Colab

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt

A colab notebook for training Stylegan2-ada on colab, transfer learning onto your own dataset.

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

CLIP + VQGAN / PixelDraw

Streamlit Tutorial (ex: stock price dashboard, cartoon-stylegan, vqgan-clip, stylemixing, styleclip, sefa)

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models

An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics.

Making a music video with Wav2CLIP and VQGAN-CLIP

Api for getting bin info and getting encrypted card details for adyen.

Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

In the case of your data having only 1 channel while want to use timm models

Just-Now - This Is Just Now Login Friendlist Cloner Tools

Try out deep learning models online on Google Colab

Monitor your ML jobs on mobile devices📱, especially for Google Colab / Kaggle

Stroke-predictions-ml-model - Machine learning model to predict individuals chances of having a stroke

A simple image/video to Desmos graph converter run locally

In `generate.py` remove or comment out the following:

Getting `random.sh` working on Mac: