Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

Overview

VQGAN-CLIP Overview

A repo for running VQGAN+CLIP locally. This started out as a Katherine Crowson VQGAN+CLIP derived Google colab notebook.

Original notebook: Open In Colab

Some example images:

Environment:

  • Tested on Ubuntu 20.04
  • GPU: Nvidia RTX 3090
  • Typical VRAM requirements:
    • 24 GB for a 900x900 image
    • 10 GB for a 512x512 image
    • 8 GB for a 380x380 image

Still a work in progress - I've not actually tested everything yet :)

Set up

Example set up using Anaconda to create a virtual Python environment with the prerequisites:

conda create --name vqgan python=3.9
conda activate vqgan

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install ftfy regex tqdm omegaconf pytorch-lightning IPython kornia imageio imageio-ffmpeg einops 

git clone https://github.com/openai/CLIP
git clone https://github.com/CompVis/taming-transformers.git

You will also need at least 1 VQGAN pretrained model. E.g.

mkdir checkpoints
curl -L -o checkpoints/vqgan_imagenet_f16_16384.yaml -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_16384.yaml' #ImageNet 16384
curl -L -o checkpoints/vqgan_imagenet_f16_16384.ckpt -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_16384.ckpt' #ImageNet 16384

By default, the model .yaml and .ckpt files are expected in the checkpoints directory. See https://github.com/CompVis/taming-transformers for more information on datasets and models.

Run

To generate images from text, specify your text prompt as shown in the example below:

python generate.py -p "A painting of an apple in a fruit bowl"

Multiple prompts

Text and image prompts can be split using the pipe symbol in order to allow multiple prompts. For example:

python generate.py -p "A painting of an apple in a fruit bowl | psychedelic | surreal | weird"

Image prompts can be split in the same way. For example:

python generate.py -p "A picture of a bedroom with a portrait of Van Gogh" -ip "samples/VanGogh.jpg | samples/Bedroom.png"

"Style Transfer"

An input image with style text and a low number of iterations can be used create a sort of "style transfer" effect. For example:

python generate.py -p "A painting in the style of Picasso" -ii samples/VanGogh.jpg -i 80 -se 10 -opt AdamW -lr 0.25
Output Style
Picasso
Sketch
Psychedelic

Feedback example

By feeding back the generated images and making slight changes, some interesting effects can be created.

The example zoom.sh shows this by applying a zoom and rotate to generated images, before feeding them back in again. To use zoom.sh, specifying a text prompt, output filename and number of frames. E.g.

./zoom.sh "A painting of a red telephone box spinning through a time vortex" Telephone.png 150

Random text example

Use random.sh to make a batch of images from random text. Edit the text and number of generated images to your taste!

./random.sh

Advanced options

To view the available options, use "-h".

python generate.py -h
usage: generate.py [-h] [-p PROMPTS] [-o OUTPUT] [-i MAX_ITERATIONS] [-ip IMAGE_PROMPTS]
[-nps [NOISE_PROMPT_SEEDS ...]] [-npw [NOISE_PROMPT_WEIGHTS ...]] [-s SIZE SIZE]
[-ii INIT_IMAGE] [-iw INIT_WEIGHT] [-m CLIP_MODEL] [-conf VQGAN_CONFIG]
[-ckpt VQGAN_CHECKPOINT] [-lr STEP_SIZE] [-cuts CUTN] [-cutp CUT_POW] [-se DISPLAY_FREQ]
[-sd SEED] [-opt OPTIMISER]
optional arguments:
  -h, --help            show this help message and exit
  -p PROMPTS, --prompts PROMPTS
                        Text prompts
  -o OUTPUT, --output OUTPUT
                        Number of iterations
  -i MAX_ITERATIONS, --iterations MAX_ITERATIONS
                        Number of iterations
  -ip IMAGE_PROMPTS, --image_prompts IMAGE_PROMPTS
                        Image prompts / target image
  -nps [NOISE_PROMPT_SEEDS ...], --noise_prompt_seeds [NOISE_PROMPT_SEEDS ...]
                        Noise prompt seeds
  -npw [NOISE_PROMPT_WEIGHTS ...], --noise_prompt_weights [NOISE_PROMPT_WEIGHTS ...]
                        Noise prompt weights
  -s SIZE SIZE, --size SIZE SIZE
                        Image size (width height)
  -ii INIT_IMAGE, --init_image INIT_IMAGE
                        Initial image
  -iw INIT_WEIGHT, --init_weight INIT_WEIGHT
                        Initial image weight
  -m CLIP_MODEL, --clip_model CLIP_MODEL
                        CLIP model
  -conf VQGAN_CONFIG, --vqgan_config VQGAN_CONFIG
                        VQGAN config
  -ckpt VQGAN_CHECKPOINT, --vqgan_checkpoint VQGAN_CHECKPOINT
                        VQGAN checkpoint
  -lr STEP_SIZE, --learning_rate STEP_SIZE
                        Learning rate
  -cuts CUTN, --num_cuts CUTN
                        Number of cuts
  -cutp CUT_POW, --cut_power CUT_POW
                        Cut power
  -se DISPLAY_FREQ, --save_every DISPLAY_FREQ
                        Save image iterations
  -sd SEED, --seed SEED
                        Seed
  -opt OPTIMISER, --optimiser OPTIMISER
                        Optimiser (Adam, AdamW, Adagrad, Adamax)

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}
@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Katherine Crowson - https://github.com/crowsonkb

Public Domain images from Open Access Images at the Art Institute of Chicago - https://www.artic.edu/open-access/open-access-images

Comments
  • About code implementation  Feedback example

    About code implementation Feedback example

    1. I particularly like this example, which is a great discovery. Can you use code to realize this example? I'm running under WIN, but I can't realize zoom.sh
    2. Is there any text prompt that can be generated automatically? I wonder if I can generate it myself , replace random.sh
    opened by zhanghongyong123456 12
  • "nan" losses issue for some small subset of users

    Example last few iterations of a user for whom it's not working:

    47it [01:22,  1.60s/it]
    48it [01:24,  1.70s/it]
    49it [01:26,  1.67s/it]
    50it [01:27,  1.64s/it]
                           
    50it [01:28,  1.64s/it]
    i: 0, loss: nan, losses: nan
    i: 50, loss: nan, losses: nan
    

    Example last few iterations for my PC:

    [e] 48it [00:16,  2.77it/s]
    [e] 49it [00:16,  2.80it/s]
    [e] 50it [00:16,  2.94it/s]
    [e]                        
    [e] 50it [00:17,  2.94it/s]
    i: 0, loss: 0.92412, losses: 0.92412
    i: 50, loss: 0.765271, losses: 0.765271
    

    I have no clue how to even begin debugging this

    opened by monsieurpooh 10
  • No matching distribution found for torch==1.9.0+cu111

    No matching distribution found for torch==1.9.0+cu111

    Hey there! I'm trying to run this on an M1 MacBook Pro.

    I installed Anaconda and created and activated the environment like in the Readme. Then I tried to run the first pip command, but I get an error: image

    Any clue what's causing this and how to fix it?

    opened by OfficialCRUGG 8
  • Why do I get different ouputs when using the same input on a different repo?

    Why do I get different ouputs when using the same input on a different repo?

    I got an amazing output that i got on the google collab notebook and am trying to replicate it to a larger scale running locally on my 3090. For some reason the outputs appear to have a different style than that of collab (im using the same model, prompt, seed and save interval..)

    Is there something thats been altered ? Is it to do with the optimizer or learning rate? (these can't be specified on the collab notebook).

    Thanks alot for this bit of software it has given me hours of experimenting and fun!

    opened by shaolinseed 8
  • Download nvidia-smi -L SyntaxError:invalid syntax

    Download nvidia-smi -L SyntaxError:invalid syntax

    Trying to open the program, renamed to python images to make it a little easier. Heres what I entered into cmd:

    cd desktop

    python pythonimages.py

    nvidia-smi -L ^ SyntaxError:invalid syntax

    Please help, thanks

    opened by charlesf218 7
  • zoom.sh fails after 25 iterations

    zoom.sh fails after 25 iterations

    here is the output: Working with z of shape (1, 256, 16, 16) = 65536 dimensions. loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth VQLPIPSWithDiscriminator running with hinge loss. Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt Using device: cuda:0 Optimising using: Adam Using text prompts: ['Rectangle camera Robot surreal'] Using seed: 6045542010 i: 0, loss: 0.873045, losses: 0.873045 i: 25, loss: 0.817415, losses: 0.817415 25it [00:07, 3.42it/s] zoom.sh: 21: convert: not found zoom.sh: 22: convert: not found zoom.sh: 25: Syntax error: Bad for loop variable

    opened by Logic-Beach 7
  • I keep getting a traceback trying to make a video

    I keep getting a traceback trying to make a video

    in line 988 "AttributeError: 'int' object has no attribute 'stdin'"

    ffmpeg command failed - check your installation 0%| | 0/5 [00:00<?, ?it/s] Traceback (most recent call last): File "C:\Users\Caleb\VQGAN-CLIP\generate.py", line 988, in im.save(p.stdin, 'PNG') AttributeError: 'int' object has no attribute 'stdin'

    Maybe i'm trying to make a video wrong, but the issue persists even with the provided example of the telephone box.

    opened by constantupgrade 6
  • memory issues

    memory issues

    I've been trying to do larger resolution images but no matter what size GPU I use, i get a message like the one below where it seems pytorch is using a massive amount of the available memory? Any advice on how to go about creating larger images?

    GPU 0; 31.75 GiB total capacity; 29.72 GiB already allocated; 381.00 MiB free; 29.94 GiB reserved in total by PyTorch
    
    opened by sidhomj 5
  • Cannot reproduce results from README.md

    Cannot reproduce results from README.md

    Hello, First of all, thank you for an amazing repository - this is both very spectacular and well-written. I am having a problem reproducing the results from the README.md file though. I was trying to reproduce the following input:

    python generate.py -p "A painting of an apple in a fruit bowl"

    However, at the output I am obtaining the following image:

    output

    Does anyone have the same issue or can guide me on what could be the reason for such discrepancies? Regards

    opened by wprazuch 4
  • RuntimeError: Error(s) in loading state_dict for VQModel

    RuntimeError: Error(s) in loading state_dict for VQModel

    So I'm trying to be brave and set this up on my Windows 10 machine running Conda since my Titan RTX GPU is on that box. I was able to install everything w/o any issues but when I try to run the example it bails out. Not 100% sure what the error is.

    (vqgan) PS C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP> python generate.py -p "A painting of an apple in a fruit bowl"
    Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
    loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth
    VQLPIPSWithDiscriminator running with hinge loss.
    Traceback (most recent call last):
      File "C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP\generate.py", line 546, in <module>
        model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
      File "C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP\generate.py", line 520, in load_vqgan_model
        model.init_from_ckpt(checkpoint_path)
      File "C:\Users\stiet\anaconda3\envs\vqgan\lib\site-packages\taming\models\vqgan.py", line 48, in init_from_ckpt
        self.load_state_dict(sd, strict=False)
      File "C:\Users\stiet\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict
        raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
    RuntimeError: Error(s) in loading state_dict for VQModel:
            size mismatch for loss.discriminator.main.8.weight: copying a param with shape torch.Size([1, 256, 4, 4]) from checkpoint, the shape in current model is torch.Size([512, 256, 4, 4]).
            size mismatch for quantize.embedding.weight: copying a param with shape torch.Size([16384, 256]) from checkpoint, the shape in current model is torch.Size([1024, 256]).
    (vqgan) PS C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP> ls
    
    
        Directory: C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP
    
    
    Mode                 LastWriteTime         Length Name
    ----                 -------------         ------ ----
    d-----         9/30/2021   3:52 PM                checkpoints
    d-----         9/30/2021   3:23 PM                CLIP
    d-----         9/30/2021   3:19 PM                samples
    d-----         9/30/2021   3:54 PM                taming
    d-----         9/30/2021   3:23 PM                taming-transformers
    -a----         9/30/2021   3:19 PM            190 .gitignore
    -a----         9/30/2021   3:19 PM           5277 download_models.sh
    -a----         9/30/2021   3:19 PM          42380 generate.py
    -a----         9/30/2021   3:19 PM           1095 LICENSE
    -a----         9/30/2021   3:19 PM           1592 opt_tester.sh
    -a----         9/30/2021   3:19 PM           1474 random.sh
    -a----         9/30/2021   3:19 PM          13240 README.md
    -a----         9/30/2021   3:19 PM           1187 requirements.txt
    -a----         9/30/2021   3:19 PM           1544 video_styler.sh
    -a----         9/30/2021   3:19 PM           2376 vqgan.yml
    -a----         9/30/2021   3:19 PM           1444 zoom.sh
    
    opened by gateway 4
  • Problem unidentified by a newbie (me)

    Problem unidentified by a newbie (me)

    Hello, I followed your video (thank a lot by the way, it seems like I did not followed well actually) Maybe you'll understand what I can do at this point :

    (vqgan) C:\Users\Milaj\github\VQGAN-CLIP>python generate.py -p "A painting of an apple in a fruit bowl" Traceback (most recent call last): File "C:\Users\Milaj\github\VQGAN-CLIP\generate.py", line 466, in model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device) File "C:\Users\Milaj\github\VQGAN-CLIP\generate.py", line 436, in load_vqgan_model config = OmegaConf.load(config_path) File "C:\Users\Milaj\anaconda3\envs\vqgan\lib\site-packages\omegaconf\omegaconf.py", line 183, in load with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Milaj\github\VQGAN-CLIP\checkpoints\vqgan_imagenet_f16_16384.yaml'

    Thank you in advance, tell me if you need more infos

    opened by glouglou2marseille 4
  • Metal Performance Shaders (MPS) Support

    Metal Performance Shaders (MPS) Support

    https://pytorch.org/docs/stable/notes/mps.html

    #47702

    Install the latest PyTorch with Metal Performance Shaders (MPS) support:

    They're in stable, so you probably already have it.

    Stable:

    pip install torch torchvision torchaudio -f https://download.pytorch.org/whl/torch_stable.html
    

    Nightly:

    pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
    

    (Source: https://pytorch.org/tutorials/prototype/ios_gpu_workflow.html)

    pip uninstall torch torchvision torchaudio if for some reason you need to remove them.

    Verifying the existence of MPS support in PyTorch:

    (vqgan) sysadmin@codekitty VQGAN-CLIP % python
    Python 3.9.12 (main, Jun  1 2022, 06:34:44)
    [Clang 12.0.0 ] :: Anaconda, Inc. on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import torch
    >>> torch.has_mps
    True
    >>> torch.backends.mps.is_available()
    True
    >>> torch.backends.mps.is_built()
    True
    

    In generate.py remove or comment out the following:

    if not args.cuda_device == 'cpu' and not torch.cuda.is_available():
        args.cuda_device = 'cpu'
        args.video_fps = 0
        print("Warning: No GPU found! Using the CPU instead. The iterations will be slow.#")
        print("Perhaps CUDA/ROCm or the right pytorch version is not properly installed?")
    

    (for testing purposes only)

    For future reference, you can check for MPS availability with

    if not torch.backends.mps.is_available():
        if not torch.backends.mps.is_built():
            print("MPS not available because the current PyTorch install was not "
                  "built with MPS enabled.")
        else:
            print("MPS not available because the current MacOS version is not 12.3+ "
                  "and/or you do not have an MPS-enabled device on this machine.")
    

    Getting random.sh working on Mac:

    The script errors out because shuf is missing, we can get shuf by installing coreutils.

    brew install coreutils
    

    Running:

    python generate.py -p "A painting of an apple in a fruit bowl" -cd mps
    

    Attempt 1:

    Output:

    (vqgan) sysadmin@codekitty VQGAN-CLIP % python generate.py --cuda_device mps -p "A painting of an apple in a fruit bowl"
    Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
    loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
    VQLPIPSWithDiscriminator running with hinge loss.
    Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt
    Traceback (most recent call last):
      File "/Users/sysadmin/Development/VQGAN-CLIP/generate.py", line 625, in <module>
        embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float()
      File "/Users/sysadmin/Development/VQGAN-CLIP/CLIP/clip/model.py", line 354, in encode_text
        x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection
    NotImplementedError: The operator 'aten::index.Tensor' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
    

    Attempt 2:

    Running with:

    export PYTORCH_ENABLE_MPS_FALLBACK=1
    

    Output:

    (vqgan) sysadmin@codekitty VQGAN-CLIP % python generate.py --cuda_device mps -p "A painting of an apple in a fruit bowl"
    Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
    loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
    VQLPIPSWithDiscriminator running with hinge loss.
    Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt
    Using device: mps
    Optimising using: Adam
    Using text prompts: ['A painting of an apple in a fruit bowl']
    Using seed: 4061991638698407707
    0it [00:00, ?it/s]-:27:11: error: invalid input tensor shapes, indices shape and updates shape must be equal
    -:27:11: note: see current operation: %25 = "mps.scatter_along_axis"(%23, %arg5, %24, %1) {mode = 6 : i32} : (tensor<3311616xf32>, tensor<224xf32>, tensor<1103872xi32>, tensor<i32>) -> tensor<3311616xf32>
    /AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1267: failed assertion `Error: MLIR pass manager failed'
    zsh: abort      python generate.py --cuda_device mps -p
    (vqgan) sysadmin@codekitty VQGAN-CLIP % /Users/sysadmin/Development/miniconda3/envs/vqgan/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
      warnings.warn('resource_tracker: There appear to be %d '
    
    opened by researcx 0
  • python generate.py -p

    python generate.py -p "A painting of an apple in a fruit bowl" Illegal instruction (core dumped)

    Run through all the instruction but it wouldn't work so I tried to pip install all of the requirements. I am running on Linux mint latest version. but when go to run it I get ------ python generate.py -p "A painting of an apple in a fruit bowl" Illegal instruction (core dumped) -------- I tried updating my NVIDIA drivers i using a 3060 RTX-Geforce.

    opened by Abrahm1234 0
  • Running generate.py from another python file

    Running generate.py from another python file "best practise"

    I want to call generate.py from my python application, what is the "best" way to do this? i currently have two options:

    This uses the shell to run another process, not ideal as I would like to stay in python, but its easy...

      import os
      vqgan_generate_dir = "/foo/baa"
    
      def generate(*, prompts=None):
          vqgan_arguments = []
      
          if prompts:
              vqgan_arguments.append("--prompts")
              vqgan_arguments.append(prompts)
          else:
              default_prompt="a nice default prompt"
              vqgan_arguments.append("--prompts")
              vqgan_arguments.append(default_prompt)
      
      
          vqan_argument_string = ' '.join(vqgan_arguments)
          # os.system(f"{vqgan_generate_dir}/generate.py {vqan_argument_string}")
          print(f"{vqgan_generate_dir}/generate.py {vqan_argument_string}")
    

    This uses straight Python by modifying sys.argv before calling generate.py, this seems over convoluted to pass args:

      import sys
      import argparse
      import imp
    
      vqgan_generate_dir = "/foo/baa/"
    
      def generate(   prompts   ):
    
          vq_parser = argparse.ArgumentParser(description='Process some params.')
          vq_parser.add_argument("-p",    "--prompts", type=str, help="Text prompts", default=None, dest='prompts')
         
          sys.argv = ['generate.py'] # define your commandline arguments here
      
          if prompts:
              sys.argv.append("-p")
              sys.argv.append(prompts)
      
          args = vq_parser.parse_args()
          print (args)
      
          ############################################################################
          #Load & run the generate.py module
          ############################################################################
      
          try:
              fp, pathname, description = imp.find_module('generate', vqgan_generate_path)
              generate = imp.load_module("generate", fp, pathname, description)
          except ImportError:
              print(f"Could not import: {vqgan_generate_dir}generate.py")
              quit()
          finally:
              if fp:
                  fp.close()
    

    neither of these are tested yet as I'm just exploring concepts, what do you think is the best way? is there another way?

    option two gets really long-winded once you want ALL args, and option one is overly simplistic and doesn't allow you to control the params going in well.

    opened by vendablefall 0
  • the effect of init_weight

    the effect of init_weight

    https://github.com/nerdyrodent/VQGAN-CLIP/blob/a6c8c487b89727d3c3440b8b3c406331c12275d6/generate.py#L726

    Why calculate a mseloss of z and 0 and multiply it by a function on init_weight? Is it some kind of regular term?

    opened by WorldHellooo 0
  • Reducing Generation Time

    Reducing Generation Time

    This is a fantastic tool! I've been using it to generate several one-off images and I'm excited to expand upon it and contribute in any way I can.

    It takes at least 20 minutes to generate a default 512x512 image, and maybe 15 minutes to generate a 256x256 image on an AWS P2.XL instance. I get around 3.26s/it. The result is similar on P2.8XL instance, with 8 available GPUs. This is quite a long runtime, especially with expensive AWS instances.

    I want to be able to scale the image generation workload by either running multiple prompts simultaneously or pool the generation across multiple gpus to reduce the draw time to sub 5 minutes.

    Parallel prompt generation is more straightforward to accomplish. You can keep a map of active jobs on the visible GPUs and a queue of prompts to distribute among them, spinning up a new generate.py process for each available GPU.

    The hard part I have been trying to examine is the draw time. Is there a way to map a single generation job across multiple, distributed or local, GPUs? This question has been raised in issues before, such as #24 and #47, but I want to bring it back. The challenge is that image generation occurs over multiple iterations of the same base seeded image, essentially procedurally generates the next layer, so you can't neatly split the "dataset" like in other ML workloads. I am not as strong in ML engineering as I thought I was, so if someone can help me clarify what I am getting at or tell me where I am wrong in this explanation/approach, I would highly appreciate it and close the issue.

    I am familiar with AWS infrastructure such as EC2, Sagemaker, Glue, Batch, so if you think it's an infrastructure or hardware bottleneck, I can research more in that direction.

    TLDR: How do we scale up image generation? In what way can we cut down generation time from ~20 minutes to ~5 minutes?

    Let me know what you think. Thank you.

    opened by SafaTinaztepe 3
Owner
Nerdy Rodent
Just a nerdy rodent. I do arty stuff with computers.
Nerdy Rodent
Traditional deepdream with VQGAN+CLIP and optical flow. Ready to use in Google Colab

VQGAN-CLIP-Video cat.mp4 policeman.mp4 schoolboy.mp4 forsenBOG.mp4

null 23 Oct 26, 2022
Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt. This is done by

Mehdi Cherti 135 Dec 30, 2022
A colab notebook for training Stylegan2-ada on colab, transfer learning onto your own dataset.

Stylegan2-Ada-Google-Colab-Starter-Notebook A no thrills colab notebook for training Stylegan2-ada on colab. transfer learning onto your own dataset h

Harnick Khera 66 Dec 16, 2022
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

VQGAN-CLIP-Docker About Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized This is a stripped and minimal dependency repository for running loca

Kevin Costa 73 Sep 11, 2022
CLIP + VQGAN / PixelDraw

clipit Yet Another VQGAN-CLIP Codebase This started as a fork of @nerdyrodent's VQGAN-CLIP code which was based on the notebooks of @RiversWithWings a

dribnet 276 Dec 12, 2022
Streamlit Tutorial (ex: stock price dashboard, cartoon-stylegan, vqgan-clip, stylemixing, styleclip, sefa)

Streamlit Tutorials Install pip install streamlit Run cd [directory] streamlit run app.py --server.address 0.0.0.0 --server.port [your port] # http:/

Jihye Back 30 Jan 6, 2023
Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models. You can easily generate all kind of art from drawing, painting, sketch, or even a specific artist style just using a text input. You can also specify the dimensions of the image. The process can take 3-20 mins and the results will be emailed to you.

Muhammad Fathy Rashad 643 Dec 30, 2022
An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics.

Sketch Simulator An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics. See

null 12 Dec 18, 2022
Making a music video with Wav2CLIP and VQGAN-CLIP

music2video Overview A repo for making a music video with Wav2CLIP and VQGAN-CLIP. The base code was derived from VQGAN-CLIP The CLIP embedding for au

Joel Jang | 장요엘 163 Dec 26, 2022
Api for getting bin info and getting encrypted card details for adyen.

Bin Info And Adyen Cse Enc Python api for getting bin info and getting encrypted

Roldex Stark 8 Dec 30, 2022
Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT

QSORT QSORT(Quick + Simple Online and Realtime Tracking) is a simple online and realtime tracking algorithm for 2D multiple object tracking in video s

Yonghye Kwon 8 Jul 27, 2022
FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

XCL 191 Dec 31, 2022
CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

CLIP-GEN [简体中文][English] 本项目在萤火二号集群上用 PyTorch 实现了论文 《CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP》。 CLIP-GEN 是一个 Language-F

null 75 Dec 29, 2022
In the case of your data having only 1 channel while want to use timm models

timm_custom Description In the case of your data having only 1 channel while want to use timm models (with or without pretrained weights), run the fol

null 2 Nov 26, 2021
Just-Now - This Is Just Now Login Friendlist Cloner Tools

JUST NOW LOGIN FRIENDLIST CLONER TOOLS Install $ apt update $ apt upgrade $ apt

MAHADI HASAN AFRIDI 21 Mar 9, 2022
Try out deep learning models online on Google Colab

Try out deep learning models online on Google Colab

Erdene-Ochir Tuguldur 1.5k Dec 27, 2022
Monitor your ML jobs on mobile devices📱, especially for Google Colab / Kaggle

TF Watcher TF Watcher is a simple to use Python package and web app which allows you to monitor ?? your Machine Learning training or testing process o

Rishit Dagli 54 Nov 1, 2022
Stroke-predictions-ml-model - Machine learning model to predict individuals chances of having a stroke

stroke-predictions-ml-model machine learning model to predict individuals chance

Alex Volchek 1 Jan 3, 2022
A simple image/video to Desmos graph converter run locally

Desmos Bezier Renderer A simple image/video to Desmos graph converter run locally Sample Result Setup Install dependencies apt update apt install git

Kevin JY Cui 339 Dec 23, 2022