Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

Related tags

text-to-image text2image
Overview

VQGAN-CLIP Overview

A repo for running VQGAN+CLIP locally. This started out as a Katherine Crowson VQGAN+CLIP derived Google colab notebook.

Original notebook: Open In Colab

Some example images:

Environment:

  • Tested on Ubuntu 20.04
  • GPU: Nvidia RTX 3090
  • Typical VRAM requirements:
    • 24 GB for a 900x900 image
    • 10 GB for a 512x512 image
    • 8 GB for a 380x380 image

Still a work in progress - I've not actually tested everything yet :)

Set up

Example set up using Anaconda to create a virtual Python environment with the prerequisites:

conda create --name vqgan python=3.9
conda activate vqgan

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install ftfy regex tqdm omegaconf pytorch-lightning IPython kornia imageio imageio-ffmpeg einops 

git clone https://github.com/openai/CLIP
git clone https://github.com/CompVis/taming-transformers.git

You will also need at least 1 VQGAN pretrained model. E.g.

mkdir checkpoints
curl -L -o checkpoints/vqgan_imagenet_f16_16384.yaml -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_16384.yaml' #ImageNet 16384
curl -L -o checkpoints/vqgan_imagenet_f16_16384.ckpt -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_16384.ckpt' #ImageNet 16384

By default, the model .yaml and .ckpt files are expected in the checkpoints directory. See https://github.com/CompVis/taming-transformers for more information on datasets and models.

Run

To generate images from text, specify your text prompt as shown in the example below:

python generate.py -p "A painting of an apple in a fruit bowl"

Multiple prompts

Text and image prompts can be split using the pipe symbol in order to allow multiple prompts. For example:

python generate.py -p "A painting of an apple in a fruit bowl | psychedelic | surreal | weird"

Image prompts can be split in the same way. For example:

python generate.py -p "A picture of a bedroom with a portrait of Van Gogh" -ip "samples/VanGogh.jpg | samples/Bedroom.png"

"Style Transfer"

An input image with style text and a low number of iterations can be used create a sort of "style transfer" effect. For example:

python generate.py -p "A painting in the style of Picasso" -ii samples/VanGogh.jpg -i 80 -se 10 -opt AdamW -lr 0.25
Output Style
Picasso
Sketch
Psychedelic

Feedback example

By feeding back the generated images and making slight changes, some interesting effects can be created.

The example zoom.sh shows this by applying a zoom and rotate to generated images, before feeding them back in again. To use zoom.sh, specifying a text prompt, output filename and number of frames. E.g.

./zoom.sh "A painting of a red telephone box spinning through a time vortex" Telephone.png 150

Random text example

Use random.sh to make a batch of images from random text. Edit the text and number of generated images to your taste!

./random.sh

Advanced options

To view the available options, use "-h".

python generate.py -h
usage: generate.py [-h] [-p PROMPTS] [-o OUTPUT] [-i MAX_ITERATIONS] [-ip IMAGE_PROMPTS]
[-nps [NOISE_PROMPT_SEEDS ...]] [-npw [NOISE_PROMPT_WEIGHTS ...]] [-s SIZE SIZE]
[-ii INIT_IMAGE] [-iw INIT_WEIGHT] [-m CLIP_MODEL] [-conf VQGAN_CONFIG]
[-ckpt VQGAN_CHECKPOINT] [-lr STEP_SIZE] [-cuts CUTN] [-cutp CUT_POW] [-se DISPLAY_FREQ]
[-sd SEED] [-opt OPTIMISER]
optional arguments:
  -h, --help            show this help message and exit
  -p PROMPTS, --prompts PROMPTS
                        Text prompts
  -o OUTPUT, --output OUTPUT
                        Number of iterations
  -i MAX_ITERATIONS, --iterations MAX_ITERATIONS
                        Number of iterations
  -ip IMAGE_PROMPTS, --image_prompts IMAGE_PROMPTS
                        Image prompts / target image
  -nps [NOISE_PROMPT_SEEDS ...], --noise_prompt_seeds [NOISE_PROMPT_SEEDS ...]
                        Noise prompt seeds
  -npw [NOISE_PROMPT_WEIGHTS ...], --noise_prompt_weights [NOISE_PROMPT_WEIGHTS ...]
                        Noise prompt weights
  -s SIZE SIZE, --size SIZE SIZE
                        Image size (width height)
  -ii INIT_IMAGE, --init_image INIT_IMAGE
                        Initial image
  -iw INIT_WEIGHT, --init_weight INIT_WEIGHT
                        Initial image weight
  -m CLIP_MODEL, --clip_model CLIP_MODEL
                        CLIP model
  -conf VQGAN_CONFIG, --vqgan_config VQGAN_CONFIG
                        VQGAN config
  -ckpt VQGAN_CHECKPOINT, --vqgan_checkpoint VQGAN_CHECKPOINT
                        VQGAN checkpoint
  -lr STEP_SIZE, --learning_rate STEP_SIZE
                        Learning rate
  -cuts CUTN, --num_cuts CUTN
                        Number of cuts
  -cutp CUT_POW, --cut_power CUT_POW
                        Cut power
  -se DISPLAY_FREQ, --save_every DISPLAY_FREQ
                        Save image iterations
  -sd SEED, --seed SEED
                        Seed
  -opt OPTIMISER, --optimiser OPTIMISER
                        Optimiser (Adam, AdamW, Adagrad, Adamax)

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}
@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Katherine Crowson - https://github.com/crowsonkb

Public Domain images from Open Access Images at the Art Institute of Chicago - https://www.artic.edu/open-access/open-access-images

Issues
  • About code implementation  Feedback example

    About code implementation Feedback example

    1. I particularly like this example, which is a great discovery. Can you use code to realize this example? I'm running under WIN, but I can't realize zoom.sh
    2. Is there any text prompt that can be generated automatically? I wonder if I can generate it myself , replace random.sh
    opened by zhanghongyong123456 12
  • Why do I get different ouputs when using the same input on a different repo?

    Why do I get different ouputs when using the same input on a different repo?

    I got an amazing output that i got on the google collab notebook and am trying to replicate it to a larger scale running locally on my 3090. For some reason the outputs appear to have a different style than that of collab (im using the same model, prompt, seed and save interval..)

    Is there something thats been altered ? Is it to do with the optimizer or learning rate? (these can't be specified on the collab notebook).

    Thanks alot for this bit of software it has given me hours of experimenting and fun!

    opened by shaolinseed 8
  • memory issues

    memory issues

    I've been trying to do larger resolution images but no matter what size GPU I use, i get a message like the one below where it seems pytorch is using a massive amount of the available memory? Any advice on how to go about creating larger images?

    GPU 0; 31.75 GiB total capacity; 29.72 GiB already allocated; 381.00 MiB free; 29.94 GiB reserved in total by PyTorch
    
    opened by sidhomj 5
  • which CUDA version is required for pytorch here?

    which CUDA version is required for pytorch here?

    I'm getting UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:115.). I do have a GPU, but I'm using CUDA version 8 (it's a shared lab machine).

    Is the old CUDA version why I get the above error? Any way to fix this, apart from setting up a brand new system?

    opened by christiansievers 4
  • Problem unidentified by a newbie (me)

    Problem unidentified by a newbie (me)

    Hello, I followed your video (thank a lot by the way, it seems like I did not followed well actually) Maybe you'll understand what I can do at this point :

    (vqgan) C:\Users\Milaj\github\VQGAN-CLIP>python generate.py -p "A painting of an apple in a fruit bowl" Traceback (most recent call last): File "C:\Users\Milaj\github\VQGAN-CLIP\generate.py", line 466, in model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device) File "C:\Users\Milaj\github\VQGAN-CLIP\generate.py", line 436, in load_vqgan_model config = OmegaConf.load(config_path) File "C:\Users\Milaj\anaconda3\envs\vqgan\lib\site-packages\omegaconf\omegaconf.py", line 183, in load with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Milaj\github\VQGAN-CLIP\checkpoints\vqgan_imagenet_f16_16384.yaml'

    Thank you in advance, tell me if you need more infos

    opened by glouglou2marseille 4
  • yaml.scanner.ScannerError: mapping values are not allowed here

    yaml.scanner.ScannerError: mapping values are not allowed here

    (base) PS C:\Users\Alex\vqgan-clip> python generate.py -p "A painting of an apple in a fruit bowl" Traceback (most recent call last): File "generate.py", line 546, in <module> model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device) File "generate.py", line 516, in load_vqgan_model config = OmegaConf.load(config_path) File "C:\Users\Alex\anaconda3\lib\site-packages\omegaconf\omegaconf.py", line 184, in load obj = yaml.load(f, Loader=get_yaml_loader()) File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\__init__.py", line 114, in load return loader.get_single_data() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\constructor.py", line 49, in get_single_data node = self.get_single_node() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\composer.py", line 36, in get_single_node document = self.compose_document() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\composer.py", line 58, in compose_document self.get_event() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\parser.py", line 118, in get_event self.current_event = self.state() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\parser.py", line 193, in parse_document_end token = self.peek_token() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\scanner.py", line 129, in peek_token self.fetch_more_tokens() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\scanner.py", line 223, in fetch_more_tokens return self.fetch_value() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\scanner.py", line 577, in fetch_value raise ScannerError(None, None, yaml.scanner.ScannerError: mapping values are not allowed here in "C:\Users\Alex\vqgan-clip\checkpoints\vqgan_imagenet_f16_16384.yaml", line 43, column 15 (base) PS C:\Users\Alex\vqgan-clip>

    opened by alexpottt 3
  • Model Not Loading

    Model Not Loading

    What do these lines mean and why aren't they working?

    FileNotFoundError Traceback (most recent call last)

    in () 3 #@markdown Once this has been run successfully you only need to run parameters and then the program to execute with new parameters 4 device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') ----> 5 model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device) 6 perceptor = clip.load(args.clip_model, jit=False)[0].eval().requires_grad_(False).to(device) 7


    /usr/local/lib/python3.7/dist-packages/omegaconf/omegaconf.py in load(file_) 181 182 if isinstance(file_, (str, pathlib.Path)): --> 183 with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f: 184 obj = yaml.load(f, Loader=get_yaml_loader()) 185 elif getattr(file_, "read", None):

    FileNotFoundError: [Errno 2] No such file or directory: '/content/vqgan_imagenet_f16_16384.yaml'

    opened by PurplePanther 3
  • Error message about conda activate

    Error message about conda activate

    Error message about conda activate:

    CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'. To initialize your shell, run $ conda init <SHELL_NAME> Currently supported shells are: - bash

    But if I try to run conda init I get this:

    a init no change /root/anaconda3/condabin/conda no change /root/anaconda3/bin/conda no change /root/anaconda3/bin/conda-env no change /root/anaconda3/bin/activate no change /root/anaconda3/bin/deactivate no change /root/anaconda3/etc/profile.d/conda.sh no change /root/anaconda3/etc/fish/conf.d/conda.fish no change /root/anaconda3/shell/condabin/Conda.psm1 no change /root/anaconda3/shell/condabin/conda-hook.ps1 no change /root/anaconda3/lib/python3.8/site-packages/xontrib/conda.xsh no change /root/anaconda3/etc/profile.d/conda.csh no change /root/.bashrc No action taken.

    And activate vqgan will still not work.

    opened by bitcoinmeetups 3
  • Richer demo images

    Richer demo images

    https://twitter.com/e08477/status/1418440857578098691?s=21

    Would love to be able to recreate this. We need to build out a taxonomy of styles.

    I. Mentioned on a comment on. YouTube that could take a video / ffmpeg and stick each image in and see it reimagine video. Can you try with the something? And post result?

    opened by johndpope 3
  • Saving each iteration to create a video

    Saving each iteration to create a video

    Is there a way I can save each image along the process rather than just the final output? Then using ffmpeg to combine the images into an animation? I've got it working on my PC! just interested in that feature as I can't get 900*900 on collab.. Thanks!

    opened by shaolinseed 3
  • No matching distribution found for torch==1.9.0+cu111

    No matching distribution found for torch==1.9.0+cu111

    Hey there! I'm trying to run this on an M1 MacBook Pro.

    I installed Anaconda and created and activated the environment like in the Readme. Then I tried to run the first pip command, but I get an error: image

    Any clue what's causing this and how to fix it?

    opened by OfficialCRUGG 3
  • CUDA out of memory.

    CUDA out of memory.

    How can i fix this?

    "CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 2.00 GiB total capacity; 1.13 GiB already allocated; 0 bytes free; 1.16 GiB reserved in total by PyTorch)"

    I understand that I need to allocate more memory or change the batch parameters. But In which file should I change it? Or what command should I use? I'm newbie btw...

    opened by nomadSky 1
  • Error when trying to generate image (noob) any help would be appreciated

    Error when trying to generate image (noob) any help would be appreciated

    (vqgan) D:\art\VQGAN-CLIP>python generate.py -p "A painting of an apple in a fruit bowl" Working with z of shape (1, 256, 16, 16) = 65536 dimensions. loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth VQLPIPSWithDiscriminator running with hinge loss. Traceback (most recent call last): File "D:\art\VQGAN-CLIP\generate.py", line 546, in model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device) File "D:\art\VQGAN-CLIP\generate.py", line 520, in load_vqgan_model model.init_from_ckpt(checkpoint_path) File "D:\art\VQGAN-CLIP\taming-transformers\taming\models\vqgan.py", line 52, in init_from_ckpt self.load_state_dict(sd, strict=False) File "D:\ana3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for VQModel: size mismatch for loss.discriminator.main.8.weight: copying a param with shape torch.Size([1, 256, 4, 4]) from checkpoint, the shape in current model is torch.Size([512, 256, 4, 4]). size mismatch for quantize.embedding.weight: copying a param with shape torch.Size([16384, 256]) from checkpoint, the shape in current model is torch.Size([1024, 256]).

    opened by Jackiplier 1
  • Error when running in CPU mode

    Error when running in CPU mode

    Bug

    I get RuntimeError: "softmax_lastdim_kernel_impl" not implemented for 'Half' when running this against my CPU.

    To reproduce

    $ python generate.py -p "A painting of an apple in a fruit bowl" -cd cpu
    

    Gives

    Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
    loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
    VQLPIPSWithDiscriminator running with hinge loss.
    Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt
    Traceback (most recent call last):
      File "/home/daniel/repos/vqgan-clip/generate.py", line 633, in <module>
        embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float()
      File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 344, in encode_text
        x = self.transformer(x)
      File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 199, in forward
        return self.resblocks(x)
      File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
        input = module(input)
      File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 186, in forward
        x = x + self.attention(self.ln_1(x))
      File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 183, in attention
        return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]
      File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/activation.py", line 1031, in forward
        attn_output, attn_output_weights = F.multi_head_attention_forward(
      File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 5082, in multi_head_attention_forward
        attn_output, attn_output_weights = _scaled_dot_product_attention(q, k, v, attn_mask, dropout_p)
      File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 4828, in _scaled_dot_product_attention
        attn = softmax(attn, dim=-1)
      File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 1679, in softmax
        ret = input.softmax(dim)
    RuntimeError: "softmax_lastdim_kernel_impl" not implemented for 'Half'
    

    Expected behavior

    No error; generate an output image.

    Additional notes

    • I followed the setup described in the readme (kudos - it's very thorough!)
    • Image generation using my GPU works fine, i.e. without the -cd cpu parameter

    Environment

    Collecting environment information...
    PyTorch version: 1.9.0+cu111
    Is debug build: False
    CUDA used to build PyTorch: 11.1
    ROCM used to build PyTorch: N/A
    
    OS: Ubuntu 20.04.3 LTS (x86_64)
    GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
    Clang version: 10.0.0-4ubuntu1 
    CMake version: version 3.16.3
    Libc version: glibc-2.31
    
    Python version: 3.9 (64-bit runtime)
    Python platform: Linux-5.4.0-88-generic-x86_64-with-glibc2.31
    Is CUDA available: True
    CUDA runtime version: 11.4.120
    GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Ti
    Nvidia driver version: 470.57.02
    cuDNN version: Could not collect
    HIP runtime version: N/A
    MIOpen runtime version: N/A
    
    Versions of relevant libraries:
    [pip3] numpy==1.21.2
    [pip3] pytorch-lightning==1.4.9
    [pip3] pytorch-ranger==0.1.1
    [pip3] torch==1.9.0+cu111
    [pip3] torch-optimizer==0.1.0
    [pip3] torchaudio==0.9.0
    [pip3] torchmetrics==0.5.1
    [pip3] torchvision==0.10.0+cu111
    [conda] numpy                     1.21.2                   pypi_0    pypi
    [conda] pytorch-lightning         1.4.9                    pypi_0    pypi
    [conda] pytorch-ranger            0.1.1                    pypi_0    pypi
    [conda] torch                     1.9.0+cu111              pypi_0    pypi
    [conda] torch-optimizer           0.1.0                    pypi_0    pypi
    [conda] torchaudio                0.9.0                    pypi_0    pypi
    [conda] torchmetrics              0.5.1                    pypi_0    pypi
    [conda] torchvision               0.10.0+cu111             pypi_0    pypi
    
    opened by danielthompson 4
  • Make output filename = text prompt by default

    Make output filename = text prompt by default

    Is there any way to make the program save each final product as prompt.jpg instead of overwriting output.jpg without manually telling it to use a different name at the start?

    opened by jps1226 0
  • custom filename and location for video is giving errors

    custom filename and location for video is giving errors

    -o flag is working properly in the case of image generation, but there is no specific information is available on how to create video with custom name. In case of providing a file name with any extension the script result in the following error

    ValueError: unknown file extension: .png'

    On windows we cannot use the zoom.sh script in conda prompt. So using the command

    python generate.py -p "An apple in a bowl" -zvid -i 2000 -vl 10 -o "output/test.mp4"

    opened by huzaifa207 0
  • requires_grad_ is not supported on ScriptModules

    requires_grad_ is not supported on ScriptModules

    using cuda 11.2, built torch from source

    Traceback (most recent call last):
      File "/home/julianallchin/github/VQGAN-CLIP/generate.py", line 548, in <module>
        perceptor = clip.load(args.clip_model, jit=jit)[0].eval().requires_grad_(False).to(device)
      File "/home/julianallchin/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/jit/_script.py", line 915, in fail
        raise RuntimeError(name + " is not supported on ScriptModules")
    RuntimeError: requires_grad_ is not supported on ScriptModules
    
    
    opened by julianallchin 0
  • updating batch creation

    updating batch creation

    null

    opened by anishone 0
  • I keep getting a traceback trying to make a video

    I keep getting a traceback trying to make a video

    in line 988 "AttributeError: 'int' object has no attribute 'stdin'"

    ffmpeg command failed - check your installation 0%| | 0/5 [00:00<?, ?it/s] Traceback (most recent call last): File "C:\Users\Caleb\VQGAN-CLIP\generate.py", line 988, in im.save(p.stdin, 'PNG') AttributeError: 'int' object has no attribute 'stdin'

    Maybe i'm trying to make a video wrong, but the issue persists even with the provided example of the telephone box.

    opened by constantupgrade 6
  • Improvement: Add cog files

    Improvement: Add cog files

    https://github.com/replicate/cog makes it easy to build Docker containers for machine learning. A cog.yaml has to be configured and the interface code written, which looks pretty straightforward. The project could probably be also be added here: https://replicate.ai/explore Anyone who has Docker installed could then run it on there system as easy as executing something like this:

    docker run -d -p 5000:5000 r8.im/nerdyrodent/[email protected]:fe8d040a80609ff5643815e28bc3c488faf8870d968f19e045c4d0e043ffae59
    curl http://localhost:5000/predict -X POST -F p="A painting of an apple in a fruit bowl"
    
    opened by microraptor 0
Owner
Nerdy Rodent
Just a nerdy rodent. I do arty stuff with computers.
Nerdy Rodent
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

VQGAN-CLIP-Docker About Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized This is a stripped and minimal dependency repository for running loca

Kevin Costa 44 Oct 18, 2021
Generate vibrant and detailed images using only text.

CLIP Guided Diffusion From RiversHaveWings. Generate vibrant and detailed images using only text. See captions and more generations in the Gallery See

Clay M. 112 Oct 15, 2021
Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

CLIP-Guided-Diffusion Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab. Original colab notebooks by Ka

Nerdy Rodent 38 Oct 23, 2021
Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt. This is done by

Mehdi Cherti 57 Oct 22, 2021
Streamlit Tutorial (ex: stock price dashboard, cartoon-stylegan, vqgan-clip, stylemixing, styleclip, sefa)

Streamlit Tutorials Install pip install streamlit Run cd [directory] streamlit run app.py --server.address 0.0.0.0 --server.port [your port] # http:/

Jihye Back 7 Oct 15, 2021
Pytorch Geometric Tutorials

Pytorch Geometric Tutorials

Antonio Longa 172 Oct 14, 2021
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

null 138 Oct 19, 2021
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

TL;DR Ignite is a high-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently. Click on the image to

null 3.7k Oct 24, 2021
Simple implementation of OpenAI CLIP model in PyTorch.

It was in January of 2021 that OpenAI announced two new models: DALL-E and CLIP, both multi-modality models connecting texts and images in some way. In this article we are going to implement CLIP model from scratch in PyTorch. OpenAI has open-sourced some of the code relating to CLIP model but I found it intimidating and it was far from something short and simple. I also came across a good tutorial inspired by CLIP model on Keras code examples and I translated some parts of it into PyTorch to build this tutorial totally with our beloved PyTorch!

Moein Shariatnia 81 Oct 16, 2021
An open source implementation of CLIP.

OpenCLIP Welcome to an open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training). The goal of this repository is to enable

null 322 Oct 22, 2021
Accelerated deep learning R&D

Accelerated deep learning R&D PyTorch framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation, and

Catalyst-Team 2.7k Oct 24, 2021
CLIPort: What and Where Pathways for Robotic Manipulation

CLIPort CLIPort: What and Where Pathways for Robotic Manipulation Mohit Shridhar, Lucas Manuelli, Dieter Fox CoRL 2021 CLIPort is an end-to-end imitat

null 84 Oct 15, 2021
The official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." *.

F-Clip — Fully Convolutional Line Parsing This repository contains the official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang

Xili Dai 67 Sep 23, 2021
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

This is a release of our VIMPAC paper to illustrate the implementations. The pretrained checkpoints and scripts will be soon open-sourced in HuggingFace transformers.

Hao Tan 56 Oct 16, 2021
DeOldify - A Deep Learning based project for colorizing and restoring old images (and video!)

DeOldify - A Deep Learning based project for colorizing and restoring old images (and video!)

Jason Antic 14.2k Oct 17, 2021
Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

CLIP-GLaSS Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search An in-browser demo is

Federico Galatolo 128 Oct 9, 2021
Vision Transformer and MLP-Mixer Architectures

Vision Transformer and MLP-Mixer Architectures Update (2.7.2021): Added the "When Vision Transformers Outperform ResNets..." paper, and SAM (Sharpness

Google Research 3.7k Oct 24, 2021
CLIP+FFT text-to-image

Aphantasia This is a text-to-image tool, part of the artwork of the same name. Based on CLIP model, with FFT parameterizer from Lucent library as a ge

vadim epstein 284 Oct 14, 2021
Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models. You can easily generate all kind of art from drawing, painting, sketch, or even a specific artist style just using a text input. You can also specify the dimensions of the image. The process can take 3-20 mins and the results will be emailed to you.

Muhammad Fathy Rashad 43 Oct 19, 2021