Generate vibrant and detailed images using only text.

Clay M.

Last update: Dec 28, 2022

Related tags

Deep Learning deep-learning artificial-intelligence openai image-generation multimodality text-to-image diffusion multimodal text-to-image-synthesis openai-clip

Overview

CLIP Guided Diffusion

From RiversHaveWings.

Generate vibrant and detailed images using only text.

See captions and more generations in the Gallery

Install

git clone https://github.com/afiaka87/clip-guided-diffusion.git && cd clip-guided-diffusion
git clone https://github.com/afiaka87/guided-diffusion.git
pip3 install -e guided-diffusion
python3 setup.py install

Run

cgd -txt "Alien friend by Odilon Redo"

A gif of the full run will be saved to ./outputs/caption_{j}.gif by default.

The file current.png can be refreshed to see the current image. Intermediate outputs are saved to ./outputs by default in the format: Respective guided-diffusion checkpoints from OpenAI will be downloaded to ~/.cache/clip-guided-diffusion/ by default.

Usage - CLI

Text to image generation

--prompt / -txt --image_size / -size

cgd --image_size 256 --prompt "32K HUHD Mushroom"

Run on a CPU

Using a CPU can take a very long time compared to using cuda. In many cases it won't be feasible to complete a full generation.
If you have a relatively recent CPU, you can run the following command to generate a single image in 30 minutes to several hours, depending on your CPU.
Note: in order to decrease runtime significantly, this uses "ddim50", the "cosine" scheduler and the 64x64 checkpoint. Generations may be somewhat underwhelming. Increase -respace or -size at your own risk.

cgd --device cpu --prompt "You can use the short options too." -cutn 8 -size 64 -cgs 5 -tvs 0.00001 -respace "ddim50" -clip "ViT-B/32"

CUDA GPU

cgd --prompt "Theres no need to specify a device, it will be chosen automatically" -cutn 32 -size 256

Iterations/Steps (Timestep Respacing)

--timestep_respacing or -respace (default: 1000)

Use fewer timesteps over the same diffusion schedule. Sacrifices accuracy/alignment for improved speed.
options: - 25, 50, 150, 250, 500, 1000, ddim25,ddim50,ddim150, ddim250,ddim500,ddim1000

cgd -respace 'ddim50' -txt "cat painting"

Penalize a text prompt as well

Loss for prompt_min is weighted 0.1

cgd -txt "32K HUHD Mushroom" -min "green grass"

Existing image

--init_image/-init and --skip_timesteps/-skip

Blend an image with the diffusion for a number of steps.

--skip_timesteps/-skip is the number of timesteps to spend blending.

-skip should be about halfway through the diffusion schedule i.e. -respace
-respace 1000 -skip 500
-respace 250 -skip 125
etc.

You must supply both --init_image and --skip_timesteps when supplying an initial image.

cgd -respace "250" -txt "A mushroom in the style of Vincent Van Gogh" \
  --init_image "images/32K_HUHD_Mushroom.png" \
  --skip_timesteps 125

Image size

Increase in -size has drastic impacts on performance. 128 is used by default.

options: 64, 128, 256, 512 pixels (square)
--clip_guidance_scale and --tv_scale will require experimentation.
Note about 64x64 when using the 64x64 checkpoint, the cosine noise scheduler is used. For unclear reasons, this noise scheduler requires different values for --clip_guidance_scale and --tv_scale. I recommend starting with -cgs 5 -tvs 0.00001 and experimenting from around there.
For all other checkpoints, clip_guidance_scale seems to work well around 1000-2000 and tv_scale at 0, 100, 150 or 200

cgd --init_image=images/32K_HUHD_Mushroom.png \
    --skip_timesteps=500 \
    --image_size 64 \
    --prompt "8K HUHD Mushroom"

resized to 128 pixels for visibility

cgd --image_size 512 --prompt "8K HUHD Mushroom"

resized to 320 pixels for formatting

Usage - Python

# Initialize diffusion generator
from cgd import clip_guided_diffusion
import cgd_util
import kornia.augmentation as K

prompt = "An image of a fox in a forest."

# Pass in your own augmentations (supports torchvision.transforms/kornia.augmentation)
# (defaults to no augmentations, which is likely best unless you're doing something special)
aug_list = [
    K.RandomAffine(degrees=0, translate=(0.1, 0.1), scale=(0.9, 1.1), shear=0.1)),
    K.RandomMotionBlur(kernel_size=(1, 5), angle=15, direction=0.5)),
    K.RandomHorizontalFlip(p=0.5)),
]

# Remove non-alphanumeric and white space characters from prompt and prompt_min for directory name
outputs_path = cgd_util.txt_to_dir(base_path=prefix_path, txt=prompt)
outputs_path.mkdir(exist_ok=True)

# `cgd_samples` is a generator that yields the output images
cgd_samples = clip_guided_diffusion(prompt=prompt, prefix=outputs_path, augs=aug_list)

# Image paths will all be in `all_images` for e.g. video generation at the end.
all_images = []
for step, output_path in enumerate(cgd_samples):
    if step % save_frequency == 0:
        print(f"Saving image {step} to {output_path}")
        all_images.append(output_path)

Full Usage:

  --prompt_min PROMPT_MIN, -min PROMPT_MIN
                        the prompt to penalize (default: )
  --min_weight MIN_WEIGHT, -min_wt MIN_WEIGHT
                        the prompt to penalize (default: 0.1)
  --image_size IMAGE_SIZE, -size IMAGE_SIZE
                        Diffusion image size. Must be one of [64, 128, 256, 512]. (default: 128)
  --init_image INIT_IMAGE, -init INIT_IMAGE
                        Blend an image with diffusion for n steps (default: )
  --skip_timesteps SKIP_TIMESTEPS, -skip SKIP_TIMESTEPS
                        Number of timesteps to blend image for. CLIP guidance occurs after this. (default: 0)
  --prefix PREFIX, -dir PREFIX
                        output directory (default: outputs)
  --checkpoints_dir CHECKPOINTS_DIR, -ckpts CHECKPOINTS_DIR
                        Path subdirectory containing checkpoints. (default: /home/samsepiol/.cache/clip-guided-diffusion)
  --batch_size BATCH_SIZE, -bs BATCH_SIZE
                        the batch size (default: 1)
  --clip_guidance_scale CLIP_GUIDANCE_SCALE, -cgs CLIP_GUIDANCE_SCALE
                        Scale for CLIP spherical distance loss. Values will need tinkering for different settings. (default: 1000)
  --tv_scale TV_SCALE, -tvs TV_SCALE
                        Scale for denoising loss (default: 100)
  --seed SEED, -seed SEED
                        Random number seed (default: 0)
  --save_frequency SAVE_FREQUENCY, -freq SAVE_FREQUENCY
                        Save frequency (default: 1)
  --diffusion_steps DIFFUSION_STEPS, -steps DIFFUSION_STEPS
                        Diffusion steps (default: 1000)
  --timestep_respacing TIMESTEP_RESPACING, -respace TIMESTEP_RESPACING
                        Timestep respacing (default: 1000)
  --num_cutouts NUM_CUTOUTS, -cutn NUM_CUTOUTS
                        Number of randomly cut patches to distort from diffusion. (default: 16)
  --cutout_power CUTOUT_POWER, -cutpow CUTOUT_POWER
                        Cutout size power (default: 0.5)
  --clip_model CLIP_MODEL, -clip CLIP_MODEL
                        clip model name. Should be one of: ('ViT-B/16', 'ViT-B/32', 'RN50', 'RN101', 'RN50x4', 'RN50x16') (default: ViT-B/32)
  --uncond, -uncond     Use finetuned unconditional checkpoints from OpenAI (256px) and Katherine Crowson (512px) (default: False)
  --noise_schedule NOISE_SCHEDULE, -sched NOISE_SCHEDULE
                        Specify noise schedule. Either 'linear' or 'cosine'. (default: linear)
  --dropout DROPOUT, -drop DROPOUT
                        Amount of dropout to apply. (default: 0.0)
  --device DEVICE, -dev DEVICE
                        Device to use. Either cpu or cuda. (default: )

Development

git clone https://github.com/afiaka87/clip-guided-diffusion.git
cd clip-guided-diffusion
git clone https://github.com/afiaka87/guided-diffusion.git
python3 -m venv cgd_venv
source cgd_venv/bin/activate
pip install -r requirements.txt
pip install -e guided-diffusion

Run integration tests

Some tests require a GPU; you may ignore them if you dont have one.

python -m unittest discover

Comments

TypeError got an unexpected keyword argument 'custom_classes'

Running on Windows we always get the following crash

Loading model from: C:\Program Files\Python39\lib\site-packages\lpips-0.1.4-py3.9.egg\lpips\weights\v0.1\vgg.pth
0it [00:07, ?it/s]
Traceback (most recent call last):
  File "C:\Program Files\Python39\Scripts\cgd-script.py", line 33, in <module>
    sys.exit(load_entry_point('cgd-pytorch==0.1.5', 'console_scripts', 'cgd')())
  File "C:\Program Files\Python39\lib\site-packages\cgd_pytorch-0.1.5-py3.9.egg\cgd\cgd.py", line 385, in main
    list(enumerate(tqdm(cgd_generator))) # iterate over generator
  File "C:\Users\david\AppData\Roaming\Python\Python39\site-packages\tqdm\std.py", line 1185, in __iter__
    for obj in iterable:
  File "C:\Program Files\Python39\lib\site-packages\cgd_pytorch-0.1.5-py3.9.egg\cgd\cgd.py", line 243, in clip_guided_diffusion
    cgd_samples = diffusion_sample_loop(
TypeError: ddim_sample_loop_progressive() got an unexpected keyword argument 'custom_classes'

At certain resolutions like 128,256,512 this crash occurs after all of the image generation iterations are completed, but before any files are saved.

Is it possible to either fix this or disable the LPIPS loss? This isn't actually required for the image generation right?

bug

opened by DavidSHolz 9

how to use cgd_util.txt_to_dir?

Hi, I'm trying to use cgd_util.txt_to_dir in my colab to clean up the directory names. Do you have any advice on bringing this into River's original colab?

ModuleNotFoundError: No module named 'cgd'

Many thanks

opened by githubarooski 3

image_prompts is mistakenly set to text prompts

Hi, thanks for putting this project out there, I am having fun playing with it. I am using it from the command line. I tried to set the --image_prompts argument but it would fail at the beginning. For example, my command would be:

cgd --image_prompts='images/32K_HUHD_Mushroom.png' --skip_timesteps=500 --image_size 256 --prompt "8K HUHD Mushroom"

And I'd get the output:

Given initial image: 
Using:
===
CLIP guidance scale: 1000 
TV Scale: 100.0
Range scale: 50.0
Dropout: 0.0.
Number of cutouts: 48 number of cutouts.
0it [00:00, ?it/s]
Using device cuda. You can specify a device manually with `--device/-dev`
0it [00:04, ?it/s]
/usr/lib/python3/dist-packages/apport/report.py:13: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import fnmatch, glob, traceback, errno, sys, atexit, locale, imp, stat
Traceback (most recent call last):
  File "/usr/local/bin/cgd", line 33, in <module>
    sys.exit(load_entry_point('cgd-pytorch==0.1.5', 'console_scripts', 'cgd')())
  File "/home/milhouse/.local/lib/python3.9/site-packages/cgd/cgd.py", line 385, in main
    list(enumerate(tqdm(cgd_generator))) # iterate over generator
  File "/home/milhouse/.local/lib/python3.9/site-packages/tqdm/std.py", line 1127, in __iter__
    for obj in iterable:
  File "/home/milhouse/.local/lib/python3.9/site-packages/cgd/cgd.py", line 167, in clip_guided_diffusion
    image_prompt, batched_weight = encode_image_prompt(img, weight, image_size, num_cutouts=num_cutouts, clip_model_name=clip_model_name, device=device)
  File "/home/milhouse/.local/lib/python3.9/site-packages/cgd/cgd.py", line 97, in encode_image_prompt
    pil_img = Image.open(fetch(image)).convert('RGB')
  File "/home/milhouse/.local/lib/python3.9/site-packages/cgd/cgd.py", line 76, in fetch
    return open(url_or_path, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '8K HUHD Mushroom'

So, on this line here: https://github.com/afiaka87/clip-guided-diffusion/blob/b18753b3f49666fd7c2c824bb4ab24de8f397880/cgd/cgd.py#L354

I think you meant to write: image_prompts = args.image_prompts.split('|')

That seemed to fix the problem for me.

opened by everythingscomingupmilhouse 1

Noisy outputs

Generations with the colab notebook are currently quite noisy. I'm looking into this. For now; it's best to just use one of Katherine's official notebooks.

opened by afiaka87 1
Add a Gitter chat badge to README.md

afiaka87/clip-guided-diffusion now has a Chat Room on Gitter

@afiaka87 has just created a chat room. You can visit it here: https://gitter.im/clip-guided-diffusion/community.

This pull-request adds this badge to your README.md:

If my aim is a little off, please let me know.

Happy chatting.

PS: Click here if you would prefer not to receive automatic pull-requests from Gitter in future.

opened by gitter-badger 0
What's the meaning of this equation in cond_fn (from cgd.py)
In cgd.py, in cond_fn(x, t, out, y=None):

fac = diffusion.sqrt_one_minus_alphas_cumprod[current_timestep] sigmas = 1 - fac x_in = out["pred_xstart"] * fac + x * sigmas

out["pred_xstart"] is the predicted x0. x is the current xt.

what the meaning of x_in?
opened by Josh00-Lu 1

Tensor is not a torch image

During the execution I get the following error:

TypeError: tensor is not a torch image.

MacBook-Pro-3 clip-guided-diffusion % cgd --prompts "A mushroom in the style of Vincent Van Gogh" \ 
  --timestep_respacing 1000 \
  --init_image "images/32K_HUHD_Mushroom.png" \
  --init_scale 1000 \
  --skip_timesteps 350
Using device cpu. You can specify a device manually with `--device/-dev`
--wandb_project not specified. Skipping W&B integration.
Loading clip model	ViT-B/32	on device	cpu.
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /Users/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100%|███████████████████████████████████████████████████████████████████████████| 528M/528M [01:19<00:00, 6.96MB/s]
Loading model from: /Library/Python/3.8/site-packages/lpips-0.1.4-py3.8.egg/lpips/weights/v0.1/vgg.pth
  0%|                                                                                      | 0/650 [00:06<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/cgd", line 33, in <module>
    sys.exit(load_entry_point('cgd-pytorch==0.2.5', 'console_scripts', 'cgd')())
  File "/Library/Python/3.8/site-packages/cgd_pytorch-0.2.5-py3.8.egg/cgd/cgd.py", line 357, in main
  File "/Library/Python/3.8/site-packages/cgd_pytorch-0.2.5-py3.8.egg/cgd/cgd.py", line 223, in clip_guided_diffusion
  File "/Users/Developement/dream-visual/clip-guided-diffusion/guided-diffusion/guided_diffusion/gaussian_diffusion.py", line 637, in p_sample_loop_progressive
    out = sample_fn(
  File "/Users/Developement/dream-visual/clip-guided-diffusion/guided-diffusion/guided_diffusion/gaussian_diffusion.py", line 522, in p_sample_with_grad
    out["mean"] = self.condition_mean_with_grad(
  File "/Users/Developement/dream-visual/clip-guided-diffusion/guided-diffusion/guided_diffusion/gaussian_diffusion.py", line 380, in condition_mean_with_grad
    gradient = cond_fn(x, t, p_mean_var, **model_kwargs)
  File "/Library/Python/3.8/site-packages/cgd_pytorch-0.2.5-py3.8.egg/cgd/cgd.py", line 150, in cond_fn
  File "/Library/Python/3.8/site-packages/torchvision-0.2.2.post3-py3.8.egg/torchvision/transforms/transforms.py", line 163, in __call__
    return F.normalize(tensor, self.mean, self.std, self.inplace)
  File "/Library/Python/3.8/site-packages/torchvision-0.2.2.post3-py3.8.egg/torchvision/transforms/functional.py", line 201, in normalize
    raise TypeError('tensor is not a torch image.')
TypeError: tensor is not a torch image.

opened by ArkasDev 0

Issue #20 still not working.
Still does not work. See the context in the original issue.

ResizeRight is expecting either a numpy array or a torch tensor, now it gets a PIL image which does not have shape attribute.

https://github.com/afiaka87/clip-guided-diffusion/blob/a631a06b51ac5c6636136fab27833c68862eaa24/cgd/clip_util.py#L57-L62

This is what I tried and at least it runs without an error

t_img = tvf.to_tensor(pil_img) t_img = resize_right.resize(t_img, out_shape=(smallest_side, smallest_side), interp_method=lanczos3, support_sz=None, antialiasing=True, by_convs=False, scale_tolerance=None) batch = make_cutouts(t_img.unsqueeze(0).to(device))

I am not sure what was intended here as to the output shape. As it was, it made 1024x512 from 1024x1024 original, for image_size 512, now this makes 512x512.

I am not using offsets, BTW.

As to the images produced, can't see much happening when using image prompts, but I guess that is another story. According to my experience guidance by comparing CLIP encoded images is not very useful as such, so I'll probably go my own way to add other ways as to image based guidance. This might depend on the kind of images I work with and how. More visuality than semantics.

PS. I see now that the init image actually means using perceptual losses as guidance, rather than initialising something (like one can do with VQGAN latents for instance). So that's more like what I am after.

Originally posted by @htoyryla in https://github.com/afiaka87/clip-guided-diffusion/issues/20#issuecomment-1045961800
opened by htoyryla 0
Use K. Crowson's denoising model to save VRAM, improve generations

Katherine just trained a new checkpoint meant to be used in tandem with existing unconditional checkpoints from Open AI. I intend to add this functionality here eventually but am going to be very busy for the next week or so.

opened by afiaka87 1
GIFs are pixelated

Originally went with GIF as it meant not placing a dependency on ffmpeg. The outputs aren't very good quality for whatever reason. Rather than mess with fixing an outdated tech; I'm just going to require ffmpeg to be installed locally on your machine. Perhaps with a message to the user if they don't have the binary on their PATH.

opened by afiaka87 0

Releases(v0.2.5)

v0.2.5(Nov 6, 2021)

Full Changelog: https://github.com/afiaka87/clip-guided-diffusion/compare/v0.2.4...v0.2.5
Source code(tar.gz)
Source code(zip)
v0.2.4(Oct 15, 2021)
Add support for cog container, prediction.

Add link to replicate.ai host

Full Changelog: https://github.com/afiaka87/clip-guided-diffusion/compare/v0.2.3...v0.2.4
Source code(tar.gz)
Source code(zip)
v0.2.3(Oct 7, 2021)

Full Changelog: https://github.com/afiaka87/clip-guided-diffusion/compare/v0.2.2...v0.2.3
Source code(tar.gz)
Source code(zip)
v0.2.2(Oct 1, 2021)

Additional image logging to W&B. Code cleanup.

Full Changelog: https://github.com/afiaka87/clip-guided-diffusion/compare/v0.2.1...v0.2.2
Source code(tar.gz)
Source code(zip)
v0.2.1(Sep 29, 2021)

Added support for some of the "quick/fast" clip guided diffusion techniques. Should help when using 100 timesteps or fewer. Saturation loss can also be used to help considerably with the 64x64 checkpoint, as well as with ddim sampling.
Source code(tar.gz)
Source code(zip)

Owner

Clay M.

Software engineer working with multi-modal deep learning.

GitHub

Fine-Tune EleutherAI GPT-Neo to Generate Netflix Movie Descriptions in Only 47 Lines of Code Using Hugginface And DeepSpeed

GPT-Neo-2.7B Fine-Tuning Example Using HuggingFace & DeepSpeed Installation cd venv/bin ./pip install -r ../../requirements.txt ./pip install deepspe

180 Jan 5, 2023

A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

A CNN implementation using only numpy. Supports multidimensional images, stride, etc. Speed up due to heavy use of slicing and mathematical simplification..

2 Nov 30, 2021

Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of images as "pixels"

picinpics Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of

1 Oct 24, 2021

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

19 Sep 29, 2022

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

61 Nov 14, 2022

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023

GAN JAX - A toy project to generate images from GANs with JAX

GAN JAX - A toy project to generate images from GANs with JAX This project aims to bring the power of JAX, a Python framework developped by Google and

14 Nov 29, 2022

Generate images from texts. In Russian. In PaddlePaddle

ruDALL-E PaddlePaddle ruDALL-E in PaddlePaddle. Install: pip install rudalle_paddle==0.0.1rc1 Run with free v100 on AI Studio. Original Pytorch versi

20 Oct 18, 2022

Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline

vqvae_dwt_distiller.pytorch Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline. It allows to generate 512x512 ima

25 Jul 19, 2022

A tensorflow/keras implementation of StyleGAN to generate images of new Pokemon.

PokeGAN A tensorflow/keras implementation of StyleGAN to generate images of new Pokemon. Dataset The model has been trained on dataset that includes 8

19 Jul 26, 2022

OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages

OCR-Streamlit-App OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages OCR app gets an image a

5 Apr 5, 2022

A real-time motion capture system that estimates poses and global translations using only 6 inertial measurement units

TransPose Code for our SIGGRAPH 2021 paper "TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors". This repository

261 Dec 31, 2022

Only a Matter of Style: Age Transformation Using a Style-Based Regression Model

Only a Matter of Style: Age Transformation Using a Style-Based Regression Model The task of age transformation illustrates the change of an individual

444 Dec 30, 2022

Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange

MyTT Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! to Stock Market Financial Technical Analysis Python

34 Dec 27, 2022

Header-only library for using Keras models in C++.

frugally-deep Use Keras models in C++ with ease Table of contents Introduction Usage Performance Requirements and Installation FAQ Introduction Would

927 Jan 5, 2023

Combine Tacotron2 and Hifi GAN to generate speech from text

EndToEndTextToSpeech Combine Tacotron2 and Hifi GAN to generate speech from text Download weights Hifi GAN -> hifi_gan/checkpoint/ : pretrain 2.5M ste

1 Dec 18, 2021

Related resources for our EMNLP 2021 paper

Plan-then-Generate: Controlled Data-to-Text Generation via Planning Authors: Yixuan Su, David Vandyke, Sihui Wang, Yimai Fang, and Nigel Collier Code

61 Jan 3, 2023

🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~

YOLOv5-Lite：lighter, faster and easier to deploy Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, a

1.5k Jan 5, 2023

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

1 Oct 29, 2021

Generate vibrant and detailed images using only text.

Related tags

Overview

CLIP Guided Diffusion

Install

Run

Usage - CLI

Text to image generation

Run on a CPU

CUDA GPU

Iterations/Steps (Timestep Respacing)

Penalize a text prompt as well

Existing image

Image size

Usage - Python

Full Usage:

Development

Run integration tests

Comments

afiaka87/clip-guided-diffusion now has a Chat Room on Gitter

Releases(v0.2.5)

v0.2.5(Nov 6, 2021)

v0.2.4(Oct 15, 2021)

v0.2.3(Oct 7, 2021)

v0.2.2(Oct 1, 2021)

v0.2.1(Sep 29, 2021)

Owner

Clay M.

Fine-Tune EleutherAI GPT-Neo to Generate Netflix Movie Descriptions in Only 47 Lines of Code Using Hugginface And DeepSpeed

A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of images as "pixels"

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

GAN JAX - A toy project to generate images from GANs with JAX

Generate images from texts. In Russian. In PaddlePaddle

Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline

A tensorflow/keras implementation of StyleGAN to generate images of new Pokemon.

OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages

A real-time motion capture system that estimates poses and global translations using only 6 inertial measurement units

Only a Matter of Style: Age Transformation Using a Style-Based Regression Model

Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange

Header-only library for using Keras models in C++.

Combine Tacotron2 and Hifi GAN to generate speech from text

Related resources for our EMNLP 2021 paper

🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers