Augmented CLIP - Training simple models to predict CLIP image embeddings from text embeddings, and vice versa.

Peter Baylies

Last update: Sep 13, 2022

Related tags

Deep Learning Augmented_CLIP

Overview

Train aug_clip against laion400m-embeddings found here: https://laion.ai/laion-400-open-dataset/ - note that this used the base ViT-B/32 CLIP model.

Sample notebook adapted from Sadnow's 360Diffusion repo, thanks to all involved!

Latest revision: Beta 1.52 (10/11/21): https://colab.research.google.com/github/sadnow/360Diffusion/blob/main/360Diffusion_Public.ipynb

Latest highlights: Full compatibility for both 256 and 512 model for upscaling to 256,512,1024,2048, and 4096px.

Note that 4096 files aren’t quite as pretty as 2048, and they’re massive in file size. 2048 is appealing in most cases. If you intend on upscaling to anything higher than 1024, I recommend using the 512 diffusion model found in the settings-

Credits & Acknowledgements

Katherine Crowson (https://github.com/crowsonkb, https://twitter.com/RiversHaveWings)
Founder of OG Diffusion Notebook Original notebook founder; [I think] has a large involvement in both VQGAN and Diffusion!
Daniel Russell (https://github.com/russelldc, https://twitter.com/danielrussruss) Fast Diffusion Fork Founder Made the OG Fast Diffusion notebook.
Dango233 and nsheppard Contributed to Daniel’s Fast Diffusion Notebook
Sadnow (twitter.com/sadly_existent) 360Diffusion Fork Founder Forked Daniel Russel’s Fast Diffusion Notebook to include Real-ESRGAN integration-
airguitararchon (steven) Init Research
Everyone else on the VQLIPSE Discord (https://www.patreon.com/sportsracer48); Support & Research

Prior release(s): Implemented Daniel Russ’s Perlin revisions, fixed init_bug, 4096 double-pass, VRAM fixes, practical debug_mode (set to higher skip_timestep)

All edits & additions are welcome and appreciated~

You might also like...

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

61 Nov 14, 2022

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

5 Jan 4, 2023

CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

CLIP-Indonesian CLIP (Radford et al., 2021) is a multimodal model that can connect images and text by training a vision encoder and a text encoder joi

17 Mar 10, 2022

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

175 Dec 29, 2022

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

187 Dec 24, 2022

Comments

Cool idea, how have your results been?

Sorry for posting an issue for this but was really curious about what the results are like :)

I tried out your version to compare against a non-augmented embedding without diffusion/vqgan to see the "raw" results but I find it's not directly obvious with these things to see what works "better"

opened by samedii 3

Augmented CLIP - Training simple models to predict CLIP image embeddings from text embeddings, and vice versa.

Related tags

Overview

Train aug_clip against laion400m-embeddings found here: https://laion.ai/laion-400-open-dataset/ - note that this used the base ViT-B/32 CLIP model.

Sample notebook adapted from Sadnow's 360Diffusion repo, thanks to all involved!

You might also like...

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

CLIP+FFT text-to-image

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

Styled Augmented Translation

Comments

Cool idea, how have your results been?

Owner

Peter Baylies

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

Python package to generate image embeddings with CLIP without PyTorch/TensorFlow

[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN.

PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch