A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

Eugenio Herrera

Last update: Dec 29, 2022

Related tags

Deep Learning StyleGAN3-CLIP-notebook

Overview

StyleGAN3 CLIP-based guidance

Edited version of this notebook, created by nshepperd(https://github.com/nshepperd).

to-dos:

Add inversion
Add model mixins

This notebook uses work made by Katherine Crowson(https://github.com/crowsonkb).

StyleGAN3 was created by NVIDIA. Here is the original repo.

CLIP (Contrastive Language-Image Pre-Training) is a model made by OpenAI. For more information head over here.

Feel free to suggest any changes! If anyone has any idea what license should this repo use, please let me know.

Comments

Add Docker environment & Replicate demo

Hey @ouhenio ! 👋

This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

This also means we can make a web page where other people can try out your model easier! View it here: https://replicate.ai/ouhenio/stylegan3-clip

At the moment we have set the inference time on the web for 5 minutes, so for models like FFHQ and MetFaces may have < 100 iterations, whereas AFHQv2 is a lot faster and can handle 300 iterations. They are so fun to play with! We support progressive output of the intermediate images generated and final video or image output.

Do claim your page here, so you can own the page, customise the Example gallery as you like, and we'll feature it on our website and tweet about it too.

In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. 😊
enhancement

opened by chenxwh 6
feat: Add inversion notebook

The general idea of this feature is to join inversion with later CLIP-guided generation, meaning that it should allow the user to project an image in the latent space, so it can be used later as the basis for the prompt guidance.

Adding inversion to SG3 is very straightforward (see this). What could be more tricky is to calibrate the later optimization of CLIP, since it'll probably deviate the generation from the resulting latent point from the projected image.

One simple solution to this problem could be to linearly interpolate between the inversion loss and CLIP loss, so that the projected image features don't disappear entirely.
enhancement

opened by ouhenio 1
Add Landscapes pretrained model and improve saving options
Pending stuff:

[x] Fix a bug where saving a renamed imgs.tar file causes an error when creating a video (because it renames the imgs folder).

[x] Add changes to inversor notebook.

[ ] Update replicate script.
opened by ouhenio 0
feat: Add option to optimize over w+

Currently, the optimization occurs over W space. Optimizing over W+ allows better results in inversion task so it makes sense to assume that it could improve the results during CLIP guidance, since it allows a more powerful generation.

It would be ideal to allow the user to chose whether the optimization occurs over W or W+.
enhancement

opened by ouhenio 0
model confuse

Hi, I have a littile confuse about the model, Would you please help me ? Here is the Question: 1.In section of model selection the model is stylegan3 or CLIP? 2. if i want to transfer the style of image by StyleGAN3_CLIP I have to pretrain the CLIP ,right?

Thank you so much

opened by snow1929 4

Owner

Eugenio Herrera

Data Scientist, Full-Stack Engineer and aspiring Researcher.

GitHub

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

CLIP-GEN [简体中文][English] 本项目在萤火二号集群上用 PyTorch 实现了论文《CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP》。 CLIP-GEN 是一个 Language-F

75 Dec 29, 2022

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

4.4k Jan 3, 2023

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN.

Ryan Murdock has done it again, combining OpenAI's CLIP and the generator from a BigGAN! This repository wraps up his work so it is easily accessible to anyone who owns a GPU.

2.3k Jan 9, 2023

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

VQGAN-CLIP-Docker About Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized This is a stripped and minimal dependency repository for running loca

73 Sep 11, 2022

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

2.9k Jan 4, 2023

Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

23 Oct 17, 2022

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

4.8k Jan 9, 2023

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

CLIP-GLaSS Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search An in-browser demo is

172 Dec 22, 2022

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

1.1k Jan 7, 2023

This Jupyter notebook shows one way to implement a simple first-order low-pass filter on sampled data in discrete time.

How to Implement a First-Order Low-Pass Filter in Discrete Time We often teach or learn about filters in continuous time, but then need to implement t

4 Aug 24, 2022

Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

CLIP-Guided-Diffusion Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab. Original colab notebooks by Ka

336 Dec 9, 2022

A 1.3B text-to-image generation model trained on 14 million image-text pairs

minDALL-E on Conceptual Captions minDALL-E, named after minGPT, is a 1.3B text-to-image generation model trained on 14 million image-text pairs for no

604 Dec 14, 2022

Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

SCL Introduction Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)' We evaluated our approach using two baseline

34 Oct 8, 2022

Towards Implicit Text-Guided 3D Shape Generation (CVPR2022)

Towards Implicit Text-Guided 3D Shape Generation Towards Implicit Text-Guided 3D Shape Generation (CVPR2022) Code for the paper [Towards Implicit Text

55 Dec 16, 2022

Exploring the Dual-task Correlation for Pose Guided Person Image Generation

Dual-task Pose Transformer Network The source code for our paper "Exploring Dual-task Correlation for Pose Guided Person Image Generation“ (CVPR2022)

63 Dec 15, 2022

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

CLIP (Contrastive Language–Image Pre-training) Experiments (Evaluation) Model Dataset Acc (%) ViT-B/32 (Paper) CIFAR100 65.1 ViT-B/32 (Our) CIFAR100 6

52 Jan 7, 2023

Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

458 Jan 2, 2023

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

8 Sep 14, 2022

CLIP+FFT text-to-image

Aphantasia This is a text-to-image tool, part of the artwork of the same name. Based on CLIP model, with FFT parameterizer from Lucent library as a ge

690 Jan 2, 2023

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

Related tags

Overview

StyleGAN3 CLIP-based guidance

Comments

Add Docker environment & Replicate demo

feat: Add inversion notebook

Add Landscapes pretrained model and improve saving options

feat: Add option to optimize over w+

model confuse

Owner

Eugenio Herrera

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN.

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

Image-generation-baseline - MUGE Text To Image Generation Baseline

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

This Jupyter notebook shows one way to implement a simple first-order low-pass filter on sampled data in discrete time.

Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

A 1.3B text-to-image generation model trained on 14 million image-text pairs

Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

Towards Implicit Text-Guided 3D Shape Generation (CVPR2022)

Exploring the Dual-task Correlation for Pose Guided Person Image Generation

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

CLIP+FFT text-to-image