A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

Overview

StyleGAN3 CLIP-based guidance

Open in Colab

Edited version of this notebook, created by nshepperd(https://github.com/nshepperd).


to-dos:

  • Add inversion
  • Add model mixins

This notebook uses work made by Katherine Crowson(https://github.com/crowsonkb).

StyleGAN3 was created by NVIDIA. Here is the original repo.

CLIP (Contrastive Language-Image Pre-Training) is a model made by OpenAI. For more information head over here.

Feel free to suggest any changes! If anyone has any idea what license should this repo use, please let me know.

Comments
  • Add Docker environment & Replicate demo

    Add Docker environment & Replicate demo

    Hey @ouhenio ! 👋

    This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

    This also means we can make a web page where other people can try out your model easier! View it here: https://replicate.ai/ouhenio/stylegan3-clip

    At the moment we have set the inference time on the web for 5 minutes, so for models like FFHQ and MetFaces may have < 100 iterations, whereas AFHQv2 is a lot faster and can handle 300 iterations. They are so fun to play with! We support progressive output of the intermediate images generated and final video or image output.

    Do claim your page here, so you can own the page, customise the Example gallery as you like, and we'll feature it on our website and tweet about it too.

    In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. 😊

    enhancement 
    opened by chenxwh 6
  • feat: Add inversion notebook

    feat: Add inversion notebook

    The general idea of this feature is to join inversion with later CLIP-guided generation, meaning that it should allow the user to project an image in the latent space, so it can be used later as the basis for the prompt guidance.

    Adding inversion to SG3 is very straightforward (see this). What could be more tricky is to calibrate the later optimization of CLIP, since it'll probably deviate the generation from the resulting latent point from the projected image.

    One simple solution to this problem could be to linearly interpolate between the inversion loss and CLIP loss, so that the projected image features don't disappear entirely.

    enhancement 
    opened by ouhenio 1
  • Add Landscapes pretrained model and improve saving options

    Add Landscapes pretrained model and improve saving options

    Pending stuff:

    • [x] Fix a bug where saving a renamed imgs.tar file causes an error when creating a video (because it renames the imgs folder).
    • [x] Add changes to inversor notebook.
    • [ ] Update replicate script.
    opened by ouhenio 0
  • feat: Add option to optimize over w+

    feat: Add option to optimize over w+

    Currently, the optimization occurs over W space. Optimizing over W+ allows better results in inversion task so it makes sense to assume that it could improve the results during CLIP guidance, since it allows a more powerful generation.

    It would be ideal to allow the user to chose whether the optimization occurs over W or W+.

    enhancement 
    opened by ouhenio 0
  • model confuse

    model confuse

    Hi, I have a littile confuse about the model, Would you please help me ? Here is the Question: 1.In section of model selection the model is stylegan3 or CLIP? 2. if i want to transfer the style of image by StyleGAN3_CLIP I have to pretrain the CLIP ,right?

    Thank you so much

    opened by snow1929 4
Owner
Eugenio Herrera
Data Scientist, Full-Stack Engineer and aspiring Researcher.
Eugenio Herrera
CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

CLIP-GEN [简体中文][English] 本项目在萤火二号集群上用 PyTorch 实现了论文 《CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP》。 CLIP-GEN 是一个 Language-F

null 75 Dec 29, 2022
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Phil Wang 4.4k Jan 3, 2023
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN.

Ryan Murdock has done it again, combining OpenAI's CLIP and the generator from a BigGAN! This repository wraps up his work so it is easily accessible to anyone who owns a GPU.

Phil Wang 2.3k Jan 9, 2023
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

VQGAN-CLIP-Docker About Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized This is a stripped and minimal dependency repository for running loca

Kevin Costa 73 Sep 11, 2022
Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

OpenAI 2.9k Jan 4, 2023
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

null 23 Oct 17, 2022
Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

NVIDIA Research Projects 4.8k Jan 9, 2023
Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

CLIP-GLaSS Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search An in-browser demo is

Federico Galatolo 172 Dec 22, 2022
Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

Phillip Lippe 1.1k Jan 7, 2023
This Jupyter notebook shows one way to implement a simple first-order low-pass filter on sampled data in discrete time.

How to Implement a First-Order Low-Pass Filter in Discrete Time We often teach or learn about filters in continuous time, but then need to implement t

Joshua Marshall 4 Aug 24, 2022
Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

CLIP-Guided-Diffusion Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab. Original colab notebooks by Ka

Nerdy Rodent 336 Dec 9, 2022
A 1.3B text-to-image generation model trained on 14 million image-text pairs

minDALL-E on Conceptual Captions minDALL-E, named after minGPT, is a 1.3B text-to-image generation model trained on 14 million image-text pairs for no

Kakao Brain 604 Dec 14, 2022
Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

SCL Introduction Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)' We evaluated our approach using two baseline

null 34 Oct 8, 2022
Towards Implicit Text-Guided 3D Shape Generation (CVPR2022)

Towards Implicit Text-Guided 3D Shape Generation Towards Implicit Text-Guided 3D Shape Generation (CVPR2022) Code for the paper [Towards Implicit Text

null 55 Dec 16, 2022
Exploring the Dual-task Correlation for Pose Guided Person Image Generation

Dual-task Pose Transformer Network The source code for our paper "Exploring Dual-task Correlation for Pose Guided Person Image Generation“ (CVPR2022)

null 63 Dec 15, 2022
CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

CLIP (Contrastive Language–Image Pre-training) Experiments (Evaluation) Model Dataset Acc (%) ViT-B/32 (Paper) CIFAR100 65.1 ViT-B/32 (Our) CIFAR100 6

Myeongjun Kim 52 Jan 7, 2023
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

null 458 Jan 2, 2023
Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

STARS Laboratory 8 Sep 14, 2022
CLIP+FFT text-to-image

Aphantasia This is a text-to-image tool, part of the artwork of the same name. Based on CLIP model, with FFT parameterizer from Lucent library as a ge

vadim epstein 690 Jan 2, 2023