v objective diffusion inference code for PyTorch.

Katherine Crowson

Last update: Dec 30, 2022

Related tags

Deep Learning v-diffusion-pytorch

Overview

v-diffusion-pytorch

v objective diffusion inference code for PyTorch, by Katherine Crowson (@RiversHaveWings) and Chainbreakers AI (@jd_pressman).

The models are denoising diffusion probabilistic models (https://arxiv.org/abs/2006.11239), which are trained to reverse a gradual noising process, allowing the models to generate samples from the learned data distributions starting from random noise. DDIM-style deterministic sampling (https://arxiv.org/abs/2010.02502) is also supported. The models are also trained on continuous timesteps. They use the 'v' objective from Progressive Distillation for Fast Sampling of Diffusion Models (https://openreview.net/forum?id=TIdIXIpzhoI).

Thank you to stability.ai for compute to train these models!

Dependencies

PyTorch (installation instructions)
requests, tqdm (install with pip install)
CLIP (https://github.com/openai/CLIP), and its additional pip-installable dependencies: ftfy, regex. If you git clone --recursive this repo, it should fetch CLIP automatically.

Model checkpoints:

CC12M 256x256, SHA-256 63946d1f6a1cb54b823df818c305d90a9c26611e594b5f208795864d5efe0d1f

A 602M parameter CLIP conditioned model trained on Conceptual 12M for 3.1M steps.

Sampling

Example

If the model checkpoints are stored in checkpoints/, the following will generate an image:

./clip_sample.py "the rise of consciousness" --model cc12m_1 --seed 0

If they are somewhere else, you need to specify the path to the checkpoint with --checkpoint.

CLIP conditioned/guided sampling

usage: clip_sample.py [-h] [--images [IMAGE ...]] [--batch-size BATCH_SIZE]
                      [--checkpoint CHECKPOINT] [--clip-guidance-scale CLIP_GUIDANCE_SCALE]
                      [--device DEVICE] [--eta ETA] [--model {cc12m_1}] [-n N] [--seed SEED]
                      [--steps STEPS]
                      [prompts ...]

prompts: the text prompts to use. Relative weights for text prompts can be specified by putting the weight after a colon, for example: "the rise of consciousness:0.5".

--batch-size: sample this many images at a time (default 1)

--checkpoint: manually specify the model checkpoint file

--clip-guidance-scale: how strongly the result should match the text prompt (default 500). If set to 0, the cc12m_1 model will still be CLIP conditioned and sampling will go faster and use less memory.

--device: the PyTorch device name to use (default autodetects)

--eta: set to 0 for deterministic (DDIM) sampling, 1 (the default) for stochastic (DDPM) sampling, and in between to interpolate between the two. DDIM is preferred for low numbers of timesteps.

--images: the image prompts to use (local files or HTTP(S) URLs). Relative weights for image prompts can be specified by putting the weight after a colon, for example: "image_1.png:0.5".

--model: specify the model to use (default cc12m_1)

-n: sample until this many images are sampled (default 1)

--seed: specify the random seed (default 0)

--steps: specify the number of diffusion timesteps (default is 1000, can lower for faster but lower quality sampling)

Comments

Generated images are completely black?! 😵 What am I doing wrong?
Hello, I am on Windows 10, and my gpu is a PNY Nvidia GTX 1660 TI 6 Gb. I installed V-Diffusion like so:

conda create --name v-diffusion python=3.8

conda activate v-diffusion

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch (as per Pytorch website instructions)

pip install requests tqdm

The problem is that when I launch the cfg_sample.py or clip_sample.py command lines, the generated images are completely black, although the inference process seems to run nicely and without errors.

Things I've tried:

installing previous pytorch version with conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

removing V-Diffusion conda environment completely and recreating it anew

uninstalling nvidia drivers and performing a new clean driver install (I tried both Nvidia Studio drivers and Nvidia Game Ready drivers)

uninstalling and reinstalling Conda completely

But nothing helped... and at this point I don't know what else to try...

The only interesting piece of information I could gather is that for some reason this problem also happens with another text-to-image project called Big Sleep where similar to V-Diffusion the inference process appears to run correctly but the generated images are all black.

I think there must be some simple detail I'm overlooking... which it's making me go insane... 😵 Please let me know something if you think you can help! THANKS !
opened by illtellyoulater 10
what does this line mean in README?

A weight of 1 will sample images that match the prompt roughly as well as images usually match prompts like that in the training set.

I can't wrap my head around this sentence. Could you please explain it with different wording? Thanks!

opened by illtellyoulater 2
AttributeError: module 'torch' has no attribute 'special'

torch version: 1.8.1+cu111

python ./cfg_sample.py "the rise of consciousness":5 -n 4 -bs 4 --seed 0 Using device: cuda:0 Traceback (most recent call last): File "./cfg_sample.py", line 154, in main() File "./cfg_sample.py", line 148, in main run_all(args.n, args.batch_size) File "./cfg_sample.py", line 136, in run_all steps = utils.get_spliced_ddpm_cosine_schedule(t) File "C:\Users\m\Desktop\v-diffusion-pytorch\diffusion\utils.py", line 75, in get_spliced_ddpm_cosine_schedule ddpm_part = get_ddpm_schedule(big_t + ddpm_crossover - cosine_crossover) File "C:\Users\m\Desktop\v-diffusion-pytorch\diffusion\utils.py", line 65, in get_ddpm_schedule log_snr = -torch.special.expm1(1e-4 + 10 * ddpm_t**2).log() AttributeError: module 'torch' has no attribute 'special'

opened by tempdeltavalue 2
Add github action to automatically push to pypi on Release x.y.z commit

you need to create a token there https://pypi.org/manage/account/token/ and put it in there https://github.com/crowsonkb/v-diffusion-pytorch/settings/secrets/actions/new name it PYPI_PASSWORD

The release will be triggered when you name your commit Release x.y.z I advise to change the version in setup.cfg in that commit

opened by rom1504 0
[Question] What's the meaning of these equations in sample and cfg_model_fn(from sample.py )
Hello, thank you for your great work! I have a little puzzle in sample.py `# Get the model output (v, the predicted velocity) with torch.cuda.amp.autocast(): v = model(x, ts * steps[i], **extra_args).float()

# Predict the noise and the denoised image pred = x * alphas[i] - v * sigmas[i] eps = x * sigmas[i] + v * alphas[i]`

what the meaning ? Where it comes?
opened by zhangquanwei962 0
Images don’t seem to evolve with each iteration

Thanks for sharing such an amazing repo!

I am testing a prompt like openAI “an astronaut riding a horse in a photorealistic style” to compare. But somehow the iterations seems to be stuck on the same image.

This is my first test, so could very likely be that I am doing something wrong. Results and settings attached bellow…

opened by alelordelo 0
[Question] Questions about `zero_embed` and `weights`
Thanks for this great work. I'm recently interested in using diffusion model to generate images iteratively. I found your script cfg_sample.py was a nice implementation and I decided to learn from it. However, because I'm new in this field, I've encountered some problems quite hard to understand for me. It'd be great if some hints/suggestions are provided. Thank you!! My questions are listed below. They're about the script cfg_sample.py.

I noticed in the codes, we've used zero_embed as one of the features for conditioning. What is the purpose of using it? Is it designed to allow the case of no prompt for input?

I also notice that the weight of zero_embed is computed as 1 - sum(weights), I think the 1 is to make them sum to one, but actually the weight of zero_embed could be a negative number, should weights be normalized before all the intermediate noise maps are weighted?

Thanks very much!!
opened by Karbo123 4
Metrics on WikiArt model

Hi!

I wanted to thank you for your work, especially since without you DiscoDiffusion wouldn't exist !

Still, I was wondering if you had the metrics (Precision, Recall, FID and Inception Score) on the 256x256 WikiArt model ?

opened by Maxim-Durand 0
Any idea on how to attach a clip model to a 64x64 unconditional model from openai/improved-diffusion?

Hey! love your work and been following your stuff for a while. I have finetuned a 64x64 unconditional model from openai/improved diffusion. checkpoint

I was curious if you could lend any insight on how to connect CLIP guidance to my model? I have tried repurposing your notebook (https://colab.research.google.com/drive/12a_Wrfi2_gwwAuN3VvMTwVMz9TfqctNj#scrollTo=1YwMUyt9LHG1) however past 100 steps, my models seems to unconverge.

I think perhaps because there is too much noise being added for the smaller image size? How might i fix this?

opened by DeepTitan 0

Releases(v0.0.2)

v0.0.2(Nov 20, 2022)

Source code(tar.gz)
Source code(zip)

Owner

Katherine Crowson

AI/generative artist.

GitHub

Minimal diffusion models - Minimal code and simple experiments to play with Denoising Diffusion Probabilistic Models (DDPMs)

Minimal code and simple experiments to play with Denoising Diffusion Probabilist

16 Oct 6, 2022

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

157 Dec 11, 2022

Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

30 Days Of Machine Learning Using Pytorch Objective of the repository is to learn and build machine learning models using Pytorch. List of Algorithms

119 Nov 24, 2022

Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

82 Nov 29, 2022

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

8 Oct 3, 2022

Data-depth-inference - Data depth inference with python

Welcome! This readme will guide you through the use of the code in this reposito

3 Feb 8, 2022

Pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

MOSNet pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion" https://arxiv.org/abs/1904.08352 Dependency L

9 Nov 18, 2022

Code for the paper Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations (AKBC 2021).

Relation Prediction as an Auxiliary Training Objective for Knowledge Base Completion This repo provides the code for the paper Relation Prediction as

85 Jan 2, 2023

[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

PG-MORL This repository contains the implementation for the paper Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Contro

65 Jan 7, 2023

Exact Pareto Optimal solutions for preference based Multi-Objective Optimization

40 Dec 24, 2022

Official implementation of "A Unified Objective for Novel Class Discovery", ICCV2021 (Oral)

A Unified Objective for Novel Class Discovery This is the official repository for the paper: A Unified Objective for Novel Class Discovery Enrico Fini

118 Dec 26, 2022

Official implementation of NeurIPS 2021 paper "One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective"

71 Dec 22, 2022

Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations

Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations Requirements The code is implemented in Python and requires

1 Nov 3, 2021

Multi-objective gym environments for reinforcement learning.

MO-Gym: Multi-Objective Reinforcement Learning Environments Gym environments for multi-objective reinforcement learning (MORL). The environments follo

74 Jan 3, 2023

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Protein GLM (wip) Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capabil

17 May 6, 2022

v objective diffusion inference code for PyTorch.

Related tags

Overview

v-diffusion-pytorch

Dependencies

Model checkpoints:

Sampling

Example

CLIP conditioned/guided sampling

Comments

Releases(v0.0.2)

v0.0.2(Nov 20, 2022)

Owner

Katherine Crowson

Minimal diffusion models - Minimal code and simple experiments to play with Denoising Diffusion Probabilistic Models (DDPMs)

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

Data-depth-inference - Data depth inference with python

Pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

Code for the paper Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations (AKBC 2021).

[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

Exact Pareto Optimal solutions for preference based Multi-Objective Optimization

Official implementation of "A Unified Objective for Novel Class Discovery", ICCV2021 (Oral)

Official implementation of NeurIPS 2021 paper "One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective"

Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations

Multi-objective gym environments for reinforcement learning.

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Paaster is a secure by default end-to-end encrypted pastebin built with the objective of simplicity.

Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"