Official implementation of VQ-Diffusion

Microsoft

Last update: Jan 3, 2023

Related tags

Deep Learning VQ-Diffusion

Overview

Vector Quantized Diffusion Model for Text-to-Image Synthesis

Overview

This is the official repo for the paper: [Vector Quantized Diffusion Model for Text-to-Image Synthesis].

VQ-Diffusion is based on a VQ-VAE whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). It produces significantly better text-to-image generation results when compared with Autoregressive models with similar numbers of parameters. Compared with previous GAN-based methods, VQ-Diffusion can handle more complex scenes and improve the synthesized image quality by a large margin.

Our code and model is ready, however, they are still under the review of the company. We promise to release them in December.

Framework

Samples

More Samples

Comments

Progress toward Google Colab with Inference Error

I created the following Colab notebook but I am getting an error during inference. https://colab.research.google.com/drive/15JABpusfx_vk32GXDFHSiB9Y-CXJowe5#scrollTo=ox_nqiA6MMno

Here is the error from the line starting with VQ_Diffusion_model = ....

ImportError Traceback (most recent call last)

in () 1 from inference_VQ_Diffusion import VQ_Diffusion ----> 2 VQ_Diffusion_model = VQ_Diffusion(config='OUTPUT/pretrained_model/config_imagenet.yaml', path='OUTPUT/pretrained_model/imagenet_pretrained.pth') 3 VQ_Diffusion_model.inference_generate_sample_with_condition("a huge white stone castle in a meadow painted by Rene Magritte",truncation_rate=0.85, save_root="RESULT",batch_size=4) 4 VQ_Diffusion_model.inference_generate_sample_with_condition("a woman in a dark red dress painted by Norman Rockwell",truncation_rate=0.85, save_root="RESULT",batch_size=4,fast=2) # for fast inference

24 frames

/usr/local/lib/python3.7/dist-packages/torchtext/vocab/vocab_factory.py in () 2 from typing import Dict, Iterable, Optional, List 3 from collections import Counter, OrderedDict ----> 4 from torchtext._torchtext import ( 5 Vocab as VocabPybind, 6 )

ImportError: /usr/local/lib/python3.7/dist-packages/torchtext/_torchtext.so: undefined symbol: _ZTVN5torch3jit6MethodE

opened by metaphorz 4
Add link to an easy-to-use diffusers implementation

VQ Diffusion is now easily useable with the diffusers library. Could it make sense to add some code to the README that shows how to easily run the model?

opened by patrickvonplaten 2
Some parameters don't receive gradients.

Hello, when I am running training command on coco, I encounter the following error:

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by making sure all forward function outputs participate in calculating loss.

Then I find the parameters in "module.content_codec" and "module.transformer.condition_emb" don't receive grads. So should we set find_unused_parameters=True in DDP?

opened by guyuchao 2
Cannot download the pretrained model

After > 2 hours of waiting, my download of the pretrained (.pth) model failed. Ouch. Is there a way you can put the models on a faster more reliable server? I have a fast internet on this computer: 200Mbps.

opened by metaphorz 2
Filter Ratio when Sample?

Terrific work! Great thanks for sharing your code! I am a little confused about the filter ratio [0.0, 0.5, 1.0] when you sample the image. I empirically find that 0.5 performs better. What does this paramter control? Did your paper mention it?

opened by sunyasheng 2
This repo is missing important files

There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.

Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

Merge this pull request

opened by microsoft-github-policy-service[bot] 1
File "taming_f8_8192_openimages_last.pth" not found?

Thanks for your code. But when running:

from inference_VQ_Diffusion import VQ_Diffusion VQ_Diffusion_model = VQ_Diffusion(config='OUTPUT/pretrained_model/config_text.yaml', path='OUTPUT/pretrained_model/coco_learnable.pth')

an error occurred

FileNotFoundError: [Errno 2] No such file or directory: 'OUTPUT/pretrained_model/taming_dvae/taming_f8_8192_openimages_last.pth'

However, it seems that this file is not provided. I'll be grateful for any response.

The whole error traceback is copied below:

Traceback (most recent call last): File "inference_VQ_Diffusion.py", line 152, in path='OUTPUT/pretrained_model/cub_pretrained.pth') File "inference_VQ_Diffusion.py", line 25, in init self.info = self.get_model(ema=True, model_path=path, config_path=config, imagenet_cf=imagenet_cf) File "inference_VQ_Diffusion.py", line 45, in get_model model = build_model(config) File "/root/VQ-Diffusion/image_synthesis/modeling/build.py", line 5, in build_model return instantiate_from_config(config['model']) File "/root/VQ-Diffusion/image_synthesis/utils/misc.py", line 132, in instantiate_from_config return cls(**config.get("params", dict())) File "/root/VQ-Diffusion/image_synthesis/modeling/models/dalle.py", line 35, in init self.content_codec = instantiate_from_config(content_codec_config) File "/root/VQ-Diffusion/image_synthesis/utils/misc.py", line 132, in instantiate_from_config return cls(**config.get("params", dict())) File "/root/VQ-Diffusion/image_synthesis/modeling/codecs/image_codec/taming_gumbel_vqvae.py", line 225, in init model = self.LoadModel(config_path, ckpt_path) File "/root/VQ-Diffusion/image_synthesis/modeling/codecs/image_codec/taming_gumbel_vqvae.py", line 248, in LoadModel sd = torch.load(ckpt_path, map_location="cpu")["state_dict"] File "/root/miniconda3/envs/vqdm/lib/python3.7/site-packages/torch/serialization.py", line 594, in load with _open_file_like(f, 'rb') as opened_file: File "/root/miniconda3/envs/vqdm/lib/python3.7/site-packages/torch/serialization.py", line 230, in _open_file_like return _open_file(name_or_buffer, mode) File "/root/miniconda3/envs/vqdm/lib/python3.7/site-packages/torch/serialization.py", line 211, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'OUTPUT/pretrained_model/taming_dvae/taming_f8_8192_openimages_last.pth'

opened by 1069066484 1
Adding Microsoft SECURITY.MD

Please accept this contribution adding the standard Microsoft SECURITY.MD :lock: file to help the community understand the security policy and how to safely report security issues. GitHub uses the presence of this file to light-up security reminders and a link to the file. This pull request commits the latest official SECURITY.MD file from https://github.com/microsoft/repo-templates/blob/main/shared/SECURITY.md.

Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

opened by microsoft-github-policy-service[bot] 0
About unconditional synthesis on FFHQ.

Hi Authors,

Thanks for sharing this nice work! I am trying to reproduce the results of unconditional synthesis on FFHQ dataset. Compared to other tasks, however, training and inference details for this experiment seem to be insufficient. Could you tell me the training details and inference code for unconditional image generation on FFHQ?

Thanks a lot:)

opened by Godkimchiy 0
Equations in the original paper
Hi, I have several questions about the equations in the paper:

Equ. 10: where the prior distribution comes from? I don't understand. Can you help to figure this out?

Equ. 11: the summation is conducted on a variable from 1 to K, it seems that the variable is a scalar. But the input nosing x_t or the estimated x_0 should be a tensor with height and weight, not just a scalar. Am I right?
opened by fido20160817 0
text-guide image editing?

How to do text-guided image edtining by VQ-diffusion? The model is the same with text-to-image? Are the input image and the masked image all input into the net? how to use them?

opened by fido20160817 0
Change the dimension of the input and ouptu image

Hi. The current version of the code seems to work with 256X256 input/output images. I am wondering if there is any way to modify the size of input and output images.

Thanks,

opened by ClinicalAI 0
Calculation of q_posterior?

https://github.com/microsoft/VQ-Diffusion/blob/16dc744405e59ed1833513ebb1db87d6263d38be/image_synthesis/modeling/transformers/diffusion_transformer.py#L244 q_pred(log_x_start, t) is the forward computation for sampling x_t given x_0 and t, but in this line of posterior, the given condition is log_x_t and t?

https://github.com/microsoft/VQ-Diffusion/blob/16dc744405e59ed1833513ebb1db87d6263d38be/image_synthesis/modeling/transformers/diffusion_transformer.py#L253

The same confusion also appears in this line. q_pred_one_timestep(self, log_x_t, t) is also the forward computation for sampling x_t given x_{t-1}, but in this line of posterior, the given condition is log_x_t and t?

opened by PanXiebit 4

Owner

Microsoft

Open source projects and samples from Microsoft

GitHub

Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

Official PyTorch implementation for "On Fast Sampling of Diffusion Probabilistic Models". FastDPM generation on CIFAR-10, CelebA, and LSUN datasets. S

68 Dec 26, 2022

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

2.9k Jan 4, 2023

Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

DiffSinger - PyTorch Implementation PyTorch implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension). Status

152 Jan 2, 2023

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

GradTTS Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv) About this repo This is an unoffic

103 Dec 23, 2022

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

157 Jan 1, 2023

Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

Retrieval-Augmented Denoising Diffusion Probabilistic Models (wip) Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in P

55 Jan 1, 2023

Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation [OpenReview] [arXiv] [Code] The official implementation of GeoDiff: A Geome

155 Dec 26, 2022

Code for "Diffusion is All You Need for Learning on Surfaces"

Source code for "Diffusion is All You Need for Learning on Surfaces", by Nicholas Sharp Souhaib Attaiki Keenan Crane Maks Ovsjanikov NOTE: the linked

247 Dec 28, 2022

Python script for Linear, Non-Linear Convection, Burger’s & Poisson Equation in 1D & 2D, 1D Diffusion Equation using Standard Wall Function, 2D Heat Conduction Convection equation with Dirichlet & Neumann BC, full Navier-Stokes Equation coupled with Poisson equation for Cavity and Channel flow in 2D using Finite Difference Method & Finite Volume Method.

Navier-Stokes-numerical-solution-using-Python- Python script for Linear, Non-Linear Convection, Burger’s & Poisson Equation in 1D & 2D, 1D D

89 Jan 4, 2023

Official implementation of VQ-Diffusion

Related tags

Overview

Vector Quantized Diffusion Model for Text-to-Image Synthesis

Overview

Framework

Samples

More Samples

Comments

Thanks for your code. But when running:

an error occurred

However, it seems that this file is not provided. I'll be grateful for any response.

The whole error traceback is copied below:

Owner

Microsoft

Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

Code for "Diffusion is All You Need for Learning on Surfaces"

Learning Energy-Based Models by Diffusion Recovery Likelihood

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms

Continuous Diffusion Graph Neural Network

Code for our TKDE paper "Understanding WeChat User Preferences and “Wow” Diffusion"

Denoising Diffusion Probabilistic Models

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

Codebase for Diffusion Models Beat GANS on Image Synthesis.

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)