Official implementation of VQ-Diffusion

Overview

Vector Quantized Diffusion Model for Text-to-Image Synthesis

Overview

This is the official repo for the paper: [Vector Quantized Diffusion Model for Text-to-Image Synthesis].

VQ-Diffusion is based on a VQ-VAE whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). It produces significantly better text-to-image generation results when compared with Autoregressive models with similar numbers of parameters. Compared with previous GAN-based methods, VQ-Diffusion can handle more complex scenes and improve the synthesized image quality by a large margin.

Our code and model is ready, however, they are still under the review of the company. We promise to release them in December.

Framework

Samples

More Samples

Comments
  • Progress toward Google Colab with Inference Error

    Progress toward Google Colab with Inference Error

    I created the following Colab notebook but I am getting an error during inference. https://colab.research.google.com/drive/15JABpusfx_vk32GXDFHSiB9Y-CXJowe5#scrollTo=ox_nqiA6MMno

    Here is the error from the line starting with VQ_Diffusion_model = ....


    ImportError Traceback (most recent call last)

    in () 1 from inference_VQ_Diffusion import VQ_Diffusion ----> 2 VQ_Diffusion_model = VQ_Diffusion(config='OUTPUT/pretrained_model/config_imagenet.yaml', path='OUTPUT/pretrained_model/imagenet_pretrained.pth') 3 VQ_Diffusion_model.inference_generate_sample_with_condition("a huge white stone castle in a meadow painted by Rene Magritte",truncation_rate=0.85, save_root="RESULT",batch_size=4) 4 VQ_Diffusion_model.inference_generate_sample_with_condition("a woman in a dark red dress painted by Norman Rockwell",truncation_rate=0.85, save_root="RESULT",batch_size=4,fast=2) # for fast inference

    24 frames

    /usr/local/lib/python3.7/dist-packages/torchtext/vocab/vocab_factory.py in () 2 from typing import Dict, Iterable, Optional, List 3 from collections import Counter, OrderedDict ----> 4 from torchtext._torchtext import ( 5 Vocab as VocabPybind, 6 )

    ImportError: /usr/local/lib/python3.7/dist-packages/torchtext/_torchtext.so: undefined symbol: _ZTVN5torch3jit6MethodE

    opened by metaphorz 4
  • Add link to an easy-to-use diffusers implementation

    Add link to an easy-to-use diffusers implementation

    VQ Diffusion is now easily useable with the diffusers library. Could it make sense to add some code to the README that shows how to easily run the model?

    opened by patrickvonplaten 2
  • Some parameters don't receive gradients.

    Some parameters don't receive gradients.

    Hello, when I am running training command on coco, I encounter the following error:

    RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by making sure all forward function outputs participate in calculating loss.

    Then I find the parameters in "module.content_codec" and "module.transformer.condition_emb" don't receive grads. So should we set find_unused_parameters=True in DDP?

    opened by guyuchao 2
  • Cannot download the pretrained model

    Cannot download the pretrained model

    After > 2 hours of waiting, my download of the pretrained (.pth) model failed. Ouch. Is there a way you can put the models on a faster more reliable server? I have a fast internet on this computer: 200Mbps. Screen Shot 2022-01-10 at 1 55 30 PM

    opened by metaphorz 2
  • Filter Ratio when Sample?

    Filter Ratio when Sample?

    Terrific work! Great thanks for sharing your code! I am a little confused about the filter ratio [0.0, 0.5, 1.0] when you sample the image. I empirically find that 0.5 performs better. What does this paramter control? Did your paper mention it?

    opened by sunyasheng 2
  • This repo is missing important files

    This repo is missing important files

    There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.

    Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

    Merge this pull request

    opened by microsoft-github-policy-service[bot] 1
  • File

    File "taming_f8_8192_openimages_last.pth" not found?

    Thanks for your code. But when running:

    from inference_VQ_Diffusion import VQ_Diffusion VQ_Diffusion_model = VQ_Diffusion(config='OUTPUT/pretrained_model/config_text.yaml', path='OUTPUT/pretrained_model/coco_learnable.pth')

    an error occurred

    FileNotFoundError: [Errno 2] No such file or directory: 'OUTPUT/pretrained_model/taming_dvae/taming_f8_8192_openimages_last.pth'

    However, it seems that this file is not provided. I'll be grateful for any response.

    The whole error traceback is copied below:

    Traceback (most recent call last): File "inference_VQ_Diffusion.py", line 152, in path='OUTPUT/pretrained_model/cub_pretrained.pth') File "inference_VQ_Diffusion.py", line 25, in init self.info = self.get_model(ema=True, model_path=path, config_path=config, imagenet_cf=imagenet_cf) File "inference_VQ_Diffusion.py", line 45, in get_model model = build_model(config) File "/root/VQ-Diffusion/image_synthesis/modeling/build.py", line 5, in build_model return instantiate_from_config(config['model']) File "/root/VQ-Diffusion/image_synthesis/utils/misc.py", line 132, in instantiate_from_config return cls(**config.get("params", dict())) File "/root/VQ-Diffusion/image_synthesis/modeling/models/dalle.py", line 35, in init self.content_codec = instantiate_from_config(content_codec_config) File "/root/VQ-Diffusion/image_synthesis/utils/misc.py", line 132, in instantiate_from_config return cls(**config.get("params", dict())) File "/root/VQ-Diffusion/image_synthesis/modeling/codecs/image_codec/taming_gumbel_vqvae.py", line 225, in init model = self.LoadModel(config_path, ckpt_path) File "/root/VQ-Diffusion/image_synthesis/modeling/codecs/image_codec/taming_gumbel_vqvae.py", line 248, in LoadModel sd = torch.load(ckpt_path, map_location="cpu")["state_dict"] File "/root/miniconda3/envs/vqdm/lib/python3.7/site-packages/torch/serialization.py", line 594, in load with _open_file_like(f, 'rb') as opened_file: File "/root/miniconda3/envs/vqdm/lib/python3.7/site-packages/torch/serialization.py", line 230, in _open_file_like return _open_file(name_or_buffer, mode) File "/root/miniconda3/envs/vqdm/lib/python3.7/site-packages/torch/serialization.py", line 211, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'OUTPUT/pretrained_model/taming_dvae/taming_f8_8192_openimages_last.pth'

    opened by 1069066484 1
  • Adding Microsoft SECURITY.MD

    Adding Microsoft SECURITY.MD

    Please accept this contribution adding the standard Microsoft SECURITY.MD :lock: file to help the community understand the security policy and how to safely report security issues. GitHub uses the presence of this file to light-up security reminders and a link to the file. This pull request commits the latest official SECURITY.MD file from https://github.com/microsoft/repo-templates/blob/main/shared/SECURITY.md.

    Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

    opened by microsoft-github-policy-service[bot] 0
  • About unconditional synthesis on FFHQ.

    About unconditional synthesis on FFHQ.

    Hi Authors,

    Thanks for sharing this nice work! I am trying to reproduce the results of unconditional synthesis on FFHQ dataset. Compared to other tasks, however, training and inference details for this experiment seem to be insufficient. Could you tell me the training details and inference code for unconditional image generation on FFHQ?

    Thanks a lot:)

    opened by Godkimchiy 0
  • Equations in the original paper

    Equations in the original paper

    Hi, I have several questions about the equations in the paper:

    1. Equ. 10: where the prior distribution comes from? I don't understand. Can you help to figure this out?
    2. Equ. 11: the summation is conducted on a variable from 1 to K, it seems that the variable is a scalar. But the input nosing x_t or the estimated x_0 should be a tensor with height and weight, not just a scalar. Am I right?
    opened by fido20160817 0
  • text-guide image editing?

    text-guide image editing?

    How to do text-guided image edtining by VQ-diffusion? The model is the same with text-to-image? Are the input image and the masked image all input into the net? how to use them?

    opened by fido20160817 0
  • Change the dimension of the input and ouptu image

    Change the dimension of the input and ouptu image

    Hi. The current version of the code seems to work with 256X256 input/output images. I am wondering if there is any way to modify the size of input and output images.

    Thanks,

    opened by ClinicalAI 0
  • Calculation of q_posterior?

    Calculation of q_posterior?

    https://github.com/microsoft/VQ-Diffusion/blob/16dc744405e59ed1833513ebb1db87d6263d38be/image_synthesis/modeling/transformers/diffusion_transformer.py#L244 q_pred(log_x_start, t) is the forward computation for sampling x_t given x_0 and t, but in this line of posterior, the given condition is log_x_t and t?

    https://github.com/microsoft/VQ-Diffusion/blob/16dc744405e59ed1833513ebb1db87d6263d38be/image_synthesis/modeling/transformers/diffusion_transformer.py#L253

    The same confusion also appears in this line. q_pred_one_timestep(self, log_x_t, t) is also the forward computation for sampling x_t given x_{t-1}, but in this line of posterior, the given condition is log_x_t and t?

    opened by PanXiebit 4
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

Official PyTorch implementation for "On Fast Sampling of Diffusion Probabilistic Models". FastDPM generation on CIFAR-10, CelebA, and LSUN datasets. S

Zhifeng Kong 68 Dec 26, 2022
Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

OpenAI 2.9k Jan 4, 2023
Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

DiffSinger - PyTorch Implementation PyTorch implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension). Status

Keon Lee 152 Jan 2, 2023
Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

GradTTS Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv) About this repo This is an unoffic

HeyangXue1997 103 Dec 23, 2022
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

Keon Lee 157 Jan 1, 2023
Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

Retrieval-Augmented Denoising Diffusion Probabilistic Models (wip) Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in P

Phil Wang 55 Jan 1, 2023
Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation [OpenReview] [arXiv] [Code] The official implementation of GeoDiff: A Geome

Minkai Xu 155 Dec 26, 2022
Code for "Diffusion is All You Need for Learning on Surfaces"

Source code for "Diffusion is All You Need for Learning on Surfaces", by Nicholas Sharp Souhaib Attaiki Keenan Crane Maks Ovsjanikov NOTE: the linked

Nick Sharp 247 Dec 28, 2022
Learning Energy-Based Models by Diffusion Recovery Likelihood

Learning Energy-Based Models by Diffusion Recovery Likelihood Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P. Kingma Paper: https://arxiv.o

Ruiqi Gao 41 Nov 22, 2022
This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

OpenAI 3k Dec 26, 2022
NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling For Official repo of NU-Wave: A Diffusion Probabilistic Model for Neural Audio Up

Rishikesh (ऋषिकेश) 38 Oct 11, 2022
A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms

Denoising Diffusion Probabilistic Model for Proteins Implementation of Denoising Diffusion Probabilistic Model in Pytorch. It is a new approach to gen

Phil Wang 108 Nov 23, 2022
Continuous Diffusion Graph Neural Network

We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE.

Twitter Research 227 Jan 5, 2023
Code for our TKDE paper "Understanding WeChat User Preferences and “Wow” Diffusion"

wechat-wow-analysis Understanding WeChat User Preferences and “Wow” Diffusion. Fanjin Zhang, Jie Tang, Xueyi Liu, Zhenyu Hou, Yuxiao Dong, Jing Zhang,

null 18 Sep 16, 2022
Denoising Diffusion Probabilistic Models

Denoising Diffusion Probabilistic Models This repo contains code for DDPM training. Based on Denoising Diffusion Probabilistic Models, Improved Denois

Alexander Markov 7 Dec 15, 2022
NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

NU-Wave — Official PyTorch Implementation NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling Junhyeok Lee, Seungu Han @ MINDsLab Inc

MINDs Lab 242 Dec 23, 2022
Codebase for Diffusion Models Beat GANS on Image Synthesis.

Codebase for Diffusion Models Beat GANS on Image Synthesis.

Katherine Crowson 128 Dec 2, 2022
ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)

ILVR + ADM This is the implementation of ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral). This repository is h

Jooyoung Choi 225 Dec 28, 2022