High-Resolution Image Synthesis with Latent Diffusion Models

Overview

Latent Diffusion Models

arXiv | BibTeX

High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach*, Andreas Blattmann*, Dominik Lorenz, Patrick Esser, Björn Ommer
* equal contribution

Requirements

A suitable conda environment named ldm can be created and activated with:

conda env create -f environment.yaml
conda activate ldm

Model Zoo

Pretrained Autoencoding Models

rec2

Model FID vs val PSNR PSIM Link Comments
f=4, VQ (Z=8192, d=3) 0.58 27.43 +/- 4.26 0.53 +/- 0.21 https://ommer-lab.com/files/latent-diffusion/vq-f4.zip
f=4, VQ (Z=8192, d=3) 1.06 25.21 +/- 4.17 0.72 +/- 0.26 https://heibox.uni-heidelberg.de/f/9c6681f64bb94338a069/?dl=1 no attention
f=8, VQ (Z=16384, d=4) 1.14 23.07 +/- 3.99 1.17 +/- 0.36 https://ommer-lab.com/files/latent-diffusion/vq-f8.zip
f=8, VQ (Z=256, d=4) 1.49 22.35 +/- 3.81 1.26 +/- 0.37 https://ommer-lab.com/files/latent-diffusion/vq-f8-n256.zip
f=16, VQ (Z=16384, d=8) 5.15 20.83 +/- 3.61 1.73 +/- 0.43 https://heibox.uni-heidelberg.de/f/0e42b04e2e904890a9b6/?dl=1
f=4, KL 0.27 27.53 +/- 4.54 0.55 +/- 0.24 https://ommer-lab.com/files/latent-diffusion/kl-f4.zip
f=8, KL 0.90 24.19 +/- 4.19 1.02 +/- 0.35 https://ommer-lab.com/files/latent-diffusion/kl-f8.zip
f=16, KL (d=16) 0.87 24.08 +/- 4.22 1.07 +/- 0.36 https://ommer-lab.com/files/latent-diffusion/kl-f16.zip
f=32, KL (d=64) 2.04 22.27 +/- 3.93 1.41 +/- 0.40 https://ommer-lab.com/files/latent-diffusion/kl-f32.zip

Get the models

Running the following script downloads und extracts all available pretrained autoencoding models.

bash scripts/download_first_stages.sh

The first stage models can then be found in models/first_stage_models/<model_spec>

Pretrained LDMs

Datset Task Model FID IS Prec Recall Link Comments
CelebA-HQ Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=0) 5.11 (5.11) 3.29 0.72 0.49 https://ommer-lab.com/files/latent-diffusion/celeba.zip
FFHQ Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=1) 4.98 (4.98) 4.50 (4.50) 0.73 0.50 https://ommer-lab.com/files/latent-diffusion/ffhq.zip
LSUN-Churches Unconditional Image Synthesis LDM-KL-8 (400 DDIM steps, eta=0) 4.02 (4.02) 2.72 0.64 0.52 https://ommer-lab.com/files/latent-diffusion/lsun_churches.zip
LSUN-Bedrooms Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=1) 2.95 (3.0) 2.22 (2.23) 0.66 0.48 https://ommer-lab.com/files/latent-diffusion/lsun_bedrooms.zip
ImageNet Class-conditional Image Synthesis LDM-VQ-8 (200 DDIM steps, eta=1) 7.77(7.76)* /15.82** 201.56(209.52)* /78.82** 0.84* / 0.65** 0.35* / 0.63** https://ommer-lab.com/files/latent-diffusion/cin.zip *: w/ guiding, classifier_scale 10 **: w/o guiding, scores in bracket calculated with script provided by ADM
Conceptual Captions Text-conditional Image Synthesis LDM-VQ-f4 (100 DDIM steps, eta=0) 16.79 13.89 N/A N/A https://ommer-lab.com/files/latent-diffusion/text2img.zip finetuned from LAION
OpenImages Super-resolution LDM-VQ-4 N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip BSR image degradation
OpenImages Layout-to-Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=0) 32.02 15.92 N/A N/A https://ommer-lab.com/files/latent-diffusion/layout2img_model.zip
Landscapes Semantic Image Synthesis LDM-VQ-4 N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/semantic_synthesis256.zip
Landscapes Semantic Image Synthesis LDM-VQ-4 N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/semantic_synthesis.zip finetuned on resolution 512x512

Get the models

The LDMs listed above can jointly be downloaded and extracted via

bash scripts/download_models.sh

The models can then be found in models/ldm/<model_spec>.

Sampling with unconditional models

We provide a first script for sampling from our unconditional models. Start it via

CUDA_VISIBLE_DEVICES=<GPU_ID> python scripts/sample_diffusion.py -r models/ldm/<model_spec>/model.ckpt -l <logdir> -n <\#samples> --batch_size <batch_size> -c <\#ddim steps> -e <\#eta> 

Inpainting

inpainting

Download the pre-trained weights

wget -O models/ldm/inpainting_big/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=1

and sample with

python scripts/inpaint.py --indir data/inpainting_examples/ --outdir outputs/inpainting_results

indir should contain images *.png and masks <image_fname>_mask.png like the examples provided in data/inpainting_examples.

Train your own LDMs

Data preparation

Faces

For downloading the CelebA-HQ and FFHQ datasets, proceed as described in the taming-transformers repository.

LSUN

The LSUN datasets can be conveniently downloaded via the script available here. We performed a custom split into training and validation images, and provide the corresponding filenames at https://ommer-lab.com/files/lsun.zip. After downloading, extract them to ./data/lsun. The beds/cats/churches subsets should also be placed/symlinked at ./data/lsun/bedrooms/./data/lsun/cats/./data/lsun/churches, respectively.

ImageNet

The code will try to download (through Academic Torrents) and prepare ImageNet the first time it is used. However, since ImageNet is quite large, this requires a lot of disk space and time. If you already have ImageNet on your disk, you can speed things up by putting the data into ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/ (which defaults to ~/.cache/autoencoders/data/ILSVRC2012_{split}/data/), where {split} is one of train/validation. It should have the following structure:

${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/
├── n01440764
│   ├── n01440764_10026.JPEG
│   ├── n01440764_10027.JPEG
│   ├── ...
├── n01443537
│   ├── n01443537_10007.JPEG
│   ├── n01443537_10014.JPEG
│   ├── ...
├── ...

If you haven't extracted the data, you can also place ILSVRC2012_img_train.tar/ILSVRC2012_img_val.tar (or symlinks to them) into ${XDG_CACHE}/autoencoders/data/ILSVRC2012_train/ / ${XDG_CACHE}/autoencoders/data/ILSVRC2012_validation/, which will then be extracted into above structure without downloading it again. Note that this will only happen if neither a folder ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/ nor a file ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/.ready exist. Remove them if you want to force running the dataset preparation again.

Model Training

Logs and checkpoints for trained models are saved to logs/<START_DATE_AND_TIME>_<config_spec>.

Training autoencoder models

Configs for training a KL-regularized autoencoder on ImageNet are provided at configs/autoencoder. Training can be started by running

CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/autoencoder/<config_spec>.yaml -t --gpus 0,    

where config_spec is one of {autoencoder_kl_8x8x64(f=32, d=64), autoencoder_kl_16x16x16(f=16, d=16), autoencoder_kl_32x32x4(f=8, d=4), autoencoder_kl_64x64x3(f=4, d=3)}.

For training VQ-regularized models, see the taming-transformers repository.

Training LDMs

In configs/latent-diffusion/ we provide configs for training LDMs on the LSUN-, CelebA-HQ, FFHQ and ImageNet datasets. Training can be started by running

CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/latent-diffusion/<config_spec>.yaml -t --gpus 0,

where <config_spec> is one of {celebahq-ldm-vq-4(f=4, VQ-reg. autoencoder, spatial size 64x64x3),ffhq-ldm-vq-4(f=4, VQ-reg. autoencoder, spatial size 64x64x3), lsun_bedrooms-ldm-vq-4(f=4, VQ-reg. autoencoder, spatial size 64x64x3), lsun_churches-ldm-vq-4(f=8, KL-reg. autoencoder, spatial size 32x32x4),cin-ldm-vq-8(f=8, VQ-reg. autoencoder, spatial size 32x32x4)}.

Coming Soon...

Comments

BibTeX

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Comments
  • Reproducing inpainting results

    Reproducing inpainting results

    Hi, thanks for this great repo! I was trying to reproduce the inpainting results on the example images and obtain noticeable artifacts. image image

    Do you have an idea what could be the reason? I am running: python scripts/inpaint.py --indir data/inpainting_examples/ --outdir outputs/inpainting_results

    opened by SirWyver 9
  • [Question] Is it possible to gradually diffuse/transform one given real image to another using diffusion model?

    [Question] Is it possible to gradually diffuse/transform one given real image to another using diffusion model?

    Thanks for this great work. I'm quite interested in the possible applications of the (latent) diffusion model proposed in the impressive paper. Your works have shown many possible promising applications of this newly emerging generative modeling approach. However, I have another question that bothers me for several days. It would be great if you could give some advices or suggestions on this problem. The problem is actually a open one, and it's detailed below.

    Question Given an initial image (e.g. a 256x256 image with a red dog on it) as the starting image, can we use a diffusion model to diffuse/transform the initial image gradually, until it satisfies the expectation (e.g. conditioned on a text prompt of "a yellow cat"), the final image should be an image describing "a yellow cat".

    Difficulty As we know, the diffusion model assumes the initial image should be taken from the gaussian distribution. But in our situation, it is not the case. Our initial image is a real image, which I think it breaks the assumption.

    I've directly tried to implement the thought in a most naive way, but it doesn't seem to work, because it generate some vague results. It would be great if you could give some advices or suggestions on this problem. Thank you!!

    opened by Karbo123 6
  • terminate called after throwing an instance of 'c10::Error'

    terminate called after throwing an instance of 'c10::Error'

    I am playing with ldm.models.diffusion.ddpm.LatentDiffusion with 4 GPUs and DDP distribution. After around 30 epochs, it stopped,

    `terminate called after throwing an instance of 'c10::Error' what(): CUDA error: initialization error Exception raised from insert_events at /opt/conda/conda-bld/pytorch_1603729096996/work/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first): frame_#0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f082820c8b2 in /root/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/lib/libc10.so) frame#1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f082845ef20 in /root/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/lib/libc10_cuda.so) frame#2: c10::TensorImpl::release_resources() + 0x4d (0x7f08281f7b7d in /root/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/lib/libc10.so) frame#3: + 0x5f65b2 (0x7f08725575b2 in /root/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame#4: + 0x13c2bc (0x55b1c22232bc in /root/miniconda3/envs/ldm/bin/python) frame#5: + 0x1efd35 (0x55b1c22d6d35 in /root/miniconda3/envs/ldm/bin/python) frame#_6: PyObject_GC_Malloc + 0x88 (0x55b1c2223998 in /root/miniconda3/envs/ldm/bin/python) frame#7: PyType_GenericAlloc + 0x3b (0x55b1c2293a8b in /root/miniconda3/envs/ldm/bin/python) frame#8: + 0xc385 (0x7f08a1bbf385 in /root/miniconda3/envs/ldm/lib/python3.8/site-packages/numpy/random/bit_generator.cpython-38-x86_64-linux-gnu.so) frame#9: + 0x13d585 (0x55b1c2224585 in /root/miniconda3/envs/ldm/bin/python) frame#10: + 0xf97f (0x7f08a1bc297f in /root/miniconda3/envs/ldm/lib/python3.8/site-packages/numpy/random/bit_generator.cpython-38-x86_64-linux-gnu.so) frame#11: + 0xfb7e (0x7f08a1bc2b7e in /root/miniconda3/envs/ldm/lib/python3.8/site-packages/numpy/random/bit_generator.cpython-38-x86_64-linux-gnu.so) frame#12: + 0x1e857 (0x7f08a1bd1857 in /root/miniconda3/envs/ldm/lib/python3.8/site-packages/numpy/random/bit_generator.cpython-38-x86_64-linux-gnu.so) frame#13: + 0x5f92c (0x55b1c214692c in /root/miniconda3/envs/ldm/bin/python) frame#14: + 0x16fb40 (0x55b1c2256b40 in /root/miniconda3/envs/ldm/bin/python) frame#_15: + 0xe4d6 (0x7f08a17a84d6 in /root/miniconda3/envs/ldm/lib/python3.8/site-packages/numpy/random/mt19937.cpython-38-x86_64-linux-gnu.so) frame#16: + 0x13d60c (0x55b1c222460c in /root/miniconda3/envs/ldm/bin/python) frame#17: + 0x14231 (0x7f08a1bf4231 in /root/miniconda3/envs/ldm/lib/python3.8/site-packages/numpy/random/mtrand.cpython-38-x86_64-linux-gnu.so) frame#18: + 0x21d0e (0x7f08a1c01d0e in /root/miniconda3/envs/ldm/lib/python3.8/site-packages/numpy/random/mtrand.cpython-38-x86_64-linux-gnu.so) frame#_19: PyObject_MakeTpCall + 0x1a4 (0x55b1c22247d4 in /root/miniconda3/envs/ldm/bin/python) frame#_20: PyEval_EvalFrameDefault + 0x4596 (0x55b1c22abf56 in /root/miniconda3/envs/ldm/bin/python) frame#_21: PyEval_EvalCodeWithName + 0x2d2 (0x55b1c2271a92 in /root/miniconda3/envs/ldm/bin/python) frame#_22: PyFunction_Vectorcall + 0x1e3 (0x55b1c2272943 in /root/miniconda3/envs/ldm/bin/python) frame#23: + 0x18be79 (0x55b1c2272e79 in /root/miniconda3/envs/ldm/bin/python) frame#24: PyVectorcall_Call + 0x71 (0x55b1c2224041 in /root/miniconda3/envs/ldm/bin/python) frame#_25: PyEval_EvalFrameDefault + 0x1fdb (0x55b1c22a999b in /root/miniconda3/envs/ldm/bin/python) frame#_26: PyEval_EvalCodeWithName + 0x7df (0x55b1c2271f9f in /root/miniconda3/envs/ldm/bin/python) frame#_27: PyFunction_Vectorcall + 0x1e3 (0x55b1c2272943 in /root/miniconda3/envs/ldm/bin/python) frame#28: + 0x18be79 (0x55b1c2272e79 in /root/miniconda3/envs/ldm/bin/python) frame#29: PyVectorcall_Call + 0x71 (0x55b1c2224041 in /root/miniconda3/envs/ldm/bin/python) frame#_30: PyEval_EvalFrameDefault + 0x1fdb (0x55b1c22a999b in /root/miniconda3/envs/ldm/bin/python) frame#_31: PyEval_EvalCodeWithName + 0x7df (0x55b1c2271f9f in /root/miniconda3/envs/ldm/bin/python) frame#_32: PyFunction_Vectorcall + 0x1e3 (0x55b1c2272943 in /root/miniconda3/envs/ldm/bin/python) frame#_33: PyObject_FastCallDict + 0x24b (0x55b1c22734cb in /root/miniconda3/envs/ldm/bin/python) frame#_34: PyObject_Call_Prepend + 0x63 (0x55b1c2273733 in /root/miniconda3/envs/ldm/bin/python) frame#35: + 0x18c83a (0x55b1c227383a in /root/miniconda3/envs/ldm/bin/python) frame#36: PyObject_Call + 0x70 (0x55b1c2224200 in /root/miniconda3/envs/ldm/bin/python) frame#_37: PyEval_EvalFrameDefault + 0x1fdb (0x55b1c22a999b in /root/miniconda3/envs/ldm/bin/python) frame#_38: PyEval_EvalCodeWithName + 0x2d2 (0x55b1c2271a92 in /root/miniconda3/envs/ldm/bin/python) frame#_39: PyFunction_Vectorcall + 0x1e3 (0x55b1c2272943 in /root/miniconda3/envs/ldm/bin/python) frame#_40: PyObject_FastCallDict + 0x24b (0x55b1c22734cb in /root/miniconda3/envs/ldm/bin/python) frame#_41: PyObject_Call_Prepend + 0x63 (0x55b1c2273733 in /root/miniconda3/envs/ldm/bin/python) frame#42: + 0x18c83a (0x55b1c227383a in /root/miniconda3/envs/ldm/bin/python) frame#_43: PyObject_MakeTpCall + 0x22f (0x55b1c222485f in /root/miniconda3/envs/ldm/bin/python) frame#_44: PyEval_EvalFrameDefault + 0x11d0 (0x55b1c22a8b90 in /root/miniconda3/envs/ldm/bin/python) frame#_45: PyFunction_Vectorcall + 0x10b (0x55b1c227286b in /root/miniconda3/envs/ldm/bin/python) frame#46: + 0xba0de (0x55b1c21a10de in /root/miniconda3/envs/ldm/bin/python) frame#47: + 0x17eb32 (0x55b1c2265b32 in /root/miniconda3/envs/ldm/bin/python) frame#48: PyObject_GetItem + 0x49 (0x55b1c22568c9 in /root/miniconda3/envs/ldm/bin/python) frame#_49: PyEval_EvalFrameDefault + 0xbdd (0x55b1c22a859d in /root/miniconda3/envs/ldm/bin/python) frame#_50: PyEval_EvalCodeWithName + 0x659 (0x55b1c2271e19 in /root/miniconda3/envs/ldm/bin/python) frame#_51: PyFunction_Vectorcall + 0x1e3 (0x55b1c2272943 in /root/miniconda3/envs/ldm/bin/python) frame#52: + 0xfeb84 (0x55b1c21e5b84 in /root/miniconda3/envs/ldm/bin/python) frame#_53: PyEval_EvalCodeWithName + 0x7df (0x55b1c2271f9f in /root/miniconda3/envs/ldm/bin/python) frame#_54: PyFunction_Vectorcall + 0x1e3 (0x55b1c2272943 in /root/miniconda3/envs/ldm/bin/python) frame#55: + 0x10075e (0x55b1c21e775e in /root/miniconda3/envs/ldm/bin/python) frame#_56: PyFunction_Vectorcall + 0x10b (0x55b1c227286b in /root/miniconda3/envs/ldm/bin/python) frame#57: PyVectorcall_Call + 0x71 (0x55b1c2224041 in /root/miniconda3/envs/ldm/bin/python) frame#_58: PyEval_EvalFrameDefault + 0x1fdb (0x55b1c22a999b in /root/miniconda3/envs/ldm/bin/python) frame#_59: PyFunction_Vectorcall + 0x10b (0x55b1c227286b in /root/miniconda3/envs/ldm/bin/python) frame#60: + 0x10075e (0x55b1c21e775e in /root/miniconda3/envs/ldm/bin/python) frame#_61: PyEval_EvalCodeWithName + 0x2d2 (0x55b1c2271a92 in /root/miniconda3/envs/ldm/bin/python) frame#62: + 0x18bd20 (0x55b1c2272d20 in /root/miniconda3/envs/ldm/bin/python) frame#_63: + 0x10011a (0x55b1c21e711a in /root/miniconda3/envs/ldm/bin/python)

    Epoch 37: 69%|\u258b| 227/328 [18:34<08:13, 4.89s/it, loss=0.794, v_num=2, train/loss_simple_step=0.792, train/loss_vlb_step=0.0081, traTraceback (most recent call last): File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1045, in _run_train self.fit_loop.run() File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run self.advance(*args, **kwargs) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 200, in advance epoch_output = self.epoch_loop.run(train_dataloader) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run self.advance(*args, **kwargs) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 130, in advance batch_output = self.batch_loop.run(batch, self.iteration_count, self._dataloader_idx) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 101, in run super().run(batch, batch_idx, dataloader_idx) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run self.advance(*args, **kwargs) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 148, in advance result = self._run_optimization(batch_idx, split_batch, opt_idx, optimizer) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 202, in _run_optimization self._optimizer_step(optimizer, opt_idx, batch_idx, closure) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 396, in _optimizer_step model_ref.optimizer_step( File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1618, in optimizer_step optimizer.step(closure=optimizer_closure) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 209, in step self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 129, in __optimizer_step trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 296, in optimizer_step self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 303, in run_optimizer_step self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 226, in optimizer_step optimizer.step(closure=lambda_closure, **kwargs) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context return func(*args, **kwargs) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/optim/adamw.py", line 65, in step loss = closure() File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 236, in _training_step_and_backward_closure result = self.training_step_and_backward(split_batch, batch_idx, opt_idx, optimizer, hiddens) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 537, in training_step_and_backward result = self._training_step(split_batch, batch_idx, opt_idx, hiddens) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 307, in _training_step training_step_output = self.trainer.accelerator.training_step(step_kwargs) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 193, in training_step return self.training_type_plugin.training_step(*step_kwargs.values()) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 383, in training_step return self.model(*args, **kwargs) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward output = self.module(*inputs[0], **kwargs[0]) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 82, in forward output = self.module.training_step(*inputs, **kwargs) File "/root/Desktop/ldm/ldm/models/diffusion/ddpm.py", line 343, in training_step loss, loss_dict = self.shared_step(batch) File "/root/Desktop/ldm/ldm/models/diffusion/ddpm.py", line 887, in shared_step x, c = self.get_input(batch, self.first_stage_key) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context return func(*args, **kwargs) File "/root/Desktop/ldm/ldm/models/diffusion/ddpm.py", line 661, in get_input z = self.get_first_stage_encoding(encoder_posterior).detach() File "/root/Desktop/ldm/ldm/models/diffusion/ddpm.py", line 544, in get_first_stage_encoding z = encoder_posterior.sample() File "/root/Desktop/ldm/ldm/modules/distributions/distributions.py", line 36, in sample x = self.mean + self.std * torch.randn(self.mean.shape).to(device=self.parameters.device) File "/root/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 21388) is killed by signal: Aborted. `

    I am sure it is related to this issue, but unable to fix by setting rank_zero_only=True.

    Any help is appreciated

    opened by kaihe 6
  • cannot load vq-f4 model

    cannot load vq-f4 model

    All the vq models work for me except the first one at https://ommer-lab.com/files/latent-diffusion/vq-f4.zip

    using this config:

    model:
      base_learning_rate: 4.5e-06
      target: ldm.models.autoencoder.VQModel
      params:
        embed_dim: 3
        n_embed: 8192
        monitor: val/rec_loss
        ddconfig:
          double_z: false
          z_channels: 3
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 4
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: taming.modules.losses.vqperceptual.VQLPIPSWithDiscriminator
          params:
            disc_conditional: false
            disc_in_channels: 3
            disc_start: 0
            disc_weight: 0.75
            codebook_weight: 1.0
    
    data:
      target: main.DataModuleFromConfig
      params:
        batch_size: 8
        num_workers: 16
        wrap: true
        train:
          target: ldm.data.openimages.FullOpenImagesTrain
          params:
            crop_size: 256
        validation:
          target: ldm.data.openimages.FullOpenImagesValidation
          params:
            crop_size: 256
    

    code:

    config = OmegaConf.load('./vq-f4/config.yaml')
    pl_sd = torch.load('./vq-f4/model.ckpt', map_location="cpu")
    sd = pl_sd["state_dict"]
    ldm = instantiate_from_config(config.model)
    ldm.load_state_dict(sd, strict=False)
    

    error:

    RuntimeError: Error(s) in loading state_dict for VQModel:
    	size mismatch for encoder.down.1.block.0.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
    	size mismatch for encoder.down.1.block.0.conv1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for encoder.down.1.block.0.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for encoder.down.1.block.0.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for encoder.down.1.block.0.conv2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
    	size mismatch for encoder.down.1.block.0.conv2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for encoder.down.1.block.1.norm1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for encoder.down.1.block.1.norm1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for encoder.down.1.block.1.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
    	size mismatch for encoder.down.1.block.1.conv1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for encoder.down.1.block.1.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for encoder.down.1.block.1.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for encoder.down.1.block.1.conv2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
    	size mismatch for encoder.down.1.block.1.conv2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for encoder.down.1.downsample.conv.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
    	size mismatch for encoder.down.1.downsample.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for encoder.down.2.block.0.norm1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for encoder.down.2.block.0.norm1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for encoder.down.2.block.0.conv1.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
    	size mismatch for encoder.down.2.block.0.conv1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for encoder.down.2.block.0.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for encoder.down.2.block.0.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for encoder.down.2.block.0.conv2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
    	size mismatch for encoder.down.2.block.0.conv2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for encoder.down.2.block.0.nin_shortcut.weight: copying a param with shape torch.Size([256, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1]).
    	size mismatch for encoder.down.2.block.0.nin_shortcut.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for encoder.down.2.block.1.norm1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for encoder.down.2.block.1.norm1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for encoder.down.2.block.1.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
    	size mismatch for encoder.down.2.block.1.conv1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for encoder.down.2.block.1.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for encoder.down.2.block.1.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for encoder.down.2.block.1.conv2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
    	size mismatch for encoder.down.2.block.1.conv2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for encoder.conv_out.weight: copying a param with shape torch.Size([8, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([3, 512, 3, 3]).
    	size mismatch for encoder.conv_out.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([3]).
    	size mismatch for decoder.conv_in.weight: copying a param with shape torch.Size([512, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 3, 3, 3]).
    	size mismatch for decoder.up.0.block.0.norm1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.0.block.0.norm1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.0.block.0.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 3, 3]).
    	size mismatch for decoder.up.1.block.0.norm1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.1.block.0.norm1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.1.block.0.conv1.weight: copying a param with shape torch.Size([128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
    	size mismatch for decoder.up.1.block.0.conv1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.0.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.0.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.0.conv2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
    	size mismatch for decoder.up.1.block.0.conv2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.0.nin_shortcut.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
    	size mismatch for decoder.up.1.block.0.nin_shortcut.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.1.norm1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.1.norm1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.1.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
    	size mismatch for decoder.up.1.block.1.conv1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.1.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.1.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.1.conv2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
    	size mismatch for decoder.up.1.block.1.conv2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.2.norm1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.2.norm1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.2.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
    	size mismatch for decoder.up.1.block.2.conv1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.2.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.2.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.block.2.conv2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
    	size mismatch for decoder.up.1.block.2.conv2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.1.upsample.conv.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
    	size mismatch for decoder.up.1.upsample.conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
    	size mismatch for decoder.up.2.block.0.norm1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.0.norm1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.0.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
    	size mismatch for decoder.up.2.block.0.conv1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.0.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.0.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.0.conv2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
    	size mismatch for decoder.up.2.block.0.conv2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.1.norm1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.1.norm1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.1.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
    	size mismatch for decoder.up.2.block.1.conv1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.1.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.1.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.1.conv2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
    	size mismatch for decoder.up.2.block.1.conv2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.2.norm1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.2.norm1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.2.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
    	size mismatch for decoder.up.2.block.2.conv1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.2.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.2.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.block.2.conv2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
    	size mismatch for decoder.up.2.block.2.conv2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for decoder.up.2.upsample.conv.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
    	size mismatch for decoder.up.2.upsample.conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
    	size mismatch for loss.discriminator.main.8.weight: copying a param with shape torch.Size([1, 256, 4, 4]) from checkpoint, the shape in current model is torch.Size([512, 256, 4, 4]).
    	size mismatch for quantize.embedding.weight: copying a param with shape torch.Size([16384, 8]) from checkpoint, the shape in current model is torch.Size([8192, 3]).
    	size mismatch for quant_conv.weight: copying a param with shape torch.Size([8, 8, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 3, 1, 1]).
    	size mismatch for quant_conv.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([3]).
    	size mismatch for post_quant_conv.weight: copying a param with shape torch.Size([8, 8, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 3, 1, 1]).
    	size mismatch for post_quant_conv.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([3]).
    
    opened by Jack000 4
  • Super Resolution notebook not working locally

    Super Resolution notebook not working locally

    image

    First I must say that SR using this method is unreal, and beats anything else out there. I can get the colab working, but I am restrained by speed and memory. Using this with my RTX 3090 would be much better. However, trying to run the notebook locally works until this step. I don't know enough to really understand anything the error message provides. That's weird, since notebooks made for colab tends to just work when running locally in my experience.

    opened by SomeOrdinaryDude 3
  • Alllow inference on CPU...

    Alllow inference on CPU...

    Tried to allocate 128.00 MiB (GPU 0; 7.79 GiB total capacity; 6.19 GiB already allocated; 81.44 MiB free; 6.34 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

    Setting CUDA_VISIBLE_DEVICES to -1 to force CPU results in no cuda devices found.

    Please allow CPU inference at the expense of time.

    opened by rlabs-oss 3
  • config file for conditional LDM

    config file for conditional LDM

    thank you for sharing this great work.

    where can I find config files for these unconditional tasks, such as Text-conditional Image Synthesis, Super-resolution, Layout-to-Image Synthesis and Semantic Image Synthesis?

    the download links are merely ckpt files, and config files at configs/latent-diffusion are all unconditional tasks.

    opened by kaihe 3
  • about training

    about training

    When I run "main.py", it shows as follows:

    /data/latent-diffusion-main/venv/bin/python /data/latent-diffusion-main/main.py Traceback (most recent call last): File "/data/latent-diffusion-main/main.py", line 535, in model = instantiate_from_config(config.model) File "/data/latent-diffusion-main/ldm/util.py", line 79, in instantiate_from_config if not "target" in config: TypeError: argument of type 'NoneType' is not iterable

    Process finished with exit code 1

    Why and how to solve it?

    opened by Yolanda2020 2
  • Why not pass conditioning directly to the model instead of compare conditional one and unconditional one to

    Why not pass conditioning directly to the model instead of compare conditional one and unconditional one to "guide" model?

    Hi, a question about conditioning. In this project, the key part about conditioning is: e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2) e_t = e_t_uncond + unconditional_guidance_scale * (e_t - e_t_uncond) Apparently, it passes the conditioning info ( when using text2img.py it's prompt encoded by BERT) right into the unet model. And it uses eps(unconditioned) and eps(conditioned) with a "scale" factor to control how much the result is close to the prompt.

    My question is, since the conditioning info is passed right into the unet model, why not use its result directly as the final output? Instead of compare conditional one and unconditional one to "guide" model? e_t = self.model.apply_model(x_in, t_in, c_in) #where c_in is conditioning only

    Another question is that, in disco diffusion, the conditioning way is using gradient to guide image generation. new_mean = ( p_mean_var["mean"].float() + p_mean_var["variance"] * gradient.float() ) Seems like in LDM, it use prompt+image to train the unet model, but in disco diffusion, it use clip to guide the inference process, not including the prompt info in its training process. Am I correct? If so, is there any logic why LDM choose to include prompt to train the model? Can it improve the model performance?

    opened by 522485449 2
  • ImportError

    ImportError

    ImportError: /../../.local/lib/python3.8/site-packages/torchtext/_torchtext.so: undefined symbol: _ZNK3c104Type14isSubtypeOfExtERKSt10shared_ptrIS0_EPSo

    Please help.

    opened by Limbicnation 2
  • question about disc_start

    question about disc_start

    Thanks for this great work.

    I am trying to train vq models with custom data, only realized that disc_start in vq configs are very different, for example, vq-f8-n256 disc_start: 250001 vq-f8 disc_start: 1

    Any particular reasons that discrimator could start from 1?

    opened by kaihe 2
  • Error downloading pretrained models

    Error downloading pretrained models

    It seems that the pretrained model available at https://ommer-lab.com/files/rdm/model.ckpt is not available (probably there is some problem in the web platform). Can you please share me that pretrained model?

    opened by LucaG92 0
  • Cannot reproduce the FID results of provided pre-trained models (LSUN - Churches & CelebA)

    Cannot reproduce the FID results of provided pre-trained models (LSUN - Churches & CelebA)

    Has anyone had the issue when re-evaluating the provided pre-trained models on LSUN - Churches and CelebA datasets? I cannot get the FIDs as reported in the paper. In fact, the results are far from the reported ones (LSUN-Churches: 4.02 in the paper compared with 11.5, CelebA: 5.11 in the paper compared with 17.4) lsun-church celeba

    The only difference I noticed is that I sampled only 10k images for each case instead of 50k as in the paper (to save time). I don't know whether the number of samples has this significant impact.

    opened by hieuphung97 0
  • multi-gpu training pickle error

    multi-gpu training pickle error

    Hello guys, thanks for sharing the code of your wonderful work. I was trying to train latent diffusion on my own datasets. When I use a single GPU, everything is ok. But when I changed to multiple GPUs, I encountered the following error. Btw, I was using PyTorch-lightening 1.8 instead of 1.4.

    Traceback (most recent call last): File "train_ddpm.py", line 710, in trainer.fit(model, data) File "/home/lix0i/miniconda3/envs/ldm_clone/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 603, in fit call._call_and_handle_interrupt( File "/home/lix0i/miniconda3/envs/ldm_clone/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) File "/home/lix0i/miniconda3/envs/ldm_clone/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/multiprocessing.py", line 113, in launch mp.start_processes( File "/home/lix0i/miniconda3/envs/ldm_clone/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 179, in start_processes process.start() File "/home/lix0i/miniconda3/envs/ldm_clone/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/home/lix0i/miniconda3/envs/ldm_clone/lib/python3.8/multiprocessing/context.py", line 284, in _Popen return Popen(process_obj) File "/home/lix0i/miniconda3/envs/ldm_clone/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/home/lix0i/miniconda3/envs/ldm_clone/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/home/lix0i/miniconda3/envs/ldm_clone/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/home/lix0i/miniconda3/envs/ldm_clone/lib/python3.8/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'always..inner'

    opened by lx709 1
  • How to train on custom dataset without text prompts and class conditionning

    How to train on custom dataset without text prompts and class conditionning

    Hello,

    I want to apply Latent Diffusion Model to medical images data. How can I feed training images from a directory of .jpg files to the train the diffusion model ? Plus, I don't want the model to be conditioned either on classes or on text.

    I would love any advice on how to do that.

    Thank you so much

    opened by NicolasNerr 0
  • cannot run on multi gpus

    cannot run on multi gpus

    I start training with this command 'python main.py --base configs/autoencoder/vqmodel1.yaml -t --gpus 4,5' but I got this image everything works fine, steps in one epoch are halved, but only one gpu is in use, and only started one process. How to solve this problem?

    opened by shencuifeng 0
Owner
CompVis Heidelberg
Computer Vision research group at the Ruprecht-Karls-University Heidelberg
CompVis Heidelberg
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Bilateral Denoising Diffusion Models (BDDMs) This is the official PyTorch implementation of the following paper: BDDM: BILATERAL DENOISING DIFFUSION M

null 172 Dec 23, 2022
Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

Arthur Juliani 76 Jan 7, 2023
This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

OpenAI 3k Dec 26, 2022
Codebase for Diffusion Models Beat GANS on Image Synthesis.

Codebase for Diffusion Models Beat GANS on Image Synthesis.

Katherine Crowson 128 Dec 2, 2022
Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

Computational Photography Lab @ SFU 1.1k Jan 2, 2023
Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

Thomas Neumann 117 Nov 27, 2022
AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

Multimedia Research 214 Jan 3, 2023
Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

Syed Waqas Zamir 906 Dec 30, 2022
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

Poisson Image Editing - A Parallel Implementation Jiayi Weng (jiayiwen), Zixu Chen (zixuc) Poisson Image Editing is a technique that can fuse two imag

Jiayi Weng 110 Dec 27, 2022
Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

DiffSinger - PyTorch Implementation PyTorch implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension). Status

Keon Lee 152 Jan 2, 2023
Latent Execution for Neural Program Synthesis

Latent Execution for Neural Program Synthesis This repo provides the code to replicate the experiments in the paper Xinyun Chen, Dawn Song, Yuandong T

Xinyun Chen 16 Oct 2, 2022
A framework for joint super-resolution and image synthesis, without requiring real training data

SynthSR This repository contains code to train a Convolutional Neural Network (CNN) for Super-resolution (SR), or joint SR and data synthesis. The met

null 83 Jan 1, 2023
Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

OpenAI 2.9k Jan 4, 2023
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

Keon Lee 157 Jan 1, 2023
SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis

SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis Pretrained Models In this work, we created synthetic tissue

Emirhan Kurtuluş 1 Feb 7, 2022
Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

512x512 flowers after 12 hours of training, 1 gpu 256x256 flowers after 12 hours of training, 1 gpu Pizza 'Lightweight' GAN Implementation of 'lightwe

Phil Wang 1.5k Jan 2, 2023
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Vision Longformer This project provides the source code for the vision longformer paper. Multi-Scale Vision Longformer: A New Vision Transformer for H

Microsoft 209 Dec 30, 2022
Official implement of Paper:A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images 深度监督影像融合网络DSIFN用于高分辨率双时相遥感影像变化检测 Of

Chenxiao Zhang 135 Dec 19, 2022
Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

DFSA Unofficial pytorch implementation of the ICCV 2021 paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution" (p

null 2 Nov 15, 2021