Taming Transformers for High-Resolution Image Synthesis

Overview

Taming Transformers for High-Resolution Image Synthesis

CVPR 2021 (Oral)

teaser

Taming Transformers for High-Resolution Image Synthesis
Patrick Esser*, Robin Rombach*, Björn Ommer
* equal contribution

tl;dr We combine the efficiancy of convolutional approaches with the expressivity of transformers by introducing a convolutional VQGAN, which learns a codebook of context-rich visual parts, whose composition is modeled with an autoregressive transformer.

teaser arXiv | BibTeX | Project Page

News

  • Thanks to rom1504 it is now easy to train a VQGAN on your own datasets.
  • Included a bugfix for the quantizer. For backward compatibility it is disabled by default (which corresponds to always training with beta=1.0). Use legacy=False in the quantizer config to enable it. Thanks richcmwang and wcshin-git!
  • Our paper received an update: See https://arxiv.org/abs/2012.09841v3 and the corresponding changelog.
  • Added a pretrained, 1.4B transformer model trained for class-conditional ImageNet synthesis, which obtains state-of-the-art FID scores among autoregressive approaches and outperforms BigGAN.
  • Added pretrained, unconditional models on FFHQ and CelebA-HQ.
  • Added accelerated sampling via caching of keys/values in the self-attention operation, used in scripts/sample_fast.py.
  • Added a checkpoint of a VQGAN trained with f8 compression and Gumbel-Quantization. See also our updated reconstruction notebook.
  • We added a colab notebook which compares two VQGANs and OpenAI's DALL-E. See also this section.
  • We now include an overview of pretrained models in Tab.1. We added models for COCO and ADE20k.
  • The streamlit demo now supports image completions.
  • We now include a couple of examples from the D-RIN dataset so you can run the D-RIN demo without preparing the dataset first.
  • You can now jump right into sampling with our Colab quickstart notebook.

Requirements

A suitable conda environment named taming can be created and activated with:

conda env create -f environment.yaml
conda activate taming

Overview of pretrained models

The following table provides an overview of all models that are currently available. FID scores were evaluated using torch-fidelity. For reference, we also include a link to the recently released autoencoder of the DALL-E model. See the corresponding colab notebook for a comparison and discussion of reconstruction capabilities.

Dataset FID vs train FID vs val Link Samples (256x256) Comments
FFHQ (f=16) 9.6 -- ffhq_transformer ffhq_samples
CelebA-HQ (f=16) 10.2 -- celebahq_transformer celebahq_samples
ADE20K (f=16) -- 35.5 ade20k_transformer ade20k_samples.zip [2k] evaluated on val split (2k images)
COCO-Stuff (f=16) -- 20.4 coco_transformer coco_samples.zip [5k] evaluated on val split (5k images)
ImageNet (cIN) (f=16) 15.98/15.78/6.59/5.88/5.20 -- cin_transformer cin_samples different decoding hyperparameters
FacesHQ (f=16) -- -- faceshq_transformer
S-FLCKR (f=16) -- -- sflckr
D-RIN (f=16) -- -- drin_transformer
VQGAN ImageNet (f=16), 1024 10.54 7.94 vqgan_imagenet_f16_1024 reconstructions Reconstruction-FIDs.
VQGAN ImageNet (f=16), 16384 7.41 4.98 vqgan_imagenet_f16_16384 reconstructions Reconstruction-FIDs.
VQGAN OpenImages (f=8), 8192, GumbelQuantization 3.24 1.49 vqgan_gumbel_f8 --- Reconstruction-FIDs.
DALL-E dVAE (f=8), 8192, GumbelQuantization 33.88 32.01 https://github.com/openai/DALL-E reconstructions Reconstruction-FIDs.

Running pretrained models

The commands below will start a streamlit demo which supports sampling at different resolutions and image completions. To run a non-interactive version of the sampling process, replace streamlit run scripts/sample_conditional.py -- by python scripts/make_samples.py --outdir <path_to_write_samples_to> and keep the remaining command line arguments.

To sample from unconditional or class-conditional models, run python scripts/sample_fast.py -r <path/to/config_and_checkpoint>. We describe below how to use this script to sample from the ImageNet, FFHQ, and CelebA-HQ models, respectively.

S-FLCKR

teaser

You can also run this model in a Colab notebook, which includes all necessary steps to start sampling.

Download the 2020-11-09T13-31-51_sflckr folder and place it into logs. Then, run

streamlit run scripts/sample_conditional.py -- -r logs/2020-11-09T13-31-51_sflckr/

ImageNet

teaser

Download the 2021-04-03T19-39-50_cin_transformer folder and place it into logs. Sampling from the class-conditional ImageNet model does not require any data preparation. To produce 50 samples for each of the 1000 classes of ImageNet, with k=600 for top-k sampling, p=0.92 for nucleus sampling and temperature t=1.0, run

python scripts/sample_fast.py -r logs/2021-04-03T19-39-50_cin_transformer/ -n 50 -k 600 -t 1.0 -p 0.92 --batch_size 25   

To restrict the model to certain classes, provide them via the --classes argument, separated by commas. For example, to sample 50 ostriches, border collies and whiskey jugs, run

python scripts/sample_fast.py -r logs/2021-04-03T19-39-50_cin_transformer/ -n 50 -k 600 -t 1.0 -p 0.92 --batch_size 25 --classes 9,232,901   

We recommended to experiment with the autoregressive decoding parameters (top-k, top-p and temperature) for best results.

FFHQ/CelebA-HQ

Download the 2021-04-23T18-19-01_ffhq_transformer and 2021-04-23T18-11-19_celebahq_transformer folders and place them into logs. Again, sampling from these unconditional models does not require any data preparation. To produce 50000 samples, with k=250 for top-k sampling, p=1.0 for nucleus sampling and temperature t=1.0, run

python scripts/sample_fast.py -r logs/2021-04-23T18-19-01_ffhq_transformer/   

for FFHQ and

python scripts/sample_fast.py -r logs/2021-04-23T18-11-19_celebahq_transformer/   

to sample from the CelebA-HQ model. For both models it can be advantageous to vary the top-k/top-p parameters for sampling.

FacesHQ

teaser

Download 2020-11-13T21-41-45_faceshq_transformer and place it into logs. Follow the data preparation steps for CelebA-HQ and FFHQ. Run

streamlit run scripts/sample_conditional.py -- -r logs/2020-11-13T21-41-45_faceshq_transformer/

D-RIN

teaser

Download 2020-11-20T12-54-32_drin_transformer and place it into logs. To run the demo on a couple of example depth maps included in the repository, run

streamlit run scripts/sample_conditional.py -- -r logs/2020-11-20T12-54-32_drin_transformer/ --ignore_base_data data="{target: main.DataModuleFromConfig, params: {batch_size: 1, validation: {target: taming.data.imagenet.DRINExamples}}}"

To run the demo on the complete validation set, first follow the data preparation steps for ImageNet and then run

streamlit run scripts/sample_conditional.py -- -r logs/2020-11-20T12-54-32_drin_transformer/

COCO

Download 2021-01-20T16-04-20_coco_transformer and place it into logs. To run the demo on a couple of example segmentation maps included in the repository, run

streamlit run scripts/sample_conditional.py -- -r logs/2021-01-20T16-04-20_coco_transformer/ --ignore_base_data data="{target: main.DataModuleFromConfig, params: {batch_size: 1, validation: {target: taming.data.coco.Examples}}}"

ADE20k

Download 2020-11-20T21-45-44_ade20k_transformer and place it into logs. To run the demo on a couple of example segmentation maps included in the repository, run

streamlit run scripts/sample_conditional.py -- -r logs/2020-11-20T21-45-44_ade20k_transformer/ --ignore_base_data data="{target: main.DataModuleFromConfig, params: {batch_size: 1, validation: {target: taming.data.ade20k.Examples}}}"

Training on custom data

Training on your own dataset can be beneficial to get better tokens and hence better images for your domain. Those are the steps to follow to make this work:

  1. install the repo with conda env create -f environment.yaml, conda activate taming and pip install -e .
  2. put your .jpg files in a folder your_folder
  3. create 2 text files a xx_train.txt and xx_test.txt that point to the files in your training and test set respectively (for example find $(pwd)/your_folder -name "*.jpg" > train.txt)
  4. adapt configs/custom_vqgan.yaml to point to these 2 files
  5. run python main.py --base configs/custom_vqgan.yaml -t True --gpus 0,1 to train on two GPUs. Use --gpus 0, (with a trailing comma) to train on a single GPU.

Data Preparation

ImageNet

The code will try to download (through Academic Torrents) and prepare ImageNet the first time it is used. However, since ImageNet is quite large, this requires a lot of disk space and time. If you already have ImageNet on your disk, you can speed things up by putting the data into ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/ (which defaults to ~/.cache/autoencoders/data/ILSVRC2012_{split}/data/), where {split} is one of train/validation. It should have the following structure:

${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/
├── n01440764
│   ├── n01440764_10026.JPEG
│   ├── n01440764_10027.JPEG
│   ├── ...
├── n01443537
│   ├── n01443537_10007.JPEG
│   ├── n01443537_10014.JPEG
│   ├── ...
├── ...

If you haven't extracted the data, you can also place ILSVRC2012_img_train.tar/ILSVRC2012_img_val.tar (or symlinks to them) into ${XDG_CACHE}/autoencoders/data/ILSVRC2012_train/ / ${XDG_CACHE}/autoencoders/data/ILSVRC2012_validation/, which will then be extracted into above structure without downloading it again. Note that this will only happen if neither a folder ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/ nor a file ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/.ready exist. Remove them if you want to force running the dataset preparation again.

You will then need to prepare the depth data using MiDaS. Create a symlink data/imagenet_depth pointing to a folder with two subfolders train and val, each mirroring the structure of the corresponding ImageNet folder described above and containing a png file for each of ImageNet's JPEG files. The png encodes float32 depth values obtained from MiDaS as RGBA images. We provide the script scripts/extract_depth.py to generate this data. Please note that this script uses MiDaS via PyTorch Hub. When we prepared the data, the hub provided the MiDaS v2.0 version, but now it provides a v2.1 version. We haven't tested our models with depth maps obtained via v2.1 and if you want to make sure that things work as expected, you must adjust the script to make sure it explicitly uses v2.0!

CelebA-HQ

Create a symlink data/celebahq pointing to a folder containing the .npy files of CelebA-HQ (instructions to obtain them can be found in the PGGAN repository).

FFHQ

Create a symlink data/ffhq pointing to the images1024x1024 folder obtained from the FFHQ repository.

S-FLCKR

Unfortunately, we are not allowed to distribute the images we collected for the S-FLCKR dataset and can therefore only give a description how it was produced. There are many resources on collecting images from the web to get started. We collected sufficiently large images from flickr (see data/flickr_tags.txt for a full list of tags used to find images) and various subreddits (see data/subreddits.txt for all subreddits that were used). Overall, we collected 107625 images, and split them randomly into 96861 training images and 10764 validation images. We then obtained segmentation masks for each image using DeepLab v2 trained on COCO-Stuff. We used a PyTorch reimplementation and include an example script for this process in scripts/extract_segmentation.py.

COCO

Create a symlink data/coco containing the images from the 2017 split in train2017 and val2017, and their annotations in annotations. Files can be obtained from the COCO webpage. In addition, we use the Stuff+thing PNG-style annotations on COCO 2017 trainval annotations from COCO-Stuff, which should be placed under data/cocostuffthings.

ADE20k

Create a symlink data/ade20k_root containing the contents of ADEChallengeData2016.zip from the MIT Scene Parsing Benchmark.

Training models

FacesHQ

Train a VQGAN with

python main.py --base configs/faceshq_vqgan.yaml -t True --gpus 0,

Then, adjust the checkpoint path of the config key model.params.first_stage_config.params.ckpt_path in configs/faceshq_transformer.yaml (or download 2020-11-09T13-33-36_faceshq_vqgan and place into logs, which corresponds to the preconfigured checkpoint path), then run

python main.py --base configs/faceshq_transformer.yaml -t True --gpus 0,

D-RIN

Train a VQGAN on ImageNet with

python main.py --base configs/imagenet_vqgan.yaml -t True --gpus 0,

or download a pretrained one from 2020-09-23T17-56-33_imagenet_vqgan and place under logs. If you trained your own, adjust the path in the config key model.params.first_stage_config.params.ckpt_path of configs/drin_transformer.yaml.

Train a VQGAN on Depth Maps of ImageNet with

python main.py --base configs/imagenetdepth_vqgan.yaml -t True --gpus 0,

or download a pretrained one from 2020-11-03T15-34-24_imagenetdepth_vqgan and place under logs. If you trained your own, adjust the path in the config key model.params.cond_stage_config.params.ckpt_path of configs/drin_transformer.yaml.

To train the transformer, run

python main.py --base configs/drin_transformer.yaml -t True --gpus 0,

More Resources

Comparing Different First Stage Models

The reconstruction and compression capabilities of different fist stage models can be analyzed in this colab notebook. In particular, the notebook compares two VQGANs with a downsampling factor of f=16 for each and codebook dimensionality of 1024 and 16384, a VQGAN with f=8 and 8192 codebook entries and the discrete autoencoder of OpenAI's DALL-E (which has f=8 and 8192 codebook entries). firststages1 firststages2

Other

Text-to-Image Optimization via CLIP

VQGAN has been successfully used as an image generator guided by the CLIP model, both for pure image generation from scratch and image-to-image translation. We recommend the following notebooks/videos/resources:

txt2img

Text prompt: 'A bird drawn by a child'

Shout-outs

Thanks to everyone who makes their code and models available. In particular,

BibTeX

@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Comments
  • Very confused by the discriminator loss

    Very confused by the discriminator loss

    When training the VQGAN pipeline in FFHQ dataset. I checked the disc_loss use the function like vanilla_d_loss

    def hinge_d_loss(logits_real, logits_fake):
        loss_real = torch.mean(F.relu(1. - logits_real))
        loss_fake = torch.mean(F.relu(1. + logits_fake))
        d_loss = 0.5 * (loss_real + loss_fake)
        return d_loss
    

    But the metric in tensorboard ,the loss is very strangeness! image

    I am confused whether this discriminator loss is really optimized for generator training.

    The discriminator loss is joined to the process after the training step reaches 30K. By the way, add the metric of discriminator loss form training starts to the shown in the picture above. image

    opened by xesdiny 13
  • add custom dataset and instruction for training on a custom dataset

    add custom dataset and instruction for training on a custom dataset

    Hi, I created a custom dataset loader taking as input jpg files in a folder. I think this makes it easy for people discovering this repo to train on a custom dataset. Tell me what you think. This was useful for me to start training my own VQGAN Thanks for this awesome paper and clean repo!

    opened by rom1504 8
  • Gradient flow

    Gradient flow

    Hi guys, first of all, impressive work you have donehere.

    Skimming through the repo I noticed that the critic/discriminator receives gradients through both losses on account of it not having its gradients frozen, when the autoencoder part is optimized. Do I see that correctly? and if so, why did you choose to do that?

    opened by CDitzel 8
  • Overfitting problem when training transformer

    Overfitting problem when training transformer

    I train the transformer but find it overfits after 30-40 epochs, with the validation loss goes high and the training loss is very small. If you meet this problem in model training. Now I try to use the pkeep=0.9 in the cond_transformer.py to avoid overfitting.

    opened by fnzhan 7
  • Add EMA Vector Quantizer

    Add EMA Vector Quantizer

    Hi, I implemented EMA(Exponential Moving Average) version of Vector Quantizer following VectorQuantizer2 from the original repo. I also included original sonnet-style EMAVectorQuantizer as SonnetEMAVectorQuantizer. I also added EMAVQ in vqgan.py. Please ping me if you have any questions. Thank you.

    opened by tgisaturday 4
  • The file hosting server heibox.uni-heidelberg.de  is down

    The file hosting server heibox.uni-heidelberg.de is down

    https://heibox.uni-heidelberg.de/seafhttp/files/0cc07b02-72f5-4615-a2ac-ace188cf0ed0/last.ckpt

    remote: Enumerating objects: 13, done.
    remote: Counting objects: 100% (13/13), done.
    remote: Compressing objects: 100% (10/10), done.
    remote: Total 671 (delta 4), reused 7 (delta 3), pack-reused 658
    Receiving objects: 100% (671/671), 116.29 MiB | 24.59 MiB/s, done.
    Resolving deltas: 100% (139/139), done.
    /content/taming-transformers
    --2021-03-24 20:53:31--  https://heibox.uni-heidelberg.de/f/140747ba53464f49b476/?dl=1
    Resolving heibox.uni-heidelberg.de (heibox.uni-heidelberg.de)... 129.206.7.113
    Connecting to heibox.uni-heidelberg.de (heibox.uni-heidelberg.de)|129.206.7.113|:443... connected.
    HTTP request sent, awaiting response... 302 Found
    Location: https://heibox.uni-heidelberg.de/seafhttp/files/0cc07b02-72f5-4615-a2ac-ace188cf0ed0/last.ckpt [following]
    --2021-03-24 20:53:31--  https://heibox.uni-heidelberg.de/seafhttp/files/0cc07b02-72f5-4615-a2ac-ace188cf0ed0/last.ckpt
    Reusing existing connection to heibox.uni-heidelberg.de:443.
    HTTP request sent, awaiting response... 200 OK
    Length: 957954257 (914M) [application/octet-stream]
    Saving to: ‘logs/vqgan_imagenet_f16_1024/checkpoints/last.ckpt’
    
    logs/vqgan_imagenet   0%[                    ]       0  --.-KB/s    in 29s     
    
    2021-03-24 20:54:01 (0.00 B/s) - Connection closed at byte 0. Retrying.
    
    --2021-03-24 20:54:02--  (try: 2)  https://heibox.uni-heidelberg.de/seafhttp/files/0cc07b02-72f5-4615-a2ac-ace188cf0ed0/last.ckpt
    Connecting to heibox.uni-heidelberg.de (heibox.uni-heidelberg.de)|129.206.7.113|:443... connected.
    HTTP request sent, awaiting response... 502 Proxy Error
    2021-03-24 20:54:32 ERROR 502: Proxy Error.
    
    --2021-03-24 20:54:32--  https://heibox.uni-heidelberg.de/f/6ecf2af6c658432c8298/?dl=1
    Resolving heibox.uni-heidelberg.de (heibox.uni-heidelberg.de)... 129.206.7.113
    Connecting to heibox.uni-heidelberg.de (heibox.uni-heidelberg.de)|129.206.7.113|:443... connected.
    HTTP request sent, awaiting response... 302 Found
    Location: https://heibox.uni-heidelberg.de/seafhttp/files/3dbcbfc9-5824-4909-8237-df3035a8d83b/model.yaml [following]
    --2021-03-24 20:54:32--  https://heibox.uni-heidelberg.de/seafhttp/files/3dbcbfc9-5824-4909-8237-df3035a8d83b/model.yaml
    Reusing existing connection to heibox.uni-heidelberg.de:443.
    HTTP request sent, awaiting response... 502 Proxy Error
    2021-03-24 20:55:02 ERROR 502: Proxy Error.
    
    --2021-03-24 20:55:03--  https://heibox.uni-heidelberg.de/f/867b05fc8c4841768640/?dl=1
    Resolving heibox.uni-heidelberg.de (heibox.uni-heidelberg.de)... 129.206.7.113
    Connecting to heibox.uni-heidelberg.de (heibox.uni-heidelberg.de)|129.206.7.113|:443... connected.
    HTTP request sent, awaiting response... 302 Found
    Location: https://heibox.uni-heidelberg.de/seafhttp/files/5baa72fd-1411-420c-b711-69aeacccbf8d/last.ckpt [following]
    --2021-03-24 20:55:03--  https://heibox.uni-heidelberg.de/seafhttp/files/5baa72fd-1411-420c-b711-69aeacccbf8d/last.ckpt
    Reusing existing connection to heibox.uni-heidelberg.de:443.
    HTTP request sent, awaiting response... ```
    
    can't get or access the checkpoints and can no longer use them for our implementation. Please host them on a different server please! 
    opened by fractaldna22 4
  • pip installable package

    pip installable package

    Hi! Thank you for the great paper :)

    I am the owner of https://github.com/lucidrains/DALLE-pytorch and was thinking of offering the users a way to train DALL-E using your pretrained VQ-GAN, specifically the one with a codebook of size 1024. https://github.com/lucidrains/DALLE-pytorch/pull/75 I was wondering if you would be open to making your repository a pip installable package, with all the necessary dependencies (omegaconf and pytorch-lightning), so that it could be installed with

    $ pip install taming-transformers
    

    followed by

    from taming.model.vqgan import VQModel
    
    opened by lucidrains 4
  • Starting point for training sflckr

    Starting point for training sflckr

    Hey guys, great work! I'm trying to run training on a dataset similar to your sflckr. However I'm hitting this error immediately after validation or training starts, right after "Summoning checkpoint.": assert t <= self.block_size, "Cannot forward, model block size is exhausted." AssertionError: Cannot forward, model block size is exhausted. Assuming this was GPU memory related I reduced the model size but the error persisted. So I started to think that perhaps this has something to do with the configuration. My starting point is your sflckr.yaml python main.py --base configs/sflckr.yaml -t True --gpus 0, Any hints are highly appreciated. Thanks!

    opened by ink1 4
  • pytorch light issue

    pytorch light issue

    `ImportError Traceback (most recent call last) in 7 sys.path.append(".") 8 sys.path.append('./taming-transformers') ----> 9 from taming.models import vqgan # checking correct import from taming

    1 frames /content/taming-transformers/main.py in 10 from pytorch_lightning.trainer import Trainer 11 from pytorch_lightning.callbacks import ModelCheckpoint, Callback, LearningRateMonitor ---> 12 from pytorch_lightning.utilities.distributed import rank_zero_only 13 14 from taming.data.utils import custom_collate

    ImportError: cannot import name 'rank_zero_only' from 'pytorch_lightning.utilities.distributed' (/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/distributed.py)

    this is from the example colab of superresolution`

    opened by willyk26 3
  • about sflckr transformer config

    about sflckr transformer config

    I trained the sflckr first stage and cond stage respectively but when I try to train the transformer I run into: IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1) in the forward function of the transformer. I use the Examples class from sflckr.py to load the data which worked fine for training the cond stage (one pointing to train and one to val directory). Any advice would be greatly appreciated.

    opened by coralreefman 3
  • VQGAN training details

    VQGAN training details

    Hi,

    Thanks for the great repo! Could I ask some questions about training VQGAN?

    What batch size did you train it with, and for how long? Also I see here that you wait until you add the discriminator loss https://www.youtube.com/watch?v=fy153-yXSQk

    Do you wait until the model has converged without it before adding it?

    Thanks!

    opened by Andrew-Brown1 3
  • 'Labelator' object has no attribute 'decode'

    'Labelator' object has no attribute 'decode'

    When I trying use imageNet/transformer, I got this error. and I check that there is no decode function in Labelator class.

    Labelator class is located at "taming/modules/util/Labelator"

    @pesser @rromb @tgisaturday @rom1504 @carmocca

    opened by halfbloodprincecode 0
  • pytorch still raised

    pytorch still raised "out of memory" but my PYTORCH_CUDA_ALLOC_CONF = "max_split_size_mb:128"

    Do I see properly that my Nvidia free memory is 451MiB? If true than why pytorch still raise Exception "out of memory"?

    torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.39 GiB (GPU 0; 6.00 GiB total capacity; 4.04 GiB already allocated; 478.00 MiB free; 4.15 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF (venv) PS C:\projects\imageai\venv\stable-diffusion> nvidia-smi Mon Dec 19 13:05:20 2022

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 527.41       Driver Version: 527.41       CUDA Version: 12.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0  On |                  N/A |
    | N/A   53C    P8     8W /  N/A |    451MiB /  6144MiB |      6%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    opened by uranity 0
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Configs for some models

    Configs for some models

    Hello,

    First of all, awesome work and awesome repo! Thanks for having shared your pre-trained models!

    For VQGAN OpenImages (f=8), 256 and VQGAN OpenImages (f=8), 16384, the provided zip files only contain model checkpoints. Could you please share the configs as well?

    Thank you in advance for your response.

    opened by netw0rkf10w 1
  • code for classifier-guided rejection sampling

    code for classifier-guided rejection sampling

    Hi,

    I would like to know whether you could provide code or show the resources for sampling with classifier guidance on ImageNet using ResNet model? This would be very helpful thanks!

    opened by xinmiaolin 0
  • Did lr(learning rate) scheduler was used?

    Did lr(learning rate) scheduler was used?

    Hi compvis group, thanks for your overwheming work and i'd love to deploy your method into my project.

    However, i found there is no learning rate scheduler(lr scheduler) was used for training. More precisely, the ./taming/models/vqgan.py, VQGAN.configure_optimizer only returns two optimizers. And self.learning_rate is model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr(in main.py), which is a constant.

        def configure_optimizers(self):
            # `self.learning_rate` is assigned from outside the class
            lr = self.learning_rate
            opt_ae = torch.optim.Adam(list(self.encoder.parameters())+
                                      list(self.decoder.parameters())+
                                      list(self.quantize.parameters())+
                                      list(self.quant_conv.parameters())+
                                      list(self.post_quant_conv.parameters()),
                                      lr=lr, betas=(0.5, 0.9))
            opt_disc = torch.optim.Adam(self.loss.discriminator.parameters(),
                                        lr=lr, betas=(0.5, 0.9))
            return [opt_ae, opt_disc], []
    

    That seems strange. To my knowledge, when training a big model from scratch, learning rate should be adjusted with a scheduler, since big lr is needed for the beginning and small lr for further steps.

    Did i missunderstand anything? Could you please give me some hint? Thanks in advance!

    opened by Maxlinn 2
Owner
CompVis Heidelberg
Computer Vision research group at the Ruprecht-Karls-University Heidelberg
CompVis Heidelberg
Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

Computational Photography Lab @ SFU 1.1k Jan 2, 2023
AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

Multimedia Research 214 Jan 3, 2023
Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

Syed Waqas Zamir 906 Dec 30, 2022
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

Poisson Image Editing - A Parallel Implementation Jiayi Weng (jiayiwen), Zixu Chen (zixuc) Poisson Image Editing is a technique that can fuse two imag

Jiayi Weng 110 Dec 27, 2022
A framework for joint super-resolution and image synthesis, without requiring real training data

SynthSR This repository contains code to train a Convolutional Neural Network (CNN) for Super-resolution (SR), or joint SR and data synthesis. The met

null 83 Jan 1, 2023
SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis

SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis Pretrained Models In this work, we created synthetic tissue

Emirhan Kurtuluş 1 Feb 7, 2022
PyTorch Implementation of "Light Field Image Super-Resolution with Transformers"

LFT PyTorch implementation of "Light Field Image Super-Resolution with Transformers", arXiv 2021. [pdf]. Contributions: We make the first attempt to a

Squidward 62 Nov 28, 2022
Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

512x512 flowers after 12 hours of training, 1 gpu 256x256 flowers after 12 hours of training, 1 gpu Pizza 'Lightweight' GAN Implementation of 'lightwe

Phil Wang 1.5k Jan 2, 2023
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Vision Longformer This project provides the source code for the vision longformer paper. Multi-Scale Vision Longformer: A New Vision Transformer for H

Microsoft 209 Dec 30, 2022
Official implement of Paper:A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images 深度监督影像融合网络DSIFN用于高分辨率双时相遥感影像变化检测 Of

Chenxiao Zhang 135 Dec 19, 2022
Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

DFSA Unofficial pytorch implementation of the ICCV 2021 paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution" (p

null 2 Nov 15, 2021
Geometry-Free View Synthesis: Transformers and no 3D Priors

Geometry-Free View Synthesis: Transformers and no 3D Priors Geometry-Free View Synthesis: Transformers and no 3D Priors Robin Rombach*, Patrick Esser*

CompVis Heidelberg 293 Dec 22, 2022
Direct application of DALLE-2 to video synthesis, using factored space-time Unet and Transformers

DALLE2 Video (wip) ** only to be built after DALLE2 image is done and replicated, and the importance of the prior network is validated ** Direct appli

Phil Wang 105 May 15, 2022
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

Rishikesh (ऋषिकेश) 31 Dec 8, 2022
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Bilateral Denoising Diffusion Models (BDDMs) This is the official PyTorch implementation of the following paper: BDDM: BILATERAL DENOISING DIFFUSION M

null 172 Dec 23, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 5, 2023
TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors

TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors This package provides a simulator for vision-based

Facebook Research 255 Dec 27, 2022
A Fast and Stable GAN for Small and High Resolution Imagesets - pytorch

A Fast and Stable GAN for Small and High Resolution Imagesets - pytorch The official pytorch implementation of the paper "Towards Faster and Stabilize

Bingchen Liu 455 Jan 8, 2023
QueryDet: Cascaded Sparse Query for Accelerating High-Resolution SmallObject Detection

QueryDet-PyTorch This repository is the official implementation of our paper: QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small O

Chenhongyi Yang 276 Dec 31, 2022