PyTorch package for the discrete VAE used for DALL·E.

Related tags

Deep Learning DALL-E
Overview
Comments
  • How to sample or generate a new image?

    How to sample or generate a new image?

    Hi, it's a great work! But I am a little confused about how to generate a new image? Shall I give the sentence tokens and then use them to predict the image tokens? And where to inject the noise? It will be very appreciate that you can answer these questions, thank you!

    opened by JohnDreamer 36
  • Error on executing usage.ipynb notebook on a cuda:0 device

    Error on executing usage.ipynb notebook on a cuda:0 device

    I changed this line as sugggested to use the GPU:

    # This can be changed to a GPU, e.g. 'cuda:0'.
    dev = torch.device('cuda:0')
    

    And I tried to execute the notebook. I got the following error message:

    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    /tmp/ipykernel_11/3257249919.py in <module>
          1 import torch.nn.functional as F
          2 
    ----> 3 z_logits = enc(x)
          4 z = torch.argmax(z_logits, axis=1)
          5 z = F.one_hot(z, num_classes=enc.vocab_size).permute(0, 3, 1, 2).float()
    
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
       1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1109                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1110             return forward_call(*input, **kwargs)
       1111         # Do not call functions when jit is used
       1112         full_backward_hooks, non_full_backward_hooks = [], []
    
    /opt/conda/lib/python3.8/site-packages/dall_e/encoder.py in forward(self, x)
         91                         raise ValueError('input must have dtype torch.float32')
         92 
    ---> 93                 return self.blocks(x)
    
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
       1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1109                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1110             return forward_call(*input, **kwargs)
       1111         # Do not call functions when jit is used
       1112         full_backward_hooks, non_full_backward_hooks = [], []
    
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
        139     def forward(self, input):
        140         for module in self:
    --> 141             input = module(input)
        142         return input
        143 
    
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
       1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1109                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1110             return forward_call(*input, **kwargs)
       1111         # Do not call functions when jit is used
       1112         full_backward_hooks, non_full_backward_hooks = [], []
    
    /opt/conda/lib/python3.8/site-packages/dall_e/utils.py in forward(self, x)
         41                         w, b = self.w, self.b
         42 
    ---> 43                 return F.conv2d(x, w, b, padding=(self.kw - 1) // 2)
         44 
         45 def map_pixels(x: torch.Tensor) -> torch.Tensor:
    
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper___slow_conv2d_forward)
    
    opened by esparig 4
  • KL Loss

    KL Loss

    I am having trouble getting the dVAE to train properly if I include the KL loss term with a uniform prior over the number of visual tokens. Does anyone here has had similar experiences or problems? The paper mentions an increasing schedule for the kl weight factor but I cant get it to work properly and results are always better if I set the KL loss to zero altogether.

    Maybe someone can help?

    opened by CDitzel 4
  • Hyperparameters of the bottleneck

    Hyperparameters of the bottleneck

    Thanks for releasing the paper as well as the codes!

    Could you give some hints on the hyperparameters of the bottleneck that might affect the performance?

    1. The downsampling ratio. In the original VQ-VAE paper, they only use 4 times downsampling (comparing to 8 in DALL-E) and it seems their generated images lack a global structure (I assumed also because they didn't use a powerful prior model). Is that using a higher downsampling rate, the global structure is better preserved? Or easier for the prior model to learn?

    2. The codebook size is set to 2^13, have you tried using smaller codebook size? Presumably, as the codebook size shrinks, the VAE can hardly reconstruct the image. What does the reconstructed image with a small codebook look like? Is the texture still preserved but the global structure distorted or something else?

    I also have an additional question to the inference stage of the model. Is the image tokens sampled from the prior transformer auto-regressively or using other searching technique? Also, how to control the number of the generated image tokens to be exactly 32 * 32?

    opened by cdjhz 4
  • Implementation Doubts

    Implementation Doubts

    Although this codebase is for the vae part. would appreciate if you could help in understanding few components of the transformer part also. In the paper and blog released, you mentioned that you use Child et al paper. Can you elaborate on what you use as the block size? 8/16/32 If we use a block size of 16 for example then how do you implement the convolutional kernel, it has gaps of only 1 block but if you have sparse block of size 16 then it doesn't make sense.

    Also, when you are training the gpt style model. Even though the loss and perplexity reduce, how do you identify when the perplexity/loss value of the 1.2B parameter model is sufficient? like is a loss of 4/5 good or should it be <1.

    opened by shubhamag97 3
  • Help

    Help

    Hello i am brand new to the github community and coding, i have zero idea how to install this but im an artist and it would be an excellent resource for non copyrighted images, i know its alot to ask but can someone please tell me how to install this code i made my account for this specifically for this

    opened by DandelionBones 3
  • questions on notebook

    questions on notebook

    I just downloaded the repo to my local file system and used jupyter notebook and then opened and played the notebook. I also downloaded the encoder and decoder to the same folder for ease of loading. It says that 'preprocess' is not defined, but it seems to be. Admittedly, a bit rusty. Running Python 3.9 on Mac OSX. Also, I may be way out of line with respect to the purpose, but I was expecting to see code that took natural language input (e.g. "Show me a penguin on snow") and then DALL*E returns the provided image.

    Screen Shot 2021-02-27 at 9 51 37 PM

    Originally posted by @metaphorz in https://github.com/openai/DALL-E/issues/5#issuecomment-787310649

    opened by metaphorz 3
  • an TypeError

    an TypeError

    pytorch:1.7.1 torchversion: 0.8.2 when run the code, it seems wrong:

    Traceback (most recent call last): File "E:/github/DALL-E-master/test.py", line 46, in display(T.ToPILImage(mode='RGB')(x[0])) File "C:\ProgramData\Anaconda3\lib\site-packages\torchvision\transforms\transforms.py", line 185, in call return F.to_pil_image(pic, self.mode) File "C:\ProgramData\Anaconda3\lib\site-packages\torchvision\transforms\functional.py", line 202, in to_pil_image 'not {}'.format(type(npimg))) TypeError: Input pic must be a torch.Tensor or NumPy ndarray, not <class 'numpy.ndarray'> Original image:

    opened by shen51000 2
  • Why do we need logit_laplace_eps in utils.py?

    Why do we need logit_laplace_eps in utils.py?

    What's the meaning of logit_laplace_eps here since both the input and output are [0,1] tensors. https://github.com/openai/DALL-E/blob/5be4b236bc3ade6943662354117a0e83752cc322/dall_e/utils.py#L51

    opened by cientgu 2
  • Why the output dimension of the decoder is 2 * output_channels which is 6, not 3 (RGB)?

    Why the output dimension of the decoder is 2 * output_channels which is 6, not 3 (RGB)?

    Hi, thanks for the code!

    It's a simple question, I found the output dimension of decoder is set to 2 * self.output_channels, which is 6. I expect the output dimension should be 3 (RGB). Can you kindly explain the reason?

    Thank you in advance!

    https://github.com/openai/DALL-E/blob/3381ae9a10bafe3cb1c7c9fff554565ad7751e7f/dall_e/decoder.py#L82

    opened by nashory 2
  • Hi! It works? I for ex cant launch it. Why? Look!

    Hi! It works? I for ex cant launch it. Why? Look!

    screens / installation

    https://disk.yandex.ru/i/kCSzh6LodRkWlQ

    Снимок экрана 2021-03-03 в 16 35 14 Снимок экрана 2021-03-03 в 16 35 30 Сниheмок экрана 2021-03-03 в 16 35 43

    CAN You explain me what should i do here and how launch the app?

    thank you in advance

    opened by mazzzai 2
Owner
OpenAI
OpenAI
PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

Subin An 8 Nov 21, 2022
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Phil Wang 5k Jan 4, 2023
Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

Generative Models Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow. Also present here are RBM and Helmholtz Machine. Note: Gen

Agustinus Kristiadi 7k Jan 2, 2023
Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Overview PyTorch 0.4.1 | Python 3.6.5 Annotated implementations with comparative introductions for minimax, non-saturating, wasserstein, wasserstein g

Shayne O'Brien 471 Dec 16, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
Open-AI's DALL-E for large scale training in mesh-tensorflow.

DALL-E in Mesh-Tensorflow [WIP] Open-AI's DALL-E in Mesh-Tensorflow. If this is similarly efficient to GPT-Neo, this repo should be able to train mode

EleutherAI 432 Dec 16, 2022
RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

[Paper] [Хабр] [Model Card] [Colab] [Kaggle] RuDOLPH ?? ?? ☃️ One Hyper-Modal Tr

Sber AI 230 Dec 31, 2022
CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Diverse Structure Inpainting ArXiv | Papar | Supplementary Material | BibTex This repository is for the CVPR 2021 paper, "Generating Diverse Structure

null 152 Nov 4, 2022
VideoGPT: Video Generation using VQ-VAE and Transformers

VideoGPT: Video Generation using VQ-VAE and Transformers [Paper][Website][Colab][Gradio Demo] We present VideoGPT: a conceptually simple architecture

Wilson Yan 470 Dec 30, 2022
Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

End-to-End Optimization of Scene Layout Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral) Project site, Bibtex For help conta

Andrew Luo 41 Dec 9, 2022
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

THUHCSI 138 Oct 28, 2022
A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-si

Dmitri Babaev 103 Dec 17, 2022
DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation This project hosts the code for implementing the DCT-MASK algorithms

Alibaba Cloud 57 Nov 27, 2022
Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ResDAVEnet-VQ Official PyTorch implementation of Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech What is in this repo? M

Wei-Ning Hsu 21 Aug 23, 2022
This is 2nd term discrete maths project done by UCU students that uses backtracking to solve various problems.

Backtracking Project Sponsors This is a project made by UCU students: Olha Liuba - crossword solver implementation Hanna Yershova - sudoku solver impl

Dasha 4 Oct 17, 2021
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

Facebook Research 253 Jan 6, 2023
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

torch-imle Concise and self-contained PyTorch library implementing the I-MLE gradient estimator proposed in our NeurIPS 2021 paper Implicit MLE: Backp

UCL Natural Language Processing 249 Jan 3, 2023
Auto HMM: Automatic Discrete and Continous HMM including Model selection

Auto HMM: Automatic Discrete and Continous HMM including Model selection

Chess_champion 29 Dec 7, 2022
This Jupyter notebook shows one way to implement a simple first-order low-pass filter on sampled data in discrete time.

How to Implement a First-Order Low-Pass Filter in Discrete Time We often teach or learn about filters in continuous time, but then need to implement t

Joshua Marshall 4 Aug 24, 2022