Pytorch implementation of MaskGIT: Masked Generative Image Transformer

Dominic Rampas

Last update: Dec 16, 2022

Related tags

Deep Learning MaskGIT-pytorch

Overview

MaskGIT-pytorch

Pytorch implementation of MaskGIT: Masked Generative Image Transformer (https://arxiv.org/pdf/2202.04200.pdf)

Note: this is work in progress

MaskGIT is an extension to the VQGAN paper which improves the second stage transformer part (and leaves the first stage untouched). It switches the unidirectional transformer for a bidirectional transformer. The (second stage) training is pretty similar to BERT by randomly masking out tokens and trying to predict these using the bidirectional transformer (the original work used a GPT architecture randomly replaced tokens by other tokens). Different from BERT, the percentage for the masking is not fixed and uniformly distributed between 0 and 1 for each batch. Furhtermore, a new inference algorithm is suggested in which we start off by a completely masked-out image and then iteratively sample vectors where the model has a high confidence.

If you are only interested in the part of the code that comes from this paper check out transformer.py.

Run the code

The code is ready for training both the VQGAN and the Bidirectional Transformer and can also be used for inference

python training_vqgan.py

python training_transformer.py

(Make sure to edit the path for the dataset etc.)

TODO

Implement the gamma functions
Implement functions for image editing tasks: inpainting, extrapolation, image manipulation
Tune hyperparameters
(Provide visual results)

Comments

Pretrained model for VQGAN

Thank you for the implementation! I would like to tune the second-stage transformer, but my VQGAN trained on Flickr landscape dataset is not so good. I see there is a load function of 'vq_flickr.pt' and you have much better landscape results. Could you kindly share that? Thanks!

opened by choyingw 2
Question about the mask token id and sos token id

Hi, In transformer.py, I find mask_token_id is set to be args.num_image_tokens. Shouldn't it be the args.num_codebook_vectors? I think we don't want the mask token id to be one of those in the codebook. Similar thing for the sos token id.

opened by larryzhang23 2
Learning rate & its scheduling

I cannot find the specific value of learning rate and how the author schedule to change the learning rate over epochs. How do you implement and reproduce the results in the paper?

opened by LeeDoYup 2

Isn't loss only supposed to be calculated on masked tokens?

In the training loop we have:

imgs = imgs.to(device=args.device)
logits, target = self.model(imgs)
loss = F.cross_entropy(logits.reshape(-1, logits.size(-1)), target.reshape(-1))
loss.backward()

However, the output of the transformer is:

  _, z_indices = self.encode_to_z(x)
.
.
.
  a_indices = mask * z_indices + (~mask) * masked_indices

  a_indices = torch.cat((sos_tokens, a_indices), dim=1)

  target = torch.cat((sos_tokens, z_indices), dim=1)

  logits = self.transformer(a_indices)

  return logits, target

which means the returned target is the original unmasked image tokens.

The MaskGIT paper seems to suggest that loss was only calculated on the masked tokens

opened by EmaadKhwaja 0

Is Each VQGAN model of class TrainVQGAN and class VQmodel different?

I am going through your MaskGIT code to study how to implement it, Thank you! But I have a question about VQGAN for tokenization. I think VQGAN for tokenization and VQGAN in training_vqgan.py are different to each other because the parameters of those are not same with each other. If I mistake it, let me know, please. Thx!

opened by 9B8DY6 0
Could you please provide a test file for image outpainting?

Hi dome, thanks for ur implementation about MaskGIT, and there is still a question in the training process. Should I train VQGAN first and then train Transformer? Using the same dataset? Thanks Reply!

opened by huafei555 0
Lost Datasets landscape and flowers

When I running training_transformer.py and training_vqgan.py. I don't hava dataset landscape and flowers.

Can writter release the dateset in repository？

opened by Shao-YJ 0
About Class-conditional Image Synthesis

Hi, thanks for your open source. It is a great work. I want to ask a question about this paper. The Bi-directional Transformers is trained without any conditional input, it just try to predict the masked token. But when we inference it, such as use the model to do a Class-conditional Image Synthesis task. How the class condition information can be used?

opened by yangdongchao 2

Owner

Dominic Rampas

I started coding in summer 2018 with the age of 17. Half a year later found interest in AI and ML and started to learn it. Mainly coding in Python.

GitHub

MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

498 Dec 30, 2022

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners This repository is built upon BEiT, thanks very much! Now, we on

2.3k Jan 4, 2023

PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

MAE for Self-supervised ViT Introduction This is an unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-sup

36 Oct 30, 2022

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners This is a coarse version for MAE, only make the pretrain model, the fine

214 Dec 29, 2022

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

61 Jan 1, 2023

Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

117 Nov 27, 2022

SimMIM: A Simple Framework for Masked Image Modeling

SimMIM By Zhenda Xie*, Zheng Zhang*, Yue Cao*, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai and Han Hu*. This repo is the official implementation of

181 Dec 10, 2021

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

12.6k Jan 9, 2023

Pytorch implementation of MaskGIT: Masked Generative Image Transformer

Related tags

Overview

MaskGIT-pytorch

Note: this is work in progress

Run the code

TODO

Comments

Pretrained model for VQGAN

Question about the mask token id and sos token id

Learning rate & its scheduling

Isn't loss only supposed to be calculated on masked tokens?

Is Each VQGAN model of class TrainVQGAN and class VQmodel different?

Could you please provide a test file for image outpainting?

Lost Datasets landscape and flowers

About Class-conditional Image Synthesis

Owner

Dominic Rampas

MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

SimMIM: A Simple Framework for Masked Image Modeling

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

SeMask: Semantically Masked Transformers for Semantic Segmentation.

FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch