Pytorch implementation of MaskGIT: Masked Generative Image Transformer

Overview

MaskGIT-pytorch

Pytorch implementation of MaskGIT: Masked Generative Image Transformer (https://arxiv.org/pdf/2202.04200.pdf)

Note: this is work in progress

MaskGIT is an extension to the VQGAN paper which improves the second stage transformer part (and leaves the first stage untouched). It switches the unidirectional transformer for a bidirectional transformer. The (second stage) training is pretty similar to BERT by randomly masking out tokens and trying to predict these using the bidirectional transformer (the original work used a GPT architecture randomly replaced tokens by other tokens). Different from BERT, the percentage for the masking is not fixed and uniformly distributed between 0 and 1 for each batch. Furhtermore, a new inference algorithm is suggested in which we start off by a completely masked-out image and then iteratively sample vectors where the model has a high confidence.

If you are only interested in the part of the code that comes from this paper check out transformer.py.

Run the code

The code is ready for training both the VQGAN and the Bidirectional Transformer and can also be used for inference

python training_vqgan.py

python training_transformer.py

(Make sure to edit the path for the dataset etc.)

TODO

  • Implement the gamma functions
  • Implement functions for image editing tasks: inpainting, extrapolation, image manipulation
  • Tune hyperparameters
  • (Provide visual results)
Comments
  • Pretrained model for VQGAN

    Pretrained model for VQGAN

    Thank you for the implementation! I would like to tune the second-stage transformer, but my VQGAN trained on Flickr landscape dataset is not so good. I see there is a load function of 'vq_flickr.pt' and you have much better landscape results. Could you kindly share that? Thanks!

    out_2

    opened by choyingw 2
  • Question about the mask token id and sos token id

    Question about the mask token id and sos token id

    Hi, In transformer.py, I find mask_token_id is set to be args.num_image_tokens. Shouldn't it be the args.num_codebook_vectors? I think we don't want the mask token id to be one of those in the codebook. Similar thing for the sos token id.

    opened by larryzhang23 2
  • Learning rate & its scheduling

    Learning rate & its scheduling

    I cannot find the specific value of learning rate and how the author schedule to change the learning rate over epochs. How do you implement and reproduce the results in the paper?

    opened by LeeDoYup 2
  • Isn't loss only supposed to be calculated on masked tokens?

    Isn't loss only supposed to be calculated on masked tokens?

    In the training loop we have:

    imgs = imgs.to(device=args.device)
    logits, target = self.model(imgs)
    loss = F.cross_entropy(logits.reshape(-1, logits.size(-1)), target.reshape(-1))
    loss.backward()
    

    However, the output of the transformer is:

      _, z_indices = self.encode_to_z(x)
    .
    .
    .
      a_indices = mask * z_indices + (~mask) * masked_indices
    
      a_indices = torch.cat((sos_tokens, a_indices), dim=1)
    
      target = torch.cat((sos_tokens, z_indices), dim=1)
    
      logits = self.transformer(a_indices)
    
      return logits, target
    

    which means the returned target is the original unmasked image tokens.

    The MaskGIT paper seems to suggest that loss was only calculated on the masked tokens

    image

    opened by EmaadKhwaja 0
  • Is Each VQGAN model of class TrainVQGAN and class VQmodel different?

    Is Each VQGAN model of class TrainVQGAN and class VQmodel different?

    I am going through your MaskGIT code to study how to implement it, Thank you! But I have a question about VQGAN for tokenization. I think VQGAN for tokenization and VQGAN in training_vqgan.py are different to each other because the parameters of those are not same with each other. If I mistake it, let me know, please. Thx!

    opened by 9B8DY6 0
  • Could you please provide a test file for image outpainting?

    Could you please provide a test file for image outpainting?

    Hi dome, thanks for ur implementation about MaskGIT, and there is still a question in the training process. Should I train VQGAN first and then train Transformer? Using the same dataset? Thanks Reply!

    opened by huafei555 0
  • Lost Datasets landscape and flowers

    Lost Datasets landscape and flowers

    When I running training_transformer.py and training_vqgan.py. I don't hava dataset landscape and flowers.

    Can writter release the dateset in repository?

    opened by Shao-YJ 0
  • About Class-conditional Image Synthesis

    About Class-conditional Image Synthesis

    Hi, thanks for your open source. It is a great work. I want to ask a question about this paper. The Bi-directional Transformers is trained without any conditional input, it just try to predict the masked token. But when we inference it, such as use the model to do a Class-conditional Image Synthesis task. How the class condition information can be used?

    opened by yangdongchao 2
Owner
Dominic Rampas
I started coding in summer 2018 with the age of 17. Half a year later found interest in AI and ML and started to learn it. Mainly coding in Python.
Dominic Rampas
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

Andrej 498 Dec 30, 2022
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners This repository is built upon BEiT, thanks very much! Now, we on

Zhiliang Peng 2.3k Jan 4, 2023
PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

MAE for Self-supervised ViT Introduction This is an unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-sup

null 36 Oct 30, 2022
An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners This is a coarse version for MAE, only make the pretrain model, the fine

FlyEgle 214 Dec 29, 2022
Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

null 61 Jan 1, 2023
Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

Thomas Neumann 117 Nov 27, 2022
SimMIM: A Simple Framework for Masked Image Modeling

SimMIM By Zhenda Xie*, Zheng Zhang*, Yue Cao*, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai and Han Hu*. This repo is the official implementation of

Microsoft 181 Dec 10, 2021
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Phil Wang 272 Dec 23, 2022
Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

mae-repo PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https:/

Peng Qiao 1 Dec 14, 2021
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

This is a release of our VIMPAC paper to illustrate the implementations. The pretrained checkpoints and scripts will be soon open-sourced in HuggingFace transformers.

Hao Tan 74 Dec 3, 2022
EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling This is the official implementation for "Frustratingly Simple Pretraining Al

Atsuki Yamaguchi 31 Nov 18, 2022
The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu

AI2 114 Jan 6, 2023
SeMask: Semantically Masked Transformers for Semantic Segmentation.

SeMask: Semantically Masked Transformers Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi This repo co

Picsart AI Research (PAIR) 186 Dec 30, 2022
FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

FocusFace This is the official repository of "FocusFace: Multi-task Contrastive Learning for Masked Face Recognition" accepted at IEEE International C

Pedro Neto 21 Nov 17, 2022
Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

ADE20k Semantic segmentation with MAE Getting started Install the mmsegmentation

null 97 Dec 17, 2022
Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir Website | arXiv | BibTeX Official PyTo

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 385 Jan 6, 2023
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training [Arxiv] VideoMAE: Masked Autoencoders are Data-Efficient Learne

Multimedia Computing Group, Nanjing University 697 Jan 7, 2023
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Phil Wang 12.6k Jan 9, 2023