PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Related tags

Deep Learning mae
Overview

Masked Autoencoders: A PyTorch Implementation

This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners:

@Article{MaskedAutoencoders2021,
  author  = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
  journal = {arXiv:2111.06377},
  title   = {Masked Autoencoders Are Scalable Vision Learners},
  year    = {2021},
}
  • The original implementation was in TensorFlow+TPU. This re-implementation is in PyTorch+GPU.

  • This repo is a modification on the DeiT repo. Installation and preparation follow that repo.

  • This repo is based on timm==0.3.2, for which a fix is needed to work with PyTorch 1.8.1+.

Catalog

  • Visualization demo
  • Pre-trained checkpoints + fine-tuning code
  • Pre-training code

Visualization demo

Run our interactive visualization demo using Colab notebook (no GPU needed):

Fine-tuning with pre-trained checkpoints

The following table provides the pre-trained checkpoints used in the paper, converted from TF/TPU to PT/GPU:

ViT-Base ViT-Large ViT-Huge
pre-trained checkpoint download download download
md5 8cad7c b8b06e 9bdbb0

The fine-tuning instruction is in FINETUNE.md.

By fine-tuning these pre-trained models, we rank #1 in these classification tasks (detailed in the paper):

ViT-B ViT-L ViT-H ViT-H448 prev best
ImageNet-1K (no external data) 83.6 85.9 86.9 87.8 87.1
following are evaluation of the same model weights (fine-tuned in original ImageNet-1K):
ImageNet-Corruption (error rate) 51.7 41.8 33.8 36.8 42.5
ImageNet-Adversarial 35.9 57.1 68.2 76.7 35.8
ImageNet-Rendition 48.3 59.9 64.4 66.5 48.7
ImageNet-Sketch 34.5 45.3 49.6 50.9 36.0
following are transfer learning by fine-tuning the pre-trained MAE on the target dataset:
iNaturalists 2017 70.5 75.7 79.3 83.4 75.4
iNaturalists 2018 75.4 80.1 83.0 86.8 81.2
iNaturalists 2019 80.5 83.4 85.7 88.3 84.1
Places205 63.9 65.8 65.9 66.8 66.0
Places365 57.9 59.4 59.8 60.3 58.0

Pre-training

The pre-training instruction is in PRETRAIN.md.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Comments
  • Problems about reproducing ViT-Base

    Problems about reproducing ViT-Base

    Thanks for your great work. My reproduction result for ViT-B is only 83.3, which is 0.3 lower than the paper result. And I have no idea what may cause this. My exp totally follows this repo, except the following changes: a. I use 32 V100 with batch size 128 (32x128 = 4096). The recommended setting of this repo is 64 V100 with batch size 64(64x64=4096). b. I didn't use submitit_pretrain.py but directly use main_pretrain.py for multi-node training.

    Are these two changes the potential causes for performance gap? Any suggestions would be deeply appreciated~

    opened by leeyegy 13
  • __init__() got an unexpected keyword argument 'qk_scale'

    __init__() got an unexpected keyword argument 'qk_scale'

    sys.path.append('..') import models_mae dir(models_mae.Block)

    The module has been successfully imported. But there is an error happened to Block Class. Use it in jupyter notebook. Any idea of it?

    opened by zlgenuine 8
  • How to pretrain mae in 1 node with 8 gpus?

    How to pretrain mae in 1 node with 8 gpus?

    I m trying to pretrain a vit_small model in 1 node with 8 gpus, but the submitit_pretrain.py goes wrong with: Traceback (most recent call last): File "submitit_pretrain.py", line 131, in main() File "submitit_pretrain.py", line 89, in main args.job_dir = get_shared_folder() / "%j" File "submitit_pretrain.py", line 39, in get_shared_folder raise RuntimeError("No shared folder available") RuntimeError: No shared folder available

    it seems like that I have no shared folder, so how can I pretrain mae in 1 node with 8 gpus? Thanks for your exciting job!!

    opened by yanjk3 8
  • Does this implement support timm > 0.3.2?

    Does this implement support timm > 0.3.2?

    Hi,

    Thanks for releasing this awesome work!

    I notice the code will check the version of timm.

    assert timm.__version__ == "0.3.2" # version check
    

    Does this implement support updated timm, e.g. 0.4.12?

    Thanks!

    Best, Vera

    opened by verazuo 6
  • distributed training has the same speed as single gpu training

    distributed training has the same speed as single gpu training

    First, thanks for the great work. Unfortunately, here I faced something really weird. I tried to train MAE on my single node with,

    python -m torch.distributed.launch --nproc_per_node=4 main_pretrain.py --batch_size 16 --model mae_vit_base_patch16 --norm_pix_loss --mask_ratio 0.75 --epochs 800 --warmup_epochs 0 --blr 1.5e-4 --weight_decay 0.05 --data_path /home/shaoshihao

    Something goes weirdly is, it just run at the same speed as i ran that on single gpu with the command, python main_pretrain.py --batch_size 64 --model mae_vit_base_patch16 --norm_pix_loss --mask_ratio 0.75 --epochs 800 --warmup_epochs 0 --blr 1.5e-4 --weight_decay 0.05 --data_path /home/shaoshihao

    I tracked the GPU status and it looks good. The 4 GPUs are all ~100% utilization rate when i used 4 gpus.

    Any advice will be deeply appreciated.

    opened by LouieShao 5
  • 提供的预训练模型和代码中的模型不一致?

    提供的预训练模型和代码中的模型不一致?

    Why the structure in the pre-trained checkpoint Vit-Base is different in the models_mae.py. The cls_token layer exist in the checkpoint Vit-Base you provide, but not in the models_mae.py.

    opened by CGZQQQ 5
  • loss fluctuates during training

    loss fluctuates during training

    Thanks for the great work!

    I wanted to pre-train a mae_vit_base_patch16 model on imagenet-1k, and I found the loss fluctuates during training. I was wondering if this is normal for pre-training mae. 截屏2022-01-20 15 19 52

    opened by cailk 4
  • ValueError: relative path can't be expressed as a file URI

    ValueError: relative path can't be expressed as a file URI

    Also having the same issue as #55 and #57 when running pretrain. I have /checkpoint, /checkpoint/{USER} and /checkpoint/{USER}/experiments. Unclear how to set up --job_dir and getting the following error. Any updates or suggestions? Appreciate any help!

    Traceback (most recent call last):
      File "submitit_pretrain.py", line 131, in <module>
        main()
      File "submitit_pretrain.py", line 120, in main
        args.dist_url = get_init_file().as_uri()
      File "/anaconda/envs/mae/lib/python3.6/pathlib.py", line 721, in as_uri
        raise ValueError("relative path can't be expressed as a file URI")
    ValueError: relative path can't be expressed as a file URI
    
    opened by kwhuang88228 3
  • ADE20k Learning Rate

    ADE20k Learning Rate

    Section A.4 of the paper has "We search for the optimal lr for each entry in Table 5 " when referring to segmentation on ADE20k. Could you share these learning rates? My reproduced baselines are about 2 points too low.

    Thanks!

    opened by cjrd 3
  • Can you provide more results about the ViT-B Mask R-CNN+MAE 25ep/50ep

    Can you provide more results about the ViT-B Mask R-CNN+MAE 25ep/50ep

    Thanks for your awesome work! I am doing some experiments with MAE. Can you provide more results (e.g. mask AP) about these two models (ViT-B Mask R-CNN + MAE 25ep/50ep). I want to cite these data in my paper. Thanks! Looking forward to your reply. image

    opened by czczup 3
  • Release of the linear probing code

    Release of the linear probing code

    Hey,

    Thanks for this awesome code! I am wondering whether you plan to release the linear probing code or not. According to the third party MAE repo, the linear probing results seem hard to be reproduced. It would be great if you plan to do so. Thanks!

    opened by Jeff-LiangF 3
  • [Question] Why non-masked patches look worse in pixel reconstruction example image

    [Question] Why non-masked patches look worse in pixel reconstruction example image

    Hi, in this image: https://user-images.githubusercontent.com/11435359/147859292-77341c70-2ed8-4703-b153-f505dcb6f2f8.png Why do the patches that aren't masked look worse?

    opened by austinmw 0
  • Shouldn't the patch embeddings be trained only on the patches that survived masking? (Rather than the original image)

    Shouldn't the patch embeddings be trained only on the patches that survived masking? (Rather than the original image)

    In the code it seems that the patch embeddings are trained on the original image rather than learning only on the patches that survive the masking process. Does this mean the implementation does not follow the paper? From the paper:

    "MAE encoder. Our encoder is a ViT [16] but applied only on visible, unmasked patches. Just as in a standard ViT, our encoder embeds patches by a linear projection with added positional embeddings, and then processes the resulting set via a series of Transformer blocks. However, our encoder only operates on a small subset (e.g., 25%) of the full set. Masked patches are removed; no mask tokens are used. This allows us to train very large encoders with only a frac- tion of compute and memory. The full set is handled by a lightweight decoder, described next."

    From my interpretation it seems as if the embeddings should also be run only on the small subset of patches.

    opened by Eduard6421 0
  • Something about the training set

    Something about the training set

    I have a question that whether it (the pretrained module) works on unlabelled images or other datasets containing different images without being classified before. Thank you!

    opened by Daniel12345abcde 0
  • AttributeError(

    AttributeError("module {!r} has no attribute " AttributeError: module 'numpy' has no attribute 'float'

    If you are getting this error, the numpy version is too new. This repository seems to be incompatible with newer numpy. I had to downgrade the numpy with the following command.

    pip install "numpy<1.24"

    opened by apple2373 0
Owner
Meta Research
Meta Research
An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners This is a coarse version for MAE, only make the pretrain model, the fine

FlyEgle 214 Dec 29, 2022
Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

mae-repo PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https:/

Peng Qiao 1 Dec 14, 2021
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training [Arxiv] VideoMAE: Masked Autoencoders are Data-Efficient Learne

Multimedia Computing Group, Nanjing University 697 Jan 7, 2023
ConvMAE: Masked Convolution Meets Masked Autoencoders

ConvMAE ConvMAE: Masked Convolution Meets Masked Autoencoders Peng Gao1, Teli Ma1, Hongsheng Li2, Jifeng Dai3, Yu Qiao1, 1 Shanghai AI Laboratory, 2 M

Alpha VL Team of Shanghai AI Lab 345 Jan 8, 2023
Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir Website | arXiv | BibTeX Official PyTo

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 385 Jan 6, 2023
Contains code for the paper "Vision Transformers are Robust Learners".

Vision Transformers are Robust Learners This repository contains the code for the paper Vision Transformers are Robust Learners by Sayak Paul* and Pin

Sayak Paul 103 Jan 5, 2023
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Katsuya Hyodo 24 Mar 2, 2022
High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Anakin2.0 Welcome to the Anakin GitHub. Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineer

null 514 Dec 28, 2022
GrabGpu_py: a scripts for grab gpu when gpu is free

GrabGpu_py a scripts for grab gpu when gpu is free. WaitCondition: gpu_memory >

tianyuluan 3 Jun 18, 2022
Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)

Self-Supervised Models are Continual Learners This is the official repository for the paper: Self-Supervised Models are Continual Learners Enrico Fini

Enrico Fini 73 Dec 18, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 4, 2023
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 5.7k Feb 12, 2021
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

Andrej 498 Dec 30, 2022
Pytorch implementation of MaskGIT: Masked Generative Image Transformer

Pytorch implementation of MaskGIT: Masked Generative Image Transformer

Dominic Rampas 247 Dec 16, 2022
PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

Subin An 8 Nov 21, 2022
Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Clockwork VAEs in JAX/Flax Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax, ported

Julius Kunze 26 Oct 5, 2022
Official implementation of the paper "AAVAE: Augmentation-AugmentedVariational Autoencoders"

AAVAE Official implementation of the paper "AAVAE: Augmentation-AugmentedVariational Autoencoders" Abstract Recent methods for self-supervised learnin

Grid AI Labs 48 Dec 12, 2022
The official codes of "Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners".

SSL models are Strong UDA learners Introduction This is the official code of paper "Semi-supervised Models are Strong Unsupervised Domain Adaptation L

Yabin Zhang 26 Dec 26, 2022
Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

ARAE Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun https://arxiv.org/abs/1706.04223 Disc

Junbo (Jake) Zhao 399 Jan 2, 2023