PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Meta Research

Last update: Jan 4, 2023

Related tags

Deep Learning mae

Overview

Masked Autoencoders: A PyTorch Implementation

This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners:

@Article{MaskedAutoencoders2021,
  author  = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
  journal = {arXiv:2111.06377},
  title   = {Masked Autoencoders Are Scalable Vision Learners},
  year    = {2021},
}

The original implementation was in TensorFlow+TPU. This re-implementation is in PyTorch+GPU.
This repo is a modification on the DeiT repo. Installation and preparation follow that repo.
This repo is based on timm==0.3.2, for which a fix is needed to work with PyTorch 1.8.1+.

Catalog

Visualization demo
Pre-trained checkpoints + fine-tuning code
Pre-training code

Visualization demo

Run our interactive visualization demo using Colab notebook (no GPU needed):

Fine-tuning with pre-trained checkpoints

The following table provides the pre-trained checkpoints used in the paper, converted from TF/TPU to PT/GPU:

	ViT-Base	ViT-Large	ViT-Huge
pre-trained checkpoint	download	download	download
md5	`8cad7c`	`b8b06e`	`9bdbb0`

The fine-tuning instruction is in FINETUNE.md.

By fine-tuning these pre-trained models, we rank #1 in these classification tasks (detailed in the paper):

	ViT-B	ViT-L	ViT-H	ViT-H₄₄₈	prev best
ImageNet-1K (no external data)	83.6	85.9	86.9	87.8	87.1
following are evaluation of the same model weights (fine-tuned in original ImageNet-1K):
ImageNet-Corruption (error rate)	51.7	41.8	33.8	36.8	42.5
ImageNet-Adversarial	35.9	57.1	68.2	76.7	35.8
ImageNet-Rendition	48.3	59.9	64.4	66.5	48.7
ImageNet-Sketch	34.5	45.3	49.6	50.9	36.0
following are transfer learning by fine-tuning the pre-trained MAE on the target dataset:
iNaturalists 2017	70.5	75.7	79.3	83.4	75.4
iNaturalists 2018	75.4	80.1	83.0	86.8	81.2
iNaturalists 2019	80.5	83.4	85.7	88.3	84.1
Places205	63.9	65.8	65.9	66.8	66.0
Places365	57.9	59.4	59.8	60.3	58.0

Pre-training

The pre-training instruction is in PRETRAIN.md.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Comments

Problems about reproducing ViT-Base

Thanks for your great work. My reproduction result for ViT-B is only 83.3, which is 0.3 lower than the paper result. And I have no idea what may cause this. My exp totally follows this repo, except the following changes: a. I use 32 V100 with batch size 128 (32x128 = 4096). The recommended setting of this repo is 64 V100 with batch size 64(64x64=4096). b. I didn't use submitit_pretrain.py but directly use main_pretrain.py for multi-node training.

Are these two changes the potential causes for performance gap? Any suggestions would be deeply appreciated~

opened by leeyegy 13
__init__() got an unexpected keyword argument 'qk_scale'

sys.path.append('..') import models_mae dir(models_mae.Block)

The module has been successfully imported. But there is an error happened to Block Class. Use it in jupyter notebook. Any idea of it?

opened by zlgenuine 8
How to pretrain mae in 1 node with 8 gpus?

I m trying to pretrain a vit_small model in 1 node with 8 gpus, but the submitit_pretrain.py goes wrong with: Traceback (most recent call last): File "submitit_pretrain.py", line 131, in main() File "submitit_pretrain.py", line 89, in main args.job_dir = get_shared_folder() / "%j" File "submitit_pretrain.py", line 39, in get_shared_folder raise RuntimeError("No shared folder available") RuntimeError: No shared folder available

it seems like that I have no shared folder, so how can I pretrain mae in 1 node with 8 gpus? Thanks for your exciting job!!

opened by yanjk3 8
Does this implement support timm > 0.3.2?
Hi,

Thanks for releasing this awesome work!

I notice the code will check the version of timm.

assert timm.__version__ == "0.3.2" # version check

Does this implement support updated timm, e.g. 0.4.12?

Thanks!

Best, Vera
opened by verazuo 6
distributed training has the same speed as single gpu training

First, thanks for the great work. Unfortunately, here I faced something really weird. I tried to train MAE on my single node with,

python -m torch.distributed.launch --nproc_per_node=4 main_pretrain.py --batch_size 16 --model mae_vit_base_patch16 --norm_pix_loss --mask_ratio 0.75 --epochs 800 --warmup_epochs 0 --blr 1.5e-4 --weight_decay 0.05 --data_path /home/shaoshihao

Something goes weirdly is, it just run at the same speed as i ran that on single gpu with the command, python main_pretrain.py --batch_size 64 --model mae_vit_base_patch16 --norm_pix_loss --mask_ratio 0.75 --epochs 800 --warmup_epochs 0 --blr 1.5e-4 --weight_decay 0.05 --data_path /home/shaoshihao

I tracked the GPU status and it looks good. The 4 GPUs are all ~100% utilization rate when i used 4 gpus.

Any advice will be deeply appreciated.

opened by LouieShao 5
提供的预训练模型和代码中的模型不一致？

Why the structure in the pre-trained checkpoint Vit-Base is different in the models_mae.py. The cls_token layer exist in the checkpoint Vit-Base you provide, but not in the models_mae.py.

opened by CGZQQQ 5
loss fluctuates during training

Thanks for the great work!

I wanted to pre-train a mae_vit_base_patch16 model on imagenet-1k, and I found the loss fluctuates during training. I was wondering if this is normal for pre-training mae.

opened by cailk 4

ValueError: relative path can't be expressed as a file URI

Also having the same issue as #55 and #57 when running pretrain. I have /checkpoint, /checkpoint/{USER} and /checkpoint/{USER}/experiments. Unclear how to set up --job_dir and getting the following error. Any updates or suggestions? Appreciate any help!

Traceback (most recent call last):
  File "submitit_pretrain.py", line 131, in <module>
    main()
  File "submitit_pretrain.py", line 120, in main
    args.dist_url = get_init_file().as_uri()
  File "/anaconda/envs/mae/lib/python3.6/pathlib.py", line 721, in as_uri
    raise ValueError("relative path can't be expressed as a file URI")
ValueError: relative path can't be expressed as a file URI

opened by kwhuang88228 3

ADE20k Learning Rate

Section A.4 of the paper has "We search for the optimal lr for each entry in Table 5 " when referring to segmentation on ADE20k. Could you share these learning rates? My reproduced baselines are about 2 points too low.

Thanks!

opened by cjrd 3
Can you provide more results about the ViT-B Mask R-CNN+MAE 25ep/50ep

Thanks for your awesome work! I am doing some experiments with MAE. Can you provide more results (e.g. mask AP) about these two models (ViT-B Mask R-CNN + MAE 25ep/50ep). I want to cite these data in my paper. Thanks! Looking forward to your reply.

opened by czczup 3
Release of the linear probing code

Hey,

Thanks for this awesome code! I am wondering whether you plan to release the linear probing code or not. According to the third party MAE repo, the linear probing results seem hard to be reproduced. It would be great if you plan to do so. Thanks!

opened by Jeff-LiangF 3
[Question] Why non-masked patches look worse in pixel reconstruction example image

Hi, in this image: https://user-images.githubusercontent.com/11435359/147859292-77341c70-2ed8-4703-b153-f505dcb6f2f8.png Why do the patches that aren't masked look worse?

opened by austinmw 0
Shouldn't the patch embeddings be trained only on the patches that survived masking? (Rather than the original image)

In the code it seems that the patch embeddings are trained on the original image rather than learning only on the patches that survive the masking process. Does this mean the implementation does not follow the paper? From the paper:

"MAE encoder. Our encoder is a ViT [16] but applied only on visible, unmasked patches. Just as in a standard ViT, our encoder embeds patches by a linear projection with added positional embeddings, and then processes the resulting set via a series of Transformer blocks. However, our encoder only operates on a small subset (e.g., 25%) of the full set. Masked patches are removed; no mask tokens are used. This allows us to train very large encoders with only a frac- tion of compute and memory. The full set is handled by a lightweight decoder, described next."

From my interpretation it seems as if the embeddings should also be run only on the small subset of patches.

opened by Eduard6421 0
Something about the training set

I have a question that whether it (the pretrained module) works on unlabelled images or other datasets containing different images without being classified before. Thank you!

opened by Daniel12345abcde 0
AttributeError("module {!r} has no attribute " AttributeError: module 'numpy' has no attribute 'float'

If you are getting this error, the numpy version is too new. This repository seems to be incompatible with newer numpy. I had to downgrade the numpy with the following command.

pip install "numpy<1.24"

opened by apple2373 0

Owner

Meta Research

GitHub

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners This is a coarse version for MAE, only make the pretrain model, the fine

214 Dec 29, 2022

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

mae-repo PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https:/

1 Dec 14, 2021

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training [Arxiv] VideoMAE: Masked Autoencoders are Data-Efficient Learne

Multimedia Computing Group, Nanjing University

697 Jan 7, 2023

ConvMAE: Masked Convolution Meets Masked Autoencoders

ConvMAE ConvMAE: Masked Convolution Meets Masked Autoencoders Peng Gao1, Teli Ma1, Hongsheng Li2, Jifeng Dai3, Yu Qiao1, 1 Shanghai AI Laboratory, 2 M

345 Jan 8, 2023

Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir Website | arXiv | BibTeX Official PyTo

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL)

385 Jan 6, 2023

Contains code for the paper "Vision Transformers are Robust Learners".

Vision Transformers are Robust Learners This repository contains the code for the paper Vision Transformers are Robust Learners by Sayak Paul* and Pin

103 Jan 5, 2023

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

24 Mar 2, 2022

High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Anakin2.0 Welcome to the Anakin GitHub. Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineer

514 Dec 28, 2022

GrabGpu_py: a scripts for grab gpu when gpu is free

GrabGpu_py a scripts for grab gpu when gpu is free. WaitCondition: gpu_memory >

3 Jun 18, 2022

Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)

Self-Supervised Models are Continual Learners This is the official repository for the paper: Self-Supervised Models are Continual Learners Enrico Fini

73 Dec 18, 2022

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

6.9k Jan 4, 2023

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

5.7k Feb 12, 2021

PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Related tags

Overview

Masked Autoencoders: A PyTorch Implementation

Catalog

Visualization demo

Fine-tuning with pre-trained checkpoints

Pre-training

License

Comments

Owner

Meta Research

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

ConvMAE: Masked Convolution Meets Masked Autoencoders

Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

Contains code for the paper "Vision Transformers are Robust Learners".

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

GrabGpu_py: a scripts for grab gpu when gpu is free

Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

Pytorch implementation of MaskGIT: Masked Generative Image Transformer

PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Official implementation of the paper "AAVAE: Augmentation-AugmentedVariational Autoencoders"

The official codes of "Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners".

Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun