Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

Overview

ADE20k Semantic segmentation with MAE

Getting started

  1. Install the mmsegmentation library and some required packages.
pip install mmcv-full==1.3.0 mmsegmentation==0.11.0
pip install scipy timm==0.3.2
  1. Install apex for mixed-precision training
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
  1. Follow the guide in mmseg to prepare the ADE20k dataset.

Fine-tuning for Reproducing Results of MAE ViT-Base

Command:

tools/dist_train.sh configs/mae/upernet_mae_base_12_512_slide_160k_ade20k.py 8 --seed 0  --options model.pretrained=https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth

Expected results log(paper results: 48.1 mIoU):

+--------+-------+-------+-------+
| Scope  | mIoU  | mAcc  | aAcc  |
+--------+-------+-------+-------+
| global | 48.15 | 58.99 | 83.05 |
+--------+-------+-------+-------+

Evaluation

Command format:

tools/dist_test.sh  <CONFIG_PATH> <CHECKPOINT_PATH> <NUM_GPUS> --eval mIoU

Acknowledgment

This code is built using the mmsegmentation library, Timm library, the Swin repository, XCiT, SETR, BEiT and the MAE repository.

You might also like...
Classical OCR DCNN reproduction based on PaddlePaddle framework.

Paddle-SVHN Classical OCR DCNN reproduction based on PaddlePaddle framework. This project reproduces Multi-digit Number Recognition from Street View I

YOLOX-Paddle - A reproduction of YOLOX by PaddlePaddle
YOLOX-Paddle - A reproduction of YOLOX by PaddlePaddle

YOLOX-Paddle A reproduction of YOLOX by PaddlePaddle 数据集准备 下载COCO数据集,准备为如下路径 /ho

Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Deep Learning for image classification pip install -r http://webia.lip6.fr/~baskiotisn/requirements-amal.txt Train an autoencoder python3 train_auto

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

This is a release of our VIMPAC paper to illustrate the implementations. The pretrained checkpoints and scripts will be soon open-sourced in HuggingFace transformers.

EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling This is the official implementation for "Frustratingly Simple Pretraining Al

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners This repository is built upon BEiT, thanks very much! Now, we on

PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

MAE for Self-supervised ViT Introduction This is an unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-sup

Comments
  • Paper results use 100 epochs ~= 126k iterations. These results use 160k iterations?

    Paper results use 100 epochs ~= 126k iterations. These results use 160k iterations?

    The paper reports results for 100 epochs of training with a batch size of 16. For the 20,210 ade20k training images this is 20,210x100/16 ~= 126k iterations. I noticed your results use 160k iterations -- any idea if this reproduces the results with 100 epochs?

    opened by cjrd 2
  • About the Pre-trained Model

    About the Pre-trained Model

    Hi @implus, thanks for the nice work of reproducing the segmentation results of MAE!

    I checked the log you provided, and noticed that unexpected keys equals to norm.weight, norm.bias https://github.com/implus/mae_segmentation/blob/main/log/20220131_012835.log#L229

    Does it mean that the pre-trained model is first fine-tuned on ImageNet-1K, and then be loaded as the backbone in segmentation? Is this a common practice for self-supervised methods?

    opened by Haochen-Wang409 0
  • model weight

    model weight

    Dear,

    Thanks for your great work!

    With your offered code and hyper-parameters, I get the results as follows:

    2022-10-01 04:03:09,229 - mmseg - INFO - Iter(val) [16000]      mIoU: 0.3869, mAcc: 0.5037, aAcc: 0.8005
    2022-10-01 05:56:34,575 - mmseg - INFO - Iter(val) [32000]      mIoU: 0.4353, mAcc: 0.5557, aAcc: 0.8148
    2022-10-01 07:49:43,813 - mmseg - INFO - Iter(val) [48000]      mIoU: 0.4535, mAcc: 0.5794, aAcc: 0.8188
    2022-10-01 09:42:33,149 - mmseg - INFO - Iter(val) [64000]      mIoU: 0.4523, mAcc: 0.5758, aAcc: 0.8216
    2022-10-01 11:35:34,234 - mmseg - INFO - Iter(val) [80000]      mIoU: 0.4655, mAcc: 0.5783, aAcc: 0.8256
    2022-10-01 13:28:33,442 - mmseg - INFO - Iter(val) [96000]      mIoU: 0.4648, mAcc: 0.5726, aAcc: 0.8279
    2022-10-01 15:21:28,416 - mmseg - INFO - Iter(val) [112000]     mIoU: 0.4678, mAcc: 0.5798, aAcc: 0.8252
    2022-10-01 17:14:35,033 - mmseg - INFO - Iter(val) [128000]     mIoU: 0.4683, mAcc: 0.5806, aAcc: 0.8270
    2022-10-01 19:07:43,025 - mmseg - INFO - Iter(val) [144000]     mIoU: 0.4729, mAcc: 0.5804, aAcc: 0.8279
    2022-10-01 21:00:48,207 - mmseg - INFO - Iter(val) [160000]     mIoU: 0.4758, mAcc: 0.5841, aAcc: 0.8293
    

    It seems a few lower than yours.

    Could you provide the model weights that reach 48.1% ? Sincerely.

    opened by Vickeyhw 0
  • Visualization or demo?

    Visualization or demo?

    Thanks for your effort in sharing this excellent work. Can you please provide a demo of applying the pre-trained models to custom images? Does the network apply masking during training and testing like MAE?

    opened by zobeirraisi 0
Owner
Learning Deeper
null
ConvMAE: Masked Convolution Meets Masked Autoencoders

ConvMAE ConvMAE: Masked Convolution Meets Masked Autoencoders Peng Gao1, Teli Ma1, Hongsheng Li2, Jifeng Dai3, Yu Qiao1, 1 Shanghai AI Laboratory, 2 M

Alpha VL Team of Shanghai AI Lab 345 Jan 8, 2023
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

Andrej 498 Dec 30, 2022
CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

Fraunhofer SCAI 10 Oct 11, 2022
SeMask: Semantically Masked Transformers for Semantic Segmentation.

SeMask: Semantically Masked Transformers Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi This repo co

Picsart AI Research (PAIR) 186 Dec 30, 2022
A simple, unofficial implementation of MAE using pytorch-lightning

Masked Autoencoders in PyTorch A simple, unofficial implementation of MAE (Masked Autoencoders are Scalable Vision Learners) using pytorch-lightning.

Connor Anderson 20 Dec 3, 2022
Final project code: Implementing MAE with downscaled encoders and datasets, for ESE546 FA21 at University of Pennsylvania

546 Final Project: Masked Autoencoder Haoran Tang, Qirui Wu 1. Training To train the network, please run mae_pretraining.py. Please modify folder path

Haoran Tang 0 Apr 22, 2022
YOLOv5🚀 reproduction by Guo Quanhao using PaddlePaddle

YOLOv5-Paddle YOLOv5 ?? reproduction by Guo Quanhao using PaddlePaddle 支持AutoBatch 支持AutoAnchor 支持GPU Memory 快速开始 使用AIStudio高性能环境快速构建YOLOv5训练(PaddlePa

QuanHao Guo 20 Nov 14, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 32 Sep 21, 2022
A pytorch reproduction of { Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation }.

A PyTorch Reproduction of HCN Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. Ch

Guyue Hu 210 Dec 31, 2022
Reproduction process of AlexNet

PaddlePaddle论文复现杂谈 背景 注:该repo基于PaddlePaddle,对AlexNet进行复现。时间仓促,难免有所疏漏,如果问题或者想法,欢迎随时提issue一块交流。 飞桨论文复现赛地址:https://aistudio.baidu.com/aistudio/competitio

null 19 Nov 29, 2022