SeMask: Semantically Masked Transformers
Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi
This repo contains the code for our paper SeMask: Semantically Masked Transformers for Semantic Segmentation.
Contents
1. Results
Note:
† denotes the backbones were pretrained on ImageNet-22k and 384x384 resolution images.
ADE20K
Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint |
---|---|---|---|---|---|---|---|
SeMask-T FPN | SeMask Swin-T | 512x512 | 42.11 | 43.16 | 35M | config | TBD |
SeMask-S FPN | SeMask Swin-S | 512x512 | 45.92 | 47.63 | 56M | config | TBD |
SeMask-B FPN | SeMask Swin-B† | 512x512 | 49.35 | 50.98 | 96M | config | TBD |
SeMask-L FPN | SeMask Swin-L† | 640x640 | 51.89 | 53.52 | 211M | config | TBD |
SeMask-L MaskFormer | SeMask Swin-L† | 640x640 | 54.75 | 56.15 | 219M | config | TBD |
SeMask-L Mask2Former | SeMask Swin-L† | 640x640 | 56.41 | 57.52 | 222M | config | TBD |
SeMask-L Mask2Former FAPN | SeMask Swin-L† | 640x640 | 56.68 | 58.00 | 227M | config | TBD |
SeMask-L Mask2Former MSFAPN | SeMask Swin-L† | 640x640 | 56.54 | 58.22 | 224M | config | TBD |
Cityscapes
Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint |
---|---|---|---|---|---|---|---|
SeMask-T FPN | SeMask Swin-T | 768x768 | 74.92 | 76.56 | 34M | config | TBD |
SeMask-S FPN | SeMask Swin-S | 768x768 | 77.13 | 79.14 | 56M | config | TBD |
SeMask-B FPN | SeMask Swin-B† | 768x768 | 77.70 | 79.73 | 96M | config | TBD |
SeMask-L FPN | SeMask Swin-L† | 768x768 | 78.53 | 80.39 | 211M | config | TBD |
SeMask-L Mask2Former | SeMask Swin-L† | 512x1024 | 83.97 | 84.98 | 222M | config | TBD |
COCO-Stuff 10k
Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint |
---|---|---|---|---|---|---|---|
SeMask-T FPN | SeMask Swin-T | 512x512 | 37.53 | 38.88 | 35M | config | TBD |
SeMask-S FPN | SeMask Swin-S | 512x512 | 40.72 | 42.27 | 56M | config | TBD |
SeMask-B FPN | SeMask Swin-B† | 512x512 | 44.63 | 46.30 | 96M | config | TBD |
SeMask-L FPN | SeMask Swin-L† | 640x640 | 47.47 | 48.54 | 211M | config | TBD |
2. Setup Instructions
We provide the codebase with SeMask incorporated into various models. Please check the setup instructions inside the corresponding folders:
- SeMask-FPN: Setup Instructions
- SeMask-MaskFormer: Setup Instructions
- SeMask-Mask2Former: Setup Instructions
- SeMask-FAPN: Setup Instructions
3. Citing SeMask
@article{jain2022semask,
title={SeMask: Semantically Masking Transformer Backbones for Effective Semantic Segmentation},
author={Jitesh Jain and Anukriti Singh and Nikita Orlov and Zilong Huang and Jiachen Li and Steven Walton and Humphrey Shi},
journal={arXiv preprint arXiv:...},
year={2022}
}
Acknowledgements
Code is based heavily on the following repositories: Swin-Transformer-Semantic-Segmentation, Mask2Former, MaskFormer and FaPN-full.