SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation
Efficient Self-Ensemble Framework for Semantic Segmentation by Walid Bousselham, Guillaume Thibault, Lucas Pagano, Archana Machireddy, Joe Gray, Young Hwan Chang and Xubo Song.
This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for SenFormer.
🔨
Installation
Conda environment
- Clone this repository and enter it:
git clone [email protected]:WalBouss/SenFormer.git && cd SenFormer
. - Create a conda environment
conda create -n senformer python=3.8
, and activate itconda activate senformer
. - Install Pytorch and torchvision
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.2 -c pytorch
— (you may also switch to other version by specifying the version number). - Install MMCV library
pip install mmcv-full==1.4.0
- Install MMSegmentation library by running
pip install -e .
in SenFormer directory. - Install other requirements
pip install timm einops
Here is a full script for setting up a conda environment to use SenFormer (with CUDA 10.2
and pytorch 1.7.1
):
conda create -n senformer python=3.8
conda activate senformer
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.2 -c pytorch
git clone [email protected]:WalBouss/SenFormer.git && cd SenFormer
pip install mmcv-full==1.4.0
pip install -e .
pip install timm einops
Datasets
For datasets preparations please refer to MMSegmentation guidelines.
Pretrained weights
ResNet pretrained weights will be automatically downloaded before training.
For Swin Transformer ImageNet pretrained weights, you can either:
- run
bash tools/download_swin_weights.sh
in SenFormer project to download all Swin Transformer pretrained weights (it will place weights underpretrain/
folder ). - download desired backbone weights here:
Swin-T
,Swin-S
,Swin-B
,Swin-L
and place them underpretrain/
folder. - download weights from official repository then, convert them to mmsegmentation format following mmsegmentation guidelines.
🎯
Model Zoo
SenFormer models with ResNet and Swin's backbones and ADE20K, COCO-Stuff 10K, Pascal Context and Cityscapes.
ADE20K
Backbone | mIoU | mIoU (MS) | #params | FLOPs | Resolution | Download | |
---|---|---|---|---|---|---|---|
ResNet-50 | 44.6 | 45.6 | 144M | 179G | 512x512 | model | config |
ResNet-101 | 46.5 | 47.0 | 163M | 199G | 512x512 | model | config |
Swin-Tiny | 46.0 | 46.4 | 144M | 179G | 512x512 | model | config |
Swin-Small | 49.2 | 50.4 | 165M | 202G | 512x512 | model | config |
Swin-Base | 51.8 | 53.2 | 204M | 242G | 640x640 | model | config |
Swin-Large | 53.1 | 54.2 | 314M | 546G | 640x640 | model | config |
COCO-Stuff 10K
Backbone | mIoU | mIoU (MS) | #params | Resolution | Download | |
---|---|---|---|---|---|---|
ResNet-50 | 39.0 | 39.7 | 144M | 512x512 | model | config |
ResNet-101 | 39.6 | 40.6 | 163M | 512x512 | model | config |
Swin-Large | 49.1 | 50.1 | 314M | 512x512 | model | config |
Pascal Context
Backbone | mIoU | mIoU (MS) | #params | Resolution | Download | |
---|---|---|---|---|---|---|
ResNet-50 | 53.2 | 54.3 | 144M | 480x480 | model | config |
ResNet-101 | 55.1 | 56.6 | 163M | 480x480 | model | config |
Swin-Large | 62.4 | 64.0 | 314M | 480x480 | model | config |
Cityscapes
Backbone | mIoU | mIoU (MS) | #params | Resolution | Download | |
---|---|---|---|---|---|---|
ResNet-50 | 78.8 | 80.1 | 144M | 512x1024 | model | config |
ResNet-101 | 80.3 | 81.4 | 163M | 512x1024 | model | config |
Swin-Large | 82.2 | 83.3 | 314M | 512x1024 | model | config |
🔭
Inference
Download one checkpoint weights from above, for example SenFormer with ResNet-50 backbone on ADE20K:
Inference on a dataset
# Single-gpu testing
python tools/test.py senformer_configs/senformer/ade20k/senformer_fpnt_r50_512x512_160k_ade20k.py /path/to/checkpoint_file
# Multi-gpu testing
./tools/dist_test.sh senformer_configs/senformer/ade20k/senformer_fpnt_r50_512x512_160k_ade20k.py /path/to/checkpoint_file <GPU_NUM>
# Multi-gpu, multi-scale testing
tools/dist_test.sh senformer_configs/senformer/ade20k/senformer_fpnt_r50_512x512_160k_ade20k.py /path/to/checkpoint_file <GPU_NUM> --aug-test
Inference on custom data
To generate segmentation maps for your own data, run the following command:
python demo/image_demo.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE}
Run python demo/image_demo.py --help
for additional options.
🔩
Training
Follow above instructions to download ImageNet pretrained weights for backbones and run one of the following command:
# Single-gpu training
python tools/train.py path/to/model/config
# Multi-gpu training
./tools/dist_train.sh path/to/model/config <GPU_NUM>
For example to train SenFormer
with a ResNet-50
as backbone on ADE20K
:
# Single-gpu training
python tools/train.py senformer_configs/senformer/ade20k/senformer_fpnt_r50_512x512_160k_ade20k.py
# Multi-gpu training
./tools/dist_train.sh senformer_configs/senformer/ade20k/senformer_fpnt_r50_512x512_160k_ade20k.py <GPU_NUM>
Note that the default learning rate and training schedule is for an effective batch size of 16, (e.g. 8 GPUs & 2 imgs/gpu).
⭐
Acknowledgement
This code is build using MMsegmentation library as codebase and uses timm and einops as well.
📚
Citation
If you find this repository useful, please consider citing our work
@article{bousselham2021senformer,
title={Efficient Self-Ensemble Framework for Semantic Segmentation},
author={Walid Bousselham, Guillaume Thibault, Lucas Pagano, Archana Machireddy, Joe Gray, Young Hwan Chang, Xubo Song},
journal={arXiv preprint arXiv:2111.13280},
year={2021}
}