RegionProxy
Figure 2. Performance vs. GFLOPs on ADE20K val split.
Semantic Segmentation by Early Region Proxy
CVPR 2022 (Poster) [arXiv]
Installation
Note: recommend using the exact version of the packages to avoid running issues.
-
Install PyTorch 1.7.1 and torchvision 0.8.2 following the official guide.
-
Install timm 0.4.12 and einops:
pip install timm==0.4.12 einops
-
This project depends on mmsegmentation 0.17 and mmcv 1.3.13, so you may follow its instructions to setup environment and prepare datasets.
Models
ADE20K
backbone | Resolution | FLOPs | #params. | mIoU | mIoU (ms+flip) | FPS | download |
---|---|---|---|---|---|---|---|
ViT-Ti/16 | 512x512 | 3.9G | 5.8M | 42.1 | 43.1 | 38.9 | [model] |
ViT-S/16 | 512x512 | 15G | 22M | 47.6 | 48.5 | 32.1 | [model] |
R26+ViT-S/32 | 512x512 | 16G | 36M | 47.8 | 49.1 | 28.5 | [model] |
ViT-B/16 | 512x512 | 59G | 87M | 49.8 | 50.5 | 20.1 | [model] |
R50+ViT-L/32 | 640x640 | 82G | 323M | 51.0 | 51.7 | 12.7 | [model] |
ViT-L/16 | 640x640 | 326G | 306M | 52.9 | 53.4 | 6.6 | [model] |
Cityscapes
backbone | Resolution | FLOPs | #params. | mIoU | mIoU (ms+flip) | download |
---|---|---|---|---|---|---|
ViT-Ti/16 | 768x768 | 69G | 6M | 76.5 | 77.7 | [model] |
ViT-S/16 | 768x768 | 270G | 23M | 79.8 | 81.5 | [model] |
ViT-B/16 | 768x768 | 1064G | 88M | 81.0 | 82.2 | [model] |
ViT-L/16 | 768x768 | - | 307M | 81.4 | 82.7 | [model] |
Evaluation
You may evaluate the model on single GPU by running:
python test.py \
--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
--checkpoint /path/to/ckpt \
--eval mIoU
To evaluate on multiple GPUs, run:
python -m torch.distributed.launch --nproc_per_node 8 test.py \
--launcher pytorch \
--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
--checkpoint /path/to/ckpt
--eval mIoU
You may add --aug-test
to enable multi-scale + flip evaluation. The test.py
script is mostly copy-pasted from mmsegmentation. Please refer to this link for more usage (e.g., visualization).
Training
The first step is to prepare the pre-trained weights. Following Segmenter, we use AugReg pre-trained weights on our tiny, small and large models, and we use DeiT pre-trained weights on our base models. Do following steps to prepare the pre-trained weights for model initialization:
-
For DeiT weight, simply download from this link. For AugReg weights, first acquire the timm-style models:
import timm m = timm.create_model('vit_tiny_patch16_384', pretrained=True)
The full list of entries can be found here (vanilla ViTs) and here (hybrid models).
-
Convert the timm models to mmsegmentation style using this script.
We train all models on 8 V100 GPUs. For example, to train RegProxy-Ti/16, run:
python -m torch.distributed.launch --nproc_per_node 8 train.py
--launcher pytorch \
--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
--work-dir /path/to/workdir \
--options model.pretrained=/path/to/pretrained/model
You may need to adjust data.samples_per_gpu
if you plan to train on less GPUs. Please refer to this link for more training optioins.
Citation
@article{zhang2022semantic,
title={Semantic Segmentation by Early Region Proxy},
author={Zhang, Yifan and Pang, Bo and Lu, Cewu},
journal={arXiv preprint arXiv:2203.14043},
year={2022}
}