Unified-EPT
Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.
Installation
- Linux, CUDA>=10.0, GCC>=5.4
- Python>=3.7
- Create a conda environment:
conda create -n unept python=3.7 pip
Then, activate the environment:
conda activate unept
- PyTorch>=1.5.1, torchvision>=0.6.1 (following instructions here)
For example:
conda install pytorch==1.5.1 torchvision==0.6.1 cudatoolkit=10.2 -c pytorch
- Install MMCV, MMSegmentation, timm
pip install -r requirements.txt
- Install Deformable DETR and compile the CUDA operators (the instructions can be found here).
Data Preparation
Please following the code from openseg to generate ground truth for boundary refinement.
The data format should be like this.
ADE20k
You can download the processed dt_offset
file here.
path/to/ADEChallengeData2016/
images/
training/
validation/
annotations/
training/
validation/
dt_offset/
training/
validation/
PASCAL-Context
You can download the processed dataset here.
path/to/PASCAL-Context/
train/
image/
label/
dt_offset/
val/
image/
label/
dt_offset/
Usage
Training
The default is for multi-gpu, DistributedDataParallel training.
python -m torch.distributed.launch --nproc_per_node=8 \ # specify gpu number
--master_port=29500 \
train.py --launcher pytorch \
--config /path/to/config_file
- specify the
data_root
in the config file; - log dir will be created in
./work_dirs
; - download the DeiT pretrained model and specify the
pretrained
path in the config file.
Evaluation
# single-gpu testing
python test.py --checkpoint /path/to/checkpoint \
--config /path/to/config_file \
--eval mIoU \
[--out ${RESULT_FILE}] [--show] \
--aug-test \ # for multi-scale flip aug
# multi-gpu testing (4 gpus, 1 sample per gpu)
python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 \
test.py --launcher pytorch --eval mIoU \
--config_file /path/to/config_file \
--checkpoint /path/to/checkpoint \
--aug-test \ # for multi-scale flip aug
Results
We report results on validation sets.
Backbone | Crop Size | Batch Size | Dataset | Lr schd | Mem(GB) | mIoU(ms+flip) | config |
---|---|---|---|---|---|---|---|
Res-50 | 480x480 | 16 | ADE20K | 160K | 7.0G | 46.1 | config |
DeiT | 480x480 | 16 | ADE20K | 160K | 8.5G | 50.5 | config |
DeiT | 480x480 | 16 | PASCAL-Context | 160K | 8.5G | 55.2 | config |
Security
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.
Citation
If you use this code and models for your research, please consider citing:
@article{zhu2021unified,
title={A Unified Efficient Pyramid Transformer for Semantic Segmentation},
author={Zhu, Fangrui and Zhu, Yi and Zhang, Li and Wu, Chongruo and Fu, Yanwei and Li, Mu},
journal={arXiv preprint arXiv:2107.14209},
year={2021}
}
Acknowledgment
We thank the authors and contributors of MMCV, MMSegmentation, timm and Deformable DETR.