Official PyTorch implementation of SegFormer

Overview

NVIDIA Source Code License Python 3.8

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Figure 1: Performance of SegFormer-B0 to SegFormer-B5.

Project page | Paper | Demo (Youtube) | Demo (Bilibili)

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo.
NeurIPS 2021.

This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for SegFormer.

SegFormer is a simple, efficient and powerful semantic segmentation method, as shown in Figure 1.

We use MMSegmentation v0.13.0 as the codebase.

🔥 🔥 SegFormer is on MMSegmentation. 🔥 🔥

Installation

For install and data preparation, please refer to the guidelines in MMSegmentation v0.13.0.

Other requirements: pip install timm==0.3.2

An example (works for me): CUDA 10.1 and pytorch 1.7.1

pip install torchvision==0.8.2
pip install timm==0.3.2
pip install mmcv-full==1.2.7
pip install opencv-python==4.5.1.48
cd SegFormer && pip install -e . --user

Evaluation

Download trained weights.

Example: evaluate SegFormer-B1 on ADE20K:

# Single-gpu testing
python tools/test.py local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py /path/to/checkpoint_file

# Multi-gpu testing
./tools/dist_test.sh local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py /path/to/checkpoint_file <GPU_NUM>

# Multi-gpu, multi-scale testing
tools/dist_test.sh local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py /path/to/checkpoint_file <GPU_NUM> --aug-test

Training

Download weights pretrained on ImageNet-1K, and put them in a folder pretrained/.

Example: train SegFormer-B1 on ADE20K:

# Single-gpu training
python tools/train.py local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py 

# Multi-gpu training
./tools/dist_train.sh local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py <GPU_NUM>

Visualize

Here is a demo script to test a single image. More details refer to MMSegmentation's Doc.

python demo/image_demo.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${DEVICE_NAME}] [--palette-thr ${PALETTE}]

Example: visualize SegFormer-B1 on CityScapes:

python demo/image_demo.py demo/demo.png local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py \
/path/to/checkpoint_file --device cuda:0 --palette cityscapes

License

Please check the LICENSE file. SegFormer may be used non-commercially, meaning for research or evaluation purposes only. For business inquiries, please contact [email protected].

Citation

@article{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal={arXiv preprint arXiv:2105.15203},
  year={2021}
}
Comments
  • Mapillary Class Remapping

    Mapillary Class Remapping

    Hello, I see that Mapillary uses a remapping to 19 classes,

    https://github.com/NVlabs/SegFormer/blob/3561d14362abe60675755ee00d266308e4e3015e/mmseg/datasets/pipelines/transforms.py#L1025

    Does this mean the experiments done in the paper uses 19 classes for all methods on Mapilary?

    opened by serser 9
  • Simple SegFormer network class

    Simple SegFormer network class

    Hello How are you? Thanks for contributing to this project. It is difficult for us to use this project because it contains many other scripts. Did u check https://github.com/lucidrains/segformer-pytorch which is a third-party implementation for SegFormer? This project contains ONLY a simple segformer network class so it is easy to use. But the number of params of MiT-B0 network by this implementation is 7M. I know that the number of params of MiT-B0 is 3.6M in the paper. Could u check https://github.com/lucidrains/segformer-pytorch shortly? If it is difficult, could u make the SegFormer network class like the above implementation? Thanks

    opened by rose-jinyang 6
  • train error mit_b1.pth is not a checkpoint file

    train error mit_b1.pth is not a checkpoint file

    It's a great honor for me to study your reserch, when i download the pretrained model into pretrained directory . It shows as follows, hope you can give me some advice. Thanks for your time and kindness. image

    opened by peterlv666 5
  • Pretraining segformer on ImageNet-22K

    Pretraining segformer on ImageNet-22K

    The Swin transformer release a large model pretrained on ImageNet-22K for semantic segmentation and achieved a good result. I wonder if you are interested in improving segformer in a similar way? Thanks!

    opened by htzheng 4
  • Inference speed of the model

    Inference speed of the model

    Hello How are you? Thanks for contributing to this project. Which device did u test your models on?

    image

    You did NOT explain the device specification in the paper.

    opened by rose-jinyang 4
  • Training details

    Training details

    Hi, I'm trying to reproduce SegFormer on PASCAL VOC dataset. When using the codes of this repo, I could get ~77% mIoU (without multi-scale test). However, I only got ~75% mIoU with my reproduced code. Here are my training details.

    I have reproduced the training and validation data pipeline, including random scaling , random horizontal flipping , and random cropping, etc. For the model, I used the code of this repo and the pre-trained weights. I also used an AdamW optimizer with a warmup scheduler. The other optimizer settings are set as the same with this repo.

    Therefore, I'm wondering if there are any extra training details in SegFormer or mmseg itself. I'll very appreciate for your reply.

    opened by rulixiang 3
  • About the efficient attention module

    About the efficient attention module

    Hi,

    I would like to ask a question about the efficient attention module, please: I see that you use a reduction ratio R to descrease the spatial size of input sequences, normally this operation will produce a output sequence of spatial size N/R. But according to your Table.6 it doesn't, the output spatial size is still N. I would like to ask where do you upsample your sequence spatial size from N/R back to N in the attention module after the reduced QKV multiplication?

    Thank you!

    opened by yihongXU 3
  • Question on Mapillary pretrain when evaluating on cityscapes(val) dataset

    Question on Mapillary pretrain when evaluating on cityscapes(val) dataset

    I met a problem when training on Mapillary and evaluating on cityscapes. The class "wall" miou=0.0. Could you please provide the training log of Mapillary pretrain and eval?(prefer Model B2) Thanks a lot!

    +---------------+-------+-------+
    |     Class     |  IoU  |  Acc  |
    +---------------+-------+-------+
    |      road     | 96.93 | 98.09 |
    |    sidewalk   | 76.57 | 90.46 |
    |    building   | 89.02 | 95.67 |
    |      wall     |  0.0  |  0.0  |
    |     fence     | 35.52 | 59.93 |
    |      pole     | 52.85 | 63.03 |
    | traffic light | 59.63 | 71.81 |
    |  traffic sign | 68.11 | 77.09 |
    |   vegetation  | 89.89 | 96.67 |
    |    terrain    |  26.0 |  26.5 |
    |      sky      | 90.97 | 93.58 |
    |     person    | 72.78 | 87.27 |
    |     rider     | 33.21 |  41.0 |
    |      car      | 91.25 | 97.25 |
    |     truck     |  61.8 | 64.37 |
    |      bus      | 66.93 | 71.56 |
    |     train     | 62.85 | 65.31 |
    |   motorcycle  | 47.68 | 65.62 |
    |    bicycle    | 67.57 | 74.03 |
    +---------------+-------+-------+
    2021-06-21 16:06:43,150 - mmseg - INFO - Summary:
    2021-06-21 16:06:43,150 - mmseg - INFO - 
    +-------+-------+-------+
    |  aAcc |  mIoU |  mAcc |
    +-------+-------+-------+
    | 93.66 | 62.61 | 70.49 |
    +-------+-------+-------+
    
    opened by littleSunlxy 3
  • KeyError: 'AlignedResize is not in the pipeline registry'

    KeyError: 'AlignedResize is not in the pipeline registry'

    Hi,

    I hava a similar error to #2. I've just forked the repo to add a print statement, so fix #1 is included. When running python tools/test.py, I'm getting the following:

    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
        return obj_cls(**args)
      File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/pipelines/test_time_aug.py", line 59, in __init__
        self.transforms = Compose(transforms)
      File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/pipelines/compose.py", line 22, in __init__
        transform = build_from_cfg(transform, PIPELINES)
      File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 44, in build_from_cfg
        f'{obj_type} is not in the {registry.name} registry')
    KeyError: 'AlignedResize is not in the pipeline registry'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
        return obj_cls(**args)
      File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/ade.py", line 91, in __init__
        **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/custom.py", line 88, in __init__
        self.pipeline = Compose(pipeline)
      File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/pipelines/compose.py", line 22, in __init__
        transform = build_from_cfg(transform, PIPELINES)
      File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
        raise type(e)(f'{obj_cls.__name__}: {e}')
    KeyError: "MultiScaleFlipAug: 'AlignedResize is not in the pipeline registry'"
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "tools/test.py", line 170, in <module>
        main()
      File "tools/test.py", line 122, in main
        dataset = build_dataset(cfg.data.test)
      File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/builder.py", line 73, in build_dataset
        dataset = build_from_cfg(cfg, DATASETS, default_args)
      File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
        raise type(e)(f'{obj_cls.__name__}: {e}')
    KeyError: 'ADE20KDataset: "MultiScaleFlipAug: \'AlignedResize is not in the pipeline registry\'"'
    

    I've made a Google Colab to reproduce: https://colab.research.google.com/drive/1-t_lj5K2ZEFemxn88DSfcy9W7RTvklsz?usp=sharing

    opened by NielsRogge 3
  • No module named 'mmseg'

    No module named 'mmseg'

    hi thanks for your great repo

    I have no idea why but it seems to import mmseg not from local module but from installed one and it keeps shows me this error

    No module named 'mmseg'

    When I run pycharm as debugging mode, than it works all fine... only when I try to run it with terminal or run mode...

    any help will be very appreciated

    opened by ooodragon94 2
  • How to change checkpoint saving frequency

    How to change checkpoint saving frequency

    Hi, first of all, thank you for your research and code.

    I see that during training, the model is saved every 4000 iterations. Where can I change this spec, such that my model is saved every, lets say, 1000 iterations?

    Thank you

    opened by gcilli 2
  • How to convert the model to tensorrt or onnx?

    How to convert the model to tensorrt or onnx?

    For robot implementation, we need onnx or openvino version of segformer, but currently, I found that segformer is not to to be converted to those versions, Does anyone can help us to find the reason or share your successful examples, thank you!

    opened by yuchenlichuck 0
  • Dataset Creation

    Dataset Creation

    Hi,

    I am working on a task of semantic segmentation. I am facing issue in generation of data in the required format. Can anyone help me with free tools that I can use to generate the data?

    opened by FatemaD1 0
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • What does

    What does "type = IMTRv21_5" mean?

    I found it diffcult to understand this two file. #1 local_configs/base/models/segformer.py #2 local_configs/base/models/segformer.py I knew that #1 means the base of #2, but I find the code in file 2: "backbone = dict(type= ‘mit_b0’)",
    and the other code in file 1: "backbone = dict(type= ‘IMTRv21_5’)",

    I wonder what does "type = IMTRv21_5" mean? Please Guide me!

    opened by Buling-Knight 0
Owner
NVIDIA Research Projects
NVIDIA Research Projects
Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

Jie Lei 雷杰 612 Jan 4, 2023
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval. CLIP4Clip is a video-text retrieval model based

ArrowLuo 456 Jan 6, 2023
Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

Neosapience 103 Dec 23, 2022
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

THUHCSI 138 Oct 28, 2022
Official implementation of Meta-StyleSpeech and StyleSpeech

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dongchan Min, Dong Bok Lee, Eunho Yang, and Sung Ju Hwang This is an official code

min95 169 Jan 5, 2023
Mirco Ravanelli 2.3k Dec 27, 2022
Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Ubiquitous Knowledge Processing Lab 59 Dec 1, 2022
SAINT PyTorch implementation

SAINT-pytorch A Simple pyTorch implementation of "Towards an Appropriate Query, Key, and Value Computation for Knowledge Tracing" based on https://arx

Arshad Shaikh 63 Dec 25, 2022
Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

COCO LM Pretraining (wip) Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch. They were a

Phil Wang 44 Jul 28, 2022
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Phil Wang 5k Jan 2, 2023
A fast and easy implementation of Transformer with PyTorch.

FasySeq FasySeq is a shorthand as a Fast and easy sequential modeling toolkit. It aims to provide a seq2seq model to researchers and developers, which

宁羽 7 Jul 18, 2022
A PyTorch Implementation of End-to-End Models for Speech-to-Text

speech Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Conne

Awni Hannun 647 Dec 25, 2022
Pytorch implementation of Tacotron

Tacotron-pytorch A pytorch implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model. Requirements Install python 3 Install pytorc

soobin seo 203 Dec 2, 2022
Google AI 2018 BERT pytorch implementation

BERT-pytorch Pytorch implementation of Google AI's 2018 BERT, with simple annotation BERT 2018 BERT: Pre-training of Deep Bidirectional Transformers f

Junseong Kim 5.3k Jan 7, 2023
Unofficial PyTorch implementation of Google AI's VoiceFilter system

VoiceFilter Note from Seung-won (2020.10.25) Hi everyone! It's Seung-won from MINDs Lab, Inc. It's been a long time since I've released this open-sour

MINDs Lab 881 Jan 3, 2023
Implementation of ProteinBERT in Pytorch

ProteinBERT - Pytorch (wip) Implementation of ProteinBERT in Pytorch. Original Repository Install $ pip install protein-bert-pytorch Usage import torc

Phil Wang 92 Dec 25, 2022
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

Chi Han 43 Dec 28, 2022
PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.09

Keon Lee 142 Jan 6, 2023
PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

Facebook Research 605 Jan 2, 2023