Official PyTorch implementation of SegFormer

NVIDIA Research Projects

Last update: Dec 29, 2022

Related tags

Overview

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Figure 1: Performance of SegFormer-B0 to SegFormer-B5.

Project page | Paper | Demo (Youtube) | Demo (Bilibili)

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo.
NeurIPS 2021.

This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for SegFormer.

SegFormer is a simple, efficient and powerful semantic segmentation method, as shown in Figure 1.

We use MMSegmentation v0.13.0 as the codebase.

🔥 🔥 SegFormer is on MMSegmentation. 🔥 🔥

Installation

For install and data preparation, please refer to the guidelines in MMSegmentation v0.13.0.

Other requirements: pip install timm==0.3.2

An example (works for me): CUDA 10.1 and pytorch 1.7.1

pip install torchvision==0.8.2
pip install timm==0.3.2
pip install mmcv-full==1.2.7
pip install opencv-python==4.5.1.48
cd SegFormer && pip install -e . --user

Evaluation

Download trained weights.

Example: evaluate SegFormer-B1 on ADE20K:

# Single-gpu testing
python tools/test.py local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py /path/to/checkpoint_file

# Multi-gpu testing
./tools/dist_test.sh local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py /path/to/checkpoint_file <GPU_NUM>

# Multi-gpu, multi-scale testing
tools/dist_test.sh local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py /path/to/checkpoint_file <GPU_NUM> --aug-test

Training

Download weights pretrained on ImageNet-1K, and put them in a folder pretrained/.

Example: train SegFormer-B1 on ADE20K:

# Single-gpu training
python tools/train.py local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py 

# Multi-gpu training
./tools/dist_train.sh local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py <GPU_NUM>

Visualize

Here is a demo script to test a single image. More details refer to MMSegmentation's Doc.

python demo/image_demo.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${DEVICE_NAME}] [--palette-thr ${PALETTE}]

Example: visualize SegFormer-B1 on CityScapes:

python demo/image_demo.py demo/demo.png local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py \
/path/to/checkpoint_file --device cuda:0 --palette cityscapes

License

Please check the LICENSE file. SegFormer may be used non-commercially, meaning for research or evaluation purposes only. For business inquiries, please contact [email protected].

Citation

@article{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal={arXiv preprint arXiv:2105.15203},
  year={2021}
}

Comments

Mapillary Class Remapping

Hello, I see that Mapillary uses a remapping to 19 classes,

https://github.com/NVlabs/SegFormer/blob/3561d14362abe60675755ee00d266308e4e3015e/mmseg/datasets/pipelines/transforms.py#L1025

Does this mean the experiments done in the paper uses 19 classes for all methods on Mapilary?

opened by serser 9
Simple SegFormer network class

Hello How are you? Thanks for contributing to this project. It is difficult for us to use this project because it contains many other scripts. Did u check https://github.com/lucidrains/segformer-pytorch which is a third-party implementation for SegFormer? This project contains ONLY a simple segformer network class so it is easy to use. But the number of params of MiT-B0 network by this implementation is 7M. I know that the number of params of MiT-B0 is 3.6M in the paper. Could u check https://github.com/lucidrains/segformer-pytorch shortly? If it is difficult, could u make the SegFormer network class like the above implementation? Thanks

opened by rose-jinyang 6
train error mit_b1.pth is not a checkpoint file

It's a great honor for me to study your reserch, when i download the pretrained model into pretrained directory . It shows as follows, hope you can give me some advice. Thanks for your time and kindness.

opened by peterlv666 5
Pretraining segformer on ImageNet-22K

The Swin transformer release a large model pretrained on ImageNet-22K for semantic segmentation and achieved a good result. I wonder if you are interested in improving segformer in a similar way? Thanks!

opened by htzheng 4
Inference speed of the model

Hello How are you? Thanks for contributing to this project. Which device did u test your models on?

You did NOT explain the device specification in the paper.

opened by rose-jinyang 4
Training details

Hi, I'm trying to reproduce SegFormer on PASCAL VOC dataset. When using the codes of this repo, I could get ~77% mIoU (without multi-scale test). However, I only got ~75% mIoU with my reproduced code. Here are my training details.

I have reproduced the training and validation data pipeline, including random scaling , random horizontal flipping , and random cropping, etc. For the model, I used the code of this repo and the pre-trained weights. I also used an AdamW optimizer with a warmup scheduler. The other optimizer settings are set as the same with this repo.

Therefore, I'm wondering if there are any extra training details in SegFormer or mmseg itself. I'll very appreciate for your reply.

opened by rulixiang 3
About the efficient attention module

Hi,

I would like to ask a question about the efficient attention module, please: I see that you use a reduction ratio R to descrease the spatial size of input sequences, normally this operation will produce a output sequence of spatial size N/R. But according to your Table.6 it doesn't, the output spatial size is still N. I would like to ask where do you upsample your sequence spatial size from N/R back to N in the attention module after the reduced QKV multiplication?

Thank you!

opened by yihongXU 3

Question on Mapillary pretrain when evaluating on cityscapes(val) dataset

I met a problem when training on Mapillary and evaluating on cityscapes. The class "wall" miou=0.0. Could you please provide the training log of Mapillary pretrain and eval？(prefer Model B2) Thanks a lot!

+---------------+-------+-------+
|     Class     |  IoU  |  Acc  |
+---------------+-------+-------+
|      road     | 96.93 | 98.09 |
|    sidewalk   | 76.57 | 90.46 |
|    building   | 89.02 | 95.67 |
|      wall     |  0.0  |  0.0  |
|     fence     | 35.52 | 59.93 |
|      pole     | 52.85 | 63.03 |
| traffic light | 59.63 | 71.81 |
|  traffic sign | 68.11 | 77.09 |
|   vegetation  | 89.89 | 96.67 |
|    terrain    |  26.0 |  26.5 |
|      sky      | 90.97 | 93.58 |
|     person    | 72.78 | 87.27 |
|     rider     | 33.21 |  41.0 |
|      car      | 91.25 | 97.25 |
|     truck     |  61.8 | 64.37 |
|      bus      | 66.93 | 71.56 |
|     train     | 62.85 | 65.31 |
|   motorcycle  | 47.68 | 65.62 |
|    bicycle    | 67.57 | 74.03 |
+---------------+-------+-------+
2021-06-21 16:06:43,150 - mmseg - INFO - Summary:
2021-06-21 16:06:43,150 - mmseg - INFO - 
+-------+-------+-------+
|  aAcc |  mIoU |  mAcc |
+-------+-------+-------+
| 93.66 | 62.61 | 70.49 |
+-------+-------+-------+

opened by littleSunlxy 3

KeyError: 'AlignedResize is not in the pipeline registry'

Hi,

I hava a similar error to #2. I've just forked the repo to add a print statement, so fix #1 is included. When running python tools/test.py, I'm getting the following:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
    return obj_cls(**args)
  File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/pipelines/test_time_aug.py", line 59, in __init__
    self.transforms = Compose(transforms)
  File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/pipelines/compose.py", line 22, in __init__
    transform = build_from_cfg(transform, PIPELINES)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 44, in build_from_cfg
    f'{obj_type} is not in the {registry.name} registry')
KeyError: 'AlignedResize is not in the pipeline registry'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
    return obj_cls(**args)
  File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/ade.py", line 91, in __init__
    **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/custom.py", line 88, in __init__
    self.pipeline = Compose(pipeline)
  File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/pipelines/compose.py", line 22, in __init__
    transform = build_from_cfg(transform, PIPELINES)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
KeyError: "MultiScaleFlipAug: 'AlignedResize is not in the pipeline registry'"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/test.py", line 170, in <module>
    main()
  File "tools/test.py", line 122, in main
    dataset = build_dataset(cfg.data.test)
  File "/usr/local/lib/python3.7/dist-packages/mmseg/datasets/builder.py", line 73, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS, default_args)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
KeyError: 'ADE20KDataset: "MultiScaleFlipAug: \'AlignedResize is not in the pipeline registry\'"'

I've made a Google Colab to reproduce: https://colab.research.google.com/drive/1-t_lj5K2ZEFemxn88DSfcy9W7RTvklsz?usp=sharing

opened by NielsRogge 3

No module named 'mmseg'

hi thanks for your great repo

I have no idea why but it seems to import mmseg not from local module but from installed one and it keeps shows me this error

No module named 'mmseg'

When I run pycharm as debugging mode, than it works all fine... only when I try to run it with terminal or run mode...

any help will be very appreciated

opened by ooodragon94 2
How to change checkpoint saving frequency

Hi, first of all, thank you for your research and code.

I see that during training, the model is saved every 4000 iterations. Where can I change this spec, such that my model is saved every, lets say, 1000 iterations?

Thank you

opened by gcilli 2
How to convert the model to tensorrt or onnx?

For robot implementation, we need onnx or openvino version of segformer, but currently, I found that segformer is not to to be converted to those versions, Does anyone can help us to find the reason or share your successful examples, thank you!

opened by yuchenlichuck 0
Dataset Creation

Hi,

I am working on a task of semantic segmentation. I am facing issue in generation of data in the required format. Can anyone help me with free tools that I can use to generate the data?

opened by FatemaD1 0
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
What does "type = IMTRv21_5" mean?

I found it diffcult to understand this two file. #1 local_configs/base/models/segformer.py #2 local_configs/base/models/segformer.py I knew that #1 means the base of #2, but I find the code in file 2: "backbone = dict(type= ‘mit_b0’)",
and the other code in file 1: "backbone = dict(type= ‘IMTRv21_5’)",

I wonder what does "type = IMTRv21_5" mean? Please Guide me!

opened by Buling-Knight 0

Owner

NVIDIA Research Projects

GitHub https://arxiv.org/abs/2105.15203

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

612 Jan 4, 2023

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval. CLIP4Clip is a video-text retrieval model based

456 Jan 6, 2023

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

103 Dec 23, 2022

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

138 Oct 28, 2022

Official implementation of Meta-StyleSpeech and StyleSpeech

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dongchan Min, Dong Bok Lee, Eunho Yang, and Sung Ju Hwang This is an official code

169 Jan 5, 2023

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

The PyTorch-Kaldi Speech Recognition Toolkit PyTorch-Kaldi is an open-source repository for developing state-of-the-art DNN/HMM speech recognition sys

2.3k Dec 27, 2022

Official PyTorch implementation of SegFormer

Related tags

Overview

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Project page | Paper | Demo (Youtube) | Demo (Bilibili)

Installation

Evaluation

Training

Visualize

License

Citation

Comments

Patching CVE-2007-4559

Owner

NVIDIA Research Projects

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Official implementation of Meta-StyleSpeech and StyleSpeech

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

SAINT PyTorch implementation

Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

A fast and easy implementation of Transformer with PyTorch.

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Pytorch implementation of Tacotron

Google AI 2018 BERT pytorch implementation

Unofficial PyTorch implementation of Google AI's VoiceFilter system

Implementation of ProteinBERT in Pytorch

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer