TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022


TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Paper Links: TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation (CVPR 2022)

by Wenqiang Zhang*, Zilong Huang*, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen.

(*) equal contribution, (†) corresponding author.


Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost makes it not suitable to deal with dense prediction tasks such as semantic segmentation on mobile devices. In this paper, we present a mobile-friendly architecture named Token Pyramid Vision TransFormer(TopFormer). The proposed TopFormer takes Tokens from various scales as input to produce scale-aware semantic features, which are then injected into the corresponding tokens to augment the representation. Experimental results demonstrate that our method significantly outperforms CNN- and ViT-based networks across several semantic segmentation datasets and achieves a good trade-off between accuracy and latency.

The latency is measured on a single Qualcomm Snapdragon 865 with input size 512×512×3, only an ARM CPU core is used for speed testing. *indicates the input size is 448×448×3.


  • 04/23/2022: TopFormer backbone has been integrated into PaddleViT, checkout here for the 3rd party implementation on Paddle framework!


  • pytorch 1.5+
  • mmcv-full==1.3.14

Main results

The classification models pretrained on ImageNet can be downloaded from Baidu Drive/Google Drive.


Model Params(M) FLOPs(G) mIoU(ss) Link
TopFormer-T_448x448_2x8_160k 1.4 0.5 32.5 Baidu Drive, Google Drive
TopFormer-T_448x448_4x8_160k 1.4 0.5 33.4 Baidu Drive, Google Drive
TopFormer-T_512x512_2x8_160k 1.4 0.6 33.6 Baidu Drive, Google Drive
TopFormer-T_512x512_4x8_160k 1.4 0.6 34.6 Baidu Drive, Google Drive
TopFormer-S_512x512_2x8_160k 3.1 1.2 36.5 Baidu Drive, Google Drive
TopFormer-S_512x512_4x8_160k 3.1 1.2 37.0 Baidu Drive, Google Drive
TopFormer-B_512x512_2x8_160k 5.1 1.8 38.3 Baidu Drive, Google Drive
TopFormer-B_512x512_4x8_160k 5.1 1.8 39.2 Baidu Drive, Google Drive
  • ss indicates single-scale.
  • The password of Baidu Drive is topf


Please see MMSegmentation for dataset prepare.

For training, run:

sh tools/ local_configs/topformer/<config-file> <num-of-gpus-to-use> --work-dir /path/to/save/checkpoint

To evaluate, run:

sh tools/ local_configs/topformer/<config-file> <checkpoint-path> <num-of-gpus-to-use>

To test the inference speed in mobile device, please refer to tnn_runtime.


The implementation is based on MMSegmentation.


if you find our work helpful to your experiments, please cite with:

  title     = {TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation},
  author    = {Zhang, Wenqiang and Huang, Zilong and Luo, Guozhong and Chen, Tao and Wang,  Xinggang and Liu, Wenyu and Yu, Gang and Shen, Chunhua.},
  booktitle = {Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)},
  year      = {2022}
  • TNN推理无法进行




    PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
    python $(dirname "$0")/ $CONFIG --shape 512 512 --checkpoint $CHECK


    之后执行命令:bash tools/ local_configs/topformer/ <path>/TopFormer-T_512x512_2x8_160k-33.6.pth

    输出以下结果: image

    按照TNN的convert.md自行编译convert2tnn,之后执行命令:python3 onnx2tnn <path>/topformer_t.onnx -optimize -v=v3.0

    再按照TNN的profiling.md进行Android平台耗时测试。执行的代码是./ -c。我用TNN代码库提供一个样例squeezenet_v1.1.tnnproto试了一下是可以正常在我的手机上测试的,但是我转的tnnproto就不能运行,会一直卡住,见下图: image

    如果我把TNN转换时的-optimize参数去掉,就不会卡住,但是会报错: image




  • FileNotFoundError


    Hello, developer, when I try to run in mmsegmentation environment, I get the following error FileNotFoundError: modelzoos/classification/topformer-T-224-66.2.pth can not be found. where should I get this file? topformer-T-224-66.2.pth

    opened by ke-dev 3
  • AttributeError: 'ConfigDict' object has no attribute 'dist_params'

    AttributeError: 'ConfigDict' object has no attribute 'dist_params'

    Hello, Thanks for providing such a great repo. But I encountered an error when run bash tools/ local_configs/topformer/ 8 --work-dir /path/to/save/checkpoint Details are as follows: image

    opened by wmkai 3
  • is there any ImageNet pre-training config of Topformer?

    is there any ImageNet pre-training config of Topformer?

    I want to change some modules when using Topformer to do classification on ImageNet. Could you please provide the ImageNet Pre-training config? Thanks!

    opened by wmkai 2
  • how to modify the batch size during inference?

    how to modify the batch size during inference?

    I tried to modify the value samples_per_gpu from 2 to 1 in config file, but the elapsed time in inference log seems to change not much. Do I have something wrong?

    opened by wmkai 2
  • 对比实验


    请问论文Table 1里面的对比方法全都是用MMSegmentation做的吗,我想跑一下对比方法但是我发现好多模型没有找到,比如DeepLabV3+ EfficientNet、DeepLabV3+ ShuffleNetV2-1.5x、Semantic FPN ConvMLP-S等,configs里面没有相关的代码。

    opened by vozhuo 1
  • The validation set result is 0

    The validation set result is 0

    Hi, thanks for your excellent work. When I used toformer to train the ADE20K dataset, the result of the validation set appeared 0. Have you ever encountered this situation? 337ba331169f76b1985b94671ea2f86 This is the command I use bash tools/ local_configs/topformer/ 1 look forward to you reply.

    opened by ke-dev 1
  • size of output

    size of output

    Hello, As shown in figure 2, 1/4 scale of input is present in the output. so for 448448 input, there is a 5656 image in the output!!! what is the point that I can't see?

    opened by javadmozaffari 1
  • Deploy test error with onnx backend

    Deploy test error with onnx backend


    Thanks for sharing this great repo. I trained and test with my own dataset, the prediction results are quite good.

    But when I exported the onnx file with command:

    python3 tools/  local_configs/topformer/ --input-img results/0109.png --shape 512 512 --checkpoint results/tiny_20k/latest.pth --output-file results/tiny_80k/tiny_512_512.onnx --show

    then try the deploy script with command: python tools/ local_configs/topformer/ results/tiny_80k/tiny_512_512.onnx --backend=onnxruntime --show

    Then it show error:

    File "tools/", line 297, in main() File "tools/", line 268, in main results = single_gpu_test( File "/home/yidong/anaconda3/envs/mask2former/lib/python3.8/site-packages/mmsegmentation-0.19.0-py3.8.egg/mmseg/apis/", line 91, in single_gpu_test result = model(return_loss=False, **data) File "/home/yidong/.local/lib/python3.8/site-packages/torch/nn/modules/", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/yidong/anaconda3/envs/mask2former/lib/python3.8/site-packages/mmcv/parallel/", line 42, in forward return super().forward(*inputs, **kwargs) File "/home/yidong/.local/lib/python3.8/site-packages/torch/nn/parallel/", line 165, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/yidong/.local/lib/python3.8/site-packages/torch/nn/modules/", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/yidong/anaconda3/envs/mask2former/lib/python3.8/site-packages/mmcv/runner/", line 98, in new_func return old_func(*args, **kwargs) File "/home/yidong/anaconda3/envs/mask2former/lib/python3.8/site-packages/mmsegmentation-0.19.0-py3.8.egg/mmseg/models/segmentors/", line 110, in forward return self.forward_test(img, img_metas, **kwargs) File "/home/yidong/anaconda3/envs/mask2former/lib/python3.8/site-packages/mmsegmentation-0.19.0-py3.8.egg/mmseg/models/segmentors/", line 92, in forward_test return self.simple_test(imgs[0], img_metas[0], **kwargs) File "tools/", line 84, in simple_test self.sess.run_with_iobinding(self.io_binding) File "/home/yidong/anaconda3/envs/mask2former/lib/python3.8/site-packages/onnxruntime/capi/", line 276, in run_with_iobinding self._sess.run_with_iobinding(iobinding._iobinding, run_options) RuntimeError: Error in execution: Got invalid dimensions for input: input for the following indices index: 2 Got: 1080 Expected: 512 index: 3 Got: 1878 Expected: 512 Please fix either the inputs or the model.

    Does it have any requirements for input images or shapes when export onnx file? the config setting as: img_scale = (1920, 1080) , crop_size = (512, 512).

    I also tried some other shape size, still confused with this error and any helps would be much appreciated.

    opened by wangyidong3 1
  • What's SASE

    What's SASE

    Excellent work.In your paper you compare SASE with FFN and ASPP , I read the code also, but I can't find out what is sase, could you please tell me what's SASE or the related paper

    opened by Liupengshuaige 1
  • onnx to tensorrt

    onnx to tensorrt

    Hi, I used this command to get tmp.onnx. python3 tools/ \ ./local_configs/topformer/ \ --input-img ./imgs/999438_0_B_0007_7680-3840-11776-7936.jpg \ --shape 2048 2048 \ --checkpoint ./work_dirs/topformer_tiny_448x448_160k_2x8_ade20k/latest.pth

    then I got an error below when I try to convert tmp.onnx to tensorrt assert is_tensorrt_plugin_loaded(), 'TensorRT plugin should be compiled.' AssertionError: TensorRT plugin should be compiled.

    my environment configuration is cuda : 10.1 tensorrt: I also try cuda 11.3, but get the same error,have you tried switching to tensorrt? what is the difference between this and the operation of mmsegmentation docs?

    look forward to your reply.

    opened by ke-dev 0
  • Can you open source the ImageNet Pretraining code?

    Can you open source the ImageNet Pretraining code?

    I have tried to pre-training the backbone+ppa+sase on imagenet, with convbnconv + linear as head. But I cannot reproduce the result of 75.3 on base model. So can you provide the source code of pertaining on imagenet?

    opened by shiyutang 0
  • AttributeError: 'ConfigDict' object has no attribute 'dist_params'

    AttributeError: 'ConfigDict' object has no attribute 'dist_params'

    (deformable_detr) root@workspace:/dfs/data/code_python/detection_2d/mmdetection# CUDA_VISIBLE_DEVICES=1 tools/ configs/deformable_detr/ 1 Traceback (most recent call last): File "tools/", line 244, in main() File "tools/", line 172, in main init_dist(args.launcher, **cfg.dist_params) File "/dfs/data/anaconda/envs/deformable_detr/lib/python3.8/site-packages/mmcv/utils/", line 507, in getattr return getattr(self._cfg_dict, name) File "/dfs/data/anaconda/envs/deformable_detr/lib/python3.8/site-packages/mmcv/utils/", line 48, in getattr raise ex AttributeError: 'ConfigDict' object has no attribute 'dist_params' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2441) of binary: /dfs/data/anaconda/envs/deformable_detr/bin/python Traceback (most recent call last): File "/dfs/data/anaconda/envs/deformable_detr/lib/python3.8/", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/dfs/data/anaconda/envs/deformable_detr/lib/python3.8/", line 87, in _run_code exec(code, run_globals) File "/dfs/data/anaconda/envs/deformable_detr/lib/python3.8/site-packages/torch/distributed/", line 193, in main() File "/dfs/data/anaconda/envs/deformable_detr/lib/python3.8/site-packages/torch/distributed/", line 189, in main launch(args) File "/dfs/data/anaconda/envs/deformable_detr/lib/python3.8/site-packages/torch/distributed/", line 174, in launch run(args) File "/dfs/data/anaconda/envs/deformable_detr/lib/python3.8/site-packages/torch/distributed/", line 710, in run elastic_launch( File "/dfs/data/anaconda/envs/deformable_detr/lib/python3.8/site-packages/torch/distributed/launcher/", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/dfs/data/anaconda/envs/deformable_detr/lib/python3.8/site-packages/torch/distributed/launcher/", line 259, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

    tools/ FAILED

    Failures: <NO_OTHER_FAILURES>

    Root Cause (first observed failure): [0]: time : 2022-12-09_15:03:20 host : workspace rank : 0 (local_rank: 0) exitcode : 1 (pid: 2441) error_file: <N/A> traceback : To enable traceback see:

    opened by yukaizhou 0
  • Error: Topformer is not in models registry

    Error: Topformer is not in models registry

    I am trying to use this with a custom dataset

    lib/python3.7/site-packages/mmcv/utils/", line 62, in build_from_cfg f'{obj_type} is not in the {} registry') KeyError: 'Topformer is not in the models registry'

    What could be the reason for this

    opened by nralka2007 1
  • Some training problems on 1 gpu

    Some training problems on 1 gpu

    I want to train on 1 gpu without using distributed training, so i ran

    python tools/ local_configs/topformer/ 1 --work-dir runs/

    and added

    import sys

    to Finally i got this:

    assert hasattr(ext, fun), f'{fun} miss in module {name}' AssertionError: ball_query_forward miss in module _ext

    How can i fix the problem? 我重装了1.3.14版本的mmcv解决了上述问题,但是我在使用自己的ade20k数据的时候出现了一个bug

    2022-10-31 18:18:14,190 - mmseg - INFO - Loaded 22945 images
    fatal: not a git repository (or any of the parent directories): .git
    Traceback (most recent call last):
      File "tools/", line 183, in <module>
      File "tools/", line 179, in main
      File "/home/xx/TopFormer-main/mmseg/apis/", line 88, in train_segmentor
        drop_last=True) for ds in dataset
      File "/home/xx/TopFormer-main/mmseg/apis/", line 88, in <listcomp>
        drop_last=True) for ds in dataset
    TypeError: object of type 'int' has no len()
    opened by sungh66 0
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
Hust Visual Learning Team
Hust Visual Learning Team belongs to the Artificial Intelligence Research Institute in the School of EIC in HUST, Lead by @xinggangw
Hust Visual Learning Team
