Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

Last update: Jan 6, 2023

Related tags

Overview

Segmenter: Transformer for Semantic Segmentation

Segmenter: Transformer for Semantic Segmentation by Robin Strudel*, Ricardo Garcia*, Ivan Laptev and Cordelia Schmid.

*Equal Contribution

Installation

Define os environment variables pointing to your checkpoint and dataset directory, put in your .bashrc:

export DATASET=/path/to/dataset/dir

Install PyTorch 1.9 then pip install . at the root of this repository.

To download ADE20K, use the following command:

python -m segm.scripts.prepare_ade20k $DATASET

Model Zoo

We release models with a Vision Transformer backbone initialized from the improved ViT models.

ADE20K

Segmenter models with ViT backbone:

Name	mIoU (SS/MS)	# params	Resolution	FPS	Download
Seg-T-Mask/16	38.1 / 38.8	7M	512x512	52.4	model	config	log
Seg-S-Mask/16	45.3 / 46.9	27M	512x512	34.8	model	config	log
Seg-B-Mask/16	48.5 / 50.0	106M	512x512	24.1	model	config	log
Seg-L-Mask/16	51.3 / 53.2	334M	512x512	10.6	model	config	log
Seg-L-Mask/16	51.8 / 53.6	334M	640x640	-	model	config	log

Segmenter models with DeiT backbone:

Name	mIoU (SS/MS)	# params	Resolution	FPS	Download
Seg-B†/16	47.1 / 48.1	87M	512x512	27.3	model	config	log
Seg-B†-Mask/16	48.7 / 50.1	106M	512x512	24.1	model	config	log

Pascal Context

Name	mIoU (SS/MS)	# params	Resolution	FPS	Download
Seg-L-Mask/16	58.1 / 59.0	334M	480x480	-	model	config	log

Inference

Download one checkpoint with its configuration in a common folder, for example seg_tiny_mask.

You can generate segmentation maps from your own data with:

python -m segm.inference --model-path seg_tiny_mask/checkpoint.pth -i images/ -o segmaps/

To evaluate on ADE20K, run the command:

# single-scale evaluation:
python -m segm.eval.miou seg_tiny_mask/checkpoint.pth ade20k --singlescale
# multi-scale evaluation:
python -m segm.eval.miou seg_tiny_mask/checkpoint.pth ade20k --multiscale

Train

Train Seg-T-Mask/16 on ADE20K on a single GPU:

python -m segm.train --log-dir seg_tiny_mask --dataset ade20k \
  --backbone vit_tiny_patch16_384 --decoder mask_transformer

To train Seg-B-Mask/16, simply set vit_base_patch16_384 as backbone and launch the above command using a minimum of 4 V100 GPUs (~12 minutes per epoch) and up to 8 V100 GPUs (~7 minutes per epoch). The code uses SLURM environment variables.

Logs

To plot the logs of your experiments, you can use

python -m segm.utils.logs logs.yml

with logs.yml located in utils/ with the path to your experiments logs:

root: /path/to/checkpoints/
logs:
  seg-t: seg_tiny_mask/log.txt
  seg-b: seg_base_mask/log.txt

Video Segmentation

Zero shot video segmentation on DAVIS video dataset with Seg-B-Mask/16 model trained on ADE20K.

BibTex

@article{strudel2021,
  title={Segmenter: Transformer for Semantic Segmentation},
  author={Strudel, Robin and Garcia, Ricardo and Laptev, Ivan and Schmid, Cordelia},
  journal={arXiv preprint arXiv:2105.05633},
  year={2021}
}

Acknowledgements

The Vision Transformer code is based on timm library and the semantic segmentation training and evaluation pipeline is using mmsegmentation.

Comments

KeyError: ''

Hello, I run the program in windows. And an error occurred that

D:\Download\anaconda\anaconda\envs\learn\python.exe E:/Learning/Graduate/segmenter/segmenter-master/segm/train.py
Starting process with rank 0...
Process 0 is connected.
All processes are connected.
Traceback (most recent call last):
  File "E:\Learning\Graduate\segmenter\segmenter-master\segm\train.py", line 304, in <module>
    main()
  File "D:\Download\anaconda\anaconda\envs\learn\lib\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "D:\Download\anaconda\anaconda\envs\learn\lib\site-packages\click\core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "D:\Download\anaconda\anaconda\envs\learn\lib\site-packages\click\core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "D:\Download\anaconda\anaconda\envs\learn\lib\site-packages\click\core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "E:\Learning\Graduate\segmenter\segmenter-master\segm\train.py", line 76, in main
    model_cfg = cfg["model"][backbone]
KeyError: ''

Do you know how to solve it? Thank you!

opened by SikangSHU 8

Ask about the "Seg-B/8"

Great work on semantic segmentation!

I find that the resolution is important for the final performance, e.g., Seg-B/8.

However, I could not find that ImageNet pre-trained checkpoints with patch-size 8 from the lib timm.

It would be great if you could help to address my concern!

opened by PkuRainBow 8
Code to compute images/sec
Hi,

Thank you for the cool work!

I see that you report images/sec, and mention the following in the paper:

To compute the images per second, we use a V100 GPU, fix the image resolution to 512 and for each model we maximize the batch size allowed by memory for a fair comparison.

I'm trying to do the same, however I'm unable to reproduce the numbers you of images/sec in the paper.

I'm using the code snippet from PyTorch as follows:

batch = torch.rand(args.batch_size, *input_shape).cuda() model(batch) n_runs = 10 from torch.utils.benchmark import Timer t = Timer(stmt="model.forward(batch)", globals={"model": model, "batch": batch}) m = t.timeit(n_runs)

The batch size that fits on V100 for Vit-T backbone is about 140. And the above code shows a timing of 0.62 seconds. So I'm computing the total images/sec = 140/0.62 = 225.8. This is almost half the numbers in Table 3. Can you please help me with what I need to do to get the mentioned result?

Thank you!
opened by prabhuteja12 6
how to get the attention maps

first the folder named images don’t have the file named im0.jpg. they release the message
if i replace the folder images/validation/ADE_val_0000000.jpg ValueError: Provided image path images/training/ADE_train_00016528 is not a valid image file.

and what is the output_dir

opened by sijiua 6
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/$WORK/tempbs_7o9oj'

**I begin to train on my own data, but I get an error when it evals for the first time. The log shows as follow: **

Epoch: [11] [0/8] eta: 0:00:34 loss: 0.0000 (0.0000) learning_rate: 0.0008 (0.0008) time: 4.2506 data: 2.3495 max mem: 9466 Epoch: [11] [7/8] eta: 0:00:00 loss: 0.0000 (0.0000) learning_rate: 0.0008 (0.0008) time: 0.9943 data: 0.2958 max mem: 9491 Epoch: [11] Total time: 0:00:08 (1.0115 s / it) Epoch: [12] [0/8] eta: 0:00:27 loss: 0.0000 (0.0000) learning_rate: 0.0008 (0.0008) time: 3.4646 data: 2.7603 max mem: 9492 Epoch: [12] [7/8] eta: 0:00:00 loss: 0.0000 (0.0000) learning_rate: 0.0008 (0.0008) time: 0.8330 data: 0.3464 max mem: 9492 Epoch: [12] Total time: 0:00:06 (0.8537 s / it) Eval: [ 0/58] eta: 0:01:40 time: 1.7340 data: 1.3048 max mem: 10891 Eval: [50/58] eta: 0:00:01 time: 0.1124 data: 0.0121 max mem: 16814 Eval: [57/58] eta: 0:00:00 time: 0.1047 data: 0.0120 max mem: 16814 _Eval: Total time: 0:00:08 (0.1505 s / it) Traceback (most recent call last): File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/runpy.py", line 192, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/qiuzheng/segmenter/segm/train.py", line 304, in main() File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/click/core.py", line 1128, in call return self.main(*args, **kwargs) File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/home/qiuzheng/segmenter/segm/train.py", line 266, in main eval_logger = evaluate( File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/home/qiuzheng/segmenter/segm/engine.py", line 104, in evaluate val_seg_pred = gather_data(val_seg_pred) File "/home/qiuzheng/segmenter/segm/metrics.py", line 60, in gather_data tmpdir = tempfile.mkdtemp(prefix=tmpprefix) File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/tempfile.py", line 359, in mkdtemp os.mkdir(file, 0o700) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/$WORK/tempbs_7o9oj'

opened by shuaikangma 6
Mutli-GPUs training

This is a good paper and very interested idea! There is a training cmd using a single gpu in readme. For multi-gpus training, could you provide the corresponding cmd ?

opened by qiulesun 4
Performance of Seg-B/16 on CityScapes using AugReg initialization

Hi, thanks for the excellent work! I notice that in your paper, the Seg-B/16 trained on CityScapes is initialized by DeiT pre-trained model (rather than AugReg). And by my own experiments, Seg-B/16 (and my own model based on ViT-Base) with AugReg initialization performs quite bad on CityScapes (73.2 mIoU), while Seg-S/16 performs well (76.2 mIoU). So I wonder if you guys had also got similar results, and if you can share extra information about your choice on initialization of Seg-B/16 model? Many thanks.

opened by YiF-Zhang 4
Multi-GPU Training Not On SLURM

Hello, thanks a lot for your contribution of such a excellent work. I noticed that the distributed multi-gpu training is based on the slurm platform, which is not easy to be run on other platforms. Could you or anyone can provide some tips to change the code from the slurm based code to the non-slurm based one, so that the multi-gpu distributed training can also be conducted on other platforms?

opened by luck528 4
Performance better than that in the paper.

Hi Robin,

Thanks for releasing the code and model. I find that your model performs better than what is reported in the paper. For example, on ADE20K validation set, Seg-B-Mask/16 has 45.69 mIoU (SS), but according to the information from this repo, it can actually achieve 48.5. Am I missing something?

opened by chenyangh 4
train on custom dataset

hello, I would like to ask if I can modify the existing code to train on my dataset because in a previous issue I read that this is not possible yet. If it's possible Any hints about modifications needed ?

opened by george-kalitsios 3

Performance on Pascal Context with Seg-L-Mask/16

Hi, thanks for the great works and the code! I'm trying to reproduce the baseline base on mmsegmentation. While the baseline could be reproduced well on cityscapes and ADE20k, I could only get 56.9 on single scale on Pascal Context(58.1 reported). Anything I've missed? Below is the config I'm running base on mmsegmentation, anything wrong in the setting? Great thanks for your help!

_base_ = [
    # "./training_scheme.py",
    "../_base_/models/segmenter_vit-b16.py",
    "../_base_/datasets/pascal_context_meanstd0.5.py",
    "../_base_/default_runtime.py",
    "../_base_/schedules/schedule_80k.py",
]

model = dict(
    pretrained="pretrain/L_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.1-sd_0.1--imagenet2012-steps_20k-lr_0.01-res_384.npz",
    backbone=dict(
        type="VisionTransformer",
        img_size=(480, 480),
        patch_size=16,
        in_channels=3,
        embed_dims=1024,
        num_layers=24,
        num_heads=16,
        mlp_ratio=4,
        out_indices=(5, 11, 17, 23),
        qkv_bias=True,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.1,
        with_cls_token=True,
        final_norm=True,
        norm_cfg=dict(type="LN", eps=1e-6),
        act_cfg=dict(type="GELU"),
        norm_eval=False,
        interpolate_mode="bicubic",
    ),
    neck=dict(
        type="UseIndexSingleOutNeck",
        index=-1,
    ),
    decode_head=dict(
        n_cls=60,
        n_layers=2,
        d_encoder=1024,
        n_heads=16,
        d_model=1024,
        d_ff=4 * 1024,
    ),
    test_cfg=dict(mode="slide", crop_size=(480, 480), stride=(320, 320)),
)

optimizer = dict(
    _delete_=True,
    type="SGD",
    lr=0.001,
    weight_decay=0.0,
    momentum=0.9,
    paramwise_cfg=dict(
        custom_keys={
            "pos_embed": dict(decay_mult=0.0),
            "cls_token": dict(decay_mult=0.0),
            "norm": dict(decay_mult=0.0),
        }
    ),
)

lr_config = dict(
    _delete_=True,
    policy="poly",
    warmup_iters=0,
    power=0.9,
    min_lr=1e-5,
    by_epoch=False,
)

# By default, models are trained on 8 GPUs with 2 images per GPU
data = dict(samples_per_gpu=2)

opened by hardyho 3

customised data

Hello,

i wanna try this on my own dataset, i have created similar config files and python files like you did for ade20k.

I added a class file for my dataset:

FISH_CONFIG_PATH = Path(__file__).parent / "config" / "fish.py"
FISH_CATS_PATH = Path(__file__).parent / "config" / "fish.yml"

@DATASETS.register_module
class FishSegmentation(BaseMMSeg):
    def __init__(self, image_size, crop_size, split, **kwargs):
        super().__init__(
            image_size, crop_size, split, 
            config_path = FISH_CONFIG_PATH,
            normalization=kwargs.pop('normalization')
        )
        self.names, self.colors = utils.dataset_cat_description(FISH_CATS_PATH)
        self.n_cls = 150
        self.ignore_label = 0
        self.reduce_zero_label = True

After i registered my data by @DATASETS.register_module, the init founction is kind of conflicted with your BaseMMSeg, is there any way that I can use customised data based on your repo?

opened by Remosy 1

CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
KeyError: 'optimizer'

Thank you for your excellent work, but I have a problem about module checkpoint.pth.When I try to run segm.train module,there is an error "KeyError: 'optimizer'",Hope you to answer me. thanks again!

opened by Werejoice 5

Unexpected keyword `mlp_ratio` running `seg_base_deit_mask`

First of all, excellent repo - thanks very much for the awesome contribution to the ml community!

When running running eval on seg_base_deit_mask (via python -m segm.eval.miou checkpoints/seg_base_deit_mask/checkpoint.pth ade20k --multiscale), I am getting an error:

Starting process with rank 0...
Process 0 is connected.
All processes are connected.
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/.../segmenter/segm/eval/miou.py", line 279, in <module>
    main()
  File "/home/.../segmenter/pyenv/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/.../segmenter/pyenv/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/.../segmenter/pyenv/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/.../segmenter/pyenv/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/.../segmenter/segm/eval/miou.py", line 226, in main
    model, variant = load_model(model_path)
  File "/home/.../segmenter/segm/model/factory.py", line 119, in load_model
    model = create_segmenter(net_kwargs)
  File "/home/.../segmenter/segm/model/factory.py", line 106, in create_segmenter
    encoder = create_vit(model_cfg)
  File "/home/.../segmenter/segm/model/factory.py", line 67, in create_vit
    model = VisionTransformer(**model_cfg)
TypeError: __init__() got an unexpected keyword argument 'mlp_ratio'

This is happening with both single and multi scale. This seems to be stemming from the mlp_ratio key in the located in the yml config.

As I keep poking around, if I find a solution I'll submit a PR.

Thanks again for the repo :+1:

opened by zroach 0

Owner

PhD student at Ecole Normale Supérieure and INRIA Paris

GitHub

The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

FAPIS The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter Introduction This repo is primari

8 Dec 11, 2022

Finding an Unsupervised Image Segmenter in each of your Deep Generative Models

Finding an Unsupervised Image Segmenter in each of your Deep Generative Models Description Recent research has shown that numerous human-interpretable

61 Oct 17, 2022

A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation

Paper Khoi Nguyen, Sinisa Todorovic "A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation", accepted to ICCV 2021 Our code is mai

5 Aug 14, 2022

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

32 Sep 21, 2022

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

1.4k Dec 30, 2022

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation ". Please

610 Dec 28, 2022

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

103 Dec 14, 2022

HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation Official PyTorch Implementation

: We present a novel, real-time, semantic segmentation network in which the encoder both encodes and generates the parameters (weights) of the decoder. Furthermore, to allow maximal adaptivity, the weights at each decoder block vary spatially. For this purpose, we design a new type of hypernetwork, composed of a nested U-Net for drawing higher level context features

182 Dec 14, 2022

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

35 Oct 26, 2022

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

147 Dec 3, 2022

Official pytorch code for SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal

SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal This is the official pytorch code for SSAT: A Symmetric Semantic-

57 Dec 13, 2022

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

337 Dec 15, 2022

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick." [Project page] [Paper

59 Sep 25, 2022

Official implementation of "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers"

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers Figure 1: Performance of SegFormer-B0 to SegFormer-B5. Project page

1.4k Dec 31, 2022

Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

DSP Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation". Accepted by ACM Multimedia 2021. Authors

20 Oct 24, 2022

Official and maintained implementation of the paper "OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data" [BMVC 2021].

OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data Christoph Reich, Tim Prangemeier, Özdemir Cetin & Heinz Koeppl | Pr

23 Sep 21, 2022