Segmenter - Transformer for Semantic Segmentation

Overview

Segmenter - Transformer for Semantic Segmentation

Figure 1 from paper

Segmenter: Transformer for Semantic Segmentation by Robin Strudel, Ricardo Garcia, Ivan Laptev and Cordelia Schmid.

Installation

The code and several trained models will be released soon.

Video Segmentation

Segmentation maps of Seg-B-Mask/16 trained on ADE20K segmentation dataset and tested on DAVIS video dataset.

BibTex

@article{strudel2021,
  title={Segmenter: Transformer for Semantic Segmentation},
  author={Strudel, Robin and Garcia, Ricardo and Laptev, Ivan and Schmid, Cordelia},
  journal={arXiv preprint arXiv:?},
  year={2021}
}

Credits

The Vision Transformer code is based on timm library and the semantic segmentation training and evaluation pipeline is using mmsegmentation.

Comments
  • KeyError: ''

    KeyError: ''

    Hello, I run the program in windows. And an error occurred that

    D:\Download\anaconda\anaconda\envs\learn\python.exe E:/Learning/Graduate/segmenter/segmenter-master/segm/train.py
    Starting process with rank 0...
    Process 0 is connected.
    All processes are connected.
    Traceback (most recent call last):
      File "E:\Learning\Graduate\segmenter\segmenter-master\segm\train.py", line 304, in <module>
        main()
      File "D:\Download\anaconda\anaconda\envs\learn\lib\site-packages\click\core.py", line 1128, in __call__
        return self.main(*args, **kwargs)
      File "D:\Download\anaconda\anaconda\envs\learn\lib\site-packages\click\core.py", line 1053, in main
        rv = self.invoke(ctx)
      File "D:\Download\anaconda\anaconda\envs\learn\lib\site-packages\click\core.py", line 1395, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "D:\Download\anaconda\anaconda\envs\learn\lib\site-packages\click\core.py", line 754, in invoke
        return __callback(*args, **kwargs)
      File "E:\Learning\Graduate\segmenter\segmenter-master\segm\train.py", line 76, in main
        model_cfg = cfg["model"][backbone]
    KeyError: ''
    

    Do you know how to solve it? Thank you!

    opened by SikangSHU 8
  • Ask about the

    Ask about the "Seg-B/8"

    Great work on semantic segmentation!

    I find that the resolution is important for the final performance, e.g., Seg-B/8.

    However, I could not find that ImageNet pre-trained checkpoints with patch-size 8 from the lib timm.

    It would be great if you could help to address my concern!

    opened by PkuRainBow 8
  • Code to compute images/sec

    Code to compute images/sec

    Hi,

    Thank you for the cool work!

    I see that you report images/sec, and mention the following in the paper:

    To compute the images per second, we use a V100 GPU, fix the image resolution to 512 and for each model we maximize the batch size allowed by memory for a fair comparison.

    I'm trying to do the same, however I'm unable to reproduce the numbers you of images/sec in the paper.

    I'm using the code snippet from PyTorch as follows:

        batch = torch.rand(args.batch_size, *input_shape).cuda()
        model(batch)
        n_runs = 10
        from torch.utils.benchmark import Timer
    
        t = Timer(stmt="model.forward(batch)", globals={"model": model, "batch": batch})
        m = t.timeit(n_runs)
    

    The batch size that fits on V100 for Vit-T backbone is about 140. And the above code shows a timing of 0.62 seconds. So I'm computing the total images/sec = 140/0.62 = 225.8. This is almost half the numbers in Table 3. Can you please help me with what I need to do to get the mentioned result?

    Thank you!

    opened by prabhuteja12 6
  • how to get the attention maps

    how to get the attention maps

    first the folder named images don’t have the file named im0.jpg. image image they release the message
    image if i replace the folder images/validation/ADE_val_0000000.jpg ValueError: Provided image path images/training/ADE_train_00016528 is not a valid image file.

    and what is the output_dir image

    opened by sijiua 6
  • FileNotFoundError: [Errno 2] No such file or directory: '/tmp/$WORK/tempbs_7o9oj'

    FileNotFoundError: [Errno 2] No such file or directory: '/tmp/$WORK/tempbs_7o9oj'

    **I begin to train on my own data, but I get an error when it evals for the first time. The log shows as follow: **

    Epoch: [11] [0/8] eta: 0:00:34 loss: 0.0000 (0.0000) learning_rate: 0.0008 (0.0008) time: 4.2506 data: 2.3495 max mem: 9466 Epoch: [11] [7/8] eta: 0:00:00 loss: 0.0000 (0.0000) learning_rate: 0.0008 (0.0008) time: 0.9943 data: 0.2958 max mem: 9491 Epoch: [11] Total time: 0:00:08 (1.0115 s / it) Epoch: [12] [0/8] eta: 0:00:27 loss: 0.0000 (0.0000) learning_rate: 0.0008 (0.0008) time: 3.4646 data: 2.7603 max mem: 9492 Epoch: [12] [7/8] eta: 0:00:00 loss: 0.0000 (0.0000) learning_rate: 0.0008 (0.0008) time: 0.8330 data: 0.3464 max mem: 9492 Epoch: [12] Total time: 0:00:06 (0.8537 s / it) Eval: [ 0/58] eta: 0:01:40 time: 1.7340 data: 1.3048 max mem: 10891 Eval: [50/58] eta: 0:00:01 time: 0.1124 data: 0.0121 max mem: 16814 Eval: [57/58] eta: 0:00:00 time: 0.1047 data: 0.0120 max mem: 16814 _Eval: Total time: 0:00:08 (0.1505 s / it) Traceback (most recent call last): File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/runpy.py", line 192, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/qiuzheng/segmenter/segm/train.py", line 304, in main() File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/click/core.py", line 1128, in call return self.main(*args, **kwargs) File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/home/qiuzheng/segmenter/segm/train.py", line 266, in main eval_logger = evaluate( File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/home/qiuzheng/segmenter/segm/engine.py", line 104, in evaluate val_seg_pred = gather_data(val_seg_pred) File "/home/qiuzheng/segmenter/segm/metrics.py", line 60, in gather_data tmpdir = tempfile.mkdtemp(prefix=tmpprefix) File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/tempfile.py", line 359, in mkdtemp os.mkdir(file, 0o700) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/$WORK/tempbs_7o9oj'

    opened by shuaikangma 6
  • Mutli-GPUs training

    Mutli-GPUs training

    This is a good paper and very interested idea! There is a training cmd using a single gpu in readme. For multi-gpus training, could you provide the corresponding cmd ?

    opened by qiulesun 4
  • Performance of Seg-B/16 on CityScapes using AugReg initialization

    Performance of Seg-B/16 on CityScapes using AugReg initialization

    Hi, thanks for the excellent work! I notice that in your paper, the Seg-B/16 trained on CityScapes is initialized by DeiT pre-trained model (rather than AugReg). And by my own experiments, Seg-B/16 (and my own model based on ViT-Base) with AugReg initialization performs quite bad on CityScapes (73.2 mIoU), while Seg-S/16 performs well (76.2 mIoU). So I wonder if you guys had also got similar results, and if you can share extra information about your choice on initialization of Seg-B/16 model? Many thanks.

    opened by YiF-Zhang 4
  • Multi-GPU Training Not On SLURM

    Multi-GPU Training Not On SLURM

    Hello, thanks a lot for your contribution of such a excellent work. I noticed that the distributed multi-gpu training is based on the slurm platform, which is not easy to be run on other platforms. Could you or anyone can provide some tips to change the code from the slurm based code to the non-slurm based one, so that the multi-gpu distributed training can also be conducted on other platforms?

    opened by luck528 4
  • Performance better than that in the paper.

    Performance better than that in the paper.

    Hi Robin,

    Thanks for releasing the code and model. I find that your model performs better than what is reported in the paper. For example, on ADE20K validation set, Seg-B-Mask/16 has 45.69 mIoU (SS), but according to the information from this repo, it can actually achieve 48.5. Am I missing something?

    opened by chenyangh 4
  • train on custom dataset

    train on custom dataset

    hello, I would like to ask if I can modify the existing code to train on my dataset because in a previous issue I read that this is not possible yet. If it's possible Any hints about modifications needed ?

    opened by george-kalitsios 3
  • Performance on Pascal Context with Seg-L-Mask/16

    Performance on Pascal Context with Seg-L-Mask/16

    Hi, thanks for the great works and the code! I'm trying to reproduce the baseline base on mmsegmentation. While the baseline could be reproduced well on cityscapes and ADE20k, I could only get 56.9 on single scale on Pascal Context(58.1 reported). Anything I've missed? Below is the config I'm running base on mmsegmentation, anything wrong in the setting? Great thanks for your help!

    _base_ = [
        # "./training_scheme.py",
        "../_base_/models/segmenter_vit-b16.py",
        "../_base_/datasets/pascal_context_meanstd0.5.py",
        "../_base_/default_runtime.py",
        "../_base_/schedules/schedule_80k.py",
    ]
    
    model = dict(
        pretrained="pretrain/L_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.1-sd_0.1--imagenet2012-steps_20k-lr_0.01-res_384.npz",
        backbone=dict(
            type="VisionTransformer",
            img_size=(480, 480),
            patch_size=16,
            in_channels=3,
            embed_dims=1024,
            num_layers=24,
            num_heads=16,
            mlp_ratio=4,
            out_indices=(5, 11, 17, 23),
            qkv_bias=True,
            drop_rate=0.0,
            attn_drop_rate=0.0,
            drop_path_rate=0.1,
            with_cls_token=True,
            final_norm=True,
            norm_cfg=dict(type="LN", eps=1e-6),
            act_cfg=dict(type="GELU"),
            norm_eval=False,
            interpolate_mode="bicubic",
        ),
        neck=dict(
            type="UseIndexSingleOutNeck",
            index=-1,
        ),
        decode_head=dict(
            n_cls=60,
            n_layers=2,
            d_encoder=1024,
            n_heads=16,
            d_model=1024,
            d_ff=4 * 1024,
        ),
        test_cfg=dict(mode="slide", crop_size=(480, 480), stride=(320, 320)),
    )
    
    optimizer = dict(
        _delete_=True,
        type="SGD",
        lr=0.001,
        weight_decay=0.0,
        momentum=0.9,
        paramwise_cfg=dict(
            custom_keys={
                "pos_embed": dict(decay_mult=0.0),
                "cls_token": dict(decay_mult=0.0),
                "norm": dict(decay_mult=0.0),
            }
        ),
    )
    
    lr_config = dict(
        _delete_=True,
        policy="poly",
        warmup_iters=0,
        power=0.9,
        min_lr=1e-5,
        by_epoch=False,
    )
    
    # By default, models are trained on 8 GPUs with 2 images per GPU
    data = dict(samples_per_gpu=2)
    
    opened by hardyho 3
  • customised data

    customised data

    Hello,

    i wanna try this on my own dataset, i have created similar config files and python files like you did for ade20k.

    I added a class file for my dataset:

    FISH_CONFIG_PATH = Path(__file__).parent / "config" / "fish.py"
    FISH_CATS_PATH = Path(__file__).parent / "config" / "fish.yml"
    
    @DATASETS.register_module
    class FishSegmentation(BaseMMSeg):
        def __init__(self, image_size, crop_size, split, **kwargs):
            super().__init__(
                image_size, crop_size, split, 
                config_path = FISH_CONFIG_PATH,
                normalization=kwargs.pop('normalization')
            )
            self.names, self.colors = utils.dataset_cat_description(FISH_CATS_PATH)
            self.n_cls = 150
            self.ignore_label = 0
            self.reduce_zero_label = True
    

    After i registered my data by @DATASETS.register_module, the init founction is kind of conflicted with your BaseMMSeg, is there any way that I can use customised data based on your repo?

    opened by Remosy 1
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • KeyError: 'optimizer'

    KeyError: 'optimizer'

    Thank you for your excellent work, but I have a problem about module checkpoint.pth.When I try to run segm.train module,there is an error "KeyError: 'optimizer'",Hope you to answer me. thanks again!

    opened by Werejoice 5
  • Unexpected keyword `mlp_ratio` running `seg_base_deit_mask`

    Unexpected keyword `mlp_ratio` running `seg_base_deit_mask`

    First of all, excellent repo - thanks very much for the awesome contribution to the ml community!

    When running running eval on seg_base_deit_mask (via python -m segm.eval.miou checkpoints/seg_base_deit_mask/checkpoint.pth ade20k --multiscale), I am getting an error:

    Starting process with rank 0...
    Process 0 is connected.
    All processes are connected.
    Traceback (most recent call last):
      File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/home/.../segmenter/segm/eval/miou.py", line 279, in <module>
        main()
      File "/home/.../segmenter/pyenv/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
        return self.main(*args, **kwargs)
      File "/home/.../segmenter/pyenv/lib/python3.8/site-packages/click/core.py", line 1053, in main
        rv = self.invoke(ctx)
      File "/home/.../segmenter/pyenv/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/home/.../segmenter/pyenv/lib/python3.8/site-packages/click/core.py", line 754, in invoke
        return __callback(*args, **kwargs)
      File "/home/.../segmenter/segm/eval/miou.py", line 226, in main
        model, variant = load_model(model_path)
      File "/home/.../segmenter/segm/model/factory.py", line 119, in load_model
        model = create_segmenter(net_kwargs)
      File "/home/.../segmenter/segm/model/factory.py", line 106, in create_segmenter
        encoder = create_vit(model_cfg)
      File "/home/.../segmenter/segm/model/factory.py", line 67, in create_vit
        model = VisionTransformer(**model_cfg)
    TypeError: __init__() got an unexpected keyword argument 'mlp_ratio'
    

    This is happening with both single and multi scale. This seems to be stemming from the mlp_ratio key in the located in the yml config.

    As I keep poking around, if I find a solution I'll submit a PR.

    Thanks again for the repo :+1:

    opened by zroach 0
Owner
PhD student at Ecole Normale Supérieure and INRIA Paris
null
GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training Code and model from our AAAI 2021 paper

Amazon Web Services - Labs 83 Jan 9, 2023
Top2Vec is an algorithm for topic modeling and semantic search.

Top2Vec is an algorithm for topic modeling and semantic search. It automatically detects topics present in text and generates jointly embedded topic, document and word vectors.

Dimo Angelov 2.4k Jan 6, 2023
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

Chi Han 43 Dec 28, 2022
CATs: Semantic Correspondence with Transformers

CATs: Semantic Correspondence with Transformers For more information, check out the paper on [arXiv]. Training with different backbones and evaluation

null 74 Dec 10, 2021
Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources Description This is the repository for the paper Unifying Cross-

Sapienza NLP group 16 Sep 9, 2022
Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings Trong bài viết này mình sẽ sử dụng pretrain model SimCS

Vo Van Phuc 18 Nov 25, 2022
PIZZA - a task-oriented semantic parsing dataset

The PIZZA dataset continues the exploration of task-oriented parsing by introducing a new dataset for parsing pizza and drink orders, whose semantics cannot be captured by flat slots and intents.

null 17 Dec 14, 2022
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

Chi Han 43 Dec 28, 2022
Blue Brain text mining toolbox for semantic search and structured information extraction

Blue Brain Search Source Code DOI Data & Models DOI Documentation Latest Release Python Versions License Build Status Static Typing Code Style Securit

The Blue Brain Project 29 Dec 1, 2022
Semi-automated vocabulary generation from semantic vector models

vec2word Semi-automated vocabulary generation from semantic vector models This script generates a list of potential conlang word forms along with asso

null 9 Nov 25, 2022
txtai: Build AI-powered semantic search applications in Go

txtai: Build AI-powered semantic search applications in Go txtai executes machine-learning workflows to transform data and build AI-powered semantic s

NeuML 49 Dec 6, 2022
The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Main Idea The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank Semantic Search Re

Sergio Arnaud Gomez 2 Jan 28, 2022
Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated. This engine can later be used for downstream tasks in NLP such as Q&A, summarization, generation, and natural language understanding (NLU).

Diego 1 Mar 20, 2022
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing Trankit is a light-weight Transformer-based Pyth

null 652 Jan 6, 2023
Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

T-TA (Transformer-based Text Auto-encoder) This repository contains codes for Transformer-based Text Auto-encoder (T-TA, paper: Fast and Accurate Deep

Jeong Ukjae 13 Dec 13, 2022
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 4.6k Jan 1, 2023
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 3.2k Feb 17, 2021
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

null 44 Dec 31, 2022
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Phil Wang 5k Jan 2, 2023