Segmenter - Transformer for Semantic Segmentation

Last update: Dec 27, 2022

Related tags

Text Data & NLP segmenter

Overview

Segmenter - Transformer for Semantic Segmentation

Segmenter: Transformer for Semantic Segmentation by Robin Strudel, Ricardo Garcia, Ivan Laptev and Cordelia Schmid.

Installation

The code and several trained models will be released soon.

Video Segmentation

Segmentation maps of Seg-B-Mask/16 trained on ADE20K segmentation dataset and tested on DAVIS video dataset.

BibTex

@article{strudel2021,
  title={Segmenter: Transformer for Semantic Segmentation},
  author={Strudel, Robin and Garcia, Ricardo and Laptev, Ivan and Schmid, Cordelia},
  journal={arXiv preprint arXiv:?},
  year={2021}
}

Credits

The Vision Transformer code is based on timm library and the semantic segmentation training and evaluation pipeline is using mmsegmentation.

Comments

KeyError: ''

Hello, I run the program in windows. And an error occurred that

D:\Download\anaconda\anaconda\envs\learn\python.exe E:/Learning/Graduate/segmenter/segmenter-master/segm/train.py
Starting process with rank 0...
Process 0 is connected.
All processes are connected.
Traceback (most recent call last):
  File "E:\Learning\Graduate\segmenter\segmenter-master\segm\train.py", line 304, in <module>
    main()
  File "D:\Download\anaconda\anaconda\envs\learn\lib\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "D:\Download\anaconda\anaconda\envs\learn\lib\site-packages\click\core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "D:\Download\anaconda\anaconda\envs\learn\lib\site-packages\click\core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "D:\Download\anaconda\anaconda\envs\learn\lib\site-packages\click\core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "E:\Learning\Graduate\segmenter\segmenter-master\segm\train.py", line 76, in main
    model_cfg = cfg["model"][backbone]
KeyError: ''

Do you know how to solve it? Thank you!

opened by SikangSHU 8

Ask about the "Seg-B/8"

Great work on semantic segmentation!

I find that the resolution is important for the final performance, e.g., Seg-B/8.

However, I could not find that ImageNet pre-trained checkpoints with patch-size 8 from the lib timm.

It would be great if you could help to address my concern!

opened by PkuRainBow 8
Code to compute images/sec
Hi,

Thank you for the cool work!

I see that you report images/sec, and mention the following in the paper:

To compute the images per second, we use a V100 GPU, fix the image resolution to 512 and for each model we maximize the batch size allowed by memory for a fair comparison.

I'm trying to do the same, however I'm unable to reproduce the numbers you of images/sec in the paper.

I'm using the code snippet from PyTorch as follows:

batch = torch.rand(args.batch_size, *input_shape).cuda() model(batch) n_runs = 10 from torch.utils.benchmark import Timer t = Timer(stmt="model.forward(batch)", globals={"model": model, "batch": batch}) m = t.timeit(n_runs)

The batch size that fits on V100 for Vit-T backbone is about 140. And the above code shows a timing of 0.62 seconds. So I'm computing the total images/sec = 140/0.62 = 225.8. This is almost half the numbers in Table 3. Can you please help me with what I need to do to get the mentioned result?

Thank you!
opened by prabhuteja12 6
how to get the attention maps

first the folder named images don’t have the file named im0.jpg. they release the message
if i replace the folder images/validation/ADE_val_0000000.jpg ValueError: Provided image path images/training/ADE_train_00016528 is not a valid image file.

and what is the output_dir

opened by sijiua 6
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/$WORK/tempbs_7o9oj'

**I begin to train on my own data, but I get an error when it evals for the first time. The log shows as follow: **

Epoch: [11] [0/8] eta: 0:00:34 loss: 0.0000 (0.0000) learning_rate: 0.0008 (0.0008) time: 4.2506 data: 2.3495 max mem: 9466 Epoch: [11] [7/8] eta: 0:00:00 loss: 0.0000 (0.0000) learning_rate: 0.0008 (0.0008) time: 0.9943 data: 0.2958 max mem: 9491 Epoch: [11] Total time: 0:00:08 (1.0115 s / it) Epoch: [12] [0/8] eta: 0:00:27 loss: 0.0000 (0.0000) learning_rate: 0.0008 (0.0008) time: 3.4646 data: 2.7603 max mem: 9492 Epoch: [12] [7/8] eta: 0:00:00 loss: 0.0000 (0.0000) learning_rate: 0.0008 (0.0008) time: 0.8330 data: 0.3464 max mem: 9492 Epoch: [12] Total time: 0:00:06 (0.8537 s / it) Eval: [ 0/58] eta: 0:01:40 time: 1.7340 data: 1.3048 max mem: 10891 Eval: [50/58] eta: 0:00:01 time: 0.1124 data: 0.0121 max mem: 16814 Eval: [57/58] eta: 0:00:00 time: 0.1047 data: 0.0120 max mem: 16814 _Eval: Total time: 0:00:08 (0.1505 s / it) Traceback (most recent call last): File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/runpy.py", line 192, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/qiuzheng/segmenter/segm/train.py", line 304, in main() File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/click/core.py", line 1128, in call return self.main(*args, **kwargs) File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/home/qiuzheng/segmenter/segm/train.py", line 266, in main eval_logger = evaluate( File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/home/qiuzheng/segmenter/segm/engine.py", line 104, in evaluate val_seg_pred = gather_data(val_seg_pred) File "/home/qiuzheng/segmenter/segm/metrics.py", line 60, in gather_data tmpdir = tempfile.mkdtemp(prefix=tmpprefix) File "/home/qiuzheng/.conda/envs/Segmenter/lib/python3.8/tempfile.py", line 359, in mkdtemp os.mkdir(file, 0o700) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/$WORK/tempbs_7o9oj'

opened by shuaikangma 6
Mutli-GPUs training

This is a good paper and very interested idea! There is a training cmd using a single gpu in readme. For multi-gpus training, could you provide the corresponding cmd ?

opened by qiulesun 4
Performance of Seg-B/16 on CityScapes using AugReg initialization

Hi, thanks for the excellent work! I notice that in your paper, the Seg-B/16 trained on CityScapes is initialized by DeiT pre-trained model (rather than AugReg). And by my own experiments, Seg-B/16 (and my own model based on ViT-Base) with AugReg initialization performs quite bad on CityScapes (73.2 mIoU), while Seg-S/16 performs well (76.2 mIoU). So I wonder if you guys had also got similar results, and if you can share extra information about your choice on initialization of Seg-B/16 model? Many thanks.

opened by YiF-Zhang 4
Multi-GPU Training Not On SLURM

Hello, thanks a lot for your contribution of such a excellent work. I noticed that the distributed multi-gpu training is based on the slurm platform, which is not easy to be run on other platforms. Could you or anyone can provide some tips to change the code from the slurm based code to the non-slurm based one, so that the multi-gpu distributed training can also be conducted on other platforms?

opened by luck528 4
Performance better than that in the paper.

Hi Robin,

Thanks for releasing the code and model. I find that your model performs better than what is reported in the paper. For example, on ADE20K validation set, Seg-B-Mask/16 has 45.69 mIoU (SS), but according to the information from this repo, it can actually achieve 48.5. Am I missing something?

opened by chenyangh 4
train on custom dataset

hello, I would like to ask if I can modify the existing code to train on my dataset because in a previous issue I read that this is not possible yet. If it's possible Any hints about modifications needed ?

opened by george-kalitsios 3

Performance on Pascal Context with Seg-L-Mask/16

Hi, thanks for the great works and the code! I'm trying to reproduce the baseline base on mmsegmentation. While the baseline could be reproduced well on cityscapes and ADE20k, I could only get 56.9 on single scale on Pascal Context(58.1 reported). Anything I've missed? Below is the config I'm running base on mmsegmentation, anything wrong in the setting? Great thanks for your help!

_base_ = [
    # "./training_scheme.py",
    "../_base_/models/segmenter_vit-b16.py",
    "../_base_/datasets/pascal_context_meanstd0.5.py",
    "../_base_/default_runtime.py",
    "../_base_/schedules/schedule_80k.py",
]

model = dict(
    pretrained="pretrain/L_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.1-sd_0.1--imagenet2012-steps_20k-lr_0.01-res_384.npz",
    backbone=dict(
        type="VisionTransformer",
        img_size=(480, 480),
        patch_size=16,
        in_channels=3,
        embed_dims=1024,
        num_layers=24,
        num_heads=16,
        mlp_ratio=4,
        out_indices=(5, 11, 17, 23),
        qkv_bias=True,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.1,
        with_cls_token=True,
        final_norm=True,
        norm_cfg=dict(type="LN", eps=1e-6),
        act_cfg=dict(type="GELU"),
        norm_eval=False,
        interpolate_mode="bicubic",
    ),
    neck=dict(
        type="UseIndexSingleOutNeck",
        index=-1,
    ),
    decode_head=dict(
        n_cls=60,
        n_layers=2,
        d_encoder=1024,
        n_heads=16,
        d_model=1024,
        d_ff=4 * 1024,
    ),
    test_cfg=dict(mode="slide", crop_size=(480, 480), stride=(320, 320)),
)

optimizer = dict(
    _delete_=True,
    type="SGD",
    lr=0.001,
    weight_decay=0.0,
    momentum=0.9,
    paramwise_cfg=dict(
        custom_keys={
            "pos_embed": dict(decay_mult=0.0),
            "cls_token": dict(decay_mult=0.0),
            "norm": dict(decay_mult=0.0),
        }
    ),
)

lr_config = dict(
    _delete_=True,
    policy="poly",
    warmup_iters=0,
    power=0.9,
    min_lr=1e-5,
    by_epoch=False,
)

# By default, models are trained on 8 GPUs with 2 images per GPU
data = dict(samples_per_gpu=2)

opened by hardyho 3

customised data

Hello,

i wanna try this on my own dataset, i have created similar config files and python files like you did for ade20k.

I added a class file for my dataset:

FISH_CONFIG_PATH = Path(__file__).parent / "config" / "fish.py"
FISH_CATS_PATH = Path(__file__).parent / "config" / "fish.yml"

@DATASETS.register_module
class FishSegmentation(BaseMMSeg):
    def __init__(self, image_size, crop_size, split, **kwargs):
        super().__init__(
            image_size, crop_size, split, 
            config_path = FISH_CONFIG_PATH,
            normalization=kwargs.pop('normalization')
        )
        self.names, self.colors = utils.dataset_cat_description(FISH_CATS_PATH)
        self.n_cls = 150
        self.ignore_label = 0
        self.reduce_zero_label = True

After i registered my data by @DATASETS.register_module, the init founction is kind of conflicted with your BaseMMSeg, is there any way that I can use customised data based on your repo?

opened by Remosy 1

CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
KeyError: 'optimizer'

Thank you for your excellent work, but I have a problem about module checkpoint.pth.When I try to run segm.train module,there is an error "KeyError: 'optimizer'",Hope you to answer me. thanks again!

opened by Werejoice 5

Unexpected keyword `mlp_ratio` running `seg_base_deit_mask`

First of all, excellent repo - thanks very much for the awesome contribution to the ml community!

When running running eval on seg_base_deit_mask (via python -m segm.eval.miou checkpoints/seg_base_deit_mask/checkpoint.pth ade20k --multiscale), I am getting an error:

Starting process with rank 0...
Process 0 is connected.
All processes are connected.
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/.../segmenter/segm/eval/miou.py", line 279, in <module>
    main()
  File "/home/.../segmenter/pyenv/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/.../segmenter/pyenv/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/.../segmenter/pyenv/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/.../segmenter/pyenv/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/.../segmenter/segm/eval/miou.py", line 226, in main
    model, variant = load_model(model_path)
  File "/home/.../segmenter/segm/model/factory.py", line 119, in load_model
    model = create_segmenter(net_kwargs)
  File "/home/.../segmenter/segm/model/factory.py", line 106, in create_segmenter
    encoder = create_vit(model_cfg)
  File "/home/.../segmenter/segm/model/factory.py", line 67, in create_vit
    model = VisionTransformer(**model_cfg)
TypeError: __init__() got an unexpected keyword argument 'mlp_ratio'

This is happening with both single and multi scale. This seems to be stemming from the mlp_ratio key in the located in the yml config.

As I keep poking around, if I find a solution I'll submit a PR.

Thanks again for the repo :+1:

opened by zroach 0

Owner

PhD student at Ecole Normale Supérieure and INRIA Paris

GitHub

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training Code and model from our AAAI 2021 paper

83 Jan 9, 2023

Top2Vec is an algorithm for topic modeling and semantic search.

Top2Vec is an algorithm for topic modeling and semantic search. It automatically detects topics present in text and generates jointly embedded topic, document and word vectors.

2.4k Jan 6, 2023

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

43 Dec 28, 2022

CATs: Semantic Correspondence with Transformers

CATs: Semantic Correspondence with Transformers For more information, check out the paper on [arXiv]. Training with different backbones and evaluation

74 Dec 10, 2021

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources Description This is the repository for the paper Unifying Cross-

16 Sep 9, 2022

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings Trong bài viết này mình sẽ sử dụng pretrain model SimCS

18 Nov 25, 2022

PIZZA - a task-oriented semantic parsing dataset

The PIZZA dataset continues the exploration of task-oriented parsing by introducing a new dataset for parsing pizza and drink orders, whose semantics cannot be captured by flat slots and intents.

17 Dec 14, 2022

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

43 Dec 28, 2022

Blue Brain text mining toolbox for semantic search and structured information extraction

Blue Brain Search Source Code DOI Data & Models DOI Documentation Latest Release Python Versions License Build Status Static Typing Code Style Securit

29 Dec 1, 2022

Semi-automated vocabulary generation from semantic vector models

vec2word Semi-automated vocabulary generation from semantic vector models This script generates a list of potential conlang word forms along with asso

9 Nov 25, 2022

txtai: Build AI-powered semantic search applications in Go

txtai: Build AI-powered semantic search applications in Go txtai executes machine-learning workflows to transform data and build AI-powered semantic s

49 Dec 6, 2022

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Main Idea The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank Semantic Search Re

2 Jan 28, 2022

Segmenter - Transformer for Semantic Segmentation

Related tags

Overview

Segmenter - Transformer for Semantic Segmentation

Installation

Video Segmentation

BibTex

Credits

Comments

Patching CVE-2007-4559

Owner

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

Top2Vec is an algorithm for topic modeling and semantic search.

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

CATs: Semantic Correspondence with Transformers

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

PIZZA - a task-oriented semantic parsing dataset

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Blue Brain text mining toolbox for semantic search and structured information extraction

Semi-automated vocabulary generation from semantic vector models

txtai: Build AI-powered semantic search applications in Go

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch