Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

zhaohu xing

Last update: Dec 16, 2022

Overview

SETR - Pytorch

Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official code,I implemented SETR-Progressive UPsampling(SETR-PUP) using pytorch.

Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

Vit

The Vit model is also implemented, and you can use it for image classification.

Usage SETR

from SETR.transformer_seg import SETRModel
import torch 

if __name__ == "__main__":
    net = SETRModel(patch_size=(32, 32), 
                    in_channels=3, 
                    out_channels=1, 
                    hidden_size=1024, 
                    num_hidden_layers=8, 
                    num_attention_heads=16, 
                    decode_features=[512, 256, 128, 64])
    t1 = torch.rand(1, 3, 256, 256)
    print("input: " + str(t1.shape))
    
    # print(net)
    print("output: " + str(net(t1).shape))

If the output size is (1, 1, 256, 256), the code runs successfully.

Usage Vit

from SETR.transformer_seg import Vit
import torch 

if __name__ == "__main__":
    model = Vit(patch_size=(7, 7), 
                    in_channels=1, 
                    out_class=10, 
                    hidden_size=1024, 
                    num_hidden_layers=1, 
                    num_attention_heads=16)
    print(model)
    t1 = torch.rand(1, 1, 28, 28)
    print("input: " + str(t1.shape))

    print("output: " + str(model(t1).shape))

The output shape is (1, 10).

current examples

task_mnist: The simplest example, using the Vit model to classify the minst dataset.
task_car_seg: The example is sample segmentation task. data download: https://www.kaggle.com/c/carvana-image-masking-challenge/data

More examples will be updated later.

You might also like...

An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

Sequence Feature Alignment (SFA) By Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-jun Zha, Yonggang Wen, and Dacheng Tao This repository is an o

79 Dec 24, 2022

An original implementation of "Noisy Channel Language Model Prompting for Few-Shot Text Classification"

Channel LM Prompting (and beyond) This includes an original implementation of Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer. "Noisy Cha

92 Jan 7, 2023

PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

2.7k Dec 27, 2022

Rethinking the U-Net architecture for multimodal biomedical image segmentation

MultiResUNet Rethinking the U-Net architecture for multimodal biomedical image segmentation This repository contains the original implementation of "M

308 Jan 5, 2023

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

456 Dec 12, 2022

[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

174 Dec 19, 2022

UnpNet - Rethinking 3-D LiDAR Point Cloud Segmentation(IEEE TNNLS)

UnpNet Citation Please cite the following paper if you use this repository in your reseach. @article {PMID:34914599, Title = {Rethinking 3-D LiDAR Po

4 Jul 15, 2022

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective Zhengzhuo Xu, Zenghao Chai, Chun Yuan This is the PyTorch implement

16 Dec 15, 2022

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

1.4k Jan 8, 2023

Comments

请问下作者，你训练carvana数据效果如何
我做了如下粗略的对比试验： carvana数据集：train_size=4070 val_size=1018 输入：256x256的单通道图, 输出：二值化矩阵对比策略：无预训练权重训练8个epoch,记录最佳dice

SETR_best_dice:0.94867 Unet_best_dice:0.98734
opened by JavisPeng 5
请问关于模型的"img_size"这个参数是什么呀？不需要和输入一样吗?
我看到使用的实例中： `from SETR.transformer_seg import SETRModel import torch

if name == "main": net = SETRModel(img_size=(32, 32), in_channels=3, out_channels=1, hidden_size=1024, num_hidden_layers=8, num_attention_heads=16, decode_features=[512, 256, 128, 64]) t1 = torch.rand(1, 3, 256, 256) print("input: " + str(t1.shape))

# print(net) print("output: " + str(net(t1).shape))`

SETRModel(img_size=(32, 32), 有这个，但是下面的输入使用的img_size是(256,256)。所以这个模型的img_size参数对应的是什么呀
opened by GusRoth 3
doubt about patch embeddings

Hi~awsome repo. But I wonder if it is necessary to implement GELU and LayerNorm after linear layer to get patch embedding. Neither the ViT paper and code applies these layers. What I mentioned is in Line 269 and 270 in /SETR/transformer_model.py

opened by zaocan666 4

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

Related tags

Overview

SETR - Pytorch

Vit

Usage SETR

Usage Vit

current examples

more

You might also like...

An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

An original implementation of "Noisy Channel Language Model Prompting for Few-Shot Text Classification"

PyTorch original implementation of Cross-lingual Language Model Pretraining.

Rethinking the U-Net architecture for multimodal biomedical image segmentation

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

UnpNet - Rethinking 3-D LiDAR Point Cloud Segmentation(IEEE TNNLS)

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Comments

请问下作者，你训练carvana数据效果如何

请问关于模型的"img_size"这个参数是什么呀？不需要和输入一样吗?

doubt about patch embeddings

Owner

zhaohu xing

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification

Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of images as "pixels"

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Spectralformer: Rethinking hyperspectral image classification with transformers

Paddle pit - Rethinking Spatial Dimensions of Vision Transformers

Official implementation of "SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers"

A PyTorch implementation of " EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks."