A simple approach to emable dense segmentation with ViT.

HReynaud

Last update: Jan 3, 2023

Related tags

Deep Learning ViTSeg

Overview

Vision Transformer Segmentation Network

This implementation of ViT in pytorch uses a super simple and straight-forward way of generating an output of the same size as the input by applying the inverse rearrange operation on all the predicted outputs. This enables convolution-free multi-class segmentation.

Most of the code is taken from https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit.py

Default Architecture Parameters:

model = ViTSeg( image_size=112, 
                channels=1,
                patch_size=7, 
                num_classes=1, 
                dim=768, 
                depth=6, 
                heads=12, 
                mlp_dim=2048, 
                learned_pos=False, 
                use_token=False)

image_size: An integer or a tuple defining the size of the input image (some code rewrite would enable any image size to be passed)
channels: An integer defining the umber of channels in the input image
patch_size: An integer or a tuple defining the size of the patches
num_classes: An integer representing the nuber of channels in the ouput
dim: An integer defining the size of the embedding dimension
depth: An integer defining the number of transformer layers
heads: An integer defining the number of heads in the transformer layers
mlp_dim: An integer defining the size of the MLP in the transformer layers
learned_pos: A boolean which, if true, switches from fixed positional encoding to learned positional encodings
use_token: A boolean which, if true, add a CLS token in the input and output

Citation

If you find this repository useful, please consider citing it:

@article{reynaud2021vitseg,
  title={ViTSeg-https://github.com/HReynaud/ViTSeg}, 
  url={https://github.com/HReynaud/ViTSeg},  
  Author={Reynaud, Hadrien}, 
  Year={2021}
}

vit for few-shot classification

Few-Shot ViT Requirements PyTorch (= 1.9) TorchVision timm (latest) einops tqdm numpy scikit-learn scipy argparse tensorboardx Pretrained Checkpoints

26 Nov 30, 2022

As-ViT: Auto-scaling Vision Transformers without Training

As-ViT: Auto-scaling Vision Transformers without Training [PDF] Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou In ICLR 2

68 Sep 5, 2022

Dense Unsupervised Learning for Video Segmentation (NeurIPS*2021)

Dense Unsupervised Learning for Video Segmentation This repository contains the official implementation of our paper: Dense Unsupervised Learning for

173 Dec 26, 2022

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

44 Dec 12, 2022

[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Comments

no_classes

Hi,

May I know, why after changing num_classes from 1 to any (e.g 2 or 4) the output of the network is still one channel?

Rearrange-96 [-1, 1, 112, 112] 0

opened by sulaimanvesal 0

A simple approach to emable dense segmentation with ViT.

Related tags

Overview

Vision Transformer Segmentation Network

Default Architecture Parameters:

Citation

You might also like...

vit for few-shot classification

As-ViT: Auto-scaling Vision Transformers without Training

Dense Unsupervised Learning for Video Segmentation (NeurIPS*2021)

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers

EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers

Comments

no_classes

Owner

HReynaud

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

A simple program for training and testing vit

So-ViT: Mind Visual Tokens for Vision Transformer

A PyTorch Implementation of ViT (Vision Transformer)

PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

Official implement of Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

Implementing Vision Transformer (ViT) in PyTorch

This project uses ViT to perform image classification tasks on DATA set CIFAR10.