CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Tianfei Zhou

Last update: Dec 26, 2022

Related tags

Deep Learning prototype transformer fcn semantic-segmentation nonparametric softmax nearest-neighbours-classifier

Overview

Rethinking Semantic Segmentation: A Prototype View

Rethinking Semantic Segmentation: A Prototype View,
Tianfei Zhou, Wenguan Wang, Ender Konukoglu and Luc Van Gool
CVPR 2022 (Oral) (arXiv 2203.15102)

News

[2022-04-19] Release the code based on openseg.pytorch!
[2022-03-31] Paper link updated!
[2022-03-12] Repo created. Paper and code will come soon.

Abstract

Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, this study uncovers several limitations of such parametric segmentation regime, and proposes a nonparametric alternative based on non-learnable prototypes. Instead of prior methods learning a single weight/query vector for each class in a fully parametric manner, our model represents each class as a set of non-learnable prototypes, relying solely on the mean features of several training pixels within that class. The dense prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space, by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to handle arbitrary number of classes with a constant amount of learnable parameters.We empirically show that, with FCN based and attention based segmentation models (i.e., HR-Net, Swin, SegFormer) and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework yields compelling results over several datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), and performs well in the large-vocabulary situation. We expect this work will provoke a rethink of the current de facto semantic segmentation model design.

Installation

This implementation is built on openseg.pytorch. Many thanks to the authors for the efforts.

Please follow the Getting Started for installation and dataset preparation.

Performance

Cityscapes

Method	Train Set	Val Set	Iters	Batch Size	mIoU	Log	CKPT	Script
HRNet	train	val	80K	8	79.0	log	ckpt	`scripts/cityscapes/hrnet/run_h_48_d_4.sh`
Ours	train	val	80K	8	80.1	log	ckpt	`scripts/cityscapes/hrnet/run_h_48_d_4_proto.sh`

More results will come soon

Citation

@inproceedings{zhou2022rethinking,
    author    = {Zhou, Tianfei and Wang, Wenguan and Konukoglu, Ender and Van Gool, Luc},
    title     = {Rethinking Semantic Segmentation: A Prototype View},
    booktitle = {CVPR},
    year      = {2022}
}

Relevant Projects

Please also see our works [1] for a novel training paradigm with a cross-image, pixel-to-pixel contrative loss, and [2] for a novel hierarchy-aware segmentation learning scheme for structured scene parsing.

[1] Exploring Cross-Image Pixel Contrast for Semantic Segmentation - ICCV 2021 (Oral) [arXiv][code]

[2] Deep Hierarchical Semantic Segmentation - CVPR 2022 [arXiv][code]

Comments

Question about seed

if args_parser.seed is not None:
	random.seed(args_parser.seed)
	torch.manual_seed(args_parser.seed)

Each gpu is set to the same seed.

# fix the seed for reproducibility
if args_parser.seed is not None:
	from lib.utils.distributed import get_rank()
	seed = args_parser.seed + get_rank()
	torch.manual_seed(seed)
	np.random.seed(seed)
	random.seed(seed)

Reference

opened by sorrowyn 3

Question regarding IoUs of pretrained HRNet Proto

Hi, I downloaded the checkpoint, prepared the data and ran the evaluation script :

bash scripts/cityscapes/hrnet/run_h_48_d_4_proto.sh val hrnet_proto_80k

I had to include a tiny fix label_img_ = Image.fromarray(label_img_) instead of label_img_ = Image.fromarray(label_img_, 'P') in tester.py because the labels were all black in the output directory. If I then execute the above line, I end up with an mIoU of 85.7, much better than the 81.1 reported in your paper in Table 2 for HRNet. This is the output:

classes          IoU      nIoU
--------------------------------
road          : 0.978194      nan
sidewalk      : 0.817676      nan
building      : 0.966470      nan
wall          : 0.586037      nan
fence         : 0.650901      nan
pole          : 0.858629      nan
traffic light : 0.865478      nan
traffic sign  : 0.904490      nan
vegetation    : 0.977709      nan
terrain       : 0.664823      nan
sky           : 0.973433      nan
person        : 0.950269    0.000000
rider         : 0.802941    0.000000
car           : 0.985926    0.000000
truck         : 0.818478    0.000000
bus           : 0.939901    0.000000
train         : 0.811736    0.000000
motorcycle    : 0.806717    0.000000
bicycle       : 0.919648    0.000000
--------------------------------
Score Average : 0.856814    0.000000
--------------------------------

I also used my own evaluation script on the generated labels that are in the label directory and I get exactly the same results. Could you check?

opened by ksakmann 3

parameter numbers of the entire model?

Thanks for the great work ! I want to ask a question that confuses me. In the Paper, Table 4 shows that the parameter numbers of the entire model is not increased. However, i see that in the code the prototype is inceased by class_num as follow: self.prototypes = nn.Parameter(torch.zeros(self.num_classes, self.num_prototype, in_channels), requires_grad=True) If class_num is increased, the parameter numbers of prototypes are also increased. Did I get it wrong？

opened by Sunting78 2
Question about the prototype initialization?

Hi, thanks for the impressive work.

After reading the paper, I have a question that how the micro-prototypes are initialized? They seem to be properly initialized so as for a reasonable solution in eq.(10).

Cheers

opened by HeimingX 2
Layernorm in Prototype learning

Thanks for your great work! I notice you use layernorm for the final features before the classifier and also for the predictions. I think it is quite uncommon in prototype learning (correct me if i am wrong).

Could you please provide some explanation for this? And if removing the two layernorm, will the performance be degraded?

https://github.com/tfzhou/ProtoSeg/blob/1c4a7784bbce96c06fe72d55255af15e6cf1ca96/lib/models/nets/hrnet.py#L81

opened by Trainingzy 0
Questions about K prototypes

Hi, I'm interested in your work. I wonder if there are k prototypes of some classes that become similar after training? For example, after visualization according to Figure 3 in the paper, it will be found that the activation area of each prototype is roughly the same. I found this problem while running your code. I suspect it's caused by some classes of my dataset that don't have meaningful parts.

Hope to receive your reply, thanks!

opened by RenLibo-aircas 1
Question about Within-Class Online Clustering

Hi, I'm interested in your work. After reading the paper, I'm confused that the goal of Within-Class Online Clustering is to map the pixels Ic to the K prototypes of class c. But how to know if pixels Ic belongs to class c? Did you use Ground Truth in this step? So how do you set it up when testing?

Hope to receive your reply, thanks!

opened by clgx00 7
Question about paper [# model parameter]

Dear author, Thank you so much for your work and code. I have a question about the number of model parameters.

As I understand the paper, pixels are classified as the closest prototype among CK prototypes at inference time. In the end, we have to store CK prototypes, then I wonder why we don't interpret them as model parameters. Also, the number of prototypes to be stored is proportional to the number of classes. Is it just convention?

Thank you.

opened by chwoong 6
Question about loss

Hi, I'm interested in your work. After reading the paper, I'm confused that the PPC loss is achieved by contrastive learning strategy in your paper. But according to the code, the PPC loss is using cross entropy loss. Hope to receive your reply, thanks.

opened by XiaoxxWang 3

Owner

Tianfei Zhou

GitHub https://arxiv.org/abs/2203.15102

[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation

RCIL [CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation Chang-Bin Zhang1, Jia-Wen Xiao1, Xialei Liu1, Ying-Cong Chen2

71 Dec 28, 2022

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation Paper Links: TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentati

253 Dec 21, 2022

[CVPR'22] Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast

wseg Overview The Pytorch implementation of Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast. [arXiv] Though image-level weakly

96 Dec 30, 2022

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

11 Feb 8, 2022

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis Multi-View Consistent Generative Adversarial Networks for 3D-aware

78 Dec 10, 2022

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

161 Dec 8, 2022

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

112 Dec 16, 2022

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

897 Jan 5, 2023

Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds (ICCV 2021 oral) **Project Page | Arxiv ** Runsong Zhu¹, Yuan Liu², Zhen Dong¹, Te

40 Dec 30, 2022

[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

36 Nov 23, 2022

Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"

Introduction This is an official release of the paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity" (arxiv link). Abstrac

21 Nov 23, 2022

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

32 Sep 21, 2022

PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images This r

63 Dec 16, 2022

Adaptive Prototype Learning and Allocation for Few-Shot Segmentation (CVPR 2021)

ASGNet The code is for the paper "Adaptive Prototype Learning and Allocation for Few-Shot Segmentation" (accepted to CVPR 2021) [arxiv] Overview data/

91 Dec 23, 2022

Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation Created by Zeyu HU Introduction This work is based on our paper VMNet: Voxel-Mes

82 Dec 27, 2022

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Pixel-Level Cycle Association This is the Pytorch implementation of our NeurIPS 2020 Oral paper Pixel-Level Cycle Association: A New Perspective for D

87 Oct 19, 2022

CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Related tags

Overview

Rethinking Semantic Segmentation: A Prototype View

News

Abstract

Installation

Performance

Cityscapes

Citation

Relevant Projects

Comments

Owner

Tianfei Zhou

[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022

[CVPR'22] Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Adaptive Prototype Learning and Allocation for Few-Shot Segmentation (CVPR 2021)

Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Rethinking the U-Net architecture for multimodal biomedical image segmentation

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation