Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals.

Wouter Van Gansbeke

Last update: Dec 28, 2022

Related tags

Deep Learning clustering representation-learning unsupervised-learning semantic-segmentation pascal-voc moco self-supervised-learning contrastive-learning

Overview

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals

This repo contains the Pytorch implementation of our paper:

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals

Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, and Luc Van Gool.

Introduction
Installation
Training
Evaluation
Model Zoo
Citation

Introduction

Being able to learn dense semantic representations of images without supervision is an important problem in computer vision. However, despite its significance, this problem remains rather unexplored, with a few exceptions that considered unsupervised semantic segmentation on small-scale datasets with a narrow visual domain. We make a first attempt to tackle the problem on datasets that have been traditionally utilized for the supervised case (e.g. PASCAL VOC). To achieve this, we introduce a novel two-step framework that adopts a predetermined prior in a contrastive optimization objective to learn pixel embeddings. Additionally, we argue about the importance of having a prior that contains information about objects, or their parts, and discuss several possibilities to obtain such a prior in an unsupervised manner. In particular, we adopt a mid-level visual prior to group pixels together and contrast the obtained object mask porposals. For this reason we name the method MaskContrast.

Installation

The Python code runs with recent Pytorch versions, e.g. 1.4. Assuming Anaconda, the most important packages can be installed as:

conda install pytorch=1.4.0 torchvision=0.5.0 cudatoolkit=10.0 -c pytorch
conda install -c conda-forge opencv           # For image transformations
conda install matplotlib scipy scikit-learn   # For evaluation
conda install pyyaml easydict                 # For using config files
conda install termcolor                       # For colored print statements

We refer to the requirements.txt file for an overview of the packages in the environment we used to produce our results. The code was run on 2 Tesla V100 GPUs.

Training MaskContrast

Setup

The PASCAL VOC dataset will be downloaded automatically when running the code for the first time. The dataset includes the precomputed supervised and unsupervised saliency masks, following the implementation from the paper.

The following files (in the pretrain/ and segmentation/ directories) need to be adapted in order to run the code on your own machine:

Change the file path for the datasets in data/util/mypath.py. The PASCAL VOC dataset will be saved to this path.
Specify the output directory in configs/env.yml. All results will be stored under this directory.

Pre-train model

The training procedure consists of two steps. First, pixels are grouped together based upon a mid-level visual prior (saliency is used). Then, a pre-training strategy is proposed to contrast the pixel-embeddings of the obtained object masks. The code for the pre-training can be found in the pretrain/ directory and the configuration files are located in the pretrain/configs/ directory. You can choose to run the model with the masks from the supervised or unsupervised saliency model. For example, run the following command to perform the pre-training step on PASCAL VOC with the supervised saliency model:

cd pretrain
python main.py --config_env configs/env.yml --config_exp configs/VOCSegmentation_supervised_saliency_model.yml

Evaluation

Linear Classifier (LC)

We freeze the weights of the pre-trained model and train a 1 x 1 convolutional layer to predict the class assignments from the generated feature representations. Since the discriminative power of a linear classifier is low, the pixel embeddings need to be informative of the semantic class to solve the task in this way. To train the classifier run the following command:

cd segmentation
python linear_finetune.py --config_env configs/env.yml --config_exp configs/linear_finetune/linear_finetune_VOCSegmentation_supervised_saliency.yml

Note, make sure that the pretraining variable in linear_finetune_VOCSegmentation_supervised_saliency.yml points to the location of your pre-trained model. You should get the following results:

mIoU is 63.95
IoU class background is 90.95
IoU class aeroplane is 83.78
IoU class bicycle is 30.66
IoU class bird is 78.79
IoU class boat is 64.57
IoU class bottle is 67.31
IoU class bus is 84.24
IoU class car is 76.77
IoU class cat is 79.10
IoU class chair is 21.24
IoU class cow is 66.45
IoU class diningtable is 46.63
IoU class dog is 73.25
IoU class horse is 62.61
IoU class motorbike is 69.66
IoU class person is 72.30
IoU class pottedplant is 40.15
IoU class sheep is 74.70
IoU class sofa is 30.43
IoU class train is 74.67
IoU class tvmonitor is 54.66

Unsurprisingly, the model has not learned a good representation for every class since some classes are hard to distinguish, e.g. chair or sofa.

We visualize a few examples after CRF post-processing below.

Clustering (K-means)

The feature representations are clustered with K-means. If the pixel embeddings are disentangled according to the defined class labels, we can match the predicted clusters with the ground-truth classes using the Hungarian matching algorithm.

cd segmentation
python kmeans.py --config_env configs/env.yml --config_exp configs/kmeans/kmeans_VOCSegmentation_supervised_saliency_model.yml

Remarks: Note that we perform the complete K-means fitting on the validation set to save memory and that the reported results were averaged over 5 different runs. You should get the following results (21 clusters):

IoU class background is 88.17
IoU class aeroplane is 77.41
IoU class bicycle is 26.18
IoU class bird is 68.27
IoU class boat is 47.89
IoU class bottle is 56.99
IoU class bus is 80.63
IoU class car is 66.80
IoU class cat is 46.13
IoU class chair is 0.73
IoU class cow is 0.10
IoU class diningtable is 0.57
IoU class dog is 35.93
IoU class horse is 48.68
IoU class motorbike is 60.60
IoU class person is 32.24
IoU class pottedplant is 23.88
IoU class sheep is 36.76
IoU class sofa is 26.85
IoU class train is 69.90
IoU class tvmonitor is 27.56

Model Zoo

Download the pretrained and linear finetuned models here.

Dataset	Pixel Grouping Prior	mIoU (LC)	mIoU (K-means)	Download link
PASCAL VOC	Supervised Saliency	-	44.2	Pretrained Model 🔗
PASCAL VOC	Supervised Saliency	63.9 (65.5*)	44.2	Linear Finetuned 🔗
PASCAL VOC	Unsupervised Saliency	-	35.0	Pretrained Model 🔗
PASCAL VOC	Unsupervised Saliency	58.4 (59.5*)	35.0	Linear Finetuned 🔗

* Denotes CRF post-processing.

To evaluate and visualize the predictions of the finetuned model, run the following command:

cd segmentation
python eval.py --config_env configs/env.yml --config_exp configs/VOCSegmentation_supervised_saliency_model.yml --state-dict $PATH_TO_MODEL

You can optionally append the --crf-postprocess flag.

Citation

This code is based on the SCAN and MoCo repositories. If you find this repository useful for your research, please consider citing the following paper(s):

@article{vangansbeke2020unsupervised,
  title={Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals},
  author={Van Gansbeke, Wouter and Vandenhende, Simon and Georgoulis, Stamatios and Van Gool, Luc},
  journal={arxiv preprint arxiv:2102.06191},
  year={2021}
}
@inproceedings{vangansbeke2020scan,
  title={Scan: Learning to classify images without labels},
  author={Van Gansbeke, Wouter and Vandenhende, Simon and Georgoulis, Stamatios and Proesmans, Marc and Van Gool, Luc},
  booktitle={Proceedings of the European Conference on Computer Vision},
  year={2020}
}
@inproceedings{he2019moco,
  title={Momentum Contrast for Unsupervised Visual Representation Learning},
  author={Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross Girshick},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year={2019}
}

For any enquiries, please contact the main authors.

For an overview on self-supervised learning, have a look at the overview repository.

License

This software is released under a creative commons license which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summary here.

Acknoledgements

This work was supported by Toyota, and was carried out at the TRACE Lab at KU Leuven (Toyota Research on Automated Cars in Europe - Leuven).

Comments

How to solve this problem?

cd segmentation python eval.py --config_env configs/env.yml --config_exp configs/VOCSegmentation_supervised_saliency_model.yml --state-dict $PATH_TO_MODEL

In this step, ,there is not "VOCSegmentation_supervised_saliency_model.yml" in segmentation/configs/, I copy this from "pretrain/configs/"

And I set --state-dict to the pretrained supervised model in MODEL ZOO, but got this problem.

Retrieve model Traceback (most recent call last): File "eval.py", line 63, in main() File "eval.py", line 38, in main model = get_model(p) File "/media/yangxilab/media/zjy/seg21/Unsupervised-Semantic-Segmentation-main/segmentation/utils/common_config.py", line 89, in get_model load_pretrained_weights(p, model) File "/media/yangxilab/media/zjy/seg21/Unsupervised-Semantic-Segmentation-main/segmentation/utils/common_config.py", line 16, in load_pretrained_weights print('Loading pre-trained weights from {}'.format(p['pretraining'])) KeyError: 'pretraining'

thanks for your reply.

opened by zjy9779 10
Some details of linear evaluation protocol
Hi, @wvangansbeke I have two questions about the details of linear evaluation protocol.

When you train a 1*1 convolutional layer on top of MoCo v2 features, do you modify the stride of convolutional layer in the ResNet backbone ? What's the final downsampling factor ?

Do all models have the same finetuning settings (learning rate, train time) ?
opened by Alxead 8
Unsupervised Saliency Model

Hi, thanks so much for the great work, it is really interesting and inspiring. I wonder if you will provide the implementation and/or pretrained weight for the unsupervised saliency model, so that we can also generate saliency masks and try your method on other datasets besides PASCAL VOC?

opened by yucornetto 8
request your weight

Thank you very much for your outstanding work about MaskConst. I request to use your weight to carry out some of my other research, thank you very much.

opened by wuleibegreat 6
Saliency model sources

Could you please clarify how the supervised and unsupervised saliency masks were obtained? Were pre-trained models from DeepUSPS and BAS-Net used? For DeepUSPS, was MSRA-10K or MSRA-B used for training? Thanks!

opened by esizikova 6
Reproduce results

Thanks for your great work!

I run your code with ImageNet pretrained model and kmeans clustering evaluation. Got 42.2 mIOU (44.2 in your paper). It seems that you do not fix a seed for training. So how do you get the results? Average over multiple training results?

opened by Trainingzy 5
More Experimental Setup on DAVIS

Thank you for sharing such a great job ! Can you share more details about how to do label propagation on DAVIS (such as hyper-parameters you use)? I notice that Contrastive Random Walk use the penultimate residual block as the embedding, so which layer do you use to perform label propagation ?

opened by Alxead 4
non-object centric dataset

As the paper has mentioned, "Using the object mask proposals, we can assemble a set of object-centric samples even when the dataset itself is non-object centric." I wonder how did you realise it?

opened by jancylee 4
Cannot reproduce kmeans results for Mocov2/SwAV

Hi there, thank you for the interesting paper. I am currently looking into your results and failed to reproduce the K-Means results for baseline methods such as MoCov2 on Pascal val. Would it be possible for you to add the evaluation scripts for other methods than MaskContrast as well? From what I understand segmentation/kmeans.py only works for MaskContrast as you use the saliency head to seperate fg from bg vectors.

opened by MkuuWaUjinga 3
regarding understanding the contrastive loss code
thanks for the great work. I am new to this topic and I am trying to understand part of the code and really appreciate it if you can explain it here to clarify it for me. herein the code I see

offset = torch.arange(0, 2 * batch_size, 2).to(sal_q.device) sal_q = (sal_q + torch.reshape(offset, [-1, 1, 1]))*sal_q # all bg's to 0 sal_q = sal_q.view(-1) mask_indexes = torch.nonzero((sal_q)).view(-1).squeeze() sal_q = torch.index_select(sal_q, index=mask_indexes, dim=0) // 2

but I was not able to get the point here. like what is the goal of offset and what mask_indexes and sal_q will do at the end?

Thank you very much in advance for your help.
opened by seyeeet 3
Custom Dataset

Thank you very much, this is very exciting work. I was wondering if you could share any guidance on how to customize the code for alternative datasets?

opened by Tonks684 3
possibility of positive samples/embeddings in MoCo queue

Hi, This is more of a theoretical query. Is there a way to make sure all embeddings in the queue(maintained in the MoCo v2 training mechanism) are actually negative samples?

e.g., there is a possibility that one or more embeddings in the queue are actually positive examples with respect to the current key/query.

Has this been handled in MaskContrast, or could you point to any work on the same?

Thanks.

opened by silky1708 0

Owner

Wouter Van Gansbeke

PhD researcher at KU Leuven. Especially interested in computer vision, machine learning and deep learning. Working on self-supervised and multi-task learning.

GitHub https://arxiv.org/abs/2102.06191

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals, CVPR2021

End-to-End Object Detection with Learnable Proposal, CVPR2021

1.2k Dec 27, 2022

Learning Open-World Object Proposals without Learning to Classify

Learning Open-World Object Proposals without Learning to Classify Pytorch implementation for "Learning Open-World Object Proposals without Learning to

149 Dec 22, 2022

Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021.

SphereRPN Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021. Authors: Th

15 Dec 2, 2022

Face Mask Detection is a project to determine whether someone is wearing mask or not, using deep neural network.

face-mask-detection Face Mask Detection is a project to determine whether someone is wearing mask or not, using deep neural network. It contains 3 scr

13 Jan 18, 2022

The Face Mask recognition system uses AI technology to detect the person with or without a mask.

Face Mask Detection Face Mask Detection system built with OpenCV, Keras/TensorFlow using Deep Learning and Computer Vision concepts in order to detect

4 Apr 5, 2022

[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

49 Jul 27, 2022

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

134 Dec 19, 2022

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

364 Jan 3, 2023

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

MiVOS (CVPR 2021) - Mask Propagation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [arXiv] [Paper PDF] [Project Page] [Papers with Code] This repo impleme

106 Jan 3, 2023

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Mask R-CNN for Object Detection and Segmentation This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bound

22.5k Jan 4, 2023

The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

32 Jun 14, 2022

AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation

AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation A pytorch-version implementation codes of paper:

11 Dec 13, 2022

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

32 Sep 21, 2022

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

147 Dec 3, 2022

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals.

Related tags

Overview

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals

Contents

Introduction

Installation

Training MaskContrast

Setup

Pre-train model

Evaluation

Linear Classifier (LC)

Clustering (K-means)

Model Zoo

Citation

License

Acknoledgements

Comments

Owner

Wouter Van Gansbeke

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals, CVPR2021

Learning Open-World Object Proposals without Learning to Classify

Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021.

Face Mask Detection is a project to determine whether someone is wearing mask or not, using deep neural network.

The Face Mask recognition system uses AI technology to detect the person with or without a mask.

[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in clustering (CVPR2021)

(CVPR2021) DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation

Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.