PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

Gyungin Shin

Last update: Sep 25, 2022

Related tags

Deep Learning PixelPick

Overview

PixelPick

This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

[Project page] [Paper]

Abstract
Installation
Benchmark results
Models
PixelPick mouse-free annotation tool (to be updated)
Citation (to be updated)
Acknowledgements

Abstract

A central challenge for the task of semantic segmentation is the prohibitive cost of obtaining dense pixel-level annotations to supervise model training. In this work, we show that in order to achieve a good level of segmentation performance, all you need are a few well-chosen pixel labels. We make the following contributions: (i) We investigate the novel semantic segmentation setting in which labels are supplied only at sparse pixel locations, and show that deep neural networks can use a handful of such labels to good effect; (ii) We demonstrate how to exploit this phenomena within an active learning framework, termed PixelPick, to radically reduce labelling cost, and propose an efficient “mouse-free” annotation strategy to implement our approach; (iii) We conduct extensive experiments to study the influence of annotation diversity under a fixed budget, model pretraining, model capacity and the sampling mechanism for picking pixels in this low annotation regime; (iv) We provide comparisons to the existing state of the art in semantic segmentation with active learning, and demonstrate comparable performance with up to two orders of magnitude fewer pixel annotations on the CamVid, Cityscapes and PASCAL VOC 2012 benchmarks; (v) Finally, we evaluate the efficiency of our annotation pipeline and its sensitivity to annotator error to demonstrate its practicality. Our code, models and annotation tool will be made publicly available.

Installation

Prerequisites

Our code is based on Python 3.8 and uses the following Python packages.

torch>=1.8.1
torchvision>=0.9.1
tqdm>=4.59.0
cv2>=4.5.1.48

Clone this repository

git clone https://github.com/NoelShin/PixelPick.git
cd PixelPick

Download dataset

Follow one of the instructions below to download a dataset you are interest in. Then, set the dir_dataset variable in args.py to the directory path which contains the downloaded dataset.

For CamVid, you need to download SegNet-Tutorial codebase as a zip file and use CamVid directory which contains images/annotations for training and test after unzipping it. You don't need to change the directory structure. [CamVid]
For Cityscapes, first visit the link and login to download. Once downloaded, you need to unzip it. You don't need to change the directory structure. It is worth noting that, if you set downsample variable in args.py (4 by default), it will first downsample train and val images of Cityscapes and store them within {dir_dataset}_d{downsample} folder which will be located in the same directory of dir_dataset. This is to enable a faster dataloading during training. [Cityscapes]
For PASCAL VOC 2012, the dataset will be automatically downloaded via torchvision.datasets.VOCSegmentation. You just need to specify which directory you want to download it with dir_dataset variable. If the automatic download fails, you can manually download through the following page (you don't need to untar VOCtrainval_11-May-2012.tar file which will be downloaded). [PASCAL VOC 2012 segmentation]

For more details about the data we used to train/validate our model, please visit datasets directory and find {camvid, cityscapes, voc}_{train, val}.txt file.

Train and validate

By default, the current code validates the model every epoch while training. To train a MobileNetv2-based DeepLabv3+ network, follow the below lines. (The pretrained MobileNetv2 will be loaded automatically.)

cd scripts
sh pixelpick-dl-cv.sh

Benchmark results

For CamVid and Cityscapes, we report the average of 5 different runs and 3 different runs for PASCAL VOC 2012. Please refer to our paper for details. ± one std of mean IoU is denoted.

CamVid

model	backbone (encoder)	# labelled pixels per img (% annotation)	mean IoU (%)
PixelPick	MobileNetv2	20 (0.012)	50.8 ± 0.2
PixelPick	MobileNetv2	40 (0.023)	53.9 ± 0.7
PixelPick	MobileNetv2	60 (0.035)	55.3 ± 0.5
PixelPick	MobileNetv2	80 (0.046)	55.2 ± 0.7
PixelPick	MobileNetv2	100 (0.058)	55.9 ± 0.1
Fully-supervised	MobileNetv2	360x480 (100)	58.2 ± 0.6
PixelPick	ResNet50	20 (0.012)	59.7 ± 0.9
PixelPick	ResNet50	40 (0.023)	62.3 ± 0.5
PixelPick	ResNet50	60 (0.035)	64.0 ± 0.3
PixelPick	ResNet50	80 (0.046)	64.4 ± 0.6
PixelPick	ResNet50	100 (0.058)	65.1 ± 0.3
Fully-supervised	ResNet50	360x480 (100)	67.8 ± 0.3

Cityscapes

Note that to make training time manageable, we train on the quarter resolution (256x512) of the original Cityscapes images (1024x2048).

model	backbone (encoder)	# labelled pixels per img (% annotation)	mean IoU (%)
PixelPick	MobileNetv2	20 (0.015)	52.0 ± 0.6
PixelPick	MobileNetv2	40 (0.031)	54.7 ± 0.4
PixelPick	MobileNetv2	60 (0.046)	55.5 ± 0.6
PixelPick	MobileNetv2	80 (0.061)	56.1 ± 0.3
PixelPick	MobileNetv2	100 (0.076)	56.5 ± 0.3
Fully-supervised	MobileNetv2	256x512 (100)	61.4 ± 0.5
PixelPick	ResNet50	20 (0.015)	56.1 ± 0.4
PixelPick	ResNet50	40 (0.031)	60.0 ± 0.3
PixelPick	ResNet50	60 (0.046)	61.6 ± 0.4
PixelPick	ResNet50	80 (0.061)	62.3 ± 0.4
PixelPick	ResNet50	100 (0.076)	62.8 ± 0.4
Fully-supervised	ResNet50	256x512 (100)	68.5 ± 0.3

PASCAL VOC 2012

model	backbone (encoder)	# labelled pixels per img (% annotation)	mean IoU (%)
PixelPick	MobileNetv2	10 (0.009)	51.7 ± 0.2
PixelPick	MobileNetv2	20 (0.017)	53.9 ± 0.8
PixelPick	MobileNetv2	30 (0.026)	56.7 ± 0.3
PixelPick	MobileNetv2	40 (0.034)	56.9 ± 0.7
PixelPick	MobileNetv2	50 (0.043)	57.2 ± 0.3
Fully-supervised	MobileNetv2	N/A (100)	57.9 ± 0.5
PixelPick	ResNet50	10 (0.009)	59.7 ± 0.8
PixelPick	ResNet50	20 (0.017)	65.6 ± 0.5
PixelPick	ResNet50	30 (0.026)	66.4 ± 0.2
PixelPick	ResNet50	40 (0.034)	67.2 ± 0.1
PixelPick	ResNet50	50 (0.043)	67.4 ± 0.5
Fully-supervised	ResNet50	N/A (100)	69.4 ± 0.3

Models

model	dataset	backbone (encoder)	# labelled pixels per img (% annotation)	mean IoU (%)	Download
PixelPick	CamVid	MobileNetv2	100 (0.058)	56.1	Link
PixelPick	CamVid	ResNet50	100 (0.058)	TBU	TBU
PixelPick	Cityscapes	MobileNetv2	100 (0.076)	56.8	Link
PixelPick	Cityscapes	ResNet50	100 (0.076)	63.3	Link
PixelPick	VOC 2012	MobileNetv2	50 (0.043)	57.4	Link
PixelPick	VOC 2012	ResNet50	50 (0.043)	68.0	Link

PixelPick mouse-free annotation tool

Code for the annotation tool will be made available.

Citation

To be updated.

Acknowledgements

We borrowed code for the MobileNetv2-based DeepLabv3+ network from https://github.com/Shuai-Xie/DEAL.

If you have any questions, please contact us at {gyungin, weidi, samuel}@robots.ox.ac.uk.

Comments

Question about random initialization for 0th query

Hi,

I have a question about random initialization for 0th query. I may totally misunderstand your implementation but do not quite understand why you randomly initialize two different queries for self.dataloader and self.dataloader_query separately. If this is the case, wouldn't it be possible that a model labels a pixel in one of self.dataloader images that has already been labeled through random initialization since 0th labels for self.dataloader and self.dataloader_query are different? Is there any particular reason why you maintain two different queries? Thank you.

opened by won-bae 2
Questions about the paper
Hi Gyungin, Thanks for sharing the code for such a simple but effective algorithm. while reading your paper, I came up with two questions,

In 4.5 Practical deployment, you mentioned that "it takes less than 1 second on average to label the queried pixel (10s per image),..." Does this mean it takes very close to 1 second for each pixel and about 10 seconds to annotate 10 pixels for each image?

In Table1, you reported the number for CCT with 1.5K fine annotations and 9K weak annotations. If my understanding on CCT is correct, however, it is supposed to be 73.2. I guess 69.4 is for the case where CCT is trained only on 1.5K fine annotations and 9K unlabeled images. Could you double check this?

I look forward to hearing from you. Thank you.
opened by won-bae 2
Metrics on validation set better than on training set in later iterations
Hey, first off thanks for your great work. I ran Pixelpick on the CamVid Datset and was able to reproduce your results. However, I noticed that in the later iterations (e.g. 8th and 9th query) the metrics are getting worse on the training set compared to earlier queries. Also the metrics on the validation set are better than on the training set in later iterations.

This was somewhat confusing to me as I would expect the train metrics to always be better or at least close to the validation metrics. Have you also experienced this behaviour? Could you explain why it happens?

Here are some example outputs of the log files (log_train and log_val) to clarify what I mean:

1_query: (train metrics are better than val metrics (mIoU)) log_train

epoch,mIoU,pixel_acc,loss 1,0.1647861916095027,0.452034670192449,1.6727225210497287 2,0.22060275543385285,0.5612143566463328,1.3061175287746993 ... 49,0.541220343050904,0.7966300366300366,0.6058177067770985 50,0.5305067612137754,0.8005161682101108,0.5683277458603916

log_val

epoch,mIoU,pixel_acc 1,0.25997502362990466,0.7165194302324359 2,0.3002034544983505,0.7339188130731727 ... 49,0.5081947105419546,0.8559528512823557 50,0.5070833871048923,0.8561115071697197

9_query: (train metrics are worse than val metrics) log_train

epoch,mIoU,pixel_acc,loss 1,0.14555334407154222,0.3848359444280225,1.7204595146283426 2,0.190205061287549,0.46715755745886994,1.43651252691863 ... 49,0.464088471069229,0.6884144658139321,0.7766113927781256 50,0.458601080099059,0.6860887275303286,0.7839057154017068

log_val

epoch,mIoU,pixel_acc 1,0.3229489249178415,0.7533149149938388 2,0.32805722647346275,0.7657497348137738 ... 49,0.5605398407886102,0.8780724211051123 50,0.5584485101927593,0.8780123472659442

Best regards, Marcel
opened by MdeBoer95 2
Downsample Cityscapes dataset

Hi author, thank you very much for sharing this repo. I want to run it on Cityscapes. And I found that I need to downsample cityscapes dataset first, as written in the text. But I haven't found any function to run this step. Could you please help to give a bit clearer explanation in this step?

opened by feipan664 2
PixelPick mouse-free annotation tool

Hello，author, thank you very much for your sharing. I'd like to take a look at the mouse-free annotation tool mentioned in this article, but I haven't found the code for it yet. I wonder if you could tell me?

opened by pranoid-cj 7

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

Related tags

Overview

PixelPick

Table of contents

Abstract

Installation

Prerequisites

Clone this repository

Download dataset

Train and validate

Benchmark results

CamVid

Cityscapes

PASCAL VOC 2012

Models

PixelPick mouse-free annotation tool

Citation

Acknowledgements

Comments

Question about random initialization for 0th query

Questions about the paper

Metrics on validation set better than on training set in later iterations

Downsample Cityscapes dataset

PixelPick mouse-free annotation tool

Owner

Gyungin Shin

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Official implementation of the ICLR 2021 paper

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Official implementation for NIPS'17 paper: PredRNN: Recurrent Neural Networks for Predictive Learning Using Spatiotemporal LSTMs.

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

The official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." *.

The repository offers the official implementation of our paper in PyTorch.

The official PyTorch implementation of the paper: Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." .