Weakly- and Semi-Supervised Panoptic Segmentation

by Qizhu Li*, Anurag Arnab*, Philip H.S. Torr

This repository demonstrates the weakly supervised ground truth generation scheme presented in our paper Weakly- and Semi-Supervised Panoptic Segmentation published at ECCV 2018. The code has been cleaned-up and refactored, and should reproduce the results presented in the paper.

For details, please refer to our paper, and project page. Please check the Downloads section for all the additional data we release.

^{_{* Equal first authorship}}

Introduction

In our weakly-supervised panoptic segmentation experiments, our models are supervised by 1) image-level tags and 2) bounding boxes, as shown in the figure above. We used image-level tags as supervision for "stuff" classes which do not have a defined extent and cannot be described well by tight bounding boxes. For "thing" classes, we used bounding boxes as our weak supervision. This code release clarifies the implementation details of the method presented in the paper.

Iterative ground truth generation

For readers' convenience, we will give an outline of the proposed iterative ground truth generation pipeline, and provide demos for some of the key steps.

We train a multi-class classifier for all classes to obtain rough localisation cues. As it is not possible to fit an entire Cityscapes image (1024x2048) into a network due to GPU memory constraints, we took 15 fixed 400x500 crops per training image, and derived their classification ground truth accordingly, which we use to train the multi-class classifier. From the trained classifier, we extract the Class Activation Maps (CAMs) using Grad-CAM, which has the advantage of being agnostic to network architecture over CAM.
- Download the fixed image crops with image-level tags here to train your own classifier. For convenience, the pixel-level semantic label of the crops are also included, though they should not be used in training.
- The CAMs we produced are available for download here.
In parallel, we extract bounding box annotations from Cityscapes ground truth files, and then run MCG (a segment-proposal algorithm) and Grabcut (a classic foreground segmentation technique given a bounding-box prior) on the training images to generate foreground masks inside each annotated bounding box. MCG and Grabcut masks are merged following the rule that only regions where both have consensus are given the predicted label; otherwise an "ignore" label is assigned.
- The extracted bounding boxes (saved in .mat format) can be downloaded here. Alternatively, we also provide a demo script demo_instanceTrainId_to_dets.m and a batch script batch_instanceTrainId_to_dets.m for you to make them yourself. The demo is self-contained; However, before running the batch script, make sure to
  1. Download the official Cityscapes scripts repository;
  2. Inside the above repository, navigate to cityscapesscripts/preparation and run
```
python createTrainIdInstanceImgs.py
```
    This command requires an environment variable CITYSCAPES_DATASTET=path/to/your/cityscapes/data/folder to be set. These two steps produce the *_instanceTrainIds.png files required by our batch script;
  3. Navigate back to this repository, and place/symlink your gtFine and gtCoarse folders inside data/Cityscapes/ folder so that they are visible to our batch script.
- Please see here for details on MCG.
- We use the OpenCV implementation of Grabcut in our experiments.
- The merged M&G masks we produced are available for download here.
The CAMs (step 1) and M&G masks (step 2) are merged to produce the ground truth needed to kick off iterative training. To see a demo of merging, navigate to the root folder of this repo in MATLAB and run:
```
 demo_merge_cam_mandg;
```
When post-processing network predictions of images from the Cityscapes train_extra split, make sure to use the following settings:
```
opts.run_apply_bbox_prior = false;
opts.run_check_image_level_tags = false;
opts.save_ins = false;
```
because the coarse annotation provided on the train_extra split trades off recall for precision, leading to inaccurate bounding box coordinates, and frequent occurrences of false negatives. This also applies to step 5.
- The results from merging CAMs with M&G masks can be downloaded here.
Using the generated ground truth, weakly-supervised models can be trained in the same way as a fully-supervised model. When the training loss converges, we make dense predictions using the model and also save the prediction scores.
- An example of dense prediction made by a weakly-supervised model is included at results/pred_sem_raw/, and an example of the corresponding prediction scores is provided at results/pred_flat_feat/.
The prediction and prediction scores (and optionally, the M&G masks) are used to generate the ground truth labels for next stage of iterative training. To see a demo of iterative ground truth generation, navigate to the root folder of this repo in MATLAB and run:
```
demo_make_iterative_gt;
```
The generated semantic and instance ground truth labels are saved at results/pred_sem_clean and results/pred_ins_clean respectively.

Please refer to scripts/get_opts.m for the options available. To reproduce the results presented in the paper, use the default setting, and set opts.run_merge_with_mcg_and_grabcut to false after five iterations of training, as the weakly supervised model by then produces better quality segmentation of ''thing'' classes than the original M&G masks.
Repeat step 4 and 5 until training loss no longer reduces.

Downloads

Image crops and tags for training multi-class classifier:
- Images
  - train (9.3GB): Dropbox or BaiduYun
  - train_extra (63.3GB): Dropbox or BaiduYun
  - val (1.6GB): Dropbox or BaiduYun
- Ground truth tags
  - train+train_extra+val (90.9MB): Dropbox or BaiduYun
- Lists
  - train+train_extra+val (827kB): Dropbox or BaiduYun
- Semantic labels (provided for convenience; not to be used in training)
  - train (87.8MB): Dropbox or BaiduYun
  - train_extra (608MB): Dropbox or BaiduYun
  - val (16.2MB): Dropbox or BaiduYun
CAMs:
- train+train_extra (682MB): Dropbox or BaiduYun
Extracted Cityscapes bounding boxes (.mat format):
- train+val (7.6GB): Dropbox or BaiduYun
- train_extra (44.2GB): Dropbox or BaiduYun
Merged MCG&Grabcut masks:
- train+train_extra (99.8MB): Dropbox or BaiduYun
CAMs merged with MCG&Grabcut masks:
- train+train_extra (764MB): Dropbox or BaiduYun

Note that due to file size limit set by BaiduYun, some of the larger files had to be split into several chunks in order to be uploaded. These files are named as filename.zip.part##, where filename is the original file name excluding the extension, and ## is a two digit part index. After you have downloaded all the parts, cd to the folder where they are saved, and use the following command to join them back together:

cat filename.zip.part* > filename.zip

The joining operation may take several minutes, depending on file size.

The above does not apply to files downloaded from Dropbox.

Reference

If you find the code helpful in your research, please cite our paper:

@InProceedings{Li_2018_ECCV,
    author = {Li, Qizhu and 
              Arnab, Anurag and 
              Torr, Philip H.S.},
    title = {Weakly- and Semi-Supervised Panoptic Segmentation},
    booktitle = {The European Conference on Computer Vision (ECCV)},
    month = {September},
    year = {2018}
}

Questions

Please contact Qizhu Li [email protected] and Anurag Arnab [email protected] for enquires, issues, and suggestions.

[CVPR 2021] Exemplar-Based Open-Set Panoptic Segmentation Network (EOPSN)

EOPSN: Exemplar-Based Open-Set Panoptic Segmentation Network (CVPR 2021) PyTorch implementation for EOPSN. We propose open-set panoptic segmentation t

49 Dec 30, 2022

[CVPR 2021] Forecasting the panoptic segmentation of future video frames

Panoptic Segmentation Forecasting Colin Graber, Grace Tsai, Michael Firman, Gabriel Brostow, Alexander Schwing - CVPR 2021 [Link to paper] We propose

44 Nov 29, 2022

PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images This r

63 Dec 16, 2022

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

wsss-analysis The code of: A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, arXiv pre-print 2019 paper.

48 Dec 18, 2022

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation The code of: Context Decoupling Augmentation for Weakly Supervised Semanti

54 Dec 12, 2022

Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation

Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation (AAAI 2021) Official pytorch implementation of our paper: Discriminative

74 Dec 27, 2022

Code for weakly supervised segmentation of a single class

SingleClassRL Implementation of weak single object segmentation from paper "Regularized Loss for Weakly Supervised Single Class Semantic Segmentation"

16 Nov 14, 2022

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

44 Dec 12, 2022

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation The code of: Cross-Image Region Mining with Region Proto

16 Nov 26, 2022

Weakly- and Semi-Supervised Panoptic Segmentation (ECCV18)

Related tags

Overview

Weakly- and Semi-Supervised Panoptic Segmentation

Introduction

Iterative ground truth generation

Downloads

Reference

Questions

You might also like...

[CVPR 2021] Exemplar-Based Open-Set Panoptic Segmentation Network (EOPSN)

[CVPR 2021] Forecasting the panoptic segmentation of future video frames

PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation

Code for weakly supervised segmentation of a single class

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Owner

Qizhu Li

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

An unofficial personal implementation of UM-Adapt, specifically to tackle joint estimation of panoptic segmentation and depth prediction for autonomous driving datasets.

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

The PyTorch implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision.

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

Pixel Consensus Voting for Panoptic Segmentation (CVPR 2020)