Zsseg.baseline - Zero-Shot Semantic Segmentation

Last update: Dec 20, 2022

Related tags

Deep Learning zsseg.baseline

Overview

This repo is for our paper A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model. It is based on the official repo of MaskFormer.

@article{xu2021ss,
  title={End-to-End Semi-Supervised Object Detection with Soft Teacher},
  author={Xu, Mengde and Zhang, Zheng and Hu, Han and Wang, Jianfeng and Wang, Lijuan and Wei, Fangyun and Bai, Xiang and Liu, Zicheng},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021}
}

Guideline

Enviroment

torch==1.8.0
torchvision==0.9.0
detectron2==0.5 #Following https://detectron2.readthedocs.io/en/latest/tutorials/install.html to install it and some required packages
mmcv==1.3.14

FurtherMore, install the modified clip package.

cd third_party/CLIP
python -m pip install -Ue .

Data Preparation

In our experiments, four datasets are used. For Cityscapes and ADE20k, follow the tutorial in MaskFormer.

For COCO Stuff 164k:

Download data from the offical dataset website and extract it like below.

Datasets/
     coco/
          #http://images.cocodataset.org/zips/train2017.zip
          train2017/ 
          #http://images.cocodataset.org/zips/val2017.zip
          val2017/   
          #http://images.cocodataset.org/annotations/annotations_trainval2017.zip
          annotations/ 
          #http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip
          stuffthingmaps/

Format the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset.

python datasets/prepare_coco_stuff_164k_sem_seg.py datasets/coco

python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/train2017_base datasets/coco/stuffthingmaps_detectron2/train2017_base_label_count.pkl

python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/val2017 datasets/coco/stuffthingmaps_detectron2/val2017_label_count.pkl

For Pascal VOC 11k:

Download data from the offical dataset website and extract it like below.

datasets/
   VOC2012/
        #http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
        JPEGImages/
        val.txt
        #http://home.bharathh.info/pubs/codes/SBD/download.html
        SegmentationClassAug/
        #https://gist.githubusercontent.com/sun11/2dbda6b31acc7c6292d14a872d0c90b7/raw/5f5a5270089239ef2f6b65b1cc55208355b5acca/trainaug.txt
        train.txt

Format the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset.

python datasets/prepare_voc_sem_seg.py datasets/VOC2012

python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/train datasets/VOC2012/annotations_detectron2/train_base_label_count.json

python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/val datasets/VOC2012/annotations_detectron2/val_label_count.json

Training and Evaluation

Before training and evaluation, see the tutorial in detectron2. For example, to training a zero shot semantic segmentation model on COCO Stuff:

Training with manually designed prompts:

python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_single_prompt_bs32_60k.yaml

Training with learned prompts:

# Training prompts
python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_proposal_classification_learn_prompt_bs32_10k.yaml --num-gpus 8 
# Training seg model
python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_bs32_60k.yaml --num-gpus 8 MODEL.CLIP_ADAPTER.PROMPT_CHECKPOINT ${TRAINED_PROMPTS}

Note: the prompts training will be affected by the random seed. It is better to run it multiple times.

For evaluation, add --eval-only flag to the traing command.

Trained Model

😄 Coming soon.

Comments

High variation in mIOU for different evaluations of same model
Hello authors, thank you for this great work!

I was trying to run for PASCAL VOC following instructions in the readme. I used the single _single_prompt_ version of config and checked for both CLIP_ADAPTER and REGION_CLIP_ADAPTER these parameters:

PROMPT_LEARNER: predefined PREDEFINED_PROMPT_TEMPLATES: - a sculpture of a {}.

After a single training, for different runs of eval (using the --eval-only and --resume args) with different seeds, I am getting highly varying numbers. For some two runs, I saw a difference of around 20 points for mIOU-unbase. The trained model is not changing so this seemed very strange.

Can you please help me understand if Im making some mistake or if this is expected behaviour, what is the source for this high variation in evaluation? Initial thoughts were that randomly selected prompts were the source, but later checked that it was actually using the single prompt.
opened by mustafa1728 14
Visualization of PASCAL VOC predicted masks

Hi,

Thanks for sharing your code.

After training and testing with PASCAL VOC, I am getting very bad masks that do not correspond to the IoU score during the evaluation. For example class "train" IoU is 92 but predicted masks are very poor.

Can you please provide the visualization script?

Thank you.

opened by prinshul 6
About evaluation

For evaluation, add --eval-only flag to the traing command. Only --eval-only seems not right. I only get the result of model which is initialized from ImageNet pre-train.

opened by NickChang97 3
Something unexpected

Hi,thanks for sharing your codes. According to the doc, python datasets/prepare_coco_stuff_164k_sem_seg.py datasets/coco I get this error: Traceback (most recent call last): File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "datasets/prepare_coco_stuff_164k_sem_seg.py", line 209, in convert_to_trainID Image.fromarray(mask_copy).save(seg_filename, "PNG") File "/home/qing_chang/env_detetron2/lib/python3.7/site-packages/PIL/Image.py", line 2237, in save fp = builtins.open(filename, "w+b") FileNotFoundError: [Errno 2] No such file or directory: 'datasets/coco/stuffthingmaps_detectron2/val2017/000000263425.png'

opened by NickChang97 2
self-training techniques

Thanks for the great work.

I wonder if the self-training techniques (Sec 6.3 in your paper) are implemented in the codes/commands in https://github.com/MendelXu/zsseg.baseline#training-and-evaluation by default?

opened by RenShuhuai-Andy 1
Incorrect detectron2 version in README

This line tries to import RandomSubsetTrainingSampler from detectron2, but this sampler is only available in detectron2 v0.6 (see here), whereas in README instructions the version of detectron2 is 0.5

opened by TB5z035 1
Retrained Vision Encoder

Is there config file or code file for retrained clip vision encoder? When building the clip model, the parameter of frozen is true. Do I understand correctly?

opened by AutomanHan 0
cocostuff result

Thanks for the fantastic paper and codebase! Does the result of training with manually designed prompts and training with learned prompts compare with Table.7 in the paper? In the result of manually desigened prompts, the mIoU of seen and unseen are 28.03 and 57.07 respectively. Useen is better than seen, maybe the dataset has some problem?

opened by AutomanHan 0
Regarding zero shot training setting

Thanks for the great work and the code. I had some doubts regarding the zero-shot training setting. If I understand correctly, in zero-shot setting only the base classes are used for training and the novel classes should be ignored maybe by setting them as ignore labels.

I didn't find any code to handle the novel classes. I was expecting that in the file zsseg.baseline/mask_former/data/dataset_mappers/mask_former_semantic_dataset_mapper.py or zsseg.baseline/mask_former/zero_shot_mask_former_model.py there should be a provision to ignore the novel classes while training.

I can see that you only pass the names of base classes here which are used in text prompts but I don't see any code to modify the groundtruth segmentation masks to ignore the novel classes.

Please let me know if I am missing something or misunderstanding something.

opened by vidit98 0
The pretained model (trrain on seen classes and test on unseen classes of the same dataset) could not match the result in paper.

I download the pretrained model Trained Model, and evaluate on the cocostuff dataset. The results of Directly using CLIP vision eccoder without tuning are 21.01 mIoU and 26.95 mIoU for seen and unseen respectively. However, the results of paper are 26.8 mIoU and 29.7 mIoU. Are the results reasonable?

opened by AutomanHan 1

VOC dataset Preparation

Was there the command of Data Preparetion for Pascal VOC 11k a error?

python datasets/prepare_voc_sem_seg.py datasets/VOC2012
python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/train datasets/VOC2012/annotations_detectron2/train_base_label_count.json

should be

python datasets/prepare_voc_sem_seg.py datasets/VOC2012
python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/train_base datasets/VOC2012/annotations_detectron2/train_base_label_count.json

train -> train_base?

opened by AutomanHan 1

VOC 2012 Dataset Preparation
I am a bit confused as to how the VOC2012 dataset should be prepared for this model. The instruction says that the file structure should be like this:

I hope some of my confusions can be cleared up regarding the matter.

I'm guessing that the first link is the link to download the whole VOC2012 dataset folder and from there we need to extract the JPEGImages folder and the val.txt file. But the VOC2012/ImageSets folder has 4 separate folders (Action, Layout, Main, Segmentation) under it, each of which contains a val.txt file. Which one am I supposed to select? I was inclined towards selecting the one under the 'Main' folder.

The second link, I'm guessing, is a link to download the SegmentationClassAug folder which I am unable to find. Any guidance as to where exactly to look for will be helpful.

The third link is one that has a file named trainaug.txt which I am supposed to download and rename as train.txt if I'm not wrong?
opened by tauseef09 0
Visualization

Thanks for the fantastic paper and codebase! I ran the manual prompts experiment python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_single_prompt_bs32_60k.yaml and wanted to know how did you guys visualize your results? It will be great if you can point me in a direction for the same.

opened by AnukritiSinghh 2
pretrained model

Thanks for your great work and released code. I wonder if you can release the pretrained model so that we can evaluate the generalization ability on custom datasets.

opened by Dingry 2
Missing Config Files

Hi there,

Thanks for sharing this good work!

When I am trying to run learned prompt on the Pascal VOC dataset. I use the config file : voc-11k-15/zero_shot_proposal_classification_learn_prompt_bs16_10k.yaml. In this config file, another config file "vo-11k-20/maskformer_R50_bs32_60k.yaml" is required as the base config, however, the base config file is missing. I am wondering if you can update either of these two settings.

Thank you,

opened by tao0420 1

Owner

GitHub

Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds Björn Michele1), Alexandre Boulch1), Gilles Puy1), Maxime Bucher1) and Rena

15 Dec 22, 2022

Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

47 Dec 16, 2022

Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

23 Oct 17, 2022

Jingju baseline - A baseline model of our project of Beijing opera script generation

Jingju Baseline It is a baseline of our project about Beijing opera script gener

1 Jan 14, 2022

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Semantic Diversity Learning for Zero-Shot Multi-label Classification Paper Official PyTorch Implementation Avi Ben-Cohen, Nadav Zamir, Emanuel Ben Bar

28 Aug 29, 2022

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r

6 Dec 1, 2021

A Strong Baseline for Image Semantic Segmentation

A Strong Baseline for Image Semantic Segmentation Introduction This project is an open source semantic segmentation toolbox based on PyTorch. It is ba

49 Sep 20, 2022

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

45 Dec 12, 2022

code for CVPR paper Zero-shot Instance Segmentation

Code for CVPR2021 paper Zero-shot Instance Segmentation Code requirements python: python3.7 nvidia GPU pytorch1.1.0 GCC >=5.4 NCCL 2 the other python

86 Dec 13, 2022

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

32 Sep 21, 2022

FairMOT - A simple baseline for one-shot multi-object tracking

3.6k Jan 8, 2023

[CVPR 2021] Few-shot 3D Point Cloud Semantic Segmentation

Few-shot 3D Point Cloud Semantic Segmentation Created by Na Zhao from National University of Singapore Introduction This repository contains the PyTor

117 Dec 27, 2022

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

103 Dec 14, 2022

[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Counterfactual Zero-Shot and Open-Set Visual Recognition This project provides implementations for our CVPR 2021 paper Counterfactual Zero-S

144 Dec 24, 2022

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model Edresson Casanova, Christopher Shulby, Eren Gölge, Nicolas Michael Müller, Frede

92 Dec 9, 2022

Zero-shot Synthesis with Group-Supervised Learning (ICLR 2021 paper)

GSL - Zero-shot Synthesis with Group-Supervised Learning Figure: Zero-shot synthesis performance of our method with different dataset (iLab-20M, RaFD,

62 Dec 21, 2022

Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

Zero-shot-Fact-Verification-by-Claim-Generation This repository contains code and models for the paper: Zero-shot Fact Verification by Claim Generatio

47 Jan 1, 2023

Shared Attention for Multi-label Zero-shot Learning

Shared Attention for Multi-label Zero-shot Learning Overview This repository contains the implementation of Shared Attention for Multi-label Zero-shot

26 Dec 14, 2022

PyTorch implementation of 1712.06087 "Zero-Shot" Super-Resolution using Deep Internal Learning

Unofficial PyTorch implementation of "Zero-Shot" Super-Resolution using Deep Internal Learning Unofficial Implementation of 1712.06087 "Zero-Shot" Sup

196 Nov 27, 2022