Zsseg.baseline - Zero-Shot Semantic Segmentation

Overview

This repo is for our paper A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model. It is based on the official repo of MaskFormer.

@article{xu2021ss,
  title={End-to-End Semi-Supervised Object Detection with Soft Teacher},
  author={Xu, Mengde and Zhang, Zheng and Hu, Han and Wang, Jianfeng and Wang, Lijuan and Wei, Fangyun and Bai, Xiang and Liu, Zicheng},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021}
}

Guideline

  • Enviroment

    torch==1.8.0
    torchvision==0.9.0
    detectron2==0.5 #Following https://detectron2.readthedocs.io/en/latest/tutorials/install.html to install it and some required packages
    mmcv==1.3.14

    FurtherMore, install the modified clip package.

    cd third_party/CLIP
    python -m pip install -Ue .
  • Data Preparation

    In our experiments, four datasets are used. For Cityscapes and ADE20k, follow the tutorial in MaskFormer.

  • For COCO Stuff 164k:

    • Download data from the offical dataset website and extract it like below.
      Datasets/
           coco/
                #http://images.cocodataset.org/zips/train2017.zip
                train2017/ 
                #http://images.cocodataset.org/zips/val2017.zip
                val2017/   
                #http://images.cocodataset.org/annotations/annotations_trainval2017.zip
                annotations/ 
                #http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip
                stuffthingmaps/ 
    • Format the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset.
      python datasets/prepare_coco_stuff_164k_sem_seg.py datasets/coco
      
      python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/train2017_base datasets/coco/stuffthingmaps_detectron2/train2017_base_label_count.pkl
      
      python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/val2017 datasets/coco/stuffthingmaps_detectron2/val2017_label_count.pkl
  • For Pascal VOC 11k:

    • Download data from the offical dataset website and extract it like below.
    datasets/
       VOC2012/
            #http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
            JPEGImages/
            val.txt
            #http://home.bharathh.info/pubs/codes/SBD/download.html
            SegmentationClassAug/
            #https://gist.githubusercontent.com/sun11/2dbda6b31acc7c6292d14a872d0c90b7/raw/5f5a5270089239ef2f6b65b1cc55208355b5acca/trainaug.txt
            train.txt
            
    • Format the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset.
    python datasets/prepare_voc_sem_seg.py datasets/VOC2012
    
    python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/train datasets/VOC2012/annotations_detectron2/train_base_label_count.json
    
    python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/val datasets/VOC2012/annotations_detectron2/val_label_count.json
  • Training and Evaluation

    Before training and evaluation, see the tutorial in detectron2. For example, to training a zero shot semantic segmentation model on COCO Stuff:

  • Training with manually designed prompts:

    python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_single_prompt_bs32_60k.yaml
    
  • Training with learned prompts:

    # Training prompts
    python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_proposal_classification_learn_prompt_bs32_10k.yaml --num-gpus 8 
    # Training seg model
    python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_bs32_60k.yaml --num-gpus 8 MODEL.CLIP_ADAPTER.PROMPT_CHECKPOINT ${TRAINED_PROMPTS}

    Note: the prompts training will be affected by the random seed. It is better to run it multiple times.

    For evaluation, add --eval-only flag to the traing command.

  • Trained Model

    😄 Coming soon.

Comments
  • High variation in mIOU for different evaluations of same model

    High variation in mIOU for different evaluations of same model

    Hello authors, thank you for this great work!

    I was trying to run for PASCAL VOC following instructions in the readme. I used the single _single_prompt_ version of config and checked for both CLIP_ADAPTER and REGION_CLIP_ADAPTER these parameters:

    PROMPT_LEARNER: predefined
    PREDEFINED_PROMPT_TEMPLATES:
        - a sculpture of a {}.
    

    After a single training, for different runs of eval (using the --eval-only and --resume args) with different seeds, I am getting highly varying numbers. For some two runs, I saw a difference of around 20 points for mIOU-unbase. The trained model is not changing so this seemed very strange.

    Can you please help me understand if Im making some mistake or if this is expected behaviour, what is the source for this high variation in evaluation? Initial thoughts were that randomly selected prompts were the source, but later checked that it was actually using the single prompt.

    opened by mustafa1728 14
  • Visualization of PASCAL VOC predicted masks

    Visualization of PASCAL VOC predicted masks

    Hi,

    Thanks for sharing your code.

    After training and testing with PASCAL VOC, I am getting very bad masks that do not correspond to the IoU score during the evaluation. For example class "train" IoU is 92 but predicted masks are very poor.

    Can you please provide the visualization script?

    Thank you.

    opened by prinshul 6
  • About evaluation

    About evaluation

    For evaluation, add --eval-only flag to the traing command. Only --eval-only seems not right. I only get the result of model which is initialized from ImageNet pre-train.

    opened by NickChang97 3
  • Something unexpected

    Something unexpected

    Hi,thanks for sharing your codes. According to the doc, python datasets/prepare_coco_stuff_164k_sem_seg.py datasets/coco I get this error: Traceback (most recent call last): File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "datasets/prepare_coco_stuff_164k_sem_seg.py", line 209, in convert_to_trainID Image.fromarray(mask_copy).save(seg_filename, "PNG") File "/home/qing_chang/env_detetron2/lib/python3.7/site-packages/PIL/Image.py", line 2237, in save fp = builtins.open(filename, "w+b") FileNotFoundError: [Errno 2] No such file or directory: 'datasets/coco/stuffthingmaps_detectron2/val2017/000000263425.png'

    opened by NickChang97 2
  • self-training techniques

    self-training techniques

    opened by RenShuhuai-Andy 1
  • Incorrect detectron2 version in README

    Incorrect detectron2 version in README

    This line tries to import RandomSubsetTrainingSampler from detectron2, but this sampler is only available in detectron2 v0.6 (see here), whereas in README instructions the version of detectron2 is 0.5

    opened by TB5z035 1
  • Retrained Vision Encoder

    Retrained Vision Encoder

    Is there config file or code file for retrained clip vision encoder? When building the clip model, the parameter of frozen is true. Do I understand correctly? image

    opened by AutomanHan 0
  • cocostuff result

    cocostuff result

    Thanks for the fantastic paper and codebase! Does the result of training with manually designed prompts and training with learned prompts compare with Table.7 in the paper? In the result of manually desigened prompts, the mIoU of seen and unseen are 28.03 and 57.07 respectively. Useen is better than seen, maybe the dataset has some problem?

    opened by AutomanHan 0
  • Regarding zero shot training setting

    Regarding zero shot training setting

    Thanks for the great work and the code. I had some doubts regarding the zero-shot training setting. If I understand correctly, in zero-shot setting only the base classes are used for training and the novel classes should be ignored maybe by setting them as ignore labels.

    I didn't find any code to handle the novel classes. I was expecting that in the file zsseg.baseline/mask_former/data/dataset_mappers/mask_former_semantic_dataset_mapper.py or zsseg.baseline/mask_former/zero_shot_mask_former_model.py there should be a provision to ignore the novel classes while training.

    I can see that you only pass the names of base classes here which are used in text prompts but I don't see any code to modify the groundtruth segmentation masks to ignore the novel classes.

    Please let me know if I am missing something or misunderstanding something.

    opened by vidit98 0
  • The pretained model (trrain on seen classes and test on unseen classes of the same dataset) could not match the result in paper.

    The pretained model (trrain on seen classes and test on unseen classes of the same dataset) could not match the result in paper.

    I download the pretrained model Trained Model, and evaluate on the cocostuff dataset. The results of Directly using CLIP vision eccoder without tuning are 21.01 mIoU and 26.95 mIoU for seen and unseen respectively. However, the results of paper are 26.8 mIoU and 29.7 mIoU. Are the results reasonable? image

    opened by AutomanHan 1
  • VOC dataset Preparation

    VOC dataset Preparation

    Was there the command of Data Preparetion for Pascal VOC 11k a error?

    python datasets/prepare_voc_sem_seg.py datasets/VOC2012
    python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/train datasets/VOC2012/annotations_detectron2/train_base_label_count.json
    

    should be

    python datasets/prepare_voc_sem_seg.py datasets/VOC2012
    python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/train_base datasets/VOC2012/annotations_detectron2/train_base_label_count.json
    

    train -> train_base?

    opened by AutomanHan 1
  • VOC 2012 Dataset Preparation

    VOC 2012 Dataset Preparation

    I am a bit confused as to how the VOC2012 dataset should be prepared for this model. The instruction says that the file structure should be like this: Capture

    I hope some of my confusions can be cleared up regarding the matter.

    1. I'm guessing that the first link is the link to download the whole VOC2012 dataset folder and from there we need to extract the JPEGImages folder and the val.txt file. But the VOC2012/ImageSets folder has 4 separate folders (Action, Layout, Main, Segmentation) under it, each of which contains a val.txt file. Which one am I supposed to select? I was inclined towards selecting the one under the 'Main' folder.
    2. The second link, I'm guessing, is a link to download the SegmentationClassAug folder which I am unable to find. Any guidance as to where exactly to look for will be helpful.
    3. The third link is one that has a file named trainaug.txt which I am supposed to download and rename as train.txt if I'm not wrong?
    opened by tauseef09 0
  • Visualization

    Visualization

    Thanks for the fantastic paper and codebase! I ran the manual prompts experiment python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_single_prompt_bs32_60k.yaml and wanted to know how did you guys visualize your results? It will be great if you can point me in a direction for the same.

    opened by AnukritiSinghh 2
  • pretrained model

    pretrained model

    Thanks for your great work and released code. I wonder if you can release the pretrained model so that we can evaluate the generalization ability on custom datasets.

    opened by Dingry 2
  • Missing Config Files

    Missing Config Files

    Hi there,

    Thanks for sharing this good work!

    When I am trying to run learned prompt on the Pascal VOC dataset. I use the config file : voc-11k-15/zero_shot_proposal_classification_learn_prompt_bs16_10k.yaml. In this config file, another config file "vo-11k-20/maskformer_R50_bs32_60k.yaml" is required as the base config, however, the base config file is missing. I am wondering if you can update either of these two settings.

    Thank you,

    opened by tao0420 1
Owner
null
Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds Björn Michele1), Alexandre Boulch1), Gilles Puy1), Maxime Bucher1) and Rena

valeo.ai 15 Dec 22, 2022
Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

null 47 Dec 16, 2022
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

null 23 Oct 17, 2022
Jingju baseline - A baseline model of our project of Beijing opera script generation

Jingju Baseline It is a baseline of our project about Beijing opera script gener

midon 1 Jan 14, 2022
Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Semantic Diversity Learning for Zero-Shot Multi-label Classification Paper Official PyTorch Implementation Avi Ben-Cohen, Nadav Zamir, Emanuel Ben Bar

null 28 Aug 29, 2022
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r

null 6 Dec 1, 2021
A Strong Baseline for Image Semantic Segmentation

A Strong Baseline for Image Semantic Segmentation Introduction This project is an open source semantic segmentation toolbox based on PyTorch. It is ba

Clark He 49 Sep 20, 2022
[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

CodingMan 45 Dec 12, 2022
code for CVPR paper Zero-shot Instance Segmentation

Code for CVPR2021 paper Zero-shot Instance Segmentation Code requirements python: python3.7 nvidia GPU pytorch1.1.0 GCC >=5.4 NCCL 2 the other python

zhengye 86 Dec 13, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 32 Sep 21, 2022
FairMOT - A simple baseline for one-shot multi-object tracking

FairMOT - A simple baseline for one-shot multi-object tracking

Yifu Zhang 3.6k Jan 8, 2023
[CVPR 2021] Few-shot 3D Point Cloud Semantic Segmentation

Few-shot 3D Point Cloud Semantic Segmentation Created by Na Zhao from National University of Singapore Introduction This repository contains the PyTor

null 117 Dec 27, 2022
Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

Lucas 103 Dec 14, 2022
[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Counterfactual Zero-Shot and Open-Set Visual Recognition This project provides implementations for our CVPR 2021 paper Counterfactual Zero-S

null 144 Dec 24, 2022
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model Edresson Casanova, Christopher Shulby, Eren Gölge, Nicolas Michael Müller, Frede

Edresson Casanova 92 Dec 9, 2022
Zero-shot Synthesis with Group-Supervised Learning (ICLR 2021 paper)

GSL - Zero-shot Synthesis with Group-Supervised Learning Figure: Zero-shot synthesis performance of our method with different dataset (iLab-20M, RaFD,

Andy_Ge 62 Dec 21, 2022
Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

Zero-shot-Fact-Verification-by-Claim-Generation This repository contains code and models for the paper: Zero-shot Fact Verification by Claim Generatio

Liangming Pan 47 Jan 1, 2023
Shared Attention for Multi-label Zero-shot Learning

Shared Attention for Multi-label Zero-shot Learning Overview This repository contains the implementation of Shared Attention for Multi-label Zero-shot

dathuynh 26 Dec 14, 2022
PyTorch implementation of 1712.06087 "Zero-Shot" Super-Resolution using Deep Internal Learning

Unofficial PyTorch implementation of "Zero-Shot" Super-Resolution using Deep Internal Learning Unofficial Implementation of 1712.06087 "Zero-Shot" Sup

Jacob Gildenblat 196 Nov 27, 2022