[CVPR'22] Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast

Related tags

Deep Learning wseg
Overview

wseg

Overview

The Pytorch implementation of Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast.

[arXiv]

Though image-level weakly supervised semantic segmentation (WSSS) has achieved great progress with Class Activation Maps (CAMs) as the cornerstone, the large supervision gap between classification and segmentation still hampers the model to generate more complete and precise pseudo masks for segmentation. In this study, we propose weakly-supervised pixel-to-prototype contrast that can provide pixel-level supervisory signals to narrow the gap. Guided by two intuitive priors, our method is executed across different views and within per single view of an image, aiming to impose cross-view feature semantic consistency regularization and facilitate intra(inter)-class compactness(dispersion) of the feature space. Our method can be seamlessly incorporated into existing WSSS models without any changes to the base networks and does not incur any extra inference burden. Extensive experiments manifest that our method consistently improves two strong baselines by large margins, demonstrating the effectiveness.

图片

Prerequisites

Preparation

  1. Clone this repository.
  2. Data preparation. Download PASCAL VOC 2012 devkit following instructions in http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit. It is suggested to make a soft link toward downloaded dataset. Then download the annotation of VOC 2012 trainaug set (containing 10582 images) from https://www.dropbox.com/s/oeu149j8qtbs1x0/SegmentationClassAug.zip?dl=0 and place them all as VOC2012/SegmentationClassAug/xxxxxx.png. Download the image-level labels cls_label.npy from https://github.com/YudeWang/SEAM/tree/master/voc12/cls_label.npy and place it into voc12/, or you can generate it by yourself.
  3. Download ImageNet pretrained backbones. We use ResNet-38 for initial seeds generation and ResNet-101 for segmentation training. Download pretrained ResNet-38 from https://drive.google.com/file/d/15F13LEL5aO45JU-j45PYjzv5KW5bn_Pn/view. The ResNet-101 can be downloaded from https://download.pytorch.org/models/resnet101-5d3b4d8f.pth.

Model Zoo

Download the trained models and category performance below.

baseline model train(mIoU) val(mIoU) test (mIoU) checkpoint category performance
SEAM contrast 61.5 58.4 - [download]
affinitynet 69.2 - [download]
deeplabv1 - 67.7* 67.4* [download] [link]
EPS contrast 70.5 - - [download]
deeplabv1 - 72.3* 73.5* [download] [link]
deeplabv2 - 72.6* 73.6* [download] [link]

* indicates using densecrf.

The training results including initial seeds, intermediate products and pseudo masks can be found here.

Usage

Step1: Initial Seed Generation with Contrastive Learning.

  1. Contrast train.

    python contrast_train.py  \
      --weights $pretrained_model \
      --voc12_root VOC2012 \
      --session_name $your_session_name \
      --batch_size $bs
    
  2. Contrast inference.

    Download the pretrained model from https://1drv.ms/u/s!AgGL9MGcRHv0mQSKoJ6CDU0cMjd2?e=dFlHgN or train from scratch, set --weights and then run:

    python contrast_infer.py \
      --weights $contrast_weight \ 
      --infer_list $[voc12/val.txt | voc12/train.txt | voc12/train_aug.txt] \
      --out_cam $your_cam_npy_dir \
      --out_cam_pred $your_cam_png_dir \
      --out_crf $your_crf_png_dir
    
  3. Evaluation.

    Following SEAM, we recommend you to use --curve to select an optimial background threshold.

    python eval.py \
      --list VOC2012/ImageSets/Segmentation/$[val.txt | train.txt] \
      --predict_dir $your_result_dir \
      --gt_dir VOC2012/SegmentationClass \
      --comment $your_comments \
      --type $[npy | png] \
      --curve True
    

Step2: Refine with AffinityNet.

  1. Preparation.

    Prepare the files (la_crf_dir and ha_crf_dir) needed for training AffinityNet. You can also use our processed crf outputs with alpha=4/8 from here.

    python aff_prepare.py \
      --voc12_root VOC2012 \
      --cam_dir $your_cam_npy_dir \
      --out_crf $your_crf_alpha_dir 
    
  2. AffinityNet train.

    python aff_train.py \
      --weights $pretrained_model \
      --voc12_root VOC2012 \
      --la_crf_dir $your_crf_dir_4.0 \
      --ha_crf_dir $your_crf_dir_8.0 \
      --session_name $your_session_name
    
  3. Random walk propagation & Evaluation.

    Use the trained AffinityNet to conduct RandomWalk for refining the CAMs from Step1. Trained model can be found in Model Zoo (https://1drv.ms/u/s!AgGL9MGcRHv0mQXi0SSkbUc2sl8o?e=AY7AzX).

    python aff_infer.py \
      --weights $aff_weights \
      --voc12_root VOC2012 \
      --infer_list $[voc12/val.txt | voc12/train.txt] \
      --cam_dir $your_cam_dir \
      --out_rw $your_rw_dir
    
  4. Pseudo mask generation. Generate the pseudo masks for training the DeepLab Model. Dense CRF is used in this step.

    python aff_infer.py \
      --weights $aff_weights \
      --infer_list voc12/trainaug.txt \
      --cam_dir $your_cam_dir \
      --voc12_root VOC2012 \
      --out_rw $your_rw_dir
    

Step3: Segmentation training with DeepLab

  1. Training.

    we use the segmentation repo from https://github.com/YudeWang/semantic-segmentation-codebase. Training and inference codes are available in segmentation/experiment/. Set DATA_PSEUDO_GT: $your_pseudo_label_path in config.py. Then run:

    python train.py
    
  2. Inference.

    Check test configration in config.py (ckpt path, trained model: https://1drv.ms/u/s!AgGL9MGcRHv0mQgpb3QawPCsKPe9?e=4vly0H) and val/test set selection in test.py. Then run:

    python test.py
    

    For test set evaluation, you need to download test set images and submit the segmentation results to the official voc server.

For integrating our approach into the EPS model, you can change branch to EPS via:

git checkout eps

Then conduct train or inference following instructions above. Segmentation training follows the same repo in segmentation. Trained models & processed files can be download in Model Zoo.

Acknowledgements

We sincerely thank Yude Wang for his great work SEAM in CVPR'20. We borrow codes heavly from his repositories SEAM and Segmentation. We also thank Seungho Lee for his EPS and jiwoon-ahn for his AffinityNet and IRN. Without them, we could not finish this work.

Citation

@inproceedings{du2021weakly,
  title={Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast},
  author={Du, Ye and Fu, Zehua and Liu, Qingjie and Wang, Yunhong},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2022}
}
Comments
  • pseudo mask generation for eps

    pseudo mask generation for eps

    Hi, nice to see interesting research and results. Thank you.

    I want to generate pseudo-masks with your pretrained eps-contrast weight, but I cannot find forward_cam in your model code.

    I simply replaced the model.forward_cam to model.forward and get cam_rv from resnet38_contrast to make code run.

    However, the pseudo-mask quality of cam_png is weird.. is there something that I missed?

    opened by k0u-id 4
  • A question about your segmentation training

    A question about your segmentation training

    Excuse me. There is a question about your segmentation training part.

    In your deeplabv2 case, it seems that there is a redundant argument "MODEL_OUTPUT_STRIDE" when initializing resnet101.

    Your current released code can not successfully introduce Resnet101 as the backbone for training.

    opened by YininKorea 3
  • 性能相关问题

    性能相关问题

    作者您好,我是一名大四在读本科生,毕设主题是弱监督语义分割。我在拜读您的文章Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast后受到了很大的启发,但是在实际运行的时候遇到了一定的困难,希望您能为我解答疑惑。 1.我完全按照您提供的README指导,运行您在GitHub中上传的代码,但是在实现 “Step1: Initial Seed Generation with Contrastive Learning.” 过程中种子点的mIoU只能达到54.6%左右,明显低于您在文章中提到的61.5%性能。 下图是我截取的输出结果 image 2.此外,我还尝试修改了您训练部分的代码,删去了其中Cross-view Contrast部分的功能,仅保留了Intra-view Contrast部分,得到的mIoU为55.8%,高于在同样环境下您的原始代码所得到的结果,如此来看Cross-view Contrast反而降低了性能,这也是我现在不能理解的部分。 下图是我截取的只保留intra-view contrast部分得到的输出结果 image

    我的系统环境为python3.6 pytorch1.10.1 CUDA10.2 符合您在README中提到的要求 硬件条件为GeForce GTX 1080 ti 12G显存 希望您能帮我解答上述两个疑惑,十分感谢!

    opened by WonYuHao 3
  • --curve option and your pseudomask

    --curve option and your pseudomask

    Hi, I try to train the code and get performance curve like this:

    19/60 background score: 0.190	mIoU: 59.990%
    20/60 background score: 0.200	mIoU: 60.442%
    21/60 background score: 0.210	mIoU: 60.832%
    22/60 background score: 0.220	mIoU: 61.156%
    23/60 background score: 0.230	mIoU: 61.306%
    24/60 background score: 0.240	mIoU: 61.446%
    25/60 background score: 0.250	mIoU: 61.529%
    26/60 background score: 0.260	mIoU: 61.724%
    27/60 background score: 0.270	mIoU: 61.746%
    28/60 background score: 0.280	mIoU: 61.461%
    29/60 background score: 0.290	mIoU: 61.201%
    30/60 background score: 0.300	mIoU: 60.883%
    31/60 background score: 0.310	mIoU: 60.706%
    

    The best background score is 0.27 and I get 61.746% miou finally. This is close to your result, but I notice that this uses ground truth masks to choose an optimal background threshold. Can you give some insights about this? Also, can you provide your processed pseudomasks so that others can use them to train segmentation models, thanks.

    opened by Theodorenyu 2
  • Question about segmentation on SEAM with deeplabv1

    Question about segmentation on SEAM with deeplabv1

    Thanks for your great work! I'm running code in segmentation part using SEAM deeplabv1. As I understand the wSEAM file is the from SEAM github. However, in segmentation/experiment/SEAM_deeplabv1_resnet38/net/deeplabv1.py, I couldn't find the wSEAM.utils file for importing "NETS".

    Is there any advice or answer?

    opened by Shield9999 1
  • What are la_crf and ha_crf?

    What are la_crf and ha_crf?

    Hi, I want to reproduce your paper.

    I have completed the implementation by Step2-Preparation. I just got some crf files as np format(crf/4.00/...), but, they are not split la_crf_dir, ha_crf_dir. I don't know what la, ha. What mean of la and ha?? What should I type in $your_la_crf_dir and $your_ha_crf_dir?

    Additional , I can not open crf outputs download link, (Step2: Refine with AffinityNet. - Preparation.) (https://github.com/usr922/wseg/blob/master).

    I'd appreciate it if you answered.

    opened by IJS1016 1
  • pseudo mask IoU after rw

    pseudo mask IoU after rw

    Hi, thank you for your work on this paper and code, when I was implementing your code of random walk on the CAM, the generated pseudo mask IoU is less than 60% following the instruction, is there something that I missed?

    opened by BinghaoLu 1
  •  Pseudo mask generation in Step 2

    Pseudo mask generation in Step 2

    Pseudo mask generation in Step 2 provides Pseudo masks of train+aug set. I wonder if they are the result of applying PCC to EPS and SEAM. Or are they just the result of SEAM and EPS?

    opened by lmm077 1
Owner
Ye Du
Ye Du
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 329 Sep 21, 2022
[CVPR22] Official codebase of Semantic Segmentation by Early Region Proxy.

RegionProxy Figure 2. Performance vs. GFLOPs on ADE20K val split. Semantic Segmentation by Early Region Proxy Yifan Zhang, Bo Pang, Cewu Lu CVPR 2022

Yifan 50 Sep 27, 2022
Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation Exploring Cross-Image Pixel Contrast for Semantic Segmentation, Wenguan Wang, Tianfei Z

Tianfei Zhou 484 Oct 2, 2022
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

Melon(Xuguang Duan) 95 Jul 30, 2022
The implementation of "Bootstrapping Semantic Segmentation with Regional Contrast".

ReCo - Regional Contrast This repository contains the source code of ReCo and baselines from the paper, Bootstrapping Semantic Segmentation with Regio

Shikun Liu 121 Sep 17, 2022
A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

wsss-analysis The code of: A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, arXiv pre-print 2019 paper.

Lyndon Chan 47 Sep 14, 2022
Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation The code of: Context Decoupling Augmentation for Weakly Supervised Semanti

null 48 Aug 30, 2022
Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation

Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation (AAAI 2021) Official pytorch implementation of our paper: Discriminative

Beom 67 Sep 19, 2022
Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation Input Image Initial CAM Successive Maps with adversar

Jungbeom Lee 75 Sep 13, 2022
Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

null 37 Sep 13, 2022
DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

The Official PyTorch Implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Shiyi Lan 3 Oct 15, 2021
Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021) The implementation of Reducing Infromation Bottleneck for W

Jungbeom Lee 51 Sep 22, 2022
The PyTorch implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision.

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision The PyTorch implementation of DiscoBox: Weakly Supe

Shiyi Lan 1 Oct 23, 2021
Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021) This is the implementation of PSD (ICCV 2021),

null 10 Aug 14, 2022
Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

CCAM (Unsupervised) Code repository for our paper "CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localizati

Computer Vision Insitute, SZU 81 Sep 21, 2022
CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Rethinking Semantic Segmentation: A Prototype View Rethinking Semantic Segmentation: A Prototype View, Tianfei Zhou, Wenguan Wang, Ender Konukoglu and

Tianfei Zhou 202 Sep 25, 2022
PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Contrast to Divide: self-supervised pre-training for learning with noisy labels This is an official implementation of "Contrast to Divide: self-superv

null 54 Jul 30, 2022
This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

Core-tuning This repository is the official implementation of ``Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regular

vanint 17 Aug 22, 2022
Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.

TS-CAM: Token Semantic Coupled Attention Map for Weakly SupervisedObject Localization This is the official implementaion of paper TS-CAM: Token Semant

vasgaowei 107 Aug 31, 2022