Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation, CVPR 2020 (Oral)

Hibercraft

Last update: Dec 26, 2022

Related tags

Computer Vision SEAM

Overview

SEAM

The implementation of Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentaion.

You can also download the repository from https://gitee.com/hibercraft/SEAM

Abstract

Image-level weakly supervised semantic segmentation is a challenging problem that has been deeply studied in recentyears. Most of advanced solutions exploit class activation map (CAM). However, CAMs can hardly serve as the object mask due to the gap between full and weak supervisions. In this paper, we propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap. Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation, whose pixel-level labels take the same spatial transformation as the input images during data augmentation. However, this constraint is lost on the CAMs trained by image-level supervision. Therefore, we propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning. Moreover, we propose a pixel correlation module (PCM), which exploits context appearance information and reﬁnes the prediction of current pixel by its similar neighbors, leading to further improvement on CAMs consistency. Extensive experiments on PASCAL VOC 2012 dataset demonstrate our method outperforms state-of-the-art methods using the same level of supervision.

Thanks to the work of jiwoon-ahn, the code of this repository borrow heavly from his AffinityNet repository, and we follw the same pipeline to verify the effectiveness of our SEAM.

Requirements

Python 3.6
pytorch 0.4.1, torchvision 0.2.1
CUDA 9.0
4 x GPUs (12GB)

Usage

Installation

Download the repository.

git clone https://github.com/YudeWang/SEAM.git

Install python dependencies.

pip install -r requirements.txt

Download model weights from google drive or baidu cloud (with code 6nmo), including ImageNet pretrained models and our training results.
Download PASCAL VOC 2012 devkit (follow instructions in http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit). It is suggested to make a soft link toward downloaded dataset.

ln -s $your_dataset_path/VOCdevkit/VOC2012 VOC2012

(Optional) The image-level labels have already been given in voc12/cls_label.npy. If you want to regenerate it (which is unnecessary), please download the annotation of VOC 2012 SegmentationClassAug training set (containing 10582 images), which can be download here and place them all as VOC2012/SegmentationClassAug/xxxxxx.png. Then run the code

cd voc12
python make_cls_labels.py --voc12_root VOC2012

SEAM step

SEAM training

python train_SEAM.py --voc12_root VOC2012 --weights $pretrained_model --session_name $your_session_name

SEAM inference.

python infer_SEAM.py --weights $SEAM_weights --infer_list [voc12/val.txt | voc12/train.txt | voc12/train_aug.txt] --out_cam $your_cam_dir --out_crf $your_crf_dir

SEAM step evaluation. We provide python mIoU evaluation script evaluation.py, or you can use official development kit. Here we suggest to show the curve of mIoU with different background score.

python evaluation.py --list VOC2012/ImageSets/Segmentation/[val.txt | train.txt] --predict_dir $your_cam_dir --gt_dir VOC2012/SegmentationClass --comment $your_comments --type npy --curve True

Random walk step

The random walk step keep the same with AffinityNet repository.

Train AffinityNet.

python train_aff.py --weights $pretrained_model --voc12_root VOC2012 --la_crf_dir $your_crf_dir_4.0 --ha_crf_dir $your_crf_dir_24.0 --session_name $your_session_name

Random walk propagation

python infer_aff.py --weights $aff_weights --infer_list [voc12/val.txt | voc12/train.txt] --cam_dir $your_cam_dir --voc12_root VOC2012 --out_rw $your_rw_dir

Random walk step evaluation

python evaluation.py --list VOC2012/ImageSets/Segmentation/[val.txt | train.txt] --predict_dir $your_rw_dir --gt_dir VOC2012/SegmentationClass --comment $your_comments --type png

Pseudo labels retrain

Pseudo label retrain on DeepLabv1. Code is available here.

Citation

Please cite our paper if the code is helpful to your research.

@InProceedings{Wang_2020_CVPR_SEAM,
    author = {Yude Wang and Jie Zhang and Meina Kan and Shiguang Shan and Xilin Chen},
    title = {Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation},
    booktitle = {Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2020}
}

Reference

[1] J. Ahn and S. Kwak. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

Comments

Training the segmentation code

Hello!

Thank you for sharing the excellent code.

I am trying to reproduce the performance you reported and I tried to train the result of the affinity network [Ahn et al.] with the segmentation code of https://github.com/itijyou/ademxapp

But I failed to train. Can you share the hyper-parameters or any change when you train? From the affinity net I found that he changed SGD to Adam with his work.

You may not remember, I need a little clue.

Thank you.

opened by halbielee 11
dCRF on CAMs

Hi， thanks for sharing your great work，I have one question about dCRF in your paper and wish for your reply. In your paper, a bunch of CAMs can be generated after training the SEAM.py, I want to know how to proceed the dCRF process in these CAMs (56.83% in table 1). Proceed the dCRF on the CAMs after combing with the best background scores from (0,60) or simply using foreground images ?

opened by Ferenas 9
Large performance gap between trained model using default setting and the provided trained model.

With the provided trained 'resnet38_SEAM.pth', the results of SEAM step evaluation:

0/60 background score: 0.000 mIoU: 28.861% 1/60 background score: 0.010 mIoU: 32.021% 2/60 background score: 0.020 mIoU: 35.937% 3/60 background score: 0.030 mIoU: 39.372% 4/60 background score: 0.040 mIoU: 42.470% 5/60 background score: 0.050 mIoU: 45.309% 6/60 background score: 0.060 mIoU: 47.967% 7/60 background score: 0.070 mIoU: 50.436% 8/60 background score: 0.080 mIoU: 52.721% 9/60 background score: 0.090 mIoU: 54.865% 10/60 background score: 0.100 mIoU: 56.885% 11/60 background score: 0.110 mIoU: 58.777% 12/60 background score: 0.120 mIoU: 60.595% 13/60 background score: 0.130 mIoU: 62.310% 14/60 background score: 0.140 mIoU: 63.905% 15/60 background score: 0.150 mIoU: 65.372% 16/60 background score: 0.160 mIoU: 66.710% 17/60 background score: 0.170 mIoU: 67.907% 18/60 background score: 0.180 mIoU: 68.925% 19/60 background score: 0.190 mIoU: 69.758% 20/60 background score: 0.200 mIoU: 70.414% 21/60 background score: 0.210 mIoU: 71.014% 22/60 background score: 0.220 mIoU: 71.291% 23/60 background score: 0.230 mIoU: 71.324% 24/60 background score: 0.240 mIoU: 71.143% 25/60 background score: 0.250 mIoU: 70.799% 26/60 background score: 0.260 mIoU: 70.287% 27/60 background score: 0.270 mIoU: 69.664% 28/60 background score: 0.280 mIoU: 68.952% 29/60 background score: 0.290 mIoU: 68.148% 30/60 background score: 0.300 mIoU: 67.274% 31/60 background score: 0.310 mIoU: 66.322% 32/60 background score: 0.320 mIoU: 65.305% 33/60 background score: 0.330 mIoU: 64.232% 34/60 background score: 0.340 mIoU: 63.105% 35/60 background score: 0.350 mIoU: 61.939% 36/60 background score: 0.360 mIoU: 60.727% 37/60 background score: 0.370 mIoU: 59.485% 38/60 background score: 0.380 mIoU: 58.215% 39/60 background score: 0.390 mIoU: 56.921% 40/60 background score: 0.400 mIoU: 55.609% 41/60 background score: 0.410 mIoU: 54.281% 42/60 background score: 0.420 mIoU: 52.940% 43/60 background score: 0.430 mIoU: 51.605% 44/60 background score: 0.440 mIoU: 50.279% 45/60 background score: 0.450 mIoU: 48.955% 46/60 background score: 0.460 mIoU: 47.630% 47/60 background score: 0.470 mIoU: 46.303% 48/60 background score: 0.480 mIoU: 44.982% 49/60 background score: 0.490 mIoU: 43.653% 50/60 background score: 0.500 mIoU: 42.330% 51/60 background score: 0.510 mIoU: 41.015% 52/60 background score: 0.520 mIoU: 39.709% 53/60 background score: 0.530 mIoU: 38.409% 54/60 background score: 0.540 mIoU: 37.119% 55/60 background score: 0.550 mIoU: 35.848% 56/60 background score: 0.560 mIoU: 34.601% 57/60 background score: 0.570 mIoU: 33.372% 58/60 background score: 0.580 mIoU: 32.158% 59/60 background score: 0.590 mIoU: 30.959%

When using the 'resnet38_SEAM.pth' trained myself using the default settings (except that I used two GPU cards，the batch size was still set to 8), the results of SEAM step evaluation:

0/60 background score: 0.000 mIoU: 22.938% 1/60 background score: 0.010 mIoU: 26.294% 2/60 background score: 0.020 mIoU: 30.367% 3/60 background score: 0.030 mIoU: 33.779% 4/60 background score: 0.040 mIoU: 36.815% 5/60 background score: 0.050 mIoU: 39.461% 6/60 background score: 0.060 mIoU: 41.722% 7/60 background score: 0.070 mIoU: 43.691% 8/60 background score: 0.080 mIoU: 45.386% 9/60 background score: 0.090 mIoU: 46.875% 10/60 background score: 0.100 mIoU: 48.230% 11/60 background score: 0.110 mIoU: 49.466% 12/60 background score: 0.120 mIoU: 50.592% 13/60 background score: 0.130 mIoU: 51.575% 14/60 background score: 0.140 mIoU: 52.443% 15/60 background score: 0.150 mIoU: 53.182% 16/60 background score: 0.160 mIoU: 53.806% 17/60 background score: 0.170 mIoU: 54.334% 18/60 background score: 0.180 mIoU: 54.759% 19/60 background score: 0.190 mIoU: 55.087% 20/60 background score: 0.200 mIoU: 55.339% 21/60 background score: 0.210 mIoU: 55.510% 22/60 background score: 0.220 mIoU: 55.590% 23/60 background score: 0.230 mIoU: 55.594% 24/60 background score: 0.240 mIoU: 55.525% 25/60 background score: 0.250 mIoU: 55.382% 26/60 background score: 0.260 mIoU: 55.169% 27/60 background score: 0.270 mIoU: 54.892% 28/60 background score: 0.280 mIoU: 54.556% 29/60 background score: 0.290 mIoU: 54.155% 30/60 background score: 0.300 mIoU: 53.685% 31/60 background score: 0.310 mIoU: 53.182% 32/60 background score: 0.320 mIoU: 52.640% 33/60 background score: 0.330 mIoU: 52.064% 34/60 background score: 0.340 mIoU: 51.445% 35/60 background score: 0.350 mIoU: 50.793% 36/60 background score: 0.360 mIoU: 50.107% 37/60 background score: 0.370 mIoU: 49.380% 38/60 background score: 0.380 mIoU: 48.624% 39/60 background score: 0.390 mIoU: 47.837% 40/60 background score: 0.400 mIoU: 47.029% 41/60 background score: 0.410 mIoU: 46.199% 42/60 background score: 0.420 mIoU: 45.353% 43/60 background score: 0.430 mIoU: 44.483% 44/60 background score: 0.440 mIoU: 43.593% 45/60 background score: 0.450 mIoU: 42.681% 46/60 background score: 0.460 mIoU: 41.749% 47/60 background score: 0.470 mIoU: 40.809% 48/60 background score: 0.480 mIoU: 39.855% 49/60 background score: 0.490 mIoU: 38.890% 50/60 background score: 0.500 mIoU: 37.914% 51/60 background score: 0.510 mIoU: 36.934% 52/60 background score: 0.520 mIoU: 35.954% 53/60 background score: 0.530 mIoU: 34.974% 54/60 background score: 0.540 mIoU: 33.988% 55/60 background score: 0.550 mIoU: 32.998% 56/60 background score: 0.560 mIoU: 32.011% 57/60 background score: 0.570 mIoU: 31.033% 58/60 background score: 0.580 mIoU: 30.064% 59/60 background score: 0.590 mIoU: 29.102%

opened by bityangke 8
segmentation training problems
It seems that you use the train_set to train segmentation model. why not use trainaug?

Following the setting in #11, my results is 61.5 training with trainaug and 56.7 with train. Why it differs a lot from the results of the paper? (Note that the weight is from ilsvrc-cls_rna-a1_cls1000_ep-0001.params. test resolution is (1024*512) * [0.5, 0.75, 1.0, 1.25, 1.5, 1.75] in test.)

why it drops after applying crf in RW step?
opened by Eli-YiLi 5
GPU and batch size?

Thanks for your great work! I noticed that in your paper you mentioned: The model is trained on 4 TITAN-Xp GPUs with batch size 8 for 8 epochs. However, I train the SEAM on 4 2080Ti GPUs with batch size 8, and find that each card only took up about 4G memory. So I wonder, are 4×12G GPUs necessary? Thanks for your reply.

opened by whatsups 4
Why using cls labels to generate CAMs at inference time? Is it valid?

At val / test time, in infer_SEAM.py (line 79 to line 82), you use GT cls labels to choose CAMs of these categories and save these specified CAMs as .npy files. I am wondering whether using GT cls labels at inference time is valid in weakly-supervised semantic segmentation. Could you provide me with some hints? Much thanks!

opened by Siyuan-Zhou 4
Question about the classification loss
The SEAM is really a excellent work. After reading the paper, I have a question:

how to get the final segmentation mask? In my understanding, the SEAM finally output a CAM map, then the Random work is used to segment the final mask? Am I right?

How to calculate the classification loss? For example, the final output is and we can also calculate the background as: but, how can we use the two result to calculate the loss? how can we generate the ground truth? Is img(m, n) = c (the true label) the ground truth?

Any suggestion is appreciated!
opened by NeuZhangQiang 3
Optimization problem when training SEAM from scratch

Hi, firstly thank you for releasing the code, I've successfully reproduced part of the result by using the provided weights.

However, when I tried to train SEAM from scratch (not using any pretrained weights), it seems ER loss easily goes down to 0 and ECR loss just cannot go down, then the model cannot improve anymore. I've tried to increase the loss weight of ECR loss but the outcome is still the same. Could you provide more details or suggestions on how you train SEAM without pretrained weights?

Thanks!

opened by Angusky 3
cam_full_arr[k+1] = v out of bounds

Thanks for posting your excellent code! I met some problems when using the code. In line95, the npy files store the information of classes in a training sample. The max k in line 98 is 21 for the VOC dataset, because the VOC dataset contains 21 categories. In line 99, the index will out of the bound, because k+1 can vbe 22 But cam_full_arr's size is 21. How can I do to solve the error? And what does the line 100 mean? What is filled in the cam_full_arr[0]? I am confused. https://github.com/YudeWang/SEAM/blob/3212261b9d008581576b8b5429e32413d892c10b/infer_aff.py#L95-L100 Looking forward to your reply.

opened by shifan-Z 2
Background threshold?

I notice that you traverse all background threshold options and give the best mIoU of pseudo labels, this setting assumes that the ground truth masks are available during pseudo label generating. However, in practice, if the gt masks are available, why don't we just use these gt labels? So I think a background threshold selection strategy without depending on gt masks is needed here for practice. What do you think of it? Thanks！

opened by whatsups 2
OHEM

Hello, Yude.

Thanks for sharing this great work!

I have one question about table 1. You mentioned that you reported results in table 1 with the training set. Then, it seems OHEM process should be involved with train_SEAM.py. Is that correct? Does your repo include OHEM process? How can I use OHEM in your code?

Thanks

opened by hchoi71 2
Question about affinityNet Inference

Dear YudeWang, Thanks for sharing your code! Should I use the result value of the CAM that I checked through the existing SEAM for the cam_dir paramter in infer_aff.py? Anyone knows the answer?

opened by dnwjddl 4
Exception during running train_SEAM.py

When I run the script train_SEAM with the weights ilsvrc-cls_rna-a1_cls1000_ep-0001.params, the code will stop randomly without any information. How can I solve it? Anybody has the same problem?

opened by yangxinhaosmu 0
Confused about the code: "cam = np.flip(cam, axis=-1)"

https://github.com/YudeWang/SEAM/blob/c55601649c5fa676836d3ec70ec044541b7d1d83/infer_SEAM.py#L64 Could you explain why the code need flip operation in the lines 64? Several papers hava also used this operation. Thank you!

opened by Dreamupers 0

Owner

Hibercraft

CS PhD, CV & DL

GitHub

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

MTLFace This repository contains the PyTorch implementation and the dataset of the paper: When Age-Invariant Face Recognition Meets Face Age Synthesis

120 Jan 5, 2023

CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

83 Jan 4, 2023

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Bridging Video-text Retrieval with Multiple Choice Questions, CVPR 2022 (Oral) Paper | Project Page | Pre-trained Model | CLIP-Initialized Pre-trained

Applied Research Center (ARC), Tencent PCG

99 Jan 6, 2023

(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

ST3D Code release for the paper ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection, CVPR 2021 Authors: Jihan Yang*, Shaoshu

224 Dec 28, 2022

Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

SoftGroup We provide code for reproducing results of the paper SoftGroup for 3D Instance Segmentation on Point Clouds (CVPR 2022) Author: Thang Vu, Ko

231 Dec 27, 2022

Solution for Problem 1 by team codesquad for AIDL 2020. Uses ML Kit for OCR and OpenCV for image processing

CodeSquad PS1 Solution for Problem Statement 1 for AIDL 2020 conducted by @unifynd technologies. Problem Given images of bills/invoices, the task was

111 Nov 27, 2022

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

595 Dec 27, 2022

Pytorch implementation of PSEnet with Pyramid Attention Network as feature extractor

Scene Text-Spotting based on PSEnet+CRNN Pytorch implementation of an end to end Text-Spotter with a PSEnet text detector and CRNN text recognizer. We

62 Oct 10, 2022

textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

An End-to-End TextSpotter with Explicit Alignment and Attention This is initially described in our CVPR 2018 paper. Getting Started Installation Clone

323 Nov 10, 2022

Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

SSTDNet Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight' using pytorch. This code is work for general object detecti

84 Jan 5, 2022

Single Shot Text Detector with Regional Attention

Single Shot Text Detector with Regional Attention Introduction SSTD is initially described in our ICCV 2017 spotlight paper. A third-party implementat

215 Dec 7, 2022

🖺 OCR using tensorflow with attention

tensorflow-ocr ?? OCR using tensorflow with attention, batteries included Installation git clone --recursive http://github.com/pannous/tensorflow-ocr

646 Nov 11, 2022

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

933 Dec 29, 2022

Visual Attention based OCR

Attention-OCR Authours: Qi Guo and Yuntian Deng Visual Attention based OCR. The model first runs a sliding CNN on the image (images are resized to hei

1.1k Jan 2, 2023

CNN+Attention+Seq2Seq

Attention_OCR CNN+Attention+Seq2Seq The model and its tensor transformation are shown in the figure below It is necessary ch_ train and ch_ test the p

2 Jul 14, 2022

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

Open Semantic Search https://opensemanticsearch.org Integrated search server, ETL framework for document processing (crawling, text extraction, text a

684 Jan 6, 2023

Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

DataTuner You have just found the DataTuner. This repository provides tools for fine-tuning language models for a task. See LICENSE.txt for license de

81 Jan 1, 2023

Semantic-based Patch Detection for Binary Programs

PMatch Semantic-based Patch Detection for Binary Programs Requirement tensorflow-gpu 1.13.1 numpy 1.16.2 scikit-learn 0.20.3 ssdeep 3.4 Usage tar -xvz

3 Sep 2, 2022

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

81 Dec 1, 2022