Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Jiwoon Ahn

Last update: Dec 15, 2022

Related tags

Overview

Learning Pixel-level Semantic Affinity with Image-level Supervision

This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead.

Introduction

The code and trained models of:

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, Jiwoon Ahn and Suha Kwak, CVPR 2018 [Paper]

We have developed a framework based on AffinityNet to generate accurate segmentation labels of training images given their image-level class labels only. A segmentation network learned with our synthesized labels outperforms previous state-of-the-arts by large margins on the PASCAL VOC 2012.

*Our code was first implemented in Tensorflow at the time of CVPR 2018 submssion, and later we migrated to PyTorch. Some trivial details (optimizer, channel size, and etc.) have been changed.

Citation

If you find the code useful, please consider citing our paper using the following BibTeX entry.

@InProceedings{Ahn_2018_CVPR,
author = {Ahn, Jiwoon and Kwak, Suha},
title = {Learning Pixel-Level Semantic Affinity With Image-Level Supervision for Weakly Supervised Semantic Segmentation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}

Prerequisite

Tested on Ubuntu 16.04, with Python 3.5, PyTorch 0.4, Torchvision 0.2.1, CUDA 9.0, and 1x NVIDIA TITAN X (Pascal).
The PASCAL VOC 2012 development kit: You also need to specify the path ('voc12_root') of your downloaded dev kit.
(Optional) If you want to try with the VGG-16 based network, PyCaffe and VGG-16 ImageNet pretrained weights [vgg16_20M.caffemodel]
(Optional) If you want to try with the ResNet-38 based network, Mxnet and ResNet-38 pretrained weights [ilsvrc-cls_rna-a1_cls1000_ep-0001.params]

Usage

1. Train a classification network to get CAMs.

python3 train_cls.py --lr 0.1 --batch_size 16 --max_epoches 15 --crop_size 448 --network [network.vgg16_cls | network.resnet38_cls] --voc12_root [your_voc12_root_folder] --weights [your_weights_file] --wt_dec 5e-4

2. Generate labels for AffinityNet by applying dCRF on CAMs.

python3 infer_cls.py --infer_list voc12/train_aug.txt --voc12_root [your_voc12_root_folder] --network [network.vgg16_cls | network.resnet38_cls] --weights [your_weights_file] --out_cam [desired_folder] --out_la_crf [desired_folder] --out_ha_crf [desired_folder]

(Optional) Check the accuracy of CAMs.

python3 infer_cls.py --infer_list voc12/val.txt --voc12_root [your_voc12_root_folder] --network network.resnet38_cls --weights res38_cls.pth --out_cam_pred [desired_folder]

3. Train AffinityNet with the labels

python3 train_aff.py --lr 0.1 --batch_size 8 --max_epoches 8 --crop_size 448 --voc12_root [your_voc12_root_folder] --network [network.vgg16_aff | network.resnet38_aff] --weights [your_weights_file] --wt_dec 5e-4 --la_crf_dir [your_output_folder] --ha_crf_dir [your_output_folder]

4. Perform Random Walks on CAMs

python3 infer_aff.py --infer_list [voc12/val.txt | voc12/train.txt] --voc12_root [your_voc12_root_folder] --network [network.vgg16_aff | network.resnet38_aff] --weights [your_weights_file] --cam_dir [your_output_folder] --out_rw [desired_folder]

Results and Trained Models

Class Activation Map

Model	Train (mIoU)	Val (mIoU)
VGG-16	48.9	46.6	[Weights]
ResNet-38	47.7	47.2	[Weights]
ResNet-38	48.0	46.8	CVPR submission

Random Walk with AffinityNet

Model	alpha	Train (mIoU)	Val (mIoU)
VGG-16	4/16/32	59.6	54.0	[Weights]
ResNet-38	4/16/32	61.0	60.2	[Weights]
ResNet-38	4/16/24	58.1	57.0	CVPR submission

*beta=8, gamma=5, t=256 for all settings

Comments

the loss of classification network doesn't decrease

When I trained the classification network(using both pretrained vgg and resnet weights), the loss didn't decrease succesfully using given hyperparamters. For example, the loss of vgg network vibrated around 0.24 after 1k iters, I also tried the learning rate of 0.01, it also failed. Could you give me some suggestions? Thanks~

opened by kevinlee9 9
CAM mIoU
Hi Jiwoon,

thanks for sharing this nice work! I'm trying to generate the CAMs with a model I trained myself, but the mIoU I get is quite low, 42.28 (on PASCAL VOC2012 / train). I first ran

python3 train_cls.py --lr 0.1 --batch_size 16 --max_epoches 15 --crop_size 448 --network network.resnet38_cls --voc12_root ./data --weights weights/ilsvrc-cls_rna-a1_cls1000_ep-0001.pth --wt_dec 5e-4

and then generated the CAMs with

python3 infer_cls.py --infer_list voc12/train.txt --voc12_root ./data --network network.resnet38_cls --weights res38_cls.pth --out_cam_pred ./cams

Is there anything amiss? To get mIoU of 48% in the paper, should I use the dCRF? Here I don't.

More details, in case it helps:

| Class | # | IoU | Pr | Re | |:---:|---:|---:|---:|---:| | background | 10581 | 71.6 | 83.4 | 83.9 | aeroplane | 586 | 37.8 | 40.8 | 93.8 | bicycle | 485 | 43.6 | 50.3 | 86.1 | bird | 698 | 34.1 | 39.6 | 86.1 | boat | 460 | 28.1 | 35.1 | 79.0 | bottle | 651 | 27.0 | 31.9 | 81.9 | bus | 385 | 61.3 | 77.2 | 79.3 | car | 1079 | 39.9 | 50.1 | 81.1 | cat | 1000 | 43.8 | 73.3 | 57.9 | chair | 1063 | 34.0 | 44.1 | 68.3 | cow | 262 | 42.2 | 54.8 | 78.0 | diningtable | 520 | 39.2 | 54.6 | 65.4 | dog | 1176 | 41.9 | 66.4 | 63.4 | horse | 444 | 40.4 | 58.4 | 65.6 | motorbike | 481 | 51.9 | 62.8 | 83.0 | person | 3876 | 37.2 | 50.5 | 66.7 | potted-plant | 485 | 35.0 | 45.8 | 76.8 | sheep | 299 | 43.8 | 50.6 | 86.4 | sofa | 474 | 43.5 | 65.9 | 60.0 | train | 499 | 52.6 | 65.9 | 79.6 | tv/monitor | 548 | 38.9 | 45.3 | 87.4 | ambiguous | 330 | 0.0 | 0.0 | 0.0

mIou: 42.28 (background included)

Best, Nikita
opened by arnike 8
model weights info

hi @jiwoon-ahn @hardBird123 , I wanted to know if you trained the vgg16 model from scratch or used a imagenet pre-trained model and used only specific layers of it for training on voc12?

opened by sinAshish 8
The training params for ResNet38_aff

@jiwoon-ahn hi, Thanks for your nice work! I load your shared weights and achieve the same mIoU as you showed in README.

While I meet some trouble to train the model by myself and only achieve 59.077% mIoU (instead of 60.2%) on val set by ResNet38_aff. As you mentioned in other issues that the default params setting is just for vgg16, could you share the params setting for resnet38_aff?

Here is my res38_aff training setttings: lr=0.01 gpu=4 (may cause different batch number on each gpu) batchsize=8 max_epoch=8 loss_weight=1/4 * bg + 1/4 * fg + 1/2 * neg pretrained_model=res38_cls.pth (provided by you) alpha=4/16/32 other default params...

opened by YudeWang 4
learning rate

in train_cls.py,when the learning rate is set to 0.1,the loss is increasing,so I tried to set it to 0.01,it works,,so can you verify the learning rate in train_cls.py and train_aff.py?Thanks in advance!

opened by LeiyuanMa 4
Do you use validation data for training when report results on test split

suppose the augmented VOC dataset can be splited into train_aug(10582)+val(1449)+test(1456), when you report results on test split in your paper, do you use train_aug + val for training or just use train_aug? thx!

opened by yaoqi-zd 2
The number of epochs?

Thanks for the code!

For how many epochs did you train your model to achieve the results in the paper? I am wondering what a good practice would be for reporting the results since there isn't really a validation set for early stopping. Thanks!

opened by IssamLaradji 2
Downloading the pretrained weights for resnet-38

Thanks for the code! How did you download the pretrained weights for ResNet-38? When I click on the link here (Optional) Mxnet and ResNet-38 pretrained weights [ilsvrc-cls_rna-a1_cls1000_ep-0001.params], I am just taken to another github repo. without instructions on how to get the pretrained weights. Thanks

opened by IssamLaradji 2
Normalize transform also shifts RGB to BGR pattern and vice-versa

Here index 0 is mapped to 2 and 2 mapped to 0 - https://github.com/jiwoon-ahn/psa/blob/master/network/vgg16d.py#L15

Image is initially loaded in RGB Mode - https://github.com/jiwoon-ahn/psa/blob/9493e59bef16687ecb4821387e38e6460857f508/voc12/data.py#L69

Finally, the transform is simply applied to RGB image, no where there is need for channel swapping. https://github.com/jiwoon-ahn/psa/blob/master/train_cls.py#L82

P.S. It could be a bug due to some legacy code using OpenCV to open image, which defaults to BGR mode.

opened by saurabheights 1
Can you share your training command for ResNet-38 segmentation network?

Hi, thanks for your brilliant work! When I tried to reproduce the ResNet38 segmentation network, I found the repo you mentioned (https://github.com/jiwoon-ahn/psa/issues/7#issuecomment-430170981_) didn't give the training command. So I'm wondering if you can share your training command for ResNet-38 segmentation network?

opened by rulixiang 1
Regarding the code
Thank author for the code. I have two small questions:

regarding this small block of code below to obtain cam: def forward_cam(self, x): x = super().forward(x) x = self.fc8(x) x = F.relu(x) x = torch.sqrt(x) return x what's x=torch.sqrt(x) for?

regarding the pretrained caffe weghts, i.e. vgg16_20M.prototxt, on what dataset was it trained on? were you meant to use the deeplab_LargeFOV model of vgg16 version as the network to compute CAM, but why setting fc6_dilation=1 not 12 as in deeplab v1 paper?
opened by laukun 1
VGG - computing cam and use of sqrt

At line https://github.com/jiwoon-ahn/psa/blob/master/network/vgg16_cls.py#L36. CAM is weighted combination of feature maps, so not sure why sqrt is used.

Please, could you provide some clarification.

opened by saurabheights-ecr 1
The paramters in optimizer
Hello, I note that the order of paramters (params lr wd) in PolyOptimizer is different from official SGD(params lr momentum). So I think the value of wd will actually be assigned to momentum. Is it so?

class PolyOptimizer(torch.optim.SGD): def __init__(self, params, lr, weight_decay, max_step, momentum=0.9): super().__init__(params, lr, weight_decay)
opened by lzyhha 0
Last fully connect layer

Hi, I notice that the last fc layer in resnet38_cls.py is defined as self.fc8 = nn.Conv2d(4096, 20, 1, bias=False), while I found another implementation define the last layer as a linear self.fc = nn.Linear(2048, num_classes), I wonder why do you use convolution instead, and is there any difference?

opened by weixuansun 0
how to calculate the mIOU?

I run the project and got the result out_cam/out_la_crf/out_ha_crf/out_rw. Where should I download GT? How should I calculate the mIOU? Can you provide the code to calculate the mIOU?

opened by jieruyao49 3
Training schedule for segmentation net

Hi Jiwoon,

when training the segmentation net on the final masks, what schedule did you use (e.g. learning rate/decay, num of iterations/epochs, frozen/learned BN)? My guess is that you might need early stopping to avoid overfitting to the errors in the masks used for pseudo-supervision.

Thanks, Nikita

opened by arnike 1

Owner

Jiwoon Ahn

Deep Learning Researcher

GitHub

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

The Official PyTorch Implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

3 Oct 15, 2021

The PyTorch implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision.

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision The PyTorch implementation of DiscoBox: Weakly Supe

1 Oct 23, 2021

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

96 Nov 1, 2022

The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation"

SD-AANet The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation" [arxiv] Overview confi

9 Nov 7, 2022

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

CCAM (Unsupervised) Code repository for our paper "CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localizati

113 Dec 27, 2022

[CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

TorchSemiSeg [CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision by Xiaokang Chen1, Yuhui Yuan2, Gang Zeng1, Jingdong Wang

387 Jan 8, 2023

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation Input Image Initial CAM Successive Maps with adversar

110 Dec 7, 2022

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

44 Dec 12, 2022

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation This paper has been accepted and early accessed

39 Sep 20, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

wsss-analysis The code of: A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, arXiv pre-print 2019 paper.

48 Dec 18, 2022

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Pixel-Level Cycle Association This is the Pytorch implementation of our NeurIPS 2020 Oral paper Pixel-Level Cycle Association: A New Perspective for D

87 Oct 19, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Related tags

Overview

Learning Pixel-level Semantic Affinity with Image-level Supervision

Introduction

Citation

Prerequisite

Usage

1. Train a classification network to get CAMs.

2. Generate labels for AffinityNet by applying dCRF on CAMs.

(Optional) Check the accuracy of CAMs.

3. Train AffinityNet with the labels

4. Perform Random Walks on CAMs

Results and Trained Models

Class Activation Map

Random Walk with AffinityNet

Comments

Owner

Jiwoon Ahn

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

The PyTorch implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision.

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation"

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

[CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Learning trajectory representations using self-supervision and programmatic supervision.

Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)