Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Overview

Learning Pixel-level Semantic Affinity with Image-level Supervision

This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead.

outline

Introduction

The code and trained models of:

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, Jiwoon Ahn and Suha Kwak, CVPR 2018 [Paper]

We have developed a framework based on AffinityNet to generate accurate segmentation labels of training images given their image-level class labels only. A segmentation network learned with our synthesized labels outperforms previous state-of-the-arts by large margins on the PASCAL VOC 2012.

*Our code was first implemented in Tensorflow at the time of CVPR 2018 submssion, and later we migrated to PyTorch. Some trivial details (optimizer, channel size, and etc.) have been changed.

Citation

If you find the code useful, please consider citing our paper using the following BibTeX entry.

@InProceedings{Ahn_2018_CVPR,
author = {Ahn, Jiwoon and Kwak, Suha},
title = {Learning Pixel-Level Semantic Affinity With Image-Level Supervision for Weakly Supervised Semantic Segmentation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}

Prerequisite

  • Tested on Ubuntu 16.04, with Python 3.5, PyTorch 0.4, Torchvision 0.2.1, CUDA 9.0, and 1x NVIDIA TITAN X (Pascal).
  • The PASCAL VOC 2012 development kit: You also need to specify the path ('voc12_root') of your downloaded dev kit.
  • (Optional) If you want to try with the VGG-16 based network, PyCaffe and VGG-16 ImageNet pretrained weights [vgg16_20M.caffemodel]
  • (Optional) If you want to try with the ResNet-38 based network, Mxnet and ResNet-38 pretrained weights [ilsvrc-cls_rna-a1_cls1000_ep-0001.params]

Usage

1. Train a classification network to get CAMs.

python3 train_cls.py --lr 0.1 --batch_size 16 --max_epoches 15 --crop_size 448 --network [network.vgg16_cls | network.resnet38_cls] --voc12_root [your_voc12_root_folder] --weights [your_weights_file] --wt_dec 5e-4

2. Generate labels for AffinityNet by applying dCRF on CAMs.

python3 infer_cls.py --infer_list voc12/train_aug.txt --voc12_root [your_voc12_root_folder] --network [network.vgg16_cls | network.resnet38_cls] --weights [your_weights_file] --out_cam [desired_folder] --out_la_crf [desired_folder] --out_ha_crf [desired_folder]

(Optional) Check the accuracy of CAMs.

python3 infer_cls.py --infer_list voc12/val.txt --voc12_root [your_voc12_root_folder] --network network.resnet38_cls --weights res38_cls.pth --out_cam_pred [desired_folder]

3. Train AffinityNet with the labels

python3 train_aff.py --lr 0.1 --batch_size 8 --max_epoches 8 --crop_size 448 --voc12_root [your_voc12_root_folder] --network [network.vgg16_aff | network.resnet38_aff] --weights [your_weights_file] --wt_dec 5e-4 --la_crf_dir [your_output_folder] --ha_crf_dir [your_output_folder]

4. Perform Random Walks on CAMs

python3 infer_aff.py --infer_list [voc12/val.txt | voc12/train.txt] --voc12_root [your_voc12_root_folder] --network [network.vgg16_aff | network.resnet38_aff] --weights [your_weights_file] --cam_dir [your_output_folder] --out_rw [desired_folder]

Results and Trained Models

Class Activation Map

Model Train (mIoU) Val (mIoU)
VGG-16 48.9 46.6 [Weights]
ResNet-38 47.7 47.2 [Weights]
ResNet-38 48.0 46.8 CVPR submission

Random Walk with AffinityNet

Model alpha Train (mIoU) Val (mIoU)
VGG-16 4/16/32 59.6 54.0 [Weights]
ResNet-38 4/16/32 61.0 60.2 [Weights]
ResNet-38 4/16/24 58.1 57.0 CVPR submission

*beta=8, gamma=5, t=256 for all settings

Comments
  • the loss of classification network doesn't decrease

    the loss of classification network doesn't decrease

    When I trained the classification network(using both pretrained vgg and resnet weights), the loss didn't decrease succesfully using given hyperparamters. For example, the loss of vgg network vibrated around 0.24 after 1k iters, I also tried the learning rate of 0.01, it also failed. Could you give me some suggestions? Thanks~

    opened by kevinlee9 9
  • CAM mIoU

    CAM mIoU

    Hi Jiwoon,

    thanks for sharing this nice work! I'm trying to generate the CAMs with a model I trained myself, but the mIoU I get is quite low, 42.28 (on PASCAL VOC2012 / train). I first ran

    python3 train_cls.py --lr 0.1 --batch_size 16 --max_epoches 15 --crop_size 448 --network network.resnet38_cls --voc12_root ./data --weights weights/ilsvrc-cls_rna-a1_cls1000_ep-0001.pth --wt_dec 5e-4
    

    and then generated the CAMs with

    python3 infer_cls.py --infer_list voc12/train.txt --voc12_root ./data --network network.resnet38_cls --weights res38_cls.pth --out_cam_pred ./cams
    

    Is there anything amiss? To get mIoU of 48% in the paper, should I use the dCRF? Here I don't.

    More details, in case it helps:

    | Class | # | IoU | Pr | Re | |:---:|---:|---:|---:|---:| | background | 10581 | 71.6 | 83.4 | 83.9 | aeroplane | 586 | 37.8 | 40.8 | 93.8 | bicycle | 485 | 43.6 | 50.3 | 86.1 | bird | 698 | 34.1 | 39.6 | 86.1 | boat | 460 | 28.1 | 35.1 | 79.0 | bottle | 651 | 27.0 | 31.9 | 81.9 | bus | 385 | 61.3 | 77.2 | 79.3 | car | 1079 | 39.9 | 50.1 | 81.1 | cat | 1000 | 43.8 | 73.3 | 57.9 | chair | 1063 | 34.0 | 44.1 | 68.3 | cow | 262 | 42.2 | 54.8 | 78.0 | diningtable | 520 | 39.2 | 54.6 | 65.4 | dog | 1176 | 41.9 | 66.4 | 63.4 | horse | 444 | 40.4 | 58.4 | 65.6 | motorbike | 481 | 51.9 | 62.8 | 83.0 | person | 3876 | 37.2 | 50.5 | 66.7 | potted-plant | 485 | 35.0 | 45.8 | 76.8 | sheep | 299 | 43.8 | 50.6 | 86.4 | sofa | 474 | 43.5 | 65.9 | 60.0 | train | 499 | 52.6 | 65.9 | 79.6 | tv/monitor | 548 | 38.9 | 45.3 | 87.4 | ambiguous | 330 | 0.0 | 0.0 | 0.0

    mIou: 42.28 (background included)

    Best, Nikita

    opened by arnike 8
  • model weights info

    model weights info

    hi @jiwoon-ahn @hardBird123 , I wanted to know if you trained the vgg16 model from scratch or used a imagenet pre-trained model and used only specific layers of it for training on voc12?

    opened by sinAshish 8
  • The training params for ResNet38_aff

    The training params for ResNet38_aff

    @jiwoon-ahn hi, Thanks for your nice work! I load your shared weights and achieve the same mIoU as you showed in README.

    While I meet some trouble to train the model by myself and only achieve 59.077% mIoU (instead of 60.2%) on val set by ResNet38_aff. As you mentioned in other issues that the default params setting is just for vgg16, could you share the params setting for resnet38_aff?

    Here is my res38_aff training setttings: lr=0.01 gpu=4 (may cause different batch number on each gpu) batchsize=8 max_epoch=8 loss_weight=1/4 * bg + 1/4 * fg + 1/2 * neg pretrained_model=res38_cls.pth (provided by you) alpha=4/16/32 other default params...

    opened by YudeWang 4
  • learning rate

    learning rate

    in train_cls.py,when the learning rate is set to 0.1,the loss is increasing,so I tried to set it to 0.01,it works,,so can you verify the learning rate in train_cls.py and train_aff.py?Thanks in advance!

    opened by LeiyuanMa 4
  • Do you use validation data for training when report results on test split

    Do you use validation data for training when report results on test split

    suppose the augmented VOC dataset can be splited into train_aug(10582)+val(1449)+test(1456), when you report results on test split in your paper, do you use train_aug + val for training or just use train_aug? thx!

    opened by yaoqi-zd 2
  • The number of epochs?

    The number of epochs?

    Thanks for the code!

    For how many epochs did you train your model to achieve the results in the paper? I am wondering what a good practice would be for reporting the results since there isn't really a validation set for early stopping. Thanks!

    opened by IssamLaradji 2
  • Downloading the pretrained weights for resnet-38

    Downloading the pretrained weights for resnet-38

    Thanks for the code! How did you download the pretrained weights for ResNet-38? When I click on the link here (Optional) Mxnet and ResNet-38 pretrained weights [ilsvrc-cls_rna-a1_cls1000_ep-0001.params], I am just taken to another github repo. without instructions on how to get the pretrained weights. Thanks

    opened by IssamLaradji 2
  • Normalize transform also shifts RGB to BGR pattern and vice-versa

    Normalize transform also shifts RGB to BGR pattern and vice-versa

    Here index 0 is mapped to 2 and 2 mapped to 0 - https://github.com/jiwoon-ahn/psa/blob/master/network/vgg16d.py#L15

    Image is initially loaded in RGB Mode - https://github.com/jiwoon-ahn/psa/blob/9493e59bef16687ecb4821387e38e6460857f508/voc12/data.py#L69

    Finally, the transform is simply applied to RGB image, no where there is need for channel swapping. https://github.com/jiwoon-ahn/psa/blob/master/train_cls.py#L82

    P.S. It could be a bug due to some legacy code using OpenCV to open image, which defaults to BGR mode.

    opened by saurabheights 1
  • Can you share your training command for ResNet-38 segmentation network?

    Can you share your training command for ResNet-38 segmentation network?

    Hi, thanks for your brilliant work! When I tried to reproduce the ResNet38 segmentation network, I found the repo you mentioned (https://github.com/jiwoon-ahn/psa/issues/7#issuecomment-430170981_) didn't give the training command. So I'm wondering if you can share your training command for ResNet-38 segmentation network?

    opened by rulixiang 1
  • Regarding the code

    Regarding the code

    Thank author for the code. I have two small questions:

    1. regarding this small block of code below to obtain cam: def forward_cam(self, x): x = super().forward(x) x = self.fc8(x) x = F.relu(x) x = torch.sqrt(x) return x what's x=torch.sqrt(x) for?
    2. regarding the pretrained caffe weghts, i.e. vgg16_20M.prototxt, on what dataset was it trained on? were you meant to use the deeplab_LargeFOV model of vgg16 version as the network to compute CAM, but why setting fc6_dilation=1 not 12 as in deeplab v1 paper?
    opened by laukun 1
  • VGG - computing cam and use of sqrt

    VGG - computing cam and use of sqrt

    At line https://github.com/jiwoon-ahn/psa/blob/master/network/vgg16_cls.py#L36. CAM is weighted combination of feature maps, so not sure why sqrt is used.

    Please, could you provide some clarification.

    opened by saurabheights-ecr 1
  • The paramters in optimizer

    The paramters in optimizer

    Hello, I note that the order of paramters (params lr wd) in PolyOptimizer is different from official SGD(params lr momentum). So I think the value of wd will actually be assigned to momentum. Is it so?

    class PolyOptimizer(torch.optim.SGD):
    
        def __init__(self, params, lr, weight_decay, max_step, momentum=0.9):
            super().__init__(params, lr, weight_decay)
    
    opened by lzyhha 0
  • Last fully connect layer

    Last fully connect layer

    Hi, I notice that the last fc layer in resnet38_cls.py is defined as self.fc8 = nn.Conv2d(4096, 20, 1, bias=False), while I found another implementation define the last layer as a linear self.fc = nn.Linear(2048, num_classes), I wonder why do you use convolution instead, and is there any difference?

    opened by weixuansun 0
  • how to calculate the mIOU?

    how to calculate the mIOU?

    I run the project and got the result out_cam/out_la_crf/out_ha_crf/out_rw. Where should I download GT? How should I calculate the mIOU? Can you provide the code to calculate the mIOU?

    opened by jieruyao49 3
  • Training schedule for segmentation net

    Training schedule for segmentation net

    Hi Jiwoon,

    when training the segmentation net on the final masks, what schedule did you use (e.g. learning rate/decay, num of iterations/epochs, frozen/learned BN)? My guess is that you might need early stopping to avoid overfitting to the errors in the masks used for pseudo-supervision.

    Thanks, Nikita

    opened by arnike 1
Owner
Jiwoon Ahn
Deep Learning Researcher
Jiwoon Ahn
DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

The Official PyTorch Implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Shiyi Lan 3 Oct 15, 2021
The PyTorch implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision.

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision The PyTorch implementation of DiscoBox: Weakly Supe

Shiyi Lan 1 Oct 23, 2021
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

Melon(Xuguang Duan) 96 Nov 1, 2022
The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation"

SD-AANet The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation" [arxiv] Overview confi

cv516Buaa 9 Nov 7, 2022
Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

CCAM (Unsupervised) Code repository for our paper "CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localizati

Computer Vision Insitute, SZU 113 Dec 27, 2022
[CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

TorchSemiSeg [CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision by Xiaokang Chen1, Yuhui Yuan2, Gang Zeng1, Jingdong Wang

Chen XiaoKang 387 Jan 8, 2023
Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation Input Image Initial CAM Successive Maps with adversar

Jungbeom Lee 110 Dec 7, 2022
Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

null 44 Dec 12, 2022
Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation This paper has been accepted and early accessed

Yun Liu 39 Sep 20, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022
A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

wsss-analysis The code of: A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, arXiv pre-print 2019 paper.

Lyndon Chan 48 Dec 18, 2022
Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Pixel-Level Cycle Association This is the Pytorch implementation of our NeurIPS 2020 Oral paper Pixel-Level Cycle Association: A New Perspective for D

null 87 Oct 19, 2022
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Phil Wang 272 Dec 23, 2022
Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation The code of: Context Decoupling Augmentation for Weakly Supervised Semanti

null 54 Dec 12, 2022
Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation

Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation (AAAI 2021) Official pytorch implementation of our paper: Discriminative

Beom 74 Dec 27, 2022
Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021) The implementation of Reducing Infromation Bottleneck for W

Jungbeom Lee 81 Dec 16, 2022
Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021) This is the implementation of PSD (ICCV 2021),

null 12 Dec 12, 2022
Learning trajectory representations using self-supervision and programmatic supervision.

Trajectory Embedding for Behavior Analysis (TREBA) Implementation from the paper: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Y

null 58 Jan 6, 2023
Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

OpenSelfSup News Downstream tasks now support more methods(Mask RCNN-FPN, RetinaNet, Keypoints RCNN) and more datasets(Cityscapes). 'GaussianBlur' is

AI Lab, Westlake University 332 Jan 3, 2023