Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR 2019 (Oral)

Overview

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations

outline

The code of:

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, Jiwoon Ahn, Sunghyun Cho, and Suha Kwak, CVPR 2019 [Paper]

This repository contains a framework for learning instance segmentation with image-level class labels as supervision. The key component of our approach is Inter-pixel Relation Network (IRNet) that estimates two types of information: a displacement vector field and a class boundary map, both of which are in turn used to generate pseudo instance masks from CAMs.

Citation

If you find the code useful, please consider citing our paper using the following BibTeX entry.

@InProceedings{Ahn_2019_CVPR,
author = {Ahn, Jiwoon and Cho, Sunghyun and Kwak, Suha},
title = {Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}

Prerequisite

  • Python 3.7, PyTorch 1.1.0, and more in requirements.txt
  • PASCAL VOC 2012 devkit
  • NVIDIA GPU with more than 1024MB of memory

Usage

Install python dependencies

pip install -r requirements.txt

Download PASCAL VOC 2012 devkit

Run run_sample.py or make your own script

python run_sample.py
  • You can either mannually edit the file, or specify commandline arguments.

Train Mask R-CNN or DeepLab with the generated pseudo labels

TO DO

  • Training code for MS-COCO
  • Code refactoring
  • IRNet v2
Comments
  • Log files for training

    Log files for training

    Hi, Can you share the log files for your training? I am unable to reproduce the performance of IRN reported in the paper using the default hyper-parameters (also mentioned here [Link]).

    For instance segmentation, instead of 37.7 [email protected], I am getting the following:

    step.eval_ins_seg: Wed Aug 14 09:55:44 2019
    0.5iou: {'ap': array([0.0402722 , 0.        , 0.04831983, 0.02532846, 0.01264213,
           0.21497569, 0.13079764, 0.06767052, 0.00229753, 0.08129419,
           0.01570647, 0.05994737, 0.03092302, 0.26370536, 0.02019956,
           0.02099569, 0.0646912 , 0.16558015, 0.23535844, 0.1566734 ]), 'map': 0.08286894241843508}
    

    and for semantic segmentation, instead of 66.5 mIOU, I am getting:

    step.eval_sem_seg: Wed Aug 14 10:15:06 2019
    0.12114407058527121 0.08625727491374735
    0.2459830480445712 0.30624211370783205
    {'iou': array([0.79259865, 0.43975817, 0.27018399, 0.42519734, 0.34189571,
           0.43639392, 0.57453956, 0.48851971, 0.41510347, 0.26892431,
           0.54274295, 0.37697739, 0.40495999, 0.47331797, 0.5605337 ,
           0.51401678, 0.39511615, 0.63538235, 0.40350322, 0.50775112,
           0.48067896]), 'miou': 0.4641950199739483}
    

    Thanks.

    opened by adityaarun1 11
  • About train_aug.txt

    About train_aug.txt

    Congratulations! This is really good work!

    As I was running your code, I find that train_aug.txt file was used to train CAM. I wonder where is this file comes from? And why not directly use VOC2012 trainval set?

    Thanks a lot!

    opened by zhaohui-yang 10
  • Performance Gap and Hyper-parameter Settings

    Performance Gap and Hyper-parameter Settings

    Hi Jiwoon Ahn, Your paper is very good and I'm really interested in it. I've already tried your code, but I cannot achieve the same performace as the paper. Would you please help me figure out where the problem is?

    In my experiments, the learning rates of both CAM and IRN are set to 0.1, while other hyper-parameters follow the default setting in rum_sample.py. My performance are as following, | model | task | my exp. | reported | | ------------ | ------------------------------- | ----------- | ----------- | | CAM | semantic segmentation | 48.1 | 48.3 | | IRN | semantic segmentation | 64.9 | 66.5 | | IRN | instance segmentation | 32.4 | 37.7 |

    The CAM models have similar performace, but there are performance gaps between IRN models in both task.

    There may be two possible reasons for the gap.

    1. I notice the hyper-parameter settings in the paper and the code are not exactly the same. The exp_times is set to 8 in the code, while in the paper it is set to 256 (which also does not work in my case).
    2. Anthor possible problem is that multiscale testing is only used in CAM, but not in IRN.

    Would you please point out the differences between my experiments and yours that may results in the gap? Thank you!

    opened by XiaoyanLi1 8
  • How do you get the result image?

    How do you get the result image?

    Thanks to open your implementation!

    I want to know how to save the visualization image like https://github.com/jiwoon-ahn/irn/blob/master/outline.jpg

    thanks.

    opened by UdonDa 6
  • How to process test data?

    How to process test data?

    Hi, For train/val data, CAMs firstly filter by GT classification labels, then get final segmentation by argmax after norming remained CAMs. But How to handle with test data? Should I generate test classification label to do similar filter? or multiply cls probabilty with corresponding CAM?

    opened by mt-cly 4
  • Tuning GN using inference data?

    Tuning GN using inference data?

    Dear Jiwoon, in the file 'train_irn.py', I noticed that GN was tuning using the inference data in the latest commit, location. Is this right in the weakly supervised instance segmentation setting? I think the validation set should not be touched except for evaluation, rather than training/tuning parameters. And I'm also curious what would be affected by this? Will the mAP be improved? Thanks

    opened by zhaohui-yang 3
  • Performance is poor after re-train a Mask RCNN

    Performance is poor after re-train a Mask RCNN

    Hi, I took the instance-level pseudo labels generated by running `make_ins_seg_labels.py' and kept the instance mask whose score is higher than 0. Then, I transfered these labels from *.npy to cocostyle json annotation and trained the standard Mask R-CNN with ResNet-50-FPN. However, the performance I've get is:

    image

    Specifically, box mAP of AP50 is 45.8, segmentation mAP of AP50 is 22.6. I noticed that the instance number in pseudo label is about 2/3 of the gt instance number for `train_aug' set. Did I miss something to reimplement the performance of Mask R-CNN with pseudo label?

    Thanks a lot!

    opened by bityangke 3
  • get AssertionError when eval_ins_seg.py

    get AssertionError when eval_ins_seg.py

    Traceback (most recent call last): File "run_sample.py", line 119, in step.eval_ins_seg.run(args) File "/home/maskrcnn-benchmark/irn/step/eval_ins_seg.py", line 10, in run gt_masks = [dataset.get_example_by_keys(i, (1,))[0] for i in range(len(dataset))] File "/home/irn/step/eval_ins_seg.py", line 10, in gt_masks = [dataset.get_example_by_keys(i, (1,))[0] for i in range(len(dataset))] File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/chainer_experimental/datasets/sliceable/getter_dataset.py", line 89, in get_example_by_keys cache[getter_index] = self._gettersgetter_index File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/datasets/voc/voc_instance_segmentation_dataset.py", line 66, in _get_annotations label_img, inst_img) File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/datasets/voc/voc_utils.py", line 55,in image_wise_to_instance_wise assert lbl != -1 AssertionError

    opened by whitesockcat 3
  • using own dataset

    using own dataset

    I am trying to adjust the code to my own dataset. However, I am really struggling since I am not a pro at python.

    How can I generate cls_labels.npy for a different dataset? The script make_cls_labels.py does not work. Plus, it makes use of .xml files. Is there an easier way to generate a dictionary with image level labels?

    cls_labels_dict = np.load('voc12/cls_labels.npy', allow_pickle=True).item() print(cls_labels_dict) # 2011003271: array([0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)}

    Also, my images don't share the same naming conventions as VOC12, so this part of the code creates a ton of problems: def decode_int_filename(int_filename): s = str(int(int_filename))

    image

    opened by SuzannaLin 2
  • comparation with Affinity

    comparation with Affinity

    Hi, In train_irn step, I remove the dispalce loss part and remains only boundary loss. I notice boundary loss is similar to the AffinityNet which you published in CVPR18 even the detail has some differents. But the semantic mIoU only 37+% which is even worse than CAM result(50%),comared to Affinity result(59%)。 So I confuse the reason for such gap in same idea, similar loss. Have you some suggests? THX

    opened by mt-cly 2
  • Asking about the Mask-Rcnn training strategy

    Asking about the Mask-Rcnn training strategy

    Hi, Jiwoon Ahn After transforming the pseudo label to the COCO-style annotations, I trained the Mask R-CNN with ResNet-50-FPN .

    But the performance i got is slightly lower than the report ,mAP50 is 45.0.

    image

    I 'd like to ask you about the mask-rcnn training strategy, what kind data augmentation you adopt.

    Thank you !

    opened by vicchu 2
  • Training is so slow after first epoch

    Training is so slow after first epoch

    Hello,

    We were using a custom dataset for this repo. Training CAM is too slow. After the first epoch, it shows an estimated finish time of 2.5 days later.

    Our training dataset has 8960 images. The batch size is 4.

    Have you ever faced this problem? Thank you.

    opened by gozdedemirci 0
  • Time cost of generating one pseudo instance mask

    Time cost of generating one pseudo instance mask

    Hi,

    After testing the IRNet, I found it takes about 3 seconds to generate one pseudo instance mask on my machine. I searched around and found no one mentioned the efficiency here, or even in the WSIS community. Or maybe I missed some paper/post.

    I understand for the final goal the inference time matters, not the time of generating one pseudo instance mask. But is there any way that I can make it faster? Why people don't care about this?

    Thanks

    opened by fcc315 0
  • On the number of convolutional filters in IRNet

    On the number of convolutional filters in IRNet

    I noticed that the convolutinal filter numbers in IRNet (either the class boundary part or the displacement part) is different from the settings in your original paper. So, may I ask, generally speaking, which setting is better in your former experiments? Best wishes.

    opened by BiQiWHU 0
  • about the function of “Instance Map”

    about the function of “Instance Map”

    I think it's OK to use "CAM" and "Pairwise Affinities" capturing instance segmentation masks. Because the "Instance Map" purpose is to distinguish instances, and "Pairwise Affinities" also has this function. And only using these two modules can make the algorithm simple. Can you tell me why "Instance Map" can't be ignored? Thank you for your reply!

    As shown in the figure below. Image 6

    opened by jingtingxu369 0
  • about the search indices

    about the search indices

    ` for x in range(1, max_radius): search_dirs.append((0, x))

        for y in range(1, max_radius): 
            for x in range(-max_radius + 1, max_radius):
                if x * x + y * y < max_radius ** 2:
                    search_dirs.append((y, x))`
    

    Thanks for sharing the work. I think the search_dirs seems to be a half circle instead of a circle. Not sure whether i understand it correctly. Look forward to your reply.

    opened by roywithfiringblade 1
Owner
Jiwoon Ahn
Deep Learning Researcher
Jiwoon Ahn
Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation This paper has been accepted and early accessed

Yun Liu 39 Sep 20, 2022
git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction Code for the ECCV 2020 paper by Yiming Qian and Yasutaka Furukawa Getting

null 37 Dec 4, 2022
Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Good news! We release a clean version of PVNet: clean-pvnet, including how to train the PVNet on the custom dataset. Use PVNet with a detector. The tr

ZJU3DV 722 Dec 27, 2022
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

Melon(Xuguang Duan) 96 Nov 1, 2022
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

[CVPRW 2021] - Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Anirudh S Chakravarthy 6 May 3, 2022
DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

The Official PyTorch Implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Shiyi Lan 3 Oct 15, 2021
The PyTorch implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision.

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision The PyTorch implementation of DiscoBox: Weakly Supe

Shiyi Lan 1 Oct 23, 2021
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set —— PyTorch implementation This is an unofficial offici

Sicheng Xu 833 Dec 28, 2022
Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

CCAM (Unsupervised) Code repository for our paper "CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localizati

Computer Vision Insitute, SZU 113 Dec 27, 2022
The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection Updates | Introduction | Results | Usage | Citation |

null 33 Jan 5, 2023
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Temporally Efficient Vision Transformer for Video Instance Segmentation Temporally Efficient Vision Transformer for Video Instance Segmentation (CVPR

Hust Visual Learning Team 203 Dec 31, 2022
Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

Pilhyeon Lee 67 Jan 3, 2023
Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation Input Image Initial CAM Successive Maps with adversar

Jungbeom Lee 110 Dec 7, 2022
Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

null 44 Dec 12, 2022
Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Pixel-Level Cycle Association This is the Pytorch implementation of our NeurIPS 2020 Oral paper Pixel-Level Cycle Association: A New Perspective for D

null 87 Oct 19, 2022
Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

Large-Scale Long-Tailed Recognition in an Open World [Project] [Paper] [Blog] Overview Open Long-Tailed Recognition (OLTR) is the author's re-implemen

Zhongqi Miao 761 Dec 26, 2022
Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Scribble-Supervised LiDAR Semantic Segmentation Dataset and code release for the paper Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORA

null 102 Dec 25, 2022
The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

null 9 Nov 14, 2022
Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

null 5 Dec 10, 2022