Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR 2019 (Oral)

Jiwoon Ahn

Last update: Dec 29, 2022

Related tags

Overview

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations

The code of:

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, Jiwoon Ahn, Sunghyun Cho, and Suha Kwak, CVPR 2019 [Paper]

This repository contains a framework for learning instance segmentation with image-level class labels as supervision. The key component of our approach is Inter-pixel Relation Network (IRNet) that estimates two types of information: a displacement vector field and a class boundary map, both of which are in turn used to generate pseudo instance masks from CAMs.

Citation

If you find the code useful, please consider citing our paper using the following BibTeX entry.

@InProceedings{Ahn_2019_CVPR,
author = {Ahn, Jiwoon and Cho, Sunghyun and Kwak, Suha},
title = {Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}

Prerequisite

Python 3.7, PyTorch 1.1.0, and more in requirements.txt
PASCAL VOC 2012 devkit
NVIDIA GPU with more than 1024MB of memory

Usage

Install python dependencies

pip install -r requirements.txt

Download PASCAL VOC 2012 devkit

Follow instructions in http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit

Run run_sample.py or make your own script

python run_sample.py

You can either mannually edit the file, or specify commandline arguments.

Train Mask R-CNN or DeepLab with the generated pseudo labels

For the reports, we used Detectron.
- Run step/make_cocoann.py to create COCO-style annotations.
- Note: Do not employ https://storage.googleapis.com/coco-dataset/external/PASCAL_VOC.zip to measure the performance of the Mask R-CNN! It only contains bounding box annotations.
TorchVision now supports Mask R-CNN and DeepLab. I personally recommend to use this.

TO DO

Training code for MS-COCO
Code refactoring
IRNet v2

Comments

Log files for training

Hi, Can you share the log files for your training? I am unable to reproduce the performance of IRN reported in the paper using the default hyper-parameters (also mentioned here [Link]).

For instance segmentation, instead of 37.7 [email protected], I am getting the following:

step.eval_ins_seg: Wed Aug 14 09:55:44 2019
0.5iou: {'ap': array([0.0402722 , 0.        , 0.04831983, 0.02532846, 0.01264213,
       0.21497569, 0.13079764, 0.06767052, 0.00229753, 0.08129419,
       0.01570647, 0.05994737, 0.03092302, 0.26370536, 0.02019956,
       0.02099569, 0.0646912 , 0.16558015, 0.23535844, 0.1566734 ]), 'map': 0.08286894241843508}

and for semantic segmentation, instead of 66.5 mIOU, I am getting:

step.eval_sem_seg: Wed Aug 14 10:15:06 2019
0.12114407058527121 0.08625727491374735
0.2459830480445712 0.30624211370783205
{'iou': array([0.79259865, 0.43975817, 0.27018399, 0.42519734, 0.34189571,
       0.43639392, 0.57453956, 0.48851971, 0.41510347, 0.26892431,
       0.54274295, 0.37697739, 0.40495999, 0.47331797, 0.5605337 ,
       0.51401678, 0.39511615, 0.63538235, 0.40350322, 0.50775112,
       0.48067896]), 'miou': 0.4641950199739483}

Thanks.

opened by adityaarun1 11

About train_aug.txt

Congratulations! This is really good work!

As I was running your code, I find that train_aug.txt file was used to train CAM. I wonder where is this file comes from? And why not directly use VOC2012 trainval set?

Thanks a lot!

opened by zhaohui-yang 10
Performance Gap and Hyper-parameter Settings
Hi Jiwoon Ahn, Your paper is very good and I'm really interested in it. I've already tried your code, but I cannot achieve the same performace as the paper. Would you please help me figure out where the problem is?

In my experiments, the learning rates of both CAM and IRN are set to 0.1, while other hyper-parameters follow the default setting in rum_sample.py. My performance are as following, | model | task | my exp. | reported | | ------------ | ------------------------------- | ----------- | ----------- | | CAM | semantic segmentation | 48.1 | 48.3 | | IRN | semantic segmentation | 64.9 | 66.5 | | IRN | instance segmentation | 32.4 | 37.7 |

The CAM models have similar performace, but there are performance gaps between IRN models in both task.

There may be two possible reasons for the gap.

I notice the hyper-parameter settings in the paper and the code are not exactly the same. The exp_times is set to 8 in the code, while in the paper it is set to 256 (which also does not work in my case).

Anthor possible problem is that multiscale testing is only used in CAM, but not in IRN.

Would you please point out the differences between my experiments and yours that may results in the gap? Thank you!
opened by XiaoyanLi1 8
How do you get the result image?

Thanks to open your implementation!

I want to know how to save the visualization image like https://github.com/jiwoon-ahn/irn/blob/master/outline.jpg

thanks.

opened by UdonDa 6
How to process test data?

Hi, For train/val data, CAMs firstly filter by GT classification labels, then get final segmentation by argmax after norming remained CAMs. But How to handle with test data? Should I generate test classification label to do similar filter? or multiply cls probabilty with corresponding CAM?

opened by mt-cly 4
Tuning GN using inference data?

Dear Jiwoon, in the file 'train_irn.py', I noticed that GN was tuning using the inference data in the latest commit, location. Is this right in the weakly supervised instance segmentation setting? I think the validation set should not be touched except for evaluation, rather than training/tuning parameters. And I'm also curious what would be affected by this? Will the mAP be improved? Thanks

opened by zhaohui-yang 3
Performance is poor after re-train a Mask RCNN

Hi, I took the instance-level pseudo labels generated by running `make_ins_seg_labels.py' and kept the instance mask whose score is higher than 0. Then, I transfered these labels from *.npy to cocostyle json annotation and trained the standard Mask R-CNN with ResNet-50-FPN. However, the performance I've get is:

Specifically, box mAP of AP50 is 45.8, segmentation mAP of AP50 is 22.6. I noticed that the instance number in pseudo label is about 2/3 of the gt instance number for `train_aug' set. Did I miss something to reimplement the performance of Mask R-CNN with pseudo label?

Thanks a lot!

opened by bityangke 3
get AssertionError when eval_ins_seg.py

Traceback (most recent call last): File "run_sample.py", line 119, in step.eval_ins_seg.run(args) File "/home/maskrcnn-benchmark/irn/step/eval_ins_seg.py", line 10, in run gt_masks = [dataset.get_example_by_keys(i, (1,))[0] for i in range(len(dataset))] File "/home/irn/step/eval_ins_seg.py", line 10, in gt_masks = [dataset.get_example_by_keys(i, (1,))[0] for i in range(len(dataset))] File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/chainer_experimental/datasets/sliceable/getter_dataset.py", line 89, in get_example_by_keys cache[getter_index] = self._gettersgetter_index File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/datasets/voc/voc_instance_segmentation_dataset.py", line 66, in _get_annotations label_img, inst_img) File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/datasets/voc/voc_utils.py", line 55,in image_wise_to_instance_wise assert lbl != -1 AssertionError

opened by whitesockcat 3
using own dataset

I am trying to adjust the code to my own dataset. However, I am really struggling since I am not a pro at python.

How can I generate cls_labels.npy for a different dataset? The script make_cls_labels.py does not work. Plus, it makes use of .xml files. Is there an easier way to generate a dictionary with image level labels?

cls_labels_dict = np.load('voc12/cls_labels.npy', allow_pickle=True).item() print(cls_labels_dict) # 2011003271: array([0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)}

Also, my images don't share the same naming conventions as VOC12, so this part of the code creates a ton of problems: def decode_int_filename(int_filename): s = str(int(int_filename))

opened by SuzannaLin 2
comparation with Affinity

Hi, In train_irn step, I remove the dispalce loss part and remains only boundary loss. I notice boundary loss is similar to the AffinityNet which you published in CVPR18 even the detail has some differents. But the semantic mIoU only 37+% which is even worse than CAM result(50%)，comared to Affinity result(59%)。 So I confuse the reason for such gap in same idea, similar loss. Have you some suggests? THX

opened by mt-cly 2
Asking about the Mask-Rcnn training strategy

Hi, Jiwoon Ahn After transforming the pseudo label to the COCO-style annotations, I trained the Mask R-CNN with ResNet-50-FPN .

But the performance i got is slightly lower than the report ，mAP50 is 45.0.

I 'd like to ask you about the mask-rcnn training strategy, what kind data augmentation you adopt.

Thank you !

opened by vicchu 2
Training is so slow after first epoch

Hello,

We were using a custom dataset for this repo. Training CAM is too slow. After the first epoch, it shows an estimated finish time of 2.5 days later.

Our training dataset has 8960 images. The batch size is 4.

Have you ever faced this problem? Thank you.

opened by gozdedemirci 0
Time cost of generating one pseudo instance mask

Hi,

After testing the IRNet, I found it takes about 3 seconds to generate one pseudo instance mask on my machine. I searched around and found no one mentioned the efficiency here, or even in the WSIS community. Or maybe I missed some paper/post.

I understand for the final goal the inference time matters, not the time of generating one pseudo instance mask. But is there any way that I can make it faster? Why people don't care about this?

Thanks

opened by fcc315 0
On the number of convolutional filters in IRNet

I noticed that the convolutinal filter numbers in IRNet (either the class boundary part or the displacement part) is different from the settings in your original paper. So, may I ask, generally speaking, which setting is better in your former experiments? Best wishes.

opened by BiQiWHU 0
about the function of “Instance Map”

I think it's OK to use "CAM" and "Pairwise Affinities" capturing instance segmentation masks. Because the "Instance Map" purpose is to distinguish instances, and "Pairwise Affinities" also has this function. And only using these two modules can make the algorithm simple. Can you tell me why "Instance Map" can't be ignored? Thank you for your reply！

As shown in the figure below.

opened by jingtingxu369 0
about the search indices
` for x in range(1, max_radius): search_dirs.append((0, x))

for y in range(1, max_radius): for x in range(-max_radius + 1, max_radius): if x * x + y * y < max_radius ** 2: search_dirs.append((y, x))`

Thanks for sharing the work. I think the search_dirs seems to be a half circle instead of a circle. Not sure whether i understand it correctly. Look forward to your reply.
opened by roywithfiringblade 1

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR 2019 (Oral)

Related tags

Overview

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations

Citation

Prerequisite

Usage

Install python dependencies

Download PASCAL VOC 2012 devkit

Run run_sample.py or make your own script

Train Mask R-CNN or DeepLab with the generated pseudo labels

TO DO

Comments

Owner

Jiwoon Ahn

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

The PyTorch implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision.

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection