Official PyTorch Implementation of Hypercorrelation Squeeze for Few-Shot Segmentation, arXiv 2021

Juhong Min

Last update: Dec 28, 2022

Related tags

Deep Learning hsnet

Overview

Hypercorrelation Squeeze for Few-Shot Segmentation

This is the implementation of the paper "Hypercorrelation Squeeze for Few-Shot Segmentation" by Juhong Min, Dahyun Kang, and Minsu Cho. Implemented on Python 3.7 and Pytorch 1.5.1.

For more information, check out project [website] and the paper on [arXiv].

Requirements

Python 3.7
PyTorch 1.5.1
cuda 10.1
tensorboard 1.14

Conda environment settings:

conda create -n hsnet python=3.7
conda activate hsnet

conda install pytorch=1.5.1 torchvision cudatoolkit=10.1 -c pytorch
conda install -c conda-forge tensorflow
pip install tensorboardX

Preparing Few-Shot Segmentation Datasets

Download following datasets:

1. PASCAL-5ⁱ

Download PASCAL VOC2012 devkit (train/val data):
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
Download PASCAL VOC2012 SDS extended mask annotations from our [Google Drive].

2. COCO-20ⁱ

Download COCO2014 train/val images and annotations:
wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip
Download COCO2014 train/val annotations from our Google Drive: [train2014.zip], [val2014.zip]. (and locate both train2014/ and val2014/ under annotations/ directory).

3. FSS-1000

Download FSS-1000 images and annotations from our [Google Drive].

Create a directory '../Datasets_HSN' for the above three few-shot segmentation datasets and appropriately place each dataset to have following directory structure:

../                         # parent directory
├── ./                      # current (project) directory
│   ├── common/             # (dir.) helper functions
│   ├── data/               # (dir.) dataloaders and splits for each FSSS dataset
│   ├── model/              # (dir.) implementation of Hypercorrelation Squeeze Network model 
│   ├── README.md           # intstruction for reproduction
│   ├── train.py            # code for training HSNet
│   └── test.py             # code for testing HSNet
└── Datasets_HSN/
    ├── VOC2012/            # PASCAL VOC2012 devkit
    │   ├── Annotations/
    │   ├── ImageSets/
    │   ├── ...
    │   └── SegmentationClassAug/
    ├── COCO2014/           
    │   ├── annotations/
    │   │   ├── train2014/  # (dir.) training masks (from Google Drive) 
    │   │   ├── val2014/    # (dir.) validation masks (from Google Drive)
    │   │   └── ..some json files..
    │   ├── train2014/
    │   └── val2014/
    └── FSS-1000/           # (dir.) contains 1000 object classes
        ├── abacus/   
        ├── ...
        └── zucchini/

Training

1. PASCAL-5ⁱ

python train.py --backbone {vgg16, resnet50, resnet101} 
                --fold {0, 1, 2, 3} 
                --benchmark pascal
                --lr 1e-3
                --bsz 20
                --load "path_to_trained_model/best_model.pt"
                --logpath "your_experiment_name"

Training takes approx. 2 days until convergence (trained with four 2080 Ti GPUs).

2. COCO-20ⁱ

python train.py --backbone {resnet50, resnet101} 
                --fold {0, 1, 2, 3} 
                --benchmark coco 
                --lr 1e-3
                --bsz 40
                --load "path_to_trained_model/best_model.pt"
                --logpath "your_experiment_name"

Training takes approx. 1 week until convergence (trained four Titan RTX GPUs).

3. FSS-1000

python train.py --backbone {vgg16, resnet50, resnet101} 
                --benchmark fss 
                --lr 1e-3
                --bsz 20
                --load "path_to_trained_model/best_model.pt"
                --logpath "your_experiment_name"

Training takes approx. 3 days until convergence (trained with four 2080 Ti GPUs).

Babysitting training:

Use tensorboard to babysit training progress:

For each experiment, a directory that logs training progress will be automatically generated under logs/ directory.

From terminal, run 'tensorboard --logdir logs/' to monitor the training progress.

Choose the best model when the validation (mIoU) curve starts to saturate.

Testing

1. PASCAL-5ⁱ

Pretrained models with tensorboard logs are available on our [Google Drive].

python test.py --backbone {vgg16, resnet50, resnet101} 
               --fold {0, 1, 2, 3} 
               --benchmark pascal
               --nshot {1, 5} 
               --load "path_to_trained_model/best_model.pt"

2. COCO-20ⁱ

Pretrained models with tensorboard logs are available on our [Google Drive].

python test.py --backbone {resnet50, resnet101} 
               --fold {0, 1, 2, 3} 
               --benchmark coco 
               --nshot {1, 5} 
               --load "path_to_trained_model/best_model.pt"

3. FSS-1000

Pretrained models with tensorboard logs are available on our [Google Drive].
python test.py --backbone {vgg16, resnet50, resnet101} 
               --benchmark fss 
               --nshot {1, 5} 
               --load "path_to_trained_model/best_model.pt"

4. Evaluation without support feature masking on PASCAL-5ⁱ

To reproduce the results in Tab.1 of our main paper, COMMENT OUT line 51 in hsnet.py: support_feats = self.mask_feature(support_feats, support_mask.clone())

Pretrained models with tensorboard logs are available on our [Google Drive].
python test.py --backbone resnet101 
               --fold {0, 1, 2, 3} 
               --benchmark pascal
               --nshot {1, 5} 
               --load "path_to_trained_model/best_model.pt"

Visualization

To visualize mask predictions, add command line argument --visualize: (prediction results will be saved under vis/ directory)

  python test.py '...other arguments...' --visualize

Example qualitative results (1-shot):

BibTeX

If you use this code for your research, please consider citing:

@article{min2021hypercorrelation, 
   title={Hypercorrelation Squeeze for Few-Shot Segmentation},
   author={Juhong Min and Dahyun Kang and Minsu Cho},
   journal={arXiv preprint arXiv:2104.01538},
   year={2021}
}

Comments

about FSS metrics

In FSS-1000, they said that "The metric we use is the IOU of positive labels in a binary segmentation map". And in DAN,the said that "Note that the metric on FSS-1000 is the IOU of positive labeels(P-IOU) in binary segmentation maps " But they donnot have a clear code to explain it. I think it is the sum of IOU on every test_img,and the divided the test_num. So, I want to ask a help for you ,do you think FSS metric have any difference to metric of PASCAL VOC(mIOU on classes)?

opened by ily666666 10
Paper Result

Hello,

I am training your model resnet50 on pascal (0 fold). After 70 epochs, the max MIou is 62.89. Now the MIou is Fluctuating below the maximum value. Can you suggest how I can achieve the paper results? are there any special parameters to achieve paper results?

Thank you

opened by Ehteshamciitwah 7
Pascal datasets

Hello,

During the training of your model. I realized the model is using nearly 11000 images for training on fold0. While the other comparable models like PFENet are using different lists containing nearly 4000 images for fold 0. Why there is a difference between training images number. am I missing something? Furthermore when I train your model using PFENet pipelines (data loader). The results are very low. Need your comments. Thank you

opened by Ehteshamciitwah 6
Unfair Comparisons especially on COCO

https://github.com/juhongm999/hsnet/issues/5

This is not the case actually because you did not fairly compare your results with REPRI and PFENet in the table1 where the results are directly copied from their papers. In https://github.com/mboudiaf/RePRI-for-Few-Shot-Segmentation/blob/master/src/dataset/transform.py#L80 they keep the aspect ratio of the resized images to be the same as the original image, but in your implementation https://github.com/juhongm999/hsnet/blob/e288916debe5290b3e9554fb61e13a474e00f885/data/dataset.py#L25 the images are simply resized to be the aspect ratio 1:1 without keeping the original label.

For my second question, now that all previous methods use 417, 473 or the original sizes for evaluating COCO and PASCAL. I do not understand why did you use size 400 on COCO and PASCAL to create a brand **new** setting and make other people hard to follow to have a fair comparison, even if the size 400 does not bring the best performance according to your words. Normally we should show the setting of the best results. This is true the performance on Pascal will be slightly higher when the training size grows but they are still comparable. And the models of REPRI and PASCAL cannot be directly tested with 1:1 aspect ratio because they are not trained with the images the 1:1 ratios in the non-255 regions. However on COCO I have tested on PFENet. the results will be much lower when it is evaluated with the original labels without resize. it is the same to the results shownin PFENet it is also mentioned by https://github.com/juhongm999/hsnet/issues/1#issuecomment-816485819. So I think it is unfair if you cannot show the COCO results with the original aspects and the original sizes (or 417, 473) to compare with related methods (REPRI, PFENET AND ASGNet and so on) because a smaller size for resizing labels does bring much better performance on COCO.

opened by deepAICrazy 5
about effect

I download your code,and train it according to the readme.txt, but when I train it 150 epoches,Its mIOU up to 82.47%.So ,can you tell me how should I train to achieve the effect of the essay.

opened by ily666666 5
unsufficient GPU storage?

Hello, I'm currently testing your code on a new class I inserted to FSS-1000 (weldings on big machine parts). When trying to change several values, such as the image size in line 88 in test.py or the batch size, i constantly run into this error:

RuntimeError: CUDA out of memory. Tried to allocate 1.40 GiB (GPU 0; 7.79 GiB total capacity; 4.36 GiB already allocated; 448.50 MiB free; 5.44 GiB reserved in total by PyTorch)

I'm working on a nvidia RTX 3070ti with 8 Gigs of VRAM. I suppose this is because Pytorch manually manages storage allocation and there is not enough VRAM for processing larger images. However this is what i want to do, because I suppose larger images lead to better results in Few Shot Segmentation. So my question is, is there any way to increase FB-IoU and mIou on my own set of images or do I have to implement a completely new dataset on my own and do I have to train a new model to use it on this own set of images?

Thank you in advance and for your impressive work.

opened by JoniGlueck 4
Possible code simplification?
Hi,

Here: https://github.com/juhongm999/hsnet/blob/2cd06324ef733004a4d0ef6ab594d16fd9d3061f/model/base/conv4d.py#L23-L34

Is a rather complicated function, which (if I understand it correctly) just strides the final two dimensions. Could you instead simply do:

out1 = x[...,::2,::2]

Perhaps there is something I'm missing here? Otherwise, I think it would make the code more readable.
opened by Parskatt 4
Training on data from different domain
Hi,

Thank you for making your work publicly available.

Currently, I am trying to train on satellite imagery (5 cm resolution) but it seems like the model can't converge.

The dataset consists of 6 classes of which 0=background. I have set the following in my custom Dataset:

self.nfolds = 5 self.nclass = 5

Can you confirm if my thought proces is correct that each fold will have 4 base classes in the training set and 1 base class in the validation set?

As for the training, I though about unfreezing the gradients because the domain data is so much different from the pretrained data.

Do you have any ideas on how I could obtain feasible results on my data?

Cheers!
opened by desmania 3
5-shot training
Hello, Thanks for your great work on FSS! I have some confusion about Implementation details of 5-shot training.

According to #8, did it mean that the dataloader give 5 support-query, then model gives 5 loss and sum these losses to backward?

With Figure A12, is it more rational to get five support images and one query image for 5-shot training?

According to section4.5 and #18, what's meaning of "maximum voting score"? Thank you.
opened by Joseph-Lee-V 2
Zero shot learning

Hello, I have a question for you. Does this code support for zeros shot learning?? I want to use zero shot with this code for evaluating and training it on my dataset

Thank you

opened by viethoang303 2
[Question] Domain shift result

COCO to Pascal - test

I tried to get domain shift results from your pre-trained model (Resnet50 COCO fold 0 ) but strangely got low accuracy.

Please correct me if I am wrong.

For example for the fold 0 in the domain shift test, First, I collect all train and validation pascal data and then exclude all classes that are in the coco fold 0. Only Airplane, Boat, Chair, Dining table, Dog, Person remain for fold 0 as explained in the paper RePRI.

The entire above-remained fold 0 data were used for checking accuracy. There are 7717 cases for the fold 0 test.

Should I sample 1000 cases from 7717 cases?

Thank you.

opened by moonsh 2

Owner

Juhong Min

research interest in computer vision

GitHub

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

138 Dec 12, 2022

Few-NERD: Not Only a Few-shot NER Dataset

Few-NERD: Not Only a Few-shot NER Dataset This is the source code of the ACL-IJCNLP 2021 paper: Few-NERD: A Few-shot Named Entity Recognition Dataset.

319 Dec 30, 2022

Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

T-Few This repository contains the official code for the paper: "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learni

220 Dec 31, 2022

The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

FAPIS The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter Introduction This repo is primari

8 Dec 11, 2022

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

103 Dec 14, 2022

Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Introduction Pytorch implementation of Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Expert. | paper Song Park1

97 Dec 23, 2022

(ICCV'21) Official PyTorch implementation of Relational Embedding for Few-Shot Classification

Relational Embedding for Few-Shot Classification (ICCV 2021) Dahyun Kang, Heeseung Kwon, Juhong Min, Minsu Cho [paper], [project hompage] We propose t

82 Dec 24, 2022

[CVPR 2021] Few-shot 3D Point Cloud Semantic Segmentation

Few-shot 3D Point Cloud Semantic Segmentation Created by Na Zhao from National University of Singapore Introduction This repository contains the PyTor

117 Dec 27, 2022

Adaptive Prototype Learning and Allocation for Few-Shot Segmentation (CVPR 2021)

ASGNet The code is for the paper "Adaptive Prototype Learning and Allocation for Few-Shot Segmentation" (accepted to CVPR 2021) [arxiv] Overview data/

91 Dec 23, 2022

Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

SCL Introduction Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)' We evaluated our approach using two baseline

34 Oct 8, 2022

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

35 Oct 26, 2022

The implementation of PEMP in paper "Prior-Enhanced Few-Shot Segmentation with Meta-Prototypes"

Prior-Enhanced network with Meta-Prototypes (PEMP) This is the PyTorch implementation of PEMP. Overview of PEMP Meta-Prototypes & Adaptive Prototypes

8 Oct 14, 2021

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

16 Oct 5, 2022

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

Deep GNN, Shallow Sampling Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kannan, Viktor Prasanna, Long Jin, R

117 Dec 20, 2022

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

671 Dec 31, 2022

Official PyTorch Implementation of Hypercorrelation Squeeze for Few-Shot Segmentation, arXiv 2021

Related tags

Overview

Hypercorrelation Squeeze for Few-Shot Segmentation

Requirements

Preparing Few-Shot Segmentation Datasets

1. PASCAL-5i

2. COCO-20i

3. FSS-1000

Training

1. PASCAL-5i

2. COCO-20i

3. FSS-1000

Babysitting training:

Testing

1. PASCAL-5i

2. COCO-20i

3. FSS-1000

4. Evaluation without support feature masking on PASCAL-5i

Visualization

Example qualitative results (1-shot):

BibTeX

Comments

Owner

Juhong Min

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Few-NERD: Not Only a Few-shot NER Dataset

Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

(ICCV'21) Official PyTorch implementation of Relational Embedding for Few-Shot Classification

[CVPR 2021] Few-shot 3D Point Cloud Semantic Segmentation

Adaptive Prototype Learning and Allocation for Few-Shot Segmentation (CVPR 2021)

Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

The implementation of PEMP in paper "Prior-Enhanced Few-Shot Segmentation with Meta-Prototypes"

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Listing arxiv - Personalized list of today's articles from ArXiv

Arxiv harvester - Poor man's simple harvester for arXiv resources

Official Implementation of Few-shot Visual Relationship Co-localization

PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

1. PASCAL-5ⁱ

2. COCO-20ⁱ

1. PASCAL-5ⁱ

2. COCO-20ⁱ

1. PASCAL-5ⁱ

2. COCO-20ⁱ

4. Evaluation without support feature masking on PASCAL-5ⁱ