Official PyTorch Implementation of Hypercorrelation Squeeze for Few-Shot Segmentation, arXiv 2021

Related tags

Deep Learning hsnet
Overview

PWC PWC PWC PWC PWC PWC PWC PWC

Hypercorrelation Squeeze for Few-Shot Segmentation

This is the implementation of the paper "Hypercorrelation Squeeze for Few-Shot Segmentation" by Juhong Min, Dahyun Kang, and Minsu Cho. Implemented on Python 3.7 and Pytorch 1.5.1.

For more information, check out project [website] and the paper on [arXiv].

Requirements

  • Python 3.7
  • PyTorch 1.5.1
  • cuda 10.1
  • tensorboard 1.14

Conda environment settings:

conda create -n hsnet python=3.7
conda activate hsnet

conda install pytorch=1.5.1 torchvision cudatoolkit=10.1 -c pytorch
conda install -c conda-forge tensorflow
pip install tensorboardX

Preparing Few-Shot Segmentation Datasets

Download following datasets:

1. PASCAL-5i

Download PASCAL VOC2012 devkit (train/val data):

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

Download PASCAL VOC2012 SDS extended mask annotations from our [Google Drive].

2. COCO-20i

Download COCO2014 train/val images and annotations:

wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip

Download COCO2014 train/val annotations from our Google Drive: [train2014.zip], [val2014.zip]. (and locate both train2014/ and val2014/ under annotations/ directory).

3. FSS-1000

Download FSS-1000 images and annotations from our [Google Drive].

Create a directory '../Datasets_HSN' for the above three few-shot segmentation datasets and appropriately place each dataset to have following directory structure:

../                         # parent directory
├── ./                      # current (project) directory
│   ├── common/             # (dir.) helper functions
│   ├── data/               # (dir.) dataloaders and splits for each FSSS dataset
│   ├── model/              # (dir.) implementation of Hypercorrelation Squeeze Network model 
│   ├── README.md           # intstruction for reproduction
│   ├── train.py            # code for training HSNet
│   └── test.py             # code for testing HSNet
└── Datasets_HSN/
    ├── VOC2012/            # PASCAL VOC2012 devkit
    │   ├── Annotations/
    │   ├── ImageSets/
    │   ├── ...
    │   └── SegmentationClassAug/
    ├── COCO2014/           
    │   ├── annotations/
    │   │   ├── train2014/  # (dir.) training masks (from Google Drive) 
    │   │   ├── val2014/    # (dir.) validation masks (from Google Drive)
    │   │   └── ..some json files..
    │   ├── train2014/
    │   └── val2014/
    └── FSS-1000/           # (dir.) contains 1000 object classes
        ├── abacus/   
        ├── ...
        └── zucchini/

Training

1. PASCAL-5i

python train.py --backbone {vgg16, resnet50, resnet101} 
                --fold {0, 1, 2, 3} 
                --benchmark pascal
                --lr 1e-3
                --bsz 20
                --load "path_to_trained_model/best_model.pt"
                --logpath "your_experiment_name"
  • Training takes approx. 2 days until convergence (trained with four 2080 Ti GPUs).

2. COCO-20i

python train.py --backbone {resnet50, resnet101} 
                --fold {0, 1, 2, 3} 
                --benchmark coco 
                --lr 1e-3
                --bsz 40
                --load "path_to_trained_model/best_model.pt"
                --logpath "your_experiment_name"
  • Training takes approx. 1 week until convergence (trained four Titan RTX GPUs).

3. FSS-1000

python train.py --backbone {vgg16, resnet50, resnet101} 
                --benchmark fss 
                --lr 1e-3
                --bsz 20
                --load "path_to_trained_model/best_model.pt"
                --logpath "your_experiment_name"
  • Training takes approx. 3 days until convergence (trained with four 2080 Ti GPUs).

Babysitting training:

Use tensorboard to babysit training progress:

  • For each experiment, a directory that logs training progress will be automatically generated under logs/ directory.
  • From terminal, run 'tensorboard --logdir logs/' to monitor the training progress.
  • Choose the best model when the validation (mIoU) curve starts to saturate.

Testing

1. PASCAL-5i

Pretrained models with tensorboard logs are available on our [Google Drive].

python test.py --backbone {vgg16, resnet50, resnet101} 
               --fold {0, 1, 2, 3} 
               --benchmark pascal
               --nshot {1, 5} 
               --load "path_to_trained_model/best_model.pt"

2. COCO-20i

Pretrained models with tensorboard logs are available on our [Google Drive].

python test.py --backbone {resnet50, resnet101} 
               --fold {0, 1, 2, 3} 
               --benchmark coco 
               --nshot {1, 5} 
               --load "path_to_trained_model/best_model.pt"

3. FSS-1000

Pretrained models with tensorboard logs are available on our [Google Drive].

python test.py --backbone {vgg16, resnet50, resnet101} 
               --benchmark fss 
               --nshot {1, 5} 
               --load "path_to_trained_model/best_model.pt"

4. Evaluation without support feature masking on PASCAL-5i

  • To reproduce the results in Tab.1 of our main paper, COMMENT OUT line 51 in hsnet.py: support_feats = self.mask_feature(support_feats, support_mask.clone())

Pretrained models with tensorboard logs are available on our [Google Drive].

python test.py --backbone resnet101 
               --fold {0, 1, 2, 3} 
               --benchmark pascal
               --nshot {1, 5} 
               --load "path_to_trained_model/best_model.pt"

Visualization

  • To visualize mask predictions, add command line argument --visualize: (prediction results will be saved under vis/ directory)
  python test.py '...other arguments...' --visualize  

Example qualitative results (1-shot):

BibTeX

If you use this code for your research, please consider citing:

@article{min2021hypercorrelation, 
   title={Hypercorrelation Squeeze for Few-Shot Segmentation},
   author={Juhong Min and Dahyun Kang and Minsu Cho},
   journal={arXiv preprint arXiv:2104.01538},
   year={2021}
}
Comments
  • about FSS metrics

    about FSS metrics

    In FSS-1000, they said that "The metric we use is the IOU of positive labels in a binary segmentation map". And in DAN,the said that "Note that the metric on FSS-1000 is the IOU of positive labeels(P-IOU) in binary segmentation maps " But they donnot have a clear code to explain it. I think it is the sum of IOU on every test_img,and the divided the test_num. So, I want to ask a help for you ,do you think FSS metric have any difference to metric of PASCAL VOC(mIOU on classes)?

    opened by ily666666 10
  • Paper Result

    Paper Result

    Hello,

    I am training your model resnet50 on pascal (0 fold). After 70 epochs, the max MIou is 62.89. Now the MIou is Fluctuating below the maximum value. Can you suggest how I can achieve the paper results? are there any special parameters to achieve paper results?

    Thank you

    opened by Ehteshamciitwah 7
  • Pascal datasets

    Pascal datasets

    Hello,

    During the training of your model. I realized the model is using nearly 11000 images for training on fold0. While the other comparable models like PFENet are using different lists containing nearly 4000 images for fold 0. Why there is a difference between training images number. am I missing something? Furthermore when I train your model using PFENet pipelines (data loader). The results are very low. Need your comments. Thank you

    opened by Ehteshamciitwah 6
  • Unfair Comparisons especially on COCO

    Unfair Comparisons especially on COCO

    https://github.com/juhongm999/hsnet/issues/5

    This is not the case actually because you did not fairly compare your results with REPRI and PFENet in the table1 where the results are directly copied from their papers. In https://github.com/mboudiaf/RePRI-for-Few-Shot-Segmentation/blob/master/src/dataset/transform.py#L80 they keep the aspect ratio of the resized images to be the same as the original image, but in your implementation https://github.com/juhongm999/hsnet/blob/e288916debe5290b3e9554fb61e13a474e00f885/data/dataset.py#L25 the images are simply resized to be the aspect ratio 1:1 without keeping the original label.

    For my second question, now that all previous methods use 417, 473 or the original sizes for evaluating COCO and PASCAL. I do not understand why did you use size 400 on COCO and PASCAL to create a brand **new** setting and make other people hard to follow to have a fair comparison, even if the size 400 does not bring the best performance according to your words. Normally we should show the setting of the best results. This is true the performance on Pascal will be slightly higher when the training size grows but they are still comparable. And the models of REPRI and PASCAL cannot be directly tested with 1:1 aspect ratio because they are not trained with the images the 1:1 ratios in the non-255 regions. However on COCO I have tested on PFENet. the results will be much lower when it is evaluated with the original labels without resize. it is the same to the results shownin PFENet it is also mentioned by https://github.com/juhongm999/hsnet/issues/1#issuecomment-816485819. So I think it is unfair if you cannot show the COCO results with the original aspects and the original sizes (or 417, 473) to compare with related methods (REPRI, PFENET AND ASGNet and so on) because a smaller size for resizing labels does bring much better performance on COCO.

    opened by deepAICrazy 5
  • about effect

    about effect

    I download your code,and train it according to the readme.txt, but when I train it 150 epoches,Its mIOU up to 82.47%.So ,can you tell me how should I train to achieve the effect of the essay.

    opened by ily666666 5
  • unsufficient GPU storage?

    unsufficient GPU storage?

    Hello, I'm currently testing your code on a new class I inserted to FSS-1000 (weldings on big machine parts). When trying to change several values, such as the image size in line 88 in test.py or the batch size, i constantly run into this error:

    RuntimeError: CUDA out of memory. Tried to allocate 1.40 GiB (GPU 0; 7.79 GiB total capacity; 4.36 GiB already allocated; 448.50 MiB free; 5.44 GiB reserved in total by PyTorch)

    I'm working on a nvidia RTX 3070ti with 8 Gigs of VRAM. I suppose this is because Pytorch manually manages storage allocation and there is not enough VRAM for processing larger images. However this is what i want to do, because I suppose larger images lead to better results in Few Shot Segmentation. So my question is, is there any way to increase FB-IoU and mIou on my own set of images or do I have to implement a completely new dataset on my own and do I have to train a new model to use it on this own set of images?

    Thank you in advance and for your impressive work.

    opened by JoniGlueck 4
  • Possible code simplification?

    Possible code simplification?

    Hi,

    Here: https://github.com/juhongm999/hsnet/blob/2cd06324ef733004a4d0ef6ab594d16fd9d3061f/model/base/conv4d.py#L23-L34

    Is a rather complicated function, which (if I understand it correctly) just strides the final two dimensions. Could you instead simply do:

    out1 = x[...,::2,::2]
    

    Perhaps there is something I'm missing here? Otherwise, I think it would make the code more readable.

    opened by Parskatt 4
  • Training on data from different domain

    Training on data from different domain

    Hi,

    Thank you for making your work publicly available.

    Currently, I am trying to train on satellite imagery (5 cm resolution) but it seems like the model can't converge.

    The dataset consists of 6 classes of which 0=background. I have set the following in my custom Dataset:

     self.nfolds = 5
     self.nclass = 5
    

    Can you confirm if my thought proces is correct that each fold will have 4 base classes in the training set and 1 base class in the validation set?

    As for the training, I though about unfreezing the gradients because the domain data is so much different from the pretrained data.

    Do you have any ideas on how I could obtain feasible results on my data?

    Cheers!

    opened by desmania 3
  • 5-shot training

    5-shot training

    Hello, Thanks for your great work on FSS! I have some confusion about Implementation details of 5-shot training.

    1. According to #8, did it mean that the dataloader give 5 support-query, then model gives 5 loss and sum these losses to backward?
    2. With Figure A12, is it more rational to get five support images and one query image for 5-shot training?
    3. According to section4.5 and #18, what's meaning of "maximum voting score"? Thank you.
    opened by Joseph-Lee-V 2
  • Zero shot learning

    Zero shot learning

    Hello, I have a question for you. Does this code support for zeros shot learning?? I want to use zero shot with this code for evaluating and training it on my dataset

    Thank you

    opened by viethoang303 2
  • [Question] Domain shift result

    [Question] Domain shift result

    COCO to Pascal - test

    I tried to get domain shift results from your pre-trained model (Resnet50 COCO fold 0 ) but strangely got low accuracy.

    Please correct me if I am wrong.

    For example for the fold 0 in the domain shift test, First, I collect all train and validation pascal data and then exclude all classes that are in the coco fold 0. Only Airplane, Boat, Chair, Dining table, Dog, Person remain for fold 0 as explained in the paper RePRI.

    The entire above-remained fold 0 data were used for checking accuracy. There are 7717 cases for the fold 0 test.

    Should I sample 1000 cases from 7717 cases?

    Thank you.

    opened by moonsh 2
Owner
Juhong Min
research interest in computer vision
Juhong Min
Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

Malik Boudiaf 138 Dec 12, 2022
Few-NERD: Not Only a Few-shot NER Dataset

Few-NERD: Not Only a Few-shot NER Dataset This is the source code of the ACL-IJCNLP 2021 paper: Few-NERD: A Few-shot Named Entity Recognition Dataset.

THUNLP 319 Dec 30, 2022
Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

T-Few This repository contains the official code for the paper: "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learni

null 220 Dec 31, 2022
The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

FAPIS The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter Introduction This repo is primari

Khoi Nguyen 8 Dec 11, 2022
Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

Lucas 103 Dec 14, 2022
Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Introduction Pytorch implementation of Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Expert. | paper Song Park1

Clova AI Research 97 Dec 23, 2022
(ICCV'21) Official PyTorch implementation of Relational Embedding for Few-Shot Classification

Relational Embedding for Few-Shot Classification (ICCV 2021) Dahyun Kang, Heeseung Kwon, Juhong Min, Minsu Cho [paper], [project hompage] We propose t

Dahyun Kang 82 Dec 24, 2022
[CVPR 2021] Few-shot 3D Point Cloud Semantic Segmentation

Few-shot 3D Point Cloud Semantic Segmentation Created by Na Zhao from National University of Singapore Introduction This repository contains the PyTor

null 117 Dec 27, 2022
Adaptive Prototype Learning and Allocation for Few-Shot Segmentation (CVPR 2021)

ASGNet The code is for the paper "Adaptive Prototype Learning and Allocation for Few-Shot Segmentation" (accepted to CVPR 2021) [arxiv] Overview data/

Gen Li 91 Dec 23, 2022
Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

SCL Introduction Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)' We evaluated our approach using two baseline

null 34 Oct 8, 2022
An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

CV Lab @ Yonsei University 35 Oct 26, 2022
The implementation of PEMP in paper "Prior-Enhanced Few-Shot Segmentation with Meta-Prototypes"

Prior-Enhanced network with Meta-Prototypes (PEMP) This is the PyTorch implementation of PEMP. Overview of PEMP Meta-Prototypes & Adaptive Prototypes

Jianwei ZHANG 8 Oct 14, 2021
Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Moustafa Meshry 16 Oct 5, 2022
arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Andrej 671 Dec 31, 2022
Listing arxiv - Personalized list of today's articles from ArXiv

Personalized list of today's articles from ArXiv Print and/or send to your gmail

Lilianne Nakazono 5 Jun 17, 2022
Arxiv harvester - Poor man's simple harvester for arXiv resources

Poor man's simple harvester for arXiv resources This modest Python script takes

Patrice Lopez 5 Oct 18, 2022
Official Implementation of Few-shot Visual Relationship Co-localization

VRC Official implementation of the Few-shot Visual Relationship Co-localization (ICCV 2021) paper project page | paper Requirements Use python >= 3.8.

null 22 Oct 13, 2022
PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

PFENet This is the implementation of our paper PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation that has been accepted to IEE

DV Lab 230 Dec 31, 2022