Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022

CVTEAM

Last update: Dec 5, 2022

Related tags

Deep Learning salient-object-detection

Overview

PGNet

Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022,
CVPR 2022 (arXiv 2204.05041)

Abstract

Recent salient object detection (SOD) methods based on deep neural network have achieved remarkable performance. However, most of existing SOD models designed for low-resolution input perform poorly on high-resolution images due to the contradiction between the sampling depth and the receptive field size. Aiming at resolving this contradiction, we propose a novel one-stage framework called Pyramid Grafting Network (PGNet), using transformer and CNN backbone to extract features from different resolution images independently and then graft the features from transformer branch to CNN branch. An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically, guided by different source feature during decoding process. Moreover, we design an Attention Guided Loss (AGL) to explicitly supervise the attention matrix generated by CMGM to help the network better interact with the attention from different models. We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-8K resolutions. To our knowledge, it is the largest dataset in both quantity and resolution for high-resolution SOD task, which can be used for training and testing in future research. Sufficient experiments on UHRSD and widely-used SOD datasets demonstrate that our method achieves superior performance compared to the state-of-the-art methods.

Ultra High-Resolution Saliency Detection Dataset

Visual display for sample in UHRSD dataset. Best viewd by clikcing and zooming in.

To relief the lack of high-resolution datasets for SOD, we contribute the Ultra High-Resolution for Saliency Detection (UHRSD) dataset with a total of 5,920 images in 4K(3840 × 2160) or higher resolution, including 4,932 images for training and 988 images for testing. A total of 5,920 images were manually selected from websites (e.g. Flickr Pixabay) with free copyright. Our dataset is diverse in terms of image scenes, with a balance of complex and simple salient objects of various size.

To our knowledge, it is the largest dataset in both quantity and resolution for high-resolution SOD task, which can be used for training and testing in future research.

Our UHRSD (Ultra High-Resolution Saliency Detection) Dataset:

We provide the original 4K version and the convenient 2K version of our UHRSD (Ultra High-Resolution Saliency Detection) Dataset for download: Google Drive

Usage

Requirements

Python 3.8
Pytorch 1.7.1
OpenCV
Numpy
Apex
Timm

Train

cd src
./train.sh

We implement our method by PyTorch and conduct experiments on 2 NVIDIA 2080Ti GPUs.
We adopt pre-trained ResNet-18 and Swin-B-224 as backbone networks, which are saved in PRE folder.
We train our method on 3 settings : DUTS-TR, DUTS-TR+HRSOD and UHRSD_TR+HRSOD_TR.
After training, the trained models will be saved in MODEL folder.

Test

The trained model can be download here: Google Drive

cd src
python test.py

After testing, saliency maps will be saved in RESULT folder

Saliency Map

Trained on DUTS-TR:Google Drive

Trained on DUT+HRSOD:Google Drive

Trained on UHRSD+HRSOD:Google Drive

Citation

@inproceedings{xie2022pyramid,
    author    = {Xie, Chenxi and Xia, Changqun and Ma, Mingcan and Zhao, Zhirui and Chen, Xiaowu and Li, Jia},
    title     = {Pyramid Grafting Network for One-Stage High Resolution Saliency Detection},
    booktitle = {CVPR},
    year      = {2022}
}

Comments

code

thank you for your nice work. I want to run the train_distributed.py file. but it is not work. the dataset.py seem to not ready. could you tell me the problems.

opened by cenchaojun 1
Ours-UH model is missing

Hey

Great work! well done! The google drive link contains the model that was trained on DUTS + HR, named Ours-DH. Can you please share the Ours-UH model as well?

opened by adar-cohen-imagenai 0
License of dataset
A total of 5,920 images were manually selected from websites (e.g. Flickr Pixabay) with free copyright.

Could you list the licenses of each individual image and where they are from exactly? I checked a few images, but could not find where it says "free copyright", whatever that means.

000056.jpg https://www.flickr.com/photos/193480849@N05/51323576150/ "All rights reserved"

000111.jpg https://www.flickr.com/photos/mikehousephotography/51289703416/ "All rights reserved"

001539.jpg https://www.flickr.com/photos/pjvmartinsphotography/51867151692/ "All rights reserved" (Not exactly the same photo, but I think it is also there somewhere)
opened by 99991 0
SBlock size problem

Thank you for your nice work.I noticed in the paper that S3 and S4 of the Swin-Transformer branch have the same size and are not downsampled. What is the reason?

opened by Qiublack 0
model predictions were all grey

Thank you for your nice work. I had some problems with the code, my model predictions were all grey with a little bit of texture. What might be the cause of the error?

opened by Qiublack 3

Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022

Related tags

Overview

PGNet

Abstract

Ultra High-Resolution Saliency Detection Dataset

Usage

Requirements

Directory

Train

Test

Saliency Map

Citation

Comments

code

Ours-UH model is missing

License of dataset

SBlock size problem

model predictions were all grey

Owner

CVTEAM

Code for Mining the Benefits of Two-stage and One-stage HOI Detection

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Virtual Dance Reality Stage: a feature that offers you to share a stage with another user virtually

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

(CVPR2021) DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation

TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

Official implement of Paper：A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Task-related Saliency Network For Few-shot learning

[ECCV 2020] Gradient-Induced Co-Saliency Detection

The code repository for "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection" (ACM MM'21)

Pytorch implementation of Feature Pyramid Network (FPN) for Object Detection

A Data Annotation Tool for Semantic Segmentation, Object Detection and Lane Line Detection.(In Development Stage)

Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"