More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

Ayan Kumar Bhunia

Last update: Aug 27, 2022

Related tags

Deep Learning sketch semi-supervised-learning image-retrieval cvpr sbir sketch-based-retrieval sketch-based-modeling sketch-based-image-retrieval sketch-based-application fg-sbir

Overview

More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval, CVPR 2021.

Ayan Kumar Bhunia, Pinaki nath Chowdhury, Aneeshan Sain, Yongxin Yang, Tao Xiang, Yi-Zhe Song, “More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

SketchX_ShoeV2/ChairV2 Dataset: Download

Abstract

A fundamental challenge faced by existing Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) models is the data scarcity -- model performances are largely bottlenecked by the lack of sketch-photo pairs. Whilst the number of photos can be easily scaled, each corresponding sketch still needs to be individually produced. In this paper, we aim to mitigate such an upper-bound on sketch data, and study whether unlabelled photos alone (of which they are many) can be cultivated for performances gain. In particular, we introduce a novel semi-supervised framework for cross-modal retrieval that can additionally leverage large-scale unlabelled photos to account for data scarcity. At the centre of our semi-supervision design is a sequential photo-to-sketch generation model that aims to generate paired sketches for unlabelled photos. Importantly, we further introduce a discriminator guided mechanism to guide against unfaithful generation, together with a distillation loss based regularizer to provide tolerance against noisy training samples. Last but not least, we treat generation and retrieval as two conjugate problems, where a joint learning procedure is devised for each module to mutually benefit from each other. Extensive experiments show that our semi-supervised model yields significant performance boost over the state-of-the-art supervised alternatives, as well as existing methods that can exploit unlabelled photos for FG-SBIR.

Outline

Figure: Our proposed method additionally leverages large scale photos without any manually labelled paired sketches to improve FG-SBIR performance. Moreover, we show that the two conjugate process, photo-to-sketch generation and fine-grained SBIR, could improve each other by joint training.

Joint Architecture

Figure: Our framework: a FG-SBIR model leverages large scale unlabelled photos using a sequential photo-to-sketch generation model along with labelled pairs. Discriminator guided instance-wise weighting and distillation loss are used to guard against the noisy generated data. Simultaneously, photo-to-sketch generation model learns by taking reward from FG-SBIR model and Discriminator via policy gradient (over both labelled and unlabelled) together with supervised VAE loss over labelled data. Note rasterization (vector to raster format) is a non-differentiable operation.

Examples

Figure: Qualitative results on our photo-to-sketch generation process, where sketch is shown with attention-map at progressive instances.

Citation

If you find this article useful in your research, please consider citing:

@InProceedings{semi-fgsbir,
author = {Ayan Kumar Bhunia and Pinaki Nath Chowdhury and Aneeshan Sain and Yongxin Yang and Tao Xiang and Yi-Zhe Song},
title = {More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021}
}

Work done at SketchX Lab, CVSSP, University of Surrey.

Comments

Fine-Grained SBIR Datasets (Chair-V2 & Shoe-V2) download link is broken

Hello, I've recently come across this work in arxiv and really enjoyed it. Thank you for making the code publicly available so quickly! 😀

I have found that the official download link for the Fine-Grained SBIR Datasets (Chair-V2 & Shoe-V2) in your group's webpage is currently not working: (The link is broken!)

Thank you so much in advance. 🙂

opened by dahyun-kang 3
The code for retrieval is missing

I wonder where the codes for the FG-SBIR task are? It seems like all the released codes are for the sketch generating task. How do you get the reword from the FG-SBIR model? Where is the FG-SBIR model?

opened by CDOTAD 0
how to form training sample pairs

Each picture has multiple sketches. When training sketch generator, how to form training sample pairs. Is each sketch will be used paired with corresponding picture? Or each picture with one of its sketches form a sample pair?

opened by bebory 0
dataset

Hi, the number of sketch images is 6648/1275 from link https://drive.google.com/file/d/1frltfiEd9ymnODZFHYrbg741kfys1rq1/view?usp=sharing and https://drive.google.com/u/0/uc?id=15s2BR-QwLgX_DObQBrYlUlZqUU90EL9G&export=download, but in your paper is 6730/2000 of Shoev2 and Chairv2. Did i put down the wrong dataset? Looking forward to your reply . Thanks.

opened by ccq195 1
Missing requirements (pre-trained models, testing modules, and runnable scripts)

Hello again,

I have found that some of the requirements for reproduction are missing: 1) the pre-trained model used in model.load_state_dict(torch.load('./modelCVPR21/QMUL/model_photo2Sketch_QMUL_2Dattention_8000_.pth')) is not provided, 2) validation or testing environments are not specified, 3) runnable scripts with proper arguments are not provided either.

I assume those have been missing during refactoring codes, and users may have difficulties with reproducing the results. Please kindly consider including those missing parts in future updates!

Thank you in advance :smile:

opened by dahyun-kang 1

More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

Related tags

Overview

More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval, CVPR 2021.

SketchX_ShoeV2/ChairV2 Dataset: Download

Abstract

Outline

Joint Architecture

Examples

Citation

Work done at SketchX Lab, CVSSP, University of Surrey.

Comments

Fine-Grained SBIR Datasets (Chair-V2 & Shoe-V2) download link is broken

The code for retrieval is missing

how to form training sample pairs

dataset

Missing requirements (pre-trained models, testing modules, and runnable scripts)

Owner

Ayan Kumar Bhunia

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

PyTorch implementation of Weak-shot Fine-grained Classification via Similarity Transfer

Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

Towards Fine-Grained Reasoning for Fake News Detection