Instance-level Image Retrieval using Reranking Transformers

Overview

Instance-level Image Retrieval using Reranking Transformers

Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021.

Abstract

Instance-level image retrieval is the task of searching in a large database for images that match an object in a query image. To address this task, systems usually rely on a retrieval step that uses global image descriptors, and a subsequent step that performs domain-specific refinements or reranking by leveraging operations such as geometric verification based on local features. In this work, we propose Reranking Transformers (RRTs) as a general model to incorporate both local and global features to rerank the matching images in a supervised fashion and thus replace the relatively expensive process of geometric verification. RRTs are lightweight and can be easily parallelized so that reranking a set of top matching results can be performed in a single forward-pass. We perform extensive experiments on the Revisited Oxford and Paris datasets, and the Google Landmark v2 dataset, showing that RRTs outperform previous reranking approaches while using much fewer local descriptors. Moreover, we demonstrate that, unlike existing approaches, RRTs can be optimized jointly with the feature extractor, which can lead to feature representations tailored to downstream tasks and further accuracy improvements.

Software required

The code is only tested on Linux 64:

  conda create -n rrt python=3.6
  conda activate rrt
  pip install -r requirements.txt

Organization

To use the code for experiments on Google Landmarks v2, Revisited Oxford/Paris, please refer to the folder RRT_GLD.

To use the code for experiments on Stanford Online Products, please refer to the folder RRT_SOP.

To use the code for evaluating SuperGlue on Revisited Oxford/Paris and Stanford Online Products, please refer to the repo SuperGlue.

Citing

If you find our paper/code useful, please consider citing:

@inproceedings{fwtan-instance-2021,
    author = {Fuwen Tan and Jiangbo Yuan and Vicente Ordonez},
    title = {Instance-level Image Retrieval using Reranking Transformers},
    year = {2021},
    booktitle = {International Conference on Computer Vision (ICCV)}
 }
Comments
  • Trying to train Reranking Model on the SOP data but getting RuntimeError:

    Trying to train Reranking Model on the SOP data but getting RuntimeError:

    Hello, I'm trying to train the Reranking Model from scratch on the SOP data but getting the below error.

    ERROR - Rerank (train) - Failed after 0:01:23! Traceback (most recent calls WITHOUT Sacred internals): File "experiment_rerank.py", line 100, in main metrics = eval_function()[0] File "/home/jupyter/image-search/experimentation/modelling/RerankingTransformer/RRT_SOP/utils/training.py", line 199, in evaluate_rerank recalls_rerank, nn_dists, nn_inds = recall_function() File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, **kwargs) File "/home/jupyter/image-search/experimentation/modelling/RerankingTransformer/RRT_SOP/utils/metrics.py", line 188, in recall_at_ks_rerank tgt_global=None, tgt_local=current_index.to(device)) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward return self.module(*inputs[0], **kwargs[0]) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/jupyter/image-search/experimentation/modelling/RerankingTransformer/RRT_SOP/models/base_model.py", line 67, in forward logits = self.matcher(src_global=src_global, src_local=src_local, tgt_global=tgt_global, tgt_local=tgt_local) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/jupyter/image-search/experimentation/modelling/RerankingTransformer/RRT_SOP/models/matcher.py", line 44, in forward tgt_local = tgt_local.flatten(2) + self.seg_encoder(3 * src_local.new_ones((bsize, 1), dtype=torch.long)).permute(0, 2, 1) + pos_embed.flatten(2) RuntimeError: The size of tensor a (2551) must match the size of tensor b (3000) at non-singleton dimension 0

    opened by JaredChung 5
  • Using RRT to compute the similarity between 2 batch of images

    Using RRT to compute the similarity between 2 batch of images

    Hi, first thank you for this great repo!

    In the code for RRT_SOP the matcher computes the pairwise similarity between src_local and tgt_local. So for a batch size of BS the output of matcher(src_local, tgt_local) is a tensor of size BS. Is there an efficient way to compute the similarity of all pairs in the two batches of image? ie leading to an output of size BSxBS.

    Thanks for any help!

    Elias

    opened by elias-ramzi 4
  • Can't training RRT_SOP

    Can't training RRT_SOP

    I'm trying to training on SOP, but I can't run your code. I think in your code experiment_global.py line 84 there should be train_global rather evaluate_global. I simply change it but there will be new issue in next code. I wanna know how to fix it.

    opened by Soledad-Z 2
  • Training reranker on gldv2 + superpoint + netvlad

    Training reranker on gldv2 + superpoint + netvlad

    Hello again! I'm trying to learn rrt on descriptors extracted with netvlad + superpoint on gldv2 dataset, and i really can't just get close to your's results (i know that's it another networks for extraction, but really looks strange). My best metrics on rOxM/rOxH is 46/22 for top100 reranking samples. I'm not using scales, because they not extracted by superpoint, and looks like that all changes in model. If you have some ideas, why that can happened i'll be very glad to hear them!

    opened by letsmakeadeal 2
  • Question about bounding box handling in Rop experiments

    Question about bounding box handling in Rop experiments

    Query Images of the Revisited Oxford/Paris experiment have a bounding box and Oxford/Paris evaluation code should not use any part other than this bounding box. A common solution to this is to crop the input image by the bounding box, to avoid use outer part. (Because the outside of the bounding box gives extra information and this extra information has a huge impact on performance.) (revisitop: https://github.com/filipradenovic/revisitop/blob/master/python/example_process_images.py, line 49) (DELG: https://github.com/tensorflow/models/blob/master/research/delf/delf/python/delg/extract_features.py, line 142)

    I found that your feature extractor code based on DELG's feature extractor, but the bounding box cropping part is commented out, even when extract oxford/paris dataset. (https://github.com/uvavision/RerankingTransformer/blob/main/RRT_GLD/delg_scripts/extract_features_gld.py, line 112~118) (https://github.com/uvavision/RerankingTransformer/blob/882dc70b2550dca32e63d0a2cca219a5953dc7b9/RRT_GLD/delg_scripts/extract_oxford_r50_gldv2.sh)

    I've been looking for something that does bounding box cropping, or feature rejection outside the bounding box in other parts of your code, but I couldn't find it. If your code handles bounding boxes for query images(or features), it would be appreciated if you could tell us which part it is.

    opened by sungonce 2
  • Does RRT_SOP train matcher?

    Does RRT_SOP train matcher?

    thx for providing such a good work.

    I'm trying to train RRT on MSLS dataset, current I am tracing SOP part and would like to modify it to be compatible with MSLS dataset. However, it seems that matcher is not trained during the training process. I found pairwise_matching is the only variable that enable matcher, but I cannot find when does it be set to be True. Am I missing anything?

    could you give me an advice that which parts I have to modify for training on other datasets such as MSLS dataset?

    Thx

    opened by macTracyHuang 1
  • Functions without arguments

    Functions without arguments

    Good evening,

    I noticed some functions like get_loaders() was defined with parameters but when callled it doesnt include any due to capture_functions, Where can I find the config file including the parameters default values?

    opened by mhmd-mst 1
  • Using RRT for different datasets

    Using RRT for different datasets

    Hello, I am working on datasets for visual geolocalization and wanted to use RRT on them, I want to ask you if it is possible to use different descriptors than delg? And the variable src_positions means keypoint? also src_masks means the attention mask? And if so what is the variable attention and why wasnt it used?

    opened by mhmd-mst 1
  • Cannot reproduce the reuslts of alphaQE+RRT

    Cannot reproduce the reuslts of alphaQE+RRT

    Hello, i tried to reproduce your results shown in the iccv paper, while cannot get the consistent results. image I have checked all the hyperparameters to ensure them to be consistent with those in the paper. Do you have any idea with this?

    opened by yuyouxixi 1
  • <SEP>?

    ?

    Hello, thanks for bringing the great work. I wonder that what's the purpose of token? I didn't find any explain in your paper. Would you please explain it?

    opened by ZihaoH 1
  • Question about masking input tokens

    Question about masking input tokens

    Hello! Can you explain please why are you masking global descriptors for pairs here RRT_GLD/models/matcher.py

            input_feats = torch.cat([cls_embed, src_global, src_local, sep_embed, tgt_global, tgt_local], 1).permute(1,0,2)
            input_mask = torch.cat([
                src_local.new_zeros((bsize, 2), dtype=torch.bool),
                src_mask,
                src_local.new_zeros((bsize, 2), dtype=torch.bool),
                tgt_mask
            ], 1)
            logits = self.encoder(input_feats, src_key_padding_mask=input_mask)
    
    opened by letsmakeadeal 0
Owner
UVA Computer Vision
UVA Computer Vision
Official repository for GCR rerank, a GCN-based reranking method for both image and video re-ID

Official repository for GCR rerank, a GCN-based reranking method for both image and video re-ID

null 53 Nov 22, 2022
Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

null 47 Dec 16, 2022
Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

CMIC-Retrieval Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021. Introduction In this wo

null 42 Nov 17, 2022
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Phil Wang 272 Dec 23, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination

InsGen - Data-Efficient Instance Generation from Instance Discrimination Data-Efficient Instance Generation from Instance Discrimination Ceyuan Yang,

GenForce: May Generative Force Be with You 93 Dec 25, 2022
Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Robust Object Detection via Instance-Level Temporal Cycle Confusion This repo contains the implementation of the ICCV 2021 paper, Robust Object Detect

Xin Wang 69 Oct 13, 2022
pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

Zhiqiang Shen 16 Nov 4, 2020
[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

Yuqing Wang 687 Jan 7, 2023
ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

This is the project page for the paper: ISTR: End-to-End Instance Segmentation via Transformers, Jie Hu, Liujuan Cao, Yao Lu, ShengChuan Zhang, Yan Wa

Jie Hu 182 Dec 19, 2022
Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Spacetimeformer Multivariate Forecasting This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecast

QData 440 Jan 2, 2023
Activity image-based video retrieval

Cross-modal-retrieval Our approach is focus on Activity Image-to-Video Retrieval (AIVR) task. The compared methods are state-of-the-art single modalit

BCMI 75 Oct 21, 2021
Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback

CoSMo.pytorch Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback, Seungmin Lee*, Dongwan Kim*, Bohyung

Seung Min Lee 54 Dec 8, 2022
A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval project page | arXiv | webvid-data Repository containing the code,

null 225 Dec 25, 2022
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking We revisit and address issues with Oxford 5k and Paris 6k image retrieval benchm

Filip Radenovic 188 Dec 17, 2022
cisip-FIRe - Fast Image Retrieval

Fast Image Retrieval (FIRe) is an open source image retrieval project release by Center of Image and Signal Processing Lab (CISiP Lab), Universiti Malaya. This project implements most of the major binary hashing methods to date, together with different popular backbone networks and public datasets.

CISiP Lab 39 Nov 25, 2022
Multimodal commodity image retrieval 多模态商品图像检索

Multimodal commodity image retrieval 多模态商品图像检索 Not finished yet... introduce explain:The specific description of the project and the product image dat

hongjie 8 Nov 25, 2022
Source code of our TTH paper: Targeted Trojan-Horse Attacks on Language-based Image Retrieval.

Targeted Trojan-Horse Attacks on Language-based Image Retrieval Source code of our TTH paper: Targeted Trojan-Horse Attacks on Language-based Image Re

fine 7 Aug 23, 2022