[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Overview

Instance-level Image Retrieval using Reranking Transformers

Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021.

Abstract

Instance-level image retrieval is the task of searching in a large database for images that match an object in a query image. To address this task, systems usually rely on a retrieval step that uses global image descriptors, and a subsequent step that performs domain-specific refinements or reranking by leveraging operations such as geometric verification based on local features. In this work, we propose Reranking Transformers (RRTs) as a general model to incorporate both local and global features to rerank the matching images in a supervised fashion and thus replace the relatively expensive process of geometric verification. RRTs are lightweight and can be easily parallelized so that reranking a set of top matching results can be performed in a single forward-pass. We perform extensive experiments on the Revisited Oxford and Paris datasets, and the Google Landmark v2 dataset, showing that RRTs outperform previous reranking approaches while using much fewer local descriptors. Moreover, we demonstrate that, unlike existing approaches, RRTs can be optimized jointly with the feature extractor, which can lead to feature representations tailored to downstream tasks and further accuracy improvements.

Software required

The code is only tested on Linux 64:

  conda create -n rrt python=3.6
  conda activate rrt
  pip install -r requirements.txt

Organization

To use the code for experiments on Google Landmarks v2, Revisited Oxford/Paris, please refer to the folder RRT_GLD.

To use the code for experiments on Stanford Online Products, please refer to the folder RRT_SOP.

To use the code for evaluating SuperGlue on Revisited Oxford/Paris and Stanford Online Products, please refer to the repo SuperGlue.

Citing

If you find our paper/code useful, please consider citing:

@inproceedings{fwtan-instance-2021,
    author = {Fuwen Tan and Jiangbo Yuan and Vicente Ordonez},
    title = {Instance-level Image Retrieval using Reranking Transformers},
    year = {2021},
    booktitle = {International Conference on Computer Vision (ICCV)}
 }
Comments
  • Trying to train Reranking Model on the SOP data but getting RuntimeError:

    Trying to train Reranking Model on the SOP data but getting RuntimeError:

    Hello, I'm trying to train the Reranking Model from scratch on the SOP data but getting the below error.

    ERROR - Rerank (train) - Failed after 0:01:23! Traceback (most recent calls WITHOUT Sacred internals): File "experiment_rerank.py", line 100, in main metrics = eval_function()[0] File "/home/jupyter/image-search/experimentation/modelling/RerankingTransformer/RRT_SOP/utils/training.py", line 199, in evaluate_rerank recalls_rerank, nn_dists, nn_inds = recall_function() File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, **kwargs) File "/home/jupyter/image-search/experimentation/modelling/RerankingTransformer/RRT_SOP/utils/metrics.py", line 188, in recall_at_ks_rerank tgt_global=None, tgt_local=current_index.to(device)) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward return self.module(*inputs[0], **kwargs[0]) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/jupyter/image-search/experimentation/modelling/RerankingTransformer/RRT_SOP/models/base_model.py", line 67, in forward logits = self.matcher(src_global=src_global, src_local=src_local, tgt_global=tgt_global, tgt_local=tgt_local) File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/jupyter/image-search/experimentation/modelling/RerankingTransformer/RRT_SOP/models/matcher.py", line 44, in forward tgt_local = tgt_local.flatten(2) + self.seg_encoder(3 * src_local.new_ones((bsize, 1), dtype=torch.long)).permute(0, 2, 1) + pos_embed.flatten(2) RuntimeError: The size of tensor a (2551) must match the size of tensor b (3000) at non-singleton dimension 0

    opened by JaredChung 5
  • Using RRT to compute the similarity between 2 batch of images

    Using RRT to compute the similarity between 2 batch of images

    Hi, first thank you for this great repo!

    In the code for RRT_SOP the matcher computes the pairwise similarity between src_local and tgt_local. So for a batch size of BS the output of matcher(src_local, tgt_local) is a tensor of size BS. Is there an efficient way to compute the similarity of all pairs in the two batches of image? ie leading to an output of size BSxBS.

    Thanks for any help!

    Elias

    opened by elias-ramzi 4
  • Can't training RRT_SOP

    Can't training RRT_SOP

    I'm trying to training on SOP, but I can't run your code. I think in your code experiment_global.py line 84 there should be train_global rather evaluate_global. I simply change it but there will be new issue in next code. I wanna know how to fix it.

    opened by Soledad-Z 2
  • Training reranker on gldv2 + superpoint + netvlad

    Training reranker on gldv2 + superpoint + netvlad

    Hello again! I'm trying to learn rrt on descriptors extracted with netvlad + superpoint on gldv2 dataset, and i really can't just get close to your's results (i know that's it another networks for extraction, but really looks strange). My best metrics on rOxM/rOxH is 46/22 for top100 reranking samples. I'm not using scales, because they not extracted by superpoint, and looks like that all changes in model. If you have some ideas, why that can happened i'll be very glad to hear them!

    opened by letsmakeadeal 2
  • Question about bounding box handling in Rop experiments

    Question about bounding box handling in Rop experiments

    Query Images of the Revisited Oxford/Paris experiment have a bounding box and Oxford/Paris evaluation code should not use any part other than this bounding box. A common solution to this is to crop the input image by the bounding box, to avoid use outer part. (Because the outside of the bounding box gives extra information and this extra information has a huge impact on performance.) (revisitop: https://github.com/filipradenovic/revisitop/blob/master/python/example_process_images.py, line 49) (DELG: https://github.com/tensorflow/models/blob/master/research/delf/delf/python/delg/extract_features.py, line 142)

    I found that your feature extractor code based on DELG's feature extractor, but the bounding box cropping part is commented out, even when extract oxford/paris dataset. (https://github.com/uvavision/RerankingTransformer/blob/main/RRT_GLD/delg_scripts/extract_features_gld.py, line 112~118) (https://github.com/uvavision/RerankingTransformer/blob/882dc70b2550dca32e63d0a2cca219a5953dc7b9/RRT_GLD/delg_scripts/extract_oxford_r50_gldv2.sh)

    I've been looking for something that does bounding box cropping, or feature rejection outside the bounding box in other parts of your code, but I couldn't find it. If your code handles bounding boxes for query images(or features), it would be appreciated if you could tell us which part it is.

    opened by sungonce 2
  • Does RRT_SOP train matcher?

    Does RRT_SOP train matcher?

    thx for providing such a good work.

    I'm trying to train RRT on MSLS dataset, current I am tracing SOP part and would like to modify it to be compatible with MSLS dataset. However, it seems that matcher is not trained during the training process. I found pairwise_matching is the only variable that enable matcher, but I cannot find when does it be set to be True. Am I missing anything?

    could you give me an advice that which parts I have to modify for training on other datasets such as MSLS dataset?

    Thx

    opened by macTracyHuang 1
  • Functions without arguments

    Functions without arguments

    Good evening,

    I noticed some functions like get_loaders() was defined with parameters but when callled it doesnt include any due to capture_functions, Where can I find the config file including the parameters default values?

    opened by mhmd-mst 1
  • Using RRT for different datasets

    Using RRT for different datasets

    Hello, I am working on datasets for visual geolocalization and wanted to use RRT on them, I want to ask you if it is possible to use different descriptors than delg? And the variable src_positions means keypoint? also src_masks means the attention mask? And if so what is the variable attention and why wasnt it used?

    opened by mhmd-mst 1
  • Cannot reproduce the reuslts of alphaQE+RRT

    Cannot reproduce the reuslts of alphaQE+RRT

    Hello, i tried to reproduce your results shown in the iccv paper, while cannot get the consistent results. image I have checked all the hyperparameters to ensure them to be consistent with those in the paper. Do you have any idea with this?

    opened by yuyouxixi 1
  • <SEP>?

    ?

    Hello, thanks for bringing the great work. I wonder that what's the purpose of token? I didn't find any explain in your paper. Would you please explain it?

    opened by ZihaoH 1
  • Question about masking input tokens

    Question about masking input tokens

    Hello! Can you explain please why are you masking global descriptors for pairs here RRT_GLD/models/matcher.py

            input_feats = torch.cat([cls_embed, src_global, src_local, sep_embed, tgt_global, tgt_local], 1).permute(1,0,2)
            input_mask = torch.cat([
                src_local.new_zeros((bsize, 2), dtype=torch.bool),
                src_mask,
                src_local.new_zeros((bsize, 2), dtype=torch.bool),
                tgt_mask
            ], 1)
            logits = self.encoder(input_feats, src_key_padding_mask=input_mask)
    
    opened by letsmakeadeal 0
Owner
UVA Computer Vision
UVA Computer Vision
Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

CIRPLANT This repository contains the code and pre-trained models for Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) For d

Zheyuan (David) Liu 29 Nov 17, 2022
🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt

null 475 Jan 4, 2023
Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

Amazon 69 Jan 3, 2023
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

Phil Wang 364 Jan 6, 2023
Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 Billion Parameters) on a single 16 GB VRAM V100 Google Cloud instance with Huggingfa

null 289 Jan 6, 2023
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra. What is Lightning Tran

Pytorch Lightning 581 Dec 21, 2022
Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

ITTR - Pytorch Implementation of the Hybrid Perception Block (HPB) and Dual-Pruned Self-Attention (DPSA) block from the ITTR paper for Image to Image

Phil Wang 17 Dec 23, 2022
A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

Transformer Embedder A Word Level Transformer layer based on PyTorch and ?? Transformers. How to use Install the library from PyPI: pip install transf

Riccardo Orlando 27 Nov 20, 2022
🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

?? The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

null 26 Apr 29, 2021
This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

null 79 Dec 27, 2022
CCKS-Title-based-large-scale-commodity-entity-retrieval-top1

- 基于标题的大规模商品实体检索top1 一、任务介绍 CCKS 2020:基于标题的大规模商品实体检索,任务为对于给定的一个商品标题,参赛系统需要匹配到该标题在给定商品库中的对应商品实体。 输入:输入文件包括若干行商品标题。 输出:输出文本每一行包括此标题对应的商品实体,即给定知识库中商品 ID,

null 43 Nov 11, 2022
Scene Text Retrieval via Joint Text Detection and Similarity Learning

This is the code of "Scene Text Retrieval via Joint Text Detection and Similarity Learning". For more details, please refer to our CVPR2021 paper.

null 79 Nov 29, 2022
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval. CLIP4Clip is a video-text retrieval model based

ArrowLuo 456 Jan 6, 2023
⚖️ A Statutory Article Retrieval Dataset in French.

A Statutory Article Retrieval Dataset in French This repository contains the Belgian Statutory Article Retrieval Dataset (BSARD), as well as the code

Maastricht Law & Tech Lab 19 Nov 17, 2022
Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation Official Code Repository for the paper "Unsupervised Documen

NLP*CL Laboratory 2 Oct 26, 2021
Question and answer retrieval in Turkish with BERT

trfaq Google supported this work by providing Google Cloud credit. Thank you Google for supporting the open source! ?? What is this? At this repo, I'm

M. Yusuf Sarıgöz 13 Oct 10, 2022
Legal text retrieval for python

legal-text-retrieval Overview This system contains 2 steps: generate training data containing negative sample found by mixture score of cosine(tfidf)

Nguyễn Minh Phương 22 Dec 6, 2022
[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

Yongming Rao 89 Dec 18, 2022
Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

japanese-ebook-analysis This aim of this project is to make analysing the contents of a japanese ebook easy and streamline the process for non-technic

Christoffer Aakre 14 Jul 23, 2022