Official implementation of the AAAI 2022 paper "Learning Token-based Representation for Image Retrieval"

Related tags

Authentication Token
Overview

Token: Token-based Representation for Image Retrieval

PyTorch training code for Token-based Representation for Image Retrieval. We propose a joint local feature learning and aggregation framework, obtaining 82.3 mAP on ROxf with Medium evaluation protocols. Inference in 50 lines of PyTorch.

Token

What it is. Given an image, Token first uses a CNN and a Local Feature Self-Attention (LFSA) module to extract local features $F_c$. Then, they are tokenized into $L$ visual tokens with spatial attention. Further, a refinement block is introduced to enhance the obtained visual tokens with self-attention and cross-attention. Finally, Token concatenates all the visual tokens to form a compact global representation $f_g$ and reduce its dimension. The aggreegated global feature is discriminative and efficient.

About the code. Token is very simple to implement and experiment with. Training code follows this idea - it is not a library, but simply a train.py importing model and criterion definitions with standard training loops.

mAP performance of the proposed model

We provide results of Token. mAP is computed with Medium and Hard evaluation protocols. model will come soon.

Token

Requirements

  • Python 3
  • cuda 11.0
  • PyTorch tested on 1.8.0, torchvision 0.9.0
  • numpy
  • matplotlib

Usage - Representation learning

There are no extra compiled components in Token and package dependencies are minimal, so the code is very simple to use. We provide instructions how to install dependencies via conda. Install PyTorch 1.8.0 and torchvision 0.9.0:

conda install -c pytorch pytorch torchvision

Data preparation

Before going further, please check out Google landmarkv2 github. We use their training images. If you use this code in your research, please also cite their work!

Download and extract Google landmarkv2 train and val images with annotations from https://github.com/cvdfoundation/google-landmark.

Download ROxf and RPar datastes with annotations. We expect the directory structure to be the following:

/data/
  ├─ Google-landmark-v2 # train images
  │   ├─ train.csv
  │   ├─ train_clean.csv
  │   ├─ GLDv2-clean-train-split.pkl
  │   ├─ GLDv2-clean-val-split.pkl
  |   └─ train
  └─test # test images
      ├─ roxford5k
      |   ├─ jpg
      |   └─ gnd_roxford5k.pkl
      └─ rparis6k
          ├─ jpg
          └─ gnd_rparis6k.pkl

Training

To train Token on a single node with 4 gpus for 30 epochs run:

sh experiment.sh

A single epoch takes 2.5 hours, so 30 epoch training takes around 3 days on a single machine with 4 3090Ti cards.

We train Token with SGD setting learning rate to 0.01. The refinement block is trained with dropout of 0.1, and linearly decaying scheduler is adopted to gradually decay the learning rate to 0 when the desired number of steps is reached.

Evaluation

To evaluate on Roxf and Rparis with a single GPU run:

python test.py

and get results as below

>> Test Dataset: roxford5k *** local aggregation >>
>> mAP Medium: 82.28, Hard: 66.57

>> Test Dataset: rparis6k *** local aggregation >>
>> mAP Medium: 89.34, Hard: 78.56

We found that there is a change in performance when the test environment is different, for example, when the environment is GeForce RTX 2080Ti with cuda 10.2, pytorch 1.7.1 and torchvision 0.8.2, the test performance is

>> Test Dataset: roxford5k *** local aggregation >>
>> mAP Medium: 81.36, Hard: 62.09

>> Test Dataset: rparis6k *** local aggregation >>
>> mAP Medium: 90.19, Hard: 80.16

Qualitative examples

Selected qualitative examples of different methods. Top-11 results are shown in the figure. The image with green denotes the true positives and the red bounding boxes are false positives.

Token

Comments
  • About the training logger and best_checkpoint.pth

    About the training logger and best_checkpoint.pth

    Hi, authors. I am impressed by your great work. And I am now trying to run your code with the same configuration(4 GPUs and the same parameters and the same training dataset). At the first epoch the loss gradually decreases from 18.3 to 17.5. But I got a sharp rise in the training loss at the second epoch. (from 18 to 200+). The loss gradually went to 7000+ in the fourth epoch. In addition, the top 5 error is always more than 99%. To debug this error, I am now generating more logs to investigate. Maybe it is caused by an invalid gradient. If possible, could you please share your training_logger and val_logger results with me? And I will appreciate it if you can share the "Best_checkpoint.pth". Thank you!

    what I have changed """ configdataset.py->GLDv2_build_train_dataset(csv_path, clean_csv_path, image_dir, output_directory, True, 0.2, 0) from False->True

    And the val_logger plot is using the same data as the training logger. """

    opened by Barry-Liang 7
  • Can't get the accuracy mentioned in the paper

    Can't get the accuracy mentioned in the paper

    I use experiment.sh to train the model,but the model loss does not drop properly,Tested on the open source test set, the results are as follows

    70/70 done...>> Test Dataset: roxford5k *** Feature Type: Token >> mAP Eeay: 1.0, Medium: 1.78, Hard: 0.91 mP@k[1, 5, 10] Easy: [0. 1.18 1.03], Medium: [1.43 2. 2.14], Hard: [1.43 0.86 1.14] 70/70 done...>> Test Dataset: rparis6k *** Feature Type: Token >> mAP Eeay: 2.25, Medium: 4.41, Hard: 2.52 mP@k[1, 5, 10] Easy: [1.43 3.43 3.29], Medium: [2.86 7.43 6.71], Hard: [1.43 4.57 3.86] Why is the score so low?

    opened by TouchSkyWf 6
  • Performance difference between papers and released weights

    Performance difference between papers and released weights

    Hello @MCC-WH Thank you for sharing the good code.

    I have noticed a huge performance difference between your paper and the released weight, especially on the +1M experiments.

    | | ROxf-M | +1M | RPar-M | +1M | ROxf-H | +1M | RPar-H | +1M | |:----------------------------------------:|:------:|:----:|:------:|:----:|:------:|:----:|:------:|:----:| | TOKEN-R50-Paper | 79.42 | 73.68 | 88.67 | 77.56 | 59.48 | 49.55 | 76.49 | 58.92 | TOKEN-R50-Released weights | 79.79 | 67.36 | 88.08 | 74.33 | 62.68 | 45.70 | 75.49 | 52.68 | TOKEN-R101-Paper | 82.28 | 75.64 | 89.34 | 79.76 | 66.57 | 51.37 | 78.56 | 61.56 | TOKEN-R101-Released weights | 82.16 | 70.58 | 89.40 | 77.24 | 65.75 | 47.46 | 78.44 | 56.81

    The same goes for the DELG you reproduced. | | ROxf-M | +1M | RPar-M | +1M | ROxf-H | +1M | RPar-H | +1M | |:----------------------------------------:|:------:|:----:|:------:|:----:|:------:|:----:|:------:|:----:| | DELG-R101-Reproduced-Paper | 78.24 | 68.36 | 88.21 | 75.83 | 60.15 | 44.41 | 76.15 | 52.40 | DELG-R101-Reproduced-Released weights | 78.55 | 66.02 | 88.58 | 73.65 | 60.89 | 41.75 | 76.05 | 51.46

    Could you please check if the performance of the paper has errors? If the performance of the paper is correct, could you please share the model (R50-Token, R101-Token) weights used in the paper to check the paper's performance? It should also be noted that this may also affect the reviews/results of many (landmark) image retrieval papers submitted or to be submitted to the conferences/journals.

    opened by sungonce 5
  • Which model is used as RetrievalNet?

    Which model is used as RetrievalNet?

    https://github.com/MCC-WH/Token/blob/928a6ccb260a1f3f02b3dcf5b3490c3fcd7d3050/train.py#L100 RetrievalNet(args.classifier_num) does not refer to any models defined in networks.py, is this correct?

    opened by milliema 2
  • Have you removed the overlapping classes between GLD and Oxford/Pairs?

    Have you removed the overlapping classes between GLD and Oxford/Pairs?

    Hi, authors. Impressive paper and great results! Have you removed the overlapping classes between GLDv2 and Oxford/Paris during training? In GeM and DELF papers, they claim that they removed the overlapping classes, but I did not find this claim in your paper. I recognize your contribution no matter you removed the overlapping or not. Your model outperforms DELG a lot, which is trained on the same GLDv2. I am just out of interest. Thank you!

    opened by anguoyuan 2
  • 论文中的网络结构与开源代码网络结构不同

    论文中的网络结构与开源代码网络结构不同

    感谢作者!但是还有一个疑问想请教一下。 在论文中,特意列出了Table 5,以论证learned tokenizer是弱于atten-based tokenizer的,但是在代码中有query项,这个query就是自行学习的,这也使得最后的attention map也是基于query对原始特征的变换后得到的。这里面是否有论文与代码不一致的问题呢?

    opened by FunkyKoki 1
  • BatchNorm1d 统计参数出现NaN的问题

    BatchNorm1d 统计参数出现NaN的问题

    你好,作者。 在你的代码中,有两处使用了BatchNorm1d,而我在训练过程中发现,随着迭代次数的增加,其统计参数在未知原因下会变成NaN,由于BN的统计参数不影响训练过程,因此训练完全没有异样,但在测试时就会出现问题。 根据我的检查,保存的模型参数中,有且只有BatchNorm1d的统计参数出现了NaN的问题。 请问你是否遇到这类情况?(我可以确保输入不是全0,batchsize不是1)

    opened by FunkyKoki 1
Owner
Hui Wu
Department of Electronic Engineering and Information Science University of Science and Technology of China
Hui Wu
JSON Web Token implementation in Python

PyJWT A Python implementation of RFC 7519. Original implementation was written by @progrium. Sponsor If you want to quickly add secure token-based aut

José Padilla 4.5k Jan 9, 2023
A JOSE implementation in Python

python-jose A JOSE implementation in Python Docs are available on ReadTheDocs. The JavaScript Object Signing and Encryption (JOSE) technologies - JSON

Michael Davis 1.2k Dec 28, 2022
Implementation of Supervised Contrastive Learning with AMP, EMA, SWA, and many other tricks

SupCon-Framework The repo is an implementation of Supervised Contrastive Learning. It's based on another implementation, but with several differencies

Ivan Panshin 132 Dec 14, 2022
A generic, spec-compliant, thorough implementation of the OAuth request-signing logic

OAuthLib - Python Framework for OAuth1 & OAuth2 *A generic, spec-compliant, thorough implementation of the OAuth request-signing logic for Python 3.5+

OAuthlib 2.5k Jan 1, 2023
REST implementation of Django authentication system.

djoser REST implementation of Django authentication system. djoser library provides a set of Django Rest Framework views to handle basic actions such

Sunscrapers 2.2k Jan 1, 2023
Flask Implementation of a login page and some basic functionality.

login_page Flask Implementation of a login page and some basic functionality. How to Run $ chmod +x run.sh setup.sh $ # run setup.sh only if the datab

null 3 Jun 3, 2021
Easy and secure implementation of Azure AD for your FastAPI APIs 🔒 Single- and multi-tenant support.

Easy and secure implementation of Azure AD for your FastAPI APIs ?? Single- and multi-tenant support.

Intility 220 Jan 5, 2023
Simple implementation of authentication in projects using FastAPI

Fast Auth Facilita implementação de um sistema de autenticação básico e uso de uma sessão de banco de dados em projetos com tFastAPi. Instalação e con

null 3 Jan 8, 2022
An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022

Dual Correlation Reduction Network An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022. Any

yueliu1999 109 Dec 23, 2022
Official repository of the AAAI'2022 paper "Contrast and Generation Make BART a Good Dialogue Emotion Recognizer"

CoG-BART Contrast and Generation Make BART a Good Dialogue Emotion Recognizer Quick Start: To run the model on test sets of four datasets, Download th

null 39 Dec 24, 2022
Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

PatatiPatata 28 Oct 18, 2022
Parallel and High-Fidelity Text-to-Lip Generation; AAAI 2022 ; Official code

Parallel and High-Fidelity Text-to-Lip Generation This repository is the official PyTorch implementation of our AAAI-2022 paper, in which we propose P

Zhying 77 Dec 21, 2022
Code for the AAAI 2022 paper "Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-Sentence Dependency Graph".

multilingual-mrc-isdg Code for the AAAI 2022 paper "Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-Sentence Dependency Graph". This r

Liyan 5 Dec 7, 2022
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch >= 1.2.0 P

null 16 Dec 14, 2022
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

null 101 Nov 25, 2022
Official implementation for paper Knowledge Bridging for Empathetic Dialogue Generation (AAAI 2021).

Knowledge Bridging for Empathetic Dialogue Generation This is the official implementation for paper Knowledge Bridging for Empathetic Dialogue Generat

Qintong Li 50 Dec 20, 2022
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 3 Aug 20, 2022
ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton (AAAI 2022)

ShuttleNet: Position-aware Rally Progress and Player Styles Fusion for Stroke Forecasting in Badminton (AAAI 2022) Official code of the paper ShuttleN

Wei-Yao Wang 11 Nov 30, 2022
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism This repository is the official PyTorch implementation of our AAAI-2022 paper, in

Jinglin Liu 829 Jan 7, 2023
Attention-based Transformation from Latent Features to Point Clouds (AAAI 2022)

Attention-based Transformation from Latent Features to Point Clouds This repository contains a PyTorch implementation of the paper: Attention-based Tr

null 12 Nov 11, 2022