The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

Ronnie_IIAU

Last update: Dec 22, 2022

Related tags

Overview

SGRAF

PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”.

It is built on top of the SCAN and Cross-modal_Retrieval_Tutorial.

We have released two versions of SGRAF: Branch main for python2.7; Branch python3.6 for python3.6.

Introduction

The framework of SGRAF:

The updated results (Better than the original paper)

Dataset	Module	Sentence retrieval			Image retrieval
Dataset	Module	R@1	R@5	R@10	R@1	R@5	R@10
Flick30k	SAF	75.6	92.7	96.9	56.5	82.0	88.4
	SGR	76.6	93.7	96.6	56.1	80.9	87.0
	SGRAF	78.4	94.6	97.5	58.2	83.0	89.1
MSCOCO1k	SAF	78.0	95.9	98.5	62.2	89.5	95.4
	SGR	77.3	96.0	98.6	62.1	89.6	95.3
	SGRAF	79.2	96.5	98.6	63.5	90.2	95.8
MSCOCO5k	SAF	55.5	83.8	91.8	40.1	69.7	80.4
	SGR	57.3	83.2	90.6	40.5	69.6	80.3
	SGRAF	58.8	84.8	92.1	41.6	70.9	81.5

Requirements

We recommended the following dependencies for Branch main.

Python 2.7
PyTorch (>=0.4.1)
NumPy (>=1.12.1)
TensorBoard
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data and vocab

We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:

wget https://scanproject.blob.core.windows.net/scan-data/data.zip
wget https://scanproject.blob.core.windows.net/scan-data/vocab.zip

Pre-trained models and evaluation

Modify the model_path, data_path, vocab_path in the evaluation.py file. Then run evaluation.py:

python evaluation.py

Note that fold5=True is only for evaluation on mscoco1K (5 folders average) while fold5=False for mscoco5K and flickr30K. Pretrained models and Log files can be downloaded from Flickr30K_SGRAF and MSCOCO_SGRAF.

Training new models from scratch

Modify the data_path, vocab_path, model_name, logger_name in the opts.py file. Then run train.py:

For MSCOCO:

(For SGR) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SGR
(For SAF) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SAF

For Flickr30K:

(For SGR) python train.py --data_name f30k_precomp --num_epochs 40 --lr_update 30 --module_name SGR
(For SAF) python train.py --data_name f30k_precomp --num_epochs 30 --lr_update 20 --module_name SAF

Reference

If SGRAF is useful for your research, please cite the following paper:

@inproceedings{Diao2021SGRAF,
  title={Similarity Reasoning and Filtration for Image-Text Matching},
  author={Diao, Haiwen and Zhang, Ying and Ma, Lin and Lu, Huchuan},
  booktitle={AAAI},
  year={2021}
}

License

Apache License 2.0.
If any problems, please contact me at ([email protected]) or ([email protected]).

Comments

About the loss

Hello, I have a problem and want to ask for help. I tried to run your code, but I found that the loss of the model does not decrease and the evaluation index R1,R5,R10 does not increase and the index medr, meanr is very large

opened by Liujin21 10
run

i have set the virtual environment of my pycharm as same as torch1.2.0 and python2.7. But i can't run through after my train when i want to evaluate the model. it shows indexerror with indices for array. Is something wrong with my environment or my data or some other probability? I tried to solve it with my friends but we found that the code is totally correct. But it still can't successfully eval on my computer. This problem had been haunted me for few days. Thank u.

opened by rabrabrab 5
How to ensumble models

Thanks for your excellent work, I am sincerely appreciative. In the paper, I saw you train SGR and SAF model seperately, but I want to know how can I get the result of SGRAF? I didn't find how to get the result of SGRAF in your Github. Is it to add the similarity obtained by the SGR and SAF models on the test set? I'm looking forward to your reply, thank you again from the bottom of my heart

opened by gzwo-O 2
evaluate.py does not run with models provided - get error from numpy array copy

Attempted to run evaluation.py using provided MS Coco models.

cpu (ie non-gpu) version of Python 3.6 branch

In evaluation.py, line 103, appears to be attempting to insert a record into array img_embs

Specific line is

img_embs[ids] = img_emb.data.cpu().numpy().copy()

This line throws an Error:

IndexError: too many indices for array

Using provided code (evaluation.py) and MS Coco models, ids appears to be a tuple which prints as a list of integers

img_emb.data is a Tensor object, so the assignment to a numpy array img_embs appears to be an attempted conversion of a Tensor to a numpy array, however, the actual intent of the assignment and a work-around for the Error is unclear

Its documented code and the results from the associated paper are good, but unfortunately the provided models are not working, and do not allow the paper results to be duplicated

Please publish an update to the code which works with provided MS Coco models

I am out of my depth in attempting to update this code.

opened by jkent42 2
visualization problem

I am very interested in your work (SGRAF). I now encounter a visualization problem, that is, how to visualize the results of image retrieval text and the results of text retrieval image (as shown in the figure below). If it's convenient for you, please provide the code of visualization results. Thank you very much !

opened by lDarryll 1
I downloaded your code and ran the code according to the default hyperparameters, but found that the loss did not decrease. Do I need to do some other operations before running this code?

I downloaded your code and ran the code according to the default hyperparameters, but found that the loss did not decrease. Do I need to do some other operations before running this code?

opened by BeiMingYy 1
About Similarity Pyramid

Hi, Could you please me how to understand the Similarity Pyramid(and Pyramid Spatial Window, different Pyramid Levels, etc.) which used in obtaining image feature that your paper memtioned??? In your released code, it was only region features extracted by Faster-RCNN(Bottom-up Attention)just as the Pioneers' work? I'm confused about that. Thank you in Advance! :)

opened by Tclz 1
Please Help with Training -Thank you

Dear Professor Hiawen Diao, I am sorry for troubling you. Your help so far has been extraordinary. I am sincerely appreciative. I have been able to replicate your results. My supervisor said that is good. He asked if i can train my own model.

I have trouble with your data, and it may be my fault. If you could please talk about my data question. When you train a machine learning model, my understanding is that you need to include relevant documents in the training data. For example, include 4 of 5 coco captions for training, and hold one back for validation. If my understanding is correct, then I don't see where this happens in the code.

This is the last piece that my supervisor has asked for. Any clarification you can provide would be very helpful.

Thank you

Kent

opened by jkent42 0

Owner

Ronnie_IIAU

GitHub

Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'

Filtration Curves for Graph Representation This repository provides the code from the KDD'21 paper Filtration Curves for Graph Representation. Depende

Machine Learning and Computational Biology Lab

16 Oct 16, 2022

Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch >= 1.2.0 P

16 Dec 14, 2022

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

912 Jan 8, 2023

Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

Sharpened Cosine Similarity A layer implementation for PyTorch Install At your c

203 Nov 30, 2022

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

DSM The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion Project Website; Datasets li

114 Oct 16, 2022

Code for KHGT model, AAAI2021

KHGT Code for KHGT accepted by AAAI2021 Please unzip the data files in Datasets/ first. To run KHGT on Yelp data, use python labcode_yelp.py For Movi

32 Nov 29, 2022

A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

LPM_Python A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching. The code is established ac

11 Nov 7, 2022

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks Contributions A novel pairwise feature LSP to extract structural

31 Dec 6, 2022

Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

C2-Matching (CVPR2021) This repository contains the implementation of the following paper: Robust Reference-based Super-Resolution via C2-Matching Yum

151 Dec 26, 2022

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps Here is the code for ssbassline model. We also provide OCR results/features/mode

51 Nov 18, 2022

Implementation for our AAAI2021 paper (Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction).

SSAN Introduction This is the pytorch implementation of the SSAN model (see our AAAI2021 paper: Entity Structure Within and Throughout: Modeling Menti

69 Nov 15, 2022

The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

Related tags

Overview

SGRAF

Introduction

Requirements

Download data and vocab

Pre-trained models and evaluation

Training new models from scratch

Reference

License

Comments

Owner

Ronnie_IIAU

Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'

Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

Code for KHGT model, AAAI2021

A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks

Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Implementation for our AAAI2021 paper (Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction).

Implementation of our paper 'RESA: Recurrent Feature-Shift Aggregator for Lane Detection' in AAAI2021.

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

Out-of-Town Recommendation with Travel Intention Modeling (AAAI2021)

Code for the Image similarity challenge.

Database Reasoning Over Text project for ACL paper

Author: Wenhao Yu ([email protected]). ACL 2022. Commonsense Reasoning on Knowledge Graph for Text Generation

This project uses Template Matching technique for object detecting by detection of template image over base image.

This project uses Template Matching technique for object detecting by detection of template image over base image