SGRAF
PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”.
It is built on top of the SCAN and Cross-modal_Retrieval_Tutorial.
We have released two versions of SGRAF: Branch main
for python2.7; Branch python3.6
for python3.6.
Introduction
The framework of SGRAF:
The updated results (Better than the original paper)
Dataset | Module | Sentence retrieval | Image retrieval | ||||
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | ||
Flick30k | SAF | 75.6 | 92.7 | 96.9 | 56.5 | 82.0 | 88.4 |
SGR | 76.6 | 93.7 | 96.6 | 56.1 | 80.9 | 87.0 | |
SGRAF | 78.4 | 94.6 | 97.5 | 58.2 | 83.0 | 89.1 | |
MSCOCO1k | SAF | 78.0 | 95.9 | 98.5 | 62.2 | 89.5 | 95.4 |
SGR | 77.3 | 96.0 | 98.6 | 62.1 | 89.6 | 95.3 | |
SGRAF | 79.2 | 96.5 | 98.6 | 63.5 | 90.2 | 95.8 | |
MSCOCO5k | SAF | 55.5 | 83.8 | 91.8 | 40.1 | 69.7 | 80.4 |
SGR | 57.3 | 83.2 | 90.6 | 40.5 | 69.6 | 80.3 | |
SGRAF | 58.8 | 84.8 | 92.1 | 41.6 | 70.9 | 81.5 |
Requirements
We recommended the following dependencies for Branch main
.
- Python 2.7
- PyTorch (>=0.4.1)
- NumPy (>=1.12.1)
- TensorBoard
- Punkt Sentence Tokenizer:
import nltk
nltk.download()
> d punkt
Download data and vocab
We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:
wget https://scanproject.blob.core.windows.net/scan-data/data.zip
wget https://scanproject.blob.core.windows.net/scan-data/vocab.zip
Pre-trained models and evaluation
Modify the model_path, data_path, vocab_path in the evaluation.py
file. Then run evaluation.py
:
python evaluation.py
Note that fold5=True
is only for evaluation on mscoco1K (5 folders average) while fold5=False
for mscoco5K and flickr30K. Pretrained models and Log files can be downloaded from Flickr30K_SGRAF and MSCOCO_SGRAF.
Training new models from scratch
Modify the data_path, vocab_path, model_name, logger_name in the opts.py
file. Then run train.py
:
For MSCOCO:
(For SGR) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SGR
(For SAF) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SAF
For Flickr30K:
(For SGR) python train.py --data_name f30k_precomp --num_epochs 40 --lr_update 30 --module_name SGR
(For SAF) python train.py --data_name f30k_precomp --num_epochs 30 --lr_update 20 --module_name SAF
Reference
If SGRAF is useful for your research, please cite the following paper:
@inproceedings{Diao2021SGRAF,
title={Similarity Reasoning and Filtration for Image-Text Matching},
author={Diao, Haiwen and Zhang, Ying and Ma, Lin and Lu, Huchuan},
booktitle={AAAI},
year={2021}
}
License
Apache License 2.0.
If any problems, please contact me at ([email protected]) or ([email protected]).