Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

Zhihao Fan

Last update: Nov 7, 2022

Related tags

Deep Learning TAGS

Overview

NSGDC

Some codes in this repo are copied/modified from opensource implementations made available by UNITER, PyTorch, HuggingFace, OpenNMT, and Nvidia. The image features are extracted using BUTD.

Requirements

This is following UNITER. We provide Docker image for easier reproduction. Please install the following:

Our scripts require the user to have the docker group membership so that docker commands can be run without sudo. We only support Linux with NVIDIA GPUs. We test on Ubuntu 18.04 and V100 cards. We use mixed-precision training hence GPUs with Tensor Cores are recommended.

Image-Text Retrieval

Download Data

bash scripts/download_itm.sh $PATH_TO_STORAGE

Launch the Docker Container

# docker image should be automatically pulled
source launch_container.sh $PATH_TO_STORAGE/txt_db $PATH_TO_STORAGE/img_db \
$PATH_TO_STORAGE/finetune $PATH_TO_STORAGE/pretrained

In case you would like to reproduce the whole preprocessing pipeline.

The launch script respects $CUDA_VISIBLE_DEVICES environment variable. Note that the source code is mounted into the container under /src instead of built into the image so that user modification will be reflected without re-building the image. (Data folders are mounted into the container separately for flexibility on folder structures.)

Image-Text Retrieval (Flickr30k)

# Train wit the base setting
bash run_cmds/tran_pnsgd_base_flickr.sh
bash run_cmds/tran_pnsgd2_base_flickr.sh

# Train wit the large setting
bash run_cmds/tran_pnsgd_large_flickr.sh
bash run_cmds/tran_pnsgd2_large_flickr.sh

Image-Text Retrieval (COCO)

# Train wit the base setting
bash run_cmds/tran_pnsgd_base_coco.sh
bash run_cmds/tran_pnsgd2_base_coco.sh

# Train wit the large setting
bash run_cmds/tran_pnsgd_large_coco.sh
bash run_cmds/tran_pnsgd2_large_coco.sh

Run Inference

bash run_cmds/inf_nsgd.sh

Results

Our models achieve the following performance.

MS-COCO

Model	Image-to-Text			Text-to-Image
Model	R@1	R@5	R@110	R@1	R@5	R@10
NSGDC-Base	66.6	88.6	94.0	51.6	79.1	87.5
NSGDC-Large	67.8	89.6	94.2	53.3	80.0	88.0

Flickr30K

Model	Image-to-Text			Text-to-Image
Model	R@1	R@5	R@110	R@1	R@5	R@10
NSGDC-Base	87.9	98.1	99.3	74.5	93.3	96.3
NSGDC-Large	90.6	98.8	99.1	77.3	94.3	97.3

You might also like...

[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

Disentangled Representation Learning for Text-Video Retrieval This is a PyTorch implementation of the paper Disentangled Representation Learning for T

49 Dec 18, 2022

text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

text recognition toolbox 1. 项目介绍该项目是基于pytorch深度学习框架，以统一的改写方式实现了以下6篇经典的文字识别论文，论文的详情如下。该项目会持续进行更新，欢迎大家提出问题以及对代码进行贡献。模型论文标题发表年份模型方法划分 CRNN 《An End-t

168 Dec 24, 2022

Sample code from the Neural Networks from Scratch book.

Neural Networks from Scratch (NNFS) book code Code from the NNFS book (https://nnfs.io) separated by chapter.

172 Dec 31, 2022

Active and Sample-Efficient Model Evaluation

Active Testing: Sample-Efficient Model Evaluation Hi, good to see you here! 👋 This is code for "Active Testing: Sample-Efficient Model Evaluation". P

19 Oct 30, 2022

Sample and Computation Redistribution for Efficient Face Detection

Introduction SCRFD is an efficient high accuracy face detection approach which initially described in Arxiv. Performance Precision, flops and infer ti

13 Mar 5, 2022

Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

PixelTransformer Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation". Project Page Installation Please insta

24 Dec 17, 2022

The GitHub repository for the paper: “Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction“.

SCINet This is the original PyTorch implementation of the following work: Time Series is a Special Sequence: Forecasting with Sample Convolution and I

386 Jan 1, 2023

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

2 Jul 25, 2022

Info and sample codes for "NTU RGB+D Action Recognition Dataset"

"NTU RGB+D" Action Recognition Dataset "NTU RGB+D 120" Action Recognition Dataset "NTU RGB+D" is a large-scale dataset for human action recognition. I

578 Dec 30, 2022

Comments

KeyError: 'tree'

When I run this code, I got an error like this, please tell me how to solve this? I'm a little bit confused, thanks:

[1,0]:Traceback (most recent call last): [1,0]: File "train_pnsgd.py", line 579, in [1,0]: main(args) [1,0]: File "train_pnsgd.py", line 217, in main [1,0]: for batch in train_dataloader_t: [1,0]: File "/src/data/loader.py", line 106, in iter [1,0]: self.preload(loader_it) [1,0]: File "/src/data/loader.py", line 117, in preload [1,0]: self.batch = next(it) [1,0]: File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 551, in next [1,0]: return self._process_next_batch(batch) [1,0]: File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 575, in _process_next_batch [1,0]: raise Exception("KeyError:" + batch.exc_msg) [1,0]:Exception: KeyError:Traceback (most recent call last): [1,0]: File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop [1,0]: samples = collate_fn([dataset[i] for i in batch_indices]) [1,0]: File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in [1,0]: samples = collate_fn([dataset[i] for i in batch_indices]) [1,0]: File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 85, in getitem [1,0]: return self.datasets[dataset_idx][sample_idx] [1,0]: File "/src/data/pnsgd.py", line 410, in getitem [1,0]: tree = self.txt_db[gt_txt_id]['tree']

opened by zchoi 1