Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

Related tags

Deep Learning TAGS
Overview

NSGDC

Some codes in this repo are copied/modified from opensource implementations made available by UNITER, PyTorch, HuggingFace, OpenNMT, and Nvidia. The image features are extracted using BUTD.

Requirements

This is following UNITER. We provide Docker image for easier reproduction. Please install the following:

Our scripts require the user to have the docker group membership so that docker commands can be run without sudo. We only support Linux with NVIDIA GPUs. We test on Ubuntu 18.04 and V100 cards. We use mixed-precision training hence GPUs with Tensor Cores are recommended.

Image-Text Retrieval

Download Data

bash scripts/download_itm.sh $PATH_TO_STORAGE

Launch the Docker Container

# docker image should be automatically pulled
source launch_container.sh $PATH_TO_STORAGE/txt_db $PATH_TO_STORAGE/img_db \
$PATH_TO_STORAGE/finetune $PATH_TO_STORAGE/pretrained

In case you would like to reproduce the whole preprocessing pipeline.

The launch script respects $CUDA_VISIBLE_DEVICES environment variable. Note that the source code is mounted into the container under /src instead of built into the image so that user modification will be reflected without re-building the image. (Data folders are mounted into the container separately for flexibility on folder structures.)

Image-Text Retrieval (Flickr30k)

# Train wit the base setting
bash run_cmds/tran_pnsgd_base_flickr.sh
bash run_cmds/tran_pnsgd2_base_flickr.sh

# Train wit the large setting
bash run_cmds/tran_pnsgd_large_flickr.sh
bash run_cmds/tran_pnsgd2_large_flickr.sh

Image-Text Retrieval (COCO)

# Train wit the base setting
bash run_cmds/tran_pnsgd_base_coco.sh
bash run_cmds/tran_pnsgd2_base_coco.sh

# Train wit the large setting
bash run_cmds/tran_pnsgd_large_coco.sh
bash run_cmds/tran_pnsgd2_large_coco.sh

Run Inference

bash run_cmds/inf_nsgd.sh

Results

Our models achieve the following performance.

MS-COCO

Model Image-to-Text Text-to-Image
R@1 R@5 R@110 R@1 R@5 R@10
NSGDC-Base 66.6 88.6 94.0 51.6 79.1 87.5
NSGDC-Large 67.8 89.6 94.2 53.3 80.0 88.0

Flickr30K

Model Image-to-Text Text-to-Image
R@1 R@5 R@110 R@1 R@5 R@10
NSGDC-Base 87.9 98.1 99.3 74.5 93.3 96.3
NSGDC-Large 90.6 98.8 99.1 77.3 94.3 97.3
You might also like...
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

Disentangled Representation Learning for Text-Video Retrieval This is a PyTorch implementation of the paper Disentangled Representation Learning for T

text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.
text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

text recognition toolbox 1. 项目介绍 该项目是基于pytorch深度学习框架,以统一的改写方式实现了以下6篇经典的文字识别论文,论文的详情如下。该项目会持续进行更新,欢迎大家提出问题以及对代码进行贡献。 模型 论文标题 发表年份 模型方法划分 CRNN 《An End-t

Sample code from the Neural Networks from Scratch book.

Neural Networks from Scratch (NNFS) book code Code from the NNFS book (https://nnfs.io) separated by chapter.

Active and Sample-Efficient Model Evaluation
Active and Sample-Efficient Model Evaluation

Active Testing: Sample-Efficient Model Evaluation Hi, good to see you here! 👋 This is code for "Active Testing: Sample-Efficient Model Evaluation". P

Sample and Computation Redistribution for Efficient Face Detection
Sample and Computation Redistribution for Efficient Face Detection

Introduction SCRFD is an efficient high accuracy face detection approach which initially described in Arxiv. Performance Precision, flops and infer ti

Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

PixelTransformer Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation". Project Page Installation Please insta

The GitHub repository for the paper: “Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction“.

SCINet This is the original PyTorch implementation of the following work: Time Series is a Special Sequence: Forecasting with Sample Convolution and I

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

Info and sample codes for "NTU RGB+D Action Recognition Dataset"

"NTU RGB+D" Action Recognition Dataset "NTU RGB+D 120" Action Recognition Dataset "NTU RGB+D" is a large-scale dataset for human action recognition. I

Comments
  • KeyError: 'tree'

    KeyError: 'tree'

    When I run this code, I got an error like this, please tell me how to solve this? I'm a little bit confused, thanks:

    [1,0]:Traceback (most recent call last): [1,0]: File "train_pnsgd.py", line 579, in [1,0]: main(args) [1,0]: File "train_pnsgd.py", line 217, in main [1,0]: for batch in train_dataloader_t: [1,0]: File "/src/data/loader.py", line 106, in iter [1,0]: self.preload(loader_it) [1,0]: File "/src/data/loader.py", line 117, in preload [1,0]: self.batch = next(it) [1,0]: File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 551, in next [1,0]: return self._process_next_batch(batch) [1,0]: File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 575, in _process_next_batch [1,0]: raise Exception("KeyError:" + batch.exc_msg) [1,0]:Exception: KeyError:Traceback (most recent call last): [1,0]: File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop [1,0]: samples = collate_fn([dataset[i] for i in batch_indices]) [1,0]: File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in [1,0]: samples = collate_fn([dataset[i] for i in batch_indices]) [1,0]: File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 85, in getitem [1,0]: return self.datasets[dataset_idx][sample_idx] [1,0]: File "/src/data/pnsgd.py", line 410, in getitem [1,0]: tree = self.txt_db[gt_txt_id]['tree']

    opened by zchoi 1
Owner
Zhihao Fan
Zhihao Fan
Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

2D-TAN (Optimized) Introduction This is an optimized re-implementation repository for AAAI'2020 paper: Learning 2D Temporal Localization Networks for

Joya Chen 112 Dec 31, 2022
Using this codebase as a tool for my own research. Making some modifications to the original repo for my own purposes.

For SwapNet Create a list.txt file containing all the images to process. This can be done with the GNU find command: find path/to/input/folder -name '

Andrew Jong 2 Nov 10, 2021
Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

null 47 Dec 16, 2022
a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Arno Barton 1 Oct 29, 2021
A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding his way.

GuidEye A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding h

Munal Jain 0 Aug 9, 2022
《LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classification》(AAAI 2021) GitHub:

LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classification

null 76 Dec 5, 2022
Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Finetune the base 64 px GLIDE-text2im model from OpenAI on your own image-text dataset

Clay Mullis 82 Oct 13, 2022
Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback

CoSMo.pytorch Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback, Seungmin Lee*, Dongwan Kim*, Bohyung

Seung Min Lee 54 Dec 8, 2022
LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

Xuan Hieu Duong 7 Jan 12, 2022
LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

Hieu Duong 7 Jan 12, 2022