Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)

ZHANG HAO

Last update: Dec 29, 2022

Related tags

Deep Learning ReLoCLNet

Overview

Video Corpus Moment Retrieval with Contrastive Learning

PyTorch implementation for the paper "Video Corpus Moment Retrieval with Contrastive Learning" (SIGIR 2021, long paper): SIGIR version, ArXiv version.

The codes are modified from TVRetrieval.

Prerequisites

python 3.x with pytorch (1.7.0), torchvision, transformers, tensorboard, tqdm, h5py, easydict
cuda, cudnn

If you have Anaconda installed, the conda environment of ReLoCLNet can be built as follows (take python 3.7 as an example):

conda create --name reloclnet python=3.7
conda activate reloclnet
conda install -c anaconda cudatoolkit cudnn  # ignore this if you already have cuda installed
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch
conda install -c anaconda h5py=2.9.0
conda install -c conda-forge transformers tensorboard tqdm easydict

The conda environment of TVRetrieval also works.

Getting started

Clone this repository

$ git clone [email protected]:IsaacChanghau/ReLoCLNet.git
$ cd ReLoCLNet

Download features

For the features of TVR dataset, please download tvr_feature_release.tar.gz (link is copied from TVRetrieval#prerequisites) and extract it to the data directory:

$ tar -xf path/to/tvr_feature_release.tar.gz -C data

This link may be useful for you to directly download Google Drive files using wget. Please refer TVRetrieval#prerequisites for more details about how the features are extracted if you are interested.

Add project root to PYTHONPATH (Note that you need to do this each time you start a new session.)

$ source setup.sh

Training and Inference

TVR dataset

# train, refer `method_tvr/scripts/train.sh` and `method_tvr/config.py` more details about hyper-parameters
$ bash method_tvr/scripts/train.sh tvr video_sub_tef resnet_i3d --exp_id reloclnet
# inference
# the model directory placed in method_tvr/results/tvr-video_sub_tef-reloclnet-*
# change the MODEL_DIR_NAME as tvr-video_sub_tef-reloclnet-*
# SPLIT_NAME: [val | test]
$ bash method_tvr/scripts/inference.sh MODEL_DIR_NAME SPLIT_NAME

For more details about evaluation and submission, please refer TVRetrieval#training-and-inference.

Citation

If you feel this project helpful to your research, please cite our work.

@inproceedings{zhang2021video,
	author = {Zhang, Hao and Sun, Aixin and Jing, Wei and Nan, Guoshun and Zhen, Liangli and Zhou, Joey Tianyi and Goh, Rick Siow Mong},
	title = {Video Corpus Moment Retrieval with Contrastive Learning},
	year = {2021},
	isbn = {9781450380379},
	publisher = {Association for Computing Machinery},
	address = {New York, NY, USA},
	url = {https://doi.org/10.1145/3404835.3462874},
	doi = {10.1145/3404835.3462874},
	booktitle = {Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval},
	pages = {685–695},
	numpages = {11},
	location = {Virtual Event, Canada},
	series = {SIGIR '21}
}

TODO

Upload codes for ActivityNet Captions dataset

Comments

some questions about model's performence on activitynet

I can't reproduce the performance on activitynet, I tried follow settings in the paper but it didn't work, so may I know some details about difference in model structure, pararmeters setting in method_tvr/config.py compared with TVR dataset?

opened by DustChen1 0
some question about activitynet dataset

I found that about 400+ video features were missing from the uploaded new activitynet (link: https://app.box.com/s/h0sxa5klco6qve5ahnz50ly2nksmuedw/folder/138579301343) , resulting in the absence of many samples from the test set. This will lead to inaccurate performance under the evaluation. May I ask on which data set will the performance test of the follow-up work be conducted ?

opened by shenxingyu 0

Automatic Differentiation Multipole Moment Molecular Forcefield

Automatic Differentiation Multipole Moment Molecular Forcefield Performance notes On a single gpu, using waterbox_31ang.pdb example from MPIDplugin wh

4 Jan 7, 2022

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

"# SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING" i

28 Dec 12, 2022

Near-Duplicate Video Retrieval with Deep Metric Learning

Near-Duplicate Video Retrieval with Deep Metric Learning This repository contains the Tensorflow implementation of the paper Near-Duplicate Video Retr

2 Jan 24, 2022

[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

Disentangled Representation Learning for Text-Video Retrieval This is a PyTorch implementation of the paper Disentangled Representation Learning for T

49 Dec 18, 2022

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

1 Jan 23, 2022

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

42 Nov 24, 2022

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language This repository contains UA-GEC data and an accompanying Python lib

226 Dec 29, 2022

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

43 Nov 27, 2022

Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"

Action-Based Conversations Dataset (ABCD) This respository contains the code and data for ABCD (Chen et al., 2021) Introduction Whereas existing goal-

49 Oct 9, 2022

Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)

Related tags

Overview

Video Corpus Moment Retrieval with Contrastive Learning

Prerequisites

Getting started

Training and Inference

Citation

TODO

You might also like...

Automatic Differentiation Multipole Moment Molecular Forcefield

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

Near-Duplicate Video Retrieval with Deep Metric Learning

[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"

Comments

some questions about model's performence on activitynet

some question about activitynet dataset

Owner

ZHANG HAO

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

4th place solution for the SIGIR 2021 challenge.

Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

Personal implementation of paper "Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval"

Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 short.

Codes for SIGIR'22 Paper 'On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation'

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

CVPR '21: In the light of feature distributions: Moment matching for Neural Style Transfer

Moment-DETR code and QVHighlights dataset