Multi-query Video Retreival

Related tags

Deep Learning MQVR

Overview

Multi-query Video Retrieval

This repository contains the code for the paper:

@misc{wang2022multiquery,
      title={Multi-query Video Retrieval}, 
      author={Zeyu Wang and Yu Wu and Karthik Narasimhan and Olga Russakovsky},
      year={2022},
      eprint={2201.03639},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Data Preparation

Download raw videos for MSR-VTT, MSVD and VATEX, and put them into data/{dataset}/raw_videos folder.
Run the script data/extract_frames.sh to extract frames from raw videos.

The resulting data folder structures like this:

├── data
    ├── msrvtt
        ├── msrvtt_train.json
        ├── msrvtt_test.json
        ├── msrvtt_test_varying_query_sample_1-20.json
        ├── raw_videos
            ├── video0.mp4
            ├── ...
        ├── extracted_frames
            ├── video0.mp4
                ├── 0.jpg
                ├── ...
            ├── ...
    ├── msvd
        ├── ...
    ├── vatex
        ├── ...

For Frozen model, download the pretrained checkpoint provided by the original authors here, and put into record/pretrained folder.

Training

Run command: python train.py -c configs/{config_path}

Evaluation

Run command: python evaluate.py -c configs/{config_path}

Acknowledgements

The structure of this repository is based on https://github.com/victoresque/pytorch-template. Some of the code are adpated from https://github.com/m-bain/frozen-in-time and https://github.com/ArrowLuo/CLIP4Clip.

Code for the paper "Query Embedding on Hyper-relational Knowledge Graphs"

Query Embedding on Hyper-Relational Knowledge Graphs This repository contains the code used for the experiments in the paper Query Embedding on Hyper-

19 Jul 26, 2022

Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"

Update 2019/06/24: A model trained on 10% of the Shepard-Metzler dataset has been added, the following notebook explains the main features of this mod

313 Dec 27, 2022

Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Vision-Language Transformer and Query Generation for Referring Segmentation Please consider citing our paper in your publications if the project helps

143 Dec 23, 2022

Our CIKM21 Paper "Incorporating Query Reformulating Behavior into Web Search Evaluation"

Reformulation-Aware-Metrics Introduction This codebase contains source-code of the Python-based implementation of our CIKM 2021 paper. Chen, Jia, et a

5 Mar 5, 2022

Vector AI — A platform for building vector based applications. Encode, query and analyse data using vectors.

Vector AI is a framework designed to make the process of building production grade vector based applications as quickly and easily as possible. Create

267 Dec 23, 2022

Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

APR The repo for the paper Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study. Environment setu

8 Nov 26, 2022

A deep learning based semantic search platform that computes similarity scores between provided query and documents

semanticsearch This is a deep learning based semantic search platform that computes similarity scores between provided query and documents. Documents

1 Nov 30, 2021

Histology images query (unsupervised)

110-1-NTU-DBME5028-Histology-images-query Final Project: Histology images query (unsupervised) Kaggle: https://www.kaggle.com/c/histology-images-query

1 Jan 5, 2022

Program your own vulkan.gpuinfo.org query in Python. Used to determine baseline hardware for WebGPU.

query-gpuinfo-data License This software is not presently released under a license. The data in data/ is obtained under CC BY 4.0 as specified there.

5 Jul 18, 2022

Comments

Clip4clip Performance on ViT B/16

Hi,

Since the reported results of Clip4clip are based on ViT B/32, I wonder if you have tested the model's performance using ViT B/16? For my reproduced version, the performance on MSR-VTT achieves a significant boost (59.6, 72.7, 78.8 R@1 on RA, SA, and MF, respectively). However, the clip4clip model performance using ViT B/16 on MSVD is slightly lower than the reported results using ViT B/32, which is 48.2, 44.7, and 58.9 R@1 on RA, SA, and MF. Therefore, if you've tested the ViT B/16 performance, maybe I could directly use your reported results as baselines for comparison in case my reproducing process might be mistaken.

Thanks!

opened by JustinYuu 1

Multi-query Video Retreival

Related tags

Overview

Multi-query Video Retrieval

Data Preparation

Training

Evaluation

Acknowledgements

You might also like...

Code for the paper "Query Embedding on Hyper-relational Knowledge Graphs"

Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"

Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Our CIKM21 Paper "Incorporating Query Reformulating Behavior into Web Search Evaluation"

Vector AI — A platform for building vector based applications. Encode, query and analyse data using vectors.

Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

A deep learning based semantic search platform that computes similarity scores between provided query and documents

Histology images query (unsupervised)

Program your own vulkan.gpuinfo.org query in Python. Used to determine baseline hardware for WebGPU.

Comments

Clip4clip Performance on ViT B/16

Owner

Princeton Visual AI Lab

The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

[2021 MultiMedia] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QueryDet: Cascaded Sparse Query for Accelerating High-Resolution SmallObject Detection

Python library containing BART query generation and BERT-based Siamese models for neural retrieval.

QueryInst: Parallelly Supervised Mask Query for Instance Segmentation