Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).

Computer Vision Lab. @ GIST

Last update: Dec 27, 2022

Related tags

Deep Learning PSVL

Overview

Zero-shot Natural Language Video Localization (ZSNLVL) by Pseudo-Supervised Video Localization (PSVL)

This repository is for Zero-shot Natural Language Video Localization. (ICCV 2021, Oral)

We first propose a novel task of zero-shot natural language video localization. The proposed task setup does not require any paired annotation cost for NLVL task but only requires easily available text corpora, off-the-shelf object detector, and a collection of videos to localize. To address the task, we propose a Pseudo-Supervised Video Localization method, called PSVL, that can generate pseudo-supervision for training an NLVL model. Benchmarked on two widely used NLVL datasets, the proposed method exhibits competitive performance and performs on par or outperforms the models trained with stronger supervision.

Environment

This repository is implemented base on PyTorch with Anaconda.
Refer to below instruction or use Docker (dcahn/psvl:latest).

Get the code

Clone this repo with git, please use:

git clone https://github.com/gistvision/PSVL.git

Make your own environment (If you use docker envronment, you just clone the code and execute it.)

conda create --name PSVL --file requirements.txt
conda activate PSVL

Working environment

RTX2080Ti (11G)
Ubuntu 18.04.5
pytorch 1.5.1

Download

Dataset & Pretrained model

This link is connected for downloading video features used in this paper.
: After downloading the video feature, you need to set the data path in a config file.
This link is connected for downloading pre-trained model.

Evaluating pre-trained models

If you want to evaluate the pre-trained model, you can use below command.

python inference.py --model CrossModalityTwostageAttention --config "YOUR CONFIG PATH" --pre_trained "YOUR MODEL PATH"

Training models from scratch

To train PSVL, run train.py with below command.

# Training from scratch
python train.py --model CrossModalityTwostageAttention --config "YOUR CONFIG PATH"
# Evaluation
python inference.py --model CrossModalityTwostageAttention --config "YOUR CONFIG PATH" --pre_trained "YOUR MODEL PATH"

Lisence

MIT Lisence

Citation

If you use this code, please cite:

@inproceedings{nam2021zero,
  title={Zero-shot Natural Language Video Localization},
  author={Nam, Jinwoo and Ahn, Daechul and Kang, Dongyeop and Ha, Seong Jong and Choi, Jonghyun},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={1470-1479},
  year={2021}
}

Contact

If you have any questions, please send e-mail to me ([email protected], [email protected])

You might also like...

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Implicit Internal Video Inpainting Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation paper | project

202 Dec 30, 2022

This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

GMPQ: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation This is the pytorch implementation for the paper: Generalizable Mix

18 Sep 2, 2022

PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

StructDepth PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimat

112 Nov 28, 2022

This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021.

PyTorch implementation of DAQ This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021. For more informatio

36 Nov 4, 2022

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

107 Dec 20, 2022

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

52 Nov 25, 2022

Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

[ICCV2021] Learning to Regress Bodies from Images using Differentiable Semantic Rendering Getting Started DSR has been implemented and tested on Ubunt

83 Nov 27, 2022

source code of “Visual Saliency Transformer” (ICCV2021)

Visual Saliency Transformer (VST) source code for our ICCV 2021 paper “Visual Saliency Transformer” by Nian Liu, Ni Zhang, Kaiyuan Wan, Junwei Han, an

89 Dec 21, 2022

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

103 Dec 14, 2022

Comments

Deviation in Reproduction of Results

Dear authors, Great work proposing zero-shot NLVL, and thank you for making the code publicly available!

However, when I try to reproduce the results on my server, I notice that the results are off by ~3-5 points for higher Recall@k measures (Recall @ {0.5, 0.7}). It seems like I may be missing something, because this is consistently the case across multiple reproductions. I would really appreciate any suggestions in this regard!

To the best of my knowledge, all the training conditions are the same as listed since I am using the config file provided in the repository as is (except minor changes to the DATA_PATH field).

Sharing the reproduced results obtained vs the reported results in the paper for your reference:

| Model | mIoU | [email protected] | [email protected] | [email protected] | |-------------------|------------|------------------|------------------|------------------| | PSVL (Reproduced) | 29.91 | 46.48 | 26.56 | 11.23 | | PSVL (Reported) | 31.24 | 46.47 | 31.29 | 14.17 |

Thank you in advance!

opened by ml-researcher1 0
Annotations of ActivityNet captions dataset

Dear authors,

First of all, really impressive work on zero-shot localization. Could you please also make available the annotations for the ActivityNet captions dataset?

Thank you so much in advance

opened by g1910 0
TEP & Pseudo-query generation code
Hi, thanks for sharing the code. I checked the 'charades_train_pseudo_supervision_TEP_PS.json' file. I think the data(timestamp and pseudo query) has already been extracted. Can you share the TEP and Pseudo-query generation code?

{'timestamp': [0.0, 0.2079207920792079], 'duration': 33.67, 'vid': 'AO8RW', 'tokens': ['climb', 'wash', 'hold', 'door', 'window', 'shirt', 'woman', 'rack']}

Thanks.
opened by minjoong507 2
The feature of ActivityNetCaptions dataset.

Could you provide the feature of ActivityNetCaptions dataset?

Besides, could you provide the config file to reproduce the results of ActivityNetCaptions dataset?

Thanks, Jiaheng.

opened by liujiaheng 1

Owner

Computer Vision Lab. @ GIST

Some useful codes for computer vision and machine learning.

GitHub

Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).

Related tags

Overview

Zero-shot Natural Language Video Localization (ZSNLVL) by Pseudo-Supervised Video Localization (PSVL)

Environment

Get the code

Working environment

Download

Dataset & Pretrained model

Evaluating pre-trained models

Training models from scratch

Lisence

Citation

Contact

You might also like...

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021.

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

source code of “Visual Saliency Transformer” (ICCV2021)

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Comments

Deviation in Reproduction of Results

Annotations of ActivityNet captions dataset

TEP & Pseudo-query generation code

The feature of ActivityNetCaptions dataset.

Owner

Computer Vision Lab. @ GIST

Official implementation of "A Unified Objective for Novel Class Discovery", ICCV2021 (Oral)

ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Official code for ICCV2021 paper "M3D-VTON: A Monocular-to-3D Virtual Try-on Network"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera