VLG-Net: Video-Language Graph Matching Networks for Video Grounding

Mattia Soldan

Last update: Dec 4, 2022

Related tags

Deep Learning VLG-Net

Overview

VLG-Net: Video-Language Graph Matching Networks for Video Grounding

Introduction

Official repository for VLG-Net: Video-Language Graph Matching Networks for Video Grounding. [ArXiv Preprint]

The paper is accepted to the first edition fo the ICCV workshop: AI for Creative Video Editing and Understanding (CVEU).

Installation

Clone the repository and move to folder:

git clone https://github.com/Soldelli/VLG-Net.git
cd VLG-Net

Install environmnet:

conda env create -f environment.yml

If installation fails, please follow the instructions in file doc/environment.md (link).

Data

Download the following resources and extract the content in the appropriate destination folder. See table.

Resource	Download Link	File Size	Destination Folder
StandfordCoreNLP-4.0.0	link	(~0.5GB)	`./datasets/`
TACoS	link	(~0.5GB)	`./datasets/`
ActivityNet-Captions	link	(~29GB)	`./datasets/`
DiDeMo	link	(~13GB)	`./datasets/`
GCNeXt warmup	link	(~0.1GB)	`./datasets/`
Pretrained Models	link	(~0.1GB)	`./models/`

The folder structure should be as follows:

.
├── configs
│
├── datasets
│   ├── activitynet1.3
│   │    ├── annotations
│   │    └── features
│   ├── didemo
│   │    ├── annotations
│   │    └── features
│   ├── tacos
│   │    ├── annotations
│   │    └── features
│   ├── gcnext_warmup
│   └── standford-corenlp-4.0.0
│
├── doc
│
├── lib
│   ├── config
│   ├── data
│   ├── engine
│   ├── modeling
│   ├── structures
│   └── utils
│
├── models
│   ├── activitynet
│   └── tacos
│
├── outputs
│
└── scripts

Training

Copy paste the following commands in the terminal.

Load environment:

conda activate vlg

For ActivityNet-Captions dataset, run:

python train_net.py --config-file configs/activitynet.yml OUTPUT_DIR outputs/activitynet

For TACoS dataset, run:

python train_net.py --config-file configs/tacos.yml OUTPUT_DIR outputs/tacos

Evaluation

For simplicity we provide scripts to automatically run the inference on pretrained models. See script details if you want to run inference on a different model.

Load environment:

conda activate vlg

Then run one of the following scripts to launch the evaluation.

For ActivityNet-Captions dataset, run:

    bash scripts/activitynet.sh

For TACoS dataset, run:

    bash scripts/tacos.sh

Expected results:

After cleaning the code and fixing a couple of minor bugs, performance changed (slightly) with respect to reported numbers in the paper. See below table.

ActivityNet	[email protected]	[email protected]	[email protected]	[email protected]
Paper	46.32	29.82	77.15	63.33
Current	46.32	29.79	77.19	63.36

TACoS	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]
Paper	57.21	45.46	34.19	81.80	70.38	56.56
Current	57.16	45.56	34.14	81.48	70.13	56.34

Citation

If any part of our paper and code is helpful to your work, please cite with:

@inproceedings{soldan2021vlg,
  title={VLG-Net: Video-Language Graph Matching Network for Video Grounding},
  author={Soldan, Mattia and Xu, Mengmeng and Qu, Sisi and Tegner, Jesper and Ghanem, Bernard},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={3224--3234},
  year={2021}
}

You might also like...

A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

One-Stage Visual Grounding ***** New: Our recent work on One-stage VG is available at ReSC.***** A Fast and Accurate One-Stage Approach to Visual Grou

118 Dec 5, 2022

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

9 Nov 14, 2022

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

2D-TAN (Optimized) Introduction This is an optimized re-implementation repository for AAAI'2020 paper: Learning 2D Temporal Localization Networks for

112 Dec 31, 2022

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

3DVG-Transformer This repository is for the ICCV 2021 paper "3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds" Our method "3DV

22 Dec 11, 2022

SeqTR: A Simple yet Universal Network for Visual Grounding

SeqTR This is the official implementation of SeqTR: A Simple yet Universal Network for Visual Grounding, which simplifies and unifies the modelling fo

76 Dec 24, 2022

Official code for the paper: Deep Graph Matching under Quadratic Constraint (CVPR 2021)

QC-DGM This is the official PyTorch implementation and models for our CVPR 2021 paper: Deep Graph Matching under Quadratic Constraint. It also contain

55 Nov 14, 2022

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

Deep GNN, Shallow Sampling Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kannan, Viktor Prasanna, Long Jin, R

117 Dec 20, 2022

A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

python_graphs This package is for computing graph representations of Python programs for machine learning applications. It includes the following modu

258 Dec 29, 2022

The source code of the paper "Understanding Graph Neural Networks from Graph Signal Denoising Perspectives"

GSDN-F and GSDN-EF This repository provides a reference implementation of GSDN-F and GSDN-EF as described in the paper "Understanding Graph Neural Net

18 Nov 14, 2022

Comments

C3D features

Thank you for releasing the code! Could you please point me to the pre-trained C3D model you used for visual feature extraction? What are the hyper-parameters (frame rate, number of frames per clip, stride, etc.) for video pre-processing? Looking forward to your reply!

opened by fmu2 5
Question of the time cost to run an epoch!

Excuse me, How long does an epoch take? I cannot see the evaluation result of an epoch for more than two hours. Maybe there's something wrong with me.

opened by huxiwen 1
training issues on activitynet1.3 and TACoS
Excellent work!

I train the VLG-Net with your code and the provided resources, and I have two issues needing your help.

I train the VLG-Net on activitynet1.3 with your code, but only get Rank@1,[email protected]=15.94 and Rank@1,[email protected]=4.88 when inference. I don't know what's wrong and how to fix it.

When I train the VLG-Net on the TACoS dataset, the training loss of the first two batches is seemingly normal at epoch 1, and then it becomes to be 'nan'.

Thanks!!!
opened by NeverMoreLCH 13

VLG-Net: Video-Language Graph Matching Networks for Video Grounding

Related tags

Overview

VLG-Net: Video-Language Graph Matching Networks for Video Grounding

Introduction

Installation

Data

Training

Evaluation

Expected results:

Citation

You might also like...

A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

SeqTR: A Simple yet Universal Network for Visual Grounding

Official code for the paper: Deep Graph Matching under Quadratic Constraint (CVPR 2021)

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

The source code of the paper "Understanding Graph Neural Networks from Graph Signal Denoising Perspectives"

Comments

C3D features

Question of the time cost to run an epoch!

training issues on activitynet1.3 and TACoS

Owner

Mattia Soldan

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

U-2-Net: U Square Net - Modified for paired image training of style transfer

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

Pytorch reimplementation of PSM-Net: "Pyramid Stereo Matching Network"

Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)