Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

Last update: Dec 4, 2022

Related tags

Computer Vision SMCG

Overview

SMCG

Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

Introduction

We investigate a novel and challenging task, namely controllable video captioning with an exemplar sentence. Formally, given a video and a syntactically valid exemplar sentence, the task aims to generate one caption which not only describes the semantic contents of the video, but also follows the syntactic form of the given exemplar sentence. In order to tackle such an exemplar-based video captioning task, we propose a novel Syntax Modulated Caption Generator (SMCG) incorporated in an encoder-decoder-reconstructor architecture.

Dependency

python 2.7.2
torch 1.1.0
java openjdk version "10.0.2" 2018-07-17
StanfordCoreNLP

Download Features and Preprocess Data

For the MSRVTT dataset, please download the following files into the './msrvtt/msrvtt_data/' folder:

MSRVTT caption info: videodatainfo_2016.json,
MSRVTT captions and their sentence parse trees: msrvtt_all_sentence_parse_dict.pkl,
Collected exemplar sentences and their parse trees: coco_filter_parse_dict.pkl,
Video features: msrvtt_incepRes_rgb_feats.hdf5,
Glove word embeddings: glove.840B.300d.zip.

For the ActivityNet Captionsd dataset, please download the following files into the './activitynet/activitynet_data/' folder:

ActivityNet caption info: CAP.pkl,
ActivityNet captions and their sentence parse trees: anet_parse_dict.pkl,
Collected exemplar sentences and their parse trees: coco_filter_parse_dict.pkl,
Video features: anet_new_inception_resnet_feats.hdf5,
Glove word embeddings: glove.840B.300d.zip.

Data Preprocessing

Go to the './msrvtt/process_msrvtt_data/' folder, and run:

python prepro_vocab_parse_pos.py
python fill_template.py

Go to the './activitynet/process_activitynet_data/' folder, and run:

python prepro_anetcoco_vocab_pos_parse.py
python fill_template.py

Model Training and Testing

For the MSRVTT dataset, please go to the './msrvtt/src/' folder, and train the model by:

python train.py --gpu xx

For model inference and evaluation, run:

bash eval.sh 
bash control.sh

Note: 'eval.sh' is used to evaluate the generated exemplar-based captions with conventional captioning metrics. 'control.sh' is used to compare the generated exemplar-based captions with the provided exemplar captions from the syntactic aspect, i.e., compute the edit distance between their parse trees.
For the ActivityNet Captions dataset, please go to the './activitynet/src/' folder, and train/test the model as on the MSRVTT dataset.

Citation

@inproceedings{yuan2020Control,
  title={Controllable Video Captioning with an Exemplar Sentence},
  author={Yuan, Yitian and Ma, Lin and Wang, Jingwen and Zhu, Wenwu},
  booktitle={the 28th ACM International Conference on Multimedia (MM ’20)},
  year={2020}
}

Comments

Could you please provide the predicted captions of the test datasets?

Hi,

Thank you for your code! Could you plz provide the predicted captions of the test datasets, that is MSRVTT and ActivityNet? (Currently, I can't infer it for some reason :).

Thank you!

opened by RyanLiut 0

code for our ICCV 2021 paper "DeepCAD: A Deep Generative Network for Computer-Aided Design Models"

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

85 Dec 31, 2022

Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Dataset and Code for RealVSR Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme Xi Yang, Wangmeng Xiang,

91 Nov 22, 2022

Code for paper "Role-based network embedding via structural features reconstruction with degree-regularized constraint"

Role-based network embedding via structural features reconstruction with degree-regularized constraint Train python main.py --dataset brazil-flights

1 Jun 28, 2022

Code for the paper: Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution

Fusformer Code for the paper: "Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution" Plateform Python 3.8.5 + Pytor

11 Dec 12, 2022

The code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Long-term Action Assessment".

Likert Scoring with Grade Decoupling for Long-term Action Assessment This is the code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Lon

10 Oct 21, 2022

Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

SoftGroup We provide code for reproducing results of the paper SoftGroup for 3D Instance Segmentation on Point Clouds (CVPR 2022) Author: Thang Vu, Ko

231 Dec 27, 2022

Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

PPE ✨ Repository for our CVPR'2022 paper: Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-

34 Nov 28, 2022

Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Bailando Code for CVPR 2022 (oral) paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory" [Paper] | [Project Page] | [Vi

237 Dec 29, 2022

Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

Albumentations Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to inc

11.4k Jan 2, 2023

Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

Related tags

Overview

SMCG

Introduction

Dependency

Download Features and Preprocess Data

Data Preprocessing

Model Training and Testing

Citation

You might also like...

code for our ICCV 2021 paper "DeepCAD: A Deep Generative Network for Computer-Aided Design Models"

Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Code for paper "Role-based network embedding via structural features reconstruction with degree-regularized constraint"

Code for the paper: Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution

The code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Long-term Action Assessment".

Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

Comments

Could you please provide the predicted captions of the test datasets?

Owner

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

This repository contains the code for the paper "SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks"

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"