Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Yuqing Song

Last update: Oct 11, 2022

Related tags

Deep Learning video-paragraph

Overview

Towards Diverse Paragraph Captioning for Untrimmed Videos

This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Captioning for Untrimmed Videos (CVPR 2021).

Requirements

Python 3.6
Java 15.0.2
PyTorch 1.2
numpy, tqdm, h5py, scipy, six

Training & Inference

Data preparation

Download the pre-extracted video features of ActivityNet Captions or Charades Captions datasets from BaiduNetdisk (code: he21).
Decompress the downloaded files to the corresponding dataset folder in the ordered_feature/ directory.

Start training

Train our model without reinforcement learning, * can be activitynet or charades.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token/model.json ../results/*/dm.token/path.json --is_train

Fine-tune the pretrained model using self-critical with both accuracy and diversity rewards.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token.rl/model.json ../results/*/dm.token.rl/path.json --is_train --resume_file ../results/*/dm.token/model/epoch.*.th

Train our model with key frames selection.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/key_frames/model.json ../results/*/key_frames/path.json --is_train --resume_file ../results/*/key_frames/pretrained.th

It will achieve a slightly worse result with only a half of the video features used at inference phase for faster decoding. You need to download the pretrained.th model at first for the key-frame selection.

Evaluation

The trained checkpoints have been saved at the results/*/folder/model/ directory. After evaluation, the generated captions (corresponding to the name file in the public_split) and evaluating scores will be saved at results/*/folder/pred/tst/.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/folder/model.json ../results/*/folder/path.json --eval_set tst --resume_file ../results/*/folder/model/epoch.*.th

We also provide the pretrained models for the ActivityNet dataset here and Charades dataset here, which are re-run and achieve similar results with the paper.

Reference

If you find this repo helpful, please consider citing:

@inproceedings{song2021paragraph,
  title={Towards Diverse Paragraph Captioning for Untrimmed Videos},
  author={Song, Yuqing and Chen, Shizhe and Jin, Qin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Comments

How to rerun strong baselines using the same video features on ActivityNet

Hi, syuqings, thank you for your repo. I am a little confused about how to run experiments of the strong baselines by using the same video features. For example, in VTransformer or MART, the features are sampled every 0.5 sec and the specific frame indices are loaded. When rerunning them, did you set the values of the sampling second for your provided features? Or did you use your framework and replace the method with the baselines? Could I know some details about this? Thank you very much.

opened by sushizixin 8
Sorry to be a bother

Dear professor syuqings,

Sorry to be a bother，I read your paper titled and I am very interested in it. Strictly follow REDME , I don't understand what pretrained.th do？ Does pretrained.th have to be used to achieve the result of the author's article?

I will be appreciated for your reply as soon as possible, thank you very much!

Best regards,

opened by 123456789live 6
ResNet model

Hi, Congratulations on the excellent work. I am unable to download resnet model from https://pan.baidu.com/s/1xznBbIELY8Pp5N-98fiqdQ#list/path=%2F. It is asking for user information. Can you upload this model on google drive or another alternate place?

Thanks in advance.

opened by avinashsai 2
val set of charades dataset

Dear professor syuqings,

Thank you very much for your article,Sorry to bother you,Why is the val set of charades dataset the same as the test set? My assessment score was the same as the training model score.

I will be appreciated for your reply as soon as possible, thank you very much!

Best regards,

opened by ZHONGHZONG 2
pretrained.th

Dear professor syuqings,

Strictly follow REDME ,In the third step of the training,Why was the post-training assessment not as high as the second-training model?Am I loading and training the pretrained.th the wrong way?

Looking forward to your reply

opened by ZHONGHZONG 2
Testing on a different video. Missing config files

Hi @syuqings,

Firstly thanks for your work and for making your code public. I want to inference your model (produce paragraphs) for a different set of videos (instead of the Activity or Charades dataset). I have read the previous issues regarding the self-video test and I have extracted the needed features for the model.

I am trying to run the driver/transformer.py script to produce the video paragraphs, but I cannot get pass the model_cfg_file and path_cfg_file arguments. I asssume that those are some config files for the modules, but I cannot find them in the repo.

Thanks in advance for any help you can give me.

George

opened by mpalaourg 1
The performance of baseline model is higher than original paper

Thanks for your great work!

However, I notice that the baseline performance reported in your paper is higher than their original paper. For example, the MART model. Why the performance is so different? Do you use a different evaluation script?

opened by PKULiuHui 1
I3D feature extraction

Hi, thank you for your great work :)

I would like to extract I3D feature with my own video data. How can I access the feature extraction code? Can you share the script for the kinetics-i3d? (https://github.com/deepmind/kinetics-i3d) I also tried to extract the feature with (https://github.com/v-iashin/video_features/blob/4fa02bd5c5b8c34081dcfb609e2bcd5a973eaab2), but values from the feature were different.

opened by lee-jiyoung 0
RL not working

[issue] The fine-tuning step doesn't increase the scores (it even decreases the score). Please refer to the green line in the chart below.

[How to reproduce] I have trained the provided code on ActivityNet dataset. I have followed the data preparation and training instructions of the README.

[questions] Is this the same behavior observed during your training?

opened by Kashu7100 2
Trouble downloading the dataset

hi syuqing, I can't access your dataset because I cannot download it to china. Is it possible to have tour dataset beside using Baidudesk? (e.g.:google drive)

opened by riokt 0

Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Related tags

Overview

Towards Diverse Paragraph Captioning for Untrimmed Videos

Requirements

Training & Inference

Data preparation

Start training

Evaluation

Reference

Comments

Owner

Yuqing Song

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis

A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"

Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification

Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

Codes for CIKM'21 paper 'Self-Supervised Graph Co-Training for Session-based Recommendation'.

codes for "Scheduled Sampling Based on Decoding Steps for Neural Machine Translation" (long paper of EMNLP-2022)

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

Multiple paper open-source codes of the Microsoft Research Asia DKI group