Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Overview

Towards Diverse Paragraph Captioning for Untrimmed Videos

This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Captioning for Untrimmed Videos (CVPR 2021).

Requirements

  • Python 3.6
  • Java 15.0.2
  • PyTorch 1.2
  • numpy, tqdm, h5py, scipy, six

Training & Inference

Data preparation

  1. Download the pre-extracted video features of ActivityNet Captions or Charades Captions datasets from BaiduNetdisk (code: he21).
  2. Decompress the downloaded files to the corresponding dataset folder in the ordered_feature/ directory.

Start training

  1. Train our model without reinforcement learning, * can be activitynet or charades.
$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token/model.json ../results/*/dm.token/path.json --is_train
  1. Fine-tune the pretrained model using self-critical with both accuracy and diversity rewards.
$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/dm.token.rl/model.json ../results/*/dm.token.rl/path.json --is_train --resume_file ../results/*/dm.token/model/epoch.*.th
  1. Train our model with key frames selection.
$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/key_frames/model.json ../results/*/key_frames/path.json --is_train --resume_file ../results/*/key_frames/pretrained.th

It will achieve a slightly worse result with only a half of the video features used at inference phase for faster decoding. You need to download the pretrained.th model at first for the key-frame selection.

Evaluation

The trained checkpoints have been saved at the results/*/folder/model/ directory. After evaluation, the generated captions (corresponding to the name file in the public_split) and evaluating scores will be saved at results/*/folder/pred/tst/.

$ cd driver
$ CUDA_VISIBLE_DEVICES=0 python transformer.py ../results/*/folder/model.json ../results/*/folder/path.json --eval_set tst --resume_file ../results/*/folder/model/epoch.*.th

We also provide the pretrained models for the ActivityNet dataset here and Charades dataset here, which are re-run and achieve similar results with the paper.

Reference

If you find this repo helpful, please consider citing:

@inproceedings{song2021paragraph,
  title={Towards Diverse Paragraph Captioning for Untrimmed Videos},
  author={Song, Yuqing and Chen, Shizhe and Jin, Qin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}
Comments
  • How to rerun strong baselines using the same video features on ActivityNet

    How to rerun strong baselines using the same video features on ActivityNet

    Hi, syuqings, thank you for your repo. I am a little confused about how to run experiments of the strong baselines by using the same video features. For example, in VTransformer or MART, the features are sampled every 0.5 sec and the specific frame indices are loaded. When rerunning them, did you set the values of the sampling second for your provided features? Or did you use your framework and replace the method with the baselines? Could I know some details about this? Thank you very much.

    opened by sushizixin 8
  • Sorry to be a bother

    Sorry to be a bother

    Dear professor syuqings,

    Sorry to be a bother,I read your paper titled and I am very interested in it. Strictly follow REDME , I don't understand what pretrained.th do? Does pretrained.th have to be used to achieve the result of the author's article?

    I will be appreciated for your reply as soon as possible, thank you very much!

    Best regards,

    opened by 123456789live 6
  • ResNet model

    ResNet model

    Hi, Congratulations on the excellent work. I am unable to download resnet model from https://pan.baidu.com/s/1xznBbIELY8Pp5N-98fiqdQ#list/path=%2F. It is asking for user information. Can you upload this model on google drive or another alternate place?

    Thanks in advance.

    opened by avinashsai 2
  • val set of charades  dataset

    val set of charades dataset

    Dear professor syuqings,

    Thank you very much for your article,Sorry to bother you,Why is the val set of charades dataset the same as the test set? My assessment score was the same as the training model score.

    I will be appreciated for your reply as soon as possible, thank you very much!

    Best regards,

    opened by ZHONGHZONG 2
  • pretrained.th

    pretrained.th

    Dear professor syuqings,

    Strictly follow REDME ,In the third step of the training,Why was the post-training assessment not as high as the second-training model?Am I loading and training the pretrained.th the wrong way?

    Looking forward to your reply

    opened by ZHONGHZONG 2
  • Testing on a different video. Missing config files

    Testing on a different video. Missing config files

    Hi @syuqings,

    Firstly thanks for your work and for making your code public. I want to inference your model (produce paragraphs) for a different set of videos (instead of the Activity or Charades dataset). I have read the previous issues regarding the self-video test and I have extracted the needed features for the model.

    I am trying to run the driver/transformer.py script to produce the video paragraphs, but I cannot get pass the model_cfg_file and path_cfg_file arguments. I asssume that those are some config files for the modules, but I cannot find them in the repo.

    Thanks in advance for any help you can give me.

    George

    opened by mpalaourg 1
  • The performance of baseline model is higher than original paper

    The performance of baseline model is higher than original paper

    Thanks for your great work!

    However, I notice that the baseline performance reported in your paper is higher than their original paper. For example, the MART model. Why the performance is so different? Do you use a different evaluation script?

    opened by PKULiuHui 1
  • I3D feature extraction

    I3D feature extraction

    Hi, thank you for your great work :)

    I would like to extract I3D feature with my own video data. How can I access the feature extraction code? Can you share the script for the kinetics-i3d? (https://github.com/deepmind/kinetics-i3d) I also tried to extract the feature with (https://github.com/v-iashin/video_features/blob/4fa02bd5c5b8c34081dcfb609e2bcd5a973eaab2), but values from the feature were different.

    opened by lee-jiyoung 0
  • RL not working

    RL not working

    [issue] The fine-tuning step doesn't increase the scores (it even decreases the score). Please refer to the green line in the chart below.

    image

    [How to reproduce] I have trained the provided code on ActivityNet dataset. I have followed the data preparation and training instructions of the README.

    [questions] Is this the same behavior observed during your training?

    opened by Kashu7100 2
  • Trouble downloading the dataset

    Trouble downloading the dataset

    hi syuqing, I can't access your dataset because I cannot download it to china. Is it possible to have tour dataset beside using Baidudesk? (e.g.:google drive)

    opened by riokt 0
Owner
Yuqing Song
A student from RUC, major in CS.
Yuqing Song
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

Unsupervised-Multi-hop-QA This repository contains code and models for the paper: Unsupervised Multi-hop Question Answering by Question Generation (NA

Liangming Pan 70 Nov 27, 2022
Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

Zero-shot-Fact-Verification-by-Claim-Generation This repository contains code and models for the paper: Zero-shot Fact Verification by Claim Generatio

Liangming Pan 47 Jan 1, 2023
The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data This repository provides the implementation details for

null 124 Dec 27, 2022
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Qin Wang 87 Jan 8, 2023
Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis

MOT Tracked object bounding box association (CenterTrack++) New association method based on CenterTrack. Two new branches (Tracked Size and IOU) are a

null 36 Oct 4, 2022
A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

P-tuning A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''. How to use our code We have released the code

THUDM 562 Dec 27, 2022
Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge Introduction SentiLARE is a sentiment-aware pre-trained language

null 74 Dec 30, 2022
Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"

LADA This repo contains codes for the following paper: Jiaao Chen*, Zhenghui Wang*, Ran Tian, Zichao Yang, Diyi Yang: Local Additivity Based Data Augm

GT-SALT 36 Dec 2, 2022
Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization

DDAMS This is the pytorch code for our IJCAI 2021 paper Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization [Arxiv Pr

xcfeng 55 Dec 27, 2022
Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ResDAVEnet-VQ Official PyTorch implementation of Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech What is in this repo? M

Wei-Ning Hsu 21 Aug 23, 2022
codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification

DLCF-DCA codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification. submitted t

null 15 Aug 30, 2022
Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

NDQ: Learning Nearly Decomposable Value Functions with Communication Minimization Note This codebase accompanies paper Learning Nearly Decomposable Va

Tonghan Wang 69 Nov 26, 2022
Codes for CIKM'21 paper 'Self-Supervised Graph Co-Training for Session-based Recommendation'.

COTREC Codes for CIKM'21 paper 'Self-Supervised Graph Co-Training for Session-based Recommendation'. Requirements: Python 3.7, Pytorch 1.6.0 Best Hype

Xin Xia 42 Dec 9, 2022
codes for "Scheduled Sampling Based on Decoding Steps for Neural Machine Translation" (long paper of EMNLP-2022)

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation (EMNLP-2021 main conference) Contents Overview Background Quick to Use Furth

Adaxry 13 Jul 25, 2022
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 52 Nov 25, 2022
Multiple paper open-source codes of the Microsoft Research Asia DKI group

?? Paper Code Collection (MSRA DKI Group) This repo hosts multiple open-source codes of the Microsoft Research Asia DKI Group. You could find the corr

Microsoft 249 Jan 8, 2023