Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

Melon(Xuguang Duan)

Last update: Nov 1, 2022

Related tags

Overview

WSDEC

This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos.

Description

Repo directories

./: global config files, training, evaluating scripts;
./data: data dictionary;
./model: our final models used to reproduce the results;
./runs: the default output dictionary used to store our trained model and result files;
./scripts: helper scripts;
./third_party: third party dependency include the official evaluation scripts;
./utils: helper functions;
./train_script: all training scripts;
./eval_script: all evalulating scripts.

Dependency

Python 2.7
CUDA 9.0(note: you will encounter a bug saying segmentation fault(core dump) if you run our code with CUDA 8.0)
- But it seems that the bug still exists. See issue
[Pytorch 0.3.1](note: 0.3.1 is not compatible with newer version)
numpy, hdf5 and other necessary packages(no special requirement)

Usage for reproduction

Before we start

Before the training and testing, we should make sure the data, third party data are prepared, here is the one-by-one steps to make everything prepared.

1. Clone our repo and submodules

git clone --recursive https://github.com/XgDuan/WSDEC

2. Download all the data

Download the official C3D features, you can either download the data from the website or from our onedrive cloud.
- Download from the official website; (Note, after you download the C3D features, you can either place it in the data folder and rename it as anet_v1.3.c3d.hdf5, or create a soft link in the data dictionary as ln -s YOURC3DFeature data/anet_v1.3.c3d.hdf5)
Download the dense video captioning data from the official website; (Similar to the C3D feature, you are supposed to place the download data in the data folder and rename it as densecap)
Download the data for the official evaluation scripts densevid_eval;
- run the command sh download.sh scripts in the folder PREFIX/WSDEC/third_party/densevid_eval;
[Good News]: we write a shell script for you to download the data, just run the following command:
```
cd data
sh download.sh
```

3. Generate the dictionary for the caption model

python scripts/caption_preprocess.py

Training

There are two steps for model training: pretrain a not so bad caption model; and the second step, train the final/baseline model.

Our pretrained captioning model is trained.

python train_script/train_cg_pretrain.py

train our final model

python train_script/train_final.py --checkpoint_cg YOUR_PRETRAINED_CAPTION_MODEL.ckp --alias MODEL_NAME

train baselines

train the baseline model without classification loss.

python train_script/train_baseline_regressor.py --checkpoint_cg YOUR_PRETRAINED_CAPTION_MODEL.ckp --alias MODEL_NAME

train the baseline model without regression branch.

python train_script/train_final.py --checkpoint_cg YOUR_PRETRAINED_CAPTION_MODEL.ckp --regressor_scale 0 --alias MODEL_NAME

About the arguments

All the arguments we use can be found in the corresponding training scripts. You can also use your own argumnets if you like to do so. But please mind, some arguments are discarded(This is our own reimplementation of our paper, the first version codes are too dirty that no one would like to use it.)

Testing

Testing is easier than training. Firstly, in the process of training, our scripts will call the densevid_eval in a subprocess every time after we run the eval function. From these results, you can have a general grasp about the final performance by just have a look at the eval_results.txt scripts. Secondly, after some epochs, you can run the evaluation scripts:

evaluate the full model or no_regression model:

python eval_script/evaluate.py --checkpoint YOUR_TRAINED_MODEL.ckp

evaluate the no_classification model:

python eval_script/evaluate_baseline_regressor.py --checkpoint YOUR_TRAINED_MODEL.ckp

evaluate the pretrained model with random temporal segment:

python eval_script/evaluate_pretrain.py --checkpoint YOUR_PRETRAIN_CAPTION_MODEL.ckp

Other usages

Besides reproduce our work, there are at least two interesting things you can do with our codes.

Train a supervised sentence localization model

To know what is sentence localization, you can have a look at our paper ABLR. Note that our work at a matter of fact provides an unsupervised solution towards sentence localization, we introduce the usage for the supervised model here. We have written the trainer, you can just run the following command and have a cup of coffee:

python train_script/train_sl.py

Train a supervised video event caption generation model

If you have read our paper, you would find that event captioning is the dual task of the aforementioned sentence localization task. To train such a model, just run the following command:

python train_script/train_cg.py

BUGS

You may encounter a cuda internal bug that says Segmentation fault(core dumped) during training if you are using cuda 8.0. If such things happen, try upgrading your cuda to 9.0.

other

We will add more description about how to use our code. Please feel free to contact us if you have any questions or suggestions.

Trained model and results

Links for our trained model

You can download our pretrained model for evaluation or further usage from our onedrive, which includes a pretrained caption generator(cg_pretrain.ckp), a baseline model without classification loss(baseline_noclass.ckp), a baseline model without regression branch(baseline_noregress.ckp), and our final model(final_model.ckp).

Cite the paper and give us star ⭐️

If you find our paper or code useful, please cite our paper using the following bibtex:

@incollection{NIPS2018_7569,
title = {Weakly Supervised Dense Event Captioning in Videos},
author = {Duan, Xuguang and Huang, Wenbing and Gan, Chuang and Wang, Jingdong and Zhu, Wenwu and Huang, Junzhou},
booktitle = {Advances in Neural Information Processing Systems 31},
editor = {S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett},
pages = {3062--3072},
year = {2018},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/7569-weakly-supervised-dense-event-captioning-in-videos.pdf}
}

Comments

Can't reproduce the reported results

Hi, Thanks for providing code in github. I am able to run your code. But unfortunately could not able to generate you reported results. Even I tried using your pretrained model from Onedrive specially for CIDEr, the scores are far away from the reported results. I followed your approach described in readme. Can you please tell me how can I able to reproduce the results?
bug

opened by trahman888 9
What's the train_script/train_cg_pretrain.py?

In the "training" of your readme, it says "python train_script/train_cg_pretrain.py" first. However, there is no file named "train_cg_pretrain.py" in the "train_script" folder. Which file should we run, "train_cg.py" or "train_captionmodel_pretrain.py"?
helpful

opened by jx-zhong-for-academic-purpose 5
Question encoundered while reproducing your experiments

Hi! Thanks for releasing your excellent work. While reproducing your experiments, I encountered several problems. 1）Is the default --translator_path should be changed from ./data/translator6000.pkl to ./data/translator.pkl to keep consistent with the dictionary generated in captioning_preprocessing.py? BTW, do I need to change the vocab_size accordingly? 2) I did not find a download.sh under the folder ./third_party/densevid_eval as stated in the ReadMe. Would u mind give me a hint about how to get that? Thanks again for releasing. It's really interesting work.

opened by KevinQian97 4
third_party evaluation script

Hi,

Thank you for sharing your code in such an organized way and it is really helpful! I was trying to run your code and evaluate the result with the provided third-party scripts. However, I encountered an error during the evaluation and I do not know how to fix it. I am wondering did you encountered the same problem while running the evaluate.py provided by the third-party? And if so, how did you solve it?

Here is the error while I was trying to run the evaluate.py script in the densevid_eval. Before this, I added 'shell=True' to the subprocess.Popen in both ptbtokenizer.py and meteor.py in order to solve the following two errors:

I am not so sure whether my changes to the two files cause the key error, so I am wondering did you also encountered this. Thanks so much in advance:)

opened by XuchenWang 4
translator.pkl

Hello, thank you very much for your sharing. Could you please tell me where to download the translator.pkl that is captioning dict between words and indexes? Looking forward to your reply.

opened by gujinjing 1
Hi! I have some question about just generating caption about full trimmed video

Hi! First, Thank you for your awesome and kindness github!

I'm on research about caption-to-video generation and I need dataset included pair of video and captions.

So This github show me the hope. But I have some problem

I already ready C3D-500dims features via C3D and PCA(n_samples * 1 * 500). But I don't have any video_length and video_mask because i used the trimmed video.

So how can I run just captioning model with your pretrained model?

Thanks you for reading :)

opened by KiBeomHong 1
Segmentation fault (core dumped) even with Cuda-9.0

Thanks for sharing the code!

I ran the training code with CUDA-9.0 under Pytorch-0.3.1-cuda90. But, I still met the bug. Can you tell me which part of the code leads to the bug? I would like to try to address it.

Thanks.
bug

opened by YapengTian 9