Chao-Yuan Wu, Philipp Krähenbühl, CVPR 2021
[Paper] [Project Page] [Dataset]
Citation
@inproceedings{lvu2021,
Author = {Chao-Yuan Wu and Philipp Kr\"{a}henb\"{u}hl},
Title = {{Towards Long-Form Video Understanding}},
Booktitle = {{CVPR}},
Year = {2021}}
Overview
This repo implements Object Transformers for long-form video understanding.
Getting Started
Please organize data/
as follows
data
|_ ava
|_ features
|_ instance_meta
|_ lvu_1.0
ava
, features
, and instance_meta
could be found at this Google Drive folder. lvu_1.0
can be found at here.
Please also download pre-trained weights at this Google Drive folder and put them in pretrained_models/
.
Pre-training
python3 -u run_pretrain.py
This pretrains on a small demo dataset data/instance_meta/instance_meta_pretrain_demo.pkl
as an example. Please follow its file format if you'd like to pretrain on a larger dataset (e.g., latest full version of MovieClips).
Training and evaluating on AVA v2.2
python3 -u run_ava.py
This should achieve 31.0 mAP.
Training and evaluating on LVU tasks
python3 -u run.py [1-9]
The argument selects a task to run on. Please see run.py
for details.
Acknowledgment
This implementation largely borrows from Huggingface Transformers. Please consider citing it if you use this repo.