Towards Long-Form Video Understanding

Chao-Yuan Wu

Last update: Dec 26, 2022

Related tags

Deep Learning lvu

Overview

Towards Long-Form Video Understanding

Chao-Yuan Wu, Philipp Krähenbühl, CVPR 2021

[Paper] [Project Page] [Dataset]

Citation

@inproceedings{lvu2021,
  Author    = {Chao-Yuan Wu and Philipp Kr\"{a}henb\"{u}hl},
  Title     = {{Towards Long-Form Video Understanding}},
  Booktitle = {{CVPR}},
  Year      = {2021}}

Overview

This repo implements Object Transformers for long-form video understanding.

Getting Started

Please organize data/ as follows

data
|_ ava
|_ features
|_ instance_meta
|_ lvu_1.0

ava, features, and instance_meta could be found at this Google Drive folder. lvu_1.0 can be found at here.

Please also download pre-trained weights at this Google Drive folder and put them in pretrained_models/.

Pre-training

python3 -u run_pretrain.py

This pretrains on a small demo dataset data/instance_meta/instance_meta_pretrain_demo.pkl as an example. Please follow its file format if you'd like to pretrain on a larger dataset (e.g., latest full version of MovieClips).

Training and evaluating on AVA v2.2

python3 -u run_ava.py

This should achieve 31.0 mAP.

Training and evaluating on LVU tasks

python3 -u run.py [1-9]

The argument selects a task to run on. Please see run.py for details.

Acknowledgment

This implementation largely borrows from Huggingface Transformers. Please consider citing it if you use this repo.

Comments

about person tracking algorithm for AVA dataset

Hi, I want to track the person in adjacent frame，and I have detected the person bbox in each keyframe, could you tell me how to track them？I haven't found it in your paper, please help, thank you!

opened by Chuckie-He 6
Details regarding bbox information for the MovieClips Dataset

Hi, I had a query regarding the bounding boxes information that is extracted out of the MovieClips Dataset in your framework. As far as I get it, that is already provided in the GT labels and hence just passed forward in your main framework. Are these bbox information already provided with the dataset or something that you calculated on your own? If the latter case I would really appreciate if you could share the codebase required for getting such annotations.

Thanks !

opened by aniket-agarwal1999 0
How do we evaluate our method on AVA for spatial-temporal action detection?

Hi author, I am reading your paper, thanks for the awesome work. In Table 4 of the paper, the results is just about action recognition? I am more interested in action detection. run_ava.sh is for action recognition or action detection?

Thanks, napohou

opened by napohou 0
Reproduction of paper results

Hi！ thanks for open source code! I have problem with reproduction of your results on ava dataset, How should I set parameters to train your model on ava from scratch?

opened by troublecmd 1
Clarification on Outro and Logo Removal

On page 5 of the paper, it is mentioned that the outro is removed for all papers. I was wondering if the MovieClips watermark was also removed by cropping the bottom boundary of each video? Also, was the outro manually removed from each video? Or was the last detected shot of every movie dropped as an approximation? Details on this would greatly be appreciated!

opened by dfan 0

Owner

Chao-Yuan Wu

GitHub

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

41 Nov 8, 2022

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

PySlowFast PySlowFast is an open source video understanding codebase from FAIR that provides state-of-the-art video classification models with efficie

5.3k Jan 3, 2023

Towards Part-Based Understanding of RGB-D Scans

Towards Part-Based Understanding of RGB-D Scans (CVPR 2021) We propose the task of part-based scene understanding of real-world 3D environments: from

26 Nov 23, 2022

[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

Towards Understanding and Mitigating Social Biases in Language Models This repo contains code and data for evaluating and mitigating bias from generat

42 Jan 3, 2023

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

SuperGen The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding. Requirements Before running, you

38 Dec 12, 2022

[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos Created by Muheng Li, Lei Chen, Yueqi Duan, Zhilan Hu, Jianjiang Feng, Jie

58 Dec 23, 2022

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective Zhengzhuo Xu, Zenghao Chai, Chun Yuan This is the PyTorch implement

16 Dec 15, 2022

PyTorchVideo is a deeplearning library with a focus on video understanding work

PyTorchVideo is a deeplearning library with a focus on video understanding work. PytorchVideo provides resusable, modular and efficient components needed to accelerate the video understanding research. PyTorchVideo is developed using PyTorch and supports different deeplearning video components like video models, video datasets, and video-specific transforms.

2.7k Jan 7, 2023

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Introduction English | 简体中文 MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project. The m

2.7k Jan 7, 2023

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

TimeSformer This is an official pytorch implementation of Is Space-Time Attention All You Need for Video Understanding?. In this repository, we provid

1k Dec 31, 2022

The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Temporal Query Networks for Fine-grained Video Understanding ?? This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks

55 Dec 21, 2022

EssentialMC2 Video Understanding

EssentialMC2 Introduction EssentialMC2 is a complete system to solve video understanding tasks including MHRL(representation learning), MECR2( relatio

106 Dec 11, 2022

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Relational Self-Attention: What's Missing in Attention for Video Understanding This repository is the official implementation of "Relational Self-Atte

43 Dec 7, 2022

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

1 Jan 23, 2022

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

TBE The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Le

150 Dec 28, 2022

Towards Long-Form Video Understanding

Related tags

Overview

Towards Long-Form Video Understanding

[Paper] [Project Page] [Dataset]

Citation

Overview

Getting Started

Pre-training

Training and evaluating on AVA v2.2

Training and evaluating on LVU tasks

Acknowledgment

Comments

about person tracking algorithm for AVA dataset

Details regarding bbox information for the MovieClips Dataset

How do we evaluate our method on AVA for spatial-temporal action detection?

Reproduction of paper results

Clarification on Outro and Logo Removal

Owner

Chao-Yuan Wu

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Towards Part-Based Understanding of RGB-D Scans

[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

PyTorchVideo is a deeplearning library with a focus on video understanding work

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

EssentialMC2 Video Understanding

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

Towards End-to-end Video-based Eye Tracking

Out-of-boundary View Synthesis towards Full-frame Video Stabilization

Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation