PyVideoAI: Action Recognition Framework

Kiyoon Kim

Last update: Dec 29, 2022

Related tags

Deep Learning PyVideoAI

Overview

This reposity contains official implementation of:

Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition (Kim et al., 2022) Instruction

PyVideoAI: Action Recognition Framework

The only framework that completes your computer vision, action recognition research environment.

** Key features **

Supports multi-gpu, multi-node training.
STOA models such as I3D, Non-local, TSN, TRN, TSM, MVFNet, ..., and even ImageNet training!
Many datasets such as Kinetics-400, EPIC-Kitchens-55, Something-Something-V1/V2, HMDB-51, UCF-101, Diving48, CATER, ...
Supports both video decoding (straight from .avi/mp4) and frame extracted (.jpg/png) dataloaders, sparse-sample and dense-sample.
Any popular LR scheduling like Cosine Annealing with Warm Restart, Step LR, and Reduce LR on Plateau.
Early stopping when training doesn't improve (customise your condition)
Easily add custom model, optimiser, scheduler, loss and dataloader!
Telegram bot reporting experiment status.
TensorBoard reporting stats.
Colour logging
All of the above come with no extra setup. Trust me and try some examples.

** Papers implemented **

[Official] TC Reordering and GrayST (Kim et al. 2022).
ProSelfLC (CVPR 2021).

This package is motivated by PySlowFast from Facebook AI. The PySlowFast is a cool framework, but it depends too much on their config system and it was difficult to add new models (other codes) or reuse part of the modules from the framework.
This framework by Kiyoon, is designed to replace all the configuration systems to Python files, which enables easy-addition of custom models/LR scheduling/dataloader etc.
Just modify the function bodies in the config files!

Difference between the two config systems can be found in CONFIG_SYSTEM.md.

Getting Started

Jupyter Notebook examples to run:

HMDB-51 data preparation
Inference on pre-trained model from the model zoo, and visualise model/dataloader/per-class performance.
Training I3D using Kinetics pretrained model
Using image model and ImageNet dataset

is provided in the examples!

Structure

All of the executable files are in tools/.
dataset_configs/ directory configures datasets. For example, where is the dataset stored, number of classes, single-label or multi-label training, dataset-specific visualisation settings (confusion matrix has different output sizes)
model_configs/ directory configures model architectures. For example, model definition, input preprocessing mean/std.
exp_configs/ directory configures other training settings like optimiser, scheduling, dataloader, number of frames as input. The config file path has to be in exp_configs/[dataset_name]/[model_name]_[experiment_name].py format.

Usage

Preparing datasets

This package supports many action recognition datasets such as HMDB-51, EPIC-Kitchens-55, Something-Something-V1, CATER, etc.
Refer to DATASET.md.

Training command

CUDA_VISIBLE_DEVICES=0,1,2,3 python tools/run_train.py -D {dataset_config_name} -M {model_config_name} -E {exp_config_name} --local_world_size {num_GPUs} -e {num_epochs}

--local_world_size denotes the number of GPUs per computing node.

Telegram Bot

You can preview experiment results using Telegram bots!

If your code raises an exception, it will report you too.

You can quickly take a look at example video inputs (as GIF or JPEGs) from the dataloader.
Use tools/visualisations/model_and_dataloader_visualiser.py

Talk to BotFather and make a bot.
Go to your bot and type anything (/start)
Find chat_id at https://api.telegram.org/bot{token}/getUpdates (without braces)
Add your token and chat_id to tools/key.ini.

[Telegram0]
token=
chat_id=

Model Zoo and Baselines

Refer to MODEL_ZOO.md

Installation

Refer to INSTALL.md.

TL;DR,

conda create -n videoai python=3.8
conda activate videoai
conda install pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=10.2 -c pytorch
### For RTX 30xx GPUs,
#conda install pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=11.1 -c pytorch -c nvidia
 

git clone --recurse-submodules https://github.com/kiyoon/PyVideoAI.git
cd PyVideoAI
git checkout v0.3
git submodule update --recursive
cd submodules/video_datasets_api
pip install -e .
cd ../experiment_utils
pip install -e .
cd ../..
pip install -e .

Experiment outputs

The experiment results (log, training stats, weights, tensorboard, plots, etc.) are saved to data/experiments by default. This can be huge, so make sure you make a softlink of a directory you really want to use. (recommended)

Otherwise, you can change pyvideoai/config.py's DEFAULT_EXPERIMENT_ROOT value. Or, you can also set --experiment_root argument manually when executing.

You might also like...

AutoVideo: An Automated Video Action Recognition System

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video action recognition, supporting various state-of-the-art video action recognition algorithms. It also supports automated model selection and hyperparameter tuning. AutoVideo is developed by DATA Lab at Texas A&M University.

Data Analytics Lab at Texas A&M University

267 Dec 17, 2022

3D ResNets for Action Recognition (CVPR 2018)

3D ResNets for Action Recognition Update (2020/4/13) We published a paper on arXiv. Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,

3.5k Jan 6, 2023

[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

CTR-GCN This repo is the official implementation for Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. The pap

148 Dec 16, 2022

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

268 Jan 9, 2023

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

32 Sep 25, 2021

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

4 Nov 3, 2022

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

5.8k Dec 31, 2022

keyframes-CNN-RNN(action recognition)

keyframes-CNN-RNN(action recognition) Environment: python=3.7 pytorch=1.2 Datasets: Following the format of UCF101 action recognition. Run steps: Mo

4 Feb 9, 2022

Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19

2s-AGCN Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19 Note PyTorch version should be 0.3! For PyTor

547 Dec 26, 2022

Comments

Has this method been tested on Kinetics?

Hi! Thanks to your great work. Have you tested Time-Color Reordering and Grayscale Short-Term Stacking on Kinetics? I wonder if this method works on Kinetics.

opened by bolin-chen 2

PyVideoAI: Action Recognition Framework

Related tags

Overview

PyVideoAI: Action Recognition Framework

Getting Started

Structure

Usage

Preparing datasets

Training command

Telegram Bot

Model Zoo and Baselines

Installation

Experiment outputs

You might also like...

AutoVideo: An Automated Video Action Recognition System

3D ResNets for Action Recognition (CVPR 2018)

[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

keyframes-CNN-RNN(action recognition)

Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19

Comments

Has this method been tested on Kinetics?

Owner

Kiyoon Kim

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Human Action Controller - A human action controller running on different platforms.

TDN: Temporal Difference Networks for Efficient Action Recognition

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

Learning Representational Invariances for Data-Efficient Action Recognition

Synthetic Humans for Action Recognition, IJCV 2021

A pytorch reproduction of { Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation }.

Compressed Video Action Recognition

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"