OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

OpenMMLab

Last update: Jan 7, 2023

Related tags

Deep Learning benchmark pytorch ava bsn x3d action-recognition video-understanding tsm non-local i3d tsn bmn slowfast temporal-action-localization spatial-temporal-action-detection openmmlab

Overview

Introduction

English | 简体中文

MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project.

The master branch works with PyTorch 1.3+.

Action Recognition Results on Kinetics-400

Spatio-Temporal Action Detection Results on AVA-2.1

Major Features

Modular design

We decompose the video understanding framework into different components and one can easily construct a customized video understanding framework by combining different modules.
Support for various datasets

The toolbox directly supports multiple datasets, UCF101, Kinetics-[400/600/700], Something-Something V1&V2, Moments in Time, Multi-Moments in Time, THUMOS14, etc.
Support for multiple video understanding frameworks

MMAction2 implements popular frameworks for video understanding:
- For action recognition, various algorithms are implemented, including TSN, TSM, TIN, R(2+1)D, I3D, SlowOnly, SlowFast, CSN, Non-local, etc.
- For temporal action localization, we implement BSN, BMN, SSN.
- For spatial temporal detection, we implement SlowOnly, SlowFast.
Well tested and documented

We provide detailed documentation and API reference, as well as unittests.

Changelog

v0.13.0 was released in 31/03/2021. Please refer to changelog.md for details and release history.

Benchmark

Model	input	io backend	batch size x gpus	MMAction2 (s/iter)	MMAction (s/iter)	Temporal-Shift-Module (s/iter)	PySlowFast (s/iter)
TSN	256p rawframes	Memcached	32x8	0.32	0.38	0.42	x
TSN	256p dense-encoded video	Disk	32x8	0.61	x	x	TODO
I3D heavy	256p videos	Disk	8x8	0.34	x	x	0.44
I3D	256p rawframes	Memcached	8x8	0.43	0.56	x	x
TSM	256p rawframes	Memcached	8x8	0.31	x	0.41	x
Slowonly	256p videos	Disk	8x8	0.32	TODO	x	0.34
Slowfast	256p videos	Disk	8x8	0.69	x	x	1.04
R(2+1)D	256p videos	Disk	8x8	0.45	x	x	x

Details can be found in benchmark.

ModelZoo

Supported methods for Action Recognition:

(click to collapse)

✅ TSN (ECCV'2016)
✅ TSM (ICCV'2019)
✅ TSM Non-Local (ICCV'2019)
✅ R(2+1)D (CVPR'2018)
✅ I3D (CVPR'2017)
✅ I3D Non-Local (CVPR'2018)
✅ SlowOnly (ICCV'2019)
✅ SlowFast (ICCV'2019)
✅ CSN (ICCV'2019)
✅ TIN (AAAI'2020)
✅ TPN (CVPR'2020)
✅ C3D (CVPR'2014)
✅ X3D (CVPR'2020)
✅ OmniSource (ECCV'2020)
✅ MultiModality: Audio (ArXiv'2020)
✅ TANet (ArXiv'2020)
✅ TRN (CVPR'2015)

Supported methods for Temporal Action Detection:

(click to collapse)

✅ BSN (ECCV'2018)
✅ BMN (ICCV'2019)
✅ SSN (ICCV'2017)

Supported methods for Spatial Temporal Action Detection:

(click to collapse)

✅ SlowOnly+Fast R-CNN (ICCV'2019)
✅ SlowFast+Fast R-CNN (ICCV'2019)
✅ Long-Term Feature Bank (CVPR'2019)

Results and models are available in the README.md of each method's config directory. A summary can be found in the model zoo page.

We will keep up with the latest progress of the community, and support more popular algorithms and frameworks. If you have any feature requests, please feel free to leave a comment in Issues.

Dataset

Supported datasets:

Supported datasets for Action Recognition:

(click to collapse)

✅ UCF101 [ Homepage ] (CRCV-IR-12-01)
✅ HMDB51 [ Homepage ] (ICCV'2011)
✅ Kinetics-[400/600/700] [ Homepage ] (CVPR'2017)
✅ Something-Something V1 [ Homepage ] (ICCV'2017)
✅ Something-Something V2 [ Homepage ] (ICCV'2017)
✅ Moments in Time [ Homepage ] (TPAMI'2019)
✅ Multi-Moments in Time [ Homepage ] (ArXiv'2019)
✅ HVU [ Homepage ] (ECCV'2020)
✅ Jester [ Homepage ] (ICCV'2019)
✅ GYM [ Homepage ] (CVPR'2020)
✅ ActivityNet [ Homepage ] (CVPR'2015)

Supported datasets for Temporal Action Detection

(click to collapse)

✅ ActivityNet [ Homepage ] (CVPR'2015)
✅ THUMOS14 [ Homepage ] (THUMOS Challenge 2014)

Supported datasets for Spatial Temporal Action Detection

(click to collapse)

✅ AVA [ Homepage ] (CVPR'2018)
🔲 UCF101-24 [ Homepage ] (CRCV-IR-12-01)
🔲 JHMDB [ Homepage ] (ICCV'2013)

Datasets marked with 🔲 are not fully supported yet, but related dataset preparation steps are provided.

Installation

Please refer to install.md for installation.

Data Preparation

Please refer to data_preparation.md for a general knowledge of data preparation. The supported datasets are listed in supported_datasets.md

Get Started

Please see getting_started.md for the basic usage of MMAction2. There are also tutorials:

A Colab tutorial is also provided. You may preview the notebook here or directly run on Colab.

FAQ

Please refer to FAQ for frequently asked questions.

License

This project is released under the Apache 2.0 license.

Citation

If you find this project useful in your research, please consider cite:

@misc{2020mmaction2,
    title={OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark},
    author={MMAction2 Contributors},
    howpublished = {\url{https://github.com/open-mmlab/mmaction2}},
    year={2020}
}

Contributing

We appreciate all contributions to improve MMAction2. Please refer to CONTRIBUTING.md in MMCV for more details about the contributing guideline.

Acknowledgement

MMAction2 is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new models.

Projects in OpenMMLab

MMCV: OpenMMLab foundational library for computer vision.
MMClassification: OpenMMLab image classification toolbox and benchmark.
MMDetection: OpenMMLab detection toolbox and benchmark.
MMDetection3D: OpenMMLab's next-generation platform for general 3D object detection.
MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark.
MMAction2: OpenMMLab's next-generation video understanding toolbox and benchmark.
MMTracking: OpenMMLab video perception toolbox and benchmark.
MMPose: OpenMMLab pose estimation toolbox and benchmark.
MMEditing: OpenMMLab image and video editing toolbox.
MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding.

Comments

[Improvement] Set RandAugment as Imgaug default transforms.
Use imgaug to reimplement RandAugment.

According to VideoMix, RandAugment helps a little.

Results

sthv1 & tsm-r50, 8V100, 50epochs

|configs|top1 acc(efficient/accuracy)|top5 acc(efficient/accuracy)| |:-|:-:|:-:| |mmaction2 model zoo|45.58 / 47.70|75.02 / 76.12| |testing with model zoo ckpt|45.47 / 47.55|74.56 / 75.79| |training with default config|45.82 / 47.90|74.38 / 76.02| |flip|47.10 / 48.51|75.02 / 76.12| |randaugment|47.16 / 48.90|76.07 / 77.92| |flip+randaugment|47.85/50.31|76.78/78.18|

Kinetics400, 8 V100, test with 256x256 & three crops

|Models|top1/5 accuracy|Training lost(epoch 100)|training time| |:-:|:-:|:-:|:-:| |TSN-R50-1x1x8-Vanilla|70.74%/89.37%|0.8|2days 12hours| |TSN-R50-1x1x8-RandAugment|71.07%/89.40%|1.3|2days 22hours| |I3D-R50-32x2x1-Vanilla|74.48%/91.62%|1.1|3days 10hours| |I3D-R50-32x2x1-RandAugment|74.23%/91.45%|1.5|4days 10hours|
opened by irvingzhang0512 40

What this training log refers? And for training SlowFast on new data for some custom activity, is there any minimum sample size to start with?

I prepared short sample custom data in AVA format for 2 activity Sweeping and walking, then trained SlowFast for 50 epocs on clip_len=16 (due to hardware limitation). Sharing below the training log json details, looks like its not learning anything because mAP is consistently 0 for all epocs, what could be possible reasons behind it?

Compiler: 10.2\nMMAction2: 0.12.0+13f42bf", "seed": null, "config_name": "custom_slowfast.py", "work_dir": "slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_clean-data_new_e80", "hook_msgs": {}}

{"mode": "train", "epoch": 1, "iter": 20, "lr": 0.0562, "memory": 8197, "data_time": 0.18563, "loss_action_cls": 0.16409, "recall@thr=0.5": 0.71278, "prec@thr=0.5": 0.67664, "recall@top3": 0.90636, "prec@top3": 0.30212, "recall@top5": 0.91545, "prec@top5": 0.18309, "loss": 0.16409, "grad_norm": 0.91884, "time": 0.99759}
{"mode": "val", "epoch": 1, "iter": 22, "lr": 0.0598, "[email protected]": 0.0}

{"mode": "train", "epoch": 2, "iter": 20, "lr": 0.0958, "memory": 8197, "data_time": 0.1842, "loss_action_cls": 0.10098, "recall@thr=0.5": 0.75593, "prec@thr=0.5": 0.74255, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.10098, "grad_norm": 0.34649, "time": 0.98014}
{"mode": "val", "epoch": 2, "iter": 22, "lr": 0.0994, "[email protected]": 0.0}

{"mode": "train", "epoch": 3, "iter": 20, "lr": 0.1354, "memory": 8197, "data_time": 0.18966, "loss_action_cls": 0.10026, "recall@thr=0.5": 0.77377, "prec@thr=0.5": 0.77127, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.10026, "grad_norm": 0.30035, "time": 0.99118}
{"mode": "val", "epoch": 3, "iter": 22, "lr": 0.139, "[email protected]": 0.0}

{"mode": "train", "epoch": 4, "iter": 20, "lr": 0.175, "memory": 8197, "data_time": 0.18845, "loss_action_cls": 0.12424, "recall@thr=0.5": 0.79485, "prec@thr=0.5": 0.78929, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.12424, "grad_norm": 0.19094, "time": 0.99367}
{"mode": "val", "epoch": 4, "iter": 22, "lr": 0.1786, "[email protected]": 0.0}

{"mode": "train", "epoch": 5, "iter": 20, "lr": 0.2146, "memory": 8197, "data_time": 0.18817, "loss_action_cls": 0.11159, "recall@thr=0.5": 0.79285, "prec@thr=0.5": 0.77159, "recall@top3": 0.99545, "prec@top3": 0.33182, "recall@top5": 0.99545, "prec@top5": 0.19909, "loss": 0.11159, "grad_norm": 0.16631, "time": 0.99733}
{"mode": "val", "epoch": 5, "iter": 22, "lr": 0.2182, "[email protected]": 0.0}

{"mode": "train", "epoch": 6, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.18938, "loss_action_cls": 0.11952, "recall@thr=0.5": 0.735, "prec@thr=0.5": 0.73273, "recall@top3": 0.98, "prec@top3": 0.32667, "recall@top5": 0.98, "prec@top5": 0.196, "loss": 0.11952, "grad_norm": 0.26395, "time": 0.99816}
{"mode": "val", "epoch": 6, "iter": 22, "lr": 0.22, "[email protected]": 0.0}

{"mode": "train", "epoch": 7, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.19043, "loss_action_cls": 0.11324, "recall@thr=0.5": 0.82705, "prec@thr=0.5": 0.82227, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.11324, "grad_norm": 0.1336, "time": 0.9999}
{"mode": "val", "epoch": 7, "iter": 22, "lr": 0.22, "[email protected]": 0.0}
{"mode": "train", "epoch": 8, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.18619, "loss_action_cls": 0.08463, "recall@thr=0.5": 0.82482, "prec@thr=0.5": 0.81927, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.08463, "grad_norm": 0.11848, "time": 0.99716}
{"mode": "val", "epoch": 8, "iter": 22, "lr": 0.22, "[email protected]": 0.0}

{"mode": "train", "epoch": 9, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.18562, "loss_action_cls": 0.09073, "recall@thr=0.5": 0.77285, "prec@thr=0.5": 0.77035, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.09073, "grad_norm": 0.12449, "time": 0.99849}
{"mode": "val", "epoch": 9, "iter": 22, "lr": 0.22, "[email protected]": 0.0}

{"mode": "train", "epoch": 10, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.18366, "loss_action_cls": 0.09193, "recall@thr=0.5": 0.81924, "prec@thr=0.5": 0.81369, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.09193, "grad_norm": 0.09078, "time": 0.99763}
{"mode": "val", "epoch": 10, "iter": 22, "lr": 0.22, "[email protected]": 0.0}

{"mode": "train", "epoch": 11, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18933, "loss_action_cls": 0.09355, "recall@thr=0.5": 0.84336, "prec@thr=0.5": 0.84086, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.09355, "grad_norm": 0.08913, "time": 1.00207}
{"mode": "val", "epoch": 11, "iter": 22, "lr": 0.022, "[email protected]": 0.0}

{"mode": "train", "epoch": 12, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18655, "loss_action_cls": 0.09352, "recall@thr=0.5": 0.84199, "prec@thr=0.5": 0.83949, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.09352, "grad_norm": 0.09578, "time": 0.99861}
{"mode": "val", "epoch": 12, "iter": 22, "lr": 0.022, "[email protected]": 0.0}

{"mode": "train", "epoch": 13, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18258, "loss_action_cls": 0.09836, "recall@thr=0.5": 0.86856, "prec@thr=0.5": 0.86856, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.09836, "grad_norm": 0.07878, "time": 0.99762}
{"mode": "val", "epoch": 13, "iter": 22, "lr": 0.022, "[email protected]": 0.0}

{"mode": "train", "epoch": 14, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18307, "loss_action_cls": 0.08192, "recall@thr=0.5": 0.86619, "prec@thr=0.5": 0.86619, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.08192, "grad_norm": 0.07241, "time": 0.99841}
{"mode": "val", "epoch": 14, "iter": 22, "lr": 0.022, "[email protected]": 0.0}

{"mode": "train", "epoch": 15, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18555, "loss_action_cls": 0.07062, "recall@thr=0.5": 0.84995, "prec@thr=0.5": 0.84995, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.07062, "grad_norm": 0.07792, "time": 0.99924}
{"mode": "val", "epoch": 15, "iter": 22, "lr": 0.022, "[email protected]": 0.0}

{"mode": "train", "epoch": 16, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18864, "loss_action_cls": 0.08495, "recall@thr=0.5": 0.86629, "prec@thr=0.5": 0.86629, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.08495, "grad_norm": 0.08121, "time": 1.00141}
{"mode": "val", "epoch": 16, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 17, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18965, "loss_action_cls": 0.11092, "recall@thr=0.5": 0.8503, "prec@thr=0.5": 0.8503, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.11092, "grad_norm": 0.06323, "time": 1.00582}
{"mode": "val", "epoch": 17, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 18, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18077, "loss_action_cls": 0.08457, "recall@thr=0.5": 0.85369, "prec@thr=0.5": 0.85369, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.08457, "grad_norm": 0.06237, "time": 0.9956}
{"mode": "val", "epoch": 18, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 19, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18342, "loss_action_cls": 0.08996, "recall@thr=0.5": 0.84434, "prec@thr=0.5": 0.84226, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.08996, "grad_norm": 0.07551, "time": 0.99802}
{"mode": "val", "epoch": 19, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 20, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18127, "loss_action_cls": 0.08211, "recall@thr=0.5": 0.85747, "prec@thr=0.5": 0.85747, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.08211, "grad_norm": 0.06186, "time": 0.99498}
{"mode": "val", "epoch": 20, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 21, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18135, "loss_action_cls": 0.0857, "recall@thr=0.5": 0.84931, "prec@thr=0.5": 0.84931, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.0857, "grad_norm": 0.07136, "time": 0.995}
{"mode": "val", "epoch": 21, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 22, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18529, "loss_action_cls": 0.08998, "recall@thr=0.5": 0.86644, "prec@thr=0.5": 0.86208, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.08998, "grad_norm": 0.07752, "time": 0.99948}
{"mode": "val", "epoch": 22, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 23, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18675, "loss_action_cls": 0.07464, "recall@thr=0.5": 0.84141, "prec@thr=0.5": 0.84141, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.07464, "grad_norm": 0.07109, "time": 1.02437}
{"mode": "val", "epoch": 23, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 24, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.19255, "loss_action_cls": 0.09615, "recall@thr=0.5": 0.87189, "prec@thr=0.5": 0.87189, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.09615, "grad_norm": 0.06948, "time": 1.00467}
{"mode": "val", "epoch": 24, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 25, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18252, "loss_action_cls": 0.0939, "recall@thr=0.5": 0.86088, "prec@thr=0.5": 0.86088, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.0939, "grad_norm": 0.06941, "time": 0.99516}
{"mode": "val", "epoch": 25, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 26, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18245, "loss_action_cls": 0.09089, "recall@thr=0.5": 0.84902, "prec@thr=0.5": 0.84901, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.09089, "grad_norm": 0.05622, "time": 0.99528}
{"mode": "val", "epoch": 26, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 27, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18309, "loss_action_cls": 0.0874, "recall@thr=0.5": 0.87808, "prec@thr=0.5": 0.87808, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.0874, "grad_norm": 0.06894, "time": 0.99701}
{"mode": "val", "epoch": 27, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 28, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18577, "loss_action_cls": 0.08544, "recall@thr=0.5": 0.84664, "prec@thr=0.5": 0.84437, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.08544, "grad_norm": 0.07643, "time": 0.99881}
{"mode": "val", "epoch": 28, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 29, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18908, "loss_action_cls": 0.10787, "recall@thr=0.5": 0.87369, "prec@thr=0.5": 0.87141, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.10787, "grad_norm": 0.05707, "time": 1.00178}
{"mode": "val", "epoch": 29, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 30, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18647, "loss_action_cls": 0.0934, "recall@thr=0.5": 0.8727, "prec@thr=0.5": 0.87042, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.0934, "grad_norm": 0.05735, "time": 0.99853}
{"mode": "val", "epoch": 30, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 31, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18154, "loss_action_cls": 0.07874, "recall@thr=0.5": 0.85874, "prec@thr=0.5": 0.85874, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.07874, "grad_norm": 0.06633, "time": 0.99413}
{"mode": "val", "epoch": 31, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 32, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18083, "loss_action_cls": 0.07918, "recall@thr=0.5": 0.86742, "prec@thr=0.5": 0.86492, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.07918, "grad_norm": 0.06247, "time": 0.9932}
{"mode": "val", "epoch": 32, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 33, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18088, "loss_action_cls": 0.08861, "recall@thr=0.5": 0.86927, "prec@thr=0.5": 0.86735, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.08861, "grad_norm": 0.07271, "time": 0.99552}
{"mode": "val", "epoch": 33, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 34, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.1886, "loss_action_cls": 0.09317, "recall@thr=0.5": 0.86667, "prec@thr=0.5": 0.86667, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.09317, "grad_norm": 0.06294, "time": 1.00273}
{"mode": "val", "epoch": 34, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 35, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18746, "loss_action_cls": 0.089, "recall@thr=0.5": 0.87669, "prec@thr=0.5": 0.87669, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.089, "grad_norm": 0.06243, "time": 0.99921}
{"mode": "val", "epoch": 35, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 36, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18179, "loss_action_cls": 0.07702, "recall@thr=0.5": 0.86391, "prec@thr=0.5": 0.86391, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.07702, "grad_norm": 0.07411, "time": 0.99609}
{"mode": "val", "epoch": 36, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 37, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18797, "loss_action_cls": 0.08872, "recall@thr=0.5": 0.86088, "prec@thr=0.5": 0.86088, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.08872, "grad_norm": 0.07458, "time": 0.99985}
{"mode": "val", "epoch": 37, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 38, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18704, "loss_action_cls": 0.08762, "recall@thr=0.5": 0.87121, "prec@thr=0.5": 0.86843, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.08762, "grad_norm": 0.06538, "time": 0.99896}
{"mode": "val", "epoch": 38, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 39, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18852, "loss_action_cls": 0.08822, "recall@thr=0.5": 0.85919, "prec@thr=0.5": 0.85919, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.08822, "grad_norm": 0.07977, "time": 1.0016}
{"mode": "val", "epoch": 39, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 40, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18234, "loss_action_cls": 0.09024, "recall@thr=0.5": 0.85601, "prec@thr=0.5": 0.85601, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.09024, "grad_norm": 0.06097, "time": 0.99434}
{"mode": "val", "epoch": 40, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 41, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18165, "loss_action_cls": 0.09851, "recall@thr=0.5": 0.84987, "prec@thr=0.5": 0.84737, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.09851, "grad_norm": 0.06554, "time": 0.99627}
{"mode": "val", "epoch": 41, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 42, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18597, "loss_action_cls": 0.10595, "recall@thr=0.5": 0.87117, "prec@thr=0.5": 0.87117, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.10595, "grad_norm": 0.05842, "time": 0.99769}
{"mode": "val", "epoch": 42, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 43, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.1856, "loss_action_cls": 0.08387, "recall@thr=0.5": 0.86939, "prec@thr=0.5": 0.86939, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.08387, "grad_norm": 0.06906, "time": 1.00146}
{"mode": "val", "epoch": 43, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 44, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18118, "loss_action_cls": 0.08536, "recall@thr=0.5": 0.85187, "prec@thr=0.5": 0.85187, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.08536, "grad_norm": 0.0665, "time": 0.9931}
{"mode": "val", "epoch": 44, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 45, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18369, "loss_action_cls": 0.09834, "recall@thr=0.5": 0.84446, "prec@thr=0.5": 0.84169, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.09834, "grad_norm": 0.07264, "time": 0.99587}
{"mode": "val", "epoch": 45, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 46, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18497, "loss_action_cls": 0.07137, "recall@thr=0.5": 0.85472, "prec@thr=0.5": 0.85194, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.07137, "grad_norm": 0.07303, "time": 0.99785}
{"mode": "val", "epoch": 46, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 47, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18986, "loss_action_cls": 0.07812, "recall@thr=0.5": 0.86687, "prec@thr=0.5": 0.86687, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.07812, "grad_norm": 0.06059, "time": 1.00136}
{"mode": "val", "epoch": 47, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 48, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.188, "loss_action_cls": 0.09891, "recall@thr=0.5": 0.85929, "prec@thr=0.5": 0.85929, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.09891, "grad_norm": 0.05919, "time": 0.99993}
{"mode": "val", "epoch": 48, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 49, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18616, "loss_action_cls": 0.06949, "recall@thr=0.5": 0.85987, "prec@thr=0.5": 0.85987, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.06949, "grad_norm": 0.07458, "time": 0.99806}
{"mode": "val", "epoch": 49, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

{"mode": "train", "epoch": 50, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.1849, "loss_action_cls": 0.07176, "recall@thr=0.5": 0.88101, "prec@thr=0.5": 0.88101, "recall@top3": 1.0, "prec@top3": 0.33333, "recall@top5": 1.0, "prec@top5": 0.2, "loss": 0.07176, "grad_norm": 0.06244, "time": 0.99677}
{"mode": "val", "epoch": 50, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}

opened by arvindchandel 34

[Feature] Support TSM-MobileNetV2
TODO list

[x] mobilenetv2 backbone & unittest.

[x] tsm-mobilenetv2 backbone & unittest.

[x] covnert checkpoint from origin repo.

original repo: 30 test crops, 19520 samples, top1/5 accuracy is 69.54%/88.66%

mmaction2 convertion: 10 test crops, 18219 samples, top1/5 accuracy is 69.04%/88.23%.

[x] Refactor mobilenet with mmcls

[x] changelog

[x] training with mmaction2 & update model zoo.

I don't have enough gpus to train on kinetics400, maybe next week i can have a try...

贫穷的泪水

training results of mobilenet-tsm with DenseSampleFrames1x1x8. (origin ckpt get 69.54%/88.66% top1/5 accuracy).

|lr|epochs|gpus|weight decay|top1/5 acuracy| |:-:|:-:|:-:|:-:|:-:| |0.00875|50|7|0.0001|63.75%/85.52%| |0.0025|50|4|0.0001|65.11%/85.99%| |0.0025|100|4|0.0001|66.xx%/86.xx%| |0.004|100|4|0.00004|68.31%/88.00%| |0.0075|100|6|0.00004|68.41%/88.07%|
opened by irvingzhang0512 33
[Improvement] Training custom classes of ava dataset
Target

Training some of the 80 ava classes to save training time and hopefully get better results for selected classes.

TODO

[x] dataset/evaluation codes.

[x] unittest

[x] docs

[x] sample config

[x] model zoo, compare results.

[x] Add input arg topk for BBoxHeadAVA, because num_classes may be smaller than 5.

[x] ~check whether exclude_file_xxx will affect the results.~

results

slowonly_kinetics_pretrained_r50_4*16

|custom classes|mAP(train 80 classes)|mAP (train custom classes only)|selected classes comment| |:-:|-:|-:|-:| |range(1, 15)|0.3460|0.3399|all PERSON_MOVEMENT classes| |[11, 12, 14, 15, 79, 80]|0.7066|0.7011|AP(80 classes ckpt) > 0.6| |[1,4,8,9,13,17,28,49,74]|0.4339|0.4397|AP(80 classes ckpt) in[0.3, 0.6)| |[3, 6, 10, 27, 29, 38, 41, 48, 51, 53, 54, 59, 61, 64, 70, 72]|0.1948|0.3311|AP(80 classes ckpt) in[0.1, 0.3)| |[11,12,17,74,79,80]|0.6520|0.6438|> 50000 samples| |[1,8,14,59]|0.4307|0.5549|[5000, 50000) samples| |[3,4,6,9,10,15,27,28,29,38,41,48,49,54,61,64,65,66,67,70,77]|0.2384|0.3269|[1000, 5000) samples| |[22,37,47,51,63,68,72,78]|0.0753|0.3209|[500, 1000) samples| |[2,5,7,13,20,24,26,30,34,36,42,45,46,52,56,57,58,60,62,69,73,75,76]|0.0348|0.1806|[100, 500) samples| |[16,18,19,21,23,25,31,32,33,35,39,40,43,44,50,53,55,71]|0.0169|0.1984|<100 samples|

insights

I think ava dataset suffers from series class imbalance. Training custom classes helps to get better results for classes with fewer samples.
opened by irvingzhang0512 29

Custom Training of SpatioTemporal Model SlowaFast giving mAP 0.0

Tried to train the model with our custom data (over 200+ videos). After training it with 50 apochs. mAP was still 0.0 after every epoch while validation. Can you help me in this?

Note: For annotations, I'm using normalized x1,y1 (top left corner) x2,y2 (bottom-right corner). Is it correct format or I need to change it ?

Below is my custom config file:


custom_classes = [1, 2, 3, 4, 5]
num_classes = 6
model = dict(
    type='FastRCNN',
    backbone=dict(
        type='ResNet3dSlowOnly',
        depth=50,
        pretrained=None,
        pretrained2d=False,
        lateral=False,
        num_stages=4,
        conv1_kernel=(1, 7, 7),
        conv1_stride_t=1,
        pool1_stride_t=1,
        spatial_strides=(1, 2, 2, 1)),
    roi_head=dict(
        type='AVARoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor3D',
            roi_layer_type='RoIAlign',
            output_size=8,
            with_temporal_pool=True),
        bbox_head=dict(
            type='BBoxHeadAVA',
            in_channels=2048,
            num_classes=6,
            multilabel=True,
            topk=(2, 3),
            dropout_ratio=0.5)),
    train_cfg=dict(
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssignerAVA',
                pos_iou_thr=0.9,
                neg_iou_thr=0.9,
                min_pos_iou=0.9),
            sampler=dict(
                type='RandomSampler',
                num=32,
                pos_fraction=1,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=1.0,
            debug=False)),
    test_cfg=dict(rcnn=dict(action_thr=0.002)))
dataset_type = 'AVADataset'
data_root = 'tools/data/SAI/rawframes'
anno_root = 'tools/data/SAI/Annotations'
ann_file_train = 'tools/data/SAI/Annotations/ava_format_train.csv'
ann_file_val = 'tools/data/SAI/Annotations/ava_format_test.csv'
label_file = 'tools/data/SAI/Annotations/action_list.pbtxt'
proposal_file_train = 'tools/data/SAI/Annotations/proposals_train.pkl'
proposal_file_val = 'tools/data/SAI/Annotations/proposals_test.pkl'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
    dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
    dict(type='RawFrameDecode'),
    dict(type='RandomRescale', scale_range=(256, 320)),
    dict(type='RandomCrop', size=256),
    dict(type='Flip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_bgr=False),
    dict(type='FormatShape', input_format='NCTHW', collapse=True),
    dict(type='Rename', mapping=dict(imgs='img')),
    dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
    dict(
        type='ToDataContainer',
        fields=[
            dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False)
        ]),
    dict(
        type='Collect',
        keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],
        meta_keys=['scores', 'entity_ids'])
]
val_pipeline = [
    dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
    dict(type='RawFrameDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_bgr=False),
    dict(type='FormatShape', input_format='NCTHW', collapse=True),
    dict(type='Rename', mapping=dict(imgs='img')),
    dict(type='ToTensor', keys=['img', 'proposals']),
    dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]),
    dict(
        type='Collect',
        keys=['img', 'proposals'],
        meta_keys=['scores', 'img_shape'],
        nested=True)
]
data = dict(
    videos_per_gpu=1,
    workers_per_gpu=4,
    val_dataloader=dict(
        videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
    train_dataloader=dict(
        videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
    test_dataloader=dict(
        videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
    train=dict(
        type='AVADataset',
        ann_file='tools/data/SAI/Annotations/ava_format_train.csv',
        pipeline=[
            dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
            dict(type='RawFrameDecode'),
            dict(type='RandomRescale', scale_range=(256, 320)),
            dict(type='RandomCrop', size=256),
            dict(type='Flip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_bgr=False),
            dict(type='FormatShape', input_format='NCTHW', collapse=True),
            dict(type='Rename', mapping=dict(imgs='img')),
            dict(
                type='ToTensor',
                keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
            dict(
                type='ToDataContainer',
                fields=[
                    dict(
                        key=['proposals', 'gt_bboxes', 'gt_labels'],
                        stack=False)
                ]),
            dict(
                type='Collect',
                keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],
                meta_keys=['scores', 'entity_ids'])
        ],
        label_file='tools/data/SAI/Annotations/action_list.pbtxt',
        proposal_file='tools/data/SAI/Annotations/proposals_train.pkl',
        person_det_score_thr=0.9,
        num_classes=6,
        custom_classes=[1, 2, 3, 4, 5],
        data_prefix='tools/data/SAI/rawframes'),
    val=dict(
        type='AVADataset',
        ann_file='tools/data/SAI/Annotations/ava_format_test.csv',
        pipeline=[
            dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
            dict(type='RawFrameDecode'),
            dict(type='Resize', scale=(-1, 256)),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_bgr=False),
            dict(type='FormatShape', input_format='NCTHW', collapse=True),
            dict(type='Rename', mapping=dict(imgs='img')),
            dict(type='ToTensor', keys=['img', 'proposals']),
            dict(
                type='ToDataContainer',
                fields=[dict(key='proposals', stack=False)]),
            dict(
                type='Collect',
                keys=['img', 'proposals'],
                meta_keys=['scores', 'img_shape'],
                nested=True)
        ],
        label_file='tools/data/SAI/Annotations/action_list.pbtxt',
        proposal_file='tools/data/SAI/Annotations/proposals_test.pkl',
        person_det_score_thr=0.9,
        num_classes=6,
        custom_classes=[1, 2, 3, 4, 5],
        data_prefix='tools/data/SAI/rawframes'),
    test=dict(
        type='AVADataset',
        ann_file='tools/data/SAI/Annotations/ava_format_test.csv',
        pipeline=[
            dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
            dict(type='RawFrameDecode'),
            dict(type='Resize', scale=(-1, 256)),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_bgr=False),
            dict(type='FormatShape', input_format='NCTHW', collapse=True),
            dict(type='Rename', mapping=dict(imgs='img')),
            dict(type='ToTensor', keys=['img', 'proposals']),
            dict(
                type='ToDataContainer',
                fields=[dict(key='proposals', stack=False)]),
            dict(
                type='Collect',
                keys=['img', 'proposals'],
                meta_keys=['scores', 'img_shape'],
                nested=True)
        ],
        label_file='tools/data/SAI/Annotations/action_list.pbtxt',
        proposal_file='tools/data/SAI/Annotations/proposals_test.pkl',
        person_det_score_thr=0.9,
        num_classes=6,
        custom_classes=[1, 2, 3, 4, 5],
        data_prefix='tools/data/SAI/rawframes'))
optimizer = dict(type='SGD', lr=0.025, momentum=0.9, weight_decay=1e-05)
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
lr_config = dict(
    policy='step',
    step=[10, 15],
    warmup='linear',
    warmup_by_epoch=True,
    warmup_iters=5,
    warmup_ratio=0.1)
total_epochs = 50
train_ratio = [1, 1]
checkpoint_config = dict(interval=1)
workflow = [('train', 1)]
evaluation = dict(interval=1, save_best='[email protected]')
log_config = dict(interval=20, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './SAI/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb'
load_from = 'https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb/slowfast_r50_4x16x1_256e_kinetics400_rgb_20200704-bcde7ed7.pth'
resume_from = None
find_unused_parameters = False
omnisource = False
module_hooks = []
gpu_ids = range(0, 1)

opened by memona008 28

Whether it is distributed training or not, errors will occur

Thanks for your contribution! When I try to train a model, whether use distributed training, there are errors. The installation is instructed by your install.md and there is no error when I prepare the enviroment.

There is a similar issue, but I have checked that there is no error during the installation(I have reinstalled the conda env)

For single GPU

$ python tools/train.py configs/tsn_r50_1x1x3_75e_ucf101_rgb.py                  
2020-11-07 19:47:14,013 - mmaction - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.8.5 (default, Sep  4 2020, 07:30:14) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2: GeForce GTX 1080 Ti
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.7.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.1
OpenCV: 4.4.0
MMCV: 1.1.6
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 10.0
MMAction2: 0.8.0+76819e4
------------------------------------------------------------

2020-11-07 19:47:14,014 - mmaction - INFO - Distributed training: False
2020-11-07 19:47:14,014 - mmaction - INFO - Config: /home/liming/code/video/test/mmaction2/configs/tsn_r50_1x1x3_75e_ucf101_rgb.py
# model settings
model = dict(
    type='Recognizer2D',
    backbone=dict(
        type='ResNet',
        pretrained='torchvision://resnet50',
        depth=50,
        norm_eval=False),
    cls_head=dict(
        type='TSNHead',
        num_classes=101,
        in_channels=2048,
        spatial_type='avg',
        consensus=dict(type='AvgConsensus', dim=1),
        dropout_ratio=0.4,
        init_std=0.001))
# model training and testing settings
train_cfg = None
test_cfg = dict(average_clips=None)
# dataset settings
dataset_type = 'VideoDataset'
data_root = 'data/ucf101/videos/'
data_root_val = 'data/ucf101/videos/'
split = 1  # official train/test splits. valid numbers: 1, 2, 3
ann_file_train = f'data/ucf101/ucf101_train_split_{split}_videos.txt'
ann_file_val = f'data/ucf101/ucf101_val_split_{split}_videos.txt'
ann_file_test = f'data/ucf101/ucf101_val_split_{split}_videos.txt'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
    dict(type='DecordInit'),
    dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='RandomResizedCrop'),
    dict(type='Resize', scale=(224, 224), keep_ratio=False),
    dict(type='Flip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=3,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=256),
    dict(type='Flip', flip_ratio=0),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=25,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='ThreeCrop', crop_size=256),
    dict(type='Flip', flip_ratio=0),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
data = dict(
    videos_per_gpu=32,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_test,
        data_prefix=data_root_val,
        pipeline=test_pipeline))
# optimizer
# lr = 0.00128 for 8 GPUs * 32 video/gpu, 0.00015 for 3 GPUs * 10 videos/gpu, 5e-5 for 1 GPU * 10 videos/gpu
optimizer = dict(
    type='SGD', lr=0.00048, momentum=0.9,
    weight_decay=0.0005)  # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='step', step=[])
total_epochs = 1
checkpoint_config = dict(interval=5)
evaluation = dict(
    interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'])
log_config = dict(
    interval=20,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook'),
    ])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = f'./work_dirs/tsn_r50_1x1x3_75e_ucf101_split_{split}_rgb/'
load_from = None
resume_from = None
workflow = [('train', 1)]

2020-11-07 19:47:14,568 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
2020-11-07 19:47:16,547 - mmaction - INFO - Start running, host: liming@liuhong-mac-All-Series, work_dir: /home/liming/code/video/test/mmaction2/work_dirs/tsn_r50_1x1x3_75e_ucf101_split_1_rgb
2020-11-07 19:47:16,547 - mmaction - INFO - workflow: [('train', 1)], max: 1 epochs
2020-11-07 19:47:30,330 - mmaction - INFO - Epoch [1][20/299]   lr: 4.800e-04, eta: 0:03:12, time: 0.689, data_time: 0.153, memory: 8244, top1_acc: 0.0141, top5_acc: 0.0703, loss_cls: 4.6118, loss: 4.6118, grad_norm: 5.5581
2020-11-07 19:47:40,713 - mmaction - INFO - Epoch [1][40/299]   lr: 4.800e-04, eta: 0:02:36, time: 0.519, data_time: 0.000, memory: 8244, top1_acc: 0.0266, top5_acc: 0.0828, loss_cls: 4.5864, loss: 4.5864, grad_norm: 5.5972
2020-11-07 19:47:51,104 - mmaction - INFO - Epoch [1][60/299]   lr: 4.800e-04, eta: 0:02:17, time: 0.520, data_time: 0.000, memory: 8244, top1_acc: 0.0484, top5_acc: 0.0938, loss_cls: 4.5600, loss: 4.5600, grad_norm: 5.6577
2020-11-07 19:48:01,512 - mmaction - INFO - Epoch [1][80/299]   lr: 4.800e-04, eta: 0:02:03, time: 0.520, data_time: 0.000, memory: 8244, top1_acc: 0.0484, top5_acc: 0.1437, loss_cls: 4.5178, loss: 4.5178, grad_norm: 5.6118
2020-11-07 19:48:11,938 - mmaction - INFO - Epoch [1][100/299]  lr: 4.800e-04, eta: 0:01:50, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.0797, top5_acc: 0.1938, loss_cls: 4.4669, loss: 4.4669, grad_norm: 5.7034
2020-11-07 19:48:22,364 - mmaction - INFO - Epoch [1][120/299]  lr: 4.800e-04, eta: 0:01:38, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.0875, top5_acc: 0.2406, loss_cls: 4.4534, loss: 4.4534, grad_norm: 5.7623
2020-11-07 19:48:32,792 - mmaction - INFO - Epoch [1][140/299]  lr: 4.800e-04, eta: 0:01:26, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.1156, top5_acc: 0.2781, loss_cls: 4.4031, loss: 4.4031, grad_norm: 5.7466
2020-11-07 19:48:43,221 - mmaction - INFO - Epoch [1][160/299]  lr: 4.800e-04, eta: 0:01:15, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.1703, top5_acc: 0.3422, loss_cls: 4.3451, loss: 4.3451, grad_norm: 5.7538
2020-11-07 19:48:53,649 - mmaction - INFO - Epoch [1][180/299]  lr: 4.800e-04, eta: 0:01:04, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.1656, top5_acc: 0.3656, loss_cls: 4.3214, loss: 4.3214, grad_norm: 5.7920
2020-11-07 19:49:04,084 - mmaction - INFO - Epoch [1][200/299]  lr: 4.800e-04, eta: 0:00:53, time: 0.522, data_time: 0.000, memory: 8244, top1_acc: 0.1938, top5_acc: 0.3844, loss_cls: 4.2619, loss: 4.2619, grad_norm: 5.8725
2020-11-07 19:49:14,525 - mmaction - INFO - Epoch [1][220/299]  lr: 4.800e-04, eta: 0:00:42, time: 0.522, data_time: 0.000, memory: 8244, top1_acc: 0.2359, top5_acc: 0.3906, loss_cls: 4.1983, loss: 4.1983, grad_norm: 5.8417
2020-11-07 19:49:24,974 - mmaction - INFO - Epoch [1][240/299]  lr: 4.800e-04, eta: 0:00:31, time: 0.522, data_time: 0.000, memory: 8244, top1_acc: 0.1938, top5_acc: 0.4281, loss_cls: 4.1371, loss: 4.1371, grad_norm: 6.0010
2020-11-07 19:49:35,435 - mmaction - INFO - Epoch [1][260/299]  lr: 4.800e-04, eta: 0:00:20, time: 0.523, data_time: 0.000, memory: 8244, top1_acc: 0.1922, top5_acc: 0.4359, loss_cls: 4.0732, loss: 4.0732, grad_norm: 5.9770
2020-11-07 19:49:45,881 - mmaction - INFO - Epoch [1][280/299]  lr: 4.800e-04, eta: 0:00:10, time: 0.522, data_time: 0.000, memory: 8244, top1_acc: 0.2406, top5_acc: 0.4516, loss_cls: 4.0252, loss: 4.0252, grad_norm: 6.1316
[1]    30756 segmentation fault (core dumped)  python tools/train.py configs/tsn_r50_1x1x3_75e_ucf101_rgb.py

For multiple GPUs:

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2020-11-07 19:44:48,222 - mmaction - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.8.5 (default, Sep  4 2020, 07:30:14) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2: GeForce GTX 1080 Ti
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.7.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.1
OpenCV: 4.4.0
MMCV: 1.1.6
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 10.0
MMAction2: 0.8.0+76819e4
------------------------------------------------------------

2020-11-07 19:44:48,223 - mmaction - INFO - Distributed training: True
2020-11-07 19:44:48,223 - mmaction - INFO - Config: /home/liming/code/video/test/mmaction2/configs/tsn_r50_1x1x3_75e_ucf101_rgb.py
# model settings
model = dict(
    type='Recognizer2D',
    backbone=dict(
        type='ResNet',
        pretrained='torchvision://resnet50',
        depth=50,
        norm_eval=False),
    cls_head=dict(
        type='TSNHead',
        num_classes=101,
        in_channels=2048,
        spatial_type='avg',
        consensus=dict(type='AvgConsensus', dim=1),
        dropout_ratio=0.4,
        init_std=0.001))
# model training and testing settings
train_cfg = None
test_cfg = dict(average_clips=None)
# dataset settings
dataset_type = 'VideoDataset'
data_root = 'data/ucf101/videos/'
data_root_val = 'data/ucf101/videos/'
split = 1  # official train/test splits. valid numbers: 1, 2, 3
ann_file_train = f'data/ucf101/ucf101_train_split_{split}_videos.txt'
ann_file_val = f'data/ucf101/ucf101_val_split_{split}_videos.txt'
ann_file_test = f'data/ucf101/ucf101_val_split_{split}_videos.txt'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
    dict(type='DecordInit'),
    dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='RandomResizedCrop'),
    dict(type='Resize', scale=(224, 224), keep_ratio=False),
    dict(type='Flip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=3,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=256),
    dict(type='Flip', flip_ratio=0),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=25,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='ThreeCrop', crop_size=256),
    dict(type='Flip', flip_ratio=0),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
data = dict(
    videos_per_gpu=32,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_test,
        data_prefix=data_root_val,
        pipeline=test_pipeline))
# optimizer
# lr = 0.00128 for 8 GPUs * 32 video/gpu, 0.00015 for 3 GPUs * 10 videos/gpu, 5e-5 for 1 GPU * 10 videos/gpu
optimizer = dict(
    type='SGD', lr=0.00048, momentum=0.9,
    weight_decay=0.0005)  # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='step', step=[])
total_epochs = 1
checkpoint_config = dict(interval=5)
evaluation = dict(
    interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'])
log_config = dict(
    interval=20,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook'),
    ])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = f'./work_dirs/tsn_r50_1x1x3_75e_ucf101_split_{split}_rgb/'
load_from = None
resume_from = None
workflow = [('train', 1)]

2020-11-07 19:44:48,776 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
2020-11-07 19:44:49,087 - mmaction - INFO - Start running, host: liming@liuhong-mac-All-Series, work_dir: /home/liming/code/video/test/mmaction2/work_dirs/tsn_r50_1x1x3_75e_ucf101_split_1_rgb
2020-11-07 19:44:49,087 - mmaction - INFO - workflow: [('train', 1)], max: 1 epochs
2020-11-07 19:45:07,472 - mmaction - INFO - Epoch [1][20/100]   lr: 4.800e-04, eta: 0:01:13, time: 0.918, data_time: 0.346, memory: 8333, top1_acc: 0.0281, top5_acc: 0.1016, loss_cls: 4.6039, loss: 4.6039, grad_norm: 3.2696
2020-11-07 19:45:18,411 - mmaction - INFO - Epoch [1][40/100]   lr: 4.800e-04, eta: 0:00:43, time: 0.547, data_time: 0.001, memory: 8333, top1_acc: 0.0437, top5_acc: 0.1385, loss_cls: 4.5756, loss: 4.5756, grad_norm: 3.2719
2020-11-07 19:45:29,362 - mmaction - INFO - Epoch [1][60/100]   lr: 4.800e-04, eta: 0:00:26, time: 0.548, data_time: 0.001, memory: 8333, top1_acc: 0.0818, top5_acc: 0.1781, loss_cls: 4.5440, loss: 4.5440, grad_norm: 3.2529
2020-11-07 19:45:40,321 - mmaction - INFO - Epoch [1][80/100]   lr: 4.800e-04, eta: 0:00:12, time: 0.548, data_time: 0.000, memory: 8333, top1_acc: 0.0688, top5_acc: 0.2083, loss_cls: 4.5135, loss: 4.5135, grad_norm: 3.2930
2020-11-07 19:45:50,957 - mmaction - INFO - Epoch [1][100/100]  lr: 4.800e-04, eta: 0:00:00, time: 0.532, data_time: 0.000, memory: 8333, top1_acc: 0.1174, top5_acc: 0.2649, loss_cls: 4.4753, loss: 4.4753, grad_norm: 3.3248
Traceback (most recent call last):
  File "/home/liming/anaconda3/envs/test/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/liming/anaconda3/envs/test/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/liming/anaconda3/envs/test/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/home/liming/anaconda3/envs/test/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/home/liming/anaconda3/envs/test/bin/python', '-u', './tools/train.py', '--local_rank=2', 'configs/tsn_r50_1x1x3_75e_ucf101_rgb.py', '--launcher', 'pytorch']' died with <Signals.SIGSEGV: 11>.

Here are my conda env list:

_libgcc_mutex             0.1                        main    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
addict                    2.3.0                    pypi_0    pypi
blas                      1.0                         mkl    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ca-certificates           2020.10.14                    0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
certifi                   2020.6.20                pypi_0    pypi
cudatoolkit               10.2.89              hfd86e86_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cycler                    0.10.0                   pypi_0    pypi
dataclasses               0.6                      pypi_0    pypi
freetype                  2.10.4               h5ab3b9f_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
future                    0.18.2                   pypi_0    pypi
intel-openmp              2020.2                      254    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jpeg                      9b                   h024ee3a_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
kiwisolver                1.3.1                    pypi_0    pypi
lcms2                     2.11                 h396b838_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ld_impl_linux-64          2.33.1               h53a641e_7    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libedit                   3.1.20191231         h14c3975_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libffi                    3.3                  he6710b0_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgcc-ng                 9.1.0                hdf63c60_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libpng                    1.6.37               hbc83047_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libstdcxx-ng              9.1.0                hdf63c60_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libtiff                   4.1.0                h2733197_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libuv                     1.40.0               h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
lz4-c                     1.9.2                heb0550a_3    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
matplotlib                3.3.2                    pypi_0    pypi
mkl                       2020.2                      256    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl-service               2.3.0            py38he904b0f_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl_fft                   1.2.0            py38h23d657b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl_random                1.1.1            py38h0573a6f_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mmaction2                 0.8.0                     dev_0    <develop>
mmcv-full                 1.1.6                    pypi_0    pypi
ncurses                   6.2                  he6710b0_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ninja                     1.10.1           py38hfd86e86_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
numpy                     1.19.2           py38h54aff64_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
numpy-base                1.19.2           py38hfa32c7d_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
olefile                   0.46                       py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
opencv-contrib-python     4.4.0.46                 pypi_0    pypi
openssl                   1.1.1h               h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pillow                    8.0.1            py38he98fc37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pip                       20.2.4           py38h06a4308_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pyparsing                 3.0.0b1                  pypi_0    pypi
python                    3.8.5                h7579374_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python-dateutil           2.8.1                    pypi_0    pypi
pytorch                   1.7.0           py3.8_cuda10.2.89_cudnn7.6.5_0    pytorch
pyyaml                    5.3.1                    pypi_0    pypi
readline                  8.0                  h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
setuptools                50.3.0           py38h06a4308_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
six                       1.15.0                     py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
sqlite                    3.33.0               h62c20be_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tk                        8.6.10               hbc83047_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
torchaudio                0.7.0                      py38    pytorch
torchvision               0.8.1                py38_cu102    pytorch
typing_extensions         3.7.4.3                    py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
wheel                     0.35.1                     py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xz                        5.2.5                h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
yapf                      0.30.0                   pypi_0    pypi
zlib                      1.2.11               h7b6447c_3    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
zstd                      1.4.5                h9ceee32_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main

The whole install steps are:

conda create -n test python=3.8 -y
conda activate test

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

pip install mmcv

git clone https://github.com/open-mmlab/mmaction2.git
cd mmaction2
pip install -r requirements/build.txt
python setup.py develop

mkdir data
ln -s PATH_TO_DATA data

opened by limingcv 28

[Feature] Support Webcam Demo for Spatio-temporal Action Detection Models
Description

This implementation is based on SlowFast Spatio-temporal Action Detection Webcam Demo.

TODO

[x] Multi-threads for read/display/inference.

Human detector

[x] easy to use abstract class

[x] mmdet

~[ ] yolov4 human detector~: it seems human detector is not the bottleneck for this demo.

[x] MMAction2 stdet models.

Output result

[x] cv2.imshow

[x] write to local video file.

[x] decouple display frame shape and model frame shape.

[x] logging

[x] remvoe global variables

[x] BUG: Unexpected exit when read thread is dead and display thread is alive.

[x] BUG: Ignore sampeling strategy

[x] fix known issue.

[x] Improvement: In SlowFast Webcam Demo, predict_stepsize must in range [clip_len * frame_interval // 2, clip_len * frame_interval]. Find a way to support predict_stepsize in range [0, clip_len * frame_interval]

Docs

[x] Annotations in script

[x] demo/README.md

[x] docs_zh_CN/demo.md

Known issue

config model -> test_cfg -> rcnn -> action_thr should be .0 instead of current default value 0.002. This may cause different bboxes number for different actions.

result = stdet_model(...)[0] previous_shape = None for class_id in range(len(result)): if previous_shape is None: previous_shape = result[class_id].shape else: assert previous_shape == result[class_id].shape, 'This assertion error may be raised.'

This may cause index of range error

https://github.com/open-mmlab/mmaction2/blob/905f07a7128c4d996af13d47d25546ad248ee187/demo/demo_spatiotemporal_det.py#L345-L364

j of result[i][j, 4] may be out of range. The for j in range(proposal.shape[0]) loop are assuming that all of the result[i] has the same shape, aka the same bbox number for different actions.

Usage

Modify --output-fps according to printed log DEBUG:__main__:Read Thread: {duration} ms, {fps} fps.

Modify --predict-stepsize so that the durations for read and inference, which are both printed by logger, are almost the same.

python demo/webcam_demo_spatiotemporal_det.py --show \ --output-fps 15 \ --predict-stepsize 8
opened by irvingzhang0512 26

For a single GPU，the code training hangs...

@innerlee I'm very sorry to disturb you， For a single GPU when I run this command

$ python tools/train.py configs/tsn_r50_1x1x3_75e_ucf101_rgb.py 。。。the code training hangs...
2020-12-06 03:47:12,059 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
2020-12-06 03:47:14,590 - mmaction - INFO - Start running, host: root@user-ubuntu, work_dir: /data6/sky/acd/mmaction2/tools/work_dirs/tsn_r50_1x1x3_75e_ucf101_split_1_rgb
2020-12-06 03:47:14,590 - mmaction - INFO - workflow: [('train', 1)], max: 15 epochs

(pytorch1.4.0+mmcv-full 1.2.1+cuda101), and the --validate option has been tried but no difference.

awaiting response

opened by skyqwe123 24

Still some bugs during AVA training

I reported some bugs during AVA training last time. Finally(I rename the image files manually), I can run the command "./tools/dist_train.sh configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py 4 --validate" in my PC. But during Epoch[1] [120/11524], it raise a bug "FileNotFoundError: [Errno 2] No such file or directory: '/home/avadata/ava/rawframes/7g37N3eoQ9s/img_26368.jpg'". It seems like There are still some bugs in the file name correspondence.
BTW, I find that in the config file I'm using, line89 and line 90 are as follows : line89: # Rename is needed to use mmdet detectors line90: dict(type='Rename', mapping=dict(imgs='img')) These codes maybe use for change the file name from ${video_name}_00001.jpg to img_00001.jpg(I guess). But actually, it does not work for some reasons. And I can not find a module named "Rename" in mmdet. Hope you can check the questions, thanks a lot.

opened by SKBL5694 22

Posec3D Inference on video.

I custom trained posec3d on very small data. And tried to inference it on a video. It is throwing this error

load checkpoint from local path: D:/pycharmprojects/skeleton_action_recognition/Runs/slowonly_r50_u48_240e_ntu120_xsub_keypoint/epoch_10.pth
Traceback (most recent call last):
  File "D:/pycharmprojects/skeleton_action_recognition/inference.py", line 19, in <module>
    results = inference_recognizer(model, video, labels)
  File "D:\pycharmprojects\skeleton_action_recognition\mmaction\apis\inference.py", line 171, in inference_recognizer
    data = test_pipeline(data)
  File "D:\pycharmprojects\skeleton_action_recognition\mmaction\datasets\pipelines\compose.py", line 50, in __call__
    data = t(data)
  File "D:\pycharmprojects\skeleton_action_recognition\mmaction\datasets\pipelines\augmentations.py", line 213, in __call__
    kp = results['keypoint']
KeyError: 'keypoint'

When I debugged, the result variable does not have that key.

Any suggestions?

The Inference file:

import torch

from mmaction.apis import init_recognizer, inference_recognizer

config_file = 'configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint_test.py'
# download the checkpoint from model zoo and put it in `checkpoints/`
checkpoint_file = 'Runs/slowonly_r50_u48_240e_ntu120_xsub_keypoint/epoch_10.pth'

# assign the desired device.
device = 'cuda:0' # or 'cpu'
device = torch.device(device)

 # build the model from a config file and a checkpoint file
model = init_recognizer(config_file, checkpoint_file, device=device)

# test a single video and show the result:
video = 'demo/Correct_vid/S001C001P001R001A001_rgb3.avi'
labels = 'custom_pose_data/labels.txt'
results = inference_recognizer(model, video, labels)

# show the results
# labels = open('tools/data/kinetics/label_map_k400.txt').readlines()
# labels = [x.strip() for x in labels]
lables = [0]
results = [(labels[k[0]], k[1]) for k in results]

# print(f'The top-5 labels with corresponding scores are:')
# for result in results:
#     print(f'{result[0]}: ', result[1])

The Config file:

model = dict(
    type='Recognizer3D',
    backbone=dict(
        type='ResNet3dSlowOnly',
        depth=50,
        pretrained=None,
        in_channels=17,
        base_channels=32,
        num_stages=3,
        out_indices=(2, ),
        stage_blocks=(4, 6, 3),
        conv1_stride_s=1,
        pool1_stride_s=1,
        inflate=(0, 1, 1),
        spatial_strides=(2, 2, 2),
        temporal_strides=(1, 1, 2),
        dilations=(1, 1, 1)),
    cls_head=dict(
        type='I3DHead',
        in_channels=512,
        num_classes=1, # change the class here
        spatial_type='avg',
        dropout_ratio=0.5),
    train_cfg=dict(),
    test_cfg=dict(average_clips='prob'))

dataset_type = 'VideoDataset' # PoseDataset'
ann_file_train = 'custom_pose_data/merge2.pkl'
ann_file_val = 'custom_pose_data/merge2.pkl'
left_kp = [1, 3, 5, 7, 9, 11, 13, 15]
right_kp = [2, 4, 6, 8, 10, 12, 14, 16]

test_pipeline = [
    dict(
        type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True), #num_clips=10
    dict(type='PoseDecode'),
    dict(type='PoseCompact', hw_ratio=1., allow_imgpad=True),
    dict(type='Resize', scale=(-1, 64)),
    dict(type='CenterCrop', crop_size=64),
    dict(
        type='GeneratePoseTarget',
        sigma=0.6,
        use_score=True,
        with_kp=True,
        with_limb=False,
        double=True,
        left_kp=left_kp,
        right_kp=right_kp),
    dict(type='FormatShape', input_format='NCTHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
data = dict(
    videos_per_gpu=1,
    workers_per_gpu=0,
    test_dataloader=dict(videos_per_gpu=1),
    test=dict(
        type=dataset_type,
        ann_file=None,
        data_prefix=None,
        pipeline=test_pipeline))

Thank You

opened by BakingBrains 20

[Fix] typo of tsm-r50 & sthv2

TYPO

In tsm, num_clips should be the same for train/val/test pipelines.

CANNOT reproduce tsm-r50/sthv2 results

I did some tests with tsm-r50/shtv2 these days and CANNOT reproduce the results from the model zoo.

Generally speaking

ckpt from the model zoo leads to WORSE results than that of model zoo reports.
ckpt trained by myself leas to BETTER results than that of model zoo reports.

Can anyone verify this?

|model|top1/5 accuracy for efficient mode|top1/5 accuracy for accurate mode| |:-:|:-:|:-:| |copy from model zoo|57.86/84.67 |61.12/86.26| |test with the model zoo ckpt by myself|55.56/82.94|56.92/83.92| |trained by myself with the default config, trained ckpt here|58.91/85.10|61.68/86.71|

PS: I'm using videos instead of rawframes. PPS: efficient/accurate test pipelines are listed as follows

# efficient
test_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=8,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]

# accurate
test_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=8,
        twice_sample=True,
        test_mode=True),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='ThreeCrop', crop_size=256),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]

opened by irvingzhang0512 20

Change the mmaction2 model to onnx failed.

I have trained a .pth file, I want to deploy this model, and I read this: https://github.com/open-mmlab/mmaction2/blob/master/docs/en/tutorials/6_export_model.md

this is my code: python tools/deployment/pytorch2onnx.py configs/recognition/tsm/tsm_r50_video_1x1x8_50e_kinetics400_rgb.py my_code/checkpoints/best_top1_acc_epoch_18.pth --shape 1 1 3 224 224 --verify

but I got the wrong answer:

Traceback (most recent call last): File "tools/deployment/pytorch2onnx.py", line 165, in pytorch2onnx( File "tools/deployment/pytorch2onnx.py", line 69, in pytorch2onnx torch.onnx.export( File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\onnx\utils.py", line 504, in export _export( File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\onnx\utils.py", line 1529, in _export graph, params_dict, torch_out = _model_to_graph( File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\onnx\utils.py", line 1111, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\onnx\utils.py", line 987, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\onnx\utils.py", line 891, in _trace_and_get_graph_from_model trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph( File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\jit_trace.py", line 1184, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs) File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\jit_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\jit_trace.py", line 118, in wrapper outs.append(self.inner(*trace_inputs)) File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "E:\miniconda3\envs\mmcv\lib\site-packages\torch\nn\modules\module.py", line 1178, in _slow_forward result = self.forward(*input, **kwargs) TypeError: forward_dummy() got multiple values for argument 'softmax'

I don't know why.
onnx

opened by yinghaodang 0
[Feature] TCANet for localization
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily got feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduces changes that break the back-compatibility of this repo? If so, please describe how it breaks the compatibility and how users should modify their codes to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Pre-commit or other linting tools should be used to fix the potential lint issues.

The modification should be covered by complete unit tests. If not, please add more unit tests to ensure the correctness.

If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.

The documentation should be modified accordingly, like docstring or example tutorials.
opened by hukkai 1
How to train mmaction2 with your own data.

Hi, I've been researching and gathering information on how to perform this task for a couple of days now and I'm not sure if anyone can help me.

I want to explain what I want to do, I have a set of videos which describe some actions that are not included in any existing data set and I want to pass this data as a training model to be able to recognise these actions.

In this case I have proceeded to create a text file with the path to the video and the label of the video.

For example: video1.mp4 1 video2.mp4 2 video1.mp4 3

Where the tags are custom actions.

Entering this to mmaction2 through the tutorial generated an error:

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Then I started to look at how to input the data in a better way but I didn't quite understand how to do it.

I don't know if you can explain me what steps I should follow to be able to train mmaction2 to detect custom actions.

opened by pgutierrezce 10
Skeleton-based Action Recognition Demo

Hello, I'm testing the demo script to predict the skeleton-based action recognition result using a single video. I have 2 doubts about this demo.

python demo/demo_skeleton.py demo/ntu_sample.avi demo/skeleton_demo.mp4
--config configs/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint.py
--checkpoint https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu120_xsub_keypoint/slowonly_r50_u48_240e_ntu120_xsub_keypoint-6736b03f.pth
--det-config demo/faster_rcnn_r50_fpn_2x_coco.py
--det-checkpoint http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth
--det-score-thr 0.9
--pose-config demo/hrnet_w32_coco_256x192.py
--pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w32_coco_256x192-c78dce93_20200708.pth
--label-map tools/data/skeleton/label_map_ntu120.txt

First, the recognition of the action does not match any of my own videos, I have tried to change the config file and checkpoint but nothing and I don't know what to try now.

The second question I have is that I need to be able to extract information from the resulting video, that is, with the skeleton. I need to get metrics of the time a person is standing or not moving, is there a method that does this?

Thanks.

opened by antonio2600 12
Skeleton-based model (PoseC3D) for Real-Time Webcam Inference
Hi! I tried to do real-time webcam inference with PoseC3D model. However, using high-level API of mmdetection and mmpose, I had a bottleneck in their inference time. Processing time for each inference is taking too long even with multi-thread and without the PoseC3D model.

After I saw the Webcam API from mmpose, I realized it could achieve the speed I desire. I wonder if I can implement or register the PoseC3D model as a node in that API so it could be run as an independent thread.

Is there any way to:

Speed up the detection and pose inferences for every frame (referring to this skeleton demo code) so that I can visualize the pose estimation while stacking the pose inference result for further action recognition (using PoseC3D)

Register PoseC3D inference to Webcam API in mmpose as a node so that it could be run there (I tried to manually add the node but it is too complicated and I failed to do so)

Thank you!
opened by juliussin 1

Releases(v1.0.0rc1)

v1.0.0rc1(Oct 14, 2022)
Highlights

Support Video Swin Transformer

New Features

Support Video Swin Transformer (#1939)

Improvements

Add colab tutorial for 1.x (#1956)

Support skeleton-based action recognition demo (#1920)

Bug Fixes

Fix link in doc (#1986, #1967, #1951, #1926,#1944, #1944, #1927, #1925)

Fix CI (#1987, #1930, #1923)

Fix pre-commit hook config (#1971)

Fix TIN config (#1912)

Fix UT for BMN and BSN (#1966)

Fix UT for Recognizer2D (#1937)

Fix BSN and BMN configs for localization (#1913)

Modeify ST-GCN configs (#1913)

Fix typo in migration doc (#1931)

Remove Onnx related tools (#1928)

Update TANet readme (#1916, #1890)

Update 2S-AGCN readme (#1915)

Fix TSN configs (#1905)

Fix configs for detection (#1903)

Fix typo in TIN config (#1904)

Fix PoseC3D readme (#1899)

Fix ST-GCN configs (#1891)

Fix audio recognition readme (#1898)

Fix TSM readme (#1887)

Fix SlowOnly readme (#1889)

Fix TRN readme (#1888)

Fix typo in get_started doc (#1895)

Source code(tar.gz)
Source code(zip)
v1.0.0rc0(Sep 1, 2022)
We are excited to announce the release of MMAction2 v1.0.0rc0. MMAction2 1.0.0beta is the first version of MMAction2 1.x, a part of the OpenMMLab 2.0 projects. Built upon the new training engine.

Highlights

New engines. MMAction2 1.x is based on MMEngine](https://github.com/open-mmlab/mmengine), which provides a general and powerful runner that allows more flexible customizations and significantly simplifies the entrypoints of high-level interfaces.

Unified interfaces. As a part of the OpenMMLab 2.0 projects, MMAction2 1.x unifies and refactors the interfaces and internal logics of train, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logics to allow the emergence of multi-task/modality algorithms.

More documentation and tutorials. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it here.

Breaking Changes

In this release, we made lots of major refactoring and modifications. Please refer to the migration guide for details and migration instructions.
Source code(tar.gz)
Source code(zip)
v0.24.1(Jul 29, 2022)

This release is meant to fix the compatibility with the latest mmcv v1.6.1
Source code(tar.gz)
Source code(zip)
v0.24.0(May 5, 2022)
Highlights

Support different seeds

New Features

Add lateral norm in multigrid config (#1567)

Add openpose 25 joints in graph config (#1578)

Support MLU Backend (#1608)

Bug and Typo Fixes

Fix local_rank (#1558)

Fix install typo (#1571)

Fix the inference API doc (#1580)

Fix zh-CN demo.md and getting_started.md (#1587)

Remove Recommonmark (#1595)

Fix inference with ndarray (#1603)

Fix the log error when IterBasedRunner is used (#1606)

Source code(tar.gz)
Source code(zip)
v0.23.0(Apr 2, 2022)
Highlights

Support different seeds

Provide multi-node training & testing script

Update error log

New Features

Support different seeds(#1502)

Provide multi-node training & testing script(#1521)

Update error log(#1546)

Documentations

Update gpus in Slowfast readme(#1497)

Fix work_dir in multigrid config(#1498)

Add sub bn docs(#1503)

Add shortcycle sampler docs(#1513)

Update Windows Declaration(#1520)

Update the link for ST-GCN(#1544)

Update install commands(#1549)

Bug and Typo Fixes

Update colab tutorial install cmds(#1522)

Fix num_iters_per_epoch in analyze_logs.py(#1530)

Fix distributed_sampler(#1532)

Fix cd dir error(#1545)

Update arg names(#1548)

Source code(tar.gz)
Source code(zip)
v0.22.0(Mar 7, 2022)
0.22.0 (03/05/2022)

Highlights

Support Multigrid training strategy

Support CPU training

Support audio demo

Support topk customizing in models/heads/base.py

New Features

Support Multigrid training strategy(#1378)

Support STGCN in demo_skeleton.py(#1391)

Support CPU training(#1407)

Support audio demo(#1425)

Support topk customizing in models/heads/base.py(#1452)

Documentations

Add OpenMMLab platform(#1393)

Update links(#1394)

Update readme in configs(#1404)

Update instructions to install mmcv-full(#1426)

Add shortcut(#1433)

Update modelzoo(#1439)

add video_structuralize in readme(#1455)

Update OpenMMLab repo information(#1482)

Bug and Typo Fixes

Update train.py(#1375)

Fix printout bug(#1382)

Update multi processing setting(#1395)

Setup multi processing both in train and test(#1405)

Fix bug in nondistributed multi-gpu training(#1406)

Add variable fps in ava_dataset.py(#1409)

Only support distributed training(#1414)

Set test_mode for AVA configs(#1432)

Support single label(#1434)

Add check copyright(#1447)

Support Windows CI(#1448)

Fix wrong device of class_weight in models/losses/cross_entropy_loss.py(#1457)

Fix bug caused by distributed(#1459)

Update readme(#1460)

Fix lint caused by colab automatic upload(#1461)

Refine CI(#1471)

Update pre-commit(#1474)

Add deprecation message for deploy tool(#1483)

ModelZoo

Support slowfast_steplr(#1421)

Source code(tar.gz)
Source code(zip)
v0.21.0(Dec 31, 2021)
Highlights

Support 2s-AGCN

Support publish models in Windows

Improve some sthv1 related models

Support BABEL

New Features

Support 2s-AGCN(#1248)

Support skip postproc in ntu_pose_extraction(#1295)

Support publish models in Windows(#1325)

Add copyright checkhook in pre-commit-config(#1344)

Documentations

Add MMFlow (#1273)

Revise README.md and add projects.md (#1286)

Add 2s-AGCN in Updates(#1289)

Add MMFewShot(#1300)

Add MMHuman3d(#1304)

Update pre-commit(#1313)

Use share menu from the theme instead(#1328)

Update installation command(#1340)

Bug and Typo Fixes

Update the inference part in notebooks(#1256)

Update the map_location(#1262)

Fix bug that start_index is not used in RawFrameDecode(#1278)

Fix bug in init_random_seed(#1282)

Fix bug in setup.py(#1303)

Fix interrogate error in workflows(#1305)

Fix typo in slowfast config(#1309)

Cancel previous runs that are not completed(#1327)

Fix missing skip_postproc parameter(#1347)

Update ssn.py(#1355)

Use latest youtube-dl(#1357)

Fix test-best(#1362)

ModelZoo

Improve some sthv1 related models(#1306)

Support BABEL(#1332)

Source code(tar.gz)
Source code(zip)
v0.20.0(Oct 30, 2021)
Highlights

Support TorchServe

Add video structuralize demo

Support using 3D skeletons for skeleton-based action recognition

Benchmark PoseC3D on UCF and HMDB

New Features

Support TorchServe (#1212)

Support 3D skeletons pre-processing (#1218)

Support video structuralize demo (#1197)

Documentations

Revise README.md and add projects.md (#1214)

Add CN docs for Skeleton dataset, PoseC3D and ST-GCN (#1228, #1237, #1236)

Add tutorial for custom dataset training for skeleton-based action recognition (#1234)

Bug and Typo Fixes

Fix tutorial link (#1219)

Fix GYM links (#1224)

ModelZoo

Benchmark PoseC3D on UCF and HMDB (#1223)

Add ST-GCN + 3D skeleton model for NTU60-XSub (#1236)

New Contributors

@bit-scientist made their first contribution in https://github.com/open-mmlab/mmaction2/pull/1234

Full Changelog: https://github.com/open-mmlab/mmaction2/compare/v0.19.0...v0.20.0
Source code(tar.gz)
Source code(zip)
v0.19.0(Oct 7, 2021)
Highlights

Support ST-GCN

Refactor the inference API

Add code spell check hook

New Features

Support ST-GCN (#1123)

Improvement

Add label maps for every dataset (#1127)

Remove useless code MultiGroupCrop (#1180)

Refactor Inference API (#1191)

Add code spell check hook (#1208)

Use docker in CI (#1159)

Documentations

Update metafiles to new OpenMMLAB protocols (#1134)

Switch to new doc style (#1160)

Improve the ERROR message (#1203)

Fix invalid URL in getting_started (#1169)

Bug and Typo Fixes

Compatible with new MMClassification (#1139)

Add missing runtime dependencies (#1144)

Fix THUMOS tag proposals path (#1156)

Fix LoadHVULabel (#1194)

Switch the default value of persistent_workers to False (#1202)

Fix _freeze_stages for MobileNetV2 (#1193)

Fix resume when building rawframes (#1150)

Fix device bug for class weight (#1188)

Correct Arg names in extract_audio.py (#1148)

ModelZoo

Add TSM-MobileNetV2 ported from TSM (#1163)

Add ST-GCN for NTURGB+D-XSub-60 (#1123)

Source code(tar.gz)
Source code(zip)
v0.18.0(Sep 2, 2021)
Improvement

Add CopyRight (#1099)

Support NTU Pose Extraction (#1076)

Support Caching in RawFrameDecode (#1078)

Add citations & Support python3.9 CI & Use fixed-version sphinx (#1125)

Documentations

Add Descriptions of PoseC3D dataset (#1053)

Bug and Typo Fixes

Fix SSV2 checkpoints (#1101)

Fix CSN normalization (#1116)

Fix typo (#1121)

Fix new_crop_quadruple bug (#1108)

Source code(tar.gz)
Source code(zip)
v0.17.0(Aug 3, 2021)
Highlights

Support PyTorch 1.9

Support Pytorchvideo Transforms

Support PreciseBN

New Features

Support Pytorchvideo Transforms (#1008)

Support PreciseBN (#1038)

Improvements

Remove redundant augmentations in config files (#996)

Make resource directory to hold common resource pictures (#1011)

Remove deperecated FrameSelector (#1010)

Support Concat Dataset (#1000)

Add to-mp4 option to resize_videos.py (#1021)

Add option to keep tail frames (#1050)

Update MIM support (#1061)

Calculate Top-K accurate and inaccurate classes (#1047)

Bug and Typo Fixes

Fix bug in PoseC3D demo (#1009)

Fix some problems in resize_videos.py (#1012)

Support torch1.9 (#1015)

Remove redundant code in CI (#1046)

Fix bug about persistent_workers (#1044)

Support TimeSformer feature extraction (#1035)

Fix ColorJitter (#1025)

ModelZoo

Add TSM-R50 sthv1 models trained by PytorchVideo RandAugment and AugMix (#1008)

Update SlowOnly SthV1 checkpoints (#1034)

Add SlowOnly Kinetics400 checkpoints trained with Precise-BN (#1038)

Add CSN-R50 from scratch checkpoints (#1045)

TPN Kinetics-400 Checkpoints trained with the new ColorJitter (#1025)

Documentation

Add Chinese translation of feature_extraction.md (#1020)

Fix the code snippet in getting_started.md (#1023)

Fix TANet config table (#1028)

Add description to PoseC3D dataset (#1053)

Source code(tar.gz)
Source code(zip)
v0.16.0(Jul 1, 2021)
Highlights

Support using backbone from pytorch-image-models(timm)

Support PIMS Decoder

Demo for skeleton-based action recognition

Support Timesformer

New Features

Support using backbones from pytorch-image-models(timm) for TSN (#880)

Support torchvision transformations in preprocessing pipelines (#972)

Demo for skeleton-based action recognition (#972)

Support Timesformer (#839)

Improvements

Add a tool to find invalid videos (#907, #950)

Add an option to specify spectrogram_type (#909)

Add json output to video demo (#906)

Add MIM related docs (#918)

Rename lr to scheduler (#916)

Support --cfg-options for demos (#911)

Support number counting for flow-wise filename template (#922)

Add Chinese tutorial (#941)

Change ResNet3D default values (#939)

Adjust script structure (#935)

Add font color to args in long_video_demo (#947)

Polish code style with Pylint (#908)

Support PIMS Decoder (#946)

Improve Metafiles (#956, #979, #966)

Add links to download Kinetics400 validation (#920)

Audit the usage of shutil.rmtree (#943)

Polish localizer related codes(#913)

Bug and Typo Fixes

Fix spatiotemporal detection demo (#899)

Fix docstring for 3D inflate (#925)

Fix bug of writing text to video with TextClip (#952)

Fix mmcv install in CI (#977)

ModelZoo

Add TSN with Swin Transformer backbone as an example for using pytorch-image-models(timm) backbones (#880)

Port CSN checkpoints from VMZ (#945)

Release various checkpoints for UCF101, HMDB51 and Sthv1 (#938)

Support Timesformer (#839)

Update TSM modelzoo (#981)

Source code(tar.gz)
Source code(zip)
v0.15.0(May 31, 2021)
Highlights

Support PoseC3D

Support ACRN

Support MIM

New Features

Support PoseC3D (#786, #890)

Support MIM (#870)

Support ACRN and Focal Loss (#891)

Support Jester dataset (#864)

Improvements

Add metric_options for evaluation to docs (#873)

Support creating a new label map based on custom classes for demos about spatio temporal demo (#879)

Improve document about AVA dataset preparation (#878)

Provide a script to extract clip-level feature (#856)

Bug and Typo Fixes

Fix issues about resume (#877, #878)

Correct the key name of eval_results dictionary for metric 'mmit_mean_average_precision' (#885)

ModelZoo

Support Jester dataset (#864)

Support ACRN and Focal Loss (#891)

Source code(tar.gz)
Source code(zip)
v0.14.0(May 3, 2021)
Highlights

Support TRN

Support Diving48

New Features

Support TRN (#755)

Support Diving48 (#835)

Support Webcam Demo for Spatio-temporal Action Detection Models (#795)

Improvements

Add softmax option for pytorch2onnx tool (#781)

Support TRN (#755)

Test with onnx models and TensorRT engines (#758)

Speed up AVA Testing (#784)

Add self.with_neck attribute (#796)

Update installation document (#798)

Use a random master port (#809)

Update AVA processing data document (#801)

Refactor spatio-temporal augmentation (#782)

Add QR code in CN README (#812)

Add Alternative way to download Kinetics (#817, #822)

Refactor Sampler (#790)

Use EvalHook in MMCV with backward compatibility (#793)

Use MMCV Model Registry (#843)

Bug and Typo Fixes

Fix a bug in pytorch2onnx.py when num_classes <= 4 (#800, #824)

Fix demo_spatiotemporal_det.py error (#803, #805)

Fix loading config bugs when resume (#820)

Make HMDB51 annotation generation more robust (#811)

ModelZoo

Update checkpoint for 256 height in something-V2 (#789)

Support Diving48 (#835)

Source code(tar.gz)
Source code(zip)
v0.13.0(Apr 1, 2021)
Highlights

Support LFB

Support using backbone from MMCls/TorchVision

Add Chinese documentation

New Features

Support LFB (#553)

Support using backbones from MMCls for TSN (#679)

Support using backbones from TorchVision for TSN (#720)

Support Mixup and Cutmix for recognizers (#681)

Support Chinese documentation (#665, #680, #689, #701, #702, #703, #706, #716, #717, #731, #733, #735, #736, #737, #738, #739, #740, #742, #752, #759, #761, #772, #775)

Improvements

Add slowfast config/json/log/ckpt for training custom classes of AVA (#678)

Set RandAugment as Imgaug default transforms (#585)

Add --test-last & --test-best for tools/train.py to test checkpoints after training (#608)

Add fcn_testing in TPN (#684)

Remove redundant recall functions (#741)

Recursively remove pretrained step for testing (#695)

Improve demo by limiting inference fps (#668)

Bug and Typo Fixes

Fix a bug about multi-class in VideoDataset (#723)

Reverse key-value in anet filelist generation (#686)

Fix flow norm cfg typo (#693)

ModelZoo

Add LFB for AVA2.1 (#553)

Add TSN with ResNeXt-101-32x4d backbone as an example for using MMCls backbones (#679)

Add TSN with Densenet161 backbone as an example for using TorchVision backbones (#720)

Add slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb (#690)

Add slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb (#704)

Add slowonly_nl_kinetics_pretrained_r50_4x16x1(8x8x1)_20e_ava_rgb (#730)

Source code(tar.gz)
Source code(zip)
v0.12.0(Mar 1, 2021)
Highlights

Support TSM-MobileNetV2

Support TANet

Support GPU Normalize

New Features

Support TSM-MobileNetV2 (#415)

Support flip with label mapping (#591)

Add seed option for sampler (#642)

Support GPU Normalize (#586)

Support TANet (#595)

Improvements

Training custom classes of ava dataset (#555)

Add CN README in homepage (#592, #594)

Support soft label for CrossEntropyLoss (#625)

Refactor config: Specify train_cfg and test_cfg in model (#629)

Provide an alternative way to download older kinetics annotations (#597)

Update FAQ for

1). data pipeline about video and frames (#598)

2). how to show results (#598)

3). batch size setting for batchnorm (#657)

4). how to fix stages of backbone when finetuning models (#658)

Modify default value of save_best (#600)

Use BibTex rather than latex in markdown (#607)

Add warnings of uninstalling mmdet and supplementary documents (#624)

Support soft label for CrossEntropyLoss (#625)

Bug and Typo Fixes

Fix value of pem_low_temporal_iou_threshold in BSN (#556)

Fix ActivityNet download script (#601)

ModelZoo

Add TSM-MobileNetV2 for Kinetics400 (#415)

Add deeper SlowFast models (#605)

Source code(tar.gz)
Source code(zip)
v0.11.0(Feb 1, 2021)
Highlights

Support imgaug

Support spatial temporal demo

Refactor EvalHook, config structure, unittest structure

New Features

Support imgaug for augmentations in the data pipeline (#492)

Support setting max_testing_views for extremely large models to save GPU memory used (#511)

Add spatial temporal demo (#547, #566)

Improvements

Refactor EvalHook (#395)

Refactor AVA hook (#567)

Add repo citation (#545)

Add dataset size of Kinetics400 (#503)

Add lazy operation docs (#504)

Add class_weight for CrossEntropyLoss and BCELossWithLogits (#509)

add some explanation about the resampling in slowfast (#502)

Modify paper title in README.md (#512)

Add alternative ways to download Kinetics (#521)

Add OpenMMLab projects link in README (#530)

Change default preprocessing to shortedge to 256 (#538)

Add config tag in dataset README (#540)

Add solution for markdownlint installation issue (#497)

Add dataset overview in readthedocs (#548)

Modify the trigger mode of the warnings of missing mmdet (583)

Refactor config structure (#488, #572)

Refactor unittest structure (#433)

Bug and Typo Fixes

Fix a bug about ava dataset validation (#527)

Fix a bug about ResNet pretrain weight initialization (#582)

Fix a bug in CI due to MMCV index (#495)

Remove invalid links of MiT and MMiT (#516)

Fix frame rate bug for AVA preparation (#576)

Source code(tar.gz)
Source code(zip)
v0.10.0(Jan 5, 2021)
Highlights

Support Spatio-Temporal Action Detection (AVA)

Support precise BN

New Features

Support precise BN (#501)

Support Spatio-Temporal Action Detection (AVA) (#351)

Support to return feature maps in inference_recognizer (#458)

Improvements

Add arg stride to long_video_demo.py, to make inference faster (#468)

Support training and testing for Spatio-Temporal Action Detection (#351)

Fix CI due to pip upgrade (#454)

Add markdown lint in pre-commit hook (#255)

Speed up confusion matrix calculation (#465)

Use title case in modelzoo statistics (#456)

Add FAQ documents for easy troubleshooting. (#413, #420, #439)

Support Spatio-Temporal Action Detection with context (#471)

Add class weight for CrossEntropyLoss and BCELossWithLogits (#509)

Add Lazy OPs docs (#504)

Bug and Typo Fixes

Fix typo in default argument of BaseHead (#446)

Fix potential bug about output_config overwrite (#463)

ModelZoo

Add SlowOnly, SlowFast for AVA2.1 (#351)

Source code(tar.gz)
Source code(zip)
v0.9.0(Dec 1, 2020)
Highlights

Support GradCAM utils for recognizers

Support ResNet Audio model

New Features

Automatically add modelzoo statistics to readthedocs (#327)

Support GYM99 data preparation (#331)

Add AudioOnly Pathway from AVSlowFast. (#355)

Add GradCAM utils for recognizer (#324)

Add print config script (#345)

Add online motion vector decoder (#291)

Improvements

Support PyTorch 1.7 in CI (#312)

Support to predict different labels in a long video (#274)

Update docs bout test crops (#359)

Polish code format using pylint manually (#338)

Update unittest coverage (#358, #322, #325)

Add random seed for building filelists (#323)

Update colab tutorial (#367)

set default batch_size of evaluation and testing to 1 (#250)

Rename the preparation docs to README.md (#388)

Move docs about demo to demo/README.md (#329)

Remove redundant code in tools/test.py (#310)

Automatically calculate number of test clips for Recognizer2D (#359)

Bug and Typo Fixes

Fix rename Kinetics classnames bug (#384)

Fix a bug in BaseDataset when data_prefix is None (#314)

Fix a bug about tmp_folder in OpenCVInit (#357)

Fix get_thread_id when not using disk as backend (#354, #357)

Fix the bug of HVU object num_classes from 1679 to 1678 (#307)

Fix typo in export_model.md (#399)

Fix OmniSource training configs (#321)

Fix Issue #306: Bug of SampleAVAFrames (#317)

ModelZoo

Add SlowOnly model for GYM99, both RGB and Flow (#336)

Add auto modelzoo statistics in readthedocs (#327)

Add TSN for HMDB51 pretrained on Kinetics400, Moments in Time and ImageNet. (#372)

Source code(tar.gz)
Source code(zip)
v0.8.0(Oct 31, 2020)
v0.8.0 (31/10/2020)

Highlights

Support OmniSource

Support C3D

Support video recognition with audio modality

Support HVU

Support X3D

New Features

Support AVA dataset preparation (#266)

Support the training of video recognition dataset with multiple tag categories (#235)

Support joint training with multiple training datasets of multiple formats, including images, untrimmed videos, etc. (#242)

Support to specify a start epoch to conduct evaluation (#216)

Implement X3D models, support testing with model weights converted from SlowFast (#288)

Improvements

Set default values of 'average_clips' in each config file so that there is no need to set it explicitly during testing in most cases (#232)

Extend HVU datatools to generate individual file list for each tag category (#258)

Support data preparation for Kinetics-600 and Kinetics-700 (#254)

Add cfg-options in arguments to override some settings in the used config for convenience (#212)

Rename the old evaluating protocol mean_average_precision as mmit_mean_average_precision since it is only used on MMIT and is not the mAP we usually talk about. Add mean_average_precision, which is the real mAP (#235)

Add accurate setting (Three crop * 2 clip) and report corresponding performance for TSM model (#241)

Add citations in each preparing_dataset.md in tools/data/dataset (#289)

Update the performance of audio-visual fusion on Kinetics-400 (#281)

Support data preparation of OmniSource web datasets, including GoogleImage, InsImage, InsVideo and KineticsRawVideo (#294)

Use metric_options dict to provide metric args in evaluate (#286)

Bug Fixes

Register FrameSelector in PIPELINES (#268)

Fix the potential bug for default value in dataset_setting (#245)

Fix the data preparation bug for something-something dataset (#278)

Fix the invalid config url in slowonly README data benchmark (#249)

Validate that the performance of models trained with videos have no significant difference comparing to the performance of models trained with rawframes (#256)

Correct the img_norm_cfg used by TSN-3seg-R50 UCF-101 model, improve the Top-1 accuracy by 3% (#273)

ModelZoo

Add Baselines for Kinetics-600 and Kinetics-700, including TSN-R50-8seg and SlowOnly-R50-8x8 (#259)

Add OmniSource benchmark on MiniKineitcs (#296)

Add Baselines for HVU, including TSN-R18-8seg on 6 tag categories of HVU (#287)

Add X3D models ported from SlowFast (#288)

Source code(tar.gz)
Source code(zip)
v0.7.0(Oct 3, 2020)
Highlights

Support TPN

Support JHMDB, UCF101-24, HVU dataset preparation

support onnx model conversion

New Features

Support the data pre-processing pipeline for the HVU Dataset (#277)

Support real-time action recognition from web camera (#171)

Support onnx (#160)

Support UCF101-24 preparation (#219)

Support evaluating mAP for ActivityNet with CUHK17_activitynet_pred (#176)

Add the data pipeline for ActivityNet, including downloading videos, extracting RGB and Flow frames, finetuning TSN and extracting feature (#190)

Support JHMDB preparation (#220)

ModelZoo

Add finetuning setting for SlowOnly (#173)

Add TSN and SlowOnly models trained with OmniSource, which achieve 75.7% Top-1 with TSN-R50-3seg and 80.4% Top-1 with SlowOnly-R101-8x8 (#215)

Improvements

Support demo with video url (#165)

Support multi-batch when testing (#184)

Add tutorial for adding a new learning rate updater (#181)

Add config name in meta info (#183)

Remove git hash in __version__ (#189)

Check mmcv version (#189)

Update url with 'https://download.openmmlab.com' (#208)

Update Docker file to support PyTorch 1.6 and update install.md (#209)

Polish readsthedocs display (#217, #229)

Bug Fixes

Fix the bug when using OpenCV to extract only RGB frames with original shape (#184)

Fix the bug of sthv2 num_classes from 339 to 174 (#174, #207)

Source code(tar.gz)
Source code(zip)
v0.6.0(Sep 2, 2020)
Highlights

Support TIN, CSN, SSN, NonLocal

Support FP16 training

New Features

Support NonLocal module and provide ckpt in TSM and I3D (#41)

Support SSN (#33, #37, #52, #55)

Support CSN (#87)

Support TIN (#53)

Support HMDB51 dataset preparation (#60)

Support encoding videos from frames (#84)

Support FP16 training (#25)

Enhance demo by supporting rawframe inference (#59), output video/gif (#72)

ModelZoo

Update Slowfast modelzoo (#51)

Update TSN, TSM video checkpoints (#50)

Add data benchmark for TSN (#57)

Add data benchmark for SlowOnly (#77)

Add BSN/BMN performance results with feature extracted by our codebase (#99)

Improvements

Polish data preparation codes (#70)

Improve data preparation scripts (#58)

Improve unittest coverage and minor fix (#62)

Support PyTorch 1.6 in CI (#117)

Support with_offset for rawframe dataset (#48)

Support json annotation files (#119)

Support multi-class in TSMHead (#104)

Support using val_step() to validate data for each val workflow (#123)

Use xxInit() method to get total_frames and make total_frames a required key (#90)

Add paper introduction in model readme (#140)

Adjust the directory structure of tools/ and rename some scripts files (#142)

Bug Fixes

Fix configs for localization test (#67)

Fix configs of SlowOnly by fixing lr to 8 gpus (#136)

Fix the bug in analyze_log (#54)

Fix the bug of generating HMDB51 class index file (#69)

Fix the bug of using load_checkpoint() in ResNet (#93)

Fix the bug of --work-dir when using slurm training script (#110)

Correct the sthv1/sthv2 rawframes filelist generate command (#71)

CosineAnnealing typo (#47)

Source code(tar.gz)
Source code(zip)
v0.5.0(Jul 21, 2020)

The first release of MMAction2.
Source code(tar.gz)
Source code(zip)

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Related tags

Overview

Introduction

Major Features

Changelog

Benchmark

ModelZoo

Dataset

Installation

Data Preparation

Get Started

FAQ

License

Citation

Contributing

Acknowledgement

Projects in OpenMMLab

Comments

Results

TODO list

Target

TODO

results

insights

Description

TODO

Known issue

Usage

TYPO

CANNOT reproduce tsm-r50/sthv2 results

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Releases(v1.0.0rc1)

v1.0.0rc1(Oct 14, 2022)

v1.0.0rc0(Sep 1, 2022)

v0.24.1(Jul 29, 2022)

v0.24.0(May 5, 2022)

v0.23.0(Apr 2, 2022)

v0.22.0(Mar 7, 2022)

0.22.0 (03/05/2022)

v0.21.0(Dec 31, 2021)

v0.20.0(Oct 30, 2021)

New Contributors

v0.19.0(Oct 7, 2021)

v0.18.0(Sep 2, 2021)

v0.17.0(Aug 3, 2021)

v0.16.0(Jul 1, 2021)

v0.15.0(May 31, 2021)

v0.14.0(May 3, 2021)

v0.13.0(Apr 1, 2021)

v0.12.0(Mar 1, 2021)

v0.11.0(Feb 1, 2021)

v0.10.0(Jan 5, 2021)

v0.9.0(Dec 1, 2020)

v0.8.0(Oct 31, 2020)

v0.8.0 (31/10/2020)

v0.7.0(Oct 3, 2020)

Highlights

New Features

ModelZoo

Improvements

Bug Fixes

v0.6.0(Sep 2, 2020)

Highlights

New Features

ModelZoo

Improvements

Bug Fixes

v0.5.0(Jul 21, 2020)

Owner

OpenMMLab

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

OpenMMLab Detection Toolbox and Benchmark

OpenMMLab Semantic Segmentation Toolbox and Benchmark.

OpenMMLab Pose Estimation Toolbox and Benchmark.