OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Overview

Introduction

English | 简体中文

Documentation actions codecov PyPI LICENSE Average time to resolve an issue Percentage of issues still open

MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project.

The master branch works with PyTorch 1.3+.


Action Recognition Results on Kinetics-400

Spatio-Temporal Action Detection Results on AVA-2.1

Major Features

  • Modular design

    We decompose the video understanding framework into different components and one can easily construct a customized video understanding framework by combining different modules.

  • Support for various datasets

    The toolbox directly supports multiple datasets, UCF101, Kinetics-[400/600/700], Something-Something V1&V2, Moments in Time, Multi-Moments in Time, THUMOS14, etc.

  • Support for multiple video understanding frameworks

    MMAction2 implements popular frameworks for video understanding:

    • For action recognition, various algorithms are implemented, including TSN, TSM, TIN, R(2+1)D, I3D, SlowOnly, SlowFast, CSN, Non-local, etc.

    • For temporal action localization, we implement BSN, BMN, SSN.

    • For spatial temporal detection, we implement SlowOnly, SlowFast.

  • Well tested and documented

    We provide detailed documentation and API reference, as well as unittests.

Changelog

v0.13.0 was released in 31/03/2021. Please refer to changelog.md for details and release history.

Benchmark

Model input io backend batch size x gpus MMAction2 (s/iter) MMAction (s/iter) Temporal-Shift-Module (s/iter) PySlowFast (s/iter)
TSN 256p rawframes Memcached 32x8 0.32 0.38 0.42 x
TSN 256p dense-encoded video Disk 32x8 0.61 x x TODO
I3D heavy 256p videos Disk 8x8 0.34 x x 0.44
I3D 256p rawframes Memcached 8x8 0.43 0.56 x x
TSM 256p rawframes Memcached 8x8 0.31 x 0.41 x
Slowonly 256p videos Disk 8x8 0.32 TODO x 0.34
Slowfast 256p videos Disk 8x8 0.69 x x 1.04
R(2+1)D 256p videos Disk 8x8 0.45 x x x

Details can be found in benchmark.

ModelZoo

Supported methods for Action Recognition:

(click to collapse)

Supported methods for Temporal Action Detection:

(click to collapse)
  • BSN (ECCV'2018)
  • BMN (ICCV'2019)
  • SSN (ICCV'2017)

Supported methods for Spatial Temporal Action Detection:

(click to collapse)

Results and models are available in the README.md of each method's config directory. A summary can be found in the model zoo page.

We will keep up with the latest progress of the community, and support more popular algorithms and frameworks. If you have any feature requests, please feel free to leave a comment in Issues.

Dataset

Supported datasets:

Supported datasets for Action Recognition:

(click to collapse)

Supported datasets for Temporal Action Detection

(click to collapse)

Supported datasets for Spatial Temporal Action Detection

(click to collapse)

Datasets marked with 🔲 are not fully supported yet, but related dataset preparation steps are provided.

Installation

Please refer to install.md for installation.

Data Preparation

Please refer to data_preparation.md for a general knowledge of data preparation. The supported datasets are listed in supported_datasets.md

Get Started

Please see getting_started.md for the basic usage of MMAction2. There are also tutorials:

A Colab tutorial is also provided. You may preview the notebook here or directly run on Colab.

FAQ

Please refer to FAQ for frequently asked questions.

License

This project is released under the Apache 2.0 license.

Citation

If you find this project useful in your research, please consider cite:

@misc{2020mmaction2,
    title={OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark},
    author={MMAction2 Contributors},
    howpublished = {\url{https://github.com/open-mmlab/mmaction2}},
    year={2020}
}

Contributing

We appreciate all contributions to improve MMAction2. Please refer to CONTRIBUTING.md in MMCV for more details about the contributing guideline.

Acknowledgement

MMAction2 is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new models.

Projects in OpenMMLab

  • MMCV: OpenMMLab foundational library for computer vision.
  • MMClassification: OpenMMLab image classification toolbox and benchmark.
  • MMDetection: OpenMMLab detection toolbox and benchmark.
  • MMDetection3D: OpenMMLab's next-generation platform for general 3D object detection.
  • MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark.
  • MMAction2: OpenMMLab's next-generation video understanding toolbox and benchmark.
  • MMTracking: OpenMMLab video perception toolbox and benchmark.
  • MMPose: OpenMMLab pose estimation toolbox and benchmark.
  • MMEditing: OpenMMLab image and video editing toolbox.
  • MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding.
Issues
  • [Improvement] Set RandAugment as Imgaug default transforms.

    [Improvement] Set RandAugment as Imgaug default transforms.

    Use imgaug to reimplement RandAugment.

    According to VideoMix, RandAugment helps a little.

    image

    Results

    • sthv1 & tsm-r50, 8V100, 50epochs

    |configs|top1 acc(efficient/accuracy)|top5 acc(efficient/accuracy)| |:-|:-:|:-:| |mmaction2 model zoo|45.58 / 47.70|75.02 / 76.12| |testing with model zoo ckpt|45.47 / 47.55|74.56 / 75.79| |training with default config|45.82 / 47.90|74.38 / 76.02| |flip|47.10 / 48.51|75.02 / 76.12| |randaugment|47.16 / 48.90|76.07 / 77.92| |flip+randaugment|47.85/50.31|76.78/78.18|

    • Kinetics400, 8 V100, test with 256x256 & three crops

    |Models|top1/5 accuracy|Training lost(epoch 100)|training time| |:-:|:-:|:-:|:-:| |TSN-R50-1x1x8-Vanilla|70.74%/89.37%|0.8|2days 12hours| |TSN-R50-1x1x8-RandAugment|71.07%/89.40%|1.3|2days 22hours| |I3D-R50-32x2x1-Vanilla|74.48%/91.62%|1.1|3days 10hours| |I3D-R50-32x2x1-RandAugment|74.23%/91.45%|1.5|4days 10hours|

    opened by irvingzhang0512 40
  • What this training log refers? And for training SlowFast on new data for some custom activity, is there any minimum sample size to start with?

    What this training log refers? And for training SlowFast on new data for some custom activity, is there any minimum sample size to start with?

    I prepared short sample custom data in AVA format for 2 activity Sweeping and walking, then trained SlowFast for 50 epocs on clip_len=16 (due to hardware limitation). Sharing below the training log json details, looks like its not learning anything because mAP is consistently 0 for all epocs, what could be possible reasons behind it?

    Compiler: 10.2\nMMAction2: 0.12.0+13f42bf", "seed": null, "config_name": "custom_slowfast.py", "work_dir": "slowfast_kinetics_pretrained_r50_4x16x1_20e_ava_rgb_clean-data_new_e80", "hook_msgs": {}}

    {"mode": "train", "epoch": 1, "iter": 20, "lr": 0.0562, "memory": 8197, "data_time": 0.18563, "loss_action_cls": 0.16409, "[email protected]=0.5": 0.71278, "[email protected]=0.5": 0.67664, "[email protected]": 0.90636, "[email protected]": 0.30212, "[email protected]": 0.91545, "[email protected]": 0.18309, "loss": 0.16409, "grad_norm": 0.91884, "time": 0.99759}
    {"mode": "val", "epoch": 1, "iter": 22, "lr": 0.0598, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 2, "iter": 20, "lr": 0.0958, "memory": 8197, "data_time": 0.1842, "loss_action_cls": 0.10098, "[email protected]=0.5": 0.75593, "[email protected]=0.5": 0.74255, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.10098, "grad_norm": 0.34649, "time": 0.98014}
    {"mode": "val", "epoch": 2, "iter": 22, "lr": 0.0994, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 3, "iter": 20, "lr": 0.1354, "memory": 8197, "data_time": 0.18966, "loss_action_cls": 0.10026, "[email protected]=0.5": 0.77377, "[email protected]=0.5": 0.77127, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.10026, "grad_norm": 0.30035, "time": 0.99118}
    {"mode": "val", "epoch": 3, "iter": 22, "lr": 0.139, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 4, "iter": 20, "lr": 0.175, "memory": 8197, "data_time": 0.18845, "loss_action_cls": 0.12424, "[email protected]=0.5": 0.79485, "[email protected]=0.5": 0.78929, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.12424, "grad_norm": 0.19094, "time": 0.99367}
    {"mode": "val", "epoch": 4, "iter": 22, "lr": 0.1786, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 5, "iter": 20, "lr": 0.2146, "memory": 8197, "data_time": 0.18817, "loss_action_cls": 0.11159, "[email protected]=0.5": 0.79285, "[email protected]=0.5": 0.77159, "[email protected]": 0.99545, "[email protected]": 0.33182, "[email protected]": 0.99545, "[email protected]": 0.19909, "loss": 0.11159, "grad_norm": 0.16631, "time": 0.99733}
    {"mode": "val", "epoch": 5, "iter": 22, "lr": 0.2182, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 6, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.18938, "loss_action_cls": 0.11952, "[email protected]=0.5": 0.735, "[email protected]=0.5": 0.73273, "[email protected]": 0.98, "[email protected]": 0.32667, "[email protected]": 0.98, "[email protected]": 0.196, "loss": 0.11952, "grad_norm": 0.26395, "time": 0.99816}
    {"mode": "val", "epoch": 6, "iter": 22, "lr": 0.22, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 7, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.19043, "loss_action_cls": 0.11324, "[email protected]=0.5": 0.82705, "[email protected]=0.5": 0.82227, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.11324, "grad_norm": 0.1336, "time": 0.9999}
    {"mode": "val", "epoch": 7, "iter": 22, "lr": 0.22, "[email protected]": 0.0}
    {"mode": "train", "epoch": 8, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.18619, "loss_action_cls": 0.08463, "[email protected]=0.5": 0.82482, "[email protected]=0.5": 0.81927, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08463, "grad_norm": 0.11848, "time": 0.99716}
    {"mode": "val", "epoch": 8, "iter": 22, "lr": 0.22, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 9, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.18562, "loss_action_cls": 0.09073, "[email protected]=0.5": 0.77285, "[email protected]=0.5": 0.77035, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09073, "grad_norm": 0.12449, "time": 0.99849}
    {"mode": "val", "epoch": 9, "iter": 22, "lr": 0.22, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 10, "iter": 20, "lr": 0.22, "memory": 8197, "data_time": 0.18366, "loss_action_cls": 0.09193, "[email protected]=0.5": 0.81924, "[email protected]=0.5": 0.81369, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09193, "grad_norm": 0.09078, "time": 0.99763}
    {"mode": "val", "epoch": 10, "iter": 22, "lr": 0.22, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 11, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18933, "loss_action_cls": 0.09355, "[email protected]=0.5": 0.84336, "[email protected]=0.5": 0.84086, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09355, "grad_norm": 0.08913, "time": 1.00207}
    {"mode": "val", "epoch": 11, "iter": 22, "lr": 0.022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 12, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18655, "loss_action_cls": 0.09352, "[email protected]=0.5": 0.84199, "[email protected]=0.5": 0.83949, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09352, "grad_norm": 0.09578, "time": 0.99861}
    {"mode": "val", "epoch": 12, "iter": 22, "lr": 0.022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 13, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18258, "loss_action_cls": 0.09836, "[email protected]=0.5": 0.86856, "[email protected]=0.5": 0.86856, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09836, "grad_norm": 0.07878, "time": 0.99762}
    {"mode": "val", "epoch": 13, "iter": 22, "lr": 0.022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 14, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18307, "loss_action_cls": 0.08192, "[email protected]=0.5": 0.86619, "[email protected]=0.5": 0.86619, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08192, "grad_norm": 0.07241, "time": 0.99841}
    {"mode": "val", "epoch": 14, "iter": 22, "lr": 0.022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 15, "iter": 20, "lr": 0.022, "memory": 8197, "data_time": 0.18555, "loss_action_cls": 0.07062, "[email protected]=0.5": 0.84995, "prec[email protected]=0.5": 0.84995, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07062, "grad_norm": 0.07792, "time": 0.99924}
    {"mode": "val", "epoch": 15, "iter": 22, "lr": 0.022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 16, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18864, "loss_action_cls": 0.08495, "[email protected]=0.5": 0.86629, "[email protected]=0.5": 0.86629, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08495, "grad_norm": 0.08121, "time": 1.00141}
    {"mode": "val", "epoch": 16, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 17, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18965, "loss_action_cls": 0.11092, "[email protected]=0.5": 0.8503, "[email protected]=0.5": 0.8503, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.11092, "grad_norm": 0.06323, "time": 1.00582}
    {"mode": "val", "epoch": 17, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 18, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18077, "loss_action_cls": 0.08457, "[email protected]=0.5": 0.85369, "[email protected]=0.5": 0.85369, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08457, "grad_norm": 0.06237, "time": 0.9956}
    {"mode": "val", "epoch": 18, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 19, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18342, "loss_action_cls": 0.08996, "[email protected]=0.5": 0.84434, "[email protected]=0.5": 0.84226, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08996, "grad_norm": 0.07551, "time": 0.99802}
    {"mode": "val", "epoch": 19, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 20, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18127, "loss_action_cls": 0.08211, "[email protected]=0.5": 0.85747, "[email protected]=0.5": 0.85747, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08211, "grad_norm": 0.06186, "time": 0.99498}
    {"mode": "val", "epoch": 20, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 21, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18135, "loss_action_cls": 0.0857, "[email protected]=0.5": 0.84931, "[email protected]=0.5": 0.84931, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.0857, "grad_norm": 0.07136, "time": 0.995}
    {"mode": "val", "epoch": 21, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 22, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18529, "loss_action_cls": 0.08998, "[email protected]=0.5": 0.86644, "[email protected]=0.5": 0.86208, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08998, "grad_norm": 0.07752, "time": 0.99948}
    {"mode": "val", "epoch": 22, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 23, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18675, "loss_action_cls": 0.07464, "[email protected]=0.5": 0.84141, "[email protected]=0.5": 0.84141, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07464, "grad_norm": 0.07109, "time": 1.02437}
    {"mode": "val", "epoch": 23, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 24, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.19255, "loss_action_cls": 0.09615, "[email protected]=0.5": 0.87189, "[email protected]=0.5": 0.87189, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09615, "grad_norm": 0.06948, "time": 1.00467}
    {"mode": "val", "epoch": 24, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 25, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18252, "loss_action_cls": 0.0939, "[email protected]=0.5": 0.86088, "[email protected]=0.5": 0.86088, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.0939, "grad_norm": 0.06941, "time": 0.99516}
    {"mode": "val", "epoch": 25, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 26, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18245, "loss_action_cls": 0.09089, "[email protected]=0.5": 0.84902, "[email protected]=0.5": 0.84901, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09089, "grad_norm": 0.05622, "time": 0.99528}
    {"mode": "val", "epoch": 26, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 27, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18309, "loss_action_cls": 0.0874, "[email protected]=0.5": 0.87808, "[email protected]=0.5": 0.87808, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.0874, "grad_norm": 0.06894, "time": 0.99701}
    {"mode": "val", "epoch": 27, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 28, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18577, "loss_action_cls": 0.08544, "[email protected]=0.5": 0.84664, "[email protected]=0.5": 0.84437, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08544, "grad_norm": 0.07643, "time": 0.99881}
    {"mode": "val", "epoch": 28, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 29, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18908, "loss_action_cls": 0.10787, "[email protected]=0.5": 0.87369, "[email protected]=0.5": 0.87141, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.10787, "grad_norm": 0.05707, "time": 1.00178}
    {"mode": "val", "epoch": 29, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 30, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18647, "loss_action_cls": 0.0934, "[email protected]=0.5": 0.8727, "[email protected]=0.5": 0.87042, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.0934, "grad_norm": 0.05735, "time": 0.99853}
    {"mode": "val", "epoch": 30, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 31, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18154, "loss_action_cls": 0.07874, "[email protected]=0.5": 0.85874, "[email protected]=0.5": 0.85874, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07874, "grad_norm": 0.06633, "time": 0.99413}
    {"mode": "val", "epoch": 31, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 32, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18083, "loss_action_cls": 0.07918, "[email protected]=0.5": 0.86742, "[email protected]=0.5": 0.86492, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07918, "grad_norm": 0.06247, "time": 0.9932}
    {"mode": "val", "epoch": 32, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 33, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18088, "loss_action_cls": 0.08861, "[email protected]=0.5": 0.86927, "[email protected]=0.5": 0.86735, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08861, "grad_norm": 0.07271, "time": 0.99552}
    {"mode": "val", "epoch": 33, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 34, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.1886, "loss_action_cls": 0.09317, "[email protected]=0.5": 0.86667, "[email protected]=0.5": 0.86667, "[email protected]3": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09317, "grad_norm": 0.06294, "time": 1.00273}
    {"mode": "val", "epoch": 34, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 35, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18746, "loss_action_cls": 0.089, "[email protected]=0.5": 0.87669, "[email protected]=0.5": 0.87669, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.089, "grad_norm": 0.06243, "time": 0.99921}
    {"mode": "val", "epoch": 35, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 36, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18179, "loss_action_cls": 0.07702, "[email protected]=0.5": 0.86391, "[email protected]=0.5": 0.86391, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07702, "grad_norm": 0.07411, "time": 0.99609}
    {"mode": "val", "epoch": 36, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 37, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18797, "loss_action_cls": 0.08872, "[email protected]=0.5": 0.86088, "[email protected]=0.5": 0.86088, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08872, "grad_norm": 0.07458, "time": 0.99985}
    {"mode": "val", "epoch": 37, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 38, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18704, "loss_action_cls": 0.08762, "[email protected]=0.5": 0.87121, "[email protected]=0.5": 0.86843, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08762, "grad_norm": 0.06538, "time": 0.99896}
    {"mode": "val", "epoch": 38, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 39, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18852, "loss_action_cls": 0.08822, "[email protected]=0.5": 0.85919, "[email protected]=0.5": 0.85919, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08822, "grad_norm": 0.07977, "time": 1.0016}
    {"mode": "val", "epoch": 39, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 40, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18234, "loss_action_cls": 0.09024, "[email protected]=0.5": 0.85601, "[email protected]=0.5": 0.85601, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09024, "grad_norm": 0.06097, "time": 0.99434}
    {"mode": "val", "epoch": 40, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 41, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18165, "loss_action_cls": 0.09851, "[email protected]=0.5": 0.84987, "[email protected]=0.5": 0.84737, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09851, "grad_norm": 0.06554, "time": 0.99627}
    {"mode": "val", "epoch": 41, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 42, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18597, "loss_action_cls": 0.10595, "[email protected]=0.5": 0.87117, "[email protected]=0.5": 0.87117, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.10595, "grad_norm": 0.05842, "time": 0.99769}
    {"mode": "val", "epoch": 42, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 43, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.1856, "loss_action_cls": 0.08387, "[email protected]=0.5": 0.86939, "[email protected]=0.5": 0.86939, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08387, "grad_norm": 0.06906, "time": 1.00146}
    {"mode": "val", "epoch": 43, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 44, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18118, "loss_action_cls": 0.08536, "[email protected]=0.5": 0.85187, "[email protected]=0.5": 0.85187, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.08536, "grad_norm": 0.0665, "time": 0.9931}
    {"mode": "val", "epoch": 44, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 45, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18369, "loss_action_cls": 0.09834, "[email protected]=0.5": 0.84446, "[email protected]=0.5": 0.84169, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09834, "grad_norm": 0.07264, "time": 0.99587}
    {"mode": "val", "epoch": 45, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 46, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18497, "loss_action_cls": 0.07137, "[email protected]=0.5": 0.85472, "[email protected]=0.5": 0.85194, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07137, "grad_norm": 0.07303, "time": 0.99785}
    {"mode": "val", "epoch": 46, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 47, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18986, "loss_action_cls": 0.07812, "[email protected]=0.5": 0.86687, "[email protected]=0.5": 0.86687, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07812, "grad_norm": 0.06059, "time": 1.00136}
    {"mode": "val", "epoch": 47, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 48, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.188, "loss_action_cls": 0.09891, "[email protected]=0.5": 0.85929, "[email protected]=0.5": 0.85929, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.09891, "grad_norm": 0.05919, "time": 0.99993}
    {"mode": "val", "epoch": 48, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 49, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.18616, "loss_action_cls": 0.06949, "[email protected]=0.5": 0.85987, "[email protected]=0.5": 0.85987, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.06949, "grad_norm": 0.07458, "time": 0.99806}
    {"mode": "val", "epoch": 49, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    {"mode": "train", "epoch": 50, "iter": 20, "lr": 0.0022, "memory": 8197, "data_time": 0.1849, "loss_action_cls": 0.07176, "[email protected]=0.5": 0.88101, "[email protected]=0.5": 0.88101, "[email protected]": 1.0, "[email protected]": 0.33333, "[email protected]": 1.0, "[email protected]": 0.2, "loss": 0.07176, "grad_norm": 0.06244, "time": 0.99677}
    {"mode": "val", "epoch": 50, "iter": 22, "lr": 0.0022, "[email protected]": 0.0}
    
    opened by arvindchandel 34
  • [Feature] Support TSM-MobileNetV2

    [Feature] Support TSM-MobileNetV2

    TODO list

    • [x] mobilenetv2 backbone & unittest.
    • [x] tsm-mobilenetv2 backbone & unittest.
    • [x] covnert checkpoint from origin repo.
      • original repo: 30 test crops, 19520 samples, top1/5 accuracy is 69.54%/88.66%
      • mmaction2 convertion: 10 test crops, 18219 samples, top1/5 accuracy is 69.04%/88.23%.
    • [x] Refactor mobilenet with mmcls
    • [x] changelog
    • [x] training with mmaction2 & update model zoo.
      • I don't have enough gpus to train on kinetics400, maybe next week i can have a try...
      • 贫穷的泪水

    training results of mobilenet-tsm with DenseSampleFrames1x1x8. (origin ckpt get 69.54%/88.66% top1/5 accuracy).

    |lr|epochs|gpus|weight decay|top1/5 acuracy| |:-:|:-:|:-:|:-:|:-:| |0.00875|50|7|0.0001|63.75%/85.52%| |0.0025|50|4|0.0001|65.11%/85.99%| |0.0025|100|4|0.0001|66.xx%/86.xx%| |0.004|100|4|0.00004|68.31%/88.00%| |0.0075|100|6|0.00004|68.41%/88.07%|

    opened by irvingzhang0512 33
  • [Improvement] Training custom classes of ava dataset

    [Improvement] Training custom classes of ava dataset

    Target

    Training some of the 80 ava classes to save training time and hopefully get better results for selected classes.

    TODO

    • [x] dataset/evaluation codes.
    • [x] unittest
    • [x] docs
    • [x] sample config
    • [x] model zoo, compare results.
    • [x] Add input arg topk for BBoxHeadAVA, because num_classes may be smaller than 5.
    • [x] ~check whether exclude_file_xxx will affect the results.~

    results

    • slowonly_kinetics_pretrained_r50_4*16

    |custom classes|mAP(train 80 classes)|mAP (train custom classes only)|selected classes comment| |:-:|-:|-:|-:| |range(1, 15)|0.3460|0.3399|all PERSON_MOVEMENT classes| |[11, 12, 14, 15, 79, 80]|0.7066|0.7011|AP(80 classes ckpt) > 0.6| |[1,4,8,9,13,17,28,49,74]|0.4339|0.4397|AP(80 classes ckpt) in[0.3, 0.6)| |[3, 6, 10, 27, 29, 38, 41, 48, 51, 53, 54, 59, 61, 64, 70, 72]|0.1948|0.3311|AP(80 classes ckpt) in[0.1, 0.3)| |[11,12,17,74,79,80]|0.6520|0.6438|> 50000 samples| |[1,8,14,59]|0.4307|0.5549|[5000, 50000) samples| |[3,4,6,9,10,15,27,28,29,38,41,48,49,54,61,64,65,66,67,70,77]|0.2384|0.3269|[1000, 5000) samples| |[22,37,47,51,63,68,72,78]|0.0753|0.3209|[500, 1000) samples| |[2,5,7,13,20,24,26,30,34,36,42,45,46,52,56,57,58,60,62,69,73,75,76]|0.0348|0.1806|[100, 500) samples| |[16,18,19,21,23,25,31,32,33,35,39,40,43,44,50,53,55,71]|0.0169|0.1984|<100 samples|

    insights

    I think ava dataset suffers from series class imbalance. Training custom classes helps to get better results for classes with fewer samples.

    image

    opened by irvingzhang0512 29
  • Whether it is distributed training or not, errors will occur

    Whether it is distributed training or not, errors will occur

    Thanks for your contribution! When I try to train a model, whether use distributed training, there are errors. The installation is instructed by your install.md and there is no error when I prepare the enviroment.

    There is a similar issue, but I have checked that there is no error during the installation(I have reinstalled the conda env)

    For single GPU

    $ python tools/train.py configs/tsn_r50_1x1x3_75e_ucf101_rgb.py                  
    2020-11-07 19:47:14,013 - mmaction - INFO - Environment info:
    ------------------------------------------------------------
    sys.platform: linux
    Python: 3.8.5 (default, Sep  4 2020, 07:30:14) [GCC 7.3.0]
    CUDA available: True
    GPU 0,1,2: GeForce GTX 1080 Ti
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 10.0, V10.0.130
    GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
    PyTorch: 1.7.0
    PyTorch compiling details: PyTorch built with:
      - GCC 7.3
      - C++ Version: 201402
      - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
      - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
      - OpenMP 201511 (a.k.a. OpenMP 4.5)
      - NNPACK is enabled
      - CPU capability usage: AVX2
      - CUDA Runtime 10.2
      - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
      - CuDNN 7.6.5
      - Magma 2.5.2
      - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 
    
    TorchVision: 0.8.1
    OpenCV: 4.4.0
    MMCV: 1.1.6
    MMCV Compiler: GCC 7.5
    MMCV CUDA Compiler: 10.0
    MMAction2: 0.8.0+76819e4
    ------------------------------------------------------------
    
    2020-11-07 19:47:14,014 - mmaction - INFO - Distributed training: False
    2020-11-07 19:47:14,014 - mmaction - INFO - Config: /home/liming/code/video/test/mmaction2/configs/tsn_r50_1x1x3_75e_ucf101_rgb.py
    # model settings
    model = dict(
        type='Recognizer2D',
        backbone=dict(
            type='ResNet',
            pretrained='torchvision://resnet50',
            depth=50,
            norm_eval=False),
        cls_head=dict(
            type='TSNHead',
            num_classes=101,
            in_channels=2048,
            spatial_type='avg',
            consensus=dict(type='AvgConsensus', dim=1),
            dropout_ratio=0.4,
            init_std=0.001))
    # model training and testing settings
    train_cfg = None
    test_cfg = dict(average_clips=None)
    # dataset settings
    dataset_type = 'VideoDataset'
    data_root = 'data/ucf101/videos/'
    data_root_val = 'data/ucf101/videos/'
    split = 1  # official train/test splits. valid numbers: 1, 2, 3
    ann_file_train = f'data/ucf101/ucf101_train_split_{split}_videos.txt'
    ann_file_val = f'data/ucf101/ucf101_val_split_{split}_videos.txt'
    ann_file_test = f'data/ucf101/ucf101_val_split_{split}_videos.txt'
    img_norm_cfg = dict(
        mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
    train_pipeline = [
        dict(type='DecordInit'),
        dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3),
        dict(type='DecordDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='RandomResizedCrop'),
        dict(type='Resize', scale=(224, 224), keep_ratio=False),
        dict(type='Flip', flip_ratio=0.5),
        dict(type='Normalize', **img_norm_cfg),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs', 'label'])
    ]
    val_pipeline = [
        dict(type='DecordInit'),
        dict(
            type='SampleFrames',
            clip_len=1,
            frame_interval=1,
            num_clips=3,
            test_mode=True),
        dict(type='DecordDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='CenterCrop', crop_size=256),
        dict(type='Flip', flip_ratio=0),
        dict(type='Normalize', **img_norm_cfg),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs'])
    ]
    test_pipeline = [
        dict(type='DecordInit'),
        dict(
            type='SampleFrames',
            clip_len=1,
            frame_interval=1,
            num_clips=25,
            test_mode=True),
        dict(type='DecordDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='ThreeCrop', crop_size=256),
        dict(type='Flip', flip_ratio=0),
        dict(type='Normalize', **img_norm_cfg),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs'])
    ]
    data = dict(
        videos_per_gpu=32,
        workers_per_gpu=4,
        train=dict(
            type=dataset_type,
            ann_file=ann_file_train,
            data_prefix=data_root,
            pipeline=train_pipeline),
        val=dict(
            type=dataset_type,
            ann_file=ann_file_val,
            data_prefix=data_root_val,
            pipeline=val_pipeline),
        test=dict(
            type=dataset_type,
            ann_file=ann_file_test,
            data_prefix=data_root_val,
            pipeline=test_pipeline))
    # optimizer
    # lr = 0.00128 for 8 GPUs * 32 video/gpu, 0.00015 for 3 GPUs * 10 videos/gpu, 5e-5 for 1 GPU * 10 videos/gpu
    optimizer = dict(
        type='SGD', lr=0.00048, momentum=0.9,
        weight_decay=0.0005)  # this lr is used for 8 gpus
    optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
    # learning policy
    lr_config = dict(policy='step', step=[])
    total_epochs = 1
    checkpoint_config = dict(interval=5)
    evaluation = dict(
        interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'])
    log_config = dict(
        interval=20,
        hooks=[
            dict(type='TextLoggerHook'),
            # dict(type='TensorboardLoggerHook'),
        ])
    # runtime settings
    dist_params = dict(backend='nccl')
    log_level = 'INFO'
    work_dir = f'./work_dirs/tsn_r50_1x1x3_75e_ucf101_split_{split}_rgb/'
    load_from = None
    resume_from = None
    workflow = [('train', 1)]
    
    2020-11-07 19:47:14,568 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
    2020-11-07 19:47:16,547 - mmaction - INFO - Start running, host: [email protected], work_dir: /home/liming/code/video/test/mmaction2/work_dirs/tsn_r50_1x1x3_75e_ucf101_split_1_rgb
    2020-11-07 19:47:16,547 - mmaction - INFO - workflow: [('train', 1)], max: 1 epochs
    2020-11-07 19:47:30,330 - mmaction - INFO - Epoch [1][20/299]   lr: 4.800e-04, eta: 0:03:12, time: 0.689, data_time: 0.153, memory: 8244, top1_acc: 0.0141, top5_acc: 0.0703, loss_cls: 4.6118, loss: 4.6118, grad_norm: 5.5581
    2020-11-07 19:47:40,713 - mmaction - INFO - Epoch [1][40/299]   lr: 4.800e-04, eta: 0:02:36, time: 0.519, data_time: 0.000, memory: 8244, top1_acc: 0.0266, top5_acc: 0.0828, loss_cls: 4.5864, loss: 4.5864, grad_norm: 5.5972
    2020-11-07 19:47:51,104 - mmaction - INFO - Epoch [1][60/299]   lr: 4.800e-04, eta: 0:02:17, time: 0.520, data_time: 0.000, memory: 8244, top1_acc: 0.0484, top5_acc: 0.0938, loss_cls: 4.5600, loss: 4.5600, grad_norm: 5.6577
    2020-11-07 19:48:01,512 - mmaction - INFO - Epoch [1][80/299]   lr: 4.800e-04, eta: 0:02:03, time: 0.520, data_time: 0.000, memory: 8244, top1_acc: 0.0484, top5_acc: 0.1437, loss_cls: 4.5178, loss: 4.5178, grad_norm: 5.6118
    2020-11-07 19:48:11,938 - mmaction - INFO - Epoch [1][100/299]  lr: 4.800e-04, eta: 0:01:50, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.0797, top5_acc: 0.1938, loss_cls: 4.4669, loss: 4.4669, grad_norm: 5.7034
    2020-11-07 19:48:22,364 - mmaction - INFO - Epoch [1][120/299]  lr: 4.800e-04, eta: 0:01:38, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.0875, top5_acc: 0.2406, loss_cls: 4.4534, loss: 4.4534, grad_norm: 5.7623
    2020-11-07 19:48:32,792 - mmaction - INFO - Epoch [1][140/299]  lr: 4.800e-04, eta: 0:01:26, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.1156, top5_acc: 0.2781, loss_cls: 4.4031, loss: 4.4031, grad_norm: 5.7466
    2020-11-07 19:48:43,221 - mmaction - INFO - Epoch [1][160/299]  lr: 4.800e-04, eta: 0:01:15, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.1703, top5_acc: 0.3422, loss_cls: 4.3451, loss: 4.3451, grad_norm: 5.7538
    2020-11-07 19:48:53,649 - mmaction - INFO - Epoch [1][180/299]  lr: 4.800e-04, eta: 0:01:04, time: 0.521, data_time: 0.000, memory: 8244, top1_acc: 0.1656, top5_acc: 0.3656, loss_cls: 4.3214, loss: 4.3214, grad_norm: 5.7920
    2020-11-07 19:49:04,084 - mmaction - INFO - Epoch [1][200/299]  lr: 4.800e-04, eta: 0:00:53, time: 0.522, data_time: 0.000, memory: 8244, top1_acc: 0.1938, top5_acc: 0.3844, loss_cls: 4.2619, loss: 4.2619, grad_norm: 5.8725
    2020-11-07 19:49:14,525 - mmaction - INFO - Epoch [1][220/299]  lr: 4.800e-04, eta: 0:00:42, time: 0.522, data_time: 0.000, memory: 8244, top1_acc: 0.2359, top5_acc: 0.3906, loss_cls: 4.1983, loss: 4.1983, grad_norm: 5.8417
    2020-11-07 19:49:24,974 - mmaction - INFO - Epoch [1][240/299]  lr: 4.800e-04, eta: 0:00:31, time: 0.522, data_time: 0.000, memory: 8244, top1_acc: 0.1938, top5_acc: 0.4281, loss_cls: 4.1371, loss: 4.1371, grad_norm: 6.0010
    2020-11-07 19:49:35,435 - mmaction - INFO - Epoch [1][260/299]  lr: 4.800e-04, eta: 0:00:20, time: 0.523, data_time: 0.000, memory: 8244, top1_acc: 0.1922, top5_acc: 0.4359, loss_cls: 4.0732, loss: 4.0732, grad_norm: 5.9770
    2020-11-07 19:49:45,881 - mmaction - INFO - Epoch [1][280/299]  lr: 4.800e-04, eta: 0:00:10, time: 0.522, data_time: 0.000, memory: 8244, top1_acc: 0.2406, top5_acc: 0.4516, loss_cls: 4.0252, loss: 4.0252, grad_norm: 6.1316
    [1]    30756 segmentation fault (core dumped)  python tools/train.py configs/tsn_r50_1x1x3_75e_ucf101_rgb.py
    

    For multiple GPUs:

    *****************************************
    Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
    *****************************************
    2020-11-07 19:44:48,222 - mmaction - INFO - Environment info:
    ------------------------------------------------------------
    sys.platform: linux
    Python: 3.8.5 (default, Sep  4 2020, 07:30:14) [GCC 7.3.0]
    CUDA available: True
    GPU 0,1,2: GeForce GTX 1080 Ti
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 10.0, V10.0.130
    GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
    PyTorch: 1.7.0
    PyTorch compiling details: PyTorch built with:
      - GCC 7.3
      - C++ Version: 201402
      - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
      - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
      - OpenMP 201511 (a.k.a. OpenMP 4.5)
      - NNPACK is enabled
      - CPU capability usage: AVX2
      - CUDA Runtime 10.2
      - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
      - CuDNN 7.6.5
      - Magma 2.5.2
      - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 
    
    TorchVision: 0.8.1
    OpenCV: 4.4.0
    MMCV: 1.1.6
    MMCV Compiler: GCC 7.5
    MMCV CUDA Compiler: 10.0
    MMAction2: 0.8.0+76819e4
    ------------------------------------------------------------
    
    2020-11-07 19:44:48,223 - mmaction - INFO - Distributed training: True
    2020-11-07 19:44:48,223 - mmaction - INFO - Config: /home/liming/code/video/test/mmaction2/configs/tsn_r50_1x1x3_75e_ucf101_rgb.py
    # model settings
    model = dict(
        type='Recognizer2D',
        backbone=dict(
            type='ResNet',
            pretrained='torchvision://resnet50',
            depth=50,
            norm_eval=False),
        cls_head=dict(
            type='TSNHead',
            num_classes=101,
            in_channels=2048,
            spatial_type='avg',
            consensus=dict(type='AvgConsensus', dim=1),
            dropout_ratio=0.4,
            init_std=0.001))
    # model training and testing settings
    train_cfg = None
    test_cfg = dict(average_clips=None)
    # dataset settings
    dataset_type = 'VideoDataset'
    data_root = 'data/ucf101/videos/'
    data_root_val = 'data/ucf101/videos/'
    split = 1  # official train/test splits. valid numbers: 1, 2, 3
    ann_file_train = f'data/ucf101/ucf101_train_split_{split}_videos.txt'
    ann_file_val = f'data/ucf101/ucf101_val_split_{split}_videos.txt'
    ann_file_test = f'data/ucf101/ucf101_val_split_{split}_videos.txt'
    img_norm_cfg = dict(
        mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
    train_pipeline = [
        dict(type='DecordInit'),
        dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3),
        dict(type='DecordDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='RandomResizedCrop'),
        dict(type='Resize', scale=(224, 224), keep_ratio=False),
        dict(type='Flip', flip_ratio=0.5),
        dict(type='Normalize', **img_norm_cfg),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs', 'label'])
    ]
    val_pipeline = [
        dict(type='DecordInit'),
        dict(
            type='SampleFrames',
            clip_len=1,
            frame_interval=1,
            num_clips=3,
            test_mode=True),
        dict(type='DecordDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='CenterCrop', crop_size=256),
        dict(type='Flip', flip_ratio=0),
        dict(type='Normalize', **img_norm_cfg),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs'])
    ]
    test_pipeline = [
        dict(type='DecordInit'),
        dict(
            type='SampleFrames',
            clip_len=1,
            frame_interval=1,
            num_clips=25,
            test_mode=True),
        dict(type='DecordDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='ThreeCrop', crop_size=256),
        dict(type='Flip', flip_ratio=0),
        dict(type='Normalize', **img_norm_cfg),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs'])
    ]
    data = dict(
        videos_per_gpu=32,
        workers_per_gpu=4,
        train=dict(
            type=dataset_type,
            ann_file=ann_file_train,
            data_prefix=data_root,
            pipeline=train_pipeline),
        val=dict(
            type=dataset_type,
            ann_file=ann_file_val,
            data_prefix=data_root_val,
            pipeline=val_pipeline),
        test=dict(
            type=dataset_type,
            ann_file=ann_file_test,
            data_prefix=data_root_val,
            pipeline=test_pipeline))
    # optimizer
    # lr = 0.00128 for 8 GPUs * 32 video/gpu, 0.00015 for 3 GPUs * 10 videos/gpu, 5e-5 for 1 GPU * 10 videos/gpu
    optimizer = dict(
        type='SGD', lr=0.00048, momentum=0.9,
        weight_decay=0.0005)  # this lr is used for 8 gpus
    optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
    # learning policy
    lr_config = dict(policy='step', step=[])
    total_epochs = 1
    checkpoint_config = dict(interval=5)
    evaluation = dict(
        interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'])
    log_config = dict(
        interval=20,
        hooks=[
            dict(type='TextLoggerHook'),
            # dict(type='TensorboardLoggerHook'),
        ])
    # runtime settings
    dist_params = dict(backend='nccl')
    log_level = 'INFO'
    work_dir = f'./work_dirs/tsn_r50_1x1x3_75e_ucf101_split_{split}_rgb/'
    load_from = None
    resume_from = None
    workflow = [('train', 1)]
    
    2020-11-07 19:44:48,776 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
    2020-11-07 19:44:49,087 - mmaction - INFO - Start running, host: [email protected], work_dir: /home/liming/code/video/test/mmaction2/work_dirs/tsn_r50_1x1x3_75e_ucf101_split_1_rgb
    2020-11-07 19:44:49,087 - mmaction - INFO - workflow: [('train', 1)], max: 1 epochs
    2020-11-07 19:45:07,472 - mmaction - INFO - Epoch [1][20/100]   lr: 4.800e-04, eta: 0:01:13, time: 0.918, data_time: 0.346, memory: 8333, top1_acc: 0.0281, top5_acc: 0.1016, loss_cls: 4.6039, loss: 4.6039, grad_norm: 3.2696
    2020-11-07 19:45:18,411 - mmaction - INFO - Epoch [1][40/100]   lr: 4.800e-04, eta: 0:00:43, time: 0.547, data_time: 0.001, memory: 8333, top1_acc: 0.0437, top5_acc: 0.1385, loss_cls: 4.5756, loss: 4.5756, grad_norm: 3.2719
    2020-11-07 19:45:29,362 - mmaction - INFO - Epoch [1][60/100]   lr: 4.800e-04, eta: 0:00:26, time: 0.548, data_time: 0.001, memory: 8333, top1_acc: 0.0818, top5_acc: 0.1781, loss_cls: 4.5440, loss: 4.5440, grad_norm: 3.2529
    2020-11-07 19:45:40,321 - mmaction - INFO - Epoch [1][80/100]   lr: 4.800e-04, eta: 0:00:12, time: 0.548, data_time: 0.000, memory: 8333, top1_acc: 0.0688, top5_acc: 0.2083, loss_cls: 4.5135, loss: 4.5135, grad_norm: 3.2930
    2020-11-07 19:45:50,957 - mmaction - INFO - Epoch [1][100/100]  lr: 4.800e-04, eta: 0:00:00, time: 0.532, data_time: 0.000, memory: 8333, top1_acc: 0.1174, top5_acc: 0.2649, loss_cls: 4.4753, loss: 4.4753, grad_norm: 3.3248
    Traceback (most recent call last):
      File "/home/liming/anaconda3/envs/test/lib/python3.8/runpy.py", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/home/liming/anaconda3/envs/test/lib/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/home/liming/anaconda3/envs/test/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in <module>
        main()
      File "/home/liming/anaconda3/envs/test/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main
        raise subprocess.CalledProcessError(returncode=process.returncode,
    subprocess.CalledProcessError: Command '['/home/liming/anaconda3/envs/test/bin/python', '-u', './tools/train.py', '--local_rank=2', 'configs/tsn_r50_1x1x3_75e_ucf101_rgb.py', '--launcher', 'pytorch']' died with <Signals.SIGSEGV: 11>.
    

    Here are my conda env list:

    _libgcc_mutex             0.1                        main    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    addict                    2.3.0                    pypi_0    pypi
    blas                      1.0                         mkl    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    ca-certificates           2020.10.14                    0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    certifi                   2020.6.20                pypi_0    pypi
    cudatoolkit               10.2.89              hfd86e86_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    cycler                    0.10.0                   pypi_0    pypi
    dataclasses               0.6                      pypi_0    pypi
    freetype                  2.10.4               h5ab3b9f_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    future                    0.18.2                   pypi_0    pypi
    intel-openmp              2020.2                      254    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    jpeg                      9b                   h024ee3a_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    kiwisolver                1.3.1                    pypi_0    pypi
    lcms2                     2.11                 h396b838_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    ld_impl_linux-64          2.33.1               h53a641e_7    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libedit                   3.1.20191231         h14c3975_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libffi                    3.3                  he6710b0_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libgcc-ng                 9.1.0                hdf63c60_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libpng                    1.6.37               hbc83047_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libstdcxx-ng              9.1.0                hdf63c60_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libtiff                   4.1.0                h2733197_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    libuv                     1.40.0               h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    lz4-c                     1.9.2                heb0550a_3    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    matplotlib                3.3.2                    pypi_0    pypi
    mkl                       2020.2                      256    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    mkl-service               2.3.0            py38he904b0f_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    mkl_fft                   1.2.0            py38h23d657b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    mkl_random                1.1.1            py38h0573a6f_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    mmaction2                 0.8.0                     dev_0    <develop>
    mmcv-full                 1.1.6                    pypi_0    pypi
    ncurses                   6.2                  he6710b0_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    ninja                     1.10.1           py38hfd86e86_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    numpy                     1.19.2           py38h54aff64_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    numpy-base                1.19.2           py38hfa32c7d_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    olefile                   0.46                       py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    opencv-contrib-python     4.4.0.46                 pypi_0    pypi
    openssl                   1.1.1h               h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    pillow                    8.0.1            py38he98fc37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    pip                       20.2.4           py38h06a4308_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    pyparsing                 3.0.0b1                  pypi_0    pypi
    python                    3.8.5                h7579374_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    python-dateutil           2.8.1                    pypi_0    pypi
    pytorch                   1.7.0           py3.8_cuda10.2.89_cudnn7.6.5_0    pytorch
    pyyaml                    5.3.1                    pypi_0    pypi
    readline                  8.0                  h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    setuptools                50.3.0           py38h06a4308_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    six                       1.15.0                     py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    sqlite                    3.33.0               h62c20be_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    tk                        8.6.10               hbc83047_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    torchaudio                0.7.0                      py38    pytorch
    torchvision               0.8.1                py38_cu102    pytorch
    typing_extensions         3.7.4.3                    py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    wheel                     0.35.1                     py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    xz                        5.2.5                h7b6447c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    yapf                      0.30.0                   pypi_0    pypi
    zlib                      1.2.11               h7b6447c_3    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    zstd                      1.4.5                h9ceee32_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    

    The whole install steps are:

    conda create -n test python=3.8 -y
    conda activate test
    
    conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
    
    pip install mmcv
    
    git clone https://github.com/open-mmlab/mmaction2.git
    cd mmaction2
    pip install -r requirements/build.txt
    python setup.py develop
    
    mkdir data
    ln -s PATH_TO_DATA data
    
    opened by limingcv 28
  • [Feature] Support Webcam Demo for Spatio-temporal Action Detection Models

    [Feature] Support Webcam Demo for Spatio-temporal Action Detection Models

    Description

    This implementation is based on SlowFast Spatio-temporal Action Detection Webcam Demo.

    TODO

    • [x] Multi-threads for read/display/inference.
    • Human detector
      • [x] easy to use abstract class
      • [x] mmdet
      • ~[ ] yolov4 human detector~: it seems human detector is not the bottleneck for this demo.
    • [x] MMAction2 stdet models.
    • Output result
      • [x] cv2.imshow
      • [x] write to local video file.
    • [x] decouple display frame shape and model frame shape.
    • [x] logging
    • [x] remvoe global variables
    • [x] BUG: Unexpected exit when read thread is dead and display thread is alive.
    • [x] BUG: Ignore sampeling strategy
    • [x] fix known issue.
    • [x] Improvement: In SlowFast Webcam Demo, predict_stepsize must in range [clip_len * frame_interval // 2, clip_len * frame_interval]. Find a way to support predict_stepsize in range [0, clip_len * frame_interval]
    • Docs
      • [x] Annotations in script
      • [x] demo/README.md
      • [x] docs_zh_CN/demo.md

    Known issue

    • config model -> test_cfg -> rcnn -> action_thr should be .0 instead of current default value 0.002. This may cause different bboxes number for different actions.
    result = stdet_model(...)[0]
    
    previous_shape = None
    for class_id in range(len(result)):
        if previous_shape is None:
            previous_shape = result[class_id].shape
        else:
            assert previous_shape == result[class_id].shape, 'This assertion error may be raised.'
    
    • This may cause index of range error

    https://github.com/open-mmlab/mmaction2/blob/905f07a7128c4d996af13d47d25546ad248ee187/demo/demo_spatiotemporal_det.py#L345-L364

    j of result[i][j, 4] may be out of range. The for j in range(proposal.shape[0]) loop are assuming that all of the result[i] has the same shape, aka the same bbox number for different actions.

    Usage

    • Modify --output-fps according to printed log DEBUG:__main__:Read Thread: {duration} ms, {fps} fps.
    • Modify --predict-stepsize so that the durations for read and inference, which are both printed by logger, are almost the same.
    python demo/webcam_demo_spatiotemporal_det.py --show \
      --output-fps 15 \
      --predict-stepsize 8
    
    opened by irvingzhang0512 26
  • Custom Training of SpatioTemporal Model SlowaFast giving mAP 0.0

    Custom Training of SpatioTemporal Model SlowaFast giving mAP 0.0

    Tried to train the model with our custom data (over 200+ videos). After training it with 50 apochs. mAP was still 0.0 after every epoch while validation. Can you help me in this?

    Note: For annotations, I'm using normalized x1,y1 (top left corner) x2,y2 (bottom-right corner). Is it correct format or I need to change it ?

    Below is my custom config file:

    
    custom_classes = [1, 2, 3, 4, 5]
    num_classes = 6
    model = dict(
        type='FastRCNN',
        backbone=dict(
            type='ResNet3dSlowOnly',
            depth=50,
            pretrained=None,
            pretrained2d=False,
            lateral=False,
            num_stages=4,
            conv1_kernel=(1, 7, 7),
            conv1_stride_t=1,
            pool1_stride_t=1,
            spatial_strides=(1, 2, 2, 1)),
        roi_head=dict(
            type='AVARoIHead',
            bbox_roi_extractor=dict(
                type='SingleRoIExtractor3D',
                roi_layer_type='RoIAlign',
                output_size=8,
                with_temporal_pool=True),
            bbox_head=dict(
                type='BBoxHeadAVA',
                in_channels=2048,
                num_classes=6,
                multilabel=True,
                topk=(2, 3),
                dropout_ratio=0.5)),
        train_cfg=dict(
            rcnn=dict(
                assigner=dict(
                    type='MaxIoUAssignerAVA',
                    pos_iou_thr=0.9,
                    neg_iou_thr=0.9,
                    min_pos_iou=0.9),
                sampler=dict(
                    type='RandomSampler',
                    num=32,
                    pos_fraction=1,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=1.0,
                debug=False)),
        test_cfg=dict(rcnn=dict(action_thr=0.002)))
    dataset_type = 'AVADataset'
    data_root = 'tools/data/SAI/rawframes'
    anno_root = 'tools/data/SAI/Annotations'
    ann_file_train = 'tools/data/SAI/Annotations/ava_format_train.csv'
    ann_file_val = 'tools/data/SAI/Annotations/ava_format_test.csv'
    label_file = 'tools/data/SAI/Annotations/action_list.pbtxt'
    proposal_file_train = 'tools/data/SAI/Annotations/proposals_train.pkl'
    proposal_file_val = 'tools/data/SAI/Annotations/proposals_test.pkl'
    img_norm_cfg = dict(
        mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
    train_pipeline = [
        dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
        dict(type='RawFrameDecode'),
        dict(type='RandomRescale', scale_range=(256, 320)),
        dict(type='RandomCrop', size=256),
        dict(type='Flip', flip_ratio=0.5),
        dict(
            type='Normalize',
            mean=[123.675, 116.28, 103.53],
            std=[58.395, 57.12, 57.375],
            to_bgr=False),
        dict(type='FormatShape', input_format='NCTHW', collapse=True),
        dict(type='Rename', mapping=dict(imgs='img')),
        dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
        dict(
            type='ToDataContainer',
            fields=[
                dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False)
            ]),
        dict(
            type='Collect',
            keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],
            meta_keys=['scores', 'entity_ids'])
    ]
    val_pipeline = [
        dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
        dict(type='RawFrameDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(
            type='Normalize',
            mean=[123.675, 116.28, 103.53],
            std=[58.395, 57.12, 57.375],
            to_bgr=False),
        dict(type='FormatShape', input_format='NCTHW', collapse=True),
        dict(type='Rename', mapping=dict(imgs='img')),
        dict(type='ToTensor', keys=['img', 'proposals']),
        dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]),
        dict(
            type='Collect',
            keys=['img', 'proposals'],
            meta_keys=['scores', 'img_shape'],
            nested=True)
    ]
    data = dict(
        videos_per_gpu=1,
        workers_per_gpu=4,
        val_dataloader=dict(
            videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
        train_dataloader=dict(
            videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
        test_dataloader=dict(
            videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
        train=dict(
            type='AVADataset',
            ann_file='tools/data/SAI/Annotations/ava_format_train.csv',
            pipeline=[
                dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
                dict(type='RawFrameDecode'),
                dict(type='RandomRescale', scale_range=(256, 320)),
                dict(type='RandomCrop', size=256),
                dict(type='Flip', flip_ratio=0.5),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_bgr=False),
                dict(type='FormatShape', input_format='NCTHW', collapse=True),
                dict(type='Rename', mapping=dict(imgs='img')),
                dict(
                    type='ToTensor',
                    keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
                dict(
                    type='ToDataContainer',
                    fields=[
                        dict(
                            key=['proposals', 'gt_bboxes', 'gt_labels'],
                            stack=False)
                    ]),
                dict(
                    type='Collect',
                    keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],
                    meta_keys=['scores', 'entity_ids'])
            ],
            label_file='tools/data/SAI/Annotations/action_list.pbtxt',
            proposal_file='tools/data/SAI/Annotations/proposals_train.pkl',
            person_det_score_thr=0.9,
            num_classes=6,
            custom_classes=[1, 2, 3, 4, 5],
            data_prefix='tools/data/SAI/rawframes'),
        val=dict(
            type='AVADataset',
            ann_file='tools/data/SAI/Annotations/ava_format_test.csv',
            pipeline=[
                dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
                dict(type='RawFrameDecode'),
                dict(type='Resize', scale=(-1, 256)),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_bgr=False),
                dict(type='FormatShape', input_format='NCTHW', collapse=True),
                dict(type='Rename', mapping=dict(imgs='img')),
                dict(type='ToTensor', keys=['img', 'proposals']),
                dict(
                    type='ToDataContainer',
                    fields=[dict(key='proposals', stack=False)]),
                dict(
                    type='Collect',
                    keys=['img', 'proposals'],
                    meta_keys=['scores', 'img_shape'],
                    nested=True)
            ],
            label_file='tools/data/SAI/Annotations/action_list.pbtxt',
            proposal_file='tools/data/SAI/Annotations/proposals_test.pkl',
            person_det_score_thr=0.9,
            num_classes=6,
            custom_classes=[1, 2, 3, 4, 5],
            data_prefix='tools/data/SAI/rawframes'),
        test=dict(
            type='AVADataset',
            ann_file='tools/data/SAI/Annotations/ava_format_test.csv',
            pipeline=[
                dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
                dict(type='RawFrameDecode'),
                dict(type='Resize', scale=(-1, 256)),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_bgr=False),
                dict(type='FormatShape', input_format='NCTHW', collapse=True),
                dict(type='Rename', mapping=dict(imgs='img')),
                dict(type='ToTensor', keys=['img', 'proposals']),
                dict(
                    type='ToDataContainer',
                    fields=[dict(key='proposals', stack=False)]),
                dict(
                    type='Collect',
                    keys=['img', 'proposals'],
                    meta_keys=['scores', 'img_shape'],
                    nested=True)
            ],
            label_file='tools/data/SAI/Annotations/action_list.pbtxt',
            proposal_file='tools/data/SAI/Annotations/proposals_test.pkl',
            person_det_score_thr=0.9,
            num_classes=6,
            custom_classes=[1, 2, 3, 4, 5],
            data_prefix='tools/data/SAI/rawframes'))
    optimizer = dict(type='SGD', lr=0.025, momentum=0.9, weight_decay=1e-05)
    optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
    lr_config = dict(
        policy='step',
        step=[10, 15],
        warmup='linear',
        warmup_by_epoch=True,
        warmup_iters=5,
        warmup_ratio=0.1)
    total_epochs = 50
    train_ratio = [1, 1]
    checkpoint_config = dict(interval=1)
    workflow = [('train', 1)]
    evaluation = dict(interval=1, save_best='[email protected]')
    log_config = dict(interval=20, hooks=[dict(type='TextLoggerHook')])
    dist_params = dict(backend='nccl')
    log_level = 'INFO'
    work_dir = './SAI/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb'
    load_from = 'https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb/slowfast_r50_4x16x1_256e_kinetics400_rgb_20200704-bcde7ed7.pth'
    resume_from = None
    find_unused_parameters = False
    omnisource = False
    module_hooks = []
    gpu_ids = range(0, 1)
    
    
    
    opened by memona008 26
  • For a single GPU,the code training hangs...

    For a single GPU,the code training hangs...

    @innerlee I'm very sorry to disturb you, For a single GPU when I run this command

    $ python tools/train.py configs/tsn_r50_1x1x3_75e_ucf101_rgb.py 。。。the code training hangs...
    2020-12-06 03:47:12,059 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
    2020-12-06 03:47:14,590 - mmaction - INFO - Start running, host: [email protected], work_dir: /data6/sky/acd/mmaction2/tools/work_dirs/tsn_r50_1x1x3_75e_ucf101_split_1_rgb
    2020-12-06 03:47:14,590 - mmaction - INFO - workflow: [('train', 1)], max: 15 epochs
    

    (pytorch1.4.0+mmcv-full 1.2.1+cuda101), and the --validate option has been tried but no difference.

    awaiting response 
    opened by skyqwe123 24
  • Still some bugs during AVA training

    Still some bugs during AVA training

    I reported some bugs during AVA training last time. Finally(I rename the image files manually), I can run the command "./tools/dist_train.sh configs/detection/ava/slowfast_kinetics_pretrained_r50_8x8x1_20e_ava_rgb.py 4 --validate" in my PC. But during Epoch[1] [120/11524], it raise a bug "FileNotFoundError: [Errno 2] No such file or directory: '/home/avadata/ava/rawframes/7g37N3eoQ9s/img_26368.jpg'". It seems like There are still some bugs in the file name correspondence.
    BTW, I find that in the config file I'm using, line89 and line 90 are as follows : line89: # Rename is needed to use mmdet detectors line90: dict(type='Rename', mapping=dict(imgs='img')) These codes maybe use for change the file name from ${video_name}_00001.jpg to img_00001.jpg(I guess). But actually, it does not work for some reasons. And I can not find a module named "Rename" in mmdet. Hope you can check the questions, thanks a lot.

    opened by SKBL5694 22
  • Useless forward Test code

    Useless forward Test code

    mmaction\models\recognizers\recognizer3d.py def forward_test(self, imgs):

    Loss is not calculated, and accuracy is not calculated. So why do I use it? I recommend printing the accuracy after the evaluation

    opened by 944284742 20
  • I want to save the returned features from posec3d

    I want to save the returned features from posec3d

    First of all, Thank you for sharing this wonderful project. I was studying on your work 'PoseC3D' and got curious about some things.

    After running the posec3d, I want to get feature maps and save these in my computer. When I run posec3d, the result of mmaction.apis inference_recognizer method is recognition result dict. So that in demo_c3d the only thing I can get is the recognition result. The comment of inference_recognizer says that if I write 'outputs' as the names of the layers it will return feature maps. Then if I set outputs with the names of the layers, will it return feature maps after running posec3d model? Or do I have to change any other things to get feature maps of posec3d?

    opened by BBEEEEI 0
  • Issues with training with multiple gpus.

    Issues with training with multiple gpus.

    During the reimplementatio of recognition/TSM, when i parse the gpu args as 2, i get the error AssertionError: MMDataParallel only supports single GPU training, if you need to train with multiple GPUs, please use MMDistributedDataParallelinstead.

    So, i parse the arg --launcher pytorch as well and then the following error comes up

    rank = int(os.environ['RANK'])
    

    File "/usr/lib/python3.8/os.py", line 675, in getitem

    raise KeyError(key) from None

    KeyError: 'RANK'

    Some say it is due to the .pyc files previously generated by the docker...

    Have anyone faced a similar issue.

    opened by Laowai01 1
  • about AVA

    about AVA

    Are there any useful annotation tools for annotating AVA datasets,Thanks.

    opened by Blueyao17 0
  • does skeleton data based posec3d recognize several actions that happen one after another?

    does skeleton data based posec3d recognize several actions that happen one after another?

    I have a quick question. Let's say I have a 15s video where two people shake hands for 5s, then hug each other for 5s and hop for 5s. Can the model recognize all 3 actions? Initially, I think the model takes one input video and only one class category is drawn as the output.

    opened by bit-scientist 1
  • puzzled about the input size of posec3d

    puzzled about the input size of posec3d

    Hi,

    When I read the paper "Revisiting Skeleton-based Action Recognition", I got puzzled about the input shape of 3D-CNN network.

    As stated in Section 3.2, the 3D heatmap volume is K × T × H × W, however in Section 4,2, the The input size for all 3D-CNN experiments is T ×H×W. So what exactly the input of the network is? Did I miss something?

    Thanks!

    opened by x2ss 4
  • windows

    windows

    Can mmaction2 be installed under Windows environment? Are there any installation tutorials? I hope you can solve my problem, thank you.

    opened by yanhan13944047669 1
  • The dataset corresponding to st-gcn was not found

    The dataset corresponding to st-gcn was not found

    I want to train the model st-gcn with 2d keypoints,but I can't find the dataset needed to train the model.How should I get the dataset? I hope you can solve my problem, thank you.

    opened by yanhan13944047669 1
  •  BMN model(action location) didn't execute temporal action detection task just proposal generation?

    BMN model(action location) didn't execute temporal action detection task just proposal generation?

    I want to apply BMN model to action location task, but I found it just generated proposal

    opened by fangxu622 1
  • question about input shape during inference

    question about input shape during inference

    It's said that 'the input should be $batch $clip $channel $time $height $width(e.g. 1 1 3 32 224 224);'

    How do I understand $clip and $time ?

    opened by ououoxx 2
  • How to embed it into 3D skeleton based action recognition

    How to embed it into 3D skeleton based action recognition

    Do you have the task of human 3D skeleton pose estimation, that is, input a video and output human 3D skeleton video or human 3D skeleton file. I am now using NTU rgb60 data set for action recognition. I want to combine the 3D skeleton pose estimation of human body and directly obtain the action classification from a video. Thank you for your reply!

    opened by PJJie 1
Releases(v0.20.0)
  • v0.20.0(Oct 30, 2021)

    Highlights

    • Support TorchServe
    • Add video structuralize demo
    • Support using 3D skeletons for skeleton-based action recognition
    • Benchmark PoseC3D on UCF and HMDB

    New Features

    • Support TorchServe (#1212)
    • Support 3D skeletons pre-processing (#1218)
    • Support video structuralize demo (#1197)

    Documentations

    • Revise README.md and add projects.md (#1214)
    • Add CN docs for Skeleton dataset, PoseC3D and ST-GCN (#1228, #1237, #1236)
    • Add tutorial for custom dataset training for skeleton-based action recognition (#1234)

    Bug and Typo Fixes

    ModelZoo

    • Benchmark PoseC3D on UCF and HMDB (#1223)
    • Add ST-GCN + 3D skeleton model for NTU60-XSub (#1236)

    New Contributors

    • @bit-scientist made their first contribution in https://github.com/open-mmlab/mmaction2/pull/1234

    Full Changelog: https://github.com/open-mmlab/mmaction2/compare/v0.19.0...v0.20.0

    Source code(tar.gz)
    Source code(zip)
  • v0.19.0(Oct 7, 2021)

    Highlights

    • Support ST-GCN
    • Refactor the inference API
    • Add code spell check hook

    New Features

    Improvement

    • Add label maps for every dataset (#1127)
    • Remove useless code MultiGroupCrop (#1180)
    • Refactor Inference API (#1191)
    • Add code spell check hook (#1208)
    • Use docker in CI (#1159)

    Documentations

    • Update metafiles to new OpenMMLAB protocols (#1134)
    • Switch to new doc style (#1160)
    • Improve the ERROR message (#1203)
    • Fix invalid URL in getting_started (#1169)

    Bug and Typo Fixes

    • Compatible with new MMClassification (#1139)
    • Add missing runtime dependencies (#1144)
    • Fix THUMOS tag proposals path (#1156)
    • Fix LoadHVULabel (#1194)
    • Switch the default value of persistent_workers to False (#1202)
    • Fix _freeze_stages for MobileNetV2 (#1193)
    • Fix resume when building rawframes (#1150)
    • Fix device bug for class weight (#1188)
    • Correct Arg names in extract_audio.py (#1148)

    ModelZoo

    • Add TSM-MobileNetV2 ported from TSM (#1163)
    • Add ST-GCN for NTURGB+D-XSub-60 (#1123)
    Source code(tar.gz)
    Source code(zip)
  • v0.18.0(Sep 2, 2021)

    Improvement

    • Add CopyRight (#1099)
    • Support NTU Pose Extraction (#1076)
    • Support Caching in RawFrameDecode (#1078)
    • Add citations & Support python3.9 CI & Use fixed-version sphinx (#1125)

    Documentations

    • Add Descriptions of PoseC3D dataset (#1053)

    Bug and Typo Fixes

    • Fix SSV2 checkpoints (#1101)
    • Fix CSN normalization (#1116)
    • Fix typo (#1121)
    • Fix new_crop_quadruple bug (#1108)
    Source code(tar.gz)
    Source code(zip)
  • v0.17.0(Aug 3, 2021)

    Highlights

    • Support PyTorch 1.9
    • Support Pytorchvideo Transforms
    • Support PreciseBN

    New Features

    • Support Pytorchvideo Transforms (#1008)
    • Support PreciseBN (#1038)

    Improvements

    • Remove redundant augmentations in config files (#996)
    • Make resource directory to hold common resource pictures (#1011)
    • Remove deperecated FrameSelector (#1010)
    • Support Concat Dataset (#1000)
    • Add to-mp4 option to resize_videos.py (#1021)
    • Add option to keep tail frames (#1050)
    • Update MIM support (#1061)
    • Calculate Top-K accurate and inaccurate classes (#1047)

    Bug and Typo Fixes

    • Fix bug in PoseC3D demo (#1009)
    • Fix some problems in resize_videos.py (#1012)
    • Support torch1.9 (#1015)
    • Remove redundant code in CI (#1046)
    • Fix bug about persistent_workers (#1044)
    • Support TimeSformer feature extraction (#1035)
    • Fix ColorJitter (#1025)

    ModelZoo

    • Add TSM-R50 sthv1 models trained by PytorchVideo RandAugment and AugMix (#1008)
    • Update SlowOnly SthV1 checkpoints (#1034)
    • Add SlowOnly Kinetics400 checkpoints trained with Precise-BN (#1038)
    • Add CSN-R50 from scratch checkpoints (#1045)
    • TPN Kinetics-400 Checkpoints trained with the new ColorJitter (#1025)

    Documentation

    • Add Chinese translation of feature_extraction.md (#1020)
    • Fix the code snippet in getting_started.md (#1023)
    • Fix TANet config table (#1028)
    • Add description to PoseC3D dataset (#1053)
    Source code(tar.gz)
    Source code(zip)
  • v0.16.0(Jul 1, 2021)

    Highlights

    • Support using backbone from pytorch-image-models(timm)
    • Support PIMS Decoder
    • Demo for skeleton-based action recognition
    • Support Timesformer

    New Features

    • Support using backbones from pytorch-image-models(timm) for TSN (#880)
    • Support torchvision transformations in preprocessing pipelines (#972)
    • Demo for skeleton-based action recognition (#972)
    • Support Timesformer (#839)

    Improvements

    • Add a tool to find invalid videos (#907, #950)
    • Add an option to specify spectrogram_type (#909)
    • Add json output to video demo (#906)
    • Add MIM related docs (#918)
    • Rename lr to scheduler (#916)
    • Support --cfg-options for demos (#911)
    • Support number counting for flow-wise filename template (#922)
    • Add Chinese tutorial (#941)
    • Change ResNet3D default values (#939)
    • Adjust script structure (#935)
    • Add font color to args in long_video_demo (#947)
    • Polish code style with Pylint (#908)
    • Support PIMS Decoder (#946)
    • Improve Metafiles (#956, #979, #966)
    • Add links to download Kinetics400 validation (#920)
    • Audit the usage of shutil.rmtree (#943)
    • Polish localizer related codes(#913)

    Bug and Typo Fixes

    • Fix spatiotemporal detection demo (#899)
    • Fix docstring for 3D inflate (#925)
    • Fix bug of writing text to video with TextClip (#952)
    • Fix mmcv install in CI (#977)

    ModelZoo

    • Add TSN with Swin Transformer backbone as an example for using pytorch-image-models(timm) backbones (#880)
    • Port CSN checkpoints from VMZ (#945)
    • Release various checkpoints for UCF101, HMDB51 and Sthv1 (#938)
    • Support Timesformer (#839)
    • Update TSM modelzoo (#981)
    Source code(tar.gz)
    Source code(zip)
  • v0.15.0(May 31, 2021)

    Highlights

    • Support PoseC3D
    • Support ACRN
    • Support MIM

    New Features

    • Support PoseC3D (#786, #890)
    • Support MIM (#870)
    • Support ACRN and Focal Loss (#891)
    • Support Jester dataset (#864)

    Improvements

    • Add metric_options for evaluation to docs (#873)
    • Support creating a new label map based on custom classes for demos about spatio temporal demo (#879)
    • Improve document about AVA dataset preparation (#878)
    • Provide a script to extract clip-level feature (#856)

    Bug and Typo Fixes

    • Fix issues about resume (#877, #878)
    • Correct the key name of eval_results dictionary for metric 'mmit_mean_average_precision' (#885)

    ModelZoo

    • Support Jester dataset (#864)
    • Support ACRN and Focal Loss (#891)
    Source code(tar.gz)
    Source code(zip)
  • v0.14.0(May 3, 2021)

    Highlights

    • Support TRN
    • Support Diving48

    New Features

    • Support TRN (#755)
    • Support Diving48 (#835)
    • Support Webcam Demo for Spatio-temporal Action Detection Models (#795)

    Improvements

    • Add softmax option for pytorch2onnx tool (#781)
    • Support TRN (#755)
    • Test with onnx models and TensorRT engines (#758)
    • Speed up AVA Testing (#784)
    • Add self.with_neck attribute (#796)
    • Update installation document (#798)
    • Use a random master port (#809)
    • Update AVA processing data document (#801)
    • Refactor spatio-temporal augmentation (#782)
    • Add QR code in CN README (#812)
    • Add Alternative way to download Kinetics (#817, #822)
    • Refactor Sampler (#790)
    • Use EvalHook in MMCV with backward compatibility (#793)
    • Use MMCV Model Registry (#843)

    Bug and Typo Fixes

    • Fix a bug in pytorch2onnx.py when num_classes <= 4 (#800, #824)
    • Fix demo_spatiotemporal_det.py error (#803, #805)
    • Fix loading config bugs when resume (#820)
    • Make HMDB51 annotation generation more robust (#811)

    ModelZoo

    • Update checkpoint for 256 height in something-V2 (#789)
    • Support Diving48 (#835)
    Source code(tar.gz)
    Source code(zip)
  • v0.13.0(Apr 1, 2021)

    Highlights

    • Support LFB
    • Support using backbone from MMCls/TorchVision
    • Add Chinese documentation

    New Features

    Improvements

    • Add slowfast config/json/log/ckpt for training custom classes of AVA (#678)
    • Set RandAugment as Imgaug default transforms (#585)
    • Add --test-last & --test-best for tools/train.py to test checkpoints after training (#608)
    • Add fcn_testing in TPN (#684)
    • Remove redundant recall functions (#741)
    • Recursively remove pretrained step for testing (#695)
    • Improve demo by limiting inference fps (#668)

    Bug and Typo Fixes

    • Fix a bug about multi-class in VideoDataset (#723)
    • Reverse key-value in anet filelist generation (#686)
    • Fix flow norm cfg typo (#693)

    ModelZoo

    • Add LFB for AVA2.1 (#553)
    • Add TSN with ResNeXt-101-32x4d backbone as an example for using MMCls backbones (#679)
    • Add TSN with Densenet161 backbone as an example for using TorchVision backbones (#720)
    • Add slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb (#690)
    • Add slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb (#704)
    • Add slowonly_nl_kinetics_pretrained_r50_4x16x1(8x8x1)_20e_ava_rgb (#730)
    Source code(tar.gz)
    Source code(zip)
  • v0.12.0(Mar 1, 2021)

    Highlights

    • Support TSM-MobileNetV2
    • Support TANet
    • Support GPU Normalize

    New Features

    • Support TSM-MobileNetV2 (#415)
    • Support flip with label mapping (#591)
    • Add seed option for sampler (#642)
    • Support GPU Normalize (#586)
    • Support TANet (#595)

    Improvements

    • Training custom classes of ava dataset (#555)
    • Add CN README in homepage (#592, #594)
    • Support soft label for CrossEntropyLoss (#625)
    • Refactor config: Specify train_cfg and test_cfg in model (#629)
    • Provide an alternative way to download older kinetics annotations (#597)
    • Update FAQ for
      • 1). data pipeline about video and frames (#598)
      • 2). how to show results (#598)
      • 3). batch size setting for batchnorm (#657)
      • 4). how to fix stages of backbone when finetuning models (#658)
    • Modify default value of save_best (#600)
    • Use BibTex rather than latex in markdown (#607)
    • Add warnings of uninstalling mmdet and supplementary documents (#624)
    • Support soft label for CrossEntropyLoss (#625)

    Bug and Typo Fixes

    • Fix value of pem_low_temporal_iou_threshold in BSN (#556)
    • Fix ActivityNet download script (#601)

    ModelZoo

    • Add TSM-MobileNetV2 for Kinetics400 (#415)
    • Add deeper SlowFast models (#605)
    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Feb 1, 2021)

    Highlights

    • Support imgaug
    • Support spatial temporal demo
    • Refactor EvalHook, config structure, unittest structure

    New Features

    • Support imgaug for augmentations in the data pipeline (#492)
    • Support setting max_testing_views for extremely large models to save GPU memory used (#511)
    • Add spatial temporal demo (#547, #566)

    Improvements

    • Refactor EvalHook (#395)
    • Refactor AVA hook (#567)
    • Add repo citation (#545)
    • Add dataset size of Kinetics400 (#503)
    • Add lazy operation docs (#504)
    • Add class_weight for CrossEntropyLoss and BCELossWithLogits (#509)
    • add some explanation about the resampling in slowfast (#502)
    • Modify paper title in README.md (#512)
    • Add alternative ways to download Kinetics (#521)
    • Add OpenMMLab projects link in README (#530)
    • Change default preprocessing to shortedge to 256 (#538)
    • Add config tag in dataset README (#540)
    • Add solution for markdownlint installation issue (#497)
    • Add dataset overview in readthedocs (#548)
    • Modify the trigger mode of the warnings of missing mmdet (583)
    • Refactor config structure (#488, #572)
    • Refactor unittest structure (#433)

    Bug and Typo Fixes

    • Fix a bug about ava dataset validation (#527)
    • Fix a bug about ResNet pretrain weight initialization (#582)
    • Fix a bug in CI due to MMCV index (#495)
    • Remove invalid links of MiT and MMiT (#516)
    • Fix frame rate bug for AVA preparation (#576)
    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Jan 5, 2021)

    Highlights

    • Support Spatio-Temporal Action Detection (AVA)
    • Support precise BN

    New Features

    • Support precise BN (#501)
    • Support Spatio-Temporal Action Detection (AVA) (#351)
    • Support to return feature maps in inference_recognizer (#458)

    Improvements

    • Add arg stride to long_video_demo.py, to make inference faster (#468)
    • Support training and testing for Spatio-Temporal Action Detection (#351)
    • Fix CI due to pip upgrade (#454)
    • Add markdown lint in pre-commit hook (#255)
    • Speed up confusion matrix calculation (#465)
    • Use title case in modelzoo statistics (#456)
    • Add FAQ documents for easy troubleshooting. (#413, #420, #439)
    • Support Spatio-Temporal Action Detection with context (#471)
    • Add class weight for CrossEntropyLoss and BCELossWithLogits (#509)
    • Add Lazy OPs docs (#504)

    Bug and Typo Fixes

    • Fix typo in default argument of BaseHead (#446)
    • Fix potential bug about output_config overwrite (#463)

    ModelZoo

    • Add SlowOnly, SlowFast for AVA2.1 (#351)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Dec 1, 2020)

    Highlights

    • Support GradCAM utils for recognizers
    • Support ResNet Audio model

    New Features

    • Automatically add modelzoo statistics to readthedocs (#327)
    • Support GYM99 data preparation (#331)
    • Add AudioOnly Pathway from AVSlowFast. (#355)
    • Add GradCAM utils for recognizer (#324)
    • Add print config script (#345)
    • Add online motion vector decoder (#291)

    Improvements

    • Support PyTorch 1.7 in CI (#312)
    • Support to predict different labels in a long video (#274)
    • Update docs bout test crops (#359)
    • Polish code format using pylint manually (#338)
    • Update unittest coverage (#358, #322, #325)
    • Add random seed for building filelists (#323)
    • Update colab tutorial (#367)
    • set default batch_size of evaluation and testing to 1 (#250)
    • Rename the preparation docs to README.md (#388)
    • Move docs about demo to demo/README.md (#329)
    • Remove redundant code in tools/test.py (#310)
    • Automatically calculate number of test clips for Recognizer2D (#359)

    Bug and Typo Fixes

    • Fix rename Kinetics classnames bug (#384)
    • Fix a bug in BaseDataset when data_prefix is None (#314)
    • Fix a bug about tmp_folder in OpenCVInit (#357)
    • Fix get_thread_id when not using disk as backend (#354, #357)
    • Fix the bug of HVU object num_classes from 1679 to 1678 (#307)
    • Fix typo in export_model.md (#399)
    • Fix OmniSource training configs (#321)
    • Fix Issue #306: Bug of SampleAVAFrames (#317)

    ModelZoo

    • Add SlowOnly model for GYM99, both RGB and Flow (#336)
    • Add auto modelzoo statistics in readthedocs (#327)
    • Add TSN for HMDB51 pretrained on Kinetics400, Moments in Time and ImageNet. (#372)
    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Oct 31, 2020)

    v0.8.0 (31/10/2020)

    Highlights

    • Support OmniSource
    • Support C3D
    • Support video recognition with audio modality
    • Support HVU
    • Support X3D

    New Features

    • Support AVA dataset preparation (#266)
    • Support the training of video recognition dataset with multiple tag categories (#235)
    • Support joint training with multiple training datasets of multiple formats, including images, untrimmed videos, etc. (#242)
    • Support to specify a start epoch to conduct evaluation (#216)
    • Implement X3D models, support testing with model weights converted from SlowFast (#288)

    Improvements

    • Set default values of 'average_clips' in each config file so that there is no need to set it explicitly during testing in most cases (#232)
    • Extend HVU datatools to generate individual file list for each tag category (#258)
    • Support data preparation for Kinetics-600 and Kinetics-700 (#254)
    • Add cfg-options in arguments to override some settings in the used config for convenience (#212)
    • Rename the old evaluating protocol mean_average_precision as mmit_mean_average_precision since it is only used on MMIT and is not the mAP we usually talk about. Add mean_average_precision, which is the real mAP (#235)
    • Add accurate setting (Three crop * 2 clip) and report corresponding performance for TSM model (#241)
    • Add citations in each preparing_dataset.md in tools/data/dataset (#289)
    • Update the performance of audio-visual fusion on Kinetics-400 (#281)
    • Support data preparation of OmniSource web datasets, including GoogleImage, InsImage, InsVideo and KineticsRawVideo (#294)
    • Use metric_options dict to provide metric args in evaluate (#286)

    Bug Fixes

    • Register FrameSelector in PIPELINES (#268)
    • Fix the potential bug for default value in dataset_setting (#245)
    • Fix the data preparation bug for something-something dataset (#278)
    • Fix the invalid config url in slowonly README data benchmark (#249)
    • Validate that the performance of models trained with videos have no significant difference comparing to the performance of models trained with rawframes (#256)
    • Correct the img_norm_cfg used by TSN-3seg-R50 UCF-101 model, improve the Top-1 accuracy by 3% (#273)

    ModelZoo

    • Add Baselines for Kinetics-600 and Kinetics-700, including TSN-R50-8seg and SlowOnly-R50-8x8 (#259)
    • Add OmniSource benchmark on MiniKineitcs (#296)
    • Add Baselines for HVU, including TSN-R18-8seg on 6 tag categories of HVU (#287)
    • Add X3D models ported from SlowFast (#288)
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Oct 3, 2020)

    Highlights

    • Support TPN
    • Support JHMDB, UCF101-24, HVU dataset preparation
    • support onnx model conversion

    New Features

    • Support the data pre-processing pipeline for the HVU Dataset (#277)
    • Support real-time action recognition from web camera (#171)
    • Support onnx (#160)
    • Support UCF101-24 preparation (#219)
    • Support evaluating mAP for ActivityNet with CUHK17_activitynet_pred (#176)
    • Add the data pipeline for ActivityNet, including downloading videos, extracting RGB and Flow frames, finetuning TSN and extracting feature (#190)
    • Support JHMDB preparation (#220)

    ModelZoo

    • Add finetuning setting for SlowOnly (#173)
    • Add TSN and SlowOnly models trained with OmniSource, which achieve 75.7% Top-1 with TSN-R50-3seg and 80.4% Top-1 with SlowOnly-R101-8x8 (#215)

    Improvements

    • Support demo with video url (#165)
    • Support multi-batch when testing (#184)
    • Add tutorial for adding a new learning rate updater (#181)
    • Add config name in meta info (#183)
    • Remove git hash in __version__ (#189)
    • Check mmcv version (#189)
    • Update url with 'https://download.openmmlab.com' (#208)
    • Update Docker file to support PyTorch 1.6 and update install.md (#209)
    • Polish readsthedocs display (#217, #229)

    Bug Fixes

    • Fix the bug when using OpenCV to extract only RGB frames with original shape (#184)
    • Fix the bug of sthv2 num_classes from 339 to 174 (#174, #207)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Sep 2, 2020)

    Highlights

    • Support TIN, CSN, SSN, NonLocal
    • Support FP16 training

    New Features

    • Support NonLocal module and provide ckpt in TSM and I3D (#41)
    • Support SSN (#33, #37, #52, #55)
    • Support CSN (#87)
    • Support TIN (#53)
    • Support HMDB51 dataset preparation (#60)
    • Support encoding videos from frames (#84)
    • Support FP16 training (#25)
    • Enhance demo by supporting rawframe inference (#59), output video/gif (#72)

    ModelZoo

    • Update Slowfast modelzoo (#51)
    • Update TSN, TSM video checkpoints (#50)
    • Add data benchmark for TSN (#57)
    • Add data benchmark for SlowOnly (#77)
    • Add BSN/BMN performance results with feature extracted by our codebase (#99)

    Improvements

    • Polish data preparation codes (#70)
    • Improve data preparation scripts (#58)
    • Improve unittest coverage and minor fix (#62)
    • Support PyTorch 1.6 in CI (#117)
    • Support with_offset for rawframe dataset (#48)
    • Support json annotation files (#119)
    • Support multi-class in TSMHead (#104)
    • Support using val_step() to validate data for each val workflow (#123)
    • Use xxInit() method to get total_frames and make total_frames a required key (#90)
    • Add paper introduction in model readme (#140)
    • Adjust the directory structure of tools/ and rename some scripts files (#142)

    Bug Fixes

    • Fix configs for localization test (#67)
    • Fix configs of SlowOnly by fixing lr to 8 gpus (#136)
    • Fix the bug in analyze_log (#54)
    • Fix the bug of generating HMDB51 class index file (#69)
    • Fix the bug of using load_checkpoint() in ResNet (#93)
    • Fix the bug of --work-dir when using slurm training script (#110)
    • Correct the sthv1/sthv2 rawframes filelist generate command (#71)
    • CosineAnnealing typo (#47)
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Jul 21, 2020)

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

PySlowFast PySlowFast is an open source video understanding codebase from FAIR that provides state-of-the-art video classification models with efficie

Meta Research 4.4k Dec 1, 2021
Detectron2 is FAIR's next-generation platform for object detection and segmentation.

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up r

Facebook Research 19.1k Dec 2, 2021
Code for the paper "Next Generation Reservoir Computing"

Next Generation Reservoir Computing This is the code for the results and figures in our paper "Next Generation Reservoir Computing". They are written

OSU QuantInfo Lab 66 Nov 25, 2021
[CVPR2021] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

UAV-Human Official repository for CVPR2021: UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicle Paper arXiv Res

null 83 Nov 16, 2021
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English ⚖️ ?? ??‍?? ??‍⚖️ Dataset Summary Inspired by the recent widespread use of th

null 30 Nov 17, 2021
OpenMMLab Detection Toolbox and Benchmark

MMDetection is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project.

OpenMMLab 17.4k Dec 2, 2021
OpenMMLab Semantic Segmentation Toolbox and Benchmark.

Documentation: https://mmsegmentation.readthedocs.io/ English | 简体中文 Introduction MMSegmentation is an open source semantic segmentation toolbox based

OpenMMLab 2.8k Dec 2, 2021
OpenMMLab Pose Estimation Toolbox and Benchmark.

Introduction English | 简体中文 MMPose is an open-source toolbox for pose estimation based on PyTorch. It is a part of the OpenMMLab project. The master b

OpenMMLab 1.4k Nov 24, 2021
An open-access benchmark and toolbox for electricity price forecasting

epftoolbox The epftoolbox is the first open-access library for driving research in electricity price forecasting. Its main goal is to make available a

null 52 Nov 24, 2021
The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

Dinghan Shen 37 Nov 25, 2021
Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

Wasi Ahmad 70 Nov 21, 2021
CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

fastNLP 80 Nov 30, 2021
The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Temporal Query Networks for Fine-grained Video Understanding ?? This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks

null 35 Nov 25, 2021
PyTorchVideo is a deeplearning library with a focus on video understanding work

PyTorchVideo is a deeplearning library with a focus on video understanding work. PytorchVideo provides resusable, modular and efficient components needed to accelerate the video understanding research. PyTorchVideo is developed using PyTorch and supports different deeplearning video components like video models, video datasets, and video-specific transforms.

Facebook Research 2.1k Nov 26, 2021
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

TimeSformer This is an official pytorch implementation of Is Space-Time Attention All You Need for Video Understanding?. In this repository, we provid

Facebook Research 659 Nov 29, 2021
Towards Long-Form Video Understanding

Towards Long-Form Video Understanding Chao-Yuan Wu, Philipp Krähenbühl, CVPR 2021 [Paper] [Project Page] [Dataset] Citation @inproceedings{lvu2021,

Chao-Yuan Wu 56 Nov 18, 2021
EssentialMC2 Video Understanding

EssentialMC2 Introduction EssentialMC2 is a complete system to solve video understanding tasks including MHRL(representation learning), MECR2( relatio

Alibaba 47 Nov 25, 2021
OpenMMLab Image and Video Editing Toolbox

Introduction MMEditing is an open source image and video editing toolbox based on PyTorch. It is a part of the OpenMMLab project. The master branch wo

OpenMMLab 2.6k Nov 26, 2021