[ICCV 2021 Oral] Deep Evidential Action Recognition

Wentao Bao

Last update: Jan 3, 2023

Related tags

Deep Learning model-calibration uncertainty-quantification action-recognition video-understanding debiasing evidential-deep-learning ood-detection openset-recognition

Overview

DEAR (Deep Evidential Action Recognition)

Project | Paper & Supp

Wentao Bao, Qi Yu, Yu Kong

International Conference on Computer Vision (ICCV Oral), 2021.

Introduction
Installation
Datasets
Testing
Training
Model Zoo
Citation

Introduction

We propose the Deep Evidential Action Recognition (DEAR) method to recognize actions in an open world. Specifically, we formulate the action recognition problem from the evidential deep learning (EDL) perspective and propose a novel model calibration method to regularize the EDL training. Besides, to mitigate the static bias of video representation, we propose a plug-and-play module to debias the learned representation through contrastive learning. Our DEAR model trained on UCF-101 dataset achieves significant and consistent performance gains based on multiple action recognition models, i.e., I3D, TSM, SlowFast, TPN, with HMDB-51 or MiT-v2 dataset as the unknown.

Demo

The following figures show the inference results by the SlowFast + DEAR model trained on UCF-101 dataset.

UCF-101 (Known)
HMDB-51 (Unknown)

Installation

This repo is developed from MMAction2 codebase. Since MMAction2 is updated in a fast pace, most of the requirements and installation steps are similar to the version MMAction2 v0.9.0.

Requirements and Dependencies

Here we only list our used requirements and dependencies. It would be great if you can work around with the latest versions of the listed softwares and hardwares on the latest MMAction2 codebase.

Linux: Ubuntu 18.04 LTS
GPU: GeForce RTX 3090, A100-SXM4
CUDA: 11.0
GCC: 7.5
Python: 3.7.9
Anaconda: 4.9.2
PyTorch: 1.7.1+cu110
TorchVision: 0.8.2+cu110
OpenCV: 4.4.0
MMCV: 1.2.1
MMAction2: 0.9.0

Installation Steps

The following steps are modified from MMAction2 (v0.9.0) installation document. If you encountered problems, you may refer to more details in the official document, or raise an issue in this repo.

a. Create a conda virtual environment of this repo, and activate it:

conda create -n mmaction python=3.7 -y
conda activate mmaction

b. Install PyTorch and TorchVision following the official instructions, e.g.,

conda install pytorch=1.7.1 cudatoolkit=11.0 torchvision=0.8.2 -c pytorch

c. Install mmcv, we recommend you to install the pre-build mmcv as below.

pip install mmcv-full==1.2.1 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.1/index.html

Important: If you have already installed mmcv and try to install mmcv-full, you have to uninstall mmcv first by running pip uninstall mmcv. Otherwise, there will be ModuleNotFoundError.

d. Clone the source code of this repo:

git clone https://github.com/Cogito2012/DEAR.git mmaction2
cd mmaction2

e. Install build requirements and then install DEAR.

pip install -r requirements/build.txt
pip install -v -e .  # or "python setup.py develop"

If no error appears in your installation steps, then you are all set!

Datasets

This repo uses standard video action datasets, i.e., UCF-101 for closed set training, and HMDB-51 and MiT-v2 test sets as two different unknowns. Please refer to the default MMAction2 dataset setup steps to setup these three datasets correctly.

Note: You can just ignore the Step 3. Extract RGB and Flow in the referred setup steps since all codes related to our paper do not rely on extracted frames and optical flow. This will save you large amount of disk space!

Testing

To test our pre-trained models (see the Model Zoo), you need to download a model file and unzip it under work_dir. Let's take the I3D-based DEAR model as an example. First, download the pre-trained I3D-based models, where the full DEAR model is saved in the folder finetune_ucf101_i3d_edlnokl_avuc_debias. The following directory tree is for your reference to place the downloaded files.

work_dirs    
├── i3d
│    ├── finetune_ucf101_i3d_bnn
│    │   └── latest.pth
│    ├── finetune_ucf101_i3d_dnn
│    │   └── latest.pth
│    ├── finetune_ucf101_i3d_edlnokl
│    │   └── latest.pth
│    ├── finetune_ucf101_i3d_edlnokl_avuc_ced
│    │   └── latest.pth
│    ├── finetune_ucf101_i3d_edlnokl_avuc_debias
│    │   └── latest.pth
│    └── finetune_ucf101_i3d_rpl
│        └── latest.pth
├── slowfast
├── tpn_slowonly
└── tsm

a. Closed Set Evaluation.

Top-K accuracy and mean class accuracy will be reported.

cd experiments/i3d
bash evaluate_i3d_edlnokl_avuc_debias_ucf101.sh

b. Get Uncertainty Threshold.

The threshold value of one model will be reported.

cd experiments/i3d
# run the thresholding with BATCH_SIZE=2 on GPU_ID=0
bash run_get_threshold.sh 0 edlnokl_avuc_debias 2

c. Open Set Evaluation and Comparison.

The open set evaluation metrics and openness curves will be reported.

Note: Make sure the threshold values of different models are from the reported results in step b.

cd experiments/i3d
bash run_openness.sh HMDB  # use HMDB-51 test set as the Unknown
bash run_openness.sh MiT  # use MiT-v2 test set as the Unknown

d. Out-of-Distribution Detection.

The uncertainty distribution figure of a specified model will be reported.

cd experiments/i3d
bash run_ood_detection.sh 0 HMDB edlnokl_avuc_debias

e. Draw Open Set Confusion Matrix

The confusion matrix with unknown dataset used will be reported.

cd experiments/i3d
bash run_draw_confmat.sh HMDB  # or MiT

Training

Let's still take the I3D-based DEAR model as an example.

cd experiments/i3d
bash finetune_i3d_edlnokl_avuc_debias_ucf101.sh 0

Since model training is time consuming, we strongly recommend you to run the above training script in a backend way if you are using SSH remote connection.

nohup bash finetune_i3d_edlnokl_avuc_debias_ucf101.sh 0 >train.log 2>&1 &
# monitoring the training status whenever you open a new terminal
tail -f train.log

Visualizing the training curves (losses, accuracies, etc.) on TensorBoard:

cd work_dirs/i3d/finetune_ucf101_i3d_edlnokl_avuc_debias/tf_logs
tensorboard --logdir=./ --port 6008

Then, you will see the generated url address http://localhost:6008. Open this address with your Internet Browser (such as Chrome), you will monitoring the status of training.

If you are using SSH connection to a remote server without monitor, tensorboard visualization can be done on your local machine by manually mapping the SSH port number:

ssh -L 16008:localhost:6008 {your_remote_name}@{your_remote_ip}

Then, you can monitor the tensorboard by the port number 16008 by typing http://localhost:16008 in your browser.

Model Zoo

The pre-trained weights (checkpoints) are available below.

Model	Checkpoint	Train Config	Test Config	Open maF1 (%)	Open Set AUC (%)	Closed Set ACC (%)
I3D + DEAR	ckpt	train	test	77.24 / 69.98	77.08 / 81.54	93.89
TSM + DEAR	ckpt	train	test	84.69 / 70.15	78.65 / 83.92	94.48
TPN + DEAR	ckpt	train	test	81.79 / 71.18	79.23 / 81.80	96.30
SlowFast + DEAR	ckpt	train	test	85.48 / 77.28	82.94 / 86.99	96.48

For other checkpoints of the compared baseline models, please download them in the Google Drive.

Citation

If you find the code useful in your research, please cite:

@inproceedings{BaoICCV2021DEAR,
  author = "Bao, Wentao and Yu, Qi and Kong, Yu",
  title = "Evidential Deep Learning for Open Set Action Recognition",
  booktitle = "International Conference on Computer Vision (ICCV)",
  year = "2021"
}

License

See Apache-2.0 License

Acknowledgement

In addition to the MMAction2 codebase, this repo contains modified codes from:

pytorch-classification-uncertainty: for implementation of the EDL (NeurIPS-2018).
ARPL: for implementation of baseline method RPL (ECCV-2020).
OSDN: for implementation of baseline method OpenMax (CVPR-2016).
bayes-by-backprop: for implementation of the baseline method Bayesian Neural Networks (BNNs).
rebias: for implementation of HSIC regularizer used in ReBias (ICML-2020)

We sincerely thank the owners of all these great repos!

Comments

About evaluation protocols.

Hello.

Thanks for the interesting work.

Besides, I was curious about the design choice of the evaluation metric regarding 10 random trials of unknown class selection.

What's the difference between using a whole hmdb51/MiT-v2 dataset as open set?

Since the training is conducted with the whole UCF101 dataset, I was thinking that using hmdb51/MIT dataset as a whole would make no difference to your evaluation protocols.

Thank you for the great work.

opened by wjun0830 2
ImportError: cannot import name 'version' from 'mmaction' (unknown location)报错

cd experiments/i3d bash finetune_i3d_edlnokl_avuc_debias_ucf101.sh 0 报错：Traceback (most recent call last): File "tools/train.py", line 16, in from mmaction import version ImportError: cannot import name 'version' from 'mmaction' (unknown location) Experiments finished!

opened by syjxxxx 2
Question about the evidential loss term in Eq.(1)

Dear authors,

Thanks a lot for the interesting paper and open-resourced repo.

It is awesome to incorporate the evidential deep learning (EDL) trick into action recognition task. But I have a question about the EDL loss term, i.e., Eq. (1) in DEAR paper. In this repo, the EDL loss with log function is set to be the default choice for running DEAR algorithm. I wonder if the digamma function can work properly? For me, I have conducted some experiments about training a EDL loss with digamma function on CIFAR100 dataset (based on the implementation in this repo) and I just found that the EDL loss term decreases quite hard and slow and the model does not get well optimized. I am curious if it happens to you?

Look forward to you response.

Best, Haiming

opened by HeimingX 2
Computation of HSIC

Hi,

Thank you for your great work. I was going through the code and notice that here you would like to make only the diagonal elements zero which is consistent with the original paper "Feature Selection via Dependence Maximization". However, torch.diag function which you used here, has different behaviors when the input is vector or a matrix. Since here it is taking a matrix as the input, its output is a vector containing diagonal elements and kernel_XX - torch.diag(kernel_XX) subtracts the diagonal elements from all the elements in the corresponding column. The fix would be to change this line to kernel_XX - torch.diag(torch.diag(kernel_XX)). Could you please confirm this?

opened by Pirazh 1
Evidential Uncertainty Calibration

Very nice paper. I've been facing similar issues as the ones reported in this figure when using evidential uncertainty quantification:

In your work, you discuss we could add an Accuracy vs. Uncertainty loss function that looks like this:

You proposed a new version of it, as shown in the equation below:

I could find the implementation of the equation above here:

https://github.com/Cogito2012/DEAR/blob/2a64f6a4be878a52046f043b50311af7316d3c33/mmaction/models/losses/edl_loss.py#L113-L144

It is unclear to me when to use the disentangle case. Could you please provide me with any insights about when to use one over the other?

Thanks :)

opened by muammar 2
Report an error

Report an error: FileNotFoundError: [Errno 2] No such file or directory: 'experiments/i3d/results/I3D_EDLNoKLAvUCDebias_EDL_trainset_uncertainties.npz' Experiments finished! Is there something wrong or missed in the operation?

opened by syjxxxx 17

[ICCV 2021 Oral] Deep Evidential Action Recognition

Related tags

Overview

DEAR (Deep Evidential Action Recognition)

Table of Contents

Introduction

Demo

Installation

Requirements and Dependencies

Installation Steps

Datasets

Testing

Training

Model Zoo

Citation

License

Acknowledgement

Comments

About evaluation protocols.

ImportError: cannot import name 'version' from 'mmaction' (unknown location)报错

Question about the evidential loss term in Eq.(1)

Computation of HSIC

Evidential Uncertainty Calibration

Report an error

Owner

Wentao Bao

Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

[ICCV 2021] Group-aware Contrastive Regression for Action Quality Assessment

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

Human Action Controller - A human action controller running on different platforms.

Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral

BARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)

[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

[ICCV 2021 (oral)] Planar Surface Reconstruction from Sparse Views

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)

Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

[ICCV 2021 Oral] SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021