This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

DeLightCMU

Last update: Sep 24, 2022

Related tags

Deep Learning ElaborativeRehearsal

Overview

Elaborative Rehearsal for Zero-shot Action Recognition

This is an official implementation of:

Shizhe Chen and Dong Huang, Elaborative Rehearsal for Zero-shot Action Recognition, ICCV, 2021. Arxiv Version

Elaborating a new concept and relating it to known concepts, we reach the dawn of zero-shot action recognition models being comparable to supervised models trained on few samples.

New SOTA results are also achieved on the standard ZSAR benchmarks (Olympics, HMDB51, UCF101) as well as the first large scale ZSAR benchmak (we proposed) on the Kinetics database.

Installation

git clone https://github.com/DeLightCMU/ElaborativeRehearsal.git
cd ElaborativeRehearsal
export PYTHONPATH=$(pwd):${PYTHONPATH}

pip install -r requirements.txt

# download pretrained models
bash scripts/download_premodels.sh

Zero-shot Action Recognition (ZSAR)

Extract Features in Video

spatial-temporal features

bash scripts/extract_tsm_features.sh '0,1,2'

object features

bash scripts/extract_object_features.sh '0,1,2'

ZSAR Training and Inference

Baselines: DEVISE, ALE, SJE, DEM, ESZSL and GCN.

# mtype: devise, ale, sje, dem, eszsl
mtype=devise
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} --is_train
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} --eval_set tst
# evaluate other splits
ksplit=1
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines_eval_splits.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} ${ksplit}

# gcn
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_kgraphs.py zeroshot/configs/zsl_baseline_kgraph_config.yaml --is_train
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_kgraphs.py zeroshot/configs/zsl_baseline_kgraph_config.yaml --eval_set tst

ER-ZSAR and ablations:

# TSM + ED class representation + AttnPool (2nd row in Table 4(b))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_vse.py zeroshot/configs/zsl_vse_wordembed_config.yaml --is_train --resume_file datasets/Kinetics/zsl220/word.glove42b.th

# TSM + ED class representation + BERT (last row in Table 4(a) and Table 4(b))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_vse.py zeroshot/configs/zsl_vse_config.yaml --is_train

# Obj + ED class representation + BERT + ER Loss (last row in Table 4(c))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_cptembed.py zeroshot/configs/zsl_cpt_config.yaml --is_train

# ER-ZSAR Full Model
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_ervse.py zeroshot/configs/zsl_ervse_config.yaml --is_train

Citation

If you find this repository useful, please cite our paper:

@proceeding{ChenHuang2021ER,
  title={Elaborative Rehearsal for Zero-shot Action Recognition},
  author={Shizhe Chen and Dong Huang},
  booktitle = {ICCV},
  year={2021}
}

Acknowledgement

Comments

model's checkpoints on Kinetics for testing

Thanks for your impressive work!

Recently, I tried to re-implement this work with your official codebase, following the same training/test protocols as in this codebase. However, the results on Kinetics I got were lower than the results reported in the paper, as shown below.

I'm trying to find out why this happened. Could you release the model's checkpoints used for testing on Kinetics? This will help me a lot.

Looking forward to your reply, thanks!

opened by Jiaming-Zhou 3
About the Kinetics datasets.
Dear authors,

Thanks for this amazing work, and we are very interested in it.

I am a bit confused, to run the code: bash scripts/extract_tsm_features.sh '0,1,2'

Shall we download the Kinetics dataset ourselves and extract them into images?

Can you please give any hints about 'we obtain 220 new action classes outside of Kinetics-400 after cleaning'?

We are very grateful for your help!

Best Wishes.
opened by haoranD 2
Pretrained model link broken

The link for pretrained model for kinetics dataset seems to broken. Can you please check that? Also, could you kindly release the pretrained models for the other three datasets as well?

opened by Atharva-Chandak 2
Pretrained tsm model cannot be downloaded

Thanks for releasing the code. I found the pretrained tsm model at [https://file.lzhu.me/projects/tsm/models/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_dense.pth] is missing at the moment. Please fix it : )

opened by acewjh 1
About release the collected EDs publicly

Thanks for your great work. But why only release Elaborative Descriptions on the Kinetics dataset？We also would like the Elaborative Descriptions of HMDB51，UCF101，and Olympic Sports. Looking forward to your reply.

opened by zhiyiGao 1
[Help] Extracting TSM Kinetics features

Hello.

Thanks for the great work and repository.

However, when I run the shell file to extract kinetics features, dataloading does not start. It seems to be stuck at line 114 in extract_video_features.py ("for batch in test_loader:").

Besides, when I was installing packages, sci==0.1.7 was not installed properly and maybe that is the possible issue.

Can you help me figure out this issue?

opened by wjun0830 0
Paper details

Thanks for your impressive work! Q1: About the standard contrastive loss in Equation.9 If I am right, the loss is just the crossentropyloss? Sorry, I am a new learner! Q2: About the details in Equation.11 You regard $q_c^n$ as the ground-truth object labels. If I am right, $q_c^n$ is the top N objects predicted by BiT? And you cannot provide the hand-marked or true labels in the paper. I am very confused between Equation.10 and Equation.11. If I am right, the Equation.10 computes the crossentropyloss of similarity score in video level($x_{vo} + x_{ov}$ with Z), but the Equation.11 computes in object level? i cannot get the idea and the meaning of the subscript c in Equation.11 Looking forward to your reply~

opened by lovelyczli 0
Questions about Elaborative Description and adapt to the datasets

Thank you very much for your excellent work, we had some issues that need your answer. 1.About Elaborative Description We found some errors in the public EDs，and some descriptions are not very appropriate. Can we manually revise them? In HMDB51 dataset: “word”: “Pick” “defn”: “detach and remove (a flower, fruit, or vegetable) from where it is growing.” In UCF101 dataset: “word”: “TableTennisShot” “defn”: “put (food) into the mouth and chew and swallow it.” 2.About adapt to the datasets The released code is on the proposed Kinetics ZSAR Benchmark. How should we get Kinetics ZSAR Benchmark? Would you give some instructions to download and adapt to the datasets (Olympic Sports, HMDB51 and UCF101)? We now have the HMDB51 and UCF101 datasets. If we run on the HMDB51 and UCF101, how should we modify the code? Begging to point out what needs to be modified. Looking forward to your reply.

opened by zhiyiGao 4

This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

Related tags

Overview

Elaborative Rehearsal for Zero-shot Action Recognition

Installation

Zero-shot Action Recognition (ZSAR)

Extract Features in Video

ZSAR Training and Inference

Citation

Acknowledgement

Comments

model's checkpoints on Kinetics for testing

About the Kinetics datasets.

Pretrained model link broken

Pretrained tsm model cannot be downloaded

About release the collected EDs publicly

[Help] Extracting TSM Kinetics features

Paper details

Questions about Elaborative Description and adapt to the datasets

Owner

DeLightCMU

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting

Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

PyTorch implementation of 1712.06087 "Zero-Shot" Super-Resolution using Deep Internal Learning

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

Human Action Controller - A human action controller running on different platforms.

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"