This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

Overview

Elaborative Rehearsal for Zero-shot Action Recognition

This is an official implementation of:

Shizhe Chen and Dong Huang, Elaborative Rehearsal for Zero-shot Action Recognition, ICCV, 2021. Arxiv Version

Elaborating a new concept and relating it to known concepts, we reach the dawn of zero-shot action recognition models being comparable to supervised models trained on few samples.

New SOTA results are also achieved on the standard ZSAR benchmarks (Olympics, HMDB51, UCF101) as well as the first large scale ZSAR benchmak (we proposed) on the Kinetics database.
PWC PWC PWC PWC

Installation

git clone https://github.com/DeLightCMU/ElaborativeRehearsal.git
cd ElaborativeRehearsal
export PYTHONPATH=$(pwd):${PYTHONPATH}

pip install -r requirements.txt

# download pretrained models
bash scripts/download_premodels.sh

Zero-shot Action Recognition (ZSAR)

Extract Features in Video

  1. spatial-temporal features
bash scripts/extract_tsm_features.sh '0,1,2'
  1. object features
bash scripts/extract_object_features.sh '0,1,2'

ZSAR Training and Inference

  1. Baselines: DEVISE, ALE, SJE, DEM, ESZSL and GCN.
# mtype: devise, ale, sje, dem, eszsl
mtype=devise
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} --is_train
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} --eval_set tst
# evaluate other splits
ksplit=1
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_baselines_eval_splits.py zeroshot/configs/zsl_baseline_${mtype}_config.yaml ${mtype} ${ksplit}

# gcn
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_kgraphs.py zeroshot/configs/zsl_baseline_kgraph_config.yaml --is_train
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_kgraphs.py zeroshot/configs/zsl_baseline_kgraph_config.yaml --eval_set tst
  1. ER-ZSAR and ablations:
# TSM + ED class representation + AttnPool (2nd row in Table 4(b))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_vse.py zeroshot/configs/zsl_vse_wordembed_config.yaml --is_train --resume_file datasets/Kinetics/zsl220/word.glove42b.th

# TSM + ED class representation + BERT (last row in Table 4(a) and Table 4(b))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_vse.py zeroshot/configs/zsl_vse_config.yaml --is_train

# Obj + ED class representation + BERT + ER Loss (last row in Table 4(c))
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_cptembed.py zeroshot/configs/zsl_cpt_config.yaml --is_train

# ER-ZSAR Full Model
CUDA_VISIBLE_DEVICES=0 python zeroshot/driver/zsl_ervse.py zeroshot/configs/zsl_ervse_config.yaml --is_train

Citation

If you find this repository useful, please cite our paper:

@proceeding{ChenHuang2021ER,
  title={Elaborative Rehearsal for Zero-shot Action Recognition},
  author={Shizhe Chen and Dong Huang},
  booktitle = {ICCV},
  year={2021}
}

Acknowledgement

Comments
  • model's checkpoints on Kinetics for testing

    model's checkpoints on Kinetics for testing

    Thanks for your impressive work!

    Recently, I tried to re-implement this work with your official codebase, following the same training/test protocols as in this codebase. However, the results on Kinetics I got were lower than the results reported in the paper, as shown below. image

    I'm trying to find out why this happened. Could you release the model's checkpoints used for testing on Kinetics? This will help me a lot.

    Looking forward to your reply, thanks!

    opened by Jiaming-Zhou 3
  • About the Kinetics datasets.

    About the Kinetics datasets.

    Dear authors,

    Thanks for this amazing work, and we are very interested in it.

    I am a bit confused, to run the code: bash scripts/extract_tsm_features.sh '0,1,2'

    1. Shall we download the Kinetics dataset ourselves and extract them into images?
    2. Can you please give any hints about 'we obtain 220 new action classes outside of Kinetics-400 after cleaning'?

    We are very grateful for your help!

    Best Wishes.

    opened by haoranD 2
  • Pretrained model link broken

    Pretrained model link broken

    The link for pretrained model for kinetics dataset seems to broken. Can you please check that? Also, could you kindly release the pretrained models for the other three datasets as well?

    opened by Atharva-Chandak 2
  • Pretrained tsm model cannot be downloaded

    Pretrained tsm model cannot be downloaded

    Thanks for releasing the code. I found the pretrained tsm model at [https://file.lzhu.me/projects/tsm/models/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_dense.pth] is missing at the moment. Please fix it : )

    opened by acewjh 1
  • About release the collected EDs publicly

    About release the collected EDs publicly

    Thanks for your great work. But why only release Elaborative Descriptions on the Kinetics dataset?We also would like the Elaborative Descriptions of HMDB51,UCF101,and Olympic Sports. Looking forward to your reply.

    opened by zhiyiGao 1
  • [Help] Extracting TSM Kinetics features

    [Help] Extracting TSM Kinetics features

    Hello.

    Thanks for the great work and repository.

    However, when I run the shell file to extract kinetics features, dataloading does not start. It seems to be stuck at line 114 in extract_video_features.py ("for batch in test_loader:").

    Besides, when I was installing packages, sci==0.1.7 was not installed properly and maybe that is the possible issue.

    Can you help me figure out this issue?

    opened by wjun0830 0
  • Paper details

    Paper details

    Thanks for your impressive work! Q1: About the standard contrastive loss in Equation.9 If I am right, the loss is just the crossentropyloss? Sorry, I am a new learner! Q2: About the details in Equation.11 You regard $q_c^n$ as the ground-truth object labels. If I am right, $q_c^n$ is the top N objects predicted by BiT? And you cannot provide the hand-marked or true labels in the paper. I am very confused between Equation.10 and Equation.11. If I am right, the Equation.10 computes the crossentropyloss of similarity score in video level($x_{vo} + x_{ov}$ with Z), but the Equation.11 computes in object level? i cannot get the idea and the meaning of the subscript c in Equation.11 Looking forward to your reply~

    opened by lovelyczli 0
  • Questions about Elaborative Description and adapt to the datasets

    Questions about Elaborative Description and adapt to the datasets

    Thank you very much for your excellent work, we had some issues that need your answer. 1.About Elaborative Description We found some errors in the public EDs,and some descriptions are not very appropriate. Can we manually revise them? In HMDB51 dataset: “word”: “Pick” “defn”: “detach and remove (a flower, fruit, or vegetable) from where it is growing.” In UCF101 dataset: “word”: “TableTennisShot” “defn”: “put (food) into the mouth and chew and swallow it.” 2.About adapt to the datasets The released code is on the proposed Kinetics ZSAR Benchmark. How should we get Kinetics ZSAR Benchmark? Would you give some instructions to download and adapt to the datasets (Olympic Sports, HMDB51 and UCF101)? We now have the HMDB51 and UCF101 datasets. If we run on the HMDB51 and UCF101, how should we modify the code? Begging to point out what needs to be modified. Looking forward to your reply.

    opened by zhiyiGao 4
Owner
DeLightCMU
Research group at CMU
DeLightCMU
The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

PIC4SeRCentre 20 Jan 3, 2023
Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

Pilhyeon Lee 67 Jan 3, 2023
Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Semantic Diversity Learning for Zero-Shot Multi-label Classification Paper Official PyTorch Implementation Avi Ben-Cohen, Nadav Zamir, Emanuel Ben Bar

null 28 Aug 29, 2022
An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

CV Lab @ Yonsei University 35 Oct 26, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

Lucas 103 Dec 14, 2022
[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Counterfactual Zero-Shot and Open-Set Visual Recognition This project provides implementations for our CVPR 2021 paper Counterfactual Zero-S

null 144 Dec 24, 2022
Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting

Pytorch Pedestrian Attribute Recognition: A strong PyTorch baseline of pedestrian attribute recognition and multi-label classification.

Jian 79 Dec 18, 2022
Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

CSRA This is the official code of ICCV 2021 paper: Residual Attention: A Simple But Effective Method for Multi-Label Recoginition Demo, Train and Vali

null 163 Dec 22, 2022
Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos Introduction This repo is official PyTorch implementatio

Gyeongsik Moon 29 Sep 24, 2022
Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

AimCLR This is an official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Reco

Gty 44 Dec 17, 2022
PyTorch implementation of 1712.06087 "Zero-Shot" Super-Resolution using Deep Internal Learning

Unofficial PyTorch implementation of "Zero-Shot" Super-Resolution using Deep Internal Learning Unofficial Implementation of 1712.06087 "Zero-Shot" Sup

Jacob Gildenblat 196 Nov 27, 2022
GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks This repository implements a capsule model Inten

Joel Huang 15 Dec 24, 2022
Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [Paper] [Colab is coming soon] Approach Example Usage To r

null 6 Dec 1, 2021
Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

Tim Ansell 70 Nov 4, 2022
Human Action Controller - A human action controller running on different platforms.

Human Action Controller (HAC) Goal A human action controller running on different platforms. Fun Easy-to-use Accurate Anywhere Fun Examples Mouse Cont

null 27 Jul 20, 2022
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

null 268 Jan 9, 2023
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

null 32 Sep 25, 2021