BMVC 2021: This is the github repository for "Few Shot Temporal Action Localization using Query Adaptive Transformers" accepted in British Machine Vision Conference (BMVC) 2021, Virtual

Sauradip Nag

Last update: Dec 9, 2022

Related tags

Deep Learning transformers action-recognition meta-learning activity-detection few-shot-learning action-localization

Overview

FS-QAT: Few Shot Temporal Action Localization using Query Adaptive Transformer

Accepted as Poster in BMVC 2021

This is an official implementation in PyTorch of FS-QAT. Our paper is available at Arxiv

Updates

(October, 2021) We released FS-QAT training and inference code for ActivityNet dataset.
(October, 2021) FS-QAT is accepted in BMVC2021.

Abstract

Existing temporal action localization (TAL) works rely on a large number of training videos with exhaustive segment-level annotation, preventing them from scaling to new classes. As a solution to this problem, few-shot TAL (FS-TAL) aims to adapt a model to a new class represented by as few as a single video. Exiting FS-TAL methods assume trimmed training videos for new classes. However, this setting is not only unnatural – actions are typically captured in untrimmed videos, but also ignores background video segments containing vital contextual cues for foreground action segmentation. In this work, we first propose a new FS-TAL setting by proposing to use untrimmed training videos. Further, a novel FS-TAL model is proposed which maximizes the knowledge transfer from training classes whilst enabling the model to be dynamically adapted to both the new class and each video of that class simultaneously. This is achieved by introducing a query adaptive Transformer in the model. Extensive experiments on two action localization benchmarks demonstrate that our method can outperform all the stateof-the-art alternatives significantly in both single-domain and cross-domain scenarios.

Summary

First Few-Shot TAL setting to use Untrimmed Videos for both Support and Query
Unified Model can accomodate both Untrimmed and Trimmed Video without design change
Instead of meta-learning the entire network, only Transformer is meta-learned hence faster adaptation.
Intra-Class Variance is handled using this adaptation
Promising performance in Cross-Domain/Dataset settings.

Qualitative Performance

Training and Evaluation

Appologize for the messed up Code

Refactoring will be done soon ( delay due to CVPR workload )

To Train

python gtad_train_fs.py

To Test

sh test_fs.sh

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@misc{nag2021fewshot,
      title={Few-Shot Temporal Action Localization with Query Adaptive Transformer}, 
      author={Sauradip Nag and Xiatian Zhu and Tao Xiang},
      year={2021},
      eprint={2110.10552},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Comments

gtad_inference_fs_inductive py

Hi, I followed the steps you gave which is python gtad_inference_fs_inductive.py --meta_learn True --shot 5 --multi_instance False python gtad_inference_fs_inductive.py --meta_learn False --shot 5 --multi_instance False python gtad_c3d_postprocess_fs.py However, when I run the inference code, i am not able to create the output/results2 file that is needed for the postprocessing step. I was wondering if I missed out something? Is this not the inference code? Sorry, I am confused.
Postprocessing Bug

opened by lucky-23 9
GTAD checkpoint

Hello, I am unable to run align.py due to incompatibility issue with my CUDA/CUDNN version and I have been stuck with this problem for a long time. Can you please share the pre-trained weights of GTAD model on ActivityNet dataset? Thank you very much!

opened by lucky-23 4
Python 2.7 or 3.7

I have an issue with gtad lib -- I created two envs FSQAT(2.7) and GTAD(3.7) - I cannot install the aign1d on GTAD(3.7) is my issue. I have seen the error only appears in python 3.7. Below are some screenshots from my pc.

Environment Bug

opened by Quadwo 4
Thumos14 dataset

Hi, first of all thanks for releasing the code. It helped me a lot 😊. I would like to ask if it is possible to release the code for thumos14 dataset as well especially the dataset and postprocessing file.

opened by deeplearning92 1

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

vision-transformer-from-scratch This repository includes several kinds of vision transformers from scratch so that one beginner can understand the the

1 Dec 24, 2021

This is the official implementation code repository of Underwater Light Field Retention : Neural Rendering for Underwater Imaging (Accepted by CVPR Workshop2022 NTIRE)

Underwater Light Field Retention : Neural Rendering for Underwater Imaging (UWNR) (Accepted by CVPR Workshop2022 NTIRE) Authors: Tian Ye†, Sixiang Che

17 Dec 14, 2022

Main repository for the HackBio'2021 Virtual Internship Experience for #Team-Greider ❤️

Hello 🤟 #Team-Greider The team of 20 people for HackBio'2021 Virtual Bioinformatics Internship 💝 🖨️ 👨‍💻 HackBio: https://thehackbio.com 💬 Ask us

7 Oct 20, 2022

Y. Zhang, Q. Yao, W. Dai, L. Chen. AutoSF: Searching Scoring Functions for Knowledge Graph Embedding. IEEE International Conference on Data Engineering (ICDE). 2020

AutoSF The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding" and this paper has been accepted by ICDE2020. News:

64 Dec 17, 2022

A PyTorch Implementation of the paper - Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

Investigating U-NETS With Various Intermediate Blocks For Spectrogram-based Singing Voice Separation A Pytorch Implementation of the paper "Investigat

63 Nov 14, 2022

BMVC 2021: This is the github repository for "Few Shot Temporal Action Localization using Query Adaptive Transformers" accepted in British Machine Vision Conference (BMVC) 2021, Virtual

Related tags

Overview

FS-QAT: Few Shot Temporal Action Localization using Query Adaptive Transformer

Updates

Abstract

Summary

Qualitative Performance

Training and Evaluation

Citation

You might also like...

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

This is the official implementation code repository of Underwater Light Field Retention : Neural Rendering for Underwater Imaging (Accepted by CVPR Workshop2022 NTIRE)

Main repository for the HackBio'2021 Virtual Internship Experience for #Team-Greider ❤️

Y. Zhang, Q. Yao, W. Dai, L. Chen. AutoSF: Searching Scoring Functions for Knowledge Graph Embedding. IEEE International Conference on Data Engineering (ICDE). 2020

A PyTorch Implementation of the paper - Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

Ratatoskr: Worcester Tech's conference scheduling system

Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code

python debugger and anti-vm that checks if you're in a virtual machine or if someones trying to debug your file

This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21

Comments

gtad_inference_fs_inductive py

GTAD checkpoint

Python 2.7 or 3.7

Thumos14 dataset

Owner

Sauradip Nag

Github for the conference paper GLOD-Gaussian Likelihood OOD detector

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

AI Virtual Calculator: This is a simple virtual calculator based on Artificial intelligence.

This repo. is an implementation of ACFFNet, which is accepted for in Image and Vision Computing.

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

"Inductive Entity Representations from Text via Link Prediction" @ The Web Conference 2021

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".