LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

Overview

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

This Repository contains the code on AVA of our ACM MM 2021 paper: LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

Installation

See INSTALL.md for details on installing the codebase, including requirement and environment settings

Data

For data preparation and setup, our LSTC strictly follows the processing of PySlowFast, See DATASET.md for details on preparing the data.

Run the code

We take SlowFast-ResNet50 as an example

  • train the model
python3 tools/run_net.py --cfg config/AVA/SLOWFAST_32x12_R50_LFB.yaml \
    AVA.FEATURE_BANK_PATH 'path/to/feature/bank/folder' \
    TRAIN.CHECKPOINT_FILE_PATH 'path/to/pretrained/backbone' \
    OUTPUT_DIR 'path/to/output/folder'
  • test the model
python3 tools/run_net.py --cfg config/AVA/SLOWFAST_32x12_R50_LFB.yaml \
    AVA.FEATURE_BANK_PATH 'path/to/feature/bank/folder' \
    OUTPUT_DIR 'path/to/output/folder' \
    TRAIN.ENABLE False \ 
    TEST.ENABLE True

If you want to start the DDP training from command line with torch.distributed.launch, please set start_method='cmd' in tools/run_net.py

Resource

The codebase provide following resources for fast training and validation

Pretrained backbone on Kinetics

backbone dataset model type link
ResNet50 Kinetics400 Caffe2 Google Drive/Baidu Disk (Code: y1wl)
ResNet101 Kinetics600 Caffe2 Google Drive/Baidu Disk (Code: slde)

Extracted long term feature bank

backbone feature bank (LMDB) dimension
ResNet50 Google Drive 1280
ResNet101 Google Drive 2304

Checkpoint file

backbone checkpoint model type
ResNet50 Google Drive/Baidu Disk (Code: fi0s) pytorch
ResNet101 Google Drive/Baidu Disk (Code: g63o) pytorch

Acknowledgement

This codebase is built upon PySlowFast.

Citation

If you find this repository helps your research, please refer following paper

@InProceedings{Yuxi_2021_ACM,
  author = {Li, Yuxi and Zhang, Boshen and Li, Jian and Wang, Yabiao and Wang, Chengjie and Li, Jilin and Huang, Feiyue and Lin, Weiyao},
  title = {LSTC: Boosting Atomic Action Detection with Long-Short-Term Context},
  booktitle = {ACM Conference on Multimedia},
  month = {October},
  year = {2021}
} 
You might also like...
Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec

Wake Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec Abstract استخراج خودکار کلمات کلیدی متون کوتاه فارسی با استفاده از word2vec ب

NLP project that works with news (NER, context generation, news trend analytics)
NLP project that works with news (NER, context generation, news trend analytics)

СоАвтор СоАвтор – платформа и открытый набор инструментов для редакций и журналистов-фрилансеров, который призван сделать процесс создания контента ма

VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

VampiresVsWerewolves Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition. Our Algorithm finish

When doing audio and video sentiment recognition, I found that a lot of code is duplicated, often a function in different time debugging for a long time, based on this problem, I want to manage all the previous work, organized into an open source library can be iterative. For their own use and others.
Extracting Summary Knowledge Graphs from Long Documents

GraphSum This repo contains the data and code for the G2G model in the paper: Extracting Summary Knowledge Graphs from Long Documents. The other basel

ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension

ThinkTwice ThinkTwice is a retriever-reader architecture for solving long-text machine reading comprehension. It is based on the paper: ThinkTwice: A

Official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.
Official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

This repository is the official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

Beyond Paragraphs: NLP for Long Sequences

Beyond Paragraphs: NLP for Long Sequences

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

Comments
  •  extract the features of custom data sets

    extract the features of custom data sets

    How to extract the features of custom data sets? Can you give me some advice? I ran the script and added yaml configuration file, and there was an error that the dimensions could not be spliced. How did you extract the features at that time? Do you need to resize the images? Can the resized features still be used as feature bank?

    opened by yan-ctrl 0
  • extract the feature bank of the data set

    extract the feature bank of the data set

    Hello, when I want to extract the feature bank of the data set, I encountered a new problem. My yaml configuration file and the error I encountered are like this. Is the pre-training weight wrong? I hope you can give me an answer. SLOWFAST_32x2_BANK1.yaml: TRAIN: ENABLE: False DATASET: ava BATCH_SIZE: 64 EVAL_PERIOD: 20 CHECKPOINT_PERIOD: 1 AUTO_RESUME: True CHECKPOINT_FILE_PATH: "/home/tuxiangone/bang/Behavior_Model/LSTC/pretrained/lstc-resnet50.pyth" DATA: NUM_FRAMES: 32 SAMPLING_RATE: 2 TRAIN_JITTER_SCALES: [256, 320] TRAIN_CROP_SIZE: 224 TEST_CROP_SIZE: 224 INPUT_CHANNEL_NUM: [3, 3] DETECTION: ENABLE: True ALIGNED: True AVA: FEATURE_EXTRACTION: True FRAME_DIR: '/home/tuxiangone/bang/Behavior_Model/SlowFast/data/ava/frames' FRAME_LIST_DIR: '/home/tuxiangone/bang/Behavior_Model/SlowFast/data/ava/frame_lists' ANNOTATION_DIR: '/home/tuxiangone/bang/Behavior_Model/SlowFast/data/ava/annotations' DETECTION_SCORE_THRESH: 0.8 TRAIN_PREDICT_BOX_LISTS: [ "ava_train_v2.2.csv", "ava_train_predicted_boxes.csv", ] TEST_PREDICT_BOX_LISTS: ["ava_val_predicted_boxes.csv"] TEST_GT_BOX_LISTS: ["ava_val_v2.2.csv"] FEATURE_BANK_PATH: "output/feature_bank" SLIDING_WINDOW_SIZE: 15 GATHER_BANK: False SLOWFAST: ALPHA: 4 BETA_INV: 8 FUSION_CONV_CHANNEL_RATIO: 2 FUSION_KERNEL_SZ: 7 RESNET: ZERO_INIT_FINAL_BN: True WIDTH_PER_GROUP: 64 NUM_GROUPS: 1 DEPTH: 50 TRANS_FUNC: bottleneck_transform STRIDE_1X1: False NUM_BLOCK_TEMP_KERNEL: [[3, 3], [4, 4], [6, 6], [3, 3]] SPATIAL_DILATIONS: [[1, 1], [1, 1], [1, 1], [2, 2]] SPATIAL_STRIDES: [[1, 1], [2, 2], [2, 2], [1, 1]] NONLOCAL: LOCATION: [[[], []], [[], []], [[], []], [[], []]] GROUP: [[1, 1], [1, 1], [1, 1], [1, 1]] INSTANTIATION: dot_product POOL: [[[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]]] BN: USE_PRECISE_STATS: False FREEZE: False NUM_BATCHES_PRECISE: 200 SOLVER: BASE_LR: 0.1 LR_POLICY: steps_with_relative_lrs STEPS: [0, 10, 15, 20] LRS: [1, 0.1, 0.01, 0.001] MAX_EPOCH: 20 MOMENTUM: 0.9 WEIGHT_DECAY: 1e-7 WARMUP_EPOCHS: 5.0 WARMUP_START_LR: 0.000125 OPTIMIZING_METHOD: sgd MODEL: NUM_CLASSES: 80 ARCH: slowfast MODEL_NAME: BankContext LOSS_FUNC: bce DROPOUT_RATE: 0.5 HEAD_ACT: sigmoid TEST: ENABLE: True DATASET: ava BATCH_SIZE: 8 CHECKPOINT_FILE_PATH: "/home/tuxiangone/bang/Behavior_Model/LSTC/pretrained/SLOWFAST_8x8_R50_KINETICS.pkl" DATA_LOADER: NUM_WORKERS: 2 PIN_MEMORY: True NUM_GPUS: 1 NUM_SHARDS: 1 RNG_SEED: 0 OUTPUT_DIR: "output/raw_bank"

    CACHE:

    ENABLE: True

    LOG_MODEL_INFO: False My error: [INFO: checkpoint.py: 401]: loading checkpoint from /home/tuxiangone/bang/Behavior_Model/LSTC/pretrained/SLOWFAST_8x8_R50_KINETICS.pkl [09/01 16:48:34][INFO] slowfast.utils.checkpoint: 401: loading checkpoint from /home/tuxiangone/bang/Behavior_Model/LSTC/pretrained/SLOWFAST_8x8_R50_KINETICS.pkl Traceback (most recent call last): File "tools/extract_feature.py", line 175, in launch_job( File "/home/tuxiangone/bang/Behavior_Model/LSTC/build/lib/slowfast/utils/misc.py", line 307, in launch_job func(cfg=cfg) File "tools/extract_feature.py", line 144, in extract_feature cu.load_test_checkpoint(cfg, model) File "/home/tuxiangone/bang/Behavior_Model/LSTC/build/lib/slowfast/utils/checkpoint.py", line 405, in load_test_checkpoint load_checkpoint( File "/home/tuxiangone/bang/Behavior_Model/LSTC/build/lib/slowfast/utils/checkpoint.py", line 268, in load_checkpoint checkpoint = torch.load(f, map_location="cpu") File "/home/tuxiangone/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/tuxiangone/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/serialization.py", line 777, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 1: invalid continuation byte

    opened by yan-ctrl 3
  • module name

    module name"lmdb "

    Hello, I reported an error during training, no module name"lmdb ".Could you please send me this script? I carefully read setup.py, which is basically the same as that of SlowFast, and there is no information about the generation of lmdb package, so it can't be trained normally. Can you update this library?

    opened by yan-ctrl 0
  • feature bank (LMDB)

    feature bank (LMDB)

    I like your paper very much. Can you tell me how to get the feature bank (LMDB)? I want to apply your work to my own data set. Do you run the script extract_feature.py directly or extract it like LFB?

    opened by yan-ctrl 9
Owner
Tencent YouTu Research
Tencent YouTu Research
Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Word2Wave is a simple method for text-controlled GAN audio generation. You can either follow the setup instructions below and use the source code and CLI provided in this repo or you can have a play around in the Colab notebook provided. Note that, in both cases, you will need to train a WaveGAN model first

Ilaria Manco 91 Dec 23, 2022
Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

Maksim Terpilowski 49 Dec 30, 2022
Various Algorithms for Short Text Mining

Short Text Mining in Python Introduction This package shorttext is a Python package that facilitates supervised and unsupervised learning for short te

Kwan-Yuet 466 Dec 6, 2022
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform This repo try to implement iSTFTNet : Fast

Rishikesh (ऋषिकेश) 126 Jan 2, 2023
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

pyannote 2.2k Jan 9, 2023
:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

huybery 60 Dec 31, 2022
Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

Sentance Parser Executing the Program Make sure Python 3.6+ is installed. Install requirements $ pip install requirements.txt Run the program:

Vaibhaw 12 Sep 28, 2022
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context This repository contains the code in both PyTorch and TensorFlow for our paper

Zhilin Yang 3.3k Dec 28, 2022
Line as a Visual Sentence: Context-aware Line Descriptor for Visual Localization

Line as a Visual Sentence with LineTR This repository contains the inference code, pretrained model, and demo scripts of the following paper. It suppo

SungHo Yoon 158 Dec 27, 2022
Abhijith Neil Abraham 2 Nov 5, 2021