Codebase for "Revisiting spatio-temporal layouts for compositional action recognition" (Oral at BMVC 2021).

Gorjan

Last update: Dec 15, 2022

Related tags

Deep Learning revisiting-spatial-temporal-layouts

Overview

Revisiting spatio-temporal layouts for compositional action recognition

Codebase for "Revisiting spatio-temporal layouts for compositional action recognition" (Oral at BMVC 2021).

New year's arrival (01.01.2022) > Code release date > CVPR 2021 deadline (17.11.2021)...

Comments

Clarification on Models

Am I right in understanding that currently, you only implement the STLT model, without any of the appearance features?

Is there a timeline for when the use of appearance features will be incorporated? I am particularly interested in either the PBF (hoping to use my own appearance features) or CACNF models (if it can be pretrained end-to-end).

Thanks

opened by michael-camilleri 15
Position Embedding
As I was tweaking the code, I found some design choices in the position embeddings which I cannot understand.

For the STLT, it seems that the position embedding vector does not take into consideration the number of sampled frames. Specifically, although it does use the config parameter config.layout_num_frames, this is never set in train.py (lines 87-96) or inference.py (lines 47-54). This means that the position embedding vector is always of size 256 (configs.py line 109). Is there a reason for this? I have changed this in my code, and ensuring that the position embedding is always config.layout_num_frames + 1 - does this make sense?

For CACNF, the position embedding seems to be dependent on the Batch-Size and Image size itself for the summation with the features to work (models.py line 267). If I change the config.spatial_size or the batch size to less than 16, then there is a dimension mismatch. I hacked this by computing some intermediary values in my code but it is somewhat hard-coded all the same based on my data. Is there a reason for this dependency/architecture?
opened by michael-camilleri 12
Mismatch between appearance_num_frames and feature-size

Hi Gorjan

I am running into an issue with setting the number of frames to sample from each video. To put it into context, I need to classify 1s clips at a time, which amount to 25 frames, and hence, cannot sample more than those. The current setup is 32 frames, and I changed appearance_num_frames to 25. However, this may be interfering with the forward_features() method of TransformerResnet. It seems that the ResNet outputs by default 32 sequence length, and not sure if this can be modified.

The error happens in models.py line 267, when it tries to join it with the position embedding. Any idea how I can rectify? am I interpreting the appearance_num_frames correctly to begin with?

opened by michael-camilleri 7
file missing for training stlt

hello, I tried to train the stlt model but got some errors. I first run create_dataset.py to prepare dataset, after that, I canot find the videoid2size.json file, which however is necessary for running the code. How can I get this json file? Thanks a lot ~

opened by patrolli 4
Saving Trained Model

Hi Gorjan

Once trained, how can I save a CACNF model? does setting the save_model_path store also the backbones (including the fine-tuned Resnet) or do I need to store these (and load them) separately using the resnet_model_path and save_backbone_path?

opened by michael-camilleri 3
0- or 1- based indexing

Just checking

The labels for the classifier, are they 1- or 0-based indexed? i.e. is the smallest label 0 or 1? the reason I am asking is that the associated label file seems to start from 1, but when I did the same for my classes, it gave me indexing errors and I had to do everything starting from 0.

opened by michael-camilleri 2
results on ActionGenome dataset
@gorjanradevski Dear author, I downloaded the STLT model on Action Genome Oracle and tried to run the following code, but the map is 58.75, slightly lower than 60.6 in the paper. Can you tell me why? `(dp-env) wn@node01:~/temp/STLT$ python inference.py --test_dataset_path "data/ActionGenome/val_dataset.json" --batch_size 1 --dataset_type "layout" --model_name "stlt" --checkpoint_path "checkpoints/action_genome_gt_stlt.pt" --dataset_name "action_genome" --labels_path "data/ActionGenome/labels.json" --videoid2size_path "data/ActionGenome/videoid2size.json" INFO:root:Preparing dataset... INFO:root:Inference on 1814 INFO:root:Preparing model... INFO:root:================================== INFO:root:The model's configuration is:

Unique categories: 38

Number of classes: 157

Hidden size: 768

Hidden dropout probability: 0.1

Layer normalization epsilon: 1e-12

Number of attention heads: 12

Number of spatial layers: 4

Number of temporal layers: 8

Max number of layout frames: 256

The backbone path is: None

Freezing the backbone: False INFO:root:================================== INFO:root:Starting inference... 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1814/1814 [00:37<00:00, 48.10it/s] INFO:root:================================= INFO:root:The metrics are: INFO:root:map: 58.75 INFO:root:================================= `
opened by NingWang2049 1

Owner

Gorjan

Third year PhD in Machine Learning at KU Leuven. Interested in multi-modal deep learning and spatial reasoning involving vision and language.

GitHub

[BMVC 2021] Official PyTorch Implementation of Self-supervised learning of Image Scale and Orientation Estimation

Self-Supervised Learning of Image Scale and Orientation Estimation (BMVC 2021) This is the official implementation of the paper "Self-Supervised Learn

17 Nov 10, 2022

Official and maintained implementation of the paper "OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data" [BMVC 2021].

OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data Christoph Reich, Tim Prangemeier, Özdemir Cetin & Heinz Koeppl | Pr

23 Sep 21, 2022

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

AniFormer This is the PyTorch implementation of our BMVC 2021 paper AniFormer: Data-driven 3D Animation with Transformer. Haoyu Chen, Hao Tang, Nicu S

7 Oct 22, 2021

Cascading Feature Extraction for Fast Point Cloud Registration (BMVC 2021)

Cascading Feature Extraction for Fast Point Cloud Registration This repository contains the source code for the paper [Arxive link comming soon]. Meth

7 May 26, 2022

Pytorch implementation of the paper Progressive Growing of Points with Tree-structured Generators (BMVC 2021)

PGpoints Pytorch implementation of the paper Progressive Growing of Points with Tree-structured Generators (BMVC 2021) Hyeontae Son, Young Min Kim Pre

9 Jun 6, 2022

Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Inter-Prototype (BMVC 2021): Official Project Webpage This repository provides the official PyTorch implementation of the following paper: Improving F

16 Jun 30, 2022

This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

inverse_attention This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021. Le

5 Jul 8, 2022

This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution

Trajectory Prediction using Equivariant Continuous Convolution (ECCO) This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivar

45 Jul 22, 2022

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

KAIROS MineRL BASALT Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL B

37 Oct 30, 2022

Spearmint Bayesian optimization codebase

Spearmint Spearmint is a software package to perform Bayesian optimization. The Software is designed to automatically run experiments (thus the code n

Formerly: Harvard Intelligent Probabilistic Systems Group -- Now at Princeton

1.5k Dec 29, 2022

A general 3D Object Detection codebase in PyTorch.

Det3D is the first 3D Object Detection toolbox which provides off the box implementations of many 3D object detection algorithms such as PointPillars, SECOND, PIXOR, etc, as well as state-of-the-art methods on major benchmarks like KITTI(ViP) and nuScenes(CBGS).

1.4k Jan 5, 2023

Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

210 Dec 28, 2022

AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

214 Jan 3, 2023

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

3k Dec 26, 2022

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Decision Transformer Lili Chen*, Kevin Lu*, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas†, and Igor M

1.4k Jan 7, 2023

Codebase for the Summary Loop paper at ACL2020

Summary Loop This repository contains the code for ACL2020 paper: The Summary Loop: Learning to Write Abstractive Summaries Without Examples. Training

Canny Lab @ The University of California, Berkeley

44 Nov 4, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

X-modaler is a versatile and high-performance codebase for cross-modal analytics.

X-modaler X-modaler is a versatile and high-performance codebase for cross-modal analytics. This codebase unifies comprehensive high-quality modules i

910 Dec 28, 2022

Codebase for Diffusion Models Beat GANS on Image Synthesis.

128 Dec 2, 2022

Codebase for "Revisiting spatio-temporal layouts for compositional action recognition" (Oral at BMVC 2021).

Related tags

Overview

Revisiting spatio-temporal layouts for compositional action recognition

Comments

Clarification on Models

Position Embedding

Mismatch between appearance_num_frames and feature-size

file missing for training stlt

Saving Trained Model

0- or 1- based indexing

results on ActionGenome dataset

Owner

Gorjan

[BMVC 2021] Official PyTorch Implementation of Self-supervised learning of Image Scale and Orientation Estimation

Official and maintained implementation of the paper "OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data" [BMVC 2021].

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

Cascading Feature Extraction for Fast Point Cloud Registration (BMVC 2021)

Pytorch implementation of the paper Progressive Growing of Points with Tree-structured Generators (BMVC 2021)

Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

Spearmint Bayesian optimization codebase

A general 3D Object Detection codebase in PyTorch.

Official codebase for Pretrained Transformers as Universal Computation Engines.

AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Codebase for the Summary Loop paper at ACL2020

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

X-modaler is a versatile and high-performance codebase for cross-modal analytics.

Codebase for Diffusion Models Beat GANS on Image Synthesis.