Temporal-Relational CrossTransformers

Last update: Dec 12, 2022

Related tags

Deep Learning trx

Overview

Temporal-Relational Cross-Transformers (TRX)

This repo contains code for the method introduced in the paper:

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

We provide two ways to use this method. The first is to incorporate it into your own few-shot video framework to allow direct comparisons against your method using the same codebase. This is recommended, as everyone has different systems, data storage etc. The second is a full train/test framework, which you will need to modify to suit your system.

Use within your own few-shot framework (recommended)

TRX_CNN in model.py contains a TRX with multiple cardinalities (i.e. pairs, triples etc.) and a ResNet backbone. It takes in support set videos, support set labels and query videos. It outputs the distances from each query video to each of the query-specific support set prototypes which are used as logits. Feed this into the loss from utils.py. An example of how it is constructed with the required arguments, and how it is called (with input dimensions etc.) is in main in model.py

You can use it with ResNet18 with 84x84 resolution on one GPU, but we recommend distributing the CNN over multiple GPUs so you can use ResNet50, 224x224 and 5 query videos per class. How you do this will depend on your system, but the function distribute shows how we do it.

Use episodic training. That is, construct a random task from the training dataset like e.g. MAML, prototypical nets etc.. Average gradients and backpropogate once every 16 training tasks. You can look at the rest of the code for an example of how this is done.

Use with our framework

It includes the training and testing process, data loader, logging and so on. It's fairly system specific, in particular the data loader, so it is recommended that you use within your own framework (see above).

Download your chosen dataset, and extract frames to be of the form dataset/class/video/frame-number.jpg (8 digits, zero-padded). To prepare your data, zip the dataset folder with no compression. We did this as our filesystem has a large block size and limited number of individual files, which means one large zip file has to be stored in RAM. If you don't have this limitation (hopefully you won't because it's annoying) then you may prefer to use a different data loading process.

Put your desired splits (we used https://github.com/ffmpbgrnn/CMN for Kinetics and SSv2) in text files. These should be called trainlistXX.txt and testlistXX.txt. XX is a 0-padded number, e.g. 01. You can have separate text files for evaluating on the validation set, e.g. trainlist01.txt/testlist01.txt to train on the train set and evaluate on the the test set, and trainlist02.txt/testlist02.txt to train on the train set and evaluate on the validation set. The number is passed as a command line argument.

Modify the distribute function in model.py. We have 4 x 11GB GPUs, so we split the ResNets over the 4 GPUs and leave the cross-transformer part on GPU 0. The ResNets are always split evenly across all GPUs specified, so you might have to split the cross-transformer part, or have the cross-transformer part on its own GPU.

Modify the command line parser in run.py so it has the correct paths and filenames for the dataset zip and split text files.

Acknowledgements

We based our code on CNAPs (logging, training, evaluation etc.). We use torch_videovision for video transforms. We took inspiration from the image-based CrossTransformer and the Temporal-Relational Network.

Comments

Change num_gpus

Thank you very much for your help on the datasets! Can I run it on two 3090ti GPUs(24GB)? I changed args.num_gpus and args.num_workers to 2, 1 respectively in run.py, but it failed. Did I miss anything else?

I RUN THIS :python run.py -c checkpoint_dir --query_per_class 4 --shot 5 --way 5 --trans_linear_out_dim 1152 --tasks_per_batch 16 --test_iters 75000 --dataset ssv2 --split 3 -lr 0.001 --method resnet50 --img_size 224

Options: Namespace(checkpoint_dir='checkpoint_dir', dataset='ssv2', debug_loader=False, img_size=224, learning_rate=0.001, method='resnet50', num_gpus=2, num_test_tasks=10000, num_workers=1, opt='sgd',

/trx-main/model.py", line 94, in forward class_k = torch.index_select(mh_support_set_ks, 0, self._extract_class_indices(support_labels, c)) RuntimeError: Input, output and indices must be on the current device

Best wishes, Han

opened by faded-TJU 11

About dataset

Thanks a lot for making your code public. I am trying to replicate your results on SSv2 and HMDB51. This was my first foray into video action recognition. Some problems happen when i work with datasets. 
1. I get SSv2 from TwentyBN  https://20bn.com/datasets/download, but when i use cat 20bn-something-something-v2-?? | tar zx, it brings a mistake:
    localhost dataset]$ cat 20bn-something-something-v2-?? | tar zx
    gzip: stdin: not in gzip format
    tar: Child died with signal 13
    tar: Error is not recoverable: exiting now
2. As you said, "Download your chosen dataset, and extract frames to be of the form dataset/class/video/frame-number.jpg", does this mean that I should put all the extracted frames into the ZIP file ? (args.path = os.path.join(args.scratch, "video_datasets/data/hmdb51.zip") or this zip file should consist of all the videos ? 
3. How can i decompose the video into frames? I cannot see the way ARN dealt with.
It would be greatly appreciated.

Best wishes!

opened by faded-TJU 4

How could we use validation set?

Thanks for sharing the code. Could you please tell us how we could use validation set at the code? The code only reads train and test lists. I would like to select best model using the validation set and then test the best model in the test list. It would be greatly appreciated.

opened by ycbilge 4
Doubts regarding the code
Thanks a lot for making your code public. I am trying to replicate your results on UCF-101 and HMDB. I have the following queries regarding them and the code in general:

I have a similar GPU configuration as you have reported on your Github(4 GPUs of 11GB) each but I am not able to run the code for either HMDB or UCF-101. When I use resnet34 instead of resnet50, the sizes have decreased and I am able to run properly. But, does the accuracy get affected by doing this? My code for training is:

python3 run.py -c checkpoint_dir --query_per_class 5 --shot 5 --way 5 --trans_linear_out_dim 1152 --test_iters 75000 --dataset ucf --split 3 -lr 0.001 --img_size 224 --scratch new --num_gpus 4 --method resnet34 -r --print_freq 1 --save_freq 1000

In the script, you have used confidence:

https://github.com/tobyperrett/trx/blob/390c20aa8ed7e309ae4949464fdcf015c4009329/run.py#L225

Why have you used 196.0. I checked the paper but couldn't find the reason behind this. It will be great if you can explain the same. What is the reason behind the confidence metric?

A general doubt, when can I say that model has started overfitting in this context of the few-shot video classification? I know that evaluation is performed for 10000 tasks but in training, the default number of tasks is 100K. Should I stop after I see that the losses have started approaching values of 0.0 decimal?

Thanks for clasrifying the validation part
opened by Anirudh257 2
Performance on Kinetics is only achieve 78.5%

I use your code as suggestion(replace dataload with mine,use random sample and randomw flip as augmentation,size of image is 3112112),but accuracy(5 way 5 shot) only achieve 78.5% after 10000 meta-train, and 76.9% after 50000 meta-train. It's far away from 85.9%.

Are there any other importance process in your dataload?

ps: loss function in code seems to be wrong(will increase to Nan as iteration),so I replace it with a common CrossEntropy Loss.

opened by ShiyeLi 2
Nonstandard Implementation

In the few-shot learning setting, a dataset is split into training, validation, and testing splits. However, all models are trained on training splits, and the validation set is used for validation, and the test set is ONLY used for computing the final result (because we do not know the specific distribution on the test set). Common evaluation: Select the best model (higher accuracy) on the val set when training, and employ this model on the test set (only once). And some works select the N episodes model, but the hyperparameter N is obtained by val set.

However, your implementation does NOT use the val set, and meanwhile, the test set is used repeatedly.

And it is unfair to compare to previous works.

opened by lovelyczli 1
about query_per_class, query_per_test

Thanks for the code again. When you are are changing the parameters for lets say 20-way 1-shot are you changing the parameters of query_per_class, query_per_test? Or should we just change --way and --shot parameters and leave the others as default? changing; parser.add_argument("--way", type=int, default=5, help="Way of each task.") -> 20 parser.add_argument("--shot", type=int, default=5, help="Shots per class.") -> 1 not changing; parser.add_argument("--query_per_class", type=int, default=5, help="Target samples (i.e. queries) per class used for training.") parser.add_argument("--query_per_class_test", type=int, default=1, help="Target samples (i.e. queries) per class used for testing.") Thanks for your help!

opened by ycbilge 1
The number of iterations and the steps to drop learning rate?

Thanks for your published work! When I reproduce your results on the Kinetics-100 dataset (with a total of 100k iterations), I find that the accuracy of the model will drop later. When will the performance of the model reach the best in the training process？ In addition, can you give me more details about the number of training iterations and the steps to drop the learning rate on both SSV2 and Kinetics-100 datasets. Thank you again for your help！

opened by PHDJieFu 1
can you please provide the spcripts that we should use for training each dataset?

Thanks for sharing your code. can you please provide the scrips that tell us the value of the arguments that you used for training/testing. It would be greatly appreciated.

opened by seyeeet 1
the number of frames chosen

Hello, I would like to ask how many frame rates are set to capture a picture for the 4 different datasets of video data when building the dataset. Thank you very much!

opened by Liu-arch 12
Could you provide checkpoints?

Hello, thanks for the amazing work! It would be great if you could share the trained checkpoint of the model. Could you provide a google drive link or something like that?

Thanks

opened by andrearosasco 0
What is the rationale behind your statement about CTX in the paper?

Hey there. In your paper, you guys have stated CrossTransformers' potential weakness is that relative spatial information is not encoded. (in Related Work)

What is the rationale behind this? Have you guys done some experiments proving this or…?

opened by buncybunny 0
Performance on UCF101 is lower than the reported

Thanks for sharing your code. I use your code to train on UCF101 with the suggested hyper-parameters (i.e., lr=0.001, trans_linear_out_dim=1152， img_size=224, tasks_per_batch =16, num_test_tasks=10000) and the same data loader. However, the 5-way 5-shot accuracy can only attain 94.8%, which is lower than your reported 96.1%. Are there any other tricks you used to improve the performance? Or I use the wrong hyper-parameters?

opened by Jamine-W 4
About dataset

Hi, I was able to only acquire 92% of the total kinetics-100dataset in this link（https://github.com/ffmpbgrnn/CMN）, the rest were not available.

Could you please send me the download link for the split

Much thanks!

opened by Yanfei-Qin 3

Owner

GitHub

CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network - Pytorch This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal

63 Sep 27, 2022

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

5 Sep 16, 2022

Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

This repository is the official PyTorch implementation of Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

4 Dec 11, 2022

Code accompanying "Dynamic Neural Relational Inference" from CVPR 2020

Code accompanying "Dynamic Neural Relational Inference" This codebase accompanies the paper "Dynamic Neural Relational Inference" from CVPR 2020. This

48 Dec 23, 2022

[CVPR 2021 Oral] Variational Relational Point Completion Network

VRCNet: Variational Relational Point Completion Network This repository contains the PyTorch implementation of the paper: Variational Relational Point

121 Dec 12, 2022

Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.

CasRel-pytorch-reimplement Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The o

170 Dec 1, 2022

Code for the paper "Query Embedding on Hyper-relational Knowledge Graphs"

Query Embedding on Hyper-Relational Knowledge Graphs This repository contains the code used for the experiments in the paper Query Embedding on Hyper-

19 Jul 26, 2022

An implementation of DeepMind's Relational Recurrent Neural Networks in PyTorch.

relational-rnn-pytorch An implementation of DeepMind's Relational Recurrent Neural Networks (Santoro et al. 2018) in PyTorch. Relational Memory Core (

241 Nov 18, 2022

ReSSL: Relational Self-Supervised Learning with Weak Augmentation

ReSSL: Relational Self-Supervised Learning with Weak Augmentation This repository contains PyTorch evaluation code, training code and pretrained model

45 Oct 25, 2022

A PyTorch implementation of the Relational Graph Convolutional Network (RGCN).

Torch-RGCN Torch-RGCN is a PyTorch implementation of the RGCN, originally proposed by Schlichtkrull et al. in Modeling Relational Data with Graph Conv

66 Nov 30, 2022

(ICCV'21) Official PyTorch implementation of Relational Embedding for Few-Shot Classification

Relational Embedding for Few-Shot Classification (ICCV 2021) Dahyun Kang, Heeseung Kwon, Juhong Min, Minsu Cho [paper], [project hompage] We propose t

82 Dec 24, 2022

[ICCV 2021] Official PyTorch implementation for Deep Relational Metric Learning.

Deep Relational Metric Learning This repository is the official PyTorch implementation of Deep Relational Metric Learning. Framework Datasets CUB-200-

39 Dec 10, 2022

Exploring Relational Context for Multi-Task Dense Prediction [ICCV 2021]

Adaptive Task-Relational Context (ATRC) This repository provides source code for the ICCV 2021 paper Exploring Relational Context for Multi-Task Dense

35 Dec 5, 2022

Code for the paper Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations (AKBC 2021).

Relation Prediction as an Auxiliary Training Objective for Knowledge Base Completion This repo provides the code for the paper Relation Prediction as

85 Jan 2, 2023

This folder contains the implementation of the multi-relational attribute propagation algorithm.

MrAP This folder contains the implementation of the multi-relational attribute propagation algorithm. It requires the package pytorch-scatter. Please

6 Dec 6, 2022

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Relational Self-Attention: What's Missing in Attention for Video Understanding This repository is the official implementation of "Relational Self-Atte

43 Dec 7, 2022

ReLoss - Official implementation for paper "Relational Surrogate Loss Learning" ICLR 2022

Relational Surrogate Loss Learning (ReLoss) Official implementation for paper "R

31 Nov 22, 2022

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

NuPIC Numenta Platform for Intelligent Computing The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implem

6.3k Dec 30, 2022

A Temporal Extension Library for PyTorch Geometric

Documentation | External Resources | Datasets PyTorch Geometric Temporal is a temporal (dynamic) extension library for PyTorch Geometric. The library

1.9k Jan 7, 2023