Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

Overview

ASFormer: Transformer for Action Segmentation

This repo provides training & inference code for BMVC 2021 paper: ASFormer: Transformer for Action Segmentation.

Enviroment

Pytorch == 1.1.0, torchvision == 0.3.0, python == 3.6, CUDA=10.1

Reproduce our results

1. Download the dataset data.zip at (https://mega.nz/#!O6wXlSTS!wcEoDT4Ctq5HRq_hV-aWeVF1_JB3cacQBQqOLjCIbc8) or (https://zenodo.org/record/3625992#.Xiv9jGhKhPY). 
2. Unzip the data.zip file to the current folder. There are three datasets in the ./data folder, i.e. ./data/breakfast, ./data/50salads, ./data/gtea
3. Download the pre-trained models at (https://pan.baidu.com/s/1zf-d-7eYqK-IxroBKTxDfg). There are pretrained models for three datasets, i.e. ./models/50salads, ./models/breakfast, ./models/gtea
4. Run python main.py --action=predict --dataset=50salads/gtea/breakfast --split=1/2/3/4/5 to generate predicted results for each split.
5. Run python eval.py --dataset=50salads/gtea/breakfast --split=0/1/2/3/4/5 to evaluate the performance. **NOTE**: split=0 will evaulate the average results for all splits, It needs to be done after you complete all split predictions.

Train your own model

Also, you can retrain the model by yourself with following command.

python main.py --action=train --dataset=50salads/gtea/breakfast --split=1/2/3/4/5

The training process is very stable in our experiments. It convergences very fast and is not sensitive to the number of training epochs.

Demo for using ASFormer as your backbone

In our paper, we replace the original TCN-based backbone model MS-TCN in ASRF with our ASFormer. The new model achieves even higher results on the 50salads dataset than the original ASRF. Code is Here.


If you find our repo useful, please give us a star and cite

@inproceedings{chinayi_ASformer,  
	author={Fangqiu Yi and Hongyu Wen and Tingting Jiang}, 
	booktitle={The British Machine Vision Conference (BMVC)},   
	title={ASFormer: Transformer for Action Segmentation},
	year={2021},  
}

Feel free to raise a issue if you got trouble with our code.

Comments
  • Batch size constraint

    Batch size constraint

    Hello,

    Thank you for your amazing work !

    I was wondering if there is any particular reason for imposing a batch size of 1 in model.py: https://github.com/ChinaYi/ASFormer/blob/89e72d840a3d3eb8f16c270adb5031d648d9fb73/model.py#L138

    In my testing, ASFormer learns fine with bigger batch sizes.

    opened by OliverGuy 6
  • attention实现的问题

    attention实现的问题

    您好,您提到的层次注意力是不是指的是band attention(如下图所示),只不过随着层数增加,窗口大小指数递增。这样的话model.py里这个函数里的那个for循环内容,是不是应该改为window_mask[:, i, i:i+self.bl] = 1

    def construct_window_mask(self):
        window_mask = torch.zeros((1, self.bl, self.bl + 2* (self.bl //2)))
        for i in range(self.bl):
            window_mask[:, :, i:i+self.bl] = 1
        return window_mask.to(device)
    

    image

    opened by ddz16 5
  • Feature Extraction

    Feature Extraction

    Hi, can you provide more informations about the feature extraction? I would like to use this fantastic model on my dataset but I don't know how to extract the features to feed to the encoder.

    opened by Camillo4eyes 3
  • results on salads50 does not match table 5

    results on salads50 does not match table 5

    Hi thanks for your work. I was able to train and test the model and achieve similar performance as mentioned in the paper when I use both enc and dec. However, when I don't use the decoder, the results are much worse than what is mentioned in the table 5 (first row). I was wondering if I need to do any changes to the setting to get the same performance (specially for Acc)? I notice that without using the decoder the acc drops lower than 80.

    opened by seyeeet 3
  • Error in evaluation code

    Error in evaluation code

    Hi,

    Thanks for sharing the code. I noticed the bg_class in the evaluation code is not properly set.

    The default name of background class is set to background, which is true in GTEA yet need to be changed to SIL for breakfast and action_start and action_end for 50salads. It seems they are not changed for the results in the paper.

    With the correct class name and the released model, I obtained a lower result | | [email protected] | [email protected] | [email protected]| |-|---------|----------|---------| |Breakfast|70.9 |67.5 |56.7 | |50salads|83.7|81.8|73.7|

    opened by ZijiaLewisLu 2
  • Increase the batchsize and the result is hurt

    Increase the batchsize and the result is hurt

    Hi, Thank you for your work. When I try to increase the batch size, the index drops a lot. What do you think are the possible reasons

    the GPU is A100 40g.

    train with default setting, only split 1

    (s1)[83.40807175 81.16591928 72.19730942] 75.934108 83.2241

    just change batch size to 8 lr 0.001 and then train

    (s1)[68.94977169 67.57990868 55.25114155] 63.931922 72.0049

    opened by wlsh1up 2
  • Cannot download the model

    Cannot download the model

    Hi ! Firstly, thanks for sharing this repo ! I'm struggling to download the model (3. Download the pre-trained models at (https://pan.baidu.com/s/1zf-d-7eYqK-IxroBKTxDfg)) Indeed, the site says that you need to create an account to download the file. The thing is I cannot create an account with a french phone number 😅 Any other way to download the pretrained model ? Many thanks !

    opened by madmoiselleve 2
  • Long training time

    Long training time

    Hello,

    I am adapting your code for my own dataset which usually train relatively fast when using only ASRF, but when using your model with the transformer it's taking approximately 10x times longer. Do you have a similar behaviour with Salad/breakfast/gtea datasets ?

    Thank you :)

    opened by Jaakik 2
  • Enviroment issues

    Enviroment issues

    I installed the environment as you asked: Pytorch == 1.1.0, torchvision == 0.3.0, python == 3.6, CUDA=10.1

    It is certain that the model is loaded because the model size is printed: Model Size: 1130860

    But the problem is: Traceback (most recent call last): File "main.py", line 99, in trainer.predict(model_dir, results_dir, features_path, batch_gen_tst, num_epochs, actions_dict, sample_rate) File "/home/cpslabrtx3090/zjb/projects/ASFormer/model.py", line 399, in predict self.model.load_state_dict(torch.load(model_dir + "/epoch-" + str(epoch) + ".model")) File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/site-packages/torch/serialization.py", line 387, in load return _load(f, map_location, pickle_module, **pickle_load_args) File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/site-packages/torch/serialization.py", line 560, in _load raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name)) RuntimeError: ./models/gtea/split_1/epoch-120.model is a zip archive (did you mean to use torch.jit.load()?)

    Traceback (most recent call last): File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 189, in nti n = int(s.strip() or "0", 8) ValueError: invalid literal for int() with base 8: 'ld_tenso'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 2299, in next tarinfo = self.tarinfo.fromtarfile(self) File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 1093, in fromtarfile obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors) File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 1035, in frombuf chksum = nti(buf[148:156]) File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 191, in nti raise InvalidHeaderError("invalid header") tarfile.InvalidHeaderError: invalid header

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/site-packages/torch/serialization.py", line 556, in _load return legacy_load(f) File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/site-packages/torch/serialization.py", line 467, in legacy_load with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar,
    File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 1591, in open return func(name, filemode, fileobj, **kwargs) File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 1621, in taropen return cls(name, mode, fileobj, **kwargs) File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 1484, in init self.firstmember = self.next() File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 2311, in next raise ReadError(str(e)) tarfile.ReadError: invalid header

    opened by wolf-bailang 1
  • about  the randomness of code

    about the randomness of code

    Hi, Thank you for your code https://github.com/ChinaYi/ASFormer/blob/3940443a7dca336f879c43a42bcf91c5bf7c790f/model.py?_pjax=%23js-repo-pjax-container%2C%20div%5Bitemtype%3D%22http%3A%2F%2Fschema.org%2FSoftwareSourceCode%22%5D%20main%2C%20%5Bdata-pjax-container%5D#L370 When I change the test interval from 10 to 20, 30, etc., different results(such as training loss ) are obtained under the same seed. What do you think is the reason? best regards

    opened by wlsh1up 1
  • The provided models generate lower scores than the paper reported

    The provided models generate lower scores than the paper reported

    Thanks for you nice work, meanwhile, may I confirm one thing? By using your features and pre-trained models (epoch=120), the obtained scores are lower than your BMVC paper for three datasets. For instance, the edit and F1@10 of gtea can only reach 84.0 and 88.9, which are lower than 84.6 and 90.1 in your paper. Same for another two datasets. 50salads edit=75.7, F1@10=83.4.

    opened by medical-girl 3
Owner
null
This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised d

Hang 3 Oct 22, 2021
Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

ACTION-Net Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21). Getting Started EgoGesture data folder struct

V-Sense 171 Dec 26, 2022
Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

Pilhyeon Lee 67 Jan 3, 2023
Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

The official implementation of Mask-aware IoU and maYOLACT detector. Our implementation is based on mmdetection. Mask-aware IoU for Anchor Assignment

Kemal Oksuz 11 Oct 21, 2021
[BMVC2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations"

DomainMix [BMVC2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations" [paper] [de

Wenhao Wang 17 Dec 20, 2022
nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation ". Please

jsguo 610 Dec 28, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
This repo provides the official code for TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/pdf/2103.04430.pdf).

TransBTS: Multimodal Brain Tumor Segmentation Using Transformer This repo is the official implementation for TransBTS: Multimodal Brain Tumor Segmenta

Raymond 247 Dec 28, 2022
Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

Tim Ansell 70 Nov 4, 2022
Human Action Controller - A human action controller running on different platforms.

Human Action Controller (HAC) Goal A human action controller running on different platforms. Fun Easy-to-use Accurate Anywhere Fun Examples Mouse Cont

null 27 Jul 20, 2022
The pytorch implementation of SOKD (BMVC2021).

Semi-Online Knowledge Distillation Implementations of SOKD. Requirements This repo was tested with Python 3.8, PyTorch 1.5.1, torchvision 0.6.1, CUDA

null 4 Dec 19, 2021
Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

MOS-Multi-Task-Face-Detect Introduction This repo is the official implementation of "MOS: A Low Latency and Lightweight Framework for Face Detection,

null 104 Dec 8, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 32 Sep 21, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
PyTorch implementation for paper StARformer: Transformer with State-Action-Reward Representations.

StARformer This repository contains the PyTorch implementation for our paper titled StARformer: Transformer with State-Action-Reward Representations.

Jinghuan Shang 14 Dec 9, 2022
PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Long Short-Term Transformer for Online Action Detection Introduction This is a PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short

null 77 Dec 16, 2022
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

null 268 Jan 9, 2023
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

null 32 Sep 25, 2021