Learning Spatio-Temporal Transformer for Visual Tracking

Multimedia Research

Last update: Jan 4, 2023

Related tags

Text Data & NLP Stark

Overview

STARK

The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking

Highlights

The strongest performances

Tracker	LaSOT (AUC)	GOT-10K (AO)	TrackingNet (AUC)
STARK	67.1	68.8	82.0
TransT	64.9	67.1	81.4
TrDiMP	63.7	67.1	78.4
Siam R-CNN	64.8	64.9	81.2

Real-Time Speed

STARK-ST50 and STARK-ST101 run at 40FPS and 30FPS respectively on a Tesla V100 GPU.

End-to-End, Post-processing Free

STARK is an end-to-end tracking approach, which directly predicts one accurate bounding box as the tracking result.
Besides, STARK does not use any hyperparameters-sensitive post-processing, leading to stable performances.

Purely PyTorch-based Code

STARK is implemented purely based on the PyTorch.

Install the environment

Option1: Use the Anaconda

conda create -n stark python=3.6
conda activate stark
bash install.sh

Option2: Use the docker file

We provide the complete docker at here

Data Preparation

Put the tracking datasets in ./data. It should look like:

${STARK_ROOT}
 -- data
     -- lasot
         |-- airplane
         |-- basketball
         |-- bear
         ...
     -- got10k
         |-- test
         |-- train
         |-- val
     -- coco
         |-- annotations
         |-- images
     -- trackingnet
         |-- TRAIN_0
         |-- TRAIN_1
         ...
         |-- TRAIN_11
         |-- TEST

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir .

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Train STARK

Training with multiple GPUs using DDP

# STARK-S50
python tracking/train.py --script stark_s --config baseline --save_dir . --mode multiple --nproc_per_node 8  # STARK-S50
# STARK-ST50
python tracking/train.py --script stark_st1 --config baseline --save_dir . --mode multiple --nproc_per_node 8  # STARK-ST50 Stage1
python tracking/train.py --script stark_st2 --config baseline --save_dir . --mode multiple --nproc_per_node 8 --script_prv stark_st1 --config_prv baseline  # STARK-ST50 Stage2
# STARK-ST101
python tracking/train.py --script stark_st1 --config baseline_R101 --save_dir . --mode multiple --nproc_per_node 8  # STARK-ST101 Stage1
python tracking/train.py --script stark_st2 --config baseline_R101 --save_dir . --mode multiple --nproc_per_node 8 --script_prv stark_st1 --config_prv baseline_R101  # STARK-ST101 Stage2

(Optionally) Debugging training with a single GPU

python tracking/train.py --script stark_s --config baseline --save_dir . --mode single

Test and evaluate STARK on benchmarks

LaSOT

python tracking/test.py stark_st baseline --dataset lasot --threads 32
python tracking/analysis_results.py # need to modify tracker configs and names

GOT10K-test

python tracking/test.py stark_st baseline_got10k_only --dataset got10k_test --threads 32
python lib/test/utils/transform_got10k.py --tracker_name stark_st --cfg_name baseline_got10k_only

TrackingNet

python tracking/test.py stark_st baseline --dataset trackingnet --threads 32
python lib/test/utils/transform_trackingnet.py --tracker_name stark_st --cfg_name baseline

VOT2020
Before evaluating "STARK+AR" on VOT2020, please install some extra packages following external/AR/README.md

cd external/vot20/<workspace_dir>
export PYTHONPATH=<path to the stark project>:$PYTHONPATH
bash exp.sh

VOT2020-LT

cd external/vot20_lt/<workspace_dir>
export PYTHONPATH=<path to the stark project>:$PYTHONPATH
bash exp.sh

Test FLOPs, Params, and Speed

# Profiling STARK-S50 model
python tracking/profile_model.py --script stark_s --config baseline
# Profiling STARK-ST50 model
python tracking/profile_model.py --script stark_st2 --config baseline
# Profiling STARK-ST101 model
python tracking/profile_model.py --script stark_st2 --config baseline_R101

Model Zoo

The trained models, the training logs, and the raw tracking results are provided in the model zoo

Acknowledgments

Thanks for the great PyTracking Library, which helps us to quickly implement our ideas.
We use the implementation of the DETR from the official repo https://github.com/facebookresearch/detr.

Comments

Dataloader will randomly crashed
Hi.

I found that the training process will randomly crashed with RuntimeError: DataLoader worker (pid(s) 36469) exited unexpectedly, is that normal?

I use the following training command.

python tracking/train.py --script stark_s --config baseline_got10k_only --save_dir . --mode multiple --nproc_per_node 8

thanks!
opened by memoiry 7
A problem about loading checkpoint

When I train ‘st’ model，I found the 'net_type' is ''STARKS'',but the checkpoint_dict['net_type'] is ''LittleBoy_clean_corner'', so assert net_type == checkpoint_dict['net_type'], 'Network is not of correct type.'It's always wrong.

How can I solve this problem?Thanks！

opened by 1071189147 5

cuda10.2 and 3060 do not match

run: python tracking/video_demo.py stark_s baseline test_video/demo.mp4

cuda10.2:

NVIDIA GeForce RTX 3060 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3060 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

cuda11.0:

WARNING: You are using tensorboardX instead sis you have a too old pytorch version.
Traceback (most recent call last):
  File "tracking/../lib/train/admin/tensorboard.py", line 4, in <module>
    from torch.utils.tensorboard import SummaryWriter
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 189, in <module>
    _load_global_deps()
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 142, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/lib/../../../../libcublas.so.11: symbol free_gemm_select version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tracking/video_demo.py", line 9, in <module>
    from lib.test.evaluation import Tracker
  File "tracking/../lib/test/evaluation/__init__.py", line 1, in <module>
    from .data import Sequence
  File "tracking/../lib/test/evaluation/data.py", line 3, in <module>
    from lib.train.data.image_loader import imread_indexed
  File "tracking/../lib/train/__init__.py", line 1, in <module>
    from .admin.multigpu import MultiGPU
  File "tracking/../lib/train/admin/__init__.py", line 3, in <module>
    from .tensorboard import TensorboardWriter
  File "tracking/../lib/train/admin/tensorboard.py", line 7, in <module>
    from tensorboardX import SummaryWriter
ModuleNotFoundError: No module named 'tensorboardX'

but when i installed tensorboardX:

WARNING: You are using tensorboardX instead sis you have a too old pytorch version.
Traceback (most recent call last):
  File "tracking/../lib/train/admin/tensorboard.py", line 4, in <module>
    from torch.utils.tensorboard import SummaryWriter
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 189, in <module>
    _load_global_deps()
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 142, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/lib/../../../../libcublas.so.11: symbol free_gemm_select version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tracking/video_demo.py", line 9, in <module>
    from lib.test.evaluation import Tracker
  File "tracking/../lib/test/evaluation/__init__.py", line 1, in <module>
    from .data import Sequence
  File "tracking/../lib/test/evaluation/data.py", line 3, in <module>
    from lib.train.data.image_loader import imread_indexed
  File "tracking/../lib/train/__init__.py", line 1, in <module>
    from .admin.multigpu import MultiGPU
  File "tracking/../lib/train/admin/__init__.py", line 3, in <module>
    from .tensorboard import TensorboardWriter
  File "tracking/../lib/train/admin/tensorboard.py", line 7, in <module>
    from tensorboardX import SummaryWriter
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/tensorboardX/__init__.py", line 5, in <module>
    from .torchvis import TorchVis
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/tensorboardX/torchvis.py", line 11, in <module>
    from .writer import SummaryWriter
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/tensorboardX/writer.py", line 34, in <module>
    import torch
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 189, in <module>
    _load_global_deps()
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 142, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/lib/../../../../libcublas.so.11: symbol free_gemm_select version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

opened by Richard-mei 4

About GOT-10k test set results

Hi, Thanks for your wonderful work. I notice that Transformer Tracking use the model trained with all datasets(LaSOT, GOT10K, COCO, TrackingNet) to get the evaluation result on GOT-10k test set, and the result is much better than the model trained with GOT10K only.

However, when I use the STARK-S50 pre-trained model(trained on all datasets) in your model zoo to evaluate the GOT-10k test set, I find that the AO is 0.688, which only gains small improvement compared with 0.672

I am confused with this phenomenon. Have you ever tried to evaluate the model trained with all datasets on GOT-10k test set? Or can you kindly explain the reason why there is just little performance gain to use the model trained on all datasets?

opened by botaoye 4
how to analysis the model on GOT10k-val dataset?

Thanks for your work! I trained the model and want to evaluate it on the GOT10k-Val dataset to see its performance, but only see 'LaSOT', 'otb', 'nfs', 'uav', 'tc128ce' datasets, so how to evaluate on the GOT10k-Val? By the way, what's the difference between analysis_results and analysis_results_ITP files?

opened by 3bobo 3
Training process not utilizing a dynamically updated template

It seems that STARK doesn't mention anything about a dynamically updated template (DUT for short) during training procedure, is it a deliberate design or am I missing something?

I reckon that the DUT is actually something like a short-term memory, and it should not be treated equally as a normal template from the first frame by the transformer, so the DUT should be explicitly included in training. However, this is not how STARK has been implemented.

So I'm curious what's the intuition or reasoning behind STARK's current training protocol of dismissing the DUT?

opened by luowyang 3
why not set sequential input of the data

Hi, thanks for your work. I find from your codes that "shuffle = True" when setting dataloaders. So if the input data is not sequential, how to update template every 200 frames? thanks!

opened by ANdong-star 3
ModuleNotFoundError: No module named 'lib'

hi,

I run several times about vot (python version), but still got the problem: from lib.test.vot20.stark_vot20 import run_vot_exp ModuleNotFoundError: No module named 'lib'

It seems not finding the stark project path, though I export it as: export PYTHONPATH=/home/xxxx/projects/transformer/Stark-main:$PYTHONPATH.

Expected to solve it by inspiring from any of your answers.

Thanks!

opened by zhanglichao 2
the meaning of "lmdb" in "self.lasot_lmdb_dir"

Hi! Could you please tell what is the meaning of "lmdb" in class EnvironmentSettings "self.lasot_lmdb_dir"? I guess it is the dir of val dataset of lasot?

opened by ANdong-star 2
where is the definition of a parameter of your codes

Hi, thanks for your work! There is a parameter making me confused. I don't find the definition of params in class STARK_ST and could you please tell me? Thanks!

opened by ANdong-star 2
Effect of template choice on transformer
Thanks for sharing! I have some questions around the choice of template. From the paper you cropped 2^2 times the ground truth bounding box, rather than just the actual target bounding box resized to square image. My questions are:

Is the purpose here to include more surrounding information? If so what would be the optimal template size here? Also a factor of 2 would not always include the whole tracking object if aspect ratio is high.

By not specifying the bounding box exactly I assume the transformer has to learn some segmentation capability? For instance right now I noticed that if you change the template crop size (output size stay the same) a little bit during the inference time, the model would give very poor performance. So it seems that some information sensitive to absolute positions are learned in this setting. Would passing the exact coordinates into the transformer help in any way?
opened by waterknows 2
How to create the data folder with all the datasets on it?

Hi guys, according to the readme, I should create a folder called data, just under the root stark folder. This folder should contain different datasets: lasot, got10k, coco and trackingnet.

How can I add all this datasets to that data folder?

opened by salcanmor 0
Where to download STARKST_ep0500.pth.tar ?

Hi guys, I'm trying to run this tracker but it is throwing the error:

FileNotFoundError: [Errno 2] No such file or directory: '/home/salva/submit_STARK_LT-code/checkpoints/train/stark_ref/baseline/STARKST_ep0500.pth.tar'

I cannot find STARKST_ep0500.pth.tar in the links provided in the modelzoo, so, how can I solve this error?

Thanks in advance.

opened by salcanmor 2
the checkpoint file of stark-st1 link has expired

Hi! the checkpoint file of stark-st1 link has expired(https://drive.google.com/file/d/1HswUW0oHKjiTL9xR7d2WNW9QOLE040vS/view?usp=sharing), can you re-upload the model of the first stage（ (baseline / baseline_got10k_only / baseline_R101 / baseline_R101_got10k_only) ）, and then give a new link, thank you very much, or send me an email：[email protected], thank you again!

opened by kuaiJL 2
Some questions about AR(Alpha Refine).

Have you tried to use Alpha-Refine for evaluation on datasets such as GOT-1OK and TrackingNet? If you have tried, can you provide this part of the code? Thanks.

opened by RelayZ 0

Owner

Multimedia Research

Multimedia Research at Microsoft Research Asia

GitHub

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

41 Jan 3, 2023

Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

3.5k Jan 3, 2023

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

CTC Decoding Algorithms Update 2021: installable Python package Python implementation of some common Connectionist Temporal Classification (CTC) decod

736 Jan 3, 2023

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

MTFAA-Net Unofficial PyTorch implementation of Baidu's MTFAA-Net: "Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speec

87 Dec 19, 2022

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

89 Dec 18, 2022

Tracking Progress in Natural Language Processing

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

21.2k Dec 30, 2022

Global Tracking Transformers, CVPR 2022

Global Tracking Transformers Global Tracking Transformers, Xingyi Zhou, Tianwei Yin, Vladlen Koltun, Philipp Krähenbühl, CVPR 2022 (arXiv 2203.13250)

304 Dec 16, 2022

Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

55 Nov 17, 2022

A simple visual front end to the Maya UE4 RBF plugin delivered with MetaHumans

poseWrangler Overview PoseWrangler is a simple UI to create and edit pose-driven relationships in Maya using the MayaUE4RBF plugin. This plugin is dis

105 Dec 18, 2022

TalkNet: Audio-visual active speaker detection Model

Is someone talking? TalkNet: Audio-visual active speaker detection Model This repository contains the code for our ACM MM 2021 paper, TalkNet, an acti

142 Dec 14, 2022

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles (TASLP 2022)

3 Apr 14, 2022

HAIS_2GNN: 3D Visual Grounding with Graph and Attention

HAIS_2GNN: 3D Visual Grounding with Graph and Attention This repository is for the HAIS_2GNN research project. Tao Gu, Yue Chen Introduction The motiv

1 Nov 26, 2022

A Flask Sentiment Analysis API, with visual implementation

The Sentiment Analysis Api was created using python flask module,it allows users to parse a text or sentence throught the (?text) arguement, then view the sentiment analysis of that sentence. It can be implementable into a web application.

10 Jul 17, 2022

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

4.6k Jan 1, 2023

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

3.2k Feb 17, 2021

A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

multitask-learning-transformers A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You

48 Jan 2, 2023

Learning Spatio-Temporal Transformer for Visual Tracking

Related tags

Overview

STARK

Highlights

The strongest performances

Real-Time Speed

End-to-End, Post-processing Free

Purely PyTorch-based Code

Install the environment

Data Preparation

Train STARK

Test and evaluate STARK on benchmarks

Test FLOPs, Params, and Speed

Model Zoo

Acknowledgments

Comments

run: python tracking/video_demo.py stark_s baseline test_video/demo.mp4

cuda10.2:

cuda11.0:

but when i installed tensorboardX:

Owner

Multimedia Research

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Sequence modeling benchmarks and temporal convolutional networks

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Tracking Progress in Natural Language Processing

Global Tracking Transformers, CVPR 2022

Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

A simple visual front end to the Maya UE4 RBF plugin delivered with MetaHumans

TalkNet: Audio-visual active speaker detection Model

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

HAIS_2GNN: 3D Visual Grounding with Graph and Attention

A Flask Sentiment Analysis API, with visual implementation

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.