Learning Spatio-Temporal Transformer for Visual Tracking

Overview

STARK

The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking STARK_Framework

Highlights

The strongest performances

Tracker LaSOT (AUC) GOT-10K (AO) TrackingNet (AUC)
STARK 67.1 68.8 82.0
TransT 64.9 67.1 81.4
TrDiMP 63.7 67.1 78.4
Siam R-CNN 64.8 64.9 81.2

Real-Time Speed

STARK-ST50 and STARK-ST101 run at 40FPS and 30FPS respectively on a Tesla V100 GPU.

End-to-End, Post-processing Free

STARK is an end-to-end tracking approach, which directly predicts one accurate bounding box as the tracking result.
Besides, STARK does not use any hyperparameters-sensitive post-processing, leading to stable performances.

Purely PyTorch-based Code

STARK is implemented purely based on the PyTorch.

Install the environment

Option1: Use the Anaconda

conda create -n stark python=3.6
conda activate stark
bash install.sh

Option2: Use the docker file

We provide the complete docker at here

Data Preparation

Put the tracking datasets in ./data. It should look like:

${STARK_ROOT}
 -- data
     -- lasot
         |-- airplane
         |-- basketball
         |-- bear
         ...
     -- got10k
         |-- test
         |-- train
         |-- val
     -- coco
         |-- annotations
         |-- images
     -- trackingnet
         |-- TRAIN_0
         |-- TRAIN_1
         ...
         |-- TRAIN_11
         |-- TEST

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir .

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Train STARK

Training with multiple GPUs using DDP

# STARK-S50
python tracking/train.py --script stark_s --config baseline --save_dir . --mode multiple --nproc_per_node 8  # STARK-S50
# STARK-ST50
python tracking/train.py --script stark_st1 --config baseline --save_dir . --mode multiple --nproc_per_node 8  # STARK-ST50 Stage1
python tracking/train.py --script stark_st2 --config baseline --save_dir . --mode multiple --nproc_per_node 8 --script_prv stark_st1 --config_prv baseline  # STARK-ST50 Stage2
# STARK-ST101
python tracking/train.py --script stark_st1 --config baseline_R101 --save_dir . --mode multiple --nproc_per_node 8  # STARK-ST101 Stage1
python tracking/train.py --script stark_st2 --config baseline_R101 --save_dir . --mode multiple --nproc_per_node 8 --script_prv stark_st1 --config_prv baseline_R101  # STARK-ST101 Stage2

(Optionally) Debugging training with a single GPU

python tracking/train.py --script stark_s --config baseline --save_dir . --mode single

Test and evaluate STARK on benchmarks

  • LaSOT
python tracking/test.py stark_st baseline --dataset lasot --threads 32
python tracking/analysis_results.py # need to modify tracker configs and names
  • GOT10K-test
python tracking/test.py stark_st baseline_got10k_only --dataset got10k_test --threads 32
python lib/test/utils/transform_got10k.py --tracker_name stark_st --cfg_name baseline_got10k_only
  • TrackingNet
python tracking/test.py stark_st baseline --dataset trackingnet --threads 32
python lib/test/utils/transform_trackingnet.py --tracker_name stark_st --cfg_name baseline
  • VOT2020
    Before evaluating "STARK+AR" on VOT2020, please install some extra packages following external/AR/README.md
cd external/vot20/<workspace_dir>
export PYTHONPATH=<path to the stark project>:$PYTHONPATH
bash exp.sh
  • VOT2020-LT
cd external/vot20_lt/<workspace_dir>
export PYTHONPATH=<path to the stark project>:$PYTHONPATH
bash exp.sh

Test FLOPs, Params, and Speed

# Profiling STARK-S50 model
python tracking/profile_model.py --script stark_s --config baseline
# Profiling STARK-ST50 model
python tracking/profile_model.py --script stark_st2 --config baseline
# Profiling STARK-ST101 model
python tracking/profile_model.py --script stark_st2 --config baseline_R101

Model Zoo

The trained models, the training logs, and the raw tracking results are provided in the model zoo

Acknowledgments

Comments
  • Dataloader will randomly crashed

    Dataloader will randomly crashed

    Hi.

    I found that the training process will randomly crashed with RuntimeError: DataLoader worker (pid(s) 36469) exited unexpectedly, is that normal?

    I use the following training command.

    python tracking/train.py --script stark_s --config baseline_got10k_only --save_dir . --mode multiple --nproc_per_node 8
    

    thanks!

    opened by memoiry 7
  • A problem about loading checkpoint

    A problem about loading checkpoint

    When I train ‘st’ model,I found the 'net_type' is ''STARKS'',but the checkpoint_dict['net_type'] is ''LittleBoy_clean_corner'', so assert net_type == checkpoint_dict['net_type'], 'Network is not of correct type.'It's always wrong. image

    How can I solve this problem?Thanks!

    opened by 1071189147 5
  • cuda10.2 and 3060 do not match

    cuda10.2 and 3060 do not match

    run: python tracking/video_demo.py stark_s baseline test_video/demo.mp4

    cuda10.2:

    NVIDIA GeForce RTX 3060 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
    The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
    If you want to use the NVIDIA GeForce RTX 3060 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
      warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
    

    cuda11.0:

    WARNING: You are using tensorboardX instead sis you have a too old pytorch version.
    Traceback (most recent call last):
      File "tracking/../lib/train/admin/tensorboard.py", line 4, in <module>
        from torch.utils.tensorboard import SummaryWriter
      File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 189, in <module>
        _load_global_deps()
      File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 142, in _load_global_deps
        ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
      File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/ctypes/__init__.py", line 348, in __init__
        self._handle = _dlopen(self._name, mode)
    OSError: /home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/lib/../../../../libcublas.so.11: symbol free_gemm_select version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "tracking/video_demo.py", line 9, in <module>
        from lib.test.evaluation import Tracker
      File "tracking/../lib/test/evaluation/__init__.py", line 1, in <module>
        from .data import Sequence
      File "tracking/../lib/test/evaluation/data.py", line 3, in <module>
        from lib.train.data.image_loader import imread_indexed
      File "tracking/../lib/train/__init__.py", line 1, in <module>
        from .admin.multigpu import MultiGPU
      File "tracking/../lib/train/admin/__init__.py", line 3, in <module>
        from .tensorboard import TensorboardWriter
      File "tracking/../lib/train/admin/tensorboard.py", line 7, in <module>
        from tensorboardX import SummaryWriter
    ModuleNotFoundError: No module named 'tensorboardX'
    

    but when i installed tensorboardX:

    WARNING: You are using tensorboardX instead sis you have a too old pytorch version.
    Traceback (most recent call last):
      File "tracking/../lib/train/admin/tensorboard.py", line 4, in <module>
        from torch.utils.tensorboard import SummaryWriter
      File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 189, in <module>
        _load_global_deps()
      File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 142, in _load_global_deps
        ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
      File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/ctypes/__init__.py", line 348, in __init__
        self._handle = _dlopen(self._name, mode)
    OSError: /home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/lib/../../../../libcublas.so.11: symbol free_gemm_select version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "tracking/video_demo.py", line 9, in <module>
        from lib.test.evaluation import Tracker
      File "tracking/../lib/test/evaluation/__init__.py", line 1, in <module>
        from .data import Sequence
      File "tracking/../lib/test/evaluation/data.py", line 3, in <module>
        from lib.train.data.image_loader import imread_indexed
      File "tracking/../lib/train/__init__.py", line 1, in <module>
        from .admin.multigpu import MultiGPU
      File "tracking/../lib/train/admin/__init__.py", line 3, in <module>
        from .tensorboard import TensorboardWriter
      File "tracking/../lib/train/admin/tensorboard.py", line 7, in <module>
        from tensorboardX import SummaryWriter
      File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/tensorboardX/__init__.py", line 5, in <module>
        from .torchvis import TorchVis
      File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/tensorboardX/torchvis.py", line 11, in <module>
        from .writer import SummaryWriter
      File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/tensorboardX/writer.py", line 34, in <module>
        import torch
      File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 189, in <module>
        _load_global_deps()
      File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 142, in _load_global_deps
        ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
      File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/ctypes/__init__.py", line 348, in __init__
        self._handle = _dlopen(self._name, mode)
    OSError: /home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/lib/../../../../libcublas.so.11: symbol free_gemm_select version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference
    
    opened by Richard-mei 4
  • About GOT-10k test set results

    About GOT-10k test set results

    Hi, Thanks for your wonderful work. I notice that Transformer Tracking use the model trained with all datasets(LaSOT, GOT10K, COCO, TrackingNet) to get the evaluation result on GOT-10k test set, and the result is much better than the model trained with GOT10K only.

    image

    However, when I use the STARK-S50 pre-trained model(trained on all datasets) in your model zoo to evaluate the GOT-10k test set, I find that the AO is 0.688, which only gains small improvement compared with 0.672

    I am confused with this phenomenon. Have you ever tried to evaluate the model trained with all datasets on GOT-10k test set? Or can you kindly explain the reason why there is just little performance gain to use the model trained on all datasets?

    opened by botaoye 4
  • how to analysis the model on GOT10k-val dataset?

    how to analysis the model on GOT10k-val dataset?

    Thanks for your work! I trained the model and want to evaluate it on the GOT10k-Val dataset to see its performance, but only see 'LaSOT', 'otb', 'nfs', 'uav', 'tc128ce' datasets, so how to evaluate on the GOT10k-Val? By the way, what's the difference between analysis_results and analysis_results_ITP files?

    opened by 3bobo 3
  • Training process not utilizing a dynamically updated template

    Training process not utilizing a dynamically updated template

    It seems that STARK doesn't mention anything about a dynamically updated template (DUT for short) during training procedure, is it a deliberate design or am I missing something?

    I reckon that the DUT is actually something like a short-term memory, and it should not be treated equally as a normal template from the first frame by the transformer, so the DUT should be explicitly included in training. However, this is not how STARK has been implemented.

    So I'm curious what's the intuition or reasoning behind STARK's current training protocol of dismissing the DUT?

    opened by luowyang 3
  • why not set sequential input of the data

    why not set sequential input of the data

    Hi, thanks for your work. I find from your codes that "shuffle = True" when setting dataloaders. So if the input data is not sequential, how to update template every 200 frames? thanks!

    opened by ANdong-star 3
  • ModuleNotFoundError: No module named 'lib'

    ModuleNotFoundError: No module named 'lib'

    hi,

    I run several times about vot (python version), but still got the problem: from lib.test.vot20.stark_vot20 import run_vot_exp ModuleNotFoundError: No module named 'lib'

    It seems not finding the stark project path, though I export it as: export PYTHONPATH=/home/xxxx/projects/transformer/Stark-main:$PYTHONPATH.

    Expected to solve it by inspiring from any of your answers.

    Thanks!

    opened by zhanglichao 2
  • the meaning of

    the meaning of "lmdb" in "self.lasot_lmdb_dir"

    Hi! Could you please tell what is the meaning of "lmdb" in class EnvironmentSettings "self.lasot_lmdb_dir"? I guess it is the dir of val dataset of lasot?

    opened by ANdong-star 2
  • where is the definition of a parameter of your codes

    where is the definition of a parameter of your codes

    Hi, thanks for your work! There is a parameter making me confused. I don't find the definition of params in class STARK_ST and could you please tell me? Thanks!

    WY(8T DNSSX YPNIIT 6UE7

    opened by ANdong-star 2
  • Effect of template choice on transformer

    Effect of template choice on transformer

    Thanks for sharing! I have some questions around the choice of template. From the paper you cropped 2^2 times the ground truth bounding box, rather than just the actual target bounding box resized to square image. My questions are:

    1. Is the purpose here to include more surrounding information? If so what would be the optimal template size here? Also a factor of 2 would not always include the whole tracking object if aspect ratio is high.
    2. By not specifying the bounding box exactly I assume the transformer has to learn some segmentation capability? For instance right now I noticed that if you change the template crop size (output size stay the same) a little bit during the inference time, the model would give very poor performance. So it seems that some information sensitive to absolute positions are learned in this setting. Would passing the exact coordinates into the transformer help in any way?
    opened by waterknows 2
  • How to create the data folder with all the datasets on it?

    How to create the data folder with all the datasets on it?

    Hi guys, according to the readme, I should create a folder called data, just under the root stark folder. This folder should contain different datasets: lasot, got10k, coco and trackingnet.

    How can I add all this datasets to that data folder?

    opened by salcanmor 0
  • Where to download STARKST_ep0500.pth.tar ?

    Where to download STARKST_ep0500.pth.tar ?

    Hi guys, I'm trying to run this tracker but it is throwing the error:

    FileNotFoundError: [Errno 2] No such file or directory: '/home/salva/submit_STARK_LT-code/checkpoints/train/stark_ref/baseline/STARKST_ep0500.pth.tar'

    I cannot find STARKST_ep0500.pth.tar in the links provided in the modelzoo, so, how can I solve this error?

    Thanks in advance.

    opened by salcanmor 2
  • the checkpoint file of stark-st1  link has expired

    the checkpoint file of stark-st1 link has expired

    Hi! the checkpoint file of stark-st1 link has expired(https://drive.google.com/file/d/1HswUW0oHKjiTL9xR7d2WNW9QOLE040vS/view?usp=sharing), can you re-upload the model of the first stage( (baseline / baseline_got10k_only / baseline_R101 / baseline_R101_got10k_only) ), and then give a new link, thank you very much, or send me an email:[email protected], thank you again!

    opened by kuaiJL 2
  • Some questions about AR(Alpha Refine).

    Some questions about AR(Alpha Refine).

    Have you tried to use Alpha-Refine for evaluation on datasets such as GOT-1OK and TrackingNet? If you have tried, can you provide this part of the code? Thanks.

    opened by RelayZ 0
Owner
Multimedia Research
Multimedia Research at Microsoft Research Asia
Multimedia Research
Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

null 41 Jan 3, 2023
Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

CMU Locus Lab 3.5k Jan 3, 2023
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

CTC Decoding Algorithms Update 2021: installable Python package Python implementation of some common Connectionist Temporal Classification (CTC) decod

Harald Scheidl 736 Jan 3, 2023
Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

MTFAA-Net Unofficial PyTorch implementation of Baidu's MTFAA-Net: "Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speec

Shimin Zhang 87 Dec 19, 2022
[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

Yongming Rao 89 Dec 18, 2022
Tracking Progress in Natural Language Processing

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Sebastian Ruder 21.2k Dec 30, 2022
Global Tracking Transformers, CVPR 2022

Global Tracking Transformers Global Tracking Transformers, Xingyi Zhou, Tianwei Yin, Vladlen Koltun, Philipp Krähenbühl, CVPR 2022 (arXiv 2203.13250)

Xingyi Zhou 304 Dec 16, 2022
Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

Visual Automata Copyright 2021 Lewi Lie Uberg Released under the MIT license Visual Automata is a Python 3 library built as a wrapper for Caleb Evans'

Lewi Uberg 55 Nov 17, 2022
A simple visual front end to the Maya UE4 RBF plugin delivered with MetaHumans

poseWrangler Overview PoseWrangler is a simple UI to create and edit pose-driven relationships in Maya using the MayaUE4RBF plugin. This plugin is dis

Christopher Evans 105 Dec 18, 2022
TalkNet: Audio-visual active speaker detection Model

Is someone talking? TalkNet: Audio-visual active speaker detection Model This repository contains the code for our ACM MM 2021 paper, TalkNet, an acti

null 142 Dec 14, 2022
Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles (TASLP 2022)

Zhuosheng Zhang 3 Apr 14, 2022
HAIS_2GNN: 3D Visual Grounding with Graph and Attention

HAIS_2GNN: 3D Visual Grounding with Graph and Attention This repository is for the HAIS_2GNN research project. Tao Gu, Yue Chen Introduction The motiv

Yue Chen 1 Nov 26, 2022
A Flask Sentiment Analysis API, with visual implementation

The Sentiment Analysis Api was created using python flask module,it allows users to parse a text or sentence throught the (?text) arguement, then view the sentiment analysis of that sentence. It can be implementable into a web application.

Ifechukwudeni Oweh 10 Jul 17, 2022
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 4.6k Jan 1, 2023
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 3.2k Feb 17, 2021
A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

multitask-learning-transformers A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You

Shahrukh Khan 48 Jan 2, 2023
Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

IMDB Sentiment Analysis This is the final project of Machine Learning Courses in Huazhong University of Science and Technology, School of Artificial I

Daniel 0 Dec 27, 2021
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing Trankit is a light-weight Transformer-based Pyth

null 652 Jan 6, 2023
Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

T-TA (Transformer-based Text Auto-encoder) This repository contains codes for Transformer-based Text Auto-encoder (T-TA, paper: Fast and Accurate Deep

Jeong Ukjae 13 Dec 13, 2022