Learning Spatio-Temporal Transformer for Visual Tracking

Multimedia Research

Last update: Dec 29, 2022

Related tags

Deep Learning Stark

Overview

STARK

The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking

Hiring research interns for visual transformer projects: [email protected]

Highlights

End-to-End, Post-processing Free

STARK is an end-to-end tracking approach, which directly predicts one accurate bounding box as the tracking result.
Besides, STARK does not use any hyperparameters-sensitive post-processing, leading to stable performances.

Real-Time Speed

STARK-ST50 and STARK-ST101 run at 40FPS and 30FPS respectively on a Tesla V100 GPU.

Strong performance

Tracker	LaSOT (AUC)	GOT-10K (AO)	TrackingNet (AUC)
STARK	67.1	68.8	82.0
TransT	64.9	67.1	81.4
TrDiMP	63.7	67.1	78.4
Siam R-CNN	64.8	64.9	81.2

Purely PyTorch-based Code

STARK is implemented purely based on the PyTorch.

Install the environment

Option1: Use the Anaconda

conda create -n stark python=3.6
conda activate stark
bash install.sh

Option2: Use the docker file

We provide the complete docker at here

Data Preparation

Put the tracking datasets in ./data. It should look like:

${STARK_ROOT}
 -- data
     -- lasot
         |-- airplane
         |-- basketball
         |-- bear
         ...
     -- got10k
         |-- test
         |-- train
         |-- val
     -- coco
         |-- annotations
         |-- images
     -- trackingnet
         |-- TRAIN_0
         |-- TRAIN_1
         ...
         |-- TRAIN_11
         |-- TEST

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir .

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Train STARK

Training with multiple GPUs using DDP

# STARK-S50
python tracking/train.py --script stark_s --config baseline --save_dir . --mode multiple --nproc_per_node 8  # STARK-S50
# STARK-ST50
python tracking/train.py --script stark_st1 --config baseline --save_dir . --mode multiple --nproc_per_node 8  # STARK-ST50 Stage1
python tracking/train.py --script stark_st2 --config baseline --save_dir . --mode multiple --nproc_per_node 8 --script_prv stark_st1 --config_prv baseline  # STARK-ST50 Stage2
# STARK-ST101
python tracking/train.py --script stark_st1 --config baseline_R101 --save_dir . --mode multiple --nproc_per_node 8  # STARK-ST101 Stage1
python tracking/train.py --script stark_st2 --config baseline_R101 --save_dir . --mode multiple --nproc_per_node 8 --script_prv stark_st1 --config_prv baseline_R101  # STARK-ST101 Stage2

(Optionally) Debugging training with a single GPU

python tracking/train.py --script stark_s --config baseline --save_dir . --mode single

Test and evaluate STARK on benchmarks

LaSOT

python tracking/test.py stark_st baseline --dataset lasot --threads 32
python tracking/analysis_results.py # need to modify tracker configs and names

GOT10K-test

python tracking/test.py stark_st baseline_got10k_only --dataset got10k_test --threads 32
python lib/test/utils/transform_got10k.py --tracker_name stark_st --cfg_name baseline_got10k_only

TrackingNet

python tracking/test.py stark_st baseline --dataset trackingnet --threads 32
python lib/test/utils/transform_trackingnet.py --tracker_name stark_st --cfg_name baseline

VOT2020
Before evaluating "STARK+AR" on VOT2020, please install some extra packages following external/AR/README.md

cd external/vot20/<workspace_dir>
export PYTHONPATH=<path to the stark project>:$PYTHONPATH
bash exp.sh

VOT2020-LT

cd external/vot20_lt/<workspace_dir>
export PYTHONPATH=<path to the stark project>:$PYTHONPATH
bash exp.sh

Test FLOPs, Params, and Speed

# Profiling STARK-S50 model
python tracking/profile_model.py --script stark_s --config baseline
# Profiling STARK-ST50 model
python tracking/profile_model.py --script stark_st2 --config baseline
# Profiling STARK-ST101 model
python tracking/profile_model.py --script stark_st2 --config baseline_R101

Model Zoo

The trained models, the training logs, and the raw tracking results are provided in the model zoo

Acknowledgments

Thanks for the great PyTracking Library, which helps us to quickly implement our ideas.
We use the implementation of the DETR from the official repo https://github.com/facebookresearch/detr.

Comments

Dataloader will randomly crashed
Hi.

I found that the training process will randomly crashed with RuntimeError: DataLoader worker (pid(s) 36469) exited unexpectedly, is that normal?

I use the following training command.

python tracking/train.py --script stark_s --config baseline_got10k_only --save_dir . --mode multiple --nproc_per_node 8

thanks!
opened by memoiry 7
A problem about loading checkpoint

When I train ‘st’ model，I found the 'net_type' is ''STARKS'',but the checkpoint_dict['net_type'] is ''LittleBoy_clean_corner'', so assert net_type == checkpoint_dict['net_type'], 'Network is not of correct type.'It's always wrong.

How can I solve this problem?Thanks！

opened by 1071189147 5

cuda10.2 and 3060 do not match

run: python tracking/video_demo.py stark_s baseline test_video/demo.mp4

cuda10.2:

NVIDIA GeForce RTX 3060 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3060 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

cuda11.0:

WARNING: You are using tensorboardX instead sis you have a too old pytorch version.
Traceback (most recent call last):
  File "tracking/../lib/train/admin/tensorboard.py", line 4, in <module>
    from torch.utils.tensorboard import SummaryWriter
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 189, in <module>
    _load_global_deps()
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 142, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/lib/../../../../libcublas.so.11: symbol free_gemm_select version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tracking/video_demo.py", line 9, in <module>
    from lib.test.evaluation import Tracker
  File "tracking/../lib/test/evaluation/__init__.py", line 1, in <module>
    from .data import Sequence
  File "tracking/../lib/test/evaluation/data.py", line 3, in <module>
    from lib.train.data.image_loader import imread_indexed
  File "tracking/../lib/train/__init__.py", line 1, in <module>
    from .admin.multigpu import MultiGPU
  File "tracking/../lib/train/admin/__init__.py", line 3, in <module>
    from .tensorboard import TensorboardWriter
  File "tracking/../lib/train/admin/tensorboard.py", line 7, in <module>
    from tensorboardX import SummaryWriter
ModuleNotFoundError: No module named 'tensorboardX'

but when i installed tensorboardX:

WARNING: You are using tensorboardX instead sis you have a too old pytorch version.
Traceback (most recent call last):
  File "tracking/../lib/train/admin/tensorboard.py", line 4, in <module>
    from torch.utils.tensorboard import SummaryWriter
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 189, in <module>
    _load_global_deps()
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 142, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/lib/../../../../libcublas.so.11: symbol free_gemm_select version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tracking/video_demo.py", line 9, in <module>
    from lib.test.evaluation import Tracker
  File "tracking/../lib/test/evaluation/__init__.py", line 1, in <module>
    from .data import Sequence
  File "tracking/../lib/test/evaluation/data.py", line 3, in <module>
    from lib.train.data.image_loader import imread_indexed
  File "tracking/../lib/train/__init__.py", line 1, in <module>
    from .admin.multigpu import MultiGPU
  File "tracking/../lib/train/admin/__init__.py", line 3, in <module>
    from .tensorboard import TensorboardWriter
  File "tracking/../lib/train/admin/tensorboard.py", line 7, in <module>
    from tensorboardX import SummaryWriter
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/tensorboardX/__init__.py", line 5, in <module>
    from .torchvis import TorchVis
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/tensorboardX/torchvis.py", line 11, in <module>
    from .writer import SummaryWriter
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/tensorboardX/writer.py", line 34, in <module>
    import torch
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 189, in <module>
    _load_global_deps()
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/__init__.py", line 142, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/home/richard/miniconda3/envs/torch1.7/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/richard/miniconda3/envs/torch1.7/lib/python3.6/site-packages/torch/lib/../../../../libcublas.so.11: symbol free_gemm_select version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

opened by Richard-mei 4

About GOT-10k test set results

Hi, Thanks for your wonderful work. I notice that Transformer Tracking use the model trained with all datasets(LaSOT, GOT10K, COCO, TrackingNet) to get the evaluation result on GOT-10k test set, and the result is much better than the model trained with GOT10K only.

However, when I use the STARK-S50 pre-trained model(trained on all datasets) in your model zoo to evaluate the GOT-10k test set, I find that the AO is 0.688, which only gains small improvement compared with 0.672

I am confused with this phenomenon. Have you ever tried to evaluate the model trained with all datasets on GOT-10k test set? Or can you kindly explain the reason why there is just little performance gain to use the model trained on all datasets?

opened by botaoye 4
how to analysis the model on GOT10k-val dataset?

Thanks for your work! I trained the model and want to evaluate it on the GOT10k-Val dataset to see its performance, but only see 'LaSOT', 'otb', 'nfs', 'uav', 'tc128ce' datasets, so how to evaluate on the GOT10k-Val? By the way, what's the difference between analysis_results and analysis_results_ITP files?

opened by 3bobo 3
Training process not utilizing a dynamically updated template

It seems that STARK doesn't mention anything about a dynamically updated template (DUT for short) during training procedure, is it a deliberate design or am I missing something?

I reckon that the DUT is actually something like a short-term memory, and it should not be treated equally as a normal template from the first frame by the transformer, so the DUT should be explicitly included in training. However, this is not how STARK has been implemented.

So I'm curious what's the intuition or reasoning behind STARK's current training protocol of dismissing the DUT?

opened by luowyang 3
why not set sequential input of the data

Hi, thanks for your work. I find from your codes that "shuffle = True" when setting dataloaders. So if the input data is not sequential, how to update template every 200 frames? thanks!

opened by ANdong-star 3
ModuleNotFoundError: No module named 'lib'

hi,

I run several times about vot (python version), but still got the problem: from lib.test.vot20.stark_vot20 import run_vot_exp ModuleNotFoundError: No module named 'lib'

It seems not finding the stark project path, though I export it as: export PYTHONPATH=/home/xxxx/projects/transformer/Stark-main:$PYTHONPATH.

Expected to solve it by inspiring from any of your answers.

Thanks!

opened by zhanglichao 2
the meaning of "lmdb" in "self.lasot_lmdb_dir"

Hi! Could you please tell what is the meaning of "lmdb" in class EnvironmentSettings "self.lasot_lmdb_dir"? I guess it is the dir of val dataset of lasot?

opened by ANdong-star 2
where is the definition of a parameter of your codes

Hi, thanks for your work! There is a parameter making me confused. I don't find the definition of params in class STARK_ST and could you please tell me? Thanks!

opened by ANdong-star 2
Effect of template choice on transformer
Thanks for sharing! I have some questions around the choice of template. From the paper you cropped 2^2 times the ground truth bounding box, rather than just the actual target bounding box resized to square image. My questions are:

Is the purpose here to include more surrounding information? If so what would be the optimal template size here? Also a factor of 2 would not always include the whole tracking object if aspect ratio is high.

By not specifying the bounding box exactly I assume the transformer has to learn some segmentation capability? For instance right now I noticed that if you change the template crop size (output size stay the same) a little bit during the inference time, the model would give very poor performance. So it seems that some information sensitive to absolute positions are learned in this setting. Would passing the exact coordinates into the transformer help in any way?
opened by waterknows 2
How to create the data folder with all the datasets on it?

Hi guys, according to the readme, I should create a folder called data, just under the root stark folder. This folder should contain different datasets: lasot, got10k, coco and trackingnet.

How can I add all this datasets to that data folder?

opened by salcanmor 0
Where to download STARKST_ep0500.pth.tar ?

Hi guys, I'm trying to run this tracker but it is throwing the error:

FileNotFoundError: [Errno 2] No such file or directory: '/home/salva/submit_STARK_LT-code/checkpoints/train/stark_ref/baseline/STARKST_ep0500.pth.tar'

I cannot find STARKST_ep0500.pth.tar in the links provided in the modelzoo, so, how can I solve this error?

Thanks in advance.

opened by salcanmor 2
the checkpoint file of stark-st1 link has expired

Hi! the checkpoint file of stark-st1 link has expired(https://drive.google.com/file/d/1HswUW0oHKjiTL9xR7d2WNW9QOLE040vS/view?usp=sharing), can you re-upload the model of the first stage（ (baseline / baseline_got10k_only / baseline_R101 / baseline_R101_got10k_only) ）, and then give a new link, thank you very much, or send me an email：[email protected], thank you again!

opened by kuaiJL 2
Some questions about AR(Alpha Refine).

Have you tried to use Alpha-Refine for evaluation on datasets such as GOT-1OK and TrackingNet? If you have tried, can you provide this part of the code? Thanks.

opened by RelayZ 0

Owner

Multimedia Research

Multimedia Research at Microsoft Research Asia

GitHub

Implementation of the "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos" paper.

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos Introduction Point cloud videos exhibit irregularities and lack of or

101 Dec 29, 2022

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

236 Dec 22, 2022

Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds This is the official code implementation for the paper "Spatio-temporal Se

63 Jan 5, 2023

Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach

Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach This is the implementation of traffic prediction code in DTMP based on PyTo

1 Dec 19, 2021

Self-supervised spatio-spectro-temporal represenation learning for EEG analysis

EEG-Oriented Self-Supervised Learning and Cluster-Aware Adaptation This repository provides a tensorflow implementation of a submitted paper: EEG-Orie

4 Jun 9, 2022

Deep generative modeling for time-stamped heterogeneous data, enabling high-fidelity models for a large variety of spatio-temporal domains.

Neural Spatio-Temporal Point Processes [arxiv] Ricky T. Q. Chen, Brandon Amos, Maximilian Nickel Abstract. We propose a new class of parameterizations

75 Dec 19, 2022

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks Abstract Facial expression recognition in video

103 Dec 29, 2022

Implementation of the "PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences" paper.

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences Introduction Point cloud sequences are irregular and unordered in the spatial dimen

63 Dec 9, 2022

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021) Project Page | Video | Paper | Data We present a novel metho

65 Nov 28, 2022

DeepSTD: Mining Spatio-temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction

DeepSTD: Mining Spatio-temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction This is the implementation of DeepSTD in

5 Sep 26, 2022

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Intelligent Robotics and Machine Vision Lab

4 Jul 19, 2022

Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression.

Spatio-Temporal Entropy Model A Pytorch Reproduction of Spatio-Temporal Entropy Model (STEM) for end-to-end leaned video compression. More details can

16 Nov 28, 2022

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

108 Dec 27, 2022

TrTr: Visual Tracking with Transformer

TrTr: Visual Tracking with Transformer We propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder a

66 Dec 27, 2022

CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network - Pytorch This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal

63 Sep 27, 2022

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

5 Sep 16, 2022

Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

This repository is the official PyTorch implementation of Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

4 Dec 11, 2022

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022)

TCTrack: Temporal Contexts for Aerial Tracking （CVPR2022) Ziang Cao and Ziyuan Huang and Liang Pan and Shiwei Zhang and Ziwei Liu and Changhong Fu In

Intelligent Vision for Robotics in Complex Environment

100 Dec 19, 2022