Paper: Do different tracking tasks require different appearance model?
[ArXiv] (comming soon) [Project Page] (comming soon)
UniTrack is a simple and Unified framework for versatile visual Tracking tasks.
As an important problem in computer vision, tracking has been fragmented into a multitude of different experimental setups. As a consequence, the literature has fragmented too, and now the novel approaches proposed by the community are usually specialized to fit only one specific setup. To understand to what extend this specialization is actually necessary, we present UniTrack, a solution to address multiple different tracking tasks within the same framework. All tasks share the same universal appearance model. UniTrack enjoys the following advantages,
-
Do NOT need training on a specific tracking task.
-
Good performance in existing tracking tasks, thus can serve as strong baselines for each task.
-
Could be easily adapted to novel tasks with different setup.
-
Could serve as an evaluation platform to test pre-trained representations on tracking tasks (e.g. via self-supervised models).
Tasks & Framework
Tasks
We classify existing tracking tasks along four axes: (1) Single or multiple targets; (2) Users specify targets or automatic detectors specify targets; (3) Observation formats (bounding box/mask/pose); (2) Class-agnostic or class-specific (i.e. human/vehicles). We mainly expriment on 5 tasks: SOT, VOS, MOT, MOTS, and PoseTrack. Task setups are summarized in the above figure.
Appearance model
An appearance model is the only learnable component in UniTrack. It should provide universal visual representation, and is usually pre-trained on large-scale dataset in supervised or unsupervised manners. Typical examples include ImageNet pre-trained ResNets (supervised), and recent self-supervised models such as MoCo and SimCLR (unsupervised).
Propagation and Association
Two fundamental algorithm building blocks in UniTrack. Both employ features extracted by the appearance model as input. For propagation we adopt exiting methods such as cross correlation, DCF, and mask propation. For association we employ a simple algorithm and develop a novel similarity metric to make full use of the appearance model.
Results
Below we show results of UniTrack with a simple ImageNet Pre-trained ResNet-18 as the appearance model. More results (other tasks/datasets, more visualization) can be found in results.md.
Qualitative results
Single Object Tracking (SOT) on OTB-2015
Video Object Segmentation (VOS) on DAVIS-2017 val split
Multiple Object Tracking (MOT) on MOT-16 test set private detector track (Detections from FairMOT)
Multiple Object Tracking and Segmentation (MOTS) on MOTS challenge test set (Detections from COSTA_st)
Pose Tracking on PoseTrack-2018 val split (Detections from LightTrack)
Quantitative results
Single Object Tracking (SOT) on OTB-2015
Method | SiamFC | SiamRPN | SiamRPN++ | UDT* | UDT+* | LUDT* | LUDT+* | UniTrack_XCorr* | UniTrack_DCF* |
---|---|---|---|---|---|---|---|---|---|
AUC | 58.2 | 63.7 | 69.6 | 59.4 | 63.2 | 60.2 | 63.9 | 55.5 | 61.8 |
* indicates non-supervised methods
Video Object Segmentation (VOS) on DAVIS-2017 val split
Method | SiamMask | FeelVOS | STM | Colorization* | TimeCycle* | UVC* | CRW* | VFS* | UniTrack* |
---|---|---|---|---|---|---|---|---|---|
J-mean | 54.3 | 63.7 | 79.2 | 34.6 | 40.1 | 56.7 | 64.8 | 66.5 | 58.4 |
* indicates non-supervised methods
Multiple Object Tracking (MOT) on MOT-16 test set private detector track
Method | POI | DeepSORT-2 | JDE | CTrack | TubeTK | TraDes | CSTrack | FairMOT* | UniTrack* |
---|---|---|---|---|---|---|---|---|---|
IDF-1 | 65.1 | 62.2 | 55.8 | 57.2 | 62.2 | 64.7 | 71.8 | 72.8 | 71.8 |
IDs | 805 | 781 | 1544 | 1897 | 1236 | 1144 | 1071 | 1074 | 683 |
MOTA | 66.1 | 61.4 | 64.4 | 67.6 | 66.9 | 70.1 | 70.7 | 74.9 | 74.7 |
* indicates methods using the same detections
Multiple Object Tracking and Segmentation (MOTS) on MOTS challenge test set
Method | TrackRCNN | SORTS | PointTrack | GMPHD | COSTA_st* | UniTrack* |
---|---|---|---|---|---|---|
IDF-1 | 42.7 | 57.3 | 42.9 | 65.6 | 70.3 | 67.2 |
IDs | 567 | 577 | 868 | 566 | 421 | 622 |
sMOTA | 40.6 | 55.0 | 62.3 | 69.0 | 70.2 | 68.9 |
* indicates methods using the same detections
Pose Tracking on PoseTrack-2018 val split
Method | MDPN | OpenSVAI | Miracle | KeyTrack | LightTrack* | UniTrack* |
---|---|---|---|---|---|---|
IDF-1 | - | - | - | - | 52.2 | 73.2 |
IDs | - | - | - | - | 3024 | 6760 |
sMOTA | 50.6 | 62.4 | 64.0 | 66.6 | 64.8 | 63.5 |
* indicates methods using the same detections
Getting started
Demo
Update log
[2021.6.24]: Start writing docs, please stay tuned!
Acknowledgement
VideoWalk by Allan A. Jabri
SOT code by Zhipeng Zhang