Track to Detect and Segment: An Online Multi-Object Tracker (CVPR 2021)
Track to Detect and Segment: An Online Multi-Object Tracker
Jialian Wu, Jiale Cao, Liangchen Song, Yu Wang, Ming Yang, Junsong Yuan
In CVPR, 2021. [Paper] [Project Page] [Demo (YouTube)]
Many thanks to CenterTrack authors for their great framework!
Installation
Please refer to INSTALL.md for installation instructions.
Run Demo
We reuse the demo script from CenterTrack. Before run the demo, first download our trained models: CrowdHuman model (2D tracking), MOT model (2D tracking) or nuScenes model (3D tracking). Then, put the models in TraDeS_ROOT/models/
and cd TraDeS_ROOT/src/
. The demo result will be saved as a video in TraDeS_ROOT/results/
.
2D Tracking Demo
Demo for a video clip from MOT dataset: Run the demo (using the MOT model):
python demo.py tracking --dataset mot --load_model ../models/mot_half.pth --demo ../videos/mot_mini.mp4 --pre_hm --ltrb_amodal --pre_thresh 0.5 --track_thresh 0.4 --inference --clip_len 3 --trades --save_video --resize_video --input_h 544 --input_w 960
Demo for a video clip which we randomly selected from YouTube: Run the demo (using the CrowdHuman model):
python demo.py tracking --load_model ../models/crowdhuman.pth --num_class 1 --demo ../videos/street_2d.mp4 --pre_hm --ltrb_amodal --pre_thresh 0.5 --track_thresh 0.5 --inference --clip_len 2 --trades --save_video --resize_video --input_h 480 --input_w 864
Demo for your own video or image folder: Please specify the file path after --demo
and run (using the CrowdHuman model):
python demo.py tracking --load_model ../models/crowdhuman.pth --num_class 1 --demo $path to your video or image folder$ --pre_hm --ltrb_amodal --pre_thresh 0.5 --track_thresh 0.5 --inference --clip_len 2 --trades --save_video --resize_video --input_h $your_input_h$ --input_w $your_input_w$
(Some Notes: (i) For 2D tracking, the models are only used for person tracking, since our method is only trained on CrowdHuman or MOT. You may train a model on COCO or your own dataset for multi-category 2D object tracking. (ii) --clip_len
is set to 3 for MOT; otherwise, it should be 2. You may refer to our paper for this detail. (iii) The CrowdHuman model is more able to generalize to real world scenes than the MOT model. Note that both datasets are in non-commercial licenses. (iii) input_h
and input_w
shall be evenly divided by 32.)
3D Tracking Demo
Demo for a video clip from nuScenes dataset: Run the demo (using the nuScenes model):
python demo.py tracking,ddd --dataset nuscenes --load_model ../models/nuscenes.pth --demo ../videos/nuscenes_mini.mp4 --pre_hm --track_thresh 0.1 --inference --clip_len 2 --trades --save_video --resize_video --input_h 448 --input_w 800 --test_focal_length 633
(You will need to specify test_focal_length for monocular 3D tracking demo to convert the image coordinate system back to 3D. The value 633 is half of a typical focal length (~1266) in nuScenes dataset in input resolution 1600x900. The mini demo video is in an input resolution of 800x448, so we need to use a half focal length. You don't need to set the test_focal_length when testing on the original nuScenes data.)
You can also refer to CenterTrack for the usage of webcam demo (code is available in this repo, but we have not tested yet).
Benchmark Evaluation and Training
Please refer to Data.md for dataset preparation.
2D Object Tracking
MOT17 Val | MOTA↑ | IDF1↑ | IDS↓ |
---|---|---|---|
Our Baseline | 64.8 | 59.5 | 1055 |
CenterTrack | 66.1 | 64.2 | 528 |
TraDeS (ours) | 68.2 | 71.7 | 285 |
Test on MOT17 validation set: Place the MOT model in $TraDeS_ROOT/models/ and run:
sh experiments/mot17_test.sh
Train on MOT17 halftrain set: Place the pretrained model in $TraDeS_ROOT/models/ and run:
sh experiments/mot17_train.sh
3D Object Tracking
nuScenes Val | AMOTA↑ | AMOTP↓ | IDSA↓ |
---|---|---|---|
Our Baseline | 4.3 | 1.65 | 1792 |
CenterTrack | 6.8 | 1.54 | 813 |
TraDeS (ours) | 11.8 | 1.48 | 699 |
Test on nuScenes validation set: Place the nuScenes model in $TraDeS_ROOT/models/. You need to change the MOT and nuScenes dataset API versions due to their conflicts. The default installed versions are for MOT dataset. For experiments on nuScenes dataset, please run:
sh nuscenes_switch_version.sh
sh experiments/nuScenes_test.sh
To switch back to the API versions for MOT experiments, you can run:
sh mot_switch_version.sh
Train on nuScenes train set: Place the pretrained model in $TraDeS_ROOT/models/ and run:
sh experiments/nuScenes_train.sh
Train on Static Images
We follow CenterTrack which uses CrowdHuman to pretrain 2D object tracking model. Only the training set is used.
sh experiments/crowdhuman.sh
The trained model is available at CrowdHuman model.
Instance Segmentation Tracking
Code will be released later on after we clean it up. Our implementation is based on here.
Citation
If you find it useful in your research, please consider citing our paper as follows:
@inproceedings{Wu2021TraDeS,
title={Track to Detect and Segment: An Online Multi-Object Tracker},
author={Wu, Jialian and Cao, Jiale and Song, Liangchen and Wang, Yu and Yang, Ming and Yuan, Junsong},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2021}}