Multiple-Object Tracking with Transformer

Overview

TransTrack: Multiple-Object Tracking with Transformer

License: MIT

Introduction

TransTrack: Multiple-Object Tracking with Transformer

Models

Training data Training time Validation MOTA download
crowdhuman, mot_half 36h + 1h 65.4 model
crowdhuman 36h 53.8 model
mot_half 8h 61.6 model

Models are also available in Baidu Drive by code m4iv.

Notes

  • Evaluating crowdhuman-training model and mot-training model use different command lines, see Steps.
  • We observe about 1 MOTA noise.
  • If the resulting MOTA of your self-trained model is not desired, playing around with the --track_thresh sometimes gives a better performance.
  • The training time is on 8 NVIDIA V100 GPUs with batchsize 16.
  • We use the models pre-trained on imagenet.

Demo

Installation

The codebases are built on top of Deformable DETR and CenterTrack.

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4
  • Python>=3.7
  • PyTorch ≥ 1.5 and torchvision that matches the PyTorch installation. You can install them together at pytorch.org to make sure of this
  • OpenCV is optional and needed by demo and visualization

Steps

  1. Install and build libs
git clone https://github.com/PeizeSun/TransTrack.git
cd TransTrack
cd models/ops
python setup.py build install
cd ../..
pip install -r requirements.txt
  1. Prepare dataset
mkdir -p crowdhuman/annotations
cp -r /path_to_crowdhuman_dataset/annotations/CrowdHuman_val.json crowdhuman/annotations/CrowdHuman_val.json
cp -r /path_to_crowdhuman_dataset/annotations/CrowdHuman_train.json crowdhuman/annotations/CrowdHuman_train.json
cp -r /path_to_crowdhuman_dataset/CrowdHuman_train crowdhuman/CrowdHuman_train
cp -r /path_to_crowdhuman_dataset/CrowdHuman_val crowdhuman/CrowdHuman_val
mkdir mot
cp -r /path_to_mot_dataset/train mot/train
cp -r /path_to_mot_dataset/test mot/test
python track_tools/convert_mot_to_coco.py

CrowdHuman dataset is available in CrowdHuman. We provide annotations of json format.

MOT dataset is available in MOT.

  1. Pre-train on crowdhuman
sh track_exps/crowdhuman_train.sh
python track_tools/crowdhuman_model_to_mot.py

The pre-trained model is available crowdhuman_final.pth.

  1. Train TransTrack
sh track_exps/crowdhuman_mot_trainhalf.sh
  1. Evaluate TransTrack
sh track_exps/mot_val.sh
sh track_exps/mot_eval.sh
  1. Visualize TransTrack
python track_tools/txt2video.py

Notes

  • Evaluate pre-trained CrowdHuman model on MOT
sh track_exps/det_val.sh
sh track_exps/mot_eval.sh

License

TransTrack is released under MIT License.

Citing

If you use TransTrack in your research or wish to refer to the baseline results published here, please use the following BibTeX entries:

@article{transtrack,
  title   =  {TransTrack: Multiple-Object Tracking with Transformer},
  author  =  {Peize Sun and Yi Jiang and Rufeng Zhang and Enze Xie and Jinkun Cao and Xinting Hu and Tao Kong and Zehuan Yuan and Changhu Wang and Ping Luo},
  journal =  {arXiv preprint arXiv: 2012.15460},
  year    =  {2020}
}
Comments
  • KeyError: 'age'

    KeyError: 'age'

    What caused the following problem during the test,Thanks! Traceback (most recent call last): File "main_track.py", line 369, in main(args) File "main_track.py", line 292, in main phase='eval', det_val=args.det_val) File "/data1/anaconda3/envs/TransTrack/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, **kwargs) File "/data1/PycharmProjects/TransTrack/engine_track.py", line 156, in evaluate res_track = tracker.step(results[0]) File "/data1/PycharmProjects/TransTrack/models/tracker.py", line 123, in step if track['age'] < self.max_age:

    opened by davidyang180 13
  • Errors when testing on customer video which has a long sequences without people appearing

    Errors when testing on customer video which has a long sequences without people appearing

    Hi, when I run mot_test.sh on my own video, it gives error

    Test:  [ 6120/17958]  eta: 0:36:03  class_error: 100.00  loss: 0.0854 (3.0055)  loss_ce: 0.0187 (0.1138)  loss_bbox: 0.0000 (0.0711)  loss_giou: 0.0000 (0.3172)  loss_ce_0: 0.0126 (0.0945)  loss_bbox_0: 0.0000 (0.0723)  loss_giou_0: 0.0000 (0.3325)  loss_ce_1: 0.0167 (0.1124)  loss_bbox_1: 0.0000 (0.0727)  loss_giou_1: 0.0000 (0.3258)  loss_ce_2: 0.0161 (0.1037)  loss_bbox_2: 0.0000 (0.0725)  loss_giou_2: 0.0000 (0.3240)  loss_ce_3: 0.0132 (0.0974)  loss_bbox_3: 0.0000 (0.0718)  loss_giou_3: 0.0000 (0.3220)  loss_ce_4: 0.0093 (0.1024)  loss_bbox_4: 0.0000 (0.0729)  loss_giou_4: 0.0000 (0.3265)  loss_ce_unscaled: 0.0093 (0.0569)  class_error_unscaled: 100.0000 (27.9203)  loss_bbox_unscaled: 0.0000 (0.0142)  loss_giou_unscaled: 0.0000 (0.1586)  cardinality_error_unscaled: 500.0000 (498.9012)  loss_ce_0_unscaled: 0.0063 (0.0473)  loss_bbox_0_unscaled: 0.0000 (0.0145)  loss_giou_0_unscaled: 0.0000 (0.1662)  cardinality_error_0_unscaled: 500.0000 (498.9012)  loss_ce_1_unscaled: 0.0084 (0.0562)  loss_bbox_1_unscaled: 0.0000 (0.0145)  loss_giou_1_unscaled: 0.0000 (0.1629)  cardinality_error_1_unscaled: 500.0000 (498.9012)  loss_ce_2_unscaled: 0.0081 (0.0518)  loss_bbox_2_unscaled: 0.0000 (0.0145)  loss_giou_2_unscaled: 0.0000 (0.1620)  cardinality_error_2_unscaled: 500.0000 (498.9012)  loss_ce_3_unscaled: 0.0066 (0.0487)  loss_bbox_3_unscaled: 0.0000 (0.0144)  loss_giou_3_unscaled: 0.0000 (0.1610)  cardinality_error_3_unscaled: 500.0000 (498.9012)  loss_ce_4_unscaled: 0.0046 (0.0512)  loss_bbox_4_unscaled: 0.0000 (0.0146)  loss_giou_4_unscaled: 0.0000 (0.1632)  cardinality_error_4_unscaled: 500.0000 (498.9012)  time: 0.1651  data: 0.0022  max mem: 870
    Traceback (most recent call last):
      File "main_track.py", line 367, in <module>
        main(args)
      File "main_track.py", line 289, in main
        test_stats, coco_evaluator, res_tracks = evaluate(model, criterion, postprocessors, data_loader_val,
      File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
        return func(*args, **kwargs)
      File "/projects/TransTrack/engine_track.py", line 152, in evaluate
        res_track = tracker.step(results[0])
      File "/projects/TransTrack/models/tracker.py", line 110, in step
        track['tracking_id'] = tracks[m1]['tracking_id']
    KeyError: 'tracking_id'
    
    

    I went back the 6120th frame and found there was a long sequence with no people for from 5000th frame. This might be somehow related to the error? do you have idea why there is no key of tracking_id here?

    opened by Xingyu-Jin 7
  • 4 GPU-Training (TITIAN RTX)

    4 GPU-Training (TITIAN RTX)

    Hi, Peize, Thank you for your great work to propose the first transformer-based MOT.

    Sadly, My resources are only 4 GPU (TITIAN RTX-24G), so I set 4 batches per machine (keep 16-batch).

    python3 -m torch.distributed.launch --nproc_per_node=4 --use_env main_track.py  --output_dir ./output/crowdhuman_mot_trainhalf --dataset_file mot --coco_path mot --batch_size 4  --with_box_refine --resume pre_trained/crowdhuman_final.pth --epochs 20 --lr_drop 10
    

    However, overall performance drop. In particular, MOTA had 1% performance degradation (64.4% vs 65.4%(paper)) Should I adjust other learning parameters? I really want to reproduce the performance with my 4 GPUs. I'd appreciate it if you could give me an advice.

    image

    opened by gritYCDA 7
  • Test MOT17 test set, performance 54.1?

    Test MOT17 test set, performance 54.1?

    $ python3 main_track.py --output_dir . --dataset_file mot --coco_path mot --batch_size 1 --resume model/619mot17_mot17.pth --eval --with_box_refine --num_queries 500

    Test: Total time: 1:00:17 (1.3641 s / it) Accumulating evaluation results... DONE (t=2.52s). IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.541 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.884 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.589 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.080 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.440 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.636

    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.050 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.345 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.641 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.223 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.573 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.719 Creating video index for mot.

    opened by yuzhiyiliu 6
  • Memory Leakage after using MultiScaleDeformableAttention

    Memory Leakage after using MultiScaleDeformableAttention

    I'm trying to train the model to reproduce the result. However, I find that the GPU utilization keeps increasing during training process, which may be related to the release of some variables. After using the GPUitl library to monitor which line leads to the issue, I find that output = MSDA.ms_deform_attn_forward( value, value_spatial_shapes, sampling_locations, attention_weights, ctx.im2col_step) in ms_deform_attn_func.py leads to the increasing of the GPU utilization. I'm wondering whether you have meet the same issue?

    opened by jzhang538 6
  • DistributedVideoSampler IndexError

    DistributedVideoSampler IndexError

    Hi, I'm trying to train with "sh track_exps/crowdhuman_mot_trainhalf.sh" on MOT20 with the pretrained model "crowdhuman_final.pth" My GPU env is 8 RTX 3090's, but I'm keep getting the below error. Anyone with the same issue? Thanks

    Traceback (most recent call last):
      File "main_track.py", line 390, in <module>
        main(args)
      File "main_track.py", line 195, in main
        sampler_val = DistributedVideoSampler(dataset_val, start_id=args.start_id, shuffle=False)
      File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in __init__
        split_flags = [c[0] for c in chunks]
      File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in <listcomp>
        split_flags = [c[0] for c in chunks]
    IndexError: index 0 is out of bounds for axis 0 with size 0
    Traceback (most recent call last):
      File "main_track.py", line 390, in <module>
        main(args)
      File "main_track.py", line 195, in main
        sampler_val = DistributedVideoSampler(dataset_val, start_id=args.start_id, shuffle=False)
      File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in __init__
        split_flags = [c[0] for c in chunks]
      File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in <listcomp>
        split_flags = [c[0] for c in chunks]
    IndexError: index 0 is out of bounds for axis 0 with size 0
    Done (t=2.44s)
    creating index...
    Traceback (most recent call last):
      File "main_track.py", line 390, in <module>
        main(args)
      File "main_track.py", line 195, in main
        sampler_val = DistributedVideoSampler(dataset_val, start_id=args.start_id, shuffle=False)
      File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in __init__
        split_flags = [c[0] for c in chunks]
      File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in <listcomp>
        split_flags = [c[0] for c in chunks]
    IndexError: index 0 is out of bounds for axis 0 with size 0
    
    
    opened by imhgchoi 5
  • about training logs

    about training logs

    hi, peize, could you provide the training log? I don't have 8 v100 devices for training , so I need to reduce the batch size. your training log surely helps me check the training process.

    Best.

    opened by boringwar 5
  • Hello! I have some questions about the model

    Hello! I have some questions about the model

    In this paper, it is mentioned that the encoder combines the feature map extracted by the backbone from the current frame with the feature map preserved in the previous frame as the input; But in the implementation of the paper, I found that it seems not to do this operation, but to concat the feature of the two same current frame pictures. image

    opened by davidyang180 4
  • Pre-training both detection and tracking

    Pre-training both detection and tracking

    Hi~ Thanks for your code and updating! I have a questioin: Now, is it pre-training both detection and tracking by running track_exps/crowdhuman_train.sh ?

    opened by lihanlin99 4
  • Question about learned feature query

    Question about learned feature query

    Hi Peize,

    Thanks for the wonderful work of TransTrack!

    I went through your paper and was still a little confused about the learned feature used for object detection. As you said in the paper, a learned feature is a set of parameters. So, what are the parameters? Could you talk about it in more details?

    Thanks in advance for your help!

    opened by zy1296 4
  • MultiScaleDeformableAttention

    MultiScaleDeformableAttention

    Hello,thanks a lot for sharing the codes !!!

    When running the codes, it rasied an error which is ''No module named MultiScaleDeformableAttention''.

    The MultiScaleDeformableAttention is used in models/ops/functions/ms_deform_attn_func.py

    Is there something that I missed ? Thanks for replying.

    opened by EddieEduardo 3
  • Can NOT reproduce results on MOT20

    Can NOT reproduce results on MOT20

    Hello, I'm trying to reproduce results on MOT20, but I got a 5 points drop using the default hyper-parameter from MOT17. Can you provide the training script on MOT20?

    opened by HaojunYuPKU 0
  • Has anybody figured out exporting the trained model to ONNX?

    Has anybody figured out exporting the trained model to ONNX?

    Issue #22 https://github.com/PeizeSun/TransTrack/issues/22#issue-884151001

    Similar to the bugs faced in the previous closed issue, has anybody figured a way of exporting the model to ONNX? Have you guys tried to create an export script as well and were the problems faced similar to ours? @PeizeSun @ifzhang @simonwu53 @Abrahamon @iFighting

    opened by JJLimmm 0
  • A problem with object detetion.

    A problem with object detetion.

    When i test my demo. I find frequently an object is surrounded by more than one regression box. So i find you didn't delete overlap regression box. Why you didn't delete the overlap regression box?

    opened by 2713286758 0
  • demo运行后推理错误

    demo运行后推理错误

    加载671mot17_crowdhuman_mot17.pth模型后,推理的视频中基本没有检测框,当阈值设成0后,才会出现满屏矩形框。 我的参数这样设置的python demo.py --device cuda --video_input videos/palace.mp4 --demo_output output/ --track_thresh 0.4 --resume pretrained/671mot17_crowdhuman_mot17.pth

    opened by ht138612 2
  • Training question with classification labels customs data

    Training question with classification labels customs data

    I trained custom data according to the format of MOT17 label file, using classification labels from 1 to 4, but I got an error.It illustrates "Assertion idx_dim>=0 && idx_dim<index_size && "index out of bounds" failed.", in models/deformable_detrtrack_train.py line 382 "target_classes_onehot.scatter_(2, target_classes.unsqueeze(-1), 1)".Does anyone have the same bug as me?

    opened by 2713286758 1
Owner
Peize Sun
Peize Sun
Multiple-Object Tracking with Transformer

TransTrack: Multiple-Object Tracking with Transformer Introduction TransTrack: Multiple-Object Tracking with Transformer Models Training data Training

Peize Sun 537 Jan 4, 2023
This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

MOTR: End-to-End Multiple-Object Tracking with TRansformer This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object

null 348 Jan 7, 2023
Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)

Quasi-Dense Tracking This is the offical implementation of paper Quasi-Dense Similarity Learning for Multiple Object Tracking. We present a trailer th

ETH VIS Research Group 327 Dec 27, 2022
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking (CVPR 2021) Pytorch implementation of the ArTIST motion model. In this repo

Fatemeh 38 Dec 12, 2022
Implementation of the CVPR 2021 paper "Online Multiple Object Tracking with Cross-Task Synergy"

Online Multiple Object Tracking with Cross-Task Synergy This repository is the implementation of the CVPR 2021 paper "Online Multiple Object Tracking

null 54 Oct 15, 2022
Multiple Object Tracking with Yolov5!

Tracking with yolov5 This implementation is for who need to tracking multi-object only with detector. You can easily track mult-object with your well

null 9 Nov 8, 2022
Yolox-bytetrack-sample - Python sample of MOT (Multiple Object Tracking) using YOLOX and ByteTrack

yolox-bytetrack-sample YOLOXとByteTrackを用いたMOT(Multiple Object Tracking)のPythonサン

KazuhitoTakahashi 12 Nov 9, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
Object Detection and Multi-Object Tracking

Object Detection and Multi-Object Tracking

Bobby Chen 1.6k Jan 4, 2023
TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction TSDF++ is a novel multi-object TSDF formulation that can encode mult

ETHZ ASL 130 Dec 29, 2022
Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Putting_Game Object tracking and object detection is applied to track golf puts in real time and display stats/games. Works best with the Perfect Prac

Max 1 Dec 29, 2021
Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

DEFT: Detection Embeddings for Tracking DEFT: Detection Embeddings for Tracking, Mohamed Chaabane, Peter Zhang, J. Ross Beveridge, Stephen O'Hara

Mohamed Chaabane 253 Dec 18, 2022
Tracking code for the winner of track 1 in the MMP-Tracking Challenge at ICCV 2021 Workshop.

Tracking Code for the winner of track1 in MMP-Trakcing challenge This repository contains our tracking code for the Multi-camera Multiple People Track

DamoCV 29 Nov 13, 2022
Tracking Pipeline helps you to solve the tracking problem more easily

Tracking_Pipeline Tracking_Pipeline helps you to solve the tracking problem more easily I integrate detection algorithms like: Yolov5, Yolov4, YoloX,

VNOpenAI 32 Dec 21, 2022
Quadruped-command-tracking-controller - Quadruped command tracking controller (flat terrain)

Quadruped command tracking controller (flat terrain) Prepare Install RAISIM link

Yunho Kim 4 Oct 20, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
TrackTech: Real-time tracking of subjects and objects on multiple cameras

TrackTech: Real-time tracking of subjects and objects on multiple cameras This project is part of the 2021 spring bachelor final project of the Bachel

null 5 Jun 17, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022