Multiple-Object Tracking with Transformer

Peize Sun

Last update: Jan 4, 2023

Related tags

Deep Learning TransTrack

Overview

TransTrack: Multiple-Object Tracking with Transformer

Introduction

TransTrack: Multiple-Object Tracking with Transformer

Models

Training data	Training time	Validation MOTA	download
crowdhuman, mot_half	36h + 1h	65.4	model
crowdhuman	36h	53.8	model
mot_half	8h	61.6	model

Models are also available in Baidu Drive by code m4iv.

Notes

Evaluating crowdhuman-training model and mot-training model use different command lines, see Steps.
We observe about 1 MOTA noise.
If the resulting MOTA of your self-trained model is not desired, playing around with the --track_thresh sometimes gives a better performance.
The training time is on 8 NVIDIA V100 GPUs with batchsize 16.
We use the models pre-trained on imagenet.

Demo

Installation

The codebases are built on top of Deformable DETR and CenterTrack.

Requirements

Linux, CUDA>=9.2, GCC>=5.4
Python>=3.7
PyTorch ≥ 1.5 and torchvision that matches the PyTorch installation. You can install them together at pytorch.org to make sure of this
OpenCV is optional and needed by demo and visualization

Steps

Install and build libs

git clone https://github.com/PeizeSun/TransTrack.git
cd TransTrack
cd models/ops
python setup.py build install
cd ../..
pip install -r requirements.txt

Prepare dataset

mkdir -p crowdhuman/annotations
cp -r /path_to_crowdhuman_dataset/annotations/CrowdHuman_val.json crowdhuman/annotations/CrowdHuman_val.json
cp -r /path_to_crowdhuman_dataset/annotations/CrowdHuman_train.json crowdhuman/annotations/CrowdHuman_train.json
cp -r /path_to_crowdhuman_dataset/CrowdHuman_train crowdhuman/CrowdHuman_train
cp -r /path_to_crowdhuman_dataset/CrowdHuman_val crowdhuman/CrowdHuman_val
mkdir mot
cp -r /path_to_mot_dataset/train mot/train
cp -r /path_to_mot_dataset/test mot/test
python track_tools/convert_mot_to_coco.py

CrowdHuman dataset is available in CrowdHuman. We provide annotations of json format.

MOT dataset is available in MOT.

Pre-train on crowdhuman

sh track_exps/crowdhuman_train.sh
python track_tools/crowdhuman_model_to_mot.py

The pre-trained model is available crowdhuman_final.pth.

Train TransTrack

sh track_exps/crowdhuman_mot_trainhalf.sh

Evaluate TransTrack

sh track_exps/mot_val.sh
sh track_exps/mot_eval.sh

Visualize TransTrack

python track_tools/txt2video.py

Notes

Evaluate pre-trained CrowdHuman model on MOT

sh track_exps/det_val.sh
sh track_exps/mot_eval.sh

License

TransTrack is released under MIT License.

Citing

If you use TransTrack in your research or wish to refer to the baseline results published here, please use the following BibTeX entries:

@article{transtrack,
  title   =  {TransTrack: Multiple-Object Tracking with Transformer},
  author  =  {Peize Sun and Yi Jiang and Rufeng Zhang and Enze Xie and Jinkun Cao and Xinting Hu and Tao Kong and Zehuan Yuan and Changhu Wang and Ping Luo},
  journal =  {arXiv preprint arXiv: 2012.15460},
  year    =  {2020}
}

Comments

KeyError: 'age'

What caused the following problem during the test,Thanks! Traceback (most recent call last): File "main_track.py", line 369, in main(args) File "main_track.py", line 292, in main phase='eval', det_val=args.det_val) File "/data1/anaconda3/envs/TransTrack/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, **kwargs) File "/data1/PycharmProjects/TransTrack/engine_track.py", line 156, in evaluate res_track = tracker.step(results[0]) File "/data1/PycharmProjects/TransTrack/models/tracker.py", line 123, in step if track['age'] < self.max_age:

opened by davidyang180 13

Errors when testing on customer video which has a long sequences without people appearing

Hi, when I run mot_test.sh on my own video, it gives error

Test:  [ 6120/17958]  eta: 0:36:03  class_error: 100.00  loss: 0.0854 (3.0055)  loss_ce: 0.0187 (0.1138)  loss_bbox: 0.0000 (0.0711)  loss_giou: 0.0000 (0.3172)  loss_ce_0: 0.0126 (0.0945)  loss_bbox_0: 0.0000 (0.0723)  loss_giou_0: 0.0000 (0.3325)  loss_ce_1: 0.0167 (0.1124)  loss_bbox_1: 0.0000 (0.0727)  loss_giou_1: 0.0000 (0.3258)  loss_ce_2: 0.0161 (0.1037)  loss_bbox_2: 0.0000 (0.0725)  loss_giou_2: 0.0000 (0.3240)  loss_ce_3: 0.0132 (0.0974)  loss_bbox_3: 0.0000 (0.0718)  loss_giou_3: 0.0000 (0.3220)  loss_ce_4: 0.0093 (0.1024)  loss_bbox_4: 0.0000 (0.0729)  loss_giou_4: 0.0000 (0.3265)  loss_ce_unscaled: 0.0093 (0.0569)  class_error_unscaled: 100.0000 (27.9203)  loss_bbox_unscaled: 0.0000 (0.0142)  loss_giou_unscaled: 0.0000 (0.1586)  cardinality_error_unscaled: 500.0000 (498.9012)  loss_ce_0_unscaled: 0.0063 (0.0473)  loss_bbox_0_unscaled: 0.0000 (0.0145)  loss_giou_0_unscaled: 0.0000 (0.1662)  cardinality_error_0_unscaled: 500.0000 (498.9012)  loss_ce_1_unscaled: 0.0084 (0.0562)  loss_bbox_1_unscaled: 0.0000 (0.0145)  loss_giou_1_unscaled: 0.0000 (0.1629)  cardinality_error_1_unscaled: 500.0000 (498.9012)  loss_ce_2_unscaled: 0.0081 (0.0518)  loss_bbox_2_unscaled: 0.0000 (0.0145)  loss_giou_2_unscaled: 0.0000 (0.1620)  cardinality_error_2_unscaled: 500.0000 (498.9012)  loss_ce_3_unscaled: 0.0066 (0.0487)  loss_bbox_3_unscaled: 0.0000 (0.0144)  loss_giou_3_unscaled: 0.0000 (0.1610)  cardinality_error_3_unscaled: 500.0000 (498.9012)  loss_ce_4_unscaled: 0.0046 (0.0512)  loss_bbox_4_unscaled: 0.0000 (0.0146)  loss_giou_4_unscaled: 0.0000 (0.1632)  cardinality_error_4_unscaled: 500.0000 (498.9012)  time: 0.1651  data: 0.0022  max mem: 870
Traceback (most recent call last):
  File "main_track.py", line 367, in <module>
    main(args)
  File "main_track.py", line 289, in main
    test_stats, coco_evaluator, res_tracks = evaluate(model, criterion, postprocessors, data_loader_val,
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/projects/TransTrack/engine_track.py", line 152, in evaluate
    res_track = tracker.step(results[0])
  File "/projects/TransTrack/models/tracker.py", line 110, in step
    track['tracking_id'] = tracks[m1]['tracking_id']
KeyError: 'tracking_id'

I went back the 6120th frame and found there was a long sequence with no people for from 5000th frame. This might be somehow related to the error? do you have idea why there is no key of tracking_id here?

opened by Xingyu-Jin 7

4 GPU-Training (TITIAN RTX)
Hi, Peize, Thank you for your great work to propose the first transformer-based MOT.

Sadly, My resources are only 4 GPU (TITIAN RTX-24G), so I set 4 batches per machine (keep 16-batch).

python3 -m torch.distributed.launch --nproc_per_node=4 --use_env main_track.py --output_dir ./output/crowdhuman_mot_trainhalf --dataset_file mot --coco_path mot --batch_size 4 --with_box_refine --resume pre_trained/crowdhuman_final.pth --epochs 20 --lr_drop 10

However, overall performance drop. In particular, MOTA had 1% performance degradation (64.4% vs 65.4%(paper)) Should I adjust other learning parameters? I really want to reproduce the performance with my 4 GPUs. I'd appreciate it if you could give me an advice.
opened by gritYCDA 7
Test MOT17 test set, performance 54.1?

$ python3 main_track.py --output_dir . --dataset_file mot --coco_path mot --batch_size 1 --resume model/619mot17_mot17.pth --eval --with_box_refine --num_queries 500

Test: Total time: 1:00:17 (1.3641 s / it) Accumulating evaluation results... DONE (t=2.52s). IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.541 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.884 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.589 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.080 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.440 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.636

Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.050 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.345 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.641 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.223 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.573 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.719 Creating video index for mot.

opened by yuzhiyiliu 6
Memory Leakage after using MultiScaleDeformableAttention

I'm trying to train the model to reproduce the result. However, I find that the GPU utilization keeps increasing during training process, which may be related to the release of some variables. After using the GPUitl library to monitor which line leads to the issue, I find that output = MSDA.ms_deform_attn_forward( value, value_spatial_shapes, sampling_locations, attention_weights, ctx.im2col_step) in ms_deform_attn_func.py leads to the increasing of the GPU utilization. I'm wondering whether you have meet the same issue?

opened by jzhang538 6

DistributedVideoSampler IndexError

Hi, I'm trying to train with "sh track_exps/crowdhuman_mot_trainhalf.sh" on MOT20 with the pretrained model "crowdhuman_final.pth" My GPU env is 8 RTX 3090's, but I'm keep getting the below error. Anyone with the same issue? Thanks

Traceback (most recent call last):
  File "main_track.py", line 390, in <module>
    main(args)
  File "main_track.py", line 195, in main
    sampler_val = DistributedVideoSampler(dataset_val, start_id=args.start_id, shuffle=False)
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in __init__
    split_flags = [c[0] for c in chunks]
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in <listcomp>
    split_flags = [c[0] for c in chunks]
IndexError: index 0 is out of bounds for axis 0 with size 0
Traceback (most recent call last):
  File "main_track.py", line 390, in <module>
    main(args)
  File "main_track.py", line 195, in main
    sampler_val = DistributedVideoSampler(dataset_val, start_id=args.start_id, shuffle=False)
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in __init__
    split_flags = [c[0] for c in chunks]
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in <listcomp>
    split_flags = [c[0] for c in chunks]
IndexError: index 0 is out of bounds for axis 0 with size 0
Done (t=2.44s)
creating index...
Traceback (most recent call last):
  File "main_track.py", line 390, in <module>
    main(args)
  File "main_track.py", line 195, in main
    sampler_val = DistributedVideoSampler(dataset_val, start_id=args.start_id, shuffle=False)
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in __init__
    split_flags = [c[0] for c in chunks]
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in <listcomp>
    split_flags = [c[0] for c in chunks]
IndexError: index 0 is out of bounds for axis 0 with size 0

opened by imhgchoi 5

about training logs

hi, peize, could you provide the training log? I don't have 8 v100 devices for training , so I need to reduce the batch size. your training log surely helps me check the training process.

Best.

opened by boringwar 5
Hello! I have some questions about the model

In this paper, it is mentioned that the encoder combines the feature map extracted by the backbone from the current frame with the feature map preserved in the previous frame as the input; But in the implementation of the paper, I found that it seems not to do this operation, but to concat the feature of the two same current frame pictures.

opened by davidyang180 4
Pre-training both detection and tracking

Hi~ Thanks for your code and updating! I have a questioin: Now, is it pre-training both detection and tracking by running track_exps/crowdhuman_train.sh ?

opened by lihanlin99 4
Question about learned feature query

Hi Peize,

Thanks for the wonderful work of TransTrack!

I went through your paper and was still a little confused about the learned feature used for object detection. As you said in the paper, a learned feature is a set of parameters. So, what are the parameters? Could you talk about it in more details?

Thanks in advance for your help!

opened by zy1296 4
MultiScaleDeformableAttention

Hello，thanks a lot for sharing the codes !!!

When running the codes, it rasied an error which is ''No module named MultiScaleDeformableAttention''.

The MultiScaleDeformableAttention is used in models/ops/functions/ms_deform_attn_func.py

Is there something that I missed ？ Thanks for replying.

opened by EddieEduardo 3
Can NOT reproduce results on MOT20

Hello, I'm trying to reproduce results on MOT20, but I got a 5 points drop using the default hyper-parameter from MOT17. Can you provide the training script on MOT20?

opened by HaojunYuPKU 0
Has anybody figured out exporting the trained model to ONNX?

Issue #22 https://github.com/PeizeSun/TransTrack/issues/22#issue-884151001

Similar to the bugs faced in the previous closed issue, has anybody figured a way of exporting the model to ONNX? Have you guys tried to create an export script as well and were the problems faced similar to ours? @PeizeSun @ifzhang @simonwu53 @Abrahamon @iFighting

opened by JJLimmm 0
A problem with object detetion.

When i test my demo. I find frequently an object is surrounded by more than one regression box. So i find you didn't delete overlap regression box. Why you didn't delete the overlap regression box?

opened by 2713286758 0
demo运行后推理错误

加载671mot17_crowdhuman_mot17.pth模型后，推理的视频中基本没有检测框，当阈值设成0后，才会出现满屏矩形框。我的参数这样设置的python demo.py --device cuda --video_input videos/palace.mp4 --demo_output output/ --track_thresh 0.4 --resume pretrained/671mot17_crowdhuman_mot17.pth

opened by ht138612 2
Training question with classification labels customs data

I trained custom data according to the format of MOT17 label file, using classification labels from 1 to 4, but I got an error.It illustrates "Assertion idx_dim>=0 && idx_dim<index_size && "index out of bounds" failed.", in models/deformable_detrtrack_train.py line 382 "target_classes_onehot.scatter_(2, target_classes.unsqueeze(-1), 1)".Does anyone have the same bug as me?

opened by 2713286758 1

Owner

Peize Sun

GitHub

Multiple-Object Tracking with Transformer

TransTrack: Multiple-Object Tracking with Transformer Introduction TransTrack: Multiple-Object Tracking with Transformer Models Training data Training

537 Jan 4, 2023

This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

MOTR: End-to-End Multiple-Object Tracking with TRansformer This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object

348 Jan 7, 2023

Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)

Quasi-Dense Tracking This is the offical implementation of paper Quasi-Dense Similarity Learning for Multiple Object Tracking. We present a trailer th

327 Dec 27, 2022

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking (CVPR 2021) Pytorch implementation of the ArTIST motion model. In this repo

38 Dec 12, 2022

Implementation of the CVPR 2021 paper "Online Multiple Object Tracking with Cross-Task Synergy"

Online Multiple Object Tracking with Cross-Task Synergy This repository is the implementation of the CVPR 2021 paper "Online Multiple Object Tracking

54 Oct 15, 2022

Multiple Object Tracking with Yolov5!

Tracking with yolov5 This implementation is for who need to tracking multi-object only with detector. You can easily track mult-object with your well

9 Nov 8, 2022

Yolox-bytetrack-sample - Python sample of MOT (Multiple Object Tracking) using YOLOX and ByteTrack

yolox-bytetrack-sample YOLOXとByteTrackを用いたMOT(Multiple Object Tracking)のPythonサン

12 Nov 9, 2022

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

443 Dec 6, 2022

Object Detection and Multi-Object Tracking

1.6k Jan 4, 2023

SiamMOT is a region-based Siamese Multi-Object Tracking network that detects and associates object instances simultaneously.

SiamMOT: Siamese Multi-Object Tracking

432 Dec 17, 2022

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

1.4k Dec 30, 2022

Multiple-Object Tracking with Transformer

Related tags

Overview

TransTrack: Multiple-Object Tracking with Transformer

Introduction

Models

Notes

Demo

Installation

Requirements

Steps

Notes

License

Citing

Comments

Owner

Peize Sun

Multiple-Object Tracking with Transformer

This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Implementation of the CVPR 2021 paper "Online Multiple Object Tracking with Cross-Task Synergy"

Multiple Object Tracking with Yolov5!

Yolox-bytetrack-sample - Python sample of MOT (Multiple Object Tracking) using YOLOX and ByteTrack

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

Object Detection and Multi-Object Tracking

SiamMOT is a region-based Siamese Multi-Object Tracking network that detects and associates object instances simultaneously.

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

Tracking code for the winner of track 1 in the MMP-Tracking Challenge at ICCV 2021 Workshop.

Tracking Pipeline helps you to solve the tracking problem more easily

Quadruped-command-tracking-controller - Quadruped command tracking controller (flat terrain)

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

TrackTech: Real-time tracking of subjects and objects on multiple cameras

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.