TrackFormer: Multi-Object Tracking with Transformers

Overview

TrackFormer: Multi-Object Tracking with Transformers

This repository provides the official implementation of the TrackFormer: Multi-Object Tracking with Transformers paper by Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixe and Christoph Feichtenhofer. The codebase builds upon DETR, Deformable DETR and Tracktor.

As the paper is still under submission this repository will continuously be updated and might at times not reflect the current state of the arXiv paper.

MOT17-03-SDP MOTS20-07

Abstract

The challenging task of multi-object tracking (MOT) requires simultaneous reasoning about track initialization, identity, and spatiotemporal trajectories. We formulate this task as a frame-to-frame set prediction problem and introduce TrackFormer, an end-to-end MOT approach based on an encoder-decoder Transformer architecture. Our model achieves data association between frames via attention by evolving a set of track predictions through a video sequence. The Transformer decoder initializes new tracks from static object queries and autoregressively follows existing tracks in space and time with the new concept of identity preserving track queries. Both decoder query types benefit from self- and encoder-decoder attention on global frame-level features, thereby omitting any additional graph optimization and matching or modeling of motion and appearance. TrackFormer represents a new tracking-by-attention paradigm and yields state-of-the-art performance on the task of multi-object tracking (MOT17) and segmentation (MOTS20).

TrackFormer casts multi-object tracking as a set prediction problem performing joint detection and tracking-by-attention. The architecture consists of a CNN for image feature extraction, a Transformer encoder for image feature encoding and a Transformer decoder which applies self- and encoder-decoder attention to produce output embeddings with bounding box and class information.

Installation

We refer to our docs/INSTALL.md for detailed installation instructions.

Train TrackFormer

We refer to our docs/TRAIN.md for detailed training instructions.

Evaluate TrackFormer

In order to evaluate TrackFormer on a multi-object tracking dataset, we provide the src/track.py script which supports several datasets and splits interchangle via the dataset_name argument (See src/datasets/tracking/factory.py for an overview of all datasets.) The default tracking configuration is specified in cfgs/track.yaml. To facilitate the reproducibility of our results, we provide evaluation metrics for both the train and test set.

MOT17

Private detections

python src/track.py reid
MOT17 MOTA IDF1 MT ML FP FN ID SW.
Train 68.1 67.6 816 207 33549 71937 1935
Test 65.0 63.9 1074 324 70443 123552 3528

Public detections (DPM, FRCNN, SDP)

python src/track.py with \
    reid \
    public_detections=min_iou_0_5 \
    obj_detect_checkpoint_file=models/mots20_train_masks/checkpoint.pth
MOT17 MOTA IDF1 MT ML FP FN ID SW.
Train 67.2 66.9 663 294 14640 94122 1866
Test 62.5 60.7 702 632 32828 174921 3917

MOTS20

python src/track.py with \
    dataset_name=MOTS20-ALL \
    obj_detect_checkpoint_file=models/mots20_train_masks/checkpoint.pth

Our tracking script only applies MOT17 metrics evaluation but outputs MOTS20 mask prediction files. To evaluate these download the official MOTChallengeEvalKit.

MOTS20 sMOTSA IDF1 FP FN IDs
Train -- -- -- -- --
Test 54.9 63.6 2233 7195 278

Demo

To facilitate the application of TrackFormer, we provide a demo interface which allows for a quick processing of a given video sequence.

ffmpeg -i data/snakeboard/snakeboard.mp4 -vf fps=30 data/snakeboard/%06d.png

python src/track.py with \
    dataset_name=DEMO \
    data_root_dir=data/snakeboard \
    output_dir=data/snakeboard \
    write_images=pretty
Snakeboard demo

Publication

If you use this software in your research, please cite our publication:

@InProceedings{meinhardt2021trackformer,
    title={TrackFormer: Multi-Object Tracking with Transformers},
    author={Tim Meinhardt and Alexander Kirillov and Laura Leal-Taixe and Christoph Feichtenhofer},
    year={2021},
    eprint={2101.02702},
    archivePrefix={arXiv},
}
Comments
  • Not able to install MultiscaleDeformableAttention

    Not able to install MultiscaleDeformableAttention

    Hi,

    I am trying to run a test for training trackformer. I am looking at INSTALL.md to build the environment. When I try python src/trackformer/models/ops/setup.py build --build-base=src/trackformer/models/ops/ install I get an error saying:

    Traceback (most recent call last):
      File "src/trackformer/models/ops/setup.py", line 62, in <module>
        setup(
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/__init__.py", line 87, in setup
        return distutils.core.setup(**attrs)
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 148, in setup
        return run_commands(dist)
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
        dist.run_commands()
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
        self.run_command(cmd)
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/dist.py", line 1214, in run_command
        super().run_command(command)
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
        cmd_obj.run()
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/_distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/dist.py", line 1214, in run_command
        super().run_command(command)
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
        cmd_obj.run()
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
        _build_ext.build_ext.run(self)
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 339, in run
        self.build_extensions()
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 580, in build_extensions
        build_ext.build_extensions(self)
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
        _build_ext.build_ext.build_extensions(self)
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 448, in build_extensions
        self._build_extensions_serial()
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 473, in _build_extensions_serial
        self.build_extension(ext)
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
        _build_ext.build_extension(self, ext)
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 528, in build_extension
        objects = self.compiler.compile(sources,
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 414, in unix_wrap_ninja_compile
        _write_ninja_file_and_compile_objects(
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1135, in _write_ninja_file_and_compile_objects
        _run_ninja_build(
      File "/z/home/mahzad-khosh/trackformer/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1413, in _run_ninja_build
        raise RuntimeError(message)
    RuntimeError: Error compiling objects for extension
    

    How can I get this package installed?

    Thanks,

    opened by mkhoshle 16
  • AttributeError: 'ReadOnlyList' object has no attribute 'message'

    AttributeError: 'ReadOnlyList' object has no attribute 'message'

    Hi, when I load the trained checkpoint, I got this error: AttributeError: 'ReadOnlyList' object has no attribute 'message'

    I think this mistake is related to Sacred.

    when reading config.yaml , i got this message: message: The configuration is read-only in a captured function!

    the config.yaml :

    lr_linear_proj_names:  !!python/object/new:sacred.config.custom_containers.ReadOnlyList
      listitems:
      - reference_points
      - sampling_offsets
      state:
        message: The configuration is read-only in a captured function!
    

    I'm just following the training instructions. And I don't know much about Sacred. Can anyone help me? Thanks firstly.

    opened by AzeroGYH 15
  • Expected results after training on the joint set of CrowdHuman and MOT17

    Expected results after training on the joint set of CrowdHuman and MOT17

    Expected results after training on the joint set of CrowdHuman and MOT17

    Hey, thank you for you excellent work!

    I train TrackFormer on your default setting (load from pretrained CrowdHuaman checkpoint) on the joint set of CrowdHuman and MOT17, but get result on MOT17 73.3 MOTA (I think it is validated on your default dataset mot17_train_cross_val_frame_0_5_to_1_0_coco).

    Is the result corresponds to the training set result (74.2 MOTA provided in ReadMe) or the cross-validate result (71.3 MOTA provided in the paper)?

    Thank you in advance.

    opened by FengLi-ust 7
  • The DDP hung up at  torch.nn.parallel.DistributedDataParallel(model)

    The DDP hung up at torch.nn.parallel.DistributedDataParallel(model)

    Hi, I really enjoyed reading your paper and code. Great work. I am trying to reproduce the results by running your code on HPC (cluster, one node with 2 GPUs). As mentioned in read me training section, I followed the following command in interactive slurm mode. " python -m torch.distributed.launch --nproc_per_node=2 --use_env src/train.py with \ crowdhuman
    deformable
    multi_frame
    tracking
    output_dir=models/crowdhuman_deformable_multi_frame \ "

    But my code is getting hung up at line " model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu], find_unused_parameters=True)."

    Could you please help me? Following is the output


    Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


    WARNING - root - Changed type of config entry "train_split" from str to NoneType WARNING - train - No observers have been added to this run WARNING - root - Changed type of config entry "train_split" from str to NoneType WARNING - train - No observers have been added to this run INFO - train - Running command 'load_config' INFO - train - Started INFO - train - Running command 'load_config' INFO - train - Started Configuration (modified, added, typechanged, doc): aux_loss = True backbone = 'resnet50' batch_size = 1 bbox_loss_coef = 5.0 clip_max_norm = 0.1 cls_loss_coef = 2.0 coco_and_crowdhuman_prev_frame_rnd_augs = 0.2 coco_min_num_objects = 0 coco_panoptic_path = None coco_path = 'data/coco_2017' coco_person_train_split = None crowdhuman_path = 'data/CrowdHuman' crowdhuman_train_split = 'train_val' dataset = 'mot_crowdhuman' debug = False dec_layers = 6 dec_n_points = 4 deformable = True device = 'cuda' dice_loss_coef = 1.0 dilation = False dim_feedforward = 1024 dist_url = 'env://' dropout = 0.1 enc_layers = 6 enc_n_points = 4 eos_coef = 0.1 epochs = 80 eval_only = False eval_train = False focal_alpha = 0.25 focal_gamma = 2 focal_loss = True freeze_detr = False giou_loss_coef = 2 hidden_dim = 288 load_mask_head_from_model = None lr = 0.0002 lr_backbone = 2e-05 lr_backbone_names = ['backbone.0'] lr_drop = 50 lr_linear_proj_mult = 0.1 lr_linear_proj_names = ['reference_points', 'sampling_offsets'] lr_track = 0.0001 mask_loss_coef = 1.0 masks = False merge_frame_features = False mot_path_train = 'data/MOT17' mot_path_val = 'data/MOT17' multi_frame_attention = True multi_frame_attention_separate_encoder = True multi_frame_encoding = True nheads = 8 no_vis = False num_feature_levels = 4 num_queries = 500 num_workers = 2 output_dir = 'models/crowdhuman_deformable_multi_frame' overflow_boxes = True overwrite_lr_scheduler = False overwrite_lrs = False position_embedding = 'sine' pre_norm = False resume = '' resume_optim = False resume_shift_neuron = False resume_vis = False save_model_interval = 5 seed = 42 set_cost_bbox = 5.0 set_cost_class = 2.0 set_cost_giou = 2.0 start_epoch = 1 track_attention = False track_backprop_prev_frame = False track_prev_frame_range = 5 track_prev_frame_rnd_augs = 0.01 track_prev_prev_frame = False track_query_false_negative_prob = 0.4 track_query_false_positive_eos_weight = True track_query_false_positive_prob = 0.1 tracking = True tracking_eval = True train_split = None two_stage = False val_interval = 5 val_split = 'mot17_train_cross_val_frame_0_5_to_1_0_coco' vis_and_log_interval = 50 vis_port = 8097 vis_server = '' weight_decay = 0.0001 with_box_refine = True world_size = 2 img_transform: max_size = 1333 val_width = 800 INFO - train - Completed after 0:00:00 Namespace(aux_loss=True, backbone='resnet50', batch_size=1, bbox_loss_coef=5.0, clip_max_norm=0.1, cls_loss_coef=2.0, coco_and_crowdhuman_prev_frame_rnd_augs=0.2, coco_min_num_objects=0, coco_panoptic_path=None, coco_path='data/coco_2017', coco_person_train_split=None, crowdhuman_path='data/CrowdHuman', crowdhuman_train_split='train_val', dataset='mot_crowdhuman', debug=False, dec_layers=6, dec_n_points=4, deformable=True, device='cuda', dice_loss_coef=1.0, dilation=False, dim_feedforward=1024, dist_url='env://', dropout=0.1, enc_layers=6, enc_n_points=4, eos_coef=0.1, epochs=80, eval_only=False, eval_train=False, focal_alpha=0.25, focal_gamma=2, focal_loss=True, freeze_detr=False, giou_loss_coef=2, hidden_dim=288, img_transform=Namespace(max_size=1333, val_width=800), load_mask_head_from_model=None, lr=0.0002, lr_backbone=2e-05, lr_backbone_names=['backbone.0'], lr_drop=50, lr_linear_proj_mult=0.1, lr_linear_proj_names=['reference_points', 'sampling_offsets'], lr_track=0.0001, mask_loss_coef=1.0, masks=False, merge_frame_features=False, mot_path_train='data/MOT17', mot_path_val='data/MOT17', multi_frame_attention=True, multi_frame_attention_separate_encoder=True, multi_frame_encoding=True, nheads=8, no_vis=False, num_feature_levels=4, num_queries=500, num_workers=2, output_dir='models/crowdhuman_deformable_multi_frame', overflow_boxes=True, overwrite_lr_scheduler=False, overwrite_lrs=False, position_embedding='sine', pre_norm=False, resume='', resume_optim=False, resume_shift_neuron=False, resume_vis=False, save_model_interval=5, seed=42, set_cost_bbox=5.0, set_cost_class=2.0, set_cost_giou=2.0, start_epoch=1, track_attention=False, track_backprop_prev_frame=False, track_prev_frame_range=5, track_prev_frame_rnd_augs=0.01, track_prev_prev_frame=False, track_query_false_negative_prob=0.4, track_query_false_positive_eos_weight=True, track_query_false_positive_prob=0.1, tracking=True, tracking_eval=True, train_split=None, two_stage=False, val_interval=5, val_split='mot17_train_cross_val_frame_0_5_to_1_0_coco', vis_and_log_interval=50, vis_port=8097, vis_server='', weight_decay=0.0001, with_box_refine=True, world_size=2) using distributed mode | distributed init (rank 1): env:// Configuration (modified, added, typechanged, doc): aux_loss = True backbone = 'resnet50' batch_size = 1 bbox_loss_coef = 5.0 clip_max_norm = 0.1 cls_loss_coef = 2.0 coco_and_crowdhuman_prev_frame_rnd_augs = 0.2 coco_min_num_objects = 0 coco_panoptic_path = None coco_path = 'data/coco_2017' coco_person_train_split = None crowdhuman_path = 'data/CrowdHuman' crowdhuman_train_split = 'train_val' dataset = 'mot_crowdhuman' debug = False dec_layers = 6 dec_n_points = 4 deformable = True device = 'cuda' dice_loss_coef = 1.0 dilation = False dim_feedforward = 1024 dist_url = 'env://' dropout = 0.1 enc_layers = 6 enc_n_points = 4 eos_coef = 0.1 epochs = 80 eval_only = False eval_train = False focal_alpha = 0.25 focal_gamma = 2 focal_loss = True freeze_detr = False giou_loss_coef = 2 hidden_dim = 288 load_mask_head_from_model = None lr = 0.0002 lr_backbone = 2e-05 lr_backbone_names = ['backbone.0'] lr_drop = 50 lr_linear_proj_mult = 0.1 lr_linear_proj_names = ['reference_points', 'sampling_offsets'] lr_track = 0.0001 mask_loss_coef = 1.0 masks = False merge_frame_features = False mot_path_train = 'data/MOT17' mot_path_val = 'data/MOT17' multi_frame_attention = True multi_frame_attention_separate_encoder = True multi_frame_encoding = True nheads = 8 no_vis = False num_feature_levels = 4 num_queries = 500 num_workers = 2 output_dir = 'models/crowdhuman_deformable_multi_frame' overflow_boxes = True overwrite_lr_scheduler = False overwrite_lrs = False position_embedding = 'sine' pre_norm = False resume = '' resume_optim = False resume_shift_neuron = False resume_vis = False save_model_interval = 5 seed = 42 set_cost_bbox = 5.0 set_cost_class = 2.0 set_cost_giou = 2.0 start_epoch = 1 track_attention = False track_backprop_prev_frame = False track_prev_frame_range = 5 track_prev_frame_rnd_augs = 0.01 track_prev_prev_frame = False track_query_false_negative_prob = 0.4 track_query_false_positive_eos_weight = True track_query_false_positive_prob = 0.1 tracking = True tracking_eval = True train_split = None two_stage = False val_interval = 5 val_split = 'mot17_train_cross_val_frame_0_5_to_1_0_coco' vis_and_log_interval = 50 vis_port = 8097 vis_server = '' weight_decay = 0.0001 with_box_refine = True world_size = 2 img_transform: max_size = 1333 val_width = 800 INFO - train - Completed after 0:00:00 Namespace(aux_loss=True, backbone='resnet50', batch_size=1, bbox_loss_coef=5.0, clip_max_norm=0.1, cls_loss_coef=2.0, coco_and_crowdhuman_prev_frame_rnd_augs=0.2, coco_min_num_objects=0, coco_panoptic_path=None, coco_path='data/coco_2017', coco_person_train_split=None, crowdhuman_path='data/CrowdHuman', crowdhuman_train_split='train_val', dataset='mot_crowdhuman', debug=False, dec_layers=6, dec_n_points=4, deformable=True, device='cuda', dice_loss_coef=1.0, dilation=False, dim_feedforward=1024, dist_url='env://', dropout=0.1, enc_layers=6, enc_n_points=4, eos_coef=0.1, epochs=80, eval_only=False, eval_train=False, focal_alpha=0.25, focal_gamma=2, focal_loss=True, freeze_detr=False, giou_loss_coef=2, hidden_dim=288, img_transform=Namespace(max_size=1333, val_width=800), load_mask_head_from_model=None, lr=0.0002, lr_backbone=2e-05, lr_backbone_names=['backbone.0'], lr_drop=50, lr_linear_proj_mult=0.1, lr_linear_proj_names=['reference_points', 'sampling_offsets'], lr_track=0.0001, mask_loss_coef=1.0, masks=False, merge_frame_features=False, mot_path_train='data/MOT17', mot_path_val='data/MOT17', multi_frame_attention=True, multi_frame_attention_separate_encoder=True, multi_frame_encoding=True, nheads=8, no_vis=False, num_feature_levels=4, num_queries=500, num_workers=2, output_dir='models/crowdhuman_deformable_multi_frame', overflow_boxes=True, overwrite_lr_scheduler=False, overwrite_lrs=False, position_embedding='sine', pre_norm=False, resume='', resume_optim=False, resume_shift_neuron=False, resume_vis=False, save_model_interval=5, seed=42, set_cost_bbox=5.0, set_cost_class=2.0, set_cost_giou=2.0, start_epoch=1, track_attention=False, track_backprop_prev_frame=False, track_prev_frame_range=5, track_prev_frame_rnd_augs=0.01, track_prev_prev_frame=False, track_query_false_negative_prob=0.4, track_query_false_positive_eos_weight=True, track_query_false_positive_prob=0.1, tracking=True, tracking_eval=True, train_split=None, two_stage=False, val_interval=5, val_split='mot17_train_cross_val_frame_0_5_to_1_0_coco', vis_and_log_interval=50, vis_port=8097, vis_server='', weight_decay=0.0001, with_box_refine=True, world_size=2) using distributed mode | distributed init (rank 0): env:// git: sha: d62d81023dbffb4a1820db39ce527b66df6d7b61, status: has uncommited changes, branch: main

    opened by shubham83183 6
  • TypeError: ms_deform_attn_forward(): incompatible function arguments.

    TypeError: ms_deform_attn_forward(): incompatible function arguments.

    Hi,

    Cheers on the wonderful work. I am trying to run just the evaluation with 'python3 src/track.py with reid'.

    I am getting the following error:

    TypeError: ms_deform_attn_forward(): incompatible function arguments. The following argument types are supported: 1. (arg0: at::Tensor, arg1: at::Tensor, arg2: at::Tensor, arg3: at::Tensor, arg4: at::Tensor, arg5: int) -> at::Tensor Invoked with: tensor([[[[-9.0617e+00, 2.9002e+00, -5.3017e+00, ..., -1.4381e+00, 3.8348e+00, 2.1320e-01], [ 1.0434e+00, -3.8773e-01, -3.6883e+00, ..., -2.8583e+00, -5.9393e-01, 6.8181e-01], [ 1.6065e+00, 7.8195e-01, -2.3155e+00, ..., -2.0958e+00, -1.9994e-01, -1.6163e+00], ..., [-2.0559e+00, 6.3167e-02, 4.4025e+00, ..., 1.9450e+00, -8.6947e-01, 1.3416e+00], [ 2.9230e+00, 1.6198e+00, 3.9162e+00, ..., -1.7625e+00, -6.7662e-01, -2.4316e+00], [-2.7931e+00, -1.3822e-01, -1.1136e+00, ..., 1.2329e-01, 3.1032e+00, -1.0232e+00]],

    ..... device='cuda:0'), 64

    opened by harkiratbehl 6
  • Some confusion about the paper

    Some confusion about the paper

    Hi, thanks for you great job! I have a question about your paper, In the MOT17 experiment section of the paper, The dataset you used for the test is the MOT17 test dataset or a part of training dataset as the test dataset?

    opened by quxu91 6
  • Evaluate TrackFormer on MOT17 with the problem with numpy

    Evaluate TrackFormer on MOT17 with the problem with numpy

    Hello, when I tried to Evaluate TrackFormer on MOT17 with python src/track.py with \ reid \ tracker_cfg.public_detections=min_iou_0_5 \ obj_detect_checkpoint_file=models/mot17_deformable_multi_frame/checkpoint_epoch_50.pth, I got the problem.

    INFO - main - TRACK SEQ: MOT17-02-DPM
    100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 600/600 [02:15<00:00,  4.41it/s]
    INFO - main - NUM TRACKS: 96 ReIDs: 13
    INFO - main - RUNTIME: 135.96 s
    ERROR - track - Failed after 0:08:13!
    Traceback (most recent calls WITHOUT Sacred internals):
      File "src/track.py", line 153, in main
        mot_accum = get_mot_accum(results, seq_loader)
      File "/media/HardDisk_new/wh/second_code/trackformer/src/trackformer/util/track_utils.py", line 397, in get_mot_accum
        distance)
      File "/home/wh/anaconda3/envs/trackformer/lib/python3.7/site-packages/motmetrics/mot.py", line 252, in update
        rids, cids = linear_sum_assignment(dists)
      File "/home/wh/anaconda3/envs/trackformer/lib/python3.7/site-packages/motmetrics/lap.py", line 73, in linear_sum_assignment
        rids, cids = solver(costs)
      File "/home/wh/anaconda3/envs/trackformer/lib/python3.7/site-packages/motmetrics/lap.py", line 288, in lsa_solve_lapjv
        from lap import lapjv
      File "/home/wh/anaconda3/envs/trackformer/lib/python3.7/site-packages/lap/__init__.py", line 25, in <module>
        from ._lapjv import (
      File "__init__.pxd", line 199, in init lap._lapjv
    ValueError: numpy.ndarray has the wrong size, try recompiling. Expected 80, got 88
    
    

    used python src/track.py with reid and got the same problem

    INFO - main - TRACK SEQ: MOT17-02-DPM
    100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 600/600 [02:15<00:00,  4.44it/s]
    INFO - main - NUM TRACKS: 133 ReIDs: 25
    INFO - main - RUNTIME: 135.04 s
    ERROR - track - Failed after 0:07:17!
    Traceback (most recent calls WITHOUT Sacred internals):
      File "src/track.py", line 153, in main
        mot_accum = get_mot_accum(results, seq_loader)
      File "/media/HardDisk_new/wh/second_code/trackformer/src/trackformer/util/track_utils.py", line 397, in get_mot_accum
        distance)
      File "/home/wh/anaconda3/envs/trackformer/lib/python3.7/site-packages/motmetrics/mot.py", line 252, in update
        rids, cids = linear_sum_assignment(dists)
      File "/home/wh/anaconda3/envs/trackformer/lib/python3.7/site-packages/motmetrics/lap.py", line 73, in linear_sum_assignment
        rids, cids = solver(costs)
      File "/home/wh/anaconda3/envs/trackformer/lib/python3.7/site-packages/motmetrics/lap.py", line 288, in lsa_solve_lapjv
        from lap import lapjv
      File "/home/wh/anaconda3/envs/trackformer/lib/python3.7/site-packages/lap/__init__.py", line 25, in <module>
        from ._lapjv import (
      File "__init__.pxd", line 199, in init lap._lapjv
    ValueError: numpy.ndarray has the wrong size, try recompiling. Expected 80, got 88
    
    opened by a171232886 5
  • RuntimeError: Error compiling objects for extension

    RuntimeError: Error compiling objects for extension

    Hello, when i finished all the requirements installed,when compile the module i met this error.

    _`/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’ without object __p->_M_set_sharable(); error.txt

       ~~~~~~~~~^~
    

    /usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::M_set_sharable() [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’ without object ninja: build stopped: subcommand failed.` i am not sure what i can do to address this issue so ask help for people who are solve this problem correctly,Thank you! and my system configuration is as follows: Ubuntu 18.04 LTS cuda 10.1 pytorch 1.5

    opened by Soyad-yao 5
  • How do you select the initial track queries from the object queries?

    How do you select the initial track queries from the object queries?

    Thank you for the wonderful work. I have read the paper and code, and have a question about track query initialization.

    How do you select the initial track queries from the object queries in the evaluation? In the paper, the following sentences are stated,

    Each valid object detection {b00, b10, . . . } with a classification score above σobject, i.e., output embedding not predicting the background class (crossed), initializes a new track query embedding.

    After reading this, I expected to add the object queries with non-zero class labels to the new track queries. However, when looking at the code, it seems to be extracting only those that match 0.

    new_det_keep = torch.logical_and(
       new_det_scores > self.detection_obj_score_thresh,
       result['labels'][-self.num_object_queries:] == 0)
    

    I believe what is written in the paper is correct, but this implementation is beyond my understanding, could you please tell me what is happening in the implementation? Or if I have extracted the wrong part of the implementation, please let me know the correct part.

    opened by Tsunehiko 4
  • Why don't the track queries get updated for two_stage?

    Why don't the track queries get updated for two_stage?

    https://github.com/timmeinhardt/trackformer/blob/d62d81023dbffb4a1820db39ce527b66df6d7b61/src/trackformer/models/deformable_transformer.py#L180-L230

    I am confused why the track queries don't get updated for the two-stage.

    Also, nice work by the way!

    opened by owen24819 4
  • Using the DEMO code

    Using the DEMO code

    This paper is very good! When I use the code of demo interface,I always get mistakes like this:

    $ python src/track.py with \

    dataset_name=DEMO \
    data_root_dir=data/snakeboard \
    output_dir=data/snakeboard \
    write_images=pretty
    

    WARNING - root - Changed type of config entry "write_images" from bool to str WARNING - track - No observers have been added to this run

    In the end, I'll get something like this again: if (isinstance(colors[0], Sized) and len(colors[0]) == 2 IndexError: list index out of range

    I have created a DEMO folder and put the video I'm going to demonstrate into it, but it still reports this error. I guess it's because the video I put in hasn't been converted to coco format?

    But when I demonstrate, I should be able to enter any video and get visual results rather than having to convert it to Coco format.

    I'm very confused.How can I solve the problem?

    opened by Zachein 4
  • Use of args.multi_frame_attention

    Use of args.multi_frame_attention

    Hi @timmeinhardt , thanks so much for this great work!

    While trying to reproduce the results for MOTS20, I noticed some differences between your DeformableDETR and the DETR implementations.

    Could you explain the use of args.multi_frame_attention in the adjusted DeformableDETR? I'm wondering why it is not used in the DETR based model for mask tracking.

    Is multi frame attention not necessary to utilise track queries in the model? I read section 4.2 in the paper, but I'm still a bit confused.

    opened by tragians 3
  • error in ms_deformable_im2col_cuda

    error in ms_deformable_im2col_cuda

    I have install MultiScaleDeformableAttention package, but here comes two errors and the model is still training: error in ms_deformable_im2col_cuda: no kernel image is available for execution on the device error in ms_deformable_col2im_coord_cuda: no kernel image is available for execution on the device

    opened by hahapt 1
  • Error while train when get dataset

    Error while train when get dataset

    Hi,When I train the model with python src/train.py with mot17 deformable multi_frame tracking output_dir=models/mot17_deformable_multi_frame, the following errors occur: Traceback (most recent call last): File "src/train.py", line 357, in train(args) File "src/train.py", line 284, in train visualizers['train'], args) File "/home/ubuntu/track/trackformer/src/trackformer/engine.py", line 119, in train_one_epoch for i, (samples, targets) in enumerate(metric_logger.log_every(data_loader, epoch)): File "/home/ubuntu/track/trackformer/src/trackformer/util/misc.py", line 230, in log_every for obj in iterable: File "/home/ubuntu/anaconda3/envs/trackformer/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/home/ubuntu/anaconda3/envs/trackformer/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data) File "/home/ubuntu/anaconda3/envs/trackformer/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/home/ubuntu/anaconda3/envs/trackformer/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/trackformer/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/ubuntu/anaconda3/envs/trackformer/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ubuntu/anaconda3/envs/trackformer/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ubuntu/track/trackformer/src/trackformer/datasets/mot.py", line 49, in getitem img, target = self._getitem_from_id(idx, random_state, random_jitter=False) File "/home/ubuntu/track/trackformer/src/trackformer/datasets/coco.py", line 63, in _getitem_from_id img, target = self.prepare(img, target) File "/home/ubuntu/track/trackformer/src/trackformer/datasets/coco.py", line 220, in call masks = convert_coco_poly_to_mask(segmentations, h, w) File "/home/ubuntu/track/trackformer/src/trackformer/datasets/coco.py", line 177, in convert_coco_poly_to_mask rles = coco_mask.frPyObjects(polygons, height, width) File "pycocotools/_mask.pyx", line 292, in pycocotools._mask.frPyObjects IndexError: list index out of range

    I found that the data set is not a pair. But the dataset I generated with this command:python src/generate_coco_from_mot.py.I don't know what's wrong.

    opened by pjy125175 3
  • What does valid ratio mean?

    What does valid ratio mean?

    Hello,

    In the Deformable Transformer, there is a variable called valid_ratios which is used based on the masks. valid_ratios = torch.stack([self.get_valid_ratio(m) for m in masks], 1). If the masks are None in my case how am I supposed to calculate it?

    Also, what is the purpose of valid_ratios? I could not find anything in the Trackformer and Original Deformable Detr paper.

    I would appreciate it if you could clarify this.

    opened by mkhoshle 1
  • Ran into trouble with MOTS

    Ran into trouble with MOTS

    So my code run for MOT but I am getting the following error when I try to run MOTS: Command: python src/track.py with dataset_name=MOTS20-ALL obj_detect_checkpoint_file=models/mots20_train_masks/checkpoint.pth

    Error: in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for DETRSegmTracking: size mismatch for class_embed.weight: copying a param with shape torch.Size([2, 256]) from checkpoint, the shape in current model is torch.Size([21, 256]). size mismatch for class_embed.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([21]).

    opened by harkiratbehl 0
  • Making compatible to recent pytorch and python versions

    Making compatible to recent pytorch and python versions

    Hi,

    I have made some minor changes in the requirements.txt and install.md to make the installation work for recent pytorch and python versions. I have tested with Pytorch1.12.1 and Python3.9.

    Hope this will be helpful to others.

    opened by harkiratbehl 1
Owner
Tim Meinhardt
Ph.D. candidate at the Dynamic Vision and Learning Group, TU Munich
Tim Meinhardt
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
Object Detection and Multi-Object Tracking

Object Detection and Multi-Object Tracking

Bobby Chen 1.6k Jan 4, 2023
TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction TSDF++ is a novel multi-object TSDF formulation that can encode mult

ETHZ ASL 130 Dec 29, 2022
Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

TGraM Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling, Qibin He, Xian Sun, Zhiyuan Yan, Beibei Li, Kun Fu Abstract Rece

Qibin He 6 Nov 25, 2022
Python package for multiple object tracking research with focus on laboratory animals tracking.

motutils is a Python package for multiple object tracking research with focus on laboratory animals tracking. Features loads: MOTChallenge CSV, sleap

Matěj Šmíd 2 Sep 5, 2022
Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Putting_Game Object tracking and object detection is applied to track golf puts in real time and display stats/games. Works best with the Perfect Prac

Max 1 Dec 29, 2021
FairMOT - A simple baseline for one-shot multi-object tracking

FairMOT - A simple baseline for one-shot multi-object tracking

Yifu Zhang 3.6k Jan 8, 2023
Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

EagerMOT: 3D Multi-Object Tracking via Sensor Fusion Read our ICRA 2021 paper here. Check out the 3 minute video for the quick intro or the full prese

Aleksandr Kim 276 Dec 30, 2022
a project for 3D multi-object tracking

a project for 3D multi-object tracking

null 155 Jan 4, 2023
ByteTrack(Multi-Object Tracking by Associating Every Detection Box)のPythonでのONNX推論サンプル

ByteTrack-ONNX-Sample ByteTrack(Multi-Object Tracking by Associating Every Detection Box)のPythonでのONNX推論サンプルです。 ONNXに変換したモデルも同梱しています。 変換自体を試したい方はByteT

KazuhitoTakahashi 16 Oct 26, 2022
Implementation of MeMOT - Multi-Object Tracking with Memory - in Pytorch

MeMOT - Pytorch (wip) Implementation of MeMOT - Multi-Object Tracking with Memory - in Pytorch. This paper is just one in a line of work, but importan

Phil Wang 15 May 9, 2022
The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

OC-SORT Observation-Centric SORT (OC-SORT) is a pure motion-model-based multi-object tracker. It aims to improve tracking robustness in crowded scenes

Jinkun Cao 325 Jan 5, 2023
Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

DEFT: Detection Embeddings for Tracking DEFT: Detection Embeddings for Tracking, Mohamed Chaabane, Peter Zhang, J. Ross Beveridge, Stephen O'Hara

Mohamed Chaabane 253 Dec 18, 2022
Tracking code for the winner of track 1 in the MMP-Tracking Challenge at ICCV 2021 Workshop.

Tracking Code for the winner of track1 in MMP-Trakcing challenge This repository contains our tracking code for the Multi-camera Multiple People Track

DamoCV 29 Nov 13, 2022
Tracking Pipeline helps you to solve the tracking problem more easily

Tracking_Pipeline Tracking_Pipeline helps you to solve the tracking problem more easily I integrate detection algorithms like: Yolov5, Yolov4, YoloX,

VNOpenAI 32 Dec 21, 2022
Quadruped-command-tracking-controller - Quadruped command tracking controller (flat terrain)

Quadruped command tracking controller (flat terrain) Prepare Install RAISIM link

Yunho Kim 4 Oct 20, 2022
Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, as well as object detection and instance segmentation projects.

null 305 Dec 16, 2022
Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

Han Xu 129 Dec 11, 2022