This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

Related tags

Deep Learning MOTR
Overview

MOTR: End-to-End Multiple-Object Tracking with TRansformer

PWC PWC

This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

Introduction

TL; DR. MOTR is a fully end-to-end multiple-object tracking framework based on Transformer. It directly outputs the tracks within the video sequences without any association procedures.

Abstract. The key challenge in multiple-object tracking (MOT) task is temporal modeling of the object under track. Existing tracking-by-detection methods adopt simple heuristics, such as spatial or appearance similarity. Such methods, in spite of their commonality, are overly simple and insufficient to model complex variations, such as tracking through occlusion. Inherently, existing methods lack the ability to learn temporal variations from data. In this paper, we present MOTR, the first fully end-to-end multiple-object tracking framework. It learns to model the long-range temporal variation of the objects. It performs temporal association implicitly and avoids previous explicit heuristics. Built on Transformer and DETR, MOTR introduces the concept of “track query”. Each track query models the entire track of an object. It is transferred and updated frame-by-frame to perform object detection and tracking, in a seamless manner. Temporal aggregation network combined with multi-frame training is proposed to model the long-range temporal relation. Experimental results show that MOTR achieves state-of-the-art performance.

Main Results

Method Dataset Train Data MOTA IDF1 IDS URL
MOTR MOT16 MOT17+CrowdHuman Val 65.8 67.1 547 model
MOTR MOT17 MOT17+CrowdHuman Val 66.5 67.0 1884 model

Note:

  1. All models of MOTR are trained on 8 NVIDIA Tesla V100 GPUs.
  2. The training time is about 2.5 days for 200 epochs;
  3. The inference speed is about 7.5 FPS for resolution 1536x800;
  4. All models of MOTR are trained with ResNet50 with pre-trained weights on COCO dataset.

Installation

The codebase is built on top of Deformable DETR.

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    We recommend you to use Anaconda to create a conda environment:

    conda create -n deformable_detr python=3.7 pip

    Then, activate the environment:

    conda activate deformable_detr
  • PyTorch>=1.5.1, torchvision>=0.6.1 (following instructions here)

    For example, if your CUDA version is 9.2, you could install pytorch and torchvision as following:

    conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=9.2 -c pytorch
  • Other requirements

    pip install -r requirements.txt
  • Build MultiScaleDeformableAttention

    cd ./models/ops
    sh ./make.sh

Usage

Dataset preparation

Please download MOT17 dataset and CrowdHuman dataset and organize them like FairMOT as following:

.
├── crowdhuman
│   ├── images
│   └── labels_with_ids
├── MOT15
│   ├── images
│   ├── labels_with_ids
│   ├── test
│   └── train
├── MOT17
│   ├── images
│   ├── labels_with_ids

Training and Evaluation

Training on single node

You can download COCO pretrained weights from Deformable DETR. Then training MOTR on 8 GPUs as following:

sh configs/r50_motr_train.sh

Evaluation on MOT15

You can download the pretrained model of MOTR (the link is in "Main Results" session), then run following command to evaluate it on MOT15 train dataset:

sh configs/r50_motr_eval.sh

For visual in demo video, you can enable 'vis=True' in eval.py like:

det.detect(vis=True)

Evaluation on MOT17

You can download the pretrained model of MOTR (the link is in "Main Results" session), then run following command to evaluate it on MOT17 test dataset (submit to server):

sh configs/r50_motr_submit.sh

Citing MOTR

If you find MOTR useful in your research, please consider citing:

@article{zeng2021motr,
  title={MOTR: End-to-End Multiple-Object Tracking with TRansformer},
  author={Zeng, Fangao and Dong, Bin and Wang, Tiancai and Chen, Cheng and Zhang, Xiangyu and Wei, Yichen},
  journal={arXiv preprint arXiv:2105.03247},
  year={2021}
}
Comments
  • evaluation on dancetrack

    evaluation on dancetrack

    Thank you for this great job. I want to know how i can evaluate the results of dancetrack? I successfully run this project on dancetrack test, but I can not get the final results like HOTA, MOTA etc.

    opened by iTruffle 8
  • Pretrain Model

    Pretrain Model

    Could you please provide the version of pretrain model. I downloaded the r50_deformable_detr-checkpoint.pth but the there is an error in loading state_dict for MOTR

    opened by Ahnsun 8
  • About 'memory-optimized version' mentioned in paper

    About 'memory-optimized version' mentioned in paper

    Hi, I've read your paper in ECCV 2022, believe that your research is truly meaningful.

    I'm trying to reproduce your experiment. At section 4.2 in your paper, you have mentioned that "provide a memory-optimized version that can be trained on NVIDIA 2080 Ti GPUs". But I didn't find any details in this repository.

    Did you release the memory-optimized code? If not, will you release this part?

    Thanks a lot for your contribution.

    opened by HELLORPG 7
  • I want to know the details about  dataset processing  and coco_pth.

    I want to know the details about dataset processing and coco_pth.

    hi! when i run r50_motr_train.sh ,i meet some problems as follow:

    (1)As for dataset ,according to FairMOT ,i put the MOT17 and crowdhuman in the right path ,but their labels I download have something different with the format of your requirements, the label_with_ids is empty , i don't kown how to generate their label. in the other word ,what code should i run to generate their label?

    (2)As for model pretrain parameters, I have found coco_model_final.pth in Deformable DETR .But there are 5 models pth in Main Results ,i don't know which pth i need.

    If anyone knows the answer, please to reply me and I'd really appreciate it.

    opened by Z-Yh-June 5
  • Performance on multiclass dataset

    Performance on multiclass dataset

    Hi, I found that the performance of the model on multiclass dataset BDD100k is lower than using traditional detector-based trackers such as QDTrack, and I think this also happens with other transformer-based trackers such as GTR. From my experiments the mIDF1 could reach about 50 and IDF1 about 70 using QDTrack on BDD100k, but transformer-based trackers are not working so well. But I see on single-class tracking datasets like MOT17 transformer-based model works pretty well. Have you done any experiments about this and do you have any insights about the possible reasons for this? Please correct me if I am wrong, thank you!

    opened by briannlongzhao 3
  • Questions about track queries (existing) and object queries (new born)

    Questions about track queries (existing) and object queries (new born)

    Thanks for the impressive work! I think both the code and the paper are very interesting and insightful. However, I have a question about the track query and object query design. According to my understanding, when doing tracking, you concatenated the track queries and the object queries for existing and new born object detection and tracking. I am a little bit confused that how do you prevent the new born object queries detect the existing objects? To better describe, I just show a small example in this discussion.

    For example, the frame_0 (initial frame) detected 2 objects and these 2 objects' corresponding query features are concatenated to the new born queries, which yields the frame_1 (next frame) object queries to be 302*256 where 302 is the total number of object queries. Since the first 300 object queries positions and features are random-initialized, is there a mechanism or module in the MOTR to prevent these 300 object queries not re-detect the 2 objects that you detected from the initial frame? Thank you so much!

    opened by CeZh 3
  • Growing memory

    Growing memory

    Hi, I found that when I use 8 2080Ti to train this model. At the initial stage, the GPU memory occupation is 6/8, but the GPU memory is soon out. So do you have an explanation for this? And what's the suggested regime to train the MOTR model?

    opened by sjtuytc 3
  • test problem

    test problem

    run: sh configs/r50_motr_submit.sh

    result: .... filter init box 442 44 filter init box 443 44 save init track 443 45 save init track 444 44 save init track 449 46 totally 13 boxes are filtered. totally 4107 dts 0 occlusion dts ......

    while not accuracy rate or error.

    opened by eeric 3
  • Question

    Question

    Hi, Thank you for this useful and very interesting work.

    I have a question about the following;

    1. How to view demo in video, to see detected and tracked object.
    2. What is the total time used to train the model, and what is the speed (in frame per second ) during testing?
    3. Which location to put dataset having labels and images_with_ids
    4. Can you share the instruction on how to use the model on a custom dataset?

    Thank you.

    opened by NaifahNurya 3
  • Adaptation to DanceTrack

    Adaptation to DanceTrack

    Hi,

    Thanks for the wonderful work.

    May I ask, compared to the implementation on MOT17 dataset, what adaptation you made for DanceTrack? I noticed the change of image resolution during training, etc. Just in case I miss anything. Thank you.

    opened by noahcao 2
  • full evaluation results on  MOT17.

    full evaluation results on MOT17.

    Thanks for the great work.

    Looks like I can't find MOTR's results on the MOT17 leaderboard. Could you please provide the full evaluation results, including such as HOTA, AssA, Frag and so on, to provide more sense of its performance?

    Moreover, have you tried MOTR on MOT20 as well?

    opened by noahcao 2
  • Demo encountering Error

    Demo encountering Error

    I get the following error when trying to run the demo code:

    Traceback (most recent call last):
      File "demo.py", line 284, in <module>
        detector.run()
      File "demo.py", line 254, in run
        res = self.model.inference_single_image(cur_img.cuda().float(), (self.dataloader.seq_h, self.dataloader.seq_w), track_instances)
      File "/root/miniconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
        return func(*args, **kwargs)
      File "/h1/abhishek/blog/MOTR/models/motr.py", line 586, in inference_single_image
        track_instances=track_instances)
      File "/h1/abhishek/blog/MOTR/models/motr.py", line 515, in _forward_single_image
        hs, init_reference, inter_references, enc_outputs_class, enc_outputs_coord_unact = self.transformer(srcs, masks, pos, track_instances.query_pos, ref_pts=track_instances.ref_pts)
      File "/root/miniconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
        result = self.forward(*input, **kwargs)
      File "/h1/abhishek/blog/MOTR/models/deformable_transformer_plus.py", line 162, in forward
        memory = self.encoder(src_flatten, spatial_shapes, level_start_index, valid_ratios, lvl_pos_embed_flatten, mask_flatten)
      File "/root/miniconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
        result = self.forward(*input, **kwargs)
      File "/h1/abhishek/blog/MOTR/models/deformable_transformer_plus.py", line 266, in forward
        output = layer(output, pos, reference_points, spatial_shapes, level_start_index, padding_mask)
      File "/root/miniconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
        result = self.forward(*input, **kwargs)
      File "/h1/abhishek/blog/MOTR/models/deformable_transformer_plus.py", line 237, in forward
        src = self.forward_ffn(src)
      File "/h1/abhishek/blog/MOTR/models/deformable_transformer_plus.py", line 225, in forward_ffn
        src2 = self.linear2(self.dropout2(self.activation(self.linear1(src))))
      File "/root/miniconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
        result = self.forward(*input, **kwargs)
      File "/root/miniconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward
        return F.linear(input, self.weight, self.bias)
      File "/root/miniconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/nn/functional.py", line 1612, in linear
        output = input.matmul(weight.t())
    RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
    

    I have been running your demo in my docker with the following configuration :

    FROM nvidia/cuda:9.2-cudnn7-devel-ubuntu18.04
    
    RUN apt-get update
    RUN apt-get install ffmpeg libsm6 libxext6  -y
    
    opened by abhiag719 0
  • About ego-motion

    About ego-motion

    Thanks for your work! When this algorithm is applied to data sets with ego motion, how does it perform? How can it be improved to be deployed on mobile robots taking ego-motion into account?

    opened by noresppon 0
  • I want to knonw MOTR's performance on MOT17 train dataset because I got a very high MOTA

    I want to knonw MOTR's performance on MOT17 train dataset because I got a very high MOTA

    Hi! Congratulations for your nice work. When I try to use the supported model to run eval.py, I get a very shocking results: iShot_2022-11-11_18 24 25

    Then I use my trained model which runs 70+ epochs on crowdhuman and mot17 to eval and get the following result:

    image

    I am shocked by it's performance on the train dataset, so I wonder know is it normal?

    opened by Soulmate7 0
  • GT Boxes

    GT Boxes

    Hi, thanks for sharing the codes, but I am confused about the bbox foramt as follows: in dataset/joint.py/class DetMOTDetection:--> """ def _pre_single_frame(self, idx: int):

    **if osp.isfile(label_path): labels0 = np.loadtxt(label_path, dtype=np.float32).reshape(-1, 6)

            # normalized cewh to pixel xyxy format
            labels = labels0.copy()
            labels[:, 2] = w * (labels0[:, 2] - labels0[:, 4] / 2)
            labels[:, 3] = h * (labels0[:, 3] - labels0[:, 5] / 2)
            labels[:, 4] = w * (labels0[:, 2] + labels0[:, 4] / 2)
            labels[:, 5] = h * (labels0[:, 3] + labels0[:, 5] / 2)**
    

    The dataset I am using is MOT17, it seems that the bboxes format that GT files provide are (x,y,w,h), but here in the above func, the gt boxes are used as (center_x,center_y,w,h), I am not sure if there is something wrong with it ?

    opened by EddieEduardo 1
  • Attention Map

    Attention Map

    Hi, is there any way to generate the output attention maps of model.transformer.decoder.layers[i].cross_attn layer? when I follow the referenced functions, I finally get stuck in MSDA.ms_deform_attn_forward function in the forward method of the MSDeformAttnFunction class which is located at ./models/ops/functions/ms_deform_attn_func.py file, and I couldn't find any argument to set True to get the attention map in output.

    ./models/deformable_transformer_plus/DeformableTransformerDecoderLayer image

    ./models/ops/modules/ms_deform_attn.py image

    ./models/ops/functions/ms_deform_attn_func.py image

    opened by amindehnavi 0
Owner
null
Python package for multiple object tracking research with focus on laboratory animals tracking.

motutils is a Python package for multiple object tracking research with focus on laboratory animals tracking. Features loads: MOTChallenge CSV, sleap

Matěj Šmíd 2 Sep 5, 2022
Multiple-Object Tracking with Transformer

TransTrack: Multiple-Object Tracking with Transformer Introduction TransTrack: Multiple-Object Tracking with Transformer Models Training data Training

Peize Sun 537 Jan 4, 2023
Implementation of the CVPR 2021 paper "Online Multiple Object Tracking with Cross-Task Synergy"

Online Multiple Object Tracking with Cross-Task Synergy This repository is the implementation of the CVPR 2021 paper "Online Multiple Object Tracking

null 54 Oct 15, 2022
Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

Kakao Brain 114 Nov 28, 2022
🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

?? Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

Rendi Chevi 156 Jan 9, 2023
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

3DETR: An End-to-End Transformer Model for 3D Object Detection PyTorch implementation and models for 3DETR. 3DETR (3D DEtection TRansformer) is a simp

Facebook Research 487 Dec 31, 2022
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 28 Nov 25, 2022
The official implementation of ICCV paper "Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds".

Box-Aware Tracker (BAT) Pytorch-Lightning implementation of the Box-Aware Tracker. Box-Aware Feature Enhancement for Single Object Tracking on Point C

Kangel Zenn 5 Mar 26, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Introduction Pytorch implementation of Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Expert. | paper Song Park1

Clova AI Research 97 Dec 23, 2022
Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)

Quasi-Dense Tracking This is the offical implementation of paper Quasi-Dense Similarity Learning for Multiple Object Tracking. We present a trailer th

ETH VIS Research Group 327 Dec 27, 2022
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking (CVPR 2021) Pytorch implementation of the ArTIST motion model. In this repo

Fatemeh 38 Dec 12, 2022
Multiple Object Tracking with Yolov5!

Tracking with yolov5 This implementation is for who need to tracking multi-object only with detector. You can easily track mult-object with your well

null 9 Nov 8, 2022
Yolox-bytetrack-sample - Python sample of MOT (Multiple Object Tracking) using YOLOX and ByteTrack

yolox-bytetrack-sample YOLOXとByteTrackを用いたMOT(Multiple Object Tracking)のPythonサン

KazuhitoTakahashi 12 Nov 9, 2022
Towards End-to-end Video-based Eye Tracking

Towards End-to-end Video-based Eye Tracking The code accompanying our ECCV 2020 publication and dataset, EVE. Authors: Seonwook Park, Emre Aksan, Xuco

Seonwook Park 76 Dec 12, 2022
End-to-end beat and downbeat tracking in the time domain.

WaveBeat End-to-end beat and downbeat tracking in the time domain. | Paper | Code | Video | Slides | Setup First clone the repo. git clone https://git

Christian J. Steinmetz 60 Dec 24, 2022
[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

MixFormer The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention [Models and Raw results] (G

Multimedia Computing Group, Nanjing University 235 Jan 3, 2023
This is the official code for the paper "Tracker Meets Night: A Transformer Enhancer for UAV Tracking".

SCT This is the official code for the paper "Tracker Meets Night: A Transformer Enhancer for UAV Tracking" The spatial-channel Transformer (SCT) enhan

Intelligent Vision for Robotics in Complex Environment 27 Nov 23, 2022