Global Tracking Transformers, CVPR 2022

Overview

Global Tracking Transformers

Global Tracking Transformers,
Xingyi Zhou, Tianwei Yin, Vladlen Koltun, Philipp Krähenbühl,
CVPR 2022 (arXiv 2203.13250)

Features

  • Object association within a long temporal window (32 frames).

  • Classification after tracking for long-tail recognition.

  • "Detector" of global trajectories.

Installation

See installation instructions.

Demo

Run our demo using Colab (no GPU needed): Open In Colab

We use the default detectron2 demo interface. For example, to run TAO model on an example video (video source: TAO/YFCC100M dataset), download the model and run

python demo.py --config-file configs/GTR_TAO_DR2101.yaml --video-input docs/yfcc_v_acef1cb6d38c2beab6e69e266e234f.mp4 --output output/demo_yfcc.mp4 --opts MODEL.WEIGHTS models/GTR_TAO_DR2101.pth

If setup correctly, the output on output/demo_yfcc.mp4 should look like:

Benchmark evaluation and training

Please first prepare datasets, then check our MODEL ZOO to reproduce results in our paper. We highlight key results below:

  • MOT17 test set
MOTA IDF1 HOTA DetA AssA FPS
75.3 71.5 59.1 61.6 57.0 19.6
  • TAO test set
Track mAP FPS
20.1 11.2

License

The majority of GTR is licensed under the Apache 2.0 license, however portions of the project are available under separate license terms: trackeval in gtr/tracking/trackeval/, is licensed under the MIT license. FairMOT in gtr/tracking/local_tracker is under MIT license. Please see NOTICE for license details. The demo video is from TAO dataset, which is originally from YFCC100M dataset. Please be aware of the original dataset license.

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{zhou2022global,
  title={Global Tracking Transformers},
  author={Zhou, Xingyi and Yin, Tianwei and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp},
  booktitle={CVPR},
  year={2022}
}
Issues
  • Training memory issue & missing file

    Training memory issue & missing file

    Hello, Thanks for sharing the source code of nice work!

    I have tried the TAO training code (GTR_TAO_DR2101.yaml) but failed full training due to the memory overhead error. It seems the memory usage increases gradually during training, and reaches the max memory limit. As I am currently using A6000 with 48G gpu, it should be enough based on your training spec (4x 32G V100 gpu). Could you give any ideas? My initial solution is to reduce the video length 8 to 2.

    Moreover, I cannot find the move_tao_keyframes.py file. Could you please provide this file?

    Thanks,

    opened by tkdtks123 4
  • Error Running Demo

    Error Running Demo

    Hello, I'm having trouble running the inference (the "Demo" section in the README). Below is a notebook link showing the setup and error.

    Here is the link to the notebook.

    Let me know if anything else needs to be provided.

    Much appreciated!

    opened by alckasoc 3
  • Not able to run in x86 in CPU

    Not able to run in x86 in CPU

    Hi @xingyizhou @noahcao Thank you for sharing this work When I'm trying to run the script in my x86 machine in cpu $python demo.py --config-file configs/GTR_TAO_DR2101.yaml --video-input docs/yfcc_v_acef1cb6d38c2beab6e69e266e234f.mp4 --output output/demo_yfcc.mp4 --opts MODEL.WEIGHTS GTR_TAO_DR2101.pth, I'm getting the following error:

    Traceback (most recent call last): File "/home/sravan/SAT/Tracker/GTR/demo.py", line 161, in for vis_frame in demo.run_on_video(video): File "/home/sravan/SAT/Tracker/GTR/gtr/predictor.py", line 147, in run_on_video outputs = self.video_predictor(frames) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/home/sravan/SAT/Tracker/GTR/gtr/predictor.py", line 103, in call predictions = self.model(inputs) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/sravan/SAT/Tracker/GTR/gtr/modeling/meta_arch/gtr_rcnn.py", line 61, in forward return self.sliding_inference(batched_inputs) File "/home/sravan/SAT/Tracker/GTR/gtr/modeling/meta_arch/gtr_rcnn.py", line 81, in sliding_inference instances_wo_id = self.inference( File "/home/sravan/SAT/Tracker/GTR/gtr/modeling/meta_arch/custom_rcnn.py", line 107, in inference features = self.backbone(images.tensor) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/sravan/anaconda3/lib/python3.9/site-packages/detectron2/modeling/backbone/fpn.py", line 126, in forward bottom_up_features = self.bottom_up(x) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/sravan/SAT/Tracker/GTR/third_party/CenterNet2/centernet/modeling/backbone/res2net.py", line 630, in forward x = stage(x) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/sravan/SAT/Tracker/GTR/third_party/CenterNet2/centernet/modeling/backbone/res2net.py", line 457, in forward sp = self.convs[i](sp, offset, mask) File "/home/sravan/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/sravan/anaconda3/lib/python3.9/site-packages/detectron2/layers/deform_conv.py", line 474, in forward x = modulated_deform_conv( File "/home/sravan/anaconda3/lib/python3.9/site-packages/detectron2/layers/deform_conv.py", line 211, in forward raise NotImplementedError("Deformable Conv is not supported on CPUs!") NotImplementedError: Deformable Conv is not supported on CPUs!

    How can I solve this?

    opened by navaravan 2
  • A question about the speed

    A question about the speed

    Thanks for releasing this great work. May I ask for more details about the speed evaluation?

    For TAO data, as you used the default detectron2, may I know if you count the inference time of detectron2 for 11.2 FPS, or only the GTR inference time? Since the TAO video sampling rate may not be 30 FPS, does it need to consider this factor and transfer the inference speed?

    Thanks.

    opened by fandulu 2
  • can't evaluate on MOT17

    can't evaluate on MOT17

    Hi Xingyi,

    I believe the guidelines you write at the doc has some issue. To be precise, to directly evaluate on MOT17 by:

    python train_net.py --config-file configs/GTR_MOT_FPN.yaml --eval-only MODEL.WEIGHTS  output/GTR_MOT/GTR_MOT_FPN/model_0004999.pth
    

    we will get the error as: gtr.tracking.trackeval.utils.TrackEvalException: GT file not found for sequence: MOT17-02-FRCNN

    Besides, to evaluate on the self-splitted half-val, I assumed we need the files "gt_val_half.txt" under the directory of each sequence?

    Could you help to double check if your guideline can work fine with the current version and reach the requirement of the TrackEvallib you adopted? I thought you may miss some guidelines about data splitting and preparation?

    opened by noahcao 2
  • Typos in training guidelines?

    Typos in training guidelines?

    Hi Xingyi,

    Thanks for the wonderful job. I tried to run the training on MOT17 following the guidelines. But I found some potential typos making that doable.

    1. Should we rename the MOT17 train to trainval, which is not explained in the prepare datasets doc?
    2. Should the datasets for training be ("mot17_halftrain","crowdhuman_train") instead of ("mot17_halftrain","crowdhuman_amodal_train") in the config file? the later one would raises an error of unregistered dataset: image
    opened by noahcao 2
  • Joint or separate training

    Joint or separate training

    Nice work! Thank you for sharing the code.

    Is training of detector and tracker is joint or separate? It seems from the paper (Section 5.2) that the first detector needs to be trained then the detector is frozen and the tracker is finetuned after that? Is that right inference?

    Thanks Gurkirt

    opened by gurkirt 2
  • about lvis version

    about lvis version

    Hi there! Thanks for your work.

    Here I have 2 questions about the version of lvis dataset:

    1. Why did you use v1.0 instead of v10.5?
    2. Could you please show me the code which re-map the labels of v1 back to v0.5?

    Looking forward to your reply!

    opened by HanGuangXin 1
  • Add Web Demo & Docker environment

    Add Web Demo & Docker environment

    Hey @xingyizhou ! 👋

    Nice work on the global tracking transformer!

    I'm from Replicate, where we're trying to make machine learning reproducible. We noticed you have registered an account with us, and this pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

    This also means we can make a web page where other people can try out your model! View it here: https://replicate.com/xingyizhou/gtr. The docker file can be found under the tab ‘run model with docker’. The demo makes it easy for anyone to upload a customised video and see the result effortless.

    We usually add some examples to the for un-registered users (it looks like the screenshot below), but we'd like to invite you to claim the page so you can own the page, customise the Example gallery as you like, push any future update to the web demo, and we'll feature it on our website and tweet about it too. You can find the 'Claim this model' button on the top of the page.

    Thank you!

    Screenshot 2022-06-01 at 10 29 14

    opened by chenxwh 1
  • Difference between GTR_MOT_FPN and GTR_MOTFull_FPN

    Difference between GTR_MOT_FPN and GTR_MOTFull_FPN

    Hi, I cannot find in the detail neither here nor in the paper what are the differences between these two models. The configurations are identical but they differ in the training dataset (half vs full).

    In the paper you said: "We follow CenterTrack [68] and split each training sequence in half. We use the first half for training and the second half for validation". But the results in table 3 seems obtained by GTR_MOTFull_FPN.

    Which of the two models should be considered the "best" one? May I have more information about this?

    Thank you so much in advance.

    opened by pietro-nardelli 0
  • CenterNet vs CenterNet2

    CenterNet vs CenterNet2

    Hi, I see that in the configs, some models are using CenterNet while the others are using CenterNet2, is there any reason for using one over the other in different models? thank you!

    opened by briannlongzhao 0
  • produce results file on TAO test set

    produce results file on TAO test set

    Hi, could you please show me how to get the results on TAO test set, which will be uploaded to the challenge server?

    I tried the following command, but it didn't work:

    python train_net.py --config-file configs/GTR_TAO_DR2101.yaml --eval-only MODEL.WEIGHTS models/GTR_TAO_DR2101.pth DATASETS.TEST ('tao_test',)

    opened by HanGuangXin 1
  • about lvis version

    about lvis version

    Hi there! Thanks for your work.

    Here I have 2 questions about the version of lvis dataset:

    1. Why did you use v1.0 instead of v0.5?
    2. Could you please show me the code which re-map the labels of v1.0 back to v0.5?

    Looking forward to your reply!

    opened by HanGuangXin 1
  • Reproducing Transformer Fine Tuning - TAO

    Reproducing Transformer Fine Tuning - TAO

    I'm following the instructions here to reproduce the transformer head fine tuning on TAO here: https://github.com/xingyizhou/GTR/blob/master/docs/MODEL_ZOO.md#tao and I can't seem to get the results reported in the MODEL_ZOO or paper.

    Here are the steps I'm following:

    1. Download and setup the datasets as described here: https://github.com/xingyizhou/GTR/tree/master/datasets
    2. Download the trained detection model C2_LVISCOCO_DR2101_4x.pth from the link in the third bullet point under note section in TAO and place it in a models/ directory. The link for the config is broken in this bullet point but I'm using the C2_LVISCOCO_DR2101_4x.yaml in configs/ folder
    3. run python train_net.py --num-gpus 8 --config-file configs/C2_LVISCOCO_DR2101_4x.yaml MODEL.WEIGHTS models/C2_LVISCOCO_DR2101_4x.pth This took about 6 days on 8 Titan X GPUs.

    The reason I believe it didn't train properly is because when I run TAO validation on the output model of the training using: python train_net.py --config-file configs/GTR_TAO_DR2101.yaml --eval-only MODEL.WEIGHTS output/GTR_TAO_first_train/C2_LVISCOCO_DR2101_4x/model_final.pth the mAP is 10.6 but when I run TAO validation on the pretraind model, GTR_TAO_DR2101.pth, downloaded from MODEL_ZOO: python train_net.py --config-file configs/GTR_TAO_DR2101.yaml --eval-only MODEL.WEIGHTS models/GTR_TAO_DR2101.pth the output is correct 22.5 mAP as reported.

    Any ideas why the model training isn't working correctly? Am i using the wrong configurations or something?

    opened by abhik-nd 1
Owner
Xingyi Zhou
CS Ph.D. student at UT Austin.
Xingyi Zhou
Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Expediting Vision Transformers via Token Reorganizations This repository contain

Youwei Liang 85 Aug 5, 2022
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

Phil Wang 288 Aug 3, 2022
Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

Amazon 66 Jun 8, 2022
Global Rhythm Style Transfer Without Text Transcriptions

Global Prosody Style Transfer Without Text Transcriptions This repository provides a PyTorch implementation of AutoPST, which enables unsupervised glo

Kaizhi Qian 171 Aug 8, 2022
An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Extracting OpenAI CLIP (Global/Grid) Features from Image and Text This repo aims at providing an easy to use and efficient code for extracting image &

Jianjie(JJ) Luo 6 May 18, 2022
Tracking Progress in Natural Language Processing

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Sebastian Ruder 20.7k Aug 10, 2022
Learning Spatio-Temporal Transformer for Visual Tracking

STARK The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking Highlights The strongest performances Tracker

Multimedia Research 436 Aug 12, 2022
Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

NERDA Not only is NERDA a mesmerizing muppet-like character. NERDA is also a python package, that offers a slick easy-to-use interface for fine-tuning

Ekstra Bladet 138 Jul 13, 2022
KoBART model on huggingface transformers

KoBART-Transformers SKT에서 공개한 KoBART를 편리하게 사용할 수 있게 transformers로 포팅하였습니다. Install (Optional) BartModel과 PreTrainedTokenizerFast를 이용하면 설치하실 필요 없습니다. p

Hyunwoong Ko 54 Apr 26, 2022
Big Bird: Transformers for Longer Sequences

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.

Google Research 425 Jul 29, 2022
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 ?? Transformers provides thousands of pretrained models to perform tasks o

Hugging Face 68.1k Aug 11, 2022
:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

deepset 5.1k Aug 5, 2022
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

Explosion 1.1k Aug 12, 2022
spaCy plugin for Transformers , Udify, ELmo, etc.

Camphr - spaCy plugin for Transformers, Udify, Elmo, etc. Camphr is a Natural Language Processing library that helps in seamless integration for a wid

null 343 Mar 12, 2022
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 ?? Transformers provides thousands of pretrained models to perform tasks o

Hugging Face 40.9k Feb 18, 2021
:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

deepset 1.4k Feb 18, 2021
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

Explosion 903 Feb 17, 2021
spaCy plugin for Transformers , Udify, ELmo, etc.

Camphr - spaCy plugin for Transformers, Udify, Elmo, etc. Camphr is a Natural Language Processing library that helps in seamless integration for a wid

null 327 Feb 18, 2021
A deep learning-based translation library built on Huggingface transformers

DL Translate A deep learning-based translation library built on Huggingface transformers and Facebook's mBART-Large ?? GitHub Repository ?? Documentat

Xing Han Lu 221 Aug 10, 2022