TrTr: Visual Tracking with Transformer

Related tags

Deep Learning TrTr
Overview

TrTr: Visual Tracking with Transformer

We propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder architecture to gain global and rich contextual interdependencies. In this new architecture, features of the template image is processed by a self-attention module in the encoder part to learn strong context information, which is then sent to the decoder part to compute cross-attention with the search image features processed by another self-attention module. In addition, we design the classification and regression heads using the output of Transformer to localize target based on shape-agnostic anchor. We extensively evaluate our tracker TrTr, on several benchmarks and our method performs favorably against state-of-the-art algorithms.

Network architecture of TrTr for visual tracking

Installation

Install dependencies

$ ./install.sh ~/anaconda3 trtr 

note1: suppose you have the anaconda installation path under ~/anaconda3.

note2: please select a proper cuda-toolkit version to install Pytorch from conda, the default is 10.1. However, for RTX3090, please select 11.0. Then the above installation command would be $ ./install.sh ~/anaconda3 trtr 11.0.

Activate conda environment

$ conda activate trtr

Quick Start: Using TrTr

Webcam demo

Offline Model

$ python demo.py --tracker.checkpoint networks/trtr_resnet50.pth --use_baseline_tracker

Online Model

$ python demo.py --tracker.checkpoint networks/trtr_resnet50.pth

image sequences (png, jpeg)

add option --video_name ${video_dir}

video (mp4 or avi)

add option --video_name ${video_name}

Benchmarks

Download testing datasets

Please read this README.md to prepare the dataset.

Basic usage

Test tracker

$ cd benchmark
$ python test.py --cfg_file ../parameters/experiment/vot2018/offline.yaml
  • --cfg_file: the yaml file containing the hyper-parameter for each datasets. Please check ./benchmark/parameters/experiment for more yaml files
    • online model for VOT2018: python test.py --cfg_file ../parameters/experiment/vot2018/online.yaml
    • online model for OTB: python test.py --cfg_file ../parameters/experiment/otb/online.yaml
  • --result_path: optional parameter to specify a directory to store the tracking result. Default value is results, which generate ./benchmark/results/${dataset_name}
  • --model_name: optional parameter to specify the name of tracker name under the result path. Default value is trtr, which yield a tracker directory of ./benchmark/results/${dataset_name}/trtr
  • --vis: visualize tracking
  • --repetition: repeat number. For example, you should assign --repetition 15 for VOT benchmark following the official evaluation.

Eval tracker

$ cd benchmark
$ python eval.py
  • --dataset: parameter to specify the benchmark. Default value is VOT2018. Please assign other bench name, e.g., OTB, VOT2019, UAV, etc.
  • --tracker_path: parameter to specify the result directory. Default value is ./benchmark/results. This is a parameter related to --result_path parameter in python test.py.
  • --num: parameter to specify the thread number for evaluation multiple tracker results. Default is 1.

(Option) Hyper-parameter search

$ python hp_search.py --tracker.checkpoint ../networks/trtr_resnet50.pth --tracker.search_sizes 280 --separate --repetition 1  --use_baseline_tracker --tracker.model.transformer_mask True

Train

Download training datasets

Please read this README.md to prepare the training dataset.

Download VOT2018 dataset

  1. Please download VOT2018 dataset following [this REAMDE], which is necessary for testing the model during training.
  2. Or you skip this testing process by assigning several parameter, which are explained later.

Test with single GPU

$ python main.py  --cfg_file ./parameters/train/default.yaml --output_dir train

note1: please check ./parameters/train/default.yaml for the parameters for training note2: --output_dir to assign the path to store the training result. The above commmand genearte ./train note3: maybe you have to modify the file limit: ulimit -n 8192. Write in ~/.bashrc maybe better. note4: you can a larger value for --benchmark_start_epoch than for --epochs to skip benchmark test. e.g., --benchmark_start_epoch 21 and --epochs 20

debug mode for quick checking the training process:

$ python main.py  --cfg_file ./parameters/train/default.yaml  --batch_size 16 --dataset.paths ./datasets/yt_bb/dataset/Curation  ./datasets/vid/dataset/Curation/ --dataset.video_frame_ranges 3 100  --dataset.num_uses 100 100  --dataset.eval_num_uses 100 100  --resume networks/trtr_resnet50.pth --benchmark_start_epoch 0 --epochs 10

Multi GPUs

multi GPUs in single machine

$ python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --cfg_file ./parameters/train/default.yaml --output_dir train

--nproc_per_node: is the number of GPU to use. The above command means use two GPUs in a machine.

multi GPUs in multi machines

Master Machine

$ python -m torch.distributed.launch --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr="${MASTER_IP_ADDRESS}" --master_port=${port} --use_env main.py --cfg_file ./parameters/train/default.yaml --output_dir train  --benchmark_start_epoch 8
  • --nnodes: number of machine to use. The above command means two machines.
  • --node_rank: the id for each machine. Master should be 0.
  • master_addr: assign the IP address of master machine
  • master_port: open port (e.g., 8080)

Slave1 Machine

$ python -m torch.distributed.launch --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr="${MASTER_IP_ADDRESS}" --master_port=${port} --use_env main.py --cfg_file ./parameters/train/default.yaml
Comments
  • About dataset creation for det and yt_bb

    About dataset creation for det and yt_bb

    I tried to download and create data sets for det and yt_bb. The README says to use par_crop.py and curate_video.py each, but there is only curate.py in the folder. Also, when I run det, yt_bb's curate.py, the content of the JSON file is only "{}" in both cases.

    スクリーンショット (45)

    opened by FukushimaDaiki 2
  • Training Problem

    Training Problem

    layer_error

    When I run main.py to training, occuring this problem:

    "RuntimeError: Error(s) in loading state_dict for TRTR:"

    I don't know how to change the argument.

    opened by FukushimaDaiki 2
  • TrTr_Test Problem

    TrTr_Test Problem

    image

    image

    I run the test.py for VOT2018, but raise an error: Benchmark dataset inference error: 'Configuration check failed:: No action for key 'tracker.model.backbone.transformer.enc_layers' to chenck value'。

    Do you occur this problem, and how to solve it?

    opened by DavidZhangdw 2
  • run demo with weights from yolov5

    run demo with weights from yolov5

    I am trying to run demo with a model trained on yolov5, and getting an error: No module named 'models.yolo'. Is TrTr compatible with yolo weights? Do I need convert yolo weights (*.pt) to a format compatible with TrTr, and how? Thanks,

    opened by chyphen7 0
  • Test questions after training.

    Test questions after training.

    After training a model, there will be errors in the test. key=lambda x: int(os.path.basename(x)).split('.')[0] valueError: invalid literal for int() with base 10: 'practical'

    opened by zws198 2
  • code problem

    code problem

    https://github.com/tongtybj/TrTr/blob/master/datasets/augmentation.py Is line 174 wrong? what is returned on the code is bbox. Shouldn't it be mask?

    def _flip_aug(self, image, bbox, mask): image = cv2.flip(image, 1) width = image.shape[1] bbox = Corner(width - 1 - bbox.x2, bbox.y1,width - 1 - bbox.x1, bbox.y2) bbox = Corner(width - 1 - mask[2], mask[1],width - 1 - mask[0], mask[3]) return image, bbox, mask

    opened by Giveupfree 1
Owner
趙 漠居(Zhao, Moju)
Project Lecture in the Uiversity of Tokyo.
趙 漠居(Zhao, Moju)
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
Learning Spatio-Temporal Transformer for Visual Tracking

STARK The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking Hiring research interns for visual transformer

Multimedia Research 484 Dec 29, 2022
Alex Pashevich 62 Dec 24, 2022
Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

DEFT: Detection Embeddings for Tracking DEFT: Detection Embeddings for Tracking, Mohamed Chaabane, Peter Zhang, J. Ross Beveridge, Stephen O'Hara

Mohamed Chaabane 253 Dec 18, 2022
Tracking code for the winner of track 1 in the MMP-Tracking Challenge at ICCV 2021 Workshop.

Tracking Code for the winner of track1 in MMP-Trakcing challenge This repository contains our tracking code for the Multi-camera Multiple People Track

DamoCV 29 Nov 13, 2022
Tracking Pipeline helps you to solve the tracking problem more easily

Tracking_Pipeline Tracking_Pipeline helps you to solve the tracking problem more easily I integrate detection algorithms like: Yolov5, Yolov4, YoloX,

VNOpenAI 32 Dec 21, 2022
Quadruped-command-tracking-controller - Quadruped command tracking controller (flat terrain)

Quadruped command tracking controller (flat terrain) Prepare Install RAISIM link

Yunho Kim 4 Oct 20, 2022
Python package for multiple object tracking research with focus on laboratory animals tracking.

motutils is a Python package for multiple object tracking research with focus on laboratory animals tracking. Features loads: MOTChallenge CSV, sleap

Matěj Šmíd 2 Sep 5, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
Visual Tracking by TridenAlign and Context Embedding

Visual Tracking by TridentAlign and Context Embedding (TACT) Test code for "Visual Tracking by TridentAlign and Context Embedding" Janghoon Choi, Juns

Janghoon Choi 32 Aug 25, 2021
STMTrack: Template-free Visual Tracking with Space-time Memory Networks

STMTrack This is the official implementation of the paper: STMTrack: Template-free Visual Tracking with Space-time Memory Networks. Setup Prepare Anac

Zhihong Fu 62 Dec 21, 2022
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

null 310 Dec 28, 2022
Multiple-Object Tracking with Transformer

TransTrack: Multiple-Object Tracking with Transformer Introduction TransTrack: Multiple-Object Tracking with Transformer Models Training data Training

Peize Sun 537 Jan 4, 2023
Transformer Tracking (CVPR2021)

TransT - Transformer Tracking [CVPR2021] Official implementation of the TransT (CVPR2021) , including training code and trained models. We are revisin

chenxin 465 Jan 6, 2023
HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)

HiFT: Hierarchical Feature Transformer for Aerial Tracking Ziang Cao, Changhong Fu, Junjie Ye, Bowen Li, and Yiming Li Our paper is Accepted by ICCV 2

Intelligent Vision for Robotics in Complex Environment 55 Nov 23, 2022
This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

MOTR: End-to-End Multiple-Object Tracking with TRansformer This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object

null 348 Jan 7, 2023
This is the official code for the paper "Tracker Meets Night: A Transformer Enhancer for UAV Tracking".

SCT This is the official code for the paper "Tracker Meets Night: A Transformer Enhancer for UAV Tracking" The spatial-channel Transformer (SCT) enhan

Intelligent Vision for Robotics in Complex Environment 27 Nov 23, 2022
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 28 Nov 25, 2022
So-ViT: Mind Visual Tokens for Vision Transformer

So-ViT: Mind Visual Tokens for Vision Transformer        Introduction This repository contains the source code under PyTorch framework and models trai

Jiangtao Xie 44 Nov 24, 2022