Dataset and Code for the paper "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021), and "Depth-only Object Tracking" (BMVC2021)

Overview

DeT and DOT

Code and datasets for

  1. "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021)
  2. "Depth-only Object Tracking" (BMVC2021)
@InProceedings{yan2021det,
    author    = {Yan, Song and Yang, Jinyu and Kapyla, Jani and Zheng, Feng and Leonardis, Ales and Kamarainen, Joni-Kristian},
    title     = {DepthTrack: Unveiling the Power of RGBD Tracking},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {10725-10733}
}

@InProceedings{yan2021dot,
  title       = {Depth-only Object Tracking},
  author      = {Yan, Song and Yang, Jinyu and Leonardis, Ales and Kamarainen, Joni-Kristian},
  booktitle   = {Procedings of the British Machine Vision Conference (BMVC)},
  year        = {2021},
  organization= {British Machine Vision Association}
}

The settings are same as that of Pytracking, please read the document of Pytracking for details.

Generated Depth

We highly recommend to generate high quality depth data from the existing RGB tracking benchmarks, such as LaSOT, Got10K, TrackingNet, and COCO.

We show the examples of generated depth here. The first row is the results from HighResDepth for LaSOT RGB images, the second and the third are from DenseDepth for Got10K and COCO RGB images, the forth row is for the failure cases in which the targets are too close to the background or floor. The last row is from DenseDepth for CDTB RGB images.

Examples of generated depth images

In our paper, we used the DenseDepth monocular depth estimation method. We calculate the Ordinal Error (ORD) on the generated depth for CDTB and our DepthTrack test set, and the mean ORD is about 0.386, which is sufficient for training D or RGBD trackers and we have tested it in our works.

And we also tried the recently HighResDepth from CVPR2021, which also performs very well.

@article{alhashim2018high,
  title={High quality monocular depth estimation via transfer learning},
  author={Alhashim, Ibraheem and Wonka, Peter},
  journal={arXiv preprint arXiv:1812.11941},
  year={2018}
}

@inproceedings{miangoleh2021boosting,
  title={Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging},
  author={Miangoleh, S Mahdi H and Dille, Sebastian and Mai, Long and Paris, Sylvain and Aksoy, Yagiz},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={9685--9694},
  year={2021}
}

We will public the generated depth maps one by one.

Generated Depth maps for LaSOT

We manually remove bad sequences, and here are totally 646 sequences (some zip files may be broken, will be updated soon) used the DenseDepth method. Original DenseDepth outputs are in range [0, 1.0], we multiply 2^16. Please check LaSOT for RGB images and groundtruth.

part01, part02, part03, part04, part05, part06, part07, part08, part09, part10

The generated depth maps by using HighResDepth will be uploaded soon.

If you find some excellent methods to generate high quality depth images, please share it.

Architecture

Actually the network architecture is very simple, just adding one ResNet50 feature extractor for Depth input and then merging the RGB and Depth feature maps. Below figures are

  1. the feature maps for RGB, D inputs and the merged RGBD ones,
  2. the network for RGBD DiMP50, and
  3. RGBD ATOM.

The feature maps for RGB, D and the merged RGBD The network for RGB+D DiMP50 The network for RGB+D ATOM

Download

  1. Download the training dataset(70 sequences) of VOT2021RGBD Challenge from Zenodo (DepthTrack RGBD Tracking Benchmark) and edit the path in local.py More data will be uploaded soon, we hope to bring a large scale RGBD training dataset.
http://doi.org/10.5281/zenodo.4716441
  1. Download the checkpoints for DeT trackers (in install.sh)
gdown https://drive.google.com/uc\?id\=1djSx6YIRmuy3WFjt9k9ZfI8q343I7Y75 -O pytracking/networks/DeT_DiMP50_Max.pth
gdown https://drive.google.com/uc\?id\=1JW3NnmFhX3ZnEaS3naUA05UaxFz6DLFW -O pytracking/networks/DeT_DiMP50_Mean.pth
gdown https://drive.google.com/uc\?id\=1wcGJc1Xq_7d-y-1nWh6M7RaBC1AixRTu -O pytracking/networks/DeT_DiMP50_MC.pth
gdown https://drive.google.com/uc\?id\=17IIroLZ0M_ZVuxkGN6pVy4brTpicMrn8 -O pytracking/networks/DeT_DiMP50_DO.pth
gdown https://drive.google.com/uc\?id\=17aaOiQW-zRCCqPePLQ9u1s466qCtk7Lh -O pytracking/networks/DeT_ATOM_Max.pth
gdown https://drive.google.com/uc\?id\=15LqCjNelRx-pOXAwVd1xwiQsirmiSLmK -O pytracking/networks/DeT_ATOM_Mean.pth
gdown https://drive.google.com/uc\?id\=14wyUaG-pOUu4Y2MPzZZ6_vvtCuxjfYPg -O pytracking/networks/DeT_ATOM_MC.pth

Install

bash install.sh path-to-anaconda DeT

Train

Using the default DiMP50 or ATOM pretrained checkpoints can reduce the training time.

For example, move the default dimp50.pth into the checkpoints folder and rename as DiMPNet_Det_EP0050.pth.tar

python run_training.py bbreg DeT_ATOM_Max
python run_training.py bbreg DeT_ATOM_Mean
python run_training.py bbreg DeT_ATOM_MC

python run_training.py dimp DeT_DiMP50_Max
python run_training.py dimp DeT_DiMP50_Mean
python run_training.py dimp DeT_DiMP50_MC

Test

python run_tracker.py atom DeT_ATOM_Max --dataset_name depthtrack --input_dtype rgbcolormap
python run_tracker.py atom DeT_ATOM_Mean --dataset_name depthtrack --input_dtype rgbcolormap
python run_tracker.py atom DeT_ATOM_MC --dataset_name depthtrack --input_dtype rgbcolormap

python run_tracker.py dimp DeT_DiMP50_Max --dataset_name depthtrack --input_dtype rgbcolormap
python run_tracker.py dimp DeT_DiMP50_Mean --dataset_name depthtrack --input_dtype rgbcolormap
python run_tracker.py dimp DeT_DiMP50_MC --dataset_name depthtrack --input_dtype rgbcolormap


python run_tracker.py dimp dimp50 --dataset_name depthtrack --input_dtype color
python run_tracker.py atom default --dataset_name depthtrack --input_dtype color

Comments
  • About depth image bit?

    About depth image bit?

    In "DepthTrack: Unveiling the Power of RGBD Tracking" it said

    RGB images were stored as 24-bit JPEG with low compression rate and the depth frames as 16-bit PNG.

    I use following code to read depth png:

    from PIL import Image
    import numpy as np
    a = np.array(Image.open('....../adapter02_indoor/depth/00000001.png'))
    

    And I find it's dtype is int32, how to understand?

    opened by laisimiao 14
  • per attribute F-cores

    per attribute F-cores

    Hello,

    I want to know how to generate the per attribute F-scores (like the Figure 7 in your paper) when analyse the results of trackers? Is this figure generated by using VOT tookit, and if so, how does it work? At your convenience, would you please help me with this problem?

    Thanks for your assistance.

    opened by pilot00 6
  • running on custom dataset

    running on custom dataset

    Hi! I have already trained a detector on my dataset. What I would like to do now is to run a tracker of yours on my dataset, providing in input the results from detections. For example I have succesfully run some tracking algorithms as SORT, providing in input a format file as the MOT standard input. can you guide me in how to provide my dataset in input to your method?

    opened by andreaceruti 5
  • Broken lasot depth zip list

    Broken lasot depth zip list

    The following is broken lasot depth zip list

    'kangaroo-10',
     'kangaroo-11',
     'kangaroo-13',
     'kangaroo-17',
     'kangaroo-20',
     'kangaroo-3',
     'kangaroo-4',
     'kangaroo-6',
     'kangaroo-8',
     'kangaroo-9',
     'lion-10',
     'lion-11',
     'lion-12',
     'lion-15',
     'lion-18',
     'lion-2',
     'lion-20',
     'lion-4',
     'lion-7',
     'lion-8',
     'lizard-10',
     'lizard-11',
     'lizard-12',
     'lizard-13',
     'lizard-14',
     'lizard-15',
     'lizard-17',
     'lizard-19',
     'lizard-3',
     'lizard-5',
     'lizard-6',
     'lizard-8',
     'lizard-9',
     'microphone-16',
     'microphone-17',
     'microphone-18',
     'microphone-19',
     'microphone-3',
     'microphone-6',
     'monkey-1',
     'monkey-10',
     'monkey-11',
     'monkey-13',
     'monkey-14',
     'monkey-15',
     'monkey-16',
     'monkey-17',
     'monkey-2',
     'monkey-20',
     'monkey-3',
     'monkey-4',
     'monkey-5',
     'monkey-6',
     'monkey-8',
     'monkey-9',
     'motorcycle-12',
     'motorcycle-14',
     'motorcycle-17',
     'motorcycle-2',
     'motorcycle-20',
     'motorcycle-3',
     'motorcycle-6',
     'motorcycle-7',
     'motorcycle-9',
     'person-1',
     'person-11',
     'person-14',
     'person-15',
     'person-17',
     'person-18',
     'person-5',
     'person-6',
     'person-9',
     'pig-1',
     'pig-10',
     'pig-11',
     'pig-12',
     'pig-13',
     'pig-14',
     'pig-15',
     'pig-16',
     'pig-17',
     'pig-18',
     'pig-19',
     'pig-5',
     'pig-6',
     'pig-7',
     'rabbit-1',
     'rabbit-10',
     'rabbit-11',
     'rabbit-13',
     'rabbit-15',
     'rabbit-16',
     'rabbit-17',
     'rabbit-2',
     'rabbit-20',
     'rabbit-4',
     'rabbit-7',
     'rabbit-8',
     'rabbit-9',
     'robot-1',
     'robot-10',
     'robot-11',
     'robot-13',
     'robot-16',
     'robot-17',
     'robot-18',
     'robot-19',
     'robot-2',
     'robot-4',
     'robot-5',
     'robot-6',
     'robot-7',
     'robot-8',
     'rubicCube-10',
     'rubicCube-11',
     'rubicCube-12',
     'rubicCube-13',
     'rubicCube-14',
     'rubicCube-15',
     'rubicCube-16',
     'rubicCube-17',
     'rubicCube-18',
     'rubicCube-19',
     'rubicCube-20',
     'rubicCube-4',
     'rubicCube-5',
     'rubicCube-8',
     'rubicCube-9'
    

    Appreciate that your hard fix work. BR.

    opened by laisimiao 5
  • How to generate the F-score, Precision and Recall

    How to generate the F-score, Precision and Recall

    Hello,

    It's really great that you guys have disclosed such an excellent work.

    I would like to know how to obtain the F-score, precision and recall values when I get the raw results with '.txt' suffix?

    Thanks a lot!

    opened by pilot00 4
  • Can we use DeT with detector?

    Can we use DeT with detector?

    Hi~ Thanks for the great work and code!

    I run the code with Kinect and found it tracks objects quite well with a user-defined bounding box. I'm wondering if I can use the DeT tracker with some detectors e.g., YOLO/MaskRCNN. For example, the detectors predict the instance segmentation with bboxes and the bboxes are tracked by the DeT. Finally, the DeT will assign these bboxes with new IDs or associate them with previous IDs. However, the new bbox tracking is defined by users in the current setting, how to filter the detections automatically?

    Is there any script or a hint for doing this? Thank you very much!

    opened by kxhit 2
  • NAN convert

    NAN convert

    depthtrack.py gt = pandas.read_csv(bb_anno_file, delimiter=',', header=None, dtype=np.float32, na_filter=False, low_memory=False).values maybe na_filter should be true?

    opened by ShangGaoG 1
  • A naming error

    A naming error

    Following sequence_list tell us there is a seq called lock_wild in depthtrack test set, but actully it's in training set link you provide and called lock01_wild. At the same time, there is a seq called lock02_indoor in test set link you provide. https://github.com/xiaozai/DeT/blob/8aa5e2eae91e0f3f88290b4133aadf09aebb34b6/pytracking/evaluation/depthtrackdataset.py#L114

    Please note this.

    opened by laisimiao 1
  • Hi, can you share your version of torch, torchvision? Thanks.

    Hi, can you share your version of torch, torchvision? Thanks.

    File "../ltr/data/transforms.py", line 238, in transform_image rgb = tvisf.normalize(rgb, self.mean, self.std, self.inplace) TypeError: normalize() takes 3 positional arguments but 4 were given

    opened by wangxiao5791509 2
  • Questions on DeT training datasets

    Questions on DeT training datasets

    Hi,

    Thanks for the brilliant work on RGB-D tracking.

    I am new to the tracking domain. I have several questions regarding the tracking datasets. Can you please help me to clarify several points?

    Firstly, in the paper it mentions that the DeT is firstly pretrained on Pseudo LaSOT and Pseudo Coco, and then finetuned on DepthTrack. I would like to know if it makes any changes if we directly train on both three datasets?

    Secondly, on the github it mentions that Using the default DiMP50 or ATOM pretrained checkpoints can reduce the training time. It seems that DiMP or ATOM are pretrained with larger RGB datasets (trackingnet, got10k, etc etc). I would like to know if these pretrained weights are adopted to initialize the model weight to produce the paper results? Or in the paper the network is only trained with Pseudo LaSOT, Pseudo Coco, and DepthTrack.

    Finally, one question regarding the Table 2: Comparison of the original RGB trackers and their DeT variants. How are the RGB baseline trained? Only with RGB images from Pseudo LaSOT, Pseudo Coco, and DepthTrack? Are they only initialized with pretrained encoder (Imagenet)?

    Sorry to bother you with all these questions... Looking forward to hearing from you. Thanks again

    opened by Zongwei97 6
Owner
Yan Song
RGBD tracking, Computerized Anthropometry, 3D Human Body Shape Reconstruction
Yan Song
Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

The official implementation of Mask-aware IoU and maYOLACT detector. Our implementation is based on mmdetection. Mask-aware IoU for Anchor Assignment

Kemal Oksuz 11 Oct 21, 2021
The pytorch implementation of SOKD (BMVC2021).

Semi-Online Knowledge Distillation Implementations of SOKD. Requirements This repo was tested with Python 3.8, PyTorch 1.5.1, torchvision 0.6.1, CUDA

null 4 Dec 19, 2021
[BMVC2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations"

DomainMix [BMVC2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations" [paper] [de

Wenhao Wang 17 Dec 20, 2022
This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised d

Hang 3 Oct 22, 2021
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment (ICCV2021)

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment This is a pytorch project for the paper Seeing Dynamic Scene i

DV Lab 21 Nov 28, 2022
Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Robust Object Detection via Instance-Level Temporal Cycle Confusion This repo contains the implementation of the ICCV 2021 paper, Robust Object Detect

Xin Wang 69 Oct 13, 2022
Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

CSRA This is the official code of ICCV 2021 paper: Residual Attention: A Simple But Effective Method for Multi-Label Recoginition Demo, Train and Vali

null 163 Dec 22, 2022
Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Talk-to-Edit (ICCV2021) This repository contains the implementation of the following paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog Yumin

Yuming Jiang 221 Jan 7, 2023
Official code for ICCV2021 paper "M3D-VTON: A Monocular-to-3D Virtual Try-on Network"

M3D-VTON: A Monocular-to-3D Virtual Try-On Network Official code for ICCV2021 paper "M3D-VTON: A Monocular-to-3D Virtual Try-on Network" Paper | Suppl

null 109 Dec 29, 2022
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Ren Yurui 261 Jan 9, 2023
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Website | ArXiv | Get Start | Video PIRenderer The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic

Ren Yurui 81 Sep 25, 2021
Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera

SPEC: Seeing People in the Wild with an Estimated Camera [ICCV 2021] SPEC: Seeing People in the Wild with an Estimated Camera, Muhammed Kocabas, Chun-

Muhammed Kocabas 187 Dec 26, 2022
Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation Created by Zeyu HU Introduction This work is based on our paper VMNet: Voxel-Mes

HU Zeyu 82 Dec 27, 2022
Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Implicit Internal Video Inpainting Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation paper | project

null 202 Dec 30, 2022
This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

GMPQ: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation This is the pytorch implementation for the paper: Generalizable Mix

null 18 Sep 2, 2022
PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

StructDepth PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimat

SJTU-ViSYS 112 Nov 28, 2022
This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021.

PyTorch implementation of DAQ This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021. For more informatio

CV Lab @ Yonsei University 36 Nov 4, 2022
ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

Zongdai 107 Dec 20, 2022