Dataset and Code for the paper "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021), and "Depth-only Object Tracking" (BMVC2021)

Yan Song

Last update: Dec 15, 2022

Related tags

Overview

DeT and DOT

Code and datasets for

"DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021)
"Depth-only Object Tracking" (BMVC2021)

@InProceedings{yan2021det,
    author    = {Yan, Song and Yang, Jinyu and Kapyla, Jani and Zheng, Feng and Leonardis, Ales and Kamarainen, Joni-Kristian},
    title     = {DepthTrack: Unveiling the Power of RGBD Tracking},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {10725-10733}
}

@InProceedings{yan2021dot,
  title       = {Depth-only Object Tracking},
  author      = {Yan, Song and Yang, Jinyu and Leonardis, Ales and Kamarainen, Joni-Kristian},
  booktitle   = {Procedings of the British Machine Vision Conference (BMVC)},
  year        = {2021},
  organization= {British Machine Vision Association}
}

The settings are same as that of Pytracking, please read the document of Pytracking for details.

Generated Depth

We highly recommend to generate high quality depth data from the existing RGB tracking benchmarks, such as LaSOT, Got10K, TrackingNet, and COCO.

We show the examples of generated depth here. The first row is the results from HighResDepth for LaSOT RGB images, the second and the third are from DenseDepth for Got10K and COCO RGB images, the forth row is for the failure cases in which the targets are too close to the background or floor. The last row is from DenseDepth for CDTB RGB images.

In our paper, we used the DenseDepth monocular depth estimation method. We calculate the Ordinal Error (ORD) on the generated depth for CDTB and our DepthTrack test set, and the mean ORD is about 0.386, which is sufficient for training D or RGBD trackers and we have tested it in our works.

And we also tried the recently HighResDepth from CVPR2021, which also performs very well.

@article{alhashim2018high,
  title={High quality monocular depth estimation via transfer learning},
  author={Alhashim, Ibraheem and Wonka, Peter},
  journal={arXiv preprint arXiv:1812.11941},
  year={2018}
}

@inproceedings{miangoleh2021boosting,
  title={Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging},
  author={Miangoleh, S Mahdi H and Dille, Sebastian and Mai, Long and Paris, Sylvain and Aksoy, Yagiz},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={9685--9694},
  year={2021}
}

We will public the generated depth maps one by one.

Generated Depth maps for LaSOT

We manually remove bad sequences, and here are totally 646 sequences (some zip files may be broken, will be updated soon) used the DenseDepth method. Original DenseDepth outputs are in range [0, 1.0], we multiply 2^16. Please check LaSOT for RGB images and groundtruth.

part01, part02, part03, part04, part05, part06, part07, part08, part09, part10

The generated depth maps by using HighResDepth will be uploaded soon.

If you find some excellent methods to generate high quality depth images, please share it.

Architecture

Actually the network architecture is very simple, just adding one ResNet50 feature extractor for Depth input and then merging the RGB and Depth feature maps. Below figures are

the feature maps for RGB, D inputs and the merged RGBD ones,
the network for RGBD DiMP50, and
RGBD ATOM.

Download

Download the training dataset(70 sequences) of VOT2021RGBD Challenge from Zenodo (DepthTrack RGBD Tracking Benchmark) and edit the path in local.py More data will be uploaded soon, we hope to bring a large scale RGBD training dataset.

http://doi.org/10.5281/zenodo.4716441

Download the checkpoints for DeT trackers (in install.sh)

gdown https://drive.google.com/uc\?id\=1djSx6YIRmuy3WFjt9k9ZfI8q343I7Y75 -O pytracking/networks/DeT_DiMP50_Max.pth
gdown https://drive.google.com/uc\?id\=1JW3NnmFhX3ZnEaS3naUA05UaxFz6DLFW -O pytracking/networks/DeT_DiMP50_Mean.pth
gdown https://drive.google.com/uc\?id\=1wcGJc1Xq_7d-y-1nWh6M7RaBC1AixRTu -O pytracking/networks/DeT_DiMP50_MC.pth
gdown https://drive.google.com/uc\?id\=17IIroLZ0M_ZVuxkGN6pVy4brTpicMrn8 -O pytracking/networks/DeT_DiMP50_DO.pth
gdown https://drive.google.com/uc\?id\=17aaOiQW-zRCCqPePLQ9u1s466qCtk7Lh -O pytracking/networks/DeT_ATOM_Max.pth
gdown https://drive.google.com/uc\?id\=15LqCjNelRx-pOXAwVd1xwiQsirmiSLmK -O pytracking/networks/DeT_ATOM_Mean.pth
gdown https://drive.google.com/uc\?id\=14wyUaG-pOUu4Y2MPzZZ6_vvtCuxjfYPg -O pytracking/networks/DeT_ATOM_MC.pth

Install

bash install.sh path-to-anaconda DeT

Train

Using the default DiMP50 or ATOM pretrained checkpoints can reduce the training time.

For example, move the default dimp50.pth into the checkpoints folder and rename as DiMPNet_Det_EP0050.pth.tar

python run_training.py bbreg DeT_ATOM_Max
python run_training.py bbreg DeT_ATOM_Mean
python run_training.py bbreg DeT_ATOM_MC

python run_training.py dimp DeT_DiMP50_Max
python run_training.py dimp DeT_DiMP50_Mean
python run_training.py dimp DeT_DiMP50_MC

Test

python run_tracker.py atom DeT_ATOM_Max --dataset_name depthtrack --input_dtype rgbcolormap
python run_tracker.py atom DeT_ATOM_Mean --dataset_name depthtrack --input_dtype rgbcolormap
python run_tracker.py atom DeT_ATOM_MC --dataset_name depthtrack --input_dtype rgbcolormap

python run_tracker.py dimp DeT_DiMP50_Max --dataset_name depthtrack --input_dtype rgbcolormap
python run_tracker.py dimp DeT_DiMP50_Mean --dataset_name depthtrack --input_dtype rgbcolormap
python run_tracker.py dimp DeT_DiMP50_MC --dataset_name depthtrack --input_dtype rgbcolormap


python run_tracker.py dimp dimp50 --dataset_name depthtrack --input_dtype color
python run_tracker.py atom default --dataset_name depthtrack --input_dtype color

Comments

About depth image bit?
In "DepthTrack: Unveiling the Power of RGBD Tracking" it said

RGB images were stored as 24-bit JPEG with low compression rate and the depth frames as 16-bit PNG.

I use following code to read depth png:

from PIL import Image import numpy as np a = np.array(Image.open('....../adapter02_indoor/depth/00000001.png'))

And I find it's dtype is int32, how to understand?
opened by laisimiao 14
per attribute F-cores

Hello,

I want to know how to generate the per attribute F-scores (like the Figure 7 in your paper) when analyse the results of trackers? Is this figure generated by using VOT tookit, and if so, how does it work? At your convenience, would you please help me with this problem?

Thanks for your assistance.

opened by pilot00 6
running on custom dataset

Hi! I have already trained a detector on my dataset. What I would like to do now is to run a tracker of yours on my dataset, providing in input the results from detections. For example I have succesfully run some tracking algorithms as SORT, providing in input a format file as the MOT standard input. can you guide me in how to provide my dataset in input to your method?

opened by andreaceruti 5

Broken lasot depth zip list

The following is broken lasot depth zip list

'kangaroo-10',
 'kangaroo-11',
 'kangaroo-13',
 'kangaroo-17',
 'kangaroo-20',
 'kangaroo-3',
 'kangaroo-4',
 'kangaroo-6',
 'kangaroo-8',
 'kangaroo-9',
 'lion-10',
 'lion-11',
 'lion-12',
 'lion-15',
 'lion-18',
 'lion-2',
 'lion-20',
 'lion-4',
 'lion-7',
 'lion-8',
 'lizard-10',
 'lizard-11',
 'lizard-12',
 'lizard-13',
 'lizard-14',
 'lizard-15',
 'lizard-17',
 'lizard-19',
 'lizard-3',
 'lizard-5',
 'lizard-6',
 'lizard-8',
 'lizard-9',
 'microphone-16',
 'microphone-17',
 'microphone-18',
 'microphone-19',
 'microphone-3',
 'microphone-6',
 'monkey-1',
 'monkey-10',
 'monkey-11',
 'monkey-13',
 'monkey-14',
 'monkey-15',
 'monkey-16',
 'monkey-17',
 'monkey-2',
 'monkey-20',
 'monkey-3',
 'monkey-4',
 'monkey-5',
 'monkey-6',
 'monkey-8',
 'monkey-9',
 'motorcycle-12',
 'motorcycle-14',
 'motorcycle-17',
 'motorcycle-2',
 'motorcycle-20',
 'motorcycle-3',
 'motorcycle-6',
 'motorcycle-7',
 'motorcycle-9',
 'person-1',
 'person-11',
 'person-14',
 'person-15',
 'person-17',
 'person-18',
 'person-5',
 'person-6',
 'person-9',
 'pig-1',
 'pig-10',
 'pig-11',
 'pig-12',
 'pig-13',
 'pig-14',
 'pig-15',
 'pig-16',
 'pig-17',
 'pig-18',
 'pig-19',
 'pig-5',
 'pig-6',
 'pig-7',
 'rabbit-1',
 'rabbit-10',
 'rabbit-11',
 'rabbit-13',
 'rabbit-15',
 'rabbit-16',
 'rabbit-17',
 'rabbit-2',
 'rabbit-20',
 'rabbit-4',
 'rabbit-7',
 'rabbit-8',
 'rabbit-9',
 'robot-1',
 'robot-10',
 'robot-11',
 'robot-13',
 'robot-16',
 'robot-17',
 'robot-18',
 'robot-19',
 'robot-2',
 'robot-4',
 'robot-5',
 'robot-6',
 'robot-7',
 'robot-8',
 'rubicCube-10',
 'rubicCube-11',
 'rubicCube-12',
 'rubicCube-13',
 'rubicCube-14',
 'rubicCube-15',
 'rubicCube-16',
 'rubicCube-17',
 'rubicCube-18',
 'rubicCube-19',
 'rubicCube-20',
 'rubicCube-4',
 'rubicCube-5',
 'rubicCube-8',
 'rubicCube-9'

Appreciate that your hard fix work. BR.

opened by laisimiao 5

How to generate the F-score, Precision and Recall

Hello,

It's really great that you guys have disclosed such an excellent work.

I would like to know how to obtain the F-score, precision and recall values when I get the raw results with '.txt' suffix?

Thanks a lot!

opened by pilot00 4
Can we use DeT with detector?

Hi~ Thanks for the great work and code!

I run the code with Kinect and found it tracks objects quite well with a user-defined bounding box. I'm wondering if I can use the DeT tracker with some detectors e.g., YOLO/MaskRCNN. For example, the detectors predict the instance segmentation with bboxes and the bboxes are tracked by the DeT. Finally, the DeT will assign these bboxes with new IDs or associate them with previous IDs. However, the new bbox tracking is defined by users in the current setting, how to filter the detections automatically?

Is there any script or a hint for doing this? Thank you very much!

opened by kxhit 2
NAN convert

depthtrack.py gt = pandas.read_csv(bb_anno_file, delimiter=',', header=None, dtype=np.float32, na_filter=False, low_memory=False).values maybe na_filter should be true?

opened by ShangGaoG 1
A naming error

Following sequence_list tell us there is a seq called lock_wild in depthtrack test set, but actully it's in training set link you provide and called lock01_wild. At the same time, there is a seq called lock02_indoor in test set link you provide. https://github.com/xiaozai/DeT/blob/8aa5e2eae91e0f3f88290b4133aadf09aebb34b6/pytracking/evaluation/depthtrackdataset.py#L114

Please note this.

opened by laisimiao 1
Hi, can you share your version of torch, torchvision? Thanks.

File "../ltr/data/transforms.py", line 238, in transform_image rgb = tvisf.normalize(rgb, self.mean, self.std, self.inplace) TypeError: normalize() takes 3 positional arguments but 4 were given

opened by wangxiao5791509 2
Questions on DeT training datasets

Hi,

Thanks for the brilliant work on RGB-D tracking.

I am new to the tracking domain. I have several questions regarding the tracking datasets. Can you please help me to clarify several points?

Firstly, in the paper it mentions that the DeT is firstly pretrained on Pseudo LaSOT and Pseudo Coco, and then finetuned on DepthTrack. I would like to know if it makes any changes if we directly train on both three datasets?

Secondly, on the github it mentions that Using the default DiMP50 or ATOM pretrained checkpoints can reduce the training time. It seems that DiMP or ATOM are pretrained with larger RGB datasets (trackingnet, got10k, etc etc). I would like to know if these pretrained weights are adopted to initialize the model weight to produce the paper results? Or in the paper the network is only trained with Pseudo LaSOT, Pseudo Coco, and DepthTrack.

Finally, one question regarding the Table 2: Comparison of the original RGB trackers and their DeT variants. How are the RGB baseline trained? Only with RGB images from Pseudo LaSOT, Pseudo Coco, and DepthTrack? Are they only initialized with pretrained encoder (Imagenet)?

Sorry to bother you with all these questions... Looking forward to hearing from you. Thanks again

opened by Zongwei97 6

Owner

Yan Song

RGBD tracking, Computerized Anthropometry, 3D Human Body Shape Reconstruction

GitHub

Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

The official implementation of Mask-aware IoU and maYOLACT detector. Our implementation is based on mmdetection. Mask-aware IoU for Anchor Assignment

11 Oct 21, 2021

The pytorch implementation of SOKD (BMVC2021).

Semi-Online Knowledge Distillation Implementations of SOKD. Requirements This repo was tested with Python 3.8, PyTorch 1.5.1, torchvision 0.6.1, CUDA

4 Dec 19, 2021

[BMVC2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations"

DomainMix [BMVC2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations" [paper] [de

17 Dec 20, 2022

This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised d

3 Oct 22, 2021

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

66 Dec 26, 2022

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment (ICCV2021)

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment This is a pytorch project for the paper Seeing Dynamic Scene i

21 Nov 28, 2022

Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Robust Object Detection via Instance-Level Temporal Cycle Confusion This repo contains the implementation of the ICCV 2021 paper, Robust Object Detect

69 Oct 13, 2022

Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

CSRA This is the official code of ICCV 2021 paper: Residual Attention: A Simple But Effective Method for Multi-Label Recoginition Demo, Train and Vali

163 Dec 22, 2022

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Talk-to-Edit (ICCV2021) This repository contains the implementation of the following paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog Yumin

221 Jan 7, 2023

Official code for ICCV2021 paper "M3D-VTON: A Monocular-to-3D Virtual Try-on Network"

M3D-VTON: A Monocular-to-3D Virtual Try-On Network Official code for ICCV2021 paper "M3D-VTON: A Monocular-to-3D Virtual Try-on Network" Paper | Suppl

109 Dec 29, 2022

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

261 Jan 9, 2023

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Website | ArXiv | Get Start | Video PIRenderer The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic

81 Sep 25, 2021

Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera

SPEC: Seeing People in the Wild with an Estimated Camera [ICCV 2021] SPEC: Seeing People in the Wild with an Estimated Camera, Muhammed Kocabas, Chun-

187 Dec 26, 2022

Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation Created by Zeyu HU Introduction This work is based on our paper VMNet: Voxel-Mes

82 Dec 27, 2022

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Implicit Internal Video Inpainting Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation paper | project

202 Dec 30, 2022

This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

GMPQ: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation This is the pytorch implementation for the paper: Generalizable Mix

18 Sep 2, 2022

PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

StructDepth PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimat

112 Nov 28, 2022

This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021.

PyTorch implementation of DAQ This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021. For more informatio

36 Nov 4, 2022

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

107 Dec 20, 2022