Official implementation of Monocular Quasi-Dense 3D Object Tracking

Visual Intelligence and Systems Group

Last update: Dec 20, 2022

Related tags

Overview

Monocular Quasi-Dense 3D Object Tracking

Monocular Quasi-Dense 3D Object Tracking (QD-3DT) is an online framework detects and tracks objects in 3D using quasi-dense object proposals from 2D images.

Monocular Quasi-Dense 3D Object Tracking,
Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, Min Sun,
arXiv technical report (arXiv 2103.07351) Project Website (QD-3DT)

@article{Hu2021QD3DT,
    author = {Hu, Hou-Ning and Yang, Yung-Hsu and Fischer, Tobias and Yu, Fisher and Darrell, Trevor and Sun, Min},
    title = {Monocular Quasi-Dense 3D Object Tracking},
    journal = {ArXiv:2103.07351},
    year = {2021}
}

Abstract

A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer’s actions in numerous applications such as autonomous driving. We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform. The object association leverages quasi-dense similarity learning to identify objects in various poses and viewpoints with appearance cues only. After initial 2D association, we further utilize 3D bounding boxes depth-ordering heuristics for robust instance association and motion-based 3D trajectory prediction for re-identification of occluded vehicles. In the end, an LSTM-based object velocity learning module aggregates the long-term trajectory information for more accurate motion extrapolation. Experiments on our proposed simulation data and real-world benchmarks, including KITTI, nuScenes, and Waymo datasets, show that our tracking framework offers robust object association and tracking on urban-driving scenarios. On the Waymo Open benchmark, we establish the first camera-only baseline in the 3D tracking and 3D detection challenges. Our quasi-dense 3D tracking pipeline achieves impressive improvements on the nuScenes 3D tracking benchmark with near five times tracking accuracy of the best vision-only submission among all published methods.

Main results

3D tracking on nuScenes test set

We achieved the best vision-only submission

AMOTA	AMOTP
21.7	1.55

3D tracking on Waymo Open test set

We established the first camera-only baseline on Waymo Open

MOTA/L2	MOTP/L2
0.0001	0.0658

2D vehicle tracking on KITTI test set

MOTA	MOTP
86.44	85.82

Installation

Please refer to INSTALL.md for installation and to DATA.md dataset preparation.

Get Started

Please see GETTING_STARTED.md for the basic usage of QD-3DT.

MODEL ZOO

Please refer to MODEL_ZOO.md for reproducing the results on varients of benchmarks

Contact

This repo is currently maintained by Hou-Ning Hu (@eborboihuc), Yung-Hsu Yang (@RoyYang0714), and Tobias Fischer (@tobiasfshr).

License

This work is licensed under BSD 3-Clause License. See LICENSE for details. Third-party datasets and tools are subject to their respective licenses.

Acknowledgements

We thank Jiangmiao Pang for his help in providing the qdtrack codebase in mmdetection. This repo uses py-motmetrics for MOT evaluation, waymo-open-dataset for Waymo Open 3D detection and 3D tracking task, and nuscenes-devkit for nuScenes evaluation and preprocessing.

Comments

ModuleNotFoundError: No module named 'qd3dt.version'
dataset=nuscenes

config_path=configs/Nusc/quasi_r101_dcn_3dmatch_multibranch_conv_dep_dim_cen_clsrot_sep_aug_confidence_scale_no_filter.py

gpu_ids=0

gpu_nums=1

PY_ARGS='--data_split_prefix train --pure_det'

root=. ++ dirname Nusc/quasi_r101_dcn_3dmatch_multibranch_conv_dep_dim_cen_clsrot_sep_aug_confidence_scale_no_filter.py

folder=work_dirs/Nusc ++ basename -s .py configs/Nusc/quasi_r101_dcn_3dmatch_multibranch_conv_dep_dim_cen_clsrot_sep_aug_confidence_scale_no_filter.py

config=quasi_r101_dcn_3dmatch_multibranch_conv_dep_dim_cen_clsrot_sep_aug_confidence_scale_no_filter

cd mmcv ++ pwd

export PYTHONPATH=/root/docker2/qd-3dt/mmcv:

PYTHONPATH=/root/docker2/qd-3dt/mmcv:

cd ..

CUDA_VISIBLE_DEVICES=0

python3 -u ./tools/test_eval_video_exp.py nuscenes configs/Nusc/quasi_r101_dcn_3dmatch_multibranch_conv_dep_dim_cen_clsrot_sep_aug_confidence_scale_no_filter.py ./work_dirs/Nusc/quasi_r101_dcn_3dmatch_multibranch_conv_dep_dim_cen_clsrot_sep_aug_confidence_scale_no_filter/latest.pth ./work_dirs/Nusc/quasi_r101_dcn_3dmatch_multibranch_conv_dep_dim_cen_clsrot_sep_aug_confidence_scale_no_filter/output/output.pkl --data_split_prefix train --pure_det Traceback (most recent call last): File "./tools/test_eval_video_exp.py", line 10, in from qd3dt.datasets import build_dataloader, build_dataset File "/root/docker2/qd-3dt/qd3dt/init.py", line 1, in from .version import version, short_version ModuleNotFoundError: No module named 'qd3dt.version'
opened by xhangHU 13
Expected 88 from C header, got 80 from PyObject

I installed the project follows the instructions and prepared the KITTI data only. Pre-trained weights are placed in related folders. When I try test mode with the below script: ./scripts/test_eval_exp.sh kitti configs/KITTI/quasi_dla34_dcn_3dmatch_multibranch_conv_dep_dim_cen_clsrot_sep_aug_confidence_subtrain_mod_anchor_ratio_small_strides_GTA.py 0 1 --data_split_prefix subval_dla34_regress_GTA_VeloLSTM --add_ablation_exp all

but got the following error:

Traceback (most recent call last): File "./tools/test_eval_video_exp.py", line 10, in from qd3dt.datasets import build_dataloader, build_dataset File "/home/kid/workspace/qd-3dt/qd3dt/datasets/init.py", line 1, in from .custom import CustomDataset File "/home/kid/workspace/qd-3dt/qd3dt/datasets/custom.py", line 12, in from .extra_aug import ExtraAugmentation File "/home/kid/workspace/qd-3dt/qd3dt/datasets/extra_aug.py", line 5, in from qd3dt.core.evaluation.bbox_overlaps import bbox_overlaps File "/home/kid/workspace/qd-3dt/qd3dt/core/init.py", line 3, in from .evaluation import * # noqa: F401, F403 File "/home/kid/workspace/qd-3dt/qd3dt/core/evaluation/init.py", line 4, in from .coco_utils import coco_eval, fast_eval_recall, results2json File "/home/kid/workspace/qd-3dt/qd3dt/core/evaluation/coco_utils.py", line 3, in from pycocotools.coco import COCO File "/home/kid/anaconda3/envs/3dt/lib/python3.7/site-packages/pycocotools/coco.py", line 55, in from . import mask as maskUtils File "/home/kid/anaconda3/envs/3dt/lib/python3.7/site-packages/pycocotools/mask.py", line 3, in import pycocotools._mask as _mask File "pycocotools/_mask.pyx", line 1, in init pycocotools._mask ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

My environment is ubuntu18, but I think it is not related to the system version.

opened by hahakid 4

AttributeError When Evaluating on nuScenes data

I am currently trying to reproduce the nuScenes results as shown in the Getting Started page, but am running into an error when I try to run the run_eval_nusc.sh script. See the output trace below.

+ python3 -u ./tools/test_eval_video_exp.py nuscenes configs/Nusc/quasi_r101_dcn_3dmatch_multibranch_conv_dep_dim_cen_clsrot_sep_aug_confidence_scale_no_filter.py ./work_dirs/Nusc/quasi_r101_dcn_3dmatch_multibranch_conv_dep_dim_cen_clsrot_sep_aug_confidence_scale_no_filter/latest.pth ./work_dirs/Nusc/quasi_r101_dcn_3dmatch_multibranch_conv_dep_dim_cen_clsrot_sep_aug_confidence_scale_no_filter/output/output.pkl --data_split_prefix val --full_frames
Using agg as matplotlib backend
Starting ./work_dirs/Nusc/quasi_r101_dcn_3dmatch_multibranch_conv_dep_dim_cen_clsrot_sep_aug_confidence_scale_no_filter/output_val_box3d_deep_depth_motion_lstm_3dcen ...
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Traceback (most recent call last):
  File "./tools/test_eval_video_exp.py", line 987, in <module>
    main()
  File "./tools/test_eval_video_exp.py", line 960, in main
    best_model(args, out_path)
  File "./tools/test_eval_video_exp.py", line 295, in best_model
    best_model_Nusc(args, out_path)
  File "./tools/test_eval_video_exp.py", line 380, in best_model_Nusc
    run_inference_and_evaluate(args, cfg, out_path_exp)
  File "./tools/test_eval_video_exp.py", line 80, in run_inference_and_evaluate
    run_inference(cfg, args.checkpoint, out_path, show_time=args.show_time)
  File "./tools/test_eval_video_exp.py", line 109, in run_inference
    dataset = build_dataset(cfg.data.test)
  File "/qd-3dt/qd3dt/datasets/builder.py", line 36, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS)
  File "/qd-3dt/qd3dt/utils/registry.py", line 74, in build_from_cfg
    return obj_type(**args)
  File "/qd-3dt/qd3dt/datasets/video/bdd_vid_3d.py", line 20, in __init__
    super(BDDVid3DDataset, self).__init__(**kwargs)
  File "/qd-3dt/qd3dt/datasets/video/video_dataset.py", line 61, in __init__
    super(VideoDataset, self).__init__(*args, **kwargs)
  File "/qd-3dt/qd3dt/datasets/custom.py", line 69, in __init__
    self.img_infos = self.load_annotations(ann_file)
  File "/qd-3dt/qd3dt/datasets/video/video_dataset.py", line 131, in load_annotations
    self.cat_ids = api.getCatIds()
AttributeError: 'NoneType' object has no attribute 'getCatIds'

A series of other errors occur as the script attempts to execute subsequent commands after test_eval_video_exp.py errors out.

I am currently using Docker and have followed the installation and dataset setup instructions, which seem to have succeeded with no issues. Any ideas as to what is causing this error? Thanks.

opened by CWAndersn 2

Information about 2D bounding boxes

In the final result, the generated txt contains 3d information. I would like to know how the result of'(a) part' mentioned in the picture of your paper is transferred to'(b) part', you can tell How is this part of my content reflected in the code?

opened by xhangHU 2
ValueError: current limit exceeds maximum limit

Hi, thanks for the great work. I am getting the following error: Traceback (most recent call last): File "./tools/test_eval_video_exp.py", line 10, in <module> from qd3dt.datasets import build_dataloader, build_dataset File "/home/husam/qd-3dt/qd3dt/datasets/__init__.py", line 3, in <module> from .loader import GroupSampler, DistributedGroupSampler, build_dataloader File "/home/husam/qd-3dt/qd3dt/datasets/loader/__init__.py", line 1, in <module> from .build_loader import build_dataloader File "/home/husam/qd-3dt/qd3dt/datasets/loader/build_loader.py", line 15, in <module> resource.setrlimit(resource.RLIMIT_NOFILE, (65535, rlimit[1])) ValueError: current limit exceeds maximum limit

when trying to reproduce your results on the KITTI data set using the command: ./scripts/test_eval_exp.sh kitti configs/KITTI/quasi_dla34_dcn_3dmatch_multibranch_conv_dep_dim_cen_clsrot_sep_aug_confidence_subtrain_mod_anchor_ratio_small_strides_GTA.py 0 1 --data_split_prefix subval_dla34_regress_GTA_VeloLSTM --add_ablation_exp all

I carefully followed the steps in the GETTING_STARTED.md file and I think I was successful with the previous steps. And now I don't know why I am getting this error ?

opened by husamhamu 1
Runing Fp16 version

Thanks for the impressive work! I have a question concerning how to run the training script using Fp16 precision? (how to update the config file accordingly)

opened by Ibrahim-Halfaoui 1
Training on simulation and testing on real-world benchmark

Thank you for sharing your interesting work. Have you perhaps tried training your model on GTA data set and testing on real-world images? It would be interesting to see how model trained on synthetic data responds to real-world data.

opened by nikola310 1
dependencies motmetrics==1.2.0 and nuscenes-devkit==1.1.1 clashes

Hi guys,

impressive work! I am in the process of reproducing some results but think I found a dependency issue:

Pip complained and I verified with the requirements file of the nuscenes-devkit 1.1.1. These two libraries whose versions are specified in the requirements file are not compatible. Pip complained:

ERROR: Cannot install -r requirements.txt (line 11) and motmetrics==1.2.0 because these package versions have conflicting dependencies.

The conflict is caused by: The user requested motmetrics==1.2.0 nuscenes-devkit 1.1.1 depends on motmetrics<=1.1.3

My solution at the moment is to install nuscenes-devkit 1.1.3 instead but I am not sure yet if that doesn't break something. I will update this ticket if I find something not working.

opened by SaschaHornauer 1

Nuscenes Conversion Process Killed without Error Message

I am currently attempting to reproduce the nuScenes dataset results according to the instructions on the Getting Startedpage, but encountered the following result during the conversion:

Done loading in 44.116 seconds.
======
Reverse indexing ...
Done reverse indexing in 12.9 seconds.
======
total scene num: 850
exist scene num: 850
train scene: 700, val scene: 150
=====
Converting training set
=====
converting CAM_FRONT
100%|█████████████████████████████████████| 34149/34149 [09:24<00:00, 60.53it/s]
converting CAM_FRONT_RIGHT
100%|█████████████████████████████████████| 34149/34149 [08:31<00:00, 66.74it/s]
converting CAM_BACK_RIGHT
100%|█████████████████████████████████████| 34149/34149 [08:18<00:00, 68.45it/s]
converting CAM_BACK
100%|█████████████████████████████████████| 34149/34149 [09:40<00:00, 58.79it/s]
converting CAM_BACK_LEFT
 47%|█████████████████▏                   | 15902/34149 [14:47<59:00,  5.15it/s]
Killed

The lack of any error message makes it unclear what went wrong and how to avoid this error to get the proper conversion. Any idea why this occurs?

opened by CWAndersn 0

results on validation set

Hi,

Thanks for your excellent work.

I wonder whether you can provide the inference results (the .json file, I think which can be accessed by running first part of the sciprs/run_eval_nusc.sh) on the nuScenes validation set. (In submission format, which can be called with evaluating tools provided by nuscenes-devkit. )

Because it is easier for us to visualization your algorithms, study the failure cases. And it is also easier for other people to analyze the strength and weaknesses of your algorithms.

Best, Tianyuan

opened by a1600012888 0
About DistributedDataParallel

Hi, I can see that the source code only use non_distributed training even with multiple GPUs training. Is there any special reason why you use non_distributed training?

opened by cijj 0
Coordinate frame for camera pose
Hi everyone,

I am building a data pipeline to run with qd-3dt as follows:

Extract RGB frames from a monocular video (I have the camera intrinsics)

Generate depth maps using a depth detector (packnet-sfm/monodepth2, etc)

Generate camera trajectory pose using RGBD SLAM (ORB-SLAM3)

Pass the camera trajectory and the RGB frames to qd-3dt to get the 3D detections.

The camera trajectory from ORB-SLAM3 has the format [timestamp, tx, ty, tz, qx, qy, qz, qw], where (tx, ty, tz) is the translation and the (qx, qy, qz, qw) is the orientation in the form of a quaternion. The frame axis for these points is (z-forward, y-left and x-down).

What coordinate frame does the camera pose need to be in when we pass it to qd-3dt? I tried rotating the translation vector by 270 degrees XZ to get a (x-forward, y-right, z-down) frame, however, it does not seem to work. The vehicle trajectory is somehow represented upwards (screenshot: https://imgur.com/a/cAl3ptD).

Has anyone converted the TUM camera trajectory to work with this project?
opened by C-Aniruddh 0
Minimalistic inference example
Hi

Nice work. Congrats!

Would it be possible to provide or give directions as to where to find a minimalistic inference example? Something like

Install (probably using instructions already provided)

Download models (same)

Run something like python predict.py -i input_video.mp4 --output results.json --overlay augm_video.mp4 potentially with some extra arguments to locate the pretrained models and produce results (3D boxes, tracking) + (optionally but would be very nice to have) the video with overlays?

Thank you.
opened by douglas125 0
RuntimeError: CUDA out of memory. Tried to allocate 84.00 MiB (GPU 0; 3.82 GiB total capacity; 2.37 GiB already allocated; 76.44 MiB free; 2.52 GiB reserved in total by PyTorch)

Hi, can you please share with us a way to solve this error:

RuntimeError: CUDA out of memory. Tried to allocate 84.00 MiB (GPU 0; 3.82 GiB total capacity; 2.37 GiB already allocated; 76.44 MiB free; 2.52 GiB reserved in total by PyTorch)

First, I though it might be a compatibility issue, even though the message is quite clear that this is not the case so nothing really worked for me. Now I am having a hard time figuring out how to solve it, I would appreciate some help. Thanks

opened by husamhamu 1
CUDA out of memory

I have implemented your training process on nuScenes dataset. I used your default settings. My environment is like 4x3090, but the error is like "CUDA out of memory". How I can do to adjust the parameters and use less GPU memory?

opened by synsin0 0
Interpreting the Output of the QuasiDense3DSepUncertainty Model
I have been attempting to utilize your model with full 3D monocular tracking on custom data, and for that I would like to make use of the inference api. Although I want to use custom data, I am currently trying to run and visualize the model on the nuscenes dataset to verify that the API is working correctly. I am using the included monocular 3D Detection/Tracking result for nuscenes from the model zoo with the corresponding QuasiDense3DSepUncertainty model.

In order to work with the nuscenes configuration of the model, I had to modify the img_meta created in the api during _prepare_data as shown below. I believe this is necessary because this api was originally intended for a different model configuration.

def _prepare_data(img, calib, pose, img_transform, cfg, device): ori_shape = img.shape img, img_shape, pad_shape, scale_factor = img_transform( img, scale=cfg.data.test.img_scale, keep_ratio=cfg.data.test.get('resize_keep_ratio', True)) img = to_tensor(img).to(device).unsqueeze(0) img_meta = [ dict( ori_shape=ori_shape, img_shape=img_shape, pad_shape=pad_shape, scale_factor=scale_factor, flip=False, calib=calib, pose=pose, img_info = dict( type="TRK", cali=calib, pose=pose ) ) ] return dict(img=[img], img_meta=[img_meta])

I am now attempting to perform a 3D visualization of the model output, basing my approach to the visualization based on the scripts/plot_tracking.py code. However, the resulting model output is not what I would expect it to be.

results, use_3d_center = inference_detector(model, img_path, calib, pose, nuscenes_categories) print(len(results["depth_results"])) print(len(results["alpha_results"])) print(results["track_results"])

A common output of this code would look like this:

30 30 defaultdict(<class 'list'>, {0: {'bbox': array([ 427.682, 518.581, 446.410, 540.689, 0.056], dtype=float32), 'label': 8}})

My main issues stems from the fact that the track_results always seem to only include one item, but tools/general_output.py seems to imply that the number of items should be the same as the length of the other results(depth_results, alpha_results, ect).

I have found that associating the 3d information(depth_results, dim_results, alpha_results) with the 2d bbox information output by the model, I can get 3d bboxes that seem to be working to an extent, but not of the quality seen when using the inference and detection scripts that read from your converted dataset format. See some examples below:

In short, I would appreciate any insight into the direct usage of the QuasiDense3DSepUncertainty model, which doesn't seem to behave as expected when using the api provided in qd3dt/api/inference.py. It seems, based on the code used to run inference in tools/test_eval_video_exp.py and tools/general_output.py, that the track_results returned in the output should have more items, but instead it only outputs one item every time.

Is my assessment of the track_results output correct? What should the track_results output actually look like? Are there any assumptions that this inference API makes that would cause issues when attempting to use it with this model with full 3D tracking?

Thank you for your time and assistance.
opened by CWAndersn 0

Official implementation of Monocular Quasi-Dense 3D Object Tracking

Related tags

Overview

Monocular Quasi-Dense 3D Object Tracking

Abstract

Main results

3D tracking on nuScenes test set

3D tracking on Waymo Open test set

2D vehicle tracking on KITTI test set

Installation

Get Started

MODEL ZOO

Contact

License

Acknowledgements

Comments

Owner

Visual Intelligence and Systems Group

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

Official Implementation of DDOD (Disentangle your Dense Object Detector), ACM MM2021

Python package for multiple object tracking research with focus on laboratory animals tracking.

Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

The official implementation of ICCV paper "Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds".

Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

Object Detection and Multi-Object Tracking

SiamMOT is a region-based Siamese Multi-Object Tracking network that detects and associates object instances simultaneously.

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

Official code for 'Robust Siamese Object Tracking for Unmanned Aerial Manipulator' and offical introduction to UAMT100 benchmark

The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.