Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

XINZHU.MA

Last update: Jan 4, 2023

Related tags

Deep Learning monodle

Overview

Delving into Localization Errors for Monocular 3D Detection

By Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang.

Introduction

This repository is an official implementation of the paper 'Delving into Localization Errors for Monocular 3D Detection'. In this work, by intensive diagnosis experiments, we quantify the impact introduced by each sub-task and found the ‘localization error’ is the vital factor in restricting monocular 3D detection. Besides, we also investigate the underlying reasons behind localization errors, analyze the issues they might bring, and propose three strategies.

Usage

Installation

This repo is tested on our local environment (python=3.6, cuda=9.0, pytorch=1.1), and we recommend you to use anaconda to create a vitural environment:

conda create -n monodle python=3.6

Then, activate the environment:

conda activate monodle

Install Install PyTorch:

conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0 -c pytorch

and other requirements:

pip install -r requirements.txt

Data Preparation

Please download KITTI dataset and organize the data as follows:

#ROOT
  |data/
    |KITTI/
      |ImageSets/ [already provided in this repo]
      |object/			
        |training/
          |calib/
          |image_2/
          |label/
        |testing/
          |calib/
          |image_2/

Training & Evaluation

Move to the workplace and train the network:

 cd #ROOT
 cd experiments/example
 python ../../tools/train_val.py --config config_patchnet.yaml

The model will be evaluated automatically if the training completed. If you only want evaluate your trained model (or the provided pretrained model) , you can modify the test part configuration in the .yaml file and use the following command:

python ../../tools/train_val.py --config config_patchnet.yaml --e

For ease of use, we also provide a pre-trained checkpoint, which can be used for evaluation directly. See the below table to check the performance.

	AP40@Easy	AP40@Mod.	AP40@Hard
In original paper	17.45	13.66	11.68
In this repo	17.94	13.72	12.10

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Ma_2021_CVPR,
author = {Ma, Xinzhu and Zhang, Yinmin, and Xu, Dan and Zhou, Dongzhan and Yi, Shuai and Li, Haojie and Ouyang, Wanli},
title = {Delving into Localization Errors for Monocular 3D Object Detection},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021}}

Acknowlegment

This repo benefits from the excellent work CenterNet. Please also consider citing it.

License

This project is released under the MIT License.

Contact

If you have any question about this project, please feel free to contact [email protected].

Comments

How do I conduct the experiments in Table 1 mentioned in the paper?
How do I conduct the experiments in Table 1 mentioned in the paper?

I change the below code but The result is so bad.

How can I get the Table 1 result?

Code:

def evaluate(label_path, result_path, label_split_file, current_class=0, coco=False, score_thresh=-1): dt_annos = kitti.get_label_annos(result_path) if score_thresh > 0: dt_annos = kitti.filter_annos_low_score(dt_annos, score_thresh) val_image_ids = _read_imageset_file(label_split_file) gt_annos = kitti.get_label_annos(label_path, val_image_ids) dt_annos = kitti.get_label_annos(label_path, val_image_ids) if coco: return get_coco_eval_result(gt_annos, dt_annos, current_class) else: return get_official_eval_result(gt_annos, dt_annos, current_class)

Result: 2021-08-13 18:43:24,476 INFO Car [email protected], 0.70, 0.70: bbox AP:100.0000, 100.0000, 100.0000 bev AP:0.0865, 0.1172, 0.2115 3d AP:0.0865, 0.1172, 0.2115 aos AP:100.00, 100.00, 100.00 Car [email protected], 0.70, 0.70: bbox AP:100.0000, 100.0000, 100.0000 bev AP:0.0238, 0.0322, 0.0582 3d AP:0.0238, 0.0322, 0.0582 aos AP:100.00, 100.00, 100.00 Car [email protected], 0.50, 0.50: bbox AP:100.0000, 100.0000, 100.0000 bev AP:0.0865, 0.1172, 0.2115 3d AP:0.0865, 0.1172, 0.2115 aos AP:100.00, 100.00, 100.00 Car [email protected], 0.50, 0.50: bbox AP:100.0000, 100.0000, 100.0000 bev AP:0.0238, 0.0322, 0.0582 3d AP:0.0238, 0.0322, 0.0582 aos AP:100.00, 100.00, 100.00
opened by sjg02122 17
Code & Result on nuScenes

Hi, I have reproduced your result on KITTI. I can't find your submission on the nuScenes leaderboard, so I wonder if you have done experiments on the nuScenes dataset. If yes, could you share your accuracy? Will you release the code for nuScenes?

Thanks for your excellent work~

opened by Treemann 11
question about training results?

Hi, @xinzhuma Thanks for your amazing work! I trained your code and got the following results: Do you have any suggestions about unstable training results? I think you have made code reproducible in set_random_seed.

opened by Senwang98 10
The detection results on val split is poor.

I train the model on single GPU and never use the distributed training. The batch size is set at 8. The detection results are as follows: Car [email protected], 0.70, 0.70: bbox AP:89.6069, 86.3804, 78.2012 bev AP:23.5712, 18.8870, 15.9993 3d AP:16.8725, 13.4053, 12.3370 aos AP:88.85, 84.82, 76.00 Car [email protected], 0.70, 0.70: bbox AP:94.9005, 89.4963, 80.5794 bev AP:21.5961, 16.3269, 13.8877 3d AP:14.9669, 11.0160, 9.5246 aos AP:93.97, 87.69, 78.04

opened by jichaofeng 8
Does the provided pretrianed model only train on Car?

Hi, @xinzhuma Many thanks for the great work! When I do the evaluation, it only tests on the Car. When I add the Pedestrian and Cyclist, the results are 0. So does the model only train on Car?

opened by karenyun 7
Support 30 GPU and bug of random seed

Hi, @xinzhuma Q1: It seems that code can't run on 30x0 GPU. I guess your code can't support cuda11 now.

Q2:It can't set random seed to do further experiment. Since you have set fixed random seed now, but it is not really work as expected. Each time you train one model, the final results are different. Do you have any suggestions about it? thanks very much! (This is strange, because we always use this method to make model reproducted)

opened by Senwang98 7
训练代码图片预处理中flip时，x未取反，这是个bug吗？
图片预处理代码中当flip时，并未对x取反，这样会导致box3d在图中投影不对。这是一个bug吗？ https://github.com/xinzhuma/monodle/blob/273d801a11ca048283abce6cc6a9898e76322f8e/lib/datasets/kitti/kitti_dataset.py#L165

是否要加入：

object.pos[0] = -object.pos[0]

@xinzhuma
opened by DuZzzs 7
多卡训练显卡利用率0%，训练很慢
您好，感谢您的优秀工作。我在跑代码的过程中遇到了以下几个问题。

测试预训练模型时，readme中的预训练模型是多卡训练的，单卡测试会报错。解决办法：在monodle/lib/helpers/save_helper.py:32 加入：

model = torch.nn.DataParallel(model).cuda() # 强制让模型多卡

训练时，self.stats['train'] = {}这里会报错。解决办法：在amonodle/lib/helpers/trainer_helper.py:90前面加入self.stats={}。

2080ti上，2块卡训练，batch_size=16，显卡利用率大部分时间是0%，训练很慢。在代码中没看到您用torch的Dataloader，不清楚怎么加num_workers。请问有遇到类似的问题吗？谢谢
opened by DuZzzs 7
change backbone to DLA102, performance degradation

Hi, @xinzhuma Have you test backbone such as DLA102? When change config file dla34 to dla102, the 3D AP is only 10 which is 6% performance degradation compared to default config setting. Wishing for your reply.

opened by Senwang98 6
a question about ablation study

We propose a method based on the monodle and need to do ablation study to evaluate the effect of our design. We obtain the different model when using the same code to train the network , so the detection results are vary largely. In this case, how to do the ablation study? Do you have a solution? We train the model on a single GPU.

opened by jichaofeng 5
请问关于远距离样本过滤，直接过滤不进行ignore处理，远距离样本不就当成负样本，这样不会有问题吗？

您好，感谢您的优秀工作。关于Training Samples这块有一些疑问，我看代码中是直接按照距离进行了过滤，并没有进行ignore处理 https://github.com/xinzhuma/monodle/blob/main/lib/datasets/kitti/kitti_dataset.py#L201

直接过滤不进行ignore处理，远距离样本不就当成负样本，这样不会有问题吗？

opened by yjcn 4
Instance level

Hello, when reading the tag data, if I change the filter conditions of the instance level, does it mean that the distance of the model I trained focus is different, and will the level filtering hyperparameter affect the evaluation process?

opened by shanqiu24 0
Image size

Hello!

Thank you for your work. I have a question to ask you. I made my own dataset in KITTI format, but the image size is 1280 * 720. How do I need to modify the corresponding code

opened by myfun-deep 0
Q

Why is the accuracy of the moderate car reported in the paper is 12.28, but in this repository reports 13.66 and 13.72？Are the tables in this repository describing the results on the val set?

opened by shanqiu24 1
Wonder

why when I train monodle： (monodle) G:\WWProject\monodle\experiments\example>python ../../tools/train_val.py --config kitti_example.yaml 2022-10-11 10:46:15,326 INFO ################### Training ################## 2022-10-11 10:46:15,326 INFO Batch Size: 16 2022-10-11 10:46:15,326 INFO Learning Rate: 0.001250 Then it took more than half an hour to appear in the training epoch progress bar, I changed the original monodle to a single GPU run, mine is a single 3090 epochs: 0%| | 0/140 [00:00<?, ?it/s] iters: 0%| | 0/232 [00:00<?, ?it/s]

opened by shanqiu24 0
error in implementing dim_aware_loss

Hi,thanks for your great work. when we use dim_aware_loss to train dim we found that there is some bug in dim_aware_loss.

https://github.com/xinzhuma/monodle/blob/e426aa65fdc7ceedcaab0d637acf3d3425d0736c/lib/losses/dim_aware_loss.py#L8

loss /= dimension may change the gradient direction of one of the parameters (h, w, l). For example, when h < 0, the gradient is of |h - h*| is negative, so that the graident changes to positive when divided by h and this will cause the loss to increase when using gradient descent to update parameters. The gradient direction of all parameters should not be changed after dividing by dimension. We recommand that this line be changed to loss/=torch.abs(dimension).

opened by zhaokai5 0

Owner

XINZHU.MA

PhD student at the University of Sydney.

GitHub

[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression

Delving into Deep Imbalanced Regression This repository contains the implementation code for paper: Delving into Deep Imbalanced Regression Yuzhe Yang

568 Dec 30, 2022

PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Out-of-distribution Generalization Investigation on Vision Transformers This repository contains PyTorch evaluation code for Delving Deep into the Gen

72 Dec 13, 2022

SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

SSL_SLAM2 Lightweight 3-D Localization and Mapping for Solid-State LiDAR (Intel Realsense L515 as an example) This repo is an extension work of SSL_SL

1.3k Jan 8, 2023

Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

146 Dec 24, 2022

Localization Distillation for Object Detection

Localization Distillation for Object Detection This repo is based on mmDetection. This is the code for our paper: Localization Distillation

274 Dec 26, 2022

Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Visual 3D Detection Package: This repo aims to provide flexible and reproducible visual 3D detection on KITTI dataset. We expect scripts starting from

305 Dec 19, 2022

Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Mo

76 Jan 2, 2023

Categorical Depth Distribution Network for Monocular 3D Object Detection

CaDDN CaDDN is a monocular-based 3D object detection method. This repository is based off of [OpenPCDet]. Categorical Depth Distribution Network for M

289 Jan 5, 2023

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow

190 Dec 30, 2022

[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

MonoRUn MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation. CVPR 2021. [paper] Hansheng Chen, Yuyao Huang, Wei Tian*

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University)

96 Dec 10, 2022

Progressive Coordinate Transforms for Monocular 3D Object Detection

Progressive Coordinate Transforms for Monocular 3D Object Detection This repository is the official implementation of PCT. Introduction In this paper,

58 Nov 6, 2022

Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

MonoFlex Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21. Work in progress. Installation This repo is tested w

169 Dec 6, 2022

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

107 Dec 20, 2022

Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

RTM3D-PyTorch The PyTorch Implementation of the paper: RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 2020

271 Nov 29, 2022

[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers This is the official PyTorch implementation and models for UP-DETR paper: @a

430 Dec 23, 2022

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information by Masato Tamura, Hiroki Ohashi, and Tomoaki Yosh

105 Dec 23, 2022

Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

Related tags

Overview

Delving into Localization Errors for Monocular 3D Detection

Introduction

Usage

Installation

Data Preparation

Training & Evaluation

Citation

Acknowlegment

License

Contact

Comments

Owner

XINZHU.MA

[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression

PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

Localization Distillation for Object Detection

Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

Categorical Depth Distribution Network for Monocular 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

Progressive Coordinate Transforms for Monocular 3D Object Detection

Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

Repository of 3D Object Detection with Pointformer (CVPR2021)

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals, CVPR2021