Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

Overview

Delving into Localization Errors for Monocular 3D Detection

By Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang.

Introduction

This repository is an official implementation of the paper 'Delving into Localization Errors for Monocular 3D Detection'. In this work, by intensive diagnosis experiments, we quantify the impact introduced by each sub-task and found the ‘localization error’ is the vital factor in restricting monocular 3D detection. Besides, we also investigate the underlying reasons behind localization errors, analyze the issues they might bring, and propose three strategies.

vis

Usage

Installation

This repo is tested on our local environment (python=3.6, cuda=9.0, pytorch=1.1), and we recommend you to use anaconda to create a vitural environment:

conda create -n monodle python=3.6

Then, activate the environment:

conda activate monodle

Install Install PyTorch:

conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0 -c pytorch

and other requirements:

pip install -r requirements.txt

Data Preparation

Please download KITTI dataset and organize the data as follows:

#ROOT
  |data/
    |KITTI/
      |ImageSets/ [already provided in this repo]
      |object/			
        |training/
          |calib/
          |image_2/
          |label/
        |testing/
          |calib/
          |image_2/

Training & Evaluation

Move to the workplace and train the network:

 cd #ROOT
 cd experiments/example
 python ../../tools/train_val.py --config config_patchnet.yaml

The model will be evaluated automatically if the training completed. If you only want evaluate your trained model (or the provided pretrained model) , you can modify the test part configuration in the .yaml file and use the following command:

python ../../tools/train_val.py --config config_patchnet.yaml --e

For ease of use, we also provide a pre-trained checkpoint, which can be used for evaluation directly. See the below table to check the performance.

AP40@Easy AP40@Mod. AP40@Hard
In original paper 17.45 13.66 11.68
In this repo 17.94 13.72 12.10

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Ma_2021_CVPR,
author = {Ma, Xinzhu and Zhang, Yinmin, and Xu, Dan and Zhou, Dongzhan and Yi, Shuai and Li, Haojie and Ouyang, Wanli},
title = {Delving into Localization Errors for Monocular 3D Object Detection},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021}}

Acknowlegment

This repo benefits from the excellent work CenterNet. Please also consider citing it.

License

This project is released under the MIT License.

Contact

If you have any question about this project, please feel free to contact [email protected].

Comments
  • How do I conduct the experiments in Table 1 mentioned in the paper?

    How do I conduct the experiments in Table 1 mentioned in the paper?

    How do I conduct the experiments in Table 1 mentioned in the paper?

    I change the below code but The result is so bad.

    How can I get the Table 1 result?

    Code:

    def evaluate(label_path,
                 result_path,
                 label_split_file,
                 current_class=0,
                 coco=False,
                 score_thresh=-1):
        dt_annos = kitti.get_label_annos(result_path)
        if score_thresh > 0:
            dt_annos = kitti.filter_annos_low_score(dt_annos, score_thresh)
        val_image_ids = _read_imageset_file(label_split_file)
        gt_annos = kitti.get_label_annos(label_path, val_image_ids)
        dt_annos = kitti.get_label_annos(label_path, val_image_ids)
        if coco:
            return get_coco_eval_result(gt_annos, dt_annos, current_class)
        else:
            return get_official_eval_result(gt_annos, dt_annos, current_class)
    

    Result: 2021-08-13 18:43:24,476 INFO Car [email protected], 0.70, 0.70: bbox AP:100.0000, 100.0000, 100.0000 bev AP:0.0865, 0.1172, 0.2115 3d AP:0.0865, 0.1172, 0.2115 aos AP:100.00, 100.00, 100.00 Car [email protected], 0.70, 0.70: bbox AP:100.0000, 100.0000, 100.0000 bev AP:0.0238, 0.0322, 0.0582 3d AP:0.0238, 0.0322, 0.0582 aos AP:100.00, 100.00, 100.00 Car [email protected], 0.50, 0.50: bbox AP:100.0000, 100.0000, 100.0000 bev AP:0.0865, 0.1172, 0.2115 3d AP:0.0865, 0.1172, 0.2115 aos AP:100.00, 100.00, 100.00 Car [email protected], 0.50, 0.50: bbox AP:100.0000, 100.0000, 100.0000 bev AP:0.0238, 0.0322, 0.0582 3d AP:0.0238, 0.0322, 0.0582 aos AP:100.00, 100.00, 100.00

    opened by sjg02122 17
  • Code & Result on nuScenes

    Code & Result on nuScenes

    Hi, I have reproduced your result on KITTI. I can't find your submission on the nuScenes leaderboard, so I wonder if you have done experiments on the nuScenes dataset. If yes, could you share your accuracy? Will you release the code for nuScenes?

    Thanks for your excellent work~

    opened by Treemann 11
  • question about training results?

    question about training results?

    Hi, @xinzhuma Thanks for your amazing work! I trained your code and got the following results: image Do you have any suggestions about unstable training results? I think you have made code reproducible in set_random_seed.

    opened by Senwang98 10
  • The detection results on val split is poor.

    The detection results on val split is poor.

    I train the model on single GPU and never use the distributed training. The batch size is set at 8. The detection results are as follows: Car [email protected], 0.70, 0.70: bbox AP:89.6069, 86.3804, 78.2012 bev AP:23.5712, 18.8870, 15.9993 3d AP:16.8725, 13.4053, 12.3370 aos AP:88.85, 84.82, 76.00 Car [email protected], 0.70, 0.70: bbox AP:94.9005, 89.4963, 80.5794 bev AP:21.5961, 16.3269, 13.8877 3d AP:14.9669, 11.0160, 9.5246 aos AP:93.97, 87.69, 78.04

    opened by jichaofeng 8
  • Does the provided pretrianed model only train on Car?

    Does the provided pretrianed model only train on Car?

    Hi, @xinzhuma Many thanks for the great work! When I do the evaluation, it only tests on the Car. When I add the Pedestrian and Cyclist, the results are 0. So does the model only train on Car?

    opened by karenyun 7
  • Support 30 GPU and bug of random seed

    Support 30 GPU and bug of random seed

    Hi, @xinzhuma Q1: It seems that code can't run on 30x0 GPU. I guess your code can't support cuda11 now. image

    Q2:It can't set random seed to do further experiment. Since you have set fixed random seed now, but it is not really work as expected. Each time you train one model, the final results are different. Do you have any suggestions about it? thanks very much! (This is strange, because we always use this method to make model reproducted)

    opened by Senwang98 7
  • 训练代码图片预处理中flip时,x未取反,这是个bug吗?

    训练代码图片预处理中flip时,x未取反,这是个bug吗?

    图片预处理代码中当flip时,并未对x取反,这样会导致box3d在图中投影不对。这是一个bug吗? https://github.com/xinzhuma/monodle/blob/273d801a11ca048283abce6cc6a9898e76322f8e/lib/datasets/kitti/kitti_dataset.py#L165

    是否要加入:

    object.pos[0] = -object.pos[0]
    

    @xinzhuma

    opened by DuZzzs 7
  • 多卡训练显卡利用率0%,训练很慢

    多卡训练显卡利用率0%,训练很慢

    您好,感谢您的优秀工作。我在跑代码的过程中遇到了以下几个问题。

    1. 测试预训练模型时,readme中的预训练模型是多卡训练的,单卡测试会报错。解决办法:在monodle/lib/helpers/save_helper.py:32 加入:
        model = torch.nn.DataParallel(model).cuda()  # 强制让模型多卡
    
    1. 训练时,self.stats['train'] = {}这里会报错。解决办法:在amonodle/lib/helpers/trainer_helper.py:90前面加入self.stats={}
    2. 2080ti上,2块卡训练,batch_size=16,显卡利用率大部分时间是0%,训练很慢。在代码中没看到您用torch的Dataloader,不清楚怎么加num_workers。请问有遇到类似的问题吗?谢谢
    opened by DuZzzs 7
  • change backbone to DLA102, performance degradation

    change backbone to DLA102, performance degradation

    Hi, @xinzhuma Have you test backbone such as DLA102? When change config file dla34 to dla102, the 3D AP is only 10 which is 6% performance degradation compared to default config setting. Wishing for your reply.

    opened by Senwang98 6
  • a question about ablation study

    a question about ablation study

    We propose a method based on the monodle and need to do ablation study to evaluate the effect of our design. We obtain the different model when using the same code to train the network , so the detection results are vary largely. In this case, how to do the ablation study? Do you have a solution? We train the model on a single GPU.

    opened by jichaofeng 5
  • 请问关于远距离样本过滤,直接过滤不进行ignore处理,远距离样本不就当成负样本,这样不会有问题吗?

    请问关于远距离样本过滤,直接过滤不进行ignore处理,远距离样本不就当成负样本,这样不会有问题吗?

    您好,感谢您的优秀工作。关于Training Samples这块有一些疑问,我看代码中是直接按照距离进行了过滤,并没有进行ignore处理 https://github.com/xinzhuma/monodle/blob/main/lib/datasets/kitti/kitti_dataset.py#L201 image

    直接过滤不进行ignore处理,远距离样本不就当成负样本,这样不会有问题吗?

    opened by yjcn 4
  • Instance level

    Instance level

    Hello, when reading the tag data, if I change the filter conditions of the instance level, does it mean that the distance of the model I trained focus is different, and will the level filtering hyperparameter affect the evaluation process?

    opened by shanqiu24 0
  • Image size

    Image size

    Hello!

    Thank you for your work. I have a question to ask you. I made my own dataset in KITTI format, but the image size is 1280 * 720. How do I need to modify the corresponding code

    opened by myfun-deep 0
  • Q

    Q

    Why is the accuracy of the moderate car reported in the paper is 12.28, but in this repository reports 13.66 and 13.72?Are the tables in this repository describing the results on the val set?

    opened by shanqiu24 1
  • Wonder

    Wonder

    why when I train monodle: (monodle) G:\WWProject\monodle\experiments\example>python ../../tools/train_val.py --config kitti_example.yaml 2022-10-11 10:46:15,326 INFO ################### Training ################## 2022-10-11 10:46:15,326 INFO Batch Size: 16 2022-10-11 10:46:15,326 INFO Learning Rate: 0.001250 Then it took more than half an hour to appear in the training epoch progress bar, I changed the original monodle to a single GPU run, mine is a single 3090 epochs: 0%| | 0/140 [00:00<?, ?it/s] iters: 0%| | 0/232 [00:00<?, ?it/s]

    opened by shanqiu24 0
  • error in implementing dim_aware_loss

    error in implementing dim_aware_loss

    Hi,thanks for your great work. when we use dim_aware_loss to train dim we found that there is some bug in dim_aware_loss.

    https://github.com/xinzhuma/monodle/blob/e426aa65fdc7ceedcaab0d637acf3d3425d0736c/lib/losses/dim_aware_loss.py#L8

    loss /= dimension may change the gradient direction of one of the parameters (h, w, l). For example, when h < 0, the gradient is of |h - h*| is negative, so that the graident changes to positive when divided by h and this will cause the loss to increase when using gradient descent to update parameters. The gradient direction of all parameters should not be changed after dividing by dimension. We recommand that this line be changed to loss/=torch.abs(dimension).

    opened by zhaokai5 0
Owner
XINZHU.MA
PhD student at the University of Sydney.
XINZHU.MA
[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression

Delving into Deep Imbalanced Regression This repository contains the implementation code for paper: Delving into Deep Imbalanced Regression Yuzhe Yang

Yuzhe Yang 568 Dec 30, 2022
PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Out-of-distribution Generalization Investigation on Vision Transformers This repository contains PyTorch evaluation code for Delving Deep into the Gen

Chongzhi Zhang 72 Dec 13, 2022
SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

SSL_SLAM2 Lightweight 3-D Localization and Mapping for Solid-State LiDAR (Intel Realsense L515 as an example) This repo is an extension work of SSL_SL

Wang Han 王晗 1.3k Jan 8, 2023
Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

Tencent YouTu Research 146 Dec 24, 2022
Localization Distillation for Object Detection

Localization Distillation for Object Detection This repo is based on mmDetection. This is the code for our paper: Localization Distillation

null 274 Dec 26, 2022
Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Visual 3D Detection Package: This repo aims to provide flexible and reproducible visual 3D detection on KITTI dataset. We expect scripts starting from

Yuxuan Liu 305 Dec 19, 2022
Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Mo

Abhinav Kumar 76 Jan 2, 2023
Categorical Depth Distribution Network for Monocular 3D Object Detection

CaDDN CaDDN is a monocular-based 3D object detection method. This repository is based off of [OpenPCDet]. Categorical Depth Distribution Network for M

Toronto Robotics and AI Laboratory 289 Jan 5, 2023
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow 190 Dec 30, 2022
[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

MonoRUn MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation. CVPR 2021. [paper] Hansheng Chen, Yuyao Huang, Wei Tian*

 同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 96 Dec 10, 2022
Progressive Coordinate Transforms for Monocular 3D Object Detection

Progressive Coordinate Transforms for Monocular 3D Object Detection This repository is the official implementation of PCT. Introduction In this paper,

null 58 Nov 6, 2022
Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

MonoFlex Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21. Work in progress. Installation This repo is tested w

Yunpeng 169 Dec 6, 2022
ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

Zongdai 107 Dec 20, 2022
Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

RTM3D-PyTorch The PyTorch Implementation of the paper: RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 2020

Nguyen Mau Dzung 271 Nov 29, 2022
[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers This is the official PyTorch implementation and models for UP-DETR paper: @a

dddzg 430 Dec 23, 2022
Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information by Masato Tamura, Hiroki Ohashi, and Tomoaki Yosh

null 105 Dec 23, 2022
Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

OTA: Optimal Transport Assignment for Object Detection This project provides an implementation for our CVPR2021 paper "OTA: Optimal Transport Assignme

null 217 Jan 3, 2023
Repository of 3D Object Detection with Pointformer (CVPR2021)

3D Object Detection with Pointformer This repository contains the code for the paper 3D Object Detection with Pointformer (CVPR 2021) [arXiv]. This wo

Zhuofan Xia 117 Jan 6, 2023
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals, CVPR2021

End-to-End Object Detection with Learnable Proposal, CVPR2021

Peize Sun 1.2k Dec 27, 2022