Progressive Coordinate Transforms for Monocular 3D Object Detection

Last update: Nov 6, 2022

Related tags

Overview

Progressive Coordinate Transforms for Monocular 3D Object Detection

This repository is the official implementation of PCT.

Introduction

In this paper, we propose a novel and lightweight approach, dubbed Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations for monocular 3D object detection. Specifically, a localization boosting mechanism with confidence-aware loss is introduced to progressively refine the localization prediction. In addition, semantic image representation is also exploited to compensate for the usage of patch proposals. Despite being lightweight and simple, our strategy allows us to establish a new state-of-the-art among the monocular 3D detectors on the competitive KITTI benchmark. At the same time, our proposed PCT shows great generalization to most coordinate-based 3D detection frameworks.

Requirements

Installation

Download this repository (tested under python3.7, pytorch1.3.1 and ubuntu 16.04.7). There are also some dependencies like cv2, yaml, tqdm, etc., and please install them accordingly:

cd #root
pip install -r requirements

Then, you need to compile the evaluation script:

cd root/tools/kitti_eval
sh compile.sh

Prepare your data

First, you should download the KITTI dataset, and organize the data as follows (* indicates an empty directory to store the data generated in subsequent steps):


#ROOT
  |data
    |KITTI
      |2d_detections
      |ImageSets
      |pickle_files *
      |object
        |training
          |calib
          |image_2
          |label
          |depth *
          |pseudo_lidar (optional for Pseudo-LiDAR)*
          |velodyne (optional for FPointNet)
        |testing
          |calib
          |image_2
          |depth *
          |pseudo_lidar (optional for Pseudo-LiDAR)*
          |velodyne (optional for FPointNet)

Second, you need to prepare your depth maps and put them to data/KITTI/object/training/depth. For ease of use, we also provide the estimated depth maps (these data generated from the pretrained models provided by DORN and Pseudo-LiDAR).

Monocular (DORN)	Stereo (PSMNet)
trainval(~1.6G), test(~1.6G)	trainval(~2.5G)

Then, you need to generate image 2D features for the 2D bounding boxes and put them to data/KITTI/pickle_files/org. We train the 2D detector according to the 2D detector in RTM3D. You can also use your own 2D detector for training and inference.

Finally, generate the training data using provided scripts :

cd #root/tools/data_prepare
python patch_data_prepare_val.py --gen_train --gen_val --gen_val_detection --car_only
mv *.pickle ../../data/KITTI/pickle_files

Prepare Waymo dataset

We also provide Waymo Usage for monocular 3D detection.

Training

Move to the workplace and train the mode (also need to modify the path of pickle files in config file):

 cd #root
 cd experiments/pct
 python ../../tools/train_val.py --config config_val.yaml

Evaluation

Generate the results using the trained model:

 python ../../tools/train_val.py --config config_val.yaml --e

and evalute the generated results using:

../../tools/kitti_eval/evaluate_object_3d_offline_ap11 ../../data/KITTI/object/training/label_2 ./output

../../tools/kitti_eval/evaluate_object_3d_offline_ap40 ../../data/KITTI/object/training/label_2 ./output

we provide the generated results for evaluation due to the tedious process of data preparation process. Unzip the output.zip and then execute the above evaluation commonds. Result is:

Models	AP3D11@mod.	AP3D11@easy	AP3D11@hard
PatchNet + PCT	27.53 / 34.65	38.39 / 47.16	24.44 / 28.47

Acknowledgements

This code benefits from the excellent work PatchNet, and use the off-the-shelf models provided by DORN and RTM3D.

Citation

@article{wang2021pct,
  title={Progressive Coordinate Transforms for Monocular 3D Object Detection},
  author={Li Wang, Li Zhang, Yi Zhu, Zhi Zhang, Tong He, Mu Li, Xiangyang Xue},
  journal={arXiv preprint arXiv:2108.05793},
  year={2021}
}

Contact

For questions regarding PCT-3D, feel free to post here or directly contact the authors ([email protected]).

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Comments

Waymo Results: mAP for all classes or only for the vehicle class

Hi PCT authors, I had a small query regarding the Waymo results. Table 7 of your paper reports the mAP on Waymo dataset. Do you report the mAP/ mAPH of all the classes or is it only the mAP/mAPH for the vehicle (car) class ?

PS- Another paper CaDDN only reports mAP on the vehicle (car) class in their Table 2.

opened by abhi1kumar 4

Waymo evaluation: Metrics of all Level 1 Objects same as Metrics of [0, 30) Level 1 Objects

Hi PCT authors, I am using your waymo_eval.py for evaluating my Waymo model. Here is the output

OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/AP: 0.34
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/APH: 0.33
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/AP: 0.02
OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/APH: 0.02
RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_1/AP: 0.34
RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_1/APH: 0.33
RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_2/AP: 0.04
RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_2/APH: 0.04
RANGE_TYPE_VEHICLE_[30, 50)_LEVEL_1/AP: 0.12
RANGE_TYPE_VEHICLE_[30, 50)_LEVEL_1/APH: 0.12
RANGE_TYPE_VEHICLE_[30, 50)_LEVEL_2/AP: 0.00
RANGE_TYPE_VEHICLE_[30, 50)_LEVEL_2/APH: 0.00
RANGE_TYPE_VEHICLE_[50, +inf)_LEVEL_1/AP: 0.05
RANGE_TYPE_VEHICLE_[50, +inf)_LEVEL_1/APH: 0.05
RANGE_TYPE_VEHICLE_[50, +inf)_LEVEL_2/AP: 0.00
RANGE_TYPE_VEHICLE_[50, +inf)_LEVEL_2/APH: 0.00

You should quickly notice that the AP for all Level 1 Vehicle = 0.34 is the same as the AP for [0,30) Level 1 Vehicle = 0.34. This strange behavior also shows up for the Level 1 Vehicle APH and other Level 1 classes (which I have not shown here). Generally, the AP for all Level 1 Vehicle is less than the AP for [0,30) Level 1 Vehicle as correctly reported in Table 7 of your paper.

I am unable to understand this behavior and so wanted to ask if you saw similar stuff on your end.

PS- Level 2 metrics do NOT show this behavior. e.g., in the above output, AP for all Level 2 objects (0.02), is less than AP for [0,30) Level 2 objects (0.04) as expected.

I am using anaconda and following are the packages in my conda environment:

blas                      1.0                         mkl    anaconda
cudatoolkit               10.1.243             h6bb024c_0    anaconda
cudnn                     7.6.5                cuda10.1_0    anaconda
google-auth               1.22.1                     py_0    anaconda
google-auth-oauthlib      0.4.1                      py_2    anaconda
google-pasta              0.2.0                      py_0    anaconda
protobuf                  3.13.0.1         py36he6710b0_1    anaconda
py-opencv                 3.4.2            py36hb342d67_1
python                    3.6.13               h12debd9_1  
tensorboard               2.2.1              pyh532a8cf_0    anaconda
tensorflow                2.1.0           gpu_py36h2e5cdaa_0    anaconda
tensorflow-gpu            2.1.0                h0d30ee6_0    anaconda

opened by abhi1kumar 3

How May I get the staff in data/KITTI/pickle_files/org?

Hi, Thank you very much for sharing your great work. I would like to ask how may I get the 2d image features needed in data/KITTI/pickle_files/org?

Best

opened by YunzheWu-404 3
How much time does it take to convert Waymo to KITTI format?
Thank you for amazing work. I wanted to know how much time does it take to convert Waymo to KITTI format using the script

python converter.py --save_dir datasets/waymo_open_organized/ --split validation

The validation one seems to take a lot of time on my machine, and so wanted to confirm.
opened by abhi1kumar 2
About generating 2D detection feature!

Hi, thanks for sharing your great work! Do you share your 2D detection feature file? Or could you tell me which layer's feature should be saving in RTM3D?

you need to generate image 2D features for the 2D bounding boxes and put them to data/KITTI/pickle_files/org

opened by rockywind 2
Update for safe yaml loading

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by Willy0919 0
Update for safe yaml loading

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by Willy0919 0
Update patch_dataset.py for yaml loading

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by Willy0919 0
fix bugs in #7

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by Willy0919 0
fix bugs

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by Willy0919 0
Add all source code

Issue #, if available:

Description of changes: Add all source code, first commit

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by bryanyzhu 0
about waymo result

Hi, you mentioned you use adabin trained on waymo. So how you do that, since waymo don't provide the gt depth map. Another question is did you train the total model, depthcompletion, 2d detection and 3d detection in an end-to-end manner?

opened by mc171819 2
Could you provide the output.zip file or pretrained model checkpoints.

Hi. I noticed that you mentioned in the README.md file that

we provide the generated results for evaluation due to the tedious process of data preparation process. Unzip the output.zip and then execute the above evaluation commands. ...

However, I did not find a link to the result file. Would you like to share the detection results or pretrained model with us. Thank you very much.

opened by anti-destiny 0
Some confusion and a request

First of all, thank you for your excellent work. But I have some confusion and a request. 1.kitti_dataset 27 lines of the code, you load label in 'ddmp', not the provided "label_2". What preprocessing did you do to the label? I did not find an explanation in your paper. 2. As you said in the paper, the performance of the 2D detector has no positive correlation with the final 3D detection accuracy. So how do I choose a 2D detector, because I cannot choose the best 2D detector? 3. Have you done other coordinate-based detector experiments, because the paper only reports PatchNet+PCT. 4. Can you provide the feature files of the two-dimensional detection in training and testing so that I can run the code?

opened by mrsempress 2
ModuleNotFoundError: No module named 'lib.helpers.decorator_helper_level'

When I run python ../../tools/train_val.py --config config_val.yaml, I get the error as follow.

Traceback (most recent call last): File "../../tools/train_val.py", line 19, in from lib.helpers.trainer_helper import Trainer File "/newnfs/zzwu/08_3d_code/progressive-coordinate-transforms/lib/helpers/trainer_helper.py", line 11, in from lib.helpers.decorator_helper_level import decorator_level ModuleNotFoundError: No module named 'lib.helpers.decorator_helper_level'

opened by rockywind 3

Owner

GitHub

Image Processing, Image Smoothing, Edge Detection and Transforms

opevcvdl-hw1 This project uses openCV and Qt to achieve the requirements. Version Python 3.7 opencv-contrib-python 3.4.2.17 Matplotlib 3.1.1 pyqt5 5.1

3 Aug 17, 2022

Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Visual 3D Detection Package: This repo aims to provide flexible and reproducible visual 3D detection on KITTI dataset. We expect scripts starting from

305 Dec 19, 2022

Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Mo

76 Jan 2, 2023

Categorical Depth Distribution Network for Monocular 3D Object Detection

CaDDN CaDDN is a monocular-based 3D object detection method. This repository is based off of [OpenPCDet]. Categorical Depth Distribution Network for M

289 Jan 5, 2023

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow

190 Dec 30, 2022

Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

Delving into Localization Errors for Monocular 3D Detection By Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang. Intr

124 Jan 4, 2023

[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

MonoRUn MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation. CVPR 2021. [paper] Hansheng Chen, Yuyao Huang, Wei Tian*

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University)

96 Dec 10, 2022

Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

MonoFlex Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21. Work in progress. Installation This repo is tested w

169 Dec 6, 2022

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

107 Dec 20, 2022

Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

RTM3D-PyTorch The PyTorch Implementation of the paper: RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 2020

271 Nov 29, 2022

Datasets, Transforms and Models specific to Computer Vision

torchvision The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. Installat

13.1k Jan 2, 2023

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

218 Jan 5, 2023

Image data augmentation scheduler for albumentations transforms

albu_scheduler Scheduler for albumentations transforms based on PyTorch schedulers interface Usage TransformMultiStepScheduler import albumentations a

19 Aug 4, 2021

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching This repository contains the source code for our paper: RAFT-Stereo: Multilevel

328 Jan 9, 2023

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

1.3k Dec 30, 2022

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

59 Dec 9, 2022

Progressive Coordinate Transforms for Monocular 3D Object Detection

Related tags

Overview

Progressive Coordinate Transforms for Monocular 3D Object Detection

Introduction

Requirements

Installation

Prepare your data

Prepare Waymo dataset

Training

Evaluation

Acknowledgements

Citation

Contact

Security

License

Comments

Owner

Image Processing, Image Smoothing, Edge Detection and Transforms

Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

Categorical Depth Distribution Network for Monocular 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

Datasets, Transforms and Models specific to Computer Vision

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Image data augmentation scheduler for albumentations transforms

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

functorch is a prototype of JAX-like composable function transforms for PyTorch.

It's like Shape Editor in Maya but works with skeletons (transforms).

Code for our CVPR2021 paper coordinate attention

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.