Progressive Coordinate Transforms for Monocular 3D Object Detection

Overview

Progressive Coordinate Transforms for Monocular 3D Object Detection

This repository is the official implementation of PCT.

Introduction

In this paper, we propose a novel and lightweight approach, dubbed Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations for monocular 3D object detection. Specifically, a localization boosting mechanism with confidence-aware loss is introduced to progressively refine the localization prediction. In addition, semantic image representation is also exploited to compensate for the usage of patch proposals. Despite being lightweight and simple, our strategy allows us to establish a new state-of-the-art among the monocular 3D detectors on the competitive KITTI benchmark. At the same time, our proposed PCT shows great generalization to most coordinate-based 3D detection frameworks.

arch

Requirements

Installation

Download this repository (tested under python3.7, pytorch1.3.1 and ubuntu 16.04.7). There are also some dependencies like cv2, yaml, tqdm, etc., and please install them accordingly:

cd #root
pip install -r requirements

Then, you need to compile the evaluation script:

cd root/tools/kitti_eval
sh compile.sh

Prepare your data

First, you should download the KITTI dataset, and organize the data as follows (* indicates an empty directory to store the data generated in subsequent steps):


#ROOT
  |data
    |KITTI
      |2d_detections
      |ImageSets
      |pickle_files *
      |object
        |training
          |calib
          |image_2
          |label
          |depth *
          |pseudo_lidar (optional for Pseudo-LiDAR)*
          |velodyne (optional for FPointNet)
        |testing
          |calib
          |image_2
          |depth *
          |pseudo_lidar (optional for Pseudo-LiDAR)*
          |velodyne (optional for FPointNet)

Second, you need to prepare your depth maps and put them to data/KITTI/object/training/depth. For ease of use, we also provide the estimated depth maps (these data generated from the pretrained models provided by DORN and Pseudo-LiDAR).

Monocular (DORN) Stereo (PSMNet)
trainval(~1.6G), test(~1.6G) trainval(~2.5G)

Then, you need to generate image 2D features for the 2D bounding boxes and put them to data/KITTI/pickle_files/org. We train the 2D detector according to the 2D detector in RTM3D. You can also use your own 2D detector for training and inference.

Finally, generate the training data using provided scripts :

cd #root/tools/data_prepare
python patch_data_prepare_val.py --gen_train --gen_val --gen_val_detection --car_only
mv *.pickle ../../data/KITTI/pickle_files

Prepare Waymo dataset

We also provide Waymo Usage for monocular 3D detection.

Training

Move to the workplace and train the mode (also need to modify the path of pickle files in config file):

 cd #root
 cd experiments/pct
 python ../../tools/train_val.py --config config_val.yaml

Evaluation

Generate the results using the trained model:

 python ../../tools/train_val.py --config config_val.yaml --e

and evalute the generated results using:

../../tools/kitti_eval/evaluate_object_3d_offline_ap11 ../../data/KITTI/object/training/label_2 ./output

or

../../tools/kitti_eval/evaluate_object_3d_offline_ap40 ../../data/KITTI/object/training/label_2 ./output

we provide the generated results for evaluation due to the tedious process of data preparation process. Unzip the output.zip and then execute the above evaluation commonds. Result is:

Models AP3D11@mod. AP3D11@easy AP3D11@hard
PatchNet + PCT 27.53 / 34.65 38.39 / 47.16 24.44 / 28.47

Acknowledgements

This code benefits from the excellent work PatchNet, and use the off-the-shelf models provided by DORN and RTM3D.

Citation

@article{wang2021pct,
  title={Progressive Coordinate Transforms for Monocular 3D Object Detection},
  author={Li Wang, Li Zhang, Yi Zhu, Zhi Zhang, Tong He, Mu Li, Xiangyang Xue},
  journal={arXiv preprint arXiv:2108.05793},
  year={2021}
}

Contact

For questions regarding PCT-3D, feel free to post here or directly contact the authors ([email protected]).

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Comments
  • Waymo Results: mAP for all classes or only for the vehicle class

    Waymo Results: mAP for all classes or only for the vehicle class

    Hi PCT authors, I had a small query regarding the Waymo results. Table 7 of your paper reports the mAP on Waymo dataset. Do you report the mAP/ mAPH of all the classes or is it only the mAP/mAPH for the vehicle (car) class ?

    PS- Another paper CaDDN only reports mAP on the vehicle (car) class in their Table 2.

    opened by abhi1kumar 4
  • Waymo evaluation: Metrics of all Level 1 Objects same as Metrics of [0, 30) Level 1 Objects

    Waymo evaluation: Metrics of all Level 1 Objects same as Metrics of [0, 30) Level 1 Objects

    Hi PCT authors, I am using your waymo_eval.py for evaluating my Waymo model. Here is the output

    OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/AP: 0.34
    OBJECT_TYPE_TYPE_VEHICLE_LEVEL_1/APH: 0.33
    OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/AP: 0.02
    OBJECT_TYPE_TYPE_VEHICLE_LEVEL_2/APH: 0.02
    RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_1/AP: 0.34
    RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_1/APH: 0.33
    RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_2/AP: 0.04
    RANGE_TYPE_VEHICLE_[0, 30)_LEVEL_2/APH: 0.04
    RANGE_TYPE_VEHICLE_[30, 50)_LEVEL_1/AP: 0.12
    RANGE_TYPE_VEHICLE_[30, 50)_LEVEL_1/APH: 0.12
    RANGE_TYPE_VEHICLE_[30, 50)_LEVEL_2/AP: 0.00
    RANGE_TYPE_VEHICLE_[30, 50)_LEVEL_2/APH: 0.00
    RANGE_TYPE_VEHICLE_[50, +inf)_LEVEL_1/AP: 0.05
    RANGE_TYPE_VEHICLE_[50, +inf)_LEVEL_1/APH: 0.05
    RANGE_TYPE_VEHICLE_[50, +inf)_LEVEL_2/AP: 0.00
    RANGE_TYPE_VEHICLE_[50, +inf)_LEVEL_2/APH: 0.00
    

    You should quickly notice that the AP for all Level 1 Vehicle = 0.34 is the same as the AP for [0,30) Level 1 Vehicle = 0.34. This strange behavior also shows up for the Level 1 Vehicle APH and other Level 1 classes (which I have not shown here). Generally, the AP for all Level 1 Vehicle is less than the AP for [0,30) Level 1 Vehicle as correctly reported in Table 7 of your paper.

    I am unable to understand this behavior and so wanted to ask if you saw similar stuff on your end.

    PS- Level 2 metrics do NOT show this behavior. e.g., in the above output, AP for all Level 2 objects (0.02), is less than AP for [0,30) Level 2 objects (0.04) as expected.

    I am using anaconda and following are the packages in my conda environment:

    blas                      1.0                         mkl    anaconda
    cudatoolkit               10.1.243             h6bb024c_0    anaconda
    cudnn                     7.6.5                cuda10.1_0    anaconda
    google-auth               1.22.1                     py_0    anaconda
    google-auth-oauthlib      0.4.1                      py_2    anaconda
    google-pasta              0.2.0                      py_0    anaconda
    protobuf                  3.13.0.1         py36he6710b0_1    anaconda
    py-opencv                 3.4.2            py36hb342d67_1
    python                    3.6.13               h12debd9_1  
    tensorboard               2.2.1              pyh532a8cf_0    anaconda
    tensorflow                2.1.0           gpu_py36h2e5cdaa_0    anaconda
    tensorflow-gpu            2.1.0                h0d30ee6_0    anaconda
    
    opened by abhi1kumar 3
  • How May I get the staff in data/KITTI/pickle_files/org?

    How May I get the staff in data/KITTI/pickle_files/org?

    Hi, Thank you very much for sharing your great work. I would like to ask how may I get the 2d image features needed in data/KITTI/pickle_files/org?

    Best

    opened by YunzheWu-404 3
  • How much time does it take to convert Waymo to KITTI format?

    How much time does it take to convert Waymo to KITTI format?

    Thank you for amazing work. I wanted to know how much time does it take to convert Waymo to KITTI format using the script

    python converter.py --save_dir datasets/waymo_open_organized/ --split validation
    

    The validation one seems to take a lot of time on my machine, and so wanted to confirm.

    opened by abhi1kumar 2
  • About generating 2D detection feature!

    About generating 2D detection feature!

    Hi, thanks for sharing your great work! Do you share your 2D detection feature file? Or could you tell me which layer's feature should be saving in RTM3D?

    you need to generate image 2D features for the 2D bounding boxes and put them to data/KITTI/pickle_files/org

    opened by rockywind 2
  • Update for safe yaml loading

    Update for safe yaml loading

    Issue #, if available:

    Description of changes:

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by Willy0919 0
  • Update for safe yaml loading

    Update for safe yaml loading

    Issue #, if available:

    Description of changes:

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by Willy0919 0
  • Update patch_dataset.py for yaml loading

    Update patch_dataset.py for yaml loading

    Issue #, if available:

    Description of changes:

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by Willy0919 0
  • fix bugs in #7

    fix bugs in #7

    Issue #, if available:

    Description of changes:

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by Willy0919 0
  • fix bugs

    fix bugs

    Issue #, if available:

    Description of changes:

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by Willy0919 0
  • Add all source code

    Add all source code

    Issue #, if available:

    Description of changes: Add all source code, first commit

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by bryanyzhu 0
  • about waymo result

    about waymo result

    Hi, you mentioned you use adabin trained on waymo. So how you do that, since waymo don't provide the gt depth map. Another question is did you train the total model, depthcompletion, 2d detection and 3d detection in an end-to-end manner?

    opened by mc171819 2
  • Could you provide the output.zip file or pretrained model checkpoints.

    Could you provide the output.zip file or pretrained model checkpoints.

    Hi. I noticed that you mentioned in the README.md file that

    we provide the generated results for evaluation due to the tedious process of data preparation process. Unzip the output.zip and then execute the above evaluation commands. ...

    However, I did not find a link to the result file. Would you like to share the detection results or pretrained model with us. Thank you very much.

    opened by anti-destiny 0
  • Some confusion and a request

    Some confusion and a request

    First of all, thank you for your excellent work. But I have some confusion and a request. 1.kitti_dataset 27 lines of the code, you load label in 'ddmp', not the provided "label_2". What preprocessing did you do to the label? I did not find an explanation in your paper. 2. As you said in the paper, the performance of the 2D detector has no positive correlation with the final 3D detection accuracy. So how do I choose a 2D detector, because I cannot choose the best 2D detector? 3. Have you done other coordinate-based detector experiments, because the paper only reports PatchNet+PCT. 4. Can you provide the feature files of the two-dimensional detection in training and testing so that I can run the code?

    opened by mrsempress 2
  • ModuleNotFoundError: No module named 'lib.helpers.decorator_helper_level'

    ModuleNotFoundError: No module named 'lib.helpers.decorator_helper_level'

    When I run python ../../tools/train_val.py --config config_val.yaml, I get the error as follow.

    Traceback (most recent call last): File "../../tools/train_val.py", line 19, in from lib.helpers.trainer_helper import Trainer File "/newnfs/zzwu/08_3d_code/progressive-coordinate-transforms/lib/helpers/trainer_helper.py", line 11, in from lib.helpers.decorator_helper_level import decorator_level ModuleNotFoundError: No module named 'lib.helpers.decorator_helper_level'

    opened by rockywind 3
Owner
null
Image Processing, Image Smoothing, Edge Detection and Transforms

opevcvdl-hw1 This project uses openCV and Qt to achieve the requirements. Version Python 3.7 opencv-contrib-python 3.4.2.17 Matplotlib 3.1.1 pyqt5 5.1

Kenny Cheng 3 Aug 17, 2022
Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Visual 3D Detection Package: This repo aims to provide flexible and reproducible visual 3D detection on KITTI dataset. We expect scripts starting from

Yuxuan Liu 305 Dec 19, 2022
Official PyTorch Code of GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 2021)

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Mo

Abhinav Kumar 76 Jan 2, 2023
Categorical Depth Distribution Network for Monocular 3D Object Detection

CaDDN CaDDN is a monocular-based 3D object detection method. This repository is based off of [OpenPCDet]. Categorical Depth Distribution Network for M

Toronto Robotics and AI Laboratory 289 Jan 5, 2023
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow 190 Dec 30, 2022
Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

Delving into Localization Errors for Monocular 3D Detection By Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang. Intr

XINZHU.MA 124 Jan 4, 2023
[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

MonoRUn MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation. CVPR 2021. [paper] Hansheng Chen, Yuyao Huang, Wei Tian*

 同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 96 Dec 10, 2022
Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

MonoFlex Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21. Work in progress. Installation This repo is tested w

Yunpeng 169 Dec 6, 2022
ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

Zongdai 107 Dec 20, 2022
Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

RTM3D-PyTorch The PyTorch Implementation of the paper: RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 2020

Nguyen Mau Dzung 271 Nov 29, 2022
Datasets, Transforms and Models specific to Computer Vision

torchvision The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. Installat

null 13.1k Jan 2, 2023
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

Rishikesh (ऋषिकेश) 218 Jan 5, 2023
Image data augmentation scheduler for albumentations transforms

albu_scheduler Scheduler for albumentations transforms based on PyTorch schedulers interface Usage TransformMultiStepScheduler import albumentations a

null 19 Aug 4, 2021
RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching This repository contains the source code for our paper: RAFT-Stereo: Multilevel

Princeton Vision & Learning Lab 328 Jan 9, 2023
TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

Microsoft 1.3k Dec 30, 2022
functorch is a prototype of JAX-like composable function transforms for PyTorch.

functorch is a prototype of JAX-like composable function transforms for PyTorch.

Facebook Research 1.2k Jan 9, 2023
It's like Shape Editor in Maya but works with skeletons (transforms).

Skeleposer What is Skeleposer? Briefly, it's like Shape Editor in Maya, but works with transforms and joints. It can be used to make complex facial ri

Alexander Zagoruyko 1 Nov 11, 2022
Code for our CVPR2021 paper coordinate attention

Coordinate Attention for Efficient Mobile Network Design (preprint) This repository is a PyTorch implementation of our coordinate attention (will appe

Qibin (Andrew) Hou 726 Jan 5, 2023
This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

Maurice Weiler 59 Dec 9, 2022