Pseudo lidar - (CVPR 2019) Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

Overview

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

This paper has been accpeted by Conference on Computer Vision and Pattern Recognition (CVPR) 2019.

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

by Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariharan, Mark Campbell and Kilian Q. Weinberger

Figure

Citation

@inproceedings{wang2019pseudo,
  title={Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving},
  author={Wang, Yan and Chao, Wei-Lun and Garg, Divyansh and Hariharan, Bharath and Campbell, Mark and Weinberger, Kilian},
  booktitle={CVPR},
  year={2019}
}

Update

  • 2nd July 2020: Add a jupyter script to visualize point cloud. It is in ./visualization folder.
  • 29th July 2019: submission.py will save the disparity to the numpy file, not png file. And fix the generate_lidar.py.
  • I have modifed the official avod a little bit. Now you can directly train and test pseudo-lidar with avod. Please check the code https://github.com/mileyan/avod_pl.

Contents

Introduction

3D object detection is an essential task in autonomous driving. Recent techniques excel with highly accurate detection rates, provided the 3D input data is obtained from precise but expensive LiDAR technology. Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies --- a gap that is commonly attributed to poor image-based depth estimation. However, in this paper we argue that data representation (rather than its quality) accounts for the majority of the difference. Taking the inner workings of convolutional neural networks into consideration, we propose to convert image-based depth maps to pseudo-LiDAR representations --- essentially mimicking LiDAR signal. With this representation we can apply different existing LiDAR-based detection algorithms. On the popular KITTI benchmark, our approach achieves impressive improvements over the existing state-of-the-art in image-based performance --- raising the detection accuracy of objects within 30m range from the previous state-of-the-art of 22% to an unprecedented 74%. At the time of submission our algorithm holds the highest entry on the KITTI 3D object detection leaderboard for stereo image based approaches.

Usage

1. Overview

We provide the guidance and codes to train stereo depth estimator and 3D object detector using the KITTI object detection benchmark. We also provide our pre-trained models.

2. Stereo depth estimation models

We provide our pretrained PSMNet model using the Scene Flow dataset and the 3,712 training images of the KITTI detection benchmark.

We also directly provide the pseudo-LiDAR point clouds and the ground planes of training and testing images estimated by this pre-trained model.

We also provide codes to train your own stereo depth estimator and prepare the point clouds and gound planes. If you want to use our pseudo-LiDAR data for 3D object detection, you may skip the following contents and directly move on to object detection models.

2.1 Dependencies

  • Python 3.5+
  • numpy, scikit-learn, scipy
  • KITTI 3D object detection dataset

2.2 Download the dataset

You need to download the KITTI dataset from here, including left and right color images, Velodyne point clouds, camera calibration matrices, and training labels. You also need to download the image set files from here. Then you need to organize the data in the following way.

KITTI/object/
    
    train.txt
    val.txt
    test.txt 
    
    training/
        calib/
        image_2/ #left image
        image_3/ #right image
        label_2/
        velodyne/ 

    testing/
        calib/
        image_2/
        image_3/
        velodyne/

The Velodyne point clouds (by LiDAR) are used ONLY as the ground truths to train a stereo depth estimator (e.g., PSMNet).

2.3 Generate ground-truth image disparities

Use the script./preprocessing/generate_disp.py to process all velodyne files appeared in train.txt. This is our training ground truth. Or you can directly download them from disparity. Name this folder as disparity and put it inside the training folder.

python generate_disp.py --data_path ./KITTI/object/training/ --split_file ./KITTI/object/train.txt 

2.4. Train the stereo model

You can train any stereo disparity model as you want. Here we give an example to train the PSMNet. The modified code is saved in the subfolder psmnet. Make sure you follow the README inside this folder to install the correct python and library. I strongly suggest using conda env to organize the python environments since we will use Python with different versions. Download the psmnet model pretrained on Sceneflow dataset from here.

# train psmnet with 4 TITAN X GPUs.
python ./psmnet/finetune_3d.py --maxdisp 192 \
     --model stackhourglass \
     --datapath ./KITTI/object/training/ \
     --split_file ./KITTI/object/train.txt \
     --epochs 300 \
     --lr_scale 50 \
     --loadmodel ./pretrained_sceneflow.tar \
     --savemodel ./psmnet/kitti_3d/  --btrain 12

2.5 Predict the point clouds

Predict the disparities.
# training
python ./psmnet/submission.py \
    --loadmodel ./psmnet/kitti_3d/finetune_300.tar \
    --datapath ./KITTI/object/training/ \
    --save_path ./KITTI/object/training/predict_disparity
# testing
python ./psmnet/submission.py \
    --loadmodel ./psmnet/kitti_3d/finetune_300.tar \
    --datapath ./KITTI/object/testing/ \
    --save_path ./KITTI/object/testing/predict_disparity
Convert the disparities to point clouds.
# training
python ./preprocessing/generate_lidar.py  \
    --calib_dir ./KITTI/object/training/calib/ \
    --save_dir ./KITTI/object/training/pseudo-lidar_velodyne/ \
    --disparity_dir ./KITTI/object/training/predict_disparity \
    --max_high 1
# testing
python ./preprocessing/generate_lidar.py  \
    --calib_dir ./KITTI/object/testing/calib/ \
    --save_dir ./KITTI/object/testing/pseudo-lidar_velodyne/ \
    --disparity_dir ./KITTI/object/testing/predict_disparity \
    --max_high 1

If you want to generate point cloud from depth map (like DORN), you can add --is_depth in the command.

2.6 Generate ground plane

If you want to train an AVOD model for 3D object detection, you need to generate ground planes from pseudo-lidar point clouds.

#training
python ./preprocessing/kitti_process_RANSAC.py \
    --calib ./KITTI/object/training/calib/ \
    --lidar_dir  ./KITTI/object/training/pseudo-lidar_velodyne/ \
    --planes_dir /KITTI/object/training/pseudo-lidar_planes/
#testing
python ./preprocessing/kitti_process_RANSAC.py \
    --calib ./KITTI/object/testing/calib/ \
    --lidar_dir  ./KITTI/object/testing/pseudo-lidar_velodyne/ \
    --planes_dir /KITTI/object/testing/pseudo-lidar_planes/

3. Object Detection models

AVOD model

Download the code from https://github.com/kujason/avod and install the Python dependencies.

Follow their README to prepare the data and then replace (1) files in velodyne with those in pseudo-lidar_velodyne and (2) files in planes with those in pseudo-lidar_planes. Note that you should still keep the folder names as velodyne and planes.

Follow their README to train the pyramid_cars_with_aug_example model. You can also download our pretrained model and directly evaluate on it. But if you want to submit your result to the leaderboard, you need to train it on trainval.txt.

Frustum-PointNets model

Download the code from https://github.com/charlesq34/frustum-pointnets and install the Python dependencies.

Follow their README to prepare the data and then replace files in velodyne with those in pseudo-lidar_velodyne. Note that you should still keep the folder name as velodyne.

Follow their README to train the v1 model. You can also download our pretrained model and directly evaluate on it.

Results

The main results on the validation dataset of our pseudo-LiDAR method. Figure

You can download the avod validation results from HERE.

Contact

If you have any question, please feel free to email us.

Yan Wang ([email protected]), Harry Chao([email protected]), Div Garg([email protected])

Comments
  • Generate bad disparity map

    Generate bad disparity map

    Hi, Yan. Thanks for your great work.

    I am using pytorch 1.1.0.

    I follow the readme to generate disparity and point cloud. First, i download your pretrained PSMNet model. Then, i run following cmd to generate disparity.

    $ python ./psmnet/submission.py --loadmodel ./finetune_300.tar --datapath /mine/KITTI_DAT/training/ --save_path ./training_predict_disparity/
    Number of model parameters: 5224768
    /usr/lib/python3.7/site-packages/torch/nn/functional.py:2457: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
      warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
    /usr/lib/python3.7/site-packages/torch/nn/functional.py:2539: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
      "See the documentation of nn.Upsample for details.".format(mode))
    /usr/lib/python3.7/site-packages/torch/nn/functional.py:2539: UserWarning: Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
      "See the documentation of nn.Upsample for details.".format(mode))
    time = 1.18
    000000.png
    time = 0.42
    000001.png
    time = 0.42
    000002.png
    ...
    

    Next, i run following cmd to generate point cloud *.bin

    $ python ./preprocessing/generate_lidar.py --calib_dir /mine/KITTI_DAT/training/calib/ --save_dir ./training_predict_velodyne/ --disparity_dir ./training_predict_disparity/ --max_high 1
    Finish Depth 000000
    Finish Depth 000001
    Finish Depth 000002
    Finish Depth 000003
    Finish Depth 000004
    ...
    

    However, i got disparity map for training/000002 like this: 000002 and point cloud like this: image The RGB image is: 000002l

    I think the generated disparity is bad. But i can't figure out the reason. Thank you

    opened by godspeed1989 26
  • Poor 3D object detection results using MONODEPTH depth estimation module

    Poor 3D object detection results using MONODEPTH depth estimation module

    Hi, First of all,thanks for sharing this wonderful work. I am using monodepth model to directly predict disparity from a single image and I am using that model output to create pseudolidar from numpy files after normalising with image width.I have used your frustrum pointnets model v1 pretrained model weights for object detection evaluation but unfortunately results are very poor(10-15% max). Can you please help me understand the issue. Thanks, Hari

    opened by hari1106 22
  • Calibration data for test data

    Calibration data for test data

    1. Are you using the calibration parameters for test data too, like how you used for point cloud generation? < generate_lidar >
    2. Also , could you give an idea as to what the 'Tr_velo_to_cam' parameter is ? <kitti_util >

    Thank You.

    opened by zsfVishnu 4
  • ModuleNotFoundError: No module named 'submodule'

    ModuleNotFoundError: No module named 'submodule'

    I followed all your instructions but I still get the above error when I run the following command in Anaconda python ./psmnet/submission.py --loadmodel ./psmnet/kitti_3d/finetune_300.tar --datapath ./media/sarim/7AF2CEB6F2CE7643/datasets/kitti/object/training/ --save_path ./media/sarim/7AF2CEB6F2CE7643/datasets/kitti/object/training/predict_disparity

    The full error log is:

    Traceback (most recent call last):
      File "./psmnet/submission.py", line 20, in <module>
        from models import *
      File "/home/sarim/pseudo_lidar/psmnet/models/__init__.py", line 1, in <module>
        from .basic import PSMNet as basic
      File "/home/sarim/pseudo_lidar/psmnet/models/basic.py", line 8, in <module>
        from submodule import *
    ModuleNotFoundError: No module named 'submodule'
    
    opened by sarimmehdi 3
  • Can not install pytorch with Python 2.7

    Can not install pytorch with Python 2.7

    Is there a solution to conda install pytorch==0.4.1 torchvision==0.2.1 cuda92 -c pytorch with Python 2.7 on Windows10? If not, how can I run the disparity step 2.5 (https://github.com/mileyan/pseudo_lidar)? It is possible to install conda install pytorch==0.4.1 torchvision==0.2.1 cuda92 -c pytorch under python 3.6 but then I get a lot of errors when I try to run submission.py (see below) since code is written for python 2.7 I think.

    Traceback (most recent call last): File "submission.py", line 57, in test_left_img, test_right_img = DA.dataloader(args.datapath) File "\KITTI\psmnet\dataloader\KITTI_submission_loader.py", line 26, in dataloader image = [img for img in os.listdir(filepath+left_fold)] FileNotFoundError: [WinError 3] The system cannot find the path specified: '/scratch/datasets/kitti2015/testing/image_2/

    opened by zazicky 2
  • Using PL-AVOD pre train to infer on MOT dataset

    Using PL-AVOD pre train to infer on MOT dataset

    Hello,

    Your code works perfectly and I have been able to evaluate your pre trained model. However, I want to use your Pseudo-LiDar pre trained model on AVOD to generate 3D detections on KITTI MOT dataset rather than the Object detection dataset, so question

    1. Do I need to train from scratch in order to generate the new Pseudo Lidar and planes for the new dataset?

    I thought I could do as usual and just provide the different images and that would be enough, but I get asked for the planes folder but do not know if using the previous one is correct, even tho I am using a pre trained model and not training. Thanks.

    opened by dmatos2012 2
  • finetune_300.tar not loading, but pretrained_model_KITTI2015.tar works

    finetune_300.tar not loading, but pretrained_model_KITTI2015.tar works

    Hi, I got runtime error while running PSMNet using the included pretrained finetune_300.tar, however, there was no error with the official PSMNet release pretrained_model_KITTI2015.tar. I'm wondering why could cause that?

    My system is Python 2.7, Pytorch 0.4, torchvision 0.2. Thanks.

    Error message:

    File "./psmnet/submission.py", line 69, in <module>
        model.load_state_dict(state_dict['state_dict'])
      File "/home/jhuang/.virtualenvs/psmnet/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
        self.__class__.__name__, "\n\t".join(error_msgs)))
    RuntimeError: Error(s) in loading state_dict for DataParallel:
            Unexpected key(s) in state_dict: "module.feature_extraction.firstconv.0.1.num_batches_tracked", xxx
    
    opened by jarvis-huang 2
  • your pre-trained AVOD model doesn't work

    your pre-trained AVOD model doesn't work

    I ran the AVOD code but I get a "no checkpoint error". I did everything as was explained on the AVOD github. Please tell me how to make your AVOD pre-trained checkpoint work. This issue is also encountered here: https://github.com/mileyan/pseudo_lidar/issues/19 and remains unanswered

    opened by sarimmehdi 2
  • Error(s) in loading state_dict for PSMNet:

    Error(s) in loading state_dict for PSMNet:

    Hi, i had faced with next issue while running step 2.4. Train the stereo model:

    python3 ./psmnet/finetune_3d.py --maxdisp 192 \
    >      --model stackhourglass \
    >      --datapath ./KITTI/object/training/ \
    >      --split_file ./KITTI/object/train.txt \
    >      --epochs 300 \
    >      --lr_scale 50 \
    >      --loadmodel ./pretrained_sceneflow.tar \
    >      --savemodel ./psmnet/kitti_3d/  --btrain 12
    ./psmnet/kitti_3d/training.log
    [2019-09-20 15:17:27 finetune_3d.py:77] INFO     load model ./pretrained_sceneflow.tar
    Traceback (most recent call last):
      File "./psmnet/finetune_3d.py", line 79, in <module>
        model.load_state_dict(state_dict['state_dict'])
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 845, in load_state_dict
        self.__class__.__name__, "\n\t".join(error_msgs)))
    RuntimeError: Error(s) in loading state_dict for PSMNet:
            Missing key(s) in state_dict: "feature_extraction.firstconv.0.0.weight", "feature_extraction.firstconv.0.1.weight", "feature_extraction.firstconv.0.1.bias", "feature_extraction.firstconv.0.1.running_mean", "feature_extraction.firstconv.0.1.running_var", "feature_extraction.firstconv.2.0.weight", "feature_extraction
    
    opened by evgdobr 2
  • Can you provide the test result (all txt files) on the validation set?

    Can you provide the test result (all txt files) on the validation set?

    I want to carry out a more detailed comparison with your method. If you can provide these detection txt files on the validation set. It would be very helpful.

    Thank you. Hope you receive more citations.

    opened by detectRecog 2
  • There is an imbalance between your GPUs

    There is an imbalance between your GPUs

    When executing the step "train psmnet with 4 TITAN X GPUs" there's an annoying warning caused by the necessity of using Python2.7 and pytorch, in full:

    torch/nn/parallel/data_parallel.py:25: UserWarning: There is an imbalance between your GPUs. You may want to exclude GPU 1 which has less than 75% of the memory or cores of GPU 0. You can do so by setting the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES environment variable.

    I suspect it could prevent the usage of multi-gpu. Luckily, there's an easy fix for it. Go to the file in question causing the warning (torch/nn/parallel/data_parallel.py), and at the top of the file, add the following line:

    from __future__ import division

    Try again. In case you run into the following issue:

    ImportError: No module named future

    Just install future, e.g. using one of the following commands:

    conda install future
    pip3 install future
    
    opened by kotchin 2
  • visualization code cannot get point cloud output

    visualization code cannot get point cloud output

    Hi, I am using the ipynb file to visualize point cloud, but get no output. I am a beginner for Jupyter. Could you show me where's the problem? Thanks!!! image

    opened by rsj007 1
  • Confusion about paper table 5

    Confusion about paper table 5

    Hi, when I reproduced the data from the paper, I found that the data in your Table 5 differs from the KITTI leaderboard, is there any difference in the implementation that causes this?

    opened by Baboom-l 0
  • question about dataloader

    question about dataloader

    hello,is is possible not to generate the huge pseuso-lidar point file(.bin). I want to done this process when load the image.I have just write a function which transform the image to point cloud with the help of SOTA monocular-depth estimator. where should I put this function to avoid load the pretrained weight mutiple times when load the image-data by training.

    TODO 
    opened by JianWu313 0
  • Training for my own custom data

    Training for my own custom data

    Hello, I am going to get 3D bounding box from only one single image. I have been suggested to use this repository for my aim. And then, I have little knowledge about this repository. Could anyone help me with my question? Is this repository suitable for my aim? That is, can I generate 3D bounding box using this repository? If yes, how can I make training dataset from only images dataset and how can I train them? Thank you in advance.

    opened by 1208overlord 2
S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)

S2-BNN (Self-supervised Binary Neural Networks Using Distillation Loss) This is the official pytorch implementation of our paper: "S2-BNN: Bridging th

Zhiqiang Shen 52 Dec 24, 2022
[MICCAI'20] AlignShift: Bridging the Gap of Imaging Thickness in 3D Anisotropic Volumes

AlignShift NEW: Code for our new MICCAI'21 paper "Asymmetric 3D Context Fusion for Universal Lesion Detection" will also be pushed to this repository

Medical 3D Vision 42 Jan 6, 2023
Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

Pranav 39 Nov 21, 2022
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving

SalsaNext: Fast, Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving Abstract In this paper, we introduce SalsaNext f

null 308 Jan 4, 2023
Repository to run object detection on a model trained on an autonomous driving dataset.

Autonomous Driving Object Detection on the Raspberry Pi 4 Description of Repository This repository contains code and instructions to configure the ne

Ethan 51 Nov 17, 2022
Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Visual 3D Detection Package: This repo aims to provide flexible and reproducible visual 3D detection on KITTI dataset. We expect scripts starting from

Yuxuan Liu 305 Dec 19, 2022
Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

RTM3D-PyTorch The PyTorch Implementation of the paper: RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 2020

Nguyen Mau Dzung 271 Nov 29, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

TransFuser This repository contains the code for the CVPR 2021 paper Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. If you find our

null 695 Jan 5, 2023
Self-Supervised Pillar Motion Learning for Autonomous Driving (CVPR 2021)

Self-Supervised Pillar Motion Learning for Autonomous Driving Chenxu Luo, Xiaodong Yang, Alan Yuille Self-Supervised Pillar Motion Learning for Autono

QCraft 101 Dec 5, 2022
RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving

RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving (AAAI2021). RTS3D is efficiency and accuracy s

null 71 Nov 29, 2022
The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

null 32 Jun 14, 2022
Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

LapDepth-release This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals" M

Minsoo Song 205 Dec 30, 2022
Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021) Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma. We address the pr

Kranti Kumar Parida 33 Jun 27, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Transparency-by-Design networks (TbD-nets) This repository contains code for replicating the experiments and visualizations from the paper Transparenc

David Mascharka 351 Nov 18, 2022
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud, CVPR 2019.

PointRCNN PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud Code release for the paper PointRCNN:3D Object Proposal Generation a

Shaoshuai Shi 1.5k Dec 27, 2022