Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

Overview

Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

This paper has been accepted by Conference on Robot Learning 2021.

By Ziyue Feng, Longlong Jing, Peng Yin, Yingli Tian, and Bing Li.

Arxiv: Link YouTube: link Slides: Link

image

image

Abstract

Self-supervised monocular depth prediction provides a cost-effective solution to obtain the 3D location of each pixel. However, the existing approaches usually lead to unsatisfactory accuracy, which is critical for autonomous robots. In this paper, we propose a novel two-stage network to advance the self-supervised monocular dense depth learning by leveraging low-cost sparse (e.g. 4-beam) LiDAR. Unlike the existing methods that use sparse LiDAR mainly in a manner of time-consuming iterative post-processing, our model fuses monocular image features and sparse LiDAR features to predict initial depth maps. Then, an efficient feed-forward refine network is further designed to correct the errors in these initial depth maps in pseudo-3D space with real-time performance. Extensive experiments show that our proposed model significantly outperforms all the state-of-the-art self-supervised methods, as well as the sparse-LiDAR-based methods on both self-supervised monocular depth prediction and completion tasks. With the accurate dense depth prediction, our model outperforms the state-of-the-art sparse-LiDAR-based method (Pseudo-LiDAR++) by more than 68% for the downstream task monocular 3D object detection on the KITTI Leaderboard.

⚙️ Setup

You can install the dependencies with:

conda create -n depth python=3.6.6
conda activate depth
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c conda-forge
pip install tensorboardX==1.4
conda install opencv=3.3.1   # just needed for evaluation
pip install open3d
pip install wandb
pip install scikit-image

We ran our experiments with PyTorch 1.8.0, CUDA 11.1, Python 3.6.6 and Ubuntu 18.04.

💾 KITTI Data Prepare

Download Data

You need to first download the KITTI RAW dataset, put in the kitti_data folder.

Our default settings expect that you have converted the png images to jpeg with this command, which also deletes the raw KITTI .png files:

find kitti_data/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg && rm {}'

or you can skip this conversion step and train from raw png files by adding the flag --png when training, at the expense of slower load times.

Preprocess Data

# bash prepare_1beam_data_for_prediction.sh
# bash prepare_2beam_data_for_prediction.sh
# bash prepare_3beam_data_for_prediction.sh
bash prepare_4beam_data_for_prediction.sh
# bash prepare_r100.sh # random sample 100 LiDAR points
# bash prepare_r200.sh # random sample 200 LiDAR points

Training

By default models and tensorboard event files are saved to log/mdp/.

Depth Prediction:

python trainer.py
python inf_depth_map.py --need_path
python inf_gdc.py
python refiner.py

Depth Completion:

Please first download the KITTI Completion dataset.

python completor.py

Monocular 3D Object Detection:

Please first download the KITTI 3D Detection dataset.

python export_detection.py

Then you can train the PatchNet based on the exported depth maps.

📊 KITTI evaluation

python evaluate_depth.py
python evaluate_completion.py

Citation

@article{feng2021advancing,
  title={Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR},
  author={Feng, Ziyue and Jing, Longlong and Yin, Peng and Tian, Yingli and Li, Bing},
  journal={arXiv preprint arXiv:2109.09628},
  year={2021}
}

Reference

Our code is based on the Monodepth2: https://github.com/nianticlabs/monodepth2

Comments
  • Running trainer.py failed运行trainer.py失败

    Running trainer.py failed运行trainer.py失败

    File "trainer.py", line 272, in process_batch inputs[key] = ipt.to(self.device) AttributeError: 'list' object has no attribute 'to'

    may i ask how to solve this problem? I tried but failed. 请问怎么解决这个问题呢?我试了一些方法,但失败了

    opened by henrycddj 7
  • Tensor size matching error

    Tensor size matching error

    Dear authors, thanks for the great work! I'm trying to train with a custom dataset containing images and Pseudo Dense Representations Generation of size H * W * 1 and have changed the Resnet encoder dimension from 2 to 1 accordingly. However, I'm getting RuntimeError: The size of tensor a (10) must match the size of tensor b (15) at non-singleton dimension 3 at x = input_features[-1] + beam_features[-1] in depth_decoder.py. I guess it's related to scaling as the Pseudo Dense Representations Generation has the original scale while for image it's scaled down. However in your original inputs["2channel"] = self.load_4beam_2channel(folder, frame_index, side, do_flip) it seems that there's no scaling down involved. Do you have any idea what might be the issue? Thanks!

    opened by hdacnw 7
  • Error when run evaluation on training result

    Error when run evaluation on training result

    Thank you for your work!

    I have followed the procedure of Preprocess Data and Depth Prediction. However, when I run the evaluation on my model, the error occurred:

    Traceback (most recent call last):
      File "evaluate_depth.py", line 510, in <module>
        evaluate(options.parse())
      File "evaluate_depth.py", line 120, in evaluate
        encoder.load_state_dict({k: v for k, v in encoder_dict.items() if k in model_dict})
      File "/home/chenwei/anaconda3/envs/diff/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
        self.__class__.__name__, "\n\t".join(error_msgs)))
    RuntimeError: Error(s) in loading state_dict for ResnetEncoder:
            size mismatch for encoder.layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
            size mismatch for encoder.layer1.1.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
            size mismatch for encoder.layer2.0.conv1.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
            size mismatch for encoder.layer2.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64,1, 1]).
            size mismatch for encoder.layer2.0.downsample.1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
            size mismatch for encoder.layer2.0.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
            size mismatch for encoder.layer2.0.downsample.1.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
            size mismatch for encoder.layer2.0.downsample.1.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
            size mismatch for encoder.layer2.1.conv1.weight: copying a param with shape torch.Size([128, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
            size mismatch for encoder.layer3.0.conv1.weight: copying a param with shape torch.Size ([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
            size mismatch for encoder.layer3.0.downsample.0.weight: copying a param with shape torch.Size([1024, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
            size mismatch for encoder.layer3.0.downsample.1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
            size mismatch for encoder.layer3.0.downsample.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
            size mismatch for encoder.layer3.0.downsample.1.running_mean: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
            size mismatch for encoder.layer3.0.downsample.1.running_var: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
            size mismatch for encoder.layer3.1.conv1.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
            size mismatch for encoder.layer4.0.conv1.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
            size mismatch for encoder.layer4.0.downsample.0.weight: copying a param with shape torch.Size([2048, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1]).
            size mismatch for encoder.layer4.0.downsample.1.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
            size mismatch for encoder.layer4.0.downsample.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
            size mismatch for encoder.layer4.0.downsample.1.running_mean: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
            size mismatch for encoder.layer4.0.downsample.1.running_var: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
            size mismatch for encoder.layer4.1.conv1.weight: copying a param with shape torch.Size([512, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3] ).
            size mismatch for encoder.fc.weight: copying a param with shape torch.Size([1000, 2048]) from checkpoint, the shape in current model is torch.Size([1000, 512]).
    

    It seems that the format of the weights missmatch the evaluation code. This error also occurred when I tried to run evaluation on the initial model (generated by running python trainer.py). Nevertheless, when I run your pretrained model, everything was OK.

    opened by Renatusphere 6
  • Error in Preprocessing Data

    Error in Preprocessing Data

    Thanks for your work! I'm running the "bash prepare_4beam_data_for_prediction.sh". However, the data "/Kitti_RAW_Data/2011_09_26/2011_09_26_drive_0002_sync/4beam/0000000069.bin" seems to be needed. })7CU%D5M$9`KBYEK5PLFT9

    I wonder if it's the same as the one in the standard kitti dataset, which is "/data/Kitti_RAW_Data/2011_09_26/2011_09_26_drive_0002_sync/velodyne_points/data/0000000069.bin" If not, how can I get the series of dataset mentioned in the image? Thanks!

    opened by Renatusphere 5
  • No ptc2depth !

    No ptc2depth !

    Hello, thanks for your work. But when I run 'bash prepare_4beam_data_for_prediction.sh', the gen2channel.py can not import ptc2depth form kitti_utils.

    opened by WeixuWang 5
  • No such file or directory

    No such file or directory

    Hello, thanks for your work. But when I run 'bash prepare_4beam_data_for_prediction.sh', there are no documents here such as sparsify.py, splits. Please make sure your code is complete. Thanks for your reply.

    opened by wangcong607 4
  • Error when run evaluation by pretrained model

    Error when run evaluation by pretrained model

    Thank you for your interesting work. I want to confirm the performance of depth completion task. I prepared the pretrained model ResNet50 and validation data(data_depth_selection). When I run python evaluate_completion.py --load_weights_folder log/res50/models/weights_best --eval_mono --nbeams 4 --num_layers 50 It returns the following error.

    Traceback (most recent call last):
      File "evaluate_completion.py", line 373, in <module>
        evaluate(options.parse())
      File "evaluate_completion.py", line 174, in evaluate
        output = depth_decoder(features, beam_features=beam_features)
      File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/workspace/localDisk/yangjunjie/scripts/3d/FusionDepth/networks/depth_decoder.py", line 70, in forward
        x = input_features[-1] + beam_features[-1]
    RuntimeError: The size of tensor a (20) must match the size of tensor b (38) at non-singleton dimension 3
    

    It seems that input image is resized while the corresponding depth map keeps the same.

    opened by mysephi 3
  • about static scenario

    about static scenario

    Hello, I have a question about the PoseNet. Can PoseNet handle the situation where the car stops at the road intersection? In this scenario, the front and rear frames are relatively static, what is the output of PoseNet?

    opened by dzin18 2
  • About the depth completion task

    About the depth completion task

    The RMSE performance of the depth completion improved a lot compared to other self-supervised methods, but the depth-completion training script completor.py is not provided.

    opened by dzin18 2
  • Visualization

    Visualization

    Hi Ziyue,

    Thanks for releasing this fantastic work. I have a quick question about your video demo. Can you give some suggestions about point cloud visualization? like any tools or software etc.

    Thanks, Hang

    opened by brandleyzhou 1
  • Accurate or normalized depth value in training?

    Accurate or normalized depth value in training?

    Thanks for your contribution in FusionDepth!

    I wonder whether the "depth" in the trainer.py is normalized or the accurate value of the depth in each pixel. When I print them in the training process, the value looks like normalized, which is generally below 1. But in the reprojection procedure, this depth value is used directly. T@S_F{IPZ(TMO7H2@GL9`%1 image

    What's more, when calculating the loss, the depth is multiplied by 26. I have no idea what the "26" means.

    $B3W2LD}3{F2IN~%%FDBN}F

    opened by Renatusphere 4
Owner
Ziyue Feng
Computer Vision, Autonomous Driving, Machine Learning, Deep Learning
Ziyue Feng
This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised d

Hang 3 Oct 22, 2021
the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

NeurAI 4 Jul 27, 2022
Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

Pranav 39 Nov 21, 2022
Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

LapDepth-release This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals" M

Minsoo Song 205 Dec 30, 2022
Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in ONNX

ONNX msg_chn_wacv20 depth completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20 model in

Ibai Gorordo 19 Oct 22, 2022
Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

TFLite-msg_chn_wacv20-depth-completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model

Ibai Gorordo 2 Oct 4, 2021
Semi-supervised Implicit Scene Completion from Sparse LiDAR

Semi-supervised Implicit Scene Completion from Sparse LiDAR Paper Created by Pengfei Li, Yongliang Shi, Tianyu Liu, Hao Zhao, Guyue Zhou and YA-QIN ZH

null 114 Nov 30, 2022
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
Self-supervised Deep LiDAR Odometry for Robotic Applications

DeLORA: Self-supervised Deep LiDAR Odometry for Robotic Applications Overview Paper: link Video: link ICRA Presentation: link This is the correspondin

Robotic Systems Lab - Legged Robotics at ETH Zürich 181 Dec 29, 2022
Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Self-Supervised Multi-Frame Monocular Scene Flow 3D visualization of estimated depth and scene flow (overlayed with input image) from temporally conse

Visual Inference Lab @TU Darmstadt 85 Dec 22, 2022
Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency(ECCV 2020) This is an official python implementati

null 304 Jan 3, 2023
Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction

Official PyTorch code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction. Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe,

stanley 152 Dec 16, 2022
Categorical Depth Distribution Network for Monocular 3D Object Detection

CaDDN CaDDN is a monocular-based 3D object detection method. This repository is based off of [OpenPCDet]. Categorical Depth Distribution Network for M

Toronto Robotics and AI Laboratory 289 Jan 5, 2023
AdelaiDepth is an open source toolbox for monocular depth prediction.

AdelaiDepth is an open source toolbox for monocular depth prediction.

Adelaide Intelligent Machines (AIM) Group 743 Jan 1, 2023
Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

Robotics and Perception Group 69 Dec 26, 2022
Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

Michaël Fonder 76 Jan 3, 2023
Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

Computational Photography Lab @ SFU 1.1k Jan 2, 2023
[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Single Image Depth Prediction with Wavelet Decomposition Michaël Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambeto

Niantic Labs 205 Jan 2, 2023
Apply our monocular depth boosting to your own network!

MergeNet - Boost Your Own Depth Boost custom or edited monocular depth maps using MergeNet Input Original result After manual editing of base You can

Computational Photography Lab @ SFU 142 Dec 17, 2022