Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

Overview

LapDepth-release

PWC PWC

This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals"

Minsoo Song, Seokjae Lim, and Wonjun Kim*
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

Video presentation

Screenshot

Requirements

  • Python >= 3.7
  • Pytorch >= 1.6.0
  • Ubuntu 16.04
  • CUDA 9.2
  • cuDNN (if CUDA available)

some other packages: geffnet, path, IPython, blessings, progressbar

Pretrained models

You can download pre-trained model

  • Trained with KITTI

    • batch 16, SyncBatchNorm, data loss
    cap a1 a2 a3 Abs Rel Sq Rel RMSE RMSE log
    0-80m 0.965 0.995 0.999 0.059 0.201 2.397 0.090
    cap a1 a2 a3 Abs Rel Sq Rel RMSE RMSE log
    0-50m 0.970 0.996 0.999 0.057 0.155 1.788 0.085
  • Trained with KITTI

    • batch 16, GroupNorm, data loss + gradient loss
    cap a1 a2 a3 Abs Rel Sq Rel RMSE RMSE log
    0-80m 0.961 0.994 0.999 0.059 0.209 2.489 0.091
    cap a1 a2 a3 Abs Rel Sq Rel RMSE RMSE log
    0-50m 0.968 0.996 0.999 0.057 0.155 1.807 0.085
  • Trained with NYU Depth V2

    • batch 16, SyncBatchNorm, data loss
    cap a1 a2 a3 Abs Rel log10 RMSE RMSE log
    0-10m 0.895 0.983 0.996 0.105 0.045 0.384 0.135

Demo images (Single Test Image Prediction)

Make sure you download the pre-trained model and placed it in the './pretrained/' directory before running the demo.
Demo Command Line:

############### Example of argument usage #####################
## Running demo using a specified image (jpg or png)
python demo.py --model_dir ./pretrained/LDRN_KITTI_ResNext101_pretrained_data.pkl --img_dir ./your/file/path/filename --pretrained KITTI --cuda --gpu_num 0
python demo.py --model_dir ./pretrained/LDRN_NYU_ResNext101_pretrained_data.pkl --img_dir ./your/file/path/filename --pretrained NYU --cuda --gpu_num 0
# output image name => 'out_' + filename

## Running demo using a whole folder of images
python demo.py --model_dir ./pretrained/LDRN_KITTI_ResNext101_pretrained_data.pkl --img_folder_dir ./your/folder/path/folder_name --pretrained KITTI --cuda --gpu_num 0
# output folder name => 'out_' + folder_name

If you are using a model pre-trained from KITTI, insert '--pretrained KITTI' command
(in the case of NYU, '--pretrained NYU').
If you run the demo on GPU, insert '--cuda'.
'--gpu_num' argument is an index list of your available GPUs you want to use (e.g., 0,1,2,3).
ex) If you want to activate only the 3rd gpu out of 4 gpus, insert '--gpu_num 2'

Dataset Preparation

We referred to BTS in the data preparation process.

KITTI

1. Official ground truth

  • Download official KITTI ground truth on the link and make KITTI dataset directory.
    $ cd ./datasets
    $ mkdir KITTI && cd KITTI
    $ mv ~/Downloads/data_depth_annotated.zip ./datasets/KITTI
    $ unzip data_depth_annotated.zip

2. Raw dataset

  • Construct raw KITTI dataset using following commands.
    $ mv ./datasets/kitti_archives_to_download.txt ./datasets/KITTI
    $ cd ./datasets/KITTI
    $ aria2c -x 16 -i ./kitti_archives_to_download.txt
    $ parallel unzip ::: *.zip

3. Dense g.t dataset
We take an inpainting method from DenseDepth to get dense g.t for gradient loss.
(You can train our model using only data loss without gradient loss, then you don't need dense g.t)
Corresponding inpainted results from './datasets/KITTI/data_depth_annotated/2011_xx_xx_drive_xxxx_sync/proj_depth/groundtruth/image_02' are should be saved in './datasets/KITTI/data_depth_annotated/2011_xx_xx_drive_xxxx_sync/dense_gt/image_02'.
KITTI data structures are should be organized as below:

|-- datasets
  |-- KITTI
     |-- data_depth_annotated  
        |-- 2011_xx_xx_drive_xxxx_sync
           |-- proj_depth  
              |-- groundtruth            # official G.T folder
        |-- ... (all drives of all days in the raw KITTI)  
     |-- 2011_09_26                      # raw RGB data folder  
        |-- 2011_09_26_drive_xxxx_sync
     |-- 2011_09_29
     |-- ... (all days in the raw KITTI)  

NYU Depth V2

1. Training set
Make NYU dataset directory

    $ cd ./datasets
    $ mkdir NYU_Depth_V2 && cd NYU_Depth_V2
  • Constructing training data using following steps :
    • Download Raw NYU Depth V2 dataset (450GB) from this Link.
    • Extract the raw dataset into './datasets/NYU_Depth_V2'
      (It should make './datasets/NYU_Depth_V2/raw/....').
    • Run './datasets/sync_project_frames_multi_threads.m' to get synchronized data. (need Matlab)
      (It shoud make './datasets/NYU_Depth_V2/sync/....').
  • Or, you can directly download whole 'sync' folder from our Google drive Link into './datasets/NYU_Depth_V2/'

2. Testing set
Download official nyu_depth_v2_labeled.mat and extract image files from the mat file.

    $ cd ./datasets
    ## Download official labled NYU_Depth_V2 mat file
    $ wget http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat
    ## Extract image files from the mat file
    $ python extract_official_train_test_set_from_mat.py nyu_depth_v2_labeled.mat splits.mat ./NYU_Depth_V2/official_splits/

Evaluation

Make sure you download the pre-trained model and placed it in the './pretrained/' directory before running the evaluation code.

  • Evaluation Command Line:
# Running evaluation using a pre-trained models
## KITTI
python eval.py --model_dir ./pretrained/LDRN_KITTI_ResNext101_pretrained_data.pkl --evaluate --batch_size 1 --dataset KITTI --data_path ./datasets/KITTI --gpu_num 0
## NYU Depth V2
python eval.py --model_dir ./pretrained/LDRN_NYU_ResNext101_pretrained_data.pkl --evaluate --batch_size 1 --dataset NYU --data_path --data_path ./datasets/NYU_Depth_V2/official_splits/test --gpu_num 0

### if you want to save image files from results, insert `--img_save` command
### if you have dense g.t files, insert `--img_save` with `--use_dense_depth` command

Training

LDRN (Laplacian Depth Residual Network) training

  • Training Command Line:
# KITTI 
python train.py --distributed --batch_size 16 --dataset KITTI --data_path ./datasets/KITTI --gpu_num 0,1,2,3
# NYU
python train.py --distributed --batch_size 16 --dataset NYU --data_path ./datasets/NYU_Depth_V2/sync --epochs 30 --gpu_num 0,1,2,3 
## if you want to train using gradient loss, insert `--use_dense_depth` command
## if you don't want distributed training, remove `--distributed` command

'--gpu_num' argument is an index list of your available GPUs you want to use (e.g., 0,1,2,3).
ex) If you want to activate only the 3rd gpu out of 4 gpus, insert '--gpu_num 2'

Reference

When using this code in your research, please cite the following paper:

M. Song, S. Lim and W. Kim, "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals," in IEEE Transactions on Circuits and Systems for Video Technology, doi: 10.1109/TCSVT.2021.3049869.

@ARTICLE{9316778,
  author={M. {Song} and S. {Lim} and W. {Kim}},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals}, 
  year={2021},
  volume={},
  number={},
  pages={1-1},
  doi={10.1109/TCSVT.2021.3049869}}
Comments
  • Getting metric depth from Depth images

    Getting metric depth from Depth images

    Hi. First of all thank you for your amazing work. Is there any way you could get a metric distance between the camera and the object from the depth map outputted by this work? I get how you could get the distance when you have a disparity map, but I'm not quite sure if the same concept applies here. Thank you

    opened by gv237-07 5
  • Change backbone to ResNet-18

    Change backbone to ResNet-18

    Hi there! Thanks for sharing this fantastic work :).

    Due to I have a small dataset, the available encoder networks are too large for my purpose. Therefore, I would like to use a smaller ResNet-18 instead. However, I am struggling with the code. This is the code for ResNet-101:

    https://github.com/tjqansthd/LapDepth-release/blob/2ea5f29d2e117e765e9da97984bbc5b6de0eddcb/model.py#L338-L383 What would it be like for a ResNet-18?

    Cheers!

    opened by DavidRecasens 4
  • The gradient remains 0

    The gradient remains 0

    Hi, very great work! but I met a problem when training. The output gradient remains 0 when I train on the kitti dataset(I did not change any code), have you met this problem before?

    opened by carry-all-coder 3
  • No files to download.

    No files to download.

    Hello,

    when I run "aria2c -x 16 -i ./datasets/kitti_archives_to_download.txt", its appear "No files to download." I have checked that the path is correct. I can't seem to get the raw dataset...

    opened by deng29 3
  • Can a model be trained on

    Can a model be trained on "generated" GT?

    Hi, I would like to ask about the feasibility of training a model from the code provided using generated dataset. I have a set of RGB images obtained from several open-source data and I have passed the dataset into the model provided in https://github.com/compphoto/BoostingMonocularDepth to produce ground truth. If I were to use this "generated" dataset, what should be the depth scale and maximum depth I should input? I notice that there is a fixed depth scale and maximum depth for NYU and KITTI datasets in the code such as self.depth_scale = 256.0 in line 24 of datasets_list.py

    opened by Oreobun 2
  • I'd like to konw the difference between gt and dense gt.

    I'd like to konw the difference between gt and dense gt.

    Thanks for your great work. I'd like to konw the difference between gt and dense gt. I get the data by carla simulator, so I get the gt with float32. Can I set the same ground truth with gt and dense gt?

    opened by raozhongyu 2
  • OSError: [Errno 22] Invalid argument: Path('checkpoints/sync/04-13-10:14')

    OSError: [Errno 22] Invalid argument: Path('checkpoints/sync/04-13-10:14')

    hello, when I run train.py, the code reaches line 54 “args.save_path.makedirs_p()”,Such an error occurred:OSError: [Errno 22] Invalid argument: Path('checkpoints/sync/04-13-10:14') I have been unable to solve this problem......

    opened by deng29 2
  • How to get the real distance from depth map

    How to get the real distance from depth map

    Hello,I don't understand how to get the real distance from the "out" in demo.py .What model I used is trained from KITTI.Could you please tell me how to get it ?

    opened by fgogoto 1
  • Issue in the script model.py during execution of the code

    Issue in the script model.py during execution of the code

    Hello,

    I just run the train.py script and I am getting the following error:

    File "/home/raai/sparse-to-dense/train_main.py", line 356, in main()

    File "/home/raai/sparse-to-dense/train_main.py", line 269, in main model = LDRN()

    File "/home/raai/sparse-to-dense/model_lapdepth.py", line 912, in init self.decoder = Lap_decoder_lv6(self.encoder.dimList)

    File "/home/raai/sparse-to-dense/model_lapdepth.py", line 740, in init self.ASPP = Dilated_bottleNeck_lv6(norm, act, dimList[4])

    IndexError: list index out of range

    opened by aliadeeba98 1
  • How to get the ground truth depth map?

    How to get the ground truth depth map?

    Thank you very much for your outstanding work. I am very interested in your work, but I have a doubt about how to get the real depth map mentioned in your paper. The real depth map of Kitti dataset is sparse。

    image

    opened by Pattern6 1
  • How to evaluate the depth image for any real-world images?

    How to evaluate the depth image for any real-world images?

    Hi, thank you for sharing your excellent work. Please how to evaluate the depth image for any real-world images. I can try the command line "python eval.py --model_dir ./pretrained/LDRN_KITTI_ResNext101_pretrained_data.pkl --evaluate --batch_size 1 --dataset KITTI --data_path ./datasets/KITTI --gpu_num 0 --img_save". However, it cannot work. If you have some good suggestions, please let me know. I appreciate your help.

    opened by angleboy8 1
  • Image cropping & Performance on different dataset

    Image cropping & Performance on different dataset

    I am currently trying the network trained on the NYU dataset on a separate dataset (TUM RGBD desk see: https://vision.in.tum.de/data/datasets/rgbd-dataset/download#freiburg2_desk)

    I would have expected to get a similar performance but the RMSE is significantly higher (TUM: 0.73 vs NYU: 0.38)

    Both datasets were recorded using Microsoft Kinect Sensors and are comprised of indoor scenery.

    I am trying to figure out why the model cannot generalize well to the TUM dataset.

    I build my own data loading logic with follows the same steps as the one defined for the NYU dataset:

     def predict(img_path):
      img = Image.open(img_path)
      img = img.crop((40,42,616,474))
      img = np.asarray(img, dtype = np.float32) / 255
      img = img.transpose((2, 0, 1))
      img = torch.from_numpy(img)
      img = normalize(img)
    
      img.cuda()
      _, org_h, org_w = img.shape
      img = img.unsqueeze(0)
      img_flip = torch.flip(img,[3])
    
      with torch.no_grad():
          _, out = Model(img)
          _, out_flip = Model(img_flip)
          out_flip = torch.flip(out_flip,[3])
          out = 0.5*(out + out_flip)
    
      pred = out[0,:,:]
      return pred
    

    Do you have any idea why the performance varies so drastically while using the same sensors and similar scenery?

    Also I would appreciate if you could help me understand the different image crops in the evaluation: Prediction: pred_uncropped[42:474, 40:616] = pred

    Mask for groundtruth and prediction: crop_mask[45:471, 41:601] = 1

    Thank you for any help & guidance!

    opened by DanielRoeder1 0
Owner
Minsoo Song
B.S. degree with the Department of Electrical and Electronics Engineering, Konkuk University (2014.03 ~)
Minsoo Song
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

Michaël Fonder 76 Jan 3, 2023
SimpleDepthEstimation - An unified codebase for NN-based monocular depth estimation methods

SimpleDepthEstimation Introduction This is an unified codebase for NN-based monocular depth estimation methods, the framework is based on detectron2 (

null 8 Dec 13, 2022
[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Single Image Depth Prediction with Wavelet Decomposition Michaël Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambeto

Niantic Labs 205 Jan 2, 2023
ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

Ibai Gorordo 18 Nov 6, 2022
ONNX-PackNet-SfM: Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Ibai Gorordo 14 Dec 9, 2022
Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

Computational Photography Lab @ SFU 1.1k Jan 2, 2023
[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

EPCDepth EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details ar

Rui Peng 110 Dec 23, 2022
This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised d

Hang 3 Oct 22, 2021
the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

NeurAI 4 Jul 27, 2022
Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

Meta Research 122 Dec 13, 2022
The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

null 32 Jun 14, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian (CVPR 2022)

Pop-Out Motion Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian (CVPR 2022) Jihyun Lee*, Minhyuk Sung*, Hyunjin Kim, Tae-Ky

Jihyun Lee 88 Nov 22, 2022
a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

pytorch-spynet This is a personal reimplementation of SPyNet [1] using PyTorch. Should you be making use of this work, please cite the paper according

Simon Niklaus 269 Jan 2, 2023
Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

Robotics and Perception Group 69 Dec 26, 2022
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Denis Emelin 42 Nov 24, 2022
Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction

Official PyTorch code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction. Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe,

stanley 152 Dec 16, 2022
Categorical Depth Distribution Network for Monocular 3D Object Detection

CaDDN CaDDN is a monocular-based 3D object detection method. This repository is based off of [OpenPCDet]. Categorical Depth Distribution Network for M

Toronto Robotics and AI Laboratory 289 Jan 5, 2023