[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Overview

Single Image Depth Prediction with Wavelet Decomposition

Michaël Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambetov

CVPR 2021

[Link to paper]

kitti gif nyu gif

We introduce WaveletMonoDepth, which improves efficiency of standard encoder-decoder monocular depth estimation methods by exploiting wavelet decomposition.

5 minute CVPR presentation video link

🧑‍🏫 Methodology

WaveletMonoDepth was implemented for two benchmarks, KITTI and NYUv2. For each dataset, we build our code upon a baseline code. Both baselines share a common encoder-decoder architecture, and we modify their decoder to provide a wavelet prediction.

Wavelets predictions are sparse, and can therefore be computed only at relevant locations, therefore saving a lot of unnecessary computations.

our architecture

The network is first trained with a dense convolutions in the decoder until convergence, and the dense convolutions are then replaced with sparse ones.

This is because the network first needs to learn to predict sparse wavelet coefficients before we can use sparse convolutions.

🗂 Environment Requirements 🗂

We recommend creating a new Anaconda environment to use WaveletMonoDepth. Use the following to setup a new environment:

conda env create -f environment.yml
conda activate wavelet-mdp

Our work uses Pytorch Wavelets, a great package from Fergal Cotter which implements the Inverse Discrete Wavelet Transform (IDWT) used in our work, and a lot more! To install Pytorch Wavelets, simply run:

git clone https://github.com/fbcotter/pytorch_wavelets
cd pytorch_wavelets
pip install .

🚗 🚦 KITTI 🌳 🛣

Depth Hints was used as a baseline for KITTI.

Depth Hints builds upon monodepth2. If you have questions about running the code, please see the issues in their repositories first.

Setup, Training and Evaluation

Please see the KITTI directory of this repository for details on how to train and evaluate our method.

📊 Results 📦 Trained models

Please find below the scores using dense convolutions to predict wavelet coefficients. Download links coming soon!

Model name Training modality Resolution abs_rel RMSE δ<1.25 Weights Eigen Predictions
Ours Resnet18 Stereo + DepthHints 640 x 192 0.106 4.693 0.876 Coming soon Coming soon
Ours Resnet50 Stereo + DepthHints 640 x 192 0.105 4.625 0.879 Coming soon Coming soon
Ours Resnet18 Stereo + DepthHints 1024 x 320 0.102 4.452 0.890 Coming soon Coming soon
Ours Resnet50 Stereo + DepthHints 1024 x 320 0.097 4.387 0.891 Coming soon Coming soon

🎚 Playing with sparsity

However the most interesting part is that we can make use of the sparsity property of the predicted wavelet coefficients to trade-off performance with efficiency, at a minimal cost on performance. We do so by tuning the threshold, and:

  • low thresholds values will lead to high performance but high number of computations,
  • high thresholds will lead to highly efficient computation, as convolutions will be computed only in a few pixel locations. This will have a minimal impact on performance.

sparsify kitti

Computing coefficients at only 10% of the pixels in the decoding process gives a relative score loss of less than 1.4%.

scores kitti

Our wavelet based method allows us to greatly reduce the number of computation in the decoder at a minimal expense in performance. We can measure the performance-vs-efficiency trade-off by evaluating scores vs FLOPs.

scores vs flops kitti

🪑 🛁 NYUv2 🛋 🚪

Dense Depth was used as a baseline for NYUv2. Note that we used the experimental PyTorch implementation of DenseDepth. Note that compared to the original paper, we made a few different modifications:

  • we supervise depth directly instead of supervising disparity
  • we do not use SSIM
  • we use DenseNet161 as encoder instead of DenseNet169

Setup, Training and Evaluation

Please see the NYUv2 directory of this repository for details on how to train and evaluate our method.

📊 Results and 📦 Trained models

Please find below the scores and associated trained models, using dense convolutions to predict wavelet coefficients.

Model name Encoder Resolution abs_rel RMSE δ<1.25 ε_acc Weights Eigen Predictions
Baseline DenseNet 640 x 480 0.1277 0.5479 0.8430 1.7170 Coming soon Coming soon
Ours DenseNet 640 x 480 0.1258 0.5515 0.8451 1.8070 Coming soon Coming soon
Baseline MobileNetv2 640 x 480 0.1772 0.6638 0.7419 1.8911 Coming soon Coming soon
Ours MobileNetv2 640 x 480 0.1727 0.6776 0.7380 1.9732 Coming soon Coming soon

🎚 Playing with sparsity

As with the KITTI dataset, we can tune the wavelet threshold to greatly reduce computation at minimal cost on performance.

sparsify nyu

Computing coefficients at only 5% of the pixels in the decoding process gives a relative depth score loss of less than 0.15%.

scores nyu

🎮 Try it yourself!

Try using our Jupyter notebooks to visualize results with different levels of sparsity, as well as compute the resulting computational saving in FLOPs. Notebooks can be found in <DATASET>/sparsity_test_notebook.ipynb where <DATASET> is either KITTI or NYUv2.

✏️ 📄 Citation

If you find our work useful or interesting, please consider citing our paper:

@inproceedings{ramamonjisoa-2021-wavelet-monodepth,
  title     = {Single Image Depth Prediction with Wavelet Decomposition},
  author    = {Ramamonjisoa, Micha{\"{e}}l and
               Michael Firman and
               Jamie Watson and
               Vincent Lepetit and
               Daniyar Turmukhambetov},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  month = {June},
  year = {2021}
}

👩‍⚖️ License

Copyright © Niantic, Inc. 2021. Patent Pending. All rights reserved. Please see the license file for terms.

Comments
  • What is the significance of

    What is the significance of "self.is_test" ?

    Hi, I have well understood the flow then, I can see that : "self.is_test" always remain False, hence the control will never enter in "If" condition https://github.com/nianticlabs/wavelet-monodepth/blob/5bc193957056a5bab6cbc1e052f6a443279f8335/NYUv2/data.py#L133

    Could you please tell me why you are doing either multiply by 1000 or dividing by 1000.

    Another point, as I am working with the smaller version of the NTU dataset i.e. "nyu_depth_v2_labeled.mat", whether this step of divide or multiply is also applicable in this dataset also ?

    opened by tanmayGIT 10
  • Can not get desired performance using wavelet-decomposition

    Can not get desired performance using wavelet-decomposition

    Hi,

    When I run the code on KITTI, I can not get the scores reported on the paper. I run the code with stereo training, with 1024x320 resolution and wavelet decomposition, the command is shown below: "train.py --data_path --log_dir --encoder_type resnet --num_layers 50 --width 1024 --height 320 --frame_ids 0 --use_stereo --split eigen_full --num_epochs 300 --use_depth_hints --depth_hint_path --use_wavelets"

    I evaluate the model on Epoch 20 and got the result: " abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.1146 & 0.8996 & 4.8552 & 0.2024 & 0.8582 & 0.9519 & 0.9785 \", which is different from the reported socres in the paper : "0.097 & 0.718 & 4.387 & 0.184 & 0.891 & 0.962 & 0.982"

    I use pytorch 1.7.1 (with cuda 10.1) , torchvision 0.8.2 (with cuda 10.1), pytorch-wavelets 1.3.0, numpy 1.19.5, opencv 3.4.2, pillow 6.2.1, scikit-learn 0.24.2, on python 3.7.10. The setting is consistent with that suggested by the github repository.

    So I wonder how to produce to desired scores, is there anything wrong with my settings? Thank you for your advice :D

    opened by ruili3 8
  • Pytorch warning on lr_scheduler.step() and PIL Image error on Dataloader

    Pytorch warning on lr_scheduler.step() and PIL Image error on Dataloader

    Hello,

    Thanks for the impressive work! I've cloned the code and setting the environment as required (pytorch 1.7.1, torchvision 0.8.2). When running the code on KITTI (w/o depth hints), I encountered two problems.

    1. There is a warning 'UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.' Consider I use the 1.7.1 version torch, do I need to change the order in run_epoch.py as suggested by the warning?

    2. An error shows in line 231 of mono_dataset.py: inputs[("depth_gt", scale)] = self.resizescale 'TypeError: img should be PIL Image. Got <class 'numpy.ndarray'>' I transform 'depth_gt' to PIL Image format and the problem is settled. I wonder if the error is an individual case for me, or how do you handle this problem.

    3. There is another small error when using grid_sample(): 'UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.' Do I need to change the 'align_corners' parameter or just leave it unchanged?

    I wonder how do you settle these issues in your implementation. Thank you a lot!

    opened by ruili3 4
  • Training process was stuck when setting resolution to 192x640

    Training process was stuck when setting resolution to 192x640

    Hi,

    Here comes a new problem. When I trained the model with the resolution of 192*640 with command train.py --data_path <path> --log_dir <path> --encoder_type resnet --num_layers 50 --width 640 --height 192 --frame_ids 0 --use_stereo --split eigen_full --num_epochs 30 --use_depth_hints --depth_hint_path <path> --use_wavelets , the training process was stuck. The logging message of the first epoch did not show. There was no other message nor error reported. It's pretty weird because the resolution is not supposed to interfere with the training process. Can you please help to debug that? Thank you!

    opened by ruili3 2
  • about the function of coefficients , e.g. (2 ** 3)(2 ** 2)(2 ** 1)

    about the function of coefficients , e.g. (2 ** 3)(2 ** 2)(2 ** 1)

    Hello, what is the function of 2 **2 when obtaining 'h'

    h = (2 ** 2) * self.wave1(x_d1).unsqueeze(1)

    and other examples, including ll = (2 ** 3) * self.wave1_ll(x_d1) h = (2 ** 1) * self.wave2(x_d2).unsqueeze(1)

    Could you please tell the function of that? Thank you in advance.

    opened by c-yn 1
  • the loss is vrey small

    the loss is vrey small

    Hello, when I use the training command without depth hints ,the loss is very small , I'm not sure whether this is the normal phenomenon. The loss is displayed as 0. 0000

    opened by wvc1208 0
  • What's the version of Pytorch_Wavelets?

    What's the version of Pytorch_Wavelets?

    Hi,

    Thanks for the brilliant work! I notice the inverse DWT you use in the code is IDWT(). However, I didn't find its usage in the current documentation of the Pytorch Wavelets library. I wonder which version of Pytorch Wavelets did you use in the paper? And if I want to use the DWT transformation under your code settings, am I supposed to use a function like DWT(img)?

    Thanks a lot!

    opened by ruili3 0
Owner
Niantic Labs
Building technologies and ideas that move us
Niantic Labs
Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

LapDepth-release This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals" M

Minsoo Song 205 Dec 30, 2022
Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

Meta Research 122 Dec 13, 2022
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

EPCDepth EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details ar

Rui Peng 110 Dec 23, 2022
the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

NeurAI 4 Jul 27, 2022
Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021) Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma. We address the pr

Kranti Kumar Parida 33 Jun 27, 2022
ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

Ibai Gorordo 18 Nov 6, 2022
ONNX-PackNet-SfM: Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Ibai Gorordo 14 Dec 9, 2022
Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

Michaël Fonder 76 Jan 3, 2023
Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

Computational Photography Lab @ SFU 1.1k Jan 2, 2023
This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised d

Hang 3 Oct 22, 2021
SimpleDepthEstimation - An unified codebase for NN-based monocular depth estimation methods

SimpleDepthEstimation Introduction This is an unified codebase for NN-based monocular depth estimation methods, the framework is based on detectron2 (

null 8 Dec 13, 2022
Pytorch implementation of forward and inverse Haar Wavelets 2D

Pytorch implementation of forward and inverse Haar Wavelets 2D

Sergei Belousov 9 Oct 30, 2022
The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

null 32 Jun 14, 2022
ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021) Project Page | Video | Paper | Data We present a novel metho

null 65 Nov 28, 2022
Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

EgoNet Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation". This repo inclu

Shichao Li 138 Dec 9, 2022
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

GDR-Net This repo provides the PyTorch implementation of the work: Gu Wang, Fabian Manhardt, Federico Tombari, Xiangyang Ji. GDR-Net: Geometry-Guided

null 169 Jan 7, 2023
Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

Robotics and Perception Group 69 Dec 26, 2022