[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Niantic Labs

Last update: Jan 2, 2023

Related tags

Deep Learning computer-vision kitti-dataset depth-estimation wavelets nyu-depth-v2 cvpr2021

Overview

Single Image Depth Prediction with Wavelet Decomposition

Michaël Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambetov

CVPR 2021

[Link to paper]

We introduce WaveletMonoDepth, which improves efficiency of standard encoder-decoder monocular depth estimation methods by exploiting wavelet decomposition.

🧑‍🏫 Methodology

WaveletMonoDepth was implemented for two benchmarks, KITTI and NYUv2. For each dataset, we build our code upon a baseline code. Both baselines share a common encoder-decoder architecture, and we modify their decoder to provide a wavelet prediction.

Wavelets predictions are sparse, and can therefore be computed only at relevant locations, therefore saving a lot of unnecessary computations.

The network is first trained with a dense convolutions in the decoder until convergence, and the dense convolutions are then replaced with sparse ones.

This is because the network first needs to learn to predict sparse wavelet coefficients before we can use sparse convolutions.

🗂 Environment Requirements 🗂

We recommend creating a new Anaconda environment to use WaveletMonoDepth. Use the following to setup a new environment:

conda env create -f environment.yml
conda activate wavelet-mdp

Our work uses Pytorch Wavelets, a great package from Fergal Cotter which implements the Inverse Discrete Wavelet Transform (IDWT) used in our work, and a lot more! To install Pytorch Wavelets, simply run:

git clone https://github.com/fbcotter/pytorch_wavelets
cd pytorch_wavelets
pip install .

🚗 🚦 KITTI 🌳 🛣

Depth Hints was used as a baseline for KITTI.

Depth Hints builds upon monodepth2. If you have questions about running the code, please see the issues in their repositories first.

⚙ Setup, Training and Evaluation

Please see the KITTI directory of this repository for details on how to train and evaluate our method.

📊 Results 📦 Trained models

Please find below the scores using dense convolutions to predict wavelet coefficients. Download links coming soon!

Model name	Training modality	Resolution	abs_rel	RMSE	δ<1.25	Weights	Eigen Predictions
`Ours Resnet18`	Stereo + DepthHints	640 x 192	0.106	4.693	0.876	Coming soon	Coming soon
`Ours Resnet50`	Stereo + DepthHints	640 x 192	0.105	4.625	0.879	Coming soon	Coming soon
`Ours Resnet18`	Stereo + DepthHints	1024 x 320	0.102	4.452	0.890	Coming soon	Coming soon
`Ours Resnet50`	Stereo + DepthHints	1024 x 320	0.097	4.387	0.891	Coming soon	Coming soon

🎚 Playing with sparsity

However the most interesting part is that we can make use of the sparsity property of the predicted wavelet coefficients to trade-off performance with efficiency, at a minimal cost on performance. We do so by tuning the threshold, and:

low thresholds values will lead to high performance but high number of computations,
high thresholds will lead to highly efficient computation, as convolutions will be computed only in a few pixel locations. This will have a minimal impact on performance.

Computing coefficients at only 10% of the pixels in the decoding process gives a relative score loss of less than 1.4%.

Our wavelet based method allows us to greatly reduce the number of computation in the decoder at a minimal expense in performance. We can measure the performance-vs-efficiency trade-off by evaluating scores vs FLOPs.

🪑 🛁 NYUv2 🛋 🚪

Dense Depth was used as a baseline for NYUv2. Note that we used the experimental PyTorch implementation of DenseDepth. Note that compared to the original paper, we made a few different modifications:

we supervise depth directly instead of supervising disparity
we do not use SSIM
we use DenseNet161 as encoder instead of DenseNet169

⚙ Setup, Training and Evaluation

Please see the NYUv2 directory of this repository for details on how to train and evaluate our method.

📊 Results and 📦 Trained models

Please find below the scores and associated trained models, using dense convolutions to predict wavelet coefficients.

Model name	Encoder	Resolution	abs_rel	RMSE	δ<1.25	ε_acc	Weights	Eigen Predictions
`Baseline`	DenseNet	640 x 480	0.1277	0.5479	0.8430	1.7170	Coming soon	Coming soon
`Ours`	DenseNet	640 x 480	0.1258	0.5515	0.8451	1.8070	Coming soon	Coming soon
`Baseline`	MobileNetv2	640 x 480	0.1772	0.6638	0.7419	1.8911	Coming soon	Coming soon
`Ours`	MobileNetv2	640 x 480	0.1727	0.6776	0.7380	1.9732	Coming soon	Coming soon

🎚 Playing with sparsity

As with the KITTI dataset, we can tune the wavelet threshold to greatly reduce computation at minimal cost on performance.

Computing coefficients at only 5% of the pixels in the decoding process gives a relative depth score loss of less than 0.15%.

🎮 Try it yourself!

Try using our Jupyter notebooks to visualize results with different levels of sparsity, as well as compute the resulting computational saving in FLOPs. Notebooks can be found in <DATASET>/sparsity_test_notebook.ipynb where <DATASET> is either KITTI or NYUv2.

✏️ 📄 Citation

If you find our work useful or interesting, please consider citing our paper:

@inproceedings{ramamonjisoa-2021-wavelet-monodepth,
  title     = {Single Image Depth Prediction with Wavelet Decomposition},
  author    = {Ramamonjisoa, Micha{\"{e}}l and
               Michael Firman and
               Jamie Watson and
               Vincent Lepetit and
               Daniyar Turmukhambetov},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  month = {June},
  year = {2021}
}

👩‍⚖️ License

Comments

What is the significance of "self.is_test" ?

Hi, I have well understood the flow then, I can see that : "self.is_test" always remain False, hence the control will never enter in "If" condition https://github.com/nianticlabs/wavelet-monodepth/blob/5bc193957056a5bab6cbc1e052f6a443279f8335/NYUv2/data.py#L133

Could you please tell me why you are doing either multiply by 1000 or dividing by 1000.

Another point, as I am working with the smaller version of the NTU dataset i.e. "nyu_depth_v2_labeled.mat", whether this step of divide or multiply is also applicable in this dataset also ?

opened by tanmayGIT 10
Can not get desired performance using wavelet-decomposition

Hi,

When I run the code on KITTI, I can not get the scores reported on the paper. I run the code with stereo training, with 1024x320 resolution and wavelet decomposition, the command is shown below: "train.py --data_path --log_dir --encoder_type resnet --num_layers 50 --width 1024 --height 320 --frame_ids 0 --use_stereo --split eigen_full --num_epochs 300 --use_depth_hints --depth_hint_path --use_wavelets"

I evaluate the model on Epoch 20 and got the result: " abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.1146 & 0.8996 & 4.8552 & 0.2024 & 0.8582 & 0.9519 & 0.9785 \", which is different from the reported socres in the paper : "0.097 & 0.718 & 4.387 & 0.184 & 0.891 & 0.962 & 0.982"

I use pytorch 1.7.1 (with cuda 10.1) , torchvision 0.8.2 (with cuda 10.1), pytorch-wavelets 1.3.0, numpy 1.19.5, opencv 3.4.2, pillow 6.2.1, scikit-learn 0.24.2, on python 3.7.10. The setting is consistent with that suggested by the github repository.

So I wonder how to produce to desired scores, is there anything wrong with my settings? Thank you for your advice :D

opened by ruili3 8
Pytorch warning on lr_scheduler.step() and PIL Image error on Dataloader
Hello,

Thanks for the impressive work! I've cloned the code and setting the environment as required (pytorch 1.7.1, torchvision 0.8.2). When running the code on KITTI (w/o depth hints), I encountered two problems.

There is a warning 'UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.' Consider I use the 1.7.1 version torch, do I need to change the order in run_epoch.py as suggested by the warning?

An error shows in line 231 of mono_dataset.py: inputs[("depth_gt", scale)] = self.resizescale 'TypeError: img should be PIL Image. Got <class 'numpy.ndarray'>' I transform 'depth_gt' to PIL Image format and the problem is settled. I wonder if the error is an individual case for me, or how do you handle this problem.

There is another small error when using grid_sample(): 'UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.' Do I need to change the 'align_corners' parameter or just leave it unchanged?

I wonder how do you settle these issues in your implementation. Thank you a lot!
opened by ruili3 4
Training process was stuck when setting resolution to 192x640

Hi,

Here comes a new problem. When I trained the model with the resolution of 192*640 with command train.py --data_path <path> --log_dir <path> --encoder_type resnet --num_layers 50 --width 640 --height 192 --frame_ids 0 --use_stereo --split eigen_full --num_epochs 30 --use_depth_hints --depth_hint_path <path> --use_wavelets , the training process was stuck. The logging message of the first epoch did not show. There was no other message nor error reported. It's pretty weird because the resolution is not supposed to interfere with the training process. Can you please help to debug that? Thank you!

opened by ruili3 2
about the function of coefficients , e.g. (2 ** 3)(2 ** 2)(2 ** 1)

Hello, what is the function of 2 **2 when obtaining 'h'

h = (2 ** 2) * self.wave1(x_d1).unsqueeze(1)

and other examples, including ll = (2 ** 3) * self.wave1_ll(x_d1) h = (2 ** 1) * self.wave2(x_d2).unsqueeze(1)

Could you please tell the function of that? Thank you in advance.

opened by c-yn 1
the loss is vrey small

Hello, when I use the training command without depth hints ,the loss is very small , I'm not sure whether this is the normal phenomenon. The loss is displayed as 0. 0000

opened by wvc1208 0
What's the version of Pytorch_Wavelets?

Hi,

Thanks for the brilliant work! I notice the inverse DWT you use in the code is IDWT(). However, I didn't find its usage in the current documentation of the Pytorch Wavelets library. I wonder which version of Pytorch Wavelets did you use in the paper? And if I want to use the DWT transformation under your code settings, am I supposed to use a function like DWT(img)?

Thanks a lot!

opened by ruili3 0

Owner

Niantic Labs

Building technologies and ideas that move us

GitHub

Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

LapDepth-release This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals" M

205 Dec 30, 2022

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

122 Dec 13, 2022

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

13 Dec 10, 2022

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

138 Dec 28, 2022

[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

EPCDepth EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details ar

110 Dec 23, 2022

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

4 Jul 27, 2022

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021) Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma. We address the pr

33 Jun 27, 2022

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

18 Nov 6, 2022

ONNX-PackNet-SfM: Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

14 Dec 9, 2022

Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

76 Jan 3, 2023

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

1.1k Jan 2, 2023

This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised d

3 Oct 22, 2021

SimpleDepthEstimation - An unified codebase for NN-based monocular depth estimation methods

SimpleDepthEstimation Introduction This is an unified codebase for NN-based monocular depth estimation methods, the framework is based on detectron2 (

8 Dec 13, 2022

Pytorch implementation of forward and inverse Haar Wavelets 2D

9 Oct 30, 2022

The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

32 Jun 14, 2022

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021) Project Page | Video | Paper | Data We present a novel metho

65 Nov 28, 2022

Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

EgoNet Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation". This repo inclu

138 Dec 9, 2022

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

GDR-Net This repo provides the PyTorch implementation of the work: Gu Wang, Fabian Manhardt, Federico Tombari, Xiangyang Ji. GDR-Net: Geometry-Guided

169 Jan 7, 2023

Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

69 Dec 26, 2022

[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Related tags

Overview

Single Image Depth Prediction with Wavelet Decomposition

🧑‍🏫 Methodology

🗂 Environment Requirements 🗂

🚗 🚦 KITTI 🌳 🛣

⚙ Setup, Training and Evaluation

📊 Results 📦 Trained models

🎚 Playing with sparsity

🪑 🛁 NYUv2 🛋 🚪

⚙ Setup, Training and Evaluation

📊 Results and 📦 Trained models

🎚 Playing with sparsity

🎮 Try it yourself!

✏️ 📄 Citation

👩‍⚖️ License

Comments

What is the significance of "self.is_test" ?

Can not get desired performance using wavelet-decomposition

Pytorch warning on lr_scheduler.step() and PIL Image error on Dataloader

Training process was stuck when setting resolution to 192x640

about the function of coefficients , e.g. (2 ** 3)(2 ** 2)(2 ** 1)

the loss is vrey small

What's the version of Pytorch_Wavelets?

Owner

Niantic Labs

Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

ONNX-PackNet-SfM: Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

SimpleDepthEstimation - An unified codebase for NN-based monocular depth estimation methods

Pytorch implementation of forward and inverse Haar Wavelets 2D

The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

about the function of coefficients , e.g. (2 3)(2 2)(2 ** 1)