[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

Overview

EPCDepth

EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details are described in our paper:

Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

Rui Peng, Ronggang Wang, Yawen Lai, Luyang Tang, Yangang Cai

ICCV 2021 (arxiv)

EPCDepth can produce the most accurate and sharpest result. In the last example, the depth of the person in the second red box should be greater than that of the road sign because the road sign obscures the person. Only our model accurately captures the cue of occlusion.

Setup

1. Recommended environment

  • PyTorch 1.1
  • Python 3.6

2. KITTI data

You can download the raw KITTI dataset (about 175GB) by running:

wget -i dataset/kitti_archives_to_download.txt -P <your kitti path>/
cd <your kitti path>
unzip "*.zip"

Then, we recommend that you converted the png images to jpeg with this command:

find <your kitti path>/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg && rm {}'

or you can skip this conversion step and by manually adjusting the suffix of the image from .jpg to .png in dataset/kitti_dataset.py. Our pre-trained model is trained in jpg, and the test performance on png will slightly decrease.

3. Prepare depth hint

Once you have downloaded the KITTI dataset as in the previous step, you need to prepare the depth hint by running:

python precompute_depth_hints.py --data_path <your kitti path>

the generated depth hint will be saved to <your kitti path>/depth_hints. You should also pay attention to the suffix of the image.

📊 Evaluation

1. Download models

Download our pretrained model and put it to <your model path>.

Pre-trained PP HxW Backbone Output Scale Abs Rel Sq Rel RMSE δ < 1.25
model18_lr 192x640 resnet18 (pt) d0 0.0998 0.722 4.475 0.888
d2 0.1 0.712 4.462 0.886
model18 320x1024 resnet18 (pt) d0 0.0925 0.671 4.297 0.899
d2 0.0920 0.655 4.268 0.898
model50 320x1024 resnet50 (pt) d0 0.0905 0.646 4.207 0.901
d2 0.0905 0.629 4.187 0.900

Note: pt refers to pre-trained on ImageNet, and the results of low resolution are a bit different from the paper.

2. KITTI evaluation

This operation will save the estimated disparity map to <your disparity save path>. To recreate the results from our paper, run:

python main.py 
    --val --data_path <your kitti path> --resume <your model path>/model18.pth.tar 
    --use_full_scale --post_process --output_scale 0 --disps_path <your disparity save path>

The shape of saved disparities in numpy data format is (N, H, W).

3. NYUv2 evaluation

We validate the generalization ability on the NYU-Depth-V2 dataset using the mode trained on the KITTI dataset. Download the testing data nyu_test.tar.gz, and unzip it to <your nyuv2 testing date path>. All evaluation codes are in the nyuv2Testing folder. Run:

python nyuv2_testing.py 
    --data_path <your nyuv2 testing date path>
    --resume <your mode path>/model50.pth.tar --post_process
    --save_dir <your nyuv2 disparity save path>

By default, only the visualization results (png format) of the predicted disparity and ground-truth will be saved to <your nyuv2 disparity save path> on NYUv2 dataset.

📦 KITTI Results

You can download our precomputed disparity predictions from the following links:

Disparity PP HxW Backbone Output Scale Abs Rel Sq Rel RMSE δ < 1.25
disps18_lr 192x640 resnet18 (pt) d0 0.0998 0.722 4.475 0.888
disps18 320x1024 resnet18 (pt) d0 0.0925 0.671 4.297 0.899
disps50 320x1024 resnet50 (pt) d0 0.0905 0.646 4.207 0.901

🖼 Visualization

To visualize the disparity map saved in the KITTI evaluation (or other disparities in numpy data format), run:

python main.py --vis --disps_path <your disparity save path>/disps50.npy

The visualized depth map will be saved to <your disparity save path>/disps_vis in png format.

Training

To train the model from scratch, run:

python main.py 
    --data_path <your kitti path> --model_dir <checkpoint save dir> 
    --logs_dir <tensorboard save dir> --pretrained --post_process 
    --use_depth_hint --use_spp_distillation --use_data_graft 
    --use_full_scale --gpu_ids 0

🔧 Suggestion

  1. The magnitude of performance improvement: Data Grafting > Full-Scale > Self-Distillation. We noticed that the performance improvement of self-distillation becomes insignificant when the model capacity is large. Therefore, it is potential to explore more accurate self-distillation label extraction methods and better self-distillation strategies in the future.
  2. According to our experimental experience, the convergence of the self-supervised monocular depth estimation model using a larger backbone network is relatively unstable. You can verify your innovations on the small backbone first, and then adjust the learning rate appropriately to train on the big backbone.
  3. We found that using a pure RSU encoder has better performance than the traditional Resnet encoder, but unfortunately there is no RSU encoder pre-trained on Imagenet. Therefore, we firmly believe that someone can pre-train the RSU encoder on Imagenet and replace the resnet encoder of this model to get huge performance improvement.

Citation

If you find our work useful in your research please consider citing our paper:

@inproceedings{epcdepth,
    title = {Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation},
    author = {Peng, Rui and Wang, Ronggang and Lai, Yawen and Tang, Luyang and Cai, Yangang},
    booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
    year = {2021}
}

👩‍ Acknowledgements

Our depth hint module refers to DepthHints, the NYUv2 pre-processing refers to P2Net, and the RSU block refers to U2Net.

Comments
  • Question about warpping of right image

    Question about warpping of right image

    Hi @prstrive,

    Thank you for your great work. I have one question regarding to the warpping process. Specifically about the variable 'stereo_T'. In this segment of code: https://github.com/prstrive/EPCDepth/blob/84119c806741334b652749ee953e3eab60a3718c/dataset/kitti_dataset.py#L164 Why multiply 0.1 instead of the baselines of the cameras, which is 0.54 for KITTI?

    Thank you in advance.

    Best,

    opened by SwagJ 4
  • distortion between near cars and adjacent environment

    distortion between near cars and adjacent environment

    Thanks authors for the interesting idea in the paper. In my test, the portion containing near car of the reconstructed point cloud is distorted, which means the disparity between near obvious cars and environment background is not predicted distinctly. I guess three reasons may cause this. First, the encoding part of the network is not deep enough, the semantic is not learned well, so the difference between the environment and the vehicles may not be well judged. Second, the disparity output decoder contains down-sampled part, so the disparity of the car and adjacent environment may belong to the same grid in the output feature map. Third, the photo-metric loss contain lots of surrounding parts of the image such as the sky, making the fine-grained loss is submerged. Please tell me if you ever encountered this situation.

    opened by darknightking 3
  • Getting different test results on the KITTI

    Getting different test results on the KITTI

    1. I downloaded your pre-trained model named "model18_lr" from: https://drive.google.com/file/d/1Z60MI_UdTHfoSFSFwLI39yfe8njEN6Kp/view?usp=sharing .

    2. I saved the estimated disparity map by your script:

    python main.py --val --data_path --resume /model18_192x640.pth.tar --use_full_scale --post_process --output_scale 0 --disps_path

    1. I tested the depth map using the script provided by monodepth2 ( https://github.com/nianticlabs/monodepth2/blob/master/evaluate_depth.py ). The command is: python evaluate_depth.py --data_path <dataset_dir> --eval_mono --ext_disp_to_eval <saved_depth_map> --post_process.

    The result is:
    Mono evaluation - using median scaling Scaling ratios | med: 6.675 | std: 0.085

    abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.169 & 0.981 & 5.269 & 0.241 & 0.745 & 0.943 & 0.978 \

    It is not good. Is there anything I have missed? Thank you!

    opened by guogangok 2
  • Artifact appears as the training goes on

    Artifact appears as the training goes on

    Hi, dear author, I really appreciate your awesome work! It is more stable and performs better than depth estimation with monocular video.

    However, I met a problem when I trained EPCNet on my own dataset. When the model is only trained for 3 epochs, the performance is good. However, when i trained for more epochs (such as 20 epochs), artifacts appears on the predicted disparity map, as shown in the following figures. image image(1)

    What could be possible to lead to the result? Could you provide me some advice? THANK YOU!

    opened by adelebei 2
  • TypeError: expected str, bytes or os.PathLike object, not NoneType

    TypeError: expected str, bytes or os.PathLike object, not NoneType

    Epoch 0/20: N/A% 00/5650 || Elapsed Time: 0:00:00,ETA: --:--:--,LR: -,Loss: ------ Traceback (most recent call last): File "main.py", line 55, in model.main() File "/home/ji322906/EPCDepth/model.py", line 90, in main train_loss = self.train_epoch(epoch) File "/home/ji322906/EPCDepth/model.py", line 197, in train_epoch for batch, data in enumerate(self.train_loader): File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) TypeError: Caught TypeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ji322906/EPCDepth/dataset/kitti_dataset.py", line 151, in getitem data["curr"] = self.transform(self.get_img(folder, frame_idx, side), is_flip, False, color_aug) File "/home/ji322906/EPCDepth/dataset/kitti_dataset.py", line 88, in get_img img_path = os.path.join(self.data_path, folder, "image_0{}/data".format(self.side_map[side]), "{:010d}{}".format(frame_idx, ".png")) File "/home/ji322906/.conda/envs/jihyungkim94/lib/python3.8/posixpath.py", line 76, in join a = os.fspath(a) TypeError: expected str, bytes or os.PathLike object, not NoneType

    How do I fix it??

    opened by jihyungkim94 2
  • Getting different test results on the KITTI

    Getting different test results on the KITTI

    Hi, first of all, thanks for your excellent work! When I tried to reproduce your results on the KITTI test set with your code and pretrained weights, I got different results from those reported in this repository. Specifically, I tested model50 with: python main.py --val --data_path <kitti path> --resume <model path>/model50.tar --use_full_scale --post_process --output_scale 0 --disps_path <disparity save path> --num_layer 50 --batch_size 4 And the results are: From| Abs Rel | Sq Rel | RMSE | δ < 1.25 -- |-- | -- | -- | -- This Repository| 0.091 | 0.646 | 4.207 | 0.901 My Reproduction| 0.096| 0.669| 4.254| 0.888

    It is noticed that the extension name of the images in my KITTI dataset is .png, and you mentioned that 'Our pre-trained model is trained in jpg, and the test performance on png will slightly decrease.'. Are the differences just caused by the extension name of the images? Or do I misunderstand other things?

    opened by ZM-Zhou 1
  • I sincerely congratulate you for publishing such an excellent article. After reading your article, I encountered a problem when running the code, I hope you can help to take a look

    I sincerely congratulate you for publishing such an excellent article. After reading your article, I encountered a problem when running the code, I hope you can help to take a look

    Traceback (most recent call last): File "main.py", line 55, in model.main() File "/hpcfiles/users/hx/EPCDepth-main/model.py", line 89, in main train_loss = self.train_epoch(epoch) File "/hpcfiles/users/hx/EPCDepth-main/model.py", line 189, in train_epoch progressbar.Timer(), ",", progressbar.ETA(), ",", progressbar.Variable('LR', width=1), ",", AttributeError: module 'progressbar' has no attribute 'Variable'

    opened by xiaoyudanaa 1
  • Error('tuple' object is not callable) in color augmentation

    Error('tuple' object is not callable) in color augmentation

    Hello, This is great work!. I am facing an issue with random color augmentation. out = self.color_aug(x) TypeError: 'tuple' object is not callable

    Could you please take a look?

    opened by akashchavan15 1
Owner
Rui Peng
Rui Peng
the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

NeurAI 4 Jul 27, 2022
This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular Depth Estimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised d

Hang 3 Oct 22, 2021
Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

LapDepth-release This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals" M

Minsoo Song 205 Dec 30, 2022
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

Official implementation for paper "Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR"

Ziyue Feng 72 Dec 9, 2022
[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Single Image Depth Prediction with Wavelet Decomposition Michaël Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambeto

Niantic Labs 205 Jan 2, 2023
Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

Michaël Fonder 76 Jan 3, 2023
Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

Computational Photography Lab @ SFU 1.1k Jan 2, 2023
SimpleDepthEstimation - An unified codebase for NN-based monocular depth estimation methods

SimpleDepthEstimation Introduction This is an unified codebase for NN-based monocular depth estimation methods, the framework is based on detectron2 (

null 8 Dec 13, 2022
ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

Ibai Gorordo 18 Nov 6, 2022
ONNX-PackNet-SfM: Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Ibai Gorordo 14 Dec 9, 2022
Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

Meta Research 122 Dec 13, 2022
Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

CorDA Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation Prerequisite Please create and activate the follo

Qin Wang 60 Nov 30, 2022
PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

StructDepth PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimat

SJTU-ViSYS 112 Nov 28, 2022
The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

null 32 Jun 14, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Self-Supervised Multi-Frame Monocular Scene Flow 3D visualization of estimated depth and scene flow (overlayed with input image) from temporally conse

Visual Inference Lab @TU Darmstadt 85 Dec 22, 2022
Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

Robotics and Perception Group 69 Dec 26, 2022
Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021) Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma. We address the pr

Kranti Kumar Parida 33 Jun 27, 2022