ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

Last update: Nov 28, 2022

Related tags

Overview

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

Project Page | Video | Paper | Data

We present a novel method for multi-view depth estimation from a single video, which is a critical task in various applications, such as perception, reconstruction and robot navigation. Although previous learning-based methods have demonstrated compelling results, most works estimate depth maps of individual video frames independently, without taking into consideration the strong geometric and temporal coherence among the frames. Moreover, current state-of-the-art (SOTA) models mostly adopt a fully 3D convolution network for cost regularization and therefore require high computational cost, thus limiting their deployment in real-world applications. Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer to explicitly associate geometric and temporal correlation with multiple estimated depth maps. Furthermore, to reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network consisting of a 2D context-aware network and a 3D matching network which learn 2D context information and 3D disparity cues separately.

Here is the official repo for the paper:

Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (Long et al., 2021, CVPR 2021).

Installation
Dataset
Usage
- Training
- Evaluation
License
Citation

Requirements and Installation

This code is implemented in PyTorch.

The code has been tested on the following system:

Python 3.6
PyTorch 1.2.0
Nvidia apex library (optional)
Nvidia GPU (GTX 2080Ti) CUDA 10.1

To install, first clone this repo and install all dependencies:

conda env create -f environment.yml

Option: install apex to enable synchronized batch normalization

git clone https://github.com/NVIDIA/apex.git
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Dataset

Please also cite the original papers if you use any of them in your work.

Dataset	Notes on Dataset Split
ScanNet	see ./data/scannet_split/
7scenes	see ./data/7scenes/test.txt

Train a new model

In the training stage, our model takes a sequence of 5 frames as input, with a batch size of 4 sequences on 4 GPUs. We use the following code to train a model:

python -m torch.distributed.launch --nproc_per_node=4 train_hybrid.py  --using_apex  --sync_bn \
--datapath /userhome/35/xxlong/dataset/scannet_whole/  \
--testdatapath /userhome/35/xxlong/dataset/scannet_test/ \
--reloadscan True \
--batch_size 1 --seq_len 5 --mode train --summary_freq 10 \
--epochs 7 --lr 0.00004 --lrepochs 2,4,6,8:2 \
--logdir ./logs/hybrid_res50_ndepths64 \
--resnet 50 --ndepths 64 --IF_EST_transformer False \
--depth_min 0.1 --depth_max 10. |  tee -a ./logs/hybrid_res50_ndepths64/log.txt

bash train_hybrid.sh

Evaluation

Once the model is trained, the following command is used to evaluate test images given the trained_model.

Our model has two testing modes: Joint and ESTM

For Joint mode, run:

bash eval_hybrid.sh

For ESTM mode, run:

bash eval_hybrid_seq.sh

License

ESTDepth is MIT-licensed. The license applies to the pre-trained models as well.

Citation

Please cite as

@InProceedings{Long_2021_CVPR,
    author    = {Long, Xiaoxiao and Liu, Lingjie and Li, Wei and Theobalt, Christian and Wang, Wenping},
    title     = {Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {8258-8267}

Comments

Reproduce the result in your paper

Hi, thanks for your great work!

I have a few questions about how to reproduce your evaluation results on Scannet. First, I have tried to use the pretrained model you provided to evaluate the Scannet test set. I modified the eval_hybrid.py for evaluation. The evaluation metrics are calculated from your code (Test mode returns the depth_metrics). However, the results are not satisfactory. I'm considering whether it's because the model you provide is not the best or because of the code？ Besides, what is the difference between your two testing modes?

opened by ckLibra 8
Calibration of 7-Scenes datasets

Hi, thanks for your great work! I'm wondering whether the 7-Scenes datasets have been calibrated before evaluation or just use the raw data from the official website for evaluation?

opened by ckLibra 3
Dataset process

Hi~ Thanks a lot for your excellent work! I'm wondering what pre-processing work should do for Scannet dataset if I want to retrain this model ? Have you ever processed the Scannet dataset? Thanks!

opened by DingYikang 3
Testing on TUM-RGB-SLAM Dataset

Hi Authors,

Thanks for providing the code and all information. I would like to test on TUM-RGB-SLAM Dataset or any other custom dataset that can be found. I understand that we require RGB images, camera poses, camera intrinsic parameters for testing on any video sequence. How do we test with the given eval-script as when I go through the script it's only coded for Scannet or 7Net. Is it possible to provide a test-script for other datasets?

Thanks and Regards Aakash Rajpal

opened by aakash26 2
Is depth prediction results scaled or not?

Hi,

Thanks for the great work! One question: is your depth prediction result scaled with ground truth depth or not? Looking at your evaluate_depth function in metric.py, it seems you calculated both. But are the numbers (e.g. table 1) in your paper gt-scaled or not?

Thanks!

opened by kudo1026 1

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

Related tags

Overview

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

Project Page | Video | Paper | Data

Table of contents

Requirements and Installation

Dataset

Train a new model

Evaluation

License

Citation

Comments

Reproduce the result in your paper

Calibration of 7-Scenes datasets

Dataset process

Testing on TUM-RGB-SLAM Dataset

Is depth prediction results scaled or not?

Owner

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

Patch2Pix: Epipolar-Guided Pixel-Level Correspondences [CVPR2021]

Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

Implementation of the "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos" paper.

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

[CVPR'21] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation (RA-L/ICRA 2020)

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

Deep generative modeling for time-stamped heterogeneous data, enabling high-fidelity models for a large variety of spatio-temporal domains.

Learning Spatio-Temporal Transformer for Visual Tracking

Implementation of the "PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences" paper.