Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Kranti Kumar Parida

Last update: Jun 27, 2022

Related tags

Deep Learning beyond-image-to-depth

Overview

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma.

We address the problem of estimating depth with multi modal audio visual data. Inspired by the ability of animals, such as bats and dolphins, to infer distance of objects with echolocation, we propose an end-to-end deep learning based pipeline utilizing RGB images, binaural echoes and estimated material properties of various objects within a scene for the task of depth estimation.

[Project] [Paper]

Requirements

The code is tesed with

- Python 3.6 
- PyTorch 1.6.0
- Numpy 1.19.5

Dataset

Replica-VisualEchoes can be obatined from here. We have used the 128x128 image resolution for our experiment.

MatterportEchoes is an extension of existing matterport3D dataset. In order to obtain the raw frames please forward the access request acceptance from the authors of matterport3D dataset. We will release the procedure to obtain the frames and echoes using habitat-sim and soundspaces in near future.

Pre-trained Model

We have provided pre-trained model for both the datasets here. For each of the dataset four different parts of the model are saved individually with name rgbdepth_*, audiodepth_*, material_*, attention_*, where * represents the name of the dataset, i.e. replica or mp3d.

Training

To train the model, first download the pre-trained material net from above link.

python train.py \
--validation_on \
--dataset mp3d \
--img_path path_to_img_folder \
--metadatapath path_to_metadata \
--audio_path path_to_audio_folder \
--checkpoints_dir path_to_save_checkpoints \
--init_material_weight path_to_pre-trained_material_net

Evaluation

To evaluate the method using the pre-trained model, download the models for the corresponding dataset and the dataset.

Evalution for Replica dataset

python test.py \
--img_path path_to_img_folder \
--audio_path path_to_audio_data \
--checkpoints_dir path_to_the_pretrained_model \
--dataset replica

Evaluation for Matterport3D dataset

python test.py \
--img_path path_to_img_folder \
--audio_path path_to_audio_data \
--checkpoints_dir path_to_the_pretrained_model \
--dataset mp3d

License and Citation

The usage of this software is under MIT License.

@inproceedings{parida2021beyond,
  title={Beyond Image to Depth: Improving Depth Prediction using Echoes},
  author={Parida, Kranti and Srivastava, Siddharth and Sharma, Gaurav},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2021}
}

Acknowledgement

Some portion of the code are adapted from Ruohan Gao. Thanks Ruohan!

Comments

I have not been able to reproduce your results.
thank you for your excellent work. There is no training code in the Git project, only the test code. So I wrote a trainer myself. But I have not been able to reproduce your results. I used my own training checkpoint during the test. The test command used is:

python test.py \ --gpu_ids 0 \ --dataset mp3d \ --img_path data/echoes_obs/mp3d/scene_observations_128.pkl \ --audio_path data/echoes_navigable \ --checkpoints_dir data/models/mp3d_128 \ --audio_sampling_rate 16000 \ --max_depth 10.0

The final result is:

Can you give me some suggestions? Thanks a lot.
opened by yyf17 7
The code of modal fusion

I am very sorry to disturb you. I did not find your modal fusion part in the code, but there is modal fusion in the paper. Is your code completely public?

opened by helloful 2
3ms and 5ms sweep sound sources for mp3d

Thank you for your patient help and guidance. The 3 ms and 5 ms sound source signals with a sampling frequency of 44100 can be obtained from the Git project VisualEchoes (https://github.com/facebookresearch/VisualEchoes).How did you get the signal of the 3 ms and 5 ms sound sources with a sampling frequency of 16000?

Thanks a lot.

opened by yyf17 2
on the partition problem of dataset mp3d
Thank you for your excellent work and hard work. I have a question as follows: How do you divide the mp3d data set? I saw in your code that the division of the data set is placed in 3 files (see the code below), but I can't find these 3 files in the code. Can you provide it? Thanks a lot.

“options/base_options.py”

train_scenes_file = os.path.join(self.opt.metadatapath, 'mp3d_scenes_train.txt') val_scenes_file = os.path.join(self.opt.metadatapath, 'mp3d_scenes_val.txt') test_scenes_file = os.path.join(self.opt.metadatapath, 'mp3d_scenes_test.txt')
opened by yyf17 2
No training code

Hello, thank you for your excellent work. There is no training code in the Git project, only the test code. Can you share your training code? Thanks a lot.

opened by yyf17 2
How long will the MatterportEchoes dataset be released?

Hello, thanks for your excellent and hard work ! Could you please tell the approximate time for the MatterportEchoes dataset to be released? I really want to follow your work. Thanks !

opened by qqsh0214 1
run.sh

For the run.sh script: CUDA_VISIBLE_DEVICES=2 python train.py
--validation_on
--dataset mp3d
--img_path /data1/kranti/audio-visual-depth/dataset/visual_echoes/images/mp3d_split_wise
--metadatapath /data1/kranti/audio-visual-depth/dataset/visual_echoes/metadata/mp3d
--audio_path /data1/kranti/audio-visual-depth/dataset/visual_echoes/echoes/mp3d/echoes_navigable
--checkpoints_dir /data1/kranti/audio-visual-depth/checkpoints
--init_material_weight ./checkpoints/material_pre_trained_minc.pth

Could you please let me know how to get the images/ and the metadata/?

opened by catherine-qian 2

Owner

Kranti Kumar Parida

GitHub

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

39 Nov 21, 2022

《Single Image Reflection Removal Beyond Linearity》(CVPR 2019)

Single-Image-Reflection-Removal-Beyond-Linearity Paper Single Image Reflection Removal Beyond Linearity. Qiang Wen, Yinjie Tan, Jing Qin, Wenxi Liu, G

51 Jun 24, 2022

Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

TCMR: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video Qualtitative result Paper teaser video Introduction This r

215 Jan 6, 2023

Lipstick ain't enough: Beyond Color-Matching for In-the-Wild Makeup Transfer (CVPR 2021)

Table of Content Introduction Datasets Getting Started Requirements Usage Example Training & Evaluation CPM: Color-Pattern Makeup Transfer CPM is a ho

248 Dec 13, 2022

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

138 Dec 28, 2022

MAU: A Motion-Aware Unit for Video Prediction and Beyond, NeurIPS2021

MAU (NeurIPS2021) Zheng Chang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Yan Ye, Xinguang Xiang, Wen GAo. Official PyTorch Code for "MAU: A Motion-Aware

20 Nov 25, 2022

Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

69 Dec 26, 2022

Code for the paper Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations (AKBC 2021).

Relation Prediction as an Auxiliary Training Objective for Knowledge Base Completion This repo provides the code for the paper Relation Prediction as

85 Jan 2, 2023

[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Single Image Depth Prediction with Wavelet Decomposition Michaël Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambeto

205 Jan 2, 2023

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021) Project Page | Video | Paper | Data We present a novel metho

65 Nov 28, 2022

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

9 Nov 14, 2022

Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

LapDepth-release This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals" M

205 Dec 30, 2022

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in ONNX

ONNX msg_chn_wacv20 depth completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20 model in

19 Oct 22, 2022

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

TFLite-msg_chn_wacv20-depth-completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model

2 Oct 4, 2021

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

236 Dec 22, 2022

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Related tags

Overview

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Requirements

Dataset

Pre-trained Model

Training

Evaluation

License and Citation

Acknowledgement

Comments

I have not been able to reproduce your results.

The code of modal fusion

3ms and 5ms sweep sound sources for mp3d

on the partition problem of dataset mp3d

No training code

How long will the MatterportEchoes dataset be released?

run.sh

Owner

Kranti Kumar Parida

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

《Single Image Reflection Removal Beyond Linearity》(CVPR 2019)

Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

Lipstick ain't enough: Beyond Color-Matching for In-the-Wild Makeup Transfer (CVPR 2021)

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

MAU: A Motion-Aware Unit for Video Prediction and Beyond, NeurIPS2021

Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Code for the paper Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations (AKBC 2021).

[CVPR 2021] Monocular depth estimation using wavelets for efficiency

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in ONNX

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals