Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Overview

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma.

We address the problem of estimating depth with multi modal audio visual data. Inspired by the ability of animals, such as bats and dolphins, to infer distance of objects with echolocation, we propose an end-to-end deep learning based pipeline utilizing RGB images, binaural echoes and estimated material properties of various objects within a scene for the task of depth estimation.

[Project] [Paper]

teaser

Requirements

The code is tesed with

- Python 3.6 
- PyTorch 1.6.0
- Numpy 1.19.5

Dataset

Replica-VisualEchoes can be obatined from here. We have used the 128x128 image resolution for our experiment.

MatterportEchoes is an extension of existing matterport3D dataset. In order to obtain the raw frames please forward the access request acceptance from the authors of matterport3D dataset. We will release the procedure to obtain the frames and echoes using habitat-sim and soundspaces in near future.

Pre-trained Model

We have provided pre-trained model for both the datasets here. For each of the dataset four different parts of the model are saved individually with name rgbdepth_*, audiodepth_*, material_*, attention_*, where * represents the name of the dataset, i.e. replica or mp3d.

Training

To train the model, first download the pre-trained material net from above link.

python train.py \
--validation_on \
--dataset mp3d \
--img_path path_to_img_folder \
--metadatapath path_to_metadata \
--audio_path path_to_audio_folder \
--checkpoints_dir path_to_save_checkpoints \
--init_material_weight path_to_pre-trained_material_net

Evaluation

To evaluate the method using the pre-trained model, download the models for the corresponding dataset and the dataset.

  • Evalution for Replica dataset
python test.py \
--img_path path_to_img_folder \
--audio_path path_to_audio_data \
--checkpoints_dir path_to_the_pretrained_model \
--dataset replica
  • Evaluation for Matterport3D dataset
python test.py \
--img_path path_to_img_folder \
--audio_path path_to_audio_data \
--checkpoints_dir path_to_the_pretrained_model \
--dataset mp3d

License and Citation

The usage of this software is under MIT License.

@inproceedings{parida2021beyond,
  title={Beyond Image to Depth: Improving Depth Prediction using Echoes},
  author={Parida, Kranti and Srivastava, Siddharth and Sharma, Gaurav},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2021}
}

Acknowledgement

Some portion of the code are adapted from Ruohan Gao. Thanks Ruohan!

Comments
  • I have not been able to reproduce your results.

    I have not been able to reproduce your results.

    thank you for your excellent work. There is no training code in the Git project, only the test code. So I wrote a trainer myself. But I have not been able to reproduce your results. I used my own training checkpoint during the test. The test command used is:

    python test.py \
    --gpu_ids 0 \
    --dataset mp3d \
    --img_path data/echoes_obs/mp3d/scene_observations_128.pkl \
    --audio_path data/echoes_navigable \
    --checkpoints_dir data/models/mp3d_128 \
    --audio_sampling_rate 16000 \
    --max_depth 10.0
    

    The final result is: image

    Can you give me some suggestions? Thanks a lot.

    opened by yyf17 7
  • The code of modal fusion

    The code of modal fusion

    I am very sorry to disturb you. I did not find your modal fusion part in the code, but there is modal fusion in the paper. Is your code completely public?

    opened by helloful 2
  • 3ms and 5ms sweep sound sources for mp3d

    3ms and 5ms sweep sound sources for mp3d

    Thank you for your patient help and guidance. The 3 ms and 5 ms sound source signals with a sampling frequency of 44100 can be obtained from the Git project VisualEchoes (https://github.com/facebookresearch/VisualEchoes).How did you get the signal of the 3 ms and 5 ms sound sources with a sampling frequency of 16000?

    image

    Thanks a lot.

    opened by yyf17 2
  • on the partition problem of dataset mp3d

    on the partition problem of dataset mp3d

    Thank you for your excellent work and hard work. I have a question as follows: How do you divide the mp3d data set? I saw in your code that the division of the data set is placed in 3 files (see the code below), but I can't find these 3 files in the code. Can you provide it? Thanks a lot.

    “options/base_options.py”

    			train_scenes_file = os.path.join(self.opt.metadatapath, 'mp3d_scenes_train.txt')
    			val_scenes_file = os.path.join(self.opt.metadatapath, 'mp3d_scenes_val.txt')
    			test_scenes_file = os.path.join(self.opt.metadatapath, 'mp3d_scenes_test.txt')
    
    opened by yyf17 2
  • No training code

    No training code

    Hello, thank you for your excellent work. There is no training code in the Git project, only the test code. Can you share your training code? Thanks a lot.

    opened by yyf17 2
  • How long will the MatterportEchoes dataset be released?

    How long will the MatterportEchoes dataset be released?

    Hello, thanks for your excellent and hard work ! Could you please tell the approximate time for the MatterportEchoes dataset to be released? I really want to follow your work. Thanks !

    opened by qqsh0214 1
  • run.sh

    run.sh

    For the run.sh script: CUDA_VISIBLE_DEVICES=2 python train.py
    --validation_on
    --dataset mp3d
    --img_path /data1/kranti/audio-visual-depth/dataset/visual_echoes/images/mp3d_split_wise
    --metadatapath /data1/kranti/audio-visual-depth/dataset/visual_echoes/metadata/mp3d
    --audio_path /data1/kranti/audio-visual-depth/dataset/visual_echoes/echoes/mp3d/echoes_navigable
    --checkpoints_dir /data1/kranti/audio-visual-depth/checkpoints
    --init_material_weight ./checkpoints/material_pre_trained_minc.pth

    Could you please let me know how to get the images/ and the metadata/?

    opened by catherine-qian 2
Owner
Kranti Kumar Parida
Kranti Kumar Parida
Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

Pranav 39 Nov 21, 2022
《Single Image Reflection Removal Beyond Linearity》(CVPR 2019)

Single-Image-Reflection-Removal-Beyond-Linearity Paper Single Image Reflection Removal Beyond Linearity. Qiang Wen, Yinjie Tan, Jing Qin, Wenxi Liu, G

Qiang Wen 51 Jun 24, 2022
Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

TCMR: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video Qualtitative result Paper teaser video Introduction This r

Hongsuk Choi 215 Jan 6, 2023
Lipstick ain't enough: Beyond Color-Matching for In-the-Wild Makeup Transfer (CVPR 2021)

Table of Content Introduction Datasets Getting Started Requirements Usage Example Training & Evaluation CPM: Color-Pattern Makeup Transfer CPM is a ho

VinAI Research 248 Dec 13, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
MAU: A Motion-Aware Unit for Video Prediction and Beyond, NeurIPS2021

MAU (NeurIPS2021) Zheng Chang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Yan Ye, Xinguang Xiang, Wen GAo. Official PyTorch Code for "MAU: A Motion-Aware

ZhengChang 20 Nov 25, 2022
Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

Robotics and Perception Group 69 Dec 26, 2022
Code for the paper Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations (AKBC 2021).

Relation Prediction as an Auxiliary Training Objective for Knowledge Base Completion This repo provides the code for the paper Relation Prediction as

Facebook Research 85 Jan 2, 2023
[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Single Image Depth Prediction with Wavelet Decomposition Michaël Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambeto

Niantic Labs 205 Jan 2, 2023
ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021) Project Page | Video | Paper | Data We present a novel metho

null 65 Nov 28, 2022
The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

null 9 Nov 14, 2022
Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

LapDepth-release This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals" M

Minsoo Song 205 Dec 30, 2022
Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in ONNX

ONNX msg_chn_wacv20 depth completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20 model in

Ibai Gorordo 19 Oct 22, 2022
Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

TFLite-msg_chn_wacv20-depth-completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model

Ibai Gorordo 2 Oct 4, 2021
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

Introduction The official repository for "Mining Contextual Information Beyond Image for Semantic Segmentation". Our full code has been merged into ss

null 55 Nov 9, 2022
Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

Kai Zhang 1.2k Dec 29, 2022
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral) This repo is the official imp

如今我已剑指天涯 46 Dec 21, 2022
The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

null 32 Jun 14, 2022