PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)

Overview

Unsupervised Depth Completion with Calibrated Backprojection Layers

PyTorch implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers

Published in ICCV 2021 (ORAL)

[publication] [arxiv] [poster] [talk]

Model have been tested on Ubuntu 16.04, 20.04 using Python 3.5, 3.6, 3.7 PyTorch 1.2, 1.3

Authors: Alex Wong

If this work is useful to you, please cite our paper:

@inproceedings{wong2021unsupervised,
  title={Unsupervised Depth Completion with Calibrated Backprojection Layers},
  author={Wong, Alex and Soatto, Stefano},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={12747--12756},
  year={2021}
}

Table of Contents

  1. About sparse to dense depth completion
  2. About Calibrated Backprojection Network
  3. Setting up
  4. Downloading pretrained models
  5. Running KBNet
  6. Training KBNet
  7. Related projects
  8. License and disclaimer

About sparse-to-dense depth completion

Given sparse point cloud and image, the goal is to infer the dense point cloud. The sparse point cloud can obtained either from computational methods such as SfM (Strcuture-from-Motion) or active sensors such as lidar or structured light sensors. Commonly, it is projected onto the image plane as a sparse depth map or 2.5D representation, in which case, methods in this domain predicts a dense depth map. Here are some examples of dense point clouds outputted by our method:

Image Sparse Point Cloud Output Point Cloud

To follow the literature and benchmarks for this task, you may visit: Awesome State of Depth Completion

About Calibrated Backprojection Network

The motivation:

(1) In the scene above of the copyroom and outdoor bench, the point cloud produced by XIVO is on the order of hundreds of points. When projected onto the image plane as a 2.5D range map, the sparse points cover only 0.05% of the image space -- where typically only a single measurement will be present within a local neighborhood and in most cases, none. This not only hinders learning by rendering conventional convolutions ineffective, which will produce mostly zero activations, but also increases the sensitivity of the model to the variations in the range sensor and feature detector used to produce the point cloud.

(2) Typically the same sensor platform is used to collect the training set, so the model tends to overfit to the sensor setup. This is exacerbated in the unsupervised learning paradigm which leverages a photometric reconstruction loss as a supervisory signal. Because image reconstruction requires reprojection from one frame to another, this implicitly bakes in the intrinsic camera calibration parameters and limits generalization.

Our solution:

(1) To address the sparsity problem, we propose to project the point cloud onto the image plane as a sparse range map and learn a dense or quasi dense representation via a sparse to dense pooling (S2D) module. S2D performs min and max pooling with various kernel sizes to densify and capture the scene structure on multiple scales as in the figure below.

There exists trade-offs between detail and density (more dense, less detail) and between preservation of near and far structures (min pool biases structures close to the camera, max pool biases structures far from the camera). These trade-offs are learned by three 1 by 1 convolutional layers and the resulting multi-scale depth features are fused back into the original sparse depth map to yield a dense or quasi-dense representation.

(2) To address the generalization problem, we propose to take an image, the projected sparse point cloud, and the calibration matrix as input. We introduce a calibrated backprojection layer or a KB layer that maps camera intrinsics, input image, and the imputed depth onto the 3D scene in a canonical frame of reference. This can be thought of as a form of spatial Euclidean positional encoding of the image.

Calibration, therefore, can be changed depending on the camera used, allowing us to use different calibrations in training and test time, which significantly improves generalization.

Our network, Calibrated Backprojection Network (KBNet), goes counter to the current trend of learning everything with generic architectures like Transformers, including what we already know about basic Euclidean geometry. Our model has strong inductive bias in our KB layer, which incorporates the calibration matrix directly into the architecture to yield an RGB representation lifted into scene topology via 3D positional encoding.

Not only do the design choices improve generalization across sensor platforms, by incorporating a basic geometric image formation model based on Euclidean transformations in 3D and central perspective projection onto 2D, we can reduce the model size while still achieving the state of the art.

To demonstrate the effectiveness of our method, we trained a model on the VOID dataset, which is captured by an Intel RealSense, and tested it on NYU v2, which is collected with a Microsoft Kinect.

Setting up your virtual environment

We will create a virtual environment with the necessary dependencies

virtualenv -p /usr/bin/python3.7 kbnet-py37env
source kbnet-py37env/bin/activate
pip install opencv-python scipy scikit-learn scikit-image matplotlib gdown numpy gast Pillow pyyaml
pip install torch==1.3.0 torchvision==0.4.1 tensorboard==2.3.0

Setting up your datasets

For datasets, we will use KITTI for outdoors and VOID for indoors. We will also use NYUv2 to demonstrate our generalization capabilities.

mkdir data
ln -s /path/to/kitti_raw_data data/
ln -s /path/to/kitti_depth_completion data/
ln -s /path/to/void_release data/
ln -s /path/to/nyu_v2 data/

In case you do not already have KITTI and VOID datasets downloaded, we provide download scripts for them:

bash bash/setup_dataset_kitti.sh
bash bash/setup_dataset_void.sh

The bash/setup_dataset_void.sh script downloads the VOID dataset using gdown. However, gdown intermittently fails. As a workaround, you may download them via:

https://drive.google.com/open?id=1GGov8MaBKCEcJEXxY8qrh8Ldt2mErtWs
https://drive.google.com/open?id=1c3PxnOE0N8tgkvTgPbnUZXS6ekv7pd80
https://drive.google.com/open?id=14PdJggr2PVJ6uArm9IWlhSHO2y3Q658v

which will give you three files void_150.zip, void_500.zip, void_1500.zip.

Assuming you are in the root of the repository, to construct the same dataset structure as the setup script above:

mkdir void_release
unzip -o void_150.zip -d void_release/
unzip -o void_500.zip -d void_release/
unzip -o void_1500.zip -d void_release/
bash bash/setup_dataset_void.sh unpack-only

For more detailed instructions on downloading and using VOID and obtaining the raw rosbags, you may visit the VOID dataset webpage.

Downloading our pretrained models

To use our pretrained models trained on KITTI and VOID models, you can download them from Google Drive

gdown https://drive.google.com/uc?id=1C2RHo6E_Q8TzXN_h-GjrojJk4FYzQfRT
unzip pretrained_models.zip

Note: gdown fails intermittently and complains about permission. If that happens, you may also download the models via:

https://drive.google.com/file/d/1C2RHo6E_Q8TzXN_h-GjrojJk4FYzQfRT/view?usp=sharing

Once you unzip the file, you will find a directory called pretrained_models containing the following file structure:

pretrained_models
|---- kitti
      |---- kbnet-kitti.pth
      |---- posenet-kitti.pth
|---- void
      |---- kbnet-void1500.pth
      |---- posenet-void1500.pth

We also provide our PoseNet model that was trained jointly with our Calibrated Backproject Network (KBNet) so that you may finetune on them without having to relearn pose from scratch.

The pretrained weights should reproduce the numbers we reported in our paper. The table below are the comprehensive numbers:

For KITTI:

Evaluation set MAE RMSE iMAE iRMSE
Validation 260.44 1126.85 1.03 3.20
Testing (online) 256.76 1069.47 1.02 2.95

For VOID:

Evaluation set MAE RMSE iMAE iRMSE
VOID 1500 (0.5% density) 39.80 95.86 21.16 49.72
VOID 500 (0.15% density) 77.70 172.49 38.87 85.59
VOID 150 (0.05% density) 131.54 263.54 66.84 128.29
NYU v2 (generalization) 117.18 218.67 23.01 47.96

Running KBNet

To run our pretrained model on the KITTI validation set, you may use

bash bash/kitti/run_kbnet_kitti_validation.sh

Our run scripts will log all of the hyper-parameters used as well as the evaluation scores based on the output_path argument. The expected output should be:

Evaluation results:
     MAE      RMSE      iMAE     iRMSE
 260.447  1126.855     1.035     3.203
     +/-       +/-       +/-       +/-
  92.735   398.888     0.285     1.915
Total time: 13187.93 ms  Average time per sample: 15.19 ms

Our model runs fairly fast, the reported number in the paper is 16ms for KITTI images on an Nvidia 1080Ti GPU. The above is just slightly faster than the reported number.

To run our pretrained model on the KITTI test set, you may use

bash bash/kitti/run_kbnet_kitti_testing.sh

To get our numbers, you will need to submit the outputs to the KITTI online benchmark.

To run our pretrained model on the VOID 1500 test set of 0.5% density, you may use

bash bash/void/run_kbnet_void1500.sh

You should expect the output:

Evaluation results:
     MAE      RMSE      iMAE     iRMSE
  39.803    95.864    21.161    49.723
     +/-       +/-       +/-       +/-
  27.521    67.776    24.340    62.204
Total time: 10399.33 ms  Average time per sample: 13.00 ms

We note that for all of the following experiments, we will use our model trained on denser (VOID 1500) data and test them on various density levels.

Similar to the above, for the VOID 500 (0.15%) test set, you can run:

bash bash/void/run_kbnet_void500.sh

and the VOID 150 (0.05%) test set:

bash bash/void/run_kbnet_void150.sh

To use our model trained on VOID and test it on NYU v2:

bash bash/void/run_kbnet_nyu_v2.sh

Training KBNet

To train KBNet on the KITTI dataset, you may run

bash bash/kitti/train_kbnet_vkitti.sh

To train KBNet on the VOID dataset, you may run

bash bash/void/train_kbnet_void1500.sh

Note that while we do not train on VOID 500 or 150 (hence no hyper-parameters are provided), if interested you may modify the training paths to train on VOID 500:

--train_image_path training/void/void_train_image_500.txt \
--train_sparse_depth_path training/voidvoid_train_sparse_depth_500.txt \
--train_intrinsics_path training/void/void_train_intrinsics_500.txt \

and on VOID 150:

--train_image_path training/void/void_train_image_150.txt \
--train_sparse_depth_path training/voidvoid_train_sparse_depth_150.txt \
--train_intrinsics_path training/void/void_train_intrinsics_150.txt \

To monitor your training progress, you may use Tensorboard

tensorboard --logdir trained_kbnet/kitti/kbnet_model
tensorboard --logdir trained_kbnet/void1500/kbnet_model

Related projects

You may also find the following projects useful:

  • ScaffNet: Learning Topology from Synthetic Data for Unsupervised Depth Completion. An unsupervised sparse-to-dense depth completion method that first learns a map from sparse geometry to an initial dense topology from synthetic data (where ground truth comes for free) and amends the initial estimation by validating against the image. This work is published in the Robotics and Automation Letters (RA-L) 2021 and the International Conference on Robotics and Automation (ICRA) 2021.
  • AdaFrame: Learning Topology from Synthetic Data for Unsupervised Depth Completion. An adaptive framework for learning unsupervised sparse-to-dense depth completion that balances data fidelity and regularization objectives based on model performance on the data. This work is published in the Robotics and Automation Letters (RA-L) 2021 and the International Conference on Robotics and Automation (ICRA) 2021.
  • VOICED: Unsupervised Depth Completion from Visual Inertial Odometry. An unsupervised sparse-to-dense depth completion method, developed by the authors. The paper introduces Scaffolding for depth completion and a light-weight network to refine it. This work is published in the Robotics and Automation Letters (RA-L) 2020 and the International Conference on Robotics and Automation (ICRA) 2020.
  • VOID: from Unsupervised Depth Completion from Visual Inertial Odometry. A dataset, developed by the authors, containing indoor and outdoor scenes with non-trivial 6 degrees of freedom. The dataset is published along with this work in the Robotics and Automation Letters (RA-L) 2020 and the International Conference on Robotics and Automation (ICRA) 2020.
  • XIVO: The Visual-Inertial Odometry system developed at UCLA Vision Lab. This work is built on top of XIVO. The VOID dataset used by this work also leverages XIVO to obtain sparse points and camera poses.
  • GeoSup: Geo-Supervised Visual Depth Prediction. A single image depth prediction method developed by the authors, published in the Robotics and Automation Letters (RA-L) 2019 and the International Conference on Robotics and Automation (ICRA) 2019. This work was awarded Best Paper in Robot Vision at ICRA 2019.
  • AdaReg: Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction. A single image depth prediction method that introduces adaptive regularization. This work was published in the proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) 2019.

We also have works in adversarial attacks on depth estimation methods and medical image segmentation:

  • Stereopagnosia: Stereopagnosia: Fooling Stereo Networks with Adversarial Perturbations. Adversarial perturbations for stereo depth estimation, published in the Proceedings of AAAI Conference on Artificial Intelligence (AAAI) 2021.
  • Targeted Attacks for Monodepth: Targeted Adversarial Perturbations for Monocular Depth Prediction. Targeted adversarial perturbations attacks for monocular depth estimation, published in the proceedings of Neural Information Processing Systems (NeurIPS) 2020.
  • SPiN : Small Lesion Segmentation in Brain MRIs with Subpixel Embedding. Subpixel architecture for segmenting ischemic stroke brain lesions in MRI images, published in the Proceedings of Medical Image Computing and Computer Assisted Intervention (MICCAI) Brain Lesion Workshop 2021 as an oral paper.

License and disclaimer

This software is property of the UC Regents, and is provided free of charge for research purposes only. It comes with no warranties, expressed or implied, according to these terms and conditions. For commercial use, please contact UCLA TDG.

Comments
  • Question on coordinate frames for pose data

    Question on coordinate frames for pose data

    Hello, Photo_loss

    In the above, the relative pose g(tau)(t) belonging to SE(3), refers to the transformation from the world frame to the camera frame right? That is, the pose is wrt the camera frame.

    opened by rakshith95 10
  • uneven GPU memory caused by multi-gpu training

    uneven GPU memory caused by multi-gpu training

    Hi, Alex, Thanks for your nice work. I'm facing the problem of uneven GPU memories when training the model with multiple GPUs. It costs much more memory on GPU#0 than others. I think the main reason is that DataParallel can only compute losses on GPU#0. Would you give some advice to balance the GPU memory? Thanks in advance.

    opened by lqzhao 9
  • Questionable depth map on VOID ground truth + inference

    Questionable depth map on VOID ground truth + inference

    Hey there! Thank you for the work! I tried it out on my own sparse depth map + rgb image, and it didn't perform too well at all. I also visualized some ground truth data from the VOID dataset, and found that some pointclouds look similarly bad. The copyroom folder looks fine, but the first image from birthplace_of_internet already looks bad. I can understand my own dataset could be problematic concerning sparse depth map resolution, but after looking at some ground truth data, I'm wondering if the problem lies elsewhere.

    Any idea on why my custom dataset would look like this? The "bad" ground truth from VOID still looks better than my results . Here a link to the visualization of a VOID and a custom pointcloud.

    I ran kbnet on both a Python 3.7 venv with the given dependency versions and a 3.9 venv with newer library versions.

    I visualized everything using Open3D:

            image_opencv = cv2.imread(image_file)
            image = o3d.io.read_image(image_file)
            depth_image = o3d.io.read_image(depth_file)
            K = np.loadtxt(intrinsics_file)
            intrinsic = o3d.camera.PinholeCameraIntrinsic(image_opencv.shape[0], image_opencv.shape[1], K[0][0],K[1][1], K[0][2], K[1][2])
            rgbd_image = o3d.geometry.RGBDImage.create_from_color_and_depth(image, depth_image, convert_rgb_to_intensity=False)
            pcd = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd_image, intrinsic)
            o3d.visualization.draw_geometries([pcd])
    
    
    opened by DornAres 7
  • Question about  RuntimeError: inverse_cuda: For batch 0: U( , ) is zero, singular U.

    Question about RuntimeError: inverse_cuda: For batch 0: U( , ) is zero, singular U.

    Hi, Alex, Thank you for your excellent work. I some problem when run the pretrained model and train the model. I haven't change the code, but the following errors were reported. RuntimeError: inverse_cuda: For batch 0: U( , ) is zero, singular U. (The values in parentheses are different each time i run them) Have you met this error before, and how can i solve it? Thanks in advance.

    opened by yxx623 7
  • Could you provide the results after the setup_dataset_kitti.py, please?

    Could you provide the results after the setup_dataset_kitti.py, please?

    Thank you very much for providing this code! but i still have a little question : The path you provide in the setup_dataset_kitti.py doesn't seem to be an dirpath, so there were a lot of empty txt files after I ran the code. in line 258 : sequence = sparse_depth_paths[0].split(os.sep)[5] sequence_date = sequence[0:10] When I debug the code,the sequence is 'data' so the sequence_data is 'data' but i think its wrong im look forward to your answer

    opened by Jue-Jue-511 7
  • What is the 'input_channels_depth' parameter for?

    What is the 'input_channels_depth' parameter for?

    Hello, In run_kbnet.py, you have an argument 'input_channels_depth' whose default is 2; I'm not sure what this means. In networks.py, it's written that it is the "number of input channels for depth branch", but why would the depth be 2 channel?

    opened by rakshith95 5
  • about NYUv2 data

    about NYUv2 data

    Hi Alex, I notice that you update the script for downloading the NYUv2 dataset. Thanks. I downloaded the raw data from the NYUv2 official website weeks ago. But I found the unzipped data contains many files, with extensions like: '.dump', '.pgm', and '.ppm'. And the name of files are like: image

    However, the setup python file of NYUv2 seems to only accept the files with '.png' extension. https://github.com/alexklwong/calibrated-backprojection-network/blob/73e3943169b3baf0e5b60e1b3378337245a03464/setup/setup_dataset_nyu_v2.py#L246

    So my question is: Did I download the correct NYUv2 data? or How can I set up the data for your code? Thanks in advance!

    opened by lqzhao 5
  • Question about cuda 11.0 and pyorch 1.7.0

    Question about cuda 11.0 and pyorch 1.7.0

    Thank you for your excellent work.

    I would like to ask if you used cuda 11.0 when testing on ubuntu 20.04? When I trainthe network on cuda 11.0 + pytorch1.7 based on RTX 3090, the loss cannot drop normally. I cannot find the reason. Could you help me?

    opened by zhangguanghui1 5
  • camera intrinsic matrix

    camera intrinsic matrix

    Hello, I have a question about the values in "K.txt"

    in original VOID dataset, the intrinsic parameters provided in here are:

    "f_x": 514.638,
    "f_y": 518.858,
    "c_x": 315.267,
    "c_y": 247.358,
    

    However, in the "K.txt":

    5.471833965147203571e+02 0.000000000000000000e+00 3.176305425559989430e+02
    0.000000000000000000e+00 5.565094509450176474e+02 2.524727249693490592e+02
    0.000000000000000000e+00 0.000000000000000000e+00 1.000000000000000000e+00```
    
    , where K =
    f_x   0    c_x
    0     f_y  c_y
    0      0      1
    (as I know)
    

    They are somewhat different.

    Q1. Is the camera's distortion model (radtan) already applied in "K.txt" ?

    Q2. And the second question is that, why the intrinsic parameters are different across the different sequence? Did you use different sensor setup in each sequence? (in your paper, it is written that D435i was used for data acquisition). If so, which intrinsic data should be used for real-usage, like VIO ?

    Very thanks in advance

    opened by zinuok 5
  • Role of 'datasets.load_image_triplet' in validation

    Role of 'datasets.load_image_triplet' in validation

    Hello, When the image validation is run, the dataloader for inference is in line 764 of kbnet.py is initialized as:

        dataloader = torch.utils.data.DataLoader(
            datasets.KBNetInferenceDataset(
                image_paths=image_paths,
                sparse_depth_paths=sparse_depth_paths,
                intrinsics_paths=intrinsics_paths),
            batch_size=1,
            shuffle=False,
            num_workers=1,
            drop_last=False) 
    

    in which case the KBNetInferenceDataset class is initialized with the default use_image_triplet=True, and tries to fetch and split triplet of images. I understand its function in the training, but why is it so in the validation?

    opened by rakshith95 5
  • Are the absolute poses in the void dataset used in training?

    Are the absolute poses in the void dataset used in training?

    Hello, Though the absolute poses are available for each frame in the VOID dataset, it looks like PoseNet is used for getting the poses between cameras. Is there a particular reason for this?

    opened by rakshith95 4
  • NYU performance

    NYU performance

    Hi Alex, Thanks for your great work. I'm reproducing the NYU v2 (generalization) experiments. I followed the instructions to prepare NYU data provided by you. When I used your pre-trained model kbnet-void1500.pth to evaluate on NYU v2,I got these errors: (kbnet) zlq@ivg-SYS-7048GR-TR:/home/disk2/code/calibrated-backprojection-network$ bash bash/void/run_knet_nyu_v2_test.sh usage: run_kbnet.py [-h] --image_path IMAGE_PATH --sparse_depth_path SPARSE_DEPTH_PATH --intrinsics_path INTRINSICS_PATH [--ground_truth_path GROUND_TRUTH_PATH] [--input_channels_image INPUT_CHANNELS_IMAGE] [--input_channels_depth INPUT_CHANNELS_DEPTH] [--normalized_image_range NORMALIZED_IMAGE_RANGE [NORMALIZED_IMAGE_RANGE ...]] [--outlier_removal_kernel_size OUTLIER_REMOVAL_KERNEL_SIZE] [--outlier_removal_threshold OUTLIER_REMOVAL_THRESHOLD] [--min_pool_sizes_sparse_to_dense_pool MIN_POOL_SIZES_SPARSE_TO_DENSE_POOL [MIN_POOL_SIZES_SPARSE_TO_DENSE_POOL ...]] [--max_pool_sizes_sparse_to_dense_pool MAX_POOL_SIZES_SPARSE_TO_DENSE_POOL [MAX_POOL_SIZES_SPARSE_TO_DENSE_POOL ...]] [--n_convolution_sparse_to_dense_pool N_CONVOLUTION_SPARSE_TO_DENSE_POOL] [--n_filter_sparse_to_dense_pool N_FILTER_SPARSE_TO_DENSE_POOL] [--n_filters_encoder_image N_FILTERS_ENCODER_IMAGE [N_FILTERS_ENCODER_IMAGE ...]] [--n_filters_encoder_depth N_FILTERS_ENCODER_DEPTH [N_FILTERS_ENCODER_DEPTH ...]] [--resolutions_backprojection RESOLUTIONS_BACKPROJECTION [RESOLUTIONS_BACKPROJECTION ...]] [--n_filters_decoder N_FILTERS_DECODER [N_FILTERS_DECODER ...]] [--deconv_type DECONV_TYPE] [--min_predict_depth MIN_PREDICT_DEPTH] [--max_predict_depth MAX_PREDICT_DEPTH] [--weight_initializer WEIGHT_INITIALIZER] [--activation_func ACTIVATION_FUNC] [--min_evaluate_depth MIN_EVALUATE_DEPTH] [--max_evaluate_depth MAX_EVALUATE_DEPTH] [--output_path OUTPUT_PATH] [--save_outputs] [--keep_input_filenames] [--depth_model_restore_path DEPTH_MODEL_RESTORE_PATH] [--device DEVICE] run_kbnet.py: error: unrecognized arguments: --avg_pool_sizes_sparse_to_dense_pool 0 --encoder_type knet_v1 fusion_conv_previous sparse_to_dense_pool_v1 --input_type sparse_depth validity_map 3 3 3 0 --n_resolutions_encoder_intrinsics 0 1 2 3 --skip_types image depth --decoder_type multi-scale --output_kernel_size 3 --outlier_removal_method remove

    So I deleted the unrecognized arguments and run it again, this time I got these numbers:

    Evaluation results:
         MAE      RMSE      iMAE     iRMSE
     122.836   228.426    24.147    50.003
         +/-       +/-       +/-       +/-
      71.550   130.133    16.920    36.531
    

    I know the numbers are close to the reported results in this repo, but I think maybe I can perfectly reproduce the results you reported if considering the unrecognized arguments. My question is: How can I reproduce the results which are closer to your reported results? Is there something wrong with my operation?

    My environment is as follows:
    torch                  1.3.0
    torchvision            0.4.1
    Python 3.7.12
    CUDA 10.2
    

    Thank you in advance.

    opened by lqzhao 5
  • Train with Poses

    Train with Poses

    Changes made to enable training with pose data from Visual odometry (available in the VOID dataset). Option enabled with command line parameter --train_pose_paths .

    The pose paths, and pose triplets have to be generated like the image paths are. These changes are not pushed in this since, there were multiple other changes to those files which were required for another application.

    opened by rakshith95 1
Owner
I am a post-doctoral researcher at the UCLA Vision Lab under the supervision of Professor Stefano Soatto.
null
Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

TFLite-msg_chn_wacv20-depth-completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model

Ibai Gorordo 2 Oct 4, 2021
[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers Created by Xumin Yu*, Yongming Rao*, Ziyi Wang, Zuyan Liu, Jiwen Lu, Jie Zhou

Xumin Yu 317 Dec 26, 2022
[ICCV 2021 Oral] SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

This repository contains the source code for the paper SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer (ICCV 2021 Oral). The project page is here.

AllenXiang 65 Dec 26, 2022
Current state of supervised and unsupervised depth completion methods

Awesome Depth Completion Table of Contents About Sparse-to-Dense Depth Completion Current State of Depth Completion Unsupervised VOID Benchmark Superv

null 224 Dec 28, 2022
Compare outputs between layers written in Tensorflow and layers written in Pytorch

Compare outputs of Wasserstein GANs between TensorFlow vs Pytorch This is our testing module for the implementation of improved WGAN in Pytorch Prereq

Hung Nguyen 72 Dec 20, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
null 8 Nov 4, 2022
CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

Fu-En Wang 83 Jan 4, 2023
ICRA 2021 "Towards Precise and Efficient Image Guided Depth Completion"

PENet: Precise and Efficient Depth Completion This repo is the PyTorch implementation of our paper to appear in ICRA2021 on "Towards Precise and Effic

null 232 Dec 25, 2022
[CVPR 2021 Oral] Variational Relational Point Completion Network

VRCNet: Variational Relational Point Completion Network This repository contains the PyTorch implementation of the paper: Variational Relational Point

PL 121 Dec 12, 2022
Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021) Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma. We address the pr

Kranti Kumar Parida 33 Jun 27, 2022
A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

Biomedical Computer Vision @ Uniandes 52 Dec 19, 2022
[CVPR 2021] Unsupervised 3D Shape Completion through GAN Inversion

ShapeInversion Paper Junzhe Zhang, Xinyi Chen, Zhongang Cai, Liang Pan, Haiyu Zhao, Shuai Yi, Chai Kiat Yeo, Bo Dai, Chen Change Loy "Unsupervised 3D

null 100 Dec 22, 2022
Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation (RA-L/ICRA 2020)

Aerial Depth Completion This work is described in the letter "Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation", by Lucas

ETHZ V4RL 70 Dec 22, 2022
RGB-D Local Implicit Function for Depth Completion of Transparent Objects

RGB-D Local Implicit Function for Depth Completion of Transparent Objects [Project Page] [Paper] Overview This repository maintains the official imple

NVIDIA Research Projects 43 Dec 12, 2022
Official page of Struct-MDC (RA-L'22 with IROS'22 option); Depth completion from Visual-SLAM using point & line features

Struct-MDC (click the above buttons for redirection!) Official page of "Struct-MDC: Mesh-Refined Unsupervised Depth Completion Leveraging Structural R

Urban Robotics Lab. @ KAIST 37 Dec 22, 2022
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral) This repo is the official imp

如今我已剑指天涯 46 Dec 21, 2022
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

Cheng Zhang 66 Nov 16, 2022
Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

Pranav 39 Nov 21, 2022
The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

null 32 Jun 14, 2022