ICRA 2021 "Towards Precise and Efficient Image Guided Depth Completion"

Overview

PENet: Precise and Efficient Depth Completion

This repo is the PyTorch implementation of our paper to appear in ICRA2021 on "Towards Precise and Efficient Image Guided Depth Completion", developed by Mu Hu, Shuling Wang, Bin Li, Shiyu Ning, Li Fan, and Xiaojin Gong at Zhejiang University and Huawei Shanghai.

Create a new issue for any code-related questions. Feel free to direct me as well at [email protected] for any paper-related questions.

Results

  • The proposed full model ranks 1st in the KITTI depth completion online leaderboard at the time of submission.
  • It infers much faster than most of the top ranked methods.
  • Both ENet and PENet can be trained thoroughly on 2x11G GPU.
  • Our network is trained with the KITTI dataset alone, not pretrained on Cityscapes or other similar driving dataset (either synthetic or real).

Method

A Strong Two-branch Backbone

Revisiting the popular two-branch architecture

The two-branch backbone is designed to thoroughly exploit color-dominant and depth-dominant information from their respective branches and make the fusion of two modalities effective. Note that it is the depth prediction result obtained from the color-dominant branch that is input to the depth-dominant branch, not a guidance map like those in DeepLiDAR and FusionNet.

Geometric convolutional Layer

To encode 3D geometric information, it simply augments a conventional convolutional layer via concatenating a 3D position map to the layer’s input.

Dilated and Accelerated CSPN++

Dilated CSPN

we introduce a dilation strategy similar to the well known dilated convolutions to enlarge the propagation neighborhoods.

Accelerated CSPN

we design an implementation that makes the propagation from each neighbor truly parallel, which greatly accelerates the propagation procedure.

Contents

  1. Dependency
  2. Data
  3. Trained Models
  4. Commands
  5. Citation

Dependency

Our released implementation is tested on.

  • Ubuntu 16.04
  • Python 3.7.4 (Anaconda 2019.10)
  • PyTorch 1.3.1 / torchvision 0.4.2
  • NVIDIA CUDA 10.0.130
  • 4x NVIDIA GTX 2080 Ti GPUs
pip install numpy matplotlib Pillow
pip install scikit-image
pip install opencv-contrib-python==3.4.2.17

Data

  • Download the KITTI Depth Dataset and KITTI Raw Dataset from their websites. The overall data directory is structured as follows:
├── kitti_depth
|   ├── depth
|   |   ├──data_depth_annotated
|   |   |  ├── train
|   |   |  ├── val
|   |   ├── data_depth_velodyne
|   |   |  ├── train
|   |   |  ├── val
|   |   ├── data_depth_selection
|   |   |  ├── test_depth_completion_anonymous
|   |   |  |── test_depth_prediction_anonymous
|   |   |  ├── val_selection_cropped
├── kitti_raw
|   ├── 2011_09_26
|   ├── 2011_09_28
|   ├── 2011_09_29
|   ├── 2011_09_30
|   ├── 2011_10_03

Trained Models

Download our pre-trained models:

Commands

A complete list of training options is available with

python main.py -h

Training

Training Pipeline

Here we adopt a multi-stage training strategy to train the backbone, DA-CSPN++, and the full model progressively. However, end-to-end training is feasible as well.

  1. Train ENet (Part Ⅰ)
CUDA_VISIBLE_DEVICES="0,1" python main.py -b 6 -n e
# -b for batch size
# -n for network model
  1. Train DA-CSPN++ (Part Ⅱ)
CUDA_VISIBLE_DEVICES="0,1" python main.py -b 6 -f -n pe --resume [enet-checkpoint-path]
# -f for freezing the parameters in the backbone
# --resume for initializing the parameters from the checkpoint
  1. Train PENet (Part Ⅲ)
CUDA_VISIBLE_DEVICES="0,1" python main.py -b 10 -n pe -he 160 -w 576 --resume [penet-checkpoint-path]
# -he, -w for the image size after random cropping

Evalution

CUDA_VISIBLE_DEVICES="0" python main.py -b 1 -n p --evaluate [enet-checkpoint-path]
CUDA_VISIBLE_DEVICES="0" python main.py -b 1 -n pe --evaluate [penet-checkpoint-path]
# test the trained model on the val_selection_cropped data

Test

CUDA_VISIBLE_DEVICES="0" python main.py -b 1 -n pe --evaluate [penet-checkpoint-path] --test
# generate and save results of the trained model on the test_depth_completion_anonymous data

Citation

If you use our code or method in your work, please cite the following:

@article{hu2020PENet,
	title={Towards Precise and Efficient Image Guided Depth Completion},
	author={Hu, Mu and Wang, Shuling and Li, Bin and Ning, Shiyu and Fan, Li and Gong, Xiaojin},
	booktitle={ICRA},
	year={2021}
}

Related Repositories

The original code framework is rendered from "Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera". It is developed by Fangchang Ma, Guilherme Venturelli Cavalheiro, and Sertac Karaman at MIT.

The part of CoordConv is rendered from "An intriguing failing of convolutional neural networks and the CoordConv".

Comments
  • Experimental results using the NYU dataset

    Experimental results using the NYU dataset

    Hello, I am still training on PENet with NYU dataset, please help me to take a look. The third one in this graph is the predicted result, right? I think this also proves that this network can run on the NYU dataset, is that correct? Because I want to know if this network is suitable for a densely labeled dataset, thanks. Looking forward to your recovery. comparison_best

    opened by yuyu19970716 11
  • Use other datasets to train PENet

    Use other datasets to train PENet

    Hello I now want to use the PENet network architecture to complete depth completion, because I only need the dataset at hand, so I found the HandNet dataset, which contains depth data and image data collected by the realsense series of depth cameras. I especially want to know where do I need to modify the code? Thank you in advance! Looking forward to your reply!

    opened by yuyu19970716 11
  • Questioning Inference Speed

    Questioning Inference Speed

    Good day,

    First of all, congratulations on your work and paper. The idea of separating depth-dominant and color-dominant branches is interesting. Also, thank you for releasing the source code to the public. I have been replicating your code the past few days, and so far inferencing has been straightforward (I am getting RMSE scores at around ~760).

    However, correct me if I'm wrong but I think there might be a mistake in the inference time computation. In main.py line 213/216, this is where the predictions are generated from the ENet/PENet models, after which gpu_time is computed. I tried adding a print(pred) function call (see in the image below). image

    I got very different inference times with and without the print(pred) function call. I ran this on a machine with RTX 2080Ti, i7-9700k, CUDA 11.2, torch==1.3.1, torchvision==0.4.2. Below are my runtimes:

    image original code - a bit faster than your official runtime presumably due to my newer CUDA version(?)

    image modified code - much slower when print(pred) was added

    My understanding is that calling pred = model(batch_data) does not yet run the model prediction; the model inference only actually runs when you call result.evaluate() in line 268 (i.e. lazy execution): image

    This results in a nearly x10 increase in inference time (i.e. 151ms vs 17ms). Can you confirm that this also happens in your environment?

    opened by wdjose 7
  • the difference intrinsic parameters between train and test

    the difference intrinsic parameters between train and test

    Thank you for your great works. I found the kitti dataset has different intrinsic parameters in train set and test set. But in the test mode, it looks like only use train set carema intrinsic parameters, is there any problem? And can the method works on untrained cameras well? Thank you very much.

    opened by q5390498 4
  • Training with Multiple GPU's

    Training with Multiple GPU's

    Hi, Thanks for the code, I am using 4 GPU's to train the model and training pass goes well. However when It starts validation i receive an error given below.

    RuntimeError: Caught RuntimeError in replica 0 on device 0.
    Original Traceback (most recent call last):
      File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
        output = module(*input, **kwargs)
      File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 545, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/nazir/PENet_ICRA2021/model.py", line 440, in forward
        sparsed_feature3 = self.depth_layer3(sparsed_feature2_plus, geo_s2, geo_s3) # b 64 88 304
      File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 545, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/nazir/PENet_ICRA2021/basic.py", line 312, in forward
        x = torch.cat((x, g1), 1)
    RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 176 and 160 in dimension 2 at /tmp/pip-req-build-akjifb_7/aten/src/THC/generic/THCTensorMath.cu:71
    

    I assume this is due to the multi-gpu training. I see that you have also used 2 GPUs for training, did you see this issue while training?

    Thanks,

    opened by danishnazir 4
  • About the training data

    About the training data

    Thanks for your code! I have a question about the KITTI raw data. Do I need to download all the raw data? I know that in the monocular depth estimation, generally only a part of it needs to be downloaded. Is there a download list that I can refer to? thanks!

    opened by yuqJin 4
  • Modify the backbone network

    Modify the backbone network

    Hi! I have a question for you! Recently I changed PENet's backbone network ENet to attention-unet, and I only used part of Kitti's dataset. I felt that the training was a little slow, and then I modified the learning rate when training the backbone network so that the network could learn faster. image It's ok when I train ENet. (Blue is the training set, yellow is the validation set) But when I used the trained backbone network for the second stage of training, the network experienced severe overfitting. I wonder if this has something to do with the learning rate? image We can see that cspn++ is trained very poorly. The validation set error is large. What is the reason for this? Below is the learning rate when I train ENet. f5519c1a5763a78fb707e811817d022

    opened by yuyu19970716 3
  • the point cloud looks a bit plausible and noisy

    the point cloud looks a bit plausible and noisy

    PENet is efficient and the output depth image seems very nice, but the point cloud transformed from the depth inferred by PENet looks plausible and noisy. Is there something wrong with my matlab code used for transformation? Thank you. depth2cloud.zip

    opened by wmj-ustc 3
  • Projection result confirmation

    Projection result confirmation

    Hello,

    I'm now working on the cross-modality detection tasks in 3D space. Since the SFDNet uses this method as their depth completion way, so I try this repo as well. The following is the output of PENet and the outcome I obtained when I project them back. Does it look correct? I know that the recovered 3d position from the depth map is suffering from the artifacts as u discuss in this issue: https://github.com/JUGGHM/PENet_ICRA2021/issues/3. But it looks more severe far beyond my expectation. Thanks for any help

    0000000000 Screenshot from 2022-05-11 23-01-29 Screenshot from 2022-05-11 23-00-21

    opened by Orbis36 2
  • Normalization Input Data

    Normalization Input Data

    Hi and thank you very much for your great repo. As it was mentioned in #43 before, you did not normalize the input data. However, as far as I know it is common practice to do so for a more efficient training and keeping weights and biases small. May I ask why you decided to not normalize your data?

    opened by MarSpit 2
  • Cannot load ENet Model

    Cannot load ENet Model

    Hi,

    I am trying to load the e.pth.tar model and I am not able to do it. All I am doing is

    import torch
    torch.load('e.pth.tar', 'cuda:0')
    

    and it gives me this error:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/core_uc/.local/lib/python3.6/site-packages/torch/serialization.py", line 608, in load
        return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
      File "/home/core_uc/.local/lib/python3.6/site-packages/torch/serialization.py", line 787, in _legacy_load
        result = unpickler.load()
    ModuleNotFoundError: No module named 'metrics'
    

    Would you know what could be the issue? I didpip3 install metrics just in case but it did not work.

    opened by duda1202 2
  • Model mismatch at inference time

    Model mismatch at inference time

    Hello, when I use my own depth map and rgb (h, w is 900x1600) for inference, I put the "--convolutional-layer-" The encoding" parameter is changed to std, there is a problem that the model does not match the weight in the backbone part when loading the weight. but I see that when you are training, the "--convolutional-layer-encoding" parameter defaults to xyz, so should I train a model with the "--convolutional-layer-encoding" is "std" from scratch? Besides, I wonder if changing this parameter will affect the final performance of the model?

    opened by lilkeker 0
  • some questions about the implement of CSPN

    some questions about the implement of CSPN

    Hi, thank you for you brilliant code, it's very helpful. I have a question about the sample and stitch steps of model PENet_C2_train in the model, sample operation in line 921 and stitch operation 965. Where can I read the paper of it, I noticed that https://github.com/JUGGHM/PENet_ICRA2021/issues/16 ,https://github.com/JUGGHM/PENet_ICRA2021/issues/59 some issue called it DE-CSPN++, but I did not find the paper. Is it a variant of RA-CSPN?

    opened by yudmoe 1
  • broken PNG file

    broken PNG file

    Thank you for your excellent work. I encountered the following problems during training. I checked the PNG image and it was normal, but I reported the following error "broken PNG file" during training. Look forward to your reply

    Traceback (most recent call last): File "main.py", line 474, in main() File "main.py", line 447, in main iterate("train", args, train_loader, model, optimizer, logger, epoch) # train for one epoch File "main.py", line 191, in iterate for i, batch_data in enumerate(loader): File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 663, in next data = self._next_data() File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1356, in _next_data return self._process_data(data) File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1382, in _process_data data.reraise() File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/_utils.py", line 475, in reraise raise exception SyntaxError: Caught SyntaxError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/hwp/PENet/dataloaders/kitti_loader.py", line 359, in getitem rgb, sparse, target = self.getraw(index) File "/home/hwp/PENet/dataloaders/kitti_loader.py", line 351, in getraw (self.paths['rgb'][index] is not None and (self.args.use_rgb or self.args.use_g)) else None File "/home/hwp/PENet/dataloaders/kitti_loader.py", line 158, in rgb_read rgb_png = np.array(img_file, dtype='uint8') # in the range [0,255] File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/PIL/Image.py", line 675, in array new["data"] = self.tobytes() File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/PIL/Image.py", line 718, in tobytes self.load() File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/PIL/ImageFile.py", line 235, in load s = read(self.decodermaxblock) File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/PIL/PngImagePlugin.py", line 896, in load_read cid, pos, length = self.png.read() File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/PIL/PngImagePlugin.py", line 166, in read raise SyntaxError(f"broken PNG file (chunk {repr(cid)})") File "", line None SyntaxError: broken PNG file (chunk b'\x00\x00\x00\x00')

    opened by Fululu627 1
  • How to infer PENet for KITTI object task?

    How to infer PENet for KITTI object task?

    Hi, @JUGGHM I am doing 3D detection with KITTI dataset. I converted lidar(.bin) into sparse depth map(.png) now. How can I get dense depth map using PENet? (It seems that PENet support only one input shape, the kitti 3d detection dataset have different image shape) Can you give me some advice? Thanks

    opened by Senwang98 3
  • lightweight deployment of PENet network

    lightweight deployment of PENet network

    Hi! Now ,I want to consider the lightweight deployment of PENet network. For lightweight deployment: Do you have any suggestions? Because the current model of the network architecture is relatively large, what is your opinion? Very much looking forward to your answer!

    opened by yuyu19970716 0
  • How to use the sparse depth?

    How to use the sparse depth?

    Thank you for your outstanding contribution! I want to know how the Color-dominant Branch is combined with the point cloud and sent to the network. Does the radar point cloud only take effective points, and does the color image also take the same effective points as the point cloud? How can we get a dense depth map in this way? This question has been bothering me. I hope you can answer it for me. Thank you again!

    opened by Pattern6 3
Owner
null
Spatial Intention Maps for Multi-Agent Mobile Manipulation (ICRA 2021)

spatial-intention-maps This code release accompanies the following paper: Spatial Intention Maps for Multi-Agent Mobile Manipulation Jimmy Wu, Xingyua

Jimmy Wu 70 Jan 2, 2023
Offcial repository for the IEEE ICRA 2021 paper Auto-Tuned Sim-to-Real Transfer.

Offcial repository for the IEEE ICRA 2021 paper Auto-Tuned Sim-to-Real Transfer.

null 47 Jun 30, 2022
Code for "FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection", ICRA 2021

FGR This repository contains the python implementation for paper "FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection"(I

Yi Wei 31 Dec 8, 2022
Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

EagerMOT: 3D Multi-Object Tracking via Sensor Fusion Read our ICRA 2021 paper here. Check out the 3 minute video for the quick intro or the full prese

Aleksandr Kim 276 Dec 30, 2022
This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

Locus This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order

Robotics and Autonomous Systems Group 96 Dec 15, 2022
Code for the RA-L (ICRA) 2021 paper "SeqNet: Learning Descriptors for Sequence-Based Hierarchical Place Recognition"

SeqNet: Learning Descriptors for Sequence-Based Hierarchical Place Recognition [ArXiv+Supplementary] [IEEE Xplore RA-L 2021] [ICRA 2021 YouTube Video]

Sourav Garg 63 Dec 12, 2022
the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

G2S This is the official code for ICRA 2021 Paper: Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation by Hemang

NeurAI 4 Jul 27, 2022
Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

Adversarial Differentiable Data Augmentation This repository provides the official PyTorch implementation of the ICRA 2021 paper: Adversarial Differen

Manli 3 Oct 15, 2022
Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation (RA-L/ICRA 2020)

Aerial Depth Completion This work is described in the letter "Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation", by Lucas

ETHZ V4RL 70 Dec 22, 2022
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

null 43 Nov 21, 2022
The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

DG-TrajGen The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022. Our Meth

Wang 25 Sep 26, 2022
SafePicking: Learning Safe Object Extraction via Object-Level Mapping, ICRA 2022

SafePicking Learning Safe Object Extraction via Object-Level Mapping Kentaro Wad

Kentaro Wada 49 Oct 24, 2022
[ICRA 2022] CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation

This is the official implementation of our paper: Bowen Wen, Wenzhao Lian, Kostas Bekris, and Stefan Schaal. "CaTGrasp: Learning Category-Level Task-R

Bowen Wen 199 Jan 4, 2023
[ICRA 2022] An opensource framework for cooperative detection. Official implementation for OPV2V.

OpenCOOD OpenCOOD is an Open COOperative Detection framework for autonomous driving. It is also the official implementation of the ICRA 2022 paper OPV

Runsheng Xu 322 Dec 23, 2022
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
Code of the lileonardo team for the 2021 Emotion and Theme Recognition in Music task of MediaEval 2021

Emotion and Theme Recognition in Music The repository contains code for the submission of the lileonardo team to the 2021 Emotion and Theme Recognitio

Vincent Bour 8 Aug 2, 2022
Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

Geometric Vector Perceptron Implementation of Geometric Vector Perceptron, a simple circuit with 3d rotation equivariance for learning over large biom

Phil Wang 59 Nov 24, 2022
[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective [PDF] Wuyang Chen, Xinyu Gong, Zhangyang Wang In ICLR 2

VITA 156 Nov 28, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022