ICRA 2021 "Towards Precise and Efficient Image Guided Depth Completion"

Last update: Dec 25, 2022

Related tags

Overview

PENet: Precise and Efficient Depth Completion

This repo is the PyTorch implementation of our paper to appear in ICRA2021 on "Towards Precise and Efficient Image Guided Depth Completion", developed by Mu Hu, Shuling Wang, Bin Li, Shiyu Ning, Li Fan, and Xiaojin Gong at Zhejiang University and Huawei Shanghai.

Create a new issue for any code-related questions. Feel free to direct me as well at [email protected] for any paper-related questions.

Results

The proposed full model ranks 1st in the KITTI depth completion online leaderboard at the time of submission.
It infers much faster than most of the top ranked methods.

Both ENet and PENet can be trained thoroughly on 2x11G GPU.
Our network is trained with the KITTI dataset alone, not pretrained on Cityscapes or other similar driving dataset (either synthetic or real).

Method

A Strong Two-branch Backbone

Revisiting the popular two-branch architecture

The two-branch backbone is designed to thoroughly exploit color-dominant and depth-dominant information from their respective branches and make the fusion of two modalities effective. Note that it is the depth prediction result obtained from the color-dominant branch that is input to the depth-dominant branch, not a guidance map like those in DeepLiDAR and FusionNet.

Geometric convolutional Layer

To encode 3D geometric information, it simply augments a conventional convolutional layer via concatenating a 3D position map to the layer’s input.

Dilated and Accelerated CSPN++

Dilated CSPN

we introduce a dilation strategy similar to the well known dilated convolutions to enlarge the propagation neighborhoods.

Accelerated CSPN

we design an implementation that makes the propagation from each neighbor truly parallel, which greatly accelerates the propagation procedure.

Dependency
Data
Trained Models
Commands
Citation

Dependency

Our released implementation is tested on.

Ubuntu 16.04
Python 3.7.4 (Anaconda 2019.10)
PyTorch 1.3.1 / torchvision 0.4.2
NVIDIA CUDA 10.0.130
4x NVIDIA GTX 2080 Ti GPUs

pip install numpy matplotlib Pillow
pip install scikit-image
pip install opencv-contrib-python==3.4.2.17

Data

Download the KITTI Depth Dataset and KITTI Raw Dataset from their websites. The overall data directory is structured as follows:

├── kitti_depth
|   ├── depth
|   |   ├──data_depth_annotated
|   |   |  ├── train
|   |   |  ├── val
|   |   ├── data_depth_velodyne
|   |   |  ├── train
|   |   |  ├── val
|   |   ├── data_depth_selection
|   |   |  ├── test_depth_completion_anonymous
|   |   |  |── test_depth_prediction_anonymous
|   |   |  ├── val_selection_cropped

├── kitti_raw
|   ├── 2011_09_26
|   ├── 2011_09_28
|   ├── 2011_09_29
|   ├── 2011_09_30
|   ├── 2011_10_03

Trained Models

Download our pre-trained models:

PENet (i.e., the proposed full model with dilation_rate=2): Download Here
ENet (i.e., the backbone): Download Here

Commands

A complete list of training options is available with

python main.py -h

Training

Here we adopt a multi-stage training strategy to train the backbone, DA-CSPN++, and the full model progressively. However, end-to-end training is feasible as well.

Train ENet (Part Ⅰ)

CUDA_VISIBLE_DEVICES="0,1" python main.py -b 6 -n e
# -b for batch size
# -n for network model

Train DA-CSPN++ (Part Ⅱ)

CUDA_VISIBLE_DEVICES="0,1" python main.py -b 6 -f -n pe --resume [enet-checkpoint-path]
# -f for freezing the parameters in the backbone
# --resume for initializing the parameters from the checkpoint

Train PENet (Part Ⅲ)

CUDA_VISIBLE_DEVICES="0,1" python main.py -b 10 -n pe -he 160 -w 576 --resume [penet-checkpoint-path]
# -he, -w for the image size after random cropping

Evalution

CUDA_VISIBLE_DEVICES="0" python main.py -b 1 -n p --evaluate [enet-checkpoint-path]
CUDA_VISIBLE_DEVICES="0" python main.py -b 1 -n pe --evaluate [penet-checkpoint-path]
# test the trained model on the val_selection_cropped data

Test

CUDA_VISIBLE_DEVICES="0" python main.py -b 1 -n pe --evaluate [penet-checkpoint-path] --test
# generate and save results of the trained model on the test_depth_completion_anonymous data

Citation

If you use our code or method in your work, please cite the following:

@article{hu2020PENet,
	title={Towards Precise and Efficient Image Guided Depth Completion},
	author={Hu, Mu and Wang, Shuling and Li, Bin and Ning, Shiyu and Fan, Li and Gong, Xiaojin},
	booktitle={ICRA},
	year={2021}
}

Related Repositories

The original code framework is rendered from "Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera". It is developed by Fangchang Ma, Guilherme Venturelli Cavalheiro, and Sertac Karaman at MIT.

The part of CoordConv is rendered from "An intriguing failing of convolutional neural networks and the CoordConv".

Comments

Experimental results using the NYU dataset

Hello, I am still training on PENet with NYU dataset, please help me to take a look. The third one in this graph is the predicted result, right? I think this also proves that this network can run on the NYU dataset, is that correct? Because I want to know if this network is suitable for a densely labeled dataset, thanks. Looking forward to your recovery.

opened by yuyu19970716 11
Use other datasets to train PENet

Hello I now want to use the PENet network architecture to complete depth completion, because I only need the dataset at hand, so I found the HandNet dataset, which contains depth data and image data collected by the realsense series of depth cameras. I especially want to know where do I need to modify the code? Thank you in advance! Looking forward to your reply!

opened by yuyu19970716 11
Questioning Inference Speed

Good day,

First of all, congratulations on your work and paper. The idea of separating depth-dominant and color-dominant branches is interesting. Also, thank you for releasing the source code to the public. I have been replicating your code the past few days, and so far inferencing has been straightforward (I am getting RMSE scores at around ~760).

However, correct me if I'm wrong but I think there might be a mistake in the inference time computation. In main.py line 213/216, this is where the predictions are generated from the ENet/PENet models, after which gpu_time is computed. I tried adding a print(pred) function call (see in the image below).

I got very different inference times with and without the print(pred) function call. I ran this on a machine with RTX 2080Ti, i7-9700k, CUDA 11.2, torch==1.3.1, torchvision==0.4.2. Below are my runtimes:

original code - a bit faster than your official runtime presumably due to my newer CUDA version(?)

modified code - much slower when print(pred) was added

My understanding is that calling pred = model(batch_data) does not yet run the model prediction; the model inference only actually runs when you call result.evaluate() in line 268 (i.e. lazy execution):

This results in a nearly x10 increase in inference time (i.e. 151ms vs 17ms). Can you confirm that this also happens in your environment?

opened by wdjose 7
the difference intrinsic parameters between train and test

Thank you for your great works. I found the kitti dataset has different intrinsic parameters in train set and test set. But in the test mode, it looks like only use train set carema intrinsic parameters, is there any problem? And can the method works on untrained cameras well? Thank you very much.

opened by q5390498 4

Training with Multiple GPU's

Hi, Thanks for the code, I am using 4 GPU's to train the model and training pass goes well. However when It starts validation i receive an error given below.

RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 545, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nazir/PENet_ICRA2021/model.py", line 440, in forward
    sparsed_feature3 = self.depth_layer3(sparsed_feature2_plus, geo_s2, geo_s3) # b 64 88 304
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 545, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nazir/PENet_ICRA2021/basic.py", line 312, in forward
    x = torch.cat((x, g1), 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 176 and 160 in dimension 2 at /tmp/pip-req-build-akjifb_7/aten/src/THC/generic/THCTensorMath.cu:71

I assume this is due to the multi-gpu training. I see that you have also used 2 GPUs for training, did you see this issue while training?

Thanks,

opened by danishnazir 4

About the training data

Thanks for your code! I have a question about the KITTI raw data. Do I need to download all the raw data? I know that in the monocular depth estimation, generally only a part of it needs to be downloaded. Is there a download list that I can refer to? thanks!

opened by yuqJin 4
Modify the backbone network

Hi! I have a question for you! Recently I changed PENet's backbone network ENet to attention-unet, and I only used part of Kitti's dataset. I felt that the training was a little slow, and then I modified the learning rate when training the backbone network so that the network could learn faster. It's ok when I train ENet. (Blue is the training set, yellow is the validation set) But when I used the trained backbone network for the second stage of training, the network experienced severe overfitting. I wonder if this has something to do with the learning rate? We can see that cspn++ is trained very poorly. The validation set error is large. What is the reason for this? Below is the learning rate when I train ENet.

opened by yuyu19970716 3
the point cloud looks a bit plausible and noisy

PENet is efficient and the output depth image seems very nice, but the point cloud transformed from the depth inferred by PENet looks plausible and noisy. Is there something wrong with my matlab code used for transformation? Thank you. depth2cloud.zip

opened by wmj-ustc 3
Projection result confirmation

Hello,

I'm now working on the cross-modality detection tasks in 3D space. Since the SFDNet uses this method as their depth completion way, so I try this repo as well. The following is the output of PENet and the outcome I obtained when I project them back. Does it look correct? I know that the recovered 3d position from the depth map is suffering from the artifacts as u discuss in this issue: https://github.com/JUGGHM/PENet_ICRA2021/issues/3. But it looks more severe far beyond my expectation. Thanks for any help

opened by Orbis36 2
Normalization Input Data

Hi and thank you very much for your great repo. As it was mentioned in #43 before, you did not normalize the input data. However, as far as I know it is common practice to do so for a more efficient training and keeping weights and biases small. May I ask why you decided to not normalize your data?

opened by MarSpit 2

Cannot load ENet Model

Hi,

I am trying to load the e.pth.tar model and I am not able to do it. All I am doing is

import torch
torch.load('e.pth.tar', 'cuda:0')

and it gives me this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/core_uc/.local/lib/python3.6/site-packages/torch/serialization.py", line 608, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/core_uc/.local/lib/python3.6/site-packages/torch/serialization.py", line 787, in _legacy_load
    result = unpickler.load()
ModuleNotFoundError: No module named 'metrics'

Would you know what could be the issue? I didpip3 install metrics just in case but it did not work.

opened by duda1202 2

Model mismatch at inference time

Hello, when I use my own depth map and rgb (h, w is 900x1600) for inference, I put the "--convolutional-layer-" The encoding" parameter is changed to std, there is a problem that the model does not match the weight in the backbone part when loading the weight. but I see that when you are training, the "--convolutional-layer-encoding" parameter defaults to xyz, so should I train a model with the "--convolutional-layer-encoding" is "std" from scratch? Besides, I wonder if changing this parameter will affect the final performance of the model?

opened by lilkeker 0
some questions about the implement of CSPN

Hi， thank you for you brilliant code, it's very helpful. I have a question about the sample and stitch steps of model PENet_C2_train in the model, sample operation in line 921 and stitch operation 965. Where can I read the paper of it, I noticed that https://github.com/JUGGHM/PENet_ICRA2021/issues/16 ,https://github.com/JUGGHM/PENet_ICRA2021/issues/59 some issue called it DE-CSPN++, but I did not find the paper. Is it a variant of RA-CSPN?

opened by yudmoe 1
broken PNG file

Thank you for your excellent work. I encountered the following problems during training. I checked the PNG image and it was normal, but I reported the following error "broken PNG file" during training. Look forward to your reply

Traceback (most recent call last): File "main.py", line 474, in main() File "main.py", line 447, in main iterate("train", args, train_loader, model, optimizer, logger, epoch) # train for one epoch File "main.py", line 191, in iterate for i, batch_data in enumerate(loader): File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 663, in next data = self._next_data() File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1356, in _next_data return self._process_data(data) File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1382, in _process_data data.reraise() File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/_utils.py", line 475, in reraise raise exception SyntaxError: Caught SyntaxError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/hwp/PENet/dataloaders/kitti_loader.py", line 359, in getitem rgb, sparse, target = self.getraw(index) File "/home/hwp/PENet/dataloaders/kitti_loader.py", line 351, in getraw (self.paths['rgb'][index] is not None and (self.args.use_rgb or self.args.use_g)) else None File "/home/hwp/PENet/dataloaders/kitti_loader.py", line 158, in rgb_read rgb_png = np.array(img_file, dtype='uint8') # in the range [0,255] File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/PIL/Image.py", line 675, in array new["data"] = self.tobytes() File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/PIL/Image.py", line 718, in tobytes self.load() File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/PIL/ImageFile.py", line 235, in load s = read(self.decodermaxblock) File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/PIL/PngImagePlugin.py", line 896, in load_read cid, pos, length = self.png.read() File "/home/hwp/anaconda3/envs/pytorch/lib/python3.7/site-packages/PIL/PngImagePlugin.py", line 166, in read raise SyntaxError(f"broken PNG file (chunk {repr(cid)})") File "", line None SyntaxError: broken PNG file (chunk b'\x00\x00\x00\x00')

opened by Fululu627 1
How to infer PENet for KITTI object task?

Hi, @JUGGHM I am doing 3D detection with KITTI dataset. I converted lidar(.bin) into sparse depth map(.png) now. How can I get dense depth map using PENet? (It seems that PENet support only one input shape, the kitti 3d detection dataset have different image shape) Can you give me some advice? Thanks

opened by Senwang98 3
lightweight deployment of PENet network

Hi! Now ,I want to consider the lightweight deployment of PENet network. For lightweight deployment: Do you have any suggestions? Because the current model of the network architecture is relatively large, what is your opinion? Very much looking forward to your answer!

opened by yuyu19970716 0
How to use the sparse depth?

Thank you for your outstanding contribution！ I want to know how the Color-dominant Branch is combined with the point cloud and sent to the network. Does the radar point cloud only take effective points, and does the color image also take the same effective points as the point cloud? How can we get a dense depth map in this way? This question has been bothering me. I hope you can answer it for me. Thank you again！

opened by Pattern6 3

ICRA 2021 "Towards Precise and Efficient Image Guided Depth Completion"

Related tags

Overview

PENet: Precise and Efficient Depth Completion

Results

Method

A Strong Two-branch Backbone

Revisiting the popular two-branch architecture

Geometric convolutional Layer

Dilated and Accelerated CSPN++

Dilated CSPN

Accelerated CSPN

Contents

Dependency

Data

Trained Models

Commands

Training

Evalution

Test

Citation

Related Repositories

Comments

Owner

Spatial Intention Maps for Multi-Agent Mobile Manipulation (ICRA 2021)

Offcial repository for the IEEE ICRA 2021 paper Auto-Tuned Sim-to-Real Transfer.

Code for "FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection", ICRA 2021

Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

Code for the RA-L (ICRA) 2021 paper "SeqNet: Learning Descriptors for Sequence-Based Hierarchical Place Recognition"

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation (RA-L/ICRA 2020)

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

SafePicking: Learning Safe Object Extraction via Object-Level Mapping, ICRA 2022

[ICRA 2022] CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation

[ICRA 2022] An opensource framework for cooperative detection. Official implementation for OPV2V.

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Code of the lileonardo team for the 2021 Emotion and Theme Recognition in Music task of MediaEval 2021

Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf