Pytorch implementation of Depth-conditioned Dynamic Message Propagation forMonocular 3D Object Detection

Overview

DDMP-3D

Pytorch implementation of Depth-conditioned Dynamic Message Propagation forMonocular 3D Object Detection, a paper on CVPR2021.

Instroduction

The objective of this paper is to learn context- and depthaware feature representation to solve the problem of monocular 3D object detection. We make following contributions: (i) rather than appealing to the complicated pseudo-LiDAR based approach, we propose a depth-conditioned dynamic message propagation (DDMP) network to effectively integrate the multi-scale depth information with the image context; (ii) this is achieved by first adaptively sampling context-aware nodes in the image context and then dynamically predicting hybrid depth-dependent filter weights and affinity matrices for propagating information; (iii) by augmenting a center-aware depth encoding (CDE) task, our method successfully alleviates the inaccurate depth prior; (iv) we thoroughly demonstrate the effectiveness of our proposed approach and show state-of-the-art results among the monocular-based approaches on the KITTI benchmark dataset.

arch

Requirements

Installation

Our code is based on DGMN, please refer to the installation for maskrcnn-benchmark compilation.

  • My settings

    conda activate maskrcnn_benchmark 
      (maskrcnn_benchmark)  conda list
      python				3.8.5
      pytorch				1.4.0          
      cudatoolkit				10.0.130  
      torchfile				0.1.0
      torchvision				0.5.0
      apex					0.1 

Data preparation

Download and unzip the full KITTI detection dataset to the folder /path/to/kitti/. Then place a softlink (or the actual data) in data/kitti/. There are two widely used training/validation set splits for the KITTI dataset. Here we only show the setting of split1, you can set split2 accordingly.

cd D4LCN
ln -s /path/to/kitti data/kitti
ln -s /path/to/kitti/testing data/kitti_split1/testing

Our method uses DORN (or other monocular depth models) to extract depth maps for all images. You can download and unzip the depth maps extracted by DORN here and put them (or softlink) to the folder data/kitti/depth_2/. (You can also change the path in the scripts setup_depth.py). Additionally, we also generate the xyz map (xy are the values along x and y axises on 2D plane, and z is the depth value) and save as pickle files and then operate like depth map.

Then use the following scripts to extract the data splits, which use softlinks to the above directory for efficient storage.

python data/kitti_split1/setup_split.py
python data/kitti_split1/setup_depth.py

Next, build the KITTI devkit eval for split1.

sh data/kitti_split1/devkit/cpp/build.sh

Lastly, build the nms modules

cd lib/nms
make

Training

You can change the batch_size according to the number of GPUs, default: 8 GPUs with batch_size = 5 on Tesla v100(32G).

If you want to utilize the resnet backbone pre-trained on the COCO dataset, it can be downloaded from git or Google Drive, default: ImageNet pretrained pytorch model, we downloaded the model and saved at 'data/'. You can also set use_corner and corner_in_3d to False for quick training.

See the configurations in scripts/config/config.py and scripts/train.py for details.

sh train.sh

Testing

Generate the results using:

python scripts/test.py

we afford the generated results for evaluation due to the tedious process of data preparation process. Unzip the output.zip and then execute the above evaluation commonds. We show the results in paper, and supplementary. Additionally, we also trained a model replacing the depth map (only contains value of z) with coordinate xyz (xy are the values along x and y axises on 2D plane), which achieves the best performance. You can download the best model on Google Drive.

Models AP3D11@mod. AP3D11@easy AP3D11@hard
model in paper 23.13 / 27.46 31.14 / 37.71 19.45 / 24.53
model in supp 23.17 / 27.85 32.40 / 42.05 19.35 / 24.91
model with coordinate(xyz), config 23.53 / 28.16 30.21 / 38.78 19.72 / 24.80

Acknowledgements

We thank D4LCN and DGMN for their great works and repos.

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{wang2021depth,
  title={Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection},
  author={Wang, Li and Du, Liang and Ye, Xiaoqing and Fu, Yanwei and Guo, Guodong and Xue, Xiangyang and Feng, Jianfeng and Zhang, Li},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={454--463},
  year={2021}
}

Contact

For questions regarding DDMP-3D, feel free to post here or directly contact the authors ([email protected]).

You might also like...
Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in ONNX
Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in ONNX

ONNX msg_chn_wacv20 depth completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20 model in

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.
Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

TFLite-msg_chn_wacv20-depth-completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Data-depth-inference - Data depth inference with python
Data-depth-inference - Data depth inference with python

Welcome! This readme will guide you through the use of the code in this reposito

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Official  implementation of
Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

DAL This project hosts the official implementation for our AAAI 2021 paper: Dynamic Anchor Learning for Arbitrary-Oriented Object Detection [arxiv] [c

Object Depth via Motion and Detection Dataset
Object Depth via Motion and Detection Dataset

ODMD Dataset ODMD is the first dataset for learning Object Depth via Motion and Detection. ODMD training data are configurable and extensible, with ea

Categorical Depth Distribution Network for Monocular 3D Object Detection
Categorical Depth Distribution Network for Monocular 3D Object Detection

CaDDN CaDDN is a monocular-based 3D object detection method. This repository is based off of [OpenPCDet]. Categorical Depth Distribution Network for M

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Comments
  • RuntimeError:  view size is not compatible with input tensor's size and stride ...

    RuntimeError: view size is not compatible with input tensor's size and stride ...

    Hi ,

    Thank you for the great work!

    When I try to do training on a single GPU 3090 24G, I got following error

    return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
    Traceback (most recent call last):
      File "scripts/train.py", line 215, in <module>
        main(sys.argv[1:])
      File "scripts/train.py", line 149, in main
        total_loss.backward()
      File "/home/songlin/anaconda3/envs/py36/lib/python3.6/site-packages/torch/_tensor.py", line 255, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
      File "/home/songlin/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 149, in backward
        allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
      File "/home/songlin/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/function.py", line 87, in apply
        return self._forward_cls.backward(self, *args)  # type: ignore[attr-defined]
      File "/home/songlin/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/function.py", line 204, in wrapper
        outputs = fn(ctx, *args)
      File "/home/songlin/Projects/dgmn/maskrcnn_benchmark/layers/dcn/deform_conv_func.py", line 128, in backward
        cur_im2col_step
    RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
    

    Here is what i changed in train.sh:

    CUDA_VISIBLE_DEVICES=0 ....
    

    and changed batch_size to 2 in config/depth_ddmp.py

    conf.batch_size = 2 # 5*8
    

    then i run

    sh train.sh
    

    Is there anything i missed? i think my memory is enough for batch_size=2. But the error happens as above. By the way, the training can run successfully with batch_size=1 but only takes about 9G memory

    opened by songlin 9
  • some bugs for your code

    some bugs for your code

    Thanks for your great work!

    I have some questions.

    1.when I use sh train.sh, I got the following bug.

    File "/data1/czy/AAAI/DDMP/DDMP-3D/lib/rpn_util.py", line 154, in generate_anchors raise ValueError('Non-used anchor #{} found'.format(aind)) ValueError: Non-used anchor #0 found

    2.But when I use sh train_xyz.sh ,there is no bug.

    When I run sh train_dep_xyz.sh, no such file: training/depth_2/2189.pkl' Can you provide pkl files of xyz map?

    opened by czy341181 3
  • ImportError: cannot import name 'DeformUnfold'

    ImportError: cannot import name 'DeformUnfold'

    Hi, I installed maskrcnn-benchmark by following the installation step here.

    but when i try to run

    sh train.sh
    

    I got error:

    Traceback (most recent call last):
      File "scripts/train.py", line 213, in <module>
        main(sys.argv[1:])
      File "scripts/train.py", line 87, in main
        rpn_net, optimizer, scheduler = init_training_model(conf, paths.output, conf_name)
      File "/home/songlin/Projects/DDMP-3D/lib/core.py", line 66, in init_training_model
        network = absolute_import(dst_path)
      File "/home/songlin/Projects/DDMP-3D/lib/util.py", line 98, in absolute_import
        spec.loader.exec_module(module)
      File "<frozen importlib._bootstrap_external>", line 678, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/home/songlin/Projects/DDMP-3D/output/depth_ddmp/Adaptive_block2_resnet_depth50_ddmp/resnet_ddmp.py", line 7, in <module>
        from maskrcnn_benchmark.layers import DeformUnfold
    ImportError: cannot import name 'DeformUnfold'
    

    I searched the keyword DeformUnfold in the original maskrcnn_benchmark project, but found nothing.

    Please help. Thank you

    opened by songlin 2
Owner
Li Wang
Ph.D
Li Wang
[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

MonoRUn MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation. CVPR 2021. [paper] Hansheng Chen, Yuyao Huang, Wei Tian*

 同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 96 Dec 10, 2022
PyTorch implementation of "Learn to Dance with AIST++: Music Conditioned 3D Dance Generation."

Learn to Dance with AIST++: Music Conditioned 3D Dance Generation. Installation pip install -r requirements.txt Prepare Dataset bash data/scripts/pre

Zj Li 8 Sep 7, 2021
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

null 10 Oct 7, 2022
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

[CVPRW 2021] - Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Anirudh S Chakravarthy 6 May 3, 2022
Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

Pranav 39 Nov 21, 2022
The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

null 32 Jun 14, 2022
Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

LapDepth-release This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals" M

Minsoo Song 205 Dec 30, 2022
Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021) Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma. We address the pr

Kranti Kumar Parida 33 Jun 27, 2022