Pytorch implementation of Depth-conditioned Dynamic Message Propagation forMonocular 3D Object Detection

Li Wang

Last update: Nov 9, 2022

Related tags

Deep Learning DDMP-3D

Overview

DDMP-3D

Pytorch implementation of Depth-conditioned Dynamic Message Propagation forMonocular 3D Object Detection, a paper on CVPR2021.

Instroduction

The objective of this paper is to learn context- and depthaware feature representation to solve the problem of monocular 3D object detection. We make following contributions: (i) rather than appealing to the complicated pseudo-LiDAR based approach, we propose a depth-conditioned dynamic message propagation (DDMP) network to effectively integrate the multi-scale depth information with the image context; (ii) this is achieved by first adaptively sampling context-aware nodes in the image context and then dynamically predicting hybrid depth-dependent filter weights and affinity matrices for propagating information; (iii) by augmenting a center-aware depth encoding (CDE) task, our method successfully alleviates the inaccurate depth prior; (iv) we thoroughly demonstrate the effectiveness of our proposed approach and show state-of-the-art results among the monocular-based approaches on the KITTI benchmark dataset.

Requirements

Installation

Our code is based on DGMN, please refer to the installation for maskrcnn-benchmark compilation.

My settings

conda activate maskrcnn_benchmark 
  (maskrcnn_benchmark)  conda list
  python				3.8.5
  pytorch				1.4.0          
  cudatoolkit				10.0.130  
  torchfile				0.1.0
  torchvision				0.5.0
  apex					0.1

Data preparation

Download and unzip the full KITTI detection dataset to the folder /path/to/kitti/. Then place a softlink (or the actual data) in data/kitti/. There are two widely used training/validation set splits for the KITTI dataset. Here we only show the setting of split1, you can set split2 accordingly.

cd D4LCN
ln -s /path/to/kitti data/kitti
ln -s /path/to/kitti/testing data/kitti_split1/testing

Our method uses DORN (or other monocular depth models) to extract depth maps for all images. You can download and unzip the depth maps extracted by DORN here and put them (or softlink) to the folder data/kitti/depth_2/. (You can also change the path in the scripts setup_depth.py). Additionally, we also generate the xyz map (xy are the values along x and y axises on 2D plane, and z is the depth value) and save as pickle files and then operate like depth map.

Then use the following scripts to extract the data splits, which use softlinks to the above directory for efficient storage.

python data/kitti_split1/setup_split.py
python data/kitti_split1/setup_depth.py

Next, build the KITTI devkit eval for split1.

sh data/kitti_split1/devkit/cpp/build.sh

Lastly, build the nms modules

cd lib/nms
make

Training

You can change the batch_size according to the number of GPUs, default: 8 GPUs with batch_size = 5 on Tesla v100(32G).

If you want to utilize the resnet backbone pre-trained on the COCO dataset, it can be downloaded from git or Google Drive, default: ImageNet pretrained pytorch model, we downloaded the model and saved at 'data/'. You can also set use_corner and corner_in_3d to False for quick training.

See the configurations in scripts/config/config.py and scripts/train.py for details.

sh train.sh

Testing

Generate the results using:

python scripts/test.py

we afford the generated results for evaluation due to the tedious process of data preparation process. Unzip the output.zip and then execute the above evaluation commonds. We show the results in paper, and supplementary. Additionally, we also trained a model replacing the depth map (only contains value of z) with coordinate xyz (xy are the values along x and y axises on 2D plane), which achieves the best performance. You can download the best model on Google Drive.

Models	AP3D11@mod.	AP3D11@easy	AP3D11@hard
model in paper	23.13 / 27.46	31.14 / 37.71	19.45 / 24.53
model in supp	23.17 / 27.85	32.40 / 42.05	19.35 / 24.91
model with coordinate(xyz), config	23.53 / 28.16	30.21 / 38.78	19.72 / 24.80

Acknowledgements

We thank D4LCN and DGMN for their great works and repos.

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{wang2021depth,
  title={Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection},
  author={Wang, Li and Du, Liang and Ye, Xiaoqing and Fu, Yanwei and Guo, Guodong and Xue, Xiangyang and Feng, Jianfeng and Zhang, Li},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={454--463},
  year={2021}
}

Contact

For questions regarding DDMP-3D, feel free to post here or directly contact the authors ([email protected]).

You might also like...

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in ONNX

ONNX msg_chn_wacv20 depth completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20 model in

19 Oct 22, 2022

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

TFLite-msg_chn_wacv20-depth-completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model

2 Oct 4, 2021

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

13 Dec 10, 2022

Data-depth-inference - Data depth inference with python

Welcome! This readme will guide you through the use of the code in this reposito

3 Feb 8, 2022

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

138 Dec 28, 2022

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

DAL This project hosts the official implementation for our AAAI 2021 paper: Dynamic Anchor Learning for Arbitrary-Oriented Object Detection [arxiv] [c

215 Nov 28, 2022

Object Depth via Motion and Detection Dataset

ODMD Dataset ODMD is the first dataset for learning Object Depth via Motion and Detection. ODMD training data are configurable and extensible, with ea

172 Dec 21, 2022

Categorical Depth Distribution Network for Monocular 3D Object Detection

CaDDN CaDDN is a monocular-based 3D object detection method. This repository is based off of [OpenPCDet]. Categorical Depth Distribution Network for M

289 Jan 5, 2023

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

364 Jan 3, 2023

Comments

RuntimeError: view size is not compatible with input tensor's size and stride ...

Hi ,

Thank you for the great work!

When I try to do training on a single GPU 3090 24G, I got following error

return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Traceback (most recent call last):
  File "scripts/train.py", line 215, in <module>
    main(sys.argv[1:])
  File "scripts/train.py", line 149, in main
    total_loss.backward()
  File "/home/songlin/anaconda3/envs/py36/lib/python3.6/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/songlin/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 149, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
  File "/home/songlin/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/function.py", line 87, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore[attr-defined]
  File "/home/songlin/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/function.py", line 204, in wrapper
    outputs = fn(ctx, *args)
  File "/home/songlin/Projects/dgmn/maskrcnn_benchmark/layers/dcn/deform_conv_func.py", line 128, in backward
    cur_im2col_step
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Here is what i changed in train.sh:

CUDA_VISIBLE_DEVICES=0 ....

and changed batch_size to 2 in config/depth_ddmp.py

conf.batch_size = 2 # 5*8

then i run

sh train.sh

Is there anything i missed? i think my memory is enough for batch_size=2. But the error happens as above. By the way, the training can run successfully with batch_size=1 but only takes about 9G memory

opened by songlin 9

some bugs for your code

Thanks for your great work!

I have some questions.

1.when I use sh train.sh, I got the following bug.

File "/data1/czy/AAAI/DDMP/DDMP-3D/lib/rpn_util.py", line 154, in generate_anchors raise ValueError('Non-used anchor #{} found'.format(aind)) ValueError: Non-used anchor #0 found

2.But when I use sh train_xyz.sh ,there is no bug.

When I run sh train_dep_xyz.sh, no such file: training/depth_2/2189.pkl' Can you provide pkl files of xyz map?

opened by czy341181 3

ImportError: cannot import name 'DeformUnfold'

Hi, I installed maskrcnn-benchmark by following the installation step here.

but when i try to run

sh train.sh

I got error:

Traceback (most recent call last):
  File "scripts/train.py", line 213, in <module>
    main(sys.argv[1:])
  File "scripts/train.py", line 87, in main
    rpn_net, optimizer, scheduler = init_training_model(conf, paths.output, conf_name)
  File "/home/songlin/Projects/DDMP-3D/lib/core.py", line 66, in init_training_model
    network = absolute_import(dst_path)
  File "/home/songlin/Projects/DDMP-3D/lib/util.py", line 98, in absolute_import
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/songlin/Projects/DDMP-3D/output/depth_ddmp/Adaptive_block2_resnet_depth50_ddmp/resnet_ddmp.py", line 7, in <module>
    from maskrcnn_benchmark.layers import DeformUnfold
ImportError: cannot import name 'DeformUnfold'

I searched the keyword DeformUnfold in the original maskrcnn_benchmark project, but found nothing.

Please help. Thank you

opened by songlin 2

Owner

Li Wang

Ph.D

GitHub

[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

MonoRUn MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation. CVPR 2021. [paper] Hansheng Chen, Yuyao Huang, Wei Tian*

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University)

96 Dec 10, 2022

PyTorch implementation of "Learn to Dance with AIST++: Music Conditioned 3D Dance Generation."

Learn to Dance with AIST++: Music Conditioned 3D Dance Generation. Installation pip install -r requirements.txt Prepare Dataset bash data/scripts/pre

8 Sep 7, 2021

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

248 Dec 23, 2022

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

147 Dec 3, 2022

Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

10 Oct 7, 2022

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

[CVPRW 2021] - Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

6 May 3, 2022

Pytorch implementation of Depth-conditioned Dynamic Message Propagation forMonocular 3D Object Detection

Related tags

Overview

DDMP-3D

Instroduction

Requirements

Installation

Data preparation

Training

Testing

Acknowledgements

Citation

Contact

You might also like...

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in ONNX

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

Data-depth-inference - Data depth inference with python

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

Object Depth via Motion and Detection Dataset

Categorical Depth Distribution Network for Monocular 3D Object Detection

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Comments

RuntimeError: view size is not compatible with input tensor's size and stride ...

some bugs for your code

ImportError: cannot import name 'DeformUnfold'

Owner

Li Wang

[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

PyTorch implementation of "Learn to Dance with AIST++: Music Conditioned 3D Dance Generation."

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)