[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University)

Last update: Jan 4, 2023

Related tags

Deep Learning pytorch levenberg-marquardt cvpr pose-estimation 6dof gauss-newton monocular perspective-n-point 3d-object-detection

Overview

EPro-PnP

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation
In CVPR 2022 (Oral). [paper]
Hansheng Chen*^1,2, Pichao Wang†², Fan Wang², Wei Tian†¹, Lu Xiong¹, Hao Li²

¹Tongji University, ²Alibaba Group
*Part of work done during an internship at Alibaba Group.
†Corresponding Authors: Pichao Wang, Wei Tian.

Introduction

EPro-PnP is a probabilistic Perspective-n-Points (PnP) layer for end-to-end 6DoF pose estimation networks. Broadly speaking, it is essentially a continuous counterpart of the widely used categorical Softmax layer, and is theoretically generalizable to other learning models with nested optimization.

Given the layer input: an -point correspondence set consisting of 3D object coordinates , 2D image coordinates , and 2D weights , a conventional PnP solver searches for an optimal pose (rigid transformation in SE(3)) that minimizes the weighted reprojection error. Previous work tries to backpropagate through the PnP operation, yet is inherently non-differentiable due to the inner operation. This leads to convergence issue if all the components in must be learned by the network.

In contrast, our probabilistic PnP layer outputs a posterior distribution of pose, whose probability density can be derived for proper backpropagation. The distribution is approximated via Monte Carlo sampling. With EPro-PnP, the correspondences can be learned from scratch altogether by minimizing the KL divergence between the predicted and target pose distribution.

Models

We release two distinct networks trained with EPro-PnP:

EPro-PnP-6DoF for 6DoF pose estimation
EPro-PnP-Det for 3D object detection

Use EPro-PnP in Your Own Model

We provide a demo on the usage of the EPro-PnP layer.

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{epropnp, 
  author = {Hansheng Chen and Pichao Wang and Fan Wang and Wei Tian and Lu Xiong and Hao Li, 
  title = {EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation}, 
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 
  year = {2022}
}

Comments

About sampling points during training

Hi, authors!

Recently I've been following your work EPro-PnP-6DoF.

I have a problem with sampling points during training. I've noticed that you randomly sampled 512 points in the output x2d/x3d images (code here). But there might be the issue with sampling some background points and disturbing the procedure of estimating object pose.

Have you ever considered this or conducted any experiments?

opened by shanice-l 7
unstable (or not stable enough) translation of far objects

Hi, this is really a fantastic work. I tried 4DoF model and it works seamlessly especially when the car are nearby and not truncated. However, I noticed that for objects that are a little far from the came (e.g. > 80 meters), the translation becomes unstable enough. For example, in frame n-1 the object might be 85 meters away but in frame n it may be 80 or 90 meters away. I understand that the relative precision might still acceptable but the absolute error should not take into account. So I'm wondering is there any approach to eliminate the error for far objects (both training or inferring are fine)?

opened by qinyq 6
Use EPro-PnP to replace EPnP

Hi, Thanks for your great job! I have some questions about the performance of EPro-PnP, can I use EPro-PnP as a faster PnP solver to replace the traditional PnP(e.g. EPnP with Ransec by cv2.solvePnPRansac)?. I try to use EPro-PnP to replace the EPnP as a post-processing module to solve the 2D-3D registration problem(image to point cloud) , but I found that EPro-PnP performs well in easy mode but will cause large errors in difficult mode (with some false correspondences) while EPnP (with Ransec) can handle these false correspondences more robust. Can you give me some suggestions on how to avoid large errors? Thanks a lot.

opened by junshengzhou 5
A question about equation(1) to equation(2).

Hi HanSheng, very excellent work! In equation (2), likelihood fuction p(X|y) is defined as $https://latex.codecogs.com/png.image?\huge \dpi{50}p(X|y)=e^{-\frac{1}{2}\sum_{i=1}^{N}||f(y_i)||^2}$ . We can view p(X|y) as the joint probability of p(X1|y), ... , p(X2|y). So $https://latex.codecogs.com/png.image?\huge \dpi{50}p(X_i|y)=e^{-\frac{1}{2}{}||f(y_i)||^2}$ . ||fi(y)|| is reprojective error and its value from 0 to infinity. Let x be ||fi(y)||, $https://latex.codecogs.com/png.image?\huge \dpi{50}\int_{0}^{+\infty}e^{-\frac{1}{2}x^2}=\frac{\sqrt{2\pi}}{2}\neq 1$ . So p(Xi|y) is not a pdf. Is my induction correct? If my induction is correct, statement p(X|y) is not proper here. And following equation can't use Bayes theorem to get p(y|X). Perhaps i'm splitting hairs, but it really confused me.

opened by TheCuriousJoe 4
Questions about the Jacobian matrix
Congratulations on winning the honor of cvpr2022 best student paper and thanks for sharing your work!

However, the computation of the Jacobian matrix confused me a lot. In your code, https://github.com/tjiiv-cprg/EPro-PnP/blob/1545131fde2b47ec4c135776d575e11feeb0f7cd/epropnp/camera.py#L121

When the dof equals to 6,

d_x3dcam_d_rot = skew(x3d_rot * 2) jac = torch.cat((d_x2d_d_x3dcam, d_x2d_d_x3dcam @ d_x3dcam_d_rot), dim=-1)

Could you please tell me how do you get the Jacobian matrix? It’s a little difficult for me to understand the d_x3dcam_d_rot, and I wish your reply could help me to figure out why the x3d_rot should be multiplied by 2, and why only the rotation is considered.

Additionally, I referred to some materials when trying to compute the Jacobian matrix, for example, in this article https://zhuanlan.zhihu.com/p/482540286, they compute the Jacobian matrix of the reprojection errors in this way:

I think if I follow this equation, the code should be

x3d_cam = x3d @ quaternion_to_rot_mat(pose[..., 3:]).transpose(-1, -2)+ pose[..., None, :3] jac = torch.cat((d_x2d_d_x3dcam, d_x2d_d_x3dcam @ skew(x3d_cam)), dim=-1)

Which is different from your algorithm, and it puzzles me a lot. I am looking forward to your reply, and thank you again for your great job.
opened by lyj9494 4
RuntimeError: CUDA error: invalid argument, same error for demo and test process

Hello, i have a problem when run demo and test code, some like this:

File "/home/EPro-PnP/EPro-PnP/EPro-PnP-Det/epropnp_det/ops/pnp/levenberg_marquardt.py", line 292, in center_based_init camera.cam_mats).transpose(-1, -2) File "/home/EPro-PnP/EPro-PnP/EPro-PnP-Det/epropnp_det/ops/pnp/levenberg_marquardt.py", line 18, in solve_wrapper return torch.linalg.solve(A, b) RuntimeError: CUDA error: invalid argument

Is something wrong in my Installation or other problem? Thanks a lot ~

opened by uljnzkit 4
discard file

I downloaded dataset sub-datasets "full dataset(v1.0)" and "Teaser dataset (v0.1) -- Deprecated" in "nuScenes" to pre-process the data. When I run the follow code:"python tools/data_converter/nuscenes_converter.py data/nuscenes --version v1.0-trainval" An error:"FileNotFoundError: file "data/nuscenes/samples/LIDAR_TOP/n015-2018-08-02-17-16-37+0800__LIDAR_TOP__1533201470448696.pcd.bin" does not exist " Please tell me how to slove it ? Thanks!

opened by legendship 3
A quick question about pose ambiguity

Hi, when I read the paper, I got intuition saying: pose ambiguity can be handled by multiple modes learned by the model. However, right after that intuition, you said that: empirically, Dirac-like distribution works best, resulting in the simplified KL-divergence as eq (5).

How to understand the intuition and the empirical finding?

Thanks,

opened by ZhiyLiu 3
Negative Monte Carlo pose loss during training

Dear authors,

Thanks for releasing your excellent work.

I'm training EPro-PnP-6DoF following your instructions provided. I noticed that loss_mc sometimes becomes negative, resulting in the overall loss being negative as well. I believe this is caused by the logarithm in the computation for L_pred, as the log of the number between 0 and 1 is negative. Is this behavior expected? Could you please elaborate a little on the logic behind it?

Much appreciated your reply in advance!

opened by jinhong-ni 2
Test Accuracy all zeros

hello, i use your trained model to test 6D pose estimate in linemod dataset, but i find all the accuracies are zeros and none. what's wrong with my code?

opened by jiangziben 2
The question about output of 'monte_carlo_forward'

hello, thanks for your source code firstly. i want to know what is the difference between "pose_opt" and "pose_opt_plus" if i want to use pose in my own loss, which one should i pick ? thanks for your answer !

return pose_opt, cost, pose_opt_plus, pose_samples, pose_sample_logweights, cost_init

opened by dream-chaser 2
“The VOC2012 is only used as background data”?What do you mean?
Hello, I want to ask a question.

When I was debugging the code of "EPro-PnP-6Dof" project, I saw a piece of code ”rgb=self.change_bg(rgb,msk) “in ”lm.py“ file. I didn't understand it. Why do I need to change the background color?

Under the training path: "EPRO-PNP-6DOF//EXP/ENPropNP _ Basic/XX/TRAIN _ VIS _ 0//",I found that all "xxx.png" files are strange pictures. That is, the background is voc data set, and the middle is one of 13 types of train object, such as "a desk lamp or a seat appears on the arm" or a toy appears on the face of a person.which looks extremely discordant.Is this normal?
opened by 1545344006 2
is there any existed work that apply this fantastic algorithm to BOP test?

Hi, first congratulate and respect to your great work! I find the dataset structure in your "data preparation" part is different from that of the official BOP dataset, i.e. the official BOP dataset may not contain VOC2012 as background and the content in LM is different from yours. So I wonder how to test on BOP dataset (using BOP toolkit) with your EPro-pnp. Is there any suggestions or existed work? Thanks a lot!

opened by minghuiwsw 1
Norm Factor Intuition
Hello, Firstly i wanna thank you for the excellent work! I have three questions regarding the norm factor(Global scaling) learned in EPro-PnP-6DoF's rotation head.

What's the reason of learning it in the first place? The paper stated that it is a global concentration of predicted pose distribution but I'm still not sure what that means. Is it some kind of mean or median?

What's the effect of including it on the Monte Carlo Pose Loss? I saw that you use it to divide the loss in the code

Is it possible to obtain the norm factor aside from learning?

Thank you very much!
opened by korizona 1
When making my custom dataset, how to generate the .pkl format file under the real_train path?

老师，您好，有一个问题，困惑了我好长时间了，今天特意来此处，请教一下您，渴望能够得到您的解答哟，先感谢一个哟。

我们的应用场景是用机械手去抓取一个指定的目标对象，因此，我们需要准确的解析出目标相对于相机坐标系的位姿。我们的图像是通过“Creator建模+Vega仿真”的技术途径来生成的，也就是说，所有的图像都是仿真生成的，另外，在仿真的过程中，我们还可以同步的输出目标在当前观测视点下的位姿（可以转换成含有旋转和平移分量的尺寸为4×4的齐次变换矩阵）。在制作LineMOD样式的自定义数据集时，我们遇到了如下的问题：

怎么生成“lm\real_train\ape\000005-coor.pkl”样式的数据呢？或者说，“000005-coor.pkl”样式的数据，是怎么制作出来的呢？

我仔细的分析了一下，“000005-coor.pkl”这个文件当中的coor，其形状为(57, 44, 3)，该尺寸跟同一路径下的“000005-box.txt”里面的数据是相吻合的：

紧接着，我们统计了一下coor当中的非零行的数量（感觉这个数据想要表达的是目标的顶点数量），发现它跟同一路径下的“000005-label.png”当中的非零像素的总量，并不相等：

基于此差异，我做出了进一步的推断：我们可能需要根据目标的姿态信息（正如同一路径下的“000005-pose.txt”所反映出来的那样）来变换原始的模型文件（models\ape\ape.ply），以便生成跟“000005-label.png”等组合数据相对应的coor.pkl文件，为此，我尝试了一下Matlab当中的pctransform函数，遗憾的是，目前对于此函数的调用，还没有成功。

又或者是coor.pkl这个文件跟原始的模型文件ape.ply之间，并没有显示的对应关系呢？

老师，不知道我的分析思路是否正确呢？或者说，我到底应该采取什么样的解析途径，才能将coor.pkl这个数据的制作规律，给摸索清楚呢？可否请您指点一下迷津呢？万分感谢哟。

opened by TopPseudoExpert 3

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Related tags

Overview

EPro-PnP

Introduction

Models

Use EPro-PnP in Your Own Model

Citation

Comments

Hello, i have a problem when run demo and test code, some like this:

Owner

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University)

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)

Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Affine / perspective transformation in Pose Estimation with Tensorflow 2

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.