Single-Stage 6D Object Pose Estimation, CVPR 2020

CVLAB @ EPFL

Last update: Dec 26, 2022

Related tags

Deep Learning single-stage-pose

Overview

This repository contains the code for the paper Single-Stage 6D Object Pose Estimation. Yinlin Hu, Pascal Fua, Wei Wang and Mathieu Salzmann. CVPR 2020.

Most recent 6D pose estimation frameworks first rely on a deep network to establish correspondences between 3D object keypoints and 2D image locations and then use a variant of a RANSAC-based Perspective-n-Point (PnP) algorithm. This two-stage process, however, is suboptimal: First, it is not end-to-end trainable. Second, training the deep network relies on a surrogate loss that does not directly reflect the final 6D pose estimation task.

In this work, we introduce a deep architecture that directly regresses 6D poses from correspondences. It takes as input a group of candidate correspondences for each 3D keypoint and accounts for the fact that the order of the correspondences within each group is irrelevant, while the order of the groups, that is, of the 3D keypoints, is fixed. Our architecture is generic and can thus be exploited in conjunction with existing correspondence-extraction networks so as to yield single-stage 6D pose estimation frameworks. Our experiments demonstrate that these single-stage frameworks consistently outperform their two-stage counterparts in terms of both accuracy and speed.

Figure 1: Motivation. Consider the modern 6D pose estimation algorithm of SegDrivenPose that uses a deep network to predict several 2D correspondences for each of the eight 3D corners of the pitcher's bounding box. (a) Because it minimizes the average 2D error of these correspondences, two instances of such a framework could produce correspondences that differ but have the same average accuracy, such as the green and the red ones. As evidenced by the projected green and red reference frames, applying a RANSAC-based PnP algorithm to these two sets of correspondences can yield substantially different poses. (b) Even when using only the set of green correspondences, simply changing their order causes a RANSAC-based PnP algorithm to return different solutions.

Figure 2: Overall architecture for single-stage 6D object pose estimation. After establishing 3D-to-2D correspondences by some segmentation-driven CNN for 6D pose, we use three main modules to infer the pose from these correspondence clusters directly: a local feature extraction module with shared network parameters, a feature aggregation module operating within the different clusters, and a global inference module consisting of simple fully-connected layers to estimate the final pose as a quaternion and a translation. The color in the CNN outputs indicates the direction of the 2D offset from the grid cell center to the corresponding projected 3D bounding box corner.

How to Use

This repository contains the code for the core network inferring pose from correspondences. It is straightforward to merge with other correspondence-extraction networks SegDrivenPose or PVNet to obtain an end-to-end 6D pose framework.

Citing

@inproceedings{hu2020singlestagepose,
  title={Single-Stage 6D Object Pose Estimation},
  author={Yinlin Hu and Pascal Fua and Wei Wang and Mathieu Salzmann},
  booktitle={CVPR},
  year={2020}
}

Comments

Question about training loss L_p

In the paper, the loss is \frac{1}{n} \sum ||(\hat{R}p_i + \hat{T} - (Rp_i+T))||. In the code, it is slightly different which is multiplied by intrinsic matrix before computing the distance. Are these two similar or mutually surrogate?

opened by xuhuazhe 2
Request a full single-stage network for the single stage training

Dear authors, Thank you so much for opensourcing your work!! I think if there is a full code to train the single-stage network, it would be very useful and impactful in many domains (vision, robotics, etc)!! Thank you!!

opened by xuhuazhe 2
Confusion about function PnP_Learning

Hi, I'm trying to merge the pose inference code with our own method， but I found that the inference model require the dxy which is calculated by predicted keypoints and reprojected keypoints as a part of inputs. In practice, how can I get reprojected keypoints to calculate dxy without object pose? Thanks you!!!

opened by acmff22 1
Hello, I wanted to predict the pose of my object(pole) . And then, many ways are based on depth image or point cloud data. Unfortunately, I couldn't get those data. I just can get single images captured by normal camera. I tried to use 3D object detection at first, but failed several times. I found 6D pose estimation possible, I think. And then, I want to know whether it is possible to predict the 6D pose of my object(pole) from one single image. If yes, how can I train them? I have more than 10000 images. Thank you in advance.

Hello, I wanted to predict the pose of my object(pole) . And then, many ways are based on depth image or point cloud data. Unfortunately, I couldn't get those data. I just can get single images captured by normal camera. I tried to use 3D object detection at first, but failed several times. I found 6D pose estimation possible, I think. And then, I want to know whether it is possible to predict the 6D pose of my object(pole) from one single image. If yes, how can I train them? I have more than 10000 images. Thank you in advance.

opened by 1208overlord 0
How to explain the three modules in the structure

Figure 2：“The color in the CNN outputs indicates the direction of the 2D offset from the grid cell center to the corresponding projected 3D bounding box corner.” How to explain this sentence？How to understand the three colors of CNN output？I'm new here, so I don't really understand。Thank you for your reply

opened by WW-0 0
3D Pose Augmentation

Hi, I am trying to merge the PVNet code with your single stage code. As part of training, the images undergo augmentation such as rotation, cropping and scaling. I have applied the corresponding augmentation to the 3D pose but the network is very slow to train. After 120 epochs, it is performing very poorly even on train data. Removing augmentation altogether causes the network to overfit. It would be great if you could suggest where I am going wrong. Thanks!

opened by Anshul-Gupta24 3

Owner

CVLAB @ EPFL

GitHub

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

328 Dec 17, 2022

Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Single-view robot pose and joint angle estimation via render & compare Yann Labbé, Justin Carpentier, Mathieu Aubry, Josef Sivic CVPR: Conference on C

51 Oct 14, 2022

FishNet: One Stage to Detect, Segmentation and Pose Estimation

FishNet FishNet: One Stage to Detect, Segmentation and Pose Estimation Introduction In this project, we combine target detection, instance segmentatio

1 Oct 5, 2022

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

GDR-Net This repo provides the PyTorch implementation of the work: Gu Wang, Fabian Manhardt, Federico Tombari, Xiangyang Ji. GDR-Net: Geometry-Guided

169 Jan 7, 2023

[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

template-pose Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions

92 Dec 28, 2022

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University)

842 Jan 4, 2023

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021. For details of the model and experiments, please see our paper.

87 Dec 16, 2022

Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

CDN Code for our NeurIPS 2021 paper "Mining the Benefits of Two-stage and One-stage HOI Detection". Contributed by Aixi Zhang*, Yue Liao*, Si Liu, Mia

71 Dec 14, 2022

Code for Mining the Benefits of Two-stage and One-stage HOI Detection

Status: Archive (code is provided as-is, no updates expected) PPO-EWMA [Paper] This is code for training agents using PPO-EWMA and PPG-EWMA, introduce

33 Dec 15, 2022

Virtual Dance Reality Stage: a feature that offers you to share a stage with another user virtually

Portrait Segmentation using Tensorflow This script removes the background from an input image. You can read more about segmentation here Setup The scr

291 Dec 24, 2022

Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

SUO-SLAM This repository hosts the code for our CVPR 2022 paper "Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation". ArXiv li

Robot Perception & Navigation Group (RPNG)

97 Jan 3, 2023

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

4 Dec 15, 2022

《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

42 Nov 24, 2022

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a

256 Dec 24, 2022

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

52 Nov 25, 2022

Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

99 Dec 31, 2022

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

677 Dec 25, 2022

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

4 Nov 3, 2022

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

5.8k Dec 31, 2022

Single-Stage 6D Object Pose Estimation, CVPR 2020

Related tags

Overview

Overview

How to Use

Citing

Comments

Question about training loss L_p

Request a full single-stage network for the single stage training

Confusion about function PnP_Learning

How to explain the three modules in the structure

3D Pose Augmentation

Owner

CVLAB @ EPFL

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

FishNet: One Stage to Detect, Segmentation and Pose Estimation

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

Code for Mining the Benefits of Two-stage and One-stage HOI Detection

Virtual Dance Reality Stage: a feature that offers you to share a stage with another user virtually

Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.