Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

NVIDIA Research Projects

Last update: Dec 27, 2022

Related tags

Deep Learning CenterPose

Overview

CenterPose

Overview

This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image" by Lin et al. (full citation below). In this work, we propose a single-stage, keypoint-based approach for category-level object pose estimation, which operates on unknown object instances within a known category using a single RGB image input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative 3D bounding cuboid dimensions. These quantities are estimated in a sequential fashion, leveraging the recent idea of convGRU for propagating information from easier tasks to those that are more difficult. We favor simplicity in our design choices: generic cuboid vertex coordinates, a single-stage network, and monocular RGB input. We conduct extensive experiments on the challenging Objectron benchmark of real images, outperforming state-of-the-art methods for 3D IoU metric (27.6% higher than the single-stage approach of MobilePose and 7.1% higher than the related two-stage approach). The algorithm runs at 15 fps on an NVIDIA GTX 1080Ti GPU.

Installation

The code was tested on Ubuntu 16.04, with Anaconda Python 3.6 and PyTorch 1.1.0. Higher versions should be possible with some accuracy difference. NVIDIA GPUs are needed for both training and testing.

Clone this repo:

CenterPose_ROOT=/path/to/clone/CenterPose
git clone https://github.com/NVlabs/CenterPose.git $CenterPose_ROOT

Create an Anaconda environment or create your own virtual environment

conda create -n CenterPose python=3.6
conda activate CenterPose
pip install -r requirements.txt
conda install -c conda-forge eigenpy

Compile the deformable convolutional layer

git submodule init
git submodule update
cd $CenterPose_ROOT/src/lib/models/networks/DCNv2
./make.sh

[Optional] If you want to use a higher version of PyTorch, you need to download the latest version of DCNv2 and compile the library.

git submodule set-url https://github.com/jinfagang/DCNv2_latest.git src/lib/models/networks/DCNv2
git submodule sync
git submodule update --init --recursive --remote
cd $CenterPose_ROOT/src/lib/models/networks/DCNv2
./make.sh

Download our pre-trained models for CenterPose and move all the .pth files to $CenterPose_ROOT/models/CenterPose/. We currently provide models for 9 categories: bike, book, bottle, camera, cereal_box, chair, cup, laptop, and shoe.
Prepare training/testing data

We save all the training/testing data under $CenterPose_ROOT/data/.

For the Objectron dataset, we created our own data pre-processor to extract the data for training/testing. Refer to the data directory for more details.

Demo

We provide supporting demos for image, videos, webcam, and image folders. See $CenterPose_ROOT/images/CenterPose

For category-level 6-DoF object estimation on images/video/image folders, run:

cd $CenterPose_ROOT/src
python demo.py --demo /path/to/image/or/folder/or/video --arch dlav1_34 --load_model ../path/to/model

You can also enable --debug 4 to save all the intermediate and final outputs.

For the webcam demo (You may want to specify the camera intrinsics via --cam_intrinsic), run

cd $CenterPose_ROOT/src
python demo.py --demo webcam --arch dlav1_34 --load_model ../path/to/model

Training

We follow the approach of CenterNet for training the DLA network, reducing the learning rate by 10x after epoch 90 and 120, and stopping after 140 epochs.

For debug purposes, you can put all the local training params in the $CenterPose_ROOT/src/main_CenterPose.py script. You can also use the command line instead. More options are in $CenterPose_ROOT/src/lib/opts.py.

To start a new training job, simply do the following, which will use default parameter settings:

cd $CenterPose_ROOT/src
python main_CenterPose.py

The result will be saved in $CenterPose_ROOT/exp/object_pose/$dataset_$category_$arch_$time ,e.g., objectron_bike_dlav1_34_2021-02-27-15-33

You could then use tensorboard to visualize the training process via

cd $path/to/folder
tensorboard --logdir=logs --host=XX.XX.XX.XX

Evaluation

We evaluate our method on the Objectron dataset, please refer to the objectron_eval directory for more details.

Citation

Please cite grasp_primitiveShape if you use this repository in your publications:

@article{lin2021single,
  title={Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image},
  author={Lin, Yunzhi and Tremblay, Jonathan and Tyree, Stephen and Vela, Patricio A and Birchfield, Stan},
  journal={arXiv preprint arXiv:2109.06161},
  year={2021}
}

Licence

CenterPose is licensed under the NVIDIA Source Code License - Non-commercial.

Comments

Question about loss function

I noticed that you use 2d keypoints & relative cuboid dimensions for supervision, could i also use the 6-DOF pose for surpervision? this 6-DOF loss could backward correctly？Does PNP algorithm affect back propagation？ thanks for your reply！

opened by xiaoxin-Crayon 5
Question about scale factor on translation vector

Hello, thanks for this awesome work!

I have a question related to the scale factor: from your paper it's crystal clear to me that it's possible to recover just the relative size of the estimated 3D bounding box.

However I was wondering whether the translation vector of the 6D pose is estimated with absolute scale (provided the correct intrinsic parameters) or not

I remain at your disposal for further clarification about my question.

opened by AlbertoRemus 2
lack some parameters to run demo.py

Hi, thanks for your code. I encountered a problem when I try to run demo.py. I loaded the pre-trained parameters of 'chair' (chair_v1_140.pth), and used the images in the chair folder. But it prompted that some parameters were missing, as shown below. I would appreciate your help.

F4`)BX](https://user-images.githubusercontent.com/95119853/188465002-d4c4eecb-409f-4c25-bc42-8a40723d66af.png)

opened by NeilLHY 1
Question about experiements

Great work and nice documented repository.

First is there a plan to update DCNv2 to the latest working then also on the newest PyTorch version?

Second, I have been wondering if there is any subsequent work planed using the NOCS dataset?

opened by HannahHaensen 1
eval problem

Could you please provide a detailed procedure for evaluating the model? Because I tested it according to the code you provided and can't output any results, thanks.

opened by YC0315 0
ImportError: dlopen: cannot load any more object with static TLS

When executing "python main_CenterPose.py", an error is reported: ImportError: dlopen: cannot load any more object with static TLS. Have you encountered it?

opened by YC0315 1
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:425

(CenterPose) dell1804@dell1804-G3-3590:~/center_pose_ws/CenterPose/src$ python demo.py --demo ../data/book.jpg --arch dlav1_34 --load_model ../models/CenterPose/book_v1_140.pth /home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/sklearn/utils/linear_assignment_.py:22: FutureWarning: The linear_assignment_ module is deprecated in 0.21 and will be removed from 0.23. Use scipy.optimize.linear_sum_assignment instead. FutureWarning) Fix size testing. training chunk_sizes: [1] The output will be saved to /home/dell1804/center_pose_ws/CenterPose/src/lib/../../exp/object_pose/default heads {'hm': 1, 'wh': 2, 'hps': 16, 'reg': 2, 'hm_hp': 8, 'hp_offset': 2, 'scale': 3} Creating model... loaded ../models/CenterPose/book_v1_140.pth, epoch 140 THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument Traceback (most recent call last): File "demo.py", line 156, in demo(opt, meta) File "demo.py", line 83, in demo ret = detector.run(image_name, meta_inp=meta) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/detectors/base_detector.py", line 474, in run images, self.pre_images, pre_hms, pre_hm_hp, pre_inds, return_time=True) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/detectors/object_pose.py", line 135, in process output = self.model(images, pre_images, pre_hms, pre_hm_hp)[-1] File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/pose_dla_dcn.py", line 531, in forward x = self.dla_up(x) File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/pose_dla_dcn.py", line 441, in forward ida(layers, len(layers) - i - 2, len(layers)) File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/pose_dla_dcn.py", line 415, in forward layers[i] = upsample(project(layers[i])) File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/pose_dla_dcn.py", line 387, in forward x = self.conv(x) File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/DCNv2/dcn_v2.py", line 128, in forward self.deformable_groups) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/DCNv2/dcn_v2.py", line 31, in forward ctx.deformable_groups) RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:425

(CenterPose) dell1804@dell1804-G3-3590:~/center_pose_ws/CenterPose/src$ nvidia-smi /usr/bin/nvidia-modprobe: unrecognized option: "-s"

ERROR: Invalid commandline, please run /usr/bin/nvidia-modprobe --help for usage information.

/usr/bin/nvidia-modprobe: unrecognized option: "-s"

ERROR: Invalid commandline, please run /usr/bin/nvidia-modprobe --help for usage information.

Sun Oct 9 11:10:47 2022
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | N/A 50C P8 2W / N/A | 1083MiB / 3911MiB | 14% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1165 G /usr/lib/xorg/Xorg 226MiB | | 0 N/A N/A 1846 G /usr/bin/gnome-shell 50MiB | | 0 N/A N/A 3778 G ...428520904353170423,131072 72MiB | | 0 N/A N/A 24592 C python 727MiB | +-----------------------------------------------------------------------------+

Python 3.6.15 | packaged by conda-forge | (default, Dec 3 2021, 18:49:41) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch torch.version '1.1.0'

opened by dbdxnuliba 1
Question about obj_scale_loss

Hello, thank you for the nice work! I have a question about obj_scale_loss. Why do you use different forms of the scale loss in training and validation phase? Specifically, in the training phase: https://github.com/NVlabs/CenterPose/blob/6c89d420b33bd01c14c13f509af08bfe3d8b2fe7/src/lib/trains/object_pose.py#L116-L117 https://github.com/NVlabs/CenterPose/blob/6c89d420b33bd01c14c13f509af08bfe3d8b2fe7/src/lib/models/losses.py#L167 and in the validation phase: https://github.com/NVlabs/CenterPose/blob/6c89d420b33bd01c14c13f509af08bfe3d8b2fe7/src/lib/trains/object_pose.py#L126-L128 https://github.com/NVlabs/CenterPose/blob/6c89d420b33bd01c14c13f509af08bfe3d8b2fe7/src/lib/models/losses.py#L174-L176

torch.abs(target * mask - pred * mask) and torch.abs((1 * mask - pred * mask) / target_rmzero) does not produce same values. I want to know the meaning of the "relative loss" in the validation phase and why it is only used in the validation phase.

opened by comvee 0
How can I use realsense camera D435 to run the demo.py?

Thanks for your state -of-the-art. the instructions show using webcam to run demo.py and I run it successfully. further I want to run demo.py using realsense camera D435 , can I get any instructions?

opened by TiRaMisu1024 0

Owner

NVIDIA Research Projects

GitHub

Single-Stage 6D Object Pose Estimation, CVPR 2020

Overview This repository contains the code for the paper Single-Stage 6D Object Pose Estimation. Yinlin Hu, Pascal Fua, Wei Wang and Mathieu Salzmann.

89 Dec 26, 2022

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

677 Dec 25, 2022

Official PyTorch implementation of CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds Introduction This is the official PyTorch implementation of o

96 Dec 7, 2022

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

367 Dec 27, 2022

PoseCamera is python based SDK for human pose estimation through RGB webcam.

PoseCamera PoseCamera is python based SDK for human pose estimation through RGB webcam. Install install posecamera package through pip pip install pos

7 Jul 20, 2021

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

PN-Net We present a neural field-based framework for depth estimation from single-view RGB images. Rather than representing a 2D depth map as a single

1 Oct 2, 2021

FishNet: One Stage to Detect, Segmentation and Pose Estimation

FishNet FishNet: One Stage to Detect, Segmentation and Pose Estimation Introduction In this project, we combine target detection, instance segmentatio

1 Oct 5, 2022

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

CoReNet CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image. It produces coherent reconstructions, where all objec

80 Dec 25, 2022

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, as well as object detection and instance segmentation projects.

305 Dec 16, 2022

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

CMIC-Retrieval Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021. Introduction In this wo

42 Nov 17, 2022

[ICRA 2022] CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation

This is the official implementation of our paper: Bowen Wen, Wenzhao Lian, Kostas Bekris, and Stefan Schaal. "CaTGrasp: Learning Category-Level Task-R

199 Jan 4, 2023

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

143 Dec 22, 2022

3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

0 Feb 6, 2022

"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (CVPRW 2022) Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Z

274 Jan 5, 2023

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

Related tags

Overview

CenterPose

Overview

Installation

Demo

Training

Evaluation

Citation

Licence

Comments

Owner

NVIDIA Research Projects

Single-Stage 6D Object Pose Estimation, CVPR 2020

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

Official PyTorch implementation of CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

PoseCamera is python based SDK for human pose estimation through RGB webcam.

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

FishNet: One Stage to Detect, Segmentation and Pose Estimation

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

[ICRA 2022] CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

Code for Mining the Benefits of Two-stage and One-stage HOI Detection

Virtual Dance Reality Stage: a feature that offers you to share a stage with another user virtually