Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

Overview

CenterPose

Overview

This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image" by Lin et al. (full citation below). In this work, we propose a single-stage, keypoint-based approach for category-level object pose estimation, which operates on unknown object instances within a known category using a single RGB image input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative 3D bounding cuboid dimensions. These quantities are estimated in a sequential fashion, leveraging the recent idea of convGRU for propagating information from easier tasks to those that are more difficult. We favor simplicity in our design choices: generic cuboid vertex coordinates, a single-stage network, and monocular RGB input. We conduct extensive experiments on the challenging Objectron benchmark of real images, outperforming state-of-the-art methods for 3D IoU metric (27.6% higher than the single-stage approach of MobilePose and 7.1% higher than the related two-stage approach). The algorithm runs at 15 fps on an NVIDIA GTX 1080Ti GPU.

Installation

The code was tested on Ubuntu 16.04, with Anaconda Python 3.6 and PyTorch 1.1.0. Higher versions should be possible with some accuracy difference. NVIDIA GPUs are needed for both training and testing.

  1. Clone this repo:

    CenterPose_ROOT=/path/to/clone/CenterPose
    git clone https://github.com/NVlabs/CenterPose.git $CenterPose_ROOT
    
  2. Create an Anaconda environment or create your own virtual environment

    conda create -n CenterPose python=3.6
    conda activate CenterPose
    pip install -r requirements.txt
    conda install -c conda-forge eigenpy
    
  3. Compile the deformable convolutional layer

    git submodule init
    git submodule update
    cd $CenterPose_ROOT/src/lib/models/networks/DCNv2
    ./make.sh
    

    [Optional] If you want to use a higher version of PyTorch, you need to download the latest version of DCNv2 and compile the library.

    git submodule set-url https://github.com/jinfagang/DCNv2_latest.git src/lib/models/networks/DCNv2
    git submodule sync
    git submodule update --init --recursive --remote
    cd $CenterPose_ROOT/src/lib/models/networks/DCNv2
    ./make.sh
    
  4. Download our pre-trained models for CenterPose and move all the .pth files to $CenterPose_ROOT/models/CenterPose/. We currently provide models for 9 categories: bike, book, bottle, camera, cereal_box, chair, cup, laptop, and shoe.

  5. Prepare training/testing data

    We save all the training/testing data under $CenterPose_ROOT/data/.

    For the Objectron dataset, we created our own data pre-processor to extract the data for training/testing. Refer to the data directory for more details.

Demo

We provide supporting demos for image, videos, webcam, and image folders. See $CenterPose_ROOT/images/CenterPose

For category-level 6-DoF object estimation on images/video/image folders, run:

cd $CenterPose_ROOT/src
python demo.py --demo /path/to/image/or/folder/or/video --arch dlav1_34 --load_model ../path/to/model

You can also enable --debug 4 to save all the intermediate and final outputs.

For the webcam demo (You may want to specify the camera intrinsics via --cam_intrinsic), run

cd $CenterPose_ROOT/src
python demo.py --demo webcam --arch dlav1_34 --load_model ../path/to/model

Training

We follow the approach of CenterNet for training the DLA network, reducing the learning rate by 10x after epoch 90 and 120, and stopping after 140 epochs.

For debug purposes, you can put all the local training params in the $CenterPose_ROOT/src/main_CenterPose.py script. You can also use the command line instead. More options are in $CenterPose_ROOT/src/lib/opts.py.

To start a new training job, simply do the following, which will use default parameter settings:

cd $CenterPose_ROOT/src
python main_CenterPose.py

The result will be saved in $CenterPose_ROOT/exp/object_pose/$dataset_$category_$arch_$time ,e.g., objectron_bike_dlav1_34_2021-02-27-15-33

You could then use tensorboard to visualize the training process via

cd $path/to/folder
tensorboard --logdir=logs --host=XX.XX.XX.XX

Evaluation

We evaluate our method on the Objectron dataset, please refer to the objectron_eval directory for more details.

Citation

Please cite grasp_primitiveShape if you use this repository in your publications:

@article{lin2021single,
  title={Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image},
  author={Lin, Yunzhi and Tremblay, Jonathan and Tyree, Stephen and Vela, Patricio A and Birchfield, Stan},
  journal={arXiv preprint arXiv:2109.06161},
  year={2021}
}

Licence

CenterPose is licensed under the NVIDIA Source Code License - Non-commercial.

Comments
  • Question about loss function

    Question about loss function

    I noticed that you use 2d keypoints & relative cuboid dimensions for supervision, could i also use the 6-DOF pose for surpervision? this 6-DOF loss could backward correctly?Does PNP algorithm affect back propagation? thanks for your reply!

    opened by xiaoxin-Crayon 5
  • Question about scale factor on translation vector

    Question about scale factor on translation vector

    Hello, thanks for this awesome work!

    I have a question related to the scale factor: from your paper it's crystal clear to me that it's possible to recover just the relative size of the estimated 3D bounding box.

    However I was wondering whether the translation vector of the 6D pose is estimated with absolute scale (provided the correct intrinsic parameters) or not

    I remain at your disposal for further clarification about my question.

    opened by AlbertoRemus 2
  • lack some parameters to run demo.py

    lack some parameters to run demo.py

    Hi, thanks for your code. I encountered a problem when I try to run demo.py. I loaded the pre-trained parameters of 'chair' (chair_v1_140.pth), and used the images in the chair folder. But it prompted that some parameters were missing, as shown below. I would appreciate your help.

    F4`)BX](https://user-images.githubusercontent.com/95119853/188465002-d4c4eecb-409f-4c25-bc42-8a40723d66af.png)

    opened by NeilLHY 1
  • Question about experiements

    Question about experiements

    Great work and nice documented repository.

    First is there a plan to update DCNv2 to the latest working then also on the newest PyTorch version?

    Second, I have been wondering if there is any subsequent work planed using the NOCS dataset?

    opened by HannahHaensen 1
  • eval problem

    eval problem

    Could you please provide a detailed procedure for evaluating the model? Because I tested it according to the code you provided and can't output any results, thanks.

    opened by YC0315 0
  • ImportError: dlopen: cannot load any more object with static TLS

    ImportError: dlopen: cannot load any more object with static TLS

    When executing "python main_CenterPose.py", an error is reported: ImportError: dlopen: cannot load any more object with static TLS. Have you encountered it?

    opened by YC0315 1
  • RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:425

    RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:425

    (CenterPose) dell1804@dell1804-G3-3590:~/center_pose_ws/CenterPose/src$ python demo.py --demo ../data/book.jpg --arch dlav1_34 --load_model ../models/CenterPose/book_v1_140.pth /home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/sklearn/utils/linear_assignment_.py:22: FutureWarning: The linear_assignment_ module is deprecated in 0.21 and will be removed from 0.23. Use scipy.optimize.linear_sum_assignment instead. FutureWarning) Fix size testing. training chunk_sizes: [1] The output will be saved to /home/dell1804/center_pose_ws/CenterPose/src/lib/../../exp/object_pose/default heads {'hm': 1, 'wh': 2, 'hps': 16, 'reg': 2, 'hm_hp': 8, 'hp_offset': 2, 'scale': 3} Creating model... loaded ../models/CenterPose/book_v1_140.pth, epoch 140 THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument Traceback (most recent call last): File "demo.py", line 156, in demo(opt, meta) File "demo.py", line 83, in demo ret = detector.run(image_name, meta_inp=meta) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/detectors/base_detector.py", line 474, in run images, self.pre_images, pre_hms, pre_hm_hp, pre_inds, return_time=True) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/detectors/object_pose.py", line 135, in process output = self.model(images, pre_images, pre_hms, pre_hm_hp)[-1] File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/pose_dla_dcn.py", line 531, in forward x = self.dla_up(x) File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/pose_dla_dcn.py", line 441, in forward ida(layers, len(layers) - i - 2, len(layers)) File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/pose_dla_dcn.py", line 415, in forward layers[i] = upsample(project(layers[i])) File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/pose_dla_dcn.py", line 387, in forward x = self.conv(x) File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/DCNv2/dcn_v2.py", line 128, in forward self.deformable_groups) File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/DCNv2/dcn_v2.py", line 31, in forward ctx.deformable_groups) RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:425

    (CenterPose) dell1804@dell1804-G3-3590:~/center_pose_ws/CenterPose/src$ nvidia-smi /usr/bin/nvidia-modprobe: unrecognized option: "-s"

    ERROR: Invalid commandline, please run /usr/bin/nvidia-modprobe --help for usage information.

    /usr/bin/nvidia-modprobe: unrecognized option: "-s"

    ERROR: Invalid commandline, please run /usr/bin/nvidia-modprobe --help for usage information.

    Sun Oct 9 11:10:47 2022
    +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | N/A 50C P8 2W / N/A | 1083MiB / 3911MiB | 14% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

    +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1165 G /usr/lib/xorg/Xorg 226MiB | | 0 N/A N/A 1846 G /usr/bin/gnome-shell 50MiB | | 0 N/A N/A 3778 G ...428520904353170423,131072 72MiB | | 0 N/A N/A 24592 C python 727MiB | +-----------------------------------------------------------------------------+

    Python 3.6.15 | packaged by conda-forge | (default, Dec 3 2021, 18:49:41) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

    import torch torch.version '1.1.0'

    opened by dbdxnuliba 1
  • Question about obj_scale_loss

    Question about obj_scale_loss

    Hello, thank you for the nice work! I have a question about obj_scale_loss. Why do you use different forms of the scale loss in training and validation phase? Specifically, in the training phase: https://github.com/NVlabs/CenterPose/blob/6c89d420b33bd01c14c13f509af08bfe3d8b2fe7/src/lib/trains/object_pose.py#L116-L117 https://github.com/NVlabs/CenterPose/blob/6c89d420b33bd01c14c13f509af08bfe3d8b2fe7/src/lib/models/losses.py#L167 and in the validation phase: https://github.com/NVlabs/CenterPose/blob/6c89d420b33bd01c14c13f509af08bfe3d8b2fe7/src/lib/trains/object_pose.py#L126-L128 https://github.com/NVlabs/CenterPose/blob/6c89d420b33bd01c14c13f509af08bfe3d8b2fe7/src/lib/models/losses.py#L174-L176

    torch.abs(target * mask - pred * mask) and torch.abs((1 * mask - pred * mask) / target_rmzero) does not produce same values. I want to know the meaning of the "relative loss" in the validation phase and why it is only used in the validation phase.

    opened by comvee 0
  • How can I use realsense camera D435 to run the demo.py?

    How can I use realsense camera D435 to run the demo.py?

    Thanks for your state -of-the-art. the instructions show using webcam to run demo.py and I run it successfully. further I want to run demo.py using realsense camera D435 , can I get any instructions?

    opened by TiRaMisu1024 0
Owner
NVIDIA Research Projects
NVIDIA Research Projects
Single-Stage 6D Object Pose Estimation, CVPR 2020

Overview This repository contains the code for the paper Single-Stage 6D Object Pose Estimation. Yinlin Hu, Pascal Fua, Wei Wang and Mathieu Salzmann.

CVLAB @ EPFL 89 Dec 26, 2022
Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

Gyeongsik Moon 677 Dec 25, 2022
Official PyTorch implementation of CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds Introduction This is the official PyTorch implementation of o

Yijia Weng 96 Dec 7, 2022
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
PoseCamera is python based SDK for human pose estimation through RGB webcam.

PoseCamera PoseCamera is python based SDK for human pose estimation through RGB webcam. Install install posecamera package through pip pip install pos

WonderTree 7 Jul 20, 2021
PN-Net a neural field-based framework for depth estimation from single-view RGB images.

PN-Net We present a neural field-based framework for depth estimation from single-view RGB images. Rather than representing a 2D depth map as a single

null 1 Oct 2, 2021
FishNet: One Stage to Detect, Segmentation and Pose Estimation

FishNet FishNet: One Stage to Detect, Segmentation and Pose Estimation Introduction In this project, we combine target detection, instance segmentatio

null 1 Oct 5, 2022
CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

CoReNet CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image. It produces coherent reconstructions, where all objec

Google Research 80 Dec 25, 2022
Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, as well as object detection and instance segmentation projects.

null 305 Dec 16, 2022
Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

CMIC-Retrieval Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021. Introduction In this wo

null 42 Nov 17, 2022
[ICRA 2022] CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation

This is the official implementation of our paper: Bowen Wen, Wenzhao Lian, Kostas Bekris, and Stefan Schaal. "CaTGrasp: Learning Category-Level Task-R

Bowen Wen 199 Jan 4, 2023
DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

Visual Learning Lab 143 Dec 22, 2022
3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

Владислав Молодцов 0 Feb 6, 2022
"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (CVPRW 2022) Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Z

Yuanhao Cai 274 Jan 5, 2023
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Build Type Linux MacOS Windows Build Status OpenPose has represented the first real-time multi-person system to jointly detect human body, hand, facia

null 25.7k Jan 9, 2023
Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021. For details of the model and experiments, please see our paper.

tricktreat 87 Dec 16, 2022
Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

CDN Code for our NeurIPS 2021 paper "Mining the Benefits of Two-stage and One-stage HOI Detection". Contributed by Aixi Zhang*, Yue Liao*, Si Liu, Mia

null 71 Dec 14, 2022
Code for Mining the Benefits of Two-stage and One-stage HOI Detection

Status: Archive (code is provided as-is, no updates expected) PPO-EWMA [Paper] This is code for training agents using PPO-EWMA and PPG-EWMA, introduce

OpenAI 33 Dec 15, 2022
Virtual Dance Reality Stage: a feature that offers you to share a stage with another user virtually

Portrait Segmentation using Tensorflow This script removes the background from an input image. You can read more about segmentation here Setup The scr

null 291 Dec 24, 2022