SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement

Overview

SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement

This repository implements the approach described in SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement (WACV 2022).

Iterative refinement using SporeAgent

Iterative registration using SporeAgent:
The initial pose from PoseCNN (purple) and the final pose using SporeAgent (blue) on the LINEMOD (left,cropped) and YCB-Video (right) datasets.

Scene-level Plausibility

Scene-level Plausibility:
The initial scene configuration from PoseCNN (left) results in an implausible pose of the target object (gray). Refinement using SporeAgent (right) results in a plausible scene configuration where the intersecting points (red) are resolved and the object rests on its supported points (cyan).

LINEMOD AD < 0.10d AD < 0.05d AD <0.02d YCB-Video ADD AUC AD AUC ADI AUC
PoseCNN 62.7 26.9 3.3 51.5 61.3 75.2
Point-to-Plane ICP 92.6 79.8 29.9 68.2 79.2 87.8
w/ VeREFINE 96.1 85.8 32.5 70.1 81.0 88.8
Multi-hypothesis ICP 99.3 89.9 35.6 77.4 86.6 92.6
SporeAgent 99.3 93.7 50.3 79.0 88.8 93.6

Comparison on LINEMOD and YCB-Video:
The initial pose and segmentation estimates are computed using PoseCNN. We compare our approach to vanilla Point-to-Plane ICP (from Open3D), Point-to-Plane ICP augmented by the simulation-based VeREFINE approach and the ICP-based multi-hypothesis approach used for refinement in PoseCNN.

Dependencies

The code has been tested on Ubuntu 16.04 and 20.04 with Python 3.6 and CUDA 10.2. To set-up the Python environment, use Anaconda and the provided YAML file:

conda env create -f environment.yml --name sporeagent

conda activate sporeagent.

The BOP Toolkit is additionally required. The BOP_PATH in config.py needs to be changed to the respective clone directory and the packages required by the BOP Toolkit need to be installed.

The YCB-Video Toolbox is required for experiments on the YCB-Video dataset.

Datasets

We use the dataset versions prepared for the BOP challenge. The required files can be downloaded to a directory of your choice using the following bash script:

export SRC=http://ptak.felk.cvut.cz/6DB/public/bop_datasets
export DATASET=ycbv                     # either "lm" or "ycbv"
wget $SRC/$DATASET_base.zip             # Base archive with dataset info, camera parameters, etc.
wget $SRC/$DATASET_models.zip           # 3D object models.
wget $SRC/$DATASET_test_all.zip         # All test images.
unzip $DATASET_base.zip                 # Contains folder DATASET.
unzip $DATASET_models.zip -d $DATASET   # Unpacks to DATASET.
unzip $DATASET_test_all.zip -d $DATASET # Unpacks to DATASET.

For training on YCB-Video, the $DATASET_train_real.zip is moreover required.

In addition, we have prepared point clouds sampled within the ground-truth masks (for training) and the segmentation masks computed using PoseCNN (for evaluation) for the LINEMOD and YCB-Video dataset. The samples for evaluation also include the initial pose estimates from PoseCNN.

LINEMOD

Extract the prepared samples into PATH_TO_BOP_LM/sporeagent/ and set LM_PATH in config.py to the base directory, i.e., PATH_TO_BOP_LM. Download the PoseCNN results and the corresponding image set definitions provided with DeepIM and extract both into POSECNN_LM_RESULTS_PATH. Finally, since the BOP challenge uses a different train/test split than the compared methods, the appropriate target file found here needs to be placed in the PATH_TO_BOP_LM directory.

To compute the AD scores using the BOP Toolkit, BOP_PATH/scripts/eval_bop19.py needs to be adapted:

  • to use ADI for symmetric objects and ADD otherwise with a 2/5/10% threshold, change p['errors'] to
{
  'n_top': -1,
  'type': 'ad',
  'correct_th': [[0.02], [0.05], [0.1]]
}
  • to use the correct test targets, change p['targets_filename'] to 'test_targets_add.json'

YCB-Video

Extract the prepared samples into PATH_TO_BOP_YCBV/reagent/ and set YCBV_PATH in config.py to the base directory, i.e., PATH_TO_BOP_YCBV. Clone the YCB Video Toolbox to POSECNN_YCBV_RESULTS_PATH. Extract the results_PoseCNN_RSS2018.zip and copy test_data_list.txt to the same directory. The POSECNN_YCBV_RESULTS_PATH in config.py needs to be changed to the respective directory. Additionally, place the meshes in the canonical frame models_eval_canonical in the PATH_TO_BOP_YCBV directory.

To compute the ADD/AD/ADI AUC scores using the YCB-Video Toolbox, replace the respective files in the toolbox by the ones provided in sporeagent/ycbv_toolbox.

Pretrained models

Weights for both datasets can be found here. Download and copy them to sporeagent/weights/.

Training

For LINEMOD: python registration/train.py --dataset=lm

For YCB-Video: python registration/train.py --dataset=ycbv

Evaluation

Note that we precompute the normal images used for pose scoring on the first run and store them to disk.

LINEMOD

The results for LINEMOD are computed using the BOP Toolkit. The evaluation script exports the required file by running

python registration/eval.py --dataset=lm,

which can then be processed via

python BOP_PATH/scripts/eval_bop19.py --result_filenames=PATH_TO_CSV_WITH_RESULTS.

YCB-Video

The results for YCB-Video are computed using the YCB-Video Toolbox. The evaluation script exports the results in BOP format by running

python registration/eval.py --dataset=ycbv,

which can then be parsed into the format used by the YCB-Video Toolbox by running

python utility/parse_matlab.py.

In MATLAB, run evaluate_poses_keyframe.m to generate the per-sample results and plot_accuracy_keyframe.m to compute the statistics.

Citation

If you use this repository in your publications, please cite

@article{bauer2022sporeagent,
    title={SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement},
    author={Bauer, Dominik and Patten, Timothy and Vincze, Markus},
    booktitle={IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    year={2022},
    pages={654-662}
}
You might also like...
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a

《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]
《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

Repository for the paper
Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.
Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Official PyTorch implementation of CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds
Official PyTorch implementation of CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds Introduction This is the official PyTorch implementation of o

[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation
[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation Getting Started Our codes are implemented and tested with pyth

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Owner
Dominik Bauer
Dominik Bauer
Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

CenterPose Overview This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation fro

NVIDIA Research Projects 188 Dec 27, 2022
PyTorch implemention of ICCV'21 paper SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation

SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation This is the PyTorch implemention of ICCV'21 paper SGPA: Structure

Chen Kai 24 Dec 5, 2022
Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

SUO-SLAM This repository hosts the code for our CVPR 2022 paper "Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation". ArXiv li

Robot Perception & Navigation Group (RPNG) 97 Jan 3, 2023
SafePicking: Learning Safe Object Extraction via Object-Level Mapping, ICRA 2022

SafePicking Learning Safe Object Extraction via Object-Level Mapping Kentaro Wad

Kentaro Wada 49 Oct 24, 2022
Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

Facebook Research 182 Dec 30, 2022
[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion Code for Multi-Temporal Scene Classification and Scene Ch

Lixiang Ru 33 Dec 12, 2022
Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object compositions and views.

null 151 Dec 26, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

Cheng Zhang 66 Nov 16, 2022
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Meftun AKARSU 52 Dec 22, 2022