On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization

Overview

On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization

This repository contains the evaluation code and alternative pseudo ground truth poses as used in our ICCV 2021 paper.

video overview

Pseudo Ground Truth for 7Scenes and 12Scenes

We generated alternative SfM-based pseudo ground truth (pGT) using Colmap to supplement the original D-SLAM-based pseudo ground truth of 7Scenes and 12Scenes.

Pose Files

Please find our SfM pose files in the folder pgt. We separated pGT files wrt datasets, individual scenes and the test/training split. Each file contains one line per image that follows the format:

rgb_file qw qx qy qz tx ty tz f

Entries q and t represent the pose as quaternion and translation vector. The pose maps world coordinates to camera coordinates, i.e. p_cam = R(q) p_world + t. This is the same convention used by Colmap. Entry f represents the focal length of the RGB sensor. f was re-estimated by COLMAP and can differ slightly per scene.

We also provide the original D-SLAM pseudo ground truth in this format to be used with our evaluation code below.

Full Reconstructions

The Colmap 3D models are available here:

Note that the Google Drive folder that currently hosts the reconstructions has a daily download limit. We are currently looking into alternative hosting options.

License Information

Since the 3D models and pose files are derived from the original datasets, they are released under the same licences as the 7Scenes and 12Scenes datasets. Before using the datasets, please check the licenses (see the websites of the datasets or the README.md files that come with the 3D models).

Evaluation Code

The main results of our paper can be reproduced using evaluate_estimates.py. The script calculates either the pose error (max of rotation and translation error) or the DCRE error (dense reprojection error). The script prints the recall at a custom threshold to the console, and produces a cumulative error plot as a PDF file.

As input, the script expects a configuration file that points to estimated poses of potentially multiple algorithms and to the pseudo ground truth that these estimates should be compared to. We provide estimated poses of all methods shown in our paper (ActiveSearch, HLoc, R2D2 and DSAC*) in the folder estimates.
These pose files follow the same format as our pGT files described previously, but omit the final f entry.

Furthermore, we provide example config files corresponding to the main experiments in our paper.

Call python evaluate_estimates.py --help for all available options.

For evaluation on 7Scenes, using our SfM pGT, call:

python evaluate_estimates.py config_7scenes_sfm_pgt.json

This produces a new file config_7scenes_sfm_pgt_pose_err.pdf:

For the corresponding plot using the original D-SLAM pGT, call:

python evaluate_estimates.py config_7scenes_dslam_pgt.json

Interpreting the Results

The plots above show very different rankings across methods. Yet, as we discuss in our paper, both plots are valid since no version of the pGT is clearly superior to the other. Furthermore, it appears plausible that any version of pGT is only trustworthy up to a certain accuracy threshold. However, it is non-obvious and currently unknown, how to determine such a trust threshold. We thus strongly discourage to draw any conclusions (beyond that a method might be overfitting to the imperfections of the pseudo ground truth) from the smaller thresholds alone.

We advise to always evaluate methods under both versions of the pGT, and to show both evaluation results in juxtaposition unless specific reasons are given why one version of the pGT is preferred.

DCRE Computation

DCRE computation is triggered with the option --error_type dcre_max or --error_type dcre_mean (see our paper for details). DCRE needs access to the original 7Scenes or 12Scenes data as it requires depth maps. We provide two utility scripts, setup_7scenes.py and setup_12scenes.py, that will download and unpack the associated datasets. Make sure to check each datasets license, via the links above, before downloading and using them.

Note I: The original depth files of 7Scenes are not calibrated, but the DCRE requires calibrated files. The setup script will apply the Kinect calibration parameters found here to register depth to RGB. This essentially involves re-rendering the depth maps which is implemented in native Python and takes a long time due to the large frame count in 7Scenes (several hours). However, this step has to be done only once.

Note II: The DCRE computation by evaluate_estimates.py is implemented on the GPU and reasonably fast. However, due to the large frame count in 7Scenes it can still take considerable time. The parameter --error_max_images limits the max. number of frames used to calculate recall and cumulative errors. The default value of 1000 provides a good tradeoff between accuracy and speed. Use --error_max_images -1 to use all images which is most accurate but slow for 7Scenes.

Uploading Your Method's Estimates

We are happy to include updated evaluation results or evaluation results of new methods in this repository. This would enable easy comparisons across methods with unified evaluation code, as we progress in the field.

If you want your results included, please provide estimates of your method under both pGT versions via a pull request. Please add your estimation files to a custom sub-folder under èstimates_external, following our pose file convention described above. We would also ask that you provide a text file that links your results to a publication or tech report, or contains a description of how you obtained these results.

estimates_external
├── someone_elses_method
└── your_method
    ├── info_your_method.txt
    ├── dslam
    │   ├── 7scenes
    │   │   ├── chess_your_method.txt
    │   │   ├── fire_your_method.txt
    │   │   ├── ...
    │   └── 12scenes
    │       ├── ...
    └── sfm
        ├── ...

Dependencies

This code requires the following python packages, and we tested it with the package versions in brackets

pytorch (1.6.0)
opencv (3.4.2)
scikit-image (0.16.2)

The repository contains an environment.yml for the use with Conda:

conda env create -f environment.yml
conda activate pgt

License Information

Our evaluation code and data utility scripts are based on parts of DSAC*, and we provide our code under the same BSD-3 license.

Citation

If you are using either the evaluation code or the Structure-from-Motion pseudo GT for the 7Scenes or 12Scenes datasets, please cite the following work:

@InProceedings{Brachmann2021ICCV,
    author = {Brachmann, Eric and Humenberger, Martin and Rother, Carsten and Sattler, Torsten},
    title = {{On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization}},
    booktitle = {International Conference on Computer Vision (ICCV)},
    year = {2021},
}
You might also like...
Official Implementation of Few-shot Visual Relationship Co-localization

VRC Official implementation of the Few-shot Visual Relationship Co-localization (ICCV 2021) paper project page | paper Requirements Use python = 3.8.

Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation

Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation Introduction WAKD is a PyTorch implementation for our ICPR-2022 pap

Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Visual 3D Detection Package: This repo aims to provide flexible and reproducible visual 3D detection on KITTI dataset. We expect scripts starting from

[WACV 2020] Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints

Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints Official implementation for Reducing Footskate in Human Motion Recon

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb
PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

GndNet: Fast ground plane estimation and point cloud segmentation for autonomous vehicles using deep neural networks.
GndNet: Fast ground plane estimation and point cloud segmentation for autonomous vehicles using deep neural networks.

GndNet: Fast Ground plane Estimation and Point Cloud Segmentation for Autonomous Vehicles. Authors: Anshul Paigwar, Ozgur Erkent, David Sierra Gonzale

Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python

Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python THIS PROJECT IS CURRENTLY A WORK IN PROGRESS AND THUS THIS REPOSITORY I

Using LSTM to detect spoofing attacks in an Air-Ground network
Using LSTM to detect spoofing attacks in an Air-Ground network

Using LSTM to detect spoofing attacks in an Air-Ground network Specifications IDE: Spider Packages: Tensorflow 2.1.0 Keras NumPy Scikit-learn Matplotl

ObjectDrawer-ToolBox: a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system
ObjectDrawer-ToolBox: a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system

ObjectDrawer-ToolBox is a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system, Object Drawer.

Comments
  • a specific pnp implementation

    a specific pnp implementation

    Hi,

    I would like to ask what is the implementation of the PnP algorithm that you used to compute the estimated pose of your localization algorithm? I'm asking because I am having some problems with mine (correct 2d-3d matching but large differences w.r.t ground-truth poses). I am currently using openCV's solvePNPRansac. Thanks.

    opened by sontung 5
  • Request for complete pipeline for ground truth data generation

    Request for complete pipeline for ground truth data generation

    Hi, Thankyou for your responses to #1 and #2 . Based on the information provided, I tried to create SFM model and consecutively poses for reference as well as query images. The superpoint and superglue is used for feature extractor and feature matching with exhaustive feature matching.

    When I evaluated PixLoc - Localization network, on produced data , the performance is worst (result below).

    Cambridge ShopFacade Scene -

    [12/17/2021 20:35:54 pixloc INFO] Evaluate scene ShopFacade_ours: /home/ajay/pixloc/outputs/results/pixloc_Cambridge_ShopFacade_ours.txt
    [12/17/2021 20:35:54 pixloc.utils.eval INFO]
    Median errors: 5.241m, 65.716deg
    Percentage of test images localized within:
            1cm, 1deg : 0.00%
            2cm, 2deg : 0.00%
            3cm, 3deg : 0.00%
            5cm, 5deg : 0.00%
            25cm, 2deg : 0.00%
            50cm, 5deg : 0.99%
            500cm, 10deg : 2.97%
    

    Can you please help by sharing full pipeline used for producing ground truth pose data?

    opened by patelajaychh 3
  • How to keep camera pose of reference images fixed while running mapper with query images?

    How to keep camera pose of reference images fixed while running mapper with query images?

    #1 Thanks for quick response!

    According to reply to #1 , --fix_existing_images is set to True while creating poses for query images.

    Passage from Paper

    First, we reconstruct the scene with SfM using only the training images. Next, we continue the reconstruction process with the test images while keeping the training
    camera poses fixed.
    

    According to this passage, doesn't this mean that --fix_existing_images should be set to False? or am I missing something?

    opened by patelajaychh 1
  • How to generate poses for query images using SFM model built on reference images.

    How to generate poses for query images using SFM model built on reference images.

    Hi, I want to generate poses for query (new) images of a scene using the SFM model built on reference images only. One way I found is to use image_registrator in COLMAP (not sure if it is the correct way).

    But I'm getting below error while running command -

    colmap image_registrator     \
    --database_path  /data/outputs/hloc/King_Seq1_ref_only/sfm_superpoint+superglue_with_query/database_with_query.db     \
    --input_path  /data/outputs/hloc/King_Seq1_ref_only/sfm_superpoint+superglue   \
    --output_path  /data/outputs/hloc/King_Seq1_ref_only/sfm_superpoint+superglue_with_query
    

    Std Output-

    ==============================================================================
    Loading database
    ==============================================================================
    
    Loading cameras... 261 in 0.000s
    Loading matches... 33929 in 0.125s
    Loading images... 261 in 0.019s (connected 261)
    Building correspondence graph... in 2.984s (ignored 682)
    
    Elapsed time: 0.052 [minutes]
    
    F1213 16:50:19.615032 97173 reconstruction.cc:81] Check failed: existing_image.Name() == image.second.Name() (seq1_frame00261.png vs. seq1_frame00211.png)
    *** Check failure stack trace: ***
        @     0x7fd54172d0cd  google::LogMessage::Fail()
        @     0x7fd54172ef33  google::LogMessage::SendToLog()
        @     0x7fd54172cc28  google::LogMessage::Flush()
        @     0x7fd54172f999  google::LogMessageFatal::~LogMessageFatal()
        @     0x55b891c61a6d  (unknown)
        @     0x55b891dc8d4e  (unknown)
        @     0x55b891badaf5  (unknown)
        @     0x55b891b9ca0e  (unknown)
        @     0x7fd53d3afb97  __libc_start_main
        @     0x55b891ba66aa  (unknown)
    Aborted (core dumped)
    

    The paper On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation has presented that they have used similar technique (without mentioning the COLMAP function) to find ground truth poses for query images. This is the reason I'm posting this issue here.

    Am I going the correct way? If not what is the correct method to do so?

    opened by patelajaychh 1
Owner
Torsten Sattler
I am a senior researcher at CIIRC, the Czech Institute of Informatics, Robotics and Cybernetics, building my own research group.
Torsten Sattler
Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.

The Face Synthetics dataset Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels. It was introduced in ou

Microsoft 608 Jan 2, 2023
git《Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser》(2021) GitHub: [fig5]

Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser Abstract The success of deep denoisers on real-world colo

Yue Cao 51 Nov 22, 2022
DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

Visual Learning Lab 143 Dec 22, 2022
SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

SSL_SLAM2 Lightweight 3-D Localization and Mapping for Solid-State LiDAR (Intel Realsense L515 as an example) This repo is an extension work of SSL_SL

Wang Han 王晗 1.3k Jan 8, 2023
Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

Ibai Gorordo 15 Oct 14, 2022
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021)

Back to the Feature with PixLoc We introduce PixLoc, a neural network for end-to-end learning of camera localization from an image and a 3D model via

Computer Vision and Geometry Lab 610 Jan 5, 2023
The implementation of the paper "A Deep Feature Aggregation Network for Accurate Indoor Camera Localization".

A Deep Feature Aggregation Network for Accurate Indoor Camera Localization This is the PyTorch implementation of our paper "A Deep Feature Aggregation

null 9 Dec 9, 2022
Pseudo-Visual Speech Denoising

Pseudo-Visual Speech Denoising This code is for our paper titled: Visual Speech Enhancement Without A Real Visual Stream published at WACV 2021. Autho

Sindhu 94 Oct 22, 2022
Camera-caps - Examine the camera capabilities for V4l2 cameras

camera-caps This is a graphical user interface over the v4l2-ctl command line to

Jetsonhacks 25 Dec 26, 2022
A 2D Visual Localization Framework based on Essential Matrices [ICRA2020]

A 2D Visual Localization Framework based on Essential Matrices This repository provides implementation of our paper accepted at ICRA: To Learn or Not

Qunjie Zhou 27 Nov 7, 2022