Geometry-Aware Learning of Maps for Camera Localization (CVPR2018)

NVIDIA Research Projects

Last update: Nov 26, 2022

Related tags

Overview

Geometry-Aware Learning of Maps for Camera Localization

This is the PyTorch implementation of our CVPR 2018 paper

"Geometry-Aware Learning of Maps for Camera Localization" - CVPR 2018 (Spotlight). Samarth Brahmbhatt, Jinwei Gu, Kihwan Kim, James Hays, and Jan Kautz

A four-minute video summary (click below for the video)

Citation

If you find this code useful for your research, please cite our paper

@inproceedings{mapnet2018,
  title={Geometry-Aware Learning of Maps for Camera Localization},
  author={Samarth Brahmbhatt and Jinwei Gu and Kihwan Kim and James Hays and Jan Kautz},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2018}
}

Documentation
Setup
Data
Running the code
FAQ
License

Documentation

Setup

MapNet uses a Conda environment that makes it easy to install all dependencies.

Install miniconda with Python 2.7.
Create the mapnet Conda environment: conda env create -f environment.yml.
Activate the environment: conda activate mapnet_release.
Note that our code has been tested with PyTorch v0.4.1 (the environment.yml file should take care of installing the appropriate version).

Data

We support the 7Scenes and Oxford RobotCar datasets right now. You can also write your own PyTorch dataloader for other datasets and put it in the dataset_loaders directory. Refer to this README file for more details.

The datasets live in the data/deepslam_data directory. We provide skeletons with symlinks to get you started. Let us call your 7Scenes download directory 7SCENES_DIR and your main RobotCar download directory (in which you untar all the downloads from the website) ROBOTCAR_DIR. You will need to make the following symlinks:

cd data/deepslam_data && ln -s 7SCENES_DIR 7Scenes && ln -s ROBOTCAR_DIR RobotCar_download

Special instructions for RobotCar: (only needed for RobotCar data)

Download this fork of the dataset SDK, and run cd scripts && ./make_robotcar_symlinks.sh after editing the ROBOTCAR_SDK_ROOT variable in it appropriately.
For each sequence, you need to download the stereo_centre, vo and gps tar files from the dataset website (more details in this comment).
The directory for each 'scene' (e.g. full) has .txt files defining the train/test split. While training MapNet++, you must put the sequences for self-supervised learning (dataset T in the paper) in the test_split.txt file. The dataloader for the MapNet++ models will use both images and ground-truth pose from sequences in train_split.txt and only images from the sequences in test_split.txt.
To make training faster, we pre-processed the images using scripts/process_robotcar_images.py. This script undistorts the images using the camera models provided by the dataset, and scales them such that the shortest side is 256 pixels.

Running the code

Demo/Inference

The trained models for all experiments presented in the paper can be downloaded here. The inference script is scripts/eval.py. Here are some examples, assuming the models are downloaded in scripts/logs. Please go to the scripts folder to run the commands.

7_Scenes

MapNet++ with pose-graph optimization (i.e., MapNet+PGO) on heads:

$ python eval.py --dataset 7Scenes --scene heads --model mapnet++ \
--weights logs/7Scenes_heads_mapnet++_mapnet++_7Scenes/epoch_005.pth.tar \
--config_file configs/pgo_inference_7Scenes.ini --val --pose_graph
Median error in translation = 0.12 m
Median error in rotation    = 8.46 degrees

For evaluating on the train split remove the --val flag
To save the results to disk without showing them on screen (useful for scripts), add the --output_dir ../results/ flag
See this README file for more information on hyper-parameters and which config files to use.
MapNet++ on heads:

$ python eval.py --dataset 7Scenes --scene heads --model mapnet++ \
--weights logs/7Scenes_heads_mapnet++_mapnet++_7Scenes/epoch_005.pth.tar \
--config_file configs/mapnet.ini --val
Median error in translation = 0.13 m
Median error in rotation    = 11.13 degrees

MapNet on heads:

$ python eval.py --dataset 7Scenes --scene heads --model mapnet \
--weights logs/7Scenes_heads_mapnet_mapnet_learn_beta_learn_gamma/epoch_250.pth.tar \
--config_file configs/mapnet.ini --val
Median error in translation = 0.18 m
Median error in rotation    = 13.33 degrees

PoseNet (CVPR2017) on heads:

$ python eval.py --dataset 7Scenes --scene heads --model posenet \
--weights logs/7Scenes_heads_posenet_posenet_learn_beta_logq/epoch_300.pth.tar \
--config_file configs/posenet.ini --val
Median error in translation = 0.19 m
Median error in rotation    = 12.15 degrees

RobotCar

MapNet++ with pose-graph optimization on loop:

$ python eval.py --dataset RobotCar --scene loop --model mapnet++ \
--weights logs/RobotCar_loop_mapnet++_mapnet++_RobotCar_learn_beta_learn_gamma_2seq/epoch_005.pth.tar \
--config_file configs/pgo_inference_RobotCar.ini --val --pose_graph
Mean error in translation = 6.74 m
Mean error in rotation    = 2.23 degrees

MapNet++ on loop:

$ python eval.py --dataset RobotCar --scene loop --model mapnet++ \
--weights logs/RobotCar_loop_mapnet++_mapnet++_RobotCar_learn_beta_learn_gamma_2seq/epoch_005.pth.tar \
--config_file configs/mapnet.ini --val
Mean error in translation = 6.95 m
Mean error in rotation    = 2.38 degrees

MapNet on loop:

$ python eval.py --dataset RobotCar --scene loop --model mapnet \
--weights logs/RobotCar_loop_mapnet_mapnet_learn_beta_learn_gamma/epoch_300.pth.tar \
--config_file configs/mapnet.ini --val
Mean error in translation = 9.84 m
Mean error in rotation    = 3.96 degrees

Train

The executable script is scripts/train.py. Please go to the scripts folder to run these commands. For example:

PoseNet on chess from 7Scenes: python train.py --dataset 7Scenes --scene chess --config_file configs/posenet.ini --model posenet --device 0 --learn_beta --learn_gamma

MapNet on chess from 7Scenes: python train.py --dataset 7Scenes --scene chess --config_file configs/mapnet.ini --model mapnet --device 0 --learn_beta --learn_gamma
MapNet++ is finetuned on top of a trained MapNet model: python train.py --dataset 7Scenes --checkpoint <trained_mapnet_model.pth.tar> --scene chess --config_file configs/mapnet++_7Scenes.ini --model mapnet++ --device 0 --learn_beta --learn_gamma

For example, we can train MapNet++ model on heads from a pretrained MapNet model:

$ python train.py --dataset 7Scenes \
--checkpoint logs/7Scenes_heads_mapnet_mapnet_learn_beta_learn_gamma/epoch_250.pth.tar \
--scene heads --config_file configs/mapnet++_7Scenes.ini --model mapnet++ \
--device 0 --learn_beta --learn_gamma

For MapNet++ training, you will need visual odometry (VO) data (or other sensory inputs such as noisy GPS measurements). For 7Scenes, we provided the preprocessed VO computed with the DSO method. For RobotCar, we use the provided stereo_vo. If you plan to use your own VO data (especially from a monocular camera) for MapNet++ training, you will need to first align the VO with the world coordinate (for rotation and scale). Please refer to the "Align VO" section below for more detailed instructions.

The meanings of various command-line parameters are documented in scripts/train.py. The values of various hyperparameters are defined in a separate .ini file. We provide some examples in the scripts/configs directory, along with a README file explaining some hyper-parameters.

If you have visdom = yes in the config file, you will need to start a Visdom server for logging the training progress:

python -m visdom.server -env_path=scripts/logs/.

Network Attention Visualization

Calculates the network attention visualizations and saves them in a video

For the MapNet model trained on chess in 7Scenes:

$ python plot_activations.py --dataset 7Scenes --scene chess
--weights <filename.pth.tar> --device 1 --val --config_file configs/mapnet.ini
--output_dir ../results/

Check here for an example video of computed network attention of PoseNet vs. MapNet++.

Other Tools

Align VO to the ground truth poses

This has to be done before using VO in MapNet++ training. The executable script is scripts/align_vo_poses.py.

For the first sequence from chess in 7Scenes: python align_vo_poses.py --dataset 7Scenes --scene chess --seq 1 --vo_lib dso. Note that alignment for 7Scenes needs to be done separately for each sequence, and so the --seq flag is needed
For all 7Scenes you can also use the script align_vo_poses_7scenes.sh The script stores the information at the proper location in data

Mean and stdev pixel statistics across a dataset

This must be calculated before any training. Use the scripts/dataset_mean.py, which also saves the information at the proper location. We provide pre-computed values for RobotCar and 7Scenes.

Calculate pose translation statistics

Calculates the mean and stdev and saves them automatically to appropriate files python calc_pose_stats.py --dataset 7Scenes --scene redkitchen This information is needed to normalize the pose regression targets, so this script must be run before any training. We provide pre-computed values for RobotCar and 7Scenes.

Plot the ground truth and VO poses for debugging

python plot_vo_poses.py --dataset 7Scenes --scene heads --vo_lib dso --val. To save the output instead of displaying on screen, add the --output_dir ../results/ flag

Process RobotCar GPS

The scripts/process_robotcar_gps.py script must be run before using GPS for MapNet++ training. It converts the csv file into a format usable for training.

Demosaic and undistort RobotCar images

This is advisable to do beforehand to speed up training. The scripts/process_robotcar_images.py script will do that and save the output images to a centre_processed directory in the stereo directory. After the script finishes, you must rename this directory to centre so that the dataloader uses these undistorted and demosaiced images.

FAQ

Collection of issues and resolution comments that might be useful:

Reproducing results: #36, #38
Pose normalization, pose stats: #35, #37
Data: #26
Hyperparameters (beta, gamma, etc): README, #42, #41, #31

License

Comments

basic question

any ideas how to overcome the below? Loaded weights from logs/7Scenes_heads_mapnet++_mapnet++_7Scenes/epoch_005.pth.tar Running mapnet++ on VAL data Traceback (most recent call last): File "eval.py", line 124, in vo_func=vo_func, no_duplicates=False, **kwargs) File "../dataset_loaders/composite.py", line 45, in init self.dset = SevenScenes(*args, real=self.real, **kwargs) File "../dataset_loaders/seven_scenes.py", line 52, in init with open(split_file, 'r') as f: IOError: [Errno 2] No such file or directory: '../data/deepslam_data/7Scenes/heads/TestSplit.txt'

opened by AndreV84 14
Output graphs are so jumbled.?

Sir I run your code to Evaluate the pretrained Mapnet++ on Heads, Offices of 7 scenes and I get such jumbled output as compared to The output u show me in your research paper.

The output should not be just like in your research paper as below.?

opened by DRAhmadFaraz 11
Argument name seems wrong in process_robotcar_iamges.py

https://github.com/NVlabs/geomapnet/blob/e18b75684ae61ca502e7df0cb99edf45770ac57b/scripts/process_robotcar_images.py#L38

The argument here seems should be scene=args.scene rather than sequence=args.scene, according to the dataset_loaders/robotcar.py https://github.com/NVlabs/geomapnet/blob/e18b75684ae61ca502e7df0cb99edf45770ac57b/dataset_loaders/robotcar.py#L19

opened by mrbulb 11
Everytime I restart training from the begining, the obtained model gaves a larger error.

Dear Mr. Samarth Brahmbhatt Thank you for your code! It is fantastic. However, I repeated three times, starting all over again (without resume checkpoint), training 0 to 100 epochs with mapnet, and using eval.py to test the trained model.

In theory, I train in strict accordance with the provided experimental parameters and compiling environment. The three pieces of training are completely independent and do not affect each other.
So the test results (for example, using eval.py to test the 100_epoch.pth.tar of the first, second, and third training respectively), the results should be almost the same. Unfortunately, the trained model (e.g. epoch_100.pth.tar) obtained in the second time's training gave an obviously larger error than that from the first-time's training. So do the third-time's results comparing with the second and the first time's ones.

I feel very confused. I'd like to ask for your opinion. Your suggestion will be very helpful to me. Thank you in advance!

Best Jialu
reproduce_results

opened by jialuwang123321 10
why the mean_t and std_t for 7-Scenes are set to zeros(3) and ones(3) in SevenScenes dataset?

Hey, I noticed that the pose stats of 7-Scenes, namely the mean_t and std_t are simply set to zeros(3) and ones(3). But when I use the pose stats computed by myself (not equal to zeros(3) and ones(3)) for training and evaluation, the accuracy decreased a lot.

opened by ZhouJiaHuan 6
What is the difference between frame00xxx-pose.txt file 4x4 pose matrix and DSO: Direct sparse odometry of 12 values poses.??

@samarth-robo Sir, I want to ask What is the difference between frame00xxx-pose.txtfile 4x4 pose matrix for each corresponding image and DSO: Direct sparse odometry txt filesof 12 values of poses for each corresponding image.??

Does your code needs to compare the results from both of them used as a ground truth.? why your code needed 2 poses files for the comparison.? what is the difference between these two files.??

Kindly guide me. I will be thankful to you.. Best regards

opened by DRAhmadFaraz 6
Why the loss appear in negative sign.?? on MAPNET

@samarth-robo, as you can see in this screenshot, The loss turn into negative sign.? what is it mean.? when I try to train MAPNET on my GPU Nvidia RTX2080ti with these parameters as below

n_epochs = 300 batch_size = 20 do_val = yes seed = 7 shuffle = yes num_workers = 5 snapshot = 50 val_freq = 50 max_grad_norm = 0

opened by DRAhmadFaraz 5
Overfitting problem
@samarth-robo Hello~ I have run this code with my own dataset, I choose Mapnet to be my model, hole settings are the same(learning rate, step, skip, etc..), but the result always converge to the same point, is this overfitting. I think the reason is

Learning rate still too big?

The parameter of skip is too small? (Because my robot move slow)

Data repeatability is too high (It means the image is too close but the poses are different because the path of my training data is close)

Following is the Ground Truth of my training data which do you think is the problem I really need your help thanks
opened by m5823779 5
a question for the initializations about Beta in _mapnet.ini_

Hi Samarth, I am implement your paper and I have a question for the initializations about Beta in mapnet.ini .I found that you set Beta = -3.0 in mapnet.ini ,and in your paper (Section 3.5. Implementation Details) you choose Beta = 0.0 instead.

Can I learn from you that which is the better initialization for Beta, and whether it has some reason to choose that initialization? Thanks, waiting for your reply.
hyperparameters

opened by novasky0709 4
How to get the DSO: Direct sparse odometry for our own dataset

@samarth-robo, Respected Sir, I have successfully installed your code and it works fine on 7-Scenes dataset,

Now I want to run your code on my own Images, I have extracted the frame00xxx-pose.txt for every corresponding images by using Visual-SFM, but here your code also needs the DSO: Direct sparse odometry of 12 values for the comparison, so can you please guide me how to extract the DSO from monocular images having 4x4 poses for each frame too..??

Or can you please guide me format of DSO file values,

Is these values are also Rotation and Translation.? and what is the format of these files in that text file named "dso_poses/seq-xx.txt".?? like 12 values shows what.? which values are for which parameters for camera.?

waiting for your kind response, Best regards.

opened by DRAhmadFaraz 4
How to run this code on our own dataset.?

@samarth-robo Respected Sir, As this code run on ur dataset of "7-Scenes" now if we want to run this code on our custom images then How would this code will able to evaluate and predict the results.?

as in Dataset of "7-Scenes" It needs two things as an input. 1 is "RGB image" 2 is "frame-000001.pose.txt" files having 4x4 pose matrix.

so in our own custom images, Can you please guide me how to get this 4x4 pose matrix "frame-000001.pose.txt" from every RGB images.?

I will be thankful to you and waiting for your kind response.

Regards

opened by DRAhmadFaraz 4
python3+pytorch1.6 reasoning failed

Hello, excuse me. For some special reason I have to use python3.6+pytorch1.6 for inference on mapnet, but to my surprise the results in this case are worse than inference with the environment in the environment.yml file Many, what is the reason for this, if you can answer, I will be very grateful.

opened by cccccv-cm 2
Problems of RGB-D Dataset 7-Scenes

Hi, thanks for sharing such a nice work. I have some trouble downloading the 7-Scenes dataset. The download links in https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/ does not work. Can you share your data? Thanks.

opened by heng94 1
About interpolate_pose.py file

Hi,

Thanks for your great work. May I know what kind of algorithm is used in pose interpolation? def interpolate_poses(pose_timestamps, abs_poses, requested_timestamps, origin_timestamp):

opened by LZL-CS 0
Weird results when training with provided script on RobotCar loop scene.

Hi there, @samarth-robo. Thanks for your solid work and your well-structured code, I could produce even better results than the numbers in the original paper by loading pre-trained model weights.

However, when I retrained on the RobotCar dataset and loop scene by using the provided script and config file (from latest version): python train.py --dataset RobotCar --scene loop --config_file configs/mapnet.ini --model mapnet --device 1 --learn_beta --learn_gamma

The results become weird and errors are much larger than I expected.

It's worth noting that, I executed the script on an 8 * NVIDIA RTX 2080ti node: When I used pytorch-0.4.1, which is specified in your environment.yaml, there was an error detected by cuda: "THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument" Thus I ran the script both in pytorch 0.4.1 and 1.0.1 environment. However, both errors are very large.

Besides, I also noticed that the preprocessed images have some over-exposure cases (some are almost all white and barely has information), is it normal?

like 1403774724292807.png, ... 1403774724917727.png at the beginning of 2014-06-26-09-24-58 and other sequences.

opened by KongYuJL 4
Extraction of Pose data

Hey, I know that you have addressed this a number of times on this forum but I have a few doubts of my own. First, I would like to know what exactly I would have to introduce when I implement the algorithm with my own dataset. For example, the seven scenes dataset contains three types of data -RGB images -Depth images -Pose text file How do you obtain said text file for frames of a custom dataset. Beyond that, what else would I have to change in the repo if I want to run it on my personal dataset While moving through the issues, I noticed a lot of queries about the DSO files and xxxx_vo_stats.pkl files. Any help I could get would go a long way. Thanks in advance

opened by rohanphil 1

Owner

NVIDIA Research Projects

GitHub https://goo.gl/mRB3Au.

PyTorch implementations of the paper: "Learning Independent Instance Maps for Crowd Localization"

IIM - Crowd Localization This repo is the official implementation of paper: Learning Independent Instance Maps for Crowd Localization. The code is dev

91 Nov 10, 2022

Range Image-based LiDAR Localization for Autonomous Vehicles Using Mesh Maps

Range Image-based 3D LiDAR Localization This repo contains the code for our ICRA2021 paper: Range Image-based LiDAR Localization for Autonomous Vehicl

208 Dec 15, 2022

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021)

Back to the Feature with PixLoc We introduce PixLoc, a neural network for end-to-end learning of camera localization from an image and a 3D model via

610 Jan 5, 2023

SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

SSL_SLAM2 Lightweight 3-D Localization and Mapping for Solid-State LiDAR (Intel Realsense L515 as an example) This repo is an extension work of SSL_SL

1.3k Jan 8, 2023

Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

15 Oct 14, 2022

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

143 Dec 22, 2022

The implementation of the paper "A Deep Feature Aggregation Network for Accurate Indoor Camera Localization".

A Deep Feature Aggregation Network for Accurate Indoor Camera Localization This is the PyTorch implementation of our paper "A Deep Feature Aggregation

9 Dec 9, 2022

Implementation of ICCV19 Paper "Learning Two-View Correspondences and Geometry Using Order-Aware Network"

OANet implementation Pytorch implementation of OANet for ICCV'19 paper "Learning Two-View Correspondences and Geometry Using Order-Aware Network", by

225 Dec 5, 2022

Camera-caps - Examine the camera capabilities for V4l2 cameras

camera-caps This is a graphical user interface over the v4l2-ctl command line to

25 Dec 26, 2022

Based on the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral

Geometry-aware Instance-reweighted Adversarial Training This repository provides codes for Geometry-aware Instance-reweighted Adversarial Training (ht

47 Dec 22, 2022

Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

Geometry-Aware Gradient Algorithms for Neural Architecture Search This repository contains the code required to run the experiments for the DARTS sear

18 May 27, 2022

[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers Created by Xumin Yu*, Yongming Rao*, Ziyi Wang, Zuyan Liu, Jiwen Lu, Jie Zhou

317 Dec 26, 2022

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency(ECCV 2020) This is an official python implementati

304 Jan 3, 2023

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

677 Dec 25, 2022

Geometry-Aware Learning of Maps for Camera Localization (CVPR2018)

Related tags

Overview

Geometry-Aware Learning of Maps for Camera Localization

A four-minute video summary (click below for the video)

Citation

Table of Contents

Documentation

Setup

Data

Special instructions for RobotCar: (only needed for RobotCar data)

Running the code

Demo/Inference

7_Scenes

RobotCar

Train

Network Attention Visualization

Other Tools

Align VO to the ground truth poses

Mean and stdev pixel statistics across a dataset

Calculate pose translation statistics

Plot the ground truth and VO poses for debugging

Process RobotCar GPS

Demosaic and undistort RobotCar images

FAQ

License

Comments

Owner

NVIDIA Research Projects

PyTorch implementations of the paper: "Learning Independent Instance Maps for Crowd Localization"

Range Image-based LiDAR Localization for Autonomous Vehicles Using Mesh Maps

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021)

SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

The implementation of the paper "A Deep Feature Aggregation Network for Accurate Indoor Camera Localization".

Implementation of ICCV19 Paper "Learning Two-View Correspondences and Geometry Using Order-Aware Network"

Camera-caps - Examine the camera capabilities for V4l2 cameras

Based on the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral

Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

Code for HodgeNet: Learning Spectral Geometry on Triangle Meshes, in SIGGRAPH 2021.

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

In generative deep geometry learning, we often get many obj files remain to be rendered

Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"