The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

Xiaoming Zhao

Last update: Dec 15, 2022

Related tags

Deep Learning PointNav-VO

Overview

PointNav-VO

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

Project Page | Paper

Setup
Reproduction
Plug-and-play
Train
Citation

Setup

Install Dependencies

conda env create -f environment.yml

Install Habitat

The repo is tested under the following commits of habitat-lab and habitat-sim.

habitat-lab == d0db1b55be57abbacc5563dca2ca14654c545552
habitat-sim == 020041d75eaf3c70378a9ed0774b5c67b9d3ce99

Note, to align with Habitat Challenge 2020 settings (see Step 36 in the Dockerfile), when installing habitat-sim, we compiled without CUDA support as

python setup.py install --headless

There was a discrepancy between noises models in CPU and CPU versions which has now been fixed, see this issue. Therefore, to reproduce the results in the paper with our pre-trained weights, you need to use noises model of CPU-version.

Download Data

We need two datasets to enable running of this repo:

Gibson scene dataset
PointGoal Navigation splits, we need pointnav_gibson_v2.zip.

Please follow Habitat's instruction to download them. We assume all data is put under ./dataset with structure:

.
+-- dataset
|  +-- Gibson
|  |  +-- gibson
|  |  |  +-- Adrian.glb
|  |  |  +-- Adrian.navmesh
|  |  |  ...
|  +-- habitat_datasets
|  |  +-- pointnav
|  |  |  +-- gibson
|  |  |  |  +-- v2
|  |  |  |  |  +-- train
|  |  |  |  |  +-- val
|  |  |  |  |  +-- valmini

Reproduce

Download pretrained checkpoints of RL navigation policy and VO from this link. Put them under pretrained_ckpts with the following structure:

.
+-- pretrained_ckpts
|  +-- rl
|  |  +-- no_tune
|  |  |  +-- rl_no_tune.pth
|  |  +-- tune_vo
|  |  |  +-- rl_tune_vo.pth
|  +-- vo
|  |  +-- act_forward.pth
|  |  +-- act_left_right_inv_joint.pth

Run the following command to reproduce navigation results. On Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz and a Nvidia GeForce GTX 1080 Ti, it takes around 4.5 hours to complete evaluation on all 994 episodes with navigation policy tuned with VO.

cd /path/to/this/repo
export POINTNAV_VO_ROOT=$PWD

export NUMBA_NUM_THREADS=1 && \
export NUMBA_THREADING_LAYER=workqueue && \
conda activate pointnav-vo && \
python ${POINTNAV_VO_ROOT}/launch.py \
--repo-path ${POINTNAV_VO_ROOT} \
--n_gpus 1 \
--task-type rl \
--noise 1 \
--run-type eval \
--addr 127.0.1.1 \
--port 8338

Use VO as a Drop-in Module

We provide a class BaseRLTrainerWithVO that contains all necessary functions to compute odometry in base_trainer_with_vo.py. Specifically, you can use _compute_local_delta_states_from_vo to compute odometry based on adjacent observations. The code sturcture will be something like:

local_delta_states = _compute_local_delta_states_from_vo(prev_obs, cur_obs, action)
cur_goal = compute_goal_pos(prev_goal, local_delta_states)

To get more sense about how to use this function, please refer to challenge2020_agent.py, which is the agent we used in HabitatChallenge 2020.

Train Your Own VO

See details in TRAIN.md

Citation

Please cite the following papers if you found our model useful. Thanks!

Xiaoming Zhao, Harsh Agrawal, Dhruv Batra, and Alexander Schwing. The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation. ICCV 2021.

@inproceedings{ZhaoICCV2021,
  title={{The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation}},
  author={Xiaoming Zhao and Harsh Agrawal and Dhruv Batra and Alexander Schwing},
  booktitle={Proc. ICCV},
  year={2021},
}

Comments

VO model training strategy
This is a follow up question to the issue "VO model training #10". You mentioned that for VO training, "For the 1-million-entry dataset we used in the paper, it needs 4-5 days to complete the whole training on a GeForce RTX 2080 Ti."

Does the 4-5 days training period include move_forward, turn_left, turn_right, and joint training for turn_left, turn_right or for each action does it take 4-5 days? Can you please clarify?

Is there a way to train all the actions together ?
opened by AshwiniUthir 6
Parallel training Not working

We tried to run multiple training in parallel with cloning the datasets/vo_datasets into multiple copies. But even then the training was stopped with the following error:

Traceback (most recent call last): File "./pointnav_vo/run.py", line 347, in main() File "./pointnav_vo/run.py", line 75, in main run_exp(**vars(args)) File "./pointnav_vo/run.py", line 313, in run_exp trainer.train() File "/home/praneeth/PointNav-VO/pointnav_vo/vo/engine/vo_cnn_regression_geo_invariance_engine.py", line 841, in train batch_data = next(train_iter) File "/home/praneeth/anaconda3/envs/pointnav-vo/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in next data = self._next_data() File "/home/praneeth/anaconda3/envs/pointnav-vo/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 974, in _next_data idx, data = self._get_data() File "/home/praneeth/anaconda3/envs/pointnav-vo/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 928, in _get_data raise RuntimeError('DataLoader timed out after {} seconds'.format(self._timeout)) RuntimeError: DataLoader timed out after 300 seconds

Initially we thought, this might be because both the training are accessing the same dataset, since the dataset is not loaded into the memory. And that is the why we cloned the dataset in first place. pointing dataset1 for one training and dataset2 for another training. We made these changes in configs/vo/vo_pointnav.yaml config file. But even then we faced the same issue. and the training was stopped after some epochs.

Do you think is there a change wherein the config file is also not stored in memory and its being read after every epochs? Or do you feel is there anything in the code that might cause this error?

I'd like to know what we should do if we need to run multiple trainings parallel, with one GPU allocated to each training.

opened by Praneethcruzon 3
Multi GPU training not working

Multi GPU training on VO model is broken/not working. The argument n-gpu simply doesnt do anything. Tweaking the code to run on multiple GPU using pytorch parallelism also doesnt work. We had the scripts hung indefinitely, seems to be resource allocation issues.

Is there a possibility of running VO model training across multiple GPUs? We could really make use of it to speed it up.

opened by Praneethcruzon 3
VO model training

Can you please let me know how long does it take for the VO model training to run (with default configurations)? How long does it take to create 1 checkpoint?

opened by AshwiniUthir 3
Recreating pertained checkpoints

How can I recreate the pretrained_ckpts for rl and vo that you have provided (rl_no_tune.pth, rl_tune_vo.pth, act_forward.pth, act_left_right_inv_joint.pth)? When I retrained the rl network, it created more than one hundred checkpoints (I manually stopped it at 103). Where can I specify the number of checkpoints to be created ?

opened by AshwiniUthir 3
is the rl pretrained model correct?

I downloaded the pretrained rl model in https://drive.google.com/drive/folders/1HG_d-PydxBBiDSnqG_GXAuG78Iq3uGdr. I found the backbone.conv1.0.weight is in [32, 1, 7, 7], is the pretrianed model only trained with depth ? if using rgbd, the shape of weight need [32, 4, 7, 7].

opened by cyj5030 2
Unserstanding of accumulated prediction error

Thanks a lot for sharing this great work. I have doubts on the following thing. From my understanding, the visual odometry module is used for estimating the transformation H(Ct -> Ct+1) for one step, and the estimation error does exist. If we test the pre-trained model under a trajectory consisting of N steps, the estimation error in step 0 may affect the prediction error in step 1, 2,3,...,N-1. It seems to be that a bigger estimation error will be achieved at a bigger step number. Is this correct? In your test experiments, is there any phenomenons to demonstrate this claim?

opened by AgentEXPL 2
Evaluation with noise-free sensors

Hi,

Thanks for your work~

Did you evaluate the provided model with noise-free sensors (e.g., GaussianNoiseModel in RGB ) or GPS+Compass sensor? Because I found unsatisfactory results after I did this, which confused me a lot. Is there any explanation or something I did wrong? Thanks a lot~

opened by StOnEGiggity 2
Requesting checkpoints for models that don't require action input

Hey @Xiaoming-Zhao, thanks for this amazing contribution and for sharing the code.

I see that the pretrained models you have shared only include the action-conditioned ones. Could you share the ones that do not require action input?

Thanks!

opened by mukulkhanna 2
Simulator.cpp(65)::~Simulator : Deconstructing Simulator

Environment creation successful [20:41:53:580240]:[Physics] PhysicsManager.h(503)::addArticulatedObjectFromURDF : Not implemented in base PhysicsManager. [20:41:53:580266]:[Core] ManagedContainerBase.h(205)::getObjectHandleByID : Unknown ArticulatedObject managed object ID: -1 . Aborting [20:41:53:580296]:[Core] ManagedContainerBase.h(329)::checkExistsWithMessage : ::getObjectCopyByID : Unknown ArticulatedObject managed object handle : . Aborting [20:41:53:580356]:[Physics] PhysicsManager.cpp(50)::~PhysicsManager : Deconstructing PhysicsManager [20:41:53:580456]:[Scene] SceneManager.h(25)::~SceneManager : Deconstructing SceneManager [20:41:53:580463]:[Scene] SceneGraph.h(25)::~SceneGraph : Deconstructing SceneGraph [20:41:53:580758]:[Sensor] Sensor.cpp(69)::~Sensor : Deconstructing Sensor [20:41:53:580858]:[Sensor] Sensor.cpp(69)::~Sensor : Deconstructing Sensor [20:41:53:580998]:[Sensor] Sensor.cpp(69)::~Sensor : Deconstructing Sensor [20:41:53:581013]:[Sensor] Sensor.cpp(69)::~Sensor : Deconstructing Sensor [20:41:53:581049]:[Scene] SemanticScene.h(47)::~SemanticScene : Deconstructing SemanticScene [20:41:53:584969]:[Gfx] Renderer.cpp(72)::~Impl : Deconstructing Renderer [20:41:53:584996]:[Gfx] WindowlessContext.h(17)::~WindowlessContext : Deconstructing WindowlessContext Traceback (most recent call last): File "examples/example.py", line 28, in example() File "examples/example.py", line 17, in example observations = env.reset() # noqa: F841 File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/habitat-lab/habitat/core/env.py", line 253, in reset self.reconfigure(self._config) File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/habitat-lab/habitat/core/env.py", line 339, in reconfigure self._sim.reconfigure(self._config.SIMULATOR) File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/habitat-lab/habitat/tasks/rearrange/rearrange_sim.py", line 171, in reconfigure self.robot.reconfigure() File "/home/desktop-obs-68/miniconda3/envs/pointnav-vo/lib/python3.7/site-packages/habitat_sim/robots/fetch_robot.py", line 75, in reconfigure super().reconfigure() File "/home/desktop-obs-68/miniconda3/envs/pointnav-vo/lib/python3.7/site-packages/habitat_sim/robots/mobile_manipulator.py", line 161, in reconfigure self.sim_obj.auto_clamp_joint_limits = True AttributeError: 'NoneType' object has no attribute 'auto_clamp_joint_limits' [20:41:53:712115]:[Sim] Simulator.cpp(65)::~Simulator : Deconstructing Simulator

This issue occurred after while executing python examples/example.py

Should be noted that we have used the latest version of Habitat-lab and Habit-sim as we had faced with the Config file missing issue with the recommended commits of the repos.

opened by PraneethRavichandran 1
ZeroDivisionError: float division by zero

694th chunk size: 256

[177700 / 1000000] ep: 4008, 0.56s / episode; remain: 10326.45s

[177750 / 1000000] ep: 4009, 0.56s / episode; remain: 10325.04s

[177800 / 1000000] ep: 4009, 0.56s / episode; remain: 10323.50s

[177850 / 1000000] ep: 4010, 0.56s / episode; remain: 10322.19s

Traceback (most recent call last): File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/pointnav_vo/vo/dataset/generate_datasets.py", line 687, in main() File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/pointnav_vo/vo/dataset/generate_datasets.py", line 682, in main obs_transformer=obs_transformer, File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/pointnav_vo/vo/dataset/generate_datasets.py", line 549, in generate_datasets obs_transformer=obs_transformer, File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/pointnav_vo/vo/dataset/generate_datasets.py", line 401, in generate_one_dataset prev_obs = env.reset() File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/habitat-lab/habitat/core/env.py", line 259, in reset observations=observations, File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/habitat-lab/habitat/core/embodied_task.py", line 162, in reset_measures measure.reset_metric(*args, **kwargs) File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/habitat-lab/habitat/tasks/nav/nav.py", line 599, in reset_metric episode=episode, task=task, *args, **kwargs File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/habitat-lab/habitat/tasks/nav/nav.py", line 620, in update_metric self._start_end_episode_distance, self._agent_episode_distance ZeroDivisionError: float division by zero

A little context about the issue, During installation we had to use the newest version of Habitat-lab and Habitat-sim . Added to that, only the nightly version of installation was succeed . - conda install habitat-sim -c conda-forge -c aihabitat-nightly.

The issue occured when we ran generate_datasets.py with the following config. --config_f /home/desktop-obs-68/projects/visual-odometry/PointNav-VO/configs/point_nav_habitat_challenge_2020.yaml --train_scene_dir ./dataset/habitat_datasets/pointnav/gibson/v2/train/content --val_scene_dir ./dataset/habitat_datasets/pointnav/gibson/v2/val/content --save_dir ./dataset/vo_dataset --data_version v2
--vis_size_w 341
--vis_size_h 192
--obs_transform none
--act_type -1
--rnd_p 1.0
--N_list 1000000 --name_list train

opened by PraneethRavichandran 1

Owner

Xiaoming Zhao

PhD Student @IllinoisCS

GitHub

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

41 Jan 3, 2023

LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

LVI-SAM This repository contains code for a lidar-visual-inertial odometry and mapping system, which combines the advantages of LIO-SAM and Vins-Mono

1.1k Dec 27, 2022

VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

VID-Fusion VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation Authors: Ziming Ding , Tiankai Yang, Kunyi Zhan

86 Nov 18, 2022

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

SAAVN - Sound Adversarial Audio-Visual Navigation,ICLR2022 (In PyTorch)

SAAVN SAAVN Code release for paper "Sound Adversarial Audio-Visual Navigation,IC

10 Aug 30, 2022

Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research

Megaverse Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research. The efficient design of the engine enables ph

191 Dec 23, 2022

This is the pytorch code for the paper Curious Representation Learning for Embodied Intelligence.

Curious Representation Learning for Embodied Intelligence This is the pytorch code for the paper Curious Representation Learning for Embodied Intellig

19 Oct 19, 2022

Embodied Intelligence via Learning and Evolution

Embodied Intelligence via Learning and Evolution This is the code for the paper Embodied Intelligence via Learning and Evolution Agrim Gupta, Silvio S

111 Dec 13, 2022

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

4 May 8, 2022

YouRefIt: Embodied Reference Understanding with Language and Gesture

YouRefIt: Embodied Reference Understanding with Language and Gesture YouRefIt: Embodied Reference Understanding with Language and Gesture by Yixin Che

16 Jul 11, 2022

Poisson Surface Reconstruction for LiDAR Odometry and Mapping

Poisson Surface Reconstruction for LiDAR Odometry and Mapping Surfels TSDF Our Approach Table: Qualitative comparison between the different mapping te

305 Dec 21, 2022

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time The first Lidar-only odometry framework with high performance based on tr

183 Dec 1, 2022

Self-supervised Deep LiDAR Odometry for Robotic Applications

DeLORA: Self-supervised Deep LiDAR Odometry for Robotic Applications Overview Paper: link Video: link ICRA Presentation: link This is the correspondin

Robotic Systems Lab - Legged Robotics at ETH Zürich

181 Dec 29, 2022

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集，包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。人机交互主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

Related tags

Overview

PointNav-VO

Table of Contents

Setup

Install Dependencies

Install Habitat

Download Data

Reproduce

Use VO as a Drop-in Module

Train Your Own VO

Citation

Comments

Owner

Xiaoming Zhao

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

SAAVN - Sound Adversarial Audio-Visual Navigation,ICLR2022 (In PyTorch)

Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research

This is the pytorch code for the paper Curious Representation Learning for Embodied Intelligence.

Embodied Intelligence via Learning and Evolution

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

YouRefIt: Embodied Reference Understanding with Language and Gesture

Poisson Surface Reconstruction for LiDAR Odometry and Mapping

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time

Self-supervised Deep LiDAR Odometry for Robotic Applications

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Winning solution of the Indoor Location & Navigation Kaggle competition

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Pathdreamer: A World Model for Indoor Navigation