The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

Overview

License

PointNav-VO

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

Project Page | Paper

Table of Contents

Setup

Install Dependencies

conda env create -f environment.yml

Install Habitat

The repo is tested under the following commits of habitat-lab and habitat-sim.

habitat-lab == d0db1b55be57abbacc5563dca2ca14654c545552
habitat-sim == 020041d75eaf3c70378a9ed0774b5c67b9d3ce99

Note, to align with Habitat Challenge 2020 settings (see Step 36 in the Dockerfile), when installing habitat-sim, we compiled without CUDA support as

python setup.py install --headless

There was a discrepancy between noises models in CPU and CPU versions which has now been fixed, see this issue. Therefore, to reproduce the results in the paper with our pre-trained weights, you need to use noises model of CPU-version.

Download Data

We need two datasets to enable running of this repo:

  1. Gibson scene dataset
  2. PointGoal Navigation splits, we need pointnav_gibson_v2.zip.

Please follow Habitat's instruction to download them. We assume all data is put under ./dataset with structure:

.
+-- dataset
|  +-- Gibson
|  |  +-- gibson
|  |  |  +-- Adrian.glb
|  |  |  +-- Adrian.navmesh
|  |  |  ...
|  +-- habitat_datasets
|  |  +-- pointnav
|  |  |  +-- gibson
|  |  |  |  +-- v2
|  |  |  |  |  +-- train
|  |  |  |  |  +-- val
|  |  |  |  |  +-- valmini

Reproduce

Download pretrained checkpoints of RL navigation policy and VO from this link. Put them under pretrained_ckpts with the following structure:

.
+-- pretrained_ckpts
|  +-- rl
|  |  +-- no_tune
|  |  |  +-- rl_no_tune.pth
|  |  +-- tune_vo
|  |  |  +-- rl_tune_vo.pth
|  +-- vo
|  |  +-- act_forward.pth
|  |  +-- act_left_right_inv_joint.pth

Run the following command to reproduce navigation results. On Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz and a Nvidia GeForce GTX 1080 Ti, it takes around 4.5 hours to complete evaluation on all 994 episodes with navigation policy tuned with VO.

cd /path/to/this/repo
export POINTNAV_VO_ROOT=$PWD

export NUMBA_NUM_THREADS=1 && \
export NUMBA_THREADING_LAYER=workqueue && \
conda activate pointnav-vo && \
python ${POINTNAV_VO_ROOT}/launch.py \
--repo-path ${POINTNAV_VO_ROOT} \
--n_gpus 1 \
--task-type rl \
--noise 1 \
--run-type eval \
--addr 127.0.1.1 \
--port 8338

Use VO as a Drop-in Module

We provide a class BaseRLTrainerWithVO that contains all necessary functions to compute odometry in base_trainer_with_vo.py. Specifically, you can use _compute_local_delta_states_from_vo to compute odometry based on adjacent observations. The code sturcture will be something like:

local_delta_states = _compute_local_delta_states_from_vo(prev_obs, cur_obs, action)
cur_goal = compute_goal_pos(prev_goal, local_delta_states)

To get more sense about how to use this function, please refer to challenge2020_agent.py, which is the agent we used in HabitatChallenge 2020.

Train Your Own VO

See details in TRAIN.md

Citation

Please cite the following papers if you found our model useful. Thanks!

Xiaoming Zhao, Harsh Agrawal, Dhruv Batra, and Alexander Schwing. The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation. ICCV 2021.

@inproceedings{ZhaoICCV2021,
  title={{The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation}},
  author={Xiaoming Zhao and Harsh Agrawal and Dhruv Batra and Alexander Schwing},
  booktitle={Proc. ICCV},
  year={2021},
}
Comments
  • VO model training strategy

    VO model training strategy

    This is a follow up question to the issue "VO model training #10". You mentioned that for VO training, "For the 1-million-entry dataset we used in the paper, it needs 4-5 days to complete the whole training on a GeForce RTX 2080 Ti."

    1. Does the 4-5 days training period include move_forward, turn_left, turn_right, and joint training for turn_left, turn_right or for each action does it take 4-5 days? Can you please clarify?

    2. Is there a way to train all the actions together ?

    opened by AshwiniUthir 6
  • Parallel training Not working

    Parallel training Not working

    We tried to run multiple training in parallel with cloning the datasets/vo_datasets into multiple copies. But even then the training was stopped with the following error:

    Traceback (most recent call last): File "./pointnav_vo/run.py", line 347, in main() File "./pointnav_vo/run.py", line 75, in main run_exp(**vars(args)) File "./pointnav_vo/run.py", line 313, in run_exp trainer.train() File "/home/praneeth/PointNav-VO/pointnav_vo/vo/engine/vo_cnn_regression_geo_invariance_engine.py", line 841, in train batch_data = next(train_iter) File "/home/praneeth/anaconda3/envs/pointnav-vo/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in next data = self._next_data() File "/home/praneeth/anaconda3/envs/pointnav-vo/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 974, in _next_data idx, data = self._get_data() File "/home/praneeth/anaconda3/envs/pointnav-vo/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 928, in _get_data raise RuntimeError('DataLoader timed out after {} seconds'.format(self._timeout)) RuntimeError: DataLoader timed out after 300 seconds

    Initially we thought, this might be because both the training are accessing the same dataset, since the dataset is not loaded into the memory. And that is the why we cloned the dataset in first place. pointing dataset1 for one training and dataset2 for another training. We made these changes in configs/vo/vo_pointnav.yaml config file. But even then we faced the same issue. and the training was stopped after some epochs.

    Do you think is there a change wherein the config file is also not stored in memory and its being read after every epochs? Or do you feel is there anything in the code that might cause this error?

    I'd like to know what we should do if we need to run multiple trainings parallel, with one GPU allocated to each training.

    opened by Praneethcruzon 3
  • Multi GPU training not working

    Multi GPU training not working

    Multi GPU training on VO model is broken/not working. The argument n-gpu simply doesnt do anything. Tweaking the code to run on multiple GPU using pytorch parallelism also doesnt work. We had the scripts hung indefinitely, seems to be resource allocation issues.

    Is there a possibility of running VO model training across multiple GPUs? We could really make use of it to speed it up.

    opened by Praneethcruzon 3
  • VO model training

    VO model training

    Can you please let me know how long does it take for the VO model training to run (with default configurations)? How long does it take to create 1 checkpoint?

    opened by AshwiniUthir 3
  • Recreating pertained checkpoints

    Recreating pertained checkpoints

    How can I recreate the pretrained_ckpts for rl and vo that you have provided (rl_no_tune.pth, rl_tune_vo.pth, act_forward.pth, act_left_right_inv_joint.pth)? When I retrained the rl network, it created more than one hundred checkpoints (I manually stopped it at 103). Where can I specify the number of checkpoints to be created ?

    opened by AshwiniUthir 3
  • is the rl pretrained model correct?

    is the rl pretrained model correct?

    I downloaded the pretrained rl model in https://drive.google.com/drive/folders/1HG_d-PydxBBiDSnqG_GXAuG78Iq3uGdr. I found the backbone.conv1.0.weight is in [32, 1, 7, 7], is the pretrianed model only trained with depth ? if using rgbd, the shape of weight need [32, 4, 7, 7].

    opened by cyj5030 2
  • Unserstanding of accumulated prediction error

    Unserstanding of accumulated prediction error

    Thanks a lot for sharing this great work. I have doubts on the following thing. From my understanding, the visual odometry module is used for estimating the transformation H(Ct -> Ct+1) for one step, and the estimation error does exist. If we test the pre-trained model under a trajectory consisting of N steps, the estimation error in step 0 may affect the prediction error in step 1, 2,3,...,N-1. It seems to be that a bigger estimation error will be achieved at a bigger step number. Is this correct? In your test experiments, is there any phenomenons to demonstrate this claim?

    opened by AgentEXPL 2
  • Evaluation with noise-free sensors

    Evaluation with noise-free sensors

    Hi,

    Thanks for your work~

    Did you evaluate the provided model with noise-free sensors (e.g., GaussianNoiseModel in RGB ) or GPS+Compass sensor? Because I found unsatisfactory results after I did this, which confused me a lot. Is there any explanation or something I did wrong? Thanks a lot~

    opened by StOnEGiggity 2
  • Requesting checkpoints for models that don't require action input

    Requesting checkpoints for models that don't require action input

    Hey @Xiaoming-Zhao, thanks for this amazing contribution and for sharing the code.

    I see that the pretrained models you have shared only include the action-conditioned ones. Could you share the ones that do not require action input?

    Thanks!

    opened by mukulkhanna 2
  • Simulator.cpp(65)::~Simulator : Deconstructing Simulator

    Simulator.cpp(65)::~Simulator : Deconstructing Simulator

    Environment creation successful [20:41:53:580240]:[Physics] PhysicsManager.h(503)::addArticulatedObjectFromURDF : Not implemented in base PhysicsManager. [20:41:53:580266]:[Core] ManagedContainerBase.h(205)::getObjectHandleByID : Unknown ArticulatedObject managed object ID: -1 . Aborting [20:41:53:580296]:[Core] ManagedContainerBase.h(329)::checkExistsWithMessage : ::getObjectCopyByID : Unknown ArticulatedObject managed object handle : . Aborting [20:41:53:580356]:[Physics] PhysicsManager.cpp(50)::~PhysicsManager : Deconstructing PhysicsManager [20:41:53:580456]:[Scene] SceneManager.h(25)::~SceneManager : Deconstructing SceneManager [20:41:53:580463]:[Scene] SceneGraph.h(25)::~SceneGraph : Deconstructing SceneGraph [20:41:53:580758]:[Sensor] Sensor.cpp(69)::~Sensor : Deconstructing Sensor [20:41:53:580858]:[Sensor] Sensor.cpp(69)::~Sensor : Deconstructing Sensor [20:41:53:580998]:[Sensor] Sensor.cpp(69)::~Sensor : Deconstructing Sensor [20:41:53:581013]:[Sensor] Sensor.cpp(69)::~Sensor : Deconstructing Sensor [20:41:53:581049]:[Scene] SemanticScene.h(47)::~SemanticScene : Deconstructing SemanticScene [20:41:53:584969]:[Gfx] Renderer.cpp(72)::~Impl : Deconstructing Renderer [20:41:53:584996]:[Gfx] WindowlessContext.h(17)::~WindowlessContext : Deconstructing WindowlessContext Traceback (most recent call last): File "examples/example.py", line 28, in example() File "examples/example.py", line 17, in example observations = env.reset() # noqa: F841 File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/habitat-lab/habitat/core/env.py", line 253, in reset self.reconfigure(self._config) File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/habitat-lab/habitat/core/env.py", line 339, in reconfigure self._sim.reconfigure(self._config.SIMULATOR) File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/habitat-lab/habitat/tasks/rearrange/rearrange_sim.py", line 171, in reconfigure self.robot.reconfigure() File "/home/desktop-obs-68/miniconda3/envs/pointnav-vo/lib/python3.7/site-packages/habitat_sim/robots/fetch_robot.py", line 75, in reconfigure super().reconfigure() File "/home/desktop-obs-68/miniconda3/envs/pointnav-vo/lib/python3.7/site-packages/habitat_sim/robots/mobile_manipulator.py", line 161, in reconfigure self.sim_obj.auto_clamp_joint_limits = True AttributeError: 'NoneType' object has no attribute 'auto_clamp_joint_limits' [20:41:53:712115]:[Sim] Simulator.cpp(65)::~Simulator : Deconstructing Simulator

    This issue occurred after while executing python examples/example.py

    Should be noted that we have used the latest version of Habitat-lab and Habit-sim as we had faced with the Config file missing issue with the recommended commits of the repos.

    opened by PraneethRavichandran 1
  • ZeroDivisionError: float division by zero

    ZeroDivisionError: float division by zero

    694th chunk size: 256

    [177700 / 1000000] ep: 4008, 0.56s / episode; remain: 10326.45s

    [177750 / 1000000] ep: 4009, 0.56s / episode; remain: 10325.04s

    [177800 / 1000000] ep: 4009, 0.56s / episode; remain: 10323.50s

    [177850 / 1000000] ep: 4010, 0.56s / episode; remain: 10322.19s

    Traceback (most recent call last): File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/pointnav_vo/vo/dataset/generate_datasets.py", line 687, in main() File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/pointnav_vo/vo/dataset/generate_datasets.py", line 682, in main obs_transformer=obs_transformer, File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/pointnav_vo/vo/dataset/generate_datasets.py", line 549, in generate_datasets obs_transformer=obs_transformer, File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/pointnav_vo/vo/dataset/generate_datasets.py", line 401, in generate_one_dataset prev_obs = env.reset() File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/habitat-lab/habitat/core/env.py", line 259, in reset observations=observations, File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/habitat-lab/habitat/core/embodied_task.py", line 162, in reset_measures measure.reset_metric(*args, **kwargs) File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/habitat-lab/habitat/tasks/nav/nav.py", line 599, in reset_metric episode=episode, task=task, *args, **kwargs File "/home/desktop-obs-68/projects/visual-odometry/PointNav-VO/habitat-lab/habitat/tasks/nav/nav.py", line 620, in update_metric self._start_end_episode_distance, self._agent_episode_distance ZeroDivisionError: float division by zero


    A little context about the issue, During installation we had to use the newest version of Habitat-lab and Habitat-sim . Added to that, only the nightly version of installation was succeed . - conda install habitat-sim -c conda-forge -c aihabitat-nightly.

    The issue occured when we ran generate_datasets.py with the following config. --config_f /home/desktop-obs-68/projects/visual-odometry/PointNav-VO/configs/point_nav_habitat_challenge_2020.yaml --train_scene_dir ./dataset/habitat_datasets/pointnav/gibson/v2/train/content --val_scene_dir ./dataset/habitat_datasets/pointnav/gibson/v2/val/content --save_dir ./dataset/vo_dataset --data_version v2
    --vis_size_w 341
    --vis_size_h 192
    --obs_transform none
    --act_type -1
    --rnd_p 1.0
    --N_list 1000000 --name_list train

    opened by PraneethRavichandran 1
Owner
Xiaoming Zhao
PhD Student @IllinoisCS
Xiaoming Zhao
Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

null 41 Jan 3, 2023
LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

LVI-SAM This repository contains code for a lidar-visual-inertial odometry and mapping system, which combines the advantages of LIO-SAM and Vins-Mono

Tixiao Shan 1.1k Dec 27, 2022
VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

VID-Fusion VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation Authors: Ziming Ding , Tiankai Yang, Kunyi Zhan

ZJU FAST Lab 86 Nov 18, 2022
Alex Pashevich 62 Dec 24, 2022
SAAVN - Sound Adversarial Audio-Visual Navigation,ICLR2022 (In PyTorch)

SAAVN SAAVN Code release for paper "Sound Adversarial Audio-Visual Navigation,IC

YinfengYu 10 Aug 30, 2022
Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research

Megaverse Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research. The efficient design of the engine enables ph

Aleksei Petrenko 191 Dec 23, 2022
This is the pytorch code for the paper Curious Representation Learning for Embodied Intelligence.

Curious Representation Learning for Embodied Intelligence This is the pytorch code for the paper Curious Representation Learning for Embodied Intellig

null 19 Oct 19, 2022
Embodied Intelligence via Learning and Evolution

Embodied Intelligence via Learning and Evolution This is the code for the paper Embodied Intelligence via Learning and Evolution Agrim Gupta, Silvio S

Agrim Gupta 111 Dec 13, 2022
PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

Saim Wani 4 May 8, 2022
YouRefIt: Embodied Reference Understanding with Language and Gesture

YouRefIt: Embodied Reference Understanding with Language and Gesture YouRefIt: Embodied Reference Understanding with Language and Gesture by Yixin Che

null 16 Jul 11, 2022
Poisson Surface Reconstruction for LiDAR Odometry and Mapping

Poisson Surface Reconstruction for LiDAR Odometry and Mapping Surfels TSDF Our Approach Table: Qualitative comparison between the different mapping te

Photogrammetry & Robotics Bonn 305 Dec 21, 2022
T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time The first Lidar-only odometry framework with high performance based on tr

Pengwei Zhou 183 Dec 1, 2022
Self-supervised Deep LiDAR Odometry for Robotic Applications

DeLORA: Self-supervised Deep LiDAR Odometry for Robotic Applications Overview Paper: link Video: link ICRA Presentation: link This is the correspondin

Robotic Systems Lab - Legged Robotics at ETH Zürich 181 Dec 29, 2022
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

null 185 Dec 26, 2022
Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

Arjun Majumdar 44 Dec 14, 2022
Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

YicongHong 34 Nov 15, 2022
Winning solution of the Indoor Location & Navigation Kaggle competition

This repository contains the code to generate the winning solution of the Kaggle competition on indoor location and navigation organized by Microsoft

Tom Van de Wiele 62 Dec 28, 2022
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

null 43 Nov 21, 2022
Pathdreamer: A World Model for Indoor Navigation

Pathdreamer: A World Model for Indoor Navigation This repository hosts the open source code for Pathdreamer, to be presented at ICCV 2021. Paper | Pro

Google Research 122 Jan 4, 2023