Pathdreamer: A World Model for Indoor Navigation

Google Research

Last update: Jan 4, 2023

Related tags

Deep Learning pathdreamer

Overview

Pathdreamer: A World Model for Indoor Navigation

This repository hosts the open source code for Pathdreamer, to be presented at ICCV 2021.

Paper | Project Webpage | Colab Demo

Setup instructions

Environment

Set up virtualenv, and install required libraries:

virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

Add the Pathdreamer library to PYTHONPATH:

export PYTHONPATH=$PYTHONPATH:/home/path/to/pathdreamer_root/

Downloading Pretrained Checkpoints

We provide a pretrained checkpoint which can be acquired by running:

wget https://storage.googleapis.com/gresearch/pathdreamer/ckpt.tar -P data/
tar -xf data/ckpt.tar --directory data/

The results will be extracted to the data/ckpt directory. Two checkpoints are provided, one for the Stage 1 model (Structure Generator), and another for the Stage 2 model (Image Generator).

Colab Demo

Pathdreamer_Example_Colab.ipynb [click to launch in Google Colab] shows how to setup and run the pretrained Pathdreamer model for inference. It includes examples on synthesizing image sequences and continuous video sequences for arbitrary navigation trajectories.

Citation

If you find this work useful, please consider citing:

@inproceedings{koh2021pathdreamer,
  title={Pathdreamer: A World Model for Indoor Navigation},
  author={Koh, Jing Yu and Lee, Honglak and Yang, Yinfei and Baldridge, Jason and Anderson, Peter},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021}
}

License

Pathdreamer is released under the Apache 2.0 license. The Matterport3D dataset is governed by the Matterport3D Terms of Use.

Disclaimer

Not an official Google product.

Comments

Questions about positions input and equirectangular image

Thanks for your great work! I believe this work could promote the developments of model-based methods for VLN! May I ask some questions? 1.Predicting a further panorama needs the positions and orientations of the agent, but I note in you demo, only the positions are input to the model. How is it work? The agent implicitly calculates the orientations? 2. How to get the equirectangular image in the Matterport3D Simulator (the simulator is based on a skybox image for each viewpoint). Do you have any scripts?

opened by MarSaKi 4
UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]

While running both image generation and video generation model on Colab. I run into this error even though i selected GPU with a Mirrored Strategy. You can see the error in the Screenshot below.

Please help with CuDNN error. At first glance it has to do something with sample_noise=False argument

opened by Raghvender1205 4
test_pathdreamer.py does not work

https://github.com/google-research/pathdreamer/blob/dc607faf3a6d3011ddd2e4723d53122235774167/test_pathdreamer.py#L15

Running the "test_pathdreamer.py" gets the following result. How to get through this?

ModuleNotFoundError: No module named 'pathdreamer'

opened by AgentEXPL 1
How to change the camera setting and train a new model?

I want to change the camera setting, e.g., making the camera look more at the ceiling rather than look at a horizontal plane. Thus, I need to do some changes. It would be of great help if I can know which file is used for training a new model.

opened by AgentEXPL 0

Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.

Deep neural network for object detection and semantic segmentation on indoor panoramic images. The implementation is based on the papers:

9 Nov 24, 2022

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

122 Dec 13, 2022

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022 [Project page | Video] Getting sta

51 Nov 29, 2022

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集，包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。人机交互主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

44 Dec 14, 2022

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

34 Nov 15, 2022

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

43 Nov 21, 2022

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

310 Dec 28, 2022

Pathdreamer: A World Model for Indoor Navigation

Related tags

Overview

Pathdreamer: A World Model for Indoor Navigation

Setup instructions

Environment

Downloading Pretrained Checkpoints

Colab Demo

Citation

License

Disclaimer

You might also like...

Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

Comments

Questions about positions input and equirectangular image

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]

test_pathdreamer.py does not work

How to change the camera setting and train a new model?

Owner

Google Research

Official implementation of "Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation" (RSS 2022)

Generate indoor scenes with Transformers

Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments

PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentation.

PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

Indoor Panorama Planar 3D Reconstruction via Divide and Conquer

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

The implementation of the paper "A Deep Feature Aggregation Network for Accurate Indoor Camera Localization".

The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.