Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Last update: Nov 21, 2022

Related tags

Deep Learning python language deep-neural-networks computer-vision deep-learning robotics navigation transformers pytorch artificial-intelligence supervised-learning bert vision-and-language vision-and-language-navigation habitat-api habitat-sim

Overview

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation

This repository is the pytorch implementation of our paper:

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation
Muhammad Zubair Irshad, Chih-Yao Ma, Zsolt Kira
International Conference on Robotics and Automation (ICRA), 2021

[Project Page] [arXiv] [GitHub]

Installation

Clone the current repository and required submodules:

git clone https://github.com/GT-RIPL/robo-vln
cd robo-vln
  
export robovln_rootdir=$PWD
    
git submodule init 
git submodule update

Habitat and Other Dependencies

Install robo-vln dependencies as follows:

conda create -n habitat python=3.6 cmake=3.14.0
cd $robovln_rootdir
python -m pip install -r requirements.txt

We use modified versions of Habitat-Sim and Habitat-API to support continuous control/action-spaces in Habitat Simulator. The details regarding continuous action spaces and converting discrete VLN dataset into continuous control formulation can be found in our paper. The specific commits of our modified Habitat-Sim and Habitat-API versions are mentioned below.

# installs both habitat-api and habitat_baselines
cd $robovln_rootdir/environments/habitat-lab
python -m pip install -r requirements.txt
python -m pip install -r habitat_baselines/rl/requirements.txt
python -m pip install -r habitat_baselines/rl/ddppo/requirements.txt
python setup.py develop --all
	
# Install habitat-sim
cd $robovln_rootdir/environments/habitat-sim
python setup.py install --headless --with-cuda

Data

Similar to Habitat-API, we expect a data folder (or symlink) with a particular structure in the top-level directory of this project.

Matterport3D

We utilize Matterport3D (MP3D) photo-realistic scene reconstructions to train and evaluate our agent. A total of 90 Matterport3D scenes are used for robo-vln. Here is the official Matterport3D Dataset download link and associated instructions: project webpage. To download the scenes needed for robo-vln, run the following commands:

# requires running with python 2.7
python download_mp.py --task habitat -o data/scene_datasets/mp3d/

Extract this data to data/scene_datasets/mp3d such that it has the form data/scene_datasets/mp3d/{scene}/{scene}.glb.

Dataset

The Robo-VLN dataset is a continuous control formualtion of the VLN-CE dataset by Krantz et al ported over from Room-to-Room (R2R) dataset created by Anderson et al. The details regarding converting discrete VLN dataset into continuous control formulation can be found in our paper.

Dataset	Path to extract	Size
robo_vln_v1.zip	`data/datasets/robo_vln_v1`	76.9 MB

Robo-VLN Dataset

The dataset robo_vln_v1 contains the train, val_seen, and val_unseen splits.

train: 7739 episodes
val_seen: 570 episodes
val_unseen: 1224 episodes

Format of {split}.json.gz

{
    'episodes' = [
        {
            'episode_id': 4991,
            'trajectory_id': 3279,
            'scene_id': 'mp3d/JeFG25nYj2p/JeFG25nYj2p.glb',
            'instruction': {
                'instruction_text': 'Walk past the striped area rug...',
                'instruction_tokens': [2384, 1589, 2202, 2118, 133, 1856, 9]
            },
            'start_position': [10.257800102233887, 0.09358400106430054, -2.379739999771118],
            'start_rotation': [0, 0.3332950713608026, 0, 0.9428225683587541],
            'goals': [
                {
                    'position': [3.360340118408203, 0.09358400106430054, 3.07817006111145], 
                    'radius': 3.0
                }
            ],
            'reference_path': [
                [10.257800102233887, 0.09358400106430054, -2.379739999771118], 
                [9.434900283813477, 0.09358400106430054, -1.3061100244522095]
                ...
                [3.360340118408203, 0.09358400106430054, 3.07817006111145],
            ],
            'info': {'geodesic_distance': 9.65537166595459},
        },
        ...
    ],
    'instruction_vocab': [
        'word_list': [..., 'orchids', 'order', 'orient', ...],
        'word2idx_dict': {
            ...,
            'orchids': 1505,
            'order': 1506,
            'orient': 1507,
            ...
        },
        'itos': [..., 'orchids', 'order', 'orient', ...],
        'stoi': {
            ...,
            'orchids': 1505,
            'order': 1506,
            'orient': 1507,
            ...
        },
        'num_vocab': 2504,
        'UNK_INDEX': 1,
        'PAD_INDEX': 0,
    ]
}

Format of {split}_gt.json.gz

{
    '4991': {
        'actions': [
          ...
          [-0.999969482421875, 1.0],
          [-0.9999847412109375, 0.15731772780418396],
          ...
          ],
        'forward_steps': 325,
        'locations': [
            [10.257800102233887, 0.09358400106430054, -2.379739999771118],
            [10.257800102233887, 0.09358400106430054, -2.379739999771118],
            ...
            [-12.644463539123535, 0.1518409252166748, 4.2241311073303220]
        ]
    }
    ...
}

Depth Encoder Weights

Similar to VLN-CE, our learning-based models utilizes a depth encoder pretained on a large-scale point-goal navigation task i.e. DDPPO. We utilize depth pretraining by using the DDPPO features from the ResNet50 from the original paper. The pretrained network can be downloaded here. Extract the contents of ddppo-models.zip to data/ddppo-models/{model}.pth.

Training and reproducing results

We use run.py script to train and evaluate all of our baseline models. Use run.py along with a configuration file and a run type (either train or eval) to train or evaluate:

python run.py --exp-config path/to/config.yaml --run-type {train | eval}

For lists of modifiable configuration options, see the default task config and experiment config files.

Evaluating Models

All models can be evaluated using python run.py --exp-config path/to/config.yaml --run-type eval. The relevant config entries for evaluation are:

EVAL_CKPT_PATH_DIR  # path to a checkpoint or a directory of checkpoints
EVAL.USE_CKPT_CONFIG  # if True, use the config saved in the checkpoint file
EVAL.SPLIT  # which dataset split to evaluate on (typically val_seen or val_unseen)
EVAL.EPISODE_COUNT  # how many episodes to evaluate

If EVAL.EPISODE_COUNT is equal to or greater than the number of episodes in the evaluation dataset, all episodes will be evaluated. If EVAL_CKPT_PATH_DIR is a directory, one checkpoint will be evaluated at a time. If there are no more checkpoints to evaluate, the script will poll the directory every few seconds looking for a new one. Each config file listed in the next section is capable of both training and evaluating the model it is accompanied by.

Off-line Data Buffer

All our models require an off-line data buffer for training. To collect the continuous control dataset for both train and val_seen splits, run the following commands before training (Please note that it would take some time on a single GPU to store data. Please also make sure to dedicate around ~1.5 TB of hard-disk space for data collection):

Collect data buffer for train split:

python run.py --exp-config robo_vln_baselines/config/paper_configs/robovln_data_train.yaml --run-type train

Collect data buffer for val_seen split:

python run.py --exp-config robo_vln_baselines/config/paper_configs/robovln_data_val.yaml --run-type train

CUDA

We use 2 GPUs to train our Hierarchical Model hierarchical_cma.yaml. To train the hierarchical model, dedicate 2 GPUs for training as follows:

CUDA_VISIBLE_DEVICES=0,1 python run.py --exp-config robo_vln_baselines/config/paper_configs/hierarchical_cma.yaml --run-type train

Models/Results From the Paper

Model	val_seen SPL	val_unseen SPL	Config
Seq2Seq	0.34	0.30	seq2seq_robo.yaml
PM	0.27	0.24	seq2seq_robo_pm.yaml
CMA	0.25	0.25	cma.yaml
HCM (Ours)	0.43	0.40	hierarchical_cma.yaml

	Legend
Seq2Seq	Sequence-to-Sequence. Please see our paper on modification made to the model to match the continuous action spaces in robo-vln
PM	Progress monitor
CMA	Cross-Modal Attention model. Please see our paper on modification made to the model to match the continuous action spaces in robo-vln
HCM	Hierarchical Cross-Modal Agent Module (The proposed hierarchical VLN model from our paper).

Pretrained Model

We provide pretrained model for our best Hierarchical Cross-Modal Agent (HCM). Pre-trained Model can be downloaded as follows:

Pre-trained Model	Size
HCM_Agent.pth	691 MB

Citation

If you find this repository useful, please cite our paper:

@inproceedings{irshad2021hierarchical,
title={Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation},
author={Muhammad Zubair Irshad and Chih-Yao Ma and Zsolt Kira},
booktitle={Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
year={2021},
url={https://arxiv.org/abs/2104.10674}
}

Acknowledgments

This code is built upon the implementation from VLN-CE

Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows WACV 2022 preprint:https://arxiv.org/abs/2107.1

156 Dec 28, 2022

Comments

lmdb.Error

Thank you for your very interesting research and sharing code. I encountered the following error when I run the command "python run.py --exp-config robo_vln_baselines/config/paper_configs/robovln_data_train.yaml --run-type train": Then I went to check the storage directory of the trajectory data and found that the data has reached 2T, as shown below: data/trajectories_dirs/robo-vln/train/trajectories.lmdb Finally, I checked the hard disk and found that the hard disk still had free space. Please tell me what should I do, thank you.

opened by yyf17 8
Cannot cloning into /robo-vln/environments/habitat-sim/src/deps/imgui
In the last command to install

python setup.py install --headless --with-cuda

but it took to long for me to wait. I had tried to rerun setup.py, however it still didn't work.

Please help me solve this problem.
opened by dee-ex 3
no trainer for Random Agent Error.
@zubair-irshad Thank you for your very interesting research and sharing code. When I run the cmd as following:

CUDA_VISIBLE_DEVICES=0,1 python run.py --exp-config robo_vln_baselines/config/nonlearning.yaml --run-type train

I encountered the following error:
opened by yyf17 1
No such file or directory: 'data/datasets/robo_vln/embeddings.json.gz'

When I run the script for Off-line Data Buffer, it raises an error No such file or directory: 'data/datasets/robo_vln/embeddings.json.gz' I didn't see any instruction for the extract dir 'data/datasets/robo_vln' (just have the dir 'data/datasets/robo_vln_v1'), could you give me some advice? Thanks.

opened by zl9501 1

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Related tags

Overview

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation

Installation

Habitat and Other Dependencies

Data

Matterport3D

Dataset

Robo-VLN Dataset

Depth Encoder Weights

Training and reproducing results

Evaluating Models

Off-line Data Buffer

CUDA

Models/Results From the Paper

Pretrained Model

Citation

Acknowledgments

You might also like...

PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

PyTorch implementation code for the paper MixCo: Mix-up Contrastive Learning for Visual Representation

This is the pytorch code for the paper Curious Representation Learning for Embodied Intelligence.

PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

Comments

lmdb.Error

Cannot cloning into /robo-vln/environments/habitat-sim/src/deps/imgui

no trainer for Random Agent Error.

No such file or directory: 'data/datasets/robo_vln/embeddings.json.gz'

Owner

The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021)

PyTorch code for the paper "FIERY: Future Instance Segmentation in Bird's-Eye view from Surround Monocular Cameras"

Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.