Vision-and-Language Navigation in Continuous Environments using Habitat

Overview

Vision-and-Language Navigation in Continuous Environments (VLN-CE)

Project WebsiteVLN-CE ChallengeRxR-Habitat Challenge

Official implementations:

  • Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments (paper)
  • Waypoint Models for Instruction-guided Navigation in Continuous Environments (paper, README)

Vision and Language Navigation in Continuous Environments (VLN-CE) is an instruction-guided navigation task with crowdsourced instructions, realistic environments, and unconstrained agent navigation. This repo is a launching point for interacting with the VLN-CE task and provides both baseline agents and training methods. Both the Room-to-Room (R2R) and the Room-Across-Room (RxR) datasets are supported. VLN-CE is implemented using the Habitat platform.

VLN-CE comparison to VLN

Setup

This project is developed with Python 3.6. If you are using miniconda or anaconda, you can create an environment:

conda create -n vlnce python3.6
conda activate vlnce

VLN-CE uses Habitat-Sim 0.1.7 which can be built from source or installed from conda:

conda install -c aihabitat -c conda-forge habitat-sim=0.1.7 headless

Then install Habitat-Lab:

git clone --branch v0.1.7 [email protected]:facebookresearch/habitat-lab.git
cd habitat-lab
# installs both habitat and habitat_baselines
python -m pip install -r requirements.txt
python -m pip install -r habitat_baselines/rl/requirements.txt
python -m pip install -r habitat_baselines/rl/ddppo/requirements.txt
python setup.py develop --all

Now you can install VLN-CE:

git clone [email protected]:jacobkrantz/VLN-CE.git
cd VLN-CE
python -m pip install -r requirements.txt

Data

Scenes: Matterport3D

Matterport3D (MP3D) scene reconstructions are used. The official Matterport3D download script (download_mp.py) can be accessed by following the instructions on their project webpage. The scene data can then be downloaded:

# requires running with python 2.7
python download_mp.py --task habitat -o data/scene_datasets/mp3d/

Extract such that it has the form data/scene_datasets/mp3d/{scene}/{scene}.glb. There should be 90 scenes.

Episodes: Room-to-Room (R2R)

The R2R_VLNCE dataset is a port of the Room-to-Room (R2R) dataset created by Anderson et al for use with the Matterport3DSimulator (MP3D-Sim). For details on the porting process from MP3D-Sim to the continuous reconstructions used in Habitat, please see our paper. We provide two versions of the dataset, R2R_VLNCE_v1-2 and R2R_VLNCE_v1-2_preprocessed. R2R_VLNCE_v1-2 contains the train, val_seen, val_unseen, and test splits. R2R_VLNCE_v1-2_preprocessed runs with our models out of the box. It additionally includes instruction tokens mapped to GloVe embeddings, ground truth trajectories, and a data augmentation split (envdrop) that is ported from R2R-EnvDrop. The test split does not contain episode goals or ground truth paths. For more details on the dataset contents and format, see our project page.

Dataset Extract path Size
R2R_VLNCE_v1-2.zip data/datasets/R2R_VLNCE_v1-2 3 MB
R2R_VLNCE_v1-2_preprocessed.zip data/datasets/R2R_VLNCE_v1-2_preprocessed 345 MB

Downloading the dataset:

# R2R_VLNCE_v1-2
gdown https://drive.google.com/uc?id=1YDNWsauKel0ht7cx15_d9QnM6rS4dKUV
# R2R_VLNCE_v1-2_preprocessed
gdown https://drive.google.com/uc?id=18sS9c2aRu2EAL4c7FyG29LDAm2pHzeqQ
Encoder Weights

Baseline models encode depth observations using a ResNet pre-trained on PointGoal navigation. Those weights can be downloaded from here (672M). Extract the contents to data/ddppo-models/{model}.pth.

Episodes: Room-Across-Room (RxR)

Download: RxR_VLNCE_v0.zip

The Room-Across-Room dataset was ported to continuous environments for the RxR-Habitat Challenge hosted at the CVPR 2021 Embodied AI Workshop. The dataset has train, val_seen, val_unseen, and test_challenge splits with both Guide and Follower trajectories ported. The starter code expects files in this structure:

data/datasets
├─ RxR_VLNCE_v0
|   ├─ train
|   |    ├─ train_guide.json.gz
|   |    ├─ train_guide_gt.json.gz
|   |    ├─ train_follower.json.gz
|   |    ├─ train_follower_gt.json.gz
|   ├─ val_seen
|   |    ├─ val_seen_guide.json.gz
|   |    ├─ val_seen_guide_gt.json.gz
|   |    ├─ val_seen_follower.json.gz
|   |    ├─ val_seen_follower_gt.json.gz
|   ├─ val_unseen
|   |    ├─ val_unseen_guide.json.gz
|   |    ├─ val_unseen_guide_gt.json.gz
|   |    ├─ val_unseen_follower.json.gz
|   |    ├─ val_unseen_follower_gt.json.gz
|   ├─ test_challenge
|   |    ├─ test_challenge_guide.json.gz
|   ├─ text_features
|   |    ├─ ...

The baseline models for RxR-Habitat use precomputed BERT instruction features which can be downloaded from here and saved to data/datasets/RxR_VLNCE_v0/text_features/rxr_{split}/{instruction_id}_{language}_text_features.npz.

RxR-Habitat Challenge (RxR Data)

RxR Challenge Teaser GIF

The RxR-Habitat Challenge uses the new Room-Across-Room (RxR) dataset which:

  • contains multilingual instructions (English, Hindi, Telugu),
  • is an order of magnitude larger than existing datasets, and
  • uses varied paths to break a shortest-path-to-goal assumption.

The challenge was hosted at the CVPR 2021 Embodied AI Workshop. While the official challenge is over, the leaderboard remains open and we encourage submissions on this difficult task! For guidelines and access, please visit: ai.google.com/research/rxr/habitat.

Generating Submissions

Submissions are made by running an agent locally and submitting a jsonlines file (.jsonl) containing the agent's trajectories. Starter code for generating this file is provided in the function BaseVLNCETrainer.inference(). Here is an example of generating predictions for English using the Cross-Modal Attention baseline:

python run.py \
  --exp-config vlnce_baselines/config/rxr_baselines/rxr_cma_en.yaml \
  --run-type inference

If you use different models for different languages, you can merge their predictions with scripts/merge_inference_predictions.py. Submissions are only accepted that contain all episodes from all three languages in the test-challenge split. Starter code for this challenge was originally hosted in the rxr-habitat-challenge branch but is now under continual development in master.

VLN-CE Challenge (R2R Data)

The VLN-CE Challenge is live and taking submissions for public test set evaluation. This challenge uses the R2R data ported in the original VLN-CE paper.

To submit to the leaderboard, you must run your agent locally and submit a JSON file containing the generated agent trajectories. Starter code for generating this JSON file is provided in the function BaseVLNCETrainer.inference(). Here is an example of generating this file using the pretrained Cross-Modal Attention baseline:

python run.py \
  --exp-config vlnce_baselines/config/r2r_baselines/test_set_inference.yaml \
  --run-type inference

Predictions must be in a specific format. Please visit the challenge webpage for guidelines.

Baseline Performance

The baseline model for the VLN-CE task is the cross-modal attention model trained with progress monitoring, DAgger, and augmented data (CMA_PM_DA_Aug). As evaluated on the leaderboard, this model achieves:

Split TL NE OS SR SPL
Test 8.85 7.91 0.36 0.28 0.25
Val Unseen 8.27 7.60 0.36 0.29 0.27
Val Seen 9.06 7.21 0.44 0.34 0.32

This model was originally presented with a val_unseen performance of 0.30 SPL, however the leaderboard evaluates this same model at 0.27 SPL. The model was trained and evaluated on a hardware + Habitat build that gave slightly different results, as is the case for the other paper experiments. Going forward, the leaderboard contains the performance metrics that should be used for official comparison. In our tests, the installation procedure for this repo gives nearly identical evaluation to the leaderboard, but we recognize that compute hardware along with the version and build of Habitat are factors to reproducibility.

For push-button replication of all VLN-CE experiments, see here.

Starter Code

The run.py script controls training and evaluation for all models and datasets:

python run.py \
  --exp-config path/to/experiment_config.yaml \
  --run-type {train | eval | inference}

For example, a random agent can be evaluated on 10 val-seen episodes of R2R using this command:

python run.py --exp-config vlnce_baselines/config/r2r_baselines/nonlearning.yaml --run-type eval

For lists of modifiable configuration options, see the default task config and experiment config files.

Training Agents

The DaggerTrainer class is the standard trainer and supports teacher forcing or dataset aggregation (DAgger). This trainer saves trajectories consisting of RGB, depth, ground-truth actions, and instructions to disk to avoid time spent in simulation.

The RecollectTrainer class performs teacher forcing using the ground truth trajectories provided in the dataset rather than a shortest path expert. Also, this trainer does not save episodes to disk, instead opting to recollect them in simulation.

Both trainers inherit from BaseVLNCETrainer.

Evaluating Agents

Evaluation on validation splits can be done by running python run.py --exp-config path/to/experiment_config.yaml --run-type eval. If EVAL.EPISODE_COUNT == -1, all episodes will be evaluated. If EVAL_CKPT_PATH_DIR is a directory, each checkpoint will be evaluated one at a time.

Cuda

Cuda will be used by default if it is available. We find that one GPU for the model and several GPUs for simulation is favorable.

SIMULATOR_GPU_IDS: [0]  # list of GPU IDs to run simulations
TORCH_GPU_ID: 0  # GPU for pytorch-related code (the model)
NUM_ENVIRONMENTS: 1  # Each GPU runs NUM_ENVIRONMENTS environments

The simulator and torch code do not need to run on the same device. For faster training and evaluation, we recommend running with as many NUM_ENVIRONMENTS as will fit on your GPU while assuming 1 CPU core per env.

License

The VLN-CE codebase is MIT licensed. Trained models and task datasets are considered data derived from the mp3d scene dataset. Matterport3D based task datasets and trained models are distributed with Matterport3D Terms of Use and under CC BY-NC-SA 3.0 US license.

Citing

If you use VLN-CE in your research, please cite the following paper:

@inproceedings{krantz_vlnce_2020,
  title={Beyond the Nav-Graph: Vision and Language Navigation in Continuous Environments},
  author={Jacob Krantz and Erik Wijmans and Arjun Majundar and Dhruv Batra and Stefan Lee},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2020}
 }

If you use the RxR-Habitat data, please additionally cite the following paper:

@inproceedings{ku2020room,
  title={Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding},
  author={Ku, Alexander and Anderson, Peter and Patel, Roma and Ie, Eugene and Baldridge, Jason},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  pages={4392--4412},
  year={2020}
}
Comments
  • ValueError: Type mismatch (<class 'habitat.config.default.Config'> vs. <class 'yacs.config.CfgNode'>) with values (DATASET:

    ValueError: Type mismatch ( vs. ) with values (DATASET:

    hello,I'm running python run.py --exp-config vlnce_ baselines/config/ nonlearning.yaml --The following error occurred during run type Eval:

    ValueError: Type mismatch (<class 'habitat.config.default.Config'> vs. <class 'yacs.config.CfgNode'>) with values (DATASET:

    I don't understand why the report is wrong. I look forward to your reply.

    opened by W-xf 5
  • [RxR-Habitat] What kind of GPU is needed to train the cma policy with the original config?

    [RxR-Habitat] What kind of GPU is needed to train the cma policy with the original config?

    Hi there,

    While running the starter code for the rxr challenge I found a single NVIDIA 2080ti GPU's VRAM (11GiB) could only fit batch_size 1 with the cma policy and max_traj_len 250. Although we could set effective_batch_size but it is set to be -1 while batch_size to be 3 in the original config. So I'm wondering what kind of GPU is needed to train the cma policy with the original config?

    Also, I found with batch_size 1, max_traj_len 250, preload_size 30 and 9 environments simulated, the used RAM will be more that 40 GiB. Is it normal?

    Thanks!

    opened by wz0919 3
  • Bring back v0.1.4 ShortestPathFollower

    Bring back v0.1.4 ShortestPathFollower

    In response to #7.

    The dataset path generation and pruning was performed using the Habitat-Lab v0.1.4 ShortestPathFollower. The VLN-CE paper also used this follower as an oracle. Habitat v0.1.5 updated the ShortestPathFollower to slightly different behavior. For compatibility with the oracle used for dataset generation and in the VLN-CE paper, we are bringing back the v0.1.4 ShortestPathFollower (ShortestPathFollowerCompat) as default. To instead use the Habitat path follower in the VLNOracleActionSensor, update the task config to include:

      TASK:
        VLN_ORACLE_ACTION_SENSOR:
          USE_ORIGINAL_FOLLOWER: False
    
    opened by jacobkrantz 3
  • [RxR-VLNCE Challenge] How to get the error log fron RxR leadboard

    [RxR-VLNCE Challenge] How to get the error log fron RxR leadboard

    I have tried some times to submmit our results. But the status of our attempts are error. (We have used the standard RxR task config.) I want to know if there is any way to check the error log file?

    Thanks for your attention to this matter! Best regards,

    opened by MarSaKi 2
  • How to use multiple GPUs to train dagger models?

    How to use multiple GPUs to train dagger models?

    Thanks for the great work.

    When I run training using dagger_trainer.py, I found that a large part of training time is taken by 1) collecting data and 2) training the model using collected data. The first process can be speeded up by setting more simulator GPU (SIMULATOR_GPU_IDS). However, the second process can only use one GPU (TORCH_GPU_ID) by default.

    Is there any easy way to use multiple GPUs to speed up the second process? Or should I use torch.distributed to reproduce the code by myself?

    Many thanks!

    opened by PeihaoChen 2
  • [RxR-Habitat] Eval baseline reproduction

    [RxR-Habitat] Eval baseline reproduction

    Hello, I tried to reproduce the baseline performance using the same yaml file listed on the README-

    python run.py \
      --exp-config vlnce_baselines/config/rxr_configs/rxr_cma_en.yaml \
      --run-type train
    

    My experimental setup used 4 TITAN X GPUs with 4 environments each. Referring issue #17 , I set my batch_size: 1 and effective_batch_size: 3 to successfully train. No other changes have been made to the codebase.

    After evaluating my saved checkpoint, I found the following metrics (all average across episodes):

    steps_taken: 350.443718 path_length: 6.737881 distance_to_goal: 11.082229 success: 0.066503 oracle_success: 0.180703 spl: 0.055868 ndtw: 0.358719

    Comparing these to Table 2 entry for Seq2Seq w/ RGBD, Instructions, and History, I found my performance to be significantly lower for the matching metrics.

    Is this the right config to be used to match the relevant baseline?

    opened by nikwalia 2
  • How to get  panoramas ?

    How to get panoramas ?

    Hi, I find when I use env.observation I only get one image. However, the CMA model should use panorams, right ? Is ther someone konws how to get the panoramas ?

    opened by Mingxiao-Li 2
  • Hope to provide more detailed content about embeddings.json.gz

    Hope to provide more detailed content about embeddings.json.gz

    Hello, I am very interested in your research. I hope to get the details about embeddings.json.gz: the correspondence among words - word embedding - instruction_tokens. I would be very grateful if I could get your reply.

    opened by Dominique-github 2
  • habitat-sim problem: Platform::WindowlessEglApplication::tryCreateContext(): no EGL devices found

    habitat-sim problem: Platform::WindowlessEglApplication::tryCreateContext(): no EGL devices found

    When I want to try this code, I must install the haistat-sim first, but I encounter a big bug. I followed the issue in the haistat sim No.288, but it did not solve my problem. Who has some good suggestions?

    I follow the codes:

    conda install -c aihabitat -c conda-forge habitat-sim=0.1.7 headless
    
    git clone --branch v0.1.7 [email protected]:facebookresearch/habitat-lab.git
    
    python -m pip install -r requirements.txt
    python -m pip install -r habitat_baselines/rl/requirements.txt
    python -m pip install -r habitat_baselines/rl/ddppo/requirements.txt
    python setup.py develop --all
    
    wget http://dl.fbaipublicfiles.com/habitat/habitat-test-scenes.zip
    unzip habitat-test-scenes.zip
    
    python examples/example.py
    

    Then get the following errors:

    I1010 18:31:04.590993 11476 SceneGraph.h:93] Created DrawableGroup: Platform::WindowlessEglApplication::tryCreateContext(): unable to find EGL device for CUDA device 0 WindowlessContext: Unable to create windowless context

    OR

    I1207 08:31:49.998020 1190 SceneGraph.h:93] Created DrawableGroup: Platform::WindowlessEglApplication::tryCreateContext(): no EGL devices found, likely a driver issue; enable --magnum-gpu-validation to see additional info WindowlessContext: Unable to create windowless context

    I can confirm that my NVIDIA driver is correct, cuda is also OK, since I can run other CNN-related codes well.

    In fact, I can run habitat correctly in my PC, but I have always encountered this error in docker of my server.

    I have tried multiple versions of the NVIDIA driver and cuda, as well as some possible dependent lib versions, such as libgl in my server, following haistat sim No.288 .

    The following are all the differences I can understand between PC and server's docker:

    with **ldconfig -N -v | grep libEGL**, In the server's docker:

    /sbin/ldconfig.real: Can't stat /usr/local/cuda/compat/lib: No such file or directory /sbin/ldconfig.real: Path /usr/local/cuda/lib64' given more than once /sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib: No such file or directory /sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib64: No such file or directory /sbin/ldconfig.real: Can't stat /usr/local/lib/x86_64-linux-gnu: No such file or directory /sbin/ldconfig.real: Path/lib/x86_64-linux-gnu' given more than once /sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once /sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.27.so is the dynamic linker, ignoring /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.440.100 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.440.100 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.440.100 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.440.100 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvcuvid.so.440.100 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.440.100 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.440.100 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.440.100 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.440.100 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.440.100 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.440.100 is empty, not checked. /sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libvdpau_nvidia.so.440.100 is empty, not checked. libEGL_mesa.so.0 -> libEGL_mesa.so.0.0.0 libEGL.so.1 -> libEGL.so.1.0.0 libEGL_nvidia.so.0 -> libEGL_nvidia.so.470.141.03

    BUT in the PC:

    /sbin/ldconfig.real: Path /lib/x86_64-linux-gnu' given more than once /sbin/ldconfig.real: Path/usr/lib/x86_64-linux-gnu' given more than once /sbin/ldconfig.real: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7 is not a symbolic link /sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.23.so is the dynamic linker, ignoring libEGL.so.1 -> libEGL.so.1.1.0 libEGL_nvidia.so.0 -> libEGL_nvidia.so.440.44

    It seems to be a problem from this difference? Who has some good solutions, please help me. Thanks.

    opened by FutureGoingOn 1
  • The ddppo download link is broken.

    The ddppo download link is broken.

    "Baseline models encode depth observations using a ResNet pre-trained on PointGoal navigation. Those weights can be downloaded from here (672M). Extract the contents to data/ddppo-models/{model}.pth." The ddppo download link is broken.

    opened by sunqiang85 1
  • Bump tensorflow from 1.13.1 to 2.7.2

    Bump tensorflow from 1.13.1 to 2.7.2

    Bumps tensorflow from 1.13.1 to 2.7.2.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.7.2

    Release 2.7.2

    This releases introduces several vulnerability fixes:

    TensorFlow 2.7.1

    Release 2.7.1

    This releases introduces several vulnerability fixes:

    • Fixes a floating point division by 0 when executing convolution operators (CVE-2022-21725)
    • Fixes a heap OOB read in shape inference for ReverseSequence (CVE-2022-21728)
    • Fixes a heap OOB access in Dequantize (CVE-2022-21726)
    • Fixes an integer overflow in shape inference for Dequantize (CVE-2022-21727)
    • Fixes a heap OOB access in FractionalAvgPoolGrad (CVE-2022-21730)
    • Fixes an overflow and divide by zero in UnravelIndex (CVE-2022-21729)
    • Fixes a type confusion in shape inference for ConcatV2 (CVE-2022-21731)
    • Fixes an OOM in ThreadPoolHandle (CVE-2022-21732)
    • Fixes an OOM due to integer overflow in StringNGrams (CVE-2022-21733)
    • Fixes more issues caused by incomplete validation in boosted trees code (CVE-2021-41208)
    • Fixes an integer overflows in most sparse component-wise ops (CVE-2022-23567)
    • Fixes an integer overflows in AddManySparseToTensorsMap (CVE-2022-23568)

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.7.2

    This releases introduces several vulnerability fixes:

    Release 2.6.4

    This releases introduces several vulnerability fixes:

    • Fixes a code injection in saved_model_cli (CVE-2022-29216)
    • Fixes a missing validation which causes TensorSummaryV2 to crash (CVE-2022-29193)
    • Fixes a missing validation which crashes QuantizeAndDequantizeV4Grad (CVE-2022-29192)
    • Fixes a missing validation which causes denial of service via DeleteSessionTensor (CVE-2022-29194)
    • Fixes a missing validation which causes denial of service via GetSessionTensor (CVE-2022-29191)
    • Fixes a missing validation which causes denial of service via StagePeek (CVE-2022-29195)
    • Fixes a missing validation which causes denial of service via UnsortedSegmentJoin (CVE-2022-29197)
    • Fixes a missing validation which causes denial of service via LoadAndRemapMatrix (CVE-2022-29199)
    • Fixes a missing validation which causes denial of service via SparseTensorToCSRSparseMatrix (CVE-2022-29198)
    • Fixes a missing validation which causes denial of service via LSTMBlockCell (CVE-2022-29200)
    • Fixes a missing validation which causes denial of service via Conv3DBackpropFilterV2 (CVE-2022-29196)
    • Fixes a CHECK failure in depthwise ops via overflows (CVE-2021-41197)
    • Fixes issues arising from undefined behavior stemming from users supplying invalid resource handles (CVE-2022-29207)
    • Fixes a segfault due to missing support for quantized types (CVE-2022-29205)
    • Fixes a missing validation which results in undefined behavior in SparseTensorDenseAdd (CVE-2022-29206)

    ... (truncated)

    Commits
    • dd7b8a3 Merge pull request #56034 from tensorflow-jenkins/relnotes-2.7.2-15779
    • 1e7d6ea Update RELEASE.md
    • 5085135 Merge pull request #56069 from tensorflow/mm-cp-52488e5072f6fe44411d70c6af09e...
    • adafb45 Merge pull request #56060 from yongtang:curl-7.83.1
    • 01cb1b8 Merge pull request #56038 from tensorflow-jenkins/version-numbers-2.7.2-4733
    • 8c90c2f Update version numbers to 2.7.2
    • 43f3cdc Update RELEASE.md
    • 98b0a48 Insert release notes place-fill
    • dfa5cf3 Merge pull request #56028 from tensorflow/disable-tests-on-r2.7
    • 501a65c Disable timing out tests
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • R2RBaseline

    R2RBaseline

    Hi! Thank you for publishing this great work! I followed the example in https://github.com/jacobkrantz/VLN-CE#vln-ce-challenge-r2r-data and tried to reproduce the predictions.json. However, there are two main issues in current codebase. python run.py --exp-config vlnce_baselines/config/r2r_baselines/test_set_inference.yaml --run-type inference

    1. AttributeError: 'InstructionData' object has no attribute 'instruction_id'
    File "~/VLN-CE/vlnce_baselines/common/base_il_trainer.py", line 511, in inference
        k = current_episodes[i].instruction.instruction_id
    

    This can be resolved by adding FORMAT: r2r in vlnce_baselines/config/r2r_baselines/test_set_inference.yaml

    1. RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor
    my ENV:
    ubuntu 20.04
    torch=1.7.0+cu110
    
      File "~/anaconda3/envs/vlnce/lib/python3.6/site-packages/torch/nn/utils/rnn.py", line 244, in pack_padded_sequence
        _VF._pack_padded_sequence(input, lengths, batch_first)
    RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor
    

    As set in https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.pack_padded_sequence.html: torch.nn.utils.rnn.pack_padded_sequence(input, lengths, batch_first=False, enforce_sorted=True) lengths (Tensor or list(int)) – list of sequence lengths of each batch element (must be on the CPU if provided as a tensor).

    This can be solved by changing the code in vlnce_baselines/models/encoders/instruction_encoder.py L78

            - lengths = (lengths != 0.0).long().sum(dim=1)
            + lengths = (lengths != 0.0).long().sum(dim=1).cpu()
    
    opened by MuMuJun97 1
  • The number of ground-truth actions does not match the number of steps in R2R_VLNCE_v1-3 gt. dataset

    The number of ground-truth actions does not match the number of steps in R2R_VLNCE_v1-3 gt. dataset

    I have downloaded the preprocessed R2R datasets from this official website. In {split}_gt.json.gz, the field 'actions' contains ground truth actions, which should produce the coordinates stored in the field 'locations'. However, the numbers of the elements in these 2 fields do not equal.

    Could anyone give me a hint on how to relate these 2 fields? Thanks

    opened by ZJULiHongxin 0
  • Bump tensorflow from 1.13.1 to 2.9.3

    Bump tensorflow from 1.13.1 to 2.9.3

    Bumps tensorflow from 1.13.1 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • DefaultCPUAllocator: can't allocate memory: you tried to allocate 1589575680 bytes.

    DefaultCPUAllocator: can't allocate memory: you tried to allocate 1589575680 bytes.

    Hi Jacob,

    Following the setup instructions in https://github.com/jacobkrantz/VLN-CE to run the model, with

    python run.py --exp-config=vlnce_baselines/config/rxr_baselines/rxr_cma_en.yml --run-type=train
    

    I got the following error:

      File "/home/wangsu/anaconda3/envs/vlnce_py3.6_h0.1.7/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/wangsu/KrantzVLNCE/habitat-lab/VLN-CE/vlnce_baselines/models/encoders/resnet_encoders.py", line 199, in forward
        resnet_output = self.cnn(normalize(rgb_observations))
      File "/home/wangsu/anaconda3/envs/vlnce_py3.6_h0.1.7/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/wangsu/anaconda3/envs/vlnce_py3.6_h0.1.7/lib/python3.6/site-packages/torch/nn/modules/container.py", line 141, in forward
        input = module(input)
      File "/home/wangsu/anaconda3/envs/vlnce_py3.6_h0.1.7/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/wangsu/anaconda3/envs/vlnce_py3.6_h0.1.7/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 446, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "/home/wangsu/anaconda3/envs/vlnce_py3.6_h0.1.7/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
        self.padding, self.dilation, self.groups)
    RuntimeError: [enforce fail at CPUAllocator.cpp:68] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 1589575680 bytes. Error code 12 (Cannot allocate memory)
    

    My machine however does have enough memory:

                  total        used        free      shared  buff/cache   available
    Mem:    27390640128   574816256 25532850176     8773632  1282973696 26407305216
    

    Could you help look into this please? Thanks!

    opened by wangsu-google-language 0
  • Instruction encoder

    Instruction encoder

    Could you please provide details on your instruction encoder? I would like to test the agent on new data, but you only supply pre-computed text weights. Thanks

    opened by idansc 0
  • VLNCE questions

    VLNCE questions

    Thanks for the incredible effort on putting this dataset together! I was wondering how can I find the continuous trajectory actions/camera poses for each episode. If I look into "xR_VLNCE_v0/train/train_guide.json.gz", each episode has a trajectory_id field. Does this correspond to the keys in "RxR_VLNCE_v0/train/train_guide_gt.json.gz"? Or is it episode_id that corresponds?

    In addition, where can I find the camera poses (location and rotation) for each trajectory? There's an "actions" field in "RxR_VLNCE_v0/train/train_guide_gt.json.gz", how do the actions integers map to actions (1 forward, 2 turn left, 3 turn right)? What does the "locations" field mean there? I would really appreciate if you could help understand the field structure a bit better.

    Thanks!

    opened by mbautistamartin 0
Owner
Jacob Krantz
PhD student at Oregon State University
Jacob Krantz
Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

YicongHong 34 Nov 15, 2022
Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

Arjun Majumdar 44 Dec 14, 2022
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

null 43 Nov 21, 2022
MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

page_type languages products description sample python azure azure-machine-learning-service azure-devops Code which demonstrates how to set up and ope

null 1 Nov 1, 2021
A task-agnostic vision-language architecture as a step towards General Purpose Vision

Towards General Purpose Vision Systems By Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, and Derek Hoiem Overview Welcome to the official code base f

AI2 79 Dec 23, 2022
PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

Zhe Niu 28 Dec 19, 2022
An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

Pi Zero Bikecomputer An open-source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+ https://github.com/hishizuka/pizero_bikecompute

hishizuka 264 Jan 2, 2023
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Salesforce 1.3k Dec 31, 2022
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

null 185 Dec 26, 2022
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

null 310 Dec 28, 2022
Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python

Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python THIS PROJECT IS CURRENTLY A WORK IN PROGRESS AND THUS THIS REPOSITORY I

Joshua Marshall 14 Dec 31, 2022
Official implementation of "Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation" (RSS 2022)

Intro Official implementation of "Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation" Robotics:Science and

Yunho Kim 21 Dec 7, 2022
The most simple and minimalistic navigation dashboard.

Navigation This project follows a goal to have simple and lightweight dashboard with different links. I use it to have my own self-hosted service dash

Yaroslav 23 Dec 23, 2022
Winning solution of the Indoor Location & Navigation Kaggle competition

This repository contains the code to generate the winning solution of the Kaggle competition on indoor location and navigation organized by Microsoft

Tom Van de Wiele 62 Dec 28, 2022
The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

PointNav-VO The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation Project Page | Paper Table of Contents Setup

Xiaoming Zhao 41 Dec 15, 2022
Pathdreamer: A World Model for Indoor Navigation

Pathdreamer: A World Model for Indoor Navigation This repository hosts the open source code for Pathdreamer, to be presented at ICCV 2021. Paper | Pro

Google Research 122 Jan 4, 2023
Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator

DRL-robot-navigation Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gra

null 87 Jan 7, 2023
SAAVN - Sound Adversarial Audio-Visual Navigation,ICLR2022 (In PyTorch)

SAAVN SAAVN Code release for paper "Sound Adversarial Audio-Visual Navigation,IC

YinfengYu 10 Aug 30, 2022
Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

gHHC Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, D

Nicholas Monath 35 Nov 16, 2022