Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Overview

Recurrent VLN-BERT

Code of the Recurrent-VLN-BERT paper: A Recurrent Vision-and-Language BERT for Navigation
Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould

[Paper & Appendices | GitHub]

Prerequisites

Installation

Install the Matterport3D Simulator. Please find the versions of packages in our environment here.

Install the Pytorch-Transformers. In particular, we use this version (same as OSCAR) in our experiments.

Data Preparation

Please follow the instructions below to prepare the data in directories:

Initial OSCAR and PREVALENT weights

Please refer to vlnbert_init.py to set up the directories.

  • Pre-trained OSCAR weights
    • Download the base-no-labels following this guide.
  • Pre-trained PREVALENT weights
    • Download the pytorch_model.bin from here.

Trained Network Weights

R2R Navigation

Please read Peter Anderson's VLN paper for the R2R Navigation task.

Reproduce Testing Results

To replicate the performance reported in our paper, load the trained network weights and run validation:

bash run/test_agent.bash

You can simply switch between the OSCAR-based and the PREVALENT-based VLN models by changing the arguments vlnbert (oscar or prevalent) and load (trained model paths).

Training

Navigator

To train the network from scratch, simply run:

bash run/train_agent.bash

The trained Navigator will be saved under snap/.

Citation

If you use or discuss our Recurrent VLN-BERT, please cite our paper:

@article{hong2020recurrent,
  title={A Recurrent Vision-and-Language BERT for Navigation},
  author={Hong, Yicong and Wu, Qi and Qi, Yuankai and Rodriguez-Opazo, Cristian and Gould, Stephen},
  journal={arXiv preprint arXiv:2011.13922},
  year={2020}
}
Comments
  • Unable to test code

    Unable to test code

    Hello Yicong,

    Can you please add a section in the README about using Matterport3DSimulator docker image with your code? The documentation is missing details on where to put the ResNet zip, prevalent JSON, and the PyTorch model. It is unclear how MatterPort3DSimulator works with your code.

    Thanks

    opened by gmuraleekrishna 7
  • the data file R2R_test.json wasn't used when testing?

    the data file R2R_test.json wasn't used when testing?

    Hi,yicong! I have reproduced this codebase. While I tried run/test_agent.bash, I notice the data file R2R_test.json wasn't used by the test. So I set the key parameter 'submit' as 1 and rewirte the file 'id_paths.json' to test without any other change. And I get the following results.

    `Optimizer: Using AdamW Namespace(IMAGENET_FEATURES='img_features/ResNet-152-imagenet.tsv', angle_feat_size=128, aug=None, batchSize=16, description='VLNBERT-test-Prevalent', dropout=0.5, epsilon=0.1, featdropout=0.4, feature_size=2048, features='places365', feedback='sample', gamma=0.9, ignoreid=-100, iters=300000, load='snap/VLNBERT-PREVALENT-final/state_dict/best_val_unseen', loadOptim=False, log_dir='snap/VLNBERT-test-Prevalent', lr=1e-05, maxAction=15, maxInput=80, ml_weight=0.2, name='VLNBERT-test-Prevalent', normalize_loss='total', optim='adamW', optimizer=<class 'torch.optim.adamw.AdamW'>, submit=1, teacher='final', teacher_weight=1.0, test_only=0, train='validlistener', vlnbert='prevalent', weight_decay=0.0, zero_init=False)

    Start loading the image feature ... (~50 seconds) Finish Loading the image feature from img_features/ResNet-152-places365.tsv in 54.7334 seconds The feature size is 2048 Loading navigation graphs for 61 scans R2RBatch loaded with 14039 instructions, using splits: train The feature size is 2048 Loading navigation graphs for 59 scans R2RBatch loaded with 1501 instructions, using splits: val_train_seen The feature size is 2048 Loading navigation graphs for 56 scans R2RBatch loaded with 1021 instructions, using splits: val_seen The feature size is 2048 Loading navigation graphs for 11 scans R2RBatch loaded with 2349 instructions, using splits: val_unseen The feature size is 2048 Loading navigation graphs for 18 scans R2RBatch loaded with 4173 instructions, using splits: test

    Initalizing the VLN-BERT model ... Loaded the listener model at iter 114000 from snap/VLNBERT-PREVALENT-final/state_dict/best_val_unseen result length 1501 Env name: val_train_seen, nav_error: 0.8354, oracle_error: 0.6634, steps: 5.1845, lengths: 10.0276, success_rate: 0.9394, oracle_rate: 0.9520, spl: 0.9124 result length 1021 Env name: val_seen, nav_error: 2.8968, oracle_error: 1.9405, steps: 5.5436, lengths: 11.1379, success_rate: 0.7228, oracle_rate: 0.7826, spl: 0.6775 result length 2349 Env name: val_unseen, nav_error: 3.9255, oracle_error: 2.5431, steps: 6.1243, lengths: 12.0028, success_rate: 0.6279, oracle_rate: 0.7024, spl: 0.5688 result length 4173 Env name: test, nav_error: 9.0420, oracle_error: 0.0000, steps: 6.1107, lengths: 12.3490, success_rate: 0.0357, oracle_rate: 1.0000, spl: 0.0000`

    I am really shocked by the results on the test data. Do I make some mistakes? Where is it?

    opened by LiHui1116 6
  • ModuleNotFoundError: No module named 'transformers.pytorch_transformers'

    ModuleNotFoundError: No module named 'transformers.pytorch_transformers'

    Hello, I was trying to run the model with bash run/test_agent.bash as instructed in your readme but i get the error: Optimizer: Using AdamW To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html Traceback (most recent call last): File "r2r_src/train.py", line 13, in <module> from agent import Seq2SeqAgent File "/Recurrent-VLN-BERT/r2r_src/agent.py", line 21, in <module> import model_OSCAR, model_PREVALENT File "/Recurrent-VLN-BERT/r2r_src/model_OSCAR.py", line 7, in <module> from vlnbert.vlnbert_init import get_vlnbert_models File "/Recurrent-VLN-BERT/r2r_src/vlnbert/vlnbert_init.py", line 3, in <module> from transformers.pytorch_transformers import (BertConfig, BertTokenizer) ModuleNotFoundError: No module named 'transformers.pytorch_transformers' I have transformers and pytorch transformers installed also the old version of pytorch-pretrained-bert and am unsure of what is causing this, any help? thanks in advance

    opened by Jaluco 3
  • Mismatch between weights?

    Mismatch between weights?

    Hi there,

    Congratulations for your CVPR paper and for releasing your code. I was wondering whether you could clarify the structure of the checkpoints you released. I'm interested in the OSCAR version of your model and I tried to load it. However, it looks like the following parameters cannot be found:

    'img_projection.weight', 'img_projection.bias'
    

    I tried to inspect the VLNBert class in the file vlnbert_OSCAR.py and it looks like there is not a module called img_projection. Instead, seems there is one in the vlnbert_PREVALENT.py file. In addition, even in the original OSCAR codebase I cannot find a mention to the img_projection layer (https://github.com/microsoft/Oscar/blob/master/oscar/modeling/modeling_bert.py). Could you please verify that the released model checkpoints are correct and referring to the correct models?

    Thanks, Alessandro

    opened by aleSuglia 3
  • is it possible to have a branch for REVERIE

    is it possible to have a branch for REVERIE

    Hello,

    Thanks a lot for maintaining your open-source code!

    As mentioned in #9, is it possible to have models and code available for REVERIE? I would like to have a fair comparison of your approach.

    opened by volkancirik 2
  • Details about the no init. OSCAR model

    Details about the no init. OSCAR model

    Hi Yicong, I wonder how do you initialize the no init. OSCAR model to get the results reported in the paper. Did you initialize all the parameters randomly or use some pretrained weights, e.g., initialize the language part with Bert pretrained weights?

    opened by Jackie-Chou 2
  • Why don't you use ‘speaker’ during training?

    Why don't you use ‘speaker’ during training?

    Hi! I don't see any codes about 'speaker', a useful way to make data augmentation for R2R. I am wondering why you delete the speaker part in your codes? Or have you done the experiments to show that using speaker doesn't work well in your method? Thanks a lot!

    opened by CrystalSixone 2
  • The vocab size

    The vocab size

    Hi, yicong,

    Thanks for your great work! I found the vocab size of R2R is 991,but the vocab size of Prevalent aug data is 1101. Additionaly, the Prevalent instructions is generated based on a speaker model trained on R2R dataset. Do you have any idea about this?

    Thanks,

    opened by MarSaKi 2
  • Failed to build Matterport3D Simulator

    Failed to build Matterport3D Simulator

    Hi Yicong,

    This is not directly related to your code, but I've spent hours trying to follow the Matterport3DSimulator repo to build it, I encountered issues either building with or without Docker.

    With docker, MatterSim can be built, but it is only available for system python, since I used anaconda on the lab server, importing matterSim will fail in my anaconda environment.

    Without docker, the build failed. It's some line in the code has an error. line 59 of src/lib/NavGraph.cpp. CV_LOAD_IMAGE_ANYDEPTH is not defined in the scope. I only downloaded matterport_skybox_images, and this might be the problem (however, the readme.md in matterport3dsimulator says matterport_skybox_images is what you need to get the simulator to build and work) I wonder what data did you download from matterport 3D dataset?

    Best, Jason

    opened by jasonppy 2
  • Why split instructions?

    Why split instructions?

    Hi Yicong,

    Thanks for open source your code!

    I wonder why do you split instructions in /r2r_src/env.py, line 129 to 142

    # Split multiple instructions into separate entries
    for j, instr in enumerate(item['instructions']):
        try:
            new_item = dict(item)
            new_item['instr_id'] = '%s_%d' % (item['path_id'], j)
            new_item['instructions'] = instr
    
            ''' BERT tokenizer '''
            instr_tokens = tokenizer.tokenize(instr)
            padded_instr_tokens, num_words = pad_instr_tokens(instr_tokens, args.maxInput)
            new_item['instr_encoding'] = tokenizer.convert_tokens_to_ids(padded_instr_tokens)
    
            if new_item['instr_encoding'] is not None:  # Filter the wrong data
                self.data.append(new_item)
                scans.append(item['scan'])
        except:
            continue
    

    This is done for original path-instruction but not for prevalent_aug.json. I wonder why do you do this. I understand that instructions in the original data is a bit long, but if you split then in to separate VLN jobs, while the desired path is always the complete path, how can an agent (or human) possibly do that?

    Best, Jason

    opened by jasonppy 2
  • Specify license for the code

    Specify license for the code

    Hello,

    Thanks again for your codebase. It was very useful indeed and congratulations for your accepted paper. I was wondering whether you could please add a license to your codebase so that it's very clear how this code can be used by third parties.

    Thanks, Alessandro

    opened by aleSuglia 2
Owner
YicongHong
I don't even know where is the end of our universe, how am I suppose to know that?
YicongHong
天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch

天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch

zxx飞翔的鱼 751 Dec 30, 2022
History Aware Multimodal Transformer for Vision-and-Language Navigation

History Aware Multimodal Transformer for Vision-and-Language Navigation This repository is the official implementation of History Aware Multimodal Tra

Shizhe Chen 46 Nov 23, 2022
VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

Salesforce 44 Nov 1, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Ubiquitous Knowledge Processing Lab 59 Dec 1, 2022
Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Zhenhailong Wang 2 Jul 15, 2022
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

UIS-RNN Overview This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm. UIS-RNN solves the problem of s

Google 1.4k Dec 28, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2.3k Dec 29, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2k Feb 9, 2021
Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

NLP-Summarizer Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5 This project aimed to provide in

Samuel Sharkey 1 Feb 7, 2022
Ongoing research training transformer language models at scale, including: BERT & GPT-2

What is this fork of Megatron-LM and Megatron-DeepSpeed This is a detached fork of https://github.com/microsoft/Megatron-DeepSpeed, which in itself is

BigScience Workshop 316 Jan 3, 2023
Ongoing research training transformer language models at scale, including: BERT & GPT-2

Megatron (1 and 2) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA.

NVIDIA Corporation 3.5k Dec 30, 2022
Pre-training BERT masked language models with custom vocabulary

Pre-training BERT Masked Language Models (MLM) This repository contains the method to pre-train a BERT model using custom vocabulary. It was used to p

Stella Douka 14 Nov 2, 2022
Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT.

KR-BERT-SimCSE Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT. Training Unsupervised python train_unsupervised.py --mi

Jeong Ukjae 27 Dec 12, 2022
End-2-end speech synthesis with recurrent neural networks

Introduction New: Interactive demo using Google Colaboratory can be found here TTS-Cube is an end-2-end speech synthesis system that provides a full p

Tiberiu Boros 214 Dec 7, 2022
A Japanese tokenizer based on recurrent neural networks

Nagisa is a python module for Japanese word segmentation/POS-tagging. It is designed to be a simple and easy-to-use tool. This tool has the following

null 325 Jan 5, 2023
PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

data2vec-pytorch PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI (F

Aryan Shekarlaban 105 Jan 4, 2023