Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

YicongHong

Last update: Nov 15, 2022

Related tags

Deep Learning vision-and-language-navigation

Overview

Fine-Grained R2R

Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation.

Code of the navigator will be released soon.

This dataset enriches the benchmark Room-to-Room (R2R) dataset by dividing the instructions into sub-instructions and pairing each of those with their corresponding viewpoints in the path.

The copyright resides with the authors of the paper Sub-Instruction Aware Vision-and-Language Navigation.
This dataset is build upon the Room-to-Room (R2R) dataset, we refer the readers to its repository for more details.

Data

The Fine-Grained R2R data, which enriches the R2R dataset with sub-instructions and their corresponding paths. The overall instruction and trajectory of each sample remains the same.

For paths in the train, the validation seen and the validation unseen splits, we add two new entries:
- new_instructions: A list of sub-instructions produced by the Chunking Function from the complete instructions. You can use import ast and ast.literal_eval() to read it a list.
- chunk_view: A list of sub-paths corresponding to the sub-instructions, where each number in the list is an index of a viewpoint in the ground-truth path. The index starts at 1.
Some sub-instructions which refer to camera rotation or a STOP action could match to a single viewpoint.
For the test unseen split, we only provide the sub-instructions but not the sub-paths.

Source

The code of the proposed Chunking Function for generating sub-instructions.

Install the StanfordNLP package (v0.1.2 in our experiment) and download the English models for the neural pipeline.
Run make_subinstr.py to generate data with sub-instructions from the original R2R data.
The generated files had been sent to the Amazon Mechanical Turk (AMT) for annotating the sub-paths.

Reference

If you use or dicsuss the Fine-Grained R2R dataset in your work, please cite our paper:

@article{hong2020sub,
  title={Sub-Instruction Aware Vision-and-Language Navigation},
  author={Hong, Yicong and Rodriguez-Opazo, Cristian and Wu, Qi and Gould, Stephen},
  journal={arXiv preprint arXiv:2004.02707},
  year={2020}
}

Contact

If you have any question regarding the dataset or publication, please create an issue in this repository or email to [email protected].

You might also like...

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

307 Jan 3, 2023

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

77 Dec 27, 2022

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

We propose a hierarchical core-fringe learning framework to measure fine-grained domain relevance of terms – the degree that a term is relevant to a broad (e.g., computer science) or narrow (e.g., deep learning) domain.

14 Oct 21, 2022

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021) PyTorch implementation of SnapMix | paper Method Overview Cite

126 Dec 30, 2022

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification Code release for The Devil is in the Channels: Mutual-Channel

230 Dec 31, 2022

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

4 Aug 28, 2022

Comments

Question regarding the LSTM notations in the paper

In equation 1 on page 3363 of the paper, the LSTM has (uhat_t, m_t-1) as the second parameter. I am wondering what the parenthesis does around the attended text feature (ATF) and the previous memory (PM). Does it mean we are concatenating the ATF and PM into one vector, or adding them, or taking the inner product?

opened by Donglin-Wang2 9

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Related tags

Overview

Fine-Grained R2R

Data

Source

Reference

Contact

You might also like...

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

X-VLM: Multi-Grained Vision Language Pre-Training

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Comments

Question regarding the LSTM notations in the paper

Owner

YicongHong

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.