Code for Multimodal Neural SLAM for Interactive Instruction Following

Last update: Dec 7, 2022

Related tags

Deep Learning multimodal-neuralslam

Overview

Code for Multimodal Neural SLAM for Interactive Instruction Following

Code structure

The code is adapted from E.T. and most training as well as data processing files are in currently in the ET/notebooks folder and the et_train folder.

Dependency

Inherited from the E.T. repo, the package is depending on:

numpy
pandas
opencv-python
tqdm
vocab
revtok
numpy
Pillow
sacred
etaprogress
scikit-video
lmdb
gtimer
filelock
networkx
termcolor
torch==1.7.1
torchvision==0.8.2
tensorboardX==1.8
ai2thor==2.1.0
E.T. (https://github.com/alexpashevich/E.T.)

MaskRCNN Fine-tuning

To fine-tune the MaskRCNN module used in solving the Alfred challenge, we provide the code adapted from the official PyTorch tutorial.

Setup

We assume the environment and the code structure as in the E.T. model is set up, with this repo served as an extension. Although the fine-tuning code should be a standalone unit.

Training Data Geneation

Given a traj_data.json file (e.g., the 45K one used in E.T. joint-training here), run python -m alfred.gen.render_trajs as in E.T. to render the training inputs (raw images) and the ground truth labels (instance segmentation masks) for all the frames recorded in the traj_data.json files. Make sure the flag for generating instance level segmentation masks is set to True.

Pre-processing Instance Segmentation Masks

The rendered instance segmentation masks need to be preprocessed so that the data format is aligned with the one used in the official PyTorch tutorial. In specific, each generated mask is of a different RGB color per instance, which is mapped to the unique instance index in the frame as well as a label index for its semantic class. The mapping is constructed by looking up the traj['scene']['color_to_object_type'] in each of the json dictionaries. The code also supports the functionality to only collect training data from certain subgoal data (such as for PickupObject in Alfred). Notice that there are some bugs in the rendering process of the masks which creates some artifacts (small regions in the ground truth labels that correspond to no actual objects). This can be fixed by only selecting instance masks that are larger than certain area (e.g., > 10 as in alfred/data/maskrcnn.py).

Training

Run python -m alfred.maskrcnn.train which first loads the pre-trained model provided by E.T. and then fine-tunes it on the pre-processed data mentioned above.

Evaluation

We follow the MSCOCO evaluation protocal which is widely used for object detection and instance segmentation, which output average precision and recall at multiple scales. The evaluation function call evaluate(model, data_loader_test, device=device) in alfred/maskrcnn/train.py serves as an example.

You might also like...

SuMa++: Efficient LiDAR-based Semantic SLAM (Chen et al IROS 2019)

SuMa++: Efficient LiDAR-based Semantic SLAM This repository contains the implementation of SuMa++, which generates semantic maps only using three-dime

701 Dec 30, 2022

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities

ORB-SLAM2 Authors: Raul Mur-Artal, Juan D. Tardos, J. M. M. Montiel and Dorian Galvez-Lopez (DBoW2) 13 Jan 2017: OpenCV 3 and Eigen 3.3 are now suppor

7.8k Dec 30, 2022

🛠️ SLAMcore SLAM Utilities

slamcore_utils Description This repo contains the slamcore-setup-dataset script. It can be used for installing a sample dataset for offline testing an

7 Aug 4, 2022

Official page of Struct-MDC (RA-L'22 with IROS'22 option); Depth completion from Visual-SLAM using point & line features

Struct-MDC (click the above buttons for redirection!) Official page of "Struct-MDC: Mesh-Refined Unsupervised Depth Completion Leveraging Structural R

37 Dec 22, 2022

Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

SUO-SLAM This repository hosts the code for our CVPR 2022 paper "Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation". ArXiv li

Robot Perception & Navigation Group (RPNG)

97 Jan 3, 2023

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

42 Nov 24, 2022

ByteTrack with ReID module following the paradigm of FairMOT, tracking strategy is borrowed from FairMOT/JDE.

Comments

How to use the training codes in et_trains

Hi

This is an excellent work for embodied Ai. I have read the README of this repo and I would like to know how to use the training code in et_trains/. By the way, in the README, you say that some details are in ET/notebooks but I don't find this folder.

Very thanks for your help.

opened by wangjingbo1219 0

Code for Multimodal Neural SLAM for Interactive Instruction Following

Related tags

Overview

Code for Multimodal Neural SLAM for Interactive Instruction Following

Code structure

Dependency

MaskRCNN Fine-tuning

Setup

Training Data Geneation

Pre-processing Instance Segmentation Masks

Training

Evaluation

You might also like...

SuMa++: Efficient LiDAR-based Semantic SLAM (Chen et al IROS 2019)

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities

🛠️ SLAMcore SLAM Utilities

Official page of Struct-MDC (RA-L'22 with IROS'22 option); Depth completion from Visual-SLAM using point & line features

Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

ByteTrack with ReID module following the paradigm of FairMOT, tracking strategy is borrowed from FairMOT/JDE.

Deep Multimodal Neural Architecture Search

MERLOT: Multimodal Neural Script Knowledge Models

Comments

How to use the training codes in et_trains

Owner

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

This repository contains the code and models for the following paper.

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

A list of papers about point cloud based place recognition, also known as loop closure detection in SLAM (processing)

A 3D Dense mapping backend library of SLAM based on taichi-Lang designed for the aerial swarm.

FLVIS: Feedback Loop Based Visual Initial SLAM

This package is for running the semantic SLAM algorithm using extracted planar surfaces from the received detection

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping