Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation

Related tags

Deep Learning mid-level-fusion-nav

Overview

Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation

This repository hosts the code related to the paper:

Marco Rosano, Antonino Furnari, Luigi Gulino, Corrado Santoro and Giovanni Maria Farinella, "Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation". Submitted to "Robotics and Autonomous Systems" (RAS), 2022.

For more details please see the project web page at https://iplab.dmi.unict.it/EmbodiedVN.

Overview

This code is built on top of the Habitat-api/Habitat-lab project. Please see the Habitat project page for more details.

This repository provides the following components:

The implementation of the proposed tool, integrated with Habitat, to train visual navigation models on synthetic observations and test them on realistic episodes containing real-world images. This allows the estimation of real-world performance, avoiding the physical deployment of the robotic agent;
The official PyTorch implementation of the proposed visual navigation models, which follow different strategies to combine a range of visual mid-level representations
the synthetic 3D model of the proposed environment, acquired using the Matterport 3D scanner and used to perform the navigation episodes at train and test time;
the photorealistic 3D model that contains real-world images of the proposed environment, labeled with their pose (X, Z, Angle). The sparse 3D reconstruction was performed using the COLMAP Structure from Motion tool, to then be aligned with the Matterport virtual 3D map.
An integration with CycleGAN to train and evaluate navigation models with Habitat on sim2real adapted images.
The checkpoints of the best performing navigation models.

Installation

Requirements

Python >= 3.7, use version 3.7 to avoid possible issues.
Other requirements will be installed via pip in the following steps.

Steps

(Optional) Create an Anaconda environment and install all on it ( conda create -n fusion-habitat python=3.7; conda activate fusion-habitat )
Install the Habitat simulator following the official repo instructions .The development and testing was done on commit bfbe9fc30a4e0751082824257d7200ad543e4c0e, installing the simulator "from source", launching the ./build.sh --headless --with-cuda command (guide). Please consider to follow these suggestions if you encounter issues while installing the simulator.

Install the customized Habitat-lab (this repo):

git clone https://github.com/rosanom/mid-level-fusion-nav.git
cd mid-level-fusion-nav/
pip install -r requirements.txt
python setup.py develop --all # install habitat and habitat_baselines

Download our dataset (journal version) from here, and extract it to the repository folder (mid-level-fusion-nav/). Inside the data folder you should see this structure:
```
datasets/pointnav/orangedev/v1/...
real_images/orangedev/...
scene_datasets/orangedev/...
orangedev_checkpoints/...
```
(Optional, to check if the software works properly) Download the test scenes data and extract the zip file to the repository folder (mid-level-fusion-nav/). To verify that the tool was successfully installed, run python examples/benchmark.py or python examples/example.py.

Data Structure

All data can be found inside the mid-level-fusion-nav/data/ folder:

the datasets/pointnav/orangedev/v1/... folder contains the generated train and validation navigation episodes files;
the real_images/orangedev/... folder contains the real world images of the proposed environment and the csv file with their pose information (obtained with COLMAP);
the scene_datasets/orangedev/... folder contains the 3D mesh of the proposed environment.
orangedev_checkpoints/ is the folder where the checkpoints are saved during training. Place the checkpoint file here if you want to restore the training process or evaluate the model. The system will load the most recent checkpoint file.

Config Files

There are two configuration files:

habitat_domain_adaptation/configs/tasks/pointnav_orangedev.yaml

and

habitat_domain_adaptation/habitat_baselines/config/pointnav/ddppo_pointnav_orangedev.yaml.

In the first file you can change the robot's properties, the sensors used by the agent and the dataset used in the experiment. You don't have to modify it.

In the second file you can decide:

if evaluate the navigation models using RGB or mid-level representations;
the set of mid-level representations to use;
the fusion architecture to use;
if train or evaluate the models using real images, or using the CycleGAN sim2real adapted observations.

...
EVAL_W_REAL_IMAGES: True
EVAL_CKPT_PATH_DIR: "data/orangedev_checkpoints/"

SIM_2_REAL: False #use cycleGAN for sim2real image adaptation?

USE_MIDLEVEL_REPRESENTATION: True
MIDLEVEL_PARAMS:
ENCODER: "simple" # "simple", SE_attention, "mid_fusion", ...
FEATURE_TYPE: ["normal"] #["normal", "keypoints3d","curvature", "depth_zbuffer"]
...

CycleGAN Integration (baseline)

In order to use CycleGAN on Habitat for the sim2real domain adaptation during train or evaluation, follow the steps suggested in the repository of our previous resease.

Train and Evaluation

To train the navigation model using the DD-PPO RL algorithm, run:

sh habitat_baselines/rl/ddppo/single_node_orangedev.sh

To evaluate the navigation model using the DD-PPO RL algorithm, run:

sh habitat_baselines/rl/ddppo/single_node_orangedev_eval.sh

For more information about DD-PPO RL algorithm, please check out the habitat-lab dd-ppo repo page.

License

The code in this repository, the 3D models and the images of the proposed environment are MIT licensed. See the LICENSE file for details.

The trained models and the task datasets are considered data derived from the correspondent scene datasets.

Matterport3D based task datasets and trained models are distributed with Matterport3D Terms of Use and under CC BY-NC-SA 3.0 US license.
Gibson based task datasets, the code for generating such datasets, and trained models are distributed with Gibson Terms of Use and under CC BY-NC-SA 3.0 US license.

Acknowledgements

This research is supported by OrangeDev s.r.l, by Next Vision s.r.l, the project MEGABIT - PIAno di inCEntivi per la RIcerca di Ateneo 2020/2022 (PIACERI) – linea di intervento 2, DMI - University of Catania, and the grant MIUR AIM - Attrazione e Mobilità Internazionale Linea 1 - AIM1893589 - CUP E64118002540007.

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Real-ESRGAN Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data Ported from https://github.com/xinntao/Real-ESRGAN Depend

44 Dec 27, 2022

Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

R2RNet Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network." Jiang Hai, Zhu Xuan, Ren Yang, Yutong Hao, Fengzhu

77 Dec 24, 2022

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

24 Mar 2, 2022

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

906 Dec 30, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation

Related tags

Overview

Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation

Overview

Installation

Requirements

Steps

Data Structure

Config Files

CycleGAN Integration (baseline)

Train and Evaluation

License

Acknowledgements

You might also like...

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021

Label-Free Model Evaluation with Semi-Structured Dataset Representations

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Owner

First Person Vision @ Image Processing Laboratory - University of Catania

Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments

ImageNet-CoG is a benchmark for concept generalization. It provides a full evaluation framework for pre-trained visual representations which measure how well they generalize to unseen concepts.

The self-supervised goal reaching benchmark introduced in Discovering and Achieving Goals via World Models

A Real-World Benchmark for Reinforcement Learning based Recommender System

Official implementation of the paper 'Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution'

HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

Pathdreamer: A World Model for Indoor Navigation

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.