ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data

Apple

Last update: Jan 5, 2023

Related tags

Deep Learning ARKitScenes

Overview

ARKitScenes

This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data and contains the data, scripts to visualize and process assets, and training code described in our paper.

ARKitScenes_screen_720p.mov

Paper

ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data

upon using these data or source code, please cite

@inproceedings{
dehghan2021arkitscenes,
title={{ARK}itScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile {RGB}-D Data},
author={Gilad Baruch and Zhuoyuan Chen and Afshin Dehghan and Tal Dimry and Yuri Feigin and Peter Fu and Thomas Gebauer and Brandon Joffe and Daniel Kurz and Arik Schwartz and Elad Shulman},
booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
year={2021},
url={https://openreview.net/forum?id=tjZjv_qh_CE}
}

Overview

ARKitScenes is not only the first RGB-D dataset that is captured with now widely available depth sensor, but also is the largest indoor scene understanding data ever collected. In addition to the raw and processed data, ARKitScenes includes high resolution depth maps captured using a stationary laser scanner, as well as manually labeled 3D oriented bounding boxes for a large taxonomy of furniture. We further provide helper scripts for two downstream tasks: 3D object detection and RGB-D guided upsampling. We hope that our dataset can help push the boundaries of existing state-of-the-art methods and introduce new challenges that better represent real world scenarios.

Key features

• ARKitScenes is the first RGB-D dataset captured with the widely available Apple LiDAR scanner. Along with the raw data we provide the camera pose and surface reconstruction for each scene.

• ARKitScenes is the largest indoor 3D dataset consisting of 5,047 captures of 1,661 unique scenes.

• We provide high quality ground truth of (a) registered RGB-D frames and (b) oriented bounding boxes of room defining objects.

Below is an overview of RGB-D datasets and their ground truth assets compared with ARKitScenes. HR and LR represent High Resolution and Low Resolution respectively, and are available for a subset of 2,257 captures of 841 unique scenes.

Data collection

In the figure below, we provide (a) illustration of iPad Pro scanning set up. (b) mesh overlay to assist data collection with iPad Pro. (c) example of one of the scan patterns captured with the iPad pro, the red markers show the chosen locations of the stationary laser scanner in that room.

Data download

To download the data please follow the data documentation

Tasks

Here we provide the two tasks mentioned in our paper, namely, 3D Object Detection (3DOD) and depth upsampling.

3DOD

Depth upsampling

License

The ARKitScenes dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/. For queries regarding a commercial license, contact [email protected] If you have any other questions raise an issue in the repository and contact [email protected]

Comments

Question about raw Faro high-resolution XYZRGB point cloud

Thank you for the impressive large-scale indoor dataset collecting work. It's a significant dataset with many possibilities for high-level application scenarios, and I like it.

It is thoughtful to generate ground truth high-resolution depth maps by discarding geometry which a direct line-of-sight from the novel viewpoint cannot be guaranteed.

But I think if we can get access to raw Faro high-resolution XYZRGB point cloud, it will grant the dataset more potential, such as point cloud completion task.

Will it be possible for us to get access to much more raw data collected in your well-designed data collecting process? We can explore more meaningful settings to explore 3D understanding in such a large-scale real indoor dataset.

opened by Gofinge 10
Question about camera orientation (portrait and landscape)
I am trying to extract frames from the Raw dataset and running into trouble/confusion related to the orientation of the images which vary between Portrait and Landscape modes from video to video.

Here are a few questions I have on this topic.

Is there any annotation or way of determining what the correct orientation is for the Raw images/annotations/intrinsic? Most seem to be rotated by -90 degrees but not all as far as I can tell. It seems that videos in Landscape mode are mostly not rotated but could occasionally be upside down in my tests.

Are the Raw videos always in the "correct" orientation? They seem to be at a glance, so I have assumed this for now.

Is it known whether the camera operators switch between Landscape and Portrait modes in the middle of a video? If it's not known, then was it an intention?

Thank you and sorry if this is covered somewhere in the code that I missed.
opened by garrickbrazil 7
Question about .traj

Thank you for the large-scale indoor dataset. I noticed that only 1/6 images have pose parameters in 'lowres_wide.traj' file. Are there more posed images? Does there exist 'vga_wide.traj' files?

opened by tau-yihouxiang 7
highres_depth in raw dataset

Dear ARKitScenes developers,

Thanks for the excellent work! I am particularly interested in using high-resolution depth maps in the raw dataset. The ones in the upsampling subset are too sparse for my use case, unfortunately.

However, after I downloaded one of the sequences in the raw dataset using the command below: python3 download_data.py raw --split Training --video_id 41048190 --download_dir /tmp/Datasets_ssd/ I couldn't find highres_depth. Could you please advise if I missed anything?

On a separate note, I also wonder if there are any plans to release the point clouds captured by the Faro laser scanner and how you computed the confidence for depth maps if it could be disclosed.

opened by likojack 6
Encounter 403 Forbidden When Downloading Mesh File

Hi, thanks for the great work again! I am trying to download the mesh file of each scan produced by ARKit (the low-resolution one). Unfortunately, I encounter 403 forbidden when downloading these files (limit the download file list to mesh only), and I worry about data loss. The following image shows some of my logs.

opened by Gofinge 3
Camera pose alignment

Hi,

Thanks for releasing this dataset. Since the highres depth map is too sparse to me, I'm trying to re-rendering highres depth map using the given camera parameters and mesh. I suppose the '.traj' contains extrsincs of each frames. Just wondering what is the orientation of raotation and translation, does that represent x,y,z or other coordinate system? which axis face the front of the camera? What is the near and far distance of each depth map?

Best,

opened by yifanjiang19 3
ARCamera poses

Hi, thanks for your great work! I'm working on a data collection APP with ARKit. I notice that you estimate the ground truth poses instead of using ARCamera poses (ARCamera.transform). Why? Is it because the ARCamera poses are inaccurate?

opened by lawpdas 3
Question about the depth data collected from Apple device.

Hello, this is really a nice work, However, as far as I know, the lidar equipped in Apple device can only acquire 9X64 points at a time. So I wonder how can you acquire the depth map in real-time? Is it generated by fusing the depth information from the lidar sensor and other information(such as RGB and IMU) through "sceneDepth" API?

opened by CodeLHY 3
How to get the 2D bboxes from 3D bboxes, in the Online Single frame Experiemnet?

Thanks for releasing this fruitful dataset. I am trying to train a 2D object detector. How can I get the 2D BBOX? What is "urc" coordinate that you are using in "data_prepare_online.py"?

opened by moaljazaery 3
The upsampling dataset

Hi, @PeterZheFu

i download the upsampling dataset, there are 4 types assets, confidence, lowres_depth, highres_depth and wide_rgb.

The highres_depth is the ground-truth or the result of upsampling network(MSG/MSPF) ?

If i want to train the network , how to generate the metadata.csv file ? what does this file mean ?

opened by MemoryNode 3

Downloading of mesh files

Hi, when i try to download mesh files

python3 download_data.py raw --split Validation --video_id 48458667 --download_dir . --raw_dataset_assets mesh

i always got errors like that

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>S35X8MRZXWAHDNCD</RequestId><HostId>RUdvpESfdl/jZ20rR9qMA67MmnhqACL1JTE6CXWbyn1qXnqby7vU6nYra1HBRIUX2ypgy2mxLfM=</HostId></Error>

opened by noamwies 3

Watertight textured mesh

Hi,

Thank you for sharing the dataset. In section 3.2 of the paper, you have metioned that watertight textured mesh could be generated by stereographic projection. Cloud you share these meshes or related code please?

opened by imbinwang 0
Add wide and wide_intrinsics to the default assets list
wide and wide_intrinsics are missing from the default assets list, hence they are not downloaded when downloading the raw datasets. Added these to the list
opened by cy94 0
Problems with annotation-files
Since I found missing objects in the object annotation files, I suggest to start a thread were we collect such issues

In the sets 40776203 and 40776204 (Training) the bed in the corner is not labeled. It has the following transformation:

in 40776203: center=[ 3.087, -1.471, -1.220 ], dimension=[ 2.320, 1.680, 0.461], rot=[-0.527, 0, 0]

in 40776204: center=[ 0.585, -1.064, -1.141 ], dimension=[ 2.320, 1.680, 0.461], rot=[ 1.043, 0, 0]
opened by M-G-A 2
Transformation between ARKit mesh and Point Cloud

Hello,

It looks like the 3D bounding boxes can only be used on the ARKit reconstructions. Could you provide the transformation between this mesh and the point clouds, so that the bounding boxes can be used on the point cloud as well? Could you also explain the difference between data['segments']['obbAligned'] and data['segments']['obb'] in the annotations file?

Thanks!

opened by cy94 5
Duplicate scans / multiple rooms in some scenes
Hi team,

Thanks for releasing the Faro point clouds. It looks like some scenes such as 421006 have 2 copies of the point cloud, each containing 4 scans -

169952, 169953, 169955, 169962

170029, 170030, 170034, 170036

A visualization -

Additionally some scenes like 422013 have 2 rooms that are far apart -

Can you let us know how to handle these?
opened by cy94 1
About lowres_wide.traj for ultra and vga images
Hello, Thank you for your sharing! Could you please provide some details about this dataset, such as :

Whether the lowres_wide.traj can be used as the extrinsic of the images in vga_wide and ultra_wide dictionary...

The total size of raw data using the script download_data.py, it seemed over 7TB...
opened by linzhenyuyuchen 1

Owner

Apple

GitHub

PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentation.

Shape-aware Convolutional Layer (ShapeConv) PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentatio

82 Dec 29, 2022

The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Habitat-Matterport 3D Dataset (HM3D) The Habitat-Matterport 3D Research Dataset is the largest-ever dataset of 3D indoor spaces. It consists of 1,000

62 Dec 27, 2022

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022 [Project page | Video] Getting sta

51 Nov 29, 2022

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

66 Nov 16, 2022

Pathdreamer: A World Model for Indoor Navigation

Pathdreamer: A World Model for Indoor Navigation This repository hosts the open source code for Pathdreamer, to be presented at ICCV 2021. Paper | Pro

122 Jan 4, 2023

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

143 Dec 22, 2022

3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

0 Feb 6, 2022

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Real-ESRGAN Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data Ported from https://github.com/xinntao/Real-ESRGAN Depend

44 Dec 27, 2022

Towards Part-Based Understanding of RGB-D Scans

Towards Part-Based Understanding of RGB-D Scans (CVPR 2021) We propose the task of part-based scene understanding of real-world 3D environments: from

26 Nov 23, 2022

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

ManhattanSLAM Authors: Raza Yunus, Yanyan Li and Federico Tombari ManhattanSLAM is a real-time SLAM library for RGB-D cameras that computes the camera

117 Dec 28, 2022

ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data

Related tags

Overview

ARKitScenes

Paper

Overview

Key features

Data collection

Data download

Tasks

License

Comments

Owner

Apple

PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentation.

The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

Pathdreamer: A World Model for Indoor Navigation

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Towards Part-Based Understanding of RGB-D Scans

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

Edge-aware Guidance Fusion Network for RGB-Thermal Scene Parsing

The first dataset on shadow generation for the foreground object in real-world scenes.

HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities

Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.

CCPD: a diverse and well-annotated dataset for license plate detection and recognition

Implementation of CVPR'21: RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)