ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data

Overview

ARKitScenes

This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data and contains the data, scripts to visualize and process assets, and training code described in our paper.

image

image

ARKitScenes_screen_720p.mov

Paper

ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data

upon using these data or source code, please cite

@inproceedings{
dehghan2021arkitscenes,
title={{ARK}itScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile {RGB}-D Data},
author={Gilad Baruch and Zhuoyuan Chen and Afshin Dehghan and Tal Dimry and Yuri Feigin and Peter Fu and Thomas Gebauer and Brandon Joffe and Daniel Kurz and Arik Schwartz and Elad Shulman},
booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
year={2021},
url={https://openreview.net/forum?id=tjZjv_qh_CE}
}

Overview

ARKitScenes is not only the first RGB-D dataset that is captured with now widely available depth sensor, but also is the largest indoor scene understanding data ever collected. In addition to the raw and processed data, ARKitScenes includes high resolution depth maps captured using a stationary laser scanner, as well as manually labeled 3D oriented bounding boxes for a large taxonomy of furniture. We further provide helper scripts for two downstream tasks: 3D object detection and RGB-D guided upsampling. We hope that our dataset can help push the boundaries of existing state-of-the-art methods and introduce new challenges that better represent real world scenarios.

Key features

• ARKitScenes is the first RGB-D dataset captured with the widely available Apple LiDAR scanner. Along with the raw data we provide the camera pose and surface reconstruction for each scene.

• ARKitScenes is the largest indoor 3D dataset consisting of 5,047 captures of 1,661 unique scenes.

• We provide high quality ground truth of (a) registered RGB-D frames and (b) oriented bounding boxes of room defining objects.

Below is an overview of RGB-D datasets and their ground truth assets compared with ARKitScenes. HR and LR represent High Resolution and Low Resolution respectively, and are available for a subset of 2,257 captures of 841 unique scenes.

image

Data collection

In the figure below, we provide (a) illustration of iPad Pro scanning set up. (b) mesh overlay to assist data collection with iPad Pro. (c) example of one of the scan patterns captured with the iPad pro, the red markers show the chosen locations of the stationary laser scanner in that room.

image

Data download

To download the data please follow the data documentation

Tasks

Here we provide the two tasks mentioned in our paper, namely, 3D Object Detection (3DOD) and depth upsampling.

3DOD

Depth upsampling

License

The ARKitScenes dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/. For queries regarding a commercial license, contact [email protected] If you have any other questions raise an issue in the repository and contact [email protected]

Comments
  • Question about raw Faro high-resolution XYZRGB point cloud

    Question about raw Faro high-resolution XYZRGB point cloud

    Thank you for the impressive large-scale indoor dataset collecting work. It's a significant dataset with many possibilities for high-level application scenarios, and I like it.

    It is thoughtful to generate ground truth high-resolution depth maps by discarding geometry which a direct line-of-sight from the novel viewpoint cannot be guaranteed.

    But I think if we can get access to raw Faro high-resolution XYZRGB point cloud, it will grant the dataset more potential, such as point cloud completion task.

    Will it be possible for us to get access to much more raw data collected in your well-designed data collecting process? We can explore more meaningful settings to explore 3D understanding in such a large-scale real indoor dataset.

    opened by Gofinge 10
  • Question about camera orientation (portrait and landscape)

    Question about camera orientation (portrait and landscape)

    I am trying to extract frames from the Raw dataset and running into trouble/confusion related to the orientation of the images which vary between Portrait and Landscape modes from video to video.

    Here are a few questions I have on this topic.

    1. Is there any annotation or way of determining what the correct orientation is for the Raw images/annotations/intrinsic? Most seem to be rotated by -90 degrees but not all as far as I can tell. It seems that videos in Landscape mode are mostly not rotated but could occasionally be upside down in my tests.

    2. Are the Raw videos always in the "correct" orientation? They seem to be at a glance, so I have assumed this for now.

    3. Is it known whether the camera operators switch between Landscape and Portrait modes in the middle of a video? If it's not known, then was it an intention?

    Thank you and sorry if this is covered somewhere in the code that I missed.

    opened by garrickbrazil 7
  • Question about .traj

    Question about .traj

    Thank you for the large-scale indoor dataset. I noticed that only 1/6 images have pose parameters in 'lowres_wide.traj' file. Are there more posed images? Does there exist 'vga_wide.traj' files?

    opened by tau-yihouxiang 7
  • highres_depth in raw dataset

    highres_depth in raw dataset

    Dear ARKitScenes developers,

    Thanks for the excellent work! I am particularly interested in using high-resolution depth maps in the raw dataset. The ones in the upsampling subset are too sparse for my use case, unfortunately.

    However, after I downloaded one of the sequences in the raw dataset using the command below: python3 download_data.py raw --split Training --video_id 41048190 --download_dir /tmp/Datasets_ssd/ I couldn't find highres_depth. Could you please advise if I missed anything?

    On a separate note, I also wonder if there are any plans to release the point clouds captured by the Faro laser scanner and how you computed the confidence for depth maps if it could be disclosed.

    opened by likojack 6
  • Encounter 403 Forbidden When Downloading Mesh File

    Encounter 403 Forbidden When Downloading Mesh File

    Hi, thanks for the great work again! I am trying to download the mesh file of each scan produced by ARKit (the low-resolution one). Unfortunately, I encounter 403 forbidden when downloading these files (limit the download file list to mesh only), and I worry about data loss. The following image shows some of my logs.

    image

    opened by Gofinge 3
  • Camera pose alignment

    Camera pose alignment

    Hi,

    Thanks for releasing this dataset. Since the highres depth map is too sparse to me, I'm trying to re-rendering highres depth map using the given camera parameters and mesh. I suppose the '.traj' contains extrsincs of each frames. Just wondering what is the orientation of raotation and translation, does that represent x,y,z or other coordinate system? which axis face the front of the camera? What is the near and far distance of each depth map?

    Best,

    opened by yifanjiang19 3
  • ARCamera poses

    ARCamera poses

    Hi, thanks for your great work! I'm working on a data collection APP with ARKit. I notice that you estimate the ground truth poses instead of using ARCamera poses (ARCamera.transform). Why? Is it because the ARCamera poses are inaccurate?

    opened by lawpdas 3
  • Question about the depth data collected from Apple device.

    Question about the depth data collected from Apple device.

    Hello, this is really a nice work, However, as far as I know, the lidar equipped in Apple device can only acquire 9X64 points at a time. So I wonder how can you acquire the depth map in real-time? Is it generated by fusing the depth information from the lidar sensor and other information(such as RGB and IMU) through "sceneDepth" API?

    opened by CodeLHY 3
  • How to get the 2D bboxes from 3D bboxes, in the Online Single frame Experiemnet?

    How to get the 2D bboxes from 3D bboxes, in the Online Single frame Experiemnet?

    Thanks for releasing this fruitful dataset. I am trying to train a 2D object detector. How can I get the 2D BBOX? What is "urc" coordinate that you are using in "data_prepare_online.py"?

    opened by moaljazaery 3
  • The upsampling  dataset

    The upsampling dataset

    Hi, @PeterZheFu

    i download the upsampling dataset, there are 4 types assets, confidence, lowres_depth, highres_depth and wide_rgb.

    The highres_depth is the ground-truth or the result of upsampling network(MSG/MSPF) ?

    If i want to train the network , how to generate the metadata.csv file ? what does this file mean ?

    opened by MemoryNode 3
  • Downloading of mesh files

    Downloading of mesh files

    Hi, when i try to download mesh files

    python3 download_data.py raw --split Validation --video_id 48458667 --download_dir . --raw_dataset_assets mesh
    

    i always got errors like that

    <?xml version="1.0" encoding="UTF-8"?>
    <Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>S35X8MRZXWAHDNCD</RequestId><HostId>RUdvpESfdl/jZ20rR9qMA67MmnhqACL1JTE6CXWbyn1qXnqby7vU6nYra1HBRIUX2ypgy2mxLfM=</HostId></Error>
    
    opened by noamwies 3
  • Watertight textured mesh

    Watertight textured mesh

    Hi,

    Thank you for sharing the dataset. In section 3.2 of the paper, you have metioned that watertight textured mesh could be generated by stereographic projection. Cloud you share these meshes or related code please?

    opened by imbinwang 0
  • Add wide and wide_intrinsics to the default assets list

    Add wide and wide_intrinsics to the default assets list

    • wide and wide_intrinsics are missing from the default assets list, hence they are not downloaded when downloading the raw datasets. Added these to the list
    opened by cy94 0
  • Problems with annotation-files

    Problems with annotation-files

    Since I found missing objects in the object annotation files, I suggest to start a thread were we collect such issues

    In the sets 40776203 and 40776204 (Training) the bed in the corner is not labeled. It has the following transformation:

    • in 40776203: center=[ 3.087, -1.471, -1.220 ], dimension=[ 2.320, 1.680, 0.461], rot=[-0.527, 0, 0]
    • in 40776204: center=[ 0.585, -1.064, -1.141 ], dimension=[ 2.320, 1.680, 0.461], rot=[ 1.043, 0, 0] bed_missing bed_included
    opened by M-G-A 2
  • Transformation between ARKit mesh and Point Cloud

    Transformation between ARKit mesh and Point Cloud

    Hello,

    It looks like the 3D bounding boxes can only be used on the ARKit reconstructions. Could you provide the transformation between this mesh and the point clouds, so that the bounding boxes can be used on the point cloud as well? Could you also explain the difference between data['segments']['obbAligned'] and data['segments']['obb'] in the annotations file?

    image

    Thanks!

    opened by cy94 5
  • Duplicate scans / multiple rooms in some scenes

    Duplicate scans / multiple rooms in some scenes

    Hi team,

    Thanks for releasing the Faro point clouds. It looks like some scenes such as 421006 have 2 copies of the point cloud, each containing 4 scans -

    • 169952, 169953, 169955, 169962
    • 170029, 170030, 170034, 170036

    A visualization -

    image

    Additionally some scenes like 422013 have 2 rooms that are far apart - image

    Can you let us know how to handle these?

    opened by cy94 1
  • About lowres_wide.traj for ultra and vga images

    About lowres_wide.traj for ultra and vga images

    Hello, Thank you for your sharing! Could you please provide some details about this dataset, such as :

    1. Whether the lowres_wide.traj can be used as the extrinsic of the images in vga_wide and ultra_wide dictionary...
    2. The total size of raw data using the script download_data.py, it seemed over 7TB...
    opened by linzhenyuyuchen 1
Owner
Apple
Apple
PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentation.

Shape-aware Convolutional Layer (ShapeConv) PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentatio

Hanchao Leng 82 Dec 29, 2022
The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Habitat-Matterport 3D Dataset (HM3D) The Habitat-Matterport 3D Research Dataset is the largest-ever dataset of 3D indoor spaces. It consists of 1,000

Meta Research 62 Dec 27, 2022
HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022

HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDAR. CVPR 2022 [Project page | Video] Getting sta

null 51 Nov 29, 2022
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

Cheng Zhang 66 Nov 16, 2022
Pathdreamer: A World Model for Indoor Navigation

Pathdreamer: A World Model for Indoor Navigation This repository hosts the open source code for Pathdreamer, to be presented at ICCV 2021. Paper | Pro

Google Research 122 Jan 4, 2023
DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

Visual Learning Lab 143 Dec 22, 2022
3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

Владислав Молодцов 0 Feb 6, 2022
Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Real-ESRGAN Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data Ported from https://github.com/xinntao/Real-ESRGAN Depend

Holy Wu 44 Dec 27, 2022
Towards Part-Based Understanding of RGB-D Scans

Towards Part-Based Understanding of RGB-D Scans (CVPR 2021) We propose the task of part-based scene understanding of real-world 3D environments: from

null 26 Nov 23, 2022
A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

ManhattanSLAM Authors: Raza Yunus, Yanyan Li and Federico Tombari ManhattanSLAM is a real-time SLAM library for RGB-D cameras that computes the camera

null 117 Dec 28, 2022
Edge-aware Guidance Fusion Network for RGB-Thermal Scene Parsing

EGFNet Edge-aware Guidance Fusion Network for RGB-Thermal Scene Parsing Dataset and Results Test maps: 百度网盘 提取码:zust Citation @ARTICLE{ author={Zhou,

ShaohuaDong 10 Dec 8, 2022
The first dataset on shadow generation for the foreground object in real-world scenes.

Object-Shadow-Generation-Dataset-DESOBA Object Shadow Generation is to deal with the shadow inconsistency between the foreground object and the backgr

BCMI 105 Dec 30, 2022
HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

Code for HDR Video Reconstruction HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021) Guanying Chen, Cha

Guanying Chen 64 Nov 19, 2022
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Meftun AKARSU 52 Dec 22, 2022
Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities

ORB-SLAM2 Authors: Raul Mur-Artal, Juan D. Tardos, J. M. M. Montiel and Dorian Galvez-Lopez (DBoW2) 13 Jan 2017: OpenCV 3 and Eigen 3.3 are now suppor

Raul Mur-Artal 7.8k Dec 30, 2022
Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.

The Face Synthetics dataset Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels. It was introduced in ou

Microsoft 608 Jan 2, 2023
CCPD: a diverse and well-annotated dataset for license plate detection and recognition

CCPD (Chinese City Parking Dataset, ECCV) UPdate on 10/03/2019. CCPD Dataset is now updated. We are confident that images in subsets of CCPD is much m

detectRecog 1.8k Dec 30, 2022
Implementation of CVPR'21: RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction

RfD-Net [Project Page] [Paper] [Video] RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction Yinyu Nie, Ji Hou, Xiaoguang Han, Matthi

Yinyu Nie 162 Jan 6, 2023
Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Implicit3DUnderstanding (Im3D) [Project Page] Holistic 3D Scene Understanding from a Single Image with Implicit Representation Cheng Zhang, Zhaopeng C

Cheng Zhang 149 Jan 8, 2023