Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image

Overview

NonCuboidRoom

Paper

Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image

Cheng Yang*, Jia Zheng*, Xili Dai, Rui Tang, Yi Ma, Xiaojun Yuan.

[Preprint] [Supplementary Material]

(*: Equal contribution)

Installation

The code is tested with Ubuntu 16.04, PyTorch v1.5, CUDA 10.1 and cuDNN v7.6.

# create conda env
conda create -n layout python=3.6
# activate conda env
conda activate layout
# install pytorch
conda install pytorch==1.5.0 torchvision==0.6.0 cudatoolkit=10.1 -c pytorch
# install dependencies
pip install -r requirements.txt

Data Preparation

Structured3D Dataset

Please download Structured3D dataset and our processed 2D line annotations. The directory structure should look like:

data
└── Structured3D
    │── Structured3D
    │   ├── scene_00000
    │   ├── scene_00001
    │   ├── scene_00002
    │   └── ...
    └── line_annotations.json

SUN RGB-D Dataset

Please download SUN RGB-D dataset, our processed 2D line annotation for SUN RGB-D dataset, and layout annotations of NYUv2 303 dataset. The directory structure should look like:

data
└── SUNRGBD
    │── SUNRGBD
    │    ├── kv1
    │    ├── kv2
    │    ├── realsense
    │    └── xtion
    │── sunrgbd_train.json      // our extracted 2D line annotations of SUN RGB-D train set
    │── sunrgbd_test.json       // our extracted 2D line annotations of SUN RGB-D test set
    └── nyu303_layout_test.npz  // 2D ground truth layout annotations provided by NYUv2 303 dataset

Pre-trained Models

You can download our pre-trained models here:

  • The model trained on Structured3D dataset.
  • The model trained on SUN RGB-D dataset and NYUv2 303 dataset.

Structured3D Dataset

To train the model on the Structured3D dataset, run this command:

python train.py --model_name s3d --data Structured3D

To evaluate the model on the Structured3D dataset, run this command:

python test.py --pretrained DIR --data Structured3D

NYUv2 303 Dataset

To train the model on the SUN RGB-D dataset and NYUv2 303 dataset, run this command:

# first fine-tune the model on the SUN RGB-D dataset
python train.py --model_name sunrgbd --data SUNRGBD --pretrained Structure3D_DIR --split all --lr_step []
# Then fine-tune the model on the NYUv2 subset
python train.py --model_name nyu --data SUNRGBD --pretrained SUNRGBD_DIR --split nyu --lr_step [] --epochs 10

To evaluate the model on the NYUv2 303 dataset, run this command:

python test.py --pretrained DIR --data NYU303

Inference on the customized data

To predict the results of customized images, run this command:

python test.py --pretrained DIR --data CUSTOM

Citation

@article{NonCuboidRoom,
  title   = {Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image},
  author  = {Cheng Yang and
             Jia Zheng and
             Xili Dai and
             Rui Tang and
             Yi Ma and
             Xiaojun Yuan},
  journal = {CoRR},
  volume  = {abs/2104.07986},
  year    = {2021}
}

LICENSE

The code is released under the MIT license. Portions of the code are borrowed from HRNet-Object-Detection and CenterNet.

Acknowledgements

We would like to thank Lei Jin for providing us the code for parsing the layout annotations in SUN RGB-D dataset.

Comments
  • Curved Depth Map

    Curved Depth Map

    Hi, I am trying to create a visualizer that allows me to visualize the room layout. I am using a custom image for inference. The results look good, when i check the segmentation overlayed on my image. The main issue I have is when I try to use open 3d to plot, it looks like the depth map is curved. Do you have any explanation for this?

    This is the overlayed image overlay_img

    This is my Open3d visualization of the polygons:

    Screen Shot 2021-11-24 at 4 18 19 PM Screen Shot 2021-11-24 at 4 18 34 PM

    The main issue is the curvature, and that is due to the depth map extracted by the network. Any idea of how to solve it? Thanks

    good first issue question 
    opened by marcomiglionico94 12
  • How to calculate the value of metric PE without semantic infomantion?

    How to calculate the value of metric PE without semantic infomantion?

    Hello, Thanks for the impressive work of you and your team!

    The output of your network are planes, lines and plane parameters, and there is no semantic infomation included in the output , so i`m very confuse about the computing method of metric PE, expecting for your reply.

    Thanks again.

    question 
    opened by Hui-Yao 4
  • Something wrong with testing Structured3D dataset

    Something wrong with testing Structured3D dataset

    I downloaded Structured3D dataset and 2D line annotations which contains 6280 labels, but I found that scene_03499_142535_2 is not exist in my downloaded dataset. And I found some totally wrong label like this (the second col is GT) scene_03253_533743_4 1710_select scene_03440_527_3 2755_select scene_03394_4372_0 1256_select

    question 
    opened by zhangjingxian1998 2
  • question regarding the network design choice

    question regarding the network design choice

    Hi author,

    I have a question regarding the network design.

    From the planar detection (Section 3.1), you indicated that "Each channel of the center likelihood map C represents different categories"., and looks like that you attempt to solve planar detection + classification of wall/ceiling/floor together via center likelihood map.

    This make me confused. As the channel for offset is still HW2 instead of HW6, where it looks like that wall/ceiling/floor share the same offset. I am not quite understand the design here as normally, the offset for wall/ceiling/floor may not be the same. Could you make some comments for my question? and will it make more sense to decouple them to keypoint detection + classification instead?

    Thank you if you can consider answer my question.

    question 
    opened by Cli98 2
  • Future Improvements Ideas

    Future Improvements Ideas

    Hello guys, I really appreciate the work and I was finally able to reconstruct and visualize my room from a 2d picture.

    Screen Shot 2021-12-06 at 5 05 12 PM

    I was wandering if you have any directions/ideas on how we could combine different predictions from different images of the same room, to obtain a complete layout reconstruction of the full room. The idea would be to we take 3 or 4 partially overlapping pictures that cover the whole room and then having a full 3D reconstruction of the room, maybe by using the camera intrinsics and extrinsics and the by joining the extracted planes. This is just an idea but I wanted to know if you thought about it and if you think is something feasible or not. Thanks a lot.

    question 
    opened by marcomiglionico94 1
  • Depth Pixelwise

    Depth Pixelwise

    I was reviewing the code, i noticed that dt_params3d_pixelwise is not used in the code. Line 326 in test.py

    # post process on output feature map size, and extract planes, lines, plane params instance and plane params pixelwise dt_planes, dt_lines, dt_params3d_instance, dt_params3d_pixelwise = post_process(x, Mnms=1)

    In the function ConvertLayout in reconstruction.py there is a parameter called pixelwise which is set to None. If i try to assign dt_params3d_pixelwise to the pixelwise parameter function of Convert Layout i get an error.

    # opt results seg, depth, img, polys = ConvertLayout( inputs['img'][i], ups, downs, attribution, K=inputs['intri'][i].cpu().numpy(), pwalls=params_layout, pfloor=pfloor, pceiling=pceiling, ixy1map=inputs['ixy1map'][i].cpu().numpy(), valid=inputs['iseg'][i].cpu().numpy(), oxy1map=inputs['oxy1map'][i].cpu().numpy(), pixelwise=None )

    So my questions are:

    • What is dt_params3d_pixelwise used for?
    • What should be passed to the function ConvertLayout as pixelwise parameters, when is not set to None.

    Thanks

    question 
    opened by marcomiglionico94 1
  • Structured 3D Dataset Corrupted

    Structured 3D Dataset Corrupted

    I tried to download the structured 3D dataset but it when I try to extract the zip file it says it is corrupted. I tried several times but the result is always the same. Anyone had the same problem?

    good first issue 
    opened by marcomiglionico94 1
  • Pipeline for inference on custom data

    Pipeline for inference on custom data

    I would know what are the steps to run an inference on custom data? More precisely, if we run the command specified on the readme, it will use the custom dataset, what are the things to modify on the dataset and on the cfg.yaml so we can run the model on custom data?

    opened by LucBourrat1 0
  • Inference on custom data

    Inference on custom data

    I am doing some inference using the Structure3D_pretrained.pt model which is downloaded from this repo, and the custom image is come from the InteriorNet dataset, which is introduced by the KuJiaLe, too. The image size is (480, 640), and the camera intrinsic is [[600, 0, 320], [0, 600, 240], [0, 0, 1].

    Part of the result output by the model seems to be reasonable, but other result is hard to accept. So should i change the intrinsic matrix ? or how should i modify the hyperparameter setting ? Any suggestion will be grateful.

    In the following pictures, red is the GT edge and green is predicted by the model: comparison_1509_1602_3FO4II8OLSP6_Dining_room_6 comparison_1201_1301_3FO4IGB214OC_Dining_room_17 comparison_807_909_3FO4IDX9JIBX_Bathroom_2 comparison_910_1011_3FO4IEQVRLOT_Kids_room_5 comparison_2607_2702_3FO4ILIXGI9L_Dining_room_13

    opened by Hui-Yao 2
  • Wrong Predictions using custom Images

    Wrong Predictions using custom Images

    I tried to use the Structured 3D pretrained model on some custom images taken from my phone and online. I noticed that the prediction are not very correct and I wanted to know if there is any preprocessing step that needs to be done on the images. Here are some results I obtained:

    0_select (2)

    0_select (3)

    Any help would be appreciated

    enhancement question 
    opened by marcomiglionico94 5
  • Having a hard time getting the right output as explained in the paper.

    Having a hard time getting the right output as explained in the paper.

    Hello. I really enjoyed reading your paper and am so excited to test your code.

    Here are some outputs that used your pretrained model, both Structure3D and NYU303. 0_select (1) 0_select (2) 0_select

    I am having a hard time getting the right output as explained in the paper. It was happening for both pretrained models.

    • Do you have any restrictions on input images by pretrained models?
    • Two models is outputting different results. Which model do you recommend more?
    • Do you have any restrictions on the environment settings?

    Could you please share some more explanations on how to use the pretrained model you provided? And it would be great if you can share some sample images you used for the inference.

    Thank you.

    question 
    opened by DHDanielSuh 5
  • Inclination is incorrect

    Inclination is incorrect

    Thanks for the amazing work. I tried the running test.py on nyu dataset and blended the seg to check the alignment. It seems to be going off. Can you suggest how to fix it? Please find the example below.

    Blended segmentation and image 9

    segmentation and image 9

    question 
    opened by Mps24-7uk 9
Owner
null
This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds This repository is a PyTorch implementation for paper: Uns

Kaizhi Yang 42 Dec 9, 2022
I3-master-layout - Simple master and stack layout script

Simple master and stack layout script | ------ | ----- | | | | | Ma

Tobias S 18 Dec 5, 2022
DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

Visual Learning Lab 143 Dec 22, 2022
3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

Владислав Молодцов 0 Feb 6, 2022
A study project using the AA-RMVSNet to reconstruct buildings from multiple images

3d-building-reconstruction This is part of a study project using the AA-RMVSNet to reconstruct buildings from multiple images. Introduction It is exci

null 17 Oct 17, 2022
CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

CoReNet CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image. It produces coherent reconstructions, where all objec

Google Research 80 Dec 25, 2022
Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

Gyeongsik Moon 677 Dec 25, 2022
Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

CenterPose Overview This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation fro

NVIDIA Research Projects 188 Dec 27, 2022
This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

null 5 Jan 4, 2023
SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images

SymmetryNet SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images ACM Transactions on Gra

null 26 Dec 5, 2022
A non-linear, non-parametric Machine Learning method capable of modeling complex datasets

Fast Symbolic Regression Symbolic Regression is a non-linear, non-parametric Machine Learning method capable of modeling complex data sets. fastsr aim

VAMSHI CHOWDARY 3 Jun 22, 2022
PN-Net a neural field-based framework for depth estimation from single-view RGB images.

PN-Net We present a neural field-based framework for depth estimation from single-view RGB images. Rather than representing a 2D depth map as a single

null 1 Oct 2, 2021
OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

OcclusionFusion (CVPR'2022) Project Page | Paper | Video Overview This repository contains the code for the CVPR 2022 paper OcclusionFusion, where we

Wenbin Lin 193 Dec 15, 2022
A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection

Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection 1. 介绍 用以替代 NMS,在所有 bbox 中挑选出最优的集合。 NMS 仅考虑了 bbox 的得分,然后根据 IOU 来

null 44 Sep 15, 2022
FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Anton Jeran Ratnarajah 89 Dec 22, 2022
Smart edu-autobooking - Johnson @ DMI-UNICT study room self-booking system

smart_edu-autobooking Sistema di autoprenotazione per l'aula studio Johnson@DMI-

Davide Carnemolla 17 Jun 20, 2022
Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Single Optical Path

Keyhole Imaging Code & Dataset Code associated with the paper "Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Singl

Stanford Computational Imaging Lab 20 Feb 3, 2022
Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning

Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning Code for the paper Harmonious Textual Layout Generation over Nat

null 7 Aug 9, 2022
Library for converting from RGB / GrayScale image to base64 and back.

Library for converting RGB / Grayscale numpy images from to base64 and back. Installation pip install -U image_to_base_64 Conversion RGB to base 64 b

Vladimir Iglovikov 16 Aug 28, 2022