Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image

Last update: Dec 15, 2022

Related tags

Overview

NonCuboidRoom

Paper

Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image

Cheng Yang*, Jia Zheng*, Xili Dai, Rui Tang, Yi Ma, Xiaojun Yuan.

[Preprint] [Supplementary Material]

(*: Equal contribution)

Installation

The code is tested with Ubuntu 16.04, PyTorch v1.5, CUDA 10.1 and cuDNN v7.6.

# create conda env
conda create -n layout python=3.6
# activate conda env
conda activate layout
# install pytorch
conda install pytorch==1.5.0 torchvision==0.6.0 cudatoolkit=10.1 -c pytorch
# install dependencies
pip install -r requirements.txt

Data Preparation

Structured3D Dataset

Please download Structured3D dataset and our processed 2D line annotations. The directory structure should look like:

data
└── Structured3D
    │── Structured3D
    │   ├── scene_00000
    │   ├── scene_00001
    │   ├── scene_00002
    │   └── ...
    └── line_annotations.json

SUN RGB-D Dataset

Please download SUN RGB-D dataset, our processed 2D line annotation for SUN RGB-D dataset, and layout annotations of NYUv2 303 dataset. The directory structure should look like:

data
└── SUNRGBD
    │── SUNRGBD
    │    ├── kv1
    │    ├── kv2
    │    ├── realsense
    │    └── xtion
    │── sunrgbd_train.json      // our extracted 2D line annotations of SUN RGB-D train set
    │── sunrgbd_test.json       // our extracted 2D line annotations of SUN RGB-D test set
    └── nyu303_layout_test.npz  // 2D ground truth layout annotations provided by NYUv2 303 dataset

Pre-trained Models

You can download our pre-trained models here:

The model trained on Structured3D dataset.
The model trained on SUN RGB-D dataset and NYUv2 303 dataset.

Structured3D Dataset

To train the model on the Structured3D dataset, run this command:

python train.py --model_name s3d --data Structured3D

To evaluate the model on the Structured3D dataset, run this command:

python test.py --pretrained DIR --data Structured3D

NYUv2 303 Dataset

To train the model on the SUN RGB-D dataset and NYUv2 303 dataset, run this command:

# first fine-tune the model on the SUN RGB-D dataset
python train.py --model_name sunrgbd --data SUNRGBD --pretrained Structure3D_DIR --split all --lr_step []
# Then fine-tune the model on the NYUv2 subset
python train.py --model_name nyu --data SUNRGBD --pretrained SUNRGBD_DIR --split nyu --lr_step [] --epochs 10

To evaluate the model on the NYUv2 303 dataset, run this command:

python test.py --pretrained DIR --data NYU303

Inference on the customized data

To predict the results of customized images, run this command:

python test.py --pretrained DIR --data CUSTOM

Citation

@article{NonCuboidRoom,
  title   = {Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image},
  author  = {Cheng Yang and
             Jia Zheng and
             Xili Dai and
             Rui Tang and
             Yi Ma and
             Xiaojun Yuan},
  journal = {CoRR},
  volume  = {abs/2104.07986},
  year    = {2021}
}

LICENSE

The code is released under the MIT license. Portions of the code are borrowed from HRNet-Object-Detection and CenterNet.

Acknowledgements

We would like to thank Lei Jin for providing us the code for parsing the layout annotations in SUN RGB-D dataset.

Comments

Curved Depth Map

Hi, I am trying to create a visualizer that allows me to visualize the room layout. I am using a custom image for inference. The results look good, when i check the segmentation overlayed on my image. The main issue I have is when I try to use open 3d to plot, it looks like the depth map is curved. Do you have any explanation for this?

This is the overlayed image

This is my Open3d visualization of the polygons:

The main issue is the curvature, and that is due to the depth map extracted by the network. Any idea of how to solve it? Thanks
good first issue question

opened by marcomiglionico94 12
How to calculate the value of metric PE without semantic infomantion?

Hello, Thanks for the impressive work of you and your team!

The output of your network are planes, lines and plane parameters, and there is no semantic infomation included in the output , so i`m very confuse about the computing method of metric PE, expecting for your reply.

Thanks again.
question

opened by Hui-Yao 4
Something wrong with testing Structured3D dataset

I downloaded Structured3D dataset and 2D line annotations which contains 6280 labels, but I found that scene_03499_142535_2 is not exist in my downloaded dataset. And I found some totally wrong label like this (the second col is GT) scene_03253_533743_4 scene_03440_527_3 scene_03394_4372_0
question

opened by zhangjingxian1998 2
question regarding the network design choice

Hi author,

I have a question regarding the network design.

From the planar detection (Section 3.1), you indicated that "Each channel of the center likelihood map C represents different categories"., and looks like that you attempt to solve planar detection + classification of wall/ceiling/floor together via center likelihood map.

This make me confused. As the channel for offset is still HW2 instead of HW6, where it looks like that wall/ceiling/floor share the same offset. I am not quite understand the design here as normally, the offset for wall/ceiling/floor may not be the same. Could you make some comments for my question? and will it make more sense to decouple them to keypoint detection + classification instead?

Thank you if you can consider answer my question.
question

opened by Cli98 2
Future Improvements Ideas

Hello guys, I really appreciate the work and I was finally able to reconstruct and visualize my room from a 2d picture.

I was wandering if you have any directions/ideas on how we could combine different predictions from different images of the same room, to obtain a complete layout reconstruction of the full room. The idea would be to we take 3 or 4 partially overlapping pictures that cover the whole room and then having a full 3D reconstruction of the room, maybe by using the camera intrinsics and extrinsics and the by joining the extracted planes. This is just an idea but I wanted to know if you thought about it and if you think is something feasible or not. Thanks a lot.
question

opened by marcomiglionico94 1
Depth Pixelwise
I was reviewing the code, i noticed that dt_params3d_pixelwise is not used in the code. Line 326 in test.py

# post process on output feature map size, and extract planes, lines, plane params instance and plane params pixelwise dt_planes, dt_lines, dt_params3d_instance, dt_params3d_pixelwise = post_process(x, Mnms=1)

In the function ConvertLayout in reconstruction.py there is a parameter called pixelwise which is set to None. If i try to assign dt_params3d_pixelwise to the pixelwise parameter function of Convert Layout i get an error.

# opt results seg, depth, img, polys = ConvertLayout( inputs['img'][i], ups, downs, attribution, K=inputs['intri'][i].cpu().numpy(), pwalls=params_layout, pfloor=pfloor, pceiling=pceiling, ixy1map=inputs['ixy1map'][i].cpu().numpy(), valid=inputs['iseg'][i].cpu().numpy(), oxy1map=inputs['oxy1map'][i].cpu().numpy(), pixelwise=None )

So my questions are:

What is dt_params3d_pixelwise used for?

What should be passed to the function ConvertLayout as pixelwise parameters, when is not set to None.

Thanks
question
opened by marcomiglionico94 1
Structured 3D Dataset Corrupted

I tried to download the structured 3D dataset but it when I try to extract the zip file it says it is corrupted. I tried several times but the result is always the same. Anyone had the same problem?
good first issue

opened by marcomiglionico94 1
Pipeline for inference on custom data

I would know what are the steps to run an inference on custom data? More precisely, if we run the command specified on the readme, it will use the custom dataset, what are the things to modify on the dataset and on the cfg.yaml so we can run the model on custom data?

opened by LucBourrat1 0
Inference on custom data

I am doing some inference using the Structure3D_pretrained.pt model which is downloaded from this repo, and the custom image is come from the InteriorNet dataset, which is introduced by the KuJiaLe, too. The image size is (480, 640), and the camera intrinsic is [[600, 0, 320], [0, 600, 240], [0, 0, 1].

Part of the result output by the model seems to be reasonable, but other result is hard to accept. So should i change the intrinsic matrix ? or how should i modify the hyperparameter setting ? Any suggestion will be grateful.

In the following pictures, red is the GT edge and green is predicted by the model:

opened by Hui-Yao 2
Wrong Predictions using custom Images

I tried to use the Structured 3D pretrained model on some custom images taken from my phone and online. I noticed that the prediction are not very correct and I wanted to know if there is any preprocessing step that needs to be done on the images. Here are some results I obtained:

Any help would be appreciated
enhancement question

opened by marcomiglionico94 5
Having a hard time getting the right output as explained in the paper.
Hello. I really enjoyed reading your paper and am so excited to test your code.

Here are some outputs that used your pretrained model, both Structure3D and NYU303.

I am having a hard time getting the right output as explained in the paper. It was happening for both pretrained models.

Do you have any restrictions on input images by pretrained models?

Two models is outputting different results. Which model do you recommend more?

Do you have any restrictions on the environment settings?

Could you please share some more explanations on how to use the pretrained model you provided? And it would be great if you can share some sample images you used for the inference.

Thank you.
question
opened by DHDanielSuh 5
Inclination is incorrect

Thanks for the amazing work. I tried the running test.py on nyu dataset and blended the seg to check the alignment. It seems to be going off. Can you suggest how to fix it? Please find the example below.

Blended segmentation and image

segmentation and image
question

opened by Mps24-7uk 9

Owner

GitHub

This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds This repository is a PyTorch implementation for paper: Uns

42 Dec 9, 2022

I3-master-layout - Simple master and stack layout script

Simple master and stack layout script | ------ | ----- | | | | | Ma

18 Dec 5, 2022

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

143 Dec 22, 2022

3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

0 Feb 6, 2022

A study project using the AA-RMVSNet to reconstruct buildings from multiple images

3d-building-reconstruction This is part of a study project using the AA-RMVSNet to reconstruct buildings from multiple images. Introduction It is exci

17 Oct 17, 2022

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

CoReNet CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image. It produces coherent reconstructions, where all objec

80 Dec 25, 2022

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

677 Dec 25, 2022

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

CenterPose Overview This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation fro

188 Dec 27, 2022

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

5 Jan 4, 2023

Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image

Related tags

Overview

NonCuboidRoom

Paper

Installation

Data Preparation

Structured3D Dataset

SUN RGB-D Dataset

Pre-trained Models

Structured3D Dataset

NYUv2 303 Dataset

Inference on the customized data

Citation

LICENSE

Acknowledgements

Comments

Owner

This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

I3-master-layout - Simple master and stack layout script

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

A study project using the AA-RMVSNet to reconstruct buildings from multiple images

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images

A non-linear, non-parametric Machine Learning method capable of modeling complex datasets

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection

FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR

Smart edu-autobooking - Johnson @ DMI-UNICT study room self-booking system

Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Single Optical Path

Harmonious Textual Layout Generation over Natural Images via Deep Aesthetics Learning

Library for converting from RGB / GrayScale image to base64 and back.