Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral

Overview

News

  • 05/10/2022 To make the comparison on ScanNet easier, we provide all quantitative and qualitative results of baselines here, including COLMAP, COLMAP*, ACMP, NeRF, UNISURF, NeuS, and VolSDF.
  • 05/10/2022 To make the following works easier to compare with our model, we provide our quantitative and qualitative results, as well as the trained models on ScanNet here.
  • 05/10/2022 We upload our processed ScanNet scene data to Onedrive.

Neural 3D Scene Reconstruction with the Manhattan-world Assumption

Project Page | Video | Paper


introduction

Neural 3D Scene Reconstruction with the Manhattan-world Assumption
Haoyu Guo*, Sida Peng*, Haotong Lin, Qianqian Wang, Guofeng Zhang, Hujun Bao, Xiaowei Zhou
CVPR 2022 (Oral Presentation)


Setup

Installation

conda env create -f environment.yml
conda activate manhattan

Data preparation

Download ScanNet scene data evaluated in the paper from Onedrive / Google Drive / BaiduNetDisk (password:ap9k) and extract them into data/. Make sure that the path is consistent with config file.

Instruction to run on custom data is coming soon!

Usage

Training

python train_net.py --cfg_file configs/scannet/0050.yaml gpus 0, exp_name scannet_0050

Mesh extraction

python run.py --type mesh_extract --output_mesh result.obj --cfg_file configs/scannet/0050.yaml gpus 0, exp_name scannet_0050

Evaluation

python run.py --type evaluate --cfg_file configs/scannet/0050.yaml gpus 0, exp_name scannet_0050

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{guo2022manhattan,
  title={Neural 3D Scene Reconstruction with the Manhattan-world Assumption},
  author={Guo, Haoyu and Peng, Sida and Lin, Haotong and Wang, Qianqian and Zhang, Guofeng and Bao, Hujun and Zhou, Xiaowei},
  booktitle={CVPR},
  year={2022}
}

Acknowledgement

  • Thanks to Lior Yariv for her excellent work VolSDF.
  • Thanks to Jianfei Guo for his implementation of VolSDF neurecon.
  • Thanks to Johannes Schönberger for his excellent work COLMAP.
  • Thanks to Shaohui Liu for his customized implementation of COLMAP as a submodule of NerfingMVS.
Comments
  • 为什么使用colmap重建深度信息时,会出现bug?

    为什么使用colmap重建深度信息时,会出现bug?

    因为我采用colmap重建相机位姿,所以我认为可以直接使用重建位姿的colmap模型,因此修改了run.py的代码。但当我这样做时,从fuse.ply.vis文件中读取信息的代码会报错,请问其中的原因是什么?或者,从fuse.ply.vis文件中读取信息的代码的作用是什么,能否提供注释?谢谢! 这是修改后的run.py主要代码 `for scene_id in ['214']: source = f'/data8T/ydf/manhattan_sdf/data/tmp/{scene_id}/' # TODO: modify this to your path target = f'/data8T/ydf/manhattan_sdf/data/{scene_id}'

    os.makedirs(f'{target}/images', exist_ok=True)
    os.makedirs(f'{target}/pose', exist_ok=True)
    os.makedirs(f'{target}/depth_patchmatch', exist_ok=True)
    
    if not os.path.exists(f'{source}/images/10.jpg'):
        sortFile(source)
    
    colmap_path = "colmap"
    
    with open(f'{source}/colmap_output.txt', 'a') as f:
        feature_extractor_args = [
            colmap_path, 'feature_extractor',
            '--database_path', os.path.join(source, "database.db"),
            '--image_path', os.path.join(source, "images"),
            '--ImageReader.single_camera', '1',
            # '--ImageReader.mask_path', os.path.join(basedir, args.masks),
        ]
        feat_output = (subprocess.check_output(feature_extractor_args, universal_newlines=True))
        f.writelines(feat_output)
        print('Features extracted')
    
        exhaustive_matcher_args = [
            colmap_path, 'exhaustive_matcher',
            '--database_path', os.path.join(source, "database.db"),
        ]
        exh_output = (subprocess.check_output(exhaustive_matcher_args, universal_newlines=True))
        f.writelines(exh_output)
        print('Exhaustive matched')
    
        p = os.path.join(source, 'sparse')
        if not os.path.exists(p):
            os.makedirs(p)
    
        mapper_args = [
            colmap_path, 'hierarchical_mapper',
            '--database_path', os.path.join(source, "database.db"),
            '--image_path', os.path.join(source, "images"),
            # '--ImageReader.mask_path', os.path.join(basedir, args.masks),
            '--output_path', os.path.join(source, 'sparse'),
        ]
    
        map_output = (subprocess.check_output(mapper_args, universal_newlines=True))
        f.writelines(map_output)
        print('Sparse map created')
    
        os.makedirs(f'{source}/dense/', exist_ok=True)
    
        danse_args = [
            colmap_path, 'image_undistorter',
            '--image_path', os.path.join(source, "images"),
            # '--ImageReader.mask_path', os.path.join(basedir, args.masks),
            '--input_path', os.path.join(source, 'sparse', '0'),
            '--output_path', os.path.join(source, 'dense'),
            '--output_type', 'COLMAP',
        ]
    
        danse_output = (subprocess.check_output(danse_args, universal_newlines=True))
        f.writelines(danse_output)
        print('Danse map created')
    
        patch_args = [
            colmap_path, 'patch_match_stereo',
            '--workspace_path', os.path.join(source, 'dense'),
            '--workspace_format', 'COLMAP',
            '--PatchMatchStereo.cache_size', '64',
        ]
    
        patch_output = (subprocess.check_output(patch_args, universal_newlines=True))
        f.writelines(patch_output)
        print('Patch match stereo succeed')
    
        stereo_args = [
            colmap_path, 'stereo_fusion',
            '--workspace_path', os.path.join(source, 'dense'),
            '--workspace_format', 'COLMAP',
            '--input_type', 'geometric',
            '--output_path', os.path.join(source, 'dense', 'fused.ply'),
            '--StereoFusion.cache_size', '64',
        ]
    
        stereo_output = (subprocess.check_output(stereo_args, universal_newlines=True))
        f.writelines(stereo_output)
        print('Stereo fusion succeed')
    
    load_save_pose(source)
    npy2one(source)
    
    id_list = os.listdir(f'{source}/images')
    id_list = [id[:-4] for id in id_list if id.endswith('0.jpg')]
    id_list.sort(key=lambda _: int(_))
    
    pose_dict = dict()
    for id in id_list:
        pose_dict[id] = np.loadtxt(source + f'pose/{id}.txt')
    
    id_list = [id for id in id_list if not np.isinf(pose_dict[id]).any()]
    id_list.sort()
    
    translation_list = []
    for id in id_list:
        translation_list.append(pose_dict[id][None, :3, 3])
    translation_list = np.concatenate(translation_list)
    translation_center = (translation_list.max(axis=0) + translation_list.min(axis=0)) / 2
    translation_list -= translation_center
    max_cam_norm = np.linalg.norm(translation_list, axis=1).max()
    scale = (scale_radius / max_cam_norm / 1.1)
    
    for id in id_list:
        pose_dict[id][:3, 3] -= translation_center
        pose_dict[id][:3, 3] *= scale
    
    with open(f'{source}/offset.txt', 'w') as f:
        f.write(f'{translation_center}')
    
    with open(f'{source}/scale.txt', 'w') as f:
        f.write(f'{scale}')
    
    os.system(f'cp {source}/intrinsic.txt {target}/intrinsic.txt')
    
    for id in tqdm(id_list):
        color = cv2.imread(f'{source}/images/{id}.jpg')
        color = cv2.resize(color, (width, height))
        cv2.imwrite(f'{target}/images/{id}.png', color)
        np.savetxt(f'{target}/pose/{id}.txt', pose_dict[id])
    
    intrinsic = np.loadtxt(f'{target}/intrinsic.txt')
    
    images_bin_path = f'{source}/sparse/0/images.bin'
    images = read_images_binary(images_bin_path)
    names = [_[1].name for _ in images.items()]
    
    shape = (height, width)
    
    ply_vis_path = f'{source}/dense/fused.ply.vis'
    assert os.path.exists(ply_vis_path)
    masks = [np.zeros(shape, dtype=np.uint8) for name in names]
    load_point_vis(ply_vis_path, masks)
    
    for name, mask in tqdm(zip(names, masks)):
        depth_bin_path = f'{source}/dense/stereo/depth_maps/{name}.geometric.bin'
        if not os.path.exists(depth_bin_path):
            continue
        depth_fname = depth_bin_path
        depth = read_array(depth_fname)
        depth[mask == 0] = 0
        np.save(f'{target}/depth_patchmatch/{name[:-4]}.npy', depth)`
    

    这是报错信息 Traceback (most recent call last): File "/home/ydf/manhattan_sdf/docs/run_colmap/run.py", line 179, in <module> load_point_vis(ply_vis_path, masks) File "/home/ydf/manhattan_sdf/docs/run_colmap/run.py", line 33, in load_point_vis idx, u, v = struct.unpack('<III', f.read(4 * 3)) struct.error: unpack requires a buffer of 12 bytes 其中的原因在于,按照你们提供的代码要求读取文件时,文件大小不匹配,导致不能读到更多信息

    opened by Cat-Lover-Yes 8
  • Run on my own data

    Run on my own data

    Could you please tell me the detail steps of running the code on our own data? I am confused with how to prepare the data, like the intrinsic.txt. And I don't know why would this happen image My pics are all 756*504 but I met this mistake. I don't know how to fix it. I hope to receive your reply,THX!

    opened by DanielLiking 6
  • One issue regarding image rendering with pre-trained model

    One issue regarding image rendering with pre-trained model

    Hi @ghy0324,

    Thanks for sharing your pre-trained model.

    I tested one model (0050_00.pth) to render images at training camera poses.

    Please check the same image below. The left figure is the GT image and the right figure is the rendered image.

    Screenshot from 2022-06-13 19-35-58

    The rendered image is still very blurry using your pre-trained model. I only changed N_rays = 512 due to memory issues on my machine. Is it by-design to trade image loss (or L_img in Eq 5) for a better geometry ?

    Any idea why this issue happens ?

    Thanks.

    opened by jinmaodaomaye2021 5
  • GPU-memory?

    GPU-memory?

    I use TITAN RTX(24g) for training, but CUDA out of memory still occurred when the step is about 2454. eta: 0:05:38 epoch: 2 step: 2454 rgb_loss: 0.1126 psnr: 15.9341 depth_loss: 0.0696 joint_loss: 0.1813 cross_entropy_loss: 0.7693 eikonal_loss: 0.0659 loss: 0.5825 beta: 0.0238 theta: -0.0845 rgb_weight: 1.0000 depth_weight: 1.0000 depth_loss_clamp_weight: 0.5000 joint_weight: 0.0500 ce_weight: 0.5000 eikonal_weight: 0.1000 data: 0.0226 batch: 0.6272 lr: 0.000456 max_mem: 22170 What should I do then? thanks!

    opened by mtk380 5
  • Different offset & scale values when performing the camera normalization by myself

    Different offset & scale values when performing the camera normalization by myself

    Hi, appreciate for releasing the codes!

    I attempt to do the camera normalization step according to this instruction, but I cannot obtain the same offset and scale value provided by you. For instance, the provided offset and scale of scene0050_00 are [4.24910, 2.30138, 1.15986] and [0.40543], while the parameters I got are [4.2678, 2.2656, 1.2732] and [0.4167]. I directly borrowed the code from VolSDF with some modifications to fit the format of scannet annotations.

    They are slightly different, could you explain where the difference comes from and will it affect the final performance a lot?

    Thanks a lot.

    opened by OasisYang 4
  • Problem on data preparation

    Problem on data preparation

    Hi, thanks for your wonderful work! I woud like to train on the new recorded sequences. so I wonder when the 'Data preparation' will be published? Looking forward to your early reply.

    opened by rainfall1998 4
  • About the depth loss L_d

    About the depth loss L_d

    Hi, thanks for your wonderful work. I am not familiar with this area, so I have a question about the calculation of the depth loss L_d(Eq.7). Why only the set of camera rays going through image pixels that have depth values estimated by COLMAP are used for calculating the depth loss? why not use the depth of all pixels?

    opened by botaoye 4
  • Questions for preprocessing custom data

    Questions for preprocessing custom data

    Thank you for your nice work!

    I want to apply this work to my own data.

    I have some images and their own camera poses, and can also get new poses from colmap.

    However, regardless of the source of poses, their axis are not aligned with dominant axis, such as x, y, and z-axis.

    As far as I understand from you paper, our data must be aligned each axis by Manhattan-world assumption.

    Is it okay just to use the colmap procedure you provide to apply your code to my data?

    Thank you!

    opened by ghvision 2
  • Question about the difference between the pose of ScanNet and the processed data.

    Question about the difference between the pose of ScanNet and the processed data.

    Thank you for your excellent work! I am now trying to train new scene of ScanNet using your model. I have found that there is some difference between the pose of ScanNet and your processed data's pose (in t parts). Could you share the process method of pose with me? Thanks~

    opened by LinZhuoChen 2
  • Question about the depth loss

    Question about the depth loss

    Hi, thanks for your beautiful work. And I recently read your paper, it's so amazing and enlightening. When viewing your source code, I'm confused about the depth_loss_clamp.

     if 'depth_loss_clamp' in loss_weights:
                    depth_loss = depth_loss.clamp(max=loss_weights['depth_loss_clamp'])
    

    Why the depth loss should be clamped?

    opened by Liuyveg 2
  • Problem on semantic segmentation evaluation

    Problem on semantic segmentation evaluation

    Hi, I have evaluated the predicted semantic of Scannet_0050_00. I think this sequece has worse segmentation result than others based on semantic picture. However, the IoU_f, IoU_w, IoU_m are 0.73332 , 0.76606, 0.78970, is much higher than table 3 in paper about 0.62 0.52 0.57.

    The GT used is based on the scannet semantic label, while floor contains 1, 161, 52; wall contains 3,140. The evaluation code is based on semantic_nerf calculate_segmentation_metrics, with ignore_label=-1.

    Is there anything wrong? If possible, could I get your calculation formula and evaluation code for semantic evaluation?

    Thanks for your work and looking forward to your reply!

    opened by rainfall1998 2
  • Training and running on custom dataset

    Training and running on custom dataset

    Thank you for this amazing work.

    I wanted to run this for a custom dataset. I prepared everything as recommended here. However I am stuck at the part where we need a ground truth mesh for training. How do I get that?

    opened by bkhanal-11 2
  • Facing issue while training.

    Facing issue while training.

    Hi, thankyou for sharing your wonderful work.

    I've followed the steps given in the setup. When I proceed further to the training step and run python train_net.py --cfg_file configs/scannet/0050.yaml gpus 0, exp_name scannet_0050, I'm facing a RuntimeError: Function 'MmBackward' returned nan values in its 1th output.

    How should I solve this error?

    opened by richas46 3
Owner
ZJU3DV
ZJU3DV is a research group of State Key Lab of CAD&CG, Zhejiang University. We focus on the research of 3D computer vision, SLAM and AR.
ZJU3DV
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

NVIDIA Research Projects 1.4k Jan 1, 2023
The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).

DeepBDC for few-shot learning        Introduction In this repo, we provide the implementation of the following paper: "Joint Distribution Matters: Dee

FeiLong 116 Dec 19, 2022
[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

null 36 Nov 23, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Antoine Yang 108 Dec 27, 2022
[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

null 249 Dec 28, 2022
(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"

RepSurf - Surface Representation for Point Clouds [CVPR 2022 Oral] By Haoxi Ran* , Jun Liu, Chengjie Wang ( * : corresponding contact) The pytorch off

Haoxi Ran 264 Dec 23, 2022
[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

 同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 842 Jan 4, 2023
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Fangzhou Hong 96 Jan 3, 2023
Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Scribble-Supervised LiDAR Semantic Segmentation Dataset and code release for the paper Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORA

null 102 Dec 25, 2022
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral) This is the official implementat

Yifan Zhang 259 Dec 25, 2022
[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

null 256 Dec 28, 2022
Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation This repository contains MegEngine implementation of ou

MEGVII Research 309 Dec 30, 2022
[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

MixFormer The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention [Models and Raw results] (G

Multimedia Computing Group, Nanjing University 235 Jan 3, 2023
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Temporally Efficient Vision Transformer for Video Instance Segmentation Temporally Efficient Vision Transformer for Video Instance Segmentation (CVPR

Hust Visual Learning Team 203 Dec 31, 2022
Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)

Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral) This is the official implementation of Focals Conv (CVPR 2022), a new sp

DV Lab 280 Jan 7, 2023
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 3 Aug 20, 2022
The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

NTIRE 2022 - Image Inpainting Challenge Important dates 2022.02.01: Release of train data (input and output images) and validation data (only input) 2

Andrés Romero 37 Nov 27, 2022
Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral

NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video Project Page | Paper NeuralRecon: Real-Time Coherent 3D Reconstruction from Mon

ZJU3DV 1.4k Dec 30, 2022