DeepPanoContext (DPC) [Project Page (with interactive results)][Paper]

DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization

Cheng Zhang, Zhaopeng Cui, Cai Chen, Shuaicheng Liu, Bing Zeng, Hujun Bao, Yinda Zhang

Introduction

This repo contains data generation, data preprocessing, training, testing, evaluation, visualization code of our ICCV 2021 paper.

Install

Install necessary tools and create conda environment (needs to install anaconda if not available):

sudo apt install xvfb ninja-build freeglut3-dev libglew-dev meshlab
conda env create -f environment.yaml
conda activate Pano3D
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
python project.py build

When running python project.py build, the script will run external/build_gaps.sh which requires password for sudo privilege for apt-get install. Please make sure you are running with a user with sudo privilege. If not, please reach your administrator for installation of these libraries and comment out the corresponding lines then run python project.py build.
If you encounter /usr/bin/ld: cannot find -lGL problem when building GAPS, please follow this issue.

Since the dataloader loads large number of variables, before training, please follow this to raise the open file descriptor limits of your system. For example, to permanently change the setting, edit /etc/security/limits.conf with a text editor and add the following lines:

*         hard    nofile      500000
*         soft    nofile      500000
root      hard    nofile      500000
root      soft    nofile      500000

Demo

Download the pretrained checkpoints of detector, layout estimation network, and other modules. Then unzip the folder out into the root directory of current project. Since the given checkpoints are trained with current version of our code, which is a refactored version, the results are slightly better than those reported in our paper.

Please run the following command to predict on the given example in demo/input with our full model:

CUDA_VISIBLE_DEVICES=0 WANDB_MODE=dryrun python main.py configs/pano3d_igibson.yaml --model.scene_gcn.relation_adjust True --mode test

Or run without relation optimization:

CUDA_VISIBLE_DEVICES=0 WANDB_MODE=dryrun python main.py configs/pano3d_igibson.yaml --mode test

The results will be saved to out/pano3d/<demo_id>. If nothing goes wrong, you should get the following results:

Data preparation

Our data is rendered with iGibson. Here, we follow their Installation guide to download iGibson dataset, then render and preprocess the data with our code.

Download iGibson dataset with:

python -m gibson2.utils.assets_utils --download_ig_dataset

Render panorama with:

python -m utils.render_igibson_scenes --renders 10 --random_yaw --random_obj --horizon_lo --world_lo

The rendered dataset should be in data/igibson/.

Make models watertight and render/crop single object image:
```
python -m utils.preprocess_igibson_obj --skip_mgn
```
The processed results should be in data/igibson_obj/.
(Optional) Before proceeding to the training steps, you could visualize dataset ground-truth of data/igibson/ with:
```
python -m utils.visualize_igibson
```
Results ('visual.png' and 'render.png') should be saved to folder of each camera like data/igibson/Pomaria_0_int/00007.

Training and Testing

Preparation

We use the pretrained weights of Implicit3DUnderstanding for fine-tuning Bdb3d Estimation Network (BEN) and LIEN+LDIF. Please download the pretrained checkpoint and unzip it into out/total3d/20110611514267/.
We use wandb for logging and visualizing experiments. You can follow their quickstart guide to sign up for a free account and login on your machine with wandb login. The training and testing results will be uploaded to your project "deeppanocontext".
Hint: The <XXX_id> in the commands bellow needs to be replaced with the XXX_id trained in the previous steps.
Hint: In the steps bellow, when training or testing with main.py, you can override yaml configurations with command line parameter:
```
CUDA_VISIBLE_DEVICES=0 python main.py configs/layout_estimation_igibson.yaml --train.epochs 100
```
This might be helpful when debugging or tuning hyper-parameters.

First Stage

2D Detector

Train 2D detector (Mask RCNN) with:
```
CUDA_VISIBLE_DEVICES=0 python train_detector.py
```
The trained weights will be saved to out/detector/detector_mask_rcnn
(Optional) When training 2D detector, you could visualize the training process with:
```
tensorboard --logdir out/detector/detector_mask_rcnn --bind_all --port 6006
```
(Optional) Evaluate with:
```
CUDA_VISIBLE_DEVICES=0 python test_detector.py
```
The results will be saved to out/detector/detector_mask_rcnn/evaluation_{train/test}. Alternatively, you can visualize the prediction results on test set with:
```
 CUDA_VISIBLE_DEVICES=0 python test_detector.py --visualize --split test
```
The visualization will be saved to the folder where the model weights file is.
(Optional) Visualize BFoV detection results:
```
CUDA_VISIBLE_DEVICES=0 python main.py configs/detector_2d_igibson.yaml --mode qtest --log.vis_step 1
```
The visualization will be saved to out/detector/<detector_test_id>

Layout Estimation

Train layout estimation network (HorizonNet) with:

CUDA_VISIBLE_DEVICES=0 python main.py configs/layout_estimation_igibson.yaml

The checkpoint and visualization results will be saved to out/layout_estimation/<layout_estimation_id>/model_best.pth

Save First Stage Outputs

Save predictions of 2D detector and LEN as dateset for stage 2 training:

CUDA_VISIBLE_DEVICES=0 WANDB_MODE=dryrun python main.py configs/first_stage_igibson.yaml --mode qtest --weight out/layout_estimation/<layout_estimation_id>/model_best.pth

The first stage outputs should be saved to data/igibson_stage1

(Optional) Visualize stage 1 dataset with:

python -m utils.visualize_igibson --dataset data/igibson_stage1 --skip_render

Second Stage

Object Reconstruction

Train object reconstruction network (LIEN+LDIF) with:

CUDA_VISIBLE_DEVICES=0 python main.py configs/ldif_igibson.yaml

The checkpoint and visualization results will be saved to out/ldif/<ldif_id>.

Bdb3D Estimation

Train bdb3d estimation network (BEN) with:

CUDA_VISIBLE_DEVICES=0 python main.py configs/bdb3d_estimation_igibson.yaml

The checkpoint and visualization results will be saved to out/bdb3d_estimation/<bdb3d_estimation_id>.

Relation SGCN

Train Relation SGCN without relation branch:

CUDA_VISIBLE_DEVICES=0 python main.py configs/relation_scene_gcn_igibson.yaml --model.scene_gcn.output_relation False --model.scene_gcn.loss BaseLoss --weight out/bdb3d_estimation/<bdb3d_estimation_id>/model_best.pth out/ldif/<ldif_id>/model_best.pth

The checkpoint and visualization results will be saved to out/relation_scene_gcn/<relation_sgcn_wo_rel_id>.

Train Relation SGCN with relation branch:

CUDA_VISIBLE_DEVICES=0 python main.py configs/relation_scene_gcn_igibson.yaml --weight out/relation_scene_gcn/<relation_sgcn_wo_rel_id>/model_best.pth --train.epochs 20

The checkpoint and visualization results will be saved to out/relation_scene_gcn/<relation_sgcn_id>.

Fine-tune Relation SGCN end-to-end with relation optimization:

CUDA_VISIBLE_DEVICES=0 python main.py configs/relation_scene_gcn_igibson.yaml --weight out/relation_scene_gcn/<relation_sgcn_id>/model_best.pth --model.scene_gcn.relation_adjust True --train.batch_size 1 --val.batch_size 1 --device.num_workers 2 --train.freeze shape_encoder shape_decoder --model.scene_gcn.loss_weights.bdb3d_proj 1.0 --model.scene_gcn.optimize_steps 20 --train.epochs 10

The checkpoint and visualization results will be saved to out/relation_scene_gcn/<relation_sgcn_ro_id>.

Test Full Model

Run:

CUDA_VISIBLE_DEVICES=0 python main.py configs/relation_scene_gcn_igibson.yaml --weight out/relation_scene_gcn/<relation_sgcn_ro_id>/model_best.pth --log.path out/relation_scene_gcn --resume False --finetune True --model.scene_gcn.relation_adjust True --mode qtest --model.scene_gcn.optimize_steps 100

The visualization results will be saved to out/relation_scene_gcn/<relation_sgcn_ro_test_id>.

Citation

If you find our work and code helpful, please consider cite:

@misc{zhang2021deeppanocontext,
      title={DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization}, 
      author={Cheng Zhang and Zhaopeng Cui and Cai Chen and Shuaicheng Liu and Bing Zeng and Hujun Bao and Yinda Zhang},
      year={2021},
      eprint={2108.10743},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@InProceedings{Zhang_2021_CVPR,
    author    = {Zhang, Cheng and Cui, Zhaopeng and Zhang, Yinda and Zeng, Bing and Pollefeys, Marc and Liu, Shuaicheng},
    title     = {Holistic 3D Scene Understanding From a Single Image With Implicit Representation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {8833-8842}
}

We thank the following great works:

Total3DUnderstanding for their well-structured code. We construct our network based on their well-structured code.
Coop for their dataset. We used their processed dataset with 2D detector prediction.
LDIF for their novel representation method. We ported their LDIF decoder from Tensorflow to PyTorch.
Graph R-CNN for their scene graph design. We adopted their GCN implemention to construct our SGCN.
Occupancy Networks for their modified version of mesh-fusion pipeline.

If you find them helpful, please cite:

@InProceedings{Nie_2020_CVPR,
author = {Nie, Yinyu and Han, Xiaoguang and Guo, Shihui and Zheng, Yujian and Chang, Jian and Zhang, Jian Jun},
title = {Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
@inproceedings{huang2018cooperative,
  title={Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation},
  author={Huang, Siyuan and Qi, Siyuan and Xiao, Yinxue and Zhu, Yixin and Wu, Ying Nian and Zhu, Song-Chun},
  booktitle={Advances in Neural Information Processing Systems},
  pages={206--217},
  year={2018}
}	
@inproceedings{genova2020local,
    title={Local Deep Implicit Functions for 3D Shape},
    author={Genova, Kyle and Cole, Forrester and Sud, Avneesh and Sarna, Aaron and Funkhouser, Thomas},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={4857--4866},
    year={2020}
}
@inproceedings{yang2018graph,
    title={Graph r-cnn for scene graph generation},
    author={Yang, Jianwei and Lu, Jiasen and Lee, Stefan and Batra, Dhruv and Parikh, Devi},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
    pages={670--685},
    year={2018}
}
@inproceedings{mescheder2019occupancy,
  title={Occupancy networks: Learning 3d reconstruction in function space},
  author={Mescheder, Lars and Oechsle, Michael and Niemeyer, Michael and Nowozin, Sebastian and Geiger, Andreas},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={4460--4470},
  year={2019}
}

ENV：conda env create -f environment.yaml

DATA: use Update data - Since iGibson has gone through a major update, their dataset download link has been updated. Please download the dataset here and follow the README to put the dataset into right places.

PROBLEM: Follow the README file. Run Training and Testing - First Stage - Save First Stage Outputs - CUDA_VISIBLE_DEVICES=0 WANDB_MODE=dryrun python main.py configs/first_stage_igibson.yaml --mode qtest --weight out/layout_estimation/<layout_estimation_id>/model_best.pth. An error has occurred: raise ValueError('all input arrays must have the same shape') ValueError: all input arrays must have the same shape

How do I need to solve this problem? Thank you.

how to show 3D interactive result

Well done for the excellent work, I got the predicted result after following the demo instructions, and I wander to get the 3D interactive result as https://chengzhag.github.io/publication/dpc/ shows, which tool can I use plz

opened by arctanbell 2
Model weights for LIEN, LDIF and BEN modules

The provided links in the readme contain model weights for the detector, layout estimation, and relation_scene_gcn modules, as well as the total3D weights.

Would it be possible to also provide the weights for the LIEN, LDIF and BEN modules?

Many thanks!

opened by davisj147 1
Getting error while creating environment.

PermissionError: [Errno 13] Permission denied: '/home/anusree16/Dental/panorama diagnosis and 3D recon/DeepPanoContext-main/condaenv.syr0t84d.requirements.txt'

opened by AnusreeSunilkumar 0
How to obtain the results of Total3d and Im3d?

This work is so excellent! I'm curious about how to obtain the results of total3d and im3d, as the layout they generate is a cuboid. Also, whether I can train total3d and im3d with current codes, or may you share their results? I want to learn how to combine several single-view results into a panoramic one.

opened by aobannong 0
Save First Stage Outputs, ValueError: all input arrays must have the same shape

ENV：conda env create -f environment.yaml

DATA: use Update data - Since iGibson has gone through a major update, their dataset download link has been updated. Please download the dataset here and follow the README to put the dataset into right places.

PROBLEM: Follow the README file. Run Training and Testing - First Stage - Save First Stage Outputs - CUDA_VISIBLE_DEVICES=0 WANDB_MODE=dryrun python main.py configs/first_stage_igibson.yaml --mode qtest --weight out/layout_estimation/<layout_estimation_id>/model_best.pth. An error has occurred: raise ValueError('all input arrays must have the same shape') ValueError: all input arrays must have the same shape

How do I need to solve this problem? Thank you.

opened by VinMing 0
GLSL 4.5 is not supported

Hello, i got problem when i run rendering.

message is like this.

I tried to change versions with export MESA_GL_VERSION_OVERRIDE, export MESA_GLSL_VERSION_OVERRIDE, export MESA_GLES_VERSION_OVERRIDE but it didn't work.

And i don't have 'shader' folder in '/root/anaconda3/envs/Pano3D/lib/python3.7/site-packages/gibson2/render/mesh_renderer'

My environment is docker container of remote server.

Could you help me?

opened by kimkj38 0
libGL.so.1.2.0

When i run python project.py build i get above message. I reinstalled libgl1-mesa-glx and checked whether it is installed by apt-get list --installed but still there's no libGL.so.1.2.0 in /usr/lib/x86_64-linux-gnu

how could i solve it?

opened by kimkj38 2

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

Related tags

Overview

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper]

DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization

Introduction

Install

Demo

Data preparation

Training and Testing

Preparation

First Stage

2D Detector

Layout Estimation

Save First Stage Outputs

Second Stage

Object Reconstruction

Bdb3D Estimation

Relation SGCN

Test Full Model

Citation

Comments

how to show 3D interactive result

Model weights for LIEN, LDIF and BEN modules

Getting error while creating environment.

How to obtain the results of Total3d and Im3d?

Save First Stage Outputs, ValueError: all input arrays must have the same shape

GLSL 4.5 is not supported

libGL.so.1.2.0

Owner

Cheng Zhang

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Code for ICCV 2021 paper "Distilling Holistic Knowledge with Graph Neural Networks"

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Code for the paper "Relation of the Relations: A New Formalization of the Relation Extraction Problem"

Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

Implementation of "Bidirectional Projection Network for Cross Dimension Scene Understanding" CVPR 2021 (Oral)

Numbering permanent and deciduous teeth via deep instance segmentation in panoramic X-rays

Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

Code and datasets for the paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction"

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Code for the paper Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations (AKBC 2021).

Source code for CIKM 2021 paper for Relation-aware Heterogeneous Graph for User Profiling

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''