CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

Overview

CoReNet

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image. It produces coherent reconstructions, where all objects live in a single consistent 3D coordinate frame relative to the camera, and they do not intersect in 3D. You can find more information in the following paper: CoReNet: Coherent 3D scene reconstruction from a single RGB image.

This repository contains source code, dataset pointers, and instructions for reproducing the results in the paper. If you find our code, data, or the paper useful, please consider citing

@InProceedings{popov20eccv,
  title="CoReNet: Coherent 3D Scene Reconstruction from a Single RGB Image",
  author="Popov, Stefan and Bauszat, Pablo and Ferrari, Vittorio", 
  booktitle="Computer Vision -- ECCV 2020",
  year="2020",
  doi="10.1007/978-3-030-58536-5_22"
}

Table of Contents

Installation

The code in this repository has been verified to work on Ubuntu 18.04 with the following dependencies:

# General APT packages
sudo apt install \
  python3-pip python3-virtualenv python python3.8-dev g++-8 \
  ninja-build git libboost-container-dev unzip

# NVIDIA related packages
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /"
sudo apt install \
    nvidia-driver-455 nvidia-utils-455 `#driver, CUDA+GL libraries, utils` \
    cuda-runtime-10-1 cuda-toolkit-10-2 libcudnn7 `# Cuda and CUDNN`

To install CoReNet, you need to clone the code from GitHub and create a python virtual environment.

# Clone CoReNet
mkdir -p ~/prj/corenet
cd ~/prj/corenet
git clone https://github.com/google-research/corenet.git .

# Setup a python virtual environment
python3.8 -m virtualenv --python=/usr/bin/python3.8 venv_38
. venv_38/bin/activate
pip install -r requirements.txt

All instructions below assume that CoReNet lives in ~/prj/corenet, that this is the current working directory, and that the virtual environment is activated. You can also run CoReNet using the supplied docker file: ~/prj/corenet/Dockerfile.

Datasets

The CoReNet paper introduced several datasets with synthetic scenes. To reproduce the experiments in the paper you need to download them, using:

cd ~/prj/corenet
mkdir -p ~/prj/corenet/data/raw
for n in single pairs triplets; do  
  for s in train val test; do
    wget "https://storage.googleapis.com/gresearch/corenet/${n}.${s}.tar" \
      -O "data/raw/${n}.${s}.tar" 
    tar -xvf "data/raw/${n}.${s}.tar" -C data/ 
  done 
done

For each scene, these datasets provide the objects placement, a good view point, and two images rendered from it with a varying degree of realism. To download the actual object geometry, you need to download ShapeNetCore.v2.zip from ShapeNet's original site, unpack it, and convert the 3D meshes to CoReNet's binary format:

echo "Please download ShapeNetCore.v2.zip from ShapeNet's original site and "
echo "place it in ~/prj/corenet/data/raw/ before running the commands below"

cd ~/prj/corenet
unzip data/raw/ShapeNetCore.v2.zip -d data/raw/
PYTHONPATH=src python -m preprocess_shapenet \
  --shapenet_root=data/raw/ShapeNetCore.v2 \
  --output_root=data/shapenet_meshes

Models from the paper

To help reproduce the results from the CoReNet paper, we offer 5 pre-trained models from it (h5, h7, m7, m9, and y1; details below and in the paper). You can download and unpack these using:

cd ~/prj/corenet
wget https://storage.googleapis.com/gresearch/corenet/paper_tf_models.tgz \
  -O data/raw/paper_tf_models.tgz
tar xzvf data/raw/paper_tf_models.tgz -C data/

You can evaluate the downloaded models against their respective test sets using:

MODEL=h7  # Set to one of: h5, h7, m7, m9, y1

cd ~/prj/corenet
ulimit -n 4096
OMP_NUM_THREADS=2 CUDA_HOME=/usr/local/cuda-10.2 PYTHONPATH=src \
TF_CPP_MIN_LOG_LEVEL=1 PATH="${PATH}:${CUDA_HOME}/bin" \
FILL_VOXELS_CUDA_FLAGS=-ccbin=/usr/bin/gcc-8 \
python -m dist_launch --nproc_per_node=1 \
tf_model_eval --config_path=configs/paper_tf_models/${MODEL}.json5

To run on multiple GPUs in parallel, set --nproc_per_node to the number of desired GPUs. You can use CUDA_VISIBLE_DEVICES to control which GPUs exactly to use. CUDA_HOME, PATH, and FILL_VOXELS_CUDA_FLAGS control the just-in-time compiler for the voxelization operation.

Upon completion, quantitative results will be stored in ~/prj/corenet/output/paper_tf_models/${MODEL}/voxel_metrics.csv. Qualitative results will be available in ~/prj/corenet/output/paper_tf_models/${MODEL}/ in the form of PNG files.

This table summarizes the model attributes and their performance. More details can be found in the paper.

model dataset realism native resolution mean IoU
h5 single low 128 x 128 x 128 57.9%
h7 single high 128 x 128 x 128 59.1%
y1 single low 32 x 32 x 32 53.3%
m7 pairs high 128 x 128 x 128 43.1%
m9 triplets high 128 x 128 x 128 43.9%

Note that all models are evaluated on a grid resolution of 128 x 128 x 128, independent of their native resolution (see section 3.5 in the paper). The performance computed with this code matches the one reported in the paper for h5, h7, m7, and m9. For y1, the performance here is slightly higher (+0.2% IoU), as we no longer have the exact checkpoint used in the paper.

You can also run these models on individual images interactively, using the corenet_demo.ipynb notebook. For this, you need to also pip install jupyter-notebook in your virtual environment.

Training and evaluating a new model

We offer PyTorch code for training and evaluating models. To train a model, you need to (once) import the starting ResNet50 checkpoint:

cd ~/prj/corenet
PYTHONPATH=src python -m import_resnet50_checkpoint

Then run:

MODEL=h7  # Set to one of: h5, h7, m7, m9 

cd ~/prj/corenet
ulimit -n 4096
OMP_NUM_THREADS=2 CUDA_HOME=/usr/local/cuda-10.2 PYTHONPATH=src \
TF_CPP_MIN_LOG_LEVEL=1 PATH="${PATH}:${CUDA_HOME}/bin" \
FILL_VOXELS_CUDA_FLAGS=-ccbin=/usr/bin/gcc-8 \
python -m dist_launch --nproc_per_node=1 \
train --config_path=configs/models/h7.json5

Again, use --nproc_per_node and CUDA_VISIBLE_DEVICES to control parallel execution on multiple GPUs, CUDA_HOME, PATH, and FILL_VOXELS_CUDA_FLAGS control just-in-time compilation.

You can also evaluate individual checkpoints, for example:

cd ~/prj/corenet
ulimit -n 4096
OMP_NUM_THREADS=2 CUDA_HOME=/usr/local/cuda-10.2 PYTHONPATH=src \
TF_CPP_MIN_LOG_LEVEL=1 PATH="${PATH}:${CUDA_HOME}/bin" \
FILL_VOXELS_CUDA_FLAGS=-ccbin=/usr/bin/gcc-8 \
python -m dist_launch --nproc_per_node=1 eval \
  --cpt_path=output/models/h7/cpt/persistent/state_000000000.cpt \
  --output_path=output/eval_cpt_example \
  --eval_names_regex="short.*" \
  -jq '(.. | .config? | select(.num_qualitative_results != null) | .num_qualitative_results) |= 4' \

The -jq option limits the number of qualitative results to 4 (see also Further details section)

We currently offer checkpoints trained with this code for models h5, h7, m7, and m9, in this .tgz. These checkpoints achieve slightly better performance than the paper (see table below). This is likely due to a different distributed training strategy (synchronous here vs. asynchronous in the paper) and a different ML framework (PyTorch vs. TensorFlow in the paper).

h5 h7 m7 m9
mean IoU 60.2% 61.6% 45.0% 46.9%

Further details

Configuration files

The evaluation and training scripts are configured using JSON5 files that map to the TfModelEvalPipeline and TrainPipeline dataclasses in src/corenet/configuration.py. You can find description of the different configuration options in code comments, starting from these two classes.

You can also modify the configuration on the fly, through jq queries, as well as defines that change entries in the string_templates section. For example, the following options change the number of workers, and the prefetch factor of the data loaders, as well as the location of the data and the output directories:

... \
-jq "'(.. | .data_loader? | select(. != null) | .num_data_workers) |= 12'" \
    "'(.. | .data_loader? | select(. != null) | .prefetch_factor) |= 4'" \
-D 'data_dir=gs://some_gcs_bucket/data' \
   'output_dir=gs://some_gcs_bucket/output/models'

Dataset statistics

The table below summarizes the number of scenes in each dataset

single pairs triplets
train 883084 319981 80000
val 127286 45600 11400
test 246498 91194 22798

Licenses

The code and the checkpoints are released under the Apache 2.0 License. The datasets, the documentation, and the configuration files are licensed under the Creative Commons Attribution 4.0 International License.

Comments
  • How to get

    How to get "mesh_visible_fractions"

    Hi, After reading the Data formats and coordinate systems document. I want to know how the value of "mesh_visible_fractions" mentioned in "Scene fields" is obtained. thanks.

    opened by njfugit 2
  • "Camera transform" matrix

    Hi, thanks for the great work and detailed documentation.

    While checking the data, I was wondering how the "camera_transform" is chosen and what's the purpose of the homogeneous part.

    I understand that the 3x3 part of the 4x4 matrix is the FOV, but what is the purpose of the 4th column [-0.8660, 0.8660, 0.8664, 0.8666]?

    Thanks a lot!

    opened by xheon 2
  • How to use the model trained by pytorch and get opengl_image, pbrt_image

    How to use the model trained by pytorch and get opengl_image, pbrt_image

    Hello, I have successfully run the demo. I have some questions, how to use the model trained by pytorch in the demo file? Do I need to convert it into a model file in pb format? If I want to make my own data set, how are the files opengl_image.npy and pbrt_image.npy generated? This will be of great help to my follow-up work.Looking forward to your reply, thank you.

    opened by njfugit 0
  • Reconstruction  of building

    Reconstruction of building

    Hello, First of all thank you for the amazing work and making it public. I have a small question. I was experimenting with CoreNet for building reconstruction and none of the saved models worked. So, I just wanted to know that whether have you used any of the building data available in ShapeNet for training any of the models? And if not, then if I want to have a model that reconstructs buildings from images, do I need to take care or change any specific part of the code, except for creating a building dataset? Thank you

    opened by bhowmickarka 1
  • Why do you use `BatchRenorm` instead of `nn.BatchNorm3d`?

    Why do you use `BatchRenorm` instead of `nn.BatchNorm3d`?

    I found that you implemented a BatchRenorm module in your code. I wonder that why you did't use nn.BatchNorm3d of pytorch directly? I hope you can explain this detail, thanks a lot!

    opened by bluestyle97 1
Owner
Google Research
Google Research
A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

ManhattanSLAM Authors: Raza Yunus, Yanyan Li and Federico Tombari ManhattanSLAM is a real-time SLAM library for RGB-D cameras that computes the camera

null 117 Dec 28, 2022
OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

OcclusionFusion (CVPR'2022) Project Page | Paper | Video Overview This repository contains the code for the CVPR 2022 paper OcclusionFusion, where we

Wenbin Lin 193 Dec 15, 2022
Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

CenterPose Overview This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation fro

NVIDIA Research Projects 188 Dec 27, 2022
Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, as well as object detection and instance segmentation projects.

null 305 Dec 16, 2022
Towards uncontrained hand-object reconstruction from RGB videos

Towards uncontrained hand-object reconstruction from RGB videos Yana Hasson, Gül Varol, Ivan Laptev and Cordelia Schmid Project page Paper Table of Co

Yana 69 Dec 27, 2022
Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

Gyeongsik Moon 677 Dec 25, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
This project uses Template Matching technique for object detecting by detection of template image over base image.

Object Detection Project Using OpenCV This project uses Template Matching technique for object detecting by detection the template image over base ima

Pratham Bhatnagar 7 May 29, 2022
This project uses Template Matching technique for object detecting by detection of template image over base image

Object Detection Project Using OpenCV This project uses Template Matching technique for object detecting by detection the template image over base ima

Pratham Bhatnagar 4 Nov 16, 2021
DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

Visual Learning Lab 143 Dec 22, 2022
3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

Владислав Молодцов 0 Feb 6, 2022
TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction TSDF++ is a novel multi-object TSDF formulation that can encode mult

ETHZ ASL 130 Dec 29, 2022
Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

Han Xu 129 Dec 11, 2022
Official implementation of "SinIR: Efficient General Image Manipulation with Single Image Reconstruction" (ICML 2021)

SinIR (Official Implementation) Requirements To install requirements: pip install -r requirements.txt We used Python 3.7.4 and f-strings which are in

null 47 Oct 11, 2022
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set —— PyTorch implementation This is an unofficial offici

Sicheng Xu 833 Dec 28, 2022
Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image

NonCuboidRoom Paper Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image Cheng Yang*, Jia Zheng*, Xili Dai, Rui Tang, Yi Ma, Xiao

null 67 Dec 15, 2022
This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

null 5 Jan 4, 2023
Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)

Table of Content Introduction Getting Started Datasets Installation Experiments Training & Testing Pretrained models Texture fine-tuning Demo Toward R

VinAI Research 42 Dec 5, 2022