Towards Part-Based Understanding of RGB-D Scans

Last update: Nov 23, 2022

Related tags

Deep Learning part-based-scan-understanding

Overview

Towards Part-Based Understanding of RGB-D Scans (CVPR 2021)

We propose the task of part-based scene understanding of real-world 3D environments: from an RGB-D scan of a scene, we detect objects, and for each object predict its decomposition into geometric part masks, which composed together form the complete geometry of the observed object.

Download Paper (.pdf)

Demo samples

Get started

The core of this repository is a network, which takes as input preprocessed scan voxel crops and produces voxelized part trees. However, data preparation is very massive step before launching actual training and inference. That's why we release already prepared data for training and checkpoint to perform inference. If you want to launch training with our data, please follow the steps below:

Clone repo: git clone https://github.com/alexeybokhovkin/part-based-scan-understanding.git
Download data and/or checkpoint:
ScanNet MLCVNet crops (finetune) [894M]
ScanNet clean crops (pretraining) [995M]
PartNet GT trees [103M]
Parts priors [169M]
Checkpoint [19M]
For training, prepare augmented version of ScanNet crops with script dataproc/prepare_rot_aug_data.py. After this, create a folder with all necessary dataset metadata using script dataproc/gather_all_shapes.py
Create config file similar to configs/config_gnn_scannet_allshapes.yaml (you need to provide paths to some directories and files)
Launch training with train_gnn_scannet.py

Citation

If you use this framework please cite:

@article{Bokhovkin2020TowardsPU,
  title={Towards Part-Based Understanding of RGB-D Scans},
  author={Alexey Bokhovkin and V. Ishimtsev and Emil Bogomolov and D. Zorin and A. Artemov and Evgeny Burnaev and Angela Dai},
  journal={ArXiv},
  year={2020},
  volume={abs/2012.02094}
}

You might also like...

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

PN-Net We present a neural field-based framework for depth estimation from single-view RGB images. Rather than representing a 2D depth map as a single

1 Oct 2, 2021

PoseCamera is python based SDK for human pose estimation through RGB webcam.

PoseCamera PoseCamera is python based SDK for human pose estimation through RGB webcam. Install install posecamera package through pip pip install pos

7 Jul 20, 2021

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

CenterPose Overview This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation fro

188 Dec 27, 2022

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

OcclusionFusion (CVPR'2022) Project Page | Paper | Video Overview This repository contains the code for the CVPR 2022 paper OcclusionFusion, where we

193 Dec 15, 2022

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow

18 Oct 6, 2022

The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

Comments

scannet_shape_ids files and part segmentation
First of all, thanks for the great work! I have two questions about this repo and your paper:

It seems that txt files for scannet_shape_ids are required for prepare_rot_aug_data.py. But I cannot find them in the provided dataset files.

Could you explain more details about part segmentation on 3D scans? I'm confused if the part segmentation labels for 3d scans are generated by 1) aligning PartNet data, 2) assigning part labels to overlapped regions. Do you provide point-wise (or voxel-wise) part segmentation annotation?
opened by jeonghyunkeem 0

Towards Part-Based Understanding of RGB-D Scans

Related tags

Overview

Towards Part-Based Understanding of RGB-D Scans (CVPR 2021)

Demo samples

Get started

Citation

You might also like...

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

PoseCamera is python based SDK for human pose estimation through RGB webcam.

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

EasyMocap is an open-source toolbox for markerless human motion capture from RGB videos.

Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

Comments

scannet_shape_ids files and part segmentation

Releases(v0.1)

v0.1(Jun 18, 2021)

Owner

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

Towards uncontrained hand-object reconstruction from RGB videos

Towards Long-Form Video Understanding

[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN

Automatic detection and classification of Covid severity degree in LUS (lung ultrasound) scans

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.