OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

Wenbin Lin

Last update: Dec 15, 2022

Related tags

Overview

OcclusionFusion (CVPR'2022)

Overview

This repository contains the code for the CVPR 2022 paper OcclusionFusion, where we introduce a novel method to calculate occlusion-aware 3D motion to guide dynamic 3D reconstruction.

In our technique, the motion of visible regions is first estimated and combined with temporal information to infer the motion of the occluded regions through an LSTM-involved graph neural network.

Currently, we provide a pretrained model and a demo. Code for data pre-processing, network training and evaluation will be available soon.

Setup

We use python 3.8.10, pytorch-1.8.0 and pytorch-geometric-1.7.2.

conda create -n occlusionfu python==3.8.10
conda activate occlusionfu
pip install -r requirements.txt
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
pip install torch-scatter==2.0.8 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu102.html
pip install torch-sparse==0.6.12 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu102.html
pip install torch-cluster==1.5.9 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu102.html
pip install torch-spline-conv==1.2.1 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu102.html
pip install torch-geometric==1.7.2

Running the demo

Run the demo with the pretrained model and prepared inputs:

python demo.py

Visualize the input and output:

python visualize.py

The defualt setting of visualize.py will render the network's input and output to a video as follow. You can also change the setting to view the network's input and output with Open3D viewer.

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{lin2022occlusionfusion,
    title={OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction}, 
    author={Wenbin Lin, Chengwei Zheng, Jun-Hai Yong, Feng Xu}, 
    journal={Conference on Computer Vision and Pattern Recognition (CVPR)}, 
    year={2022}
}

Comments

Complete node graph creation in DeepDeform and Live Demo
Thanks for sharing this amazing work! I had a doubt regarding the complete node graph used as input for Occlusion-aware Motion Estimation Network module.

Unlike DeformingThings4D where the complete object surface is known. In datasets like Deepdeform or for live demo, only front-view is available. In these cases what is the input to the module?

Is complete object surface precomputed (maybe by DynamicFusion)?

Or only the graph extracted from front-view RGBD Image at frame t_0 is used. All the confidence and visibility scores are computed on this graph and no graph update is made during the motion estimation step?
opened by shubhMaheshwari 8
Question on OpticalFlow to Node motion
Congrats on your great work! I am currently trying to re-implement your work. I am faced with a problem: you said in your paper that you generate 3D notion motion from optical flow image. I currently can think up of two ways to do it:

Project node position to the optical flow image, and read value at that pixel.

Compute the motion of each vertex, and compute node motion by averaging the motion of its nearby vertex.

Which one should I choose?

Best, Haonan
opened by changhaonan 6
About the dataset version of FlyingThings3D

Thanks for your wonderful work. I am trying to reproduce your modification on optical flow model RAFT and stumble upon a choice between the full version and subset version (FlowNet2.0 uses this) so I want to ask which version did you use?

opened by phamtrongthang123 1
Decrease GPU memory usage

There were issues regarding the high memory usage. I couldn't run the demo myself due to CUDA out-of-memory errors.

The solution is to avoid computing gradients at the forward step. As it was before, the gradients accumulate at each iteration since they are not used. This results in linearly increasing memory usage to store unnecessary tensors.

opened by Barbany 0
down sample & up sample？More information?

Is it a random process in down sample & up sample?Or you process by another way ? I am confused for it and did't get useful information in your arXiv paper. And Is it feasible to OPEN SOURCES more information about some details about your code release?

opened by Lin-13 0

This is a really great project, we try to run the demo in our PC (NVIDIA GeForce RTX 3080 Ti). But we encountered this error, can anyone tell me how to resolve this issue. we have tried setting max_split_size_mb but not worked.

File "E:\projects\OcclusionFusion\model.py", line 89, in forward
  feature7 = self.layer72(feature7, edge_indexes[0])
...
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 12.00 GiB total capacity; 11.10 GiB already allocated; 0 bytes free; 11.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

opened by luxious 1

Can measure the volume from the result of the 3D reconstruction?

Hello, excuse me! First, thanks for your excellent work. Then I want to enquire if this algorithm can get the volume of the real object directly. I will look forward to your reply. Thanks.

opened by Jie-Huangi 0
Background subtraction

Hi! Thank you for the great work! Your real-time results look amazing!

I have a question about the depth image data. It seems like in all of your reconstructed results, the backgrounds (for example the walls) are all removed. May I ask how you guys do it? I mean, it seems like optical flow can solve part of the problem. But optical flows are generated from color images, right? I assume they are not perfect. For example, if you pick all the points that have flow values u^2+v^2 > 1, there will always be some background pixels included in the masked area. Do you set up a threshold for input depth values so you subtract the background in the very beginning? Or do you remove it after you calculated the optical flow? Or do you build everything in the canonical model anyway, you just not visualize it in the experiment? Or anything else?

In some other cases, the person may not move very drastically, so the optical flow may ignore a large part of the person. Do you run into similar problems? Any idea how I can solve this?

Thanks again.

opened by BoomFan 0

Owner

Wenbin Lin

GitHub https://wenbin-lin.github.io/OcclusionFusion/

"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (CVPRW 2022) Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Z

274 Jan 5, 2023

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

138 Dec 28, 2022

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

PN-Net We present a neural field-based framework for depth estimation from single-view RGB images. Rather than representing a 2D depth map as a single

1 Oct 2, 2021

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

CoReNet CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image. It produces coherent reconstructions, where all objec

80 Dec 25, 2022

SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images

SymmetryNet SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images ACM Transactions on Gra

26 Dec 5, 2022

MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

494 Jan 6, 2023

Dynamic View Synthesis from Dynamic Monocular Video

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer This repository contains code to compute depth from a

2.3k Jan 1, 2023

Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video Project Website | Video | Paper Dynamic View Synthesis from Dynamic Monocular Video Chen Gao, Ayus

139 Dec 28, 2022

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

143 Dec 22, 2022

3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

0 Feb 6, 2022

Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)

Table of Content Introduction Getting Started Datasets Installation Experiments Training & Testing Pretrained models Texture fine-tuning Demo Toward R

42 Dec 5, 2022

Code for "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" paper

UNICORN ?? Webpage | Paper | BibTex PyTorch implementation of "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" pap

118 Jan 6, 2023

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

11 Feb 8, 2022

Dynamic Realtime Animation Control

Our project is targeted at making an application that dynamically detects the user’s expressions and gestures and projects it onto an animation software which then renders a 2D/3D animation realtime that gets broadcasted live.

10 Aug 1, 2022

Towards uncontrained hand-object reconstruction from RGB videos

Towards uncontrained hand-object reconstruction from RGB videos Yana Hasson, Gül Varol, Ivan Laptev and Cordelia Schmid Project page Paper Table of Co

69 Dec 27, 2022

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

CenterPose Overview This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation fro

188 Dec 27, 2022

Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image

NonCuboidRoom Paper Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image Cheng Yang*, Jia Zheng*, Xili Dai, Rui Tang, Yi Ma, Xiao

67 Dec 15, 2022

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

677 Dec 25, 2022

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction TSDF++ is a novel multi-object TSDF formulation that can encode mult

130 Dec 29, 2022

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

Related tags

Overview

OcclusionFusion (CVPR'2022)

Project Page | Paper | Video

Overview

Setup

Running the demo

Citation

Comments

Complete node graph creation in DeepDeform and Live Demo

Question on OpticalFlow to Node motion

About the dataset version of FlyingThings3D

Decrease GPU memory usage

down sample & up sample？More information?

How to decrease memory usage?

Can measure the volume from the result of the 3D reconstruction?

Background subtraction

Owner

Wenbin Lin

"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images

MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)

Code for "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" paper

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Dynamic Realtime Animation Control

Towards uncontrained hand-object reconstruction from RGB videos

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction