Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)

VinAI Research

Last update: Dec 5, 2022

Related tags

Deep Learning computer-vision deep-learning python3 pytorch unsupervised-learning 3d-reconstruction 3d-graphics iccv 3d-objects anaconda3 vinai 3d-object-reconstruction iccv2021 iccv21 phong-shading iccv-2021

Overview

Table of Content

Introduction
Getting Started
- Datasets
- Installation
Experiments

Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images

Recovering the 3D structure of an object from a single image is a challenging task due to its ill-posed nature. One approach is to utilize the plentiful photos of the same object category to learn a strong 3D shape prior for the object. We propose a general framework without symmetry constraint, called LeMul, that effectively Learns from Multi-image datasets for more flexible and reliable unsupervised training of 3D reconstruction networks. It employs loose shape and texture consistency losses based on component swapping across views.

Details of the model architecture and experimental results can be found in our following paper.

@inproceedings{ho2021lemul,
      title={Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images},
      author={Long-Nhat Ho and Anh Tran and Quynh Phung and Minh Hoai},
      booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
      year={2021}
}

Please CITE our paper whenever our model implementation is used to help produce published results or incorporated into other software.

Getting Started

Datasets

CelebA face dataset. Please download the original images (img_celeba.7z) from their website and run celeba_crop.py in data/ to crop the images.
Synthetic face dataset generated using Basel Face Model. This can be downloaded using the script download_synface.sh provided in data/.
Cat face dataset composed of Cat Head Dataset and Oxford-IIIT Pet Dataset (license). This can be downloaded using the script download_cat.sh provided in data/.
CASIA WebFace dataset. You can download the original dataset from backup links such as the Google Drive link on this page. Decompress, and run casia_data_split.py in data/ to re-organize the images.

Please remember to cite the corresponding papers if you use these datasets.

Installation:

# clone the repo
git clone https://github.com/VinAIResearch/LeMul.git
cd LeMul

# install dependencies
conda env create -f environment.yml

Experiments

Training and Testing

Check the configuration files in experiments/ and run experiments, eg:

# Training
python run.py --config experiments/train_multi_CASIA.yml --gpu 0 --num_workers 4

# Testing
python run.py --config experiments/test_multi_CASIA.yml --gpu 0 --num_workers 4

Texture fine-tuning

With collection-style datasets such as CASIA, you can fine-tune the texture estimation network after training. Check the configuration file experiments/finetune_CASIA.yml as an example. You can run it with the command:

python run.py --config experiments/finetune_CASIA.yml --gpu 0 --num_workers 4

Pretrained Models

Pretrained models can be found here: Google Drive Please download and place pretrained models in ./pretrained folder.

Demo

After downloading pretrained models and preparing input image folder, you can run demo, eg:

python demo/demo.py --input demo/human_face_cropped --result demo/human_face_results --checkpoint pretrained/casia_checkpoint028.pth

Options:

--config path-to-training-config-file.yml: input the config file used in training (recommended)
--detect_human_face: enable automatic human face detection and cropping using MTCNN. You need to install facenet-pytorch before using this option. This only works on human face images
--gpu: enable GPU
--render_video: render 3D animations using neural_renderer (GPU is required)

To replicate the results reported in the paper with the model pretrained on the CASIA dataset, use the --detect_human_face option with images in folder demo/images/human_face and skip that flag with images in demo/images/human_face_cropped.

Comments

Questions about evaluation metrics

Thank you for your generous open source.

Now that I have been through the training process, and obtain a model trained for 70 epochs.

So what I wonder is how to output or where I can find these evaluation metrics to measure the performance of my trained model, and if any ground truth data needed for this evaluation process.

Looking forward to your reply and help, much thanks.

opened by Yual16078 1

SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images

SymmetryNet SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images ACM Transactions on Gra

26 Dec 5, 2022

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

CoReNet CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image. It produces coherent reconstructions, where all objec

80 Dec 25, 2022

Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery (ICCV 2021)

Change is Everywhere Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery by Zhuo Zheng, Ailong Ma, Liangpei Zhang and Yanfei

125 Dec 13, 2022

PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images This r

63 Dec 16, 2022

Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of images as "pixels"

picinpics Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of

1 Oct 24, 2021

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

ManhattanSLAM Authors: Raza Yunus, Yanyan Li and Federico Tombari ManhattanSLAM is a real-time SLAM library for RGB-D cameras that computes the camera

117 Dec 28, 2022

Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images (ICCV 2021)

Related tags

Overview

Table of Content

Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images

Getting Started

Datasets

Installation:

Experiments

Training and Testing

Texture fine-tuning

Pretrained Models

Demo

You might also like...

SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery (ICCV 2021)

PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of images as "pixels"

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

Comments

Questions about evaluation metrics

Owner

VinAI Research

Toward Spatially Unbiased Generative Models (ICCV 2021)

Simple Tensorflow implementation of Toward Spatially Unbiased Generative Models (ICCV 2021)

[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

Code for "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" paper

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

[NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Images