Fully Convolutional Refined Auto Encoding Generative Adversarial Networks for 3D Multi Object Scenes

Overview

Fully Convolutional Refined Auto-Encoding Generative Adversarial Networks for 3D Multi Object Scenes

This repository contains the source code for Fully Convolutional Refined Auto-Encoding Generative Adversarial Networks for 3D Multi Object Scenes which is my work at Stanford AI Lab as a visiting scholar.
Special thanks to Christopher Choy and Prof. Silvio Savarese.

Contents

  1. Introduction
  2. Dataset
  3. Models
  4. Experiments
  5. Evaluations
  6. Installation
  7. References

Introduction

The generative model utilizing Generative Adversarial Networks or Variational Auto-Encoder is one of the hottest topic in deep learning and computer vision. That doesn’t only enable high quality generation, but also has a lot of possibility for representation learning, feature extraction, and applications to some recognition tasks without supervision using probabilistic spaces and manifolds.
Especially 3D multi object generative models, which allow us to synthesize a variety of novel 3D multi objects and recognize the objects including shapes, objects and layouts, should be an extremely important tasks for AR/VR and graphics fields.
However, 3D generative models are still less developed. Basic generative models of only single objects are published as [1],[2]. But multi objects are not. Therefore I have tried end-to-end 3D multi object generative models using novel generative adversarial network architectures.

Dataset

I used ground truth voxel data of SUNCG dataset. [3]
http://suncg.cs.princeton.edu/

I modified this dataset as follows.

  • Downsized from 240x144x240 to 80x48x80.
  • Got rid of trimming by camera angles.
  • Chose the scenes that have over 10000 amount of voxels.

As a result, over 185K scenes were gathered which have 12 classes (empty, ceiling, floor, wall, window, chair, bed, sofa, table, tvs, furn, objs).
  

This dataset is extremely sparse which means around 92% voxels averagely in these scenes are empty class. In addition, this dataset has plenty of varieties such as living rooms, bathrooms, bedrooms, dining rooms, garages etc.

Models

Network Architecture

The network architecture of this work is fully convolutional refined auto-encoding generative adversarial networks. This is inspired by 3DGAN[1], alphaGAN[4] and SimGAN[5]. And fully convolutional layer and classifying multi objects are novel architectures as generative models.

This network combined a variational auto-encoder with generative adversarial networks. Also the KL divergence loss of variational auto-encoder is replaced with adversarial auto-encoder using code-discriminator as alphaGAN architectuires[4]. In addition, generated scenes are refined by refiner[5]. In this work, the shape of the latent space is 5x3x5x16 and this is calculated by fully convolutional layer. Fully convolution allows us to extract features more specifically like semantic segmentation tasks. As a result, fully convolution enables reconstruction performance to improve.
Adversarial auto-encoder allows us to loosen the constraint of distributions and treat this distribution as implicit. Also generator is trained to fool discriminator by generative adversarial network. As a result, this architecture enables reconstruction and generation performance to improve. In addition, refiner allows us to smooth the object shapes and put up shapes to be more realistic visually.

  

-Encoder

The basic architecture of encoder is similar to discriminator network of 3DGAN[1]. The difference is the last layer which is 1x1x1 fully convolution.

-Generator

The basic architecture of generator is also similar to 3DGAN[1] as above figure. The difference is the last layer which has 12 channels and is activated by softmax. Also, the first layer of latent space is flatten.

-Discriminator

The basic architecture of discriminator is also similar to 3DGAN[1]. The difference is the activation layers which are layer normalization.

-Code Discriminator

Code discriminator is same as alphaGAN[4] which has 2 hidden layers of 750 dimensions.

 

-Refiner

The basic architecrure of refiner is similar to SimGAN[5] which is composed with 4 Resnet blocks. The number of channels is 32 in order to decrease the memory charge.

Loss Functions 

  • Reconstruction loss

    is occupancy normalized weights with every batch to weight the importance of small objects. is a hyperparameter which weights the relative importance of false positives against false negatives.

  • GAN loss

  • Distribution GAN loss

Optimization

  • Encoder

  • Generator and Refiner

  • Discriminator

  • Code Discriminator

    is a hyperparameter which weights the reconstruction loss.

Experiments

Adam optimizer was used for each of the architectures by learning rate 0.0001. This network was trained for 75000 iterations except refiner in first, and then refiner was inserted and trained for more 25000 iterations. Batch size was 20 for first training of the base networks and 8 for second training of the refiner.

Learning curves


Visualization

-Reconstruction  

Here are the results of reconstruction using encoder and generator and refiner.

Almost all voxels are reconstructed although small objects have disappeared. Also that shapes are refined by refiner. Numerical evaluations using IoU and mAP are described below.

-Generation from normal distribution

Here are the results of generation from normal distribution using generator and refiner.

As above figure, FCR-alphaGAN architecture worked better than standard fully convolutional VAE architecture. But this was not enough to represent realistic scene objects. This is assumed because the encoder was not generalized to the distribution, and the probabilistic space was extremely complicated because of the sparsity of the dataset. In order to solve this problem, the probabilistic space should be isolated to each object and layout.

Reconstruction Performance

Here are the numerical evaluations of reconstruction performance.

-Intersection over Union(IoU)

IoU is defined as [6]. The bar chart describes IoU performances of each class. The line chart describes over-all IoU performances.

-mean Average Precision(mAP)

The bar chart and line chart describes same as IoU.

These results give the following considerations.

  • Fully convolution enables reconstruction performance to improve even though the number of dimensions are the same.
  • AlphaGAN architecture enables reconstruction performance to improve.

Evaluations

Interpolation

Here is the interpolation results. Gif image of interpolation is posted on the top of this document.

The latent space walkthrough gives smooth transitions between scenes.

Interpretation of latent space

The charts below are the 2D represented mapping by SVD of 200 encoded samples. Gray scale gradations are followed to 1D embedding by SVD of centroid coordinates of each scene. Left is fully convolution, right is standard 1D latent vector of 1200 dimension.

The chart of fully convolution follows 1d embedding of centroid coordinates from lower right to upper left. This means fully convolution enables the latent space to be related to spatial contexts compared to standard VAE.

The figures below describe the effects of individual spatial dimensions composed of 5x3x5 as the latent space. The normal distribution noises were given on the individual dimension, and the level of change from original scene is represented by red colors.

This means each spatial dimension is related to generation of each positions by fully convolution.

Suggestions of future work

-Revise the dataset

This dataset is extremely sparse and has plenty of varieties. Floors and small objects are allocated to huge varieties of positions, also some of the small parts like legs of chairs broke up in the dataset becouse of the downsizing. That makes predicting latent space too hard. Therefore, it is an important work to revise the dataset like limitting the varieties or adjusting the positions of objects.

-Redefine the latent space

In this work, I defined the latent space with one space which includes all information like shapes and positions of each object. Therefore, some small objects disappeared in the generated models, and a lot of non-realistic objects were generated. In order to solve that, it is an important work to redefine the latent space like isolating it to each object and layout. However, increasing the varieties of objects and taking account into multiple objects are required in that case.

Installation

This package requires python2.7. If you don't have following prerequisites, you need to download them using pip, apt-get etc before downloading this repository.
Also at least 12GB GPU memory is required.

Prerequisites

Following is my environment.

  • Base
 - tensorflow 1.12.0  
 - numpy 1.15.1    
 - easydict 1.9 
  • -Evaluation
 - sklearn.metrics 0.19.2  
  • -Visualization
 - vtk 8.1.2

Download

  • Download the repository and go to the directory.
    $ git clone https://github.com/yunishi3/3D-FCR-alphaGAN.git
    $ cd 3D-FCR-alphaGAN

  • Download and unzip the dataset (It would be 57GB)
    $ wget http://yunishi.s3.amazonaws.com/3D_FCRaGAN/Scenevox.tar.gz
    $ tar xfvz Scenevox.tar.gz

Training

$ python main.py --mode train

Evaluation

If you want to use the pretrained model, you can download the checkpoint files with the following instructions.

  • Download and unzip the pretrained checkpoint.
    It contains checkpoint10000* as confirmation epoch = 10000.
    $ wget http://yunishi.s3.amazonaws.com/3D_FCRaGAN/Checkpt.tar.gz
    $ tar xfvz Checkpt.tar.gz

Or if you want to evaluate your trained model, you can replace confirmation epoch 10000 with another confirmation epoch which you want to confirm.

-Evaluate reconstruction performances from test data and make visualization files.

$ python main.py --mode evaluate_recons --conf_epoch 10000

After execute, you could get the following files in the eval directory.

  • real.npy : Reference models which are chosen as test data.
  • reons.npy : Reconstructed models before refine which are encoded and decoded from reference models.
  • recons_refine.npy : Reconstrcuted models after refine.
  • generate.npy : Generated models from normal distribution before refine.
  • generate_refine.npy : Generated models after refine.
  • AP(_refine).csv : mean average precision result of this reconstruction models.
  • IoU(_refine).csv : Intersection over Union result of this reconstruction models.

-Evaluate the interpolation

$ python main.py --mode evaluate_interpolate --conf_epoch 10000

After execute, you could get interpolation files like above interpolation results. So many and heavy files would be built.

-Evaluate the effect of individual spatial dimensions

$ python main.py --mode evaluate_noise --conf_epoch 10000

After execute, you could get noise files like above interpretation of latent space results. So many and heavy files would be built.

Visualization

In order to visualize the npy files built by evaluation process, you need to use python vtk. Please see the [7] to know the details of this code. I just modified original code to fit the 3D multi object scenes.

  • Go to the eval directory.
    $ cd eval

  • Visualize all and save the png files.
    $ python screenshot.py ***.npy

  • Visualize only 1 file.
    $ python visualize.py ***.npy -i 1
    You can change the visualize model using index -i

References

[1]Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B. Tenenbaum; Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling; arXiv:1610.07584v1
[2]Andrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston; Generative and Discriminative Voxel Modeling with Convolutional Neural Networks; arXiv:1608.04236v2
[3]Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang, Manolis Savva, Thomas Funkhouser; Semantic Scene Completion from a Single Depth Image; arXiv:1611.08974v1
[4]Mihaela Rosca, Balaji Lakshminarayanan, David Warde-Farley, Shakir Mohamed; Variational Approaches for Auto-Encoding Generative Adversarial Networks; arXiv:1706.04987v1
[5]Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, Russ Webb; Learning from Simulated and Unsupervised Images through Adversarial Training; arXiv:1612.07828v1
[6]Christopher B. Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, Silvio Savarese; 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction; arXiv:1604.00449v1
[7]https://github.com/zck119/3dgan-release/tree/master/visualization/python

You might also like...
AirCode: A Robust Object Encoding Method
AirCode: A Robust Object Encoding Method

AirCode This repo contains source codes for the arXiv preprint "AirCode: A Robust Object Encoding Method" Demo Object matching comparison when the obj

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Vision Longformer This project provides the source code for the vision longformer paper. Multi-Scale Vision Longformer: A New Vision Transformer for H

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

CG3 This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning]. R

Minimal PyTorch implementation of Generative Latent Optimization from the paper
Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.
StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks
[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks, ICLR 2021 (Spotlight) Demo | Paper [NEW!] Time to play with our interac

Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)
Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)

Regularizing Generative Adversarial Networks under Limited Data [Project Page][Paper] Implementation for our GAN regularization method. The proposed r

Comments
  • Update Library Versions

    Update Library Versions

    Hello, I have updated all the code to meet the requirements of the newest version of libraries.

    This makes it especially easier to run code in cloud environments with tensorflow-gpu requirements pre-installed.

    I have tested the training and various evaluation code.

    opened by Dragomir2020 1
  • Exporting Latent Interpolation

    Exporting Latent Interpolation

    Hi!

    Is there a particular way you went about exporting this Latent Interpolation? Is it possible to do so in 3D?

    I've been able to export .obj files without problems, but I'm not sure how to export the Interpolation. I'm assuming that one would have to export obj's for each frame of an interpolations trajectory through the latent space but I am unclear on how to target that trajectory and its specific frames.

    Any thoughts? Thankyou thank you thank you!

    opened by dollarbilll 1
  • Do you have data processing codes for SUNCG?

    Do you have data processing codes for SUNCG?

    Hi I found that your codes are quite interesting and I am also doing research on indoor scene understanding tasks. I am working on SUNCG dataset now, so can you also share the data processing code with me? Best

    opened by wangyida 1
Owner
Yu Nishimura
Yu Nishimura
This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

Maurice Weiler 59 Dec 9, 2022
This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams.

Mutli-agent task allocation This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams. To change

Biorobotics Lab 5 Oct 12, 2022
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis Multi-View Consistent Generative Adversarial Networks for 3D-aware

Xuanmeng Zhang 78 Dec 10, 2022
End-to-End Object Detection with Fully Convolutional Network

This project provides an implementation for "End-to-End Object Detection with Fully Convolutional Network" on PyTorch.

null 472 Dec 22, 2022
Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

Y. Dong 158 Dec 21, 2022
A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

A PyTorch implementation of V-Net Vnet is a PyTorch implementation of the paper V-Net: Fully Convolutional Neural Networks for Volumetric Medical Imag

Matthew Macy 606 Dec 21, 2022
PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

pytorch-fcn PyTorch implementation of Fully Convolutional Networks. Requirements pytorch >= 0.2.0 torchvision >= 0.1.8 fcn >= 6.1.5 Pillow scipy tqdm

Kentaro Wada 1.6k Jan 7, 2023
Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

Y. Dong 158 Dec 21, 2022
This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields Project Page | Paper | Supplementary | Video | Slides | Blog | Talk If

null 1.1k Dec 30, 2022
git《FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding》(CVPR 2021) GitHub: [fig8]

FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding (CVPR 2021) This repo contains the implementation of our state-of-the-art fewshot ob

null 233 Dec 29, 2022