Unofficial implementation (replicates paper results!) of MINER: Multiscale Implicit Neural Representations in pytorch-lightning

Overview

MINER_pl

Unofficial implementation of MINER: Multiscale Implicit Neural Representations in pytorch-lightning.

image

📖 Ref readings

⚠️ Main differences w.r.t. the original paper before continue:

  • In the pseudo code on page 8, where the author states Weight sharing for images, it means finer level networks are initialized with coarser level network weights. However, I did not find the correct way to implement this. Therefore, I initialize the network weights from scratch for all levels.
  • The paper says it uses sinusoidal activation (does he mean SIREN? I don't know), but I use gaussian activation (in hidden layers) with trainable parameters (per block) like my experiments in the other repo. In finer levels where the model predicts laplacian pyramids, I use sinusoidal activation x |-> sin(ax) with trainable parameters a (per block) as output layer (btw, this performs significantly better than simple tanh). Moreover, I precompute the maximum amplitude for laplacian residuals, and use it to scale the output, and I find it to be better than without scaling.
  • I experimented with a common trick in coordinate mlp: positional encoding and find that using it can increase training time/accuracy with the same number of parameters (by reducing 1 layer). This can be turned on/off by specifying the argument --use_pe. The optimal number of frequencies depends on the patch size, the larger patch sizes, the more number of frequencies you need and vice versa.
  • Some difference in the hyperparameters: the default learning rate is 3e-2 instead of 5e-4. Optimizer is RAdam instead of Adam. Block pruning happens when the loss is lower than 1e-4 (i.e. when PSNR>=40) for image and 5e-3 for occupancy rather than 2e-7.

💻 Installation

  • Run pip install -r requirements.txt.
  • Download the images from Acknowledgement or prepare your own images into a folder called images.
  • Download the meshes from Acknowledgement or prepare your own meshes into a folder called meshes.

🔑 Training

image

Pluto example:

python train.py \
    --task image --path images/pluto.png \
    --input_size 4096 4096 --patch_size 32 32 --batch_size 256 --n_scales 4 \
    --use_pe --n_layers 3 \
    --num_epochs 50 50 50 200 \
    --exp_name pluto4k_4scale

Tokyo station example:

python train.py \
    --task image --path images/tokyo-station.jpg \
    --input_size 6000 4000 --patch_size 25 25 --batch_size 192 --n_scales 5 \
    --use_pe --n_layers 3 \
    --num_epochs 50 50 50 50 150 \
    --exp_name tokyo6k_5scale
Image (size) Train time (s) GPU mem (MiB) #Params (M) PSNR
Pluto (4096x4096) 53 3171 9.16 42.14
Pluto (8192x8192) 106 6099 28.05 45.09
Tokyo station (6000x4000) 68 6819 35.4 42.48
Shibuya (7168x2560) 101 8967 17.73 37.78
Shibuya (14336x5120) 372 8847 75.42 39.32
Shibuya (28672x10240) 890 10255 277.37 41.93
Shibuya (28672x10240)* 1244 6277 98.7 37.59

*paper settings (6 scales, each network has 4 layer with 9 hidden units)

The original image will be resized to img_wh for reconstruction. You need to make sure img_wh divided by 2^(n_scales-1) (the resolution at the coarsest level) is still a multiple of patch_wh.


mesh

First, convert the mesh to N^3 occupancy grid by

python preprocess_mesh.py --N 512 --M 1 --T 1 --path <path/to/mesh> 

This will create N^3 occupancy to be regressed by the neural network. For detailed options, please see preprocess_mesh.py. Typically, increase M or T if you find the resulting occupancy bad.

Next, start training (bunny example):

python train.py \
    --task mesh --path occupancy/bunny_512.npy \
    --input_size 512 --patch_size 16 --batch_size 512 --n_scales 4 \
    --use_pe --n_freq 5 --n_layers 2 --n_hidden 8 \
    --loss_thr 5e-3 --b_chunks 512 \
    --num_epochs 50 50 50 150 \
    --exp_name bunny512_4scale

For full options, please see here. Some important options:

  • If your GPU memory is not enough, try reducing batch_size.
  • By default it will not log intermediate images to tensorboard to save time. To visualize image reconstruction and active blocks, add --log_image argument.

You are recommended to monitor the training progress by

tensorboard --logdir logs

where you can see training curves and images.

🟥 🟩 🟦 Block decomposition

To reconstruct the image using trained model and to visualize block decomposition per scale like Fig. 4 in the paper, see image_test.ipynb or mesh_test.ipynb

Examples:

💡 Implementation tricks

  • Setting num_workers=0 in dataloader increased the speed a lot.
  • As suggested in training details on page 4, I implement parallel block inference by defining parameters of shape (n_blocks, n_in, n_out) and use @ operator (same as torch.bmm) for faster inference.
  • To perform block pruning efficiently, I create two copies of the same network, and continually train and prune one of them while copying the trained parameters to the target network (somehow like in reinforcement learning, e.g. DDPG). This allows the network as well as the optimizer to shrink, therefore achieve higher memory and speed performance.
  • In validation, I perform inference in chunks like NeRF, and pass each chunk to cpu to reduce GPU memory usage.

💝 Acknowledgement

Further readings

During a stream, my audience suggested me to test on this image with random pixels:

random

The default 32x32 patch size doesn't work well, since the texture varies too quickly inside a patch. Decreasing to 16x16 and increasing network hidden units make the network converge right away to 43.91 dB under a minute. Surprisingly, with the other image reconstruction SOTA instant-ngp, the network is stuck at 17 dB no matter how long I train.

ngp-random

Is this a possible weakness of instant-ngp? What effect could it bring to real application? You are welcome to test other methods to reconstruct this image!

You might also like...
Unofficial pytorch implementation of paper
Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

One-Shot Free-View Neural Talking Head Synthesis Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Vide

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.
RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

[3DV 2021] We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network.

An essential implementation of BYOL in PyTorch + PyTorch Lightning
An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

Neural Scene Flow Fields using pytorch-lightning, with potential improvements
Neural Scene Flow Fields using pytorch-lightning, with potential improvements

nsff_pl Neural Scene Flow Fields using pytorch-lightning. This repo reimplements the NSFF idea, but modifies several operations based on observation o

Spiking Neural Network for Computer Vision using SpikingJelly framework and Pytorch-Lightning

Spiking Neural Network for Computer Vision using SpikingJelly framework and Pytorch-Lightning

The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory
The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

This repository contains the software implementation of most algorithms used or developed in my research. The LaTeX and Python code for generating the

A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.

NeRF-pytorch NeRF (Neural Radiance Fields) is a method that achieves state-of-the-art results for synthesizing novel views of complex scenes. Here are

Implementation of the method proposed in the paper
Implementation of the method proposed in the paper "Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation"

Neural Descriptor Fields (NDF) PyTorch implementation for training continuous 3D neural fields to represent dense correspondence across objects, and u

Neural implicit reconstruction experiments for the Vector Neuron paper
Neural implicit reconstruction experiments for the Vector Neuron paper

Neural Implicit Reconstruction with Vector Neurons This repository contains code for the neural implicit reconstruction experiments in the paper Vecto

Comments
  • Official MINER implementation

    Official MINER implementation

    Greetings,

    Thank you for the interest in our work. I have also benefited from several other of your repositories including the one on coordinate MLPs.

    We now have an official implementation for MINER here: https://github.com/vishwa91/miner . Would be great if you could add it to the README.

    Once again, thanks a lot for replicating our results!

    Regards.

    opened by vishwa91 1
Owner
AI葵
AI R&D in computer vision. Doing VTuber about DL algorithms. Check my channel! If you find my works helpful, please consider sponsoring! 我有在做VTuber,歡迎訂閱我的頻道!
AI葵
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.

This repository contains the code release for Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. This implementation is written in JAX, and is a fork of Google's JaxNeRF implementation. Contact Jon Barron if you encounter any issues.

Google 605 Dec 1, 2022
Code for the paper "Implicit Representations of Meaning in Neural Language Models"

Implicit Representations of Meaning in Neural Language Models Preliminaries Create and set up a conda environment as follows: conda create -n state-pr

Belinda Li 39 Nov 3, 2022
Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Gyeongjae Choi 17 Sep 23, 2021
A simple, unofficial implementation of MAE using pytorch-lightning

Masked Autoencoders in PyTorch A simple, unofficial implementation of MAE (Masked Autoencoders are Scalable Vision Learners) using pytorch-lightning.

Connor Anderson 19 Sep 29, 2022
Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yu

UT-Austin Robot Perception and Learning Lab 62 Nov 29, 2022
Code for Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations

Implementation for Iso-Points (CVPR 2021) Official code for paper Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations paper |

Yifan Wang 66 Nov 8, 2022
Unofficial pytorch-lightning implement of Mip-NeRF

mipnerf_pl Unofficial pytorch-lightning implement of Mip-NeRF, Here are some results generated by this repository (pre-trained models are provided bel

Jianxin Huang 153 Nov 30, 2022
This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

Differentiable Volumetric Rendering Paper | Supplementary | Spotlight Video | Blog Entry | Presentation | Interactive Slides | Project Page This repos

null 690 Nov 30, 2022
PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks. Code, based on the PyTorch framework, for reprodu

Asaf 2 Jun 27, 2022