Supplemental learning materials for "Fourier Feature Networks and Neural Volume Rendering"

Overview

Fourier Feature Networks and Neural Volume Rendering

This repository is a companion to a lecture given at the University of Cambridge Engineering Department, which is available for viewing here. In it you will find the code to reproduce all of the visualizations and experiments shared in the lecture, as well as a Jupyter Notebook providing interactive lecture notes convering the following topics:

  1. 1D Signal Reconstruction
  2. 2D Image Regression
  3. Volume Raycasting
  4. 3D Volume Rendering with NeRF

Getting Started

In this section I will outline how to run the various experiments. Before I begin, it is worth noting that while the defaults are all reasonable and will produce the results you see in the lecture, it can be very educational to play around with different hyperparameter values and observe the results.

In order to run the various experiments, you will first need to install the requirements for the repository, ideally in a virtual environment. We recommend using a version of Python >= 3.7. As this code heavily relies upon PyTorch, you should install the correct version for your platform. The guide here is very useful and I suggest you follow it closely. You may also find this site helpful if you are working on Windows. Once that is done, you can run the following:

pip install wheel
pip install -r requirements.txt

You should now be ready to run any of the experiment scripts in this repository.

Fourier Feature Networks

This repository contains implementations of the research presented in Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains and NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Those who use this code should be sure to cite them, and to also take a look at our own work in this space, FastNeRF: High-Fidelity Neural Rendering at 200FPS.

Fourier Feature Networks address the inherent problems with teaching neural nets to model complex signals from low frequency information. They do this by introducing Fourier features as a preprocessing step, used to encode the low-frequency inputs in such a way as to introduce higher-frequency information as seen below for the 1D case:

1D Fourier Feature Network

Ultimately the Fourier features replace the featurizer, or kernel, that the neural net would otherwise need to learn. As shown above, Fourier Feature Networks can be used to predict a 1-D signal from a single floating point value indicating time. They can also be used to predict image pixel values from their position and, most intriguingly, predict color and opacity from 3D position and view direction, i.e. to model a radiance field. The ability to do that allows the creation of rendered neural avatars, like the one below:

neural_rendered_avatar.mp4

As well as neurally rendered objects which have believable materials properties and view-dependent effects.

The code contained in this repository is intended for use as supplemental learning materials to the lecture. The Lecture Notes in particular will provide a walkthrough of the technical content. This README is focused more on how to run these scripts to reproduce experimental results and/or run your own experiments using this code.

Data

As in the lecture, you can access any of a variety of datasets for use in running these (or your own) experiments:

1D Datasets

The SignalDataset class can take any function mapping a single input to a single output. Feel free to experiment. Here is an example of how to create one:

def _multifreq(x):
    return np.sin(x) + 0.5*np.sin(2*x) - 0.2*np.cos(5*x) + 2

num_samples = 32
sample_rate = 8
dataset = ffn.SignalDataset.create(_multifreq, num_samples, sample_rate)

2D Datasets

The PixelDataset class can take any path to an image. Create one like this:

dataset = ffn.PixelDataset.create(path_to_image_file, color_space="RGB",
                                   size=512)

3D Datasets

This is where the library becomes a bit picky about input data. The RayDataset supports a set format for data, and we provide several datasets in this format to play with. These datasets are not stored in the repo, but the library will automatically download them to the data folder when you first requests them which you can do like so:

dataset = ffn.RayDataset.load("antinous_400.npz", split="train", num_samples=64)

We recommend you use one of the following (all datasets are provided in 400 and 800 versions):

Name Image Size # Train # Val # Test Description Sample image
antinous_(size) (size)x(size) 100 7 13 Renders of a sculpture kindly provided by the Fitzwilliam Museum. Does not include view-dependent effects. Antinous
rubik_(size) (size)x(size) 100 7 13 This work is based on "Rubik's Cube" (https://sketchfab.com/3d-models/rubiks-cube-d7d8aa83720246c782bca30dbadebb98) by BeyondDigital (https://sketchfab.com/BeyondDigital) licensed under CC-BY-4.0 (http://creativecommons.org/licenses/by/4.0/). Does not include view-dependent effects. Rubik
lego_(size) (size)x(size) 100 7 13 Physically based renders of a lego build, provided by the NeRF authors. Lego
trex_(size) (size)x(size) 100 7 13 This work is based on "The revenge of the traditional toys" (https://sketchfab.com/3d-models/the-revenge-of-the-traditional-toys-d2dd1ee7948343308cd732c665ef1337) by Bastien Genbrugge (https://sketchfab.com/bastienBGR) licensed under CC-BY-4.0 (http://creativecommons.org/licenses/by/4.0/). Rendered with PBR and thus includes multiple view-dependent effects. T-Rex
benin_(size) (size)x(1.5 *size) 74 10 0 Free moving, hand-held photographs of a bronze statue of a rooster from Benin, kindly provided by Jesus College, Cambridge. Benin
matthew_(size) (size)x(size) 26 5 0 Photographs of me, taken by a 31 camera fixed rig. Matthew

If you want to bring your own data, the format we support is an NPZ with the following tensors:

Name Shape dtype description
images (C, D, D, 4) uint8 Tensor of camera images with RGBA pixel values. Alpha value indicates a mask around the object (where appropriate).
intrinsics (C, 3, 3) float32 Tensor of camera intrinsics (i.e. projection) matrices
extrinsics (C, 4, 4) float32 Tensor of camera extrinsics (i.e. camera to world) matrices
bounds (4, 4) float32 Rigid transform indicating the bounds of the volume to be rendered. Will be used to transform a unit cube.
split_counts (3) int32 Number of cameras (in order) for train, val and test data splits.

where C is the number of cameras and D is the image resolution. You may find it helpful to use the provided datasets as a reference.

Experiments

These experiments form the basis of the results that you may have already seen in the lecture. With a sufficiently powerful GPU (or access to one in Azure or another cloud service) you should be able to reproduce all the animations and videos you have seen. In this section, I will provide a brief guide to how to use the different scripts that you will find in the root directory of the repo.

1D Signal Regression

The 1D Signal Regression script can be invoked like so:

python train_signal_regression.py multifreq outputs/multifreq

You should see a window pop up that looks like the image below:

1D Signal Training

2D Image Regression

To get started with 2D Image Regression, run the following command:

python train_image_regression.py cat.jpg mlp outputs/cat_mlp

A window should pop up as the system trains that looks like this:

Image Regression

At the end it will show you the result, which as you will have come to expect from the lecture is severaly lacking in detail due to the lack of high-frequency gradients. Try running the same script with positional or gaussian in place of mlp to see how using Fourier features dramatically improves the quality. Your results should look like what you see below:

ir_pos.mp4

Feel free to pass the script your own images and see what happens!

Ray Sampling

As a preparation for working with volume rendering, it can be useful to get a feel for the training data. If you run:

python test_ray_sampling.py lego_400.npz lego_400_rays.html

This should download the dataset into the data directory and then create a scenepic showing what the ray sampling data looks like. Notice how the rays pass from the camera through the pixels and into the volume. Try running this script again with --stratified to see what happens when we add some uniform noise to the samples. Here is an example of what this can look like:

ray_sampling_crop.mp4

Voxel-based Volume Rendering

Just like in the lecture, we'll start with voxel-based rendering. If you run the following command:

python train_voxels.py lego_400.npz 128 outputs/lego_400_vox128

You should be able to train a voxel representation of a radiance field.

Note: You may have trouble running this script (and the ones that follow) if your computer does not have a GPU with enough memory. See Running on Azure ML for information on how to run these experiments in the cloud.

If you look in the train and val folders in the output directory you can see images produced during training showing how the model improves over time. There is also a visualization of the model provided in the voxels.html scenepic. Here is an example of an image produced by the Ray Caster:

Raycaster Training Image

All of the 3D methods will produce these images when in default training mode. They show (in row major order): rendered image, depth, training/val image, and per-pixel error. You can also ask the script to make a video of the training process. For example, if you run this script:

 python train_voxels.py lego_400.npz 128 outputs/lego_400_vox128 --make-video

It will produce the frames of the following video:

lego_400_vox128_train.mp4

Another way to visualize what the model has learned is toproduce a voxelization of the model. This is different from the voxel-based volume rendering, in which multiple voxels contribute to a single sample. Rather, it is a sparse octree containing voxels at the places the model has determined are solid, thus providing a rough sense of how the model is producing the rendered images. You can produce a scenepic showing this via the following command:

python voxelize_model.py outputs/lego_400_vox128/voxels.pt lego_400.npz lego_400_voxels.html

This will work for any of the volumetric rendering models.

Tiny NeRF

The first neural rendering technique we looked at was so-called "Tiny" NeRF, in which the view direction is not incorporated but we only focus on the 3D position within the volume. You can train Tiny NeRF models using the following command:

python train_tiny_nerf.py lego_400.npz mlp outputs/lego_400_mlp/

Substituting positional and gaussian as before to try out different modes of Fourier encoding. You'll notice again the same low-resolution results for MLP and similarly improved results when Fourier features are introduced. Here is a side-by-side comparison of mlp and positional training for our datasets (top row is nearest training image to the orbit camera). Your results should be similar.

tiny_nerf_pos.mp4

NeRF

In the results above you possibly noticed that specularities and transparency were not quite right. This is because those effects require the incorporation of the view direction, that is, where the camera is located in relation to the position. NeRF introduces this via a novel structure in the fairly simple model we've used so far:

NeRF Diagram

First, the model is deeper, allowing it to encode more information about the radiance field (note the skip connection to address signal attenuation with depth). However, the key structure difference is the addition of the ray direction being added before the final layer. A subtle but important point is that the opacity is predicted without the view direction, to encourage structural consistency.

The other major difference from what has come before is that NeRF samples the volume in a different way. The technique performs two-tiers of sampling. First, they sample a coarse network, which determines where in the space is opaque, and then they use that to create a second set of samples which are used to train a fine network. For the purpose of this lecture, we do something very similar in spirit, which is to use the voxel model we trained above as the coarse model. You can see how this changes the sampling of the volume by running the test_ray_sampling.py script again:

python test_ray_sampling.py lego_400.npz lego_400_fine.html --opacity-model lego_400_vox128.pt

You should now be able to see how additional samples are clustering near the location of the model, as opposed to being only evenly distributed over the volume. This helps the NeRF model to learn detail. Try passing in --stratified again to see the effects for random sampling as well. The video below displays the results of different kinds of sampling, but you should explore it for yourself as well:

sampling.mp4

NOTE: the Tiny NeRF model can also take advantage of fine sampling using an opacity model. Try it out!

You can train the NeRF model with the following command:

python train_nerf.py lego_400.npz outputs/lego_400_nerf --opacity-model lego_400_vox128.pt

While this model can train for many more steps than 50000 and continue to improve, you should already be able to see the increase in quality over the other models from adding in view direction. Here are some sample render orbits from the NeRF model:

antinous_800_nerf.mp4
lego_800_nerf.mp4
trex_800_nerf.mp4

You can produce these orbit videos yourself by calling, for example:

python orbit_video.py antinous_800_nerf.pt 800 outputs/antinous_render --opacity-model antinous_800_vox128.pt

Give it a try! That's it for the main experimental scripts. All of them have descriptive help statements, so be sure to explore your options and see what you can learn.

Running on Azure ML

It is outside of the scope of this lecture (or repository) to describe in detail how to get access to cloud computing resources for machine learning via Azure ML. However, there are some amazing resources out there already. For the purpose of this repository, all you need to do is complete this Quickstart Tutorial and download the config.json associated with your workspace into the root of the repository. You can then run any of the training scripts in Azure ML using the submit_aml_run.py script, like so:

python submit_aml_run.py cat <compute> train_image_regression.py "cat.jpg mlp outputs"

Where cat is the experiment name (you can choose anything here) that will group different runs together, and where you replace <compute> with the name of the compute target you want to use to run the experiment (which will need to have a GPU available). Finally you provide the script name (in this case, train_image_regression.py, which I suggest you use while you are getting your workspace up and running) and the arguments to the script as a string. If you get an error, make certain you've run:

pip install -r requirements-azureml.txt

If everything is working, you should receive a link that lets you monitor the experiment and view the output images and results in your browser.

You might also like...
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos
PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos. By adopting a unified pipeline-based API design, PyKale enforces standardization and minimalism, via reusing existing resources, reducing repetitions and redundancy, and recycling learning models across areas.

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

🔥 Cogitare - A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python
🔥 Cogitare - A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python

Cogitare is a Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python. A friendly interface for beginners and a powerful too

Visualizer for neural network, deep learning, and machine learning models
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX (.onnx, .pb, .pbtxt), Keras (.h5, .keras), Tens

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

English | 简体中文 Welcome to the PaddlePaddle GitHub. PaddlePaddle, as the only independent R&D deep learning platform in China, has been officially open

Machine learning framework for both deep learning and traditional algorithms
Machine learning framework for both deep learning and traditional algorithms

NeoML is an end-to-end machine learning framework that allows you to build, train, and deploy ML models. This framework is used by ABBYY engineers for

Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases.
Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases.

Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases. Ivy wraps the functional APIs of existing frameworks. Framework-agnostic functions, libraries and layers can then be written using Ivy, with simultaneous support for all frameworks. Ivy currently supports Jax, TensorFlow, PyTorch, MXNet and Numpy. Check out the docs for more info!

Comments
  • Fourier Features for Avatars

    Fourier Features for Avatars

    thank you for that really interesting lecture.

    I'm invested in how the rendered_neural_avatars.mp4 was created. I think there is a difference between the shown nerf results and the avatar. The scenes of the nerf results in your lecture were all static, while the avatar seems to be dynamic (I assume that view and facial expression are independent). Is there any paper that describes the creation process of that kind of avatar? Or could you give me more information about how the expression was fed into the nerf in more detail?

    opened by Alpe6825 2
  • Unable to import AzureML, running as local experiment during voxelize_model.py call

    Unable to import AzureML, running as local experiment during voxelize_model.py call

    python voxelize_model.py outputs/lego_400_vox128/voxels.pt lego_400.npz lego_400_voxels.html

    Unable to import AzureML, running as local experiment Downloading model... Unrecognized asset: voxels.pt Unable to download model voxels.pt

    opened by tempdeltavalue 1
  • Visualizer refactor

    Visualizer refactor

    This refactor opens the way for different dataset types (e.g. synthetic, video-based), modular training process visualizers, and also adds some new visualizations for different parts of the process.

    opened by matajoh 0
  • Bump numpy from 1.20 to 1.22.0

    Bump numpy from 1.20 to 1.22.0

    Bumps numpy from 1.20 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(v1.1.0)
  • v1.1.0(Nov 26, 2021)

    Various improvements in this release:

    1. New trex_400 and trex_800 datasets, perfect for testing view-dependent effects, self-occlusion, and thin structures.
    2. cuda device flag is now configurable everywhere. I don't recommend running this on CPU, but the scripts/notebook no longer require a GPU
    3. The volume_raycasting animation is much cleaner and easier to follow now
    4. OcTree algorithms are a bit cleaner
    5. RayDataset handles the Sparse sampling mode correctly now
    6. Miscellaneous bug fixes
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Nov 23, 2021)

Owner
Matthew A Johnson
Matthew A Johnson
Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial

Utku Ozbulak 53 Jul 4, 2022
Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

null 7 Jun 22, 2022
This repository contains all the code and materials distributed in the 2021 Q-Programming Summer of Qode.

Q-Programming Summer of Qode This repository contains all the code and materials distributed in the Q-Programming Summer of Qode. If you want to creat

Sammarth Kumar 11 Jun 11, 2021
This repo contains research materials released by members of the Google Brain team in Tokyo.

Brain Tokyo Workshop ?? ?? This repo contains research materials released by members of the Google Brain team in Tokyo. Past Projects Weight Agnostic

Google 1.2k Jan 2, 2023
The materials used in the SaxonJS tutorial presented at Declarative Amsterdam, 2021

SaxonJS-Tutorial-2021, version 1.0.4 Last updated on 4 November, 2021. Table of contents Background Prerequisites Starting a web server Running a Java

Saxonica 11 Oct 23, 2022
An SE(3)-invariant autoencoder for generating the periodic structure of materials

Crystal Diffusion Variational AutoEncoder This software implementes Crystal Diffusion Variational AutoEncoder (CDVAE), which generates the periodic st

Tian Xie 94 Dec 10, 2022
Workshop Materials Delivered on 28/02/2022

intro-to-cnn-p1 Repo for hosting workshop materials delivered on 28/02/2022 Questions you will answer in this workshop Learning Objectives What are co

Beginners Machine Learning 5 Feb 28, 2022
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

NVIDIA Research Projects 1.4k Jan 1, 2023
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Machine Learning From Scratch About Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The purpose

Erik Linder-Norén 21.8k Jan 9, 2023