Pytorch Implementation of paper "Noisy Natural Gradient as Variational Inference"

Tony JiHyun Kim

Last update: Dec 2, 2022

Related tags

Deep Learning NoisyNaturalGradient

Overview

Noisy Natural Gradient as Variational Inference

PyTorch implementation of Noisy Natural Gradient as Variational Inference.

Requirements

Python 3
Pytorch
visdom

Comments

This paper is about how to optimize bayesian neural network which has matrix variate gaussian distribution.
This implementation contains Noisy Adam optimizer which is for Fully Factorized Gaussian(FFG) distribution, and Noisy KFAC optimizer which is for Matrix Variate Gaussian(MVG) distribution.
These optimizers only work with bayesian network which has specific structure that I will mention below.
Currently only linear layer is available.

Experimental comments

I addded a lr scheduler to noisy KFAC because loss is exploded during training. I guess this happens because of slight approximation.
For MNIST training noisy KFAC is 15-20x slower than noisy Adam, as mentioned in paper.
I guess the noisy KFAC needs more epochs to train simple neural network structure like 2 linear layers.

Usage

Currently only MNIST dataset are currently supported, and only fully connected layer is implemented.

Options

model : Fully Factorized Gaussian(FFG) or Matrix Variate Gaussian(MVG)
n : total train dataset size. need this value for optimizer.
eps : parameter for optimizer. Default to 1e-8.
initial_size : initial input tensor size. Default to 784, size of MNIST data.
label_size : label size. Default to 10, size of MNIST label.

More details in option_parser.py

Train

$ python train.py --model=FFG --batch_size=100 --lr=1e-3 --dataset=MNIST
$ python train.py --model=MVG --batch_size=100 --lr=1e-2 --dataset=MNIST --n=60000

Visualize

To visualize intermediate results and loss plots, run python -m visdom.server and go to the URL http://localhost:8097

Test

$ python test.py --epoch=20

Training Graphs

1. MNIST

network is consist of 2 linear layers.
FFG optimized by noisy Adam : epoch 20, lr 1e-3

MVG optimized by noisy KFAC : epoch 100, lr 1e-2, decay 0.1 for every 30 epochs
Need to tune learning rate.

Implementation detail

Optimizing parameter procedure is consists of 2 steps, Calculating gradient and Applying to bayeisan parameters.
Before forward, network samples parameters with means & variances.
Usually calling step function updates parameters, but not this case. After calling step function, you have to update bayesian parameters. Look at the ffg_model.py

TODOs

More benchmark cases
Supports bayesian convolution
Implement Block Tridiagonal Covariance, which is dependent between layers.

Code reference

Visualization code(visualizer.py, utils.py) references to pytorch-CycleGAN-and-pix2pix(https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) by Jun-Yan Zhu

Author

Tony Kim

An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" in Pytorch.

GLOM An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" for MNIST Dataset. To understand this

50 Oct 19, 2022

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

OTA: Optimal Transport Assignment for Object Detection This project provides an implementation for our CVPR2021 paper "OTA: Optimal Transport Assignme

217 Jan 3, 2023

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

307 Jan 3, 2023

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PointDSC repository PyTorch implementation of PointDSC for CVPR'2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency",

153 Dec 14, 2022

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis This is a PyTorch implementation of the Deep Streaming Linear Discriminant

41 Dec 25, 2022

Pytorch Implementation of paper "Noisy Natural Gradient as Variational Inference"

Related tags

Overview

Noisy Natural Gradient as Variational Inference

Requirements

Comments

Experimental comments

Usage

Options

Train

Visualize

Test

Training Graphs

1. MNIST

Implementation detail

TODOs

Code reference

Author

You might also like...

An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" in Pytorch.

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

Pytorch implementation of our paper under review — Lottery Jackpots Exist in Pre-trained Models

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

Owner

Tony JiHyun Kim

HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

ALBERT-pytorch-implementation - ALBERT pytorch implementation

The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.