Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

Overview

PyVarInf

PyVarInf provides facilities to easily train your PyTorch neural network models using variational inference.

Bayesian Deep Learning with Variational Inference

Bayesian Deep Learning

Assume we have a dataset D = {(x1, y1), ..., (xn, yn)} where the x's are the inputs and the y's the outputs. The problem is to predict the y's from the x's. Further assume that p(D|θ) is the output of a neural network with weights θ. The network loss is defined as

Usually, when training a neural network, we try to find the parameter θ* which minimizes Ln(θ).

In Bayesian Inference, the problem is instead to study the posterior distribution of the weights given the data. Assume we have a prior α over ℝd. The posterior is

This can be used for model selection, or prediction with Bayesian Model Averaging.

Variational Inference

It is usually impossible to analytically compute the posterior distribution, especially with models as complex as neural networks. Variational Inference adress this problem by approximating the posterior p(θ|D) by a parametric distribution q(θ|φ) where φ is a parameter. The problem is then not to learn a parameter θ* but a probability distribution q(θ|φ) minimizing

F is called the variational free energy.

This idea was originally introduced for deep learning by Hinton and Van Camp [5] as a way to use neural networks for Minimum Description Length [3]. MDL aims at minimizing the number of bits used to encode the whole dataset. Variational inference introduces one of many data encoding schemes. Indeed, F can be interpreted as the total description length of the dataset D, when we first encode the model, then encode the part of the data not explained by the model:

  • LC(φ) = KL(q(.|φ)||α) is the complexity loss. It measures (in nats) the quantity of information contained in the model. It is indeed possible to encode the model in LC(φ) nats, with the bits-back code [4].
  • LE(φ) = Eθ ~ q(θ|φ)[Ln(θ)] is the error loss. It measures the necessary quantity of information for encoding the data D with the model. This code length can be achieved with a Shannon-Huffman code for instance.

Therefore F(φ) = LC(φ) + LE(φ) can be rephrased as an MDL loss function which measures the total encoding length of the data.

Practical Variational Optimisation

In practice, we define φ = (µ, σ) in ℝd x ℝd, and q(.|φ) = N(µ, Σ) the multivariate distribution where Σ = diag(σ12, ..., σd2), and we want to find the optimal µ* and σ*.

With this choice of a gaussian posterior, a Monte Carlo estimate of the gradient of F w.r.t. µ and σ can be obtained with backpropagation. This allows to use any gradient descent method used for non-variational optimisation [2]

Overview of PyVarInf

The core feature of PyVarInf is the Variationalize function. Variationalize takes a model as input and outputs a variationalized version of the model with gaussian posterior.

Definition of a variational model

To define a variational model, first define a traditional PyTorch model, then use the Variationalize function :

import pyvarinf
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
        self.bn1 = nn.BatchNorm2d(10)
        self.bn2 = nn.BatchNorm2d(20)

    def forward(self, x):
        x = self.bn1(F.relu(F.max_pool2d(self.conv1(x), 2)))
        x = self.bn2(F.relu(F.max_pool2d(self.conv2(x), 2)))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x)

model = Net()
var_model = pyvarinf.Variationalize(model)
var_model.cuda()

Optimisation of a variational model

Then, the var_model can be trained that way :

optimizer = optim.Adam(var_model.parameters(), lr=0.01)

def train(epoch):
    var_model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.cuda(), target.cuda()
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = var_model(data)
        loss_error = F.nll_loss(output, target)
	# The model is only sent once, thus the division by
	# the number of datapoints used to train
        loss_prior = var_model.prior_loss() / 60000
        loss = loss_error + loss_prior
        loss.backward()
        optimizer.step()

for epoch in range(1, 500):
    train(epoch)

Available priors

In PyVarInf, we have implemented four families of priors :

Gaussian prior

The gaussian prior is N(0,Σ), with Σ the diagonal matrix diag(σ12, ..., σd2) defined such that 1/σi is the square root of the number of parameters in the layer, following the standard initialisation of neural network weights. It is the default prior, and do not have any parameter. It can be set with :

var_model.set_prior('gaussian')

Conjugate priors

The conjugate prior is used if we assume that all the weights in a given layer should be distributed as a gaussian, but with unknown mean and variance. See [6] for more details. This prior can be set with

var_model.set_prior('conjugate', n_mc_samples, alpha_0, beta_0, mu_0, kappa_0)

There are five parameters that have to bet set :

  • n_mc_samples, the number of samples used in the Monte Carlo estimation of the prior loss and its gradient.
  • mu_0, the prior sample mean
  • kappa_0, the number of samples used to estimate the prior sample mean
  • alpha_0 and beta_0, such that variance was estimated from 2 alpha_0 observations with sample mean mu_0 and sum of squared deviations 2 beta_0

Conjugate prior with known mean

The conjugate prior with known mean is similar to the conjugate prior. It is used if we assume that all the weights in a given layer should be distributed as a gaussian with a known mean but unknown variance. It is usefull in neural networks model when we assume that the weights in a layer should have mean 0. See [6] for more details. This prior can be set with :

var_model.set_prior('conjugate_known_mean', n_mc_samples, mean, alpha_0, beta_0)

Four parameters have to be set:

  • n_mc_samples, the number of samples used in the Monte Carlo estimation of the prior loss and its gradient.
  • mean, the known mean
  • alpha_0 and beta_0 defined as above

Mixture of two gaussian

The idea of using a mixture of two gaussians is defined in [1]. This prior can be set with:

var_model.set_prior('mixtgauss', n_mc_samples, sigma_1, sigma_2, pi)
  • n_mc_samples, the number of samples used in the Monte Carlo estimation of the prior loss and its gradient.
  • sigma_1 and sigma_2 the std of the two gaussians
  • pi the probability of the first gaussian

Requirements

This module requires Python 3. You need to have PyTorch installed for PyVarInf to work (as PyTorch is not readily available on PyPi). To install PyTorch, follow the instructions described here.

References

  • [1] Blundell, Charles, Cornebise, Julien, Kavukcuoglu, Koray, and Wierstra, Daan. Weight Uncertainty in Neural Networks. In International Conference on Machine Learning, pp. 1613–1622, 2015.
  • [2] Graves, Alex. Practical Variational Inference for Neural Networks. In Neural Information Processing Systems, 2011.
  • [3] Grünwald, Peter D. The Minimum Description Length principle. MIT press, 2007.
  • [4] Honkela, Antti and Valpola, Harri. Variational Learning and Bits-Back Coding: An Information-Theoretic View to Bayesian Learning. IEEE transactions on Neural Networks, 15(4), 2004.
  • [5] Hinton, Geoffrey E and Van Camp, Drew. Keeping Neural Networks Simple by Minimizing the Description Length of the Weights. In Proceedings of the sixth annual conference on Computational learning theory. ACM, 1993.
  • [6] Murphy, Kevin P. Conjugate Bayesian analysis of the Gaussian distribution., 2007.
You might also like...
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Knodle (Knowledge-supervised Deep Learning Framework) - a new framework for weak supervision with neural networks. It provides a modularization for se

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

End-to-End Coreference Resolution with Different Higher-Order Inference Methods This repository contains the implementation of the paper: Revealing th

HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference
HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

HNECV This repository provides a reference implementation of HNECV as described in the paper: HNECV: Heterogeneous Network Embedding via Cloud model a

TensorFlow implementation of
TensorFlow implementation of "Variational Inference with Normalizing Flows"

[TensorFlow 2] Variational Inference with Normalizing Flows TensorFlow implementation of "Variational Inference with Normalizing Flows" [1] Concept Co

 Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution
Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution Figure: Example visualization of the method and baseline as a

Use deep learning, genetic programming and other methods to predict stock and market movements

StockPredictions Use classic tricks, neural networks, deep learning, genetic programming and other methods to predict stock and market movements. Both

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Data-depth-inference - Data depth inference with python
Data-depth-inference - Data depth inference with python

Welcome! This readme will guide you through the use of the code in this reposito

Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

Comments
  • [FIX] Param naming for torch>.5

    [FIX] Param naming for torch>.5

    Bug was raising the following error:

    Traceback (most recent call last):
      File "/home/diviyan/conda/lib/python3.6/site-packages/pyvarinf/vi.py", line 232, in __init__
        learn_mean, learn_rho)
      File "/home/diviyan/conda/lib/python3.6/site-packages/pyvarinf/vi.py", line 274, in _variationalize_module
        learn_mean, learn_rho)
      File "/home/diviyan/conda/lib/python3.6/site-packages/pyvarinf/vi.py", line 274, in _variationalize_module
        learn_mean, learn_rho)
      File "/home/diviyan/conda/lib/python3.6/site-packages/pyvarinf/vi.py", line 259, in _variationalize_module
        dico[name].mean)
      File "/home/diviyan/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 138, in register_parameter
        raise KeyError("parameter name can't contain \".\"")
    KeyError: 'parameter name can\'t contain "."'
    

    Best, Diviyan

    opened by Diviyan-Kalainathan 1
  • Error with LSTMs on GPU

    Error with LSTMs on GPU

    Hi @ctallec,

    Great package! Thanks for writing it.

    I recently found a bug that I wanted to seek your advice on. When I Variationalize an LSTM model, I see an error when porting it to CUDA. For example, here's the working version:

    net = nn.LSTM(10, 10)
    net.cuda()
    

    This works fine. Then,

    net = nn.LSTM(10, 10)
    net = pyvarinf.Variationalize(net)
    net.cuda()
    

    This gives the bug:

    ---------------------------------------------------------------------------
    
    StopIteration                             Traceback (most recent call last)
    
    <ipython-input-14-0c80e37b52e0> in <module>()
          1 net = nn.LSTM(10, 10)
          2 net = pyvarinf.Variationalize(net)
    ----> 3 net.cuda()
    
    3 frames
    
    /usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in flatten_parameters(self)
         94         Otherwise, it's a no-op.
         95         """
    ---> 96         any_param = next(self.parameters()).data
         97         if not any_param.is_cuda or not torch.backends.cudnn.is_acceptable(any_param):
         98             return
    
    StopIteration: 
    

    Any idea what this is from? Maybe I shouldn't be using this with an RNN-like model?

    Thanks! Miles

    opened by MilesCranmer 1
  • TypeError: super() takes at least 1 argument (0 given)

    TypeError: super() takes at least 1 argument (0 given)

    I used python2.7 to test the example:

    import pyvarinf import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim class Net(nn.Module): def init(self): super(Net, self).init() self.conv1 = nn.Conv2d(1, 10, kernel_size=5) self.conv2 = nn.Conv2d(10, 20, kernel_size=5) self.fc1 = nn.Linear(320, 50) self.fc2 = nn.Linear(50, 10) self.bn1 = nn.BatchNorm2d(10) self.bn2 = nn.BatchNorm2d(20) def forward(self, x): x = self.bn1(F.relu(F.max_pool2d(self.conv1(x), 2))) x = self.bn2(F.relu(F.max_pool2d(self.conv2(x), 2))) x = x.view(-1, 320) x = F.relu(self.fc1(x)) x = self.fc2(x) return F.log_softmax(x) model = Net() var_model = pyvarinf.Variationalize(model) var_model.cuda()

    But Error report: TypeError: super() takes at least 1 argument (0 given)

    opened by tlok666 3
Owner
null
LBK 20 Dec 2, 2022
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Bayesian Methods for Hackers Using Python and PyMC The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chap

Cameron Davidson-Pilon 25.1k Jan 2, 2023
mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms.

mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms. It provides easily interchangeable modeling and planning components, and a set of utility functions that allow writing model-based RL algorithms with only a few lines of code.

Facebook Research 724 Jan 4, 2023
A variational Bayesian method for similarity learning in non-rigid image registration (CVPR 2022)

A variational Bayesian method for similarity learning in non-rigid image registration We provide the source code and the trained models used in the re

daniel grzech 14 Nov 21, 2022
Facilitating Database Tuning with Hyper-ParameterOptimization: A Comprehensive Experimental Evaluation

A Comprehensive Experimental Evaluation for Database Configuration Tuning This is the source code to the paper "Facilitating Database Tuning with Hype

DAIR Lab 9 Oct 29, 2022
【CVPR 2021, Variational Inference Framework, PyTorch】 From Rain Generation to Rain Removal

From Rain Generation to Rain Removal (CVPR2021) Hong Wang, Zongsheng Yue, Qi Xie, Qian Zhao, Yefeng Zheng, and Deyu Meng [PDF&&Supplementary Material]

Hong Wang 48 Nov 23, 2022
Pytorch Implementation of paper "Noisy Natural Gradient as Variational Inference"

Noisy Natural Gradient as Variational Inference PyTorch implementation of Noisy Natural Gradient as Variational Inference. Requirements Python 3 Pytor

Tony JiHyun Kim 119 Dec 2, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 157 Dec 11, 2022
Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

pgmpy 2.2k Jan 3, 2023