TyXe: Pyro-based BNNs for Pytorch users
TyXe aims to simplify the process of turning Pytorch neural networks into Bayesian neural networks by leveraging the model definition and inference capabilities of Pyro. Our core design principle is to cleanly separate the construction of neural architecture, prior, inference distribution and likelihood, enabling a flexible workflow where each component can be exchanged independently. Defining a BNN in TyXe takes as little as 5 lines of code:
net = nn.Sequential(nn.Linear(1, 50), nn.Tanh(), nn.Linear(50, 1))
prior = tyxe.priors.IIDPrior(dist.Normal(0, 1))
likelihood = tyxe.likelihoods.HomoskedasticGaussian(scale=0.1)
inference = tyxe.guides.AutoNormal
bnn = tyxe.VariationalBNN(net, prior, likelihood, inference)
In the following, we assume that you (roughly) know what a BNN is mathematically.
Motivating example
Standard neural networks give us a single function that fits the data, but many different ones are typically plausible. With only a single fit, we don't know for what inputs the model is 'certain' (because there is training data nearby) and where it is uncertain.
Maximum likelihood fit | Posterior samples |
Implementing the former can be achieved easily in a few lines of Pytorch code, but training a BNN that gives a distribution over different fits is typically more complicated and is specifically what we aim to simplify.
Training
Constructing a BNN object has been shown in the example above. For fitting the posterior approximation, we provide a high-level .fit
method similar to libraries such as scikit-learn or keras:
optim = pyro.optim.Adam({"lr": 1e-3})
bnn.fit(data_loader, optim, num_epochs)
Prediction & evaluation
Further we provide .predict
and .evaluation
methods, which make predictions based on multiple samples from the approximate posterior, average them based on the observation model, and return log likelihoods and an error measure:
predictions = bnn.predict(x_test, num_samples)
error, log_likelihood = bnn.evaluate(x_test, y_test, num_samples)
Local reparameterization
We implement local reparameterization for factorized Gaussians as a poutine, which reduces gradient noise during training. This means it can be enabled or disabled at both during training and prediction with a context manager:
with tyxe.poutine.local_reparameterization():
bnn.fit(data_loader, optim, num_epochs)
bnn.predict(x_test, num_predictions)
At the moment, this poutine does not work with the AutoNormal
and AutoDiagonalNormal
guides in pyro, since those draw the weights from a Delta distribution, so you need to use tyxe.guides.ParameterwiseDiagonalNormal
as your guide.
MCMC
We provide a unified interface to pyro's MCMC implementations, simply use the tyxe.MCMC_BNN
class instead and provide a kernel instead of the guide:
kernel = pyro.infer.mcmcm.NUTS
bnn = tyxe.MCMC_BNN(net, prior, likelihood, kernel)
Any parameters that pyro's MCMC
class accepts can be passed through the keyword arguments of the .fit
method.
Continual learning
Due to our design that cleanly separates the prior from guide, architecture and likelihood, it is easy to update it in a continual setting. For example, you can construct a tyxe.priors.DictPrior
by extracting the distributions over all weights and biases from a ParameterwiseDiagonalNormal
instance using the get_detached_distributions
method and pass it to bnn.update_prior
to implement Variational Continual Learning in a few lines of code. See examples/vcl.py
for a basic example on split-MNIST and split-CIFAR.
Network architectures
We don't implement any layer classes. You construct your network in Pytorch and then turn it into a BNN, which makes it easy to apply the same prior and inference strategies to different neural networks.
Inference
For inference, we mainly provide an equivalent to pyro's AutoDiagonalNormal
that is compatible with local reparameterization in tyxe.guides
. This module also contains a few helper functions for initialization of Gaussian mean parameters, e.g. to the values of a pre-trained network. It should be possible to use any of pyro's autoguides for variational inference. See examples/resnet.py
for a few options as well as initializing to pre-trained weights.
Priors
The priors can be found in tyxe.priors
. We currently only support placing priors on the parameters. Through the expose and hide arguments in the init method you can specify layers, types of layers and specific parameters over which you want to place a prior. This helps, for example in learning the parameters of BatchNorm layers deterministically.
Likelihoods
tyxe.observation_models
contains classes that wrap the most common torch.distributions
for specifying noise models of data to
Installation
We recommend installing TyXe using conda with the provided environment.yml
, which also installs all the dependencies for the examples except for Pytorch3d, which needs to be added manually. The environment assumes that you are using CUDA11.0, if this is not the case, simply change the cudatoolkit
and dgl-cuda
versions before running:
conda env create -f environment.yml
conda activate tyxe
pip install -e .
Citation
If you use TyXe, please consider citing:
@article{ritter2021tyxe,
author = {Hippolyt Ritter and
Theofanis Karaletsos
},
title = {TyXe: Pyro-based Bayesian neural nets for Pytorch},
journal = {International Conference on Probabilistic Programming (ProbProg)},
volume = {},
pages = {},
year = {2020},
url = {https://arxiv.org/abs/2110.00276}
}