A python tutorial on bayesian modeling techniques (PyMC3)

Overview

Bayesian Modelling in Python

Bayesian Modelling in Python

Welcome to "Bayesian Modelling in Python" - a tutorial for those interested in learning how to apply bayesian modelling techniques in python (PYMC3). This tutorial doesn't aim to be a bayesian statistics tutorial - but rather a programming cookbook for those who understand the fundamental of bayesian statistics and want to learn how to build bayesian models using python. The tutorial sections and topics can be seen below.

Contents

  • Introduction

    • Motivation for learning bayesian statistics
    • Loading and parsing Hangout chat data
  • Section 1: Estimating model parameters

    • Frequentist technique for estimating parameters of a poisson model (Optimization routine)
    • Bayesian technique for estimating parameters of a poisson model (MCMC)
  • Section 2: Model checking & comparison

    • Posterior predictive check
    • Bayes factor
  • Section 3: Hierarchal modeling

    • Model pooling (separate models)
    • Partial pooling (hierarchal models)
    • Shrinkage effect of partial pooling
  • Section 4: Bayesian regression

    • Bayesian fixed effects poisson regression
    • Bayesian mixed effects poisson regression
  • Section 5: Bayesian survival analysis

    • Survival model theory
    • Cox proportional hazard model
    • Aalen's additive hazard model
  • Section 6: Bayesian A/B tests

    • Bayesian test of proportions
    • Bayesian t-test (BEST)

Contributions

  • All contributions are more than welcome. They can be minor (spelling, better explanations, improved code/charts) or major (contribute a full section).
  • If you would like to contribute, please create a pull request in GitHub. Happy to discuss ideas before you begin working on the addition.
  • I would especially welcome any contributions that address: survival analysis, mixture models, time series models or A/B experiments.
  • If you're not familiar with GitHub - please email me at [email protected].

Motivation for learning bayesian statistics

Statistics is a topic that never resonated with me throughout university. The frequentist techniques that we were taught (p-values etc) felt contrived and ultimately I turned my back on statistics as a topic that I wasn't interested in.

That was until I stumbled upon Bayesian statistics - a branch to statistics quite different from the traditional frequentist statistics that most universities teach. I was inspired by a number of different publications, blogs & videos that I would highly recommend any newbies to bayesian stats to begin with. They include:

I created this tutorial in the hope that others find it useful and it helps them learn Bayesian techniques just like the above resources helped me. I hope you find it useful and I'd welcome any corrections/comments/contributions from the community.

Note

This tutorial is actively being worked on. I'm keen to get feedback and welcome ideas/contributions.

Comments
  • Section 2 pymc3 error

    Section 2 pymc3 error

    Section 2 - Model Check II - Bayes Factor

    Model fails

    14 
      15     y_like = pm.DensityDist('y_like',
    ---> 16              lambda value: pm.switch(tau, 
      17                  pm.Poisson.dist(mu_p).logp(value),
      18                  pm.NegativeBinomial.dist(mu_nb, alpha).logp(value)
    
    AttributeError: module 'pymc3' has no attribute 'switch'
    
    

    Found a fix suggested by Cameron Davidson-Pilson which looks like it was intended to be merged into your code (but seems it wasn't?). The suggested revision to lambda works fine.

      y_like = pm.DensityDist('y_like',
                   lambda value: pm.math.switch(tau, 
                       pm.Poisson.dist(mu_p).logp(value),
                       pm.NegativeBinomial.dist(mu_nb, alpha).logp(value)
                   ),
                   observed=messages['time_delay_seconds'].values)
    
    opened by ghost 5
  • Section 2 fails when trying to sample the model

    Section 2 fails when trying to sample the model

    I am using pymc3 version 3.0 together with anaconda 4.0.0 & python 2.7 and theano 0.8.2. In the section 2 notebook:

    with pm.Model() as model:
        mu = pm.Uniform('mu', lower=0, upper=100)
        y_est = pm.Poisson('y_est', mu=mu, observed=messages['time_delay_seconds'].values)
    
        y_pred = pm.Poisson('y_pred', mu=mu)
    
        start = pm.find_MAP()
        step = pm.Metropolis()
        trace = pm.sample(50000, step, start=start, progressbar=True)
    

    Leaving out the y_ptred = ... statement removes the crash but also removes the purpose of the example...

    Since I'm trying to learn pymc3 I currently have no idea where to start. Something goes wrong in theano (strack-trace):

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-12-76dad9c91fda> in <module>()
          7     start = pm.find_MAP()
          8     step = pm.Metropolis()
    ----> 9     trace = pm.sample(50000, step, start=start, progressbar=True)
    
    C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\pymc3\sampling.pyc in sample(draws, step, start, trace, chain, njobs, tune, progressbar, model, random_seed)
        148         sample_func = _sample
        149 
    --> 150     return sample_func(**sample_args)
        151 
        152 
    
    C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\pymc3\sampling.pyc in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed)
        157     progress = progress_bar(draws)
        158     try:
    --> 159         for i, strace in enumerate(sampling):
        160             if progressbar:
        161                 progress.update(i)
    
    C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\pymc3\sampling.pyc in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
        239         if i == tune:
        240             step = stop_tuning(step)
    --> 241         point = step.step(point)
        242         strace.record(point)
        243         yield strace
    
    C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\pymc3\step_methods\compound.pyc in step(self, point)
         12     def step(self, point):
         13         for method in self.methods:
    ---> 14             point = method.step(point)
         15         return point
    
    C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\pymc3\step_methods\arraystep.pyc in step(self, point)
        116         bij = DictToArrayBijection(self.ordering, point)
        117 
    --> 118         apoint = self.astep(bij.map(point))
        119         return bij.rmap(apoint)
        120 
    
    C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\pymc3\step_methods\metropolis.pyc in astep(self, q0)
        123             q = q0 + delta
        124 
    --> 125         q_new = metrop_select(self.delta_logp(q, q0), q, q0)
        126 
        127         if q_new is q:
    
    C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\theano\compile\function_module.pyc in __call__(self, *args, **kwargs)
        869                     node=self.fn.nodes[self.fn.position_of_error],
        870                     thunk=thunk,
    --> 871                     storage_map=getattr(self.fn, 'storage_map', None))
        872             else:
        873                 # old-style linkers raise their own exceptions
    
    C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\theano\gof\link.pyc in raise_with_op(node, thunk, exc_info, storage_map)
        312         # extra long error message in that case.
        313         pass
    --> 314     reraise(exc_type, exc_value, exc_trace)
        315 
        316 
    
    C:\Users\egbert\AppData\Local\Continuum\Anaconda\py27_64\lib\site-packages\theano\compile\function_module.pyc in __call__(self, *args, **kwargs)
        857         t0_fn = time.time()
        858         try:
    --> 859             outputs = self.fn()
        860         except Exception:
        861             if hasattr(self.fn, 'position_of_error'):
    
    opened by elypma 4
  • Section II Check2: Bayes Factor fails

    Section II Check2: Bayes Factor fails

    Running win 10x64, anaconda, python3.5, theano 0.9.0dev4, pymc3 3.0rc4 updated 10-dec-2106. Have confirmed pm.switch(tau........ not found. Perhaps an alternative syntax is used. Code section and error output follows. franc

    with pm.Model() as model:

    # Index to true model
    prior_model_prob = 0.5
    #tau = pm.DiscreteUniform('tau', lower=0, upper=1)
    tau = pm.Bernoulli('tau', prior_model_prob)
    
    # Poisson parameters
    mu_p = pm.Uniform('mu_p', 0, 60)
    
    # Negative Binomial parameters
    alpha = pm.Exponential('alpha', lam=0.2)
    mu_nb = pm.Uniform('mu_nb', lower=0, upper=60)
    
    y_like = pm.DensityDist('y_like',
             lambda value: pm.switch(tau, 
                 pm.Poisson.dist(mu_p).logp(value),
                 pm.NegativeBinomial.dist(mu_nb, alpha).logp(value)
             ),
             observed=messages['time_delay_seconds'].values)
    
    start = pm.find_MAP()
    step1 = pm.Metropolis([mu_p, alpha, mu_nb])
    step2 = pm.ElemwiseCategorical(vars=[tau], values=[0,1])
    trace = pm.sample(200000, step=[step1, step2], start=start)
    

    _ = pm.traceplot(trace[burnin:], varnames=['tau'])


    AttributeError Traceback (most recent call last) in () 18 pm.NegativeBinomial.dist(mu_nb, alpha).logp(value) 19 ), ---> 20 observed=messages['time_delay_seconds'].values) 21 22 start = pm.find_MAP()

    C:\Anaconda3\lib\site-packages\pymc3\distributions\distribution.py in new(cls, name, *args, **kwargs) 29 data = kwargs.pop('observed', None) 30 dist = cls.dist(*args, **kwargs) ---> 31 return model.Var(name, dist, data) 32 else: 33 raise TypeError("Name needs to be a string but got: %s" % name)

    C:\Anaconda3\lib\site-packages\pymc3\model.py in Var(self, name, dist, data) 301 else: 302 var = ObservedRV(name=name, data=data, --> 303 distribution=dist, model=self) 304 self.observed_RVs.append(var) 305 if var.missing_values:

    C:\Anaconda3\lib\site-packages\pymc3\model.py in init(self, type, owner, index, name, data, distribution, model) 584 self.missing_values = data.missing_values 585 --> 586 self.logp_elemwiset = distribution.logp(data) 587 self.model = model 588 self.distribution = distribution

    in (value) 14 15 y_like = pm.DensityDist('y_like', ---> 16 lambda value: pm.switch(tau, 17 pm.Poisson.dist(mu_p).logp(value), 18 pm.NegativeBinomial.dist(mu_nb, alpha).logp(value)

    AttributeError: module 'pymc3' has no attribute 'switch'

    opened by unkufunku 2
  • Fixing ElementwiseCategorical call

    Fixing ElementwiseCategorical call

    In Section 2, I received an error when trying to run Bayes Factor cell. Modified the call: step2 = pm.ElemwiseCategoricalStep(vars=[tau], values=[0,1]) to step2 = pm.ElemwiseCategorical(vars=[tau], values=[0,1]) and now it seems to work as intended.

    opened by zjost 1
  • PyMC3 API update - vars and varnames

    PyMC3 API update - vars and varnames

    Changes:

    • API compatibility (it was NOT working with the current PyMC3)
      • in pm.traceplot: vars to varnames
      • in pm.ElemwiseCategoricalStep: var=tau to vars=[tau]

    Also, for some reason I had to remove ax=ax in pm.autocorrplot (Section 1). It changed its graphics, but otherwise was resulting in an error. (I have up-to-date matplotlib et al on OS X, but it is also possible, I have some errors on my side.)

    Tested. On an old OS X, so times are 2-4x longer.

    opened by stared 1
  • Asking questions of the posterior predictive distribution

    Asking questions of the posterior predictive distribution

    Add examples of questions that can be asked of the posterior predictive distribution for the hierarchal negative binomial distribution. such as:

    • what is the probability I will respond to David in less than 20 seconds
    • Who are the most likely people I will respond to
    • etc
    opened by markdregan 1
  • Research new functionality added to PYMC when sampling from the posterior predictive distribution

    Research new functionality added to PYMC when sampling from the posterior predictive distribution

    Check out: http://pymc-devs.github.io/pymc3/posterior_predictive/

    This might be a flexible solution when sampling form the posterior predictive for various covariates

    opened by markdregan 0
Owner
Mark Regan
PM @Google Assistant
Mark Regan
This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

Data Structure and Algorithms with Python This repository is related to the Arabic tutorial here, within the tutorial we discuss the common data struc

Mohamed Ayman 33 Dec 2, 2022
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Bayesian Methods for Hackers Using Python and PyMC The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chap

Cameron Davidson-Pilon 25.1k Jan 2, 2023
Image Classification - A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

null 0 Jan 23, 2022
Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

Intel Labs 210 Jan 4, 2023
LBK 20 Dec 2, 2022
TensorFlow implementation for Bayesian Modeling and Uncertainty Quantification for Learning to Optimize: What, Why, and How

Bayesian Modeling and Uncertainty Quantification for Learning to Optimize: What, Why, and How TensorFlow implementation for Bayesian Modeling and Unce

Shen Lab at Texas A&M University 8 Sep 2, 2022
🐦 Opytimizer is a Python library consisting of meta-heuristic optimization techniques.

Opytimizer: A Nature-Inspired Python Optimizer Welcome to Opytimizer. Did you ever reach a bottleneck in your computational experiments? Are you tired

Gustavo Rosa 546 Dec 31, 2022
Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.

Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.

Nafis Ahmed 1 Dec 28, 2021
Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Wonjong Jang 8 Nov 1, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Vowpal Wabbit 8.1k Jan 6, 2023
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

null 195 Dec 7, 2022
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

timeseriesAI 2.8k Jan 8, 2023
TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

null 2.6k Jan 4, 2023
performing moving objects segmentation using image processing techniques with opencv and numpy

Moving Objects Segmentation On this project I tried to perform moving objects segmentation using background subtraction technique. the introduced meth

Mohamed Magdy 15 Dec 12, 2022
The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

PointNav-VO The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation Project Page | Paper Table of Contents Setup

Xiaoming Zhao 41 Dec 15, 2022
RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

The first comprehensive Robustness investigation benchmark on large-scale dataset ImageNet regarding ARchitecture design and Training techniques towards diverse noises.

null 132 Dec 23, 2022
Pytorch implementation of four neural network based domain adaptation techniques: DeepCORAL, DDC, CDAN and CDAN+E. Evaluated on benchmark dataset Office31.

Deep-Unsupervised-Domain-Adaptation Pytorch implementation of four neural network based domain adaptation techniques: DeepCORAL, DDC, CDAN and CDAN+E.

Alan Grijalva 49 Dec 20, 2022
pcnaDeep integrates cutting-edge detection techniques with tracking and cell cycle resolving models.

pcnaDeep: a deep-learning based single-cell cycle profiler with PCNA signal Welcome! pcnaDeep integrates cutting-edge detection techniques with tracki

ChanLab 8 Oct 18, 2022