Bayesian Modeling and Computation in Python

Overview

Bayesian Modeling and Computation in Python

Open access and Code

This repository contains the open access version of the text and the code examples in the book. All of this can be more easily viewed at www.bayesiancomputationbook.com

See a mistake?

If you see any issues please create an issue on the issue tracker

Environment installation

To run the code you will need to install the correct packages in a computational environment. We have provided instructions below for common options.

Conda

conda env create -f environment.yml
conda activate bmcp

Colab

The book code can also be run using Google Colab. https://colab.research.google.com

More instructions to come soon

Comments
  • add markdown files for chapter 1, 2, 3, and frontmatter sections

    add markdown files for chapter 1, 2, 3, and frontmatter sections

    This PR includes a first draft of a .md file for chapter 1 and 2. Also includes other files like a bibtext (with all citations in the book). Figures for chapter 1 and 2. And a couple of editions to config files.

    I think the .md is functional in the sense that equations are numbered, figures has captions, elements are crossreferenced (except for links to chapter still not available), the coolboxes are styled and code highlighted (although the style is not the same as the one used in the book), Tables are properly rendeted. Probably still room for improvements.

    I was not able to automatically crossreference code blocks, so instead of consecutive numbers (as in the book) I am using ad hoc text references. Tables needs to be rewritten.

    opened by aloctavodia 9
  • Manuscript Error:

    Manuscript Error:

    Manuscript code requires bambi package, please add to the environment.yml file.

    Description

    bambi library mentioned on page xix, Code 0.2 section, but not included in the environment.yml file so running the code on page xix yields a runtime error.

    Reference page xix

    errata 
    opened by aaelony 8
  • ValueError: `as_list()` is not defined on an unknown TensorShape.

    ValueError: `as_list()` is not defined on an unknown TensorShape.

    Hi there, I'm encountering an issue in Chapter 06 at Code Block 6.14, when it comes to the section of the code where we're using generate_gam_ar_latent():

    mcmc_samples, sampler_stats = run_mcmc(
        1000, gam_with_latent_ar, n_chains=4, num_adaptation_steps=1000,
        seed=tf.constant([36245, 734565], dtype=tf.int32),
        observed=co2_by_month_training_data.T
    )
    

    When it comes to running MCMC function on this model I get the following error:

    ------------------------------------------------------------------
    ValueError                       Traceback (most recent call last)
    <timed exec> in <module>
    
    ~\anaconda3\envs\pm3env\lib\site-packages\tensorflow\python\util\traceback_utils.py in error_handler(*args, **kwargs)
        151     except Exception as e:
        152       filtered_tb = _process_traceback_frames(e.__traceback__)
    --> 153       raise e.with_traceback(filtered_tb) from None
        154     finally:
        155       del filtered_tb
    
    ~\anaconda3\envs\pm3env\lib\site-packages\tensorflow_probability\python\experimental\mcmc\windowed_sampling.py in windowed_adaptive_nuts(n_draws, joint_dist, n_chains, num_adaptation_steps, current_state, init_step_size, dual_averaging_kwargs, max_tree_depth, max_energy_diff, unrolled_leapfrog_steps, parallel_iterations, trace_fn, return_final_kernel_results, discard_tuning, chain_axis_names, seed, **pins)
        709       'unrolled_leapfrog_steps': unrolled_leapfrog_steps,
        710       'parallel_iterations': parallel_iterations}
    --> 711   return _windowed_adaptive_impl(
        712       n_draws=n_draws,
        713       joint_dist=joint_dist,
    
    ~\anaconda3\envs\pm3env\lib\site-packages\tensorflow_probability\python\experimental\mcmc\windowed_sampling.py in _windowed_adaptive_impl(n_draws, joint_dist, kind, n_chains, proposal_kernel_kwargs, num_adaptation_steps, current_state, dual_averaging_kwargs, trace_fn, return_final_kernel_results, discard_tuning, seed, chain_axis_names, **pins)
        883       samplers.sanitize_seed(seed), n=2)
        884   (target_log_prob_fn, initial_transformed_position, bijector,
    --> 885    step_broadcast, batch_shape, shard_axis_names) = _setup_mcmc(
        886        joint_dist,
        887        n_chains=n_chains,
    
    ~\anaconda3\envs\pm3env\lib\site-packages\tensorflow_probability\python\experimental\mcmc\windowed_sampling.py in _setup_mcmc(model, n_chains, init_position, seed, **pins)
        241                                    tf.nest.flatten(batch_shape))
        242 
    --> 243   lp_static_shape = tensorshape_util.concatenate(n_chains, batch_shape)
        244 
        245   if not tensorshape_util.is_fully_defined(batch_shape):
    
    ~\anaconda3\envs\pm3env\lib\site-packages\tensorflow_probability\python\internal\tensorshape_util.py in concatenate(x, other)
        131       dimensions in `x` and `other`.
        132   """
    --> 133   return _cast_tensorshape(tf.TensorShape(x).concatenate(other), type(x))
        134 
        135 
    
    ~\anaconda3\envs\pm3env\lib\site-packages\tensorflow_probability\python\internal\tensorshape_util.py in _cast_tensorshape(x, x_type)
         72     # as the shape, which we don't want.
         73     return np.array(as_list(x), dtype=np.int32)
    ---> 74   return x_type(as_list(x))
         75 
         76 
    
    ~\anaconda3\envs\pm3env\lib\site-packages\tensorflow_probability\python\internal\tensorshape_util.py in as_list(x)
         62     ValueError: If `x` has unknown rank.
         63   """
    ---> 64   return tf.TensorShape(x).as_list()
         65 
         66 
    
    ValueError: as_list() is not defined on an unknown TensorShape.
    

    Is there something that I need to fix in the generate_gam_ar_latent() function itself?

    def generate_gam_ar_latent(training=True):
    
        @tfd.JointDistributionCoroutine
        def gam_with_latent_ar():
            seasonality, trend, noise_sigma = yield from gam_trend_seasonality()
            
            # Latent AR(1)
            ar_sigma = yield root(tfd.HalfNormal(.1, name='ar_sigma'))
            rho = yield root(tfd.Uniform(-1., 1., name='rho'))
            
            def ar_func(y):
                loc = tf.concat([tf.zeros_like(y[..., :1]), y[..., :-1]],
                                axis=-1) * rho[..., None]
                return tfd.Independent(
                    tfd.Normal(loc=loc, scale=ar_sigma[..., None]),
                    reinterpreted_batch_ndims=1
                )
            
            temporal_error = yield tfd.Autoregressive(
                distribution_fn=ar_func,
                sample0=tf.zeros_like(trend),
                num_steps=trend.shape[-1],
                name='temporal_error'
            )
    
            # Linear prediction
            y_hat = seasonality + trend + temporal_error
            
            if training:
                y_hat = y_hat[..., :co2_by_month_training_data.shape[0]]
    
            # Likelihood
            observed = yield tfd.Independent(
                tfd.Normal(y_hat, noise_sigma[..., None]),
                reinterpreted_batch_ndims=1,
                name='observed'
            )
    
        return gam_with_latent_ar
    

    I upgraded to TensorFlow Probability 0.15.0 and now my Kernel just dies; is there any possible known solution to this? Many thanks for this incredible book!

    opened by scroobiustrip 6
  • Solutions to selected problems of CH3

    Solutions to selected problems of CH3

    Solutions to selected problems of CH3 at the bottom of the notebook (top of the notebook are the codes from the original chapter, which contains some snippets useful to solving the solutions at the bottom).

    Problem 3H17 is incomplete but left for reference in case it maybe of help to others (feel free to delete if you want to keep full solutions only).

    opened by ikarosilva 5
  • SARIMAX - Time Series Models

    SARIMAX - Time Series Models

    Hey, thanks once again for the great book - I was just revisiting the section on the Tensorflow Probability implementation of SARIMAX in Chapter 6, I have a couple quick questions:

    1. Is it possible to use / include a demo of how you'd make out of sample forecast predictions using the SARIMAX models? (Probably requires more effort, but similar to the forecast examples given using the STS models at the end?)

    2. While it's stated that we can't combine the seasonal_order with the design_matrix - is it possible to model external regressors (e.g. holidays, covariates etc) in addition to using the fourier_seasonality() basis function, and just add a couple extra columns to the design matrix, or is something more clever / sophisticated required?

    Many thanks again!

    question 2nd_Edition 
    opened by scroobiustrip 5
  • More errata fixes

    More errata fixes

    Zen of Python Footnote error

    • [x] Changed manuscript
    • [x] Changed code notebook reference
    • [x] Added to Errata

    New York times truncated sentence

    • [x] Changed manuscript
    • [x] Changed open access
    • [x] Added to Errata

    Misspelled name and uncapitalized plinko

    • [x] Changed manuscript
    • [x] Changed open access
    • [x] Added to Errata

    Bad grammar in Figure 9.17 caption title

    • [x] Changed manuscript
    • [x] Changed open access
    • [x] Added to Errata

    Missing vertical line in Figure 3.3 https://github.com/BayesianModelingandComputationInPython/BookCode_Edition1/issues/106

    • [x] Changed manuscript
    • [x] Changed code notebook reference
    • [x] Added to Errata
    opened by canyon289 4
  • Update chp_10.md / Section 10.2.2

    Update chp_10.md / Section 10.2.2

    Clean up equation and make code example consistent

    • \text for text within equations
    • \texttt to set the variable names in monospace as in the following code example
    • use same names in code example as in equation (also fixes bug in code example)
    opened by st-- 4
  • Replace salad with sandwhich in list of stable categories in hierarchical modeling

    Replace salad with sandwhich in list of stable categories in hierarchical modeling

    In the unpooled estimate the mean of the $\sigma$ estimate for salads is 21.3, whereas in the hierarchical estimate the mean of the same parameter estimate is now 25.5, and has been “pulled” up by the means of the pizza and sandwiches category. Moreover, the estimates of the pizza and salad categories in the hierarchical category, while regressed towards the mean slightly, remain largely the same as the unpooled estimates.

    In the Hierarchical models section, the text lists salad as a category notably influenced by the hyperprior, then immediately lists salads as a stable category lightly effected by the hyperprior. I think this is meant to be sandwhiches instead of salads—since it's much more stable.

    opened by paw-lu 3
  • Manuscript Error: (Grammar) Missing subject in intro to Hierarchical Models

    Manuscript Error: (Grammar) Missing subject in intro to Hierarchical Models

    Description

    This phrase in the intro to Hierarchical Models is missing a subject:

    https://github.com/BayesianModelingandComputationInPython/BookCode_Edition1/blob/1ae02289e4b9961ad56e06de7700b1a1119d1856/markdown/chp_04.md?plain=1#L976-L979

    I'm assuming it's meant to be something like hyperparameter. eg

    The partial refers to the idea that groups that do not share one fixed parameter, but share a hyperparameter which describes the distribution of for the parameters of the prior itself.

    Reference

    Author Section

    Do not close until

    • [ ] Added to Errata
    • [ ] Fixed in Open Access
    • [ ] Fixed in latex source

    Once again, enjoying the book here, thanks!

    errata 
    opened by paw-lu 3
  • Using prior instead of hyperprior for group level prediction in section 4.6.2

    Using prior instead of hyperprior for group level prediction in section 4.6.2

    Model model_hierarchical_salad_sales_predictions in listing 4.17 is

    @tfd.JointDistributionCoroutine
    def out_of_sample_prediction_model():
        model = yield root(non_centered_model)
        β = model.beta_offset * model.beta_sigma[..., None] + model.beta_mu[..., None]
        
        β_group = yield tfd.Normal(
            model.beta_mu, model.beta_sigma, name="group_beta_prediction")
        group_level_prediction = yield tfd.Normal(
            β_group * out_of_sample_customers,
            model.sigma_prior,
            name="group_level_prediction")
        for l in [2, 4]:
            yield tfd.Normal(
                tf.gather(β, l, axis=-1) * out_of_sample_customers,
                tf.gather(model.sigma, l, axis=-1),
                name=f"location_{l}_prediction")
    

    In the distribution for group_level_prediction, model.sigma_prior is used for the standard deviation. However, model.sigma_prior is the distribution of the hyperprior, not the prior that goes into the model.

    I don't understand this. Shouldn't the standard deviation in group_level_prediction be a half-normal distribution whose scaling parameter in turn is influenced by the hyperprior, like this (see s = ... as well as the line afterwards):

    @tfd.JointDistributionCoroutine
    def out_of_sample_prediction_model():
        model = yield root(non_centered_model)
        β = model.beta_offset * model.beta_sigma[..., None] + model.beta_mu[..., None]
        
        β_group = yield tfd.Normal(
            model.beta_mu, model.beta_sigma, name="group_beta_prediction")
        s = yield tfd.HalfNormal(model.sigma_prior, name="s")
        group_level_prediction = yield tfd.Normal(
            β_group * out_of_sample_customers,
            s,
            name="group_level_prediction")
        for l in [2, 4]:
            yield tfd.Normal(
                tf.gather(β, l, axis=-1) * out_of_sample_customers,
                tf.gather(model.sigma, l, axis=-1),
                name=f"location_{l}_prediction")
    
    question 2nd_Edition 
    opened by JeanLuc001 3
  • add license

    add license

    This adds a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

    Not sure this is the best option. Just following other CRC books.

    opened by aloctavodia 3
  • Manuscript Error: Typo in word marginalize

    Manuscript Error: Typo in word marginalize

    Description

    Marginalize should be marginalizing

    Reference image

    Author Section

    Do not close until

    • [ ] Added to Errata
    • [ ] Fixed in Open Access
    • [ ] Fixed in latex source
    errata 
    opened by canyon289 0
  • Manuscript Error: Extra word added

    Manuscript Error: Extra word added

    Description

    may many

    Need to delete may

    Reference image

    Author Section

    Do not close until

    • [ ] Added to Errata
    • [ ] Fixed in Open Access
    • [ ] Fixed in latex source
    errata 
    opened by canyon289 0
  • Eq. (11.58) in section 11.6.1 has a sign error

    Eq. (11.58) in section 11.6.1 has a sign error

    Description

    In section 11.6.1, I think Eq. (11.58) is not correct. The "+" should be "-". And the following Eqs. (11.59) - (11.61) should be rectified. Thanks. image

    Reference

    Section 11.6.1

    Author Section

    Do not close until

    • [x] Added to Errata
    • [ ] Fixed in Open Access
    • [ ] Fixed in latex source
    errata 
    opened by yingzhuo1994 0
  • Manuscript Error: code may not be consistent with figure

    Manuscript Error: code may not be consistent with figure

    Description

    The code in Listing 5.1 splines_patsy line 1 may not be consistent with figure 5.6. We generate 500 data points using "x = np.linspace(0., 1., 500)" in the code. However, there are 20 data points in figure 5.6.

    Reference Link to page for chp5 Splines

    图片

    Author Section

    Do not close until

    • [ ] Added to Errata
    • [ ] Fixed in Open Access
    • [ ] Fixed in latex source
    errata 
    opened by Yan9564 0
  • Manuscript Error: Equation (5.2) /markdown/chp_05.html

    Manuscript Error: Equation (5.2) /markdown/chp_05.html

    Maybe there is an typo below Equation (5.2).

    This is known as polynomial regression. At first it may seem that Expression (5.2) is representing a multiple linear regression of the covariates $ \bm X, \bm X^2, \ldots +\bm X^m $.

    should be

    This is known as polynomial regression. At first it may seem that Expression (5.2) is representing a multiple linear regression of the covariates $ \bm X, \bm X^2, \ldots , \bm X^m $

    opened by Yan9564 0
Owner
Bayesian Modeling and Computation in Python
Code, references and all material to accompany the text
Bayesian Modeling and Computation in Python
ArviZ is a Python package for exploratory analysis of Bayesian models

ArviZ (pronounced "AR-vees") is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, model checking, comparison and diagnostics

ArviZ 1.3k Jan 5, 2023
Bayesian optimization in JAX

Bayesian optimization in JAX

Predictive Intelligence Lab 26 May 11, 2022
Combines Bayesian analyses from many datasets.

PosteriorStacker Combines Bayesian analyses from many datasets. Introduction Method Tutorial Output plot and files Introduction Fitting a model to a d

Johannes Buchner 19 Feb 13, 2022
Bonsai: Gradient Boosted Trees + Bayesian Optimization

Bonsai is a wrapper for the XGBoost and Catboost model training pipelines that leverages Bayesian optimization for computationally efficient hyperparameter tuning.

null 24 Oct 27, 2022
Case studies with Bayesian methods

Case studies with Bayesian methods

Baze Petrushev 8 Nov 26, 2022
Fourier-Bayesian estimation of stochastic volatility models

fourier-bayesian-sv-estimation Fourier-Bayesian estimation of stochastic volatility models Code used to run the numerical examples of "Bayesian Approa

null 15 Jun 20, 2022
BASTA: The BAyesian STellar Algorithm

BASTA: BAyesian STellar Algorithm Current stable version: v1.0 Important note: BASTA is developed for Python 3.8, but Python 3.7 should work as well.

BASTA team 16 Nov 15, 2022
Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations.

BO-GP Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations. The BO-GP codes are developed using GPy and GPyOpt. The optimizer

KTH Mechanics 8 Mar 31, 2022
Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

Amazon Web Services - Labs 3.3k Jan 3, 2023
Pyomo is an object-oriented algebraic modeling language in Python for structured optimization problems.

Pyomo is a Python-based open-source software package that supports a diverse set of optimization capabilities for formulating and analyzing optimization models. Pyomo can be used to define symbolic problems, create concrete problem instances, and solve these instances with standard solvers.

Pyomo 1.4k Dec 28, 2022
UpliftML: A Python Package for Scalable Uplift Modeling

UpliftML is a Python package for scalable unconstrained and constrained uplift modeling from experimental data. To accommodate working with big data, the package uses PySpark and H2O models as base learners for the uplift models. Evaluation functions expect a PySpark dataframe as input.

Booking.com 254 Dec 31, 2022
MICOM is a Python package for metabolic modeling of microbial communities

Welcome MICOM is a Python package for metabolic modeling of microbial communities currently developed in the Gibbons Lab at the Institute for Systems

null 57 Dec 21, 2022
Uplift modeling and causal inference with machine learning algorithms

Disclaimer This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to chang

Uber Open Source 3.7k Jan 7, 2023
Automated modeling and machine learning framework FEDOT

This repository contains FEDOT - an open-source framework for automated modeling and machine learning (AutoML). It can build custom modeling pipelines for different real-world processes in an automated way using an evolutionary approach. FEDOT supports classification (binary and multiclass), regression, clustering, and time series prediction tasks.

National Center for Cognitive Research of ITMO University 148 Jul 5, 2021
A Pythonic framework for threat modeling

pytm: A Pythonic framework for threat modeling Introduction Traditional threat modeling too often comes late to the party, or sometimes not at all. In

Izar Tarandach 644 Dec 20, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 8, 2023
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 3, 2023
Uber Open Source 1.6k Dec 31, 2022
learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

BDFD 6 Nov 5, 2022