Doing bayesian data analysis - Python/PyMC3 versions of the programs described in Doing bayesian data analysis by John K. Kruschke

Overview

Doing_bayesian_data_analysis

Gitter

This repository contains the Python version of the R programs described in the great book Doing bayesian data analysis (first edition) by John K. Kruschke (AKA the puppy book).

All the code is adapted from the Kruschke's book, except hpd.py that is taken (without modifications) from the PyMC project.

The name of the programs are the same used in the book, except they begin with a number indicating the chapter. All programs are written in Python and instead of BUGS/JAGS the PyMC3 module is used.

Thanks to Brian Naughton the code is also available as an IPython notebook

Second edition

If you are interested on the PyMC3 code for the second edition of Doing bayesian data analysis, please check this Repository.

Comments
  • Two way anova

    Two way anova

    Hi! I was wondering if you managed to get the Salary program (two way anova) working. I'm trying really hard but I don't seem to get an answer similar to the book. (for example there's no main effect on FT1 FT2 and FT3, plus so many convergence problems)

    opened by gabrielelanaro 5
  • 09_FilconPyMC.py -- Choice of step method

    09_FilconPyMC.py -- Choice of step method

    Why is NUTS used as the step method for kappa? Using Metropolis appears to produce the same results, but runs faster.

    I am new to pymc3 and trying to understand when to use NUTS.

    opened by cwnoyes 2
  • docs: Fix a few typos

    docs: Fix a few typos

    There are small typos in:

    • 06_BernGrid.py
    • 08_BernTwoGrid.py
    • 15_YmetricXsinglePyMC.py
    • 16_SimpleRobustLinearRegressionPyMC.py

    Fixes:

    • Should read approximation rather than aproximation.
    • Should read precision rather than precission.
    • Should read biased rather than baised.

    Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

    opened by timgates42 1
  • Linear regression (chapter 16)

    Linear regression (chapter 16)

    https://github.com/aloctavodia/Doing_bayesian_data_analysis/blob/master/16_SimpleLinearRegressionPyMC.py

    Hi! I've tried to do linear regression by myself, using the following model (very much similar to the book), but no matter what I do, I can't manage to get a straight fit.

    I've seen that in your code you do a transformation. Why is that? Is there any way to get the result without resorting to the transformation?

    import numpy as np
    import pymc as pm
    
    true_intercept = 10
    true_slope = 0.5 
    
    x = np.linspace(0, 10, 100)
    y = true_slope * x + true_intercept
    # Jitter x and y
    x += 0.5 * (np.random.rand(len(x)) * 2.0 - 1.0)
    y += 1.0 * (np.random.rand(len(x)) * 2.0 - 1.0)
    scatter(x, y)
    show()
    # Linear regression
    b0 = pm.Normal('b0', 0.0, tau=1e-5, value=0)
    b1 = pm.Normal('b1', 0.0, tau=1e-5, value=0)
    
    @pm.deterministic
    def mu(value=x, b1=b1, b0=b0):
        return x * b1 + b0
    
    tau = pm.Gamma('tau', 0.01, 0.01)
    
    resp = pm.Normal('y', mu, tau, value=data, observed=True)
    
    mcmc = pm.MCMC([b0, b1, mu, tau, resp])
    mcmc.sample(40000, burn=20000, thin=10)
    
    opened by gabrielelanaro 1
  • Python 2 and 3 compatibility fix, and multiprocessing

    Python 2 and 3 compatibility fix, and multiprocessing

    • fixed IPython/Kruschkes_Doing_Bayesian_Data_Analysis_in_PyMC3.ipynb
      • StringIO and BytesIO in 18_ section requires import from six to work in both, Python 2 and 3
    • fixed 18_*.py
      • subplot count starts at 1, not 0
    • added _process_all.py, changed some *.py
      • script for processing with ipyparallel (requires 'ipycluster start' action before)
      • adapted imports because of namespace peculiarities of multiprocessing function handling
    opened by prismv 0
  • IPython notebook of all the current scripts

    IPython notebook of all the current scripts

    Hi Osvaldo, as we discussed, here is a pull request for the ipython notebook I created. This is my first foray into committing into someone else's repo, so I have no idea if I am doing the pull request right... Best, Brian

    opened by hgbrian 0
  • Add a Gitter chat badge to README.md

    Add a Gitter chat badge to README.md

    opened by gitter-badger 0
Owner
Osvaldo Martin
Osvaldo Martin
ckan 3.6k Dec 27, 2022
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Aesara

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

PyMC 7.2k Dec 30, 2022
🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program

Bioinformatics Laboratory 3.9k Jan 5, 2023
An open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure

CellProfiler 734 Jan 8, 2023
A modular single-molecule analysis interface

MOSAIC: A modular single-molecule analysis interface MOSAIC is a single molecule analysis toolbox that automatically decodes multi-state nanopore data

National Institute of Standards and Technology 35 Dec 13, 2022
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

Blue Collar Bioinformatics 915 Dec 29, 2022
3D visualization of scientific data in Python

Mayavi: 3D visualization of scientific data in Python Mayavi docs: http://docs.enthought.com/mayavi/mayavi/ TVTK docs: http://docs.enthought.com/mayav

Enthought, Inc. 1.1k Jan 6, 2023
Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code

A Python framework for creating reproducible, maintainable and modular data science code.

QuantumBlack Labs 7.9k Jan 1, 2023
Efficient Python Tricks and Tools for Data Scientists

Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.

Khuyen Tran 944 Dec 28, 2022
An interactive explorer for single-cell transcriptomics data

an interactive explorer for single-cell transcriptomics data cellxgene (pronounced "cell-by-gene") is an interactive data explorer for single-cell tra

Chan Zuckerberg Initiative 424 Dec 15, 2022
Data intensive science for everyone.

The latest information about Galaxy can be found on the Galaxy Community Hub. Community support is available at Galaxy Help. Galaxy Quickstart Galaxy

Galaxy Project 1k Jan 8, 2023
CS 506 - Computational Tools for Data Science

CS 506 - Computational Tools for Data Science Code, slides, and notes for Boston University CS506 Fall 2021 The Final Project Repository can be found

Lance Galletti 14 Mar 23, 2022
A framework for feature exploration in Data Science

Beehive A framework for feature exploration in Data Science Background What do we do when we finish one episode of feature exploration in a jupyter no

Steven IJ 1 Jan 3, 2022
Datamol is a python library to work with molecules

Datamol is a python library to work with molecules. It's a layer built on top of RDKit and aims to be as light as possible.

datamol 276 Dec 19, 2022
Incubator for useful bioinformatics code, primarily in Python and R

Collection of useful code related to biological analysis. Much of this is discussed with examples at Blue collar bioinformatics. All code, images and

Brad Chapman 560 Dec 24, 2022
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Dec 31, 2022
Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

statsmodels 8.1k Dec 30, 2022
A computer algebra system written in pure Python

SymPy See the AUTHORS file for the list of authors. And many more people helped on the SymPy mailing list, reported bugs, helped organize SymPy's part

SymPy 9.9k Jan 8, 2023
PennyLane is a cross-platform Python library for differentiable programming of quantum computers.

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural network.

PennyLaneAI 1.6k Jan 4, 2023