Statistical-Rethinking-with-Python-and-PyMC3 - Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

Overview

Statistical Rethinking with Python and PyMC3

This repository has been deprecated in favour of this one, please check that repository for updates, for opening issues or sending pull requests

Statistical Rethinking is an incredible good introductory book to Bayesian Statistics, its follows a Jaynesian and practical approach with very good examples and clear explanations.

In this repository we ported the codes (originally in R and Stan) in the book to PyMC3. We are trying to keep the examples as close as possible to those in the book, while at the same time trying to express them in the most Pythonic and PyMC3onic way we can.

Display notebooks

View Jupyter notebooks in nbviewer

Contributing

If you want to contribute please, send your pull request to this. All contributions are welcome!

Installing the dependencies

To install the dependencies to run these notebooks, you can use Anaconda. Once you have installed Anaconda, run:

conda env create -f environment.yml

to install all the dependencies into an isolated environment. You can switch to this environment by running:

source activate stat-rethink-pymc3

Creative Commons License
Statistical Rethinking with Python and PyMC3 by All Contributors is licensed under a Creative Commons Attribution 4.0 International License.

Comments
  • New home for this repo

    New home for this repo

    What about transferring this repository into an organization. Maybe inside pymc-devs? Together with other books/docs? Thoughts @junpenglao, @AustinRochford @twiecki

    opened by aloctavodia 20
  • Conda Environment File

    Conda Environment File

    It might be helpful for contributors to have a conda environment file so they can quickly create a new environment and get things rolling. I have one I'll paste below, but I haven't checked it against all of the notebooks to make sure everything necessary is included and anything extraneous is not.

    Here's mine:

    name: pymc
    channels:
    - conda-forge/label/rc
    - conda-forge
    - defaults
    dependencies:
    - cycler=0.10.0=py36_0
    - freetype=2.7=1
    - joblib=0.11=py36_0
    - libgpuarray=0.6.2=np112py36_0
    - libpng=1.6.28=0
    - mako=1.0.6=py36_0
    - matplotlib=2.0.0=np112py36_3
    - nose=1.3.7=py36_2
    - pandas=0.19.2=np112py36_1
    - patsy=0.4.1=py36_0
    - pyparsing=2.2.0=py36_0
    - pytz=2017.2=py36_0
    - theano=0.9.0=py36_0
    - tqdm=4.11.2=py36_0
    - pymc3=3.1rc3=py36_0
    - appnope=0.1.0=py36_0
    - bleach=1.5.0=py36_0
    - decorator=4.0.11=py36_0
    - entrypoints=0.2.2=py36_1
    - h5py=2.7.0=np112py36_0
    - hdf5=1.8.17=1
    - html5lib=0.999=py36_0
    - ipykernel=4.6.1=py36_0
    - ipython=5.3.0=py36_0
    - ipython_genutils=0.2.0=py36_0
    - jinja2=2.9.6=py36_0
    - jsonschema=2.5.1=py36_0
    - jupyter_client=5.0.1=py36_0
    - jupyter_core=4.3.0=py36_0
    - markupsafe=0.23=py36_2
    - mistune=0.7.4=py36_0
    - mkl=2017.0.1=0
    - mkl-service=1.1.2=py36_3
    - nbconvert=5.1.1=py36_0
    - nbformat=4.3.0=py36_0
    - notebook=5.0.0=py36_0
    - numpy=1.12.1=py36_0
    - openssl=1.0.2k=1
    - pandocfilters=1.4.1=py36_0
    - path.py=10.1=py36_0
    - pexpect=4.2.1=py36_0
    - pickleshare=0.7.4=py36_0
    - pip=9.0.1=py36_1
    - prompt_toolkit=1.0.14=py36_0
    - ptyprocess=0.5.1=py36_0
    - pygments=2.2.0=py36_0
    - python=3.6.1=0
    - python-dateutil=2.6.0=py36_0
    - pyzmq=16.0.2=py36_0
    - readline=6.2=2
    - scipy=0.19.0=np112py36_0
    - seaborn=0.7.1=py36_0
    - setuptools=27.2.0=py36_0
    - simplegeneric=0.8.1=py36_1
    - six=1.10.0=py36_0
    - sqlite=3.13.0=0
    - terminado=0.6=py36_0
    - testpath=0.3=py36_0
    - tk=8.5.18=0
    - tornado=4.4.2=py36_0
    - traitlets=4.3.2=py36_0
    - wcwidth=0.1.7=py36_0
    - wheel=0.29.0=py36_0
    - xz=5.2.2=1
    - zlib=1.2.8=3
    - pip:
      - ipython-genutils==0.2.0
      - jupyter-client==5.0.1
      - jupyter-core==4.3.0
      - prompt-toolkit==1.0.14
      - pygpu==0.6.2
    prefix: /Users/Peter/anaconda/envs/pymc
    
    opened by pmbaumgartner 5
  • First draft of Chapter 11

    First draft of Chapter 11

    Parts of Chapter 11. Two notes:

    • This PyMC3 pull request will make ordinal regression simpler, so I'll wait on that.
    • The book uses a nonstandard parametrization of the beta(-binomial) distribution, which requires some care (and reading the R rethinking source) to replicate, which is slowing me down a bit.
    opened by AustinRochford 4
  • Code in 4.54,4.58

    Code in 4.54,4.58

    Cant understand this piece of code in 4.54 and 4.58:

    weigth_seq = np.arange(25, 71)
    # Given that we have a lot of samples we can use less of them for plotting (or we can use all!)
    chain_N_thinned = chain_N[::10]
    mu_pred = np.zeros((len(weigth_seq), len(chain_N_thinned)*chain_N.nchains))
    for i, w in enumerate(weigth_seq):
        mu_pred[i] = chain_N_thinned['alpha'] + chain_N_thinned['beta'] * w
    
    

    What is chain_N[::10]? and chain_N.nchains doing? What is mu_pred supposed to be?

    opened by bluesky314 3
  • New code for Statistcal Rethinking 2nd edition

    New code for Statistcal Rethinking 2nd edition

    I've been working through the not yet released 2nd version of McElreath's book and have been using this repository to help guide me along the way. Its been extremely helpful, thanks! As of about ch.6 there is enough difference between the versions that it has become quite difficult to use. Any interest in getting a head start on the 2nd edition code translations so when he releases it, the code with be available in python as well? The 2nd edition can be found on his website w/ a password on one of the first 2019 youtube videos.

    opened by Mind-The-Data 3
  • Error in code 10.10

    Error in code 10.10

    If I run this cell:

    rt = chimp_ensemble['pulled_left']
    pred_mean = np.zeros((1000, 4))
    cond = d.condition.unique()
    prosoc_l = d.prosoc_left.unique()
    
    for i in range(len(rt)):
        
        tmp = []
        for cp in cond:
            for pl in prosoc_l:
                tmp.append(np.mean(rt[i][(d.prosoc_left==pl) & (d.chose_prosoc==cp)]))
        pred_mean[i, :] = tmp
            
    ticks = range(4)
    mp = pred_mean.mean(0)
    hpd = pm.hpd(pred_mean)
    
    plt.figure(figsize=(13,6))
    plt.fill_between(ticks, hpd[:,1], hpd[:,0], alpha=0.25, color='k')
    plt.plot(mp, color='k')
    plt.xticks(ticks, ("0/0","1/0","0/1","1/1"))
    chimps = d.groupby(['actor', 'prosoc_left', 'condition']).agg('mean')['pulled_left'].values.reshape(7, -1)
    for i in range(7):
        plt.plot(chimps[i], 'C0')
    
    plt.ylim(0, 1.1);
    

    I get this error:

    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    <ipython-input-18-509a6637b6be> in <module>()
         11     for cp in cond:
         12         for pl in prosoc_l:
    ---> 13             tmp.append(np.mean(rt[i][(d.prosoc_left==pl) & (d.chose_prosoc==cp)]))
         14     pred_mean[i, :] = tmp
         15 
    
    IndexError: too many indices for array
    

    which it is very strange. Fortunately, after an hour, I found a fix. Change the loop to this one:

    
    for i in range(len(rt)):
        
        tmp = []
        if rt[i].shape == ():
            continue
        for cp in cond:
            for pl in prosoc_l:
                tmp.append(np.mean(rt[i][(d.prosoc_left==pl) & (d.chose_prosoc==cp)]))
        pred_mean[i, :] = tmp
    
    opened by rosgori 3
  • Enhancement of the notebooks

    Enhancement of the notebooks

    Since almost all the chapter is ported (I will also work on Chapter 14), maybe it is good to

    • [ ] add legends to the figures in the early chapter.
    • [ ] reproduce the rest of the figures shown in the book. Some of the figures are related to the analysis without code presented.
    opened by junpenglao 3
  • Why should both betas have same distribution in quadratic regression?

    Why should both betas have same distribution in quadratic regression?

    In 4.66, we set beta to generate two numbers for our quadratic model so both are generated from the same distribution. But the book generates it from two different distributions. Why would we do this? Doesn't this decrease the flexibility of the model?

    with pm.Model() as m_4_5:
        alpha = pm.Normal('alpha', mu=178, sd=100)
        
        beta = pm.Normal('beta', mu=0, sd=10, shape=2) # shape of 2 generates two betas
        sigma = pm.Uniform('sigma', lower=0, upper=50)
        mu = pm.Deterministic('mu', alpha + beta[0] * d.weight_std + beta[1] * d.weight_std2)
        height = pm.Normal('height', mu=mu, sd=sigma, observed=d.height)
        trace_4_5 = pm.sample(1000, tune=1000)
    
    opened by bluesky314 2
  • Proposal for Code 5.55 (Ch 5)

    Proposal for Code 5.55 (Ch 5)

    It seems to me that the Code 5.55, as it presently stands, does not reflect the intended "index variable" approach introduced in the book? Hence, I propose the following code:

    with pm.Model() as m5_16_alt:
        a = pm.Normal('a',mu = 0.6, sd=10, shape=len(d['clade_id'].unique()))
        mu = pm.Deterministic('mu', a[d['clade_id'].values])
        sigma = pm.Uniform('sigma', lower= 0 , upper= 10)
        kcal_per_g = pm.Normal('kcal_per_g', mu = mu, sd=sigma, observed = d['kcal.per.g'])
        trace_5_16_alt = pm.sample(1000, tune=1000) 
    

    Proposed code includes shape parameter for the variable a and uses index variable the way it was intended by the author of the book. Is my reasoning on this correct? The summary produces following output (excerpt):

    a:
    
      Mean             SD               MC Error         89% HPD interval
      -------------------------------------------------------------------
      
      0.544            0.044            0.001            [0.482, 0.620]
      0.713            0.044            0.001            [0.641, 0.781]
      0.788            0.054            0.002            [0.709, 0.882]
      0.506            0.059            0.002            [0.407, 0.593]
    
    sigma:
    
      Mean             SD               MC Error         89% HPD interval
      -------------------------------------------------------------------
      
      0.131            0.019            0.001            [0.101, 0.159]
    

    Compare this output with the book (page 159).

    opened by sarajcev 2
  • update notebook, WIP on Chp10

    update notebook, WIP on Chp10

    I am working to finish Chp 10 and meanwhile also did some clean up of the the finished notebooks:

    1, add column names to pm.compare output, so that the dataframe and compare plot has better label (should add this feature to pymc3 main retro as well): 2, using new init (jitter+adapt_diag) and increase tuning samples (partly get rid of the warning of acceptance probability mismatch the target) 3, retina display across notebook 4, remove pm.effective_n and pm.gelman_rubin as it is now included in pm.summary. 5, changing pm.df_summary to pm.summary.

    opened by junpenglao 2
Owner
Osvaldo Martin
Osvaldo Martin
ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

ColossalAI-Examples This repository contains examples of training models with Co

HPC-AI Tech 185 Jan 9, 2023
TensorFlow ROCm port

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

ROCm Software Platform 622 Jan 9, 2023
hipCaffe: the HIP port of Caffe

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Cent

ROCm Software Platform 126 Dec 5, 2022
Pytorch port of Google Research's LEAF Audio paper

leaf-audio-pytorch Pytorch port of Google Research's LEAF Audio paper published at ICLR 2021. This port is not completely finished, but the Leaf() fro

Dennis Fedorishin 80 Oct 31, 2022
A data-driven maritime port simulator

PySeidon - A Data-Driven Maritime Port Simulator ?? Extendable and modular software for maritime port simulation. This software uses entity-component

null 6 Apr 10, 2022
A PyTorch port of the Neural 3D Mesh Renderer

Neural 3D Mesh Renderer (CVPR 2018) This repo contains a PyTorch implementation of the paper Neural 3D Mesh Renderer by Hiroharu Kato, Yoshitaka Ushik

Daniilidis Group University of Pennsylvania 1k Jan 9, 2023
Tensorflow port of a full NetVLAD network

netvlad_tf The main intention of this repo is deployment of a full NetVLAD network, which was originally implemented in Matlab, in Python. We provide

Robotics and Perception Group 225 Nov 8, 2022
A big endian Gentoo port developed on a Pine64.org RockPro64

Gentoo-aarch64_be A big endian Gentoo port developed on a Pine64.org RockPro64 The endian wars are over... little endian won. As a result, it is incre

Rory Bolt 6 Dec 7, 2022
A port of muP to JAX/Haiku

MUP for Haiku This is a (very preliminary) port of Yang and Hu et al.'s μP repo to Haiku and JAX. It's not feature complete, and I'm very open to sugg

null 18 Dec 30, 2022
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a

Jia Li 256 Dec 24, 2022
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

SHI Lab 174 Dec 19, 2022
Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms

LESA Introduction This repository contains the official implementation of Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Cont

Chenglin Yang 20 Dec 31, 2021
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 161 Dec 8, 2022
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

zhaohu xing 112 Dec 16, 2022
《Rethinking Sptil Dimensions of Vision Trnsformers》(2021)

Rethinking Spatial Dimensions of Vision Transformers Byeongho Heo, Sangdoo Yun, Dongyoon Han, Sanghyuk Chun, Junsuk Choe, Seong Joon Oh | Paper NAVER

NAVER AI 224 Dec 27, 2022
Official implementation of Rethinking Graph Neural Architecture Search from Message-passing (CVPR2021)

Rethinking Graph Neural Architecture Search from Message-passing Intro The GNAS can automatically learn better architecture with the optimal depth of

Shaofei Cai 48 Sep 30, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group 897 Jan 5, 2023
[ICLR2021oral] Rethinking Architecture Selection in Differentiable NAS

DARTS-PT Code accompanying the paper ICLR'2021: Rethinking Architecture Selection in Differentiable NAS Ruochen Wang, Minhao Cheng, Xiangning Chen, Xi

Ruochen Wang 86 Dec 27, 2022
Rethinking the U-Net architecture for multimodal biomedical image segmentation

MultiResUNet Rethinking the U-Net architecture for multimodal biomedical image segmentation This repository contains the original implementation of "M

Nabil Ibtehaz 308 Jan 5, 2023