Statistical Rethinking course winter 2022

Overview

Statistical Rethinking (2022 Edition)

Instructor: Richard McElreath

Lectures: Uploaded <Playlist> and pre-recorded, two per week

Discussion: Online, Fridays 3pm-4pm Central European Time

Purpose

This course teaches data analysis, but it focuses on scientific models first. The unfortunate truth about data is that nothing much can be done with it, until we say what caused it. We will prioritize conceptual, causal models and precise questions about those models. We will use Bayesian data analysis to connect scientific models to evidence. And we will learn powerful computational tools for coping with high-dimension, imperfect data of the kind that biologists and social scientists face.

Format

Online, flipped instruction. The lectures are pre-recorded. We'll meet online once a week for an hour to work through the solutions to the assigned problems.

We'll use the 2nd edition of my book, <Statistical Rethinking>. I'll provide a PDF of the book to enrolled students.

Registration: Please sign up via <[COURSE IS FULL SORRY]>. I've also set aside 100 audit tickets at the same link, for people who want to participate, but who don't need graded work and course credit.

Calendar & Topical Outline

There are 10 weeks of instruction. Links to lecture recordings will appear in this table. Weekly problem sets are assigned on Fridays and due the next Friday, when we discuss the solutions in the weekly online meeting.

Lecture playlist on Youtube: <Statistical Rethinking 2022>

Week ## Meeting date Reading Lectures
Week 01 07 January Chapters 1, 2 and 3 [1] <The Golem of Prague> <(Slides)>
[2] <Bayesian Inference> <(Slides)>
Week 02 14 January Chapters 4 and 5 [3] <Basic Regression> <(Slides)>
[4] <Categories & Curves> <(Slides)>
Week 03 21 January Chapters 5 and 6 [5] <Elemental Confounds> <(Slides)>
[6] <Good & Bad Controls> <(Slides)>
Week 04 28 January Chapters 7 and 8 [7] Overfitting
[8] Interactions
Week 05 04 February Chapters 9, 10 and 11 [9] Markov chain Monte Carlo
[10] Binomial GLMs
Week 06 11 February Chapters 11 and 12 [11] Poisson GLMs
[12] Ordered Categories
Week 07 18 February Chapter 13 [13] Multilevel Models
[14] Multi-Multilevel Models
Week 08 25 February Chapter 14 [15] Varying Slopes
[16] Gaussian Processes
Week 09 04 March Chapter 15 [17] Measurement Error
[18] Missing Data
Week 10 11 March Chapters 16 and 17 [19] Beyond GLMs: State-space Models, ODEs
[20] Horoscopes

Coding

This course involves a lot of scripting. Students can engage with the material using either the original R code examples or one of several conversions to other computing environments. The conversions are not always exact, but they are rather complete. Each option is listed below.

Original R Flavor

For those who want to use the original R code examples in the print book, you need to install the rethinking R package. The code is all on github https://github.com/rmcelreath/rethinking/ and there are additional details about the package there, including information about using the more-up-to-date cmdstanr instead of rstan as the underlying MCMC engine.

R + Tidyverse + ggplot2 + brms

The <Tidyverse/brms> conversion is very high quality and complete through Chapter 14.

Python and PyMC3

The <Python/PyMC3> conversion is quite complete.

Julia and Turing

The <Julia/Turing> conversion is not as complete, but is growing fast and presents the Rethinking examples in multiple Julia engines, including the great <TuringLang>.

Other

The are several other conversions. See the full list at https://xcelab.net/rm/statistical-rethinking/.

Homework and solutions

I will also post problem sets and solutions. Check the folders at the top of the repository.

Comments
  • wrong model in lecture 9 code?

    wrong model in lecture 9 code?

    I'm trying to reproduce what you did in lecture 9. Getting stuck at the marginal/counterfactual example. Your (updated) code is this:

    simulate as if all apps from women

    p_G1 <- link(m2,data=list( D=rep(1:6,times=apps_per_dept), N=rep(1,total_apps), G=rep(1,total_apps)))

    But m2 is the model from the simulated data, and if I understand correctly, here we are trying to mimic the real data. So I think it should be mGD. And indeed, in your script you have this:

    OLD WRONG CODE!

    #p_G1 <- link( mGD , data=list(N=dat$N,D=dat$D,G=rep(1,12)) )

    I think mGD is the right model. Unfortunately when I use mGD with the updated code above, my result figure doesn't look like yours (it also doesn't look like yours when. Instead of the main peak at around 0.1, I get the main peak at 0 and a minor one at 0.2. I'm not sure what's going on and trying to figure things out. Any pointers appreciated. Thanks!

    opened by andreashandel 7
  • error still in Lecture 9 marginalized effect example?

    error still in Lecture 9 marginalized effect example?

    Hi, I think you said that the error from lecture 9 is fixed, but I'm wondering if an error is still there on slide 75 and 76?

    For example on slide 75, there's this code where link refers to model m2:

    # simulate as if all apps from women
    p_G1 <- link(m2,data=list(
    D=rep(1:6,times=apps_per_dept),
    N=rep(1,total_apps),
    G=rep(1,total_apps)))
    

    But in the lecture code, the model now refers to mGD.

    # simulate as if all apps from women
    p_G1 <- link(mGD,data=list(
        D=rep(1:6,times=apps_per_dept),
        N=rep(1,total_apps),
        G=rep(1,total_apps)))
    
    opened by benslack19 2
  • many base R plots not rendering scatterplot points in slides

    many base R plots not rendering scatterplot points in slides

    opened by wesslen 2
  • which book to buy?

    which book to buy?

    which book to buy?

    Statistical Rethinking: A Bayesian Course with Examples in R and STAN Hardcover – March 16 2020 by Richard McElreath (Author) ? https://www.amazon.ca/Statistical-Rethinking-Bayesian-Course-Examples/dp/036713991X/ref=sr_1_2?gclid=Cj0KCQiA8vSOBhCkARIsAGdp6RRjxXH1pEPmwImkgZcRpJt3Dg4TtOtwemUP5kBetPmUsQKQW2WtA2IaAnkWEALw_wcB&hvadid=229990286577&hvdev=c&hvlocphy=9000920&hvnetw=g&hvqmt=e&hvrand=14759278627849499121&hvtargid=kwd-300572196203&hydadcr=3317_10311015&keywords=statistical+rethinking&qid=1641911963&sr=8-2

    opened by Sandy4321 2
  • This event has sold out online

    This event has sold out online

    Hi Richard McElreath I am a recent undergrad with an interest in applied statistics. I am willing to learn more about statistics but the website says "This event has sold out online" Can you please let me in to attend your lectures?

    opened by VellalaVineethKumar 2
  • Unable to replicate a plot in Lecture 09

    Unable to replicate a plot in Lecture 09

    I was unable to replicate the plot of the hypothetical effect of manipulating the perception of applicants' gender. Using the code provided in the repository, I got something like this: image Instead of the one shown in the lecture: image

    opened by ellen-ying 1
  • Add Chapter 5 to week 2?

    Add Chapter 5 to week 2?

    Given the discussion of DAGs and categorical variables in the homework, would it make sense to add Chapter 5 to this week's reading on the README.md schedule?

    opened by benslack19 1
  • Possibility to publish our solutions to exercises from the 2nd edition book?

    Possibility to publish our solutions to exercises from the 2nd edition book?

    Dear Prof. McElreath,

    My colleagues and I are working through the exercises and we are wondering whether it's allowed to make our solutions publicly available in a, for example, blogdown website?

    Regards, Mikhael

    opened by mdmanurung 1
  • Question about multiplication in Bayesian Inference

    Question about multiplication in Bayesian Inference

    Hi Richard @rmcelreath ,

    Thanks for this great course! I have been reading chapter two of the book and I can't see why for the marble example, successive multiplications will produce the same results as a single number of ways calculation. In the example we are calculating the number of ways to draw with replacement a blue marble followed by a white and then another blue marble (BWB), given there are four marbles in the bag, I can't seem to get the same result if I do calculation in three stages like:

    Posterior = (Number of ways / Total number of ways) * Prior

    Assuming Prior is 1 at the beginning.

    | p | Blue | White | Blue | Product | |------| ------- | ------- | ------- | --------- | | 0.25 | 0.17 (1)| 0.50 (3)| 0.17 (1)| 0.0145 | | 0.50 | 0.33 (2)| 0.33 (2)| 0.33 (2)| 0.0360 | | 0.75 | 0.50 (3)| 0.17 (1)| 0.50 (3)| 0.0425 |

    Calculate in one go:

    | p | BWB | |------| ------- | | 0.25 | 0.15 (3)| | 0.50 | 0.40 (8)| | 0.75 | 0.45 (9)|

    We can see the for example, P = 0.75, doing successive draw / multiply gives a posterior probability of 0.0425, but if we do this in one go in the second table, we get 9 out of 20 ways = 0.45 which doesn't match with 0.0425. Assuming the normalization (denominator) is always 1 for every stage?

    For the proportion of water in globe example, I also tried to see if the probabilities are calculated individually, the product will be the same as a single calculation. If we had the sequence WWWLL:

    | E | Likelihood | Posterior | How Posterior is calculated |------------| ---------- | ---------------------------- | --------------------------------------------------------------------------------- | W | p | 2p | (p * 1)/ integral from 0 to 1 {p*1} | WW | p | 3p^2 | (p * 2p)/ integral from 0 to 1 {p * 2p} | WWW | p | 4p^3 | (p * 3p^2)/ integral from 0 to 1 {p * 3p^2} | WWWL | 1-p | 20p^3 - 20p^4 | ((1-p) * 4p^3)/ integral from 0 to 1 {(1-p) * 4p^3} | WWWLL | 1-p | 60p^3 - 120p^4 + 60p^5 | ((1-p) * (20p^3−20p^4))/ integral from 0 to 1 {(1-p) * (20p^3−20p^4)}

    And this is different if I use the binomial distribution formula directly, the result is:

    $5!/(2!3!) * p^3 * (1-p)^2 = 10p^3 - 20p^4 + 10p^5

    Same question in stack exchange: https://math.stackexchange.com/questions/4503794/bayesian-inference-multiplication

    opened by JerryCBH 0
  • External registration open?

    External registration open?

    D'oh! APOLOGIES! I now see the tweet I saw was for 2022 not 2023. Missed my chance. I'm blaming COVID.

    << Ignore the following -- and feel free to delete this "issue" if someone knows how >>

    Sorry if I simply missed a memo but I'm hoping to register for the course. Opening it to external participants was mentioned in a tweet a couple of weeks ago, but I've not seen any follow-up.

    Does anyone know if this is a possibility? And, if so, is there a link to the registration process?

    Thanks very much for any leads!

    opened by rbalshaw 0
  • Phoenix Wright?!

    Phoenix Wright?!

    I am taking the course right now and I have just come across the Phoenix Wright meme on colliders. Thanks for bringing back good old memories! Loving the course so far!

    opened by tinosai 0
  • lppd CV equation (text p218)

    lppd CV equation (text p218)

    Thank you for your great book, slides and YouTube lecture. I am struggling to read through your book (2nd edition). lppd CV equation, on page 218 of your book and on Lecture 7 slide, looks inconsistent with lppd equation on page 210 and lppd IS equation on page 218. I think "log" should be put before "1/S". Am I wrong?

    opened by mitsuoxv 0
  • Question about interpretation of the individual intercepts in m11.4

    Question about interpretation of the individual intercepts in m11.4

    In model m11.4, the model allows each monkey to have their own intercept but common treatment effect. I am not sure about the interpretation of the individual intercept when the treatment variable has index contrast. Does the intercept indicate the logit(p left) of an individual monkey when there is no treatment, and does this make sense when the smallest coding value of treatment is 1?

    Sorry if the answer is obvious, but I haven't been able to wrap my head around this.

    Thank you.

    opened by thai1491 1
  • No residual plot code for Chapter 5 (Figure 5-4)

    No residual plot code for Chapter 5 (Figure 5-4)

    Hi professor @rmcelreath , thanks for the amazing book! Loving it!

    I would like to know if it is possible to share some code around the residual plots of Chapter 5, more specific, Figure 5-4.

    I am trying to replicate it, but finding some difficulties. I will keep trying anyway.

    All the best, Edu

    opened by edumagol 0
  • More dramatic gain from partial pooling

    More dramatic gain from partial pooling

    First of all thanks for the book and the video course. The motivation behind multilevel models is clear: partial pooling is an "adaptive compromise" between no pooling and complete pooling. In the video lecture-12 (https://speakerdeck.com/rmcelreath/statistical-rethinking-2022-lecture-12?slide=40) we show the "gain" of using partial pooling using "cross-validation". But the cross-validation score of partial pooling is very similar to the complete pooling.

    1. For this particular example, are there any other (more convincing?) arguments (other than cross-validation) to use partial pooling against complete pooling?
    2. Is it possible to create a "simple" example in which we observe a more "dramatic" U-shaped cross-validation line?

    Thanks in advance.

    opened by armanboyaci 0
Owner
Richard McElreath
Richard McElreath
Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

statsmodels 8k Dec 29, 2022
Probabilistic reasoning and statistical analysis in TensorFlow

TensorFlow Probability TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFl

null 3.8k Jan 5, 2023
Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

pgmpy 2.2k Dec 25, 2022
Describing statistical models in Python using symbolic formulas

Patsy is a Python library for describing statistical models (especially linear models, or models that have a linear component) and building design mat

Python for Data 866 Dec 16, 2022
Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

Raphael Vallat 1.2k Dec 31, 2022
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is a state-of-the-art platform for statistical modeling and high-

Stan 229 Dec 29, 2022
statDistros is a Python library for dealing with various statistical distributions

StatisticalDistributions statDistros statDistros is a Python library for dealing with various statistical distributions. Now it provides various stati

null 1 Oct 3, 2021
Creating a statistical model to predict 10 year treasury yields

Predicting 10-Year Treasury Yields Intitially, I wanted to see if the volatility in the stock market, represented by the VIX index (data source), had

null 10 Oct 27, 2021
Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

Long Course "Geophysical Python for Seismic Data Analysis" Instruktur: Dr.rer.nat. Wiwit Suryanto, M.Si Dipersiapkan oleh: Anang Sahroni Waktu: Sesi 1

Anang Sahroni 0 Dec 4, 2021
Sample code for Harry's Airflow online trainng course

Sample code for Harry's Airflow online trainng course You can find the videos on youtube or bilibili. I am working on adding below things: the slide p

null 102 Dec 30, 2022
ICLR 2022 Paper submission trend analysis

Visualize ICLR 2022 OpenReview Data

Jintang Li 75 Dec 6, 2022
[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

Nested Collaborative Learning for Long-Tailed Visual Recognition This repository is the official PyTorch implementation of the paper in CVPR 2022: Nes

Jun Li 65 Dec 9, 2022
Statistical Analysis 📈 focused on statistical analysis and exploration used on various data sets for personal and professional projects.

Statistical Analysis ?? This repository focuses on statistical analysis and the exploration used on various data sets for personal and professional pr

Andy Pham 1 Sep 3, 2022
Ice Skating Simulator for Winter and Christmas [yay]

Ice Skating Simulator for Winter and Christmas [yay]

null 1 Aug 21, 2022
[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

null 36 Nov 23, 2022
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 3 Aug 20, 2022
All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

Daniel Bourke 3.4k Jan 7, 2023
A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Udemy Scraper A Web Scraper built with beautiful soup, that fetches udemy course information. Installation Virtual Environment Firstly, it is recommen

Aditya Gupta 15 May 17, 2022
A course-planning, course-map rendering and GPA-calculation web service, designed for the SFU (Simon Fraser University) student.

SFU Course Planner What is the overall goal of the project (i.e. what does it do, or what problem is it solving)? As the title suggests, this project

Ash Peng 1 Oct 21, 2021
Course material for the Multi-agents and computer graphics course

TC2008B Course material for the Multi-agents and computer graphics course. Setup instructions Strongly recommend using a custom conda environment. Ins

null 16 Dec 13, 2022