Statistical Rethinking course winter 2022

Richard McElreath

Last update: Dec 31, 2022

Related tags

Data Analysis stat_rethinking_2022

Overview

Statistical Rethinking (2022 Edition)

Instructor: Richard McElreath

Lectures: Uploaded <Playlist> and pre-recorded, two per week

Discussion: Online, Fridays 3pm-4pm Central European Time

Purpose

This course teaches data analysis, but it focuses on scientific models first. The unfortunate truth about data is that nothing much can be done with it, until we say what caused it. We will prioritize conceptual, causal models and precise questions about those models. We will use Bayesian data analysis to connect scientific models to evidence. And we will learn powerful computational tools for coping with high-dimension, imperfect data of the kind that biologists and social scientists face.

Format

Online, flipped instruction. The lectures are pre-recorded. We'll meet online once a week for an hour to work through the solutions to the assigned problems.

We'll use the 2nd edition of my book, <Statistical Rethinking>. I'll provide a PDF of the book to enrolled students.

Registration: Please sign up via <[COURSE IS FULL SORRY]>. I've also set aside 100 audit tickets at the same link, for people who want to participate, but who don't need graded work and course credit.

Calendar & Topical Outline

There are 10 weeks of instruction. Links to lecture recordings will appear in this table. Weekly problem sets are assigned on Fridays and due the next Friday, when we discuss the solutions in the weekly online meeting.

Lecture playlist on Youtube: <Statistical Rethinking 2022>

Week ##	Meeting date	Reading	Lectures
Week 01	07 January	Chapters 1, 2 and 3	[1] <The Golem of Prague> <(Slides)> [2] <Bayesian Inference> <(Slides)>
Week 02	14 January	Chapters 4 and 5	[3] <Basic Regression> <(Slides)> [4] <Categories & Curves> <(Slides)>
Week 03	21 January	Chapters 5 and 6	[5] <Elemental Confounds> <(Slides)> [6] <Good & Bad Controls> <(Slides)>
Week 04	28 January	Chapters 7 and 8	[7] Overfitting [8] Interactions
Week 05	04 February	Chapters 9, 10 and 11	[9] Markov chain Monte Carlo [10] Binomial GLMs
Week 06	11 February	Chapters 11 and 12	[11] Poisson GLMs [12] Ordered Categories
Week 07	18 February	Chapter 13	[13] Multilevel Models [14] Multi-Multilevel Models
Week 08	25 February	Chapter 14	[15] Varying Slopes [16] Gaussian Processes
Week 09	04 March	Chapter 15	[17] Measurement Error [18] Missing Data
Week 10	11 March	Chapters 16 and 17	[19] Beyond GLMs: State-space Models, ODEs [20] Horoscopes

Coding

This course involves a lot of scripting. Students can engage with the material using either the original R code examples or one of several conversions to other computing environments. The conversions are not always exact, but they are rather complete. Each option is listed below.

Original R Flavor

For those who want to use the original R code examples in the print book, you need to install the rethinking R package. The code is all on github https://github.com/rmcelreath/rethinking/ and there are additional details about the package there, including information about using the more-up-to-date cmdstanr instead of rstan as the underlying MCMC engine.

R + Tidyverse + ggplot2 + brms

The <Tidyverse/brms> conversion is very high quality and complete through Chapter 14.

Python and PyMC3

The <Python/PyMC3> conversion is quite complete.

Julia and Turing

The <Julia/Turing> conversion is not as complete, but is growing fast and presents the Rethinking examples in multiple Julia engines, including the great <TuringLang>.

Other

The are several other conversions. See the full list at https://xcelab.net/rm/statistical-rethinking/.

Homework and solutions

I will also post problem sets and solutions. Check the folders at the top of the repository.

Comments

wrong model in lecture 9 code?

I'm trying to reproduce what you did in lecture 9. Getting stuck at the marginal/counterfactual example. Your (updated) code is this:

simulate as if all apps from women

p_G1 <- link(m2,data=list( D=rep(1:6,times=apps_per_dept), N=rep(1,total_apps), G=rep(1,total_apps)))

But m2 is the model from the simulated data, and if I understand correctly, here we are trying to mimic the real data. So I think it should be mGD. And indeed, in your script you have this:

OLD WRONG CODE!

#p_G1 <- link( mGD , data=list(N=dat$N,D=dat$D,G=rep(1,12)) )

I think mGD is the right model. Unfortunately when I use mGD with the updated code above, my result figure doesn't look like yours (it also doesn't look like yours when. Instead of the main peak at around 0.1, I get the main peak at 0 and a minor one at 0.2. I'm not sure what's going on and trying to figure things out. Any pointers appreciated. Thanks!

opened by andreashandel 7
error still in Lecture 9 marginalized effect example?
Hi, I think you said that the error from lecture 9 is fixed, but I'm wondering if an error is still there on slide 75 and 76?

For example on slide 75, there's this code where link refers to model m2:

# simulate as if all apps from women p_G1 <- link(m2,data=list( D=rep(1:6,times=apps_per_dept), N=rep(1,total_apps), G=rep(1,total_apps)))

But in the lecture code, the model now refers to mGD.

# simulate as if all apps from women p_G1 <- link(mGD,data=list( D=rep(1:6,times=apps_per_dept), N=rep(1,total_apps), G=rep(1,total_apps)))
opened by benslack19 2
many base R plots not rendering scatterplot points in slides
FYI many base R scatterplots are not rendering in the pdf slides.

Examples:

Lecture 3 slides 25-28

Lecture 3 slide 38

Lecture 3 slide 40

Lecture 4 slide 6

Lecture 4 slide 8

Lecture 5 slide 6

Lecture 5 slides 13-4

Lecture 6 slide 87

List above isn't all slides that have issues but a few for illustrative.
opened by wesslen 2
which book to buy?

which book to buy?

Statistical Rethinking: A Bayesian Course with Examples in R and STAN Hardcover – March 16 2020 by Richard McElreath (Author) ? https://www.amazon.ca/Statistical-Rethinking-Bayesian-Course-Examples/dp/036713991X/ref=sr_1_2?gclid=Cj0KCQiA8vSOBhCkARIsAGdp6RRjxXH1pEPmwImkgZcRpJt3Dg4TtOtwemUP5kBetPmUsQKQW2WtA2IaAnkWEALw_wcB&hvadid=229990286577&hvdev=c&hvlocphy=9000920&hvnetw=g&hvqmt=e&hvrand=14759278627849499121&hvtargid=kwd-300572196203&hydadcr=3317_10311015&keywords=statistical+rethinking&qid=1641911963&sr=8-2

opened by Sandy4321 2
This event has sold out online

Hi Richard McElreath I am a recent undergrad with an interest in applied statistics. I am willing to learn more about statistics but the website says "This event has sold out online" Can you please let me in to attend your lectures?

opened by VellalaVineethKumar 2
Unable to replicate a plot in Lecture 09

I was unable to replicate the plot of the hypothetical effect of manipulating the perception of applicants' gender. Using the code provided in the repository, I got something like this: Instead of the one shown in the lecture:

opened by ellen-ying 1
Add Chapter 5 to week 2?

Given the discussion of DAGs and categorical variables in the homework, would it make sense to add Chapter 5 to this week's reading on the README.md schedule?

opened by benslack19 1
Possibility to publish our solutions to exercises from the 2nd edition book?

Dear Prof. McElreath,

My colleagues and I are working through the exercises and we are wondering whether it's allowed to make our solutions publicly available in a, for example, blogdown website?

Regards, Mikhael

opened by mdmanurung 1
Question about multiplication in Bayesian Inference

Hi Richard @rmcelreath ,

Thanks for this great course! I have been reading chapter two of the book and I can't see why for the marble example, successive multiplications will produce the same results as a single number of ways calculation. In the example we are calculating the number of ways to draw with replacement a blue marble followed by a white and then another blue marble (BWB), given there are four marbles in the bag, I can't seem to get the same result if I do calculation in three stages like:

Posterior = (Number of ways / Total number of ways) * Prior

Assuming Prior is 1 at the beginning.

| p | Blue | White | Blue | Product | |------| ------- | ------- | ------- | --------- | | 0.25 | 0.17 (1)| 0.50 (3)| 0.17 (1)| 0.0145 | | 0.50 | 0.33 (2)| 0.33 (2)| 0.33 (2)| 0.0360 | | 0.75 | 0.50 (3)| 0.17 (1)| 0.50 (3)| 0.0425 |

Calculate in one go:

| p | BWB | |------| ------- | | 0.25 | 0.15 (3)| | 0.50 | 0.40 (8)| | 0.75 | 0.45 (9)|

We can see the for example, P = 0.75, doing successive draw / multiply gives a posterior probability of 0.0425, but if we do this in one go in the second table, we get 9 out of 20 ways = 0.45 which doesn't match with 0.0425. Assuming the normalization (denominator) is always 1 for every stage?

For the proportion of water in globe example, I also tried to see if the probabilities are calculated individually, the product will be the same as a single calculation. If we had the sequence WWWLL:

| E | Likelihood | Posterior | How Posterior is calculated |------------| ---------- | ---------------------------- | --------------------------------------------------------------------------------- | W | p | 2p | (p * 1)/ integral from 0 to 1 {p*1} | WW | p | 3p^2 | (p * 2p)/ integral from 0 to 1 {p * 2p} | WWW | p | 4p^3 | (p * 3p^2)/ integral from 0 to 1 {p * 3p^2} | WWWL | 1-p | 20p^3 - 20p^4 | ((1-p) * 4p^3)/ integral from 0 to 1 {(1-p) * 4p^3} | WWWLL | 1-p | 60p^3 - 120p^4 + 60p^5 | ((1-p) * (20p^3−20p^4))/ integral from 0 to 1 {(1-p) * (20p^3−20p^4)}

And this is different if I use the binomial distribution formula directly, the result is:

$5!/(2!3!) * p^3 * (1-p)^2 = 10p^3 - 20p^4 + 10p^5

Same question in stack exchange: https://math.stackexchange.com/questions/4503794/bayesian-inference-multiplication

opened by JerryCBH 0
External registration open?

D'oh! APOLOGIES! I now see the tweet I saw was for 2022 not 2023. Missed my chance. I'm blaming COVID.

<< Ignore the following -- and feel free to delete this "issue" if someone knows how >>

Sorry if I simply missed a memo but I'm hoping to register for the course. Opening it to external participants was mentioned in a tweet a couple of weeks ago, but I've not seen any follow-up.

Does anyone know if this is a possibility? And, if so, is there a link to the registration process?

Thanks very much for any leads!

opened by rbalshaw 0
Phoenix Wright?!

I am taking the course right now and I have just come across the Phoenix Wright meme on colliders. Thanks for bringing back good old memories! Loving the course so far!

opened by tinosai 0
lppd CV equation (text p218)

Thank you for your great book, slides and YouTube lecture. I am struggling to read through your book (2nd edition). lppd CV equation, on page 218 of your book and on Lecture 7 slide, looks inconsistent with lppd equation on page 210 and lppd IS equation on page 218. I think "log" should be put before "1/S". Am I wrong?

opened by mitsuoxv 0
Question about interpretation of the individual intercepts in m11.4

In model m11.4, the model allows each monkey to have their own intercept but common treatment effect. I am not sure about the interpretation of the individual intercept when the treatment variable has index contrast. Does the intercept indicate the logit(p left) of an individual monkey when there is no treatment, and does this make sense when the smallest coding value of treatment is 1?

Sorry if the answer is obvious, but I haven't been able to wrap my head around this.

Thank you.

opened by thai1491 1
No residual plot code for Chapter 5 (Figure 5-4)

Hi professor @rmcelreath , thanks for the amazing book! Loving it!

I would like to know if it is possible to share some code around the residual plots of Chapter 5, more specific, Figure 5-4.

I am trying to replicate it, but finding some difficulties. I will keep trying anyway.

All the best, Edu

opened by edumagol 0
More dramatic gain from partial pooling
First of all thanks for the book and the video course. The motivation behind multilevel models is clear: partial pooling is an "adaptive compromise" between no pooling and complete pooling. In the video lecture-12 (https://speakerdeck.com/rmcelreath/statistical-rethinking-2022-lecture-12?slide=40) we show the "gain" of using partial pooling using "cross-validation". But the cross-validation score of partial pooling is very similar to the complete pooling.

For this particular example, are there any other (more convincing?) arguments (other than cross-validation) to use partial pooling against complete pooling?

Is it possible to create a "simple" example in which we observe a more "dramatic" U-shaped cross-validation line?

Thanks in advance.
opened by armanboyaci 0

Owner

Richard McElreath

GitHub

Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

8k Dec 29, 2022

Probabilistic reasoning and statistical analysis in TensorFlow

TensorFlow Probability TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFl

3.8k Jan 5, 2023

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

2.2k Dec 25, 2022

Describing statistical models in Python using symbolic formulas

Patsy is a Python library for describing statistical models (especially linear models, or models that have a linear component) and building design mat

866 Dec 16, 2022

Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

1.2k Dec 31, 2022

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is a state-of-the-art platform for statistical modeling and high-

229 Dec 29, 2022

statDistros is a Python library for dealing with various statistical distributions

StatisticalDistributions statDistros statDistros is a Python library for dealing with various statistical distributions. Now it provides various stati

1 Oct 3, 2021

Creating a statistical model to predict 10 year treasury yields

Predicting 10-Year Treasury Yields Intitially, I wanted to see if the volatility in the stock market, represented by the VIX index (data source), had

10 Oct 27, 2021

Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

Long Course "Geophysical Python for Seismic Data Analysis" Instruktur: Dr.rer.nat. Wiwit Suryanto, M.Si Dipersiapkan oleh: Anang Sahroni Waktu: Sesi 1

0 Dec 4, 2021

Sample code for Harry's Airflow online trainng course

Sample code for Harry's Airflow online trainng course You can find the videos on youtube or bilibili. I am working on adding below things: the slide p

102 Dec 30, 2022

ICLR 2022 Paper submission trend analysis

Visualize ICLR 2022 OpenReview Data

75 Dec 6, 2022

[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

Nested Collaborative Learning for Long-Tailed Visual Recognition This repository is the official PyTorch implementation of the paper in CVPR 2022: Nes

65 Dec 9, 2022

Statistical Analysis 📈 focused on statistical analysis and exploration used on various data sets for personal and professional projects.

Statistical Analysis ?? This repository focuses on statistical analysis and the exploration used on various data sets for personal and professional pr

1 Sep 3, 2022

Ice Skating Simulator for Winter and Christmas [yay]

1 Aug 21, 2022

[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

36 Nov 23, 2022

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

3 Aug 20, 2022

All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

3.4k Jan 7, 2023

A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Udemy Scraper A Web Scraper built with beautiful soup, that fetches udemy course information. Installation Virtual Environment Firstly, it is recommen

15 May 17, 2022

A course-planning, course-map rendering and GPA-calculation web service, designed for the SFU (Simon Fraser University) student.

SFU Course Planner What is the overall goal of the project (i.e. what does it do, or what problem is it solving)? As the title suggests, this project

1 Oct 21, 2021

Course material for the Multi-agents and computer graphics course

TC2008B Course material for the Multi-agents and computer graphics course. Setup instructions Strongly recommend using a custom conda environment. Ins

16 Dec 13, 2022

Statistical Rethinking course winter 2022

Related tags

Overview

Statistical Rethinking (2022 Edition)

Purpose

Format

Calendar & Topical Outline

Coding

Original R Flavor

R + Tidyverse + ggplot2 + brms

Python and PyMC3

Julia and Turing

Other

Homework and solutions

Comments

simulate as if all apps from women

OLD WRONG CODE!

Owner

Richard McElreath

Statsmodels: statistical modeling and econometrics in Python

Probabilistic reasoning and statistical analysis in TensorFlow

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

Describing statistical models in Python using symbolic formulas

Statistical package in Python based on Pandas

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

statDistros is a Python library for dealing with various statistical distributions

Creating a statistical model to predict 10 year treasury yields

Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

Sample code for Harry's Airflow online trainng course

ICLR 2022 Paper submission trend analysis

[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

Statistical Analysis 📈 focused on statistical analysis and exploration used on various data sets for personal and professional projects.

Ice Skating Simulator for Winter and Christmas [yay]

[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

A course-planning, course-map rendering and GPA-calculation web service, designed for the SFU (Simon Fraser University) student.

Course material for the Multi-agents and computer graphics course