Astrostatistics
Davide Gerosa - [email protected]
University of Milano-Bicocca, 2022.
Schedule
- Introduction
- Probability and Statistics I
- Probability and Statistics II
- Probability and Statistics III
- Classical/Frequentist Statistical Inference: I
- Classical/Frequentist Statistical Inference: II
- Classical/Frequentist Statistical Inference: III
Aims
The use of statistics is ubiquitous in astronomy and astrophysics. Modern advances are made possible by the application of increasingly sophisticated tools, often dubbed as "data mining", "machine learning", and "artificial intelligence". This class provides an introduction to (some of) these statistical techniques in a very practical fashion, pairing formal derivations to hands-on computational applications. Although examples will be taken almost exclusively from the realm of astronomy, this class is appropriate to all Physics students interested in machine learning.
❗
Important
Data mining and machine learning are computational subjects. One does not understand how to treat scientific data by reading equations on the blackboard: you will need to get your hands dirty (and this is the fun part!). Students are required to come to classes with a computer or any device where you can code on (larger than a smartphone I would say...). Each class will pair theoretical explanations to hands-on exercises and demonstrations. These are the key content of the course, so please engage with them as much a possible.
Conceptual map of the class
Credits: Steve Taylor (Vanderbilt)
Textbook and Resources
The main textbook we will be using is:
"Statistics, Data Mining, and Machine Learning in Astronomy", Željko, Andrew, Jacob, and Gray. Princeton University Press, 2012.
It's a wonderful book that I keep on referring to in my research. The library has a few copies. What I really like about that book is that they provide the code behind each single figure: astroml.org/book_figures. The best way to approach these topics is to study the introduction on the book, then grab the code and try to play with it. Make sure you get the updated edition of the book (that's the one with a black cover, not orange) because all the examples have been updated to python 3.
There are many other good resources in astrostatistics, here is a partial list. Some of them are free.
- "Statistical Data Analysis", Cowan. Oxford Science Publications, 1997.
- "Data Analysis: A Bayesian Tutorial", Sivia and Skilling. Oxford Science Publications, 2006.
- "Bayesian Data Analysis", Gelman, Carlin, Stern, Dunson, Vehtari, and Rubin. Chapman & Hall, 2013. Free!
- "Python Data Science Handbook", VanderPlas. O'Reilly Media, 2016. Free!
- "Practical Statistics for Astronomers", Wall and Jenkins. Cambridge University Press, 2003.
- "Bayesian Logical Data Analysis for the Physical Sciences", Gregory. Cambridge University Press, 2005.
- "Modern Statistical Methods For Astronomy" Feigelson and Babu. Cambridge University Press, 2012.
- "Information theory, inference, and learning algorithms" MacKay. Cambridge University Press, 2003. Free!
- “Data analysis recipes". These free are chapters of books that is not yet finished by Hogg et al.
- "Choosing the binning for a histogram" [arXiv:0807.4820]
- "Fitting a model to data [arXiv:1008.4686]
- "Probability calculus for inference" [arXiv:1205.4446]
- "Using Markov Chain Monte Carlo" [arXiv:1710.06068]
- "Products of multivariate Gaussians in Bayesian inferences" [arXiv:2005.14199]
We will make heavy usage of the python programming language. If you need to refresh your python skills, here are some catch-up resources and online tutorials. A strong python programming background is essential in modern astrophysics!
- "Lectures on scientific computing with Python", R. Johansson et al.
- Python Programming for Scientists", T. Robitaille et al.
- "Learning Scientific Programming with Python", Hill, Cambridge University Press, 2020. Supporting code: scipython.com.
Credits
This class draws heavily from many others that came before me. Credit goes to:
- Stephen Taylor (Vanderbilt University), friend and collaborator: github.com/VanderbiltAstronomy/astr_8070_s21.
- Gordon Richards (Drexel University): github.com/gtrichards/PHYS_440_540.
- Jake Vanderplas (University of Washington): github.com/jakevdp/ESAC-stats-2014.
- Zeljko Ivezic (University of Washington): github.com/uw-astr-302-w18/astr-302-w18.
- Andy Connolly (University of Washington): cadence.lsst.org/introAstroML/.
- Karen Leighly (University of Oklahoma): seminar.ouml.org/.
- Adam Miller (Northwestern University): github.com/LSSTC-DSFP/LSSTC-DSFP-Sessions/.
- Jo Bovy (University of Toronto): astro.utoronto.ca/~bovy/teaching.html.
- Thomas Wiecki (PyMC Labs): twiecki.github.io/blog/2015/11/10/mcmc-sampling.