Overview
This project uses historical baseball games data to calculate an Elo-like rating for MLB teams based on regular season match ups. The Elo rating system was originally developed for ranking chess players but also can be applied to other individual and team sports. In particular it works well for measuring performance of baseball teams because of the large number of games played per season.
Wikipedia has more technical details on the rating system: https://en.m.wikipedia.org/wiki/Elo_rating_system
A similar analysis was done by 538: https://projects.fivethirtyeight.com/complete-history-of-mlb/
This repo consists of three primary pieces of code:
Parser.py
imports and cleans the game-level data. It performs basic calculations, such as determining game winners.Elo.py
calculates Elo ratings for a given set of games dataVisualization.R
plots the output in a variety of interesting ways
Data
Source: https://www.retrosheet.org/gamelogs/index.html
This data was compiled by Retosheet.org. The only stipulation for use of this data is prominent display of this statement:
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".
The formats of the data can be found in the Formats.py
module. The Parser.py
script imports the annual files, applies some cleaning and formatting, and stores the consolidated data in a SQLite database for later use.
Examples
This graph shows the distribution of each team's Elo rating over the course of the decade 2010-2019. The ratings are weighted by in-season days.
This graph shows the progress of the 5 teams in the AL West over the 5-year period starting in 2014.
Each team was also ranked by their Elo relative to the rest of the league on each day. The graph below shows the time spent at each rank for the 5 teams in the AL West. The gradient indicates the years that those ranks occured.
Future improvements
- Finish tuning parameters for best model performance
- Expand analysis to earlier time periods (requires handling of teams entering/leaving league)