Python code for working with NFL play by play data.

Last update: Jan 5, 2023

Related tags

Documentation python nfl

Overview

nfl_data_py

nfl_data_py is a Python library for interacting with NFL data sourced from nflfastR, nfldata, dynastyprocess, and Draft Scout.

Includes import functions for play-by-play data, weekly data, seasonal data, rosters, win totals, scoring lines, officials, draft picks, draft pick values, schedules, team descriptive info, combine results and id mappings across various sites.

Installation

Use the package manager pip to install nfl_data_py.

pip install nfl_data_py

Usage

import nfl_data_py as nfl

Working with play-by-play data

nfl.import_pbp_data(years, columns, downcast=True, cache=False, alt_path=None)

Returns play-by-play data for the years and columns specified

years : required, list of years to pull data for (earliest available is 1999)

columns : optional, list of columns to pull data for

downcast : optional, converts float64 columns to float32, reducing memory usage by ~30%. Will slow down initial load speed ~50%

cache : optional, determines whether to pull pbp data from github repo or local cache generated by nfl.cache_pbp()

alt_path : optional, required if nfl.cache_pbp() is called using an alternate path to the default cache

nfl.see_pbp_cols()

returns list of columns available in play-by-play dataset

Working with weekly data

nfl.import_weekly_data(years, columns, downcast)

Returns weekly data for the years and columns specified

years : required, list of years to pull data for (earliest available is 1999)

columns : optional, list of columns to pull data for

downcast : converts float64 columns to float32, reducing memory usage by ~30%. Will slow down initial load speed ~50%

nfl.see_weekly_cols()

returns list of columns available in weekly dataset

Working with seasonal data

nfl.import_seasonal_data(years)

Returns seasonal data, including various calculated market share stats

years : required, list of years to pull data for (earliest available is 1999)

Additional data imports

nfl.import_rosters(years, columns)

Returns roster information for years and columns specified

years : required, list of years to pull data for (earliest available is 1999)

columns : optional, list of columns to pull data for

nfl.import_win_totals(years)

Returns win total lines for years specified

years : optional, list of years to pull

nfl.import_sc_lines(years)

Returns scoring lines for years specified

years : optional, list of years to pull

nfl.import_officials(years)

Returns official information by game for the years specified

years : optional, list of years to pull

nfl.import_draft_picks(years)

Returns list of draft picks for the years specified

years : optional, list of years to pull

nfl.import_draft_values()

Returns relative values by generic draft pick according to various popular valuation methods

nfl.import_team_desc()

Returns dataframe with color/logo/etc information for all NFL team

nfl.import_schedules(years)

Returns dataframe with schedule information for years specified

years : required, list of years to pull data for (earliest available is 1999)

nfl.import_combine_data(years, positions)

Returns dataframe with combine results for years and positions specified

years : optional, list or range of years to pull data from

positions : optional, list of positions to be pulled (standard format - WR/QB/RB/etc.)

nfl.import_ids(columns, ids)

Returns dataframe with mapped ids for all players across most major NFL and fantasy football data platforms

columns : optional, list of columns to return

ids : optional, list of ids to return

nfl.import_ngs_data(stat_type, years)

Returns dataframe with specified NGS data

columns : required, type of data (passing, rushing, receiving)

years : optional, list of years to return data for

nfl.import_depth_charts(years)

Returns dataframe with depth chart data

years : optional, list of years to return data for

nfl.import_injuries(years)

Returns dataframe of injury reports

years : optional, list of years to return data for

nfl.import_qbr(years, level, frequency)

Returns dataframe with QBR history

years : optional, years to return data for

level : optional, competition level to return data for, nfl or college, default nfl

frequency : optional, frequency to return data for, weekly or season, default season

nfl.import_pfr_passing(years)

Returns dataframe of PFR passing data

years : optional, years to return data for

nfl.import_snap_counts(years)

Returns dataframe with snap count records

years : optional, list of years to return data for

Additional features

nfl.cache_pbp(years, downcast=True, alt_path=None)

Caches play-by-play data locally to speed up download time. If years specified have already been cached they will be overwritten, so if using in-season must cache 1x per week to catch most recent data

years : required, list or range of years to cache

downcast : optional, converts float64 columns to float32, reducing memory usage by ~30%. Will slow down initial load speed ~50%

alt_path :optional, alternate path to store pbp cache - default is in program created user Local folder

nfl.clean_nfl_data(df)

Runs descriptive data (team name, player name, etc.) through various cleaning processes

df : required, dataframe to be cleaned

Recognition

I'd like to recognize all of Ben Baldwin, Sebastian Carl, and Lee Sharpe for making this data freely available and easy to access. I'd also like to thank Tan Ho, who has been an invaluable resource as I've worked through this project, and Josh Kazan for the resources and assistance he's provided.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Comments

Unable to import data

I am trying to run ` import nfl_data_py as nfl

nfl.import_seasonal_data([2010]) `

but I get the following error: OverflowError: value too large Exception ignored in: 'fastparquet.cencoding.read_bitpacked'

I have tried uninstalling and reinstalling fastparquet, which did not work, and the same issue has arisen in my other function calls.

More specifically: line 530, in read_col part[defi == max_defi] = dic[val]

IndexError: index 8389024 is out of bounds for axis 0 with size 94518

opened by samob917 6
Caching

I think the library would benefit from having a caching strategy for downloaded or processed files. One option is to download the files outside of pandas and use an http cache. This tends to be temporary, no more than 7 days. Another option is to give the user the option to read from and save to a cache, which could just involve reading/writing parquet files from the existing data directory, which would be more permanent, or to the system tmp directory, which is more transient. Let me know if either option is of interest.

opened by sansbacon 6
read_parquet engine parameter: 'fastparquet' vs. default 'auto' argument

Pandas will try pyarrow and default back to fastparquet if pyarrow is unavailable. Is there a specific reason you are specifying engine='fastparquet' because, if not, I think it would be better to leave it as the default 'auto' argument.

opened by sansbacon 3
ID Tables Small Issue?
Hi, i'm a newb and this is my first issue post so please delete if this is wrong. I suspect your code is fine and that the source data has some bugs, but i'm not sure who to alert to help them out.

I believe there are some slight data issues on the player ID table.

probably some more than listed below, but i have to run for now. these are pretty small things, doubtful they will be relevant to anything imminent for anyone... i ran into it on a merge i was doing on gsis_id that didn't like my many-to-one relationship because of it. i'm happy to share and/or try to help fix if manual edits are an option (i stink at coding though)

gsis_id has four duplications: 00-0020270, 00-0019641, 00-0016098, and 00-0029435. On further research, each of these same cases also has a duplication of the pff_id. There are no other pff_id duplications.

espn_ids have some duplication: 17257, 2578554, 2574009, 5774, 5730, 12771, 2516049, 13490, 2574010, 14660, 2582138, 16094, 17101. i'm not sure the source of all of these - some seem to be extremely similar names but different people and others seem to be typos (e.g. 5774 one of them should simply be 15774).

yahoo_ids have one dup: 33495. the one from Duke should be 33542
opened by robertryanharrison 2
.import_ngs_data() HTTPError when filtering for year(s)
HTTPError: HTTP Error 404: Not Found

I receive a 404: not found error when I try to import ngs data from a specific year:

This works just fine and returns the dataframe:

ngs_data = nfl.import_ngs_data('passing') ngs_data

When I add a specific year argument, this breaks:

ngs_data = nfl.import_ngs_data('passing', [2020]) ngs_data

Yes: I realize that nfl.import_ngs_data('passing') is comprehensive of all seasons that are available and I can filter with that.

I don't know much about these different file formats, but when I look in nflverse-data/releases/nextgen_stats it looks like there are no .parquet files specific to a season (only .qs, .csv.gz, .rds), only the larger ones (sans season) called in the first block:

So I think something just needs to be changed around here?

Thanks for putting together this awesome library!! It's been a huge help as I get ready for the season.
opened by jbf302 2
What version of Python is this for? Getting an error

I have tried to install it for both Python 3.8 and 3.6 and I am getting this error:

Collecting pandas>1 (from nfl_data_py) Could not find a version that satisfies the requirement pandas>1 (from nfl_data_py) (from versions: 0.1, 0.2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 0.19.2, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.21.1, 0.22.0, 0.23.0, 0.23.1, 0.23.2, 0.23.3, 0.23.4, 0.24.0, 0.24.1, 0.24.2, 0.25.0, 0.25.1, 0.25.2, 0.25.3)

opened by epmck 2
Reduce memory usage of multi-year pbp dataframe
If you load 20 years of data into one dataframe, you could start pushing up against memory limits on certain users' computers. You can reduce memory usage about 30% by converting float64 values to float32 when loading the yearly play-by-play data.

cols = df.select_dtypes(include=[np.float64]).columns df.loc[:, cols] = df.loc[:, cols].astype(np.float32)

On my computer, this reduced the memory usage of a single year from 129.7 MB to 94.5 MB. I don't think the lost precision is going to matter for anything we are doing with football stats.

If you are interested, I can submit a pull request that implements this change. You could also make it optional with the default being to downcast but allow the user to override if they want np.float64 as the dtype.
opened by sansbacon 2
HTTP Error 404: Not Found

There's a chance the links have changed in nflfastR. This link doesn't seem to work: data = pandas.read_parquet(r'https://github.com/nflverse/nflfastR-data/raw/master/data/player_stats.parquet', engine='auto')

Using Python v3.10.7 with nfl_data_py v0.2.5 Call: nfl.import_weekly_data([2021, 2022]) Error: HTTPError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_15024/1768614258.py in ----> 1 nfl.import_weekly_data([2021, 2022])

c:\Users\andre\AppData\Local\Programs\Python\Python310\lib\site-packages\nfl_data_py_init_.py in import_weekly_data(years, columns, downcast) 215 216 # read weekly data --> 217 data = pandas.read_parquet(r'https://github.com/nflverse/nflfastR-data/raw/master/data/player_stats.parquet', engine='auto') 218 data = data[data['season'].isin(years)] 219

c:\Users\andre\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parquet.py in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, **kwargs) 493 impl = get_engine(engine) 494 --> 495 return impl.read( 496 path, 497 columns=columns,

c:\Users\andre\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parquet.py in read(self, path, columns, use_nullable_dtypes, storage_options, **kwargs) 230 to_pandas_kwargs["split_blocks"] = True # type: ignore[assignment] 231 --> 232 path_or_handle, handles, kwargs["filesystem"] = _get_path_or_handle( 233 path, 234 kwargs.pop("filesystem", None), ... --> 643 raise HTTPError(req.full_url, code, msg, hdrs, fp) 644 645 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: Not Found

opened by andre03051 1
Pandas version incompatibility

Per discussion in nflverse discord.

Release 0.2.7 updated Pandas to a version not compatible with Python 3.6. This requirement should be reverted if possible. Alternatively if functionality in the updated Pandas is found to be needed, then the package metadata should be updated to specify it requires Python >= 3.7.

opened by alecglen 1
Pandas .append method deprication

nfl.import_pbp_data give the warning below. Should update method to use concat rather than append method.

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. plays = plays.append(raw)

opened by martydertz 1
import_pbp_data function not working

when trying to run the function import_pbp_data, I receive the following error message: No such file or directory: [path]

Code: import nfl_data_py as nfl data = nfl.import_pbp_data([2021])

opened by Josephhero 1

Missing EPA data

In week 6 of 2021, the Chicago Bears lost 24-14 to the Green Bay Packers: https://www.nfl.com/games/packers-at-bears-2021-reg-6. When trying to load the EPA data for this game, I get an empty dataframe. Could someone let me know if I am doing something wrong?

import nfl_data_py as nfl

seasons = [2021]
cols = ['epa', 'week', 'possession_team']

nfl.cache_pbp(seasons, downcast=True, alt_path=None)
data = nfl.import_pbp_data(seasons, cols, cache=True)

print(data.loc[(data['possession_team'] == 'CHI') & (data['week'] == 5)])
print(data.loc[(data['possession_team'] == 'CHI') & (data['week'] == 6)])

opened by pphili 2

Changed downcast code to use df[cols] instead of df.loc[:, cols]

pandas was giving me the following FutureWarning when importing data:

/nfl_data_py/__init__.py:137: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt
to set the values inplace instead of always setting a new array. To retain the old behavior, use either
`df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
    plays.loc[:, cols] = plays.loc[:, cols].astype(numpy.float32)

This change seems to have eliminated the warning.

opened by wstrausser 0

import_win_totals() years not optional

Git ReadMe and PyPl docs say years is optional, but years is required. This is a bit difficult because all years are not in the data, so it's hard to know why 2022 doesn't show.

Suggest making argument years=None and adding the following in the method/lambda: if years is None: years = []

Let me know how I can contribute. --David

opened by DavidAllyn68 0
Data Dictionary

Hey guys - thanks for pulling this all together!!! Do you have a data dictionary explaining the columns? Most are self explanatory, but a few are cryptic (e.g. in weekly_data dakota, pacr, racr, wopr, and ..._epa).

opened by DavidAllyn68 2
Missing Personnel Data Week 4 NFL

It appears that the values for formation data (offense/defense), players on the play (offense/defense), men in the box, # of pass rushers, etc. are unavailable for 2022 Week 4 data (which is otherwise available). Is this a known issue, an intentional omission, or something else? Where is this data sourced from?

Wondering if it will be available in the near future or if that is a result of an availability issue

opened by jwald3 1

Releases(v0.3.0)

v0.3.0(Aug 20, 2022)

Added import functionality for participation, contract, officials, and player data made previously available through nflReadR
Source code(tar.gz)
Source code(zip)
v0.2.11(Aug 20, 2022)
Actually fixed issue between python and pandas not resolved in 0.2.9

Dropped python 3.5 support from nfl_data_py to allow for parquet file usage

Fixed position filtering for combine data

Source code(tar.gz)
Source code(zip)
v0.2.8(Jul 30, 2022)

Fixed deprecation warning for import_pbp, adjust import_ngs to handle years correctly.
Source code(tar.gz)
Source code(zip)
v0.2.7(Jun 4, 2022)

-Now getting data from updated data sources -Fixed bug that was impeding the import_weekly_data() function
Source code(tar.gz)
Source code(zip)
v0.2.6(Mar 15, 2022)

-Cache functionality should work on all systems -PFR passing and snaps data redirected to new data location
Source code(tar.gz)
Source code(zip)
v0.2.4(Aug 28, 2021)

Added functionality for caching pbp data locally to speed up load process by 4-5x.
Source code(tar.gz)
Source code(zip)
v0.2.0(Aug 11, 2021)
New release includes functions for pulling:

NGS data

snap counts

depth charts

injury reports

PFR passing stats

And cleaned up repo.
Source code(tar.gz)
Source code(zip)
0.1.6(Aug 7, 2021)

-data imports can now use either pyarrow or fastparquet -import_schedules() directs to new file with more data -clean_df() includes feature for replace 'NA' with np.nan
Source code(tar.gz)
Source code(zip)
v0.1.5(Aug 2, 2021)

Source code(tar.gz)
Source code(zip)
v0.1.4(Aug 2, 2021)

Added default downcasting of float64s to float32 to reduce memory usage. Will slow down initial data load. Can be turned off by setting downcast=False in import_weekly_data() or import_pbp_data().
Source code(tar.gz)
Source code(zip)
v0.1.3(Jul 31, 2021)

Added pulls for combine data and mapping table with ids for a variety of NFL/fantasy sites
Source code(tar.gz)
Source code(zip)
v0.1.2(Jul 29, 2021)

Source code(tar.gz)
Source code(zip)
v0.1.0(Jul 29, 2021)

Added functions for pulling betting lines, officials data, and draft pick data.
Source code(tar.gz)
Source code(zip)
v0.0.5(Jul 27, 2021)

Added error checking + functions to pull schedule and team descriptive data.
Source code(tar.gz)
Source code(zip)
v0.0.4(Jul 26, 2021)

First public version nfl_data_py
Source code(tar.gz)
Source code(zip)

Owner

GitHub

Quick tutorial on orchest.io that shows how to build multiple deep learning models on your data with a single line of code using python

Deep AutoViML Pipeline for orchest.io Quickstart Build Deep Learning models with a single line of code: deep_autoviml Deep AutoViML helps you build te

6 Oct 2, 2022

Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.

time-series-kafka-demo Mock stream producer for time series data using Kafka. I walk through this tutorial and others here on GitHub and on my Medium

26 Nov 15, 2022

Data-Scrapping SEO - the project uses various data scrapping and Google autocompletes API tools to provide relevant points of different keywords so that search engines can be optimized

Data-Scrapping SEO - the project uses various data scrapping and Google autocompletes API tools to provide relevant points of different keywords so that search engines can be optimized; as this information is gathered, the marketing team can target the top keywords to get your company’s website higher on a results page.

2 Jul 18, 2022

advance python series: Data Classes, OOPs, python

Working With Pydantic - Built-in Data Process ========================== Normal way to process data (reading json file): the normal princiople, it's f

1 Nov 8, 2021

A Python library for setting up projects using tabular data.

A Python library for setting up projects using tabular data. It can create project folders, standardize delimiters, and convert files to CSV from either individual files or a directory.

0 Dec 13, 2022

Run `black` on python code blocks in documentation files

blacken-docs Run black on python code blocks in documentation files. install pip install blacken-docs usage blacken-docs provides a single executable

460 Dec 23, 2022

This is a repository for "100 days of code challenge" projects. You can reach all projects from beginner to professional which are written in Python.

100 Days of Code It's a challenge that aims to gain code practice and enhance programming knowledge. Day #1 Create a Band Name Generator It's actually

2 May 12, 2022

Source Code for 'Practical Python Projects' (video) by Sunil Gupta

Apress Source Code This repository accompanies %Practical Python Projects by Sunil Gupta (Apress, 2021). Download the files as a zip using the green b

2 Jun 1, 2022

Tutorial for STARKs with supporting code in python

stark-anatomy STARK tutorial with supporting code in python Outline: introduction overview of STARKs basic tools -- algebra and polynomials FRI low de

121 Jan 3, 2023

A collection and example code of every topic you need to know about in the basics of Python.

The Python Beginners Guide: Master The Python Basics Tonight This guide is a collection of every topic you need to know about in the basics of Python.

1 Dec 19, 2021

Some of the best ways and practices of doing code in Python!

Pythonicness ❤ This repository contains some of the best ways and practices of doing code in Python! Features Properly formatted codes (PEP 8) for bet

2 Jan 15, 2022

Example Python code for running the mango-explorer marketmaker

?? Mango Explorer ?? Introduction This guide will show you how to load and run a customisable marketmaker that runs on Mango Markets using the mango-e

2 Apr 11, 2022

Near Zero-Overhead Python Code Coverage

Slipcover: Near Zero-Overhead Python Code Coverage by Juan Altmayer Pizzorno and Emery Berger at UMass Amherst's PLASMA lab. About Slipcover Slipcover

325 Dec 28, 2022

A tutorial for people to run synthetic data replica's from source healthcare datasets

Synthetic-Data-Replica-for-Healthcare Description What is this? A tailored hands-on tutorial showing how to use Python to create synthetic data replic

11 Mar 22, 2022

An open source utility for creating publication quality LaTex figures generated from OpenFOAM data files.

foamTEX An open source utility for creating publication quality LaTex figures generated from OpenFOAM data files. Explore the docs » Report Bug · Requ

1 Dec 19, 2021

This contains timezone mapping information for when preprocessed from the geonames data

when-data This contains timezone mapping information for when preprocessed from the geonames data. It exists in a separate repository so that one does

2 Dec 7, 2021

Generates, filters, parses, and cleans data regarding the financial disclosures of judges in the American Judicial System

This repository contains code that gets data regarding financial disclosures from the Court Listener API main.py: contains driver code that interacts

2 Aug 6, 2022

Soccerdata - Efficiently scrape soccer data from various sources

SoccerData is a collection of wrappers over soccer data from Club Elo, ESPN, FBr

195 Jan 4, 2023

DataAnalysis: Some data analysis projects in charles_pikachu

DataAnalysis DataAnalysis: Some data analysis projects in charles_pikachu You can star this repository to keep track of the project if it's helpful fo

9 Nov 4, 2022

Python code for working with NFL play by play data.

Related tags

Overview

nfl_data_py

Installation

Usage

Recognition

Contributing

License

Comments

Releases(v0.3.0)

v0.3.0(Aug 20, 2022)

v0.2.11(Aug 20, 2022)

v0.2.8(Jul 30, 2022)

v0.2.7(Jun 4, 2022)

v0.2.6(Mar 15, 2022)

v0.2.4(Aug 28, 2021)

v0.2.0(Aug 11, 2021)

0.1.6(Aug 7, 2021)

v0.1.5(Aug 2, 2021)

v0.1.4(Aug 2, 2021)

v0.1.3(Jul 31, 2021)

v0.1.2(Jul 29, 2021)

v0.1.0(Jul 29, 2021)

v0.0.5(Jul 27, 2021)

v0.0.4(Jul 26, 2021)

Owner

Quick tutorial on orchest.io that shows how to build multiple deep learning models on your data with a single line of code using python

Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.

Data-Scrapping SEO - the project uses various data scrapping and Google autocompletes API tools to provide relevant points of different keywords so that search engines can be optimized

advance python series: Data Classes, OOPs, python

A Python library for setting up projects using tabular data.

Run `black` on python code blocks in documentation files

This is a repository for "100 days of code challenge" projects. You can reach all projects from beginner to professional which are written in Python.

Source Code for 'Practical Python Projects' (video) by Sunil Gupta

Tutorial for STARKs with supporting code in python

A collection and example code of every topic you need to know about in the basics of Python.

Some of the best ways and practices of doing code in Python!

Example Python code for running the mango-explorer marketmaker

Near Zero-Overhead Python Code Coverage

A tutorial for people to run synthetic data replica's from source healthcare datasets

An open source utility for creating publication quality LaTex figures generated from OpenFOAM data files.

This contains timezone mapping information for when preprocessed from the geonames data

Generates, filters, parses, and cleans data regarding the financial disclosures of judges in the American Judicial System

Soccerdata - Efficiently scrape soccer data from various sources

DataAnalysis: Some data analysis projects in charles_pikachu