sportsdataverse python package

Overview

sportsdataverse-py

Lifecycle:experimental PyPI Contributors Twitter Follow

See CHANGELOG.md for details.

The goal of sportsdataverse-py is to provide the community with a python package for working with sports data as a companion to the cfbfastR, hoopR, and wehoop R packages. Beyond data aggregation and tidying ease, one of the multitude of services that sportsdataverse-py provides is for benchmarking open-source expected points and win probability metrics for American Football.

Installation

sportsdataverse-py can be installed via pip:

pip install sportsdataverse

or from the repo (which may at times be more up to date):

git clone https://github.com/saiemgilani/sportsdataverse-py
cd sportsdataverse-py
pip install -e .

Our Authors

Citations

To cite the sportsdataverse-py Python package in publications, use:

BibTex Citation

@misc{gilani_sdvpy_2021,
  author = {Gilani, Saiem},
  title = {sportsdataverse-py: The SportsDataverse's Python Package for Sports Data.},
  url = {https://sportsdataverse-py.sportsdataverse.org},
  season = {2021}
}
Comments
  • Cannot pull 2021 data from load_cfb_pbp

    Cannot pull 2021 data from load_cfb_pbp

    Trying to access pbp data for the 2021 season leads to a HTTPError. Is there any way to make sure the data is up-to-date? I would love to be able to utilize this package during the season as well. Thank you!

    bug enhancement 
    opened by JayTheriault 4
  • ncaa march madness tournament scores not pulling

    ncaa march madness tournament scores not pulling

    First of all, this is an amazing package, and thank you for all of your hard work. It seems that the NCAA tournament games are missing from the schedule, play by play, and team boxscores.

    Screen Shot 2022-03-08 at 11 04 57 AM

    At some point, I can try my hand at a pull request, but I thought I'd open an issue first.

    opened by btatkinson 3
  • Sg/nfl loaders

    Sg/nfl loaders

    opened by saiemgilani 2
  • armstjc/fix-retrosheet-bugs RC 4

    armstjc/fix-retrosheet-bugs RC 4

    • Fixed a glitch in multiple Retrosheet functions where the 1st row would be cut off.

    • Added failsafe conditions for Retrosheet and Retrosplits functions where the season inputted was equal to the current year, but Retrosheet and/or Retrosplits doesn't have a file currently for that season.

    • Re-wrote parts of the Retrosheet and Retrosplits functions to operate faster.

    • Bumped up version to 0.0.27

    opened by armstjc 1
  • Armstjc/add retrosheet Release Candidate 2

    Armstjc/add retrosheet Release Candidate 2

    • Adds Retrosheet data functionality to the sportsdataverse-py Python package.
    • Adds Retrosplits data functionality to the sportsdataverse-py Python package.
    • Cleaned up pre-existing code for MLBAM functions in sportsdataverse-py.
    • Updated docs to add references to the new, MLB-related functions.
    enhancement 
    opened by armstjc 1
  • 29 example sdvcfbcfb teams call results in error

    29 example sdvcfbcfb teams call results in error

    An edge case issue was identified within the sportsdataverse.cfb.cfb_teams() function, where python would throw a'module' object is not callable error for the user.

    A solution to fix this issue, involves renaming the function to sportsdataverse.cfb.get_cfb_teams(), to avoid this specific error to happen.

    opened by armstjc 1
  • Cannot pull pbp data for Washington Football Team post 2018

    Cannot pull pbp data for Washington Football Team post 2018

    Since there is no mascot for the Washington Football Team post 2018, its produces the following key error:

    KeyError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_17088/3575107685.py in ----> 1 nfl = sportsdataverse.nfl.NFLPlayProcess("401326425").espn_nfl_pbp()

    ~\Anaconda3\lib\site-packages\sportsdataverse\nfl\nfl_pbp.py in espn_nfl_pbp(self) 112 ) 113 awayTeamMascot = str( --> 114 pbp_txt["header"]["competitions"][0]["competitors"][1]["team"]["name"] 115 ) 116 homeTeamName = str(

    KeyError: 'name'

    bug question 
    opened by peternavin 1
  • Assists recorded in boxscores missing in play-by-play

    Assists recorded in boxscores missing in play-by-play

    For the December 22, 2021, game between Wyoming and Stanford (game Id: 401372551), the sportsdataverse.mbb.load_mbb_player_boxscore(seasons=[2022]) data for this game records Hunter Maldonado (player Id: 4280267) of Wyoming as getting 8 assists. However, when going through the rows for this game from sportsdataverse.mbb.load_mbb_pbp(seasons=[2022]), Maldonado would only be credited with 6 assists, based upon the number of occurrences of "Assisted by Hunter Maldonado" in the text column, as well as the number of times his Id appears in the participants_1_athlete_id column. In particular, he should have been credited with assists on jumpers from Graham Ike with 17:16 remaining in the 1st half and 1:15 remaining in the 2nd half. Overall, Maldonado only gets credited with 190 assists in the pbp file, but he should have 197 assists for the season so far (following Wyoming's win over UNLV in the MWC conference tournament, their last game in the dataset).

    opened by perceptualJonathan 0
  • Bring in all games from cfbd

    Bring in all games from cfbd

    sportsdataverse.cfb.load_cfb_schedule only brings in games as far back as 2002, but cfbd actually has games back to ~1860, I don't see a reason not to bring them all in, even if there's no attendant pxp?

    enhancement 
    opened by christophermclement 2
Owner
Saiem Gilani
Sports Analytics, healthcare and interesting problems
Saiem Gilani
Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

Raphael Vallat 1.2k Dec 31, 2022
A Python package for the mathematical modeling of infectious diseases via compartmental models

A Python package for the mathematical modeling of infectious diseases via compartmental models. Originally designed for epidemiologists, epispot can be adapted for almost any type of modeling scenario.

epispot 12 Dec 28, 2022
GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors

GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors. GWpy provides a user-f

GWpy 342 Jan 7, 2023
Python Package for DataHerb: create, search, and load datasets.

The Python Package for DataHerb A DataHerb Core Service to Create and Load Datasets.

DataHerb 4 Feb 11, 2022
wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Python based Wikidata framework for easy dataframe extraction wikirepo is a Python package that provides a framework to easily source and leverage sta

Andrew Tavis McAllister 35 Jan 4, 2023
Python package for processing UC module spectral data.

UC Module Python Package How To Install clone repo. cd UC-module pip install . How to Use uc.module.UC(measurment=str, dark=str, reference=str, heade

Nicolai Haaber Junge 1 Oct 20, 2021
PyEmits, a python package for easy manipulation in time-series data.

PyEmits, a python package for easy manipulation in time-series data. Time-series data is very common in real life. Engineering FSI industry (Financial

Thompson 5 Sep 23, 2022
peptides.py is a pure-Python package to compute common descriptors for protein sequences

peptides.py Physicochemical properties and indices for amino-acid sequences. ??️ Overview peptides.py is a pure-Python package to compute common descr

Martin Larralde 32 Dec 31, 2022
Retail-Sim is python package to easily create synthetic dataset of retaile store.

Retailer's Sale Data Simulation Retail-Sim is python package to easily create synthetic dataset of retaile store. Simulation Model Simulator consists

Corca AI 7 Sep 30, 2022
A python package which can be pip installed to perform statistics and visualize binomial and gaussian distributions of the dataset

GBiStat package A python package to assist programmers with data analysis. This package could be used to plot : Binomial Distribution of the dataset p

Rishikesh S 4 Oct 17, 2022
VevestaX is an open source Python package for ML Engineers and Data Scientists.

VevestaX Track failed and successful experiments as well as features. VevestaX is an open source Python package for ML Engineers and Data Scientists.

Vevesta 24 Dec 14, 2022
nrgpy is the Python package for processing NRG Data Files

nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github

NRG Tech Services 23 Dec 8, 2022
ToeholdTools is a Python package and desktop app designed to facilitate analyzing and designing toehold switches, created as part of the 2021 iGEM competition.

ToeholdTools Category Status Repository Package Build Quality A library for the analysis of toehold switch riboregulators created by the iGEM team Cit

null 0 Dec 1, 2021
PyPDC is a Python package for calculating asymptotic Partial Directed Coherence estimations for brain connectivity analysis.

Python asymptotic Partial Directed Coherence and Directed Coherence estimation package for brain connectivity analysis. Free software: MIT license Doc

Heitor Baldo 3 Nov 26, 2022
PyIOmica (pyiomica) is a Python package for omics analyses.

PyIOmica (pyiomica) This repository contains PyIOmica, a Python package that provides bioinformatics utilities for analyzing (dynamic) omics datasets.

G. Mias Lab 13 Jun 29, 2022
Python package for analyzing behavioral data for Brain Observatory: Visual Behavior

Allen Institute Visual Behavior Analysis package This repository contains code for analyzing behavioral data from the Allen Brain Observatory: Visual

Allen Institute 16 Nov 4, 2022
Python package to transfer data in a fast, reliable, and packetized form.

pySerialTransfer Python package to transfer data in a fast, reliable, and packetized form.

PB2 101 Dec 7, 2022
A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

The leading use-case for the staircase package is for the creation and analysis of step functions. Pretty exciting huh. But don't hit the close button

null 48 Dec 21, 2022
BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics. It provides: Frequency table constr

Angel Chavez 1 Oct 31, 2021