A repository with scraping code and soccer dataset from understat.com.

douglasbc

Last update: Jan 3, 2023

Related tags

Web Crawling scraping-understat-dataset

Overview

UNDERSTAT - SHOTS DATASET

As many people interested in soccer analytics know, Understat is an amazing source of information. They provide Expected Goals (xG) stats for every shot taken in the top 5 leagues in Europe, as well as the Russian league.

After watching an awesome tutorial by McKay Johns (great channel btw, loads of resources for beginners in soccer analytics), I decided to write some code to scrape all the shots data available at Understat. As a consequence I managed to generate this dataset, containing shots data of season 2014/2015, up to every match played in the 2020/2021 season, for the top division on the following countries:

England - EPL

Spain - La Liga

Germany - Bundesliga

Italy - Serie A

France - Ligue 1

Russia - RFPL

Besides shots data, I also managed to scrape very detailed season stats on every single player that took part in these matches.

The datasets have been split into folders for every league, so every folder has 7 .csv files for shots data and 7 .csv files for players data (1 for every season since 14/15). The full dataset, with every league and season combined is also available at the "datasets" folder. I plan on updating the datasets everyday, but I also uploaded the Python code that generates and updates the datasets. Feel free to play with it and suggest improvements (hit me up on twitter). To update it by yourself, just save "scraping" and "datasets" on the same folder, run Python with this folder as the current working directory and then run the update.py script, that is located in "scraping".

Most of the columns in the datasets are pretty straightforward, but some aren't. So I uploaded a couple of .pdf files in "documentation", explaining every column.

Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Scrapy Cluster This Scrapy project uses Redis and Kafka to create a distributed

0 Jan 6, 2022

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website (prior form publication) by form number and returns the results as json. It provides the option to download pdfs over a range of years.

1 Jan 4, 2022

Web-scraping - Program that scrapes a website for a collection of quotes, picks one at random and displays it

Comments

Error when running update.py

Hi and thanks for the interesting repo.

I have tried to update the shots with the update.py, but receive the following error:

No such file or directory: 'XXX\scraping\scraping\empty_url_update.txt'

I thought the directory looked wrong as folder "scraping" was used twice so I tried editting matchscraper.py to reference a higher working directory:

#CWD = os.getcwd() CWD = os.path.dirname(os.getcwd())

This seems to scrape because it prints all the new match ids in the console and gives no errors, but the datasets do not seem to be updated. Any ideas?

Many thanks!

opened by dataslug1 0

A repository with scraping code and soccer dataset from understat.com.

Related tags

Overview

UNDERSTAT - SHOTS DATASET

You might also like...

Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Web-scraping - Program that scrapes a website for a collection of quotes, picks one at random and displays it

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

Web Scraping Framework

Visual scraping for Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Async Python 3.6+ web scraping micro-framework based on asyncio

Transistor, a Python web scraping framework for intelligent use cases.

Comments

Error when running update.py

Owner

douglasbc

Here I provide the source code for doing web scraping using the python library, it is Selenium.

Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

🥫 The simple, fast, and modern web scraping library

A tool for scraping and organizing data from NewsBank API searches

Example of scraping a paginated API endpoint and dumping the data into a DB

Web Scraping OLX with Python and Bsoup.

A powerful annex BUBT, BUBT Soft, and BUBT website scraping script.

Web Scraping images using Selenium and Python