This is a Web scraping project using BeautifulSoup and Python to scrape basic information of all the Test matches played till Jan 2022.

Souradeep Banerjee

Last update: Oct 10, 2022

Related tags

Data Visualization jupyter-notebook pandas python3 web-scraping beautifulsoup bs4 beautifulsoup4 requests-library-python

Overview

Scraping-test-matches-data

This is a Web scraping project using BeautifulSoup and Python to scrape basic information of all the Test matches played till Jan 2022.

To see the code, open the test-match-records.ipynb file.

The data is scraped from the ESPNCricinfo Stats website using BeautifulSoup and rendered into a CSV file using Pandas.
Link to the Source page: https://stats.espncricinfo.com/ci/content/records/307847.html

The data is initially arranged year-wise. Using web scraping, first the links to individual years are extracted a and then web scraping is performed on those links to get the data of all Test Matches.

From the scraped Year links, the By_Year folder is created, containing CSV files for each years' matches. Then the CSV files are read and a master CSV file containing all the matches is created and stored as All_Matches.csv.

Then the All_Matches.csv file is used to segregate the data into other folders like By_Ground, By_Team and By_Hosting_Nation.

You may find some anomalies in the CSV files in the Host Team column. Those anomalies are explained in the Jupyter Notebook.

The above dataset is also uploaded to Kaggle: https://www.kaggle.com/bong952/test-matches-played-from-1877-jan-2022
The Jupyter notebook was originally posted and edited on Jovian: https://jovian.ai/ash007online/test-match-records

This is my first Web Scraping project. Kindly give a Star if you like it !!!

You might also like...

Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t

17.9k Dec 31, 2022

Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t

13.9k Feb 13, 2021

Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t

14k Feb 18, 2021

Movies-chart - A CLI app gets the top 250 movies of all time from imdb.com and the top 100 movies from rottentomatoes.com

movies-chart This CLI app gets the top 250 movies of all time from imdb.com and

3 Feb 17, 2022

Easily convert matplotlib plots from Python into interactive Leaflet web maps.

mplleaflet mplleaflet is a Python library that converts a matplotlib plot into a webpage containing a pannable, zoomable Leaflet map. It can also embe

502 Dec 28, 2022

Generating interfaces(CLI, Qt GUI, Dash web app) from a Python function.

oneFace is a Python library for automatically generating multiple interfaces(CLI, GUI, WebGUI) from a callable Python object. oneFace is an easy way t

31 Oct 21, 2022

This is a web application to visualize various famous technical indicators and stocks tickers from user

Visualizing Technical Indicators Using Python and Plotly. Currently facing issues hosting the application on heroku. As soon as I am able to I'll like

4 Aug 4, 2022

A System Metrics Monitoring Tool Built using Python3 , rabbitmq,Grafana and InfluxDB. Setup using docker compose. Use to monitor system performance with graphical interface of grafana , storage of influxdb and message queuing of rabbitmq

SystemMonitoringRabbitMQGrafanaInflux This repository has code to setup a system monitoring tool The tools used are the follows Python3.6 Docker Rabbi

7 Sep 6, 2022

Show Data: Show your dataset in web browser!

Show Data is to generate html tables for large scale image dataset, especially for the dataset in remote server. It provides some useful commond line tools and fully customizeble API reference to generate html table different tasks.

83 Nov 26, 2022

This is a Web scraping project using BeautifulSoup and Python to scrape basic information of all the Test matches played till Jan 2022.

Related tags

Overview

Scraping-test-matches-data

You might also like...

Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Movies-chart - A CLI app gets the top 250 movies of all time from imdb.com and the top 100 movies from rottentomatoes.com

Easily convert matplotlib plots from Python into interactive Leaflet web maps.

Generating interfaces(CLI, Qt GUI, Dash web app) from a Python function.

This is a web application to visualize various famous technical indicators and stocks tickers from user

A System Metrics Monitoring Tool Built using Python3 , rabbitmq,Grafana and InfluxDB. Setup using docker compose. Use to monitor system performance with graphical interface of grafana , storage of influxdb and message queuing of rabbitmq

Show Data: Show your dataset in web browser!

Owner

Souradeep Banerjee

Profile and test to gain insights into the performance of your beautiful Python code

An application that allows you to design and test your own stock trading algorithms in an attempt to beat the market.

Here I plotted data for the average test scores across schools and class sizes across school districts.

eoplatform is a Python package that aims to simplify Remote Sensing Earth Observation by providing actionable information on a wide swath of RS platforms and provide a simple API for downloading and visualizing RS imagery

A simple interpreted language for creating basic mathematical graphs.

Extract and visualize information from Gurobi log files

100 Days of Code The Complete Python Pro Bootcamp for 2022

The implementation of the paper "HIST: A Graph-based Framework for Stock Trend Forecasting via Mining Concept-Oriented Shared Information".

Project coded in Python using Pandas to look at changes in chase% for batters facing a pitcher first time through the order vs. thrid time

termplotlib is a Python library for all your terminal plotting needs.