Scraping script for stats on covid19 pandemic status in Chiba prefecture, Japan

Overview

About

千葉県の地域別の詳細感染者統計(Excelファイル) をCSVに変換し、かつ地域別の日時感染者集計値を出力するスクリプトです。

Requirement

  • POSIX互換なシェル, e.g. GNU Bash (1)
  • curl (1)
  • python >= 3.8
  • pandas >= 1.1.3 (debian derivatives: python3-pandas >= 1.1.3)
  • xlrd >= 1.2.0 (debian derivatives: python3-xlrd >= 1.2.0)

上記以外のバージョンは動作保証の対象外となります。

Usage

取得~変換まで一括

fetchを含む全工程を一括で実施するconv.sh allが便利です。

サーバに過度な負荷をかけることのないよう、手動で行うことをおすすめします。

./conv.sh all

ファイル取得

昨日付で公開された地域別感染者数を含むxlsxファイルを取得します。 サーバに過度な負荷をかけることのないよう、手動で行うことをおすすめします。

./conv.sh fetch

取得ファイルの変換

conv.sh target FILE で千葉県の感染者データの解析結果をout配下に出力します。 実体としてはconv.py プラグインを呼び出しており、このスクリプトは千葉県専用の実装です。

./conv.sh target data/1013kansensya.xslx

Testing

本スクリプトは、変換後のデータ形式のみをテスト対象としています。 conv.py へのコミットを行う場合には、生成データ(data.csv, data-analyzed.csv) の形式を検証頂きますようお願いします。

データ形式テストには shellspec と GNU grep (1) が必要です。

データの正確性については、現時点で十分に確認できていません。ご協力いただける方はイシューを立てていただけますでしょうか。

Credit

千葉県庁公式のコロナ統計公表ページ:「新型コロナウイルス感染症患者等の県内発生状況について」のページ内リンクより取得したxlsxファイルを利用しています。 感染症対策に尽力されている行政職員、医療従事者の皆様に心より敬意を表します。

fixture配下のテスト用データについては千葉県の公表統計に属するため、CC-BY-4.0 にてライセンスされますfixture配下を除く本リポジトリの素材はCC-BY-SA-4.0 にて Conv4Japan Contributor によりライセンスされます。

You might also like...
🥫 The simple, fast, and modern web scraping library
🥫 The simple, fast, and modern web scraping library

About gazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with zero dependencies. I

Transistor, a Python web scraping framework for intelligent use cases.
Transistor, a Python web scraping framework for intelligent use cases.

Web data collection and storage for intelligent use cases. transistor About The web is full of data. Transistor is a web scraping framework for collec

A pure-python HTML screen-scraping library

Scrapely Scrapely is a library for extracting structured data from HTML pages. Given some example web pages and the data to be extracted, scrapely con

A repository with scraping code and soccer dataset from understat.com.
A repository with scraping code and soccer dataset from understat.com.

UNDERSTAT - SHOTS DATASET As many people interested in soccer analytics know, Understat is an amazing source of information. They provide Expected Goa

Minimal set of tools to conduct stealthy scraping.

Stealthy Scraping Tools Do not use puppeteer and playwright for scraping. Explanation. We only use the CDP to obtain the page source and to get the ab

crypto currency scraping
crypto currency scraping

SCRYPTO What ? Crypto currencies scraping (At the moment, only bitcoin and ethereum crypto currencies are supported) How ? A python script is running

Scraping Thailand COVID-19 data from the DDC's tableau dashboard

Scraping COVID-19 data from DDC Dashboard Scraping Thailand COVID-19 data from the DDC's tableau dashboard. Data is updated at 07:30 and 08:00 daily.

A Web Scraping Program.

Web Scraping AUTHOR: Saurabh G. MTech Information Security, IIT Jammu. If you find this repository useful. I would appreciate if you Star it and Fork

PyQuery-based scraping micro-framework.

demiurge PyQuery-based scraping micro-framework. Supports Python 2.x and 3.x. Documentation: http://demiurge.readthedocs.org Installing demiurge $ pip

Owner
Conv4Japan
Convert, convert and CONVERT for neighbors!
Conv4Japan
A powerful annex BUBT, BUBT Soft, and BUBT website scraping script.

Annex Bubt Scraping Script I think this is the first public repository that provides free annex-BUBT, BUBT-Soft, and BUBT website scraping API script

Md Imam Hossain 4 Dec 3, 2022
Poolbooru gelscraper - a simple python script for scraping images off gelbooru pools.

poolbooru_gelscraper a simple python script for scraping images off gelbooru pools. modules required:requests_html, and os by default saves files with

savantshuia 1 Jan 2, 2022
API which uses discord to scrape NameMC searches/droptime/dropping status of minecraft names

NameMC Scrape API This is an api to scrape NameMC using message previews generated by discord. NameMC makes it a pain to scrape their website, but som

Twilak 2 Dec 22, 2021
OSTA web scraper, for checking the status of school buses in Ottawa

OSTA-La-Vista OSTA web scraper, for checking the status of school buses in Ottawa. Getting Started Using a Raspberry Pi, download Python 3, and option

null 1 Jan 28, 2022
Web Scraping Framework

Grab Framework Documentation Installation $ pip install -U grab See details about installing Grab on different platforms here http://docs.grablib.

null 2.3k Jan 4, 2023
Visual scraping for Scrapy

Portia Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web pag

Scrapinghub 8.7k Jan 5, 2023
Scrapy, a fast high-level web crawling & scraping framework for Python.

Scrapy Overview Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pag

Scrapy project 45.5k Jan 7, 2023
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Jan 8, 2023
Async Python 3.6+ web scraping micro-framework based on asyncio

Ruia ??️ Async Python 3.6+ web scraping micro-framework based on asyncio. ⚡ Write less, run faster. Overview Ruia is an async web scraping micro-frame

howie.hu 1.6k Jan 1, 2023
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

trafilatura: Web scraping tool for text discovery and retrieval Description Trafilatura is a Python package and command-line tool which seamlessly dow

Adrien Barbaresi 704 Jan 6, 2023