wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Overview

rtd ci codecov pyversions pypi pypistatus license coc codestyle

Python based Wikidata framework for easy dataframe extraction

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information. The goal is to create an intuitive interface so that Wikidata can function as a common read-write repository for public statistics.

Contents

Installation

wikirepo can be downloaded from PyPI via pip or sourced directly from this repository:

pip install wikirepo
git clone https://github.com/andrewtavis/wikirepo.git
cd wikirepo
python setup.py install
import wikirepo

Data

wikirepo's data structure is built around Wikidata.org. Human-readable access to Wikidata statistics is achieved through converting requests into Wikidata's Quantity IDs (QIDs) and Property IDs (PIDs), with the Python package wikidata serving as a basis for data loading and indexing. See the documentation for a structured overview of the currently available properties.

Query Data

wikirepo's main access function, wikirepo.data.query, returns a pandas.DataFrame of locations and property data across time.

Each query needs the following inputs:

  • locations: the locations that data should be queried for
    • Strings are accepted for Earth, continents, and countries
    • Get all country names with wikirepo.data.incl_lctn_lbls(lctn_lvls='country')
    • The user can also pass Wikidata QIDs directly
  • depth: the geographic level of the given locations to query
    • A depth of 0 is the locations themselves
    • Greater depths correspond to lower geographic levels (states of countries, etc.)
    • A dictionary of locations is generated for lower depths (see second example below)
  • timespan: start and end datetime.date objects defining when data should come from
    • If not provided, then the most recent data will be retrieved with annotation for when it's from
  • interval: yearly, monthly, weekly, or daily as strings
  • Further arguments: the names of modules in wikirepo/data directories
    • These are passed to arguments corresponding to their directories
    • Data will be queried for these properties for the given locations, depth, timespan and interval, with results being merged as dataframe columns

Queries are also able to access information in Wikidata sub-pages for locations. For example: if inflation rate is not found on the location's main page, then wikirepo checks the location's economic topic page as inflation_rate.py is found in wikirepo/data/economic (see Germany and economy of Germany).

wikirepo further provides a unique dictionary class, EntitiesDict, that stores all loaded Wikidata entities during a query. This speeds up data retrieval, as entities are loaded once and then accessed in the EntitiesDict object for any other needed properties.

Examples of wikirepo.data.query follow:

Querying Information for Given Countries

import wikirepo
from wikirepo.data import wd_utils
from datetime import date

ents_dict = wd_utils.EntitiesDict()
# Strings must match their Wikidata English page names
countries = ["Germany", "United States of America", "People's Republic of China"]
# countries = ["Q183", "Q30", "Q148"] # we could also pass QIDs
# data.incl_lctn_lbls(lctn_lvls='country') # or all countries`
depth = 0
timespan = (date(2009, 1, 1), date(2010, 1, 1))
interval = "yearly"

df = wikirepo.data.query(
    ents_dict=ents_dict,
    locations=countries,
    depth=depth,
    timespan=timespan,
    interval=interval,
    climate_props=None,
    demographic_props=["population", "life_expectancy"],
    economic_props="median_income",
    electoral_poll_props=None,
    electoral_result_props=None,
    geographic_props=None,
    institutional_props="human_dev_idx",
    political_props="executive",
    misc_props=None,
    verbose=True,
)

col_order = [
    "location",
    "qid",
    "year",
    "executive",
    "population",
    "life_exp",
    "human_dev_idx",
    "median_income",
]
df = df[col_order]

df.head(6)
location qid year executive population life_exp human_dev_idx median_income
Germany Q183 2010 Angela Merkel 8.1752e+07 79.9878 0.921 33333
Germany Q183 2009 Angela Merkel nan 79.8366 0.917 nan
United States of America Q30 2010 Barack Obama 3.08746e+08 78.5415 0.914 43585
United States of America Q30 2009 George W. Bush nan 78.3902 0.91 nan
People's Republic of China Q148 2010 Wen Jiabao 1.35976e+09 75.236 0.706 nan
People's Republic of China Q148 2009 Wen Jiabao nan 75.032 0.694 nan

Querying Information for all US Counties

# Note: >3000 regions, expect a 45 minute runtime
import wikirepo
from wikirepo.data import lctn_utils, wd_utils
from datetime import date

ents_dict = wd_utils.EntitiesDict()
country = "United States of America"
# country = "Q30" # we could also pass its QID
depth = 2  # 2 for counties, 1 for states and territories
sub_lctns = True  # for all
# Only valid sub-locations given the timespan will be queried
timespan = (date(2016, 1, 1), date(2018, 1, 1))
interval = "yearly"

us_counties_dict = lctn_utils.gen_lctns_dict(
    ents_dict=ents_dict,
    locations=country,
    depth=depth,
    sub_lctns=sub_lctns,
    timespan=timespan,
    interval=interval,
    verbose=True,
)

df = wikirepo.data.query(
    ents_dict=ents_dict,
    locations=us_counties_dict,
    depth=depth,
    timespan=timespan,
    interval=interval,
    climate_props=None,
    demographic_props="population",
    economic_props=None,
    electoral_poll_props=None,
    electoral_result_props=None,
    geographic_props="area",
    institutional_props="capital",
    political_props=None,
    misc_props=None,
    verbose=True,
)

df[df["population"].notnull()].head(6)
location sub_lctn sub_sub_lctn qid year population area_km2 capital
United States of America California Alameda County Q107146 2018 1.6602e+06 2127 Oakland
United States of America California Contra Costa County Q108058 2018 1.14936e+06 2078 Martinez
United States of America California Marin County Q108117 2018 263886 2145 San Rafael
United States of America California Napa County Q108137 2018 141294 2042 Napa
United States of America California San Mateo County Q108101 2018 774155 1919 Redwood City
United States of America California Santa Clara County Q110739 2018 1.9566e+06 3377 San Jose

Upload Data (WIP)

wikirepo.data.upload will be the core of the eventual wikirepo upload feature. The goal is to record edits that a user makes to a previously queried or baseline dataframe such that these changes can then be pushed back to Wikidata. With the addition of Wikidata login credentials as a wikirepo feature (WIP), the unique information in the edited dataframe could then be uploaded to Wikidata for all to use.

The same process used to query information from Wikidata could be reversed for the upload process. Dataframe columns could be linked to their corresponding Wikidata properties, whether the time qualifiers are a point in time or spans using start time and end time could be derived through the defined variables in the module header, and other necessary qualifiers for proper data indexing could also be included. Source information could also be added in corresponding columns to the given property edits.

Pseudocode for how this process could function follows:

In the first example, changes are made to a df.copy() of a queried dataframe. pandas is then used to compare the new and original dataframes after the user has added information that they have access to.

import wikirepo
from wikirepo.data import lctn_utils, wd_utils
from datetime import date

credentials = wd_utils.login()

ents_dict = wd_utils.EntitiesDict()
country = "Country Name"
depth = 2
sub_lctns = True
timespan = (date(2000,1,1), date(2018,1,1))
interval = 'yearly'

lctns_dict = lctn_utils.gen_lctns_dict()

df = wikirepo.data.query()
df_copy = df.copy()

# The user checks for NaNs and adds data

df_edits = pd.concat([df, df_copy]).drop_duplicates(keep=False)

wikirepo.data.upload(df_edits, credentials)

In the next example data.data_utils.gen_base_df is used to create a dataframe with dimensions that match a time series that the user has access to. The data is then added to the column that corresponds to the property to which it should be added. Source information could further be added via a structured dictionary generated for the user.

import wikirepo
from wikirepo.data import data_utils, wd_utils
from datetime import date

credentials = wd_utils.login()

locations = "Country Name"
depth = 0
# The user defines the time parameters based on their data
timespan = (date(1995,1,2), date(2010,1,2)) # (first Monday, last Sunday)
interval = 'weekly'

base_df = data_utils.gen_base_df()
base_df['data'] = data_for_matching_time_series

source_data = wd_utils.gen_source_dict('Source Information')
base_df['data_source'] = [source_data] * len(base_df)

wikirepo.data.upload(base_df, credentials)

Put simply: a full featured wikirepo.data.upload function would realize the potential of a single read-write repository for all public information.

Maps (WIP)

wikirepo/maps is a further goal of the project, as it combines wikirepo's focus on easy to access open source data and quick high level analytics.

Query Maps

As in wikirepo.data.query, passing the locations, depth, timespan and interval arguments could access GeoJSON files stored on Wikidata, thus providing mapping files in parallel to the user's data. These files could then be leveraged using existing Python plotting libraries to provide detailed presentations of geographic analysis.

Upload Maps

Similar to the potential of adding statistics through wikirepo.data.upload, GeoJSON map files could also be uploaded to Wikidata using appropriate arguments. The potential exists for a myriad of variable maps given locations, depth, timespan and interval information that would allow all wikirepo users to get the exact mapping file that they need for their given task.

Examples

wikirepo can be used as a foundation for countless projects, with its usefulness and practicality only improving as more properties are added and more data is uploaded to Wikidata.

Current usage examples include:

  • Sample notebooks for the Python package poli-sci-kit show how to use wikirepo as a basis for political election and parliamentary appointment analysis, with those notebooks being found in the examples for poli-sci-kit or on Google Colab
  • Pull requests with other examples will gladly be accepted

To-Do

Please see the contribution guidelines if you are interested in contributing to this project. Work that is in progress or could be implemented includes:

Expanding wikirepo

  • Creating an outline of the package's structure for the readme (see issue)

  • Integrating current Python tools with wikirepo structures for uploads to Wikidata

  • Adding a query of property descriptions to data.data_utils.incl_dir_idxs (see issue)

  • Adding multiprocessing support to the wikirepo.data.query process and data.lctn_utils.gen_lctns_dict

  • Potentially converting wikirepo.data.query and data.lctn_utils.gen_lctns_dict over to generated Wikidata SPARQL queries

  • Optimizing wikirepo.data.query:

    • Potentially converting EntitiesDict and LocationsDict to slotted object classes for memory savings
    • Deriving and optimizing other slow parts of the query process
  • Adding access to GeoJSON files for mapping via wikirepo.maps.query

  • Designing and adding GeoJSON files indexed by time properties to Wikidata

  • Creating, improving and sharing examples

  • Improving tests for greater code coverage

  • Improving code quality by refactoring large functions and checking conventions

Expanding Wikidata

The growth of wikirepo's database relies on that of Wikidata. Through data.wd_utils.dir_to_topic_page wikirepo can access properties on location sub-pages, thus allowing for statistics on any topic to be linked to. Beyond including entries for already existing properties (see this issue), the following are examples of property types that could be added:

  • Climate statistics could be added to data/climate

    • This would allow for easy modeling of global warming and its effects
    • Planning would be needed for whether lower intervals would be necessary, or just include daily averages
  • Those for electoral polling and results for locations

    • This would allow direct access to all needed election information in a single function call
  • A property that links political parties and their regions in data/political

    • For easy professional presentation of electoral results (ex: loading in party hex colors, abbreviations, and alignments)
  • data/demographic properties such as:

    • age, education, religious, and linguistic diversities across time
  • data/economic properties such as:

    • female workforce participation, workforce industry diversity, wealth diversity, and total working age population across time
  • Distinct properties for Freedom House and Press Freedom indexes, as well as other descriptive metrics

Similar Projects

Python

JavaScript

Java

Powered By


Wikimedia           Wikibase           Wikidata
Comments
  • Create concise requirement and env files

    Create concise requirement and env files

    This issue is for creating concise versions of requirements.txt and environment.yml for wikirepo. It would be great if these files were created by hand with specific version numbers or generated in a way so that sub-dependencies don't always need to be updated.

    As of now both files are being created with the following commands in the package's conda virtual environment:

    pip list --format=freeze > requirements.txt  
    conda env export --no-builds | grep -v "^prefix: " > environment.yml
    

    wikirepo and other obviously unneeded packages are then removed from these files before being uploaded.

    Any insights or help would be much appreciated!

    help wanted good first issue question 
    opened by andrewtavis 7
  • Remove unused packages in requirements

    Remove unused packages in requirements

    Hello, This is to follow-up issue https://github.com/andrewtavis/wikirepo/issues/17.

    Please review~

    And about setup.py, is there some purpose to use graph package, such as matplotlib and seaborn?

    opened by kination 2
  • Bump aiohttp from 3.7.3 to 3.7.4

    Bump aiohttp from 3.7.3 to 3.7.4

    Bumps aiohttp from 3.7.3 to 3.7.4.

    Changelog

    Sourced from aiohttp's changelog.

    3.7.4 (2021-02-25)

    Bugfixes

    • (SECURITY BUG) Started preventing open redirects in the aiohttp.web.normalize_path_middleware middleware. For more details, see https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg.

      Thanks to Beast Glatisant <https://github.com/g147>__ for finding the first instance of this issue and Jelmer Vernooij <https://jelmer.uk/>__ for reporting and tracking it down in aiohttp. [#5497](https://github.com/aio-libs/aiohttp/issues/5497) <https://github.com/aio-libs/aiohttp/issues/5497>_

    • Fix interpretation difference of the pure-Python and the Cython-based HTTP parsers construct a yarl.URL object for HTTP request-target.

      Before this fix, the Python parser would turn the URI's absolute-path for //some-path into / while the Cython code preserved it as //some-path. Now, both do the latter. [#5498](https://github.com/aio-libs/aiohttp/issues/5498) <https://github.com/aio-libs/aiohttp/issues/5498>_


    Commits
    • 0a26acc Bump aiohttp to v3.7.4 for a security release
    • 021c416 Merge branch 'ghsa-v6wp-4m6f-gcjg' into master
    • 4ed7c25 Bump chardet from 3.0.4 to 4.0.0 (#5333)
    • b61f0fd Fix how pure-Python HTTP parser interprets //
    • 5c1efbc Bump pre-commit from 2.9.2 to 2.9.3 (#5322)
    • 0075075 Bump pygments from 2.7.2 to 2.7.3 (#5318)
    • 5085173 Bump multidict from 5.0.2 to 5.1.0 (#5308)
    • 5d1a75e Bump pre-commit from 2.9.0 to 2.9.2 (#5290)
    • 6724d0e Bump pre-commit from 2.8.2 to 2.9.0 (#5273)
    • c688451 Removed duplicate timeout parameter in ClientSession reference docs. (#5262) ...
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump lxml from 4.6.2 to 4.6.3

    Bump lxml from 4.6.2 to 4.6.3

    Bumps lxml from 4.6.2 to 4.6.3.

    Changelog

    Sourced from lxml's changelog.

    4.6.3 (2021-03-21)

    Bugs fixed

    • A vulnerability (CVE-2021-28957) was discovered in the HTML Cleaner by Kevin Chung, which allowed JavaScript to pass through. The cleaner now removes the HTML5 formaction attribute.
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • [ImgBot] Optimize images

    [ImgBot] Optimize images

    Beep boop. Your images are optimized!

    Your image file size has been reduced by 45% 🎉

    Details

    | File | Before | After | Percent reduction | |:--|:--|:--|:--| | /resources/wikirepo_logo_transparent.png | 171.28kb | 76.11kb | 55.56% | | /resources/gh_images/wikidata_logo.png | 26.59kb | 16.87kb | 36.56% | | /resources/wikirepo_logo.png | 150.90kb | 96.30kb | 36.18% | | /resources/gh_images/wikibase_logo.png | 20.41kb | 14.64kb | 28.30% | | | | | | | Total : | 369.18kb | 203.92kb | 44.76% |


    Black Lives Matter | 💰 donate | 🎓 learn | ✍🏾 sign

    📝 docs | :octocat: repo | 🙋🏾 issues | 🏅 swag | 🏪 marketplace

    opened by imgbot[bot] 1
  • Bump aiohttp from 3.7.3 to 3.7.4

    Bump aiohttp from 3.7.3 to 3.7.4

    Bumps aiohttp from 3.7.3 to 3.7.4.

    Changelog

    Sourced from aiohttp's changelog.

    3.7.4 (2021-02-25)

    Bugfixes

    • (SECURITY BUG) Started preventing open redirects in the aiohttp.web.normalize_path_middleware middleware. For more details, see https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg.

      Thanks to Beast Glatisant <https://github.com/g147>__ for finding the first instance of this issue and Jelmer Vernooij <https://jelmer.uk/>__ for reporting and tracking it down in aiohttp. [#5497](https://github.com/aio-libs/aiohttp/issues/5497) <https://github.com/aio-libs/aiohttp/issues/5497>_

    • Fix interpretation difference of the pure-Python and the Cython-based HTTP parsers construct a yarl.URL object for HTTP request-target.

      Before this fix, the Python parser would turn the URI's absolute-path for //some-path into / while the Cython code preserved it as //some-path. Now, both do the latter. [#5498](https://github.com/aio-libs/aiohttp/issues/5498) <https://github.com/aio-libs/aiohttp/issues/5498>_


    Commits
    • 0a26acc Bump aiohttp to v3.7.4 for a security release
    • 021c416 Merge branch 'ghsa-v6wp-4m6f-gcjg' into master
    • 4ed7c25 Bump chardet from 3.0.4 to 4.0.0 (#5333)
    • b61f0fd Fix how pure-Python HTTP parser interprets //
    • 5c1efbc Bump pre-commit from 2.9.2 to 2.9.3 (#5322)
    • 0075075 Bump pygments from 2.7.2 to 2.7.3 (#5318)
    • 5085173 Bump multidict from 5.0.2 to 5.1.0 (#5308)
    • 5d1a75e Bump pre-commit from 2.9.0 to 2.9.2 (#5290)
    • 6724d0e Bump pre-commit from 2.8.2 to 2.9.0 (#5273)
    • c688451 Removed duplicate timeout parameter in ClientSession reference docs. (#5262) ...
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Create package structure outline

    Create package structure outline

    wikirepo as a project has many modules that interconnect and are funneled to two functions - wikirepo.data.query and lctn_utils.gen_lctns_dict. It would be helpful for users and potential contributors to have a visual representation of the package that details the overarching structure and the purpose of various components. This outline could then be added to the readme in the To-Do section, potentially in a drop down.

    An initial test of this could be as simple as a directory outline that has a bit more detail about the given components - say by using *, **, †, ‡ and other symbols to indicate where a description could be found.

    A discussion of how to best present the package structure is more than welcome, and contributions would further be very appreciated!

    documentation good first issue question 
    opened by andrewtavis 0
  • Suggest properties for wikirepo

    Suggest properties for wikirepo

    Please use this issue to suggest Wikidata properties that could be added to wikirepo. With the suggestion it would be great to get the following:

    • The link to the property page on Wikidata
    • A suggestion of which category (demographic, economic, etc) the property should go into
    • [Optional] how the query script should be written (see examples/add_property to make suggestions for how the module header should be structured)

    Accepted property suggestions would then be converted to good first issues for wikirepo. Pull requests with new properties following the process of examples/add_property would also gladly be accepted! Documentation could also be done fur such issues or PRs, or could also be a separate issue.

    Thanks for your interest in supporting this project :)

    good first issue question 
    opened by andrewtavis 2
  • Add descriptions to data.data_utils.incl_dir_idxs

    Add descriptions to data.data_utils.incl_dir_idxs

    The function data.data_utils.incl_dir_idxs is how a user can find what indexes are available for a given type of data - demographic, economic, etc. It would be great if data.data_utils.incl_dir_idxs would have an option to also provide a description for the index. This could be directly queried from Wikidata.

    enhancement good first issue 
    opened by andrewtavis 0
Releases(v1.0.0)
  • v1.0.0(Dec 28, 2021)

  • v0.1.1.5(Mar 28, 2021)

    Changes include:

    • An src structure has been adopted for easier testing and to fix wheel distribution issues
    • Code quality is now checked with Codacy
    • Extensive code formatting to improve quality and style
    • Fixes to vulnerabilities through exception use
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Feb 23, 2021)

    First stable release of wikirepo

    Changes include:

    • Full documentation of the package

    • Virtual environment files

    • Bug fixes

    • Extensive testing of all modules with GH Actions and Codecov

    • Code of conduct and contribution guidelines

    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Dec 8, 2020)

    The minimum viable product of wikirepo:

    • Users are able to query data from Wikidata given locations, depth, time_lvl, and timespan arguments

    • String arguments are accepted for Earth, continents, countries and disputed territories

    • Data for greater depths can be retrieved by creating a dictionary given initial starting locations and going to greater depths using the contains administrative territorial entity property

    • Data is formatted and loaded into a pandas dataframe for further manipulation

    • All available social science properties on Wikidata have had modules created for them

    • Estimated load times and progress are given

    • The project's scope and general roadmap have been defined and detailed in the README

    Source code(tar.gz)
    Source code(zip)
Owner
Andrew Tavis McAllister
Data scientist focussing on NLP, causal inference and recommendation engines. Humboldt University of Berlin (MS); University of Oregon (BA).
Andrew Tavis McAllister
Retail-Sim is python package to easily create synthetic dataset of retaile store.

Retailer's Sale Data Simulation Retail-Sim is python package to easily create synthetic dataset of retaile store. Simulation Model Simulator consists

Corca AI 7 Sep 30, 2022
Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

null 898 Jan 9, 2023
VevestaX is an open source Python package for ML Engineers and Data Scientists.

VevestaX Track failed and successful experiments as well as features. VevestaX is an open source Python package for ML Engineers and Data Scientists.

Vevesta 24 Dec 14, 2022
Python beta calculator that retrieves stock and market data and provides linear regressions.

Stock and Index Beta Calculator Python script that calculates the beta (β) of a stock against the chosen index. The script retrieves the data and resa

sammuhrai 4 Jul 29, 2022
Used for data processing in machine learning, and help us to construct ML model more easily from scratch

Used for data processing in machine learning, and help us to construct ML model more easily from scratch. Can be used in linear model, logistic regression model, and decision tree.

ShawnWang 0 Jul 5, 2022
Toolchest provides APIs for scientific and bioinformatic data analysis.

Toolchest Python Client Toolchest provides APIs for scientific and bioinformatic data analysis. It allows you to abstract away the costliness of runni

Toolchest 11 Jun 30, 2022
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 9, 2023
Visions provides an extensible suite of tools to support common data analysis operations

Visions And these visions of data types, they kept us up past the dawn. Visions provides an extensible suite of tools to support common data analysis

null 168 Dec 28, 2022
Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data.

PremiershipPlayerAnalysis Using Python to scrape some basic player information from www.premierleague.com and then use Pandas to analyse said data. No

null 5 Sep 6, 2021
Additional tools for particle accelerator data analysis and machine information

PyLHC Tools This package is a collection of useful scripts and tools for the Optics Measurements and Corrections group (OMC) at CERN. Documentation Au

PyLHC 3 Apr 13, 2022
Making the DAEN information accessible.

The purpose of this repository is to make the information on Australian COVID-19 adverse events accessible. The Therapeutics Goods Administration (TGA) keeps a database of adverse reactions to medications including the COVID-19 vaccines.

null 10 May 10, 2022
TheMachineScraper 🐱‍👤 is an Information Grabber built for Machine Analysis

TheMachineScraper ??‍?? is a tool made purely for analysing machine data for any reason.

doop 5 Dec 1, 2022
A data structure that extends pyspark.sql.DataFrame with metadata information.

MetaFrame A data structure that extends pyspark.sql.DataFrame with metadata info

Invent Analytics 8 Feb 15, 2022
A python package which can be pip installed to perform statistics and visualize binomial and gaussian distributions of the dataset

GBiStat package A python package to assist programmers with data analysis. This package could be used to plot : Binomial Distribution of the dataset p

Rishikesh S 4 Oct 17, 2022
ToeholdTools is a Python package and desktop app designed to facilitate analyzing and designing toehold switches, created as part of the 2021 iGEM competition.

ToeholdTools Category Status Repository Package Build Quality A library for the analysis of toehold switch riboregulators created by the iGEM team Cit

null 0 Dec 1, 2021
A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Disclaimer This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to chang

Uber Open Source 1.6k Dec 29, 2022
Python Package for DataHerb: create, search, and load datasets.

The Python Package for DataHerb A DataHerb Core Service to Create and Load Datasets.

DataHerb 4 Feb 11, 2022
Python package to transfer data in a fast, reliable, and packetized form.

pySerialTransfer Python package to transfer data in a fast, reliable, and packetized form.

PB2 101 Dec 7, 2022
Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

Raphael Vallat 1.2k Dec 31, 2022