MapReader: A computer vision pipeline for the semantic exploration of maps at scale

Living with Machines

Last update: Dec 26, 2022

Related tags

Overview

MapReader

A computer vision pipeline for the semantic exploration of maps at scale

MapReader is an end-to-end computer vision (CV) pipeline designed by the Living with Machines project. It has two main components: preprocessing/annotation and training/inference:

MapReader provides a set of tools to:

load images/maps stored locally or retrieve maps via web-servers (e.g., tileservers which can be used to retrieve maps from OpenStreetMap (OSM), the National Library of Scotland (NLS), or elsewhere). ⚠️ Refer to the credits and re-use terms section if you are using digitized maps or metadata provided by NLS.
preprocess images/maps (e.g., divide them into patches, resampling the images, removing borders outside the neatline or reprojecting the map).
annotate images/maps or their patches (i.e. slices of an image/map) using an interactive annotation tool.
train, fine-tune, and evaluate various CV models.
predict labels (i.e., model inference) on large sets of images/maps.
Other functionalities include:
- various plotting tools using, e.g., matplotlib, cartopy, Google Earth, and kepler.gl.
- compute mean/standard-deviation pixel intensity of image patches.

Below is an example of MapReader CV model output (see the paper on MapReader for more details):

British 'railspace' and buildings as predicted by a MapReader computer vision model. ~30.5M patches from ~16K nineteenth-century Ordnance Survey map sheets were used (courtesy of the National Library of Scotland). (a) Predicted railspace; (b) predicted buildings; (c) and (d) predicted railspace (red) and buildings (black) in and around Middlesbrough and London, respectively. MapReader extracts information from large images or a set of images at a patch level, as depicted in the insets. For both railspace and buildings, we removed those patches that had no other neighboring patches with the same label within a distance of 250 meters.

Installation and setup
Tutorials are organized in Jupyter Notebooks as follows:
- Classification
  - classification_one_inch_maps_001
    - Goal: train/fine-tune PyTorch CV classifiers on historical maps.
    - Dataset: from National Library of Scotland: OS one-inch, 2nd edition layer.
    - Data access: tileserver
    - Annotations are done on map patches (i.e., slices of each map).
    - Classifier: train/fine-tuned PyTorch CV models.
How to cite MapReader
Credits and re-use terms
- Digitized maps: MapReader can retrieve maps from NLS via tileserver. Read the re-use terms in this section.
- Metadata: the metadata files are stored at mapreader/persistent_data. Read the re-use terms in this section.
- Acknowledgements

Installation

Set up a conda environment

We strongly recommend installation via Anaconda:

Refer to Anaconda website and follow the instructions.
Create a new environment for mapreader called mr_py38:

conda create -n mr_py38 python=3.8

Activate the environment:

conda activate mr_py38

Method 1

Install mapreader:

pip install git+https://github.com/Living-with-machines/MapReader.git

We have provided some Jupyter Notebooks to show how different components in MapReader can be run. To allow the newly created mr_py38 environment to show up in the notebooks:

python -m ipykernel install --user --name mr_py38 --display-name "Python (mr_py38)"

Continue with the Tutorials!

Method 2

Clone mapreader source code:

git clone https://github.com/Living-with-machines/MapReader.git

Install using poetry:

cd /path/to/MapReader
poetry install
poetry shell

Continue with the Tutorials!

How to cite MapReader

Please consider acknowledging MapReader if it helps you to obtain results and figures for publications or presentations, by citing:

Link: https://arxiv.org/abs/2111.15592

Kasra Hosseini, Daniel C. S. Wilson, Kaspar Beelen and Katherine McDonough (2021), MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale, arXiv:2111.15592.

and in BibTeX:

@misc{hosseini2021mapreader,
      title={MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale}, 
      author={Kasra Hosseini and Daniel C. S. Wilson and Kaspar Beelen and Katherine McDonough},
      year={2021},
      eprint={2111.15592},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Credits and re-use terms

Digitized maps

MapReader can retrieve maps from NLS (National Library of Scotland) via webservers. For all the digitized maps (retrieved or locally stored), please note the re-use terms:

⚠️ Use of the digitised maps for commercial purposes is currently restricted by contract. Use of these digitised maps for non-commercial purposes is permitted under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC-BY-NC-SA) licence. Please refer to https://maps.nls.uk/copyright.html#exceptions-os for details on copyright and re-use license.

Metadata

We have provided some metadata files in mapreader/persistent_data. For all these file, please note the re-use terms:

⚠️ Use of the metadata for commercial purposes is currently restricted by contract. Use of this metadata for non-commercial purposes is permitted under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC-BY-NC-SA) licence. Please refer to https://maps.nls.uk/copyright.html#exceptions-os for details on copyright and re-use license.

Acknowledgements

This work was supported by Living with Machines (AHRC grant AH/S01179X/1) and The Alan Turing Institute (EPSRC grant EP/N510129/1). Living with Machines, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London.

Comments

Update README.md
[x] TODOs: See https://github.com/Living-with-machines/MapReader/pull/38#issuecomment-1109569025

[x] Rename Maps / Non-maps to Geospatial / Non-geospatial.

[x] @kasra-hosseini Review the changes, check the links and merge.
opened by kasra-hosseini 24
Testing `MapReader`
Hi All 👋🏼

I will be testing MapReader install and the demo notebooks run to evecute the analysis. I will document my process here

[x] Installation

[x] Install git clone [email protected]:Living-with-machines/MapReader.git

[x] git branch -> * dev

[X] git pull origin dev

[X] poetry install

[X] poetry shell

this command was not included in the README.md (unlike conda activate ...)

[x] Notebooks code execution

[x] 001_retrieve_patchify_plot.ipynb

[x] 002_annotation.ipynb

[x] 003_train_classifier.ipynb

[x] 004_inference.ipynb
opened by ChristinaLast 18

:bug: some errors in `binder` deployment.

Tasks

[x] Fix 'great_circle' is not defined
[x] Fix simplekml needs to be installed to create KML outputs!

Associated tracebacks

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_60/1620428857.py in <module>
      3 
      4 xmin, xmax, ymin, ymax, myimg_shape, size_in_m = \
----> 5         mymaps.calc_pixel_width_height(all_maps[0])

/srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in calc_pixel_width_height(self, parent_id, calc_size_in_m)
    349 
    350         elif calc_size_in_m in ['gc', 'great-circle']:
--> 351             bottom = great_circle((ymin, xmin), (ymin, xmax)).meters
    352             right = great_circle((ymin, xmax), (ymax, xmax)).meters
    353             top = great_circle((ymax, xmax), (ymax, xmin)).meters

NameError: name 'great_circle' is not defined

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in _createKML(self, path2kml, value, coords, counter)
    817         try:
--> 818             import simplekml
    819         except:

ModuleNotFoundError: No module named 'simplekml'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
/tmp/ipykernel_60/28836796.py in <module>
      4             save_kml_dir="./kml_tutorial",
      5             figsize=(20, 20),
----> 6             image_width_resolution=600)

/srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in show(self, image_ids, value, plot_parent, border, border_color, vmin, vmax, colorbar, alpha, discrete_colorbar, tree_level, grid_plot, plot_histogram, save_kml_dir, image_width_resolution, kml_dpi_image, **kwds)
    675                                     value=one_image_id,
    676                                     coords=self.images["parent"][one_image_id]["coord"],
--> 677                                     counter=-1)
    678                 else:
    679                     plt.title(one_image_id)

/srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in _createKML(self, path2kml, value, coords, counter)
    818             import simplekml
    819         except:
--> 820             raise ImportError("[ERROR] simplekml needs to be installed to create KML outputs!")
    821 
    822         (lon_min, lon_max, lat_min, lat_max) = coords

ImportError: [ERROR] simplekml needs to be installed to create KML outputs!

opened by ChristinaLast 6

🐛 `LoadAnnotations` not returning annotation interface

When using a local notebook to run through the annotation section of the quick_start notebook, I am unable to see the LoadAnnonations object returned in order to generate new labels! See screen shot below:

opened by ChristinaLast 3
d actual edits to first para

I've restored the order to 'maps' -> 'images' so we get a clearer narative as in the current existing repo; and shortened / combined a sentence, as it was repeating 'non-maps' and 'maps', so I used 'any images' instead to make it more intuitive to read.

I was also going to add a few sentences giving the nice positive spin about interdisciplinary cross-pollination of image analysis, but not sure where this should go: I don't want to break the flow to the instructions, so perhaps it can go after the bullet points?

opened by dcsw2 2
Deploying `MapReader` through `binder`
[x] @ChristinaLast and @andrewphilipsmith to walk through binder deployment

[x] adding requirements.txt with no hashed libraries for binderhub deployment
opened by ChristinaLast 2
Model inference in one step
Summary

Currently, we first need to patchify an image and then do the model inference (in two separate steps). In this issue, we plan to have a method that does both steps, i.e.,

# example interface my_classifier.inference(path2image, **kwds for the slice method, including patch size, ...) my_classifier.plot()

TODO

Refer to https://github.com/alan-turing-institute/mapreader-plant-scivision. Here, we have a function/method called "predict" that does model inference on an image. Under the hood, it slices an image into patches, does model inference on the patches and then plot the results (and return the predicted labels).

It would be interesting to have a similar function/method in MapReader.
opened by kasra-hosseini 2
Dev
Creating requirements.txt from pyproject.toml to generate package list needed for binderhub build

Commands run:

to generate requirements.txt

poetry export -f requirements.txt --output requirements.txt --without-hashes

After doing this, I am required to add the github repo manually to the requirements.txt file to install MapReader such as:

git+https://github.com/Living-with-machines/MapReader@main#egg=mapreader
opened by ChristinaLast 1
Bump ipython from 8.0.0 to 8.0.1
Bumps ipython from 8.0.0 to 8.0.1.

Commits

325262d release 8.0.1

a06ca83 Merge pull request from GHSA-pq7m-3gw7-gq5x

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Add plant phenotyping example notebooks and data

Add directory with cleaned and updated notebooks demonstrating classification of plant patches in images. Also includes examples of open access data that can be used in running these notebooks, annotation files to facilitate annotating plant vs. non-plant patches.

opened by evangeline-corcoran 1
Bump pillow from 8.4.0 to 9.0.0
Bumps pillow from 8.4.0 to 9.0.0.

Release notes

Sourced from pillow's releases.

9.0.0

https://pillow.readthedocs.io/en/stable/releasenotes/9.0.0.html

Changes

Restrict builtins for ImageMath.eval() #5923 [@radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [@radarhere]

Fixed ImagePath.Path array handling #5920 [@radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [@radarhere]

Removed redundant part of condition #5915 [@radarhere]

Explicitly enable strip chopping for large uncompressed TIFFs #5517 [@kmilos]

Use the Windows method to get TCL functions on Cygwin #5807 [@DWesl]

Changed error type to allow for incremental WebP parsing #5404 [@radarhere]

Improved I;16 operations on big endian #5901 [@radarhere]

Ensure that BMP pixel data offset does not ignore palette #5899 [@radarhere]

Limit quantized palette to number of colors #5879 [@radarhere]

Use latin1 encoding to decode bytes #5870 [@radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [@radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [@radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [@radarhere]

Added rounding when converting P and PA #5824 [@radarhere]

Improved putdata() documentation and data handling #5910 [@radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [@radarhere]

Image.NONE is only used for resampling and dithers #5908 [@radarhere]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [@radarhere]

Add Tidelift alignment action and badge #5763 [@aclark4life]

Replaced further direct invocations of setup.py #5906 [@radarhere]

Added ImageShow support for xdg-open #5897 [@m-shinder]

Fixed typo #5902 [@radarhere]

Switched from deprecated "setup.py install" to "pip install ." #5896 [@radarhere]

Support 16-bit grayscale ImageQt conversion #5856 [@cmbruns]

Fixed raising OSError in _safe_read when size is greater than SAFEBLOCK #5872 [@radarhere]

Convert subsequent GIF frames to RGB or RGBA #5857 [@radarhere]

WebP: Fix memory leak during decoding on failure #5798 [@ilai-deutel]

Do not prematurely return in ImageFile when saving to stdout #5665 [@infmagic2047]

Added support for top right and bottom right TGA orientations #5829 [@radarhere]

Corrected ICNS file length in header #5845 [@radarhere]

Block tile TIFF tags when saving #5839 [@radarhere]

Added line width argument to ImageDraw polygon #5694 [@radarhere]

Do not redeclare class each time when converting to NumPy #5844 [@radarhere]

Only prevent repeated polygon pixels when drawing with transparency #5835 [@radarhere]

Fix pushes_fd method signature #5833 [@hoodmane]

Add support for pickling TrueType fonts #5826 [@hugovk]

Only prefer command line tools SDK on macOS over default MacOSX SDK #5828 [@radarhere]

Fix compilation on 64-bit Termux #5793 [@landfillbaby]

Replace 'setup.py sdist' with '-m build --sdist' #5785 [@hugovk]

Use declarative package configuration #5784 [@hugovk]

Use title for display in ImageShow #5788 [@radarhere]

Fix for PyQt6 #5775 [@hugovk]

... (truncated)

Changelog

Sourced from pillow's changelog.

9.0.0 (2022-01-02)

Restrict builtins for ImageMath.eval(). CVE-2022-22817 #5923 [radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [radarhere]

Fixed ImagePath.Path array handling. CVE-2022-22815, CVE-2022-22816 #5920 [radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [radarhere]

Improved I;16 operations on big endian #5901 [radarhere]

Limit quantized palette to number of colors #5879 [radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [radarhere]

Added rounding when converting P and PA #5824 [radarhere]

Improved putdata() documentation and data handling #5910 [radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [hugovk]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [radarhere]

Added ImageShow support for xdg-open #5897 [m-shinder, radarhere]

Support 16-bit grayscale ImageQt conversion #5856 [cmbruns, radarhere]

Convert subsequent GIF frames to RGB or RGBA #5857 [radarhere]

... (truncated)

Commits

82541b6 9.0.0 version bump

cae5ac4 Merge pull request #5924 from radarhere/cves

ed4cf78 CVEs TBD

d7f60d1 Merge pull request #5923 from radarhere/imagemath_eval

8531b01 Restrict builtins for ImageMath.eval

1efb1d9 Merge pull request #5922 from radarhere/releasenotes

f6c7871 Added release notes for #5919, #5920 and #5921

032d2dc Update CHANGES.rst [ci skip]

baae9ec Merge pull request #5921 from radarhere/jpeg_eoi

1059eb5 If appended EOI did not work, do not keep trying

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Satellite images (some references)
I just had a talk with one of the REG members on https://github.com/urbangrammarai and they are using this tool to download satellite images: https://github.com/urbangrammarai/gee_pipeline/.

The other option is : https://planetarycomputer.microsoft.com/
opened by kasra-hosseini 0
Add `min_std_pixel` and `max_std_pixel` to `prepare_annotation`

So that we can filter out black patches easier. We have trained some MapReader models using ~6K annotated patches (the plant phenotyping project), and now we need to extend the dataset, particularly for non-black patches.
enhancement

opened by kasra-hosseini 1
Choose a tool to simplify diffs on .ipynb files.
Consider

https://www.reviewnb.com/

https://jupyter.org/enhancement-proposals/08-notebook-diff/notebook-diff.html

https://blog.ouseful.info/2017/01/27/displaying-differences-in-jupyter-notebooks-nbdime-nbdiff/

and others

Build into workflow using pre-commit/CI as appropriate.
opened by andrewphilipsmith 1
Create CODE_OF_CONDUCT.md

@DavidBeavan Could you please review this PR? I am using "Contributor Covenant" of GitHub with the following edit:

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at https://livingwithmachines.ac.uk/contact-us/. All complaints will be reviewed and investigated promptly and fairly.

opened by kasra-hosseini 0
Adding a notebook containing start of implementation for maps
This PR aims to implement the requirements of this issue https://github.com/Living-with-machines/MapReader/issues/36

For details: See https://hackmd.io/bL3y2cWdT-y3qPGkyVzD5Q?both

Tasks:

[ ] Get or create annotations for example map data

[ ] Complete text in HackMD above and transfer it into an appropriate place within the repo. (readme.md or quick_start.ipynb etc)

[ ] Resolve all of the questions in the HackMD (whether adding more detail or explicitly deciding to exclude from a quick start guide).

[ ] Give the quick_start.ipynb (maps) and quick_start.ipynb (plants) distinct names.

[ ] Complete the quick_start.ipynb (maps) to, at least, the same level of detail as the quick_start.ipynb (plants).
opened by andrewphilipsmith 1

Releases(v0.3.3)

v0.3.3(Apr 27, 2022)

Source code(tar.gz)
Source code(zip)
v0.1.2(Mar 3, 2022)

Source code(tar.gz)
Source code(zip)
v0.1.1(Jan 15, 2022)

MapReader v0.1.1
Source code(tar.gz)
Source code(zip)

Owner

Living with Machines

A radical collaboration between computational linguists, curators, data scientists, software engineers, geographers and historians

GitHub https://living-with-machines.github.io/MapReader/

Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

1 Feb 11, 2022

Udacity-api-reporting-pipeline - Udacity api reporting pipeline

udacity-api-reporting-pipeline In this exercise, you'll use portions of each of

1 Feb 15, 2022

Data exploration done quick.

Pandas Tab Implementation of Stata's tabulate command in Pandas for extremely easy to type one-way and two-way tabulations. Support: Python 3.7 and 3.

20 Aug 27, 2022

Statistical Analysis 📈 focused on statistical analysis and exploration used on various data sets for personal and professional projects.

Statistical Analysis ?? This repository focuses on statistical analysis and the exploration used on various data sets for personal and professional pr

1 Sep 3, 2022

DefAP is a program developed to facilitate the exploration of a material's defect chemistry

DefAP is a program developed to facilitate the exploration of a material's defect chemistry. A large number of features are provided and rapid exploration is supported through the use of autoplotting with carefully considered automatic data labelling and simplification options enabling production of publication quality plots.

6 Oct 25, 2022

Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

ETL Pipeline with Airflow, Spark, s3, MongoDB and Amazon Redshift

214 Jan 2, 2023

pipeline for migrating lichess data into postgresql

How Long Does It Take Ordinary People To "Get Good" At Chess? TL;DR: According to 5.5 years of data from 2.3 million players and 450 million games, mo

182 Nov 11, 2022

In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

ETL Pipeline for AWS Project Description In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift. The data is loaded from S3 t

1 Nov 1, 2021

Pipeline and Dataset helpers for complex algorithm evaluation.

tpcp - Tiny Pipelines for Complex Problems A generic way to build object-oriented datasets and algorithm pipelines and tools to evaluate them pip inst

Machine Learning and Data Analytics Lab FAU

3 Dec 7, 2022

A pipeline that creates consensus sequences from a Nanopore reads. I

A pipeline that creates consensus sequences from a Nanopore reads. It clusters reads that are similar to each other and creates a consensus that is then identified using BLAST.

2 May 15, 2022

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

2 Nov 20, 2021

Full automated data pipeline using docker images

Create postgres tables from CSV files This first section is only relate to creating tables from CSV files using postgres container alone. Just one of

1 Nov 21, 2021

Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

Overview dataflow-mvp provides a basic example pipeline that pulls data from an API and writes it to a BigQuery table using GCP's Dataflow (i.e., Apac

1 Dec 3, 2021

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is

6 Sep 7, 2022

ETL pipeline on movie data using Python and postgreSQL

Movies-ETL ETL pipeline on movie data using Python and postgreSQL Overview This project consisted on a automated Extraction, Transformation and Load p

0 Jul 7, 2021

X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana

5 Sep 28, 2022

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

1 Jan 19, 2022

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify.

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify. The ETL process flows from AWS's S3 into staging tables in AWS Redshift.

1 Feb 11, 2022

A computer algebra system written in pure Python

SymPy See the AUTHORS file for the list of authors. And many more people helped on the SymPy mailing list, reported bugs, helped organize SymPy's part

9.9k Dec 31, 2022

MapReader: A computer vision pipeline for the semantic exploration of maps at scale

Related tags

Overview

MapReader

A computer vision pipeline for the semantic exploration of maps at scale

Table of contents

Installation

Set up a conda environment

Method 1

Method 2

How to cite MapReader

Credits and re-use terms

Digitized maps

Metadata

Acknowledgements

Comments

Tasks

Associated tracebacks

Summary

TODO

Commands run:

9.0.0

Changes

9.0.0 (2022-01-02)

Releases(v0.3.3)

v0.3.3(Apr 27, 2022)

v0.1.2(Mar 3, 2022)

v0.1.1(Jan 15, 2022)

Owner

Living with Machines

Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

Udacity-api-reporting-pipeline - Udacity api reporting pipeline

Data exploration done quick.

Statistical Analysis 📈 focused on statistical analysis and exploration used on various data sets for personal and professional projects.

DefAP is a program developed to facilitate the exploration of a material's defect chemistry

Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

pipeline for migrating lichess data into postgresql

In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

Pipeline and Dataset helpers for complex algorithm evaluation.

A pipeline that creates consensus sequences from a Nanopore reads. I

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Full automated data pipeline using docker images

Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

ETL pipeline on movie data using Python and postgreSQL

X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify.

A computer algebra system written in pure Python