A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset

Related tags

Data Analysis xwrf
Overview

xwrf

A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset. The primary objective of xwrf is to replicate crucial I/O functionality from the wrf-python package in a way that is more convenient for users and provides seamless integration with the rest of the Pangeo software stack.

CI GitHub Workflow Status Code Coverage Status
Docs Documentation Status
License License

This code is highly experimental! Let the buyer beware ⚠️ ;)

Installation

xwrf may be installed with pip:

python -m pip install git+https://github.com/NCAR/xwrf.git

What is it?

The native WRF output files are not CF compliant. This makes these files not the easiest NetCDF files to use with tools like xarray. This package provides a simple interface for reading in the WRF output files into xarray Dataset objects using xarray's flexible and extensible I/O backend API. For example, the following code reads in a WRF output file:

Dimensions: (Time: 1, south_north: 546, west_east: 480) Coordinates: XLONG (south_north, west_east) float32 ... XLAT (south_north, west_east) float32 ... Dimensions without coordinates: Time, south_north, west_east Data variables: Q2 (Time, south_north, west_east) float32 ... PSFC (Time, south_north, west_east) float32 ... Attributes: (12/86) TITLE: OUTPUT FROM WRF V3.3.1 MODEL START_DATE: 2012-04-20_00:00:00 SIMULATION_START_DATE: 2012-04-20_00:00:00 WEST-EAST_GRID_DIMENSION: 481 SOUTH-NORTH_GRID_DIMENSION: 547 BOTTOM-TOP_GRID_DIMENSION: 32 ... ... NUM_LAND_CAT: 24 ISWATER: 16 ISLAKE: -1 ISICE: 24 ISURBAN: 1 ISOILWATER: 14 ">
In [1]: import xarray as xr

In [2]: path = "./tests/sample-data/wrfout_d03_2012-04-22_23_00_00_subset.nc"

In [3]: ds = xr.open_dataset(path, engine="xwrf")

In [4]: # or

In [5]: # ds = xr.open_dataset(path, engine="wrf")

In [6]: ds
Out[6]:
<xarray.Dataset>
Dimensions:  (Time: 1, south_north: 546, west_east: 480)
Coordinates:
    XLONG    (south_north, west_east) float32 ...
    XLAT     (south_north, west_east) float32 ...
Dimensions without coordinates: Time, south_north, west_east
Data variables:
    Q2       (Time, south_north, west_east) float32 ...
    PSFC     (Time, south_north, west_east) float32 ...
Attributes: (12/86)
    TITLE:                            OUTPUT FROM WRF V3.3.1 MODEL
    START_DATE:                      2012-04-20_00:00:00
    SIMULATION_START_DATE:           2012-04-20_00:00:00
    WEST-EAST_GRID_DIMENSION:        481
    SOUTH-NORTH_GRID_DIMENSION:      547
    BOTTOM-TOP_GRID_DIMENSION:       32
    ...                              ...
    NUM_LAND_CAT:                    24
    ISWATER:                         16
    ISLAKE:                          -1
    ISICE:                           24
    ISURBAN:                         1
    ISOILWATER:                      14

In addition to being able to use xr.open_dataset, xwrf also allows reading in multiple WRF output files at once via xr.open_mfdataset function:

ds = xr.open_mfdataset(list_of_files, engine="xwrf", parallel=True,
                       concat_dim="Time", combine="nested")

Why not just a preprocess function?

One can achieve the same functionality with a preprocess function. However, there are some additional I/O features that wrf-python implements under the hood that we think would be worth implementing as part of a backend engine instead of a regular preprocess function.

Comments
  • First Release Blog Post

    First Release Blog Post

    Description

    I think that once we have a first release of xwrf, we should write a blog post demonstrating its use. It would be great if one of our WRF expert collaborators could spearhead this blog. Any volunteers?

    Implementation

    Personally, I think that a Jupyter Notebook is a good medium for a demonstration, and the notebook can be easily converted to a markdown doc for a blog-post.

    Tests

    N/A

    Questions

    Before embarking on this, though, we need to complete the features that we want in the first release. That said, I wouldn't be too overly excited to delay the release. Earlier is better, even if incomplete.

    enhancement 
    opened by kmpaul 32
  • Implementation of salem-style x, y, and z coordinates

    Implementation of salem-style x, y, and z coordinates

    Change Summary

    As alluded to in #2, including dimension coordinates in the grid mapping/projection space is a key feature for integrating with other tools in the ecosystem like metpy and xgcm. In this (draft) PR, I've combined code ported from salem with some of my own one-off scripts and what already exists in xwrf to meet this goal. In particular, this introduces a pyproj dependency (for CRS handling and transforming the domain center point from lon/lat to easting/northing). Matching the assumptions already present in xwrf and salem, this implementation assumes we do not have a moving domain (which simplifies things greatly). Also, this implements the c_grid_axis_shift attr as-needed, so xgcm should be able to interpret our coords automatically, eliminating the need for direct handling (like #5) in xwrf.

    ~~Also, because it existed in salem and my scripts alongside the dimension coordinate handling, I also included my envisioned diagnostic field calculations. These are deliberately limited to only those four fields that require WRF-specific handling:~~

    • ~~ 'T' going to potential temperature has a magic number offset of 300 K~~
    • ~~ 'P' and 'PB' combine to form pressure, and are not otherwise used~~
    • ~~ 'PH' and 'PHB' combine to form geopotential, and are not otherwise used~~
    • ~~ Geopotential to geopotential height conversion depends on a particular value of g (9.81 m s**2) that may not match the value used elsewhere~~

    ~~Unless I'm missing something, any other diagnostics should be derivable using these or other existing fields in a non-WRF-specific way (and so, fit outside of xwrf). If the netcdf4 backend already handles Dask chunks, then this should "just work" as it is currently written. However, I'm not sure how this should behave with respect to lazy-loading when chunks are not specified, so that is definitely a discussion to have in relation to #10.~~

    ~~Right now, no tests are included, as this is just a draft implementation to get the conversation started on how we want to approach these features. So, please do share your thoughts and ask questions!~~

    Related issue number

    • Closes #3
    • Closes #11

    Checklist

    • [x] Unit tests for the changes exist
    • [x] Tests pass on CI
    • [ ] Documentation reflects the changes where applicable
    enhancement 
    opened by jthielen 31
  • First Release?

    First Release?

    Now that we have xwrf in a usable state, should we consider cutting its first release soon (later this week or next week)? We already have the infrastructure in place for automatically publishing the package to PyPI. One missing piece is the documentation. The infrastructure for authoring the docs is already in place (uses markdown via myst + furo theme, and the current template follows this documentation system guide). I am opening this issue to keep track of other outstanding issues that need to be addressed before the first release. Feel free to add to this list (cc @ncar-xdev/xwrf)

    • [x] Update documentation
    • [x] Publish to PyPI
    • [x] Publish to conda-forge
    opened by andersy005 27
  • Tutorial

    Tutorial

    Change Summary

    Tutorial showing xWRF usage.

    Related issue number

    • Towards #69

    Checklist

    • [x] Unit tests for the changes exist
    • [x] Tests pass on CI
    • [x] Documentation reflects the changes where applicable
    opened by lpilz 20
  • Tutorial on xWRF

    Tutorial on xWRF

    What is your issue?

    The aim of this issue is to track the progress in creating a tutorial for xWRF. Here the start of a list of features which are to be presented. Please feel free to add to this list - I'll work on implementing this over coming days.

    • [x] general parsing/coordinate transformation (what does xwrf do?)
    • [x] interface to metpy via unit CF-conventions and pint
    • [x] destaggering data using xgcm
    • [x] vertically interpolating data using xgcm
    • [x] plotting
    opened by lpilz 17
  • Update of tutorials for v0.0.2

    Update of tutorials for v0.0.2

    Change Summary

    Added a tutorial for using xgcm with dask-data.

    Related issue number

    Closes #69

    Checklist

    • [x] Documentation reflects the changes where applicable
    documentation 
    opened by lpilz 13
  • First draft

    First draft "destagger" function

    Change Summary

    Here's an attempt at a "destaggering" function. This is based on the function in WRF-python (https://github.com/NCAR/wrf-python/blob/22fb45c54f5193b849fdff0279445532c1a6c89f/src/wrf/destag.py).

    I've tested in on "east_west_stag" and "north_south_stag" coordinates. The function takes an xarray data-array and guesses the name of the staggered coordinate (it ends in "_stag"). If there is more than one (I don't think there are in WRF?), a NotImplenetedError is raised.

    I'm also not sure if this should ultimately look like this at all, but I wanted to go ahead and throw this code out there.

    Related issue number

    This is related to issue #35

    Checklist

    I don't have any unit tests to check this -- I'm open to ideas on how to make unit tests (do they need to be on "real" data?) Maybe that's a separate issue.

    • [ ] Unit tests for the changes exist
    • [ ] Tests pass on CI
    • [ ] Documentation reflects the changes where applicable

    I'm new to collaborating on open-source projects, and writing code for wide usage, so any feedback is welcome!

    enhancement 
    opened by bsu-wrudisill 13
  • [MISC]: Curate sample datasets

    [MISC]: Curate sample datasets

    What is your issue?

    We currently don't have great sample datasets to use for testing, documentation. It's worth curating exemplar, small data sets. We could emulate the approach used by fatiando/ensaio or xarray tutorial module. These datasets should probably be hosted in a separate GitHub repository.

    • Option 1: A separate data package (xwrf_data)
    import xwrf_data
    import xwrf
    import xarray as xr
    
    fname = xwrf_data.fetch_foo_dataset()
    ds = xr.open_dataset(fname).wrf.diag_and_destagger()
    
    • Option 2: Tutorial module within xwrf
    import xwrf
    import xarray as xr
    
    ds = xwrf.tutorial.open_dataset('foo_dataset').wrf.diag_and_destagger()
    

    Cc @ncar-xdev/xwrf

    enhancement 
    opened by andersy005 12
  • Division of Features in Top-Level API

    Division of Features in Top-Level API

    While detailed API discussions will be ongoing based on https://github.com/NCAR/xwrf/discussions/13 and other issues/discussions that follow from that, https://github.com/NCAR/xwrf/pull/14#issuecomment-977066277 and https://github.com/NCAR/xwrf/pull/14#issuecomment-977157649 raised a more high-level API point that would be good to clear up first: what features go into the xwrf backend, and what goes elsewhere (such as a .wrf accessor)?

    Original comments:


    If so, I think this means we can't have direct Dask operations within the backend, but would rather need to design custom backend arrays that play nicely with the Dask chunking xarray itself does, or re-evaluate the approach for derived quantities so that they are outside the backend. Perhaps the intake-esm approach could help in that regard at least?

    Wouldn't creating custom backend arrays be overkill? Assuming we want to support reading files via the Python-netCDF4 library, we might be able to write a custom data store that borrows from xarray's NetCDF4DataStore: https://github.com/pydata/xarray/blob/5db40465955a30acd601d0c3d7ceaebe34d28d11/xarray/backends/netCDF4_.py#L291. With this custom datastore, we would have more control over what to do with variables, dimensions, attrs before passing them to xarray. Wouldn't this suffice for the data loading (without the derived quantities)?

    I think there's value in keeping the backend plugin simple (e.g. performing simple tasks such as decoding coordinates, fixing attributes/metadata, etc) and everything else outside the backend. Deriving quantities doesn't seem simple enough to warrant having this functionality during the data loading.

    Some of the benefits of deriving quantities outside the backend are that this approach:

    (1) doesn't obfuscate what's going on, (2) gives users the opportunity to fix aspects of the dataset that might be missed by xwrf during data loading before passing this cleaned dataset to the functionality for deriving quantities. (3) removes the requirement for deriving quantities to be a lazy operation i.e. if your dataset is in memory, deriving the quantity is done eagerly...

    Originally posted by @andersy005 in https://github.com/NCAR/xwrf/issues/14#issuecomment-977066277


    Some of the benefits of deriving quantities outside the backend are that this approach:

    Also, Wouldn't it be beneficial for deriving quantities to be backend agnostic? I'm imagining cases in which the data have been post-processed and saved in a different format (e.g. Zarr) and you still want to be able to use the same code for deriving quantities on the fly.

    Originally posted by @andersy005 in https://github.com/NCAR/xwrf/issues/14#issuecomment-977072366


    Deriving quantities doesn't seem simple enough to warrant having this functionality during the data loading.

    This sounds like it factors directly into the "keep the solutions as general as possible (so that maybe also MPAS can profit from it)" discussion. However, I feel that we have to think about the user-perspective too. I don't have any set opinions on this and we should definitely discuss this maybe in a larger group too. Here some thoughts on this so far:

    I think the reason users like wrf-python is because it is an easy one-stop-shop for getting wrf output to work with python - this is especially true because lots of users are scientists and not software engineers or programmers. I personally take from this point that it would be prudent to keep the UX as easy as possible. I think this is what the Backend-approach does really well. Basically users just have to add the engine='xwrf' kwarg and then it just works (TM). Meaning that it provides the users with CF-compliant de-WRFified meteo data. Also, given that the de-WRFification of the variable data is not too difficult (it's basically just adding fields for three variables), I think the overhead in complexity wouldn't be too great. However, while I do see that it breaks the conceptual barrier between data loading (and decoding etc.) and computation, this breakage would be required in order to provide the user with meteo data rather than raw wrf fields.

    @andersy005 do you already have some other ideas on how one could handle this elegantly?

    Also, should we move this discussion to a separate issue maybe?

    Originally posted by @lpilz in https://github.com/NCAR/xwrf/issues/14#issuecomment-977157649

    opened by jthielen 10
  • Coordinate UX

    Coordinate UX

    I think this is pretty straightforward as we just need the lat, lon and time coordinates, all other can be discarded. Unstaggering will be done in the variable initialization. However, we should be aware of moving-nest runs and keep the time-dependence of lat and lon for these occasions.

    enhancement 
    opened by lpilz 9
  • Create xWRF logo

    Create xWRF logo

    What is your issue?

    It would be nice to have a minimalistic logo for the project. Does anyone have or know someone with design skills? :). This would be good for the overall branding of the project once we start advertising the project after the first release

    • https://github.com/ncar-xdev/xwrf/issues/51

    Cc @ncar-xdev/xwrf

    opened by andersy005 8
  • [Bug]: ValueError when using MetPy to calculate geostrophic winds

    [Bug]: ValueError when using MetPy to calculate geostrophic winds

    What happened?

    I'm trying to use the MetPy function mpcalc.geostrophic_wind() to calculate geostrophic winds from a wrfout file.

    I'm getting "ValueError: Must provide dx/dy arguments or input DataArray with latitude/longitude coordinates", along with a warning, "warnings.warn('More than one ' + axis + ' coordinate present for variable'".

    I don't know what's causing the problem.

    Minimal Complete Verifiable Example

    import metpy.calc as mpcalc
    import xarray as xr
    import xwrf
    
    # Open the NetCDF file
    filename = "wrfout_d01_2016-10-04_12:00:00"
    ds = xr.open_dataset(filename).xwrf.postprocess()
    
    # Extract the geopotential height
    z = ds['geopotential_height']
    
    # Compute the geostrophic wind
    geo_wind_u, geo_wind_v = mpcalc.geostrophic_wind(z)
    

    Relevant log output

    /mnt/iusers01/fatpou01/sees01/w34926hb/.conda/envs/metpy_env/lib/python3.9/site-packages/metpy/xarray.py:355: UserWarning: More than one latitude coordinate present for variable "geopotential_height".
      warnings.warn('More than one ' + axis + ' coordinate present for variable'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/mnt/iusers01/fatpou01/sees01/w34926hb/.conda/envs/metpy_env/lib/python3.9/site-packages/metpy/xarray.py", line 1508, in wrapper
        raise ValueError('Must provide dx/dy arguments or input DataArray with '
    ValueError: Must provide dx/dy arguments or input DataArray with latitude/longitude coordinates.
    

    Environment

    System Information
    ------------------
    xWRF commit : None
    python      : 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:58:50)
    [GCC 10.3.0]
    python-bits : 64
    OS          : Linux
    OS-release  : 3.10.0-1127.19.1.el7.x86_64
    machine     : x86_64
    processor   : x86_64
    byteorder   : little
    LC_ALL      : None
    LANG        : en_GB.UTF-8
    LOCALE      : ('en_GB', 'UTF-8')
    
    Installed Python Packages
    -------------------------
    cf_xarray   : 0.7.5
    dask        : 2022.11.0
    donfig      : 0.7.0
    matplotlib  : 3.6.2
    metpy       : 1.3.1
    netCDF4     : 1.6.2
    numpy       : 1.23.5
    pandas      : 1.5.1
    pint        : 0.20.1
    pooch       : v1.6.0
    pyproj      : 3.4.0
    xarray      : 2022.11.0
    xgcm        : 0.8.0
    xwrf        : 0.0.2
    

    Anything else we need to know?

    No response

    bug waiting for response 
    opened by starforge 3
  • [MISC]: Plot in metpy tutorial is missing

    [MISC]: Plot in metpy tutorial is missing

    What is your issue?

    On https://xwrf.readthedocs.io/en/latest/tutorials/metpy.html, the Skew-T plot is missing. @andersy005 is this an intermittent sphinx issue or do we have some malconfiguration somewhere?

    opened by lpilz 1
  • More comprehensive unit harmonization

    More comprehensive unit harmonization

    Change Summary

    Unit harmonization is improved by:

    • using a better map parsed from WRF Registries (yes, all of them, but not WPS)
      • translations are generated manually using custom external tool
      • includes all versions from WRFv4.0 onwards
      • makes bracket cleaning superfluous
    • extracting this map from the config yaml to avoid clutter

    Related issue number

    Checklist

    • [x] Unit tests for the changes exist
    • [x] Tests pass on CI
    • [x] Documentation reflects the changes where applicable
    enhancement 
    opened by lpilz 4
  • [FEATURE]: Add functionality to organize WRF data into a DataTree

    [FEATURE]: Add functionality to organize WRF data into a DataTree

    Description

    WRF output can easily have a couple hundred data variables in a dataset, which is not ideal for interactive exploration of a dataset's contents. With DataTree, we would have a tree-like hierarchical data structure for xarray which could be used for this.

    From @lpilz in https://github.com/xarray-contrib/xwrf/issues/10:

    • Which diagnostics do we want to provide and do we want to expose them in a DataTree eventually?

    One suggestion might be:

    DataTree("root")
    |-- DataNode("2d_variables")
    |   |-- DataArrayNode("sea_surface_temperature")
    |   |-- DataArrayNode("surface_temperature")
    |   |-- DataArrayNode("surface_air_pressure")
    |   |-- DataArrayNode("air_pressure_at_sea_level")
    |   |-- DataArrayNode("air_temperature_at_2m") (?)
    |   ....
    |-- DataNode("3d_variables")
        |-- DataArrayNode("air_temperature")
        |-- DataArrayNode("air_pressure")
        |-- DataArrayNode("northward_wind")
        |-- DataArrayNode("eastward_wind")
        ....
    

    Implementation

    This would likely become a new accessor method, such as .xwrf.organize().

    Tests

    After xwrf.postprocess(), we have a post processed dataset (with likely many data variables). Then, after xwrf.organize(), we would have a DataTree with (a yet to be decided) tree-like grouping of data variables. Calling xwrf.organize() without xwrf.postprocess() would fail.

    Questions

    What form of heirarchy would we want to have and how deep?

    • 2d_variables vs. 3d_variables?
    • semantic grouping of variables, such as thermodynamic, grid_metrics, kinematic, accumulated, etc.?
    • Parse the WRF Registry somehow and assign groups based on that?
    • some other strategy?
    enhancement 
    opened by jthielen 0
  • [META]: Support for unexpected/non-pristine wrfout datasets

    [META]: Support for unexpected/non-pristine wrfout datasets

    What is your issue?

    As encountered in #36 and https://github.com/xarray-contrib/xwrf-data/pull/34 (and perhaps elsewhere), there may be several unexpected factors (old versions, tweaked registries, subsetting, etc.) that could result in xWRF's standard functionality being unsupported or failing. While it is definitely something not to prioritize for immediate releases, it would still be nice to make as many subsets of xWRF functionality available to users whose WRF datasets "break" xWRF's norms as possible. So, I propose this to be a meta-issue to

    • track such unexpected/non-pristine examples
    • work towards features to enable extended compatibility and/or custom application of atomized functionality outside of the standard postprocess()
    • discuss any high-level design strategies to improve the experience of xWRF in these situations

    Running list of sub-issues

    (feel free to add/modify)

    • [ ] Missing latitude/longitude coordinates (xref #36)
      • Could be addressed by (one or both of)
        • Convenience methods to merge in coordinates from geo_em files
        • Recompute lat/lon from projection coordinates
    • [ ] Dataset grid definition attributes partially invalid due to spatial subsetting prior to postprocessing (xref https://github.com/xarray-contrib/xwrf-data/pull/34; local issue TBD)
      • Could be addressed by (one or both of)
        • Reference lat/lon being derived from XLAT/XLONG corner(s) rather than CEN_LON/CEN_LAT attrs
        • Require user input of needed info if some sanity check fails (which would also lead to support for completely missing attrs, not just CEN_LON/CEN_LAT being rendered invalid)
    enhancement 
    opened by jthielen 0
  • [MISC]: More careful consideration of different xarray options

    [MISC]: More careful consideration of different xarray options

    What is your issue?

    Test expected results under different xarray options

    In the spirit of improving the quality of our tests (xref #60), it would be nice to implement tests where different relevant xarray options are enabled (using set_options as a context manager). This would likely make it easier to catch issues like #96 .

    Xarray options in issue reports

    Not sure the best way to do this (bundle into xwrf.show_versions()? Add another copy-paste box to the issue template?), but it could help with debugging if we knew the state of xarray.get_options.

    maintenance 
    opened by jthielen 0
Releases(v0.0.2)
  • v0.0.2(Sep 21, 2022)

    What's Changed

    • Add destaggering functionality by @jthielen in https://github.com/xarray-contrib/xwrf/pull/93
    • Fix destagger attrs by @lpilz in https://github.com/xarray-contrib/xwrf/pull/97
    • Fix staggered coordinate destaggering for dataarray destagger method by @jthielen in https://github.com/xarray-contrib/xwrf/pull/101
    • Added earth-relative wind field calculation to base diagnostics by @lpilz in https://github.com/xarray-contrib/xwrf/pull/100
    • Clean up _destag_variable with respect to types and terminology by @jthielen in https://github.com/xarray-contrib/xwrf/pull/103
    • Changed wrfout file (cf. xwrf-data/#34) by @lpilz in https://github.com/xarray-contrib/xwrf/pull/102
    • More unit harmonization by @lpilz in https://github.com/xarray-contrib/xwrf/pull/105
    • Fixing a further coords attrs fail. by @lpilz in https://github.com/xarray-contrib/xwrf/pull/107
    • Clear c_grid_axis_shift from attrs when destaggering by @jthielen in https://github.com/xarray-contrib/xwrf/pull/106
    • Update of tutorials for v0.0.2 by @lpilz in https://github.com/xarray-contrib/xwrf/pull/89

    Full Changelog: https://github.com/xarray-contrib/xwrf/compare/v0.0.1...v0.0.2

    Source code(tar.gz)
    Source code(zip)
  • v0.0.1(Sep 9, 2022)

    This is the first packaged release of xWRF (a lightweight interface for working with the Weather Research and Forecasting (WRF) model output in xarray). Features in this release include:

    • A xwrf Dataset accessor with a postprocess method that can perform the following operations
      • Rename dimensions to match the CF conventions.
      • Rename variables to match the CF conventions.
      • Rename variable attributes to match the CF conventions.
      • Convert units to Pint-friendly units.
      • Decode times.
      • Include projection coordinates.
      • Collapse time dimension.
    • A tutorial module with several sample datasets
    • Documentation with several examples/tutorials

    Thank you to the following contributors for their efforts towards this release!

    • @andersy005
    • @lpilz
    • @jthielen
    • @kmpaul
    • @dcherian
    • @jukent

    Full Changelog: https://github.com/xarray-contrib/xwrf/commits/v0.0.1

    Source code(tar.gz)
    Source code(zip)
Owner
National Center for Atmospheric Research
NCAR is sponsored by the National Science Foundation and managed by the University Corporation for Atmospheric Research.
National Center for Atmospheric Research
Aggregating gridded data (xarray) to polygons

A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons. Check out the binder link above for a sample code run!

Kevin Schwarzwald 42 Nov 9, 2022
Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

Using Streaming Twitter Data with Kafka and Spark Reading streams of Twitter data, publishing them to Kafka topic, process message using Kafka Stream

Rustam Zokirov 1 Dec 6, 2021
Weather analysis with Python, SQLite, SQLAlchemy, and Flask

Surf's Up Weather analysis with Python, SQLite, SQLAlchemy, and Flask Overview The purpose of this analysis was to examine weather trends (precipitati

Art Tucker 1 Sep 5, 2021
Stream-Kafka-ELK-Stack - Weather data streaming using Apache Kafka and Elastic Stack.

Streaming Data Pipeline - Kafka + ELK Stack Streaming weather data using Apache Kafka and Elastic Stack. Data source: https://openweathermap.org/api O

Felipe Demenech Vasconcelos 2 Jan 20, 2022
A set of tools to analyse the output from TraDIS analyses

QuaTradis (Quadram TraDis) A set of tools to analyse the output from TraDIS analyses Contents Introduction Installation Required dependencies Bioconda

Quadram Institute Bioscience 2 Feb 16, 2022
A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Disclaimer This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to chang

Uber Open Source 1.6k Dec 29, 2022
A forecasting system dedicated to smart city data

smart-city-predictions System prognostyczny dedykowany dla danych inteligentnych miast Praca inżynierska realizowana przez Michała Stawikowskiego and

Kevin Lai 1 Nov 8, 2021
For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project > Downloads 2. Download all documents,

hyeong 4 Dec 28, 2021
BAyesian Model-Building Interface (Bambi) in Python.

Bambi BAyesian Model-Building Interface in Python Overview Bambi is a high-level Bayesian model-building interface written in Python. It's built on to

null 861 Dec 29, 2022
This creates a ohlc timeseries from downloaded CSV files from NSE India website and makes a SQLite database for your research.

NSE-timeseries-form-CSV-file-creator-and-SQL-appender- This creates a ohlc timeseries from downloaded CSV files from National Stock Exchange India (NS

PILLAI, Amal 1 Oct 2, 2022
A lightweight, hub-and-spoke dashboard for multi-account Data Science projects

A lightweight, hub-and-spoke dashboard for cross-account Data Science Projects Introduction Modern Data Science environments often involve many indepe

AWS Samples 3 Oct 30, 2021
International Space Station data with Python research 🌎

International Space Station data with Python research ?? Plotting ISS trajectory, calculating the velocity over the earth and more. Plotting trajector

Facundo Pedaccio 41 Jun 16, 2022
Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

null 2 Nov 20, 2021
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jan 5, 2023
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is a state-of-the-art platform for statistical modeling and high-

Stan 229 Dec 29, 2022
A python package which can be pip installed to perform statistics and visualize binomial and gaussian distributions of the dataset

GBiStat package A python package to assist programmers with data analysis. This package could be used to plot : Binomial Distribution of the dataset p

Rishikesh S 4 Oct 17, 2022
Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format.

Brady Law 2 Dec 1, 2021
Import, connect and transform data into Excel

xlwings_query Import, connect and transform data into Excel. Description The concept is to apply data transformations to a main query object. When the

George Karakostas 1 Jan 19, 2022
Pipeline and Dataset helpers for complex algorithm evaluation.

tpcp - Tiny Pipelines for Complex Problems A generic way to build object-oriented datasets and algorithm pipelines and tools to evaluate them pip inst

Machine Learning and Data Analytics Lab FAU 3 Dec 7, 2022