A Unified Framework for Hydrology

Unified Framefork for Hydrology - Community Organisation

Last update: Jan 1, 2023

Related tags

Overview

Unified Framework for Hydrology

The Python package unifhy (Unified Framework for Hydrology) is a hydrological modelling framework which combines interchangeable modelling components for the surface layer, subsurface, and open water parts of the terrestrial water cycle. It is designed to foster collaborations between land surface, hydrological, and groundwater modelling communities.

Comments

Review Z dimension for `SpaceDomain`

At the moment, an optional Z dimension can be included in SpaceDomain. The original intention of adding it was to be able to check for the Z axis of a dataset provided to the Component (e.g. measurement level). But having a third dimension is problematic at the science level. The compoment developper may be given a 2D or a 3D array. If the Z dimension was limited to a size of 1, a systematic "squeeze" could be performed before giving the data array to the component, but there is no such enforcement for size 1 at the moment.

This third dimension is important nonetheless. For instance, layers for taking into account different vegetation heights or soil moisture layers could be needed. When these are about holding multi-layer states, this is alredy accommodated for by using the 'divisions' key in _states_info (so no need for a Z dimension in SpaceDomain), if this is about getting multi-layer inputs, then there is a gap at the moment, and there may be a need for a Z dimension to remain in SpaceDomain.

This is also problematic for Fortran and C extensions if the rank of the array is varying between 2 and 3.

I am opening this issue to register the problem, and start a discussion on this. Until this problem is solved, it may be good to remove the possibility for a Z dimension in SpaceDomain to avoid problems when arriving in the science component code and assume only a 2D array can be given.
invalid

opened by ThibHlln 7
Rename interface transfers to be CF-compliant

The current names for variables transferred through the interface via the Exchanger are not based on the CF-convention standard names http://cfconventions.org/standard-names.html - it would be good if they were.

For some of them, a match seems possible, for others the list falls short and we will need to request for these to be added. See the table below for the ones for which I found a match.

| current cm4twc name | CF standard name equivalent? | CF definition (in v77) | | --- | --- | --- | | throughfall
[kg m-2 s-1] | canopy_throughfall_flux | "Canopy" means the vegetative covering over a surface. The canopy is often considered to be the outer surfaces of the vegetation. Plant height and the distribution, orientation and shape of plant leaves within a canopy influence the atmospheric environment and many plant processes within the canopy. Reference: AMS Glossary http://glossary.ametsoc.org/wiki/Canopy. "Throughfall" is the part of the precipitation flux that reaches the ground directly through the vegetative canopy, through spaces in the canopy, and as drip from the leaves, twigs, and stems (but not including snowmelt). In accordance with common usage in geophysical disciplines, "flux" implies per unit area, called "flux density" in physics. | | snowmelt
[kg m-2 s-1] | surface_snow_melt_flux | Surface snow refers to the snow on the solid ground or on surface ice cover, but excludes, for example, falling snowflakes and snow on plants. The surface called "surface" means the lower boundary of the atmosphere. In accordance with common usage in geophysical disciplines, "flux" implies per unit area, called "flux density" in physics. | | transpiration
[kg m-2 s-1] | transpiration_flux | In accordance with common usage in geophysical disciplines, "flux" implies per unit area, called "flux density" in physics. | | evaporation_soil_surface
[kg m-2 s-1] | water_evaporation_flux_from_soil | "Water" means water in all phases. Evaporation is the conversion of liquid or solid into vapor. (The conversion of solid alone into vapor is called "sublimation".) In accordance with common usage in geophysical disciplines, "flux" implies per unit area, called "flux density" in physics. | | evaporation_ponded_water
[kg m-2 s-1] | ? | - | | evaporation_openwater
[kg m-2 s-1] | ? | - | | direct_throughfall
[kg m-2 s-1] | ? | - | | surface_runoff
[kg m-2 s-1]| surface_runoff_flux | The surface called "surface" means the lower boundary of the atmosphere. Runoff is the liquid water which drains from land. If not specified, "runoff" refers to the sum of surface runoff and subsurface drainage. In accordance with common usage in geophysical disciplines, "flux" implies per unit area, called "flux density" in physics. | | subsurface_runoff
[kg m-2 s-1] | subsurface_runoff_flux | Runoff is the liquid water which drains from land. If not specified, "runoff" refers to the sum of surface runoff and subsurface drainage. In accordance with common usage in geophysical disciplines, "flux" implies per unit area, called "flux density" in physics. | | soil_water_stress
[1] | ? | - | | water_level
[m] | water_surface_height_above_
reference_datum* | 'Water surface height above reference datum' means the height of the upper surface of a body of liquid water, such as sea, lake or river, above an arbitrary reference datum. The altitude of the datum should be provided in a variable with standard name water_surface_reference_datum_altitude. The surface called "surface" means the lower boundary of the atmosphere. |

* probably not the right match as the reference datum would be the river bed, so it would vary spatially

By the way @HughesAG, I remember you told me once that "subsurface runoff" was not very popular within the groundwater community, would "subsurface drainage" be a better term? This is the opportunity to make this right. :-)

For the ones with a match, we need to agree on the match. For the ones without a match, we need to come up with a CF-inspired name with an associated definition and units ready for submission to the CF-conventions.
invalid

opened by ThibHlln 6
Gather 'driving'/'ancillary' data into a single 'inputs' item
resolve #7

The definition of a Component used to distinguish between 'driving_data' and 'ancillary_data', but this distinction was rather ambiguous (where would climatology data fit in?), community-specific (mostly UM world?), and limited (no time dimension allowed for ancillary).

A component is now defined by just one item 'inputs' for the data given to it. Each input must be given a 'units' metadata (as was already the case) and a 'kind' metadata (newly added). The 'kind' can be:

'dynamic': data for every spatial element of component's SpaceDomain, and for every time step of component's TimeDomain (i.e. both time and space dimensions are expected for the data array)

'static': data for every spatial element of component's SpaceDomain (i.e. only space dimensions are expected for the data array)

'climatologic': data for every spatial element of component's SpaceDomain, and for a given number of sub-periods in a year period (i.e. both time and space dimensions expected for the data array, but length of time dimension not equal to number of time steps) – sub-periods defined in an additional 'frequency' metadata, e.g. 'seasonal', 'monthly', 'day_of_year', timedelta(days=7), etc.

The definition of a component's inputs would look like this:

inputs_info = { 'rainfall': { 'units': 'kg m-2 s-1', 'kind': 'dynamic' }, 'elevation': { 'units': 'm above sea level', 'kind': 'static' }, 'leaf_area_index': { 'units': '1', 'kind': 'climatologic', 'frequency': 'monthly' } }

The distinction of 'inputs' into kinds allows for some checks on the compatibility between the data given and what the component needs. For 'dynamic' a full space and time check can be done, for 'static' a space check can be done (a time dimension may or may not exist, but if it does, it must be of size one), and for 'climatologic' a space check can be done alongside a check on the length of the time dimension compared to the expectation.
opened by ThibHlln 6
Add support for input time series/climatology/time-invariant in place of driving/ancillary

Currently, driving data and ancillary data feature one single difference: the former has a mandatory time dimension, and the latter cannot feature a time dimension. Typically driving data is expected to be with dimensions (T, Z, Y, X), and ancillary data is expected to be with dimensions (Z, Y, X).

But some ancillary data may require to feature a time dimension, even if not necessarily of the time resolution of the actual component (otherwise it can be argued it is not an ancillary data anymore). Leaf area index is an example of ancillary data that may need to feature a different value e.g. seasonally or monthly.
functionality

opened by ThibHlln 3
Release v0.1.1
Bug fixes

fix intermittent data filenames loss in YAML configuration file (#80)

fix relative import bug for advanced test suite (#82)

prevent unwanted installation of tests package alongside unifhy (#83)

Documentation

add data to run tutorial and revise tutorial accordingly (#81)

revise installation instructions to recommend conda over pip because of esmpy dependency not available on PyPI (#81)
opened by ThibHlln 2
Rename states for prognostics?

I had named "states" those model variables that need initial values to start model integration. These variables are the ones being allocated memory, and being recorded in dump files to allow for model resuming after crashing. This is a name commonly used in hydrological models for such variables.

However, in atmospheric models and by extension land surface models, these variables are named "prognotics".

Now, in physics/thermodynamics, "state variables" are the ones you find in "equations of state", and it appears that not all "state variables" meet the requirements of needing initial values to start model integration, which to me seems to explain why the term "prognostic" is used (i.e. not all state variables are prognostics).

So I am questioning which term we should use, as I don't think "prognostic" will speak to most hydrologists, while if it is misleading to use "state" because it leads to confusion with the physics?
question

opened by ThibHlln 2
Remove science components from framework
resolve #45

The science components are removed from the package. The subpackage components only contained module component.py and its utils, so the module is moved to package root, and its utils are put with the other package utils.

The existing science components now live in their own repositories:

https://github.com/cm4twc-org/cm4twccontrib-artemis

https://github.com/cm4twc-org/cm4twccontrib-rfm

The dummy components remain in the package to run the tests.

The documentation is updated to reflect the externalisation of the science components, including a review of the workflows to share science component contributions.

A template for component contribution now exists:

https://github.com/cm4twc-org/cm4twccontrib-template

This PR also includes some API changes:

Component et al. move from cm4twc.components to cm4twc.component

Component class attributes _land_sea_mask and _flow_direction become class attributes _requires_land_sea_mask and _requires_flow_direction, respectively

MetaComponent class properties land_sea_mask and flow_direction become class methods requires_land_sea_mask() and requires_flow_direction()

MetaComponent class properties to explore component's definition (e.g. inwards_info, inputs_info, etc.) become inwards_metadata, inputs_metadata, etc.
opened by ThibHlln 2
Add user-customisable divisions for component states

At the moment, a Component state can be divided into a vector (e.g. useful for soil layers) through the use of the key 'divisions' in _parameters_info. The value for this key is expected to be an integer. This dictionary is expected to be created/modified by a component contributor, but some components may support a customisable value (e.g. customisable number of soil layers) which would directly impact on the memory to be allocated for this component state.

This could be modified by setting this integer value as a component constant. However, the way the default value for a constant is set at the moment (i.e. as a keyword argument in component's run method) is problematic because it is not easy to communicate this value to the component for its _instantiate_states method. So maybe it would be better to make it a component parameter, but the distinction between paramater and constant in the framework has always been that the former is subject to tuning while the latter is not, but I wonder if e.g. the number of soil layers should not be subject to tuning?

These two options try to reuse existing concepts/functionality, if none is suited for this purpose, a third option adding a new concept and potentially requiring new functionalities could be required.
functionality

opened by ThibHlln 2
filename not saved in YAML configuration file

There seems to be a bug in unifhy when the files contained in the unifhy.DataSet are small enough so that they can fit in memory. This seems to be linked to the documented behaviour of cf.Field.get_filenames:

The file names in normalised, absolute form. If all of the data are in memory then an empty set is returned.

This results in the filenames attribute of a given unifhy.Variable to be an empty set. Ultimately leading to saving an empty sequence of filenames in the YAML file, so that a to_yaml > from_yaml workflow fails.

It would be good to check with cf-python whether there is another functionality that keeps track of filenames, or if it makes sense for their package to offer such functionality. If not, it will be up to unifhy to keep track of them.
bug

opened by ThibHlln 1
Separate science components from framework
I suggest to split the framework functionalities and the science it holds by removing the science components from this repository, and by creating distinct repositories for each (set of) science component(s).

The main motivations are:

have a clean separation of concerns between technical aspects and science aspects

keep the installation of the framework lightweight (i.e. no need for a user to install all compatible science components if they are not going to use them all, the user can cherry-pick the science components they want to use)

simplify the installation of the framework for users not interested in Fortran/C-based science components (since these require compilation, they represent a potential source for complications)

give science component contributors their own space (i.e. repository) where they can have e.g. development branches, and where they can handle releases of new versions themselves without the need to wait for a newer version of the framework to be released – compatibility between framework and science components will be preserved through the specification of the version(s) of cm4twc these science components can run with in their dependencies, e.g. in their requirements.txt

Note, the framework would keep its "dummy" components for testing purposes.

This may complicate slighly the installation instructions and the import statements, but I think it is really a minor inconvenience in comparison with the benefits listed above. However, see below comparisons between having everything in one repository and having separate repositories.

installation

one repository (current)

pip install cm4twc

separate repositories (proposed)

pip install cm4twc cm4twccontrib-artemis cm4twccontrib-rfm

usage

one repository (current)

import cm4twc sl = cm4twc.components.surfacelayer.Artemis(...) ss = cm4twc.components.subsurface.Artemis(...) ow = cm4twc.components.openwater.RFM(...) md = cm4twc.Model(sl, ss, ow, ...)

separate repositories (proposed)

import cm4twc import cm4twccontrib.artemis import cm4twccontrib.rfm sl = cm4twccontrib.artemis.SurfaceLayerComponent(...) ss = cm4twccontrib.artemis.SubSurfaceComponent(...) ow = cm4twccontrib.rfm.OpenWaterComponent(...) md = cm4twc.Model(sl, ss, ow, ...)

The science components would require to follow some packaging and naming conventions (as shown in the example above), namely:

follow suggestions in PEP 423: "use the ${MAINPROJECT}contrib.${PROJECT} pattern to store community contributions", where MAINPROJECT would be cm4twc and where PROJECT should be the name of the model from which the science component(s) is (are) originating from

follow PEP 8's package naming convention "Python packages should also have short, all-lowercase names, although the use of underscores is discouraged", so e.g. acronyms should be lowercased.

Note, while PEP 423 has a "deferred" status, it is based on what sphinx is already doing for community contributions, and appears to be good practice.

In our case, Artemis and RFM projects (e.g. on PyPI) would be named cm4twccontrib-artemis and cm4twccontrib-rfm, respectively. And they would be imported as cm4twccontrib.artemis and cm4twccontrib.rfm, respectively. Repositories for these models have already been created https://github.com/cm4twc-org/cm4twccontrib-artemis, https://github.com/cm4twc-org/cm4twccontrib-rfm.
science
opened by ThibHlln 1
Cache remapping weights

resolve #43

Following a PR in cf-python https://github.com/NCAS-CMS/cf-python/pull/223, the regrids/regridc methods of a cf.Field can now return a RegridOperator so that the remapping weights are calculated then, and stored in the RegridOperator which can be re-used on demand later on.

This PR makes use of this new feature is the set_up of the Exchanger, where the RegridOperator is computed once if remapping is required, and then used in get_transfer.

This PR requires cf-python>=3.10.0 to work, so I expect our tests to fail because this version has not been released on PyPI yet. While their PR is merged in their master branch already, I would rather not modify our GitHub Actions workflow to install from their repo rather than from PyPI.

opened by ThibHlln 1
Consider fucntionality to allow alternative treatment of soil water stress

JULES has a simple representation of the impact of soil moisture on photosynthesis. Other models may not be able to use a stress factor between 0 and 1
enhancement

opened by rich-HJ 0
add short_name for variables

While CF standard names should be retained because they reduce the ambiguity in what the variables are about, in the code, this makes for very long names which impedes on readability. So alongside standard_name, it would be good to have short_name using those in the AMIP column in the CF standard name list if they exist, or with new ones (maybe look at ISIMIP). At the moment, the contributors are likely to bind the argument names (CF names) to new bespoke shorter names in the code.
enhancement

opened by ThibHlln 0
Support multiple flow directions

At the moment, only one flow_direction per space domain is permitted, but one can easily imagine situations where surface flow direction may differ from subsurface flow direction, so it would be good to support more than one flow_direction.
enhancement

opened by ThibHlln 0
North-South orientation of `Grid`

At the moment, the latitude (Y) coordinates are in increasing order (i.e. from -90 degrees North to +90 degrees North) which results in [Y, X] arrays featuring the southernmost points on the first row, and the northernmost points on the last row. Which is not necessarily intuitive, e.g. when "printing" the array the World is "upside down".

So we need to decide whether we want to stick with this, or enforce the opposite (i.e. latitude coordinate should be in decreasing order).
question

opened by ThibHlln 0
Should transfers be impacted by the component solver history?

At the moment, the framework components feature a "solver_history" which corresponds to the number of time steps a given call to the run method needs to have access to (e.g. a simple explicit backward Euler method will require a history of 2 [for t and t-1], but other methods may be multi-steps and require also t-2, t-3*, etc.).

Solver history is not currently a feature made very public because it only has impacts on the depth of the array storing the various timesteps of the component states. But I am wondering whether this should also impact on the depth of the arrays for the transfers between components, e.g. if a daily component needs to know about its internal temperature yesterday and the day before yesterday, does it also need to know about say the radiation from another component for the other component's timesteps overlapping with yesterday and the day before yesterday?

And if it should impact on the transfers, should the framework give the component the transfers for all the overlapping timesteps from the other components, or do some averaging over the time steps and give only one value?
question

opened by ThibHlln 0
Support 3D components

This involves reactivating the vertical dimension of Grid (see 6aca7f2f336c14d0e732515a4829250fe26953f7 and #28 for context) and then adding functionality for component contributors to express whether they need/support a vertical dimension.
enhancement

opened by ThibHlln 0

Releases(v0.1.1)

v0.1.1(Apr 17, 2022)
Released on 2021-12-07.

Bug fixes

fix intermittent data filenames loss in YAML configuration file (#80)

fix relative import bug for advanced test suite (#82)

prevent unwanted installation of tests package alongside unifhy (#83)

Documentation

add data to run tutorial and revise tutorial accordingly (#81)

revise installation instructions to recommend conda over pip because of esmpy dependency not available on PyPI (#81)

Source code(tar.gz)
Source code(zip)
v0.1.0(Dec 7, 2021)
Released on 2021-12-07.

Functional changes

constrain temporal and spatial resolutions of components forming a model to be integer multiples of one another (#67)

enforce two-dimensional spatial domains for components (#69)

allow components to use/produce only parts of the standardised transfers through the framework interfaces (#76)

API changes

add units requirement for component parameters and constants (#21)

move Component and its subclasses from subpackage components to package root (#46)

rename component class attributes _flow_direction and _land_sea_mask in component definition to _requires_flow_direction and _requires_land_sea_mask, respectively (#46)

remove science components (Artemis and RFM) from framework (#45)

remove vertical dimension (i.e. altitude) in LatLonGrid, RotatedLatLonGrid, and BritishNationalGrid (#69)

replace State dunder methods __getitem__ and __setitem__ with get_timestep and set_timestep methods (#71)

include component inputs as arguments given to initialise method (#75)

revise/refine component inward and outward transfers (#76)

Bug fixes

fix dump file update bug due to missing 'divisions' dimension (#32)

fix model identifier renaming not propagating to its components' identifiers (#48)

fix impossibility to run Model using a Component on the BritishNationalGrid (#51)

fix failed aggregation of fields with no standard name in DataSet (#52)

apply land_sea_mask to underlying field of Grid to be used in remapping (#59)

Enhancements

add support for arrays for component parameters (#21)

add support for multiple divisions for component states (#39)

add support for customisable divisions for component states (#31)

add time slice for I/O operations (user customisable) (#42)

cache remapping weights at initialisation (#44)

add cell_area property to SpaceDomain that can be provided by the user or else automatically computed for Grid (#61)

add initialised_states property to Component to allow component contributors not to overwrite user-defined initial conditions (#75)

add shelf attribute to Component to allow the communication of data between component methods (#75)

add _inwards and _outwards component class attributes to allow contributors to declare what interface transfers their component use and produce, respectively (#76)

Dependencies

change dependency cf-python>=3.11.0

drop support for Python 3.6

Documentation

document 'divisions' for component states in preparation page (#35)

move API reference page to doc tree root

move science library page to doc tree root

add support page

add change log page

add logo for package

Source code(tar.gz)
Source code(zip)

Owner

Unified Framefork for Hydrology - Community Organisation

GitHub https://unifhy-org.github.io/unifhy

Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

Viewflow Viewflow is a framework built on the top of Airflow that enables data scientists to create materialized views. It allows data scientists to f

114 Oct 12, 2022

Package pyVHR is a comprehensive framework for studying methods of pulse rate estimation relying on remote photoplethysmography (rPPG)

Package pyVHR (short for Python framework for Virtual Heart Rate) is a comprehensive framework for studying methods of pulse rate estimation relying on remote photoplethysmography (rPPG)

261 Jan 3, 2023

Simple dependency injection framework for Python

A simple, strictly typed dependency injection library.

14 Jun 29, 2022

Backtest framework based on DAGs

MultitaskQueue It's a simple framework based on three composed concepts: Task: A task is the smaller unit of execution or simple a node in the DAG, ev

4 Dec 9, 2021

Framework for creating efficient data processing pipelines

Aqueduct Framework for creating efficient data processing pipelines. Contact Feel free to ask questions in telegram t.me/avito-ml Key Features Increas

137 Dec 29, 2022

A topology optimization framework written in Taichi programming language, which is embedded in Python.

Taichi TopOpt (Under Active Development) Intro A topology optimization framework written in Taichi programming language, which is embedded in Python.

41 Nov 17, 2022

Sabe is a python framework written for easy web server setup.

Sabe is a python framework written for easy web server setup. Sabe, kolay web sunucusu kurulumu için yazılmış bir python çerçevesidir. Öğrenmesi kola

2 Jan 1, 2022

Reproducible nvim completion framework benchmarks.

Nvim.Bench Reproducible nvim completion framework benchmarks. Runs inside Docker. Fair and balanced Methodology Note: for all "randomness", they are g

14 Nov 20, 2022

EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.

87 Dec 27, 2022

Results of Robot Framework 5.0 survey

Robot Framework 5.0 survey results We had a survey asking what features Robot Framework community members would like to see in the forthcoming Robot F

2 Oct 16, 2021

The dynamic code loading framework used in LocalStack

localstack-plugin-loader localstack-plugin-loader is the dynamic code loading framework used in LocalStack. Install pip install localstack-plugin-load

5 Oct 9, 2022

A new mini-batch framework for optimal transport in deep generative models, deep domain adaptation, approximate Bayesian computation, color transfer, and gradient flow.

BoMb-OT Python3 implementation of the papers On Transportation of Mini-batches: A Hierarchical Approach and Improving Mini-batch Optimal Transport via

18 Nov 14, 2022

A Unified Framework for Hydrology

Related tags

Overview

Unified Framework for Hydrology

Comments

Bug fixes

Documentation

installation

usage

Releases(v0.1.1)

v0.1.1(Apr 17, 2022)

Bug fixes

Documentation

v0.1.0(Dec 7, 2021)

Functional changes

API changes

Bug fixes

Enhancements

Dependencies

Documentation

Owner

Unified Framefork for Hydrology - Community Organisation

Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

Package pyVHR is a comprehensive framework for studying methods of pulse rate estimation relying on remote photoplethysmography (rPPG)

Simple dependency injection framework for Python

Backtest framework based on DAGs

Framework for creating efficient data processing pipelines

A topology optimization framework written in Taichi programming language, which is embedded in Python.

Sabe is a python framework written for easy web server setup.

Reproducible nvim completion framework benchmarks.

EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.

Results of Robot Framework 5.0 survey

The dynamic code loading framework used in LocalStack

A new mini-batch framework for optimal transport in deep generative models, deep domain adaptation, approximate Bayesian computation, color transfer, and gradient flow.

Framework To Ease Operating with Quantum Computers

An execution framework for systematic strategies

Bionic is Python Framework for crafting beautiful, fast user experiences for web and is free and open source.

Flames Calculater App used to calculate flames status between two names created using python's Flask web framework.

redun aims to be a more expressive and efficient workflow framework

Python framework to build apps with the GASP metaphor

MuMMI Core is the underlying infrastructure and generalizable component of the MuMMI framework