Simulating ground motion fields conditioned on station data
Rationale
Given an earthquake rupture model, a site model, and a set of ground shaking intensity models (GSIMs), the OpenQuake scenario
calculator simulates a set of ground motion fields (GMFs) at the target sites for the requested set of intensity measure types (IMTs). This set of GMFs can then be used in the scenario_damage
and scenario_risk
calculators to estimate the distribution of potential damage, economic losses, fatalities, and other consequences. The scenario calculators are useful for simulating both historical and hypothetical earthquakes.
For historical events, ground motion recordings and macroseismic intensity data ("ground truth") can provide additional constraints on the ground shaking estimates in the region affected by the earthquake. The U.S. Geological Survey ShakeMap system (Worden et al., 2020[^1], Wald et al., 2022[^2]) provides near-real time estimates of ground shaking for earthquakes, conditioned on station and macroseismic data where available. Support to read USGS ShakeMaps and simulate GMFs based on the mean and standard deviation values estimated by the ShakeMap — for peak ground acceleration (PGA) and spectral acceleration (SA) at 0.3s, 1.0s, and 3.0s — was added to the OpenQuake-engine in #3545 and #3606, and a link to the scenario_damage
and scenario_risk
calculators was added in #3608, based on the methodology described by Silva & Horspool (2019)[^3]. Further support for reading ShakeMaps from sources other than the USGS was added as discussed in #6572. Support for MMI was added in #6660, and some performance improvements were introduced in #6624.
Starting from USGS ShakeMap v4.1, the conditioning of the ground shaking is based on the methodology developed by Engler et al. (2022)[^4], which preserves the partition of the (conditioned) within-event and between-event residuals, thus enabling more accurate propagation of the uncertainties through a Monte Carlo simulation approach.
The ability to run OpenQuake scenario damage and loss calculations starting from ShakeMaps from multiple providers has proved to be a useful feature for the assessment of damage and losses for past earthquakes, development and calibration of fragility and vulnerability models, and testing of probabilistic seismic risk models[^5][^6]. However, there are still many instances where the user may wish to consider a different set of assumptions in the ground motion conditioning process compared to the ShakeMap service providers. For instance, users may be interested in:
- testing the impact of using different rupture geometries and source parameters from published literature for an event[^7]
- including additional recording station data that they have obtained, or excluding certain outliers based on criteria they deem relevant
- employing a different Ground-Motion–Intensity Conversion Equation (GMICE) to convert macroseismic data to ground motion intensities or choosing to ignore the macroseismic data altogether
- testing the behavior of individual GMPEs compared to the station data or using a weighted average GMPE for the tectonic region based on a probabilistic hazard model
- incorporating a custom vs30 grid or using custom site-amplification functions if microzonation studies are available for the affected area
- obtaining estimates for IMT(s) that are not directly available from the ShakeMap provider(s)
- employing different models for the spatial cross-correlation of the within-event residuals and cross-correlation of the IMs for the between-event residuals
For the reasons mentioned above, it would be beneficial to include within the OpenQuake-engine, a Conditioned GMF calculator tightly coupled with the existing scenario calculator[^8], that would:
- accept user-provided station data in addition to the usual inputs for a scenario calculation including an earthquake rupture model, a site model, and a single GSIM or a GSIM logic tree
- calculate the conditioned mean and conditioned between-event and within-event covariances for the target sites for the requested IMTs, for each GSIM given the station data, based on the procedure outlined in Engler et al. (2022)[^4]
- simulate the requested number of GMFs at the target sites
- proceed to damage / loss / consequence calculations if desired
New inputs
Station data csv file
In addition to the usual inputs for a scenario calculation including an earthquake rupture model, a site model, and a single GSIM or a GSIM logic tree, the user would also need to provide a csv file containing the observed intensity values for one or more intensity measure types at a set of ground motion recording stations.
The path to this input file could be specified in the job file using a new parameter station_data_file
, eg.:
[station_data]
station_data_file = stationlist_all.csv
The csv file snippet below illustrates the information that would be contained within such a station data file:
|STATION_ID |STATION_NAME|LONGITUDE |LATITUDE |STATION_TYPE|PGA_VALUE|PGA_LN_SIGMA|SA(0.3)_VALUE|SA(0.3)_LN_SIGMA|SA(1.0)_VALUE|SA(1.0)_LN_SIGMA|
|-----------------------|------------|-----------|---------|------------|---------|------------|-------------|----------------|-------------|----------------|
|VIGA |LAS VIGAS |-99.23326 |16.75870|seismic |0.3550|0.0 |0.5262 |0.0 |0.1012 |0.0 |
|VNTA |LA VENTA |-99.81885 |16.91426 |seismic |0.2061|0.0 |0.3415 |0.0 |0.1051 |0.0 |
|COYC |COYUCA |-100.08996|16.99778|seismic |0.1676|0.0 |0.2643 |0.0 |0.0872 |0.0 |
|⋮ |⋮ |⋮ |⋮ |⋮ |⋮ |⋮ |⋮ |⋮ |⋮ |⋮ |
|UTM_14Q_041_186|NA |-99.79820 |16.86687 |macroseismic|0.6512 |0.8059 |0.9535 |1.0131 |0.4794 |1.0822 |
|UTM_14Q_041_185|NA |-99.79761 |16.77656 |macroseismic|0.5797 |0.8059 |0.8766 |1.0131 |0.4577 |1.0822 |
|UTM_14Q_040_186|NA |-99.89182 |16.86655 |macroseismic|0.4770 |0.8059 |0.7220 |1.0131 |0.3223 |1.0822 |
|⋮ |⋮ |⋮ |⋮ |⋮ |⋮ |⋮ |⋮ |⋮ |⋮ |⋮ |
Mandatory fields
- STATION_ID: string; subject to the same validity checks as the id fields in other input files
- LONGITUDE, LATITUDE; or LON, LAT: floats; valid longitude and latitude values
- STATION_TYPE: string; currently the only two valid options are 'seismic' and 'macroseismic'
- <IMT>_VALUE, <IMT>_LN_SIGMA / <IMT>_STDDEV: floats; for each IMT observed at the recording stations, two values should be provided – for IMTs that are assumed to be lognormally distributed (eg. PGV, PGA, SA), these would be the median and lognormal standard deviation using the column headers <IMT>_VALUE, <IMT>_LN_SIGMA respectively; and for other IMTs (eg., MMI), these would simply be the mean and standard deviation using the column headers <IMT>_VALUE, <IMT>_STDDEV respectively
Optional fields
- STATION_NAME: string; free form and not subject to the same constraints as the STATION_ID field, the optional STATION_NAME field can contain information that aids in identifying a particular station
- Other fields: could contain notes about the station, flags indicating outlier status for the values reported by the station, site information, etc., but these optional fields will not be read by the station_data_file parser
Site model file for the station sites
All of the site parameters required by the GMMs used in the conditioned scenario calculation will also need to be provided for the set of sites in the station_data_file. This could be accomplished in various ways –
A: Use the existing site model csv file parser to read the site model for the station locations at the same time as the site model file for the hazard / exposure sites, eg:
[site_params]
site_model_file = site_model.csv site_model_stations.csv
The advantage in this case would be that the already existing site model parser can be used directly, without needing to manage two separate site models under the hood. The drawback is that the two site models will get merged and it could become difficult to separate the stations from the hazard/exposure sites in downstream parts of the calculation.
B: Read the site model for the station locations separately from the site model file for the hazard / exposure sites, eg:
[site_params]
site_model_file = site_model.csv
station_site_model_file = site_model_stations.csv
The advantage in this case would be that the site model for the stations can be kept separate from the the site model file for the hazard / exposure sites. The site model for the stations will be used only during the conditioning process, whereas the site model file for the hazard / exposure sites will be used for the ground motion simulation. The drawback is the requirement of adding a new input parameter station_site_model_file
and separate management of two site model files, which might be a non-trivial task.
Ground motion models
Users can choose to specify one or more GSIMs for the conditioned scenario calculation using any of the following options:
- A single GMM, eg.
gsim = ChiouYoungs2014
- A GSIM logic tree file, eg.
gsim_logic_tree_file = gsim_logic_tree.xml
- A weighted average GSIM, eg.
gsim_logic_tree_file = gsim_weighted_avg.xml
, where the file gsim_weighted_avg.xml
can be constructed using the modifiable GMPE structure for AvgGMPE as shown in the example below:
<?xml version="1.0" encoding="UTF-8"?>
<nrml xmlns:gml="http://www.opengis.net/gml"
xmlns="http://openquake.org/xmlns/nrml/0.4">
<logicTree logicTreeID='lt1'>
<logicTreeBranchingLevel branchingLevelID="bl1">
<logicTreeBranchSet
branchSetID="bs1"
uncertaintyType="gmpeModel"
applyToTectonicRegionType="Active Shallow Crust">
<logicTreeBranch branchID="br1">
<uncertaintyModel>
[AvgGMPE]
b1.AbrahamsonEtAl2014.weight=0.22
b2.BooreEtAl2014.weight=0.22
b3.CampbellBozorgnia2014.weight=0.22
b4.ChiouYoungs2014.weight=0.22
b5.Idriss2014.weight=0.12
</uncertaintyModel>
<uncertaintyWeight>
1.0
</uncertaintyWeight>
</logicTreeBranch>
</logicTreeBranchSet>
</logicTreeBranchingLevel>
</logicTree>
</nrml>
For a regular GSIM logic tree, the conditioning steps will be undertaken for each of the GSIMs in the logic tree file separately, and GMFs will be generated for each of the GSIMs separately as well. This can be useful when the modeller is evaluating or comparing the predictions from different GSIMs for a scenario.
If a weighted average GSIM is provided, the conditioning and GMF simulation will occur only for the blended GSIM. This can be useful when the modeller is evaluating or trying to calibrate the performance of the set of GSIMs for the tectonic region as a whole against one or more scenarios, perhaps looking at the GSIM logic tree from a probabilistic hazard model for the region.
Note: Each of the GSIMs specified for a conditioned GMF calculation must provide the within-event and between-event standard deviations separately. If a GSIM of interest provides only the total standard deviation, a (non-ideal) workaround might be for the user to specify the ratio between the within-event and between-event standard deviations (example), which the engine will use to add the between and within standard deviations to the GSIM.
Calculation steps
The broad calculation steps are as follows, with more detailed descriptions of each step provided subsequently:
- Read the usual scenario/scenario_damage/scenario_risk inputs
- If the
station_data_file
is present in the job file,
- Read the station data input file
- Trigger the ConditionedGmfComputer instead of the regular GmfComputer
- Calculate and store the conditioned between-event residual mean at every station site
- Calculate and store the nominal event bias
- Calculate the conditioned mean and covariance of the ground motion at the target sites
- Simulate and store the requested number of ground motion fields at the target sites
- Proceed to damage / loss / consequence calculations if desired
Reading the station data input file
This would require adding a function get_station_data(..)
to commonlib/readinput.py, to extract the following three pieces of information from the station data csv file:
- station_data: a dataframe containing the mean and standard deviation values, in lognormal space or otherwise, for the set of valid IMTs in the csv file
- station_sites: a dataframe containing {station_id, lon, lat} for the list of stations in the csv file
- imts: the set of valid intensity measure types for which intensity observations are available in the csv file
The ConditionedGmfComputer
The conditoned GMF calculator should essentially follow the conditioning steps outlined in Appendix B of Engler et al. (2022)[^4], see attached pdf of Appendix B for the equations. In short, the main steps are for each target IMT:
- From station_data, get the mean observations (recorded values at the stations, "yD") and any additional sigma ("var_addon_D") for the observations that are uncertain, which might arise if the values for this particular IMT were not directly recorded, but obtained by conversion equations or cross-correlation functions
- Compute the predicted mean intensity ("mu_yD") and uncertainty components ("phi_D" and "tau_D") at the observation points, from the specified GMM(s)
- Compute the raw residuals at the observation points, by subtracting the predicted intensity from the observed intensity (zeta_D = yD - mu_yD)
- Compute the station data within-event covariance matrix ("cov_WD_WD"), and add on the additional variance of the residuals for the cases where the station data is uncertain
- Compute the (pseudo)-inverse of the station data within-event covariance matrix ("cov_WD_WD_inv")
- Compute the cross-intensity measure correlations for the observed intensity measures ("corr_HD_HD")
- Using Bayes rule, compute the posterior distribution (i.e., mean "mu_HD_yD" and covariance "cov_HD_HD_yD") of the normalized between-event residual, employing Engler et al. (2022), eqns B8 and B9 (also eqns B18 and B19)
- Compute the distribution of the conditional between-event residual ("mu_BD_yD" and "cov_BD_BD_yD")
- Compute the nominal bias and its variance as the means of the conditional between-event residual mean and covariance, display this information in the calculation log, and also store it as one of the outputs of the calculation
- From the GMMs, compute the predicted mean ("mu_Y") and stddevs ("phi_Y" and "tau_Y") of the intensity at the target sites
- Compute the mean of the conditional between-event residual for the target sites ("mu_BY_yD"), eqn. B18 and B19a
- Compute the within-event covariance matrices for the target sites and observation sites ("cov_WY_WD" and "cov_WD_WY")
- Compute the within-event covariance matrix for the target sites (apriori, "cov_WY_WY")
- Compute the regression coefficient matrix ("RC" = cov_WY_WD × cov_WD_WD_inv)
- Compute the scaling matrix "C" for the conditioned between-event covariance matrix, eqn. B22
- Compute the conditioned within-event covariance matrix for the target sites ("cov_WY_WY_wD"), eqn. B21
- Compute the "conditioned between-event" covariance matrix for the target sites ("cov_BY_BY_yD"), last term of eqn. B17
- Compute the conditioned mean of the ground motion at the target sites ("mu_Y_yD"), eqn. B16
- Compute the conditional covariance of the ground motion at the target sites ("cov_Y_Y_yD"), eqn. B17
- Use the conditioned mean vector (mu_Y_yD) and standard deviation matrices (cov_WY_WY_wD and cov_BY_BY_yD) computed in the preceding two steps to simulate the requested number of ground motion fields at the target sites
Outputs
- Nominal event bias (from step 9 above): one bias value for each IMT, for every GSIM used in the calculation
- Conditioned between-event residual mean (from step 8 above): one bias value for each station site, for each IMT, for every GSIM used in the calculation (useful if the between-event uncertainties are heteroscedastic, depending upon either Vs30, the median estimated ground motions, or both)
- Simulated ground motion fields
- All other existing outputs of the
scenario
calculator
Verification tests
Since the implementation is expected to closely follow Engler et al. (2022), the results should match for the set of eleven simplified one-dimensional verification tests devised by Worden et al. (2020) to check the USGS ShakeMap implementation of the Engler et al. (2022) equations.
Future improvements
The conditioning process requires the specification of a spatial correlation model of the within-event residuals between two points for the intensity measures involved in the calculation, a model for the cross-measure correlation of the within-event residuals, and a model for the cross-measure correlation of the between-event residuals. Assuming a conditional independence of the cross-measure correlation and the spatial correlation of a given intensity measure, the spatial cross-correlation of two different IMs at two different sites can be obtained as the product of the cross-correlation of two IMs at the same location and the spatial correlation due to the distance between sites. Given the limited set of correlation models available in the engine currently, the choice of the three aforementioned correlation models could be hardcoded in the initial implementation of the conditioned GMF calculator, using JB2009CorrelationModel
as the spatial correlation model of the within-event residuals, BakerJayaram2008
as the cross-measure correlation model for the within-event residuals, and GodaAtkinson2009
as the cross-measure correlation model for the between-event residuals.
Ideally, the choice of the different correlation models should be exposed to the user through parameters in the job file. Direct spatial cross-correlation models for the within-event residuals[^9][^10] could also be used instead of separate models for the spatial correlation and cross-measure correlation. Supporting these choices would entail a substantial refactoring of the existing correlation.py and cross_correlation.py modules of the engine, and is thus left for a future issue.
References
[^1]: Worden, C. B., Thompson, E. M., Hearne, M. G., & Wald, D. J. (2020). ShakeMap Manual Online: technical manual, user’s guide, and software guide, U. S. Geological Survey. URL: http://usgs.github.io/shakemap/. DOI: https://doi.org/10.5066/F7D21VPQ.
[^2]: Wald, D. J., Worden, C. B., Thompson, E. M., & Hearne, M. G. (2022). ShakeMap operations, policies, and procedures. Earthquake Spectra, 38(1), 756–777. DOI: https://doi.org/10.1177/87552930211030298
[^3]: Silva, V., & Horspool, N. (2019). Combining USGS ShakeMaps and the OpenQuake-engine for damage and loss assessment. Earthquake Engineering and Structural Dynamics, 48(6), 634–652. DOI: https://doi.org/10.1002/eqe.3154
[^4]: Engler, D. T., Worden, C. B., Thompson, E. M., & Jaiswal, K. S. (2022). Partitioning Ground Motion Uncertainty When Conditioned on Station Data. Bulletin of the Seismological Society of America, 112(2), 1060–1079. DOI: https://doi.org/10.1785/0120210177
[^5]: Riga, E., Karatzetzou, A., Apostolaki, S., Crowley, H., & Pitilakis, K. D. (2021). Verification of seismic risk models using observed damages from past earthquake events. Bulletin of Earthquake Engineering (Vol. 19). DOI: https://doi.org/10.1007/s10518-020-01017-5
[^6]: Crowley, H., Silva, V., Kalakonas, P., Martins, L., Weatherill, G. A., Pitilakis, K. D., Riga, E., Borzi, B., & Faravelli, M. (2020). Verification of the European Seismic Risk Model (ESRM20). In 17th World Conference on Earthquake Engineering. Sendai, Japan.
[^7]: de Pro‐Díaz, Y., Vilanova, S., & Canora, C. (2022). Ranking Earthquake Sources Using Spatial Residuals of Seismic Scenarios: Methodology Application to the 1909 Benavente Earthquake. Bulletin of the Seismological Society of America. DOI: https://doi.org/10.1785/0120220067
[^8]: The GMPE Strong Motion Modeller's Toolkit (gmpe-smtk) includes an example of conditional simulation of ground motion fields using hazardlib in conditional_simulation.py, though the current issue envisages a much tighter coupling with the engine's scenario calculator architecture, using the formulation from Appendix B of Engler et al. (2022)[^4] to compute the conditioned mean and within-event and between-event residuals, and permitting users to input station data directly.
[^9]: Loth, C., & Baker, J. W. (2013). A spatial cross‐correlation model of spectral accelerations at multiple periods. Earthquake Engineering & Structural Dynamics, 42(3), 397–417. DOI: https://doi.org/10.1002/eqe2212
[^10]: Du, W., & Ning, C. L. (2021). Modeling spatial cross-correlation of multiple ground motion intensity measures (SAs, PGA, PGV, Ia, CAV, and significant durations) based on principal component and geostatistical analyses. Earthquake Spectra, 37(1), 486–504. DOI: https://doi.org/10.1177/8755293020952442
proposal