TenSIR
Parameter estimation on epidemic data under the SIR model using a novel algorithm for differentiated uniformization of Markov transition rate matrices in tensor representation.
This repository contains the code for the paper.
Data
We used the data from the Austrian BMSGPK on the COVID-19 pandemic from March 2020 to August 2020. A CSV file containing the data used by us can be found here if the API is subject to change in the future.
Results
Kernel density estimation plot of points generated by Hamilton Monte Carlo simulation
The x marks the least squares estimate after grid search using the default deterministic model (system of ODEs).
Susceptible and infected people to/with COVID-19 in Austria during the early months of the pandemic
Reproducing the results
Prerequisites
- Python 3.7+ with Pip (tested with Python 3.9 and 3.10)
Setup
We advise you to use a virtual environment for running the code. After you activated it change to the source
directory and run
pip install -r requirements.txt
Generating points
To exactly reproduce our results, one should use the generate-points.py script as
python generate-points.py <month> <run>
where <month>
is a number from 3 (March 2020) to 8 (August 2020) and <run>
specifies a number for an independent HMC run. The random number generator is seeded uniquely for each run by seed = month * 1000 + run
. For the HMC simulation, we did 10 runs (with numbers 0 - 9) for each month (3 - 8) resulting in 60 runs total.
Note that the script assumes 48 CPU threads. This can be changed in the script, though diminishing returns are expected for thread counts greater than 60. More runs can of course be computed independently in parallel.
The parameters for all simulations were as follows (as seen in generate-points.py):
- Initial parameter
Theta0 = (0.1, 0.1)
(*) - Covariance matrix
M = diag(2)
- "Learning rate"
epsilon = 0.05
- Leapfrog count
L = 5
per generated point - Simulation until 100 points are accepted for each run
- Discard the first 10% of accepted points as "burn-in" before plotting
(*) In our framework we use the convention Theta = (alpha, beta)
and theta = (log(alpha), log(beta))
where alpha
, beta
are the parameters of the SIR model.
Leveraging HPC clusters
Especially for months March, April and August the simulation can take quite some time. If there is access to a compute cluster that uses slurm the slurm-job-template.sh can be utilized. Note that the venv must be setup on the target architecture.