Library of Stan Models for Survival Analysis

Hammer Lab

Last update: Jan 6, 2023

Related tags

Machine Learning survivalstan

Overview

survivalstan: Survival Models in Stan

author: Jacki Novik

Overview

Library of Stan Models for Survival Analysis

Features:

Variety of standard survival models
- Weibull, Exponential, and Gamma parameterizations
- PEM models with variety of baseline hazards
- PEM model with varying-coefficients (by group)
- PEM model with time-varying-effects
Extensible framework - bring your own Stan code, or edit the models above
Uses pandas data frames & patsy formulas
Graphical posterior predictive checking (currently PEM models only)
Plot posterior estimates of key parameters using seaborn
Annotate posterior draws of parameter estimates, format as pandas dataframes
Works with extensions to pystan, such as stancache or pystan-cache

Support

Documentation is available online.

For help, please reach out to us on gitter.

Installation / Usage

Install using pip, as:

$ pip install survivalstan

Or, you can clone the repo:

$ git clone https://github.com/hammerlab/survivalstan.git
$ pip install .

Contributing

Please contribute to survivalstan development by letting us know if you encounter any bugs or have specific feature requests.

In addition, we welcome contributions of:

Stan code for survival models
Worked examples, as jupyter notebooks or markdown documents

Usage examples

There are several examples included in the example-notebooks, roughly one corresponding to each model.

If you are not sure where to start, Test pem_survival_model with simulated data.ipynb contains the most explanatory text. Many of the other notebooks are sparse on explanation, but do illustrate variations on the different models.

For basic usage:

import survivalstan
import stanity
import seaborn as sb
import matplotlib.pyplot as plt
import statsmodels

## load flchain test data from R's `survival` package
dataset = statsmodels.datasets.get_rdataset(package = 'survival', dataname = 'flchain' )
d  = dataset.data.query('futime > 7')
d.reset_index(level = 0, inplace = True)

## e.g. fit Weibull survival model
testfit_wei = survivalstan.fit_stan_survival_model(
	model_cohort = 'Weibull model',
	model_code = survivalstan.models.weibull_survival_model,
	df = d,
	time_col = 'futime',
	event_col = 'death',
	formula = 'age + sex',
	iter = 3000,
	chains = 4,
	make_inits = survivalstan.make_weibull_survival_model_inits
	)

## coefplot for Weibull coefficient estimates
sb.boxplot(x = 'value', y = 'variable', data = testfit_wei['coefs'])

## or, use plot_coefs
survivalstan.utils.plot_coefs([testfit_wei])

## print summary of MCMC draws from posterior for each parameter
print(testfit_wei['fit'])


## e.g. fit Piecewise-exponential survival model 
dlong = survivalstan.prep_data_long_surv(d, time_col = 'futime', event_col = 'death')
testfit_pem = survivalstan.fit_stan_survival_model(
	model_cohort = 'PEM model',
	model_code = survivalstan.models.pem_survival_model,
	df = dlong,
	sample_col = 'index',
	timepoint_end_col = 'end_time',
	event_col = 'end_failure',
	formula = 'age + sex',
	iter = 3000,
	chains = 4,
	)

## print summary of MCMC draws from posterior for each parameter
print(testfit_pem['fit'])

## coefplot for PEM model results
sb.boxplot(x = 'value', y = 'variable', data = testfit_pem['coefs'])

## plot baseline hazard (only PEM models)
survivalstan.utils.plot_coefs([testfit_pem], element='baseline')

## posterior-predictive checking (only PEM models)
survivalstan.utils.plot_pp_survival([testfit_pem])

## e.g. compare models using PSIS-LOO
stanity.loo_compare(testfit_wei['loo'], testfit_pem['loo'])

## compare coefplots 
sb.boxplot(x = 'value', y = 'variable', hue = 'model_cohort',
    data = testfit_pem['coefs'].append(testfit_wei['coefs']))
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

## (or, use survivalstan.utils.plot_coefs)
survivalstan.utils.plot_coefs([testfit_wei, testfit_pem])

Comments

Feature/add patsy formulas
Some code refactoring in order to support using patsy formulas for LHS as well as RHS spec. Closes #52

Models can now use the following syntax:

survivalstan.fit_stan_survival_model( formula = 'surv(event_status=end_failure, time=t) ~ age + sex', data = d, model = survivalstan.models.exp_survival_model, ... )

Optional parameters to surv include group= and subject=.

The previous syntax is still also supported:

survivalstan.fit_stan_survival_model( event_col = 'end_failure', time_col = 't', formula = ' ~ age + sex', ... )

Moved a fair amount of data-processing into the survivalstan.formulas module, which extends patsy syntax to support the above. For example, the patsy.ModelDesc module includes the stan-data dict which gets passed to pystan.stan for sampling.

This ipynb tests early versions of this code and may help demonstrate how the elements in formulas.py fit together.
opened by jburos 15
stanity requirement

git cloneing this repo and then pip install -e . does not result in stanity being installed, even though it seems like it should based on the setup.py code. Not sure why?
bug

opened by tavinathanson 6
v0.1.2 release
Enhancements:

Improve plot_coefs & supporting functions

support a wider variety of coefficient types (element="baseline", element="grp_coefs", etc)

add trans option, so that one can plot exp(beta) or beta

support different types of grp_coefs, such as vector of vectors vs matrix

some bug-fixes

make fit_stan_survival_model more robust to failures in post-processing

updated TVC model

based on random-walk baseline hazard model

use random-walk (cumulative delta) to model TVCs

estimate hazard on log-scale for stability

New features:

Added pem-gamma model

Graphical posterior-predictive checking, via:

survivalstan.utils.plot_pp_survival(models=[my_model])

(optionally overlay observed values using plot_observed_survival(..))

Get posterior-predictive data via:

survivalstan.utils.prep_pp_data

updated tvc, pem-randomwalk & pem-gamma to support ppcheck functions
opened by jburos 5
0.1.1 release
Several fixes & cleanup items, in preparation for first real release.

bug fixes

Fix bug estimating null model (#8)

enhancements

add utils for plot_coefs

modify plot_coefs to accept arbitrary parameter names to extract (e.g. baseline_raw)

add utils for extract_params_long

refactor some parts of fit_stan_survival_model

precompute t_dur & t_obs & pass these in as data instead of using transformed data block

new models

modified default pem_survival_model - is the unstructured model.

added exp_survival_model.stan

added pem_survival_model_unstructured.stan

added pem_survival_model_randomwalk.stan

remove varying_coefs2 and varying_coefs3 models for now.
opened by jburos 5
more options
Few small changes:

Don't fail if psis-loo doesn't run after model estimation

Add drop_intercept param to fit_stan_survival_model, since not all models want intercept terms

New example model notebook containing gamma survival model, in bring-your-own model form
opened by jburos 4
add patsy formulas for Surv() syntax
This is a revised PR for the add-patsy-formulas feature (PR #60).

Since then:

Improved code formatting

fix log_t_dur spec (#71)

fix dataframe merge error on pp_surv (#76)

Comments from previous PR:

Some code refactoring in order to support using patsy formulas for LHS as well as RHS spec. Closes #52

Models can now use the following syntax:

survivalstan.fit_stan_survival_model( formula = 'surv(event_status=end_failure, time=t) ~ age + sex', data = d, model = survivalstan.models.exp_survival_model, ... )

Optional parameters to surv include group= and subject=.

The previous syntax is still also supported:

survivalstan.fit_stan_survival_model( event_col = 'end_failure', time_col = 't', formula = ' ~ age + sex', ... )

Moved a fair amount of data-processing into the survivalstan.formulas module, which extends patsy syntax to support the above. For example, the patsy.ModelDesc module includes the stan-data dict which gets passed to pystan.stan for sampling.

This ipynb tests early versions of this code and may help demonstrate how the elements in formulas.py fit together.
opened by jburos 3
Time and event column types

Following the PEM example with my own data, I got unreasonable results using survivalstan.utils.plot_observed_survival.

After casting my time column to float (was int before) and the event column to boolean (was 0/1 before), everything worked.

Is this intentional?
bug

opened by adam-haber 3
refactor code to anticipate joint modeling work
Changes in this PR:

pull data-prep into a separate class: SurvivalStanData. Preparation of these data is done via that class's __init__ method.

is a class because meta-data about the processing step (e.g. coefficient names, etc) are needed in the fit-stan object to support variety of post-processing.

new parameter input_data, allowing user to prepare the SurvivalStanData object separately & pass in as an argument.

minor bugfix for case where data are provided in "long" format & contain only a single failure time.
opened by jburos 3
Dataframe Merge error in utils.plot_pp_survival (Test pem_survival_model with simulated data notebook)

In the notebook, when I ran the code : survivalstan.utils.plot_pp_survival([testfit], by='sex') I got the following error:

ValueError Traceback (most recent call last) in () ----> 1 survivalstan.utils.plot_pp_survival([testfit], by='sex')

~/.conda/envs/test-env/lib/python3.5/site-packages/survivalstan/utils.py in plot_pp_survival(models, time_element, event_element, num_ticks, step_size, ticks_at, time_col, event_col, fill, by, alpha, pal, subplot, **kwargs) 326 pp_surv = prep_pp_survival_data(models, time_element=time_element, 327 event_element=event_element, time_col=time_col, --> 328 event_col=event_col, by=by) 329 print(pp_surv.head()) 330 if by:

~/.conda/envs/test-env/lib/python3.5/site-packages/survivalstan/utils.py in prep_pp_survival_data(models, time_element, event_element, time_col, event_col, by, **kwargs) 260 by=None, 261 **kwargs): --> 262 pp_data = prep_pp_data(models, time_element=time_element, event_element=event_element, time_col=time_col, event_col=event_col, **kwargs) 263 groups = ['iter', 'model_cohort'] 264 if by:

~/.conda/envs/test-env/lib/python3.5/site-packages/survivalstan/utils.py in prep_pp_data(models, time_element, event_element, event_col, time_col, **kwargs) 250 def prep_pp_data(models, time_element='y_hat_time', event_element='y_hat_event', event_col='event_status', time_col='event_time', **kwargs): 251 data = [_prep_pp_data_single_model(model=model, event_element=event_element, time_element=time_element, event_col=event_col, time_col=time_col, **kwargs) --> 252 for model in models] 253 data = pd.concat(data) 254 data.sort_values([time_col, 'iter'], inplace=True)

~/.conda/envs/test-env/lib/python3.5/site-packages/survivalstan/utils.py in (.0) 250 def prep_pp_data(models, time_element='y_hat_time', event_element='y_hat_event', event_col='event_status', time_col='event_time', **kwargs): 251 data = [_prep_pp_data_single_model(model=model, event_element=event_element, time_element=time_element, event_col=event_col, time_col=time_col, **kwargs) --> 252 for model in models] 253 data = pd.concat(data) 254 data.sort_values([time_col, 'iter'], inplace=True)

~/.conda/envs/test-env/lib/python3.5/site-packages/survivalstan/utils.py in _prep_pp_data_single_model(model, time_element, event_element, sample_col, time_col, event_col, join_with, sample_id_col) 244 #pp_data['index'] = pp_data['index'].astype(int) 245 if join_with: --> 246 pp_data = pd.merge(pp_data, model[join_with], on=sample_col, suffixes=['','_original']) 247 return pp_data 248

~/.conda/envs/test-env/lib/python3.5/site-packages/pandas/core/reshape/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate) 59 right_index=right_index, sort=sort, suffixes=suffixes, 60 copy=copy, indicator=indicator, ---> 61 validate=validate) 62 return op.get_result() 63

~/.conda/envs/test-env/lib/python3.5/site-packages/pandas/core/reshape/merge.py in init(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate) 553 # validate the merge keys dtypes. We may need to coerce 554 # to avoid incompat dtypes --> 555 self._maybe_coerce_merge_keys() 556 557 # If argument passed to validate,

~/.conda/envs/test-env/lib/python3.5/site-packages/pandas/core/reshape/merge.py in _maybe_coerce_merge_keys(self) 984 elif (not is_numeric_dtype(lk) 985 and (is_numeric_dtype(rk) and not is_bool_dtype(rk))): --> 986 raise ValueError(msg) 987 elif is_datetimelike(lk) and not is_datetimelike(rk): 988 raise ValueError(msg)

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

Resolution: I debugged the code and in the utils.py, the below function was producing a dataframe pp_data in which the index column had a type str: _prep_pp_data_single_model(model, time_element='y_hat_time', event_element='y_hat_event', sample_col=None, time_col='event_time', event_col='event_status', join_with='df_all', sample_id_col=None):

To solve this I just changed the type to int by using below line and the code ran fine without any errors: pp_data = pd.merge(pp_event_time, pp_event_status, on=['iter', sample_col, 'model_cohort']) pp_data['index'] = pp_data['index'].astype(int) if join_with: pp_data = pd.merge(pp_data, model[join_with], on=sample_col, suffixes=['','_original']) return pp_data
bug documentation

opened by prabhjot12 2
NameError: FIT_FUN
Hey all -

First, thank you for all the work that goes into this. This is making my job so much easier.

I'm imitating this: https://github.com/hammerlab/survivalstan/blob/master/example-notebooks/Test%20pem_survival_model_timevarying%20with%20simulated%20data.ipynb

I'm getting error "FIT_FUN". I'm imitating everything fine on new data until this part. I'm using the the time varying weibull survival instead of the PEM in the example. i.e.:

fit = survivalstan.fit_stan_survival_model( model_cohort = 'test model', model_code = survivalstan.models.weibull_survival_model_varying_coefs, df = df, sample_col = 'MRN', timepoint_end_col = 'Time', event_col = 'Death', formula = '~ CFR_Global + Etiology_CMP + Tacro + Cyclo + Syro + Evero + DM + HTN + eGFR + BMI + Age + AMR_Y/N', iter = 10000, chains = 4, seed = 8128, FIT_FUN = stancache.cached_stan_file, )

I'll keep staring at it, but is there something stupid I'm missing?

Thanks all... Andre
opened by drezap 2

Unknown compilation issue

I ran into a compilation issue when running the following on my Mac, from the README (seems to work on Linux):

## e.g. fit Weibull survival model
testfit_wei = survivalstan.fit_stan_survival_model(
    model_cohort = 'Weibull model',
    model_code = survivalstan.models.weibull_survival_model,
    df = d,
    time_col = 'futime',
    event_col = 'death',
    formula = 'age + sex',
    iter = 3000,
    chains = 4,
    make_inits = survivalstan.make_weibull_survival_model_inits
    )

I'll try to post the stack trace and associated errors when I reproduce.

bug

opened by tavinathanson 2

model fit works when running on shell, but runs into error when executing as script

i'm trying to build a bayes model object using the survivalstan package, the model object compiles properly when the script below is executed directly on a python shell (copy + paste method into the interpreter). however, when I execute the entire script at once using the command line format, I get a variable error (see below)

File "C:\ProgramData\Anaconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\thomasc\Documents\WeibullMethod\weibull_bayes_demo.py", line 117, in model = main() File "C:\Users\thomasc\Documents\WeibullMethod\weibull_bayes_demo.py", line 112, in main make_inits = survivalstan.make_weibull_survival_model_inits) File "C:\ProgramData\Anaconda3\lib\site-packages\survivalstan-0.1.2.8-py3.7.egg\survivalstan\survivalstan.py", line 167, in fit_stan_survival_model File "C:\ProgramData\Anaconda3\lib\site-packages\stanity-0.1.4-py3.7.egg\stanity\fit.py", line 37, in fit File "C:\ProgramData\Anaconda3\lib\site-packages\pystan\api.py", line 447, in stan n_jobs=n_jobs, **kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\pystan\model.py", line 813, in sampling ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs) File "C:\ProgramData\Anaconda3\lib\site-packages\pystan\model.py", line 87, in _map_parallel pool.close()

how can I prevent this error? I do not wish to have to interact with a python shell directly to train a model. code below to train model, directly same as the example code for this package this compiles if copy and pasted directly into a shell, but fails if saved as a .py script and executed from command line as python

Library of Stan Models for Survival Analysis

Related tags

Overview

survivalstan: Survival Models in Stan

Overview

Support

Installation / Usage

Contributing

Usage examples

Comments

bug fixes

enhancements

new models

In the notebook, when I ran the code : survivalstan.utils.plot_pp_survival([testfit], by='sex') I got the following error: