scikit-survival is a Python module for survival analysis built on top of scikit-learn.

Sebastian Pölsterl

Last update: Jan 4, 2023

Related tags

Overview

scikit-survival

scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizing the power of scikit-learn, e.g., for pre-processing or doing cross-validation.

About Survival Analysis

The objective in survival analysis (also referred to as time-to-event or reliability analysis) is to establish a connection between covariates and the time of an event. What makes survival analysis differ from traditional machine learning is the fact that parts of the training data can only be partially observed – they are censored.

For instance, in a clinical study, patients are often monitored for a particular time period, and events occurring in this particular period are recorded. If a patient experiences an event, the exact time of the event can be recorded – the patient’s record is uncensored. In contrast, right censored records refer to patients that remained event-free during the study period and it is unknown whether an event has or has not occurred after the study ended. Consequently, survival analysis demands for models that take this unique characteristic of such a dataset into account.

Requirements

Python 3.7 or later
ecos
joblib
numexpr
numpy 1.16 or later
osqp
pandas 0.25 or later
scikit-learn 0.24
scipy 1.0 or later
C/C++ compiler

Installation

The easiest way to install scikit-survival is to use Anaconda by running:

conda install -c sebp scikit-survival

Alternatively, you can install scikit-survival from source following this guide.

Examples

The user guide provides in-depth information on the key concepts of scikit-survival, an overview of available survival models, and hands-on examples in the form of Jupyter notebooks.

Help and Support

Documentation

HTML documentation for the latest release: https://scikit-survival.readthedocs.io/en/stable/
HTML documentation for the development version (master branch): https://scikit-survival.readthedocs.io/en/latest/
For a list of notable changes, see the release notes.

Bug reports

If you encountered a problem, please submit a bug report.

Questions

If you have a question on how to use scikit-survival, please use GitHub Discussions.
For general theoretical or methodological questions on survival analysis, please use Cross Validated.

Contributing

New contributors are always welcome. Please have a look at the contributing guidelines on how to get started and to make sure your code complies with our guidelines.

References

Please cite the following paper if you are using scikit-survival.

S. Pölsterl, "scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn," Journal of Machine Learning Research, vol. 21, no. 212, pp. 1–6, 2020.

@article{sksurv,
  author  = {Sebastian P{\"o}lsterl},
  title   = {scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {212},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v21/20-729.html}
}

Comments

CoxPH SurvivalAnalysis and Singular Matrix Error

I'm going through the tutorial using the veterans lung cancer study and I am using the same code for my own dataset for Cox regression. My problem is to calculating the days to graft failure after a transplant and the dataset has about 900 features after encoding and other preprocessing steps and it has 130K rows. I prepared data for Cox regression (data_x is a dataframe and data_y is a numpy array of status and suvival_in_days) and took a sample of it to run. However when I run the CoxRegression, I am getting the error of: LinAlgError:Matrix is Singular I manipulated my data in different ways, but I could not understand what is the problem and how to solve it.
awaiting response

opened by sarahysh12 22

Explain how to interpret output of .predict() in API doc

(I also posted this as a question on Stack Overflow: https://stackoverflow.com/q/47274356/1870832 )

I'm confused how to interpret the output of .predict from a fitted CoxnetSurvivalAnalysis model in scikit-survival. I've read through the notebook Intro to Survival Analysis in scikit-survival and the API reference, but can't find an explanation. Below is a minimal example of what leads to my confusion:

import pandas as pd
from sksurv.datasets import load_veterans_lung_cancer
from sksurv.linear_model import CoxnetSurvivalAnalysis

# load data
data_X, data_y = load_veterans_lung_cancer()

# one-hot-encode categorical columns in X
categorical_cols = ['Celltype', 'Prior_therapy', 'Treatment']

X = data_X.copy()
for c in categorical_cols:
    dummy_matrix = pd.get_dummies(X[c], prefix=c, drop_first=False)
    X = pd.concat([X, dummy_matrix], axis=1).drop(c, axis=1)

# display final X to fit Cox Elastic Net model on
del data_X
print(X.head(3))

so here's the X going into the model:

   Age_in_years  Celltype  Karnofsky_score  Months_from_Diagnosis  \
0          69.0  squamous             60.0                    7.0   
1          64.0  squamous             70.0                    5.0   
2          38.0  squamous             60.0                    3.0   

  Prior_therapy Treatment  
0            no  standard  
1           yes  standard  
2            no  standard

...moving on to fitting model and generating predictions:

# Fit Model
coxnet_model = CoxnetSurvivalAnalysis()
coxnet.fit(X, data_y)    

# What are these predictions?    
preds = coxnet.predict(X)

preds has same number of records as X, but their values are wayyy different than the values in data_y, even when predicted on the same data they were fit on.

print(preds.mean()) 
print(data_y['Survival_in_days'].mean())

output:

-0.044114643249153422
121.62773722627738

So what exactly are preds? Clearly .predict means something pretty different here than in scikit-learn, but I can't figure out what. The API Reference says it returns "The predicted decision function," but what does that mean? And how do I get to the predicted estimate in months yhat for a given X? I'm new to survival analysis so I'm obviously missing something.

opened by MaxPowerWasTaken 21

During install: error: command '/usr/bin/clang' failed with exit code 1

Python version: Python 3.10.3

OS: OSX 12.4 (Proc: M1 chip)

When trying to pip install (tried versions 0.17 and 0.18):

      222 warnings and 4 errors generated.
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]

The errors seem to be:

      In file included from sksurv/linear_model/_coxnet.cpp:801:
      In file included from sksurv/linear_model/src/coxnet_wrapper.h:21:
      sksurv/linear_model/src/coxnet/coxnet.h:139:23: error: expected unqualified-id
                  if (!std::isfinite(exp_xw[k])) {
                            ^

      In file included from sksurv/linear_model/src/coxnet/coxnet.h:18:
      In file included from sksurv/linear_model/src/eigen/Eigen/Core:374:
      sksurv/linear_model/src/eigen/Eigen/src/Core/MathFunctions.h:753:12: error: reference to unresolved using declaration
          return isnan EIGEN_NOT_A_MACRO (x);
                 ^

      In file included from sksurv/linear_model/src/coxnet/coxnet.h:18:
      In file included from sksurv/linear_model/src/eigen/Eigen/Core:374:
      sksurv/linear_model/src/eigen/Eigen/src/Core/MathFunctions.h:738:12: error: reference to unresolved using declaration
          return isinf EIGEN_NOT_A_MACRO (x);
                 ^

      In file included from sksurv/linear_model/src/coxnet/coxnet.h:18:
      In file included from sksurv/linear_model/src/eigen/Eigen/Core:374:
      sksurv/linear_model/src/eigen/Eigen/src/Core/MathFunctions.h:723:12: error: reference to unresolved using declaration
          return isfinite EIGEN_NOT_A_MACRO (x);
                 ^

Happy to provide more details if needed

opened by tpilewicz 13

0.12.0: from sksurv.ensemble import RandomSurvivalForest fails

Upon upgrading to 0.12.0

>>> from sksurv.ensemble import RandomSurvivalForest
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/ensemble/__init__.py", line 2, in <module>
    from .forest import RandomSurvivalForest  # noqa: F401
  File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/ensemble/forest.py", line 14, in <module>
    from ..tree import SurvivalTree
  File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/tree/__init__.py", line 1, in <module>
    from .tree import SurvivalTree  # noqa: F401
  File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/tree/tree.py", line 14, in <module>
    from ._criterion import LogrankCriterion
  File "_splitter.pxd", line 34, in init sksurv.tree._criterion
ValueError: sklearn.tree._splitter.Splitter size changed, may indicate binary incompatibility. Expected 368 from C header, got 360 from PyObject
>>>

opened by gregchu 13

Fix a variety of build problems.
Checklist

[x] py.test passes

[x] documentation renders correctly

What does this implement/fix? Explain your changes

In LLVM, this project was not compiling properly. With these changes, the project seems to compile fine.
opened by llpamies 10
viz of ensemble models

Hi!

would you have any advice on how to visualize decision path / decision trees from the ensemble survival model methods (either RF or Gradient Boosting)?

opened by ad05bzag 10
Different results of CoxPHSurvivalAnalysis and CoxnetSurvivalAnalysis
The documentation of CoxPHSurvivalAnalysis says:

Cox proportional hazards model.

And the documentation of CoxnetSurvivalAnalysis says:

Cox's proportional hazard's model with elastic net penalty.

So I assume the two classes implement the same model, and should return the same results when set with the same model parameters and given the same data. However, I see different results. Why? Also, what are the differences between them?

Codes:

from sksurv.linear_model import CoxPHSurvivalAnalysis, CoxnetSurvivalAnalysis from sksurv.datasets import load_veterans_lung_cancer from sksurv.preprocessing import OneHotEncoder X_, y = load_veterans_lung_cancer() X = OneHotEncoder().fit_transform(X_) # try to match the model parameters wherever possible f = CoxPHSurvivalAnalysis(alpha=0.5, n_iter=100000) g = CoxnetSurvivalAnalysis(alphas=[0.5], alpha_min_ratio=1, n_alphas=1, l1_ratio=1e-16, tol=1e-09, normalize=False) print(f) print(g) f.fit(X, y) g.fit(X, y) print(f.coef_) print(g.coef_[:,0])

Output:

CoxPHSurvivalAnalysis(alpha=0.5, n_iter=100000, tol=1e-09, verbose=0) CoxnetSurvivalAnalysis(alpha_min_ratio=0.0001, alphas=[0.5], copy_X=True, l1_ratio=1e-16, max_iter=100000, n_alphas=1, normalize=False, penalty_factor=None, tol=1e-09, verbose=False) [-8.34518623e-03 -7.21105070e-01 -2.80434400e-01 -1.11234345e+00 -3.26083027e-02 -1.93213436e-04 6.22726190e-02 2.90289950e-01] [-0.00346722 -0.05117406 0.06044394 -0.16433136 -0.03300373 0.0003172 -0.00881617 0.06956854]

What I've gathered:

CoxPHSurvivalAnalysis is sksurv's own implementation of Cox Proportional Hazard model, and supports ridge (L2) regularization.

CoxnetSurvivalAnalysis is a wrapper of some C++ extension codes used by R's glmnet package, and supports elastic net (L1 and L2) regularization.

In the test files, CoxPHSurvivalAnalysis is tested with the Rossi dataset, while CoxnetSurvivalAnalysis is tested with the Breast Cancer dataset.

The two classes have different constructor signatures and methods (eg, only CoxPHSurvivalAnalysis has predict_survival_function).

Will it be some nice features to have a consolidated constructor signatures and methods for the two classes? And have them tested on the same dataset, for validation or comparison?

Thanks.
opened by leihuang 10
Add `apply` and `decision_path` to `SurvivalTree`
Checklist

[x] closes #290

[x] py.test passes

[x] tests are included

[x] code is well formatted

[x] documentation renders correctly

What does this implement/fix? Explain your changes

Add apply and decision_path to SurvivalForest to also enable the same methods for RandomSurvivalForest and ExtraSurvivalTrees.
opened by Vincent-Maladiere 8
RandomSurvivalForest - predict_survival_function
Describe the bug

I am trying to predict the survival function for my data using RandomSurvivalForest, although the class method works well, it doesn't retrieve the times for each of the steps in the survival function. Each list containing the survival function has a lenght equal or lower to the number of unique times in our "y", hence we can't deduct to what point in time each steps belongs to.

Additionally, if you follow the example given in the documentation of RandomSurvivalForest, you will get the following error:

from sksurv.datasets import load_whas500 X, y = load_whas500() times = sorted(np.unique(y["lenfol"])) n_times = len(times) # n_times = 395 estimator = RandomSurvivalForest().fit(X, y) surv_funcs = estimator.predict_survival_function(X.iloc[:5]) surv_funcs[0] # array([0.9975 , 0.9975 , 0.9975 , 0.9975 , 0.9975 , # 0.9975 , 0.9975 , 0.995 , 0.98883333, 0.98883333,... len(surv_funcs[0]) # 162

Additionally, if you follow the example given in the documentation of RandomSurvivalForest, you will get an error since the result of predict_survival_function is an 1D unlike the same function used in CoxnetSurvivalAnalysis or CoxPHSurvivalAnalysis. This is the error you get:

from sksurv.datasets import load_whas500 X, y = load_whas500() estimator = RandomSurvivalForest().fit(X, y) surv_funcs = estimator.predict_survival_function(X.iloc[:5]) for fn in surv_funcs: plt.step(fn.x, fn(fn.x), where="post") plt.ylim(0, 1) plt.show() AttributeError: 'numpy.ndarray' object has no attribute 'x'
opened by felipe0216 8

Error when using PIP to install scikit-survival 0.13 that uses PEP 517

Describe the bug

A clear and concise description of what the bug is.

Code Sample to Reproduce the Bug

# Insert your code here that produces the bug.
# This example should be as succinct as possible and self-contained,
# i.e., not rely on external data.
# We are going to copy-paste your code and we expect to get the same result as you.
# It should run in a fresh python session, and so include all relevant imports.

Expected Results A clear and concise description of what you expected to happen.

Actual Results Please paste or specifically describe the actual output or traceback.

Versions Please execute the following snippet and paste the output below.

import sklearn; sklearn.show_versions()
import sksurv; print("sksurv:", sksurv.__version__)
import cvxopt; print("cvxopt:", cvxopt.__version__)
import cvxpy; print("cvxpy:", cvxpy.__version__)
import numexpr; print("numexpr:", numexpr.__version__)
import osqp; print("osqp:", osqp.OSQP().version())

opened by SurajitTest 8

Loss Function "ipcwls" in GradientBoostingSurvivalAnalysis leads to error

Hi

I was trying to train a time-to-failure model using machine sensor data. I chose the loss function 'ipcwls' which as per the docs weights the observations by their censoring weights. Although I'm not aware of the thoery behind it, it seemed like a reasonable choice. But, the code fails while applying the fit() function with the error message "input contains nan infinity or a value too large for dtype float64"

FYI, All of my X variables are scaled and they take continuous values within +-50 range. Quite a few has small values close to zero (5-6 decimal places). Is the loss function choice leading to a division by zero situation? Need some clarity on this and when this loss function should not be used.

Thanks, Soham

opened by Soham2112 8
n_iter_no_change in GradientBoostingSurvivalAnalysis
Describe the bug

The documentation for the parameter "n_estimators_" of GradientBoostingSurvivalAnalysis says "The number of estimators as selected by early stopping (if n_iter_no_change is specified)." However, GradientBoostingSurvivalAnalysis does not accept n_iter_no_change as an argument.

Code Sample to Reproduce the Bug

from sksurv.ensemble import GradientBoostingSurvivalAnalysis GradientBoostingSurvivalAnalysis(n_iter_no_change = 10)

Actual Results

TypeError: GradientBoostingSurvivalAnalysis.__init__() got an unexpected keyword argument 'n_iter_no_change' Please paste or specifically describe the actual output or traceback.

Versions System: python: 3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] machine: Linux-5.15.0-56-generic-x86_64-with-glibc2.35

Python dependencies: sklearn: 1.2.0 pip: 22.3.1 setuptools: 65.5.0 numpy: 1.23.4 scipy: None Cython: 0.29.32 pandas: 1.5.1 matplotlib: 3.6.2 joblib: 1.2.0 threadpoolctl: 3.1.0
opened by TristanFauvel 0
Added conditional property to expose time scale predictions
Checklist

[X] closes #324

[X] py.test passes

[ ] tests are included

[X] code is well formatted

[X] documentation renders correctly

Added a decorator for properties, which are only available, if a check returns true. The decorator provided by scikit-learn only works for functions sadly.

@sebp I am not sure what to test exactly, maybe a test which tests whether pipelines correctly patch the property and functions through? I also think this should not show up in the documentation, as it is internal?
opened by Finesim97 5

SciKit-Learn Pipeline not patched with "_predict_risk_score"

Describe the bug

In my own evaluation code I used the check for '_predict_risk_score' to see, whether models return their predictions on the time scale or risk scale, but this doesn't work, when the estimator is wrapped in a pipeline.

# Insert your code here that produces the bug.
from sklearn.pipeline import Pipeline
from sksurv.linear_model.aft import IPCRidge
from sksurv.datasets import load_veterans_lung_cancer
from sksurv.preprocessing import OneHotEncoder
from sksurv.base import SurvivalAnalysisMixin


data_x, data_y = load_veterans_lung_cancer()


data_x_prep = OneHotEncoder().fit_transform(data_x)
model_direct = IPCRidge().fit(data_x_prep, data_y)


pipe = Pipeline([('encode', OneHotEncoder()),
                 ('model', IPCRidge())])
pipe.fit(data_x, data_y)


# Are equal
print(model_direct.predict(data_x_prep.head()))
print(pipe.predict(data_x.head()))


# Steal super method
# This does not match, because ...
print(SurvivalAnalysisMixin.score(model_direct, data_x_prep, data_y))
print(SurvivalAnalysisMixin.score(pipe, data_x, data_y))


# ... the property is not patched through
# if this returns true, the scores are treated as being on the time scale
print(not getattr(model_direct, "_predict_risk_score", True))
print(not getattr(pipe, "_predict_risk_score", True))


# The second one should also be true!

Expected Results A Pipeline object should also have the corresponding property set, as this might break evaluation codes.

Actual Results The property is not available. It should be possible to just add it to the __init__.py, but I am not sure, how well it works together with the @property decorator. Currently I am finishing my master thesis, but I should be able to work out a PR on the 5th of December while testing the behaviour.

Versions (Not running the newest version cough)

System:
    python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03)  [GCC 9.4.0]
executable: /home/jovyan/master-thesis/env/bin/python
   machine: Linux-5.10.0-15-amd64-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.1.2
          pip: 22.2.2
   setuptools: 65.4.0
        numpy: 1.23.3
        scipy: 1.9.1
       Cython: None
       pandas: 1.5.0
   matplotlib: 3.6.0
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /home/jovyan/master-thesis/env/lib/python3.9/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None
    num_threads: 48

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/jovyan/master-thesis/env/lib/libopenblasp-r0.3.21.so
        version: 0.3.21
threading_layer: pthreads
   architecture: Zen
    num_threads: 48

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/jovyan/master-thesis/env/lib/python3.9/site-packages/scipy.libs/libopenblasp-r0-9f9f5dbc.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: Zen
    num_threads: 48
sksurv: 0.18.0

enhancement

opened by Finesim97 2

Bug in nonparametric.py when calling IPCRidge

Describe the bug

Running IPCRidge hangs with the following message

assert (Ghat > 0).all()

and nothing after. I found that changing the option 'reverse = False' as shown down below in kaplan_meier_estimator in the function ipc_weights in the file nonparametric.py corrects the mistake. Error message:

AssertionError                            Traceback (most recent call last)
Input In [74], in <cell line: 5>()
      2 set_config(display="text")  # displays text representation of estimators
      4 estimator = IPCRidge(alpha = 0.5,fit_intercept=True)
----> 5 estimator.fit(data_x,data_y)

File /opt/homebrew/Caskroom/miniforge/base/envs/teaching_env/lib/python3.10/site-packages/sksurv/linear_model/aft.py:90, in IPCRidge.fit(self, X, y)
     72 """Build an accelerated failure time model.
     73 
     74 Parameters
   (...)
     86 self
     87 """
     88 event, time = check_array_survival(X, y)
---> 90 weights = ipc_weights(event, time)
     91 super().fit(X, numpy.log(time), sample_weight=weights)
     93 return self

File /opt/homebrew/Caskroom/miniforge/base/envs/teaching_env/lib/python3.10/site-packages/sksurv/nonparametric.py:323, in ipc_weights(event, time)
    320 idx = numpy.searchsorted(unique_time, time[event])
    321 Ghat = p[idx]
--> 323 assert (Ghat > 0).all()
    325 weights = numpy.zeros(time.shape[0])
    326 weights[event] = 1.0 / Ghat

AssertionError:

Code Sample to Reproduce the Bug

used code:

estimator = IPCRidge(alpha = 0.5,fit_intercept=True)
estimator.fit(data_x,data_y)

Here is what I changed in the nonparametric.py in the line unique_time, p = kaplan_meier_estimator(event, time, reverse=False) -- changed True to False

def ipc_weights(event, time):
    """Compute inverse probability of censoring weights

    Parameters
    ----------
    event : array, shape = (n_samples,)
        Boolean event indicator.

    time : array, shape = (n_samples,)
        Time when a subject experienced an event or was censored.

    Returns
    -------
    weights : array, shape = (n_samples,)
        inverse probability of censoring weights

    See also
    --------
    CensoringDistributionEstimator
        An estimator interface for estimating inverse probability
        of censoring weights for unseen time points.
    """
    if event.all():
        return np.ones(time.shape[0])

    unique_time, p = kaplan_meier_estimator(event, time, reverse=False)

    idx = np.searchsorted(unique_time, time[event])
    Ghat = p[idx]

    assert (Ghat > 0).all()

    weights = np.zeros(time.shape[0])
    weights[event] = 1.0 / Ghat

    return weights

Machine and packages versions used:

Last updated: 2022-11-08T08:59:04.111247-05:00

Python implementation: CPython
Python version       : 3.10.5
IPython version      : 8.4.0

Compiler    : Clang 13.0.1 
OS          : Darwin
Release     : 21.6.0
Machine     : arm64
Processor   : arm
CPU cores   : 10
Architecture: 64bit

matplotlib: 3.5.2
numpy     : 1.22.4
pandas    : 1.4.4
json      : 2.0.9

bug

opened by fbarfi 4

Suggestions for StepFunction
I have 2 minor suggestions for StepFunction that I would like to see:

Different argument name for 'x' in init and call. In addition, current API reference is missing.

Sort the arrays inside the function.

Thanks.
awaiting response
opened by drproduck 1
KM_variance_estimator
Checklist

[x] py.test passes

[x] tests are included

[x] code is well formatted

[ ] documentation renders correctly

What does this implement/fix? Explain your changes

Hi @sebp, I added the Greenwood's estimation of KM variance to nonparametric.py (this is a prerequesite for implementing some goodness-of-fit tests). NB: I ran tox -e py310-docs but for some reason the new function does not not appear in the API doc. Best,
opened by TristanFauvel 3

Releases(v0.19.0.post1)

v0.19.0.post1(Oct 24, 2022)

This release raises the install requirement of scikit-learn to 1.1.2 to avoid binary incompatibility with previous releases (#316).

Full Changelog: https://github.com/sebp/scikit-survival/compare/v0.19.0...v0.19.0.post1
Source code(tar.gz)
Source code(zip)
v0.19.0(Oct 23, 2022)
This release adds sksurv.tree.SurvivalTree.apply() and sksurv.tree.SurvivalTree.decision_path(), and support for sparse matrices to sksurv.tree.SurvivalTree. Moreover, it fixes build issues with scikit-learn 1.1.2 and on macOS with ARM64 CPU.

Bug fixes

Fix build issue with scikit-learn 1.1.2, which is binary-incompatible with previous releases from the 1.1 series.

Fix build from source on macOS with ARM64 by specifying numpy 1.21.0 as install requirement for that platform (#313).

Enhancements

sksurv.tree.SurvivalTree: Add sksurv.tree.SurvivalTree.apply() and sksurv.tree.SurvivalTree.decision_path() (#290).

sksurv.tree.SurvivalTree: Add support for sparse matrices (#290).

Full Changelog: https://github.com/sebp/scikit-survival/compare/v0.18.0...v0.19.0
Source code(tar.gz)
Source code(zip)
v0.18.0(Aug 15, 2022)
This release adds support for scikit-learn 1.1, which includes more informative error messages. Support for Python 3.7 has been dropped, and the minimum supported versions of dependencies are updated to

numpy 1.17.3

Pandas 1.0.5

scikit-learn 1.1.0

scipy 1.3.2

Enhancements

Add n_iter_ attribute to all estimators in sksurv.svm (#277).

Add return_array argument to all models providing predict_survival_function and predict_cumulative_hazard_function (#268).

Deprecations

The loss_ attribute of ComponentwiseGradientBoostingSurvivalAnalysis and GradientBoostingSurvivalAnalysis has been deprecated.

The default for the max_features argument has been changed from 'auto' to 'sqrt' for RandomSurvivalForest and ExtraSurvivalTrees. 'auto' and 'sqrt' have the same effect.

Full Changelog: https://github.com/sebp/scikit-survival/compare/v0.17.2...v0.18.0
Source code(tar.gz)
Source code(zip)
v0.17.2(Apr 24, 2022)
This release fixes several issues with packaging scikit-survival.

Bug fixes

Added backward support for gcc-c++ by @navashiva (#255).

Do not install C/C++ and Cython source files.

Add packaging to build requirements in pyproject.toml.

Exclude generated API docs from source distribution.

Add Python 3.10 to classifiers.

Documentation

Use permutation_importance from sklearn instead of eli5.

Build documentation with Sphinx 4.4.0.

Fix missing documentation for classes in sksurv.meta.

New Contributors

@navashiva made their first contribution in https://github.com/sebp/scikit-survival/pull/255

Full Changelog: https://github.com/sebp/scikit-survival/compare/v0.17.1...v0.17.2
Source code(tar.gz)
Source code(zip)
v0.17.1(Mar 5, 2022)

This release adds support for Python 3.10.

Full Changelog: https://github.com/sebp/scikit-survival/compare/v0.17.0...v0.17.1
Source code(tar.gz)
Source code(zip)
v0.17.0(Jan 9, 2022)
This release adds support for scikit-learn 1.0, which includes support for feature names. If you pass a pandas dataframe to fit, the estimator will set a feature_names_in_ attribute containing the feature names. When a dataframe is passed to predict, it is checked that the column names are consistent with those passed to fit. See the scikit-learn release highlights for details.

Bug fixes

Fix a variety of build problems with LLVM (#243).

Enhancements

Add support for feature_names_in_ and n_features_in_ to all estimators and transforms.

Add sksurv.preprocessing.OneHotEncoder.get_feature_names_out.

Update bundeled version of Eigen to 3.3.9.

Backwards incompatible changes

Drop min_impurity_split parameter from sksurv.ensemble.GradientBoostingSurvivalAnalysis.

base_estimators and meta_estimator attributes of sksurv.meta.Stacking do not contain fitted models anymore, use estimators_ and final_estimator_, respectively.

Deprecations

The normalize parameter of sksurv.linear_model.IPCRidge is deprecated and will be removed in a future version. Instead, use a sciki-learn pipeline: make_pipeline(StandardScaler(with_mean=False), IPCRidge()).

Source code(tar.gz)
Source code(zip)
v0.16.0(Oct 30, 2021)
This release adds support for changing the evaluation metric that is used in estimators’ score method. This is particular useful for hyper-parameter optimization using scikit-learn’s GridSearchCV. You can now use sksurv.metrics.as_concordance_index_ipcw_scorer, sksurv.metrics.as_cumulative_dynamic_auc_scorer, or sksurv.metrics.as_integrated_brier_score_scorer to adjust the score method to your needs. A detailed example is available in the User Guide.

Moreover, this release adds sksurv.ensemble.ExtraSurvivalTrees to fit an ensemble of randomized survival trees, and improves the speed of sksurv.compare.compare_survival() significantly. The documentation has been extended by a section on the time-dependent Brier score.

Bug fixes

Columns are dropped in sksurv.column.encode_categorical() despite allow_drop=False (#199).

Ensure sksurv.column.categorical_to_numeric() always returns series with int64 dtype.

Enhancements

Add sksurv.ensemble.ExtraSurvivalTrees ensemble (#195).

Faster speed for sksurv.compare.compare_survival() (#215).

Add wrapper classes sksurv.metrics.as_concordance_index_ipcw_scorer, sksurv.metrics.as_cumulative_dynamic_auc_scorer, and sksurv.metrics.as_integrated_brier_score_scorer to override the default score method of estimators (#192).

Remove use of deprecated numpy dtypes.

Remove use of inplace in pandas’ set_categories.

Documentation

Remove comments and code suggesting log-transforming times prior to training Survival SVM (#203).

Add documentation for max_samples parameter to sksurv.ensemble.ExtraSurvivalTrees and sksurv.ensemble.RandomSurvivalForest (#217).

Add section on time-dependent Brier score (#220).

Add section on using alternative metrics for hyper-parameter optimization.

Source code(tar.gz)
Source code(zip)
v0.15.0(Mar 20, 2021)
This release adds support for scikit-learn 0.24 and Python 3.9. scikit-survival now requires at least pandas 0.25 and scikit-learn 0.24. Moreover, if sksurv.ensemble.GradientBoostingSurvivalAnalysis or sksurv.ensemble.ComponentwiseGradientBoostingSurvivalAnalysis are fit with loss='coxph', predict_cumulative_hazard_function and predict_survival_function are now available. sksurv.metrics.cumulative_dynamic_auc now supports evaluating time-dependent predictions, for instance for a sksurv.ensemble.RandomSurvivalForest as illustrated in the User Guide.

Bug fixes

Allow passing pandas data frames to all fit and predict methods (#148).

Allow sparse matrices to be passed to sksurv.ensemble.GradientBoostingSurvivalAnalysis.predict.

Fix example in user guide using GridSearchCV to determine alphas for CoxnetSurvivalAnalysis (#186).

Enhancements

Add score method to sksurv.meta.Stacking, sksurv.meta.EnsembleSelection, and sksurv.meta.EnsembleSelectionRegressor (#151).

Add support for predict_cumulative_hazard_function and predict_survival_function to sksurv.ensemble.GradientBoostingSurvivalAnalysis. and sksurv.ensemble.GradientBoostingSurvivalAnalysis if model was fit with loss='coxph'.

Add support for time-dependent predictions to sksurv.metrics.cumulative_dynamic_auc See the User Guide for an example (#134).

Backwards incompatible changes

The score method of sksurv.linear_model.IPCRidge, sksurv.svm.FastSurvivalSVM, and sksurv.svm.FastKernelSurvivalSVM (if rank_ratio is smaller than 1) now converts predictions on log(time) scale to risk scores prior to computing the concordance index.

Support for cvxpy and cvxopt solver in sksurv.svm.MinlipSurvivalAnalysis and sksurv.svm.HingeLossSurvivalSVM has been dropped. The default solver is now ECOS, which was used by cvxpy (the previous default) internally. Therefore, results should be identical.

Dropped the presort argument from sksurv.tree.SurvivalTree and sksurv.ensemble.GradientBoostingSurvivalAnalysis.

The X_idx_sorted argument in sksurv.tree.SurvivalTree.fit has been deprecated in scikit-learn 0.24 and has no effect now.

predict_cumulative_hazard_function and predict_survival_function of sksurv.ensemble.RandomSurvivalForest and sksurv.tree.SurvivalTree now return an array of sksurv.functions.StepFunction objects by default. Use return_array=True to get the old behavior.

Support for Python 3.6 has been dropped.

Increase minimum supported versions of dependencies. We now require:

| Package | Minimum Version | |--------------|-----------------| | Pandas | 0.25.0 | | scikit-learn | 0.24.0 |

Source code(tar.gz)
Source code(zip)
v0.14.0(Oct 7, 2020)
This release features a complete overhaul of the documentation. It features a new visual design, and the inclusion of several interactive notebooks in the User Guide.

In addition, it includes important bug fixes. It fixes several bugs in sksurv.linear_model.CoxnetSurvivalAnalysis where predict, predict_survival_function, and predict_cumulative_hazard_function returned wrong values if features of the training data were not centered. Moreover, the score function of sksurv.ensemble.ComponentwiseGradientBoostingSurvivalAnalysis and sksurv.ensemble.GradientBoostingSurvivalAnalysis will now correctly compute the concordance index if loss='ipcwls' or loss='squared'.

Bug fixes

sksurv.column.standardize() modified data in-place. Data is now always copied.

sksurv.column.standardize() works with integer numpy arrays now.

sksurv.column.standardize() used biased standard deviation for numpy arrays (ddof=0), but unbiased standard deviation for pandas objects (ddof=1). It always uses ddof=1 now. Therefore, the output, if the input is a numpy array, will differ from that of previous versions.

Fixed sksurv.linear_model.CoxnetSurvivalAnalysis.predict_survival_function() and sksurv.linear_model.CoxnetSurvivalAnalysis.predict_cumulative_hazard_function(), which returned wrong values if features of training data were not already centered. This adds an offset_ attribute that accounts for non-centered data and is added to the predicted risk score. Therefore, the outputs of predict, predict_survival_function, and predict_cumulative_hazard_function will be different to previous versions for non-centered data (#139).

Rescale coefficients of sksurv.linear_model.CoxnetSurvivalAnalysis if normalize=True.

Fix score function of sksurv.ensemble.ComponentwiseGradientBoostingSurvivalAnalysis and sksurv.ensemble.GradientBoostingSurvivalAnalysis if loss='ipcwls' or loss='squared' is used. Previously, it returned 1.0 - true_cindex.

Enhancements

Add sksurv.show_versions() that prints the version of all dependencies.

Add support for pandas 1.1

Include interactive notebooks in documentation on readthedocs.

Add user guide on penalized Cox models.

Add user guide on gradient boosted models.

Source code(tar.gz)
Source code(zip)
v0.13.1(Jul 4, 2020)
This release fixes warnings that were introduced with 0.13.0.

Bug fixes

Explicitly pass return_array=True in sksurv.tree.SurvivalTree.predict to avoid FutureWarning.

Fix error when fitting sksurv.tree.SurvivalTree with non-float dtype for time (#127).

Fix RuntimeWarning: invalid value encountered in true_divide in sksurv.nonparametric.kaplan_meier_estimator.

Fix PendingDeprecationWarning about use of matrix when fitting sksurv.svm.FastSurvivalSVM if optimizer is PRSVM or simple.

Source code(tar.gz)
Source code(zip)
v0.13.0(Jun 28, 2020)
The highlights of this release include the addition of sksurv.metrics.brier_score and sksurv.metrics.integrated_brier_score and compatibility with scikit-learn 0.23.

predict_survival_function and predict_cumulative_hazard_function of sksurv.ensemble.RandomSurvivalForest and sksurv.tree.SurvivalTree can now return an array of sksurv.functions.StepFunction, similar to sksurv.linear_model.CoxPHSurvivalAnalysis by specifying return_array=False. This will be the default behavior starting with 0.14.0.

Note that this release fixes a bug in estimating inverse probability of censoring weights (IPCW), which will affect all estimators relying on IPCW.

Enhancements

Make build system compatible with PEP-517/518.

Added sksurv.metrics.brier_score and sksurv.metrics.integrated_brier_score (#101).

sksurv.functions.StepFunction can now be evaluated at multiple points in a single call.

Update documentation on usage of predict_survival_function and predict_cumulative_hazard_function (#118).

The default value of alpha_min_ratio of sksurv.linear_model.CoxnetSurvivalAnalysis will now depend on the n_samples/n_features ratio. If n_samples > n_features, the default value is 0.0001 If n_samples <= n_features, the default value is 0.01.

Add support for scikit-learn 0.23 (#119).

Deprecations

predict_survival_function and predict_cumulative_hazard_function of sksurv.ensemble.RandomSurvivalForest and sksurv.tree.SurvivalTree will return an array of sksurv.functions.StepFunction in the future (as sksurv.linear_model.CoxPHSurvivalAnalysis does). For the old behavior, use return_array=True.

Bug fixes

Fix deprecation of importing joblib via sklearn.

Fix estimation of censoring distribution for tied times with events. When estimating the censoring distribution, by specifying reverse=True when calling sksurv.nonparametric.kaplan_meier_estimator, we now consider events to occur before censoring. For tied time points with an event, those with an event are not considered at risk anymore and subtracted from the denominator of the Kaplan-Meier estimator. The change affects all functions relying on inverse probability of censoring weights, namely:

sksurv.nonparametric.CensoringDistributionEstimator

sksurv.nonparametric.ipc_weights

sksurv.linear_model.IPCRidge

sksurv.metrics.cumulative_dynamic_auc

sksurv.metrics.concordance_index_ipcw

Throw an exception when trying to estimate c-index from uncomparable data (#117).

Estimators in sksurv.svm will now throw an exception when trying to fit a model to data with uncomparable pairs.

Source code(tar.gz)
Source code(zip)
v0.12.0(Apr 15, 2020)
This release adds support for scikit-learn 0.22, thereby dropping support for older versions. Moreover, the regularization strength of the ridge penalty in sksurv.linear_model.CoxPHSurvivalAnalysis can now be set per feature. If you want one or more features to enter the model unpenalized, set the corresponding penalty weights to zero. Finally, sklearn.pipeline.Pipeline will now be automatically patched to add support for predict_cumulative_hazard_function and predict_survival_function if the underlying estimator supports it.

Deprecations

Add scikit-learn's deprecation of presort in sksurv.tree.SurvivalTree and sksurv.ensemble.GradientBoostingSurvivalAnalysis.

Add warning that default alpha_min_ratio in sksurv.linear_model.CoxnetSurvivalAnalysis will depend on the ratio of the number of samples to the number of features in the future (#41).

Enhancements

Add references to API doc of sksurv.ensemble.GradientBoostingSurvivalAnalysis (#91).

Add support for pandas 1.0 (#100).

Add ccp_alpha parameter for Minimal Cost-Complexity Pruning to sksurv.ensemble.GradientBoostingSurvivalAnalysis.

Patch sklearn.pipeline.Pipeline to add support for predict_cumulative_hazard_function and predict_survival_function if the underlying estimator supports it.

Allow per-feature regularization for sksurv.linear_model.CoxPHSurvivalAnalysis (#102).

Clarify API docs of sksurv.metrics.concordance_index_censored (#96).

Source code(tar.gz)
Source code(zip)
v0.11(Dec 21, 2019)
This release adds sksurv.tree.SurvivalTree and sksurv.ensemble.RandomSurvivalForest, which are based on the log-rank split criterion. It also adds the OSQP solver as option to sksurv.svm.MinlipSurvivalAnalysis and sksurv.svm.HingeLossSurvivalSVM, which will replace the now deprecated cvxpy and cvxopt options in a future release.

This release removes support for sklearn 0.20 and requires sklearn 0.21.

Deprecations

The cvxpy and cvxopt options for solver in sksurv.svm.MinlipSurvivalAnalysis and sksurv.svm.HingeLossSurvivalSVM are deprecated and will be removed in a future version. Choosing osqp is the preferred option now.

Enhancements

Add support for pandas 0.25.

Add OSQP solver option to sksurv.svm.MinlipSurvivalAnalysis, and sksurv.svm.HingeLossSurvivalSVM which has no additional dependencies.

Fix issue when using cvxpy 1.0.16 or later.

Explicitly specify utf-8 encoding when reading README.rst (#89).

Add sksurv.tree.SurvivalTree and sksurv.ensemble.RandomSurvivalForest (#90).

Bug fixes

Exclude Cython-generated files from source distribution because they are not forward compatible.

Source code(tar.gz)
Source code(zip)
v0.10(Sep 2, 2019)
This release adds the ties argument to sksurv.linear_model.CoxPHSurvivalAnalysis to choose between Breslow’s and Efron’s likelihood in the presence of tied event times. Moreover, sksurv.compare.compare_survival() has been added, which implements the log-rank hypothesis test for comparing the survival function of 2 or more groups.

Enhancements

Update API doc of predict function of boosting estimators (#75).

Clarify documentation for GradientBoostingSurvivalAnalysis (#78).

Implement Efron’s likelihood for handling tied event times.

Implement log-rank test for comparing survival curves.

Add support for scipy 1.3.1 (#66).

Bug fixes

Re-add baseline_survival_ and cum_baseline_hazard_ attributes to sksurv.linear_model.CoxPHSurvivalAnalysis (#76).

Source code(tar.gz)
Source code(zip)
v0.9(Jul 26, 2019)
This release adds support for sklearn 0.21 and pandas 0.24.

Enhancements

Add reference to IPCRidge (#65).

Use scipy.special.comb instead of deprecated scipy.misc.comb.

Add support for pandas 0.24 and drop support for 0.20.

Add support for scikit-learn 0.21 and drop support for 0.20 (#71).

Explain use of intercept in ComponentwiseGradientBoostingSurvivalAnalysis (#68)

Bump Eigen to 3.3.7.

Bug fixes

Disallow scipy 1.3.0 due to scipy regression (#66).

Source code(tar.gz)
Source code(zip)
v0.8(May 1, 2019)
Enhancements

Add sksurv.linear_model.CoxnetSurvivalAnalysis.predict_survival_function and sksurv.linear_model.CoxnetSurvivalAnalysis.predict_cumulative_hazard_function (#46).

Add sksurv.nonparametric.SurvivalFunctionEstimator and sksurv.nonparametric.CensoringDistributionEstimator that wrap sksurv.nonparametric.kaplan_meier_estimator and provide a predict_proba method for evaluating the estimated function on test data.

Implement censoring-adjusted C-statistic proposed by Uno et al. (2011) in sksurv.metrics.concordance_index_ipcw.

Add estimator of cumulative/dynamic AUC of Uno et al. (2007) in sksurv.metrics.cumulative_dynamic_auc.

Add flchain dataset (see sksurv.datasets.load_flchain).

Bug fixes

The tied_time return value of sksurv.metrics.concordance_index_censored now correctly reflects the number of comparable pairs that share the same time and that are used in computing the concordance index.

Fix a bug in sksurv.metrics.concordance_index_censored where a pair with risk estimates within tolerance was counted both as concordant and tied.

Source code(tar.gz)
Source code(zip)
v0.7(Feb 27, 2019)
This release adds support for Python 3.7 and sklearn 0.20.

Changes:

Add support for sklearn 0.20 (#48).

Migrate to py.test (#50).

Explicitly request ECOS solver for sksurv.svm.MinlipSurvivalAnalysis and sksurv.svm.HingeLossSurvivalSVM.

Add support for Python 3.7 (#49).

Add support for cvxpy >=1.0.

Add support for numpy 1.15.

Source code(tar.gz)
Source code(zip)
v0.6.0(Oct 7, 2018)
This release adds support for numpy 1.14 and pandas up to 0.23. In addition, the new class sksurv.util.Surv makes it easier to construct a structured array from numpy arrays, lists, or a pandas data frame.

Changes:

Support numpy 1.14 and pandas 0.22, 0.23 (#36).

Enable support for cvxopt with Python 3.5+ on Windows (requires cvxopt >=1.1.9).

Add max_iter parameter to sksurv.svm.MinlipSurvivalAnalysis and sksurv.svm.HingeLossSurvivalSVM.

Fix score function of sksurv.svm.NaiveSurvivalSVM to use concordance index.

sksurv.linear_model.CoxnetSurvivalAnalysis now throws an exception if coefficients get too large (#47).

Add sksurv.util.Surv class to ease constructing a structured array (#26).

Source code(tar.gz)
Source code(zip)
v0.5(Dec 9, 2017)

This release adds support for scikit-learn 0.19 and pandas 0.21. In turn, support for older versions is dropped, namely Python 3.4, scikit-learn 0.18, and pandas 0.18.
Source code(tar.gz)
Source code(zip)
v0.4(Oct 29, 2017)

This release adds sksurv.linear_model.CoxnetSurvivalAnalysis which implements an efficient algorithm to fit Cox's proportional hazards model with LASSO, ridge, and elastic net penalty. Moreover, it includes support for Windows with Python 3.5 and later by making the cvxopt package optional.
Source code(tar.gz)
Source code(zip)
v0.3(Aug 1, 2017)

This release adds predict_survival_function and predict_cumulative_hazard_function to sksurv.linear_model.CoxPHSurvivalAnalysis, which return the survival function and cumulative hazard function using Breslow's estimator.

Moreover, it fixes a build error on Windows (#3) and adds the sksurv.preprocessing.OneHotEncoder class, which can be used in a scikit-learn pipeline.
Source code(tar.gz)
Source code(zip)
v0.2(May 29, 2017)

This release adds support for Python 3.6, and pandas 0.19 and 0.20.
Source code(tar.gz)
Source code(zip)