PySurvival is an open source python package for Survival Analysis modeling

Overview

PySurvival

pysurvival_logo

What is Pysurvival ?

PySurvival is an open source python package for Survival Analysis modeling - the modeling concept used to analyze or predict when an event is likely to happen. It is built upon the most commonly used machine learning packages such NumPy, SciPy and PyTorch.

PySurvival is compatible with Python 2.7-3.7.

Check out the documentation here


Content

PySurvival provides a very easy way to navigate between theoretical knowledge on Survival Analysis and detailed tutorials on how to conduct a full analysis, build and use a model. Indeed, the package contains:


Installation

If you have already installed a working version of gcc, the easiest way to install Pysurvival is using pip

pip install pysurvival

The full description of the installation steps can be found here.


Get Started

Because of its simple API, Pysurvival has been built to provide to best user experience when it comes to modeling. Here's a quick modeling example to get you started:

# Loading the modules
from pysurvival.models.semi_parametric import CoxPHModel
from pysurvival.models.multi_task import LinearMultiTaskModel
from pysurvival.datasets import Dataset
from pysurvival.utils.metrics import concordance_index

# Loading and splitting a simple example into train/test sets
X_train, T_train, E_train, X_test, T_test, E_test = \
	Dataset('simple_example').load_train_test()

# Building a CoxPH model
coxph_model = CoxPHModel()
coxph_model.fit(X=X_train, T=T_train, E=E_train, init_method='he_uniform', 
                l2_reg = 1e-4, lr = .4, tol = 1e-4)

# Building a MTLR model
mtlr = LinearMultiTaskModel()
mtlr.fit(X=X_train, T=T_train, E=E_train, init_method = 'glorot_uniform', 
           optimizer ='adam', lr = 8e-4)

# Checking the model performance
c_index1 = concordance_index(model=coxph_model, X=X_test, T=T_test, E=E_test )
print("CoxPH model c-index = {:.2f}".format(c_index1))

c_index2 = concordance_index(model=mtlr, X=X_test, T=T_test, E=E_test )
print("MTLR model c-index = {:.2f}".format(c_index2))

Citation and License

Citation

If you use Pysurvival in your research and we would greatly appreciate if you could use the following:

@Misc{ pysurvival_cite,
  author =    {Stephane Fotso and others},
  title =     {PySurvival: Open source package for Survival Analysis modeling},
  year =      {2019--},
  url = "https://www.pysurvival.io/"
}

License

Copyright 2019 Square Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • RSF implementation seems to hang when predicting

    RSF implementation seems to hang when predicting

    I've installed pysurvival using brew for gcc and pip on my MacBook Pro (macOS 10.13.6) and been able to train a RSF model in a Jupyter Notebook (though this took several minutes of high CPU activity). The training data has around 70 factors and 5000 rows.

    I'm now trying to work with the model but when I call e.g.

    risks = rsf.predict_risk(X_test)

    the notebook just hangs indefinitely with no sign of CPU activity.

    opened by simonthorogood 4
  • Can not import custom class depend on pysurvival

    Can not import custom class depend on pysurvival

    Hello,

    I have following class

    from pysurvival.models.multi_task import NeuralMultiTaskModel
    
    import joblib
    import numpy as np
    
    from ml_models.preprocessing.one_hot_encoding import PreProcessingWOneHot
    from ml_models.templates.model import Model
    from pysurvival.utils import save_model, load_model
    
    from util import get_my_logger
    
    
    class PySurvival(Model):
        def __init__(self):
            super().__init__()
            self.pre_processing = PreProcessingWOneHot()
    
        def build_survival_model(self, parm: dict, row_data: dict, target: np.array) -> (np.ndarray, np.ndarray, np.float64, np.float64, np.ndarray):
            """
    
            Args:
                row_data: list
                target: np.array
                parm: what parameters that I need to pass to models
    
            Returns:
                hold_out_y: target variable for hold out set
                predict_proba: predicted score on hold out set
                logloss: logloss on hold out
                auc: auc on hold out
                feature_score_array: feature score of features
            """
            structure = parm.pop('structure')
            self.model_instance = NeuralMultiTaskModel(bins=parm.pop('bins'), structure=structure)
            self.logger.info("building training data started")
            train_x, time, event = self.build_data_survival(row_data, target)
            self.logger.info("time is {0}".format(time[:10]))
            self.logger.info("time is {0}".format(event[:10]))
            self.logger.info("target is {0}".format(target[:10]))
    
            self.logger.info("final model building starting")
            self.model_instance.fit(train_x, time, event, **parm)
    
            hazard, density, survival = self.model_instance.predict(train_x)
            risk = self.model_instance.predict_risk(train_x)
    
            return {"time": time, "event": event, "hazard": hazard, "density": density, "survival": survival, "risk": risk}
    

    In another file i am importing that class like following

    from ml_models.templates.py_survival.py_survival_model import PySurvival

    However, this doesn't work, it throws following error

    Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

    However, it works if I import pysurvival first, before importing class, it works like following.

    import pysurvival
    from ml_models.templates.py_survival.py_survival_model import PySurvival
    

    Do you know what is happening ? Any help is appreciated.

    This is great package. Thank you for making open sources

    opened by kush99993s 2
  • Tutorial - Employee Retention - Dropping low salary feature

    Tutorial - Employee Retention - Dropping low salary feature

    I don't fully understand how the salary feature is handled in the Employee Retention. There appears to be an ordinal with 3 categories: low, medium and high. What happens here is that:

    1. The salary feature is one-hot encoded - Why wouldn't an ordinal encoding work here, considering the tree model?

    2. The correlation is then tested on the "low" and "medium" columns, which is very negative - Isn't this quite expected, considering it's a categorical feature?

    3. The "low" column is dropped - Doesn't that mean that we effectively grouped "high" and "low" salary together?

    opened by Olof-Hojvall 1
  • Error when installing on Mac

    Error when installing on Mac

    I recently bought a MacBook Pro 2 GHz Intel Core i5, and RAM of 16Gb. I have installed Python's last version 3.9.0. I am trying to install PySurvival on my new laptop following the usual procedure, however, I get a lot of error messages when running the command pip3 install pysurvival

    How could I solve this issue please?

    ERROR: Command errored out with exit status 1:
       command: /usr/local/opt/[email protected]/bin/python3.9 /usr/local/lib/python3.9/site-packages/pip install --ignore-installed --no-user --prefix /private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-build-env-gazj7qjl/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'cython >= 0.29' 'numpy==1.14.5; python_version<'"'"'3.7'"'"'' 'numpy==1.16.0; python_version>='"'"'3.7'"'"'' setuptools setuptools_scm wheel
           cwd: None
      Complete output (3595 lines):
      Ignoring numpy: markers 'python_version < "3.7"' don't match your environment
      Collecting cython>=0.29
        Using cached Cython-0.29.21-py2.py3-none-any.whl (974 kB)
      Collecting numpy==1.16.0
        Using cached numpy-1.16.0.zip (5.1 MB)
      Collecting setuptools
        Using cached setuptools-50.3.2-py3-none-any.whl (785 kB)
      Collecting setuptools_scm
        Using cached setuptools_scm-4.1.2-py2.py3-none-any.whl (27 kB)
      Collecting wheel
        Using cached wheel-0.35.1-py2.py3-none-any.whl (33 kB)
      Building wheels for collected packages: numpy
        Building wheel for numpy (setup.py): started
        Building wheel for numpy (setup.py): still running...
        Building wheel for numpy (setup.py): finished with status 'error'
        ERROR: Command errored out with exit status 1:
         command: /usr/local/opt/[email protected]/bin/python3.9 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-install-ts_1r59v/numpy/setup.py'"'"'; __file__='"'"'/private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-install-ts_1r59v/numpy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-wheel-obu2ycmj
             cwd: /private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-install-ts_1r59v/numpy/
        Complete output (3194 lines):
        Running from numpy source directory.
        /private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-install-ts_1r59v/numpy/numpy/distutils/misc_util.py:476: SyntaxWarning: "is" with a literal. Did you mean "=="?
          return is_string(s) and ('*' in s or '?' is s)
        blas_opt_info:
        blas_mkl_info:
        customize UnixCCompiler
          libraries mkl_rt not found in ['/usr/local/Cellar/[email protected]/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib', '/usr/local/lib', '/usr/lib']
          NOT AVAILABLE
      
        blis_info:
        customize UnixCCompiler
          libraries blis not found in ['/usr/local/Cellar/[email protected]/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib', '/usr/local/lib', '/usr/lib']
          NOT AVAILABLE
      
    
     
    
    opened by elopezfune 0
  • Installation fails

    Installation fails

    I am trying to install the library on my computer pip install pysurvival, however, I get the following error message

          /usr/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
            446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
          error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
          [end of output]
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
    error: legacy-install-failure
    
    × Encountered error while trying to install package.
    ╰─> pysurvival
    
    note: This is an issue with the package mentioned above, not pip.
    hint: See above for output from the failure.
    
    
    I am running on Ubuntu 22.04.1 LTS, python version 3.10.06
    
    opened by elopezfune 1
  • installation fails for python 3.10.7

    installation fails for python 3.10.7

    Unable to install pysurvival.

    Here is the version information:

    OS: MacOs Monterey 12.3.1.

    % python3 --version
    Python 3.10.7
    % pip3 --version
    pip 22.3 from /Users/<somewhere>/lib/python3.10/site-packages/pip (python 3.10)
    % gcc --version
    Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk/usr/include/c++/4.2.1
    Apple clang version 13.0.0 (clang-1300.0.27.3)
    Target: arm64-apple-darwin21.4.0
    Thread model: posix
    InstalledDir: /Library/Developer/CommandLineTools/usr/bin
    

    Here are the errors reported during installation

    % pip3 install pysurvival
    ....
          building 'pysurvival.models._non_parametric' extension
          clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch arm64 -arch x86_64 -g -I/usr/local/opt/llvm/include -I/Users/matthew/venv/python3.10.7_20220906/include -I/Library/Frameworks/Python.framework/Versions/3.10/include/python3.10 -c pysurvival/cpp_extensions/_non_parametric.cpp -o build/temp.macosx-10.9-universal2-3.10/pysurvival/cpp_extensions/_non_parametric.o -std=c++11 -O3
          pysurvival/cpp_extensions/_non_parametric.cpp:8246:5: error: expression is not assignable
              ++Py_REFCNT(o);
              ^ ~~~~~~~~~~~~
          pysurvival/cpp_extensions/_non_parametric.cpp:8248:5: error: expression is not assignable
              --Py_REFCNT(o);
              ^ ~~~~~~~~~~~~
          pysurvival/cpp_extensions/_non_parametric.cpp:8510:5: error: expression is not assignable
              ++Py_REFCNT(o);
              ^ ~~~~~~~~~~~~
          pysurvival/cpp_extensions/_non_parametric.cpp:8512:5: error: expression is not assignable
              --Py_REFCNT(o);
              ^ ~~~~~~~~~~~~
          pysurvival/cpp_extensions/_non_parametric.cpp:8947:71: error: no member named 'tp_print' in '_typeobject'
            __pyx_type_10pysurvival_6models_15_non_parametric__KaplanMeierModel.tp_print = 0;
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
          pysurvival/cpp_extensions/_non_parametric.cpp:8971:66: error: no member named 'tp_print' in '_typeobject'
            __pyx_type_10pysurvival_6models_15_non_parametric__KernelModel.tp_print = 0;
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
          pysurvival/cpp_extensions/_non_parametric.cpp:9661:22: warning: '_PyUnicode_get_wstr_length' is deprecated [-Wdeprecated-declarations]
                              (PyUnicode_GET_SIZE(**name) != PyUnicode_GET_SIZE(key)) ? 1 :
                               ^
          /Library/Frameworks/Python.framework/Versions/3.10/include/python3.10/cpython/unicodeobject.h:261:7: note: expanded from macro 'PyUnicode_GET_SIZE'
                PyUnicode_WSTR_LENGTH(op) :                    \
                ^
          /Library/Frameworks/Python.framework/Versions/3.10/include/python3.10/cpython/unicodeobject.h:451:35: note: expanded from macro 'PyUnicode_WSTR_LENGTH'
          #define PyUnicode_WSTR_LENGTH(op) _PyUnicode_get_wstr_length((PyObject*)op)
    
    opened by mmp3 1
  • Estimator used for p-value approximation

    Estimator used for p-value approximation

    Hello! There were several approximations proposed in the original paper here, but which method was used to approximate the p-value for Conditional Survival Forests? Additionally, is the final p-value used for comparison corrected for multiple hypothesis testing using Bonferroni's correction? Thanks ahead.

    opened by ninahua 0
  • about: local variable 'step_size_min' referenced before assignment in pysurvival

    about: local variable 'step_size_min' referenced before assignment in pysurvival

    The error message is prompted by the step function in the rprop.py file in the user torch directory. You only need to initialize the variable that reports the error in this function, similar to step_size_min=[] def step(self, closure=None): ...... F.rprop(params, grads, prevs, step_sizes, step_size_min=step_size_min,
    step_size_max=step_size_max, etaminus=etaminus, etaplus=etaplus)

    opened by dadekandrew2010 3
MICOM is a Python package for metabolic modeling of microbial communities

Welcome MICOM is a Python package for metabolic modeling of microbial communities currently developed in the Gibbons Lab at the Institute for Systems

null 57 Dec 21, 2022
ArviZ is a Python package for exploratory analysis of Bayesian models

ArviZ (pronounced "AR-vees") is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, model checking, comparison and diagnostics

ArviZ 1.3k Jan 5, 2023
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 3, 2023
Open source time series library for Python

PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible array

Ross Taylor 2k Jan 2, 2023
Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

By Investors, For Investors. Want to read this in Chinese? Click here Empyrial is a Python-based open-source quantitative investment library dedicated

Santosh 640 Dec 31, 2022
SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

SageMaker Python SDK SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the S

Amazon Web Services 1.8k Jan 1, 2023
Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

Amazon Web Services - Labs 3.3k Jan 3, 2023
A python library for Bayesian time series modeling

PyDLM Welcome to pydlm, a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and W

Sam 438 Dec 17, 2022
Pyomo is an object-oriented algebraic modeling language in Python for structured optimization problems.

Pyomo is a Python-based open-source software package that supports a diverse set of optimization capabilities for formulating and analyzing optimization models. Pyomo can be used to define symbolic problems, create concrete problem instances, and solve these instances with standard solvers.

Pyomo 1.4k Dec 28, 2022
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

null 23.3k Dec 31, 2022
An open-source library of algorithms to analyse time series in GPU and CPU.

An open-source library of algorithms to analyse time series in GPU and CPU.

Shapelets 216 Dec 30, 2022
Nixtla is an open-source time series forecasting library.

Nixtla Nixtla is an open-source time series forecasting library. We are helping data scientists and developers to have access to open source state-of-

Nixtla 401 Jan 8, 2023
Pytools is an open source library containing general machine learning and visualisation utilities for reuse

pytools is an open source library containing general machine learning and visualisation utilities for reuse, including: Basic tools for API developmen

BCG Gamma 26 Nov 6, 2022
Data Version Control or DVC is an open-source tool for data science and machine learning projects

Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj

Azaria Gebremichael 2 Jul 29, 2021
MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

The collaboration platform for Machine Learning MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine

MLReef 1.4k Dec 27, 2022
Uplift modeling and causal inference with machine learning algorithms

Disclaimer This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to chang

Uber Open Source 3.7k Jan 7, 2023
Automated modeling and machine learning framework FEDOT

This repository contains FEDOT - an open-source framework for automated modeling and machine learning (AutoML). It can build custom modeling pipelines for different real-world processes in an automated way using an evolutionary approach. FEDOT supports classification (binary and multiclass), regression, clustering, and time series prediction tasks.

National Center for Cognitive Research of ITMO University 148 Jul 5, 2021
A Pythonic framework for threat modeling

pytm: A Pythonic framework for threat modeling Introduction Traditional threat modeling too often comes late to the party, or sometimes not at all. In

Izar Tarandach 644 Dec 20, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022