PySurvival is an open source python package for Survival Analysis modeling

Square

Last update: Dec 27, 2022

Related tags

Machine Learning python machine-learning deep-learning numpy pytorch survival-analysis

Overview

PySurvival

What is Pysurvival ?

PySurvival is an open source python package for Survival Analysis modeling - the modeling concept used to analyze or predict when an event is likely to happen. It is built upon the most commonly used machine learning packages such NumPy, SciPy and PyTorch.

PySurvival is compatible with Python 2.7-3.7.

Check out the documentation here

Content

PySurvival provides a very easy way to navigate between theoretical knowledge on Survival Analysis and detailed tutorials on how to conduct a full analysis, build and use a model. Indeed, the package contains:

10+ models ranging from the Cox Proportional Hazard model, the Neural Multi-Task Logistic Regression to Random Survival Forest
Summaries of the theory behind each model as well as API descriptions and examples.
Tutorials displaying in great details how to perform exploratory data analysis, survival modeling, cross-validation and prediction, for churn modeling and credit risk to name a few.
Performance metrics to assess the models' abilities like c-index or brier score
Simple ways to load and save models
... and more !

Installation

If you have already installed a working version of gcc, the easiest way to install Pysurvival is using pip

pip install pysurvival

The full description of the installation steps can be found here.

Get Started

Because of its simple API, Pysurvival has been built to provide to best user experience when it comes to modeling. Here's a quick modeling example to get you started:

# Loading the modules
from pysurvival.models.semi_parametric import CoxPHModel
from pysurvival.models.multi_task import LinearMultiTaskModel
from pysurvival.datasets import Dataset
from pysurvival.utils.metrics import concordance_index

# Loading and splitting a simple example into train/test sets
X_train, T_train, E_train, X_test, T_test, E_test = \
	Dataset('simple_example').load_train_test()

# Building a CoxPH model
coxph_model = CoxPHModel()
coxph_model.fit(X=X_train, T=T_train, E=E_train, init_method='he_uniform', 
                l2_reg = 1e-4, lr = .4, tol = 1e-4)

# Building a MTLR model
mtlr = LinearMultiTaskModel()
mtlr.fit(X=X_train, T=T_train, E=E_train, init_method = 'glorot_uniform', 
           optimizer ='adam', lr = 8e-4)

# Checking the model performance
c_index1 = concordance_index(model=coxph_model, X=X_test, T=T_test, E=E_test )
print("CoxPH model c-index = {:.2f}".format(c_index1))

c_index2 = concordance_index(model=mtlr, X=X_test, T=T_test, E=E_test )
print("MTLR model c-index = {:.2f}".format(c_index2))

Citation and License

Citation

If you use Pysurvival in your research and we would greatly appreciate if you could use the following:

@Misc{ pysurvival_cite,
  author =    {Stephane Fotso and others},
  title =     {PySurvival: Open source package for Survival Analysis modeling},
  year =      {2019--},
  url = "https://www.pysurvival.io/"
}

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments

RSF implementation seems to hang when predicting

I've installed pysurvival using brew for gcc and pip on my MacBook Pro (macOS 10.13.6) and been able to train a RSF model in a Jupyter Notebook (though this took several minutes of high CPU activity). The training data has around 70 factors and 5000 rows.

I'm now trying to work with the model but when I call e.g.

risks = rsf.predict_risk(X_test)

the notebook just hangs indefinitely with no sign of CPU activity.

opened by simonthorogood 4

Can not import custom class depend on pysurvival

Hello,

I have following class

from pysurvival.models.multi_task import NeuralMultiTaskModel

import joblib
import numpy as np

from ml_models.preprocessing.one_hot_encoding import PreProcessingWOneHot
from ml_models.templates.model import Model
from pysurvival.utils import save_model, load_model

from util import get_my_logger


class PySurvival(Model):
    def __init__(self):
        super().__init__()
        self.pre_processing = PreProcessingWOneHot()

    def build_survival_model(self, parm: dict, row_data: dict, target: np.array) -> (np.ndarray, np.ndarray, np.float64, np.float64, np.ndarray):
        """

        Args:
            row_data: list
            target: np.array
            parm: what parameters that I need to pass to models

        Returns:
            hold_out_y: target variable for hold out set
            predict_proba: predicted score on hold out set
            logloss: logloss on hold out
            auc: auc on hold out
            feature_score_array: feature score of features
        """
        structure = parm.pop('structure')
        self.model_instance = NeuralMultiTaskModel(bins=parm.pop('bins'), structure=structure)
        self.logger.info("building training data started")
        train_x, time, event = self.build_data_survival(row_data, target)
        self.logger.info("time is {0}".format(time[:10]))
        self.logger.info("time is {0}".format(event[:10]))
        self.logger.info("target is {0}".format(target[:10]))

        self.logger.info("final model building starting")
        self.model_instance.fit(train_x, time, event, **parm)

        hazard, density, survival = self.model_instance.predict(train_x)
        risk = self.model_instance.predict_risk(train_x)

        return {"time": time, "event": event, "hazard": hazard, "density": density, "survival": survival, "risk": risk}

In another file i am importing that class like following

from ml_models.templates.py_survival.py_survival_model import PySurvival

However, this doesn't work, it throws following error

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

However, it works if I import pysurvival first, before importing class, it works like following.

import pysurvival
from ml_models.templates.py_survival.py_survival_model import PySurvival

Do you know what is happening ? Any help is appreciated.

This is great package. Thank you for making open sources

opened by kush99993s 2

Tutorial - Employee Retention - Dropping low salary feature
I don't fully understand how the salary feature is handled in the Employee Retention. There appears to be an ordinal with 3 categories: low, medium and high. What happens here is that:

The salary feature is one-hot encoded - Why wouldn't an ordinal encoding work here, considering the tree model?

The correlation is then tested on the "low" and "medium" columns, which is very negative - Isn't this quite expected, considering it's a categorical feature?

The "low" column is dropped - Doesn't that mean that we effectively grouped "high" and "low" salary together?
opened by Olof-Hojvall 1

Error when installing on Mac

I recently bought a MacBook Pro 2 GHz Intel Core i5, and RAM of 16Gb. I have installed Python's last version 3.9.0. I am trying to install PySurvival on my new laptop following the usual procedure, however, I get a lot of error messages when running the command pip3 install pysurvival

How could I solve this issue please?

ERROR: Command errored out with exit status 1:
   command: /usr/local/opt/[email protected]/bin/python3.9 /usr/local/lib/python3.9/site-packages/pip install --ignore-installed --no-user --prefix /private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-build-env-gazj7qjl/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'cython >= 0.29' 'numpy==1.14.5; python_version<'"'"'3.7'"'"'' 'numpy==1.16.0; python_version>='"'"'3.7'"'"'' setuptools setuptools_scm wheel
       cwd: None
  Complete output (3595 lines):
  Ignoring numpy: markers 'python_version < "3.7"' don't match your environment
  Collecting cython>=0.29
    Using cached Cython-0.29.21-py2.py3-none-any.whl (974 kB)
  Collecting numpy==1.16.0
    Using cached numpy-1.16.0.zip (5.1 MB)
  Collecting setuptools
    Using cached setuptools-50.3.2-py3-none-any.whl (785 kB)
  Collecting setuptools_scm
    Using cached setuptools_scm-4.1.2-py2.py3-none-any.whl (27 kB)
  Collecting wheel
    Using cached wheel-0.35.1-py2.py3-none-any.whl (33 kB)
  Building wheels for collected packages: numpy
    Building wheel for numpy (setup.py): started
    Building wheel for numpy (setup.py): still running...
    Building wheel for numpy (setup.py): finished with status 'error'
    ERROR: Command errored out with exit status 1:
     command: /usr/local/opt/[email protected]/bin/python3.9 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-install-ts_1r59v/numpy/setup.py'"'"'; __file__='"'"'/private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-install-ts_1r59v/numpy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-wheel-obu2ycmj
         cwd: /private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-install-ts_1r59v/numpy/
    Complete output (3194 lines):
    Running from numpy source directory.
    /private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-install-ts_1r59v/numpy/numpy/distutils/misc_util.py:476: SyntaxWarning: "is" with a literal. Did you mean "=="?
      return is_string(s) and ('*' in s or '?' is s)
    blas_opt_info:
    blas_mkl_info:
    customize UnixCCompiler
      libraries mkl_rt not found in ['/usr/local/Cellar/[email protected]/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib', '/usr/local/lib', '/usr/lib']
      NOT AVAILABLE
  
    blis_info:
    customize UnixCCompiler
      libraries blis not found in ['/usr/local/Cellar/[email protected]/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib', '/usr/local/lib', '/usr/lib']
      NOT AVAILABLE

opened by elopezfune 0

Installation fails

I am trying to install the library on my computer pip install pysurvival, however, I get the following error message

      /usr/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> pysurvival

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.


I am running on Ubuntu 22.04.1 LTS, python version 3.10.06

opened by elopezfune 1

installation fails for python 3.10.7

Unable to install pysurvival.

Here is the version information:

OS: MacOs Monterey 12.3.1.

% python3 --version
Python 3.10.7
% pip3 --version
pip 22.3 from /Users/<somewhere>/lib/python3.10/site-packages/pip (python 3.10)
% gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk/usr/include/c++/4.2.1
Apple clang version 13.0.0 (clang-1300.0.27.3)
Target: arm64-apple-darwin21.4.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Here are the errors reported during installation

% pip3 install pysurvival
....
      building 'pysurvival.models._non_parametric' extension
      clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch arm64 -arch x86_64 -g -I/usr/local/opt/llvm/include -I/Users/matthew/venv/python3.10.7_20220906/include -I/Library/Frameworks/Python.framework/Versions/3.10/include/python3.10 -c pysurvival/cpp_extensions/_non_parametric.cpp -o build/temp.macosx-10.9-universal2-3.10/pysurvival/cpp_extensions/_non_parametric.o -std=c++11 -O3
      pysurvival/cpp_extensions/_non_parametric.cpp:8246:5: error: expression is not assignable
          ++Py_REFCNT(o);
          ^ ~~~~~~~~~~~~
      pysurvival/cpp_extensions/_non_parametric.cpp:8248:5: error: expression is not assignable
          --Py_REFCNT(o);
          ^ ~~~~~~~~~~~~
      pysurvival/cpp_extensions/_non_parametric.cpp:8510:5: error: expression is not assignable
          ++Py_REFCNT(o);
          ^ ~~~~~~~~~~~~
      pysurvival/cpp_extensions/_non_parametric.cpp:8512:5: error: expression is not assignable
          --Py_REFCNT(o);
          ^ ~~~~~~~~~~~~
      pysurvival/cpp_extensions/_non_parametric.cpp:8947:71: error: no member named 'tp_print' in '_typeobject'
        __pyx_type_10pysurvival_6models_15_non_parametric__KaplanMeierModel.tp_print = 0;
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
      pysurvival/cpp_extensions/_non_parametric.cpp:8971:66: error: no member named 'tp_print' in '_typeobject'
        __pyx_type_10pysurvival_6models_15_non_parametric__KernelModel.tp_print = 0;
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
      pysurvival/cpp_extensions/_non_parametric.cpp:9661:22: warning: '_PyUnicode_get_wstr_length' is deprecated [-Wdeprecated-declarations]
                          (PyUnicode_GET_SIZE(**name) != PyUnicode_GET_SIZE(key)) ? 1 :
                           ^
      /Library/Frameworks/Python.framework/Versions/3.10/include/python3.10/cpython/unicodeobject.h:261:7: note: expanded from macro 'PyUnicode_GET_SIZE'
            PyUnicode_WSTR_LENGTH(op) :                    \
            ^
      /Library/Frameworks/Python.framework/Versions/3.10/include/python3.10/cpython/unicodeobject.h:451:35: note: expanded from macro 'PyUnicode_WSTR_LENGTH'
      #define PyUnicode_WSTR_LENGTH(op) _PyUnicode_get_wstr_length((PyObject*)op)

opened by mmp3 1

Estimator used for p-value approximation

Hello! There were several approximations proposed in the original paper here, but which method was used to approximate the p-value for Conditional Survival Forests? Additionally, is the final p-value used for comparison corrected for multiple hypothesis testing using Bonferroni's correction? Thanks ahead.

opened by ninahua 0
about: local variable 'step_size_min' referenced before assignment in pysurvival

The error message is prompted by the step function in the rprop.py file in the user torch directory. You only need to initialize the variable that reports the error in this function, similar to step_size_min=[] def step(self, closure=None): ...... F.rprop(params, grads, prevs, step_sizes, step_size_min=step_size_min,
step_size_max=step_size_max, etaminus=etaminus, etaplus=etaplus)

opened by dadekandrew2010 3

Owner

Square

GitHub https://www.pysurvival.io/

MICOM is a Python package for metabolic modeling of microbial communities

Welcome MICOM is a Python package for metabolic modeling of microbial communities currently developed in the Gibbons Lab at the Institute for Systems

57 Dec 21, 2022

ArviZ is a Python package for exploratory analysis of Bayesian models

ArviZ (pronounced "AR-vees") is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, model checking, comparison and diagnostics

1.3k Jan 5, 2023

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

1.8k Jan 3, 2023

Open source time series library for Python

PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible array

2k Jan 2, 2023

Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

By Investors, For Investors. Want to read this in Chinese? Click here Empyrial is a Python-based open-source quantitative investment library dedicated

640 Dec 31, 2022

SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

SageMaker Python SDK SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the S

1.8k Jan 1, 2023

Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

3.3k Jan 3, 2023

A python library for Bayesian time series modeling

PyDLM Welcome to pydlm, a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and W

438 Dec 17, 2022

Pyomo is an object-oriented algebraic modeling language in Python for structured optimization problems.

Pyomo is a Python-based open-source software package that supports a diverse set of optimization capabilities for formulating and analyzing optimization models. Pyomo can be used to define symbolic problems, create concrete problem instances, and solve these instances with standard solvers.

1.4k Dec 28, 2022

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

23.3k Dec 31, 2022

An open-source library of algorithms to analyse time series in GPU and CPU.

216 Dec 30, 2022

Nixtla is an open-source time series forecasting library.

Nixtla Nixtla is an open-source time series forecasting library. We are helping data scientists and developers to have access to open source state-of-

401 Jan 8, 2023

Pytools is an open source library containing general machine learning and visualisation utilities for reuse

pytools is an open source library containing general machine learning and visualisation utilities for reuse, including: Basic tools for API developmen

26 Nov 6, 2022

Data Version Control or DVC is an open-source tool for data science and machine learning projects

Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj

2 Jul 29, 2021

MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

The collaboration platform for Machine Learning MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine

1.4k Dec 27, 2022

Uplift modeling and causal inference with machine learning algorithms

Disclaimer This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to chang

3.7k Jan 7, 2023

Automated modeling and machine learning framework FEDOT

This repository contains FEDOT - an open-source framework for automated modeling and machine learning (AutoML). It can build custom modeling pipelines for different real-world processes in an automated way using an evolutionary approach. FEDOT supports classification (binary and multiclass), regression, clustering, and time series prediction tasks.

National Center for Cognitive Research of ITMO University

148 Jul 5, 2021

A Pythonic framework for threat modeling

pytm: A Pythonic framework for threat modeling Introduction Traditional threat modeling too often comes late to the party, or sometimes not at all. In

644 Dec 20, 2022

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

4.2k Dec 29, 2022