Survival analysis in Python

Overview

PyPI version Anaconda-Server Badge Build Status Coverage Status DOI

What is survival analysis and why should I learn it? Survival analysis was originally developed and applied heavily by the actuarial and medical community. Its purpose was to answer why do events occur now versus later under uncertainty (where events might refer to deaths, disease remission, etc.). This is great for researchers who are interested in measuring lifetimes: they can answer questions like what factors might influence deaths?

But outside of medicine and actuarial science, there are many other interesting and exciting applications of survival analysis. For example:

  • SaaS providers are interested in measuring subscriber lifetimes, or time to some first action
  • inventory stock out is a censoring event for true "demand" of a good.
  • sociologists are interested in measuring political parties' lifetimes, or relationships, or marriages
  • A/B tests to determine how long it takes different groups to perform an action.

lifelines is a pure Python implementation of the best parts of survival analysis.

Documentation and intro to survival analysis

If you are new to survival analysis, wondering why it is useful, or are interested in lifelines examples, API, and syntax, please read the Documentation and Tutorials page

Contact

Roadmap

You can find the roadmap for lifelines here.

Development

See our Contributing guidelines.

Comments
  • Some failing CoxPH tests

    Some failing CoxPH tests

    Noticed some strange behavior with CoxPH. I always have normalized data, which did not seem to work good with the implementation. The failures here seem unrelated to that though.

    The added tests currently fail with the following trace:

    Traceback (most recent call last): File "/home/jonas/workspacepython/lifelines/lifelines/tests/test_suite.py", line 937, in test_crossval_normalized event_col='E', k=3) File "/home/jonas/workspacepython/lifelines/lifelines/utils.py", line 311, in k_fold_cross_validation fitter.fit(training_data, duration_col=duration_col, event_col=event_col) File "/home/jonas/workspacepython/lifelines/lifelines/estimation.py", line 998, in fit include_likelihood=include_likelihood) File "/home/jonas/workspacepython/lifelines/lifelines/estimation.py", line 938, in _newton_rhaphson delta = solve(-hessian, step_size * gradient.T) File "/home/jonas/anaconda3/lib/python3.4/site-packages/numpy/linalg/linalg.py", line 381, in solve r = gufunc(a, b, signature=signature, extobj=extobj) File "/home/jonas/anaconda3/lib/python3.4/site-packages/numpy/linalg/linalg.py", line 90, in _raise_linalgerror_singular raise LinAlgError("Singular matrix") numpy.linalg.linalg.LinAlgError: Singular matrix

    However, note that I have commented out one of the datasets because that seems to cause the cross-validation to end up in an infinite loop of some kind. The tests never finish (only waited for ~2 minutes).

    Doing similar things with R works with no problem.

    Doing the following is a fast way to check the results:

    python -m unittest lifelines.tests.test_suite.CoxRegressionTests
    
    opened by spacecowboy 37
  • Add concordance index function

    Add concordance index function

    This commit includes a function for calculating Harrell's concordance index, which can be calculated in R using 'hmisc'. The function is implemented in Fortran, with a small Python wrapper. The reason for this is that since calculating the C-index is an O(n^2) process, it quickly becomes unacceptably slow with pure Python. Comparing a pure Python implementation with the Fortran version, on arrays with length 1000, Python required 434 ms while Fortran did it in 4.73 ms. So almost a factor of 100 difference.

    As a consequence of the addition of the Fortran module, the setup script now utilizes numpy's setup function which will handle the compilation of the native code.

    In addition, a small unit test has been added. To be able to run the unit tests, it is likely necessary to compile the native code first with:

    python setup.py build_ext --inplace
    

    I'm not sure how you want to organize the source code, so I opted for naming the file "_statistics.f90" which compiles to a module "_statistics". The function inside is then imported and wrapped in "statistics.py". My thinking is that any "module.py" might have some native code related to it in a "_module.f90" or "_module.c" file.

    As a reference, here is a pure python version of the function:

    def concordance_index(event_times, predicted_event_times, event_observed=None):
        """
        Calculates the concordance index (C-index) between two series
        of event times. The first is the real survival times from
        the experimental data, and the other is the predicted survival
        times from a model of some kind.
    
        The concordance index is a value between 0 and 1 where,
        0.5 is the expected result from random predictions,
        1.0 is perfect concordance and,
        0.0 is perfect anti-concordance (multiply predictions with -1 to get 1.0)
    
        Parameters:
          event_times: a (nx1) array of observed survival times.
          predicted_event_times: a (nx1) array of predicted survival times.
          event_observed: a (nx1) array of censorship flags, 1 if observed,
                          0 if not. Default assumes all observed.
    
        Returns:
          c-index: a value between 0 and 1.
        """
        event_times = np.array(event_times, dtype=float)
        predicted_event_times = np.array(predicted_event_times, dtype=float)
    
        if event_observed is None:
            event_observed = np.ones(event_times.shape[0], dtype=float)
    
        if event_times.shape != predicted_event_times.shape:
            raise ValueError("Event times arrays must have the same shape!")
    
        def valid_comparison(time_a, time_b, event_a, event_b):
            """True if times can be compared."""
            if event_a and event_b:
                return True
            elif event_a and time_a < time_b:
                return True
            elif event_b and time_b < time_a:
                return True
            else:
                return False
    
        def concordance_value(time_a, time_b, pred_a, pred_b):
            if pred_a == pred_b:
                # Same as random
                return 0.5
            elif time_a < time_b and pred_a < pred_b:
                return 1.0
            elif time_b < time_a and pred_b < pred_a:
                return 1.0
            else:
                return 0.0
    
        paircount = 0.0
        csum = 0.0
    
        for a, (time_a, pred_a, event_a) in enumerate(zip(event_times,
                                                          predicted_event_times,
                                                          event_observed)):
            # Don't want to double count
            for b in range(a + 1, len(event_times)):
                time_b = event_times[b]
                pred_b = predicted_event_times[b]
                event_b = event_observed[b]
    
                if valid_comparison(time_a, time_b, event_a, event_b):
                    paircount += 1.0
                    csum += concordance_value(time_a, time_b, pred_a, pred_b)
    
        return csum / paircount
    
    opened by spacecowboy 26
  • Better alignment and sizing of at_risk_counts

    Better alignment and sizing of at_risk_counts

    By setting ha = "center" you get nicer alignment with the x ticks.

    ha = "right"

    image

    ha = "center"

    image

    I also hacked together a way to adjust the font size by adding it as a parameter to the function that accepts an integer x then adding:

    ax2.set_xlabel("At risk", fontsize = x)

    Probably a nicer way to incorporate that into the arguments though.

    edit: Or very possible that there was already a way to adjust the font size and I just couldn't figure it out!

    installation plotting 
    opened by NickCEBM 24
  • Multiple comparisons testing

    Multiple comparisons testing

    Multiple comparisons corrections with something like Bonferroni would be useful. This would also require generating p-values for the logrank statistic from the Chi**2 distribution.

    enhancement 
    opened by waltonjones 20
  • CoxPHFitter Error

    CoxPHFitter Error

    Not sure if this is the right place for this, but I am having an issue getting the CoxPH methods to work. I am new to survival analysis, so I assume this is something wrong with my data set up. I am getting a delta contains nan's, convergence halted during the coxph.fit()? Wondering if anyone can shed some light on why this is happening?

    Thanks,

    ValueError Traceback (most recent call last) 3 cphf1=CoxPHFitter() ----> 4 cphf1.fit(X, 'T','E') 5 cphf1.print_summary()

    /lifelines/fitters/coxph_fitter.pyc in fit(self, df, duration_col, event_col, show_progress, initial_beta, include_likelihood, strata) 313 hazards_ = self.newton_rhaphson(df, T, E, initial_beta=initial_beta, 314 show_progress=show_progress, --> 315 include_likelihood=include_likelihood) 316 317 self.hazards = pd.DataFrame(hazards_.T, columns=df.columns,

    lifelines/fitters/coxph_fitter.pyc in _newton_rhaphson(self, X, T, E, initial_beta, step_size, precision, show_progress, include_likelihood) 223 delta = solve(-h, step_size * g.T) 224 if np.any(np.isnan(delta)): --> 225 raise ValueError("delta contains nan value(s). Convergence halted.") 226 227 # Save these as pending result

    ValueError: delta contains nan value(s). Convergence halted.

    convergence issue 
    opened by slipss 16
  • nlogn concordance index algorithm (first pass)

    nlogn concordance index algorithm (first pass)

    My dataset is about 200k rows, so even the Fortran concordance option takes more than a minute because it's an n^2 algorithm. I wrote a faster (n log n) version. On 100k rows of fake data, this takes the time down from 52s (previous fast n^2 Fortran version) to 4s (current pure-Python n log n version).

    Right now it introduces a dependency on another library (blist) because I didn't want to write the order statistic tree myself. Unfortunately blist's order statistic trees might be O(log^2 n) instead of O(log n) for RANK operations, so right now this might be O(n log^2 n) in practice. Also I suspect blist is slower than a data structure that just tried to be an order statistic tree would be. Anyway, because of the dependency issue, I wanted to run this by the maintainers before I go any further with it. What do you recommend?

    PS It's also not quite correct yet--it disagrees with the Fortran implementation on the full Cox model concordance test, and they both disagree with what I get in R. I still have to track this down.

    opened by benkuhn 16
  • Speeding up Aalen Additive Regression

    Speeding up Aalen Additive Regression

    Hi, I've been working on a project for a few months now and one problem I have is that it can take about 4 days to run on 340k rows, with about 6 features.

    I know lifelines isn't necessarily designed for this and I've discovered that the ridge regression solve step is the biggest bottleneck - 60% of the compute time happens there.

    Are there alternative algorithms I can use like mini-batch say? Rather than the ridge regression?

    performance 
    opened by springcoil 15
  • Create aalen_johansen_fitter.py

    Create aalen_johansen_fitter.py

    Adding a Aalen Johansen fitter as I mentioned in #413. Still needs some cleaning up. Items still needed: standard error estimator, tests, check to see how well jitter() works, ensure documentation and formatting matches rest of lifelines, write up an example

    How it works is a follows: estimates an overall survival curve, calculates discrete time hazards for the event of interest (event_ind), calculates the cumulative density function (The survival function can then be used to generate the discrete time hazard (minus log transform S(t) and S(t-) where t- is the event time right before t, then subtract the quantities). To estimate F(t,j) you multiply S(t-) with the discrete time hazard and an indicator for j).

    Potential addition: warn users not to calculate survival times from this (only generates the cumulative density function / risk) since the interpretation of those survival times is not straightforward.

    Some discussion and examples: https://www.duo.uio.no/bitstream/handle/10852/10287/stat-res-03-97.pdf?sequence=1 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5557056/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4325676/

    opened by pzivich 14
  • Provide Initial Guess to Regression Fitters

    Provide Initial Guess to Regression Fitters

    It would be great to be able to provide an initial guess point (warm-start) to the regression fitters, such as WeibullAFTFitter. I'm referring to this line:

    https://github.com/CamDavidsonPilon/lifelines/blob/d9d3f9f9acb832d03166e39556770c3374c868a4/lifelines/fitters/init.py#L1018

    I've been comparing this particular fitter to R's survreg, and for some datasets, their solutions don't agree at all. I'd like to provide the same initial values to both codes and hopefully get the same solution.

    opened by bacalfa 12
  • ImportError: LogNormalFitter, LogLogisticFitter, PiecewiseExponentialFitter

    ImportError: LogNormalFitter, LogLogisticFitter, PiecewiseExponentialFitter

    I am having issues importing the following subpackages:

    from lifelines import LogNormalFitter
    from lifelines import LogLogisticFitter
    from lifelines import PiecewiseExponentialFitter
    

    The error message is:

    ImportError: cannot import name 'LogNormalFitter' from 'lifelines' (C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\lifelines\__init__.py)
    

    Any ideas what the problem might be?

    Thank you, c.

    installation 
    opened by cerenaaa 12
  • Model serialization?

    Model serialization?

    First off - AWESOME library. I've been using it for a few of my projects and it's been a lifesaver.

    However, I'm looking to save models. I can't seem to find any documentation on how. My specific use case is just wanting to save the Kaplan-Meier estimator and use it to make predictions later. I can of course save off the data frame of the survival function, but I'd like to pickle (or otherwise) the model and reload in a different module as I'm performing analysis/fitting and then using the model later. Is there another way besides exporting the survival function (I'd like use the predict method...)

    Thanks!

    opened by dwilson1988 12
  • Bugs of Incorrect Calculation of Baseline Hazard & Baseline Cumulative Hazard

    Bugs of Incorrect Calculation of Baseline Hazard & Baseline Cumulative Hazard

    Hello, I am a senior data scientist from Prudential Financial. While working on one project involving the Cox proportional hazard models, I found that the baseline hazard and baseline cumulative hazard were obviously calculated incorrectly using the CoxPHFitter module within the lifelines package. Before jumping into the bug details, I want to share the version information first. The lifelines package I was using was 0.27.0, but by the time I checked the source codes in GitHub again this morning, which belonged to the version 0.27.4 I believe, I still saw the same bugs.

    The bugs originate from the fit_model function in the SemiParametricPHFitter class, which start from the line 1252 of the coxph_fitter.py file. Notice that, the standardized data are supplied to the fit_model function for further estimation. This will not be a problem for the Cox coefficient estimation (i.e., the params in the function), because they are restored to their original scales after being divided by corresponding standard deviations of the original data as in line 1399. However, no similar action has been taken for the predicted_partial_hazards calculated in line 1392. I notice that in line 1393, a matrix multiplication of the standardized data with the uncorrected Cox coefficients was used to avoid the scale issues. However, the location issues were never taken care of. As a result, it is almost like the raw data are shifted which direction and extent depend on their original mean values, and the effect of the shift is incorrectly transferred to the baseline hazard of the Cox model. Thus, it causes all the subsequent calculations of the baseline hazard and baseline cumulative hazard to be incorrect.

    The fixes for the bugs are straightforward, which is basically to use the unstandardized data for baseline hazard related calculations. I have appended some codes below for a lazy fix. By adding them right at the line 1270, it should generate the correct baseline hazard and baseline cumulative hazard. However, this is definitely not ideal, since incorrect calculations are not removed but rather just replaced. If desired, I would be very happy to work with the lifelines development team to come up with a permanent and neater fix for it.

    predicted_partial_hazards_ = (
        pd.DataFrame(np.exp(dot(X.values, self.params_)), columns=["P"]).assign(T=T.values, E=E.values, W=weights.values).set_index(X.index)
    )
    self.baseline_hazard_ = self._compute_baseline_hazards(predicted_partial_hazards_)
    self.baseline_cumulative_hazard_ = self._compute_baseline_cumulative_hazard(self.baseline_hazard_)
    
    opened by bofeng2018 1
  • Is Generalized Gamma still having convergence problems?

    Is Generalized Gamma still having convergence problems?

    I was fitting the Generalized Gamma in my survival data, however, one hour later, the code was still running and it didn't converge. So, I gave up and stopped the fit. I tried to make the negative log-likelihood of Generalized Gamma on my own and I utilized scipy.minimize with Nelder-Mead method to find the maximum likelihood parameters. It converged in two minutes. I would like to know if the Generalized Gamma is still having convergence problems and why?

    def neg_log_Gama_Generalizada(params):
      if min(params)< 0:
        return(np.inf)
      gama = params[0]
      k = params[1]
      alpha = params[2]
      falha = np.array(tempo_censura[tempo_censura['censura']==1]['tempo'])
      tc = np.array(tempo_censura[tempo_censura['censura']==0]['tempo'])
      #pdf = [gama*f**(gama*k-1)*np.exp(-(f/alpha)**gama)/(gamma(k)*alpha**(gama*k)) for f in falha]
      #sf = [1 - gammainc(k,(t/alpha)**gama) for t in tc] 
      pdf = gama*falha**(gama*k-1)*np.exp(-(falha/alpha)**gama)/(gamma(k)*alpha**(gama*k))
      sf = 1 - gammainc(k,(tc/alpha)**gama) 
      log_vero = sum(np.log(pdf))+sum(np.log(sf))
      return(-log_vero)
    
    res_gg = minimize(neg_log_Gama_Generalizada,[1,1,1],method='Nelder-Mead')
    res_gg.x
    

    array([15.55077716, 0.03200795, 17.09975645])

    opened by MichelMiler 0
  • Feature Request: Cause-Specific Hazards Models

    Feature Request: Cause-Specific Hazards Models

    From this open issue, it seems there isn't much support for competing risks models in lifelines. I find myself working on a competing risks problem for which I'd like to use a cause-specific hazards model. As mentioned in this source:

    Cause-specific hazard models can be fit in any statistical software package that permits estimation of the conventional Cox proportional hazards model. One simply treats those subjects who experience a competing event as being censored at the time of the occurrence of the competing event.

    So actually achieving this model using two instances of CoxPHFitter isn't horrendous, but it's a bit of a pain to not be able to call, eg, a single survival function. It seems like implementation could be straightforward based on that quote. Is there any interest in a contribution adding such a model to lifelines ?

    opened by anthonymichaelclark 1
  • KaplanMeierFitter: Index Error when adding at_risk_counts

    KaplanMeierFitter: Index Error when adding at_risk_counts

    Python 3.8 (conda env) lifelines-0.27.1

    Using the into on the docs website: https://lifelines.readthedocs.io/en/latest/Survival%20analysis%20with%20lifelines.html

    kmf = KaplanMeierFitter().fit(T, E, label="all_regimes")
    kmf.plot_survival_function(at_risk_counts=True)
    plt.tight_layout()
    
    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    Input In [88], in <cell line: 2>()
          1 kmf = KaplanMeierFitter().fit(T, E, label="all_regimes")
    ----> 2 kmf.plot_survival_function(at_risk_counts=True)
          3 plt.tight_layout()
    
    File ~/.conda/envs/survival/lib/python3.8/site-packages/lifelines/fitters/kaplan_meier_fitter.py:453, in KaplanMeierFitter.plot_survival_function(self, **kwargs)
        451 """Alias of ``plot``"""
        452 if not CensoringType.is_interval_censoring(self):
    --> 453     return _plot_estimate(self, estimate="survival_function_", **kwargs)
        454 else:
        455     # hack for now.
        456     def safe_pop(dict, key):
    
    File ~/.conda/envs/survival/lib/python3.8/site-packages/lifelines/plotting.py:961, in _plot_estimate(cls, estimate, loc, iloc, show_censors, censor_styles, ci_legend, ci_force_lines, ci_only_lines, ci_no_lines, ci_alpha, ci_show, at_risk_counts, logx, ax, **kwargs)
        950         plot_estimate_config.ax.fill_between(
        951             x,
        952             lower,
       (...)
        957             step=step,
        958         )
        960 if at_risk_counts:
    --> 961     add_at_risk_counts(cls, ax=plot_estimate_config.ax)
        962     plt.tight_layout()
        964 return plot_estimate_config.ax
    
    File ~/.conda/envs/survival/lib/python3.8/site-packages/lifelines/plotting.py:512, in add_at_risk_counts(labels, rows_to_show, ypos, xticks, ax, at_risk_count_from_start_of_period, *fitters, **kwargs)
        505     event_table_slice = f.event_table.assign(at_risk=lambda x: x.at_risk - x.removed)
        507 event_table_slice = (
        508     event_table_slice.loc[:tick, ["at_risk", "censored", "observed"]]
        509     .agg({"at_risk": lambda x: x.tail(1).values, "censored": "sum", "observed": "sum"})  # see #1385
        510     .rename({"at_risk": "At risk", "censored": "Censored", "observed": "Events"})
        511 )
    --> 512 tmp = [int(c) for c in event_table_slice.loc[rows_to_show]]
        513 print(tmp)
        514 counts.extend([int(c) for c in event_table_slice.loc[rows_to_show]])
    
    File ~/.local/lib/python3.8/site-packages/pandas/core/indexing.py:879, in _LocationIndexer.__getitem__(self, key)
        876 axis = self.axis or 0
        878 maybe_callable = com.apply_if_callable(key, self.obj)
    --> 879 return self._getitem_axis(maybe_callable, axis=axis)
    
    File ~/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1099, in _LocIndexer._getitem_axis(self, key, axis)
       1096     if hasattr(key, "ndim") and key.ndim > 1:
       1097         raise ValueError("Cannot index with multidimensional key")
    -> 1099     return self._getitem_iterable(key, axis=axis)
       1101 # nested tuple slicing
       1102 if is_nested_tuple(key, labels):
    
    File ~/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1037, in _LocIndexer._getitem_iterable(self, key, axis)
       1034 self._validate_key(key, axis)
       1036 # A collection of keys
    -> 1037 keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
       1038 return self.obj._reindex_with_indexers(
       1039     {axis: [keyarr, indexer]}, copy=True, allow_dups=True
       1040 )
    
    File ~/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1254, in _LocIndexer._get_listlike_indexer(self, key, axis, raise_missing)
       1251 else:
       1252     keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
    -> 1254 self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
       1255 return keyarr, indexer
    
    File ~/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1298, in _LocIndexer._validate_read_indexer(self, key, indexer, axis, raise_missing)
       1296 if missing == len(indexer):
       1297     axis_name = self.obj._get_axis_name(axis)
    -> 1298     raise KeyError(f"None of [{key}] are in the [{axis_name}]")
       1300 # We (temporarily) allow for some missing keys with .loc, except in
       1301 # some cases (e.g. setting) in which "raise_missing" will be False
       1302 if raise_missing:
    
    KeyError: "None of [Index(['At risk', 'Censored', 'Events'], dtype='object')] are in the [index]"
    
    opened by tobiasweede 5
  • survival_difference_at_fixed_point_in_time_test documentation

    survival_difference_at_fixed_point_in_time_test documentation

    Hey everyone, newbie here!

    The 'survival_difference_at_fixed_point_in_time_test documentation' does not explain that it performs a test using chi-squared distribution and the example can be improved by adding interpretation of the result.

    docs 
    opened by nasserboan 1
Releases(v0.27.4)
  • v0.27.4(Nov 17, 2022)

  • v0.27.3(Sep 25, 2022)

    0.27.3

    New features
    • Fixed and silenced a lot of warnings
    Bug fixes
    • Migrate to newer Pandas Styler for to_latex
    API Changes
    • There were way too many functions on the summary objects, so I've hidden to_* on them.
    Source code(tar.gz)
    Source code(zip)
  • v0.27.2(Sep 8, 2022)

  • v0.27.1(Jun 26, 2022)

    0.27.1 - 2022-03-15

    New features
    • all fit_ methods now accept a fit_options dict that allows one to pass kwargs to the underlying fitting algorithm.
    API Changes
    • step_size is removed from Cox models fit. See fit_options above.
    Bug fixes
    • fixed Cox models when "trival" matrix was passed in (one with no covariates)
    Source code(tar.gz)
    Source code(zip)
  • v0.27.0(Mar 15, 2022)

    0.27.0 - 2022-03-15

    Dropping Python3.6 support.

    Bug fixes
    • Fix late entry in add_at_risk_counts.
    New features
    • add_at_risk_counts has a new flag to determine to use start or end-of-period at risk counts.
    • new column in fitter's summary that display the number the parameter is being compared against.
    API Changes
    • plot_lifetimes's duration arg has the interpretation of "relative time the subject died (since birth)", instead of the old "time observed for". These interpretations are different when there is late entry.
    Source code(tar.gz)
    Source code(zip)
  • v0.26.4(Nov 30, 2021)

  • v0.26.3(Sep 16, 2021)

  • v0.26.2(Sep 15, 2021)

  • v0.26.1(Sep 15, 2021)

    0.26.1 - 2021-09-15

    API Changes
    • t_0 in logrank_test now will not remove data, but will instead censor all subjects that experience the event afterwards.
    • update status column in lifelines.datasets.load_lung to be more standard coding: 0 is censored, 1 is event.
    Bug fixes
    • Fix using formulas with AalenAdditiveFitter.predict_cumulative_hazard
    • Fix using formulas with CoxPHFitter.score
    Source code(tar.gz)
    Source code(zip)
  • 0.26.0(May 27, 2021)

    0.26.0 - 2021-05-26

    New features
    • .BIC_ is now present on fitted models.
    • CoxPHFitter with spline baseline can accept pre-computed knot locations.
    • Left censoring fitting in KaplanMeierFitter is now "expected". That is, predict always predicts the survival function (as does every other model), confidence_interval_ is always the CI for the survival function (as does every other model), and so on. In summary: the API for estimates doesn't change depending on what your censoring your dataset is.
    Bug fixes
    • Fixed an annoying bug where at_risk-table label's were not aligning properly when data spanned large ranges. See merging PR for details.
    • Fixed a bug in find_best_parametric_model where the wrong BIC value was being computed.
    • Fixed regression bug when using an array as a penalizer in Cox models.
    Source code(tar.gz)
    Source code(zip)
  • v0.25.11-2(Apr 13, 2021)

    0.25.11 - 2021-04-06

    A previous release (on Github) was missing correct metadata and was deleted.

    Bug fixes
    • Fix integer-valued categorical variables in regression model predictions.
    • numpy > 1.20 is allowed.
    • Bug fix in the elastic-net penalty for Cox models that wasn't weighting the terms correctly.
    Source code(tar.gz)
    Source code(zip)
  • v0.25.10(Mar 3, 2021)

  • v0.25.9(Feb 5, 2021)

  • v0.25.8(Jan 22, 2021)

    0.25.8 - 2021-01-22

    Important: we dropped Patsy as our formula framework, and adopted Formulaic. Will the latter is less mature than Patsy, we feel the core capabilities are satisfactory and it provides new opportunities.

    New features
    • Parametric models with formulas are able to be serialized now.
    • a _scipy_callback function is available to use in fitting algorithms.
    Source code(tar.gz)
    Source code(zip)
  • v0.25.7(Dec 9, 2020)

    0.25.7 - 2020-12-09

    API Changes
    • Adding cumulative_hazard_at_times to NelsonAalenFitter
    Bug fixes
    • Fixed error in CoxPHFitter when entry time == event time.
    • Fixed formulas in AFT interval censoring regression.
    • Fixed concordance_index_ when no events observed
    • Fixed label being overwritten in ParametricUnivariate models
    Source code(tar.gz)
    Source code(zip)
  • v0.25.6(Oct 26, 2020)

    0.25.6 - 2020-10-26 New features Parametric Cox models can now handle left and interval censoring datasets. Bug fixes "improved" the output of add_at_risk_counts by removing a call to plt.tight_layout() - this works better when you are calling add_at_risk_counts on multiple axes, but it is recommended you call plt.tight_layout() at the very end of your script. Fix bug in KaplanMeierFitter's interval censoring where max(lower bound) < min(upper bound).

    Source code(tar.gz)
    Source code(zip)
  • v0.25.5(Sep 25, 2020)

    0.25.5 - 2020-09-23

    API Changes
    • check_assumptions now returns a list of list of axes that can be manipulated
    Bug fixes
    • fixed error when using plot_partial_effects with categorical data in AFT models
    • improved warning when Hessian matrix contains NaNs.
    • fixed performance regression in interval censoring fitting in parametric models
    • weights wasn't being applied properly in NPMLE
    Source code(tar.gz)
    Source code(zip)
  • v0.25.4(Aug 26, 2020)

    0.25.4 - 2020-08-26

    New features
    • New baseline estimator for Cox models: piecewise
    • Performance improvements for parametric models' log_likelihood_ratio_test() and print_summary()
    • Better step-size defaults for Cox model -> more robust convergence.
    Bug fixes
    • fix check_assumptions when using formulas.
    Source code(tar.gz)
    Source code(zip)
  • v0.25.3(Aug 24, 2020)

    0.25.3 - 2020-08-24

    New features
    • survival_difference_at_fixed_point_in_time_test now accepts fitters instead of raw data, meaning that you can use this function on left, right or interval censored data.
    API Changes
    • See note on survival_difference_at_fixed_point_in_time_test above.
    Bug fixes
    • fix StatisticalResult printing in notebooks
    • fix Python error when calling plot_covariate_groups
    • fix dtype mismatches in plot_partial_effects_on_outcome.
    Source code(tar.gz)
    Source code(zip)
  • v0.25.2(Aug 9, 2020)

    0.25.2 - 2020-08-08

    New features
    • Spline CoxPHFitter can now use strata.
    API Changes
    • a small parameterization change of the spline CoxPHFitter. The linear term in the spline part was moved to a new Intercept term in the beta_.
    • n_baseline_knots in the spline CoxPHFitter now refers to all knots, and not just interior knots (this was confusing to me, the author.). So add 2 to n_baseline_knots to recover the identical model as previously.
    Bug fixes
    • fix splines CoxPHFitter with when predict_hazard was called.
    • fix some exception imports I missed.
    • fix log-likelihood p-value in splines CoxPHFitter
    Source code(tar.gz)
    Source code(zip)
  • v0.25.1(Aug 1, 2020)

    0.25.1 - 2020-08-01

    Bug fixes
    • ok actually ship the out-of-sample calibration code
    • fix labels=False in add_at_risk_counts
    • all for specific rows to be shown in add_at_risk_counts
    • put patsy as a proper dependency.
    • suppress some Pandas 1.1 warnings.
    Source code(tar.gz)
    Source code(zip)
  • v0.25.0(Jul 27, 2020)

    0.25.0 - 2020-07-27

    New features
    • Formulas! lifelines now supports R-like formulas in regression models. See docs here.
    • plot_covariate_group now can plot other y-values like hazards and cumulative hazards (default: survival function).
    • CoxPHFitter now accepts late entries via entry_col.
    • calibration.survival_probability_calibration now works with out-of-sample data.
    • print_summary now accepts a column argument to filter down the displayed values. This helps with clutter in notebooks, latex, or on the terminal.
    • add_at_risk_counts now follows the cool new KMunicate suggestions
    API Changes
    • With the introduction of formulas, all models can be using formulas under the hood.
      • For both custom regression models or non-AFT regression models, this means that you no longer need to add a constant column to your DataFrame (instead add a 1 as a formula string in the regressors dict). You may also need to remove the T and E columns from regressors. I've updated the models in the \examples folder with examples of this new model building.
    • Unfortunately, if using formulas, your model will not be able to be pickled. This is a problem with an upstream library, and I hope to have it resolved in the near future.
    • plot_covariate_groups has been deprecated in favour of plot_partial_effects_on_outcome.
    • The baseline in plot_covariate_groups has changed from the mean observation (including dummy-encoded categorical variables) to median for ordinal (including continuous) and mode for categorical.
    • Previously, lifelines used the label "_intercept" to when it added a constant column in regressions. To align with Patsy, we are now using "Intercept".
    • In AFT models, ancillary_df kwarg has been renamed to ancillary. This reflects the more general use of the kwarg (not always a DataFrame, but could be a boolean or string now, too).
    • Some column names in datasets shipped with lifelines have changed.
    • The never used "lifelines.metrics" is deleted.
    • With the introduction of formulas, plot_covariate_groups (now called plot_partial_effects_on_outcome) behaves differently for transformed variables. Users no longer need to add "derivatives" features, and encoding is done implicitly. See docs here.
    • all exceptions and warnings have moved to lifelines.exceptions
    Bug fixes
    • The p-value of the log-likelihood ratio test for the CoxPHFitter with splines was returning the wrong result because the degrees of freedom was incorrect.
    • better print_summary logic in IDEs and Jupyter exports. Previously it should not be displayed.
    • p-values have been corrected in the SplineFitter. Previously, the "null hypothesis" was no coefficient=0, but coefficient=0.01. This is now set to the former.
    • fixed NaN bug in survival_table_from_events with intervals when no events would occur in a interval.
    Source code(tar.gz)
    Source code(zip)
  • v0.24.16(Jul 9, 2020)

    0.24.16 - 2020-07-09

    New features
    • improved algorithm choice for large Dataframes for Cox models. Should see a significant performance boost.
    Bug fixes
    • fixed utils.median_survival_time not accepting Pandas Series.
    Source code(tar.gz)
    Source code(zip)
  • v0.24.15(Jul 7, 2020)

    0.24.15 - 2020-07-07

    Bug fixes
    • fixed an edge case in KaplanMeierFitter where a really late entry would occur after all other population had died.
    • fixed plot in BreslowFlemingtonHarrisFitter
    • fixed bug where using conditional_after and times in CoxPHFitter("spline") prediction methods would be ignored.
    Source code(tar.gz)
    Source code(zip)
  • v0.24.14(Jul 2, 2020)

    0.24.14 - 2020-07-02

    Bug fixes
    • fixed a bug where using conditional_after and times in prediction methods would result in a shape error
    • fixed a bug where score was not able to be used in splined CoxPHFitter
    • fixed a bug where some columns would not be displayed in print_summary
    Source code(tar.gz)
    Source code(zip)
  • v0.24.13(Jun 22, 2020)

    0.24.13 - 2020-06-22

    Bug fixes
    • fixed a bug where CoxPHFitter would ignore inputed alpha levels for confidence intervals
    • fixed a bug where CoxPHFitter would fail with working with sklearn_adapter
    Source code(tar.gz)
    Source code(zip)
  • v0.24.12(Jun 20, 2020)

  • v0.24.11(Jun 18, 2020)

    0.24.11 - 2020-06-17

    New features
    • new spline regression model CRCSplineFitter based on the paper "A flexible parametric accelerated failure time model" by Michael J. Crowther, Patrick Royston, Mark Clements.
    • new survival probability calibration tool lifelines.calibration.survival_probability_calibration to help validate regression models. Based on “Graphical calibration curves and the integrated calibration index (ICI) for survival models” by P. Austin, F. Harrell, and D. van Klaveren.
    API Changes
    • (and bug fix) scalar parameters in regression models were not being penalized by penalizer - we now penalizing everything except intercept terms in linear relationships.
    Source code(tar.gz)
    Source code(zip)
  • v0.24.10(Jun 17, 2020)

    0.24.10

    New features
    • New improvements when using splines model in CoxPHFitter - it should offer much better prediction and baseline-hazard estimation, including extrapolation and interpolation.
    API Changes
    • Related to above: the fitted spline parameters are now available in the .summary and .print_summary methods.
    Bug fixes
    • fixed a bug in initialization of some interval-censoring models -> better convergence.
    Source code(tar.gz)
    Source code(zip)
  • v0.24.9(Jun 5, 2020)

    0.24.9 - 2020-06-05

    New features
    • Faster NPMLE for interval censored data
    • New weightings available in the logrank_test: wilcoxon, tarone-ware, peto, fleming-harrington. Thanks @sean-reed
    • new interval censored dataset: lifelines.datasets.load_mice
    Bug fixes
    • Cleared up some mislabeling in plot_loglogs. Thanks @sean-reed!
    • tuples are now able to be used as input in univariate models.
    Source code(tar.gz)
    Source code(zip)
Owner
Cameron Davidson-Pilon
CEO of Pioreactor. Former Director of Data Science @Shopify. Author of Bayesian Methods for Hackers and DataOrigami.
Cameron Davidson-Pilon
Delta Conformity Sociopatterns Analysis - Delta Conformity Sociopatterns Analysis

Delta_Conformity_Sociopatterns_Analysis ∆-Conformity is a local homophily measur

null 2 Jan 9, 2022
Streamlit App For Product Analysis - Streamlit App For Product Analysis

Streamlit_App_For_Product_Analysis Здравствуйте! Перед вами дашборд, позволяющий

Grigory Sirotkin 1 Jan 10, 2022
Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange

MyTT Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! to Stock Market Financial Technical Analysis Python

dev 34 Dec 27, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Jan 3, 2023
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2020 Links Doc

Sebastian Raschka 4.2k Jan 2, 2023
Py-FEAT: Python Facial Expression Analysis Toolbox

Py-FEAT is a suite for facial expressions (FEX) research written in Python. This package includes tools to detect faces, extract emotional facial expressions (e.g., happiness, sadness, anger), facial muscle movements (e.g., action units), and facial landmarks, from videos and images of faces, as well as methods to preprocess, analyze, and visualize FEX data.

Computational Social Affective Neuroscience Laboratory 147 Jan 6, 2023
A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

python_graphs This package is for computing graph representations of Python programs for machine learning applications. It includes the following modu

Google Research 258 Dec 29, 2022
A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks A Transformer-based library for SocialNLP classification tasks. Currently

null 298 Jan 7, 2023
Clustergram - Visualization and diagnostics for cluster analysis in Python

Clustergram Visualization and diagnostics for cluster analysis Clustergram is a diagram proposed by Matthias Schonlau in his paper The clustergram: A

Martin Fleischmann 96 Dec 26, 2022
Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

deepface Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid

Kushal Shingote 2 Feb 10, 2022
DCA - Official Python implementation of Delaunay Component Analysis algorithm

Delaunay Component Analysis (DCA) Official Python implementation of the Delaunay

Petra Poklukar 9 Sep 6, 2022
a generic C++ library for image analysis

VIGRA Computer Vision Library Copyright 1998-2013 by Ullrich Koethe This file is part of the VIGRA computer vision library. You may use,

Ullrich Koethe 378 Dec 30, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.6k Jan 1, 2023
Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

Zihao Fu 37 Nov 21, 2022
Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

TTNet-Pytorch The implementation for the paper "TTNet: Real-time temporal and spatial video analysis of table tennis" An introduction of the project c

Nguyen Mau Dung 438 Dec 29, 2022
A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

wsss-analysis The code of: A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, arXiv pre-print 2019 paper.

Lyndon Chan 48 Dec 18, 2022
Semi-supervised Learning for Sentiment Analysis

Neural-Semi-supervised-Learning-for-Text-Classification-Under-Large-Scale-Pretraining Code, models and Datasets for《Neural Semi-supervised Learning fo

null 47 Jan 1, 2023
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Demo video: CVPR 2021 Oral: Single Channel Manipulation: Localized or attribu

Zongze Wu 267 Dec 30, 2022
PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis This is a PyTorch implementation of the Deep Streaming Linear Discriminant

Tyler Hayes 41 Dec 25, 2022