Epidemiology analysis package

Overview

zepid

zEpid

PyPI version Build Status Documentation Status Join the chat at https://gitter.im/zEpid/community

zEpid is an epidemiology analysis package, providing easy to use tools for epidemiologists coding in Python 3.5+. The purpose of this library is to provide a toolset to make epidemiology e-z. A variety of calculations and plots can be generated through various functions. For a sample walkthrough of what this library is capable of, please look at the tutorials available at https://github.com/pzivich/Python-for-Epidemiologists

A few highlights: basic epidemiology calculations, easily create functional form assessment plots, easily create effect measure plots, and causal inference tools. Implemented estimators include; inverse probability of treatment weights, inverse probability of censoring weights, inverse probabilitiy of missing weights, augmented inverse probability of treatment weights, time-fixed g-formula, Monte Carlo g-formula, Iterative conditional g-formula, and targeted maximum likelihood (TMLE). Additionally, generalizability/transportability tools are available including; inverse probability of sampling weights, g-transport formula, and doubly robust generalizability/transportability formulas.

If you have any requests for items to be included, please contact me and I will work on adding any requested features. You can contact me either through GitHub (https://github.com/pzivich), email (gmail: zepidpy), or twitter (@zepidpy).

Installation

Installing:

You can install zEpid using pip install zepid

Dependencies:

pandas >= 0.18.0, numpy, statsmodels >= 0.7.0, matplotlib >= 2.0, scipy, tabulate

Module Features

Measures

Calculate measures directly from a pandas dataframe object. Implemented measures include; risk ratio, risk difference, odds ratio, incidence rate ratio, incidence rate difference, number needed to treat, sensitivity, specificity, population attributable fraction, attributable community risk

Measures can be directly calculated from a pandas DataFrame object or using summary data.

Other handy features include; splines, Table 1 generator, interaction contrast, interaction contrast ratio, positive predictive value, negative predictive value, screening cost analyzer, counternull p-values, convert odds to proportions, convert proportions to odds

For guided tutorials with Jupyter Notebooks: https://github.com/pzivich/Python-for-Epidemiologists/blob/master/3_Epidemiology_Analysis/a_basics/1_basic_measures.ipynb

Graphics

Uses matplotlib in the background to generate some useful plots. Implemented plots include; functional form assessment (with statsmodels output), p-value function plots, spaghetti plot, effect measure plot (forest plot), receiver-operator curve, dynamic risk plots, and L'Abbe plots

For examples see: http://zepid.readthedocs.io/en/latest/Graphics.html

Causal

The causal branch includes various estimators for causal inference with observational data. Details on currently implemented estimators are below:

G-Computation Algorithm

Current implementation includes; time-fixed exposure g-formula, Monte Carlo g-formula, and iterative conditional g-formula

Inverse Probability Weights

Current implementation includes; IP Treatment W, IP Censoring W, IP Missing W. Diagnostics are also available for IPTW. IPMW supports monotone missing data

Augmented Inverse Probability Weights

Current implementation includes the augmented-IPTW estimator described by Funk et al 2011 AJE

Targeted Maximum Likelihood Estimator

TMLE can be estimated through standard logistic regression model, or through user-input functions. Alternatively, users can input machine learning algorithms to estimate probabilities. Supported machine learning algorithms include sklearn

Generalizability / Transportability

For generalizing results or transporting to a different target population, several estimators are available. These include inverse probability of sampling weights, g-transport formula, and doubly robust formulas

Tutorials for the usage of these estimators are available at: https://github.com/pzivich/Python-for-Epidemiologists/tree/master/3_Epidemiology_Analysis/c_causal_inference

G-estimation of Structural Nested Mean Models

Single time-point g-estimation of structural nested mean models are supported.

Sensitivity Analyses

Includes trapezoidal distribution generator, corrected Risk Ratio

Tutorials are available at: https://github.com/pzivich/Python-for-Epidemiologists/tree/master/3_Epidemiology_Analysis/d_sensitivity_analyses

Comments
  • Confusing about the time in MonteCarloGFormula

    Confusing about the time in MonteCarloGFormula

    Hi,I am confusing about how to use the time and history data. It seems that we should make a model with data of questionnaires k-1 and k-2, but I couldn't find such code implementation. I found you used data of time k only while fitting you model.

    #176 exposure_model #207 outcome_model #274 add_covariate_model

    question 
    opened by Jnnocent 14
  • SingleCrossFit `invalid value encountered in log`

    SingleCrossFit `invalid value encountered in log`

    @pzivich, When using Singlecrossfit TMLE for a continuous outcome with sm.Gaussian GLM class. I have encountered the following error:

    xxx/lib/python3.7/site-packages/zepid/causal/doublyrobust/crossfit.py:1663: RuntimeWarning: invalid value encountered in log
      log = sm.GLM(ys, np.column_stack((h1ws, h0ws)), offset=np.log(probability_to_odds(py_os)),
    xxx/lib/python3.7/site-packages/zepid/causal/doublyrobust/crossfit.py:1669: RuntimeWarning: invalid value encountered in log
      ystar0 = np.append(ystar0, logistic.cdf(np.log(probability_to_odds(py_ns)) - epsilon[1] / pa0s))
    

    Here is how I defined the estimator for superleaner as well as the parameter input.

    link_i = sm.genmod.families.links.identity()
    SL_glm = GLMSL(family = sm.families.family.Gaussian(link=link_i))
    GLMSL(family = sm.families.family.Binomial())
    
    sctmle = SingleCrossfitTMLE(dataset = df, exposure='treatment', outcome='paid_amt', continuous_bound = 0.01)
    sctmle.exposure_model('gender_cd_F + prospective_risk + age_nbr', GLMSL(family = sm.families.family.Binomial()), bound=[0.01, 0.99])
    sctmle.outcome_model('gender_cd_F + prospective_risk + age_nbr', SL_glm)
    sctmle.fit(n_splits = 2, n_partitions=3, random_state=12345, method = 'median')
    sctmle.summary()
    

    If I uses any other ML estimates such as Lasso, GBM, RandomForest from Sklearn for outcome model estimator, it will work fine. The error only related to use of GLMSL family.

    Could you share any idea of the reason of this error and how I can fix this issue? Much appreciated!

    opened by miaow27 8
  • Add G-formula

    Add G-formula

    One lofty goal is to implement the G-formula. Would need to code two versions; time-fixed and time-varying. The Chapter by Robins & Hernan is good reference. I have code that implements the g-formula using pandas. It is reasonably fast.

    TODO: generalize to a class, allow input models then predict, need to determine how to allow users to input custom treatment regimes (all/none/natural course are easy to do), compare results (https://www.ncbi.nlm.nih.gov/pubmed/25140837)

    Time-fixed version will be relatively easy to write up

    Time-varying will need the ability to specify a large amount of models and specify the order in which the models are fit.

    Note; I am also considering reorganizing in v0.2.0 that IPW/g-formula/doubly robust will all be contained within a folder caused causal, rather than adding to the current ipw folder

    enhancement 
    opened by pzivich 8
  • generalize branch

    generalize branch

    In the future, I think a zepid.causal.generalize branch would be a useful addition. This branch would contain some generalizability and transportability tools. Specifically, the g-transport formula, inverse probability of sampling weights, inverse odds of sampling weights, and doubly robust generalizability.

    Generally, I think I can repurpose a fair bit of the existing code. I need to consider how best to handle the distinction between generalizability (sample from target population) and transportability (sample not from target population). I am imagining that the user would pass in two data sets, the data to estimate the model on, and the other data set to generalize to.

    As far as I know, these resources are largely lacking in all other commonly used epi softwares. Plus this is becoming an increasingly hot topic in epi (and I think it will catch on more widely once people recognize you can go from your biased sample to a full population under an additional set of assumptions)

    Resources:

    • g-transport and IPSW estimators: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5466356/

    • inverse odds of sampling: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860052/

    • Doubly Robust estimator: https://arxiv.org/pdf/1902.06080.pdf

    • General introduction: https://arxiv.org/pdf/1805.00550.pdf

    Notes: Some record of Issa's presentation at CIRG. This is the more difficult doubly robust estimator. It is when only a subset of the cohort has some measure needed for transportability. Rather than throwing out the individual who don't have X2 measures, you can use the process in the arXiv paper. For the nested-nested population, the robust estimator has three parts. b_a(X_1, S) is harder to estimate but you can use the following algorithm

    1. model Y as a function of X={X1, X2} among S=1 and A=a

    2. Predict \hat{Y} among those with D=1

    3. Model \hat{Y} as X1, S in D=1

    4. Predict \hat{Y*} in D={0, 1}

    Also hard to estimate Pr(S=1|X) becuase X2 only observed for subset. Can use m-estimator to obtain. Can do this by a weighted regression with 1 for D=1 & S=1 and 1/C(X1, S) for D=1 & S=0. This is a little less obvious to me but seems doable

    enhancement Short-term Causal inference 
    opened by pzivich 7
  • TMLE & Machine Learning

    TMLE & Machine Learning

    TMLE is not guaranteed to attain nominal coverage when used with machine learning. A simulation paper showing major problems is: https://arxiv.org/abs/1711.07137 As a result, I don't feel like TMLE can continue to be supported with machine learning, especially since it implies the confidence intervals are way too narrow (sometimes resulting in 0% coverage). I know this is a divergence from R's tmleverse, but I would rather enforce the best practice/standards than allow incorrect use of methods

    Due to this issue, I will be dropping support for TMLE with machine learning. In place of this, I plan on adding CrossfitTMLE which will support machine learning approaches. The crossfitting will result in valid confidence intervals / inference.

    Tentative plan:

    • In v0.8.0, TMLE will throw a warning when using the custom_model argument.

    • Once the Crossfit-AIPW and Crossfit-TMLE are available (v0.9.0), TMLE will lose that functionality. If users want to use TMLE with machine learning, they will need to use a prior version

    bug change Short-term Causal inference 
    opened by pzivich 6
  • G-estimation of Structural Nested Models

    G-estimation of Structural Nested Models

    Add SNM to the zepid.causal branch. After this addition, all of Robin's g-methods will be implemented in zEpid.

    SNM are discussed in the Causal Inference book (https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/) and The Chapter. SAS code for search-based and closed-form solvers is available at the site. Ideally will have both implemented. Will start with time-fixed estimator

    enhancement Long-term wishlist Causal inference 
    opened by pzivich 6
  • v0.8.0

    v0.8.0

    Update for version 0.8.0. Below are listed the planned additions

    • [x] ~Update how the weights argument works in applicable causal models (IPTW, AIPW, g-formula)~ #106 No longer using this approach

    Inverse Probability of Treatment Weights

    • [x] Changing entire structure #102

    • [x] Figure out why new structure is giving slightly different values for positivity calculations...

    • [x] Add g-bounds option to truncate weights

    • [x] Update tests for new structure

    • [x] Weight argument behaves slightly different (diagnostics now available for either IPTW alone or with other weights)

    • [x] New summary function for results

    • [x] ~Allowing for continuous treatments in estimation of MSM~ ...for later

    • [x] ~Plots available for binary or continuous treatments~ ...for later

    Inverse Probability of Censoring Weights

    • [x] ~Correction for pooled logistic weight calculation with late-entry occurring~ Raise ValueError if late-entry is detected. The user will need to do some additional work

    • [x] Create better documentation for when late-entry occurs for this model

    G-formula

    • [x] Add diagnostics (prediction accuracy of model)

    • [x] Add run_diagnostics()

    Augmented IPTW

    • [x] Add g-bounds

    • [x] Add diagnostics for weights and outcome model

    • [x] Add run_diagnostics()

    TMLE

    • [x] New warning when using machine learning algorithms to estimate nuisance functions #109

    • [x] Add diagnostics for weights and outcome model

    • [x] Add run_diagnostics()

    S-value

    • [x] Add calculator for S-values, a (potentially) more informative measure than p-values #107

    ReadTheDocs Documentation

    • [x] Add S-value

    • [x] Update IPTW

    • [x] Make sure run_diagnostics() and bound are sufficiently explained

    opened by pzivich 5
  • refactor spline so an anonymous function can be returned for use elsewhere

    refactor spline so an anonymous function can be returned for use elsewhere

    Previously my code might look like:

    rossi_with_splines[['age0', 'age1']] = spline(rossi_with_splines, var='age', term=2, restricted=True)
    cph = CoxPHFitter().fit(rossi_with_splines.drop('age', axis=1), 'week', 'arrest')
    
    # this part is nasty
    df =rossi_with_splines.drop_duplicates('age').sort_values('age')[['age', 'age0', 'age1']].set_index('age')
    (df * cph.hazards_[['age0', 'age1']].values).sum(1).plot()
    

    vs

    spline_transform, _ = create_spline_transform(df['age'], term=2, restricted=True)
    rossi_with_splines[['age0', 'age1']] = spline_transform(rossi_with_splines['age'])
    
    cph = CoxPHFitter().fit(rossi_with_splines.drop('age', axis=1), 'week', 'arrest')
    
    ages_to_consider = np.arange(20, 45))
    y = spline_transform(ages_to_consider).dot(cph.hazards_[['age0', 'age1']].values)
    plot(ages_to_consider, y)
    
    opened by CamDavidsonPilon 5
  • v0.5.0

    v0.5.0

    Version 0.5.0

    Features to be implemented:

    • [x] Replace AIPW with the more specific AIPTW #57

    • [x] Add support for monotone IPMW #55

    • [ ] ~~Add support for nonmonotone IPMW #55~~ As I have read further into this, it gets a little complicated (even for the unconditional scenario). Will save for later implementation

    • [ ] Add support for continuous treatments in TimeFixedGFormula #49

    • [ ] ~~Add stratify option to measures #56~~

    • [x] TMLE with continuous outcomes #39

    • [x] TMLE with missing data #39 (only applies to missing outcome data)

    • [ ] ~~Add support for stochastic interventions into TMLE #52~~ Above two changes to TMLE will take precedence. Stochastic treatments are to be added later

    • [ ] ~~Add support for permutation weighting (TBD depending on complexity)~~ Will open a new branch for this project. No idea on how long implementation may take

    • [x] Incorporate random error in MonteCarloRR

    Maintenance

    • [x] Add Python 3.7 support

    • [x] Check to see if matplotlib 3 breaks anything. LGTM via test_graphics_manual.py

    • [x] Magic-g warning updates for g-formula #63

    opened by pzivich 5
  • Add interference

    Add interference

    Later addition, but since statsmodels 0.9.0 has GLIMMIX, I would like to add something to deal with interference for the causal branch. I don't have any part of this worked out, so I will need to take some time to really learn what is happening in these papers

    References: https://www.ncbi.nlm.nih.gov/pubmed/21068053 https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.12184 https://github.com/bsaul/inferference

    Branch plan:

    ---causal
          |
          -interference
    

    Verification: inferference the R package has some datasets that I can compare results with

    Other: Will need to update requirements to need statsmodels 0.9.0

    enhancement Long-term wishlist Causal inference 
    opened by pzivich 5
  • Enhancements to Monte-Carlo g-formula

    Enhancements to Monte-Carlo g-formula

    As noted in #73 and #77 there are some further optional enhancements I can add to MonteCarloGFormula

    Items to add:

    • [x] Censoring model

    • [ ] Competing risk model

    Testing:

    • [x] Test censoring model works as intended (compare to Keil 2014)

    • [ ] Test competing risks. May be easiest to simulate up a quick data set to compare. Don't have anything on hand

    The updates to Monte-Carlo g-formula will be added to a future update (haven't decided which version they will make it into)

    Optional:

    • [x] Reduce memory burden of unneeded replicants

    I sometimes run into a MemoryError when replicating Keil et al 2014 with many resamples. A potential way out of this is to "throw away" the observations that are not the final observation for that individual. Can add option low_memory=True to throw out those unnecessary observations. User could return the full simulated dataframe with False.

    enhancement 
    opened by pzivich 4
  • Unable to install latest 0.9.0 version through pip

    Unable to install latest 0.9.0 version through pip

    Using the latest version of pip 22.2.2 I am unable to install the most recent zEpid 0.9.0 release on python 3.7.0

    pip install -Iv zepid==0.9.0

    ERROR: Could not find a version that satisfies the requirement zepid==0.9.0 (from versions: 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.5, 0.1.6, 0.2.0, 0.2.1, 0.3.0, 0.3.1, 0.3.2, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.5.1, 0.5.2, 0.6.0, 0.6.1, 0.7.0, 0.7.1, 0.7.2, 0.8.0, 0.8.1, 0.8.2) ERROR: No matching distribution found for zepid==0.9.0

    opened by aidanberry12 7
  • Saving DAGs programatically

    Saving DAGs programatically

    I had corresponded with @pzivich over email and am posting our communication here for the benefit of other users.

    JD.

    Is it possible to program saving figures of directed acyclic graphs (DAGs) using zEpid? E.g. using the M-bias DAG code in the docs, typing plt.savefig('dag.png') only saves a blank PNG. To save it to disk, I'd need to plot the figure then manually click-and-point on the pop-up to save it.

    PZ.

    Unfortunately, saving the DAGs draw isn't too easy. In the background, I use NetworkX to organize and plot the diagram. NetworkX uses matplotlib, but it doesn't return the matplotlib axes object. So while you can tweak parts of the graph in various ways, NetworkX doesn't allow you to directly access the drawn part of the image. Normally, this isn't a problem but when it gets wrapped up in a class object that returns the matplotlib axes (which is what DirectedAcyclicGraph. draw_dag(...) does) it can lead to the issues you noted.

    Currently, the best work-around is to generate the image by hand. Below is some code that should do the trick to match what is output by DirectedAcyclicGraph

    import networkx as nx
    import matplotlib.pyplot as plt
    from zepid.causal.causalgraph import DirectedAcyclicGraph
    
    dag = DirectedAcyclicGraph(exposure='X', outcome="Y")
    dag.add_arrows((('X', 'Y'),
                    ('U1', 'X'), ('U1', 'B'),
                    ('U2', 'B'), ('U2', 'Y')
                   ))
    
    fig = plt.figure(figsize=(6, 5))
    ax = plt.subplot(1, 1, 1)
    positions = nx.spectral_layout(dag.dag)
    nx.draw_networkx(dag.dag, positions, node_color="#d3d3d3", node_size=1000, edge_color='black',
                     linewidths=1.0, width=1.5, arrowsize=15, ax=ax, font_size=12)
    plt.axis('off')
    plt.savefig("filename.png", format='png', dpi=300)
    plt.close()
    
    

    Thanks Paul for the advice!

    For longer term, it seems useful to build this or something similar into zEpid graphics to programatically save (complex) DAGs in Python for publication. Possibly using position values from DAGs generated in dagitty, which is handy to quickly graph and analyse complex DAGs. Just a thought.

    Cheers

    opened by joannadiong 11
  • Add Odds Ratio and other estimands for AIPTW and TMLE

    Add Odds Ratio and other estimands for AIPTW and TMLE

    Currently AIPTW only returns RD and RR. TMLE returns those and OR as well. I should add support for OR with AIPTW (even though I am not a huge fan of OR when we have nicer estimands)

    I should also add support for all / none, and things like ATT and ATU for TMLE and AIPTW both. Basically I need to look up the influence curve info in the TMLE book(s)

    enhancement Causal inference 
    opened by pzivich 0
  • MonteCarloGFormula

    MonteCarloGFormula

    Currently you need to set the np.random.seed outside of the function for reproducibility (which isn't good). I should use a similar RandomState approach that the cross-fit estimators use

    bug Causal inference 
    opened by pzivich 0
  • Update documentation (and possibly re-organize)

    Update documentation (and possibly re-organize)

    I wrote most of the ReadTheDocs documentation 2-3 years ago now. It is dated (and my understanding has expanded), so I should go back and review everything after the v0.9.0 release

    Here are some things to consider

    • Use a different split than time-fixed and time-varying exposures
    • Add a futures section (rather than having embedded in documents)
    • Update the LIPTW / SIPTW info (once done)
    • Replace Chat Gitter button with GitHub Discussions
    • Add SuperLearner page to docs
    enhancement help wanted Website 
    opened by pzivich 2
Releases(latest-version)
  • latest-version(Oct 23, 2022)

  • v0.9.0(Dec 30, 2020)

    v0.9.0

    The 0.9.x series drops support of Python 3.5.x. Only Python 3.6+ are now supported. Support has also been added for Python 3.8

    Cross-fit estimators have been implemented for better causal inference with machine learning. Cross-fit estimators include SingleCrossfitAIPTW, DoubleCrossfitAIPTW, SingleCrossfitTMLE, and DoubleCrossfitTMLE. Currently functionality is limited to treatment and outcome nuisance models only (i.e. no model for missing data). These estimators also do not accept weighted data (since most of sklearn does not support weights)

    Super-learner functionality has been added via SuperLearner. Additions also include emprical mean (EmpiricalMeanSL), generalized linear model (GLMSL), and step-wise backward/forward selection via AIC (StepwiseSL). These new estimators are wrappers that are compatible with SuperLearner and mimic some of the R superlearner functionality.

    Directed Acyclic Graphs have been added via DirectedAcyclicGraph. These analyze the graph for sufficient adjustment sets, and can be used to display the graph. These rely on an optional NetworkX dependency.

    AIPTW now supports the custom_model optional argument for user-input models. This is the same as TMLE now.

    zipper_plot function for creating zipper plots has been added.

    Housekeeping: bound has been updated to new procedure, updated how print_results displays to be uniform, created function to check missingness of input data in causal estimators, added warning regarding ATT and ATU variance for IPTW, and added back observation IDs for MonteCarloGFormula

    Future plans: TimeFixedGFormula will be deprecated in favor of two estimators with different labels. This will more clearly delineate ATE versus stochastic effects. The replacement estimators are to be added

    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Oct 3, 2019)

    Added support for pygam's LogisticGAM for TMLE with custom models (Thanks darrenreger!)

    Removed warning for TMLE with custom models following updates to Issue #109 I plan on creating a smarter warning system that flags non-Donsker class machine learning algorithms and warns the user. I still need to think through how to do this.

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Jul 17, 2019)

    v0.8.0

    Major changes to IPTW. IPTW now supports calculation of a marginal structural model directly.

    Greater support for censored data in IPTW, AIPTW, and GEstimationSNM

    Addition of s-values

    Source code(tar.gz)
    Source code(zip)
  • v0.7.2(May 19, 2019)

  • v0.7.1(May 3, 2019)

  • v0.6.0(Mar 31, 2019)

    MonteCarloGFormula now includes a separate censoring_model() function for informative censoring. Additionally, I added a low memory option to reduce the memory burden during the Monte-Carlo procedure

    IterativeCondGFormula has been refactored to accept only data in a wide format. This allows for me to handle more complex treatment assignments and specify models correctly. Additional tests have been added comparing to R's ltmle

    There is a new branch in zepid.causal. This is the generalize branch. It contains various tools for generalizing or transporting estimates from a biased sample to the target population of interest. Options available are inverse probability of sampling weights for generalizability (IPSW), inverse odds of sampling weights for transportability (IPSW), the g-transport formula (GTransportFormula), and doubly-robust augmented inverse probability of sampling weights (AIPSW)

    RiskDifference now calculates the Frechet probability bounds

    TMLE now allows for specified bounds on the Q-model predictions. Additionally, avoids error when predicted continuous values are outside the bounded values.

    AIPTW now has confidence intervals for the risk difference based on influence curves

    spline now uses numpy.percentile to allow for older versions of NumPy. Additionally, new function create_spline_transform returns a general function for splines, which can be used within other functions

    Lots of documentation updates for all functions. Additionally, summary() functions are starting to be updated. Currently, only stylistic changes

    Source code(tar.gz)
    Source code(zip)
  • v0.4.3(Feb 8, 2019)

  • v0.3.2(Nov 5, 2018)

    MAJOR CHANGES:

    TMLE now allows estimation of risk ratios and odds ratios. Estimation procedure is based on tmle.R

    TMLE variance formula has been modified to match tmle.R rather than other resources. This is beneficial for future implementation of missing data adjustment. Also would allow for mediation analysis with TMLE (not a priority for me at this time).

    TMLE now includes an option to place bounds on predicted probabilities using the bound option. Default is to use all predicted probabilities. Either symmetrical or asymmetrical truncation can be specified.

    TimeFixedGFormula now allows weighted data as an input. For example, IPMW can be integrated into the time-fixed g-formula estimation. Estimation for weighted data uses statsmodels GEE. As a result of the difference between GLM and GEE, the check of the number of dropped data was removed.

    TimeVaryGFormula now allows weighted data as an input. For example, Sampling weights can be integrated into the time-fixed g-formula estimation. Estimation for weighted data uses statsmodels GEE.

    MINOR CHANGES:

    Added Sciatica Trial data set. Mertens, BJA, Jacobs, WCH, Brand, R, and Peul, WC. Assessment of patient-specific surgery effect based on weighted estimation and propensity scoring in the re-analysis of the Sciatica Trial. PLOS One 2014. Future plan is to replicate this analysis if possible.

    Added data from Freireich EJ et al., "The Effect of 6-Mercaptopurine on the Duration of Steriod-induced Remissions in Acute Leukemia: A Model for Evaluation of Other Potentially Useful Therapy" Blood 1963

    TMLE now allows general sklearn algorithms. Fixed issue where predict_proba() is used to generate probabilities within sklearn rather than predict. Looking at this, I am probably going to clean up the logic behind this and the rest of custom_model functionality in the future

    AIPW object now contains risk_difference and risk_ratio to match RiskRatio and RiskDifference classes

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Aug 27, 2018)

  • v0.2.1(Aug 13, 2018)

  • v0.2.0(Aug 7, 2018)

    BIG CHANGES:

    IPW all moved to zepid.causal.ipw. zepid.ipw is no longer supported

    IPTW, IPCW, IPMW are now their own classes rather than functions. This was done since diagnostics are easier for IPTW and the user can access items directly from the models this way.

    Addition of TimeVaryGFormula to fit the g-formula for time-varying exposures/confounders

    effect_measure_plot() is now EffectMeasurePlot() to conform to PEP

    ROC_curve() is now roc(). Also 'probability' was changed to 'threshold', since it now allows any continuous variable for threshold determinations

    MINOR CHANGES:

    Added sensitivity analysis as proposed by Fox et al. 2005 (MonteCarloRR)

    Updated Sensitivity and Specificity functionality. Added Diagnostics, which calculates both sensitivity and specificity.

    Updated dynamic risk plots to avoid merging warning. Input timeline is converted to a integer (x100000), merged, then back converted

    Updated spline to use np.where rather than list comprehension

    Summary data calculators are now within zepid.calc.utils

    FUTURE CHANGES:

    All pandas effect/association measure calculations will be migrating from functions to classes in a future version. This will better meet PEP syntax guidelines and allow users to extract elements/print results. Still deciding on the setup for this... No changes are coming to summary measure calculators (aside from possibly name changes). Intended as part of v0.3.0

    Addition of Targeted Maximum Likelihood Estimation (TMLE). No current timeline developed

    Addition of IPW for Interference settings. No current timeline but hopefully before 2018 ends

    Further conforming to PEP guidelines (my bad)

    Source code(tar.gz)
    Source code(zip)
  • v0.1.6(Jul 16, 2018)

    See CHANGELOG for the full list of details

    Briefly,

    Added causal branch Added time-fixed g-formula Added double-robust estimator Updated some fixes to errors

    Source code(tar.gz)
    Source code(zip)
  • v0.1.5(Jul 11, 2018)

  • v0.1.3(Jul 2, 2018)

  • v0.1.2(Jun 25, 2018)

Owner
Paul Zivich
Epidemiology post-doc working in epidemiologic methods and infectious diseases.
Paul Zivich
Madanalysis5 - A package for event file analysis and recasting of LHC results

Welcome to MadAnalysis 5 Outline What is MadAnalysis 5? Requirements Downloading

MadAnalysis 15 Jan 1, 2023
a generic C++ library for image analysis

VIGRA Computer Vision Library Copyright 1998-2013 by Ullrich Koethe This file is part of the VIGRA computer vision library. You may use,

Ullrich Koethe 378 Dec 30, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Jan 3, 2023
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2020 Links Doc

Sebastian Raschka 4.2k Jan 2, 2023
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.6k Jan 1, 2023
Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

Zihao Fu 37 Nov 21, 2022
Improving XGBoost survival analysis with embeddings and debiased estimators

xgbse: XGBoost Survival Embeddings "There are two cultures in the use of statistical modeling to reach conclusions from data

Loft 242 Dec 30, 2022
Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

TTNet-Pytorch The implementation for the paper "TTNet: Real-time temporal and spatial video analysis of table tennis" An introduction of the project c

Nguyen Mau Dung 438 Dec 29, 2022
A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

wsss-analysis The code of: A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, arXiv pre-print 2019 paper.

Lyndon Chan 48 Dec 18, 2022
Semi-supervised Learning for Sentiment Analysis

Neural-Semi-supervised-Learning-for-Text-Classification-Under-Large-Scale-Pretraining Code, models and Datasets for《Neural Semi-supervised Learning fo

null 47 Jan 1, 2023
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Demo video: CVPR 2021 Oral: Single Channel Manipulation: Localized or attribu

Zongze Wu 267 Dec 30, 2022
PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis This is a PyTorch implementation of the Deep Streaming Linear Discriminant

Tyler Hayes 41 Dec 25, 2022
Py-FEAT: Python Facial Expression Analysis Toolbox

Py-FEAT is a suite for facial expressions (FEX) research written in Python. This package includes tools to detect faces, extract emotional facial expressions (e.g., happiness, sadness, anger), facial muscle movements (e.g., action units), and facial landmarks, from videos and images of faces, as well as methods to preprocess, analyze, and visualize FEX data.

Computational Social Affective Neuroscience Laboratory 147 Jan 6, 2023
[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis [arxiv|pdf|v

Yinan He 78 Dec 22, 2022
ivadomed is an integrated framework for medical image analysis with deep learning.

Repository on the collaborative IVADO medical imaging project between the Mila and NeuroPoly labs.

null 144 Dec 19, 2022
Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide.

SARS-CoV-2 processing requests Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide. Prerequisites This autom

useGalaxy.eu 17 Aug 13, 2022
A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

python_graphs This package is for computing graph representations of Python programs for machine learning applications. It includes the following modu

Google Research 258 Dec 29, 2022
Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Image Crop Analysis This is a repo for the code used for reproducing our Image Crop Analysis paper as shared on our blog post. If you plan to use this

Twitter Research 239 Jan 2, 2023
Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

SYMPAIS: Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis Overview | Installation | Documentation | Examples | Notebo

Yicheng Luo 4 Sep 13, 2022