Epidemiology analysis package

Paul Zivich

Last update: Jan 8, 2023

Related tags

Deep Learning data-science tmle epidemiology odds-ratio g-formula risk-ratio ipw inverse-probability-weights epidemiology-analysis risk-difference aipw g-computation targeted-maximum-likelihood incidence-rate-ratio g-estimation

Overview

zEpid

zEpid is an epidemiology analysis package, providing easy to use tools for epidemiologists coding in Python 3.5+. The purpose of this library is to provide a toolset to make epidemiology e-z. A variety of calculations and plots can be generated through various functions. For a sample walkthrough of what this library is capable of, please look at the tutorials available at https://github.com/pzivich/Python-for-Epidemiologists

A few highlights: basic epidemiology calculations, easily create functional form assessment plots, easily create effect measure plots, and causal inference tools. Implemented estimators include; inverse probability of treatment weights, inverse probability of censoring weights, inverse probabilitiy of missing weights, augmented inverse probability of treatment weights, time-fixed g-formula, Monte Carlo g-formula, Iterative conditional g-formula, and targeted maximum likelihood (TMLE). Additionally, generalizability/transportability tools are available including; inverse probability of sampling weights, g-transport formula, and doubly robust generalizability/transportability formulas.

If you have any requests for items to be included, please contact me and I will work on adding any requested features. You can contact me either through GitHub (https://github.com/pzivich), email (gmail: zepidpy), or twitter (@zepidpy).

Installation

Installing:

You can install zEpid using pip install zepid

Dependencies:

pandas >= 0.18.0, numpy, statsmodels >= 0.7.0, matplotlib >= 2.0, scipy, tabulate

Module Features

Measures

Calculate measures directly from a pandas dataframe object. Implemented measures include; risk ratio, risk difference, odds ratio, incidence rate ratio, incidence rate difference, number needed to treat, sensitivity, specificity, population attributable fraction, attributable community risk

Measures can be directly calculated from a pandas DataFrame object or using summary data.

Other handy features include; splines, Table 1 generator, interaction contrast, interaction contrast ratio, positive predictive value, negative predictive value, screening cost analyzer, counternull p-values, convert odds to proportions, convert proportions to odds

For guided tutorials with Jupyter Notebooks: https://github.com/pzivich/Python-for-Epidemiologists/blob/master/3_Epidemiology_Analysis/a_basics/1_basic_measures.ipynb

Graphics

Uses matplotlib in the background to generate some useful plots. Implemented plots include; functional form assessment (with statsmodels output), p-value function plots, spaghetti plot, effect measure plot (forest plot), receiver-operator curve, dynamic risk plots, and L'Abbe plots

For examples see: http://zepid.readthedocs.io/en/latest/Graphics.html

Causal

The causal branch includes various estimators for causal inference with observational data. Details on currently implemented estimators are below:

G-Computation Algorithm

Current implementation includes; time-fixed exposure g-formula, Monte Carlo g-formula, and iterative conditional g-formula

Inverse Probability Weights

Current implementation includes; IP Treatment W, IP Censoring W, IP Missing W. Diagnostics are also available for IPTW. IPMW supports monotone missing data

Augmented Inverse Probability Weights

Current implementation includes the augmented-IPTW estimator described by Funk et al 2011 AJE

Targeted Maximum Likelihood Estimator

TMLE can be estimated through standard logistic regression model, or through user-input functions. Alternatively, users can input machine learning algorithms to estimate probabilities. Supported machine learning algorithms include sklearn

Generalizability / Transportability

For generalizing results or transporting to a different target population, several estimators are available. These include inverse probability of sampling weights, g-transport formula, and doubly robust formulas

Tutorials for the usage of these estimators are available at: https://github.com/pzivich/Python-for-Epidemiologists/tree/master/3_Epidemiology_Analysis/c_causal_inference

G-estimation of Structural Nested Mean Models

Single time-point g-estimation of structural nested mean models are supported.

Sensitivity Analyses

Includes trapezoidal distribution generator, corrected Risk Ratio

Tutorials are available at: https://github.com/pzivich/Python-for-Epidemiologists/tree/master/3_Epidemiology_Analysis/d_sensitivity_analyses

Comments

Confusing about the time in MonteCarloGFormula

Hi，I am confusing about how to use the time and history data. It seems that we should make a model with data of questionnaires k-1 and k-2, but I couldn't find such code implementation. I found you used data of time k only while fitting you model.

#176 exposure_model #207 outcome_model #274 add_covariate_model
question

opened by Jnnocent 14

SingleCrossFit `invalid value encountered in log`

@pzivich, When using Singlecrossfit TMLE for a continuous outcome with sm.Gaussian GLM class. I have encountered the following error:

xxx/lib/python3.7/site-packages/zepid/causal/doublyrobust/crossfit.py:1663: RuntimeWarning: invalid value encountered in log
  log = sm.GLM(ys, np.column_stack((h1ws, h0ws)), offset=np.log(probability_to_odds(py_os)),
xxx/lib/python3.7/site-packages/zepid/causal/doublyrobust/crossfit.py:1669: RuntimeWarning: invalid value encountered in log
  ystar0 = np.append(ystar0, logistic.cdf(np.log(probability_to_odds(py_ns)) - epsilon[1] / pa0s))

Here is how I defined the estimator for superleaner as well as the parameter input.

link_i = sm.genmod.families.links.identity()
SL_glm = GLMSL(family = sm.families.family.Gaussian(link=link_i))
GLMSL(family = sm.families.family.Binomial())

sctmle = SingleCrossfitTMLE(dataset = df, exposure='treatment', outcome='paid_amt', continuous_bound = 0.01)
sctmle.exposure_model('gender_cd_F + prospective_risk + age_nbr', GLMSL(family = sm.families.family.Binomial()), bound=[0.01, 0.99])
sctmle.outcome_model('gender_cd_F + prospective_risk + age_nbr', SL_glm)
sctmle.fit(n_splits = 2, n_partitions=3, random_state=12345, method = 'median')
sctmle.summary()

If I uses any other ML estimates such as Lasso, GBM, RandomForest from Sklearn for outcome model estimator, it will work fine. The error only related to use of GLMSL family.

Could you share any idea of the reason of this error and how I can fix this issue? Much appreciated!

opened by miaow27 8

Add G-formula

One lofty goal is to implement the G-formula. Would need to code two versions; time-fixed and time-varying. The Chapter by Robins & Hernan is good reference. I have code that implements the g-formula using pandas. It is reasonably fast.

TODO: generalize to a class, allow input models then predict, need to determine how to allow users to input custom treatment regimes (all/none/natural course are easy to do), compare results (https://www.ncbi.nlm.nih.gov/pubmed/25140837)

Time-fixed version will be relatively easy to write up

Time-varying will need the ability to specify a large amount of models and specify the order in which the models are fit.

Note; I am also considering reorganizing in v0.2.0 that IPW/g-formula/doubly robust will all be contained within a folder caused causal, rather than adding to the current ipw folder
enhancement

opened by pzivich 8
generalize branch
In the future, I think a zepid.causal.generalize branch would be a useful addition. This branch would contain some generalizability and transportability tools. Specifically, the g-transport formula, inverse probability of sampling weights, inverse odds of sampling weights, and doubly robust generalizability.

Generally, I think I can repurpose a fair bit of the existing code. I need to consider how best to handle the distinction between generalizability (sample from target population) and transportability (sample not from target population). I am imagining that the user would pass in two data sets, the data to estimate the model on, and the other data set to generalize to.

As far as I know, these resources are largely lacking in all other commonly used epi softwares. Plus this is becoming an increasingly hot topic in epi (and I think it will catch on more widely once people recognize you can go from your biased sample to a full population under an additional set of assumptions)

Resources:

g-transport and IPSW estimators: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5466356/

inverse odds of sampling: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860052/

Doubly Robust estimator: https://arxiv.org/pdf/1902.06080.pdf

General introduction: https://arxiv.org/pdf/1805.00550.pdf

Notes: Some record of Issa's presentation at CIRG. This is the more difficult doubly robust estimator. It is when only a subset of the cohort has some measure needed for transportability. Rather than throwing out the individual who don't have X2 measures, you can use the process in the arXiv paper. For the nested-nested population, the robust estimator has three parts. b_a(X_1, S) is harder to estimate but you can use the following algorithm

model Y as a function of X={X1, X2} among S=1 and A=a

Predict \hat{Y} among those with D=1

Model \hat{Y} as X1, S in D=1

Predict \hat{Y*} in D={0, 1}

Also hard to estimate Pr(S=1|X) becuase X2 only observed for subset. Can use m-estimator to obtain. Can do this by a weighted regression with 1 for D=1 & S=1 and 1/C(X1, S) for D=1 & S=0. This is a little less obvious to me but seems doable
enhancement Short-term Causal inference
opened by pzivich 7
TMLE & Machine Learning
TMLE is not guaranteed to attain nominal coverage when used with machine learning. A simulation paper showing major problems is: https://arxiv.org/abs/1711.07137 As a result, I don't feel like TMLE can continue to be supported with machine learning, especially since it implies the confidence intervals are way too narrow (sometimes resulting in 0% coverage). I know this is a divergence from R's tmleverse, but I would rather enforce the best practice/standards than allow incorrect use of methods

Due to this issue, I will be dropping support for TMLE with machine learning. In place of this, I plan on adding CrossfitTMLE which will support machine learning approaches. The crossfitting will result in valid confidence intervals / inference.

Tentative plan:

In v0.8.0, TMLE will throw a warning when using the custom_model argument.

Once the Crossfit-AIPW and Crossfit-TMLE are available (v0.9.0), TMLE will lose that functionality. If users want to use TMLE with machine learning, they will need to use a prior version

bug change Short-term Causal inference
opened by pzivich 6
G-estimation of Structural Nested Models

Add SNM to the zepid.causal branch. After this addition, all of Robin's g-methods will be implemented in zEpid.

SNM are discussed in the Causal Inference book (https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/) and The Chapter. SAS code for search-based and closed-form solvers is available at the site. Ideally will have both implemented. Will start with time-fixed estimator
enhancement Long-term wishlist Causal inference

opened by pzivich 6
v0.8.0
Update for version 0.8.0. Below are listed the planned additions

[x] ~Update how the weights argument works in applicable causal models (IPTW, AIPW, g-formula)~ #106 No longer using this approach

Inverse Probability of Treatment Weights

[x] Changing entire structure #102

[x] Figure out why new structure is giving slightly different values for positivity calculations...

[x] Add g-bounds option to truncate weights

[x] Update tests for new structure

[x] Weight argument behaves slightly different (diagnostics now available for either IPTW alone or with other weights)

[x] New summary function for results

[x] ~Allowing for continuous treatments in estimation of MSM~ ...for later

[x] ~Plots available for binary or continuous treatments~ ...for later

Inverse Probability of Censoring Weights

[x] ~Correction for pooled logistic weight calculation with late-entry occurring~ Raise ValueError if late-entry is detected. The user will need to do some additional work

[x] Create better documentation for when late-entry occurs for this model

G-formula

[x] Add diagnostics (prediction accuracy of model)

[x] Add run_diagnostics()

Augmented IPTW

[x] Add g-bounds

[x] Add diagnostics for weights and outcome model

[x] Add run_diagnostics()

TMLE

[x] New warning when using machine learning algorithms to estimate nuisance functions #109

[x] Add diagnostics for weights and outcome model

[x] Add run_diagnostics()

S-value

[x] Add calculator for S-values, a (potentially) more informative measure than p-values #107

ReadTheDocs Documentation

[x] Add S-value

[x] Update IPTW

[x] Make sure run_diagnostics() and bound are sufficiently explained
opened by pzivich 5

refactor spline so an anonymous function can be returned for use elsewhere

Previously my code might look like:

rossi_with_splines[['age0', 'age1']] = spline(rossi_with_splines, var='age', term=2, restricted=True)
cph = CoxPHFitter().fit(rossi_with_splines.drop('age', axis=1), 'week', 'arrest')

# this part is nasty
df =rossi_with_splines.drop_duplicates('age').sort_values('age')[['age', 'age0', 'age1']].set_index('age')
(df * cph.hazards_[['age0', 'age1']].values).sum(1).plot()

spline_transform, _ = create_spline_transform(df['age'], term=2, restricted=True)
rossi_with_splines[['age0', 'age1']] = spline_transform(rossi_with_splines['age'])

cph = CoxPHFitter().fit(rossi_with_splines.drop('age', axis=1), 'week', 'arrest')

ages_to_consider = np.arange(20, 45))
y = spline_transform(ages_to_consider).dot(cph.hazards_[['age0', 'age1']].values)
plot(ages_to_consider, y)

opened by CamDavidsonPilon 5

v0.5.0
Version 0.5.0

Features to be implemented:

[x] Replace AIPW with the more specific AIPTW #57

[x] Add support for monotone IPMW #55

[ ] ~~Add support for nonmonotone IPMW #55~~ As I have read further into this, it gets a little complicated (even for the unconditional scenario). Will save for later implementation

[ ] Add support for continuous treatments in TimeFixedGFormula #49

[ ] ~~Add stratify option to measures #56~~

[x] TMLE with continuous outcomes #39

[x] TMLE with missing data #39 (only applies to missing outcome data)

[ ] ~~Add support for stochastic interventions into TMLE #52~~ Above two changes to TMLE will take precedence. Stochastic treatments are to be added later

[ ] ~~Add support for permutation weighting (TBD depending on complexity)~~ Will open a new branch for this project. No idea on how long implementation may take

[x] Incorporate random error in MonteCarloRR

Maintenance

[x] Add Python 3.7 support

[x] Check to see if matplotlib 3 breaks anything. LGTM via test_graphics_manual.py

[x] Magic-g warning updates for g-formula #63
opened by pzivich 5
Add interference
Later addition, but since statsmodels 0.9.0 has GLIMMIX, I would like to add something to deal with interference for the causal branch. I don't have any part of this worked out, so I will need to take some time to really learn what is happening in these papers

References: https://www.ncbi.nlm.nih.gov/pubmed/21068053 https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.12184 https://github.com/bsaul/inferference

Branch plan:

---causal | -interference

Verification: inferference the R package has some datasets that I can compare results with

Other: Will need to update requirements to need statsmodels 0.9.0
enhancement Long-term wishlist Causal inference
opened by pzivich 5
Enhancements to Monte-Carlo g-formula
As noted in #73 and #77 there are some further optional enhancements I can add to MonteCarloGFormula

Items to add:

[x] Censoring model

[ ] Competing risk model

Testing:

[x] Test censoring model works as intended (compare to Keil 2014)

[ ] Test competing risks. May be easiest to simulate up a quick data set to compare. Don't have anything on hand

The updates to Monte-Carlo g-formula will be added to a future update (haven't decided which version they will make it into)

Optional:

[x] Reduce memory burden of unneeded replicants

I sometimes run into a MemoryError when replicating Keil et al 2014 with many resamples. A potential way out of this is to "throw away" the observations that are not the final observation for that individual. Can add option low_memory=True to throw out those unnecessary observations. User could return the full simulated dataframe with False.
enhancement
opened by pzivich 4
Unable to install latest 0.9.0 version through pip

Using the latest version of pip 22.2.2 I am unable to install the most recent zEpid 0.9.0 release on python 3.7.0

pip install -Iv zepid==0.9.0

ERROR: Could not find a version that satisfies the requirement zepid==0.9.0 (from versions: 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.5, 0.1.6, 0.2.0, 0.2.1, 0.3.0, 0.3.1, 0.3.2, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.5.1, 0.5.2, 0.6.0, 0.6.1, 0.7.0, 0.7.1, 0.7.2, 0.8.0, 0.8.1, 0.8.2) ERROR: No matching distribution found for zepid==0.9.0

opened by aidanberry12 7
Saving DAGs programatically
I had corresponded with @pzivich over email and am posting our communication here for the benefit of other users.

JD.

Is it possible to program saving figures of directed acyclic graphs (DAGs) using zEpid? E.g. using the M-bias DAG code in the docs, typing plt.savefig('dag.png') only saves a blank PNG. To save it to disk, I'd need to plot the figure then manually click-and-point on the pop-up to save it.

PZ.

Unfortunately, saving the DAGs draw isn't too easy. In the background, I use NetworkX to organize and plot the diagram. NetworkX uses matplotlib, but it doesn't return the matplotlib axes object. So while you can tweak parts of the graph in various ways, NetworkX doesn't allow you to directly access the drawn part of the image. Normally, this isn't a problem but when it gets wrapped up in a class object that returns the matplotlib axes (which is what DirectedAcyclicGraph. draw_dag(...) does) it can lead to the issues you noted.

Currently, the best work-around is to generate the image by hand. Below is some code that should do the trick to match what is output by DirectedAcyclicGraph

import networkx as nx import matplotlib.pyplot as plt from zepid.causal.causalgraph import DirectedAcyclicGraph dag = DirectedAcyclicGraph(exposure='X', outcome="Y") dag.add_arrows((('X', 'Y'), ('U1', 'X'), ('U1', 'B'), ('U2', 'B'), ('U2', 'Y') )) fig = plt.figure(figsize=(6, 5)) ax = plt.subplot(1, 1, 1) positions = nx.spectral_layout(dag.dag) nx.draw_networkx(dag.dag, positions, node_color="#d3d3d3", node_size=1000, edge_color='black', linewidths=1.0, width=1.5, arrowsize=15, ax=ax, font_size=12) plt.axis('off') plt.savefig("filename.png", format='png', dpi=300) plt.close()

Thanks Paul for the advice!

For longer term, it seems useful to build this or something similar into zEpid graphics to programatically save (complex) DAGs in Python for publication. Possibly using position values from DAGs generated in dagitty, which is handy to quickly graph and analyse complex DAGs. Just a thought.

Cheers
opened by joannadiong 11
Add Odds Ratio and other estimands for AIPTW and TMLE

Currently AIPTW only returns RD and RR. TMLE returns those and OR as well. I should add support for OR with AIPTW (even though I am not a huge fan of OR when we have nicer estimands)

I should also add support for all / none, and things like ATT and ATU for TMLE and AIPTW both. Basically I need to look up the influence curve info in the TMLE book(s)
enhancement Causal inference

opened by pzivich 0
MonteCarloGFormula

Currently you need to set the np.random.seed outside of the function for reproducibility (which isn't good). I should use a similar RandomState approach that the cross-fit estimators use
bug Causal inference

opened by pzivich 0
Update documentation (and possibly re-organize)
I wrote most of the ReadTheDocs documentation 2-3 years ago now. It is dated (and my understanding has expanded), so I should go back and review everything after the v0.9.0 release

Here are some things to consider

Use a different split than time-fixed and time-varying exposures

Add a futures section (rather than having embedded in documents)

Update the LIPTW / SIPTW info (once done)

Replace Chat Gitter button with GitHub Discussions

Add SuperLearner page to docs

enhancement help wanted Website
opened by pzivich 2

Releases(latest-version)

latest-version(Oct 23, 2022)

Release of v0.9.1
Source code(tar.gz)
Source code(zip)
v0.9.0(Dec 30, 2020)

v0.9.0

The 0.9.x series drops support of Python 3.5.x. Only Python 3.6+ are now supported. Support has also been added for Python 3.8

Cross-fit estimators have been implemented for better causal inference with machine learning. Cross-fit estimators include SingleCrossfitAIPTW, DoubleCrossfitAIPTW, SingleCrossfitTMLE, and DoubleCrossfitTMLE. Currently functionality is limited to treatment and outcome nuisance models only (i.e. no model for missing data). These estimators also do not accept weighted data (since most of sklearn does not support weights)

Super-learner functionality has been added via SuperLearner. Additions also include emprical mean (EmpiricalMeanSL), generalized linear model (GLMSL), and step-wise backward/forward selection via AIC (StepwiseSL). These new estimators are wrappers that are compatible with SuperLearner and mimic some of the R superlearner functionality.

Directed Acyclic Graphs have been added via DirectedAcyclicGraph. These analyze the graph for sufficient adjustment sets, and can be used to display the graph. These rely on an optional NetworkX dependency.

AIPTW now supports the custom_model optional argument for user-input models. This is the same as TMLE now.

zipper_plot function for creating zipper plots has been added.

Housekeeping: bound has been updated to new procedure, updated how print_results displays to be uniform, created function to check missingness of input data in causal estimators, added warning regarding ATT and ATU variance for IPTW, and added back observation IDs for MonteCarloGFormula

Future plans: TimeFixedGFormula will be deprecated in favor of two estimators with different labels. This will more clearly delineate ATE versus stochastic effects. The replacement estimators are to be added
Source code(tar.gz)
Source code(zip)
v0.8.2(Jul 12, 2020)

Source code(tar.gz)
Source code(zip)
v0.8.1(Oct 3, 2019)

Added support for pygam's LogisticGAM for TMLE with custom models (Thanks darrenreger!)

Removed warning for TMLE with custom models following updates to Issue #109 I plan on creating a smarter warning system that flags non-Donsker class machine learning algorithms and warns the user. I still need to think through how to do this.
Source code(tar.gz)
Source code(zip)
v0.8.0(Jul 17, 2019)

v0.8.0

Major changes to IPTW. IPTW now supports calculation of a marginal structural model directly.

Greater support for censored data in IPTW, AIPTW, and GEstimationSNM

Addition of s-values
Source code(tar.gz)
Source code(zip)
v0.7.2(May 19, 2019)

Allows access to estimated standard errors for TMLE and AIPTW

Fixed label for RiskDifference summary
Source code(tar.gz)
Source code(zip)
v0.7.1(May 3, 2019)

Addition of StochasticIPTW class
Source code(tar.gz)
Source code(zip)
v0.7.0(Apr 24, 2019)

Source code(tar.gz)
Source code(zip)
v0.6.0(Mar 31, 2019)

MonteCarloGFormula now includes a separate censoring_model() function for informative censoring. Additionally, I added a low memory option to reduce the memory burden during the Monte-Carlo procedure

IterativeCondGFormula has been refactored to accept only data in a wide format. This allows for me to handle more complex treatment assignments and specify models correctly. Additional tests have been added comparing to R's ltmle

There is a new branch in zepid.causal. This is the generalize branch. It contains various tools for generalizing or transporting estimates from a biased sample to the target population of interest. Options available are inverse probability of sampling weights for generalizability (IPSW), inverse odds of sampling weights for transportability (IPSW), the g-transport formula (GTransportFormula), and doubly-robust augmented inverse probability of sampling weights (AIPSW)

RiskDifference now calculates the Frechet probability bounds

TMLE now allows for specified bounds on the Q-model predictions. Additionally, avoids error when predicted continuous values are outside the bounded values.

AIPTW now has confidence intervals for the risk difference based on influence curves

spline now uses numpy.percentile to allow for older versions of NumPy. Additionally, new function create_spline_transform returns a general function for splines, which can be used within other functions

Lots of documentation updates for all functions. Additionally, summary() functions are starting to be updated. Currently, only stylistic changes
Source code(tar.gz)
Source code(zip)
v0.5.0(Feb 23, 2019)

Source code(tar.gz)
Source code(zip)
v0.4.3(Feb 8, 2019)

Updates to TimeVaryGFormula, TMLE, Love plot. Addition of L'Abbe plots
Source code(tar.gz)
Source code(zip)
v0.4.1(Jan 11, 2019)

Source code(tar.gz)
Source code(zip)
v0.4.0(Dec 26, 2018)

Source code(tar.gz)
Source code(zip)
v0.3.2(Nov 5, 2018)

MAJOR CHANGES:

TMLE now allows estimation of risk ratios and odds ratios. Estimation procedure is based on tmle.R

TMLE variance formula has been modified to match tmle.R rather than other resources. This is beneficial for future implementation of missing data adjustment. Also would allow for mediation analysis with TMLE (not a priority for me at this time).

TMLE now includes an option to place bounds on predicted probabilities using the bound option. Default is to use all predicted probabilities. Either symmetrical or asymmetrical truncation can be specified.

TimeFixedGFormula now allows weighted data as an input. For example, IPMW can be integrated into the time-fixed g-formula estimation. Estimation for weighted data uses statsmodels GEE. As a result of the difference between GLM and GEE, the check of the number of dropped data was removed.

TimeVaryGFormula now allows weighted data as an input. For example, Sampling weights can be integrated into the time-fixed g-formula estimation. Estimation for weighted data uses statsmodels GEE.

MINOR CHANGES:

Added Sciatica Trial data set. Mertens, BJA, Jacobs, WCH, Brand, R, and Peul, WC. Assessment of patient-specific surgery effect based on weighted estimation and propensity scoring in the re-analysis of the Sciatica Trial. PLOS One 2014. Future plan is to replicate this analysis if possible.

Added data from Freireich EJ et al., "The Effect of 6-Mercaptopurine on the Duration of Steriod-induced Remissions in Acute Leukemia: A Model for Evaluation of Other Potentially Useful Therapy" Blood 1963

TMLE now allows general sklearn algorithms. Fixed issue where predict_proba() is used to generate probabilities within sklearn rather than predict. Looking at this, I am probably going to clean up the logic behind this and the rest of custom_model functionality in the future

AIPW object now contains risk_difference and risk_ratio to match RiskRatio and RiskDifference classes
Source code(tar.gz)
Source code(zip)
v0.3.1(Oct 8, 2018)

Source code(tar.gz)
Source code(zip)
v0.3.0(Aug 27, 2018)

Release 0.3.0

Major changes include; migration to class objects for basic pandas summary measures, addition of a simple/naive TMLE
Source code(tar.gz)
Source code(zip)
v0.2.1(Aug 13, 2018)

Fine tuning TimeVaryGFormula to decrease run-time substantially
Source code(tar.gz)
Source code(zip)
v0.2.0(Aug 7, 2018)

BIG CHANGES:

IPW all moved to zepid.causal.ipw. zepid.ipw is no longer supported

IPTW, IPCW, IPMW are now their own classes rather than functions. This was done since diagnostics are easier for IPTW and the user can access items directly from the models this way.

Addition of TimeVaryGFormula to fit the g-formula for time-varying exposures/confounders

effect_measure_plot() is now EffectMeasurePlot() to conform to PEP

ROC_curve() is now roc(). Also 'probability' was changed to 'threshold', since it now allows any continuous variable for threshold determinations

MINOR CHANGES:

Added sensitivity analysis as proposed by Fox et al. 2005 (MonteCarloRR)

Updated Sensitivity and Specificity functionality. Added Diagnostics, which calculates both sensitivity and specificity.

Updated dynamic risk plots to avoid merging warning. Input timeline is converted to a integer (x100000), merged, then back converted

Updated spline to use np.where rather than list comprehension

Summary data calculators are now within zepid.calc.utils

FUTURE CHANGES:

All pandas effect/association measure calculations will be migrating from functions to classes in a future version. This will better meet PEP syntax guidelines and allow users to extract elements/print results. Still deciding on the setup for this... No changes are coming to summary measure calculators (aside from possibly name changes). Intended as part of v0.3.0

Addition of Targeted Maximum Likelihood Estimation (TMLE). No current timeline developed

Addition of IPW for Interference settings. No current timeline but hopefully before 2018 ends

Further conforming to PEP guidelines (my bad)
Source code(tar.gz)
Source code(zip)
v0.1.6(Jul 16, 2018)

See CHANGELOG for the full list of details

Briefly,

Added causal branch Added time-fixed g-formula Added double-robust estimator Updated some fixes to errors
Source code(tar.gz)
Source code(zip)
v0.1.5(Jul 11, 2018)

Version 0.1.5

Addition of dynamic risk plots Added user option forlate entries into ipcw_prep()
Source code(tar.gz)
Source code(zip)
v0.1.3(Jul 2, 2018)

Updates to zepid. Added ROC curve and allows user-specification of the censoring indicator column for ipcw()
Source code(tar.gz)
Source code(zip)
v0.1.2(Jun 25, 2018)

First release of zepid
Source code(tar.gz)
Source code(zip)

Owner

Paul Zivich

Epidemiology post-doc working in epidemiologic methods and infectious diseases.

GitHub http://zepid.readthedocs.org

Madanalysis5 - A package for event file analysis and recasting of LHC results

Welcome to MadAnalysis 5 Outline What is MadAnalysis 5? Requirements Downloading

15 Jan 1, 2023

a generic C++ library for image analysis

378 Dec 30, 2022

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group

8.4k Jan 3, 2023

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2020 Links Doc

4.2k Jan 2, 2023

A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

11.6k Jan 1, 2023

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

37 Nov 21, 2022

Improving XGBoost survival analysis with embeddings and debiased estimators

xgbse: XGBoost Survival Embeddings "There are two cultures in the use of statistical modeling to reach conclusions from data

242 Dec 30, 2022

Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

TTNet-Pytorch The implementation for the paper "TTNet: Real-time temporal and spatial video analysis of table tennis" An introduction of the project c

438 Dec 29, 2022

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

wsss-analysis The code of: A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, arXiv pre-print 2019 paper.

48 Dec 18, 2022

Semi-supervised Learning for Sentiment Analysis

Neural-Semi-supervised-Learning-for-Text-Classification-Under-Large-Scale-Pretraining Code, models and Datasets for《Neural Semi-supervised Learning fo

47 Jan 1, 2023

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Demo video: CVPR 2021 Oral: Single Channel Manipulation: Localized or attribu

267 Dec 30, 2022

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis This is a PyTorch implementation of the Deep Streaming Linear Discriminant

41 Dec 25, 2022

Py-FEAT: Python Facial Expression Analysis Toolbox

Py-FEAT is a suite for facial expressions (FEX) research written in Python. This package includes tools to detect faces, extract emotional facial expressions (e.g., happiness, sadness, anger), facial muscle movements (e.g., action units), and facial landmarks, from videos and images of faces, as well as methods to preprocess, analyze, and visualize FEX data.

Computational Social Affective Neuroscience Laboratory

147 Jan 6, 2023

[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis [arxiv|pdf|v

78 Dec 22, 2022

ivadomed is an integrated framework for medical image analysis with deep learning.

Repository on the collaborative IVADO medical imaging project between the Mila and NeuroPoly labs.

144 Dec 19, 2022

Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide.

SARS-CoV-2 processing requests Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide. Prerequisites This autom

17 Aug 13, 2022

A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

python_graphs This package is for computing graph representations of Python programs for machine learning applications. It includes the following modu

258 Dec 29, 2022

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Image Crop Analysis This is a repo for the code used for reproducing our Image Crop Analysis paper as shared on our blog post. If you plan to use this

239 Jan 2, 2023

Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

SYMPAIS: Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis Overview | Installation | Documentation | Examples | Notebo

4 Sep 13, 2022

Epidemiology analysis package

Related tags

Overview

zEpid

Installation

Installing:

Dependencies:

Module Features

Measures

Graphics

Causal

G-Computation Algorithm

Inverse Probability Weights

Augmented Inverse Probability Weights

Targeted Maximum Likelihood Estimator

Generalizability / Transportability

G-estimation of Structural Nested Mean Models

Sensitivity Analyses

Comments

Releases(latest-version)

latest-version(Oct 23, 2022)

v0.9.0(Dec 30, 2020)

v0.8.2(Jul 12, 2020)

v0.8.1(Oct 3, 2019)

v0.8.0(Jul 17, 2019)

v0.7.2(May 19, 2019)

v0.7.1(May 3, 2019)

v0.7.0(Apr 24, 2019)

v0.6.0(Mar 31, 2019)

v0.5.0(Feb 23, 2019)

v0.4.3(Feb 8, 2019)

v0.4.1(Jan 11, 2019)

v0.4.0(Dec 26, 2018)

v0.3.2(Nov 5, 2018)

v0.3.1(Oct 8, 2018)

v0.3.0(Aug 27, 2018)

v0.2.1(Aug 13, 2018)

v0.2.0(Aug 7, 2018)

v0.1.6(Jul 16, 2018)

v0.1.5(Jul 11, 2018)

v0.1.3(Jul 2, 2018)

v0.1.2(Jun 25, 2018)

Owner

Paul Zivich

Madanalysis5 - A package for event file analysis and recasting of LHC results

a generic C++ library for image analysis

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

A library of extension and helper modules for Python's data analysis and machine learning libraries.

A toolkit for making real world machine learning and data analysis applications in C++

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

Improving XGBoost survival analysis with embeddings and debiased estimators

Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

Semi-supervised Learning for Sentiment Analysis

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Py-FEAT: Python Facial Expression Analysis Toolbox

[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

ivadomed is an integrated framework for medical image analysis with deep learning.

Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide.

A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX