[HELP REQUESTED] Generalized Additive Models in Python

Overview

Build Status Documentation Status PyPI version codecov python27 python36 DOI

pyGAM

Generalized Additive Models in Python.

Documentation

Installation

pip install pygam

scikit-sparse

To speed up optimization on large models with constraints, it helps to have scikit-sparse installed because it contains a slightly faster, sparse version of Cholesky factorization. The import from scikit-sparse references nose, so you'll need that too.

The easiest way is to use Conda:
conda install -c conda-forge scikit-sparse nose

scikit-sparse docs

Contributing - HELP REQUESTED

Contributions are most welcome!

You can help pyGAM in many ways including:

  • Working on a known bug.
  • Trying it out and reporting bugs or what was difficult.
  • Helping improve the documentation.
  • Writing new distributions, and link functions.
  • If you need some ideas, please take a look at the issues.

To start:

  • fork the project and cut a new branch
  • Now install the testing dependencies
conda install pytest numpy pandas scipy pytest-cov cython
pip install --upgrade pip
pip install -r requirements.txt

It helps to add a sym-link of the forked project to your python path. To do this, you should install flit:

  • pip install flit
  • Then from main project folder (ie .../pyGAM) do: flit install -s

Make some changes and write a test...

  • Test your contribution (eg from the .../pyGAM): py.test -s
  • When you are happy with your changes, make a pull request into the master branch of the main project.

About

Generalized Additive Models (GAMs) are smooth semi-parametric models of the form:

alt tag

where X.T = [X_1, X_2, ..., X_p] are independent variables, y is the dependent variable, and g() is the link function that relates our predictor variables to the expected value of the dependent variable.

The feature functions f_i() are built using penalized B splines, which allow us to automatically model non-linear relationships without having to manually try out many different transformations on each variable.

GAMs extend generalized linear models by allowing non-linear functions of features while maintaining additivity. Since the model is additive, it is easy to examine the effect of each X_i on Y individually while holding all other predictors constant.

The result is a very flexible model, where it is easy to incorporate prior knowledge and control overfitting.

Citing pyGAM

Please consider citing pyGAM if it has helped you in your research or work:

Daniel Servén, & Charlie Brummitt. (2018, March 27). pyGAM: Generalized Additive Models in Python. Zenodo. DOI: 10.5281/zenodo.1208723

BibTex:

@misc{daniel\_serven\_2018_1208723,
  author       = {Daniel Servén and
                  Charlie Brummitt},
  title        = {pyGAM: Generalized Additive Models in Python},
  month        = mar,
  year         = 2018,
  doi          = {10.5281/zenodo.1208723},
  url          = {https://doi.org/10.5281/zenodo.1208723}
}

References

  1. Simon N. Wood, 2006
    Generalized Additive Models: an introduction with R

  2. Hastie, Tibshirani, Friedman
    The Elements of Statistical Learning
    http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf

  3. James, Witten, Hastie and Tibshirani
    An Introduction to Statistical Learning
    http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.pdf

  4. Paul Eilers & Brian Marx, 1996 Flexible Smoothing with B-splines and Penalties http://www.stat.washington.edu/courses/stat527/s13/readings/EilersMarx_StatSci_1996.pdf

  5. Kim Larsen, 2015
    GAM: The Predictive Modeling Silver Bullet
    http://multithreaded.stitchfix.com/assets/files/gam.pdf

  6. Deva Ramanan, 2008
    UCI Machine Learning: Notes on IRLS
    http://www.ics.uci.edu/~dramanan/teaching/ics273a_winter08/homework/irls_notes.pdf

  7. Paul Eilers & Brian Marx, 2015
    International Biometric Society: A Crash Course on P-splines
    http://www.ibschannel2015.nl/project/userfiles/Crash_course_handout.pdf

  8. Keiding, Niels, 1991
    Age-specific incidence and prevalence: a statistical perspective

Comments
  • [WIP] Simulate from posterior of the coefficients and smoothing parameters

    [WIP] Simulate from posterior of the coefficients and smoothing parameters

    Summary

    This is a work in progress to implement #113. This PR implements the following:

    1. a new method for GAM called simulate_from_coef_posterior_conditioned_on_smoothing_parameters that simulates from the posterior distribution over the parameters conditioned on the smoothing parameters using np.random.multivariate_normal(self.coef_, self.statistics_['cov'], size=n_simulations).
    2. a new method for every distribution called sample that draws random samples from the given array of expected values.

    Some questions to consider

    1. Right now, Distribution.sample does not take a size argument. In every sample method, the size argument is always None, so that numpy simply broadcasts across the array mu and draws one random sample for each mu. This is all we need if we just use distribution.sample(mu) from the method GAM.simulate_from_coef_posterior_conditioned_on_smoothing_parameters: we draw a certain number of random samples of the coefficients, and for each of those samples of the coefficients we draw one sample from the distribution of the response. Perhaps one may want to add more control over how many samples from the response distribution one would like for each sample from the posterior of the coefficients.
    2. For the hepatitis_A_bulgaria dataset with a LinearGAM, simulate_from_coef_posterior_conditioned_on_smoothing_parameters gives the warning RuntimeWarning: covariance is not positive-semidefinite self.coef_, self.statistics_['cov'], size=n_simulations). (This happens no matter whether constraints is 'monotonic_inc', 'concave', or None.) The method still returns random samples. The default behavior of numpy.random.multivariate_normal is to raise a warning (check_valid='warn'). Is this behavior good enough? I am not familiar with how severe this problem is.
    3. Should we rename simulate_from_coef_posterior_conditioned_on_smoothing_parameters? It's long but descriptive. It'd be nice to have it called just simulate_from_posterior, and if we implement the bootstrap samples to simulate from the distribution over the smoothing parameters, too, then that could become just an optional argument in this catch-all method simulate_from_posterior.
    4. I made sample an abstract method for the Distribution class. Feel free to remove that abstract method and the two imports (from abc import ABCMeta, abstractmethod) if you don't want to bother to force subclasses of Distribution to implement these methods (because we don't expect new distributions to be written?).
    5. simulate_from_coef_posterior_conditioned_on_smoothing_parameters checks that the model is already fitted and that n_simulations is not <= 0. I also copied and pasted the code for X = check_X(X, ...). Is this checking good enough?
    6. Should we write tests? We could verify shapes of return values and maybe also verify the random samples are close enough to what we expect.
    7. Should we implement bootstrap samples to get some uncertainty over the smoothing parameters? That may be somewhat involved to make a good API, given how computationally expensive it could be, so perhaps that is left for another PR.

    Example

    Here's an example of simulate_from_coef_posterior_conditioned_on_smoothing_parameters on the first example in the README:

    example

    The extra code added was:

    response_simulations = gam.simulate_from_coef_posterior_conditioned_on_smoothing_parameters(
        XX, 100)
    for response in response_simulations:
        plt.scatter(XX, response, alpha=.01, color='k')
    

    The light-opacity black disks are random samples from the posterior.

    I made the first axis of the return value the simulations, so you can loop over simulations with response in response_simulations, or compute averages across simulations with np.mean(response_simulations, axis=0), and so on. This convention (that axis 0 is the simulation index) matches that used by PyMC3's sample function.

    opened by cbrummitt 32
  • PoissonGAM fails with dimension mismatch warning depending on n_splines

    PoissonGAM fails with dimension mismatch warning depending on n_splines

    Using a grid search and several options for n_splines, some fits fail due to a dimension mismatch.

    gam = PoissonGAM(dtype='numerical').gridsearch(X, y, n_splines=np.arange(4,10))
    
    ....
    
      return (mu**y) * np.exp(-mu) / sp.misc.factorial(y)
     50% (3 of 6) |#############              | Elapsed Time: 0:00:00 ETA:  0:00:00/usr/local/lib/python3.6/site-packages/pygam/pygam.py:1888: UserWarning: shapes (120,240) and (239,120) not aligned: 240 (dim 1) != 239 (dim 0)
    on model:
    PoissonGAM(callbacks=[Deviance(), Diffs(), Accuracy()], 
       constraints=None, dtype='numerical', fit_intercept=True, 
       fit_linear=False, fit_splines=True, lam=0.6, max_iter=100, 
       n_splines=7, penalties='auto', spline_order=3, tol=0.0001)
    skipping...
    
      warnings.warn(msg)
     66% (4 of 6) |##################         | Elapsed Time: 0:00:00 ETA:  0:00:00/usr/local/lib/python3.6/site-packages/pygam/pygam.py:1888: UserWarning: shapes (137,260) and (259,123) not aligned: 260 (dim 1) != 259 (dim 0)
    on model:
    PoissonGAM(callbacks=[Deviance(), Diffs(), Accuracy()], 
       constraints=None, dtype='numerical', fit_intercept=True, 
       fit_linear=False, fit_splines=True, lam=0.6, max_iter=100, 
       n_splines=8, penalties='auto', spline_order=3, tol=0.0001)
    skipping...
    
    ...
    
    

    Training a LinearGAM model using the same dataset and grid search options does not give rise to the same error.

    gam = LinearGAM(dtype='numerical').gridsearch(X, y, n_splines=np.arange(4,10))
    100% (6 of 6) |###########################| Elapsed Time: 0:00:00 Time: 0:00:00
    

    Does this occur because there are more coefficients than data? If so, a more informative warning would be helpful.

    bug 
    opened by maxpagels 18
  • any offsets?

    any offsets?

    Hi, First of all, this is a great package! Is it possible to declare an offset or exposure variable? Meaning: a regressor with coefficient fixed to 1.

    opened by ric70x7 14
  • scikit-sparse installed but not detected?

    scikit-sparse installed but not detected?

    I'm trying to use LogisticGAM with a really sparse pandas dataframe. But I'm getting this warning:

    /home/echo66/.local/share/virtualenvs/pygam-505CBPMV/lib/python3.6/site-packages/pygam/utils.py:78: UserWarning: Could not import Scikit-Sparse or Suite-Sparse.
    This will slow down optimization for models with monotonicity/convexity penalties and many splines.
    See installation instructions for installing Scikit-Sparse and Suite-Sparse via Conda.
      warnings.warn(msg)
    
    opened by echo66 11
  • add constraints

    add constraints

    some penalties should really be constraints. for example, monotonic smoothing and harmonic smoothing should be hard constraints.

    perhaps they might also be added as penalties, but a basic application would be as constraints.

    opened by dswah 10
  • How to get rid of the progress bar in pyGAM program?

    How to get rid of the progress bar in pyGAM program?

    Very good and effective library!
    I wanted to embed 'pyGAM' into my own python program, but it would have a progress bar in each loop, so that it would take a long time to run. If I can remove it and not let it run, the program will run very fast. (The screenshot of issue is below.) So, how can I get rid of progress bar in your pyGAM program without changing other functions ? Coul you help me? Thanks in advance!

    issue

    enhancement good first issue 
    opened by sunshine1204 9
  • Add method for simulating from the posterior (or just add an example to the documentation)

    Add method for simulating from the posterior (or just add an example to the documentation)

    Estimating the mean and confidence intervals (using prediction_intervals) is great. In some cases, it can be useful to simulate from the posterior distribution of the model's coefficients. An example is given in pages 242–243 of [1].

    I think the following code snippet does the trick for a LinearGAM:

    def simulate_from_posterior(linear_gam, X, n_simulations):
        """Simulate from the posterior of a LinearGAM a certain number of times.
    
        Inputs
        ------
        linear_gam : pyGAM.LinearGAM
    
        X : array of shape (n_samples, n_features)
    
        n_simulations : int
            The number of simulations from the posterior to compute
    
        Returns
        -------
        simulations : array of shape (n_samples, n_simulations)
        """
        beta_replicates = np.random.multivariate_normal(
            linear_gam.coef_, linear_gam.statistics_['cov'], size=n_simulations)
        return linear_gam._modelmat(X).dot(beta_replicates.T)
    

    I'm not sure if this should be added as an example in the documentation or added to the code (or both).

    To implement this in general, I think we'd want to add a method for each Distribution that draws a certain number of samples (called sample or random_variate?), so we'd have a NormalDist.sample, BinomialDist.sample, and so on. Then the GAM.simulate could just call self.dist.sample(self.coef_, self.statistics['cov'], size=n_simulations)? I'm not sure yet how to best handle the link functions for these simulations...

    As pointed out on pages 256–257 of [1], this procedure simulates the coefficients conditioned on the smoothing parameters, lambda (lam). To actually simulate from the coefficients, one may use bootstrap samples to get simulations of the coefficients and of the smoothing parameters; an example is given on page 257 of [1].

    [1] S. Wood. Generalized Additive Models: An Introduction with R (First Edition). Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis, 2006.

    opened by cbrummitt 9
  • Added Sphinx-based documentation, updated requirements.txt

    Added Sphinx-based documentation, updated requirements.txt

    Hi, I discovered your project last week and have already used it to great effect, and thought you might like some docs. :smile:

    Commit message is below; I've hosted a copy of the HTML docs here, if you think this is worthwhile it's probably best to use readthedocs (hence my choice of theme). A few more details on the changes:

    • I recreated the text and code from the README as a jupyter notebook; only the image pygam_basis.png needed including as all the others were generated in the code
      • It's easy to add a link to the notebook's source on github from the notebook itself, so people can download and run it themselves
      • This uses nbsphinx, which is awesome
    • The API docs are largely generated with autodoc, which uses .. autoX:: directives; in the cases of links and callbacks I documented each class individually; for distributions and penalties you can see these are done at the module level with automodule::.
    • Numpy-style docstrings are a great choice--they mean you can use napoleon to generate documentation from them; the scikit-learn docs are a great example of what can be created automatically from the docstrings.

    There's a lot more to be done, in particular incorporating points from Pablo's post in more involved examples for classification and regression, deciding a better structure for the documentation overall, and of course working on the docstrings in the code. I didn't want to spend too much time on it before checking that you thought it worthwhile!

    Commit message follows


    Sphinx-based documentation has been added to the project; this comes from two main sources:

    • Main modules added using autodoc and napoleon
      • These are split into two sections, GAM classes and other helper classes and functions
      • Each GAM has its own page, while the helper classes and functions are grouped by module
      • Minor changes to docstrings in pygam.py to fix formatting issues in the output docs; there are a lot more of these to tackle but in general everything is pretty readable
    • The text and code from the README as a jupyter notebook, imported using nbsphinx

    Updated requirements.txt with a Documentation section, detailing the different modules necessary to build the documentation.

    opened by badge 8
  • LogisticGAM not converging

    LogisticGAM not converging

    I keep getting the error below when trying to train a LogisticGAM. This error does not happen when I use sklearn LogisticRegression or RandomForest.

    /opt/anaconda/envs/heartfailureNN/lib/python3.5/site-packages/pygam/links.py:149: RuntimeWarning: divide by zero encountered in true_divide return dist.levels/(mu*(dist.levels - mu)) /opt/anaconda/envs/heartfailureNN/lib/python3.5/site-packages/pygam/pygam.py:888: RuntimeWarning: invalid value encountered in multiply return sp.sparse.diags((self.link.gradient(mu, self.distribution)2 * self.distribution.V(mu=mu))-0.5) /opt/anaconda/envs/heartfailureNN/lib/python3.5/site-packages/pygam/pygam.py:907: RuntimeWarning: invalid value encountered in greater_equal mask = (np.abs(weights) >= np.sqrt(EPS)) * (weights != np.nan)

    opened by jeweinberg 8
  • Score method

    Score method

    Hi @dswah, I have added a GAM.score method to the base GAM class following from issue #102. I have also added a simple test for the score method to quickly check that it the score method without crashing and checks that the score which is R^2 is <=1.

    Currently it only calculates the R^2 which is the default score in scikit-learn for regression models. I can also add the accuracy score specifically to the LogisticGam class.

    I am quite new to github and this whole open source concept, so I would appreciate any feedback! Let me know if the code makes sense or if I am completely off. Thanks!

    opened by JodesL 7
  • constraint doesn't work for s() when there is tensor term te() in the model

    constraint doesn't work for s() when there is tensor term te() in the model

    using the "chicago example"

    from pygam import PoissonGAM, s, te from pygam.datasets import chicago X, y = chicago(return_X_y=True) gam = PoissonGAM(s(0, n_splines=200) + te(3, 1) + s(2)).fit(X, y) gam_test = PoissonGAM(s(0, constraints='monotonic_inc') + te(3, 1) + s(2)).fit(X, y)


    TypeError Traceback (most recent call last) in () 6 gam3 = PoissonGAM(s(0, n_splines=200) + te(3, 1) + s(2)).fit(X, y) 7 ----> 8 gam_test = PoissonGAM(s(0, constraints='monotonic_inc') + te(3, 1) + s(2)).fit(X, y)

    ...

    anaconda/lib/python2.7/site-packages/pygam/terms.py in build_constraints(self, coef, constraint_lam, constraint_l2) 363 if constraint is None: 364 constraint = 'none' --> 365 if constraint in CONSTRAINTS: 366 constraint = CONSTRAINTS[constraint] 367

    TypeError: unhashable type: 'list'

    bug 
    opened by jzang18 7
  • When Creating X Grid for 3D Plotting or Derivative Calculation, How Can We Include Exposure?

    When Creating X Grid for 3D Plotting or Derivative Calculation, How Can We Include Exposure?

    I'm having trouble understanding how to incorporate exposure when predicting on an X grid. The grid is needed when plotting or taking derivatives. Any advice or guidance here would be appreciated!

    opened by eddietaylor 0
  • One-hot encoding factor term

    One-hot encoding factor term

    Hi,

    I have an array as: [[ '1234' 0.123 'GitHub']]

    and I want to pass the third feature as a factor term f(2, coding = 'one-hot'). However, the encoding fails returning this error:

    ValueError: X data must be type int or float, but found type: <class 'numpy.object_'> Try transforming data with a LabelEncoder first. as a consequence of utils.check_array

    Any suggestion to overcome this issue?

    opened by ilagith 0
  • Pass callbacks argument to LinearGAM super init - fixes #291

    Pass callbacks argument to LinearGAM super init - fixes #291

    callbacks argument to be passed to LinearGAM's super().__init__() call.

    This basically addresses issue #291 regarding cloning LinearGAM estimator with sklearn functions, like cross_validate.

    opened by miguelfmc 0
  • Can't use flit to install pygam - pyproject.toml does not exist

    Can't use flit to install pygam - pyproject.toml does not exist

    I have been trying to install pygam from source using flit, as indicated in the docs, but I am unable to do so.

    I have been running the following (from the project's root directory):

    pip install flit flit install -s

    And after running flit install I get the following error:

    Config file pyproject.toml does not exist

    So it seems like the config toml file for installation is missing.

    Environment details Python 3.6.13 Flit 3.7.1

    opened by miguelfmc 1
  • Difference between prediction intervals and partial dependence (question)

    Difference between prediction intervals and partial dependence (question)

    Hi, thanks for the great package Unfortunately, I couldn't fully understand the difference between prediction intervals and partial dependence when I have a model of: y~s(0) meaning only one feature. These two functions produce different intervals with the same std requested (.95) Thanks in advance, Yifat

    opened by Yifath7 0
Releases(v0.8.0)
  • v0.8.0(Oct 31, 2018)

    New Features

    • cyclic p-splines: you can now train models with periodic features by using the 'cp' basis like so:
    GAM(s(0, basis='cp'))
    
    • factor smooths now allow dummy coding, via:
    GAM(f(0, coding='dummy'))
    

    Models using this coding scheme are more statistically interpretable , and computationally less expensive than those using one-hot encodings.

    Bug Fixes

    • models can mix constrained terms and un-constrained tensor-terms
    • tensor terms can be constrained
    Source code(tar.gz)
    Source code(zip)
  • v0.7.2(Oct 29, 2018)

    Bug Fixes

    • Fix not None element existance judgement bug in terms.py thanks @BeefOnionDumplings !
    • Added a warning issued in summary indicating that there is likely a bug in the p-values
    Source code(tar.gz)
    Source code(zip)
  • v0.7.1(Sep 22, 2018)

    Bug Fixes

    • fixed bug where np.int64 did not count as integers. the following no longer fails:
    LinearGAM().gridsearch(X, y, n_splines=np.arange(5, 10)).summary()
    
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Sep 20, 2018)

  • v0.6.3(Sep 17, 2018)

    New Features

    • gridsearch(...) allows searching across a predefined grid of points, without doing the cartesian product, when grid is a np.ndarray of shape (n_points, len(flatten(gam.lam))). This is useful for RandomSearchCV - style behavior.

    Bug Fixes

    • estimate_r_squared(X, y) no longer raises AttributeError
    • dtype=auto no longer allowed for terms
    • intercept.lam = None
    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Sep 13, 2018)

    New Features

    • easier global arguments for terms
    GAM(s(0) + s(1), n_splines=10).fit(X, y)
    

    will broadcast n_splines=10 to all terms

    Bug Fixes

    • fixed inconsistencies in GAM instatiation, where
    GAM(lam=0.6).gridsearch(X, y)
    

    worked for multi-dimensional X

    but not

    GAM(lam=0.6).gridsearch(X, y)
    
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Sep 9, 2018)

    New Features

    • tensor product terms and feature interactions. On top of that, construction is more precise and less verbose:
    GAM(te(0, s(1, n_splines=5))).fit(X, y)
    
    • the partial_dependence() method can return meshgrids to help you make 3D plots of interaction terms
    • ExpectileGAM: for creating a non-parametric description of a distribution. Instead of just modeling the mean of a response, we can model any quantile using
    ExpectileGAM().fit_quantile(X, y, quantile=0.25) 
    

    Breaking Changes

    • GAM construction is different but much simpler. check out the docstrings for help.
    • generate_X_grid and partial_dependence methods require you to specify term= instead of ~feature=~
    Source code(tar.gz)
    Source code(zip)
  • v0.5.5(Jul 7, 2018)

    New Featrues

    • all GAM classes have a verbose argument. this makes them compatible with sklearn GridsearchCV + RandomizedsearchCV
    • add toy_classification dataset
    • move generate_X_grid to GAM method

    Bug Fixes

    • users should get a more pythonic experience with partial_dependence by never needing to index with i+1
    • _initial_estimate() method no longer fails on value nudge for purely integer observations
    • regenerate images
    • bugs in readme
    • fixes bug where poorly conditioned matrix would fail when using skcholmod
    • make2d should not be verbose in initial_estimate()
    Source code(tar.gz)
    Source code(zip)
  • v0.5.4(Jun 29, 2018)

    Bug Fixes

    • PoissonGAM no longer produces -inf log-likelihoods when using non-integer exposure.
    • PoissonGAM checks exposure, weight, and y array shapes before fitting.
    Source code(tar.gz)
    Source code(zip)
  • v0.5.3(Jun 28, 2018)

    Bug Fixes

    • datasets are loadable like:
    from pygam.datasets load cake
    X, y = cake(return_X_y=True)
    
    • better model initializations for complex models by using the solution to linear unpenalized problem. This makes the second order PIRLS optimizer less likely to diverge by overshooting the maximum likelihood estimate.
    • ReadMe call for collaboration, examples reference dataset loaders, fix typos
    Source code(tar.gz)
    Source code(zip)
  • v0.5.2(Apr 22, 2018)

    Bug Fixes

    • bug fix in p-value for models with unknown variance. f-statistic was sensitive to estimated variance when it should be invariant.
    • typos
    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Apr 6, 2018)

    New Features:

    • p-values!
    • you can now see p-values in the model summary. each feature function will have a p-value, and a code describing it's level of significance.

    image

    Bug Fixes

    • improving documentation
    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Apr 4, 2018)

    Bug Fixes

    • use scipy stats log-pdfs for computing log-likelihoods
    • disable progress bars in gridsearch setting progress=False
    • add verbosity attribute to GAMs to control warnings
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Mar 27, 2018)

    Bug Fixes:

    • alow for changing SVD shapes during PIRLS iterations due to changing mask shapes
    • change coefficient initialization to constant model
    • change GammaGAM and InvGaussGAM to use non-canonical log-links by default.
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jan 22, 2018)

  • v0.3.0(Sep 15, 2017)

    New Features:

    • GAMs accept weights in fitting, gridsearch, likelihood, statistics...
    • PoissonGAM accepts exposure

    Changes

    • better handling of PIRLS weights
    • check for isfinite(...).all() in check_X, check_y

    Bug Fixes

    • constant covariates won't break SVD
    Source code(tar.gz)
    Source code(zip)
Model factory is a ML training platform to help engineers to build ML models at scale

Model Factory Machine learning today is powering many businesses today, e.g., search engine, e-commerce, news or feed recommendation. Training high qu

null 16 Sep 23, 2022
A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

?? Interactive Machine Learning experiments: ??️models training + ??models demo

Oleksii Trekhleb 1.4k Jan 6, 2023
AtsPy: Automated Time Series Models in Python (by @firmai)

Automated Time Series Models in Python (AtsPy) SSRN Report Easily develop state of the art time series models to forecast univariate data series. Simp

Derek Snow 465 Jan 2, 2023
Uber Open Source 1.6k Dec 31, 2022
ArviZ is a Python package for exploratory analysis of Bayesian models

ArviZ (pronounced "AR-vees") is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, model checking, comparison and diagnostics

ArviZ 1.3k Jan 5, 2023
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

null 152 Jan 2, 2023
Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Oracle 95 Dec 28, 2022
MIT-Machine Learning with Python–From Linear Models to Deep Learning

MIT-Machine Learning with Python–From Linear Models to Deep Learning | One of the 5 courses in MIT MicroMasters in Statistics & Data Science Welcome t

null 2 Aug 23, 2022
SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

SageMaker Python SDK SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the S

Amazon Web Services 1.8k Jan 1, 2023
Formulae is a Python library that implements Wilkinson's formulas for mixed-effects models.

formulae formulae is a Python library that implements Wilkinson's formulas for mixed-effects models. The main difference with other implementations li

null 34 Dec 21, 2022
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

Tamas Madl 482 Nov 19, 2022
Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

Time series analysis today is an important cornerstone of quantitative science in many disciplines, including natural and life sciences as well as eco

Christoph Mark 129 Dec 24, 2022
Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Auto_TS: Auto_TimeSeries Automatically build multiple Time Series models using a Single Line of Code. Now updated with Dask. Auto_timeseries is a comp

AutoViz and Auto_ViML 519 Jan 3, 2023
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

Hivemind: decentralized deep learning in PyTorch Hivemind is a PyTorch library to train large neural networks across the Internet. Its intended usage

null 1.3k Jan 8, 2023
A Lucid Framework for Transparent and Interpretable Machine Learning Models.

Currently a Beta-Version lucidmode is an open-source, low-code and lightweight Python framework for transparent and interpretable machine learning mod

lucidmode 15 Aug 12, 2022
Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

The Apache Software Foundation 121 Dec 28, 2022
TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models and supports classification, regression and ranking. TF-DF is a TensorFlow wrapper around the Yggdrasil Decision Forests C++ libraries. Models trained with TF-DF are compatible with Yggdrasil Decision Forests' models, and vice versa.

null 538 Jan 1, 2023
Evidently helps analyze machine learning models during validation or production monitoring

Evidently helps analyze machine learning models during validation or production monitoring. The tool generates interactive visual reports and JSON profiles from pandas DataFrame or csv files. Currently 6 reports are available.

Evidently AI 3.1k Jan 7, 2023