[HELP REQUESTED] Generalized Additive Models in Python

daniel servén

Last update: Jan 5, 2023

Related tags

Machine Learning python data-science machine-learning scientific-computing gams interpretable-machine-learning

Overview

pyGAM

Generalized Additive Models in Python.

Documentation

Installation

pip install pygam

scikit-sparse

To speed up optimization on large models with constraints, it helps to have scikit-sparse installed because it contains a slightly faster, sparse version of Cholesky factorization. The import from scikit-sparse references nose, so you'll need that too.

The easiest way is to use Conda:
conda install -c conda-forge scikit-sparse nose

scikit-sparse docs

Contributing - HELP REQUESTED

Contributions are most welcome!

You can help pyGAM in many ways including:

Working on a known bug.
Trying it out and reporting bugs or what was difficult.
Helping improve the documentation.
Writing new distributions, and link functions.
If you need some ideas, please take a look at the issues.

To start:

fork the project and cut a new branch
Now install the testing dependencies

conda install pytest numpy pandas scipy pytest-cov cython
pip install --upgrade pip
pip install -r requirements.txt

It helps to add a sym-link of the forked project to your python path. To do this, you should install flit:

pip install flit
Then from main project folder (ie .../pyGAM) do: flit install -s

Make some changes and write a test...

Test your contribution (eg from the .../pyGAM): py.test -s
When you are happy with your changes, make a pull request into the master branch of the main project.

About

Generalized Additive Models (GAMs) are smooth semi-parametric models of the form:

where X.T = [X_1, X_2, ..., X_p] are independent variables, y is the dependent variable, and g() is the link function that relates our predictor variables to the expected value of the dependent variable.

The feature functions f_i() are built using penalized B splines, which allow us to automatically model non-linear relationships without having to manually try out many different transformations on each variable.

GAMs extend generalized linear models by allowing non-linear functions of features while maintaining additivity. Since the model is additive, it is easy to examine the effect of each X_i on Y individually while holding all other predictors constant.

The result is a very flexible model, where it is easy to incorporate prior knowledge and control overfitting.

Citing pyGAM

Please consider citing pyGAM if it has helped you in your research or work:

Daniel Servén, & Charlie Brummitt. (2018, March 27). pyGAM: Generalized Additive Models in Python. Zenodo. DOI: 10.5281/zenodo.1208723

BibTex:

@misc{daniel\_serven\_2018_1208723,
  author       = {Daniel Servén and
                  Charlie Brummitt},
  title        = {pyGAM: Generalized Additive Models in Python},
  month        = mar,
  year         = 2018,
  doi          = {10.5281/zenodo.1208723},
  url          = {https://doi.org/10.5281/zenodo.1208723}
}

References

Simon N. Wood, 2006
Generalized Additive Models: an introduction with R
Hastie, Tibshirani, Friedman
The Elements of Statistical Learning
http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
James, Witten, Hastie and Tibshirani
An Introduction to Statistical Learning
http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.pdf
Paul Eilers & Brian Marx, 1996 Flexible Smoothing with B-splines and Penalties http://www.stat.washington.edu/courses/stat527/s13/readings/EilersMarx_StatSci_1996.pdf
Kim Larsen, 2015
GAM: The Predictive Modeling Silver Bullet
http://multithreaded.stitchfix.com/assets/files/gam.pdf
Deva Ramanan, 2008
UCI Machine Learning: Notes on IRLS
http://www.ics.uci.edu/~dramanan/teaching/ics273a_winter08/homework/irls_notes.pdf
Paul Eilers & Brian Marx, 2015
International Biometric Society: A Crash Course on P-splines
http://www.ibschannel2015.nl/project/userfiles/Crash_course_handout.pdf
Keiding, Niels, 1991
Age-specific incidence and prevalence: a statistical perspective

Comments

[WIP] Simulate from posterior of the coefficients and smoothing parameters
Summary

This is a work in progress to implement #113. This PR implements the following:

a new method for GAM called simulate_from_coef_posterior_conditioned_on_smoothing_parameters that simulates from the posterior distribution over the parameters conditioned on the smoothing parameters using np.random.multivariate_normal(self.coef_, self.statistics_['cov'], size=n_simulations).

a new method for every distribution called sample that draws random samples from the given array of expected values.

Some questions to consider

Right now, Distribution.sample does not take a size argument. In every sample method, the size argument is always None, so that numpy simply broadcasts across the array mu and draws one random sample for each mu. This is all we need if we just use distribution.sample(mu) from the method GAM.simulate_from_coef_posterior_conditioned_on_smoothing_parameters: we draw a certain number of random samples of the coefficients, and for each of those samples of the coefficients we draw one sample from the distribution of the response. Perhaps one may want to add more control over how many samples from the response distribution one would like for each sample from the posterior of the coefficients.

For the hepatitis_A_bulgaria dataset with a LinearGAM, simulate_from_coef_posterior_conditioned_on_smoothing_parameters gives the warning RuntimeWarning: covariance is not positive-semidefinite self.coef_, self.statistics_['cov'], size=n_simulations). (This happens no matter whether constraints is 'monotonic_inc', 'concave', or None.) The method still returns random samples. The default behavior of numpy.random.multivariate_normal is to raise a warning (check_valid='warn'). Is this behavior good enough? I am not familiar with how severe this problem is.

Should we rename simulate_from_coef_posterior_conditioned_on_smoothing_parameters? It's long but descriptive. It'd be nice to have it called just simulate_from_posterior, and if we implement the bootstrap samples to simulate from the distribution over the smoothing parameters, too, then that could become just an optional argument in this catch-all method simulate_from_posterior.

I made sample an abstract method for the Distribution class. Feel free to remove that abstract method and the two imports (from abc import ABCMeta, abstractmethod) if you don't want to bother to force subclasses of Distribution to implement these methods (because we don't expect new distributions to be written?).

simulate_from_coef_posterior_conditioned_on_smoothing_parameters checks that the model is already fitted and that n_simulations is not <= 0. I also copied and pasted the code for X = check_X(X, ...). Is this checking good enough?

Should we write tests? We could verify shapes of return values and maybe also verify the random samples are close enough to what we expect.

Should we implement bootstrap samples to get some uncertainty over the smoothing parameters? That may be somewhat involved to make a good API, given how computationally expensive it could be, so perhaps that is left for another PR.

Example

Here's an example of simulate_from_coef_posterior_conditioned_on_smoothing_parameters on the first example in the README:

The extra code added was:

response_simulations = gam.simulate_from_coef_posterior_conditioned_on_smoothing_parameters( XX, 100) for response in response_simulations: plt.scatter(XX, response, alpha=.01, color='k')

The light-opacity black disks are random samples from the posterior.

I made the first axis of the return value the simulations, so you can loop over simulations with response in response_simulations, or compute averages across simulations with np.mean(response_simulations, axis=0), and so on. This convention (that axis 0 is the simulation index) matches that used by PyMC3's sample function.
opened by cbrummitt 32

PoissonGAM fails with dimension mismatch warning depending on n_splines

Using a grid search and several options for n_splines, some fits fail due to a dimension mismatch.

gam = PoissonGAM(dtype='numerical').gridsearch(X, y, n_splines=np.arange(4,10))

....

  return (mu**y) * np.exp(-mu) / sp.misc.factorial(y)
 50% (3 of 6) |#############              | Elapsed Time: 0:00:00 ETA:  0:00:00/usr/local/lib/python3.6/site-packages/pygam/pygam.py:1888: UserWarning: shapes (120,240) and (239,120) not aligned: 240 (dim 1) != 239 (dim 0)
on model:
PoissonGAM(callbacks=[Deviance(), Diffs(), Accuracy()], 
   constraints=None, dtype='numerical', fit_intercept=True, 
   fit_linear=False, fit_splines=True, lam=0.6, max_iter=100, 
   n_splines=7, penalties='auto', spline_order=3, tol=0.0001)
skipping...

  warnings.warn(msg)
 66% (4 of 6) |##################         | Elapsed Time: 0:00:00 ETA:  0:00:00/usr/local/lib/python3.6/site-packages/pygam/pygam.py:1888: UserWarning: shapes (137,260) and (259,123) not aligned: 260 (dim 1) != 259 (dim 0)
on model:
PoissonGAM(callbacks=[Deviance(), Diffs(), Accuracy()], 
   constraints=None, dtype='numerical', fit_intercept=True, 
   fit_linear=False, fit_splines=True, lam=0.6, max_iter=100, 
   n_splines=8, penalties='auto', spline_order=3, tol=0.0001)
skipping...

...

Training a LinearGAM model using the same dataset and grid search options does not give rise to the same error.

gam = LinearGAM(dtype='numerical').gridsearch(X, y, n_splines=np.arange(4,10))
100% (6 of 6) |###########################| Elapsed Time: 0:00:00 Time: 0:00:00

Does this occur because there are more coefficients than data? If so, a more informative warning would be helpful.

bug

opened by maxpagels 18

any offsets?

Hi, First of all, this is a great package! Is it possible to declare an offset or exposure variable? Meaning: a regressor with coefficient fixed to 1.

opened by ric70x7 14

scikit-sparse installed but not detected?

I'm trying to use LogisticGAM with a really sparse pandas dataframe. But I'm getting this warning:

/home/echo66/.local/share/virtualenvs/pygam-505CBPMV/lib/python3.6/site-packages/pygam/utils.py:78: UserWarning: Could not import Scikit-Sparse or Suite-Sparse.
This will slow down optimization for models with monotonicity/convexity penalties and many splines.
See installation instructions for installing Scikit-Sparse and Suite-Sparse via Conda.
  warnings.warn(msg)

opened by echo66 11

add constraints

some penalties should really be constraints. for example, monotonic smoothing and harmonic smoothing should be hard constraints.

perhaps they might also be added as penalties, but a basic application would be as constraints.

opened by dswah 10
How to get rid of the progress bar in pyGAM program?

Very good and effective library!
I wanted to embed 'pyGAM' into my own python program, but it would have a progress bar in each loop, so that it would take a long time to run. If I can remove it and not let it run, the program will run very fast. (The screenshot of issue is below.) So, how can I get rid of progress bar in your pyGAM program without changing other functions ? Coul you help me? Thanks in advance!

enhancement good first issue

opened by sunshine1204 9
Add method for simulating from the posterior (or just add an example to the documentation)
Estimating the mean and confidence intervals (using prediction_intervals) is great. In some cases, it can be useful to simulate from the posterior distribution of the model's coefficients. An example is given in pages 242–243 of [1].

I think the following code snippet does the trick for a LinearGAM:

def simulate_from_posterior(linear_gam, X, n_simulations): """Simulate from the posterior of a LinearGAM a certain number of times. Inputs ------ linear_gam : pyGAM.LinearGAM X : array of shape (n_samples, n_features) n_simulations : int The number of simulations from the posterior to compute Returns ------- simulations : array of shape (n_samples, n_simulations) """ beta_replicates = np.random.multivariate_normal( linear_gam.coef_, linear_gam.statistics_['cov'], size=n_simulations) return linear_gam._modelmat(X).dot(beta_replicates.T)

I'm not sure if this should be added as an example in the documentation or added to the code (or both).

To implement this in general, I think we'd want to add a method for each Distribution that draws a certain number of samples (called sample or random_variate?), so we'd have a NormalDist.sample, BinomialDist.sample, and so on. Then the GAM.simulate could just call self.dist.sample(self.coef_, self.statistics['cov'], size=n_simulations)? I'm not sure yet how to best handle the link functions for these simulations...

As pointed out on pages 256–257 of [1], this procedure simulates the coefficients conditioned on the smoothing parameters, lambda (lam). To actually simulate from the coefficients, one may use bootstrap samples to get simulations of the coefficients and of the smoothing parameters; an example is given on page 257 of [1].

[1] S. Wood. Generalized Additive Models: An Introduction with R (First Edition). Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis, 2006.
opened by cbrummitt 9
Added Sphinx-based documentation, updated requirements.txt
Hi, I discovered your project last week and have already used it to great effect, and thought you might like some docs. :smile:

Commit message is below; I've hosted a copy of the HTML docs here, if you think this is worthwhile it's probably best to use readthedocs (hence my choice of theme). A few more details on the changes:

I recreated the text and code from the README as a jupyter notebook; only the image pygam_basis.png needed including as all the others were generated in the code

It's easy to add a link to the notebook's source on github from the notebook itself, so people can download and run it themselves

This uses nbsphinx, which is awesome

The API docs are largely generated with autodoc, which uses .. autoX:: directives; in the cases of links and callbacks I documented each class individually; for distributions and penalties you can see these are done at the module level with automodule::.

Numpy-style docstrings are a great choice--they mean you can use napoleon to generate documentation from them; the scikit-learn docs are a great example of what can be created automatically from the docstrings.

There's a lot more to be done, in particular incorporating points from Pablo's post in more involved examples for classification and regression, deciding a better structure for the documentation overall, and of course working on the docstrings in the code. I didn't want to spend too much time on it before checking that you thought it worthwhile!

Commit message follows

Sphinx-based documentation has been added to the project; this comes from two main sources:

Main modules added using autodoc and napoleon

These are split into two sections, GAM classes and other helper classes and functions

Each GAM has its own page, while the helper classes and functions are grouped by module

Minor changes to docstrings in pygam.py to fix formatting issues in the output docs; there are a lot more of these to tackle but in general everything is pretty readable

The text and code from the README as a jupyter notebook, imported using nbsphinx

Updated requirements.txt with a Documentation section, detailing the different modules necessary to build the documentation.
opened by badge 8
LogisticGAM not converging

I keep getting the error below when trying to train a LogisticGAM. This error does not happen when I use sklearn LogisticRegression or RandomForest.

/opt/anaconda/envs/heartfailureNN/lib/python3.5/site-packages/pygam/links.py:149: RuntimeWarning: divide by zero encountered in true_divide return dist.levels/(mu*(dist.levels - mu)) /opt/anaconda/envs/heartfailureNN/lib/python3.5/site-packages/pygam/pygam.py:888: RuntimeWarning: invalid value encountered in multiply return sp.sparse.diags((self.link.gradient(mu, self.distribution)2 * self.distribution.V(mu=mu))-0.5) /opt/anaconda/envs/heartfailureNN/lib/python3.5/site-packages/pygam/pygam.py:907: RuntimeWarning: invalid value encountered in greater_equal mask = (np.abs(weights) >= np.sqrt(EPS)) * (weights != np.nan)

opened by jeweinberg 8
Score method

Hi @dswah, I have added a GAM.score method to the base GAM class following from issue #102. I have also added a simple test for the score method to quickly check that it the score method without crashing and checks that the score which is R^2 is <=1.

Currently it only calculates the R^2 which is the default score in scikit-learn for regression models. I can also add the accuracy score specifically to the LogisticGam class.

I am quite new to github and this whole open source concept, so I would appreciate any feedback! Let me know if the code makes sense or if I am completely off. Thanks!

opened by JodesL 7
constraint doesn't work for s() when there is tensor term te() in the model

using the "chicago example"

from pygam import PoissonGAM, s, te from pygam.datasets import chicago X, y = chicago(return_X_y=True) gam = PoissonGAM(s(0, n_splines=200) + te(3, 1) + s(2)).fit(X, y) gam_test = PoissonGAM(s(0, constraints='monotonic_inc') + te(3, 1) + s(2)).fit(X, y)

TypeError Traceback (most recent call last) in () 6 gam3 = PoissonGAM(s(0, n_splines=200) + te(3, 1) + s(2)).fit(X, y) 7 ----> 8 gam_test = PoissonGAM(s(0, constraints='monotonic_inc') + te(3, 1) + s(2)).fit(X, y)

...

anaconda/lib/python2.7/site-packages/pygam/terms.py in build_constraints(self, coef, constraint_lam, constraint_l2) 363 if constraint is None: 364 constraint = 'none' --> 365 if constraint in CONSTRAINTS: 366 constraint = CONSTRAINTS[constraint] 367

TypeError: unhashable type: 'list'
bug

opened by jzang18 7
When Creating X Grid for 3D Plotting or Derivative Calculation, How Can We Include Exposure?

I'm having trouble understanding how to incorporate exposure when predicting on an X grid. The grid is needed when plotting or taking derivatives. Any advice or guidance here would be appreciated!

opened by eddietaylor 0
One-hot encoding factor term

Hi,

I have an array as: [[ '1234' 0.123 'GitHub']]

and I want to pass the third feature as a factor term f(2, coding = 'one-hot'). However, the encoding fails returning this error:

ValueError: X data must be type int or float, but found type: <class 'numpy.object_'> Try transforming data with a LabelEncoder first. as a consequence of utils.check_array

Any suggestion to overcome this issue?

opened by ilagith 0
Pass callbacks argument to LinearGAM super init - fixes #291

callbacks argument to be passed to LinearGAM's super().__init__() call.

This basically addresses issue #291 regarding cloning LinearGAM estimator with sklearn functions, like cross_validate.

opened by miguelfmc 0
Can't use flit to install pygam - pyproject.toml does not exist

I have been trying to install pygam from source using flit, as indicated in the docs, but I am unable to do so.

I have been running the following (from the project's root directory):

pip install flit flit install -s

And after running flit install I get the following error:

Config file pyproject.toml does not exist

So it seems like the config toml file for installation is missing.

Environment details Python 3.6.13 Flit 3.7.1

opened by miguelfmc 1
Difference between prediction intervals and partial dependence (question)

Hi, thanks for the great package Unfortunately, I couldn't fully understand the difference between prediction intervals and partial dependence when I have a model of: y~s(0) meaning only one feature. These two functions produce different intervals with the same std requested (.95) Thanks in advance, Yifat

opened by Yifath7 0

Releases(v0.8.0)

v0.8.0(Oct 31, 2018)
New Features

cyclic p-splines: you can now train models with periodic features by using the 'cp' basis like so:

GAM(s(0, basis='cp'))

factor smooths now allow dummy coding, via:

GAM(f(0, coding='dummy'))

Models using this coding scheme are more statistically interpretable , and computationally less expensive than those using one-hot encodings.

Bug Fixes

models can mix constrained terms and un-constrained tensor-terms

tensor terms can be constrained

Source code(tar.gz)
Source code(zip)
v0.7.2(Oct 29, 2018)
Bug Fixes

Fix not None element existance judgement bug in terms.py thanks @BeefOnionDumplings !

Added a warning issued in summary indicating that there is likely a bug in the p-values

Source code(tar.gz)
Source code(zip)
v0.7.1(Sep 22, 2018)
Bug Fixes

fixed bug where np.int64 did not count as integers. the following no longer fails:

LinearGAM().gridsearch(X, y, n_splines=np.arange(5, 10)).summary()
Source code(tar.gz)
Source code(zip)
v0.7.0(Sep 20, 2018)
New Features

Documentation!!!

Bug Fixes

removed WIP method randomsearch

Source code(tar.gz)
Source code(zip)
v0.6.3(Sep 17, 2018)
New Features

gridsearch(...) allows searching across a predefined grid of points, without doing the cartesian product, when grid is a np.ndarray of shape (n_points, len(flatten(gam.lam))). This is useful for RandomSearchCV - style behavior.

Bug Fixes

estimate_r_squared(X, y) no longer raises AttributeError

dtype=auto no longer allowed for terms

intercept.lam = None

Source code(tar.gz)
Source code(zip)
v0.6.1(Sep 13, 2018)
New Features

easier global arguments for terms

GAM(s(0) + s(1), n_splines=10).fit(X, y)

will broadcast n_splines=10 to all terms

Bug Fixes

fixed inconsistencies in GAM instatiation, where

GAM(lam=0.6).gridsearch(X, y)

worked for multi-dimensional X

but not

GAM(lam=0.6).gridsearch(X, y)
Source code(tar.gz)
Source code(zip)
0.6.0(Sep 9, 2018)
New Features

tensor product terms and feature interactions. On top of that, construction is more precise and less verbose:

GAM(te(0, s(1, n_splines=5))).fit(X, y)

the partial_dependence() method can return meshgrids to help you make 3D plots of interaction terms

ExpectileGAM: for creating a non-parametric description of a distribution. Instead of just modeling the mean of a response, we can model any quantile using

ExpectileGAM().fit_quantile(X, y, quantile=0.25)

Breaking Changes

GAM construction is different but much simpler. check out the docstrings for help.

generate_X_grid and partial_dependence methods require you to specify term= instead of ~feature=~

Source code(tar.gz)
Source code(zip)
v0.5.5(Jul 7, 2018)
New Featrues

all GAM classes have a verbose argument. this makes them compatible with sklearn GridsearchCV + RandomizedsearchCV

add toy_classification dataset

move generate_X_grid to GAM method

Bug Fixes

users should get a more pythonic experience with partial_dependence by never needing to index with i+1

_initial_estimate() method no longer fails on value nudge for purely integer observations

regenerate images

bugs in readme

fixes bug where poorly conditioned matrix would fail when using skcholmod

make2d should not be verbose in initial_estimate()

Source code(tar.gz)
Source code(zip)
v0.5.4(Jun 29, 2018)
Bug Fixes

PoissonGAM no longer produces -inf log-likelihoods when using non-integer exposure.

PoissonGAM checks exposure, weight, and y array shapes before fitting.

Source code(tar.gz)
Source code(zip)
v0.5.3(Jun 28, 2018)
Bug Fixes

datasets are loadable like:

from pygam.datasets load cake X, y = cake(return_X_y=True)

better model initializations for complex models by using the solution to linear unpenalized problem. This makes the second order PIRLS optimizer less likely to diverge by overshooting the maximum likelihood estimate.

ReadMe call for collaboration, examples reference dataset loaders, fix typos

Source code(tar.gz)
Source code(zip)
v0.5.2(Apr 22, 2018)
Bug Fixes

bug fix in p-value for models with unknown variance. f-statistic was sensitive to estimated variance when it should be invariant.

typos

Source code(tar.gz)
Source code(zip)
v0.5.1(Apr 6, 2018)
New Features:

p-values!

you can now see p-values in the model summary. each feature function will have a p-value, and a code describing it's level of significance.

Bug Fixes

improving documentation

Source code(tar.gz)
Source code(zip)
v0.4.2(Apr 4, 2018)
Bug Fixes

use scipy stats log-pdfs for computing log-likelihoods

disable progress bars in gridsearch setting progress=False

add verbosity attribute to GAMs to control warnings

Source code(tar.gz)
Source code(zip)
v0.4.1(Mar 27, 2018)
Bug Fixes:

alow for changing SVD shapes during PIRLS iterations due to changing mask shapes

change coefficient initialization to constant model

change GammaGAM and InvGaussGAM to use non-canonical log-links by default.

Source code(tar.gz)
Source code(zip)
v0.4.0(Jan 22, 2018)
New Features

all GAMs have a sample() method that samples:

response variables,

model coefficients,

and expected values from the posterior probability thanks to @cbrummitt !!! :1st_place_medal:

all distributions have a sample(mu) method

Bug fixes

can now raise to negative power

confidence and prediction intervals use correct degrees of freedom

all public methods that accept data check for finite data

Improvements

fixes to documentation

Source code(tar.gz)
Source code(zip)
v0.3.0(Sep 15, 2017)
New Features:

GAMs accept weights in fitting, gridsearch, likelihood, statistics...

PoissonGAM accepts exposure

Changes

better handling of PIRLS weights

check for isfinite(...).all() in check_X, check_y

Bug Fixes

constant covariates won't break SVD

Source code(tar.gz)
Source code(zip)

Owner

daniel servén

GitHub https://pygam.readthedocs.io

Model factory is a ML training platform to help engineers to build ML models at scale

Model Factory Machine learning today is powering many businesses today, e.g., search engine, e-commerce, news or feed recommendation. Training high qu

16 Sep 23, 2022

A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

?? Interactive Machine Learning experiments: ??️models training + ??models demo

1.4k Jan 6, 2023

AtsPy: Automated Time Series Models in Python (by @firmai)

Automated Time Series Models in Python (AtsPy) SSRN Report Easily develop state of the art time series models to forecast univariate data series. Simp

465 Jan 2, 2023

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Petastorm Contents Petastorm Installation Generating a dataset Plain Python API Tensorflow API Pytorch API Spark Dataset Converter API Analyzing petas

1.6k Dec 31, 2022

ArviZ is a Python package for exploratory analysis of Bayesian models

ArviZ (pronounced "AR-vees") is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, model checking, comparison and diagnostics

1.3k Jan 5, 2023

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

152 Jan 2, 2023

FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically

FLAML - Fast and Lightweight AutoML

2.2k Jan 9, 2023

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

95 Dec 28, 2022

MIT-Machine Learning with Python–From Linear Models to Deep Learning

MIT-Machine Learning with Python–From Linear Models to Deep Learning | One of the 5 courses in MIT MicroMasters in Statistics & Data Science Welcome t

2 Aug 23, 2022

SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

SageMaker Python SDK SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the S

1.8k Jan 1, 2023

Formulae is a Python library that implements Wilkinson's formulas for mixed-effects models.

formulae formulae is a Python library that implements Wilkinson's formulas for mixed-effects models. The main difference with other implementations li

34 Dec 21, 2022

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

482 Nov 19, 2022

Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

Time series analysis today is an important cornerstone of quantitative science in many disciplines, including natural and life sciences as well as eco

129 Dec 24, 2022

Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Auto_TS: Auto_TimeSeries Automatically build multiple Time Series models using a Single Line of Code. Now updated with Dask. Auto_timeseries is a comp

519 Jan 3, 2023

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

Hivemind: decentralized deep learning in PyTorch Hivemind is a PyTorch library to train large neural networks across the Internet. Its intended usage

1.3k Jan 8, 2023

A Lucid Framework for Transparent and Interpretable Machine Learning Models.

Currently a Beta-Version lucidmode is an open-source, low-code and lightweight Python framework for transparent and interpretable machine learning mod

15 Aug 12, 2022

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

121 Dec 28, 2022

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models and supports classification, regression and ranking. TF-DF is a TensorFlow wrapper around the Yggdrasil Decision Forests C++ libraries. Models trained with TF-DF are compatible with Yggdrasil Decision Forests' models, and vice versa.

538 Jan 1, 2023

Evidently helps analyze machine learning models during validation or production monitoring

Evidently helps analyze machine learning models during validation or production monitoring. The tool generates interactive visual reports and JSON profiles from pandas DataFrame or csv files. Currently 6 reports are available.

3.1k Jan 7, 2023

[HELP REQUESTED] Generalized Additive Models in Python

Related tags

Overview

pyGAM

Documentation

Installation

scikit-sparse

Contributing - HELP REQUESTED

About

Citing pyGAM

References

Comments

Summary

Some questions to consider

Example

Releases(v0.8.0)

v0.8.0(Oct 31, 2018)

New Features

Bug Fixes

v0.7.2(Oct 29, 2018)

Bug Fixes

v0.7.1(Sep 22, 2018)

Bug Fixes

v0.7.0(Sep 20, 2018)

New Features

Bug Fixes

v0.6.3(Sep 17, 2018)

New Features

Bug Fixes

v0.6.1(Sep 13, 2018)

New Features

Bug Fixes

0.6.0(Sep 9, 2018)

New Features

Breaking Changes

v0.5.5(Jul 7, 2018)

New Featrues

Bug Fixes

v0.5.4(Jun 29, 2018)

v0.5.3(Jun 28, 2018)

v0.5.2(Apr 22, 2018)

Bug Fixes

v0.5.1(Apr 6, 2018)

New Features:

Bug Fixes

v0.4.2(Apr 4, 2018)

Bug Fixes

v0.4.1(Mar 27, 2018)

Bug Fixes:

v0.4.0(Jan 22, 2018)

New Features

Bug fixes

Improvements

v0.3.0(Sep 15, 2017)

New Features:

Changes

Bug Fixes

Owner

daniel servén

Model factory is a ML training platform to help engineers to build ML models at scale

A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

AtsPy: Automated Time Series Models in Python (by @firmai)

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

ArviZ is a Python package for exploratory analysis of Bayesian models

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

MIT-Machine Learning with Python–From Linear Models to Deep Learning

SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

Formulae is a Python library that implements Wilkinson's formulas for mixed-effects models.

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

A Lucid Framework for Transparent and Interpretable Machine Learning Models.

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

Evidently helps analyze machine learning models during validation or production monitoring