GAM timeseries modeling with auto-changepoint detection. Inspired by Facebook Prophet and implemented in PyMC3

Overview

pm-prophet

Logo

Pymc3-based universal time series prediction and decomposition library (inspired by Facebook Prophet). However, while Faceook prophet is a well-defined model, pm-prophet allows for total flexibility in the choice of priors and thus is potentially suited for a wider class of estimation problems.

⚠️ Only supports Python 3

Table of Contents

Installing pm-prophet

PM-Prophet installation is straightforward using pip: pip install pmprophet

Note that the key dependency of pm-prophet is PyMc3 a library that depends on Theano.

Key Features

  • Nowcasting & Forecasting
  • Intercept, growth
  • Regressors
  • Holidays
  • Additive & multiplicative seasonality
  • Fitting and plotting
  • Custom choice of priors (not in Facebook's prophet original model)
  • Changepoints in growth
  • Automatic changepoint location detection (not in Facebook's prophet original model)
  • Fitting with NUTS/AVDI/Metropolis

Experimental warning ⚠️

  • Note that automatic changepoint detection is experimental

Differences with Prophet:

  • Saturating growth is not implemented
  • Uncertainty estimation is different
  • All components (including seasonality) need to be explicitly added to the model
  • By design pm-prophet places a big emphasis on posteriors and uncertainty estimates, and therefore it won't use MAP for it's estimates.
  • While Faceook prophet is a well-defined fixed model, pm-prophet allows for total flexibility in the choice of priors and thus is potentially suited for a wider class of estimation problems

Peyton Manning example

Predicting the Peyton Manning timeseries:

import pandas as pd
from pmprophet.model import PMProphet, Sampler

df = pd.read_csv("examples/example_wp_log_peyton_manning.csv")
df = df.head(180)

# Fit both growth and intercept
m = PMProphet(df, growth=True, intercept=True, n_changepoints=25, changepoints_prior_scale=.01, name='model')

# Add monthly seasonality (order: 3)
m.add_seasonality(seasonality=30, fourier_order=3)

# Add weekly seasonality (order: 3)
m.add_seasonality(seasonality=7, fourier_order=3)

# Fit the model (using NUTS)
m.fit(method=Sampler.NUTS)

ddf = m.predict(60, alpha=0.2, include_history=True, plot=True)
m.plot_components(
    intercept=False,
)

Model Seasonality-7 Seasonality-30 Growth Change Points

Custom Priors

One of the main reason why PMProphet was built is to allow custom priors for the modeling.

The default priors are:

Variable Prior Parameters
regressors Laplace loc:0, scale:2.5
holidays Laplace loc:0, scale:2.5
seasonality Laplace loc:0, scale:0.05
growth Laplace loc:0, scale:10
changepoints Laplace loc:0, scale:2.5
intercept Normal loc:y.mean, scale: 2 * y.std
sigma Half Cauchy tau:10

But you can change model priors by inspecting and modifying the distributions stored in

m.priors

which is a dictionary of {prior: pymc3-distribution}.

In the example below we will model an additive time-series by imposing a "positive coefficients" constraint by using an Exponential distribution instead of a Laplacian distribution for the regressors.

import pandas as pd
import numpy as np
import pymc3 as pm
from pmprophet.model import PMProphet, Sampler

n_timesteps = 100
n_regressors = 20

regressors = np.random.normal(size=(n_timesteps, n_regressors))
coeffs = np.random.exponential(size=n_regressors) + np.random.normal(size=n_regressors)
# Note that min(coeffs) could be negative due to the white noise

regressors_names = [str(i) for i in range(n_regressors)]

df = pd.DataFrame()
df['y'] = np.dot(regressors, coeffs)
df['ds'] = pd.date_range('2017-01-01', periods=n_timesteps)
for idx, regressor in enumerate(regressors_names):
    df[regressor] = regressors[:, idx]

m = PMProphet(df, growth=False, intercept=False, n_changepoints=0, name='model')

with m.model:
    # Remember to suffix _<model-name> to the custom priors
    m.priors['regressors'] = pm.Exponential('regressors_%s' % m.name, 1, shape=n_regressors)

for regressor in regressors_names:
    m.add_regressor(regressor)

m.fit(
    draws=10 ** 4,
    method=Sampler.NUTS,
)
m.plot_components()

Regressors

Automatic changepoint detection ( ⚠️ experimental)

Pm-prophet is equipped with a non-parametric truncated Dirichlet Process allowing it to automatically detect changepoints in the trend.

To enable it simply initialize the model with auto_changepoints=True as follows:

from pmprophet.model import PMProphet, Sampler
import pandas as pd

df = pd.read_csv("examples/example_wp_log_peyton_manning.csv")
df = df.head(180)
m = PMProphet(df, auto_changepoints=True, growth=True, intercept=True, name='model')
m.fit(method=Sampler.METROPOLIS, draws=2000)
m.predict(60, alpha=0.2, include_history=True, plot=True)
m.plot_components(
    intercept=False,
)

Where n_changepoints is interpreted as the truncation point for the Dirichlet Process.

Pm-prophet will then decide which changepoint values make sense and add a custom weight to them. A call to plot_components() will reveal the changepoint map:

Regressors

A few caveats exist:

  • It's slow to fit since it's a non-parametric model
  • For best results use NUTS as method
  • It will likely require more than the default number of draws to converge
Comments
  • small fix for larger dataframes

    small fix for larger dataframes

    when the size of the dataframe becomes too large (for example if you don't run df = df.head(180) in the peyton_manning.ipynb example) this max function will produce the incorrect amount of draws in predict. This fix gets around this problem

    opened by JoeyFaulkner 8
  • Categorical Variables

    Categorical Variables

    can you package work with multivariate categorical time series and multivariate time series when some features are categorical and some some features are continues

    opened by Sandy4321 7
  • Example not working on Pandas 1.0

    Example not working on Pandas 1.0

    I am really excited about this library but cannot get it to work past fitting the model because of an error where it says "AttributeError": Dataframe object has no attribute 'ix'.

    I have tried downgrading pandas to .19 and .23 but still won't work. Would love to get this working.

    Screen Shot 2020-04-30 at 9 14 23 AM
    opened by zwelitunyiswa 6
  • Example code not working in Python 2.7

    Example code not working in Python 2.7

    I am trying to reproduce the example using example_wp_log_peyton_manning.csv with Python 2.7. I have all of the correct dependencies from the requirements file, but I keep getting the error

    ValueError: Mass matrix contains zeros on the diagonal. 
    The derivative of RV `intercept_model`.ravel()[0] is zero.
    The derivative of RV `changepoints_model`.ravel()[0] is zero.
    The derivative of RV `seasonality_model`.ravel()[2] is zero.
    The derivative of RV `seasonality_model`.ravel()[3] is zero.
    The derivative of RV `seasonality_model`.ravel()[4] is zero.
    The derivative of RV `seasonality_model`.ravel()[5] is zero.
    The derivative of RV `sigma_model_log__`.ravel()[0] is zero.
    

    The code I used is attached with references to matplotlib removed test.txt with versions: theano == 1.0.3 numpy == 1.15.3 pandas == 0.23.4 pymc3 == 3.5

    opened by danna-naser 6
  • license

    license

    hey there! could you attach a license to this repo? Am I free to fork it and use it? (https://help.github.com/articles/licensing-a-repository/) Thanks!

    opened by JoeyFaulkner 4
  • Documentation scratch plus some minor changes

    Documentation scratch plus some minor changes

    Changes:

    • I added a scratch for the documentation (large parts blantly copied from the fbprophet docs),
    • Fixed a minor bug (there was a if changepoints and changepoints check),
    • added a licence (MIT is ok?),
    • fit method now returns self (consistent with sklearn and fbprophet),
    • I added the requirements.txt file.

    Suggestions:

    • Some of the methods should be re-named into "internal" format (e.g. fit_growth -> _fit_growth) or documented,
    • Documentation needs improvements (I lack enough understanding of the package yet for this).
    opened by twolodzko 4
  • Updated readme for project direction?

    Updated readme for project direction?

    As it is, it is hard to gain a sense of where this project is going, or what needs to still be done (and what exactly has been accomplished), which an updated readme would solve.

    Not only would this improve readability, but help others help out with contributions as well.

    opened by michael-ziedalski 3
  • ValueError: Of the three parameters: start, end, and periods, exactly two must be specified

    ValueError: Of the three parameters: start, end, and periods, exactly two must be specified

    When running the examples from model.py and readme I get the following error:

    ValueError: Of the three parameters: start, end, and periods, exactly two must be specified

    opened by twolodzko 3
  • AttributeError: 'str' object has no attribute 'value'

    AttributeError: 'str' object has no attribute 'value'

    Trying to run the same simple model with version 0.2.1, I get the error below.

    m = PMProphet(df_city, name='model')
    
    # Fit the model (using NUTS)
    m.fit(method='NUTS')
    
    ~/.pyenv/versions/3.6.7/lib/python3.6/site-packages/pmprophet/model.py in fit(self, draws, chains, trace_size, method, map_initialization, finalize, step_kwargs, sample_kwargs)
        610             if draws:
        611                 if method != Sampler.ADVI:
    --> 612                     step_method = method.value(**step_kwargs)
        613                     self.trace = pm.sample(
        614                         draws,
    
    AttributeError: 'str' object has no attribute 'value'
    
    opened by clausherther 2
  • UnboundLocalError: local variable 'w' referenced before assignment

    UnboundLocalError: local variable 'w' referenced before assignment

    I tried to run a very simple model with weekly data from a pandas dataframe and get the below error. I'm not sure how to troubleshoot this. I get the same error if I add more explicit params for changepoints etc.

    FYI, this exact dataset has been fit with fbprophet successfully.

    Version: pmprophet-0.2 Python: Python 3.6.7

    m = PMProphet(df_city, name='model')
    m.fit(method='NUTS')
    
    ~/.pyenv/versions/3.6.7/lib/python3.6/site-packages/pmprophet/model.py in generate_priors(self)
        243                 else:
        244                     k = len(self.changepoints)
    --> 245                 cgpt = pm.Deterministic('cgpt', w * pm.Laplace('cgpt_inner', 0, self.changepoints_prior_scale, shape=k))
        246                 self.priors['changepoints'] = pm.Deterministic('changepoints_%s' % self.name, cgpt)
        247             if self.intercept and 'intercept' not in self.priors:
    
    UnboundLocalError: local variable 'w' referenced before assignment
    
    opened by clausherther 2
  • additive_seasonality *= self.data.y.max()

    additive_seasonality *= self.data.y.max()

    I've been reading through the code and I met this line at model.py, line 293: additive_seasonality *= self.data.y.max() and in a few other places around the code. Could anyone comment on its exact purpose?

    This seems to me to be related to data standartisation, but data standartisation is also done later, in the model specification: observed=(self.data['y'] - self.data['y'].mean()) / self.data['y'].std()

    At the very least, shouldn't it be additive_seasonality *= self.data.y.abs().max() to account for the possiblity that, say, range of y is [-1,0] and thus y.max()==0 ?

    opened by dsvolk 2
  • PyMC3 has been renamed PyMC

    PyMC3 has been renamed PyMC

    Hi, PyMC3 has been renamed PyMC. If this affects you and you have questions, or you want someone to direct your rage at I'm available! Do let me know how i, or any of the PyMC devs can help.

    Ravin

    opened by canyon289 0
  • AttributeError: 'DataFrame' object has no attribute 'ix'

    AttributeError: 'DataFrame' object has no attribute 'ix'

    Hi-- first off. thank you for making this. i have been looking for a pymc3 based prophet alternative for a while (not a fan of the installation process for pystan)..

    I am trying to run this peyton manning example from the read me, and the underlying module is using the deprecated ix function. Any way we could upgrade the module for the latest version of pandas? Ive never submitted a pull requests but I can potentially help.

    opened by rambam613 1
  • [Feature Request] Simplify API for regressors

    [Feature Request] Simplify API for regressors

    1. Can the prior definition for each regressor be moved within the add_regressor() method by passing the distribution object as a parameter?

    2. Can a regressor_coefficients() method be created similar to what Prophet has to get the distribution of coefficients?

    opened by rohan-gt 0
  • Heteroskedasticity

    Heteroskedasticity

    Hi,

    Great package. I was considering attempting something similar, but was very grateful to find your implementation!

    I'm modelling data that is generally periodic and trending but is heteroskedastic and rather than transform the heteroskedasticity out of the data as is usually recommended, I would like to model it specifically. My sense is that the GAM approach used by Prophet could model the data very well if I could model the varying variance explicitly.

    My initial desire is to try to model the variance itself, as opposed to the original timeseries, and see if I can identify the points where the variance changes.

    To that end, I'm trying to better understand how the truncated dirichlet process used herein and copied below accomplishes this. In particular, I don't understand the use of tt.extra_ops.cumprod(1 - beta)[:-1] is this the stickbreaking function?

    Also, how does the switch work exactly at tt.switch(tt.gt(x, 1e-4), x, 0)

                        k = self.n_changepoints
                        alpha = pm.Gamma("alpha_%s" % self.name, 1.0, 1.0)
                        beta = pm.Beta("beta_%s" % self.name, 1.0, alpha, shape=k)
                        w1 = pm.Deterministic(
                            "w1_%s" % self.name,
                            tt.concatenate([[1], tt.extra_ops.cumprod(1 - beta)[:-1]])
                            * beta,
                        )
                        w, _ = theano.map(
                            fn=lambda x: tt.switch(tt.gt(x, 1e-4), x, 0), sequences=[w1]
                        )
                        self.w = pm.Deterministic("w_%s" % self.name, w)
    
    

    can I access the probabilities for a given time-step being a changepoint after sampling?

    Thanks in advance

    opened by arainboldt 0
  • Multiple extra data

    Multiple extra data

    Hello, I use PMProphet with several features. Since I don't know the future value of the features, I generate a lot of scenarios. Is there any way to pass all these scenarios to model so that the uncertainty of the result consists of two (uncertainty of the model coefficients and uncertainty of the features)? Or is there any way to introduce such functionality using the capabilities of pymc3 (without making a loop through the scenarios) ?

    opened by sungulnara2000 0
Owner
Luca Giacomel
Luca Giacomel
Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Auto_TS: Auto_TimeSeries Automatically build multiple Time Series models using a Single Line of Code. Now updated with Dask. Auto_timeseries is a comp

AutoViz and Auto_ViML 519 Jan 3, 2023
A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching.

A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching. The solver will solve equations of the type: A can be

Sanjeet N. Dasharath 3 Feb 15, 2022
Can a machine learning project be implemented to estimate the salaries of baseball players whose salary information and career statistics for 1986 are shared?

END TO END MACHINE LEARNING PROJECT ON HITTERS DATASET Can a machine learning project be implemented to estimate the salaries of baseball players whos

Pinar Oner 7 Dec 18, 2021
Decision Tree Regression algorithm implemented on Python from scratch.

Decision_Tree_Regression I implemented the decision tree regression algorithm on Python. Unlike regular linear regression, this algorithm is used when

null 1 Dec 22, 2021
Implemented four supervised learning Machine Learning algorithms

Implemented four supervised learning Machine Learning algorithms from an algorithmic family called Classification and Regression Trees (CARTs), details see README_Report.

Teng (Elijah)  Xue 0 Jan 31, 2022
ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

ClearML - Auto-Magical Suite of tools to streamline your ML workflow Experiment Manager, MLOps and Data-Management ClearML Formerly known as Allegro T

ClearML 4k Jan 9, 2023
Uplift modeling and causal inference with machine learning algorithms

Disclaimer This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to chang

Uber Open Source 3.7k Jan 7, 2023
Automated modeling and machine learning framework FEDOT

This repository contains FEDOT - an open-source framework for automated modeling and machine learning (AutoML). It can build custom modeling pipelines for different real-world processes in an automated way using an evolutionary approach. FEDOT supports classification (binary and multiclass), regression, clustering, and time series prediction tasks.

National Center for Cognitive Research of ITMO University 148 Jul 5, 2021
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
This is an auto-ML tool specialized in detecting of outliers

Auto-ML tool specialized in detecting of outliers Description This tool will allows you, with a Dash visualization, to compare 10 models of machine le

null 1 Nov 3, 2021
Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

Repository Status for Scikit-learn Live webpage Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn. Running local

Thomas J. Fan 6 Dec 27, 2022
Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

Amazon Web Services - Labs 3.3k Jan 3, 2023
A python library for Bayesian time series modeling

PyDLM Welcome to pydlm, a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and W

Sam 438 Dec 17, 2022
Pyomo is an object-oriented algebraic modeling language in Python for structured optimization problems.

Pyomo is a Python-based open-source software package that supports a diverse set of optimization capabilities for formulating and analyzing optimization models. Pyomo can be used to define symbolic problems, create concrete problem instances, and solve these instances with standard solvers.

Pyomo 1.4k Dec 28, 2022
UpliftML: A Python Package for Scalable Uplift Modeling

UpliftML is a Python package for scalable unconstrained and constrained uplift modeling from experimental data. To accommodate working with big data, the package uses PySpark and H2O models as base learners for the uplift models. Evaluation functions expect a PySpark dataframe as input.

Booking.com 254 Dec 31, 2022
MICOM is a Python package for metabolic modeling of microbial communities

Welcome MICOM is a Python package for metabolic modeling of microbial communities currently developed in the Gibbons Lab at the Institute for Systems

null 57 Dec 21, 2022
A Pythonic framework for threat modeling

pytm: A Pythonic framework for threat modeling Introduction Traditional threat modeling too often comes late to the party, or sometimes not at all. In

Izar Tarandach 644 Dec 20, 2022
Anomaly Detection and Correlation library

luminol Overview Luminol is a light weight python library for time series data analysis. The two major functionalities it supports are anomaly detecti

LinkedIn 1.1k Jan 1, 2023
A Python toolkit for rule-based/unsupervised anomaly detection in time series

Anomaly Detection Toolkit (ADTK) Anomaly Detection Toolkit (ADTK) is a Python package for unsupervised / rule-based time series anomaly detection. As

Arundo Analytics 888 Dec 30, 2022