Hierarchical Time Series Forecasting using Prophet

Overview

htsprophet

Hierarchical Time Series Forecasting using Prophet

Credit to Rob J. Hyndman and research partners as much of the code was developed with the help of their work.

https://www.otexts.org/fpp

https://robjhyndman.com/publications/

Credit to Facebook and their fbprophet package.

https://facebookincubator.github.io/prophet/

It was my intention to make some of the code look similar to certain sections in the Prophet and (Hyndman's) hts packages.

Downloading

  1. pip install htsprophet

If you'd like to just skip to coding with the package, runHTS.py should help you with that, but if you like reading, the following should help you understand how I built htsprophet and how it works.

Part I: The Data

I originally used Redfin traffic data to build this package.

I pulled the data so that date was in the first column, my layers were the middle columns, and the number I wanted to forecast was in the last column.

I made a function called makeWeekly() , that rolls up your data into the weekly level. It’s not a necessary function, it was mostly just convenient for me.

So the data looked like this:

Date Platform Medium BusinessMarket Sessions
1100 B.C. Stone Tablet Land Birmingham 23234
... Car Phone Air Auburn 2342
... Sea Evanston 233
... Seattle 445
... 46362

I then ran my orderHier() function with just this dataframe as its input.

NOTE: you cannot run this function if you have more than 4 columns in the middle (in between Date and Sessions for ex.)

To run this function, you specify the data, and how you want your middle columns to be ordered.

So orderHier(data, 2, 1, 3) means you want the second column after date to be the first level of the hierarchy.

Our example would look like this:

Alt text

Date Total Land Air Sea Land_Stone tablet Land_Car Phone Air_Stone tablet
1100 B.C. 24578 23135 555 888 23000 135 550
1099 B.C. 86753 86654 44 55 2342 84312 22
... ... ... ... ... ... ... ...
*All numbers represent the number of sessions for each node in the Hierarchy

If you have more than 4 categorical columns, then you must get the data in this format on your own while also producing the list of lists called nodes

Nodes – describes the structure of the hierarchy.

Here it would equal [[3],[2,2,2],[4,4,4,4,4,4]]

There are 3 nodes in the first level: Land, Air, Sea.

There are 2 children for each of those nodes: Stone tablet, Car phone.

There are 4 business markets for each of those nodes: Tokyo, Hamburg etc.

If you use the orderHier function, nodes will be the second output of the function.

Part II: Prophet Inputs

Anything that you would specify in Prophet you can specify in hts().

It’s flexible and will allow you to input a dataframe of values for inputs like cap, capF, and changepoints.

All of these inputs are specified when you call hts, and after that you just let it run.

The following is the description of inputs and outputs for hts as well as the specified defaults:

Parameters
----------------
 y - dataframe of time-series data
           Layout:
               0th Col - Time instances
               1st Col - Total of TS
               2nd Col - One of the children of the Total TS
               3rd Col - The other child of the Total TS
               ...
               ... Rest of the 1st layer
               ...
               Xth Col - First Child of the 2nd Col
               ...
               ... All of the 2nd Col's Children
               ...
               X+Yth Col - First Child of the 3rd Col
               ...
               ..
               .   And so on...

 h - number of step ahead forecasts to make (int)

 nodes - a list or list of lists of the number of child nodes at each level
 Ex. if the hierarchy is one total with two child nodes that comprise it, the nodes input would be [2]
 
 method – (String)  the type of hierarchical forecasting method that the user wants to use. 
            Options:
            "OLS" - optimal combination using ordinary least squares (Default), 
            "WLSS" - optimal combination using structurally weighted least squares, 
            "WLSV" - optimal combination using variance weighted least squares, 
            "FP" - forcasted proportions (top-down)
            "PHA" - proportions of historical averages (top-down)
            "AHP" - average historical proportions (top-down)
            "BU" - bottom-up (simple addition)
            "CVselect" - select which method is best for you based on 3-fold Cross validation (longer run time)
 
 freq - (Time Frequency) input for the forecasting function of Prophet 
 
 include_history - (Boolean) input for the forecasting function of Prophet
 
 transform - (None or "BoxCox") Do you want to transform your data before fitting the prophet function? If yes, type "BoxCox"
            
 cap - (Dataframe or Constant) carrying capacity of the input time series.  If it is a dataframe, then
                               the number of columns must equal len(y.columns) - 1
                               
 capF - (Dataframe or Constant) carrying capacity of the future time series.  If it is a dataframe, then
                                the number of columns must equal len(y.columns) - 1
 
 changepoints - (DataFrame or List) changepoints for the model to consider fitting. If it is a dataframe, then
                                    the number of columns must equal len(y.columns) - 1
 
 n_changepoints - (constant or list) changepoints for the model to consider fitting. If it is a list, then
                                     the number of items must equal len(y.columns) - 1
 skipFitting - (Boolean) if y is already a dictionary of dataframes, set this to True, and DO NOT run with method = "cvSelect" or transform = "BoxCox"
 
 numThreads - (int) number of threads you want to use when running cvSelect. Note: 14 has shown to decrease runtime by 10 percent 
 
 All other inputs - see Prophet
 
Returns
-----------------
 ynew - a dictionary of DataFrames with predictions, seasonalities and trends that can all be plotted

Don’t forget to specify the frequency if you’re not using daily data.

All other functions should be self-explanatory.

Part III: Room For Improvement

  1. Prediction intervals
Comments
  • Issue With runHTS.py

    Issue With runHTS.py

    Hi,

    Seems to be a bug when running runHTS.py in python 3. There is an issue with holidays producing ValueError: holidays must be a DataFrame with 'ds' and 'holiday' columns, despite it being clearly defined.

    opened by adwarner 3
  • Call fbProphet objects with keyword arguments

    Call fbProphet objects with keyword arguments

    The fbProphet objects in fitForecast.py are currently instantiated with positional arguments. Changing it to keyword arguments makes it less vulnerable to API changes. This fixes #3 and #6.

    opened by emorfam 0
  • Problem dataframe

    Problem dataframe

    d = {'Total': [100],'GP48_1': [100], 'GP48_2': [100] , 'GP48_1_GP48_cogenpct': [100], 'GP48_1_GP48_coalpct': [100] , 'GP48_1_GP48_CCpct': [100], 'GP48_2_GP48_windpct': [100] , 'GP48_2_GP48_biomasspct': [100],'GP48_2_GP48_thermosolarpct': [100] , 'GP48_2_GP48_PVpct': [100], 'GP48_2_GP48_nuclearpct': [100] , 'GP48_2_GP48_hydrototalpct': [100] } capDataset=pd.DataFrame(d,columns=['Total','GP48_1', 'GP48_2', 'GP48_1_GP48_cogenpct', 'GP48_1_GP48_coalpct', 'GP48_1_GP48_CCpct', 'GP48_2_GP48_windpct', 'GP48_2_GP48_biomasspct', 'GP48_2_GP48_thermosolarpct', 'GP48_2_GP48_PVpct', 'GP48_2_GP48_nuclearpct', 'GP48_2_GP48_hydrototalpct'])

    pred = hts(data3, 1, nodes, method = "BU",cap=100,capF=capDataset)

    opened by junajo10 0
  • runHTS.py does not work

    runHTS.py does not work

    When trying to run the example script I get the following error:

    INFO:matplotlib.font_manager:font search path ['/opt/conda/lib/python3.6/site-packages/matplotlib/mpl-data/fonts/ttf', '/opt/conda/lib/python3.6/site-packages/matplotlib/mpl-data/fonts/afm', '/opt/conda/lib/python3.6/site-packages/matplotlib/mpl-data/fonts/pdfcorefonts'] INFO:matplotlib.font_manager:generated new fontManager

    TypeError Traceback (most recent call last) in 44 # NOTE: CVselect takes a while, so if you want results in minutes instead of half-hours pick a different method 45 ## ---> 46 myDict = hts(data2, 52, nodes, holidays = holidays, method = "FP", transform = "BoxCox") 47 ## 48 # This output is a dictionary of dataframes, so you can do any further analysis that you may want. It also allows you to plot the forecasts.

    /opt/conda/lib/python3.6/site-packages/htsprophet/hts.py in hts(y, h, nodes, method, freq, transform, include_history, cap, capF, changepoints, n_changepoints, yearly_seasonality, weekly_seasonality, holidays, seasonality_prior_scale, holidays_prior_scale, changepoint_prior_scale, mcmc_samples, interval_width, uncertainty_samples, skipFitting, numThreads) 270 ynew = fitForecast(y, h, sumMat, nodes, method, freq, include_history, cap, capF, changepoints, n_changepoints,
    271 yearly_seasonality, weekly_seasonality, holidays, seasonality_prior_scale, holidays_prior_scale,
    --> 272 changepoint_prior_scale, mcmc_samples, interval_width, uncertainty_samples, boxcoxT, skipFitting) 273 ## 274 # Inverse boxcox the data

    /opt/conda/lib/python3.6/site-packages/htsprophet/fitForecast.py in fitForecast(y, h, sumMat, nodes, method, freq, include_history, cap, capF, changepoints, n_changepoints, yearly_seasonality, weekly_seasonality, holidays, seasonality_prior_scale, holidays_prior_scale, changepoint_prior_scale, mcmc_samples, interval_width, uncertainty_samples, boxcoxT, skipFitting) 72 growth = 'linear' 73 m = Prophet(growth, changepoints1, n_changepoints1, yearly_seasonality, weekly_seasonality, holidays, seasonality_prior_scale,
    ---> 74 holidays_prior_scale, changepoint_prior_scale, mcmc_samples, interval_width, uncertainty_samples) 75 else: 76 growth = 'logistic'

    /opt/conda/lib/python3.6/site-packages/fbprophet/forecaster.py in init(self, growth, changepoints, n_changepoints, changepoint_range, yearly_seasonality, weekly_seasonality, daily_seasonality, holidays, seasonality_mode, seasonality_prior_scale, holidays_prior_scale, changepoint_prior_scale, mcmc_samples, interval_width, uncertainty_samples) 140 self.component_modes = None 141 self.train_holiday_names = None --> 142 self.validate_inputs() 143 144 def validate_inputs(self):

    /opt/conda/lib/python3.6/site-packages/fbprophet/forecaster.py in validate_inputs(self) 147 raise ValueError( 148 "Parameter 'growth' should be 'linear' or 'logistic'.") --> 149 if ((self.changepoint_range < 0) or (self.changepoint_range > 1)): 150 raise ValueError("Parameter 'changepoint_range' must be in [0, 1]") 151 if self.holidays is not None:

    TypeError: '<' not supported between instances of 'str' and 'int'

    opened by XanderHorn 5
  • can the nodes values not be same ?

    can the nodes values not be same ?

    in your post the nodes has this structure :[[3],[2,2,2],[4,4,4,4,4,4]] which is a strict rules in general we will have bottom layer not same leaf values

    like [[3],[2,1,3],[4,1,1,1,1,4]] ?? can this be possible ???

    opened by wentixiaogege 0
  • Already have  forecasts how to generate a set of revised forecasts that would aggregate consistently with the hierarchical structure

    Already have forecasts how to generate a set of revised forecasts that would aggregate consistently with the hierarchical structure

    I already generated independent base forecast for each series in the hierarchy, As these base forecasts are independently generated they will not be aggregate consistent" (i.e., they will not add up according to the hierarchical structure).I just want to generates a set of revised forecasts that would aggregate consistently with the hierarchical structure. Can use you py funciton? I looking forward to hearing from u , thanks

    opened by jnuvenus 2
Owner
Collin Rooney
Collin Rooney
Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis.

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

Facebook Research 4.1k Dec 29, 2022
Stock Price Prediction Bank Jago Using Facebook Prophet Machine Learning & Python

Stock Price Prediction Bank Jago Using Facebook Prophet Machine Learning & Python Overview Bank Jago has attracted investors' attention since the end

Najibulloh Asror 3 Feb 10, 2022
A python library for easy manipulation and forecasting of time series.

Time Series Made Easy in Python darts is a python library for easy manipulation and forecasting of time series. It contains a variety of models, from

Unit8 5.2k Jan 4, 2023
Time series forecasting with PyTorch

Our article on Towards Data Science introduces the package and provides background information. Pytorch Forecasting aims to ease state-of-the-art time

Jan Beitner 2.5k Jan 2, 2023
ETNA is an easy-to-use time series forecasting framework.

ETNA is an easy-to-use time series forecasting framework. It includes built in toolkits for time series preprocessing, feature generation, a variety of predictive models with unified interface - from classic machine learning to SOTA neural networks, models combination methods and smart backtesting. ETNA is designed to make working with time series simple, productive, and fun.

Tinkoff.AI 674 Jan 7, 2023
Nixtla is an open-source time series forecasting library.

Nixtla Nixtla is an open-source time series forecasting library. We are helping data scientists and developers to have access to open source state-of-

Nixtla 401 Jan 8, 2023
ETNA – time series forecasting framework

ETNA Time Series Library Predict your time series the easiest way Homepage | Documentation | Tutorials | Contribution Guide | Release Notes ETNA is an

Tinkoff.AI 675 Jan 8, 2023
Continuously evaluated, functional, incremental, time-series forecasting

timemachines Autonomous, univariate, k-step ahead time-series forecasting functions assigned Elo ratings You can: Use some of the functionality of a s

Peter Cotton 343 Jan 4, 2023
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

null 2.3k Jan 5, 2023
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

Facebook 15.4k Jan 7, 2023
Open source time series library for Python

PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible array

Ross Taylor 2k Jan 2, 2023
Automatic extraction of relevant features from time series:

tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis

Blue Yonder GmbH 7k Jan 6, 2023
A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

The Alan Turing Institute 6k Jan 6, 2023
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

null 2.3k Dec 29, 2022
Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

Amazon Web Services - Labs 3.3k Jan 3, 2023
STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

TD Ameritrade 2.5k Jan 6, 2023
A Python package for time series classification

pyts: a Python package for time series classification pyts is a Python package for time series classification. It aims to make time series classificat

Johann Faouzi 1.4k Jan 1, 2023
Python module for machine learning time series:

seglearn Seglearn is a python package for machine learning time series or sequences. It provides an integrated pipeline for segmentation, feature extr

David Burns 536 Dec 29, 2022