Lightning ⚡️ fast forecasting with statistical and econometric models.

Nixtla

Last update: Dec 29, 2022

Related tags

Machine Learning python machine-learning statistics timeseries time-series econometrics forecasting arima

Overview

Nixtla

Statistical ⚡️ Forecast

Lightning fast forecasting with statistical and econometric models

StatsForecast offers a collection of widely used univariate time series forecasting models, including exponential smoothing and automatic ARIMA modeling optimized for high performance using numba.

🔥 Highlights

Fastest and most accurate auto_arima in Python and R.
New!: Replace FB-Prophet in two lines of code and gain speed and accuracy. Check the experiments here.
New!: Distributed computation in clusters with ray. (Forecast 1M series in 30min)
New!: Good Ol' sklearn syntax with AutoARIMA().fit(y).predict(h=7).

🎊 Features

Inclusion of exogenous variables and prediction intervals.
Out of the box implementation of exponential smoothing, croston, seasonal naive, random walk with drift and tbs.
20x faster than pmdarima.
1.5x faster than R.
500x faster than Prophet.
Compiled to high performance machine code through numba.
1,000,000 series in 30 min with ray.

Missing something? Please open an issue or write us in

📖 Why?

Current Python alternatives for statistical models are slow and inaccurate. So we created a library that can be used to forecast in production environments or as benchmarks. StatsForecast includes an extensive battery of models that can efficiently fit thousands of time series.

🔬 Accuracy

We compared accuracy and speed against: pmdarima, Rob Hyndman's forecast package and Facebook's Prophet. We used the Daily, Hourly and Weekly data from the M4 competition.

The following table summarizes the results. As can be seen, our auto_arima is the best model in accuracy (measured by the MASE loss) and time, even compared with the original implementation in R.

dataset	metric	nixtla	pmdarima [1]	auto_arima_r	prophet
M4-Daily	MASE	3.26	3.35	4.46	14.26
M4-Daily	time	1.41	27.61	1.81	514.33
M4-Hourly	MASE	0.92	---	1.02	1.78
M4-Hourly	time	12.92	---	23.95	17.27
M4-Weekly	MASE	2.34	2.47	2.58	7.29
M4-Weekly	time	0.42	2.92	0.22	19.82

[1] The model auto_arima from pmdarima had problems with Hourly data. An issue was opened in their repo.

The following table summarizes the data details.

group	n_series	mean_length	std_length	min_length	max_length
Daily	4,227	2,371	1,756	107	9,933
Hourly	414	901	127	748	1,008
Weekly	359	1,035	707	93	2,610

⏲ Computational efficiency

We measured the computational time against the number of time series. The following graph shows the results. As we can see, the fastest model is our auto_arima.

Nixtla vs Prophet

You can reproduce the results here.

External regressors

Results with external regressors are qualitatively similar to the reported before. You can find the complete experiments here.

👾 Less code

📖 Documentation

Here is a link to the documentation.

🧬 Getting Started

Example Jupyter Notebook

💻 Installation

PyPI

You can install the released version of StatsForecast from the Python package index with:

pip install statsforecast

(Installing inside a python virtualenvironment or a conda environment is recommended.)

Conda

Also you can install the released version of StatsForecast from conda with:

conda install -c conda-forge statsforecast

(Installing inside a python virtualenvironment or a conda environment is recommended.)

Dev Mode

If you want to make some modifications to the code and see the effects in real time (without reinstalling), follow the steps below:

git clone https://github.com/Nixtla/statsforecast.git
cd statsforecast
pip install -e .

🧬 How to use

The following example needs ipython and matplotlib as additional packages. If not installed, install it via your preferred method, e.g. pip install ipython matplotlib.

import numpy as np
import pandas as pd
from IPython.display import display, Markdown

import matplotlib.pyplot as plt
from statsforecast import StatsForecast
from statsforecast.models import seasonal_naive, auto_arima
from statsforecast.utils import AirPassengers

horizon = 12
ap_train = AirPassengers[:-horizon]
ap_test = AirPassengers[-horizon:]

series_train = pd.DataFrame(
    {
        'ds': pd.date_range(start='1949-01-01', periods=ap_train.size, freq='M'),
        'y': ap_train
    },
    index=pd.Index([0] * ap_train.size, name='unique_id')
)

fcst = StatsForecast(
    series_train, 
    models=[(auto_arima, 12), (seasonal_naive, 12)], 
    freq='M', 
    n_jobs=1
)
forecasts = fcst.forecast(12, level=(80, 95))

forecasts['y_test'] = ap_test

fig, ax = plt.subplots(1, 1, figsize = (20, 7))
df_plot = pd.concat([series_train, forecasts]).set_index('ds')
df_plot[['y', 'y_test', 'auto_arima_season_length-12_mean', 'seasonal_naive_season_length-12']].plot(ax=ax, linewidth=2)
ax.fill_between(df_plot.index, 
                df_plot['auto_arima_season_length-12_lo-80'], 
                df_plot['auto_arima_season_length-12_hi-80'],
                alpha=.35,
                color='green',
                label='auto_arima_level_80')
ax.fill_between(df_plot.index, 
                df_plot['auto_arima_season_length-12_lo-95'], 
                df_plot['auto_arima_season_length-12_hi-95'],
                alpha=.2,
                color='green',
                label='auto_arima_level_95')
ax.set_title('AirPassengers Forecast', fontsize=22)
ax.set_ylabel('Monthly Passengers', fontsize=20)
ax.set_xlabel('Timestamp [t]', fontsize=20)
ax.legend(prop={'size': 15})
ax.grid()
for label in (ax.get_xticklabels() + ax.get_yticklabels()):
    label.set_fontsize(20)

Adding external regressors

series_train['trend'] = np.arange(1, ap_train.size + 1)
series_train['intercept'] = np.ones(ap_train.size)
series_train['month'] = series_train['ds'].dt.month
series_train = pd.get_dummies(series_train, columns=['month'], drop_first=True)

display_df(series_train.head())

ds	y	trend	intercept	month_2	month_3	month_4	month_5
1949-01-31 00:00:00	112	1	1	0	0	0	0
1949-02-28 00:00:00	118	2	1	1	0	0	0
1949-03-31 00:00:00	132	3	1	0	1	0	0
1949-04-30 00:00:00	129	4	1	0	0	1	0
1949-05-31 00:00:00	121	5	1	0	0	0	1

xreg_test = pd.DataFrame(
    {
        'ds': pd.date_range(start='1960-01-01', periods=ap_test.size, freq='M')
    },
    index=pd.Index([0] * ap_test.size, name='unique_id')
)

xreg_test['trend'] = np.arange(133, ap_test.size + 133)
xreg_test['intercept'] = np.ones(ap_test.size)
xreg_test['month'] = xreg_test['ds'].dt.month
xreg_test = pd.get_dummies(xreg_test, columns=['month'], drop_first=True)

fcst = StatsForecast(
    series_train, 
    models=[(auto_arima, 12), (seasonal_naive, 12)], 
    freq='M', 
    n_jobs=1
)
forecasts = fcst.forecast(12, xreg=xreg_test, level=(80, 95))

forecasts['y_test'] = ap_test

🔨 How to contribute

See CONTRIBUTING.md.

📃 References

The auto_arima model is based (translated) from the R implementation included in the forecast package developed by Rob Hyndman.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_fede 💻	_{José Morales} 💻 🚧	_{Sugato Ray} 💻	_{Jeff Tackes} 🐛	_darinkist 🤔	_{Alec Helyar} 💬	_{Dave Hirschfeld} 💬
_mergenthaler 💻	_Kin 💻	_Yasslight90 🤔	_asinig 🤔	_{Philip Gillißen} 💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Comments

Adding `settings.ini` to PyPI source
The setup.py file reads in the config values from settings.ini. So, absence of settings.ini file in the source distribution (*.tar.gz file), leads to failure in installation of statsforecast.

[x] Currently (v0.3.0) the PyPI source does not include the settings.ini file. This PR fixes that.

[ ] ~~Changes to README.md file~~:

[ ] ~~Fixed some of the formatting errors~~.

[ ] ~~Fixed some broken URLs~~.

The relative URLs do not render properly on PyPI. Converted them from relative to absolute URLs.

Closes #24
ready to merge
opened by sugatoray 13

Comparing ETS with Statsmodels ExponentialSmoothing

Hi guys,

I was playing a little with ETS to see whether we could include it in Darts. For the time being I'm having a hard time to have it outperform statsmodels in terms of runtime (I haven't looked at accuracy). Is there any special trick (or a special regime to be in) in order for the statsforcast version to run faster?

Here is the small benchmark I ran:

import numpy as np
import time

from statsforecast.models import ETS as SF_ETS
import statsmodels.tsa.holtwinters as hw

values = np.random.rand(100,)

# First, we make a dry run for jit:
model_sf = SF_ETS()
model_sf.fit(values)
_ = model_sf.predict(10)

model_sm = hw.ExponentialSmoothing(values)
sm_res = model_sm.fit()
_ = sm_res.forecast(10)

# Now, a small benchmark:
tic = time.time()
for _ in range(100):
    model_sf = SF_ETS()
    model_sf.fit(values)
    _ = model_sf.predict(10)
    
print("Time taken by SF ETS : {:.2f} s.".format(time.time() - tic))

tic = time.time()
for _ in range(100):
    model_sm = hw.ExponentialSmoothing(values)
    sm_res = model_sm.fit()
    _ = sm_res.forecast(10)
    
print("Time taken by SM ETS : {:.2f} s.".format(time.time() - tic))

And I get results like

Time taken by SF ETS : 1.54 s.
Time taken by SM ETS : 0.38 s.

opened by hrzn 12

:sparkles: add plotly-resampler as plotting engine
This PR adds "plotly-resampler" as option for the plotting engine of StatsForecast.plot method, see #342

Would love to hear your feedback on this!

Example use of plotly-resampler (in the ElectricityLoadForecasting.ipynb notetbook)

Note that plotly-resampler properties can be passed in the resampler_kwargs argument. :arrow_down: illustrates how for example the number of shown samples can be changed!

StatsForecast.plot(df, engine="plotly-resampler", resampler_kwargs={"default_n_shown_samples": 3000, "show_dash": {"mode": "inline"}})

PS: also fixed a minor typo in the CONTRIBUTING.md file
opened by jvdd 9
fix: make logging config local to package

The core notebook created a logging config when the notebook was ran, but also when the generated core.py file was imported. This prevented anyone who imports statsforecast from setting their own logging config in the most commonly used way.

More details in the linked issue. Fixes Nixtla/statsforecast#275

opened by JeroenPeterBos 7

"Frequency too high" for anything finer than monthly, but I don't have enough data to sample monthly

Describe the bug I'm trying to follow the "Getting started" example with my own data, which happens to be sampled hourly. So I set season_length = 365 * 24, but got the following error:

...

File /opt/conda/lib/python3.9/site-packages/statsforecast/models.py:245, in ETS.forecast(self, y, h, X, X_future, fitted)
    237 def forecast(
    238         self, 
    239         y: np.ndarray, # time series
   (...)
    243         fitted: bool = False, # return fitted values?
    244     ):
--> 245     mod = ets_f(y, m=self.season_length, model=self.model)
    246     fcst = forecast_ets(mod, h)
    247     keys = ['mean']

File /opt/conda/lib/python3.9/site-packages/statsforecast/ets.py:937, in ets_f(y, m, model, damped, alpha, beta, gamma, phi, additive_only, blambda, biasadj, lower, upper, opt_crit, nmse, bounds, ic, restrict, allow_multiplicative_trend, use_initial_values, maxit)
    935 if m > 24:
    936     if seasontype in ['A', 'M']:
--> 937         raise ValueError('Frequency too high')
    938     elif seasontype == 'Z':
    939         warnings.warn(
    940             "I can't handle data with frequency greater than 24. " 
    941             "Seasonality will be ignored."
    942         )

ValueError: Frequency too high

If I resample daily and set season_length = 365, I too get a ValueError: Frequency too high. Same goes with a weekly resample and season_length = 52.

I only have 2 incomplete years of data, so I can't resample monthly: I get various versions of

ValueError: order must be 3 non-negative integers, got (0, 0, 0)

To Reproduce (I can work on a reproducer if it helps, it will take me some time)

Expected behavior The models work with higher frequencies.

Desktop (please complete the following information):

OS: Ubuntu 22.04
Browser: Firefox
Version: 1.0.0

opened by astrojuanlu 6

Stability of the API

Hi, great work on statsforecasts! This package looks very nice. I'm considering integrating a couple of the models in Darts (https://github.com/unit8co/darts). I'm wondering about your future plans - do you intend to maintain this package on the long term? How likely can we expect API changes in the future releases?

Also as a side note - I took a quick look at the Croston method, and it looks like the method accepts h and future_xreg, which I'm not sure is intended as those are not used.

In general I think slightly more documentations on your different models could be helpful for users :)
question

opened by hrzn 6
Compute residuals

I'm currently trying to perform some forecastings on a set of daily time series and I was wondering whether is there a way to get the predictions on the training data, that are used to compute the residuals (difference between actual and predictions in the train). In StatsForecast class there is no possibility for doing that. I'm mainly interested to obtain them with auto_arima approach, but it could be extended also for the remaining approaches.

Is it possible to add a method or attribute to get them?

Thank you
question arima

opened by asinig 6
[question] Model summary table for ARIMA model

Hi! I was wondering if you have implemented (or planning to implement) a model summary table for the ARIMA model that contain the coefficients, their p-values, etc.?

Like https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMAResults.summary.html

Many thanks!
enhancement arima

opened by darinkist 6
[docs] HTML in index notebook messes up online docs

This is how the docs render locally for me with the latest changes to the readme. I believe this is due to the embedded html, maybe we can try to achieve that formatting without using html.

bug documentation

opened by jmoralez 6
ValueError: math domain error

I get ValueError: math domain error cause by tmp['bic'] = tmp['aic'] + npar*(math.log(nstar) - 2) from statsforecast/arima.py", line 1225,

I guess nstar is not > 0

opened by dalpozz 6
Exception: no model able to be fitted Error on AutoARIMA

I am trying to solve a timeseries problem with intermittent zero demand in the timeframe(Monthly data). I am getting this warning/error.

/opt/conda/lib/python3.7/site-packages/statsforecast/arima.py:866: RuntimeWarning: divide by zero encountered in log return 0.5 * np.log(res) /opt/conda/lib/python3.7/site-packages/statsforecast/ets.py:443: RuntimeWarning: divide by zero encountered in double_scalars l0 = l0 / b0 /opt/conda/lib/python3.7/site-packages/statsforecast/ets.py:443: RuntimeWarning: divide by zero encountered in double_scalars l0 = l0 / b0 /opt/conda/lib/python3.7/site-packages/statsforecast/ets.py:448: RuntimeWarning: invalid value encountered in float_scalars b0 = max(y_sa[1] / y_sa[0], 1e-3) /opt/conda/lib/python3.7/site-packages/statsforecast/ets.py:448: RuntimeWarning: invalid value encountered in float_scalars b0 = max(y_sa[1] / y_sa[0], 1e-3) /opt/conda/lib/python3.7/site-packages/statsforecast/ets.py:443: RuntimeWarning: divide by zero encountered in double_scalars l0 = l0 / b0 /opt/conda/lib/python3.7/site-packages/statsforecast/ets.py:448: RuntimeWarning: invalid value encountered in float_scalars b0 = max(y_sa[1] / y_sa[0], 1e-3) /opt/conda/lib/python3.7/site-packages/statsforecast/ets.py:443: RuntimeWarning: divide by zero encountered in double_scalars l0 = l0 / b0 /opt/conda/lib/python3.7/site-packages/statsforecast/ets.py:448: RuntimeWarning: invalid value encountered in float_scalars b0 = max(y_sa[1] / y_sa[0], 1e-3) /opt/conda/lib/python3.7/site-packages/statsforecast/ets.py:448: RuntimeWarning: invalid value encountered in double_scalars b0 = max(y_sa[1] / y_sa[0], 1e-3)

and throws an error showing

Exception: no model able to be fitted

Any thoughts on how this can be resolved? How can I use your package for this?

Regards Shravan
bug

opened by shravankoninti 5

[FugueBackend: Dask Distributed] Result is empty when using remote dask cluster

What happened + What you expected to happen

I wanted to try distributed computation using dask. I followed the docs https://nixtla.github.io/statsforecast/distributed.fugue.html and it's working fine using dask local client.

However, when I attempted to use a remote dask cluster, I got empty result:

from dask.distributed import Client
from fugue_dask import DaskExecutionEngine
from statsforecast import StatsForecast
from statsforecast.models import Naive
from statsforecast.utils import generate_series
from statsforecast.distributed.fugue import FugueBackend
import pandas as pd

# Instantiate FugueBackend with DaskExecutionEngine
dask_client = Client('tcp://***:***')
engine = DaskExecutionEngine(dask_client=dask_client)
remote_backend = FugueBackend(engine=engine, as_local=True)
Y_df = pd.read_parquet('https://datasets-nixtla.s3.amazonaws.com/m4-hourly.parquet')


from statsforecast.models import (
    AutoARIMA,
    HoltWinters,
    CrostonClassic as Croston, 
    HistoricAverage,
    DynamicOptimizedTheta as DOT,
    SeasonalNaive
)


# Create a list of models and instantiation parameters
models = [
    AutoARIMA(season_length=24),
    HoltWinters(),
    Croston(),
    SeasonalNaive(season_length=24),
    HistoricAverage(),
    DOT(season_length=24)
]

# Instantiate StatsForecast class as sf
sf = StatsForecast(
    df=Y_df, 
    models=models,
    freq='H', 
    n_jobs=-1,
    fallback_model = SeasonalNaive(season_length=7),
    backend=remote_backend
)

forecasts_df = sf.forecast(h=48, level=[90])
print(forecasts_df.size) # returns 0

I didn't get any errors, but the result is empty

Versions / Dependencies

Local setup:

OS: mac Python: 3.10 py packages:

dask==2022.12.1
dask-cloudprovider==2022.10.0
datasetsforecast==0.0.7
distributed==2022.12.1
fugue==0.7.3
fugue-sql-antlr==0.1.1
s3fs==2022.11.0
statsforecast==1.4.0
pandas==1.5.2

Dask cluster:

scheduer & worker: dask, version 2022.12.1
Python: 3.8

Reproduction script

Create a remote dask cluster
Use the code above to connect to dask scheduler, and run the sample

Issue Severity

High: It blocks me from completing my task.

bug

opened by ibyter 1

[Core] Make StatsForecast.forecast_fitted_values() possible when using a Fugue backend

Description

When using StatsForecast.forecast_fitted_values() while the backend parameter in the StatsForecast object is set to my Fugue backend, I get the following error: NotImplementedError: Execution with a distributed backend only supports forecast and cross_validation methods. Try setting the backend parameter to None.

Use case

I would like to use StatsForecast.forecast_fitted_values() with my Fugue backend so I can use the fitted values in reconciliation approaches provided by the hierarchicalforecast package, while training the models for the hierarchy quickly using Spark.
enhancement

opened by wregter 0
[AutoETS] add support for exogenous variables

Description

Currently AutoARIMA does support exogenous variables. Which is great because it allows for including some external information into your model without losing yourself in complexity or giving up on the effectiveness of 'simple' statistical techniques.

In addition to this it might prove worth while to add this feature to AutoETS.

There's this paper in which they implement it: https://www.monash.edu/business/econometrics-and-business-statistics/research/publications/ebs/wp02-15.pdf

Additionally, in this post Stephen Kolassa suggests a couple of ways of implementing it, among which doing it similarly to AutoARIMA (regression with ARMA errors): https://stats.stackexchange.com/questions/220830/holt-winters-with-exogenous-regressors-in-r

Hyndman also shares some of his thought on the problem (old post though): https://robjhyndman.com/hyndsight/ets-regressors/

Maybe, we can get Osman to share his code with us.

Could be that there's other people implementing it too. But I haven't come across it yet.

Use case

No response
enhancement

opened by Beerstabr 0
[Core] Make train_ds accessible to models

Description

Models in statsforecast, mlforecast, and neuraforecast currently can only receive train_y but not train_ds.

Other models such as uber/orbit can process arbitrary sparse training data (e.g. train_ds=[2010, 2011, 2014, 2016]).

Use case

I am trying to build an adapter for the orbit API to be integrated into the nixtla model ecosystem (implement custom adapter class with the fit, predict)
enhancement

opened by Elijas 0

[FugueBackend] Forecast with Exogenous variables fails using a Spark backend

What happened + What you expected to happen

When trying to use the FugueBackend class to distribute my exogenous forecasts, I encounter the following error

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<command-2016733390302825> in <cell line: 13>()
     11 spark = SparkSession.builder.getOrCreate()
     12 backend = FugueBackend(spark, {"fugue.spark.use_pandas_udf":True})
---> 13 forecasts = forecast(
     14     spark.createDataFrame(df),
     15     models=[ETS()],

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/statsforecast/distributed/utils.py in forecast(df, models, freq, h, X_df, level, parallel)
     20 ):
     21     backend = parallel if parallel is not None else ParallelBackend()
---> 22     return backend.forecast(df, models, freq, h=h, X_df=X_df, level=level)
     23 
     24 # %% ../../nbs/distributed.utils.ipynb 6

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/statsforecast/distributed/fugue.py in forecast(self, df, models, freq, **kwargs)
     81         schema = schema + ",AutoARIMA_lo_99:float, AutoARIMA_hi_99:float"
     82         print(schema)
---> 83         return transform(
     84             df,
     85             self._forecast_series,

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/fugue/interfaceless.py in transform(df, using, schema, params, partition, callback, ignore_errors, engine, engine_conf, force_output_fugue_dataframe, persist, as_local, save_path, checkpoint)
    134     else:
    135         src = dag.df(df)
--> 136     tdf = src.transform(
    137         using=using,
    138         schema=schema,

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/fugue/workflow/workflow.py in transform(self, using, schema, params, pre_partition, ignore_errors, callback)
    539         if pre_partition is None:
    540             pre_partition = self.partition_spec
--> 541         df = self.workflow.transform(
    542             self,
    543             using=using,

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/fugue/workflow/workflow.py in transform(self, using, schema, params, pre_partition, ignore_errors, callback, *dfs)
   1954         tf._has_rpc_client = not isinstance(callback, EmptyRPCHandler)  # type: ignore
   1955         tf.validate_on_compile()
-> 1956         return self.process(
   1957             *dfs,
   1958             using=RunTransformer,

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/fugue/workflow/workflow.py in process(self, using, schema, params, pre_partition, *dfs)
   1615             using = _PROCESSOR_REGISTRY.get(using)
   1616         _dfs = self._to_dfs(*dfs)
-> 1617         task = Process(
   1618             len(_dfs),
   1619             processor=using,

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/fugue/workflow/_tasks.py in __init__(self, input_n, processor, schema, params, pre_partition, deterministic, lazy, input_names)
    314     ):
    315         self._processor = _to_processor(processor, schema)
--> 316         self._processor._params = ParamDict(params)
    317         self._processor._partition_spec = PartitionSpec(pre_partition)
    318         self._processor.validate_on_compile()

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/triad/collections/dict.py in __init__(self, data, deep)
    175     def __init__(self, data: Any = None, deep: bool = True):
    176         super().__init__()
--> 177         self.update(data, deep=deep)
    178 
    179     def __setitem__(  # type: ignore

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/triad/collections/dict.py in update(self, other, on_dup, deep)
    262         for k, v in to_kv_iterable(other):
    263             if on_dup == ParamDict.OVERWRITE or k not in self:
--> 264                 self[k] = copy.deepcopy(v) if deep else v
    265             elif on_dup == ParamDict.THROW:
    266                 raise KeyError(f"{k} exists in dict")

/usr/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    144     copier = _deepcopy_dispatch.get(cls)
    145     if copier is not None:
--> 146         y = copier(x, memo)
    147     else:
    148         if issubclass(cls, type):

/usr/lib/python3.9/copy.py in _deepcopy_dict(x, memo, deepcopy)
    228     memo[id(x)] = y
    229     for key, value in x.items():
--> 230         y[deepcopy(key, memo)] = deepcopy(value, memo)
    231     return y
    232 d[dict] = _deepcopy_dict

/usr/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    144     copier = _deepcopy_dispatch.get(cls)
    145     if copier is not None:
--> 146         y = copier(x, memo)
    147     else:
    148         if issubclass(cls, type):

/usr/lib/python3.9/copy.py in _deepcopy_dict(x, memo, deepcopy)
    228     memo[id(x)] = y
    229     for key, value in x.items():
--> 230         y[deepcopy(key, memo)] = deepcopy(value, memo)
    231     return y
    232 d[dict] = _deepcopy_dict

/usr/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    170                     y = x
    171                 else:
--> 172                     y = _reconstruct(x, memo, *rv)
    173 
    174     # If is its own copy, don't memoize.

/usr/lib/python3.9/copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    268     if state is not None:
    269         if deep:
--> 270             state = deepcopy(state, memo)
    271         if hasattr(y, '__setstate__'):
    272             y.__setstate__(state)

/usr/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    144     copier = _deepcopy_dispatch.get(cls)
    145     if copier is not None:
--> 146         y = copier(x, memo)
    147     else:
    148         if issubclass(cls, type):

/usr/lib/python3.9/copy.py in _deepcopy_dict(x, memo, deepcopy)
    228     memo[id(x)] = y
    229     for key, value in x.items():
--> 230         y[deepcopy(key, memo)] = deepcopy(value, memo)
    231     return y
    232 d[dict] = _deepcopy_dict

/usr/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    170                     y = x
    171                 else:
--> 172                     y = _reconstruct(x, memo, *rv)
    173 
    174     # If is its own copy, don't memoize.

/usr/lib/python3.9/copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    268     if state is not None:
    269         if deep:
--> 270             state = deepcopy(state, memo)
    271         if hasattr(y, '__setstate__'):
    272             y.__setstate__(state)

/usr/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    144     copier = _deepcopy_dispatch.get(cls)
    145     if copier is not None:
--> 146         y = copier(x, memo)
    147     else:
    148         if issubclass(cls, type):

/usr/lib/python3.9/copy.py in _deepcopy_dict(x, memo, deepcopy)
    228     memo[id(x)] = y
    229     for key, value in x.items():
--> 230         y[deepcopy(key, memo)] = deepcopy(value, memo)
    231     return y
    232 d[dict] = _deepcopy_dict

/usr/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    159                     reductor = getattr(x, "__reduce_ex__", None)
    160                     if reductor is not None:
--> 161                         rv = reductor(4)
    162                     else:
    163                         reductor = getattr(x, "__reduce__", None)

/databricks/spark/python/pyspark/context.py in __getnewargs__(self)
    493     def __getnewargs__(self) -> NoReturn:
    494         # This method is called when attempting to pickle SparkContext, which is always an error:
--> 495         raise RuntimeError(
    496             "It appears that you are attempting to reference SparkContext from a broadcast "
    497             "variable, action, or transformation. SparkContext can only be used on the driver, "

RuntimeError: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

Versions / Dependencies

statsforecast: 1.0.0 fugue: 0.7.3 python: 3.9 OS: Unbuntu

Reproduction script

from statsforecast.distributed.utils import forecast
from statsforecast.distributed.fugue import FugueBackend
from statsforecast.models import ETS
from statsforecast.core import StatsForecast

from pyspark.sql import SparkSession
df = pd.DataFrame({"ds": [1, 2, 3,4,5,6,7,8,9], "y": [1,2,3,4,5,6,7,8,9], "x":[1,2,3,4,5,6,7,8,9]})
df["unique_id"] = 1

X_df = pd.DataFrame({"ds":[4], "x":[4], "unique_id":1})
spark = SparkSession.builder.getOrCreate()
backend = FugueBackend(spark, {"fugue.spark.use_pandas_udf":True})
forecasts = forecast(
    spark.createDataFrame(df),
    models=[ETS()],
    X_df=spark.createDataFrame(X_df),
    h=1,
    freq="D",
    parallel=backend)

Issue Severity

High: It blocks me from completing my task.

bug

opened by jstammers 0

AutoARIMA, AutoETS' documentation missing `ic` options (`aic`, `bic`, `aicc`).

Description

Hi! I'm new to this library. I cannot find the possible values for each parameter. For example, in https://nixtla.github.io/statsforecast/models.html#autoarima, I can see the defaults, but if I wanted to modify the ic parameter from 'aicc' to some other value, I don't know what that other value that would be. Is it possible to expose all the possible values for each parameter. Thanks, Victor

Link

No response
documentation

opened by victor-guerrero 1

Releases(v1.4.0)

v1.4.0(Dec 1, 2022)
What's Changed

feat: Added prediction intervals for insample and ETS models in https://github.com/Nixtla/statsforecast/pull/328

[FEAT] Add plot anomalies option in https://github.com/Nixtla/statsforecast/pull/341

[DOCS] Improve README and docs page index in https://github.com/Nixtla/statsforecast/pull/344

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v1.3.2...v1.4.0
Source code(tar.gz)
Source code(zip)
v1.3.2(Nov 28, 2022)
What's Changed

[FIX] Improvements to StatsForecast's plot method in https://github.com/Nixtla/statsforecast/pull/312

[FEAT] Add plotly as engine to StatsForecast's plot method in https://github.com/Nixtla/statsforecast/pull/313

[FEAT] Add autowidth to plotly engine in https://github.com/Nixtla/statsforecast/pull/314

feat: add new documentation in https://github.com/Nixtla/statsforecast/pull/317

[FIX] ETS for inttermitent series in https://github.com/Nixtla/statsforecast/pull/315

[FIX] Theta for intermittent series in https://github.com/Nixtla/statsforecast/pull/316

[FEAT] Rename ETS to AutoETS in https://github.com/Nixtla/statsforecast/pull/318

[FEAT] Change library to newest black formatting in https://github.com/Nixtla/statsforecast/pull/320

[FIX] Add new plot method to mstl example in https://github.com/Nixtla/statsforecast/pull/324

[FIX] Build docs for Theta model in https://github.com/Nixtla/statsforecast/pull/322

[FIX] Isolate elements for all subplots plotly in https://github.com/Nixtla/statsforecast/pull/323

Fix/multiple seas docs in https://github.com/Nixtla/statsforecast/pull/325

[FEAT] Add mstl experiment in https://github.com/Nixtla/statsforecast/pull/326

[FIX] Prevent futurewarning series indexing in https://github.com/Nixtla/statsforecast/pull/327

Fix sidebar in https://github.com/Nixtla/statsforecast/pull/331

feat: Improved tutorial on Cross-Validation in https://github.com/Nixtla/statsforecast/pull/333

Feat/improve prediction intervals in https://github.com/Nixtla/statsforecast/pull/336

fix: Improved AutoARIMA plot in https://github.com/Nixtla/statsforecast/pull/334

docs: ERCOT electricity demand peak forecasting in https://github.com/Nixtla/statsforecast/pull/335

docs: fix peak demand plot in https://github.com/Nixtla/statsforecast/pull/339

New Contributors

@cchallu made their first contribution in https://github.com/Nixtla/statsforecast/pull/335

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v1.3.1...v1.3.2
Source code(tar.gz)
Source code(zip)
v1.3.1(Nov 17, 2022)
What's Changed

[FEAT] Add plot method to StatsForecast class in https://github.com/Nixtla/statsforecast/pull/305

[FEAT] New Issues Templates in https://github.com/Nixtla/statsforecast/pull/307

[FIX] make logging config local to package in https://github.com/Nixtla/statsforecast/pull/275

[FIX] Error when ds column is object in https://github.com/Nixtla/statsforecast/pull/309

New Contributors

@JeroenPeterBos made their first contribution in https://github.com/Nixtla/statsforecast/pull/275

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v1.3.0...v1.3.1
Source code(tar.gz)
Source code(zip)
v1.3.0(Nov 15, 2022)
What's Changed

[FIX] Use conda env for ray tests in https://github.com/Nixtla/statsforecast/pull/297

[FIX] Source code broken links in https://github.com/Nixtla/statsforecast/pull/293

[FIX] Sparse models with zero-valued time series in https://github.com/Nixtla/statsforecast/pull/294

[FIX] Add explicit optional argument (PEP-484) in https://github.com/Nixtla/statsforecast/pull/301

[FIX] SeasonalNaive in https://github.com/Nixtla/statsforecast/pull/302

[FEAT] Add exogenous variables to fugue's backend in https://github.com/Nixtla/statsforecast/pull/300

[FEAT] Add Theta methods in https://github.com/Nixtla/statsforecast/pull/299

[FEAT] Add MSTL example and comparison in https://github.com/Nixtla/statsforecast/pull/295

[FEAT] Add backend argument to StatsForecast class in https://github.com/Nixtla/statsforecast/pull/303

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v1.2.1...v1.3.0
Source code(tar.gz)
Source code(zip)
v1.2.1(Nov 2, 2022)
What's Changed

[FEAT]: Add fallback model to cross validation in https://github.com/Nixtla/statsforecast/pull/289

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v1.2.0...v1.2.1
Source code(tar.gz)
Source code(zip)
v1.2.0(Oct 31, 2022)
What's Changed

[FEAT] MSTL model n https://github.com/Nixtla/statsforecast/pull/284

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v1.1.3...v1.2.0
Source code(tar.gz)
Source code(zip)
v1.1.3(Oct 25, 2022)
What's Changed

[FEAT] Add progress bar for sequential tasks in https://github.com/Nixtla/statsforecast/pull/280

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v1.1.2...v1.1.3
Source code(tar.gz)
Source code(zip)
v1.1.2(Oct 24, 2022)
What's Changed

[FEAT] Improve navbar docs in https://github.com/Nixtla/statsforecast/pull/262

[FEAT] Add ETS to spark results in https://github.com/Nixtla/statsforecast/pull/264

[FEAT] Improve CES results in https://github.com/Nixtla/statsforecast/pull/265

[FEAT] Add fallback model to distributed backends in https://github.com/Nixtla/statsforecast/pull/277

[FIX] Backend docs in https://github.com/Nixtla/statsforecast/pull/278

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v1.1.1...v1.1.2
Source code(tar.gz)
Source code(zip)
v1.1.1(Oct 5, 2022)
What's Changed

[FEAT] Add Distributed post in https://github.com/Nixtla/statsforecast/pull/257

[FEAT] Fallback Model in https://github.com/Nixtla/statsforecast/pull/259

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v1.1.0...v1.1.1
Source code(tar.gz)
Source code(zip)
v1.1.0(Sep 28, 2022)
What's Changed

[FIX] License in https://github.com/Nixtla/statsforecast/pull/191

[FIX] Add hide statement for ets cells in https://github.com/Nixtla/statsforecast/pull/192

[FEAT] New experiments neuralprophet in https://github.com/Nixtla/statsforecast/pull/195

[FIX] use ubuntu to deploy docs in https://github.com/Nixtla/statsforecast/pull/197

[FIX] Broken links in https://github.com/Nixtla/statsforecast/pull/203

[FEAT] Add linters and update contributing instructions in https://github.com/Nixtla/statsforecast/pull/205

[FIX] nbdev latest changes in https://github.com/Nixtla/statsforecast/pull/208

[FIX] python3.7 ci error in https://github.com/Nixtla/statsforecast/pull/214

fixing the argument name for external regressors in the example notebook in https://github.com/Nixtla/statsforecast/pull/200

[FIX] #210 in https://github.com/Nixtla/statsforecast/pull/213

Docstring based documentation in https://github.com/Nixtla/statsforecast/pull/209

[FIX] nbdev version until next release in https://github.com/Nixtla/statsforecast/pull/225

[FEAT] Prediction intervals for fitted values in https://github.com/Nixtla/statsforecast/pull/228

[FEAT] Add anomaly detection example in https://github.com/Nixtla/statsforecast/pull/229

[FEAT] Add single anomaly plot in https://github.com/Nixtla/statsforecast/pull/230

[FEAT] Add exogenous var use case and install instructions in https://github.com/Nixtla/statsforecast/pull/231

[FEAT] M5 scalability comparison in https://github.com/Nixtla/statsforecast/pull/232

Intervals for some simple methods in https://github.com/Nixtla/statsforecast/pull/201

[FEAT] Add prediction intervals example in https://github.com/Nixtla/statsforecast/pull/239

[FEAT] Auto CES model by in https://github.com/Nixtla/statsforecast/pull/238

[FIX] nbdev releases in https://github.com/Nixtla/statsforecast/pull/251

[FEAT] Add CES + ETS ensemble results in https://github.com/Nixtla/statsforecast/pull/252

[FIX] nbdev deploy to gihub pages in https://github.com/Nixtla/statsforecast/pull/253

New Contributors

@jattenberg made their first contribution in https://github.com/Nixtla/statsforecast/pull/200

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v1.0.0...v1.1.0
Source code(tar.gz)
Source code(zip)
v1.0.0(Aug 15, 2022)
What's Changed

Add FugueBackend in https://github.com/Nixtla/statsforecast/pull/157

[FEAT] Add neuralprophet experiment in https://github.com/Nixtla/statsforecast/pull/181

[FEAT] nbdev2 integration in https://github.com/Nixtla/statsforecast/pull/186

[BREAKING CHANGE] SKLearn syntax in https://github.com/Nixtla/statsforecast/pull/184

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v0.7.1...v1.0.0
Source code(tar.gz)
Source code(zip)
v0.7.1(Jul 23, 2022)
What's Changed

[FEAT] Fitted df returns in-sample values in https://github.com/Nixtla/statsforecast/pull/158

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v0.7.0...v0.7.1
Source code(tar.gz)
Source code(zip)
v0.7.0(Jul 21, 2022)
What's Changed

[Fix]: prevent arima RuntimeWarnings in https://github.com/Nixtla/statsforecast/pull/136

[BREAKING CHANGE] Fitted Values Computation in https://github.com/Nixtla/statsforecast/pull/137

Now models return a dict instead a numpy array with mean and fitted values.

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v0.6.0...v0.7.0
Source code(tar.gz)
Source code(zip)
v0.6.0(Jul 19, 2022)
What's Changed

[FEAT] Add ETS model and experiments in https://github.com/Nixtla/statsforecast/pull/142

[BREAKING CHANGE] Deprecate python3.6 in https://github.com/Nixtla/statsforecast/pull/146

[FEAT] Ray experiment ets in https://github.com/Nixtla/statsforecast/pull/145

[FEAT] Readme updates to include ets in https://github.com/Nixtla/statsforecast/pull/148

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v0.5.6...v0.6.0
Source code(tar.gz)
Source code(zip)
v0.5.6(Jun 27, 2022)
What's Changed

[DOCS] Typo fixes by @ryanrussell in https://github.com/Nixtla/statsforecast/pull/117

[FEAT] Add fugue example by @goodwanghan in https://github.com/Nixtla/statsforecast/pull/111

[FEAT] add cross validation functionality by @FedericoGarza in https://github.com/Nixtla/statsforecast/pull/120

[FIX] #121 fitting autoarima on constant time series causes typeerror by @FedericoGarza in https://github.com/Nixtla/statsforecast/pull/122

[DOCS] add shagn as a contributor for bug by @allcontributors in https://github.com/Nixtla/statsforecast/pull/124

[FEAT] add integer ds compatibility for cross validation by @FedericoGarza in https://github.com/Nixtla/statsforecast/pull/123

[FEAT] Add n_windows argument for cross_validation method by @FedericoGarza in https://github.com/Nixtla/statsforecast/pull/131

[EXP] Add benchmarks at scale experiment by @FedericoGarza in https://github.com/Nixtla/statsforecast/pull/134

New Contributors

@ryanrussell made their first contribution in https://github.com/Nixtla/statsforecast/pull/117

@goodwanghan made their first contribution in https://github.com/Nixtla/statsforecast/pull/111

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v0.5.5...v0.5.6
Source code(tar.gz)
Source code(zip)
v0.5.5(May 9, 2022)
What's Changed

ARIMA level/quantile compatibility, missing nbdev_flow, protected gif by @kdgutier in https://github.com/Nixtla/statsforecast/pull/102

Add dependency hint for quick intro by @guerda in https://github.com/Nixtla/statsforecast/pull/106

[FEAT] Add AutoARIMA adapter for Prophet by @FedericoGarza in https://github.com/Nixtla/statsforecast/pull/114

New Contributors

@kdgutier made their first contribution in https://github.com/Nixtla/statsforecast/pull/102

@guerda made their first contribution in https://github.com/Nixtla/statsforecast/pull/106

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v0.5.4...v0.5.5
Source code(tar.gz)
Source code(zip)
v0.5.4(May 2, 2022)
What's Changed

feat: add issues template by @FedericoGarza in https://github.com/Nixtla/statsforecast/pull/93

refactor: use Pool instead of ProcessPoolExecutor by @FedericoGarza in https://github.com/Nixtla/statsforecast/pull/96

Feat: add ray integration by @FedericoGarza in https://github.com/Nixtla/statsforecast/pull/98

fix: add automatic n_jobs behavior by @FedericoGarza in https://github.com/Nixtla/statsforecast/pull/99

Creation of forecast dates improvement by @FedericoGarza in https://github.com/Nixtla/statsforecast/pull/101

Ray experiment by @FedericoGarza in https://github.com/Nixtla/statsforecast/pull/103

Update README.md by @mergenthaler in https://github.com/Nixtla/statsforecast/pull/104

Full Changelog: https://github.com/Nixtla/statsforecast/compare/v0.5.3...v0.5.4
Source code(tar.gz)
Source code(zip)
v0.5.3(Apr 12, 2022)
What's Changed

New features

summary method for the AutoARIMA class requested in #31.

representational string for the AutoARIMA fitted model, requested in #83.

Bug Fixes

[BUG] croston_sba #88 fixed in #89.

Source code(tar.gz)
Source code(zip)
v0.5.2(Mar 19, 2022)
Added predict_in_sample method for AutoARIMA.

Users can now compute in sample forecasts including prediction intervals.

Source code(tar.gz)
Source code(zip)
v0.5.1(Mar 11, 2022)
Now: Good Ol' sklearn syntax with model = AutoARIMA(); model.fit(y); model.predict(10).

Bug fixes.

Source code(tar.gz)
Source code(zip)
v0.5.0(Mar 7, 2022)
Notable changes

Inclusion of prediction intervals for auto_arima.

statsforecast is now installable from conda-forge (conda install -c conda-forecast statsforecast, thanks to @sugatoray).

Source code(tar.gz)
Source code(zip)
v0.4.0(Mar 1, 2022)
Notable changes

Inclusion of exogenous variables for auto_arima.

The StatsForecast class now handles exogenous variables.

This release allows developers to include more models that use exogenous variables.

Bug fixes.

Source code(tar.gz)
Source code(zip)