A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

alkaline-ml

Last update: Jan 6, 2023

Related tags

Machine Learning python machine-learning time-series econometrics forecasting arima forecasting-models sarimax pmdarima

Overview

pmdarima

Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time series analysis capabilities. This includes:

The equivalent of R's auto.arima functionality
A collection of statistical tests of stationarity and seasonality
Time series utilities, such as differencing and inverse differencing
Numerous endogenous and exogenous transformers and featurizers, including Box-Cox and Fourier transformations
Seasonal time series decompositions
Cross-validation utilities
A rich collection of built-in time series datasets for prototyping and examples
Scikit-learn-esque pipelines to consolidate your estimators and promote productionization

Pmdarima wraps statsmodels under the hood, but is designed with an interface that's familiar to users coming from a scikit-learn background.

Installation

pip

Pmdarima has binary and source distributions for Windows, Mac and Linux (manylinux) on pypi under the package name pmdarima and can be downloaded via pip:

pip install pmdarima

conda

Pmdarima also has Mac and Linux builds available via conda and can be installed like so:

conda config --add channels conda-forge
conda config --set channel_priority strict
conda install pmdarima

Note: We do not maintain our own Conda binaries, they are maintained at https://github.com/conda-forge/pmdarima-feedstock. See that repo for further documentation on working with Pmdarima on Conda.

Quickstart Examples

Fitting a simple auto-ARIMA on the wineind dataset:

import pmdarima as pm
from pmdarima.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

# Load/split your data
y = pm.datasets.load_wineind()
train, test = train_test_split(y, train_size=150)

# Fit your model
model = pm.auto_arima(train, seasonal=True, m=12)

# make your forecasts
forecasts = model.predict(test.shape[0])  # predict N steps into the future

# Visualize the forecasts (blue=train, green=forecasts)
x = np.arange(y.shape[0])
plt.plot(x[:150], train, c='blue')
plt.plot(x[150:], forecasts, c='green')
plt.show()

Fitting a more complex pipeline on the sunspots dataset, serializing it, and then loading it from disk to make predictions:

import pmdarima as pm
from pmdarima.model_selection import train_test_split
from pmdarima.pipeline import Pipeline
from pmdarima.preprocessing import BoxCoxEndogTransformer
import pickle

# Load/split your data
y = pm.datasets.load_sunspots()
train, test = train_test_split(y, train_size=2700)

# Define and fit your pipeline
pipeline = Pipeline([
    ('boxcox', BoxCoxEndogTransformer(lmbda2=1e-6)),  # lmbda2 avoids negative values
    ('arima', pm.AutoARIMA(seasonal=True, m=12,
                           suppress_warnings=True,
                           trace=True))
])

pipeline.fit(train)

# Serialize your model just like you would in scikit:
with open('model.pkl', 'wb') as pkl:
    pickle.dump(pipeline, pkl)
    
# Load it and make predictions seamlessly:
with open('model.pkl', 'rb') as pkl:
    mod = pickle.load(pkl)
    print(mod.predict(15))
# [25.20580375 25.05573898 24.4263037  23.56766793 22.67463049 21.82231043
# 21.04061069 20.33693017 19.70906027 19.1509862  18.6555793  18.21577243
# 17.8250318  17.47750614 17.16803394]

Availability

pmdarima is available on PyPi in pre-built Wheel files for Python 3.6+ for the following platforms:

Mac (64-bit)
Linux (64-bit manylinux)
Windows (32 & 64-bit)

If a wheel doesn't exist for your platform, you can still pip install and it will build from the source distribution tarball, however you'll need cython>=0.29 and gcc (Mac/Linux) or MinGW (Windows) in order to build the package from source.

Note that legacy versions (<1.0.0) are available under the name "pyramid-arima" and can be pip installed via:

# Legacy warning:
$ pip install pyramid-arima
# python -c 'import pyramid;'

However, this is not recommended.

Documentation

All of your questions and more (including examples and guides) can be answered by the pmdarima documentation. If not, always feel free to file an issue.

Comments

Import error from pyramid.arima

#trying to impor auto_arima using below commands: from pyramid.arima import auto_arima

#error: from pyramid.arima._arima import C_Approx

ImportError: cannot import name 'C_Approx'

Versions

run the following snippet and paste the output below. Windows-7-6.1.7601-SP1 Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)] Pyramid 0.6.2 NumPy 1.12.1 SciPy 1.0.0 Scikit-Learn 0.20.dev0 Statsmodels 0.8.0
:confused: : cannot replicate

opened by munitech4u 31
ModuleNotFoundError: No module named 'pmdarima'
Already updated the required packages to install pmdarima, and then installed the package, but the package still cannot be imported.

[1] import pmdarima as pm Traceback (most recent call last): File "<ipython-input-13-05191c3ae93f>", line 1, in <module> import pmdarima as pm ModuleNotFoundError: No module named 'pmdarima'

Tried uninstalling the package and reinstall it, but still cannot install pmdarima. What's the problem with it?
help wanted :information_source: : more info needed
opened by weianuk 28

ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required ?

Question

I have a timeline with number of some reports per week. When I try to fit a model with auto_arima, I receive an error ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required. (Find the complete error trace below).

Since I am not a statistician, I do not understand what that error tells me in relation to my data. I am supposing it has something to do with seasonality given the error trace. Since the data is per week, I set m=52. From my programming understanding, this seems to be the reason for the error. When choosing a smaller m the error disappears. But I cannot really choose another m due to weekly data.

In essence my question is what does auto_arima try to tell me here?

Even though I can see that the error originates from the check_array function in sklearn's validation.py, perhaps that exception could be caught and instead a "domain-specific" exception could be raised by pmdarima?

I built a small example that reproduces my issue:

import pmdarima as pm
import matplotlib.pyplot as plt


week_number = ["201751", "201752", "201801", "201802", "201803", "201804", "201805", 
               "201806", "201807", "201808", "201809", "201810", "201811", "201812", 
               "201813", "201814", "201815", "201816", "201817", "201818", "201819", 
               "201820", "201821", "201822", "201823", "201824", "201825", "201826", 
               "201827", "201828", "201829", "201830", "201831", "201832", "201833", 
               "201834", "201835", "201836", "201837", "201838", "201839", "201840", 
               "201841", "201842", "201843", "201844", "201845", "201846", "201847", 
               "201848", "201849", "201850", "201851", "201852", "201853", "201900", 
               "201901", "201902", "201903", "201904", "201905", "201906", "201907", 
               "201908", "201909", "201910", "201911", "201912", "201913", "201914", 
               "201915", "201916", "201917", "201918", "201919", "201920", "201921", 
               "201922", "201923", "201924", "201925", "201926", "201927", "201928", 
               "201929", "201930", "201931", "201932"]
reports_per_week = [1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 1, 6, 2, 1, 0,
                    2, 0, 1, 0, 0, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 3, 0, 0, 6,
                    0, 0, 0, 0, 0, 1, 3, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0]


plt.plot(week_number, reports_per_week)
plt.xticks(range(0, len(week_number), 15))
plt.show()

model = pm.auto_arima(data.reports_per_week, trace=True, seasonal=True, m=52)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-78-b33795e0a9d1> in <module>
     25 plt.xticks(range(0, len(week_number), 15))
     26 
---> 27 model = pm.auto_arima(data.reports_per_week, trace=True, seasonal=True, m=52)

/usr/local/anaconda3/lib/python3.7/site-packages/pmdarima/arima/auto.py in auto_arima(y, exogenous, start_p, d, start_q, max_p, max_d, max_q, start_P, D, start_Q, max_P, max_D, max_Q, max_order, m, seasonal, stationary, information_criterion, alpha, test, seasonal_test, stepwise, n_jobs, start_params, trend, method, maxiter, offset_test_args, seasonal_test_args, suppress_warnings, error_action, trace, random, random_state, n_fits, return_valid_fits, out_of_sample_size, scoring, scoring_args, with_intercept, sarimax_kwargs, **fit_args)
    398             if seasonal_test_args is not None else dict()
    399         D = nsdiffs(xx, m=m, test=seasonal_test, max_D=max_D,
--> 400                     **seasonal_test_args)
    401 
    402         if D > 0 and exogenous is not None:

/usr/local/anaconda3/lib/python3.7/site-packages/pmdarima/arima/utils.py in nsdiffs(x, m, max_D, test, **kwargs)
    108         if is_constant(x):
    109             return D
--> 110         dodiff = testfunc(x)
    111 
    112     return D

/usr/local/anaconda3/lib/python3.7/site-packages/pmdarima/arima/seasonality.py in estimate_seasonal_differencing_term(self, x)
    580 
    581         # Get the critical value for m
--> 582         stat = self._compute_test_statistic(x)
    583         crit_val = self._calc_ocsb_crit_val(self.m)
    584         return int(stat > crit_val)

/usr/local/anaconda3/lib/python3.7/site-packages/pmdarima/arima/seasonality.py in _compute_test_statistic(self, x)
    520             for lag_term in range(1, maxlag + 1):  # 1 -> maxlag (incl)
    521                 try:
--> 522                     fit = self._fit_ocsb(x, m, lag_term, maxlag)
    523                     fits.append(fit)
    524                     icvals.append(icfunc(fit))

/usr/local/anaconda3/lib/python3.7/site-packages/pmdarima/arima/seasonality.py in _fit_ocsb(x, m, lag, max_lag)
    466         """Fit the linear model used to compute the test statistic"""
    467         y_first_order_diff = diff(x, m)
--> 468         y = diff(y_first_order_diff)
    469         ylag = OCSBTest._gen_lags(y, lag)
    470 

/usr/local/anaconda3/lib/python3.7/site-packages/pmdarima/utils/array.py in diff(x, lag, differences)
    299         raise ValueError('lag and differences must be positive (> 0) integers')
    300 
--> 301     x = check_array(x, ensure_2d=False, dtype=DTYPE, copy=False)
    302     fun = _diff_vector if x.ndim == 1 else _diff_matrix
    303     res = x

/usr/local/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    584                              " minimum of %d is required%s."
    585                              % (n_samples, array.shape, ensure_min_samples,
--> 586                                 context))
    587 
    588     if ensure_min_features > 0 and array.ndim == 2:

ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.

Versions (if necessary)

I provide the version info just to be sure in case it becomes relevant.

System:
    python: 3.7.6 (default, Jan  8 2020, 13:42:34)  [Clang 4.0.1 (tags/RELEASE_401/final)]
executable: /usr/local/anaconda3/bin/python
   machine: Darwin-18.7.0-x86_64-i386-64bit

Python dependencies:
        pip: 20.0.2
 setuptools: 46.0.0.post20200309
    sklearn: 0.22.1
statsmodels: 0.11.0
      numpy: 1.18.1
      scipy: 1.4.1
     Cython: 0.29.15
     pandas: 1.0.1
     joblib: 0.14.1
   pmdarima: 1.6.1

Thank you for any help!

:grey_question: : question

opened by HelgeCPH 24

First-party conda support

We have tried to implement first-party conda support several times (see #174, #263 , #298, and #300, and finally #325) and have not been able to figure it out.

This issue is to track any discussion around conda support. We are very interested in outside help for anyone who may be a conda expert
help wanted feature request

opened by aaronreidsmith 15
auto_arima's error_action="ignore" does not work when alternative training methods are specified
Describe the bug If I specify error_action="ignore" in auto_arima when method is anything other than the default of CSS-MLE, errors from statsmodels are no longer ignored.

To Reproduce
Execute the following python code (this is using pmdarima 1.5.3)

import pmdarima y = pmdarima.datasets.load_wineind() model = pmdarima.auto_arima(y=y, error_action="ignore", suppress_warnings=True, trace=True, method='CSS')

Versions

System: python: 3.6.10 |Anaconda, Inc.| (default, Jan 7 2020, 15:18:16) [MSC v.1916 64 bit (AMD64)] executable: C:\Users\adgustaf\AppData\Local\Continuum\anaconda3\envs\pmdarima_v153\python.exe machine: Windows-10-10.0.18362-SP0

Python dependencies: pip: 20.0.2 setuptools: 46.0.0.post20200309 sklearn: 0.22.2.post1 statsmodels: 0.11.1 numpy: 1.18.1 scipy: 1.4.1 Cython: 0.29.15 pandas: 1.0.1 joblib: 0.14.1 pmdarima: 1.5.3

Expected behavior I expect the training to complete just as when the default method of CSS-MLE is applied.

Actual behavior Fitting fails

Additional context None
wontfix
opened by adgustaf 15

number of residuals from auto_arima is different from what I get from R

I am comparing results of auto ARIMA with R (forecast package) and Python (pmdarima package). One of the issues I am getting is the length of the residuals in R and Python are different when d is not zero. For example, in the code shown below, in R, the length of the residuals is the same as the time series data but in python it is the length of the time series data minus d. Further, I also see significant changes in the order I get from R vs Python. What are the causes of the differences? R

set.seed(250)
ts1 = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 200)
set.seed(270)
ts2 = arima.sim(list(order = c(1,2,0), ar = 0.7), n = 200)
my_fit1 <- auto.arima(ts1, stepwise = FALSE)
resid1 = as.numeric(residuals(my_fit1))
my_fit2 <- auto.arima(ts2, stepwise = FALSE)
resid2 = as.numeric(residuals(my_fit2))
#to use with Python
 ts1 = data.frame(ts1 = ts1)
 write.csv(ts1, 'ts1.csv', row.names = F)
 ts2 = data.frame(ts2 = ts2)
 write.csv(ts2, 'ts2.csv', row.names = F)

Python

from pmdarima.arima import auto_arima
ts1 = pd.read_csv('ts1.csv')
ts2 = pd.read_csv('ts2.csv')
my_fit1 = auto_arima(y = ts1.ts1, 
                  start_p=1, 
                  start_q=1,
                  max_p=3, 
                  max_q=3,
                  seasonal=False,
                  d = None,  
                  error_action='ignore',  
                  suppress_warnings=True, 
                  stepwise=False)

 my_fit2 = auto_arima(y = ts2.ts2, 
                  start_p=1, 
                  start_q=1,
                  max_p=3, 
                  max_q=3,
                  seasonal=False,
                  d = None,  
                  error_action='ignore',  
                  suppress_warnings=True, 
                  stepwise=False)```

:grey_question: : question

opened by fissehab 15

Failed to build 1.4.0 from tarball

Describe the bug

Getting ValueError while installing the pmdarima using pip3. Failed building wheel for pmdarima

To Reproduce
Steps to reproduce the behavior: Tried installing pmdarima in termux app running on Android 9, Redmi Note 7 pro, MIUI 10.2.

pip install pmdarima

Versions

Linux-4.14.81-perf+-aarch64-with-libc Python 3.7.5 (default, Oct 23 2019, 08:30:10) [Clang 8.0.7 (https://android.googlesource.com/toolchain/clang b55f2d4ebfd35bf6 NumPy 1.17.3 SciPy 1.3.1 Expected behavior

Actual behavior

Additional context

2019-12-03T21:46:55,026 Collecting pmdarima
2019-12-03T21:46:55,028   Created temporary directory: /data/data/com.termux/files/usr/tmp/pip-unpack-5t6czeam
2019-12-03T21:46:55,062   Using cached https://files.pythonhosted.org/packages/1a/4f/6851c8d37551efcb8cfe12539f42f0f1b42a2d28a7275f1e1f6bdd6956a2/pmdarima-1.4.0.tar.gz
2019-12-03T21:46:55,208   Added pmdarima from https://files.pythonhosted.org/packages/1a/4f/6851c8d37551efcb8cfe12539f42f0f1b42a2d28a7275f1e1f6bdd6956a2/pmdarima-1.4.0.tar.gz#sha256=d91f15bc1ab700ad5587671db48e208b606a6852c71544a32af30bf4d4f69782 to build tracker '/data/data/com.termux/files/usr/tmp/pip-req-tracker-j_a7hzre'
2019-12-03T21:46:55,208     Running setup.py (path:/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py) egg_info for package pmdarima
2019-12-03T21:46:55,209     Running command python setup.py egg_info
2019-12-03T21:46:56,011     Requirements: ['Cython>=0.29', 'joblib>=0.11', 'numpy>=1.16', 'pandas>=0.19', 'scikit-learn>=0.19', 'scipy>=1.3', 'six>=1.5', 'statsmodels>=0.10.0']
2019-12-03T21:46:56,011     Adding extra setuptools args
2019-12-03T21:46:56,012     running egg_info
2019-12-03T21:46:56,013     creating /data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/pip-egg-info/pmdarima.egg-info
2019-12-03T21:46:56,013     writing /data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/pip-egg-info/pmdarima.egg-info/PKG-INFO
2019-12-03T21:46:56,014     writing dependency_links to /data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/pip-egg-info/pmdarima.egg-info/dependency_links.txt
2019-12-03T21:46:56,014     writing requirements to /data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/pip-egg-info/pmdarima.egg-info/requires.txt
2019-12-03T21:46:56,015     writing top-level names to /data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/pip-egg-info/pmdarima.egg-info/top_level.txt
2019-12-03T21:46:56,015     writing manifest file '/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/pip-egg-info/pmdarima.egg-info/SOURCES.txt'
2019-12-03T21:46:56,343     reading manifest file '/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/pip-egg-info/pmdarima.egg-info/SOURCES.txt'
2019-12-03T21:46:56,343     reading manifest template 'MANIFEST.in'
2019-12-03T21:46:56,347     Partial import of pmdarima during the build process.
2019-12-03T21:46:56,347     /data/data/com.termux/files/usr/lib/python3.7/distutils/dist.py:274: UserWarning: Unknown distribution option: 'configuration'
2019-12-03T21:46:56,347       warnings.warn(msg)
2019-12-03T21:46:56,348     warning: no files found matching '*.pyd' under directory 'pmdarima/__check_build'
2019-12-03T21:46:56,348     warning: no files found matching '*.so' under directory 'pmdarima/__check_build'
2019-12-03T21:46:56,348     warning: no files found matching '*.dylib' under directory 'pmdarima/__check_build'
2019-12-03T21:46:56,349     warning: no files found matching '*.dll' under directory 'pmdarima/__check_build'
2019-12-03T21:46:56,352     warning: no files found matching '*.pyd' under directory 'pmdarima/arima'
2019-12-03T21:46:56,353     warning: no files found matching '*.so' under directory 'pmdarima/arima'
2019-12-03T21:46:56,354     warning: no files found matching '*.dylib' under directory 'pmdarima/arima'
2019-12-03T21:46:56,355     warning: no files found matching '*.dll' under directory 'pmdarima/arima'
2019-12-03T21:46:56,356     warning: no files found matching '*.pyd' under directory 'pmdarima/preprocessing/exog'
2019-12-03T21:46:56,357     warning: no files found matching '*.so' under directory 'pmdarima/preprocessing/exog'
2019-12-03T21:46:56,358     warning: no files found matching '*.dylib' under directory 'pmdarima/preprocessing/exog'
2019-12-03T21:46:56,359     warning: no files found matching '*.dll' under directory 'pmdarima/preprocessing/exog'
2019-12-03T21:46:56,360     warning: no files found matching '*.pyx' under directory 'pmdarima/compat'
2019-12-03T21:46:56,361     warning: no files found matching '*.pyd' under directory 'pmdarima/compat'
2019-12-03T21:46:56,362     warning: no files found matching '*.pyx' under directory 'pmdarima/datasets'
2019-12-03T21:46:56,364     warning: no files found matching '*.pyd' under directory 'pmdarima/datasets'
2019-12-03T21:46:56,366     warning: no files found matching '*.pyx' under directory 'pmdarima/utils'
2019-12-03T21:46:56,367     warning: no files found matching '*.pyd' under directory 'pmdarima/utils'
2019-12-03T21:46:56,369     writing manifest file '/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/pip-egg-info/pmdarima.egg-info/SOURCES.txt'
2019-12-03T21:46:56,436   Source in /data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima has version 1.4.0, which satisfies requirement pmdarima from https://files.pythonhosted.org/packages/1a/4f/6851c8d37551efcb8cfe12539f42f0f1b42a2d28a7275f1e1f6bdd6956a2/pmdarima-1.4.0.tar.gz#sha256=d91f15bc1ab700ad5587671db48e208b606a6852c71544a32af30bf4d4f69782
2019-12-03T21:46:56,438   Removed pmdarima from https://files.pythonhosted.org/packages/1a/4f/6851c8d37551efcb8cfe12539f42f0f1b42a2d28a7275f1e1f6bdd6956a2/pmdarima-1.4.0.tar.gz#sha256=d91f15bc1ab700ad5587671db48e208b606a6852c71544a32af30bf4d4f69782 from build tracker '/data/data/com.termux/files/usr/tmp/pip-req-tracker-j_a7hzre'
2019-12-03T21:46:56,531 Requirement already satisfied: Cython>=0.29 in /data/data/com.termux/files/usr/lib/python3.7/site-packages (from pmdarima) (0.29.14)
2019-12-03T21:46:56,538 Requirement already satisfied: joblib>=0.11 in /data/data/com.termux/files/usr/lib/python3.7/site-packages (from pmdarima) (0.14.0)
2019-12-03T21:46:56,545 Requirement already satisfied: numpy>=1.16 in /data/data/com.termux/files/usr/lib/python3.7/site-packages (from pmdarima) (1.17.3)
2019-12-03T21:46:56,551 Requirement already satisfied: pandas>=0.19 in /data/data/com.termux/files/usr/lib/python3.7/site-packages (from pmdarima) (0.25.3)
2019-12-03T21:46:56,573 Requirement already satisfied: scikit-learn>=0.19 in /data/data/com.termux/files/usr/lib/python3.7/site-packages (from pmdarima) (0.21.3)
2019-12-03T21:46:56,590 Requirement already satisfied: scipy>=1.3 in /data/data/com.termux/files/usr/lib/python3.7/site-packages/scipy-1.3.1-py3.7-linux-aarch64.egg (from pmdarima) (1.3.1)
2019-12-03T21:46:56,596 Requirement already satisfied: six>=1.5 in /data/data/com.termux/files/usr/lib/python3.7/site-packages (from pmdarima) (1.12.0)
2019-12-03T21:46:56,601 Requirement already satisfied: statsmodels>=0.10.0 in /data/data/com.termux/files/usr/lib/python3.7/site-packages (from pmdarima) (0.10.1)
2019-12-03T21:46:56,640 Requirement already satisfied: pytz>=2017.2 in /data/data/com.termux/files/usr/lib/python3.7/site-packages (from pandas>=0.19->pmdarima) (2019.3)
2019-12-03T21:46:56,649 Requirement already satisfied: python-dateutil>=2.6.1 in /data/data/com.termux/files/usr/lib/python3.7/site-packages (from pandas>=0.19->pmdarima) (2.8.1)
2019-12-03T21:46:56,659 Requirement already satisfied: patsy>=0.4.0 in /data/data/com.termux/files/usr/lib/python3.7/site-packages (from statsmodels>=0.10.0->pmdarima) (0.5.1)
2019-12-03T21:46:56,670 Building wheels for collected packages: pmdarima
2019-12-03T21:46:56,671   Created temporary directory: /data/data/com.termux/files/usr/tmp/pip-wheel-vcnsyaf2
2019-12-03T21:46:56,672   Destination directory: /data/data/com.termux/files/usr/tmp/pip-wheel-vcnsyaf2
2019-12-03T21:46:56,672   Running command /data/data/com.termux/files/usr/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py'"'"'; __file__='"'"'/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /data/data/com.termux/files/usr/tmp/pip-wheel-vcnsyaf2 --python-tag cp37
2019-12-03T21:46:57,170   Partial import of pmdarima during the build process.
2019-12-03T21:46:57,170   Requirements: ['Cython>=0.29', 'joblib>=0.11', 'numpy>=1.16', 'pandas>=0.19', 'scikit-learn>=0.19', 'scipy>=1.3', 'six>=1.5', 'statsmodels>=0.10.0']
2019-12-03T21:46:57,171   Adding extra setuptools args
2019-12-03T21:46:58,511   Traceback (most recent call last):
2019-12-03T21:46:58,511     File "<string>", line 1, in <module>
2019-12-03T21:46:58,512     File "/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py", line 250, in <module>
2019-12-03T21:46:58,512       do_setup()
2019-12-03T21:46:58,512     File "/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py", line 246, in do_setup
2019-12-03T21:46:58,512       setup(**metadata)
2019-12-03T21:46:58,512     File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/numpy/distutils/core.py", line 137, in setup
2019-12-03T21:46:58,513       config = configuration()
2019-12-03T21:46:58,513     File "/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py", line 164, in configuration
2019-12-03T21:46:58,513       config.add_subpackage(DISTNAME)
2019-12-03T21:46:58,514     File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/numpy/distutils/misc_util.py", line 1035, in add_subpackage
2019-12-03T21:46:58,515       caller_level = 2)
2019-12-03T21:46:58,515     File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/numpy/distutils/misc_util.py", line 1004, in get_subpackage
2019-12-03T21:46:58,515       caller_level = caller_level + 1)
2019-12-03T21:46:58,515     File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/numpy/distutils/misc_util.py", line 941, in _get_configuration_from_setup_py
2019-12-03T21:46:58,516       config = setup_module.configuration(*args)
2019-12-03T21:46:58,516     File "pmdarima/setup.py", line 36, in configuration
2019-12-03T21:46:58,516       config.add_subpackage('model_selection/tests')
2019-12-03T21:46:58,516     File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/numpy/distutils/misc_util.py", line 1035, in add_subpackage
2019-12-03T21:46:58,517       caller_level = 2)
2019-12-03T21:46:58,517     File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/numpy/distutils/misc_util.py", line 997, in get_subpackage
2019-12-03T21:46:58,517       caller_level = caller_level+1)
2019-12-03T21:46:58,517     File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/numpy/distutils/misc_util.py", line 779, in __init__
2019-12-03T21:46:58,518       raise ValueError("%r is not a directory" % (package_path,))
2019-12-03T21:46:58,518   ValueError: 'pmdarima/model_selection/tests' is not a directory
2019-12-03T21:46:58,616   ERROR: Failed building wheel for pmdarima
2019-12-03T21:46:58,619   Running setup.py clean for pmdarima
2019-12-03T21:46:58,621   Running command /data/data/com.termux/files/usr/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py'"'"'; __file__='"'"'/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' clean --all
2019-12-03T21:46:59,348   Partial import of pmdarima during the build process.
2019-12-03T21:46:59,350   Requirements: ['Cython>=0.29', 'joblib>=0.11', 'numpy>=1.16', 'pandas>=0.19', 'scikit-learn>=0.19', 'scipy>=1.3', 'six>=1.5', 'statsmodels>=0.10.0']
2019-12-03T21:46:59,397   running clean
2019-12-03T21:46:59,403   'build/lib' does not exist -- can't clean it
2019-12-03T21:46:59,404   'build/bdist.linux-aarch64' does not exist -- can't clean it
2019-12-03T21:46:59,405   'build/scripts-3.7' does not exist -- can't clean it
2019-12-03T21:46:59,406   Removing directory: __pycache__
2019-12-03T21:46:59,408   Removing directory: __pycache__
2019-12-03T21:46:59,409   Removing directory: __pycache__
2019-12-03T21:46:59,411   Removing directory: __pycache__
2019-12-03T21:46:59,449 Failed to build pmdarima
2019-12-03T21:47:00,195 Installing collected packages: pmdarima
2019-12-03T21:47:00,196   Created temporary directory: /data/data/com.termux/files/usr/tmp/pip-record-hqdqajee
2019-12-03T21:47:00,197     Running command /data/data/com.termux/files/usr/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py'"'"'; __file__='"'"'/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /data/data/com.termux/files/usr/tmp/pip-record-hqdqajee/install-record.txt --single-version-externally-managed --compile
2019-12-03T21:47:00,713     Partial import of pmdarima during the build process.
2019-12-03T21:47:00,713     Requirements: ['Cython>=0.29', 'joblib>=0.11', 'numpy>=1.16', 'pandas>=0.19', 'scikit-learn>=0.19', 'scipy>=1.3', 'six>=1.5', 'statsmodels>=0.10.0']
2019-12-03T21:47:00,713     Adding extra setuptools args
2019-12-03T21:47:01,542     Traceback (most recent call last):
2019-12-03T21:47:01,542       File "<string>", line 1, in <module>
2019-12-03T21:47:01,542       File "/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py", line 250, in <module>
2019-12-03T21:47:01,542         do_setup()
2019-12-03T21:47:01,542       File "/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py", line 246, in do_setup
2019-12-03T21:47:01,542         setup(**metadata)
2019-12-03T21:47:01,542       File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/numpy/distutils/core.py", line 137, in setup
2019-12-03T21:47:01,542         config = configuration()
2019-12-03T21:47:01,543       File "/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py", line 164, in configuration
2019-12-03T21:47:01,543         config.add_subpackage(DISTNAME)
2019-12-03T21:47:01,543       File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/numpy/distutils/misc_util.py", line 1035, in add_subpackage
2019-12-03T21:47:01,543         caller_level = 2)
2019-12-03T21:47:01,544       File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/numpy/distutils/misc_util.py", line 1004, in get_subpackage
2019-12-03T21:47:01,544         caller_level = caller_level + 1)
2019-12-03T21:47:01,544       File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/numpy/distutils/misc_util.py", line 941, in _get_configuration_from_setup_py
2019-12-03T21:47:01,544         config = setup_module.configuration(*args)
2019-12-03T21:47:01,545       File "pmdarima/setup.py", line 36, in configuration
2019-12-03T21:47:01,545         config.add_subpackage('model_selection/tests')
2019-12-03T21:47:01,545       File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/numpy/distutils/misc_util.py", line 1035, in add_subpackage
2019-12-03T21:47:01,545         caller_level = 2)
2019-12-03T21:47:01,545       File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/numpy/distutils/misc_util.py", line 997, in get_subpackage
2019-12-03T21:47:01,546         caller_level = caller_level+1)
2019-12-03T21:47:01,546       File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/numpy/distutils/misc_util.py", line 779, in __init__
2019-12-03T21:47:01,546         raise ValueError("%r is not a directory" % (package_path,))
2019-12-03T21:47:01,546     ValueError: 'pmdarima/model_selection/tests' is not a directory
2019-12-03T21:47:01,644 Cleaning up...
2019-12-03T21:47:01,644   Removing source in /data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima
2019-12-03T21:47:01,653 Removed build tracker '/data/data/com.termux/files/usr/tmp/pip-req-tracker-j_a7hzre'
2019-12-03T21:47:01,653 ERROR: Command errored out with exit status 1: /data/data/com.termux/files/usr/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py'"'"'; __file__='"'"'/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /data/data/com.termux/files/usr/tmp/pip-record-hqdqajee/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.
2019-12-03T21:47:01,654 Exception information:
2019-12-03T21:47:01,654 Traceback (most recent call last):
2019-12-03T21:47:01,654   File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 153, in _main
2019-12-03T21:47:01,654     status = self.run(options, args)
2019-12-03T21:47:01,654   File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 455, in run
2019-12-03T21:47:01,654     use_user_site=options.use_user_site,
2019-12-03T21:47:01,654   File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 62, in install_given_reqs
2019-12-03T21:47:01,654     **kwargs
2019-12-03T21:47:01,654   File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 888, in install
2019-12-03T21:47:01,654     cwd=self.unpacked_source_directory,
2019-12-03T21:47:01,654   File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/pip/_internal/utils/subprocess.py", line 275, in runner
2019-12-03T21:47:01,654     spinner=spinner,
2019-12-03T21:47:01,654   File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/pip/_internal/utils/subprocess.py", line 242, in call_subprocess
2019-12-03T21:47:01,654     raise InstallationError(exc_msg)
2019-12-03T21:47:01,654 pip._internal.exceptions.InstallationError: Command errored out with exit status 1: /data/data/com.termux/files/usr/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py'"'"'; __file__='"'"'/data/data/com.termux/files/usr/tmp/pip-install-lf0h5m6i/pmdarima/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /data/data/com.termux/files/usr/tmp/pip-record-hqdqajee/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

:beetle: : bug

opened by RSwarnkar 14

How can I load trained models with Python 2.7?

Description

I want to use a trained model in AWS Lambda and then deploy it on AWS Sagemaker, for local use. The problem is that AWS Lambda in Sagemaker just allows using Python 2.7 so I have to use pmdarima 0.9.0 as highest for the prediction, but I have this issue #107 when reading the trained model. So my question is If there is anything I can do for using pmdarima 0.9.0 without having the #107 issue?

My idea is to train a model on Sagemaker, and then deploy it on AWS Greengrass for local predictions. So I need to run the predictions on Python 2.7.

Versions

Windows-10-10.0.17763 ('Python', '2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AMD64)]') ('pmdarima', '0.9.0') ('NumPy', '1.16.2') ('SciPy', '1.2.1') ('Scikit-Learn', '0.20.3') ('Statsmodels', '0.9.0')

opened by venegas09 14
Scipy factorial ImportError on Kaggle

Kaggle doesn't have pyramid library. So I need to install it using !pip install pyramid-arima and !pip install pyramid.

But after that, when I want to import auto_arima: from pyramid.arima import auto_arima

Kaggle show me an error:

ImportError: cannot import name 'factorial'

user error dependency-issue

opened by storabian 13
predict_in_sample of auto_arima produces fitted-values fluctuating around zero

Description

predict_in_sample of auto_arima produces fitted-values fluctuating around zero, does not follow real data pattern (see the blue line in actual results)! The expected result is made by sm.ARIMA using the same parameters as the auto-arima.

Steps/Code to Reproduce

model_auto = pm.auto_arima(array, start_p=0, start_q=0, max_p=10, max_q=10, max_d=3, error_action="ignore", seasonal=False, D=None, trace=True, stepwise=True, enforce_stationarity=False, enforce_invertibility=False, maxiter=5000)
model_auto.summary() model_auto.fit(array) preds = model_auto.predict_in_sample(array) plt.plot(preds, color='blue')

from statsmodels.tsa.arima_model import ARIMA model = ARIMA(array, order=(1,1,2)).fit(disp=0) predict = model.predict(typ='levels') plt.plot(array, color='lightblue') plt.plot(predict,color='green')

-->

Expected Results

Actual Results

Versions
:beetle: : bug

opened by JahangirVajedsamiei 13
AttributeError: module 'pyramid' has no attribute '__version__'

Description

I was trying to read about ARIMA models via this link

https://medium.com/@josemarcialportilla/using-python-and-auto-arima-to-forecast-seasonal-time-series-90877adff03c

and while training the model i got the following error

Steps/Code to Reproduce

Expected Results

The resulting best model parameters should give us an AIC value of 1771.29.

Actual Results

AttributeError Traceback (most recent call last) in () 5 error_action='ignore', 6 suppress_warnings=True, ----> 7 stepwise=True)

/anaconda3/lib/python3.6/site-packages/pyramid/arima/auto.py in auto_arima(y, exogenous, start_p, d, start_q, max_p, max_d, max_q, start_P, D, start_Q, max_P, max_D, max_Q, max_order, m, seasonal, stationary, information_criterion, alpha, test, seasonal_test, stepwise, n_jobs, start_params, trend, method, transparams, solver, maxiter, disp, callback, offset_test_args, seasonal_test_args, suppress_warnings, error_action, trace, random, random_state, n_fits, return_valid_fits, out_of_sample_size, scoring, scoring_args, **fit_args) 626 627 # fit a baseline p, d, q model and then a null model --> 628 stepwise_wrapper.fit_increment_k_cache_set(True) # p, d, q, P, D, Q 629 stepwise_wrapper.fit_increment_k_cache_set( 630 True, p=0, q=0, P=0, Q=0) # null model

/anaconda3/lib/python3.6/site-packages/pyramid/arima/auto.py in fit_increment_k_cache_set(self, expr, p, q, P, Q) 779 out_of_sample_size=self.out_of_sample_size, 780 scoring=self.scoring, --> 781 scoring_args=self.scoring_args) 782 783 # use the orders as a key to be hashed for

/anaconda3/lib/python3.6/site-packages/pyramid/arima/auto.py in _fit_arima(x, xreg, order, seasonal_order, start_params, trend, method, transparams, solver, maxiter, disp, callback, fit_params, suppress_warnings, trace, error_action, out_of_sample_size, scoring, scoring_args) 846 out_of_sample_size=out_of_sample_size, scoring=scoring, 847 scoring_args=scoring_args)
--> 848 .fit(x, exogenous=xreg, **fit_params) 849 850 # for non-stationarity errors or singular matrices, return None

/anaconda3/lib/python3.6/site-packages/pyramid/arima/arima.py in fit(self, y, exogenous, **fit_args) 375 # As of version 0.7.2, start saving the version with the model so 376 # we can track changes over time. --> 377 self.pkg_version_ = pyramid.version 378 return self 379

AttributeError: module 'pyramid' has no attribute 'version'

Versions
1 import pyramid; print("Pyramid", pyramid.__version__)
AttributeError: module 'pyramid' has no attribute 'version'

-->
:beetle: : bug namespace collision

opened by nikhilsharma010 12

How to make `predict_in_sample()` derived from the first `d` term of original data when differencing order (`d`) isn't zero

Describe the question you have

I think the the usage of predict_in_sample() in pmdarima(python pkg) is same as fitted() in forecast(R pkg).

But I find out that the output of predict_in_sample() in pmdarima(python pkg) is different with output of fitted() in forecast(R pkg) when difference order isn't zero.

I use the following python codes to generate the output of predict_in_sample() in three different differencing order (d) :

import numpy as np
from pmdarima.arima import ARIMA, auto_arima

for diff_ord in range(1,4):
    model = ARIMA(order=(2,diff_ord,1), out_of_sample_size=0, mle_regression=True, suppress_warnings=True)

    ori_time_series = np.array([0.49958017, 0.15162735, 0.86757565, 0.3093554, 0.20545085, -0.48288408, 0.6880291,
                                0.8461229, 0.8320223, -0.7372907, 0.6048833, 0.40874475, 0.57708055, 0.27590698,
                                -0.21213382, 0.4236031, 0.3324298, -0.076647766, -0.20372462, 0.93162024, 0.5740154])

    model = model.fit(ori_time_series)
    pred_in_sample = np.array(model.predict_in_sample())
    print(f"pred_in_sample: {list(pred_in_sample)}")

I copy the output of above python codes, then paste to a R script to compare the difference between predict_in_sample() in pmdarima and fitted() in forecast(R pkg), R script:

par(mfrow=c(3,2))

resid_py_arr = rbind(c(0.5074603452679165, -0.34007254404457005, 0.5791635465131197, -0.2603422455105438, -0.11715725264909946, -0.9906800274596668, 0.21001659120388472, 0.3475615524863871, 0.7611416545018854, -0.8381194770281715, 0.2623376398458589, -0.23206647709890738, 0.4220053862092831, 0.06219916252952257, -0.4222181747488597, 0.038538708232362495, -0.08313995790478867, -0.26009526647620496, -0.48253573807639416, 0.5137650759426555, 0.34360142378578173),
                     c(0.49261470998182477, -0.6081910499091976, 1.0569346650372444, -0.6076380990855064, -0.2378976168316683, -0.8352597431282236, 1.0357942689078028, 0.631678563908118, 0.2767179073401864, -1.6808983058879592, 0.5653496038522947, 0.09552482041758414, 0.3659123662710821, -0.3491598976266268, -0.6745863302200333, 0.27064401578860775, 0.05909012652824919, -0.3734973292295873, -0.4340483262207756, 0.9099684919779161, 0.09388976669825833),
                     c(0.4940135127419584, -0.8586664744669461, 1.3118624972776918, -2.3436128660405227, -0.11312237780455078, -0.2567555216685367, 1.8351322442185336, 0.16659999993838948, -0.7634839186197411, -2.006759899752263, 1.8390440448281202, 0.3681746393692201, -0.30073609275686164, -0.6107051995729558, -0.5143875708651311, 0.8646807292504737, 0.03706843182642394, -0.7386026479980223, -0.1930232885395262, 1.3558665622526413, -0.579683995736004))

fitted_val_py_arr = rbind(c(-0.007880175267916512, 0.49169989404457004, 0.28841210348688023, 0.5696976455105438, 0.32260810264909945, 0.5077959474596667, 0.47801250879611523, 0.4985613475136129, 0.07088064549811457, 0.1008287770281715, 0.3425456601541411, 0.6408112270989074, 0.15507516379071695, 0.21370781747047746, 0.21008435474885973, 0.3850643917676375, 0.4155697579047887, 0.18344750047620495, 0.27881111807639414, 0.4178551640573445, 0.23041397621421822),
                          c(0.006965460018175224, 0.7598183999091976, -0.18935901503724428, 0.9169934990855064, 0.4433484668316683, 0.35237566312822355, -0.34776516890780296, 0.21444433609188196, 0.5553043926598136, 0.9436076058879592, 0.0395336961477053, 0.31321992958241585, 0.21116818372891794, 0.6250668776266268, 0.4624525102200333, 0.15295908421139226, 0.2733396734717508, 0.2968495632295873, 0.23032370622077558, 0.02165174802208386, 0.48012563330174163),
                          c(0.005566657258041537, 1.0102938244669462, -0.44428684727769174, 2.6529682660405225, 0.3185732278045508, -0.22612855833146328, -1.1471031442185335, 0.6795229000616105, 1.595506218619741, 1.2694691997522631, -1.2341607448281202, 0.040570110630779865, 0.8778166427568617, 0.8866121795729558, 0.30225375086513107, -0.44107762925047367, 0.29536136817357606, 0.6619548819980223, -0.010701331460473806, -0.4242463222526414, 1.153699395736004))

y<-c(0.49958017, 0.15162735, 0.86757565, 0.3093554, 0.20545085, -0.48288408, 0.6880291,
     0.8461229, 0.8320223, -0.7372907, 0.6048833, 0.40874475, 0.57708055, 0.27590698,
     -0.21213382, 0.4236031, 0.3324298, -0.076647766, -0.20372462, 0.93162024, 0.5740154)  # The statistical part of the question is understanding that the in-sample one-step-ahead forecasts of an ARIMA model are actually the fitted values of that model. In R, the method fitted applied on model output object normally returns the fitted values of the model. However, the method is not applicable to the output of function arima. There is a workaround: fitted values equal original values minus residuals. Residuals can be extracted from a fitted object using the method residuals (and that applies to the output of function arima).

for (dif_ord in seq(1:3)) {
  #  Better still, use the forecast package which does have a fitted method for outputs from Arima and auto.arima. – Rob Hyndman Feb 26, 2016 at 9:49
  #install.packages('forecast')
  library(forecast)
  fit.model.2 <- Arima(y, order = c(2, dif_ord, 1))
  
  resid_r_forecast_arima <- residuals(fit.model.2)
  resid_py <- resid_py_arr[dif_ord,]

  plot.ts(y, xaxp = c(0, 21, 21), ylim = c(-2,2))
  axis(2, at = seq(-1.5, 1.5, 0.5), tck = 1, lty = 2, col = "grey", labels = NA)  # Add horizontal grid 
  axis(1, at = 1:21, tck = 1, lty = 2, col = "grey", labels = NA)  # Add vertical grid
  lines(resid_r_forecast_arima, col=2, lty=2)
  lines(resid_py, col=3, lty=3)
  legend(13, -1, c("origin", "resid_r", "resid_py"), col=1:3, lty=1:3, cex=1, ncol=3, y.intersp=0, x.intersp=0, text.width=0.9)
  mtext(paste("Check residual series trend for diff order:", dif_ord))
  
  
  
  fitted_val_r_forecast_arima <- fitted(fit.model.2)
  fitted_val_py <- fitted_val_py_arr[dif_ord,]
  plot.ts(y, xaxp = c(0, 21, 21), ylim = c(-2,2))
  axis(2, at = seq(-1.5, 1.5, 0.5), tck = 1, lty = 2, col = "grey", labels = NA)  # Add horizontal grid 
  axis(1, at = 1:21, tck = 1, lty = 2, col = "grey", labels = NA)  # Add vertical grid
  lines(fitted_val_r_forecast_arima, col=2, lty=2)
  lines(fitted_val_py, col=3, lty=3)
  legend(13, -1, c("origin", "fitted_r", "fitted_py"), col=1:3, lty=1:3, cex=1, ncol=3, y.intersp=0, x.intersp=0, text.width=1)
  mtext(paste("Check fitted series trend for diff order:", dif_ord))
}

output image of R script:

I observe that: for the first d(difference order) term has huge difference, then they getting close after d(difference order) term.
And I think the output of forecast(R package) is correct., based on:
- the response in https://github.com/alkaline-ml/pmdarima/issues/140#issuecomment-926592471
  
  This produces some junky values for the first d indices when predicting in-sample.
- and observe first d term of original time-series(black line) and output of fitted() (red line)

My questions are:

Do I have any misunderstanding on above description? If no,
How should I adjust the parameters of predict_in_sample() in pmdarima to get the same output with fitted() in forecast?
- I have tried the the parameter:start, but it only shorten the length of output of predict_in_sample().

Versions (if necessary)

No response

:grey_question: : question

opened by theabc50111 0

fit(), update(), and predict() still take exog as argument

Describe the bug

I read that after 2.0 exog was being replaced with X as argument call when fitting, updating, or predicting. But because of **fit_args these functions still take exog as argument without having an impact then. This can be very confusing or a dangerous source of mistakes for users who updated a pre 2.0 version to 2.0 and upwards because it produces wrong or unexpected predictions.

To Reproduce

General behavior, independent of specific configurations or code.

Versions

>>> import pmdarima; pmdarima.show_versions()

System:
    python: 3.10.6 (main, Nov  2 2022, 18:53:38) [GCC 11.3.0]
executable: /usr/bin/python3
   machine: Linux-5.15.0-53-generic-x86_64-with-glibc2.35

Python dependencies:
        pip: 22.3.1
 setuptools: 59.6.0
    sklearn: 1.1.2
statsmodels: 0.13.2
      numpy: 1.23.3
      scipy: 1.8.1
     Cython: 0.29.32
     pandas: 1.5.0
     joblib: 1.2.0
   pmdarima: 2.0.1
>>> 
>>> # For pmdarima versions <1.5.2 use this:
>>> import platform; print(platform.platform())
Linux-5.15.0-53-generic-x86_64-with-glibc2.35
>>> import sys; print("Python", sys.version)
Python 3.10.6 (main, Nov  2 2022, 18:53:38) [GCC 11.3.0]
>>> import pmdarima; print("pmdarima", pmdarima.__version__)
pmdarima 2.0.1
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.23.3
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.8.1
>>> import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 1.1.2
>>> import statsmodels; print("Statsmodels", statsmodels.__version__)
Statsmodels 0.13.2

Expected Behavior

It should throw an error when still using exog.

Actual Behavior

Is processes exogenous data passed to these functions as exog but these exogenous data have no impact on the model.

Additional Context

No response

:beetle: : bug

opened by Zepp3 0

[MRG] Update CI/CD dependencies and bump citation version
Description

This PR:

Fixes deprecations in our GitHub Actions workflows by changing echo "::set-output name=foo::bar" to echo "foo=bar" >> $GITHUB_OUTPUT

Bumps our setup-python action to v4

Bumps our setup-qemu-action to v2

Changes actions/checkout to use a pinned version (v3) instead of master

Updates our badges and test_tagging workflow to use python 3.11

As an aside, do we still need the test_tagging workflow?

Bumps cibuildwheel to 2.11.2

Updates our CITATION.cff to the latest tag

I think I am going to write a workflow that does this on a release. Seems like a cleaner option

Type of change

[X] Documentation change

[X] CI/CD Change

How Has This Been Tested?

[X] Tests still pass on GHA

Checklist:

[X] I have performed a self-review of my own code

[X] I have commented my code, particularly in hard-to-understand areas

[X] I have made corresponding changes to the documentation

[X] My changes generate no new warnings

[X] I have added tests that prove my fix is effective or that my feature works

[X] New and existing unit tests pass locally with my changes

doc cicd
opened by aaronreidsmith 0
Batch functionality
Is your feature request related to a problem? Please describe.

I would like to call the ARIMA on batches of data, i.e. the data has the shape [batchsize, timesteps], and therefore generate output that has the shape [batchsize, output_size].

Describe the solution you'd like

The auto_arima.predict() function would be cooler if it would take also batched inputs.

Describe alternatives you've considered

Currently I am doing it with a very ugly for-loop:

output_list for i in range(batchsize): model = pm.auto_arima(data[i,:]) output = model.predict(output_size) output_list.append(output) output = np.array(output)

However this is not very pythonic and quite slow - maybe I have only missed a batch functionality?

Additional Context

I hope i havend missed that this feature already exists. Thanks alot! :)
feature request
opened by Jostarndt 0
Support `statsmodels.tsa.arima.model.ARIMA`

Describe the question you have

Hi, may I ask is there any reason that statsmodels.tsa.arima.model.ARIMA is not supported in pmdarima?

This post from statsmodels mentioned that:

Finally, there is a new model class ARIMA, which is meant to (eventually) be a single point of entry for all ARIMA-type models (including SARIMAX models). It is a subclass of statespace.SARIMAX but allows fitting using any of the estimators above.

And imo ARIMA could be more performant than SARIMAX because the different methods (estimators) provided in ARIMA.fit() could potentially speed up the fit process (see the inline comment in statsmodels.tsa.arima.model.ARIMA).

Reference of ARIMA's fit method: https://www.statsmodels.org/devel/generated/statsmodels.tsa.arima.model.ARIMA.fit.html#statsmodels.tsa.arima.model.ARIMA.fit

So would you consider adding / replacing the current implementation with statsmodels.tsa.arima.model.ARIMA?

Please let me know if you have a different opinion.

Thanks!

Versions (if necessary)

No response
:grey_question: : question

opened by jasmineliaw 0

Huge difference between auto_arima with and without seasonality

Describe the question you have

Hi, I am forecasting several time series with auto arima, once without seasonality and once with seasonality. This is part of an automated algorithm which compares both forecasts and then decides which is the better forecast. The results for one time series is as follows:

 ARIMA(0,0,0)(0,0,0)[0] intercept   : AIC=1063.124, Time=0.00 sec
 ARIMA(1,0,0)(0,0,0)[0] intercept   : AIC=1065.064, Time=0.03 sec
 ARIMA(0,0,1)(0,0,0)[0] intercept   : AIC=1065.069, Time=0.06 sec
 ARIMA(0,0,0)(0,0,0)[0]             : AIC=1064.208, Time=0.00 sec
 ARIMA(1,0,1)(0,0,0)[0] intercept   : AIC=inf, Time=0.09 sec

Best model:  ARIMA(0,0,0)(0,0,0)[0] intercept
Total fit time: 0.235 seconds
Performing stepwise search to minimize aic
 ARIMA(0,1,0)(0,1,0)[12]             : AIC=895.902, Time=0.03 sec
 ARIMA(1,1,0)(1,1,0)[12]             : AIC=861.360, Time=0.50 sec
 ARIMA(0,1,1)(0,1,1)[12]             : AIC=inf, Time=0.56 sec
 ARIMA(1,1,0)(0,1,0)[12]             : AIC=879.416, Time=0.02 sec
 ARIMA(1,1,0)(2,1,0)[12]             : AIC=859.139, Time=4.96 sec
 ARIMA(1,1,0)(3,1,0)[12]             : AIC=857.942, Time=10.18 sec
 ARIMA(1,1,0)(3,1,1)[12]             : AIC=inf, Time=5.66 sec
 ARIMA(1,1,0)(2,1,1)[12]             : AIC=inf, Time=5.40 sec
 ARIMA(0,1,0)(3,1,0)[12]             : AIC=inf, Time=5.11 sec
 ARIMA(2,1,0)(3,1,0)[12]             : AIC=858.513, Time=3.30 sec
 ARIMA(1,1,1)(3,1,0)[12]             : AIC=inf, Time=4.75 sec
 ARIMA(0,1,1)(3,1,0)[12]             : AIC=inf, Time=3.49 sec
 ARIMA(2,1,1)(3,1,0)[12]             : AIC=inf, Time=17.40 sec
 ARIMA(1,1,0)(3,1,0)[12] intercept   : AIC=859.944, Time=8.52 sec

Best model:  ARIMA(1,1,0)(3,1,0)[12]
Total fit time: 739.036 seconds```

What I am wondering, why is the total fit time between the two so large as the total fit time of auto arima with seasonalitty is more than 700 seconds. This happens regularly, for other time series as well. 


### Versions (if necessary)

_No response_

:grey_question: : question

opened by tobiasderoos 1

Releases(v2.0.2)

v2.0.2(Nov 28, 2022)

Add support for Python 3.11
Source code(tar.gz)
Source code(zip)
v2.0.2rc1(Nov 28, 2022)

Pre-release for version 2.0.2, including Python 3.11 support.
Source code(tar.gz)
Source code(zip)
v2.0.1(Aug 23, 2022)
Add support for macOS with M1 chip

Source code(tar.gz)
Source code(zip)
v2.0.1rc1(Aug 23, 2022)
Add support for macOS with M1 chip

Source code(tar.gz)
Source code(zip)
v2.0.0(Aug 20, 2022)
Potentially breaking changes:

Use of the exogenous keyword (deprecated in 1.8.0) will now raise a TypeError

Use of the sarimax_kwargs keyword (deprecated in 1.5.1) will now raise a TypeError

A falsey value for ARIMA's method argument (deprecated pre-1.5.0) will now raise a ValueError

A falsey value for ARIMA's maxiter argument will now raise a ValueError (warning since 1.5.0)

pmdarima is no longer built for 32-bit architectures

macOS images are built using macOS 11 instead of macOS 10.15

Other changes:

Bump numpy dependency to >= 1.21

Expose fittedvalues in the public API. See https://github.com/alkaline-ml/pmdarima/issues/493

Add support for ARM64 architecture. See https://github.com/alkaline-ml/pmdarima/issues/434

Introduce new arg, preserve_series, to pmdarima.utils.check_endog that will preserve or squeeze a Pandas Series object to preserve index information.

Update Cython pinned version to include !=0.29.31

Source code(tar.gz)
Source code(zip)
v1.8.5(Feb 22, 2022)
Add support for Python 3.10

Remove support for Python 3.6 (EOL: 23 Dec 2021)

Source code(tar.gz)
Source code(zip)
v1.8.4(Nov 5, 2021)
Version 1.8.4

Add compatibility for statsmodels 0.13 and higher

Source code(tar.gz)
Source code(zip)
v1.8.3(Sep 24, 2021)
Version 1.8.3

Fix a bug in tsdisplay where a value of lag_max larger than the length of the series would create a cryptic numpy broadcasting error. This precondition will still cause an error, but now it is one the user can better understand. See #440

Change numpy pin to numpy>=1.19.3 (and build on lowest supported version) to no longer limit users' NumPy versions. This addresses #449

Fix a bug where scikit-learn version 1.0.0 was raising ValueError when calling if_delegate_has_method, addressing #454

Source code(tar.gz)
Source code(zip)
v1.8.3rc0(Jul 22, 2021)
Fixes #440

Test new Circle CI deployment

Source code(tar.gz)
Source code(zip)
v1.8.2(Apr 19, 2021)
Version 1.8.2

Change numpy pin to ~=1.19.0 to avoid incompatibility issues, addressing #423

Source code(tar.gz)
Source code(zip)
v1.8.2rc3(Apr 19, 2021)
Version 1.8.2-RC3

Change numpy pin to be ~=1.19.0 to avoid incompatibility issues

Source code(tar.gz)
Source code(zip)
v1.8.2rc1(Apr 19, 2021)
Version 1.8.2-RC1

Change numpy pin to be ~=1.19 to avoid incompatibility issues

Source code(tar.gz)
Source code(zip)
v1.8.2rc2(Apr 19, 2021)
Version 1.8.1-RC2

Change numpy pin to be ~=1.19.0 to avoid incompatibility issues

Source code(tar.gz)
Source code(zip)
v1.8.1(Apr 18, 2021)
Version 1.8.1

Address issue 370 where iterables were not accepted in the ARIMA order.

Address issue 407 where the LogEndogTransformer could not be cloned in a pipeline.

No longer pin Cython to <0.29.18

Add support for Python 3.9

Source code(tar.gz)
Source code(zip)
v1.8.0(Dec 2, 2020)
Version 1.8.0

Wheels are no longer built for pmdarima on Python <3.6, and backward-compatibility is no longer guaranteed for older python versions.

The exogenous argument has been deprecated in favor of X - See the RFC and the PR for more information. Beginning in version 2.0, the exogenous argument will raise an error.

Migrate random searches into the auto-solvers interface

Random searches now perform unit root tests to prevent models with near non-invertible parameters

The default value of suppress_warnings has changed to True. The primary reason for this is that most warnings emitted come from unit root tests, which are very noisy. DeprecationWarnings and other warnings generated from user input will still be emitted.

Move ModelFitWarning from pmdarima.arima.warnings to pmdarima.warnings

Fix a bug where the pmdarima.model_selection.RollingForecastCV could produce too few splits for the given input data.

Change pin for setuptools from <50.0.0 to !=50.0.0, addressing #401

Change pin for statsmodels from <0.12.0 to !=0.12.0, addressing #376

Source code(tar.gz)
Source code(zip)
v1.7.1(Sep 2, 2020)
Version 1.7.1

Pins statsmodels <0.12 to get around single-step forecasts with an exog array

Fixes new issues introduced by latest setuptools

Deprecate Python 3.5 support, which will be removed in the next release cycle

Source code(tar.gz)
Source code(zip)
v1.7.0(Aug 4, 2020)
v1.7.0

Address issue #341 where a combination of a large m value and D value could difference an array into being too small to test stationarity in the ADF test

Fix issue #351 where a large value of m could prevent the seasonality test from completing.

Fix issue #354 where models with near non-invertible roots could still be considered as candidate best-fits.

Remove legacy pickling behavior that separates the statsmodels object from the pmdarima object. This breaks backwards compatibility with versions pre-1.2.0.

Change default with_intercept in pmdarima.arima.auto_arima to 'auto' rather than True. This will behave much like the current behavior, where a truthiness check will still return True, but allows the stepwise search to selectively change it to False in the presence of certain differencing conditions.

Inverse endog transformation is now supported when return_conf_int=True on pipeline predictions

Fix a bug where the pmdarima.model_selection.SlidingWindowForecastCV could produce too few splits for the given input data.

Permit custom scoring metrics to be passed for out-of-sample scoring, as requested in #368

Source code(tar.gz)
Source code(zip)
v1.6.1(May 19, 2020)
Pin Cython to be >=0.29,<0.29.18

Pin statsmodels to be >=0.11

Source code(tar.gz)
Source code(zip)
v1.6.0(May 1, 2020)
Support newest versions of matplotlib

Add new level of auto_arima error actions: "trace" which will warn for errors while dumping the original stacktrace.

New featurizer: pmdarima.preprocessing.DateFeaturizer. This can be used to create dummy and ordinal exogenous features and is useful when modeling pseudo-seasonal trends or time series with holes in them.

Removes first-party conda distributions (see #326)

Raise a ValueError in arima.predict_in_sample when start < d

Source code(tar.gz)
Source code(zip)
v1.6.0rc1(Apr 30, 2020)

Release candidate for 1.6.0 release
Source code(tar.gz)
Source code(zip)
v1.5.3(Feb 14, 2020)
Version 1.5.3

Adds first-party conda distributions as requested in #173

Due to dependency limitations, we only support 64-bit architectures and Python 3.6 or 3.7

Adds Python 3.8 support as requested in #199

Added pmdarima.datasets.load_gasoline

Added integer levels of verbosity in the trace argument

Added support for statsmodels 0.11+

Added pmdarima.model_selection.cross_val_predict, as requested in #291

Source code(tar.gz)
Source code(zip)
v1.5.2(Dec 17, 2019)
Version 1.5.2

Added pmdarima.show_versions as a utility for issue filing

Fixed deprecation for check_is_fitted in newer versions of scikit-learn

Adds the pmdarima.datasets.load_sunspots() method with R’s sunspots dataset

Adds the pmdarima.model_selection.train_test_split() method

Fix bug where 1.5.1 documentation was labeled version “0.0.0”.

Fix bug reported in #271, where the use of threading.local to store stepwise context information may have broken job schedulers.

Fix bug reported in #272, where the new default value of max_order can cause a ValueError even in default cases when stepwise=False.

Source code(tar.gz)
Source code(zip)
v1.5.1(Dec 6, 2019)

Fixes a bug in v1.5.0 where the pmdarima.__version__ attribute returned 0.0.0
Source code(tar.gz)
Source code(zip)
v1.5.0(Dec 6, 2019)
No longer use statsmodels' ARIMA or ARMA class under the hood; only use the SARIMAX model, which cuts back on a lot of errors/warnings we saw in the past. (#211)

Defaults in the ARIMA class that have changed as a result of #211:

maxiter is now 50 (was None)

method is now 'lbfgs' (was None)

seasonal_order is now (0, 0, 0, 0) (was None)

max_order is now 5 (was 10) and is no longer used as a constraint when stepwise=True

Correct bug where aicc always added 1 (for constant) to degrees of freedom, even when df_model accounted for the constant term.

New pmdarima.arima.auto.StepwiseContext feature for more control over fit duration (introduced by @kpsunkara in #221.

Adds the pmdarima.preprocessing.LogEndogTransformer class as discussed in #205

Exogenous arrays are no longer cast to numpy array by default, and will pass pandas frames through to the model. This keeps variable names intact in the summary #222

Added the prefix param to exogenous featurizers to allow the addition of meaningful names to engineered features.

Added polyroot test of near non-invertibility when stepwise=True. For models that are near non-invertible will be deprioritized in model selection as requested in #208

Removes pmdarima.arima.ARIMA.add_new_samples, which was previously deprecated. Use pmdarima.arima.ARIMA.update instead.

The following args have been deprecated from the pmdarima.arima.ARIMA class as well as pmdarima.arima.auto_arima and any other calling methods/classes:

disp^[1]

callback^[1]

transparams

solver

typ

[1] These can still be passed to the fit method via **fit_kwargs, but should no longer be passed to the model constructor.

Added diff_inv function that is in parity with R's implementation, as requested in #180

Added decompose function that is in parity with R's implementation, as requested in #190

Source code(tar.gz)
Source code(zip)
v1.4.0(Nov 6, 2019)
Fixes #191, an issue where the OCSB test could raise ValueError: negative dimensions are not allowed" in OCSB test

Add option to automatically inverse-transform endogenous transformations when predicting from pipelines (#197)

Add predict_in_sample to pipeline (#196)

Parameterize dtype option in datasets module

Adds the model_selection submodule, which defines several different cross-validation classes as well as CV functions:

pmdarima.model_selection.RollingForecastCV

pmdarima.model_selection.SlidingWindowForecastCV

pmdarima.model_selection.cross_validate

pmdarima.model_selection.cross_val_score

Adds the pmdarima.datasets.load_taylor dataset

Source code(tar.gz)
Source code(zip)
v1.3.0(Sep 7, 2019)
v1.3.0

Adds a new dataset for stock prediction, along with an associated example (load_msft)

Fixes a bug in predict_in_sample, as addressed in #140.

Numpy 1.16+ is now required

Statsmodels 0.10.0+ is now required

Added sarimax_kwargs to ARIMA constructor and auto_arima function. This fixes #146

Source code(tar.gz)
Source code(zip)
v1.2.1(Jun 12, 2019)

This is a patch release specifically to get around the statsmodels issue:

https://github.com/statsmodels/statsmodels/issues/5747

This pins scipy at 1.12 until statsmodels releases 0.10.0 (at some point in June 2019). Additionally, deprecation warnings are fixed in the scikit-learn dependency.
Source code(tar.gz)
Source code(zip)
v1.2.0(Apr 27, 2019)
v1.2.0

Adds the OCSBTest of seasonality, as discussed in #88

Default value of seasonal_test changes from "ch" to "ocsb" in auto_arima

Default value of test changes from "ch" to "ocsb" in nsdiffs

Adds benchmarking notebook and capabilities in pytest plugins

Removes the following environment variables, which are now deprecated:

PMDARIMA_CACHE and PYRAMID_ARIMA_CACHE

PMDARIMA_CACHE_WARN_SIZE and PYRAMID_ARIMA_CACHE_WARN_SIZE

PYRAMID_MPL_DEBUG

PYRAMID_MPL_BACKEND

Deprecates the is_stationary method in tests of stationarity. This will be removed in v1.4.0. Use should_diff instead.

Adds two new datasets: airpassengers & austres

When using out_of_sample, the out-of-sample predictions are now stored under the oob_preds_ attribute.

Adds a number of transformer classes including:

BoxCoxEndogTransformer

FourierFeaturizer

Adds a Pipeline class resembling that of scikit-learn's, which allows the stacking of transformers together.

Adds a class wrapper for auto_arima: AutoARIMA. This is allows auto-ARIMA to be used with pipelines.

Source code(tar.gz)
Source code(zip)
v1.1.1(Mar 26, 2019)
v1.1.1 is a patch release in response to #104

Deprecates the ARIMA.add_new_observations method. This method originally was designed to support updating the endogenous/exogenous arrays with new observations without changing the model parameters, but achieving this behavior for each of statsmodels' ARMA, ARIMA and SARIMAX classes proved nearly impossible, given the extremely complex internals of statmodels estimators.

Replace ARIMA.add_new_observations with ARIMA.update. This allows the user to update the model with new observations by taking maxiter new steps from the existing model coefficients and allowing the MLE to converge to an updated set of model parameters.

Change default maxiter to None, using 50 for seasonal models and 500 for non-seasonal models (as statsmodels does). The default value used to be 50 for all models.

New behavior in ARIMA.fit allows start_params and maxiter to be passed as **fit_args, overriding the use of their corresponding instance attributes.

Source code(tar.gz)
Source code(zip)
v1.1.0(Dec 27, 2018)
Release 1.1.0 adds:

ARIMA.plot_diagnostics method, as requested in #49

Adds new arg to ARIMA constructor and auto_arima: with_intercept (default is True).

New default for trend is no longer 'c', it is None.

Added to_dict method to ARIMA class to address #54

The 'PMDARIMA_CACHE' and 'PMDARIMA_CACHE_WARN_SIZE' environment variables are now deprecated, since they no longer need to be used. They will be removed in v1.2.0

Added versioned documentation. All releases' doc (from 0.9.0 onward) is now available at alkaline-ml.com/pmdarima/<version>

Python 3.7 support(!!)

Source code(tar.gz)
Source code(zip)