Skforecast is a python library that eases using scikit-learn regressors as multi-step forecasters

Overview

Maintenance Lifecycle Python Licence Downloads PyPI

skforecast

logo-skforecast

Time series forecasting with scikit-learn regressors.

Skforecast is a python library that eases using scikit-learn regressors as multi-step forecasters. It also works with any regressor compatible with the scikit-learn API (pipelines, CatBoost, LightGBM, XGBoost, Ranger...).

Documentation: https://joaquinamatrodrigo.github.io/skforecast/

Version 0.4 has undergone a huge code refactoring. Main changes are related to input-output formats (only pandas series and dataframes are allowed although internally numpy arrays are used for performance) and model validation methods (unified into backtesting with and without refit). Changelog

Table of contents

Installation

pip install skforecast

Specific version:

pip install skforecast==0.4.1

Latest (unstable):

pip install git+https://github.com/JoaquinAmatRodrigo/skforecast#master

The most common error when importing the library is:

'cannot import name 'mean_absolute_percentage_error' from 'sklearn.metrics'.

This is because the scikit-learn installation is lower than 0.24. Try to upgrade scikit-learn with

pip3 install -U scikit-learn

There is a current problem when installing statsmodels 0.13 in Google Colab. To avoid this dependency issues when installing skforecast:

pip install statsmodels==0.12.2
pip install skforecast

Dependencies

  • numpy>=1.20.1
  • pandas>=1.2.2
  • tqdm>=4.57.0
  • scikit-learn>=1.0.1
  • statsmodels>=0.12.2

Features

  • Create recursive autoregressive forecasters from any scikit-learn regressor
  • Create multi-output autoregressive forecasters from any scikit-learn regressor
  • Grid search to find optimal hyperparameters
  • Grid search to find optimal lags (predictors)
  • Include exogenous variables as predictors
  • Include custom predictors (rolling mean, rolling variance ...)
  • Backtesting
  • Prediction interval estimated by bootstrapping
  • Get predictor importance

Introduction

A time series is a sequence of data arranged chronologically, in principle, equally spaced in time. Time series forecasting is the use of a model to predict future values based on previously observed values, with the option of also including other external variables.

When working with time series, it is seldom needed to predict only the next element in the series (t+1). Instead, the most common goal is to predict a whole future interval (t+1, ..., t+n) or a far point in time (t+n). There are several strategies that allow generating this type of multiple predictions.

Recursive multi-step forecasting

Since the value of t(n) is required to predict the point t(n-1), and t(n-1) is unknown, it is necessary to make recursive predictions in which, each new prediction, is based on the previous one. This process is known as recursive forecasting or recursive multi-step forecasting.

forecasting-python-multi-step


The main challenge when using scikit-learn models for recursive multi-step forecasting is transforming the time series in an matrix where, each value of the series, is related to the time window (lags) that precedes it. This forecasting strategy can be easily generated with the classes ForecasterAutoreg and ForecasterAutoregCustom.

forecasting-python

Time series transformation into a matrix of 5 lags and a vector with the value of the series that follows each row of the matrix.

forecasting-python

Time series transformation including an exogenous variable.



Direct multi-step forecasting

This strategy consists of training a different model for each step. For example, to predict the next 5 values of a time series, 5 different models are trainded, one for each step. As a result, the predictions are independent of each other. This forecasting strategy can be easily generated with the ForecasterAutoregMultiOutput class (changed in version 0.1.9).

forecasting-python-direct-multi-step

Time series transformation into the matrices needed to train a direct multi-step forecaster.

Multiple output forecasting

Certain models are capable of simultaneously predicting several values of a sequence (one-shot), for example, LSTM neural networks. This strategy is not implemented in skforecast.



Getting started

Autoregressive forecaster

# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'datetime'])

# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data['y']
data = data.sort_index()

# Split train-test
# ==============================================================================
steps = 36
data_train = data[:-steps]
data_test  = data[-steps:]

# Plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(9, 4))
data_train.plot(ax=ax, label='train')
data_test.plot(ax=ax, label='test')
ax.legend()

# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                regressor = RandomForestRegressor(random_state=123),
                lags      = 15
             )

forecaster.fit(y=data_train)
forecaster
================= 
ForecasterAutoreg 
================= 
Regressor: RandomForestRegressor(random_state=123) 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15] 
Window size: 15 
Included exogenous: False 
Type of exogenous variable: None 
Exogenous variables names: None 
Training range: [Timestamp('1991-07-01 00:00:00'), Timestamp('2005-06-01 00:00:00')] 
Training index type: DatetimeIndex 
Training index frequency: MS 
Regressor parameters: {'bootstrap': True, 'ccp_alpha': 0.0, 'criterion': 'squared_error', 'max_depth': None, 'max_features': 'auto', 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 100, 'n_jobs': None, 'oob_score': False, 'random_state': 123, 'verbose': 0, 'warm_start': False} 
Creation date: 2022-01-02 16:50:21 
Last fit date: 2022-01-02 16:50:21 
Skforecast version: 0.4.2 
# Predict
# ==============================================================================
predictions = forecaster.predict(steps=36)
predictions.head(3)
2005-07-01    0.921840
2005-08-01    0.954921
2005-09-01    1.101716
Freq: MS, Name: pred, dtype: float64
# Plot predictions
# ==============================================================================
fig, ax=plt.subplots(figsize=(9, 4))
data_train.plot(ax=ax, label='train')
data_test.plot(ax=ax, label='test')
predictions.plot(ax=ax, label='predictions')
ax.legend()

# Prediction error
# ==============================================================================
error_mse = mean_squared_error(
                y_true = data_test,
                y_pred = predictions
            )
print(f"Test error (mse): {error_mse}")
Test error (mse): 0.00429855684785846
# Feature importance
# ==============================================================================
forecaster.get_feature_importance()
| feature   |   importance |
|-----------|--------------|
| lag_1     |   0.0123397  |
| lag_2     |   0.0851603  |
| lag_3     |   0.0134071  |
| lag_4     |   0.00437446 |
| lag_5     |   0.00318805 |
| lag_6     |   0.00343593 |
| lag_7     |   0.00313612 |
| lag_8     |   0.00714094 |
| lag_9     |   0.00783053 |
| lag_10    |   0.0127507  |
| lag_11    |   0.00901919 |
| lag_12    |   0.807098   |
| lag_13    |   0.00481128 |
| lag_14    |   0.0163282  |
| lag_15    |   0.0099792  |

Autoregressive forecaster with exogenous predictors

# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o_exog.csv')
data = pd.read_csv(url, sep=',', header=0, names=['datetime', 'y', 'exog_1', 'exog_2'])

# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data.sort_index()

# Plot
# ==============================================================================
fig, ax=plt.subplots(figsize=(9, 4))
data.plot(ax=ax);

# Split train-test
# ==============================================================================
steps = 36
data_train = data.iloc[:-steps, :]
data_test  = data.iloc[-steps:, :]
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                regressor = RandomForestRegressor(random_state=123),
                lags      = 15
             )

forecaster.fit(
    y    = data_train['y'],
    exog = data_train[['exog_1', 'exog_2']]
)

forecaster
================= 
ForecasterAutoreg 
================= 
Regressor: RandomForestRegressor(random_state=123) 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15] 
Window size: 15 
Included exogenous: True 
Type of exogenous variable: 
   
     
Exogenous variables names: ['exog_1', 'exog_2'] 
Training range: [Timestamp('1992-04-01 00:00:00'), Timestamp('2005-06-01 00:00:00')] 
Training index type: DatetimeIndex 
Training index frequency: MS 
Regressor parameters: {'bootstrap': True, 'ccp_alpha': 0.0, 'criterion': 'squared_error', 'max_depth': None, 'max_features': 'auto', 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 100, 'n_jobs': None, 'oob_score': False, 'random_state': 123, 'verbose': 0, 'warm_start': False} 
Creation date: 2022-01-02 16:51:34 
Last fit date: 2022-01-02 16:51:34 
Skforecast version: 0.4.2 

   
# Feature importance
# ==============================================================================
forecaster.get_feature_importance()
| feature   |   importance |
|-----------|--------------|
| lag_1     |   0.0133541  |
| lag_2     |   0.0611202  |
| lag_3     |   0.00908617 |
| lag_4     |   0.00272094 |
| lag_5     |   0.00247847 |
| lag_6     |   0.00315493 |
| lag_7     |   0.00217887 |
| lag_8     |   0.00815443 |
| lag_9     |   0.0103189  |
| lag_10    |   0.0205869  |
| lag_11    |   0.00703555 |
| lag_12    |   0.773389   |
| lag_13    |   0.00458297 |
| lag_14    |   0.0181272  |
| lag_15    |   0.00873237 |
| exog_1    |   0.0103638  |
| exog_2    |   0.0446156  |

Autoregressive forecaster with custom predictors

# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.ForecasterAutoregCustom import ForecasterAutoregCustom
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'datetime'])

# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data['y']
data = data.sort_index()

# Split train-test
# ==============================================================================
steps = 36
data_train = data[:-steps]
data_test  = data[-steps:]
# Custom function to create predictors
# ==============================================================================
def create_predictors(y):
    '''
    Create first 10 lags of a time series.
    Calculate moving average with window 20.
    '''
    
    lags = y[-1:-11:-1]
    mean = np.mean(y[-20:])
    predictors = np.hstack([lags, mean])
    
    return predictors
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoregCustom(
                    regressor      = RandomForestRegressor(random_state=123),
                    fun_predictors = create_predictors,
                    window_size    = 20
                )

forecaster.fit(y=data_train)
forecaster
======================= 
ForecasterAutoregCustom 
======================= 
Regressor: RandomForestRegressor(random_state=123) 
Predictors created with function: create_predictors 
Window size: 20 
Included exogenous: False 
Type of exogenous variable: None 
Exogenous variables names: None 
Training range: [Timestamp('1991-07-01 00:00:00'), Timestamp('2005-06-01 00:00:00')] 
Training index type: DatetimeIndex 
Training index frequency: MS 
Regressor parameters: {'bootstrap': True, 'ccp_alpha': 0.0, 'criterion': 'squared_error', 'max_depth': None, 'max_features': 'auto', 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 100, 'n_jobs': None, 'oob_score': False, 'random_state': 123, 'verbose': 0, 'warm_start': False} 
Creation date: 2022-01-02 16:52:12 
Last fit date: 2022-01-02 16:52:12 
Skforecast version: 0.4.2
# Predict
# ==============================================================================
predictions = forecaster.predict(steps=36)
predictions.head(3)
2005-07-01    0.926598
2005-08-01    0.948202
2005-09-01    1.020947
Freq: MS, Name: pred, dtype: float64
# Feature importance
# ==============================================================================
forecaster.get_feature_importance()
| feature             |   importance |
|---------------------|--------------|
| custom_predictor_0  |    0.53972   |
| custom_predictor_1  |    0.119097  |
| custom_predictor_2  |    0.0464036 |
| custom_predictor_3  |    0.0241653 |
| custom_predictor_4  |    0.0305667 |
| custom_predictor_5  |    0.0151391 |
| custom_predictor_6  |    0.0428832 |
| custom_predictor_7  |    0.012742  |
| custom_predictor_8  |    0.018938  |
| custom_predictor_9  |    0.108639  |
| custom_predictor_10 |    0.0417066 |

Prediction intervals

# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'datetime'])

# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data['y']
data = data.sort_index()

# Split train-test
# ==============================================================================
steps = 36
data_train = data[:-steps]
data_test  = data[-steps:]
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                    regressor = make_pipeline(StandardScaler(), Ridge()),
                    lags      = 15
                )

forecaster.fit(y=data_train)
forecaster
================= 
ForecasterAutoreg 
================= 
Regressor: Pipeline(steps=[('standardscaler', StandardScaler()), ('ridge', Ridge())]) 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15] 
Window size: 15 
Included exogenous: False 
Type of exogenous variable: None 
Exogenous variables names: None 
Training range: [Timestamp('1991-07-01 00:00:00'), Timestamp('2005-06-01 00:00:00')] 
Training index type: DatetimeIndex 
Training index frequency: MS 
Regressor parameters: {'standardscaler__copy': True, 'standardscaler__with_mean': True, 'standardscaler__with_std': True, 'ridge__alpha': 1.0, 'ridge__copy_X': True, 'ridge__fit_intercept': True, 'ridge__max_iter': None, 'ridge__normalize': 'deprecated', 'ridge__positive': False, 'ridge__random_state': None, 'ridge__solver': 'auto', 'ridge__tol': 0.001} 
Creation date: 2022-01-02 16:53:00 
Last fit date: 2022-01-02 16:53:00 
Skforecast version: 0.4.2 
# Prediction intervals
# ==============================================================================
predictions = forecaster.predict_interval(
                    steps    = steps,
                    interval = [5, 95],
                    n_boot   = 500
              )


fig, ax=plt.subplots(figsize=(9, 4))
data_test.plot(ax=ax, label='test')
predictions['pred'].plot(ax=ax, label='predictions')
ax.fill_between(
    predictions.index,
    predictions['lower_bound'],
    predictions['upper_bound'],
    alpha=0.5
)
ax.legend()

Backtesting

# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import backtesting_forecaster
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'datetime'])

# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data['y']
data = data.sort_index()

# Split train-backtest
# ==============================================================================
n_backtest = 36*3  # Last 9 years are used for backtest
data_train = data[:-n_backtest]
data_backtest = data[-n_backtest:]

# Plot
# ==============================================================================
fig, ax=plt.subplots(figsize=(9, 4))
data_train.plot(ax=ax, label='train')
data_backtest.plot(ax=ax, label='backtest')
ax.legend()
# Backtest forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                regressor = RandomForestRegressor(random_state=123),
                lags      = 15 
             )

metric, predictions_backtest = backtesting_forecaster(
                                    forecaster = forecaster,
                                    y          = data,
                                    initial_train_size = len(data_train),
                                    steps      = 12,
                                    metric     = 'mean_squared_error',
                                    refit      = True,
                                    verbose    = True
                               )
Information of backtesting process
----------------------------------
Number of observations used for initial training: 96
Number of observations used for backtesting: 108
    Number of folds: 9
    Number of steps per fold: 12

Data partition in fold: 0
    Training:   1991-07-01 00:00:00 -- 1999-06-01 00:00:00
    Validation: 1999-07-01 00:00:00 -- 2000-06-01 00:00:00
Data partition in fold: 1
    Training:   1991-07-01 00:00:00 -- 2000-06-01 00:00:00
    Validation: 2000-07-01 00:00:00 -- 2001-06-01 00:00:00
Data partition in fold: 2
    Training:   1991-07-01 00:00:00 -- 2001-06-01 00:00:00
    Validation: 2001-07-01 00:00:00 -- 2002-06-01 00:00:00
Data partition in fold: 3
    Training:   1991-07-01 00:00:00 -- 2002-06-01 00:00:00
    Validation: 2002-07-01 00:00:00 -- 2003-06-01 00:00:00
Data partition in fold: 4
    Training:   1991-07-01 00:00:00 -- 2003-06-01 00:00:00
    Validation: 2003-07-01 00:00:00 -- 2004-06-01 00:00:00
Data partition in fold: 5
    Training:   1991-07-01 00:00:00 -- 2004-06-01 00:00:00
    Validation: 2004-07-01 00:00:00 -- 2005-06-01 00:00:00
Data partition in fold: 6
    Training:   1991-07-01 00:00:00 -- 2005-06-01 00:00:00
    Validation: 2005-07-01 00:00:00 -- 2006-06-01 00:00:00
Data partition in fold: 7
    Training:   1991-07-01 00:00:00 -- 2006-06-01 00:00:00
    Validation: 2006-07-01 00:00:00 -- 2007-06-01 00:00:00
Data partition in fold: 8
    Training:   1991-07-01 00:00:00 -- 2007-06-01 00:00:00
    Validation: 2007-07-01 00:00:00 -- 2008-06-01 00:00:00
fig, ax = plt.subplots(figsize=(9, 4))
data_backtest.plot(ax=ax, label='backtest')
predictions_backtest.plot(ax=ax, label='predictions')
ax.legend()

Model tuning

# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import grid_search_forecaster
from sklearn.ensemble import RandomForestRegressor
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'datetime'])

# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data['y']
data = data.sort_index()

# Split train-test
# ==============================================================================
steps = 24
data_train = data.loc[: '2001-01-01']
data_val = data.loc['2001-01-01' : '2006-01-01']
data_test  = data.loc['2006-01-01':]

# Plot
# ==============================================================================
fig, ax=plt.subplots(figsize=(9, 4))
data_train.plot(ax=ax, label='train')
data_val.plot(ax=ax, label='validation')
data_test.plot(ax=ax, label='test')
ax.legend()

# Grid search hyperparameters and lags
# ==============================================================================
forecaster = ForecasterAutoreg(
                regressor = RandomForestRegressor(random_state=123),
                lags      = 12 # Placeholder, the value will be overwritten
             )

# Regressor hyperparameters
param_grid = {'n_estimators': [50, 100],
              'max_depth': [5, 10, 15]}

# Lags used as predictors
lags_grid = [3, 10, [1, 2, 3, 20]]

results_grid = grid_search_forecaster(
                        forecaster  = forecaster,
                        y           = data.loc[:'2006-01-01'],
                        param_grid  = param_grid,
                        lags_grid   = lags_grid,
                        steps       = 12,
                        refit       = True,
                        metric      = 'mean_squared_error',
                        initial_train_size = len(data_train),
                        return_best = True,
                        verbose     = False
                    )

results_grid
Number of models compared: 18
loop lags_grid:   0%|                                               | 0/3 [00:00
| lags                            | params                                 |    metric |   max_depth |   n_estimators |
|---------------------------------|----------------------------------------|-----------|-------------|----------------|
| [ 1  2  3  4  5  6  7  8  9 10] | {'max_depth': 5, 'n_estimators': 50}   | 0.0334486 |           5 |             50 |
| [ 1  2  3  4  5  6  7  8  9 10] | {'max_depth': 10, 'n_estimators': 50}  | 0.0392212 |          10 |             50 |
| [ 1  2  3  4  5  6  7  8  9 10] | {'max_depth': 15, 'n_estimators': 100} | 0.0392658 |          15 |            100 |
| [ 1  2  3  4  5  6  7  8  9 10] | {'max_depth': 5, 'n_estimators': 100}  | 0.0395258 |           5 |            100 |
| [ 1  2  3  4  5  6  7  8  9 10] | {'max_depth': 10, 'n_estimators': 100} | 0.0402408 |          10 |            100 |
| [ 1  2  3  4  5  6  7  8  9 10] | {'max_depth': 15, 'n_estimators': 50}  | 0.0407645 |          15 |             50 |
| [ 1  2  3 20]                   | {'max_depth': 15, 'n_estimators': 100} | 0.0439092 |          15 |            100 |
| [ 1  2  3 20]                   | {'max_depth': 5, 'n_estimators': 100}  | 0.0449923 |           5 |            100 |
| [ 1  2  3 20]                   | {'max_depth': 5, 'n_estimators': 50}   | 0.0462237 |           5 |             50 |
| [1 2 3]                         | {'max_depth': 5, 'n_estimators': 50}   | 0.0486662 |           5 |             50 |
| [ 1  2  3 20]                   | {'max_depth': 10, 'n_estimators': 100} | 0.0489914 |          10 |            100 |
| [ 1  2  3 20]                   | {'max_depth': 10, 'n_estimators': 50}  | 0.0501932 |          10 |             50 |
| [1 2 3]                         | {'max_depth': 15, 'n_estimators': 100} | 0.0505563 |          15 |            100 |
| [ 1  2  3 20]                   | {'max_depth': 15, 'n_estimators': 50}  | 0.0512172 |          15 |             50 |
| [1 2 3]                         | {'max_depth': 5, 'n_estimators': 100}  | 0.0531229 |           5 |            100 |
| [1 2 3]                         | {'max_depth': 15, 'n_estimators': 50}  | 0.0602604 |          15 |             50 |
| [1 2 3]                         | {'max_depth': 10, 'n_estimators': 50}  | 0.0609513 |          10 |             50 |
| [1 2 3]                         | {'max_depth': 10, 'n_estimators': 100} | 0.0673343 |          10 |            100 |

Using forecaster in production

A trained model may be deployed in production in order to generate predictions regularly. Suppose predictions have to be generated on a weekly basis, for example, every Monday. By default, when using the predict method on a trained forecaster object, predictions start right after the last training observation. Therefore, the model could be retrained weekly, just before the first prediction is needed, and call the predict method. This strategy, although simple, may not be possible to use for several reasons:

  • Model training is very expensive and cannot be run as often.

  • The history with which the model was trained is no longer available.

  • The prediction frequency is so high that there is no time to train the model between predictions.

In these scenarios, the model must be able to predict at any time, even if it has not been recently trained.

Every model generated using skforecast has the last_window argument in its predict method. Using this argument, it is possible to provide only the past values needs to create the autoregressive predictors (lags) and thus, generate the predictions without the need to retrain the model.

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])

data['date'] = pd.to_datetime(data['date'], format='%Y/%m/%d')
data = data.set_index('date')
data = data.asfreq('MS')
data_train = data.loc[:'2005-01-01']
data_train.tail()
date y
2004-09-01 00:00:00 1.13443
2004-10-01 00:00:00 1.18101
2004-11-01 00:00:00 1.21604
2004-12-01 00:00:00 1.25724
2005-01-01 00:00:00 1.17069
forecaster = ForecasterAutoreg(
                    regressor = RandomForestRegressor(),
                    lags = 5
                )

forecaster.fit(y=data_train['y'])
2005-02-01    0.927480
2005-03-01    0.756215
2005-04-01    0.692595
Freq: MS, Name: pred, dtype: float64

As expected, predictions follow directly from the end of training data. When last window is provided, the forecaster uses this data to generate the lags needed as predictors and starts the prediction afterwards.

forecaster.predict(steps=3, last_window=data['y'].tail(5))
2008-07-01    0.803853
2008-08-01    0.870858
2008-09-01    0.905003
Freq: MS, Name: pred, dtype: float64

Since the provided last_window contains values from 2008-02-01 to 2008-06-01, the forecaster is able to create the needed lags and predict the next 5 steps.

WARNING:
It is important to note that the length of last windows must be enough to include the maximum lag used by the forecaster. Fore example, if the forecaster uses lags 1, 24, 48 and 72 last_window must include the last 72 values of the series.

Examples and tutorials

English

Español

Donating

If you found skforecast useful, you can support us with a donation. Your contribution will help to continue developing and improving this project. Many thanks!

paypal

Licence

joaquinAmatRodrigo/skforecast is licensed under the MIT License, a short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.

Comments
  • Bayesian Optimization

    Bayesian Optimization

    I am trying to tune the model using scikit-optimize. But a bunch of errors are coming up. I think it is a good idea to implement bayesian search for this library too.

    question 
    opened by CalenDario13 11
  • IndexError When lags is greater than number of steps skforecast==0.4.3

    IndexError When lags is greater than number of steps skforecast==0.4.3

    Another beginner question - what are the conditions for refit = True?

    I have below error:

    d:\programy\miniconda3\lib\site-packages\skforecast\ForecasterAutoreg\ForecasterAutoreg.py in _recursive_predict(self, steps, last_window, exog) 405 406 for i in range(steps): --> 407 X = last_window[-self.lags].reshape(1, -1) 408 if exog is not None: 409 X = np.column_stack((X, exog[i, ].reshape(1, -1)))

    IndexError: index -6 is out of bounds for axis 0 with size 4

    If it is important from input side I have following data:

    data.shape (50,) data_train.shape (37,) data_test.shape (13,) steps = 13 initial lags: lags = int(data_train.shape[0]*0.4) = 14

    whole grid search looks like that:

    forecaster_rf = ForecasterAutoreg(
                        regressor = XGBRegressor(verbosity=1),
                        lags = lags
                 )
    
    param_grid = {
                'gamma': [0.5, 1, 1.5, 2, 5],
                'subsample': [0.6, 0.8, 1.0],
                'colsample_bytree': [0.6, 0.8, 1.0],
                'max_depth': np.arange(2, 22, 2)
                }
    
    lags_grid = [6, 12, lags, [1, 3, 6, 12, lags]]
    

    below lags throws an error too: lags_grid = np.arange(1, 3, 1) lags_grid = [1]

    metric = mean_squared_log_error
    
    results_grid = grid_search_forecaster(
                            forecaster         = forecaster_rf,
                            y                  = data_train,
                            param_grid         = param_grid,
                            steps              = steps,
                            metric             = metric,
                            refit              = True,
                            initial_train_size = int(len(data_train)*0.5),
                            return_best        = True,
                            verbose            = True
                       )
    

    Originally posted by @spike8888 in https://github.com/JoaquinAmatRodrigo/skforecast/issues/137#issuecomment-1108727110

    bug 
    opened by JoaquinAmatRodrigo 9
  • ValueError for backtesting_forecaster when interval is provided

    ValueError for backtesting_forecaster when interval is provided

    Hi !

    I'm trying to use your backtesting_forecaster

    and when I use and ask for intervals, it leads to ValueError

    When no intervals are asked, all works perfectly:

    [in]:

    if __name__ == "__main__":
        import pandas as pd
        from skforecast.ForecasterAutoreg import ForecasterAutoreg
        from skforecast.model_selection import backtesting_forecaster
        from sklearn.ensemble import RandomForestRegressor
    
        y_train = pd.Series([479.157, 478.475, 481.205, 492.467, 490.42, 508.166, 523.182,
                             499.634, 495.88, 494.174, 494.174, 490.078, 490.078, 495.539,
                             488.713, 485.3, 493.491, 492.126, 493.832, 485.983, 481.887,
                             474.379, 433.084, 456.633, 477.451, 468.919, 484.959, 471.99,
                             486.324, 498.61, 517.381, 485.3, 480.864, 485.983, 484.276,
                             490.761, 490.078, 494.515, 495.88, 493.15, 491.443, 490.42,
                             485.3, 485.3, 486.665, 467.895, 441.616, 469.601, 477.11,
                             486.324, 485.3, 489.054, 494.856, 513.968, 544.683, 557.31,
                             574.374, 603.383, 617.034, 621.812, 627.273, 612.598, 598.605,
                             610.891, 598.605, 563.112, 542.635, 536.492, 499.634, 456.633,
                             431.037, 453.903, 464.141, 454.244, 456.633, 476.768, 495.88,
                             523.524, 537.516, 577.787, 600.994, 616.693, 631.71, 636.487,
                             621.471, 635.805, 625.908, 616.011, 581.2, 565.842, 553.556,
                             570.279, 514.992, 483.253, 460.046, 469.26, 475.745, 478.816,
                             482.57, 506.801, 510.896])
    
    
    
        backtesting_forecaster(
            forecaster=ForecasterAutoreg(regressor=RandomForestRegressor(random_state=42), lags=10),
            y=y_train,
            steps=24,
            metric="mean_absolute_percentage_error",
            initial_train_size=14,
            n_boot=50,
        )
    

    [out]:

    (array([0.07647964]),
               pred
     14   493.40924
     15   493.17717
     16   492.99968
     17   492.98603
     18   492.69932
     ..         ...
     96   492.98603
     97   492.98603
     98   492.98603
     99   492.98603
     100  492.98603
     
     [87 rows x 1 columns])
    

    however asking for intervals leads to ValueError

    [in]:

    if __name__ == "__main__":
        import pandas as pd
        from skforecast.ForecasterAutoreg import ForecasterAutoreg
        from skforecast.model_selection import backtesting_forecaster
        from sklearn.ensemble import RandomForestRegressor
    
        y_train = pd.Series([479.157, 478.475, 481.205, 492.467, 490.42, 508.166, 523.182,
                             499.634, 495.88, 494.174, 494.174, 490.078, 490.078, 495.539,
                             488.713, 485.3, 493.491, 492.126, 493.832, 485.983, 481.887,
                             474.379, 433.084, 456.633, 477.451, 468.919, 484.959, 471.99,
                             486.324, 498.61, 517.381, 485.3, 480.864, 485.983, 484.276,
                             490.761, 490.078, 494.515, 495.88, 493.15, 491.443, 490.42,
                             485.3, 485.3, 486.665, 467.895, 441.616, 469.601, 477.11,
                             486.324, 485.3, 489.054, 494.856, 513.968, 544.683, 557.31,
                             574.374, 603.383, 617.034, 621.812, 627.273, 612.598, 598.605,
                             610.891, 598.605, 563.112, 542.635, 536.492, 499.634, 456.633,
                             431.037, 453.903, 464.141, 454.244, 456.633, 476.768, 495.88,
                             523.524, 537.516, 577.787, 600.994, 616.693, 631.71, 636.487,
                             621.471, 635.805, 625.908, 616.011, 581.2, 565.842, 553.556,
                             570.279, 514.992, 483.253, 460.046, 469.26, 475.745, 478.816,
                             482.57, 506.801, 510.896])
    
    
    
        backtesting_forecaster(
            forecaster=ForecasterAutoreg(regressor=RandomForestRegressor(random_state=42), lags=10),
            y=y_train,
            steps=24,
            metric="mean_absolute_percentage_error",
            initial_train_size=14,
            interval=[95],
            n_boot=50,
        )
    

    [out]:

    Traceback (most recent call last):
      File "...\lib\site-packages\IPython\core\interactiveshell.py", line 3398, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-6-9cd2f471a2e7>", line 22, in <cell line: 22>
        backtesting_forecaster(
      File "...\lib\site-packages\skforecast\model_selection\model_selection.py", line 925, in backtesting_forecaster
        metric_value, backtest_predictions = _backtesting_forecaster_no_refit(
      File "...\lib\site-packages\skforecast\model_selection\model_selection.py", line 705, in _backtesting_forecaster_no_refit
        pred = forecaster.predict_interval(
      File "...\lib\site-packages\skforecast\ForecasterAutoreg\ForecasterAutoreg.py", line 757, in predict_interval
        predictions = pd.DataFrame(
      File "...\lib\site-packages\pandas\core\frame.py", line 694, in __init__
        mgr = ndarray_to_mgr(
      File "...\lib\site-packages\pandas\core\internals\construction.py", line 351, in ndarray_to_mgr
        _check_values_indices_shape_match(values, index, columns)
      File "...\dev\lib\site-packages\pandas\core\internals\construction.py", line 422, in _check_values_indices_shape_match
        raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
    ValueError: Shape of passed values is (24, 2), indices imply (24, 3)
    
    enhancement good first issue 
    opened by tkaraouzene 8
  • Pip Installation Fails on Macbook Pro M1

    Pip Installation Fails on Macbook Pro M1

    When trying to install skforecast using pip install skforecast, the install process fails with the following output:

          
    note: This error originates from a subprocess, and is likely not a problem with pip.
    ERROR: Failed building wheel for numpy
    Failed to build numpy
    ERROR: Could not build wheels for numpy, which is required to install pyproject.toml-based projects
    [end of output]
    
    note: This error originates from a subprocess, and is likely not a problem with pip.
    error: subprocess-exited-with-error
    
    × pip subprocess to install build dependencies did not run successfully.
    │ exit code: 1
    ╰─> See above for output.
    

    Unfortunately, the full output is too large, however this seems to happen when trying to install the build dependencies for statsmodels.

    Setup: MacBook Pro M1 MacOS Monterey 12.6

    Python 3.9.13 (main, Oct 5 2022, 11:08:35) Clang 14.0.0 (clang-1400.0.29.102)

    opened by bernhard-kaindl 7
  • How to disable verbosity on Skforecast?

    How to disable verbosity on Skforecast?

    TLDR

    as the title says, is there a way to pass a verbosity parameter to the fit() function ?

    I would love to not have INFOs printed. There are a lot and kinda slow down the code a little bit.

    question 
    opened by tulbureandreit 5
  • Question: Estrategia de validación cruzada demasiado costosa

    Question: Estrategia de validación cruzada demasiado costosa

    Hola,

    Creo que estás haciendo un gran trabajo con skforecast y los tutoriales que lo acompañan son muy didácticos. Antes de empezar a usar skforecast la validación cruzada que hago, usando por ejemplo LightGBM, es basarme en TimeSeriesSplit para crear, por ejemplo, 5 particiones, de manera que el proceso de validación cruzada queda más o menos así (la primera columna indica la parte de entrenamiento y la segunda la de validación): [0,1,2,3,4] [5] [0,1,2,3,4,5] [6] [0,1,2,3,4,5,6] [7] [0,1,2,3,4,5,6,7] [8] [0,1,2,3,4,5,6,7,8] [9]

    En el ejemplo de arriba tendría 5 conjuntos de predicciones validadas y la métrica final sería por ejemplo la media de las métricas obtenidas en cada una de las validaciones. Por tanto, cualquier prueba de selección de modelo estaría sujeta a este bucle de validación cruzada. Al tener 5 particiones o "folds", el modelo habría sido re-entrenado 5 veces lo cual permite usar este bucle de validación cruzada a la hora, por ejemplo, de ajustar hiperparámetros.

    Al usar la libreria skforecast, veo que se crea un "fold (para el caso del ForecasterAutoregDirect) del tamaño del número de pasos. En mi caso, esto hace casi siempre inviable el uso de la opción "refit=True", ya que el número de "folds" es muy alto y por tanto se hacen muchos "re-entrenamientos".ç

    Por favor, ¿podrías indicarme si hay alguna manera de usar la opción "refit=True" si que haya tantos "re-entrenamientos"?

    Un saludo y muchas gracias

    question 
    opened by benitocm 4
  • Plot the curve for learning of an lgbm forecaster

    Plot the curve for learning of an lgbm forecaster

    Hello! I am new to sk.forecast.ForecasterAutoReg. I tried to implement an LGBM model, and make the predictions on 4 steps ahead. I did manage to visualize the predictions, but I struggle how to plot the curve for the learning part to see how the model performs on the learning dataframe. Below is the code I have used.

    `forecaster = ForecasterAutoreg( regressor=LGBMRegressor(**lgbm_trial_params), lags=[1,2,3,4] )

    cols = [col for col in df.columns if col not in ['date', "Qty"]] exog = df[cols]

    forecaster.fit(y=df["Qty"][:-4], exog=exog[:-4]) predictions = forecaster.predict(steps = 4,exog = exog[:-4])`

    Qty represents the target variable I want to predict and my dataframe is structured on weekly data with month, week_num and last_month_average as features.

    Here is a printscreen of the structure of my dataframe:

    image

    I did not manage to find something useful searching the internet so any help will be much apreciatted. Thanks!

    question 
    opened by DariusMargineanNicolae 4
  • How to incorporate pipeline in bayesian_optuna_search?

    How to incorporate pipeline in bayesian_optuna_search?

    Hi there, recently I have tried skforecast version 0.5.x with _bayesian_optuna_search. In order to make sklearn SVR work properly, I have to add StandardScaler and intend to use Pipeline.

    However, after several trials, I cannot implement the pipeline in _bayesian_optuna_search. Are there any solutions?

    question 
    opened by andreale28 4
  • Support for multiple metrics in model_selection

    Support for multiple metrics in model_selection

    Add support for multiple metrics when backtesting.

    • If metric is an instance of str or callable behavior is the same as before
    • If metric is an instance of list, multiple metrics will be calculated. Only the first metric will be considered in order to select the best parameters.
    opened by Pablo-Davila 4
  • How to fill future known information from one of the exogenous variables?

    How to fill future known information from one of the exogenous variables?

    Hi developers,

    I have a question about fill future known information from one of the exogenous variables. For example, I have a data where is y as target variable and three exogenous variables X1,X2, and X3. If I have future known information of X3, but no future information of X1,X2. Either using directed or recursive methods, how to integrate the future known exogenous variables (X3) and without future known (X1,X2) and integrate this in the designed framework package?

    question 
    opened by kennis222 3
  • About get_coef in older version

    About get_coef in older version

    Hi developers,

    The function "get_coef" is deprecated since version 4.3.0, but I would like to use the get_coef with an older version since it easier to interpret to people without machine learning or statistical knowledge compared to use the "impurity-based feature importance" . However, when I use the function "get_coef" in the older version either 4.2.0 or 4.1.0, the program reports error with "AttributeError: 'ForecasterAutoregMultiOutput' object has no attribute 'get_coef'"

    Here is the codes I calls for the function: coef_ = [] for i,j in zip(range(1,7),pd.date_range(start='2022-01',periods=6,freq='M')): <space>df = pd.DataFrame() <space>df['feature'] = model_direct.get_coef(step=i)['feature'] <space>df['coef'] = model_direct.get_coef(step=i)['coef'] <space> df['date'] = pd.to_datetime(j) <space>coef_.append(df)

    Thanks.

    question 
    opened by kennis222 3
  • Question: forecaster.predict is not same as backtesting_forecaster predictions

    Question: forecaster.predict is not same as backtesting_forecaster predictions

    Hello I have a question. Why is the forecast value different even if the last_window data of forecaster.predict and the initial_train_data of backtesting_forecaster are set to be the same period? (using same forecaster) Did I miss anything?

    opened by relakuman 1
  • fcaster.fit Lightgmb freezes

    fcaster.fit Lightgmb freezes

    Hi,

    To be very short:

    I tried to fit a lightgbm model onto forecaster and it freezes every time without RAM or memory issues.

    Code:

    param_grid = {'n_estimators': 2000, 'boosting_type': 'dart', 'max_depth': 45, 'learning_rate': 0.01, 'num_leaves': 25, 'lambda_l1': 0.1, 'lambda_l2': 0.5, 'min_child_samples': 50}
    
      lags_lightgbm = 16
      forecaster = ForecasterAutoreg(
          regressor=LGBMRegressor(**param_grid), lags=lags_lightgbm
      )
      cols = [col for col in df_to_process_lightgbm.columns if col not in ["ds", "y"]]
    
      exog = df_to_process_lightgbm[cols]
    
      forecaster.fit(y=df_to_process_lightgbm["y"], exog=exog) -> here it freezes
    
      predictions = forecaster.predict(steps=8, exog=exog)
    
      predictions = predictions.values
    

    Environment: Ubuntu 20.04

    certifi                  2022.9.24
    charset-normalizer       2.1.1
    click                    8.1.3
    cliff                    4.1.0
    cmaes                    0.9.0
    cmd2                     2.4.2
    cmdstanpy                1.0.8
    colorlog                 6.7.0
    comm                     0.1.2
    contourpy                1.0.6
    convertdate              2.4.0
    cycler                   0.11.0
    Cython                   0.29.32
    debugpy                  1.6.4
    decorator                5.1.1
    entrypoints              0.4
    ephem                    4.1.3
    executing                1.2.0
    fonttools                4.38.0
    frozenlist               1.3.3
    greenlet                 2.0.1
    hijri-converter          2.2.4
    idna                     3.4
    importlib-metadata       5.1.0
    ipykernel                6.19.2
    ipython                  8.7.0
    ipywidgets               8.0.3
    isoweek                  1.3.3
    jedi                     0.18.2
    jmespath                 1.0.1
    joblib                   1.2.0
    kiwisolver               1.4.4
    lightgbm                 3.3.3
    LunarCalendar            0.0.9
    Mako                     1.2.4
    MarkupSafe               2.1.1
    matplotlib               3.5.0
    matplotlib-inline        0.1.6
    multidict                6.0.2
    mypy-extensions          0.4.3
    mysql-connector-python   8.0.31
    nest-asyncio             1.5.6
    numpy                    1.23.0
    nvidia-cublas-cu11       11.10.3.66
    nvidia-cuda-nvrtc-cu11   11.7.99
    nvidia-cuda-runtime-cu11 11.7.99
    nvidia-cudnn-cu11        8.5.0.96
    optuna                   2.10.0
    packaging                22.0
    pandas                   1.4.0
    parso                    0.8.3
    pathspec                 0.10.2
    patsy                    0.5.3
    pbr                      5.11.0
    pexpect                  4.8.0
    pickleshare              0.7.5
    Pillow                   9.3.0
    pip                      22.3.1
    platformdirs             2.6.0
    prettytable              3.5.0
    prompt-toolkit           3.0.36
    protobuf                 3.20.1
    psutil                   5.9.4
    ptyprocess               0.7.0
    pure-eval                0.2.2
    pyaml                    21.10.1
    Pygments                 2.13.0
    PyMeeus                  0.5.12
    pyparsing                3.0.9
    pyperclip                1.8.2
    python-dateutil          2.8.2
    python-slugify           6.1.2
    pytz                     2022.6
    PyYAML                   6.0
    pyzmq                    24.0.1
    s3transfer               0.6.0
    scikit-learn             1.1.0
    scikit-optimize          0.9.0
    scipy                    1.9.3
    seaborn                  0.11.0
    setuptools               65.6.3
    setuptools-git           1.2
    setuptools-scm           7.0.5
    six                      1.16.0
    skforecast               0.5.1
    stack-data               0.6.2
    statsmodels              0.13.0
    stevedore                4.1.1
    tenacity                 8.1.0
    text-unidecode           1.3
    threadpoolctl            3.1.0
    tomli                    2.0.1
    torch                    1.12.0
    torch-lr-finder          0.2.1
    tornado                  6.2
    tqdm                     4.64.0
    traitlets                5.7.1
    typing_extensions        4.4.0
    urllib3                  1.26.13
    wcwidth                  0.2.5
    wheel                    0.38.4
    widgetsnbextension       4.0.4
    yarl                     1.8.1
    
    opened by tulbureandreit 2
  • Forecasting Future Unknown Data

    Forecasting Future Unknown Data

    Hello,

    I divided the data into three part: Train, validation and test. I used the backtesting forecaster and predicted the validation data. Then I obtained a rmse via predicted and validation data. Now, I want to predict the test data without giving the test data into the model using backtesting forecaster. I will calculate a rmse again. Then I will forecast the future unknown data. How can I do that??

    Thank you.

    question 
    opened by yagmurn 5
  • Backtesting with overlap in the validation sets [parameter defining forecast origin shift]

    Backtesting with overlap in the validation sets [parameter defining forecast origin shift]

    Hi guys. I was thinking about performing backtesting with overlap in the validation sets. That could be done by having a parameter to define how many periods the forecast origin should advance/shift.

    I'll leave an image below that shows the desired backtesting method.

    What do you guys think?

    Screenshot 2022-10-27 at 12 12 26

    opened by flaviojfpereira 3
  • Question: is it possible to use early stopping of Lightbm, Catboost?

    Question: is it possible to use early stopping of Lightbm, Catboost?

    Hi,

    i am using gbt algoritms as base regressor for the forecaster. I am interested in using the early stopping feature of those kind algos. Is it possible?

    in the case of HistogramGradientboosting i think is easier becuse the early stopping is configures differently.

    Thank you in advance

    question 
    opened by benitocm 5
Releases(v.0.6.0)
Owner
Joaquín Amat Rodrigo
Data science, statistics and machine learning.
Joaquín Amat Rodrigo
MBTR is a python package for multivariate boosted tree regressors trained in parameter space.

MBTR is a python package for multivariate boosted tree regressors trained in parameter space.

SUPSI-DACD-ISAAC 61 Dec 19, 2022
PySpark + Scikit-learn = Sparkit-learn

Sparkit-learn PySpark + Scikit-learn = Sparkit-learn GitHub: https://github.com/lensacom/sparkit-learn About Sparkit-learn aims to provide scikit-lear

Lensa 1.1k Jan 4, 2023
My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data

kNN-vs-RFR My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data In many areas, rental bikes have been launched to

null 1 Oct 28, 2021
A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

null 802 Jan 1, 2023
To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

Astitva Veer Garg 1 Jan 11, 2022
learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

BDFD 6 Nov 5, 2022
Scikit learn library models to account for data and concept drift.

liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d

null 7 Nov 18, 2021
Iris species predictor app is used to classify iris species created using python's scikit-learn, fastapi, numpy and joblib packages.

Iris Species Predictor Iris species predictor app is used to classify iris species using their sepal length, sepal width, petal length and petal width

Siva Prakash 5 Apr 5, 2022
Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Penguins Classification App Penguins species predictor app is used to classify penguins species using their island, sex, bill length (mm), bill depth

Siva Prakash 3 Apr 5, 2022
Predicting Baseball Metric Clusters: Clustering Application in Python Using scikit-learn

Clustering Clustering Application in Python Using scikit-learn This repository contains the prediction of baseball metric clusters using MLB Statcast

Tom Weichle 2 Apr 18, 2022
Relevance Vector Machine implementation using the scikit-learn API.

scikit-rvm scikit-rvm is a Python module implementing the Relevance Vector Machine (RVM) machine learning technique using the scikit-learn API. Quicks

James Ritchie 204 Nov 18, 2022
K-Means clusternig example with Python and Scikit-learn

Unsupervised-Machine-Learning Flat Clustering K-Means clusternig example with Python and Scikit-learn Flat clustering Clustering algorithms group a se

Emin 1 Dec 13, 2021
Painless Machine Learning for python based on scikit-learn

PlainML Painless Machine Learning Library for python based on scikit-learn. Install pip install plainml Example from plainml import KnnModel, load_ir

null 1 Aug 6, 2022
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

Tamas Madl 482 Nov 19, 2022
Automated Machine Learning with scikit-learn

auto-sklearn auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. Find the documentation here

AutoML-Freiburg-Hannover 6.7k Jan 7, 2023
Distributed scikit-learn meta-estimators in PySpark

sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn

Ibotta 282 Dec 9, 2022
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Chris Santiago 0 Mar 30, 2022
Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets Datasets Used: Iris dataset,

Samrat Mitra 2 Nov 18, 2021
Scikit-Learn useful pre-defined Pipelines Hub

Scikit-Pipes Scikit-Learn useful pre-defined Pipelines Hub Usage: Install scikit-pipes It's advised to install sklearn-genetic using a virtual env, in

Rodrigo Arenas 1 Apr 26, 2022