Probabilistic time series modeling in Python

Amazon Web Services - Labs

Last update: Jan 3, 2023

Related tags

Machine Learning machine-learning deep-learning time-series mxnet pytorch neural-networks forecasting time-series-prediction time-series-forecasting

Overview

GluonTS - Probabilistic Time Series Modeling in Python

GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (incubating).

GluonTS provides utilities for loading and iterating over time series datasets, state of the art models ready to be trained, and building blocks to define your own models and quickly experiment with different solutions.

Installation

GluonTS requires Python 3.6, and the easiest way to install it is via pip:

pip install --upgrade mxnet~=1.7 gluonts

Dockerfiles

Dockerfiles compatible with Amazon Sagemaker can be found in the examples/dockerfiles folder.

Quick start guide

This simple example illustrates how to train a model from GluonTS on some data, and then use it to make predictions. As a first step, we need to collect some data: in this example we will use the volume of tweets mentioning the AMZN ticker symbol.

import pandas as pd
url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
df = pd.read_csv(url, header=0, index_col=0)

The first 100 data points look like follows:

import matplotlib.pyplot as plt
df[:100].plot(linewidth=2)
plt.grid(which='both')
plt.show()

We can now prepare a training dataset for our model to train on. Datasets in GluonTS are essentially iterable collections of dictionaries: each dictionary represents a time series with possibly associated features. For this example, we only have one entry, specified by the "start" field which is the timestamp of the first datapoint, and the "target" field containing time series data. For training, we will use data up to midnight on April 5th, 2015.

from gluonts.dataset.common import ListDataset
training_data = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-05 00:00:00"]}],
    freq = "5min"
)

A forecasting model in GluonTS is a predictor object. One way of obtaining predictors is by training a correspondent estimator. Instantiating an estimator requires specifying the frequency of the time series that it will handle, as well as the number of time steps to predict. In our example we're using 5 minutes data, so freq="5min", and we will train a model to predict the next hour, so prediction_length=12. We also specify some minimal training options.

from gluonts.model.deepar import DeepAREstimator
from gluonts.mx.trainer import Trainer

estimator = DeepAREstimator(freq="5min", prediction_length=12, trainer=Trainer(epochs=10))
predictor = estimator.train(training_data=training_data)

During training, useful information about the progress will be displayed. To get a full overview of the available options, please refer to the documentation of DeepAREstimator (or other estimators) and Trainer.

We're now ready to make predictions: we will forecast the hour following the midnight on April 15th, 2015.

test_data = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-15 00:00:00"]}],
    freq = "5min"
)

from gluonts.dataset.util import to_pandas

for test_entry, forecast in zip(test_data, predictor.predict(test_data)):
    to_pandas(test_entry)[-60:].plot(linewidth=2)
    forecast.plot(color='g', prediction_intervals=[50.0, 90.0])
plt.grid(which='both')

Note that the forecast is displayed in terms of a probability distribution: the shaded areas represent the 50% and 90% prediction intervals, respectively, centered around the median (dark green line).

Further examples

The following are good entry-points to understand how to use many features of GluonTS:

Quick Start Tutorial: a quick start guide.
Extended Forecasting Tutorial: a detailed tutorial on forecasting using GluonTS.
evaluate_model.py: how to train a model and compute evaluation metrics.
benchmark_m4.py: how to evaluate and compare multiple models on multiple datasets.

The following modules illustrate how custom models can be implemented:

gluonts.model.seasonal_naive: how to implement simple models using just NumPy and Pandas.
gluonts.model.simple_feedforward: how to define a trainable, Gluon-based model.

Contributing

If you wish to contribute to the project, please refer to our contribution guidelines.

Citing

If you use GluonTS in a scientific publication, we encourage you to add the following references to the related papers:

@article{gluonts_jmlr,
  author  = {Alexander Alexandrov and Konstantinos Benidis and Michael Bohlke-Schneider
    and Valentin Flunkert and Jan Gasthaus and Tim Januschowski and Danielle C. Maddix
    and Syama Rangapuram and David Salinas and Jasper Schulz and Lorenzo Stella and
    Ali Caner Türkmen and Yuyang Wang},
  title   = {{GluonTS: Probabilistic and Neural Time Series Modeling in Python}},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {116},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v21/19-820.html}
}

@article{gluonts_arxiv,
  author  = {Alexandrov, A. and Benidis, K. and Bohlke-Schneider, M. and
    Flunkert, V. and Gasthaus, J. and Januschowski, T. and Maddix, D. C.
    and Rangapuram, S. and Salinas, D. and Schulz, J. and Stella, L. and
    Türkmen, A. C. and Wang, Y.},
  title   = {{GluonTS: Probabilistic Time Series Modeling in Python}},
  journal = {arXiv preprint arXiv:1906.05264},
  year    = {2019}
}

Video

Neural Time Series with GluonTS

Description

In the past, when I used gluon-ts-0.5.0, num_workers>1 could not be set. I saw the code lostella released in #898 the other day and pulled it down for use. Num_workers can be set to greater than 1, but there are some problems. Problem 1: when epoch reaches about 200, the training will end. Problem 2: CUDA initialization error occurs when training the next target.

I made a demo to reproduce the problem, and some of the data will be uploaded. Thanks! @lostella

data.zip

To Reproduce

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
import os
import mxnet as mx
import numpy as np
import pandas as pd
from gluonts.dataset import common
from gluonts.evaluation import Evaluator
from gluonts.evaluation.backtest import make_evaluation_predictions
from gluonts.model import deepar
from gluonts.trainer import Trainer



def model_train(df):
    param_list = {
            'epochs': [10000],
            'num_layers': [4],
            'learning_rate': [1e-2],
            'mini_batch_size': [32],
            'num_cells': [40],
            'cell_type': ['lstm'],
        }
    prediction_length = 12
    freq = '2H'
    re_day = 7
    train_time = df.iloc[-1 + (-24) * re_day].monitor_time
    end_time = df.iloc[-1].monitor_time
    test_time = pd.date_range(start=train_time, end=end_time, freq='H')[1:]
    df = df.set_index('monitor_time')
    model_i = 0
    a = []
    for i, _ in enumerate(test_time):
        a.append({"start": df.index[0], "target": df.Measured[:str(test_time[i])]})
    data = common.ListDataset([{"start": df.index[0],
                                    "target": df.Measured[:train_time]}],
                                  freq=freq)

    val_data = common.ListDataset(a, freq=freq)
    for epochs_i in param_list['epochs']:
        for batch_i in param_list['mini_batch_size']:
            for lr_i in param_list['learning_rate']:
                for cells_i in param_list['num_cells']:
                    for layers_i in param_list['num_layers']:
                        for type_i in param_list['cell_type']:
                            estimator = deepar.DeepAREstimator(
                                prediction_length=prediction_length,
                                context_length=prediction_length,
                                freq=freq,
                                num_layers=layers_i,
                                num_cells=cells_i,
                                cell_type=type_i,

                                trainer=Trainer(
                                    ctx=mx.gpu(),
                                    epochs=epochs_i,
                                    learning_rate=lr_i,
                                    hybridize=True,
                                    batch_size=batch_i,
                                ),
                            )

                            predictor = estimator.train(training_data=data, num_workers=2, num_prefetch=96)
                            forecast_it, ts_it = make_evaluation_predictions(val_data, predictor=predictor,
                                                                             num_samples=100)
                            forecasts = list(forecast_it)
                            tss = list(ts_it)

                            evaluator = Evaluator(quantiles=[0.5], seasonality=2016)
                            agg_metrics, item_metrics = evaluator(iter(tss), iter(forecasts), num_series=len(val_data))

                            if model_i == 0:
                                df_metrics = pd.DataFrame(columns=list(agg_metrics))

                            values_metrics = []
                            for k in agg_metrics:
                                values_metrics.append(agg_metrics[k])

                            df_metrics.loc[model_i, :] = values_metrics
                            model_i = model_i + 1


    best_model_ind = np.argmin(df_metrics['RMSE'].values)
    print('The best model index is {}, mae {}, rmese {}'.format(
        best_model_ind, df_metrics.loc[best_model_ind, 'abs_error'] / prediction_length,
        df_metrics.loc[best_model_ind, 'RMSE']))
    return df_metrics, best_model_ind

def file_name_get(item, spe_file):
    for root, dirs, files in os.walk(spe_file):
        file = []
        for i in files:
            if item in i:
                file.append(i)
        return file


if __name__=='__main__':
    data_file = 'data'
    files = file_name_get('data', data_file)
    for file in files:
        df = pd.read_csv(file)
        df_metrics, best_model_ind = model_train(df)

Error message or code output

100%|███| 50/50 [00:01<00:00, 31.63it/s, epoch=194/10000, avg_epoch_loss=-.0183]
100%|███| 50/50 [00:01<00:00, 31.55it/s, epoch=195/10000, avg_epoch_loss=0.0884]
WARNING:root:Serializing RepresentableBlockPredictor instances does not save the prediction network structure in a backwards-compatible manner. Be careful not to use this method in production.
Running evaluation: 100%|████████████████████| 336/336 [00:02<00:00, 112.13it/s]

Train process 0 ,Epochs 10000, Batch_size: 32, Learning_rate: 0.01, Num_cells: 40, Num_layers: 4, cell_type: lstm
0%| | 0/50 [00:00<?, ?it/s][06:25:22] src/engine/threaded_engine_perdevice.cc:101: Ignore CUDA Error [06:25:22] /home/ubuntu/mxnet-distro/mxnet-build/3rdparty/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess: CUDA: initialization error
Stack trace:
[bt] (0) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x6b8b5b) [0x7ff3de97fb5b]
[bt] (1) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37ab842) [0x7ff3e1a72842]
[bt] (2) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37ceece) [0x7ff3e1a95ece]
[bt] (3) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37c19d1) [0x7ff3e1a889d1]
[bt] (4) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37b74a1) [0x7ff3e1a7e4a1]
[bt] (5) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37b83f4) [0x7ff3e1a7f3f4]
[bt] (6) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::Chunk::~Chunk()+0x3c2) [0x7ff3e1cada42]
[bt] (7) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x6bc30a) [0x7ff3de98330a]
[bt] (8) /home/cjk/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(MXNDArrayFree+0x54) [0x7ff3e19e89c4]

Environment

Operating system: Ubuntu 18.04.2
Python version: Python 3.7.4
GluonTS version: the code released in #898
MXNet version: mxnet-cu101 1.6.0
CPU cores: 14
GPU information: 3b:00.0 3D controller: NVIDIA Corporation GV100 [Tesla V100 PCIe] (rev a1) b1:00.0 3D controller: NVIDIA Corporation GV100 [Tesla V100 PCIe] (rev a1)

bug

opened by k-user 22

Remove mandatory `freq` attribute of `Predictor`.

Issue #, if available:

Description of changes:

Follow-up changes to #1997

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup
BREAKING

opened by kashif 20
Implemented model iteration averaging to reduce model variance
Issue #, if available:

Description of changes:

In model_iteration_averaging.py, implemented model averaging across iterations during training instead of epochs after training

Implemented 3 different averaging triggers: NTA (NTA_V1 is the ICLR version: https://openreview.net/pdf?id=SyyGPP0TZ, NTA_V2 is the arxiv version: https://arxiv.org/pdf/1708.02182.pdf), and Alpha Suffix (https://arxiv.org/pdf/1109.5647.pdf)

Integrated both epoch averaging and iteration averaging in Trainer (mx/trainer/_base.py)

Wrote test in test/trainer/test_model_iteration_averaging.py

The overall goal is to reduce the model variance. We test iteration averaging on DeepAR anomaly detection (examples\anomaly_detection.py, electricity data) We train the model with 20 different random seeds, and report the variance on the same batch of target sequences (take variance on each timestamp, and then take the average over the entire sequence and all samples) The results are as follows: | | n or alpha | var | var/mean | std | std/mean | RMSE | |-----------------|--------------|---------|------------|---------|------------|---------| | SelectNBestMean | 1 | 9552.24 | 0.508395 | 22.5279 | 0.0318269 | 414.924 | | SelectNBestMean | 5 | 8236.13 | 0.41966 | 19.9947 | 0.0253164 | 411.92 | | NTA_V1 | 5 | 5888.36 | 0.387781 | 16.7624 | 0.0253107 | 412.792 | | NTA_V2 | 5 | 6422.11 | 0.394004 | 17.7947 | 0.0237186 | 416.328 | | Alpha_Suffix | 0.2 | 5877.92 | 0.384664 | 16.6868 | 0.030484 | 408.711 | | Alpha_Suffix | 0.4 | 5814.86 | 0.378298 | 16.6081 | 0.0290987 | 409.952 |

Although we haven't tuned the hyperparameters, we've already obtained smaller variance and better RMSE.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
opened by xcgoner 20
Predictions are way too high when modeling an intermittent count series with DeepAR and NegBin distribution?

I'm trying to model a simulated series of weekly seasonal intermittent sales, with values between 0 and 4. I generated 5 years of simulated data:

I trained a DeepAR model with the output distribution set to Negative Binomial (all other settings were the default settings), on 3 years, and generated predictions for the next two. I got the following results (plotting the [70.0, 80.0, 95.0] predictions intervals):

Increasing number of training epochs doesn't change anything, the loss falls to its lowest value around the 8th to 10th epoch and hovers more or less around there, whether I train for 10 or 100 epochs. I thought training on 3 years and testing on 2 might be too ambitious, so I tried 4y/1y split instead, and the results got much worse - and downright strange - this time with values climbing into the 100s, even though the largest historical value the series ever reaches is 4 (I'm using the same input series, but is seems flat now because the scale is completely skewed by how large the predictions are):

I'm wondering if I am doing anything wrong? Are there any special settings for DeepAR when applied to intermittent series?

For comparison, the DeepAREstimator worked pretty well out of the box for more traditional series (using Student's distribution), for example:

Details:

Train data: [{'start': Timestamp('2014-01-05 00:00:00', freq='W-SUN'), 'target': array([1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 1., 1., 1., 1., 1., 2., 0., 0., 1., 2., 2., 1., 4., 1., 2., 1., 0., 0., 2., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0., 0., 1., 2., 1., 2., 0., 1., 1., 2., 3., 2., 2., 1., 1., 3., 4., 1., 1., 0., 0., 3., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 0., 0., 0., 1., 1., 0., 2., 1., 1., 0., 1., 0., 1., 2., 2., 1., 2., 3., 3., 1., 2., 2., 0., 0., 2., 0., 3., 0., 1., 2., 0., 1., 1.], dtype=float32), 'source': SourceContext(source='list_data', row=1)}]

Test data: {'start': Timestamp('2017-01-08 00:00:00', freq='W-SUN'), 'target': array([2., 1., 2., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 1., 2., 3., 1., 0., 3., 2., 1., 0., 0., 2., 2., 2., 1., 0., 2., 0., 2., 2., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 2., 0., 0., 4., 1., 2., 2., 1., 3., 1., 2., 1., 2., 1., 2., 3., 3., 1., 2., 0.], dtype=float32), 'source': SourceContext(source='list_data', row=1)}

Estimator used:

estimator = DeepAREstimator(freq="W", prediction_length=105, trainer=Trainer(epochs=10),distr_output=NegativeBinomialOutput()) predictor = estimator.train(training_data=training_data)
question

opened by SkanderHn 20
PyTorch implementation of DeepAR
Work in progress, open for comments.

This ports the PyTorch implementation of DeepAR from PyTorchTS (cc @kashif), with some changes:

The estimator class was slightly refactored, and in particular the way data loaders are set up is more in line with other estimators (but I want to try out a few things here, this is giving me some thoughts)

No specific "trainer" class was implemented, and instead the estimator relies on PyTorch Lightning for this.

The network is now down to a single class implementing both loss computation and sample paths prediction, following torch's .training convention

A thin extension to the network provides the interface used by Lightning

A few surrounding, related changes are also included.

Some open questions:

should the dtype and device be specified at constructor time for the estimator? Or is it something we want to pass to the train method?

the base estimator class is really PyTorch Lightning oriented: should it be called PyTorchLightningEstimator?

we would now have gluonts.model containing existing models (mxnet based) and gluonts.torch.model containing this one; should the mxnet ones moved to gluonts.mx.model for the sake of symmetry?

TODOs (partial list, probably):

[x] cover also validation data in tests

[x] remove the input_size parameter from the estimator (this should probably be inferred from the other ones)

[x] re-include the option to pseudo-shuffle batches at training time

[x] improve tests (also serde and so on)

[x] open issue on the left-over features of the model, and make it release-blocking

[x] run experiments to check the model accuracy

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup
new feature
opened by lostella 19

potential bottleneck in training

Description

I profiled my training which was taking too long and here is what I believe the part that is taking the longest:

Profile stats for: run_training_epoch
         309984 function calls (302234 primitive calls) in 82.921 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   82.921   82.921 base.py:168(run)
    520/2    0.000    0.000   82.921   41.460 {built-in method builtins.next}
        2    0.000    0.000   82.792   41.396 fetching.py:271(_fetch_next_batch)
        4    0.000    0.000   82.792   20.698 apply_func.py:73(apply_to_collection)
        2    0.000    0.000   82.792   41.396 supporters.py:547(__next__)
        2    0.000    0.000   82.792   41.396 supporters.py:555(request_next_batch)
        2    0.000    0.000   82.792   41.396 itertools.py:174(__iter__)
        2    0.000    0.000   82.792   41.396 dataloader.py:639(__next__)
        2    0.001    0.000   82.792   41.396 dataloader.py:680(_next_data)
        2    0.001    0.001   82.791   41.395 fetch.py:24(fetch)
      512    0.000    0.000   82.777    0.162 util.py:140(__iter__)
  992/512    0.001    0.000   82.777    0.162 _base.py:102(__iter__)
 5312/512    0.018    0.000   82.776    0.162 _base.py:121(__call__)
      512    0.005    0.000   82.773    0.162 _base.py:174(__call__)
      479    0.001    0.000   81.289    0.170 itertools.py:68(__iter__)
      479    0.013    0.000   78.146    0.163 feature.py:354(map_transform)
      479    0.014    0.000   76.209    0.159 feature.py:367(<listcomp>)
     2395    0.026    0.000   73.799    0.031 extension.py:67(fget)
    14851    0.015    0.000   73.575    0.005 {built-in method builtins.getattr}
     2395    0.020    0.000   73.559    0.031 period.py:97(f)
     2395   73.505    0.031   73.505    0.031 {pandas._libs.tslibs.period.get_period_field_arr}
        1    0.000    0.000   42.221   42.221 training_epoch_loop.py:157(advance)
...

To reproduce kindly train the pytorch DeepAR estimator with:

...
trainer_kwargs=dict(..., profiler="advanced"),
...

and train with num_workers=0

bug

opened by kashif 18

Add `TimeLimitCallback` to `mx/trainer` callbacks.

Issue #, if available:

Description of changes: Add TimelimitCallback so that user can set a time limit to the training process.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup
BREAKING new feature

opened by yx1215 18
Predict for future date without target value

I need to predict for future dates with some dates missing in between the training date and the date I wan to predict. So I wont be having any target values. When I use Nan for target series, My forecast is mostly on 0.
question

opened by ManikandanThangavelu 18
Use `pd.Period` instead of `pd.Timestamp`.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup
enhancement BREAKING other change

opened by jaheba 17
Multiprocessing data loader.
Issue, if available: With a multiprocessing data loader we should overcome data loading bottlenecks. Will fix issue https://github.com/awslabs/gluon-ts/issues/682.

Description of changes:

Datasets use the class attributes of MPWorkerInfo to get information about their multiprocessing environment.

Datasets are replicated among workers (only Object reference though, not the physical dataset), this happens exactly one in the beginning of the training

Datasets are not cached by default (caching not implemented so far)

Data loading can now be done in a multiprocessing fashion by specifying the number of workers, this works for training set and validation set (for inference there is some bug for now, but that has the least impact on performance of all datasets)

Parallelisation for datasets: modulo based, i.e. every num_workerth ts will be assigned to the corresponding worker. // however, this does not guarantee that the batches will always be sampled from equidistant locations for training, since some workers could potentially be slower or faster

The data loaders return batches of transformed samples of batch_size in the requested context. The transformation is done according to the provided transformation.

There is no threading support (wouldn't make sense since we are also doing computation heavy transformations), there is no memory pinning support (not necessary since we load the batches into the right context right away)

Which exact batches and samples one gets is nondeterministic if num_workers > 0

Future extensibility:

the main functions to modify will be the batching function and the stacking function, the transformation can already be replaced to any that produces a list of samples if applied to a dataset

any dataset that makes use of the MPWorkerInfo class can be effectively parallelelized

Missing functionality:

Shuffling (beyond a single batch)

Dataset caching

Correct documentation

Current bugs:

No mp support for windows due to pickling error.

No mp support for InferenceDataLoader due to pickling error.

Possible improvements

Create named tuple for all the different data the worker processes use

Only pass subset of dataset to worker

Switch away from Pool to something that allows for more fine-grained control, like manually creating Processes as seen in Pytorch's data-loader or make use of libraries like Ray using Actors

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
opened by AaronSpieler 17
Multivariate time series forecasting question

My apologies for the ignorant questions in advance, while I’m not necessarily new to deep learning, I’m a new fairly new to time series forecasting, especially when using deep learning techniques for it.

Due to the fact gluon-ts is making use of DL based approaches, dealing with non-stationarity in training datasets is not necessary, unlike when using AR/MA and VAR based models, correct? This appears to be outlined here.

Also, I am working with a multivariate time series dataset in which the target/dependent variable is related and/or dependent on other features/independent variables. So, while I’m only trying to predict one target variable, the relationship between this target variable and the other features is important; consequently, this leads to two questions.

First, since the relationship between the target variable and other features is important, are the most applicable models deepvar and gpvar or will other models in gluon-ts work and I’m just thinking too much in terms of classical time series forecasting?

Second, if I’m using deepvar or gpvar, I’m assuming that when making the dataset, the target should be a vector of vectors which include my target variable and the other features, right? However, if I’m thinking too much in terms of classical time series forecasting, target should be a vector of the target variable and I should store the other features as vectors of vectors in either dynamic_feat or cat, right?

Again, I’m sorry for my ignorance. Thanks in advance for any assistance you provide.
question

opened by CMobley7 17
TemporalFusionTransformer implementation in PyTorch
Description of changes:

Unlike the MXNet implementation of TFT, the PyTorch versions of model/estimator are compatible with the dataset schema used by DeepAR. For example, if we construct the estimator with as follows

estimator = TemporalFusionTransformerEstimator( ..., dynamic_dims=[2, 5, 7], dynamic_cardinalities=[6, 5, 5, 4, 2], past_dynamic_dims=[3, 3], past_dynamic_cardinalities=[5, 2, 4], )

it will expect to receive a dataset, where each time series has keys

"feat_dynamic_real": shape [..., sum(dynamic_dims)] (inside the network this will feature get partitioned into 3 chunks of dim 2, 5, 7)

"feat_dynamic_cat": shape [..., len(dynamic_cardinalities)]

"past_feat_dynamic_real": shape [..., sum(past_dynamic_dims)]

"past_feat_dynamic_cat": shape [..., len(past_dynamic_cardinalities)]

To do:

[ ] Move QuantileOutput and TFTInstanceSplitter to a another folder?

[ ] Benchmarking

[ ] Add tests

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup
opened by shchur 2

Wrong splitting when generating rolling datasets on multivariate DataFrame

Description

When calling generate_rolling_dataset on a multivariate dataset (dataset = PandasDataset(dict(df_wide))), it will generates some time series with different lengths in rolled (rolled = generate_rolling_dataset(dataset=dataset, strategy = StepStrategy(prediction_length=1). Let the lengths be [5, 6, 7, 8, 9] and the target dim (i.e., the number of variates) be 3 for simplicity.

If I use MultivariateGrouper on the rolled, it will split the rolled into several data entries (in multivariate_grouper). We hope that series in each data entry share the same length, where series in different data entries have different lengths. Which means, hopefully, we will have five data entries (with length of 5, 6, 7, 8, 9) and each entry contains three (corresponding to the variate number) time series. We denote this as: (5,5,5), (6,6,6),..., (9,9,9)

However, the series in each data entry have different lengths, like (5,6,7),(5,7,8),(5,7,9)...., meaning that these series are not assigned as we hope.

To Reproduce

df_wide = $\quad\quad \quad$ || series 1 || series2 2000-01-01 || 0 || 1 2000-01-02 || 0 || 1 2000-01-03 || 0 || 1 2000-01-05 || 1 || 0 ... 2000-01-09 || 1 || 0

dataset = PandasDataset(dict(df_wide))
rolled = generate_rolling_dataset(dataset=dataset, strategy = StepStrategy(prediction_length=2),start_time="2000-01-04")

test_grouper = MultivariateGrouper(num_test_dates=5, max_target_dim=2)
dataset_test = test_grouper(rolled)

Error message or code output

(Paste the complete error message, including stack trace, or the undesired output that the above snippet produces.)

/home/cheryl/anaconda3/envs/diff/lib/python3.7/site-packages/gluonts/dataset/multivariate_grouper.py:197: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  return {FieldName.TARGET: np.array([funcs(data) for data in dataset])}
Traceback (most recent call last):
......
  File "/home/cheryl/workspace/diff/Ongoing/dataload.py", line 80, in for_pred
    dataset_test = test_grouper(rolled)
  File "/home/cheryl/anaconda3/envs/diff/lib/python3.7/site-packages/gluonts/dataset/multivariate_grouper.py", line 87, in __call__
    return self._group_all(dataset)
  File "/home/cheryl/anaconda3/envs/diff/lib/python3.7/site-packages/gluonts/dataset/multivariate_grouper.py", line 125, in _group_all
    grouped_dataset = self._prepare_test_data(dataset)
  File "/home/cheryl/anaconda3/envs/diff/lib/python3.7/site-packages/gluonts/dataset/multivariate_grouper.py", line 162, in _prepare_test_data
    list(dataset_at_test_date), dtype=np.float32
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (21,) + inhomogeneous part.

Environment

Operating system: Linux Ubuntu
Python version: 3.7
GluonTS version: 0.10.0
MXNet version:

(Add as much information about your environment as possible, e.g. dependencies versions.)

bug

opened by cherylLbt 2

Integrating N-Linear and D-Linear

Are there any plans to integrate those models as proposed in

Zeng, A., Chen, M., Zhang, L., & Xu, Q. (2022). Are Transformers Effective for Time Series Forecasting?. arXiv preprint arXiv:2205.13504.

Darts already implemented them. These models are also probabilistic.

https://unit8co.github.io/darts/generated_api/darts.models.forecasting.dlinear.html#r844e17822ca3-1
enhancement

opened by baniasbaabe 0
add TDformer

This is the official implementation of the paper "First De-trend then Attend: Rethinking Attention for Time-Series Forecasting" (https://openreview.net/pdf?id=GLc8Rhney0e), which is an internship project at AWS and has been accepted by NeurIPS 2022 All Things Attention workshop.
new feature

opened by xiyuanzh 0
Consolidate `DeepNPTSEstimator`
Description of changes:

Align DeepNPTSEstimator.train arguments to those of Estimator and PyTorchLightningEstimator in particular (move many of its arguments to the constructor)

Make dropout_rate not Optional: it's a float that can be zero

Consolidate tests for DeepNPTSEstimator with similar ones for other estimator classes

Rename DeepNPTSMultiStepPredictor -> DeepNPTSMultiStepNetwork to avoid confusion (since it's not a Predictor subclass)

Fix docstrings that ended up formatted wrong for some reason

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup
enhancement BREAKING
opened by lostella 1
calculate_seasonal_error bug for multivariate data
Description

calculate_seasonal_error is buggy when the inputs are multivariate data. past_data = extract_past_data(time_series, forecast) results in (dim, time) shaped array rather than a (time,) shaped array, which is then passed into calculate_seasonal_error, which calls the follow code snippet. https://github.com/awslabs/gluonts/blob/cac5e6b91cbd545c0811b20849290e9215c1b822/src/gluonts/evaluation/_base.py#L323-L326 Rather than calculating the seasonal error, it is differencing between multivariate dimensions.

https://github.com/awslabs/gluonts/blob/4fef7e26470d15096b11b005be846dedf87fb736/src/gluonts/evaluation/metrics.py#L49-L50

To Reproduce

(Please provide minimal example of code snippet that reproduces the error. For existing examples, please provide link.)

from datetime import timedelta import pandas as pd import numpy as np from gluonts.evaluation import MultivariateEvaluator from gluonts.model.forecast import QuantileForecast dim = 5 time = 20 past = 40 forecast = QuantileForecast( forecast_arrays=np.random.randn(3, dim, time), forecast_keys=['mean', '0.95', '0.05'], start_date=pd.to_datetime('today').to_period('H') ) time_series = pd.DataFrame(zip(*[np.arange(past + time) for _ in range(dim)]), pd.period_range(start=pd.to_datetime('today') - timedelta(hours=past), periods=past+time, freq='H')) evaluator = MultivariateEvaluator(seasonality=1) evaluator.get_metrics_per_ts(time_series, forecast)

Error message or code output

'seasonal_error': 0.0

Desired message or code output

'seasonal_error': 1.0

Environment

GluonTS version: '0.11.4'

bug
opened by gorold 5

Releases(v0.11.7)

v0.11.7(Jan 3, 2023)
Backporting fixes:

Make serde.dataclass always kw-only. (#2428 by @jaheba)

Fix serde.dataclass inheritance handling. (#2512 by @jaheba)

Fix QuantileForecast.quantile in case only mean is stored (#2513 by @lostella)

Remove mypy plugin for dataclass. (#2514 by @jaheba)

GH Actions: Use authenticated requests for just. (#2522 by @jaheba)

Fix aggregate_valid for non-numerical columns (#2526 by @lostella)

Source code(tar.gz)
Source code(zip)
v0.11.6(Dec 20, 2022)
Backporting fixes:

itertools.select. #2426 by @jaheba

Fix dataclass handling of member inheritance. #2492 by @jaheba

Fix: sort dataset keys in error message when importing non-existing dataset #2497 by @lostella

Fix DateSplitter for multiples of base frequencies #2500 by @lostella

Fix docstrings according to docformatter #2501 by @lostella

Add examples to docstring for periods_between #2504 by @lostella

Cap numpy compatibility in mxnet extra requirements #2506 by @lostella

Pin docformatter version. #2507 by @jaheba

Docs: Fix install instructions. #2508 by @jaheba

Source code(tar.gz)
Source code(zip)
v0.11.5(Dec 13, 2022)
What's Changed

Backports for v0.11.5. by @jaheba in https://github.com/awslabs/gluonts/pull/2491

Full Changelog: https://github.com/awslabs/gluonts/compare/v0.11.4...v0.11.5
Source code(tar.gz)
Source code(zip)
v0.11.4(Dec 5, 2022)
Backports:

Fix pandas issue with inferring start of X frequency. (#2462 by @jaheba)

Source code(tar.gz)
Source code(zip)
v0.11.3(Nov 24, 2022)
Backporting fixes:

Add test cases for PandasDataset, fix missing assertion (#2453 by @lostella)

Speed up PandasDataset further (#2441 by @lostella)

Fix MANIFEST.in (#2456 by @lostella)

Source code(tar.gz)
Source code(zip)
v0.11.3rc1(Nov 24, 2022)
Backporting fixes:

Add test cases for PandasDataset, fix missing assertion (#2453 by @lostella)

Speed up PandasDataset further (#2441 by @lostella)

Fix MANIFEST.in (#2456 by @lostella)

Source code(tar.gz)
Source code(zip)
v0.11.2(Nov 21, 2022)
Backporting fixes:

Fix rotbaum random seed and num_samples argument. (#2408 by @sighellan)

Hierarchical: Make sure the input S matrix is of right dtype (#2415 by @rshyamsundar)

Mypy fixes (#2427 by @jaheba)

Speed up PandasDataset for long dataframes (#2435 by @lostella)

Fix frequency inference in PandasDataset (#2442 by @lostella)

Tests: Change Python versions. (#2448 by @jaheba)

Source code(tar.gz)
Source code(zip)
v0.11.1(Oct 28, 2022)
Backporting fixes:

Fix dominick dataset bug. (#2364 by @haskarb)

Remove strange quoting marks from docstrings (#2368 by @lostella)

Consistent use of term "prediction interval" (#2373 by @codingWhale13)

Fix MQCMM ignoring zero-seed. (#2379 by @sighellan)

Source code(tar.gz)
Source code(zip)
v0.10.8(Oct 28, 2022)
Backporting fixes:

Fix numerical bug in BinnedUniforms (#2344 by @moudheus)

Fix dominick dataset bug. (#2364 by @haskarb)

Fix MQCMM ignoring zero-seed. (#2379 by @sighellan)

Source code(tar.gz)
Source code(zip)
v0.11.0(Oct 10, 2022)
Overview

Incremental training

Estimators are now re-trainable on new data, using the train_from method. This accepts a previously trained model (predictor), and new data to train on, and can greatly reduce training time if combined with early stopping. The feature is integrated with gluonts.shell-based SageMaker containers, and can be used by specifying the additional model channel to point to the output of a previous training job. More info in #2249.

New models

Two models are added in this release:

DeepVARHierarchicalEstimator, a hierarchical extension to DeepVAREstimator; learn more about how to use this in this tutorial.

DeepNPTSEstimator, a global extension to NPTS, where sampling probabilities are learned from data; learn more on how to use this estimator here.

Deprecated import paths and options

This release moves MXNet-based models from gluonts.model to gluonts.mx.model; the old import paths continue working in this release, but are deprecated and will be removed in the next release. For example, now the MXNet-based DeepAREstimator should be imported from gluonts.mx (or gluonts.mx.model.deepar).

We also removed deprecated options for learning rate reduction in the gluonts.mx.Trainer class: these can now be controlled via the LearningRateReduction callback.

Dataset splitting functionality (experimental)

We updated the functionality to split time series datasets (along the time axis) for training/validation/test purposes. Now this functionality can be easily accessed via the split function (from gluonts.dataset.split import split); learn more about this here.

This feature is experimental and subject to future changes.

Changelog

Breaking changes

Breaking: Update data splitters to return (input, output) pairs in the test split (#2031 by @npnv)

Breaking: Move MXNet-based models to mx.model. (#2126 by @Hongqing-work)

Convert time-features into functions. (#2149 by @jaheba)

Remove deprecated args from mx.Trainer. (#2153 by @jaheba)

Reduce sdist size. (#2199 by @jaheba)

Remove core.exception module. (#2202 by @jaheba)

Remove core.ty. (#2203 by @jaheba)

Update gluonts.dataset.split code, test, docs (#2223 by @lostella)

Remove gluonts_forecasters entrypoint mechanic. (#2278 by @jaheba)

Enable 'python -m gluonts'. (#2292 by @jaheba)

New features / major improvements

Interrupting mx.Trainer stops training. (#2131 by @Hongqing-work)

Expose evaluator aggregation_strategy functions (#2198 by @kashif)

Add data preparation utility for hierarchical time series and a tutorial notebook (#2206 by @rshyamsundar)

Add Deep NPTS model (#1835 by @rshyamsundar)

Improve arrow reading performance. (#2217 by @mr-1993)

Allow DeepVAR model to use (global) dynamic features (#2226 by @rshyamsundar)

Hierarchical: Allow use of external dynamic features and add a section in the tutorial (#2253 by @rshyamsundar)

Add serde.dataclass. (#2166 by @jaheba)

R: Add Python wrapper for calling R's hierarchical methods (#1685 by @rshyamsundar)

Add learning rate and weight decay arguments to PyTorch estimators (#2289)

Added LR scheduler to DeepAR Pytorch (#2287 by @shubhamkapoor)

Add LR scheduling patience option to MQF2 (#2291 by @lostella)

Add incremental training (#2249 by @lostella)

Add input size and type information to DeepARModel, and example_input_array to DeepARLightningModule. (#2307 by @jgasthaus)

Add dataset.schema.translate. (#2304 by @jaheba)

Add forecast_start to entry-wise metrics in evaluator (#2312 by @lostella)

Bug fixes / minor improvements

Fix DatasetCollection (#2135 by @rsnirwan)

Fix PandasDataset for Python 3.9 (#2141 by @lostella)

Make PandasDataset faster (#2148 by @lostella)

Ignore divide warnings in evaluation. (#2159 by @jaheba)

Fix Prophet wrapper to work with Timestamp instead of Period (#2182 by @lostella)

Fix dtype for "item_id" column in metrics dataframe (#2183 by @lostella)

Fix recursive case for gluonts.mx.batchify.stack (#2184 by @lostella)

Fix item_id values in ConstantValuePredictor (#2192 by @codingWhale13)

Fixup Patience class. (#2197 by @jaheba)

Fix dataset arrow writer tool. (#2196 by @jaheba)

Fix SymbolBlock serde issue (#2187 by @lostella)

Add item id to Uber TLC dataset (#2214 by @mvanness354)

Fix r_forecast wrapper to shift start date when truncating time series (#2216 by @abdulfatir)

Fix dtype bug in piecewise_linear and add a test (#2224 by @rshyamsundar)

Fix bug in to_quantile_forecast (#2225 by @eugeneteoh)

Fix gluonts.mx.trainer.Trainer in case of empty data loader (#2228 by @lostella)

Fix feed-forward models when features are provided (#2238 by @lostella)

update SplicedBinnedPareto demos from nursery version to gluonts version (#2250 by @elenaehrlich)

Improve len() for ParquetFile. (#2261 by @jaheba)

Move max_idle_transform usage to GluonEstimator. (#2262 by @jaheba)

Optimize TimeSeriesSlice performance (#2259 by @lostella)

Fix ignore hidden files when generating datasets (#2263 by @kashif)

Fix: set max idle transforms in PyTorch estimators (#2266 by @lostella)

Fix QuantileForecast.plot() to use DateTimeIndex (#2269 by @abdulfatir)

Fix serde dataclass eventual. (#2277 by @jaheba)

Fix gluonts.dataset.split for multivariate case (#2314 by @lostella)

Improve TestData class in gluonts.dataset.split (#2315 by @lostella)

Simplify make_evaluation_predictions (#2309 by @lostella)

Fix MQCNN for kernel_size=1 (#2321 by @lostella)

Simplify unbatching in forecast-generator. (#2334 by @jaheba)

Fix numerical bug in BinnedUniforms (#2344 by @moudheus)

Documentation

Docs: Make notebook templates. (#2122 by @jaheba)

Docs: Rework installation section. (#2130 by @jaheba)

Docs: Fix running tutorials for publishing docs. (#2138 by @jaheba)

Docs: Update hyperparameter tuning with optuna notebook. (#2137 by @npnv)

Fix issues with hyperparameter tuning tutorial (#2143 by @lostella)

Apply black to notebooks. (#2144 by @jaheba)

Docs: Simplify wide DataFrame example (#2150 by @lostella)

Docs: fix links in models table (#2156 by @lostella)

Add 'Background' section to docs. (#2129 by @jaheba)

Docs: Add info about version guarantees. (#2161 by @jaheba)

Docs: fix tutorial after breaking changes in trainer class (#2179 by @lostella)

Add tutorial with data splitting examples (#2157 by @npnv)

Fix: add missing link to splitting tutorial (#2185 by @lostella)

Fix: ensure last cell of tutorials runs (#2186 by @lostella)

Fixes to the dataset splitting tutorial (#2189 by @npnv)

Update TSBench readme with paper reference (#2191 by @geoalgo)

Update Available models table with the hierarchical model (#2209 by @rshyamsundar)

Fix broken links in Available-models table (#2211 by @rshyamsundar)

Add logo to README. (#2248 by @jaheba)

New logo. (#2243 by @jaheba)

Use brand colors in docs. (#2257 by @jaheba)

Docs: Reformatting table, badge colors. (#2258 by @jaheba)

Docs: update contribution guidelines and dev setup (#2270 by @lostella)

Add Github footer icon to docs. (#2285 by @jaheba)

Docs: Custom Pygments style for dark theme. (#2290 by @jaheba)

Fix README quick examples (#2297 by @lostella)

Fix text in Quick Start Tutorial (#2300 by @sighellan)

Update README and tutorial (#2311 by @lostella)

Turn on apidoc generation (#2332 by @jaheba)

Add info on how to use 'just' (#2339 by @codingWhale13)

Small documentation improvements (#2343 by @codingWhale13)

Test / setup changes

add python 3.9 to test workflows (#2136)

Tests: Move mx model test. (#2158 by @jaheba)

Test: Use spawn method for shell server tests. (#2177 by @jaheba)

Remove holidays and matplotlib from core dependencies. (#2055 by @jaheba)

Update minimal version for nbconvert. (#2233 by @jaheba)

Hierarchical: Add a test for to_dataset method (#2265 by @rshyamsundar)

Fix mypy and black commands in pre-commit githook (#2271 by @abdulfatir)

Update project_urls. (#2274 by @jaheba)

Move _version to meta. (#2293 by @jaheba)

Remove setup-requires. (#2295 by @jaheba)

Remove pytest.ini. (#2298 by @jaheba)

Speed up smoke tests (#2341 by @lostella)

Source code(tar.gz)
Source code(zip)
v0.10.7(Sep 27, 2022)
Backporting fixes:

Add Github footer icon to docs. (#2285 by @jaheba)

Docs: Custom Pygments style for dark theme. (#2290 by @jaheba)

Fix README quick examples (#2297 by @lostella)

Fix text in Quick Start Tutorial (#2300 by @sighellan)

Update README and tutorial (#2311 by @lostella)

Fix MQCNN for kernel_size=1 (#2321 by @lostella)

Source code(tar.gz)
Source code(zip)
v0.10.6(Sep 6, 2022)
Backporting fixes:

Improve len() for ParquetFile. (#2261 by @jaheba)

Max idle transform fix (#2262 by @jaheba)

Fix ignore hidden files when generating datasets (#2263 by @kashif)

Fix: set max idle transforms in PyTorch estimators (#2266 by @lostella)

Fix QuantileForecast.plot() to use DateTimeIndex (#2269 by @abdulfatir)

Source code(tar.gz)
Source code(zip)
v0.10.5(Aug 26, 2022)
Backporting fixes:

Fix broken links in Available-models table (#2211 by @rshyamsundar)

Fix r_forecast wrapper to shift start date when truncating time series (#2216 by @abdulfatir)

Improve arrow reading performance (#2217 by @mr-1993)

Fix dtype bug in piecewise_linear and add a test (#2224 by @rshyamsundar)

Fix bug in to_quantile_forecast (#2225 by @eugeneteoh)

Fix gluonts.mx.trainer.Trainer in case of empty data loader (#2228 by @lostella)

Fix feed-forward models when features are provided (#2238 by @lostella)

Full changelog: https://github.com/awslabs/gluon-ts/compare/v0.10.4...v0.10.5
Source code(tar.gz)
Source code(zip)
v0.9.9(Aug 26, 2022)
Backporting fixes:

Fix r_forecast wrapper to shift start date when truncating time series (#2216 by @abdulfatir)

Fix dtype bug in piecewise_linear and add a test (#2224 by @rshyamsundar)

Fix bug in to_quantile_forecast (#2225 by @eugeneteoh)

Fix gluonts.mx.trainer.Trainer in case of empty data loader (#2228 by @lostella)

Fix feed-forward models when features are provided (#2238 by @lostella)

Full Changelog: https://github.com/awslabs/gluon-ts/compare/v0.9.8...v0.9.9
Source code(tar.gz)
Source code(zip)
v0.10.4(Aug 14, 2022)
Backporting fixes:

Fix SymbolBlock serde issue (#2187 by @lostella)

Fix dataset arrow writer tool. (#2196 by @jaheba)

Expose evaluator aggregation_strategy functions (#2198 by @kashif)

Update Available models table with the hierarchical model (#2209 by @rshyamsundar)

Source code(tar.gz)
Source code(zip)
v0.9.8(Aug 14, 2022)
Backporting fixes:

Fix SymbolBlock serde issue (#2187 by @lostella)

Source code(tar.gz)
Source code(zip)
v0.10.3(Aug 8, 2022)
Backporting fixes:

Fix Prophet wrapper to work with Timestamp instead of Period (#2182 by @lostella)

Fix dtype for "item_id" column in metrics dataframe (#2183 by @lostella)

Fix recursive case for gluonts.mx.batchify.stack (#2184 by @lostella)

Fix: ensure last cell of tutorials runs (#2186 by @lostella)

Fix item_id values in ConstantValuePredictor (#2192 by @codingWhale13)

Source code(tar.gz)
Source code(zip)
v0.9.7(Aug 8, 2022)
Backporting fixes:

Fix dtype for "item_id" column in metrics dataframe (https://github.com/awslabs/gluon-ts/pull/2183 by @lostella)

Fix recursive case for gluonts.mx.batchify.stack (https://github.com/awslabs/gluon-ts/pull/2184 by @lostella)

Fix item_id values in ConstantValuePredictor (https://github.com/awslabs/gluon-ts/pull/2192 by @codingWhale13)

Source code(tar.gz)
Source code(zip)
v0.10.2(Jul 14, 2022)
Backport fixes:

Make PandasDataset faster (#2148 by @lostella)

Interrupting mx.Trainer stops training. (#2131 by @Hongqing-work)

Ignore divide warnings in evaluation. (#2159 by @jaheba)

Source code(tar.gz)
Source code(zip)
v0.10.1(Jul 8, 2022)
Backporting fixes:

Docs: Make notebook templates. (#2122 by @jaheba)

Docs: Rework installation section. (#2130 by @jaheba)

Fix DatasetCollection for Python 3.9. (#2135 by @rsnirwan)

Docs: Fix running tutorials for publishing docs. (#2138 by @jaheba)

Fix PandasDataset for Python 3.9 (#2141 by @lostella)

Fix issues with hyperparameter tuning tutorial (#2143 by @lostella)

Docs: Apply black to notebooks. (#2144 by @jaheba)

Source code(tar.gz)
Source code(zip)
v0.10.0(Jun 30, 2022)
Overview

Arrow based datasets

We have added support for Parquet-files, as well as Arrow's binary format. This is an opt-in feature, requiring pyarrow to be installed. Use pip install 'gluonts[pro]' or pip install 'gluonts[arrow]' to ensure the correct version is installed.

FileDataset has been reworked to support .parquet and .arrow files. Previously, it had assumed all files to use jsonlines. To continue using jsonlines ensure that the the files use one of the .json, .jsonl, .json.gz, jsonl.gz suffixes.

Depending on the dataset size and shape, Arrow can be much faster than the json variant. In more extreme cases we saw speedups of more than 100x when using arrow vs jsonlines (see #2003 for some examples).

To convert a given dataset into arrow, you can use the gluonts.dataset.arrow utility:

python -m gluonts.dataset.arrow write </path/to/dataset> my-dataset.arrow

PandasDataset

We have added support for pandas.DataFrame and pandas.Series as well. You can now directly model data given in a DataFrame using gluonts.dataset.pandas.PandasDataset. In this tutorial we describe in depth how you can use PandasDataset to speed up modelling using GluonTS.

Changelog

New Features

#1631 - Add TimeLimitCallback to mx/trainer callbacks. (by @yx1215)

#1780 - adding MQF2 (Multi-horizon) (by @KelvinKan)

#1903 - Added QuarterlyBegin time feature (by @kashif)

#1924 - Porting SimpleFeedForwardEstimator to PyTorch (by @lostella)

#1925 - DeepAR PyTorch: make samplers configurable (by @lostella)

#1935 - added support for pandas dataframes (by @rsnirwan)

#1962 - Add support for beta-NLL loss (by @kashif)

#1982 - Add Uber-TLC dataset to dataset repository. (by @Hongqing-work)

#1990 - Add info cli. (by @jaheba)

#1987 - Add HP tuning example with Optuna (by @npnv)

#2000 - Add arrow-based dataset. (by @vafl, @lostella, @jaheba)

#2002 - add ND for item_metrics (by @melopeo)

#2006 - Added support of "long" RTS, making short RTS be "past_feat_dynamic_real" (by @zoolhasson)

#2061 - Add DatasetWriter. (by @jaheba)

#2074 - Add support for second frequency. (by @kashif)

Breaking Changes

#1917 - Breaking: Fix return types of features (by @lostella)

#1941 - Breaking: Update dependency fbprophet -> prophet (by @lostella)

#1946 - Breaking: Split incremental quantile output into separate class (by @lostella)

#1965 - Breaking: reorg torch package, shorten import paths (by @lostella)

#1980 - Use pd.Period instead of pd.Timestamp. (by @jaheba)

#1997 - Remove freq argument from Forecast. (by @kashif)

#2011 - Remove dct_reduce. (by @jaheba)

#2017 - Remove mandatory freq attribute of Predictor. (by @kashif)

#2018 - Remove multiprocessing dataloader. (by @jaheba)

#2019 - Rework FileDataset. (by @jaheba)

#2053 - Add dataset_writer to get_dataset. (by @Hongqing-work)

#2070 - Add jsonl.encode_json, remove serialize_data_entry. (by @jaheba)

Bug Fixes / Minor Improvements

#1704 - Settings._let will pop element it added instead of just the last one. (by @jaheba)

#1905 - Fix typing issues in torch estimators, update base estimators docstrings (by @lostella)

#1909 - Fix the use of the scaling parameter in Transformer model (by @StanislasGuinel)

#1916 - Fix AddTimeFeatures transformation for multiples of base frequencies (by @lostella)

#1920 - Fix: use broadcast_lesser in place of comparisons in ISQF (by @vincentqb)

#1931 - Fix dummy estimator (by @canerturkmen)

#1933 - Fix Pytorch Lightning tutorial. (by @jaheba)

#1938 - Fixed autograd inplace operations error in Transformed Distribution (by @shubhamkapoor)

#1950 - Fix: Hard threshold positive distribution parameters (by @lostella)

#1952 - Fix forecast keys (quantiles) output by TemporalFusionTransformer (by @lostella)

#1968 - Fix: use of num_parallel_samples in deepAR (by @kashif)

#1969 - Fix: torch DeepAR observed indicator in multivariate case (by @kashif)

#1975 - use FieldName (by @kashif)

#1983 - Documentation: add docstrings for torch-based models (by @lostella)

#1986 - Fix OffsetSplitter for negative offsets (by @lostella)

#1989 - Pin protobuf version. (by @jaheba)

#1991 - Remove packaged pytorch-ts from gluonts.nursery.SCott (by @lostella)

#1999 - Documentation: fix and speed up tutorials (by @lostella)

#2004 - Refactor splitter assertion and add error message (by @RSNirwan)

#2005 - Rework itertools, add col-to-row and row-to-col functions. (by @jaheba)

#2008 - Re-add cache for parsing 'pd.Period'. (by @jaheba)

#2013 - Update website template, clean up homepage and tutorials (by @lostella)

#2014 - Expose Estimator, Predictor, Forecast in gluonts.model. (by @jaheba)

#2015 - Fix mean in AffineTransformedDistribution (by @stailx)

#2016 - Fix torch affine transformed distribution (by @lostella)

#2020 - Remove unnecessary files from docs folder, update gitignore (by @lostella)

#2021 - Update references to dev branch. (by @lostella)

#2024 - Fix README. Use DataFramesDataset. (by @jaheba)

#2025 - Make HP tuning tutorial more accurate (by @jaheba)

#2028 - Re-add support for Python 3.6 (by @jaheba)

#2029 - Add support for nan values in Rotbaum (by @zoolhasson)

#2035 - Simplify lag values computation in torch DeepAR (by @lostella)

#2036 - Minor improvements to the hierarchical model (by @rshyamsundar)

#2047 - Make Quantile derive from pydantic.BaseModel. (by @jaheba)

#2050 - Add concepts section to docs. (by @jaheba)

#2051 - Add tutorial on DataFramesDataset (by @RSNirwan)

#2057 - Add optional parameter time_axis to forecast_start. (by @melopeo)

#2062 - Fix type annotations for predict_to_numpy (by @lostella)

#2066 - Always pass freq explicitly to pd.period_range. (by @kashif)

#2068 - Docs: simplify call to evaluator (by @lostella)

#2092 - Fix: DistributionLoss not encodable. (by @jaheba)

#2098 - Add Airtraffic dataset. (by @jaheba)

#2108 - Fixup trainer in case of non-finite loss. (by @jaheba)

#2121 - Change default behavior for TrainDatasets overwrite (by @nklingen)

Source code(tar.gz)
Source code(zip)
v0.9.6(Jun 30, 2022)
Backporting fixes:

Fix: DistributionLoss not encodable (#2092 by @jaheba)

Source code(tar.gz)
Source code(zip)
v0.10.0rc1(Jun 24, 2022)
Overview

Arrow based datasets

We have added support for Parquet-files, as well as Arrow's binary format. This is an opt-in feature, requiring pyarrow to be installed. Use pip install 'gluonts[pro]' or pip install 'gluonts[arrow]' to ensure the correct version is installed.

FileDataset has been reworked to support .parquet and .arrow files. Previously, it had assumed all files to use jsonlines. To continue using jsonlines ensure that the the files use one of the .json, .jsonl, .json.gz, jsonl.gz suffixes.

Depending on the dataset size and shape, Arrow can be much faster than the json variant. In more extreme cases we saw speedups of more than 100x when using arrow vs jsonlines (see #2003 for some examples).

To convert a given dataset into arrow, you can use the gluonts.dataset.arrow utility:

python -m gluonts.dataset.arrow write </path/to/dataset> my-dataset.arrow

PandasDataset

We have added support for pandas.DataFrame and pandas.Series as well. You can now directly model data given in a DataFrame using gluonts.dataset.pandas.PandasDataset. In this tutorial we describe in depth how you can use PandasDataset to speed up modelling using GluonTS.

Changelog

New Features

#1631 - Add TimeLimitCallback to mx/trainer callbacks. (by @yx1215)

#1780 - adding MQF2 (Multi-horizon) (by @KelvinKan)

#1903 - Added QuarterlyBegin time feature (by @kashif)

#1924 - Porting SimpleFeedForwardEstimator to PyTorch (by @lostella)

#1925 - DeepAR PyTorch: make samplers configurable (by @lostella)

#1935 - added support for pandas dataframes (by @rsnirwan)

#1962 - Add support for beta-NLL loss (by @kashif)

#1982 - Add Uber-TLC dataset to dataset repository. (by @Hongqing-work)

#1990 - Add info cli. (by @jaheba)

#1987 - Add HP tuning example with Optuna (by @npnv)

#2000 - Add arrow-based dataset. (by @vafl, @lostella, @jaheba)

#2002 - add ND for item_metrics (by @melopeo)

#2006 - Added support of "long" RTS, making short RTS be "past_feat_dynamic_real" (by @zoolhasson)

#2061 - Add DatasetWriter. (by @jaheba)

#2074 - Add support for second frequency. (by @kashif)

Breaking Changes

#1917 - Breaking: Fix return types of features (by @lostella)

#1941 - Breaking: Update dependency fbprophet -> prophet (by @lostella)

#1946 - Breaking: Split incremental quantile output into separate class (by @lostella)

#1965 - Breaking: reorg torch package, shorten import paths (by @lostella)

#1980 - Use pd.Period instead of pd.Timestamp. (by @jaheba)

#1997 - Remove freq argument from Forecast. (by @kashif)

#2011 - Remove dct_reduce. (by @jaheba)

#2018 - Remove multiprocessing dataloader. (by @jaheba)

#2019 - Rework FileDataset. (by @jaheba)

#2053 - Add dataset_writer to get_dataset. (by @Hongqing-work)

#2070 - Add jsonl.encode_json, remove serialize_data_entry. (by @jaheba)

Bug Fixes / Minor Improvements

#1704 - Settings._let will pop element it added instead of just the last one. (by @jaheba)

#1905 - Fix typing issues in torch estimators, update base estimators docstrings (by @lostella)

#1909 - Fix the use of the scaling parameter in Transformer model (by @StanislasGuinel)

#1916 - Fix AddTimeFeatures transformation for multiples of base frequencies (by @lostella)

#1920 - Fix: use broadcast_lesser in place of comparisons in ISQF (by @vincentqb)

#1931 - Fix dummy estimator (by @canerturkmen)

#1933 - Fix Pytorch Lightning tutorial. (by @jaheba)

#1938 - Fixed autograd inplace operations error in Transformed Distribution (by @shubhamkapoor)

#1950 - Fix: Hard threshold positive distribution parameters (by @lostella)

#1952 - Fix forecast keys (quantiles) output by TemporalFusionTransformer (by @lostella)

#1968 - Fix: use of num_parallel_samples in deepAR (by @kashif)

#1969 - Fix: torch DeepAR observed indicator in multivariate case (by @kashif)

#1975 - use FieldName (by @kashif)

#1983 - Documentation: add docstrings for torch-based models (by @lostella)

#1986 - Fix OffsetSplitter for negative offsets (by @lostella)

#1989 - Pin protobuf version. (by @jaheba)

#1991 - Remove packaged pytorch-ts from gluonts.nursery.SCott (by @lostella)

#1999 - Documentation: fix and speed up tutorials (by @lostella)

#2004 - Refactor splitter assertion and add error message (by @RSNirwan)

#2005 - Rework itertools, add col-to-row and row-to-col functions. (by @jaheba)

#2008 - Re-add cache for parsing 'pd.Period'. (by @jaheba)

#2013 - Update website template, clean up homepage and tutorials (by @lostella)

#2014 - Expose Estimator, Predictor, Forecast in gluonts.model. (by @jaheba)

#2015 - Fix mean in AffineTransformedDistribution (by @stailx)

#2016 - Fix torch affine transformed distribution (by @lostella)

#2020 - Remove unnecessary files from docs folder, update gitignore (by @lostella)

#2021 - Update references to dev branch. (by @lostella)

#2024 - Fix README. Use DataFramesDataset. (by @jaheba)

#2025 - Make HP tuning tutorial more accurate (by @jaheba)

#2028 - Re-add support for Python 3.6 (by @jaheba)

#2029 - Add support for nan values in Rotbaum (by @zoolhasson)

#2035 - Simplify lag values computation in torch DeepAR (by @lostella)

#2036 - Minor improvements to the hierarchical model (by @rshyamsundar)

#2047 - Make Quantile derive from pydantic.BaseModel. (by @jaheba)

#2050 - Add concepts section to docs. (by @jaheba)

#2051 - Add tutorial on DataFramesDataset (by @RSNirwan)

#2057 - Add optional parameter time_axis to forecast_start. (by @melopeo)

#2062 - Fix type annotations for predict_to_numpy (by @lostella)

#2068 - Docs: simplify call to evaluator (by @lostella)

Source code(tar.gz)
Source code(zip)
v0.9.5(Jun 14, 2022)
Re-add support for Python 3.6 in v0.9.x. (#2032 by @jaheba)

Backporting fixes:

Fix: use of num_parallel_samples in deepAR (#1968 by @kashif)

Fix: torch DeepAR observed indicator in multivariate case (#1969 by @kashif)

Fix OffsetSplitter for negative offsets (#1986 by @lostella)

Fix mean in AffineTransformedDistribution (#2015 by @stailx)

Source code(tar.gz)
Source code(zip)
v0.9.4(Apr 28, 2022)
Backporting fixes:

Fix: Hard threshold positive distribution parameters (#1950 by @lostella)

Fix forecast keys (quantiles) output by TemporalFusionTransformer (#1952 by @lostella)

Source code(tar.gz)
Source code(zip)
v0.9.3(Apr 12, 2022)
Backporting fixes:

Fix: use broadcast_lesser in place of comparisons in ISQF (#1920 by @vincentqb)

Fix dummy estimator (#1931 by @canerturkmen)

Fix Pytorch Lightning tutorial (#1933 by @jaheba)

Fixed autograd inplace operations error in Transformed Distribution (#1938 by @shubhamkapoor)

Source code(tar.gz)
Source code(zip)
v0.9.2(Mar 16, 2022)
Backporting fixes:

Fix AddTimeFeatures transformation for multiples of base frequencies (https://github.com/awslabs/gluon-ts/pull/1916)

Update docs requirements (https://github.com/awslabs/gluon-ts/pull/1919)

Source code(tar.gz)
Source code(zip)
v0.9.1(Feb 28, 2022)
Backporting fixes:

Added QuarterlyBegin time feature (#1903)

Fix the use of the scaling parameter in Transformer model (#1909)

Source code(tar.gz)
Source code(zip)
v0.9.0(Feb 18, 2022)
Changelog

New Features

Add ckpt_path argument to PyTorchLightningEstimator. (#1872)

Add TSBench (#1865)

add SCott code to nursery (#1827)

Add dynamic code for shell. (#1821)

Adding torch.isqf (#1815)

Add tsbench readme placeholder (#1808)

Adding ISQF distribution class (#1746)

Adding IQF to remove quantile crossing and required retraining for ne… (#1693)

Hierarchical Forecaster: End-to-End model based on DeepVAR (#1665)

Adding glouonts.torch.piecewise_linear (#1663)

Add quantitle regression mode to AutoGluon-based TabularEstimator (#1611)

add dummy estimator to trivial models (#1602)

Bug Fixes

Add file path argument to m5 dataset generation (#1896)

Fix negative binomial parameter map (#1893)

Fix negative binomial sampling (#1884)

Fixes for Monash Forecasting Repository datasets (#1879)

Fix serde.flat type handling. (#1851)

Fix datesplitter. (#1850)

changed metadata creation function (#1847)

Check equality of transformations. (#1844)

Fix samples scaling in PyTorch DeepAR (#1836)

Fix _version for cases when git is not installed. (#1825)

Fixed data leakage bug in implementation of dynamic real and categorical features (#1809)

fix for #1725, reverse breaking changes to data loader and handle all zero batches (#1779)

Upgrade pytorch and pytorch-lightning requirements and some fixes. (#1765)

Fix torch NOPScaler shape. (#1752)

Convert batchify list to np array (#1732)

Fix gluonts.json; added bdump/bdumps. (#1721)

Fix scaling for pytorch negative binomial output (#1702)

Fix frequency string conversion from ts format, add test (#1652)

Fix NegativeBinomial constructor args in NegativeBinomialOutput (torch) (#1651)

Add batch_size attribute to MQCNNEstimator and MQRNNEstimator (#1645)

Add additional datasets from the Monash Time Series Forecasting Repository (#1632)

Breaking Changes

Extend default quantiles for MQ* Estimators to match MSIS quantiles. (#1866)

changed metadata creation function (#1847)

Remove support module. (#1792)

Set minimum Python version to 3.7. (#1791)

Exceptions cleanup. (#1615)

Other Changes & Improvements

Update mypy to 0.910. (#1875)

Bump ujson from 4.3.0 to 5.1.0 in /src/gluonts/nursery/tsbench (#1869)

Update black to v22. (#1867)

Fix docstring typo in feature.py (#1863)

Fix scott checks. (#1845)

Remove requirement for @validated in from_hyperparameters. (#1826)

Fix test collect ignore. (#1817)

Split tests into one workflow for each framework. (#1805)

Mark transformer as flaky. (#1801)

Mark empirical_distribution test as flaky. (#1798)

Use of int/float/object over np.int/float/object for dtype. (#1795)

Rework tests. (#1786)

Update typing_extension version. (#1785)

Use of independent random seed. (#1767)

Upgrade pytorch and pytorch-lightning requirements and some fixes. (#1765)

Remove sphinx-autobuild sphinx-autorun, update sphinx version. (#1745)

Exlude bin folders from apidoc. (#1744)

Don't run doctest on nursery. (#1743)

Hierarchical: Compute relative reconciliation error and add tests (#1722)

Fixing doc build from mqcnn-iqf commit (#1699)

Replace miniver with custom versioning code. (#1662)

Cap numba<0.54, ipykernel<6.2.0 (#1661)

Removed assert for cardinality and static feats (#1659)

Source code(tar.gz)
Source code(zip)
v0.8.1(Aug 12, 2021)
Backporting fixes:

loosen RTOL in test/distribution/test_flows.py to make test_flow_invertibility pass (#1604)

Add batch_size attribute to MQCNNEstimator and MQRNNEstimator (#1645)

Fix NegativeBinomial constructor args in NegativeBinomialOutput (torch) (#1651)

Fix frequency string conversion from ts format, add test (adapted from #1652)

Source code(tar.gz)
Source code(zip)

Probabilistic time series modeling in Python

Related tags

Overview

GluonTS - Probabilistic Time Series Modeling in Python

Installation

Dockerfiles

Quick start guide

Further examples

Contributing

Citing

Video

Further Reading

Overview tutorials

Introductory material

Comments

Description

To Reproduce

Error message or code output

Environment

Description

Description

To Reproduce

Error message or code output

Environment

Description

To Reproduce

Error message or code output

Desired message or code output

Environment

Releases(v0.11.7)

v0.11.7(Jan 3, 2023)

v0.11.6(Dec 20, 2022)

v0.11.5(Dec 13, 2022)

What's Changed

v0.11.4(Dec 5, 2022)

v0.11.3(Nov 24, 2022)

v0.11.3rc1(Nov 24, 2022)

v0.11.2(Nov 21, 2022)

v0.11.1(Oct 28, 2022)

v0.10.8(Oct 28, 2022)

v0.11.0(Oct 10, 2022)

Overview

Incremental training

New models

Deprecated import paths and options

Dataset splitting functionality (experimental)

Changelog

Breaking changes

New features / major improvements

Bug fixes / minor improvements

Documentation

Test / setup changes

v0.10.7(Sep 27, 2022)

v0.10.6(Sep 6, 2022)

v0.10.5(Aug 26, 2022)

v0.9.9(Aug 26, 2022)

v0.10.4(Aug 14, 2022)

v0.9.8(Aug 14, 2022)

v0.10.3(Aug 8, 2022)

v0.9.7(Aug 8, 2022)

v0.10.2(Jul 14, 2022)

v0.10.1(Jul 8, 2022)

v0.10.0(Jun 30, 2022)

Overview

Arrow based datasets

PandasDataset

Changelog

New Features

Breaking Changes

Bug Fixes / Minor Improvements

v0.9.6(Jun 30, 2022)

v0.10.0rc1(Jun 24, 2022)

Overview

Arrow based datasets

PandasDataset

Changelog

New Features

Breaking Changes

Bug Fixes / Minor Improvements

v0.9.5(Jun 14, 2022)

`PandasDataset`

`PandasDataset`