N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

Related tags

Deep Learning n-hits
Overview

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

Recent progress in neural forecasting instigated significant improvements in the accuracy of large-scale forecasting systems. Yet, extremely long horizon forecasting remains a very difficult task. Two common challenges afflicting the long horizon forecasting are the volatility of the predictions and their computational complexity. In this paper we introduce N-HiTS, which addresses both challenges by incorporating novel hierarchical interpolation and multi-rate data sampling techniques. These techniques enable our method to assemble its predictions sequentially, selectively emphasizing components with different frequencies and scales while decomposing the input signal and synthesizing the forecast. We conduct an extensive empirical evaluation demonstrating the advantages of N-HiTS over the state-of-the-art long-horizon forecasting methods. On an array of multivariate forecasting tasks, our method provides an average accuracy improvement of 25% over the latest Transformer architectures while reducing the computational time by orders of magnitude.

N-HiTS architecture. The model is composed of several MLPs with ReLU nonlinearities. Blocks are connected via doubly residual stacking principle with the backcast y[t-L:t, l] and forecast y[t+1:t+H, l] outputs of the l-th block. Multi-rate input pooling, hierarchical interpolation and backcast residual connections together induce the specialization of the additive predictions in different signal bands, reducing memory footprint and compute time, improving architecture parsimony and accuracy.

Long Horizon Datasets Results

Run N-HiTS experiment from console

To replicate the results of the paper, in particular to produce the forecasts for N-HiTS, run the following:

  1. make init
  2. make get_dataset to download data.
make run_module module="python -m nhits_multivariate --hyperopt_max_evals 10 --experiment_id run_1"

If you want to use GPU simply add gpu=0 to the last line.

make run_module module="python -m nhits_multivariate --hyperopt_max_evals 10 --experiment_id run_1" gpu=0
  1. Evaluate results for a dataset using:
make run_module module="python -m evaluation --dataset ETTm2 --horizon -1 --model NHITS --experiment run_1"

Alternatively, run all evaluations at once:

for dataset in ETTm2 ECL Exchange traffic weather ili;
 do make run_module module="python -m evaluation --dataset $dataset --horizon -1 --model NHITS --experiment run_1";
done
Comments
  • different settings (nhits vs. autoformer)

    different settings (nhits vs. autoformer)

    Hi! Thank you for sharing your source code.

    I have some questions about the settings of NHITS and Autoformer.

    I think there might be some unfair comparisons in your Tab 2 because you compared the Autoformer's reported results but used different settings in the NHITS model.

    Q1: the length of the history window you use 5*args.horizon for NHITS. But for Autoformer, you use a shorter length (say, 1*args.horizon.) Here args.horizon=96.

    When using a history length of 5*96, your reported result of ECL-96 is 0.147 (I can reproduce this by re-running your released code). The Autoformer's reported result is 0.201 (use only a 96-length window).

    I tried some experiments and get results as follows:

    using the same setting for NHITS (96-length window), the result of ECL-96 is MSE: 0.1902 / MAE: 0.2739

    it seems the length of history window is an important hyperparameter.

    By the way, using 5*96-length window for NBeats model, I get a much better result of ECL-96 is MSE: 0.1340 / MAE: 0.2311

    Q2: the spilt of train/val/test set you use masks (train_mask_df, valid_mask_df, test_mask_df) to indicate the parts of train/valid/test. However, in autoformer's setting (see https://github.com/thuml/Autoformer) the borders are

    border1s = [0, num_train - self.seq_len, len(df_raw) - num_test - self.seq_len] border2s = [num_train, num_train + num_vali, len(df_raw)]

    Here, it seems you did not use the overlap part like [num_train - self.seq_len, num_train + num_vali]

    So my question here is whether the same number of test samples are used for evaluation. If not, I think it might be unfair to directly compare Autoformer's results in your Tab 2.

    opened by ResearcherLifeng 2
  • Clarification regarding data pre-processing

    Clarification regarding data pre-processing

    Hello,

    I was trying to run N-HiTS. Can you please let me know the motivation behind this data preprocessing step: https://github.com/cchallu/n-hits/blob/main/src/data/datasets/ett.py#L45

    Could you also provide some insight on the usage of the above mentioned code in the program.

    Thank you

    opened by vageeshmaiya 2
  • About training procedures and doc

    About training procedures and doc

    Update: Additional questions

    • Your data pipeline seems quite non-traditional for me. At each training step, you randomly sample 256 windows from one time series as model input. A training epoch is finished by sampling each series once. I understand that it's a univariate model, but I don't see why you leave it to probability to cover the entire training span.

    • I tried an ablation by feeding the data in multivariate fashion, i.e. input a history of all variables, roll windows along time dimension, learning (N, S) -> (N, T) where N == num_series. The result was bad on traffic dataset. Could you help explain?

    • The paper says that you have lr halved three times across the training procedure. However, you mis-configured your pl_module. The default lr_schedule interval is epoch (ref. https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html#configure-optimizers), which means that you actually kept training with initial lr till the end.

    • You chose a training step of 1000, which is conservative considering your data feeding. For example, each ts is covered at most twice using traffic dataset. Training more steps slightly improved over your reported results on traffic dataset (at least).

    I hope these could help improve your model (Of course the metric presented is already impressive enough :).

    =============================================== Thank you for this amazing work. I found these typo and doc issues:

    https://github.com/cchallu/n-hits/blob/4e929ed31e1d3ff5169b4aa0d3762a0040abb8db/src/models/nhits/nhits.py#L398-L405

    1. while documented as a multiplier, n_time_in is actually the final Lookback period

    https://github.com/cchallu/n-hits/blob/4e929ed31e1d3ff5169b4aa0d3762a0040abb8db/src/models/nhits/nhits.py#L248-L250

    1. n_layers in nhits_multivariate.py should be [ 3*[2] ] rather than 9 since elements are indexed across 3 stacks

    2. loss_hypar should be an int like 7 or 24 from its context

    3. There are bypassed logics for exogenous variables in nhits model. I wonder if they can be put into work now?

    opened by Guan-t7 2
  • About your data processing + final result

    About your data processing + final result

    In your code, seems you normalized the orginal data. but when you calculated the MSE and mae, you didn't transform them back to the normal scale? Then are the mse and mae wrong in such case ?

    opened by tianzhou2011 2
  • I am confuse about this line :                n_theta = (input_size + max(h//n_freq_downsample[i], 1) )

    I am confuse about this line : n_theta = (input_size + max(h//n_freq_downsample[i], 1) )

    I am confuse about this line :
    n_theta = (input_size + max(h//n_freq_downsample[i], 1) )

    I think the n_theta is larger than the input_size , but freq_downsample doesn't means the output_size should smaller than input_size? I means inorder to do Interpolation-----the up-sampling? Can you help me ? Thank you!!

    opened by signalworker123 1
  • Question on n_time_in

    Question on n_time_in

    Hi,

    Thank you for publishing your code and also thanks for your interesting paper. I am now trying to use your code but I am not sure if I need to update the hyper opt space for n_time_in?

    The current settings in nhits_multivariate is set to 'n_time_in': hp.choice('n_time_in', [5*args.horizon]) which results 960 inputs for a horizon length of 192, I was wondering is it the one used in your experiments or should I changed it to 96?

    Thanks

    opened by aminshabani 1
  • Clarification regarding implementation

    Clarification regarding implementation

    Hi! I have some queries about the following line of code - this seems make your implemented model somewhat different from what you specified in your paper: https://github.com/cchallu/n-hits/blob/7b12bd3cef2e444d50803f8776b55e9606a4a1b6/src/models/nhits/nhits.py#L330 My understanding is that this line of code specifies that the final forecast also includes the first value of the lookback window, meaning you are predicting the "change in value" rather than the actual time series value. Is there any reason for doing this? Thank you for your time!

    opened by gorold 1
  • Reproducing Results

    Reproducing Results

    Hello,

    I downloaded the repository to my computer and tried to reproduce the results that were published in the paper for the traffic dataset with a prediction window of length 96. I ran the code with the following args:

    --hyperopt_max_evals 10 --experiment_id run_1

    But the results were 0.504 for the MSE and 0.311 for the MAE which is significantly worse than what I was expecting to achieve. Is there anything else that needs to be done before running the code and training the model in order to reproduce the results?

    Thanks in advance!

    opened by camerons1967 3
  • I can't see the model's detail in the code

    I can't see the model's detail in the code

    I want see the model's detail in the code,but i found the Pytorch Lightning in the pycharm can't debug, they just run,how can i see the training data flows in the model? And it will makes me understand the model better. Thank you.

    opened by signalworker123 7
  • Follow up to

    Follow up to "change-in-level" forecast

    Hi @cchallu, this is a follow up to issue #8. I was also wondering if it would be better to use the last value of the lookback window instead of the first value in the lookback window, mainly because the first value is sometimes masked?

    opened by gorold 0
  • Clarification regarding data normalization

    Clarification regarding data normalization

    Hello,

    I was trying to run N-HiTS with my own data using the shared colab

    I tried to normalize the original EETm2 dataset and compared it with the data used in your N-HiTS model.

    The size of df_train is 46641, and I followed the information given in section 4.1: Each set is normalized with the train data mean and standard deviation.

    def normalize(df_csv, df_train): result = df_csv.copy() columns_names = list(df_csv.columns) for feature_name in columns_names[1:]: result[feature_name] = (df_csv[feature_name] - df_train[feature_name].mean()) / df_train[feature_name].std() return result

    My function return different result comparing to yours: date HUFL 2016-07-01 00:00:00 0.126520 2016-07-01 00:15:00 -0.023339 2016-07-01 00:30:00 -0.098268 2016-07-01 00:45:00 -0.431177 2016-07-01 01:00:00 -0.231432 Name: HUFL, dtype: float64

    and yours: unique_id | ds | y HUFL | 2016-07-01 00:00:00 | -0.041413 HUFL | 2016-07-01 00:15:00 | -0.185467 HUFL | 2016-07-01 00:30:00 | -0.257495 HUFL | 2016-07-01 00:45:00 | -0.577510 HUFL | 2016-07-01 01:00:00 | -0.385501

    Can you please tell me more about the data normalization process?

    Thanks and regards,

    Sophie

    opened by JiahuiSophieHU 0
  • Is backcast interpolated?

    Is backcast interpolated?

    https://github.com/cchallu/n-hits/blob/4e929ed31e1d3ff5169b4aa0d3762a0040abb8db/src/models/nhits/nhits.py#L55-L68

    https://github.com/cchallu/n-hits/blob/4e929ed31e1d3ff5169b4aa0d3762a0040abb8db/src/models/nhits/nhits.py#L263-L266

    https://github.com/cchallu/n-hits/blob/4e929ed31e1d3ff5169b4aa0d3762a0040abb8db/src/models/nhits/nhits.py#L156-L157

    According to these code blocks, It seems that Interpolation is used for synthesizing forecast only and the backcast is generated directly thru MLP. But Eq. 3 of your paper 3.3 states that forecast and backcast are interpolated in a similar way. Is there any reason behind this discrepency?

    Thank you for your time!

    opened by Guan-t7 0
Owner
Cristian Challu
Cristian Challu
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
Spectral Temporal Graph Neural Network (StemGNN in short) for Multivariate Time-series Forecasting

Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting This repository is the official implementation of Spectral Temporal Gr

Microsoft 306 Dec 29, 2022
LBK 20 Dec 2, 2022
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting This is the origin Pytorch implementation of Informer in the followin

Haoyi 3.1k Dec 29, 2022
Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Non-AR Spatial-Temporal Transformer Introduction Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series For

Chen Kai 66 Nov 28, 2022
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

timeseriesAI 2.8k Jan 8, 2023
This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

TSForecasting This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the tim

Rakshitha Godahewa 80 Dec 30, 2022
Code for the CIKM 2019 paper "DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting".

Dual Self-Attention Network for Multivariate Time Series Forecasting 20.10.26 Update: Due to the difficulty of installation and code maintenance cause

Kyon Huang 223 Dec 16, 2022
The GitHub repository for the paper: “Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction“.

SCINet This is the original PyTorch implementation of the following work: Time Series is a Special Sequence: Forecasting with Sample Convolution and I

null 386 Jan 1, 2023
The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting".

IGMTF The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting". Requirements The framework

Wentao Xu 24 Dec 5, 2022
Time Series Forecasting with Temporal Fusion Transformer in Pytorch

Forecasting with the Temporal Fusion Transformer Multi-horizon forecasting often contains a complex mix of inputs – including static (i.e. time-invari

Nicolás Fornasari 6 Jan 24, 2022
Event-forecasting - Event Forecasting Algorithms With Python

event-forecasting Event Forecasting Algorithms Theory Correlating events in comp

Intellia ICT 4 Feb 15, 2022
Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).

What is judgyprophet? judgyprophet is a Bayesian forecasting algorithm based on Prophet, that enables forecasting while using information known by the

AstraZeneca 56 Oct 26, 2022
An implementation of the [Hierarchical (Sig-Wasserstein) GAN] algorithm for large dimensional Time Series Generation

Hierarchical GAN for large dimensional financial market data Implementation This repository is an implementation of the [Hierarchical (Sig-Wasserstein

null 11 Nov 29, 2022
RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation Ported from https://github.com/hzwer/arXiv2020-RIFE Dependencies NumPy

null 49 Jan 7, 2023
RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

hzwer 3k Jan 4, 2023
RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation YouTube | BiliBili 16X interpolation results from two input images: Introd

旷视天元 MegEngine 28 Dec 9, 2022
LONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION

Query Selector Here you can find code and data loaders for the paper https://arxiv.org/pdf/2107.08687v1.pdf . Query Selector is a novel approach to sp

MORAI 62 Dec 17, 2022
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

Autoformer (NeurIPS 2021) Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting Time series forecasting is a c

THUML @ Tsinghua University 847 Jan 8, 2023