N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

Cristian Challu

Last update: Jan 4, 2023

Related tags

Deep Learning n-hits

Overview

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

Recent progress in neural forecasting instigated significant improvements in the accuracy of large-scale forecasting systems. Yet, extremely long horizon forecasting remains a very difficult task. Two common challenges afflicting the long horizon forecasting are the volatility of the predictions and their computational complexity. In this paper we introduce N-HiTS, which addresses both challenges by incorporating novel hierarchical interpolation and multi-rate data sampling techniques. These techniques enable our method to assemble its predictions sequentially, selectively emphasizing components with different frequencies and scales while decomposing the input signal and synthesizing the forecast. We conduct an extensive empirical evaluation demonstrating the advantages of N-HiTS over the state-of-the-art long-horizon forecasting methods. On an array of multivariate forecasting tasks, our method provides an average accuracy improvement of 25% over the latest Transformer architectures while reducing the computational time by orders of magnitude.

N-HiTS architecture. The model is composed of several MLPs with ReLU nonlinearities. Blocks are connected via doubly residual stacking principle with the backcast y[t-L:t, l] and forecast y[t+1:t+H, l] outputs of the l-th block. Multi-rate input pooling, hierarchical interpolation and backcast residual connections together induce the specialization of the additive predictions in different signal bands, reducing memory footprint and compute time, improving architecture parsimony and accuracy.

Long Horizon Datasets Results

Run N-HiTS experiment from console

To replicate the results of the paper, in particular to produce the forecasts for N-HiTS, run the following:

make init
make get_dataset to download data.

make run_module module="python -m nhits_multivariate --hyperopt_max_evals 10 --experiment_id run_1"

If you want to use GPU simply add gpu=0 to the last line.

make run_module module="python -m nhits_multivariate --hyperopt_max_evals 10 --experiment_id run_1" gpu=0

Evaluate results for a dataset using:

make run_module module="python -m evaluation --dataset ETTm2 --horizon -1 --model NHITS --experiment run_1"

Alternatively, run all evaluations at once:

for dataset in ETTm2 ECL Exchange traffic weather ili;
 do make run_module module="python -m evaluation --dataset $dataset --horizon -1 --model NHITS --experiment run_1";
done

Comments

different settings (nhits vs. autoformer)

Hi! Thank you for sharing your source code.

I have some questions about the settings of NHITS and Autoformer.

I think there might be some unfair comparisons in your Tab 2 because you compared the Autoformer's reported results but used different settings in the NHITS model.

Q1: the length of the history window you use 5*args.horizon for NHITS. But for Autoformer, you use a shorter length (say, 1*args.horizon.) Here args.horizon=96.

When using a history length of 5*96, your reported result of ECL-96 is 0.147 (I can reproduce this by re-running your released code). The Autoformer's reported result is 0.201 (use only a 96-length window).

I tried some experiments and get results as follows:

using the same setting for NHITS (96-length window), the result of ECL-96 is MSE: 0.1902 / MAE: 0.2739

it seems the length of history window is an important hyperparameter.

By the way, using 5*96-length window for NBeats model, I get a much better result of ECL-96 is MSE: 0.1340 / MAE: 0.2311

Q2: the spilt of train/val/test set you use masks (train_mask_df, valid_mask_df, test_mask_df) to indicate the parts of train/valid/test. However, in autoformer's setting (see https://github.com/thuml/Autoformer) the borders are

border1s = [0, num_train - self.seq_len, len(df_raw) - num_test - self.seq_len] border2s = [num_train, num_train + num_vali, len(df_raw)]

Here, it seems you did not use the overlap part like [num_train - self.seq_len, num_train + num_vali]

So my question here is whether the same number of test samples are used for evaluation. If not, I think it might be unfair to directly compare Autoformer's results in your Tab 2.

opened by ResearcherLifeng 2
Clarification regarding data pre-processing

Hello,

I was trying to run N-HiTS. Can you please let me know the motivation behind this data preprocessing step: https://github.com/cchallu/n-hits/blob/main/src/data/datasets/ett.py#L45

Could you also provide some insight on the usage of the above mentioned code in the program.

Thank you

opened by vageeshmaiya 2
About training procedures and doc
Update: Additional questions

Your data pipeline seems quite non-traditional for me. At each training step, you randomly sample 256 windows from one time series as model input. A training epoch is finished by sampling each series once. I understand that it's a univariate model, but I don't see why you leave it to probability to cover the entire training span.

I tried an ablation by feeding the data in multivariate fashion, i.e. input a history of all variables, roll windows along time dimension, learning (N, S) -> (N, T) where N == num_series. The result was bad on traffic dataset. Could you help explain?

The paper says that you have lr halved three times across the training procedure. However, you mis-configured your pl_module. The default lr_schedule interval is epoch (ref. https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html#configure-optimizers), which means that you actually kept training with initial lr till the end.

You chose a training step of 1000, which is conservative considering your data feeding. For example, each ts is covered at most twice using traffic dataset. Training more steps slightly improved over your reported results on traffic dataset (at least).

I hope these could help improve your model (Of course the metric presented is already impressive enough :).

=============================================== Thank you for this amazing work. I found these typo and doc issues:

https://github.com/cchallu/n-hits/blob/4e929ed31e1d3ff5169b4aa0d3762a0040abb8db/src/models/nhits/nhits.py#L398-L405

while documented as a multiplier, n_time_in is actually the final Lookback period

https://github.com/cchallu/n-hits/blob/4e929ed31e1d3ff5169b4aa0d3762a0040abb8db/src/models/nhits/nhits.py#L248-L250

n_layers in nhits_multivariate.py should be [ 3*[2] ] rather than 9 since elements are indexed across 3 stacks

loss_hypar should be an int like 7 or 24 from its context

There are bypassed logics for exogenous variables in nhits model. I wonder if they can be put into work now?
opened by Guan-t7 2
About your data processing + final result

In your code, seems you normalized the orginal data. but when you calculated the MSE and mae, you didn't transform them back to the normal scale? Then are the mse and mae wrong in such case ?

opened by tianzhou2011 2
I am confuse about this line ： n_theta = (input_size + max(h//n_freq_downsample[i], 1) )

I am confuse about this line ：
n_theta = (input_size + max(h//n_freq_downsample[i], 1) )

I think the n_theta is larger than the input_size , but freq_downsample doesn't means the output_size should smaller than input_size? I means inorder to do Interpolation-----the up-sampling? Can you help me ? Thank you!!

opened by signalworker123 1
Question on n_time_in

Hi,

Thank you for publishing your code and also thanks for your interesting paper. I am now trying to use your code but I am not sure if I need to update the hyper opt space for n_time_in?

The current settings in nhits_multivariate is set to 'n_time_in': hp.choice('n_time_in', [5*args.horizon]) which results 960 inputs for a horizon length of 192, I was wondering is it the one used in your experiments or should I changed it to 96?

Thanks

opened by aminshabani 1
Clarification regarding implementation

Hi! I have some queries about the following line of code - this seems make your implemented model somewhat different from what you specified in your paper: https://github.com/cchallu/n-hits/blob/7b12bd3cef2e444d50803f8776b55e9606a4a1b6/src/models/nhits/nhits.py#L330 My understanding is that this line of code specifies that the final forecast also includes the first value of the lookback window, meaning you are predicting the "change in value" rather than the actual time series value. Is there any reason for doing this? Thank you for your time!

opened by gorold 1
Reproducing Results

Hello,

I downloaded the repository to my computer and tried to reproduce the results that were published in the paper for the traffic dataset with a prediction window of length 96. I ran the code with the following args:

--hyperopt_max_evals 10 --experiment_id run_1

But the results were 0.504 for the MSE and 0.311 for the MAE which is significantly worse than what I was expecting to achieve. Is there anything else that needs to be done before running the code and training the model in order to reproduce the results?

Thanks in advance!

opened by camerons1967 3
I can't see the model's detail in the code

I want see the model's detail in the code,but i found the Pytorch Lightning in the pycharm can't debug, they just run,how can i see the training data flows in the model? And it will makes me understand the model better. Thank you.

opened by signalworker123 7
Follow up to "change-in-level" forecast

Hi @cchallu, this is a follow up to issue #8. I was also wondering if it would be better to use the last value of the lookback window instead of the first value in the lookback window, mainly because the first value is sometimes masked?

opened by gorold 0
Clarification regarding data normalization

Hello,

I was trying to run N-HiTS with my own data using the shared colab

I tried to normalize the original EETm2 dataset and compared it with the data used in your N-HiTS model.

The size of df_train is 46641, and I followed the information given in section 4.1: Each set is normalized with the train data mean and standard deviation.

def normalize(df_csv, df_train): result = df_csv.copy() columns_names = list(df_csv.columns) for feature_name in columns_names[1:]: result[feature_name] = (df_csv[feature_name] - df_train[feature_name].mean()) / df_train[feature_name].std() return result

My function return different result comparing to yours: date HUFL 2016-07-01 00:00:00 0.126520 2016-07-01 00:15:00 -0.023339 2016-07-01 00:30:00 -0.098268 2016-07-01 00:45:00 -0.431177 2016-07-01 01:00:00 -0.231432 Name: HUFL, dtype: float64

and yours: unique_id | ds | y HUFL | 2016-07-01 00:00:00 | -0.041413 HUFL | 2016-07-01 00:15:00 | -0.185467 HUFL | 2016-07-01 00:30:00 | -0.257495 HUFL | 2016-07-01 00:45:00 | -0.577510 HUFL | 2016-07-01 01:00:00 | -0.385501

Can you please tell me more about the data normalization process?

Thanks and regards,

Sophie

opened by JiahuiSophieHU 0
Is backcast interpolated?

https://github.com/cchallu/n-hits/blob/4e929ed31e1d3ff5169b4aa0d3762a0040abb8db/src/models/nhits/nhits.py#L55-L68

https://github.com/cchallu/n-hits/blob/4e929ed31e1d3ff5169b4aa0d3762a0040abb8db/src/models/nhits/nhits.py#L263-L266

https://github.com/cchallu/n-hits/blob/4e929ed31e1d3ff5169b4aa0d3762a0040abb8db/src/models/nhits/nhits.py#L156-L157

According to these code blocks, It seems that Interpolation is used for synthesizing forecast only and the backcast is generated directly thru MLP. But Eq. 3 of your paper 3.3 states that forecast and backcast are interpolated in a similar way. Is there any reason behind this discrepency?

Thank you for your time!

opened by Guan-t7 0

Owner

Cristian Challu

GitHub

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

4 Dec 15, 2022

Spectral Temporal Graph Neural Network (StemGNN in short) for Multivariate Time-series Forecasting

Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting This repository is the official implementation of Spectral Temporal Gr

306 Dec 29, 2022

Hierarchical-Bayesian-Defense - Towards Adversarial Robustness of Bayesian Neural Network through Hierarchical Variational Inference (Openreview)

Towards Adversarial Robustness of Bayesian Neural Network through Hierarchical V

20 Dec 2, 2022

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting This is the origin Pytorch implementation of Informer in the followin

3.1k Dec 29, 2022

Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Non-AR Spatial-Temporal Transformer Introduction Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series For

66 Nov 28, 2022

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

2.8k Jan 8, 2023

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

TSForecasting This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the tim

80 Dec 30, 2022

Code for the CIKM 2019 paper "DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting".

Dual Self-Attention Network for Multivariate Time Series Forecasting 20.10.26 Update: Due to the difficulty of installation and code maintenance cause

223 Dec 16, 2022

The GitHub repository for the paper: “Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction“.

SCINet This is the original PyTorch implementation of the following work: Time Series is a Special Sequence: Forecasting with Sample Convolution and I

386 Jan 1, 2023

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

Related tags

Overview

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

Long Horizon Datasets Results

Run N-HiTS experiment from console

Comments

different settings (nhits vs. autoformer)

Clarification regarding data pre-processing

About training procedures and doc

About your data processing + final result

I am confuse about this line ： n_theta = (input_size + max(h//n_freq_downsample[i], 1) )

Question on n_time_in

Clarification regarding implementation

Reproducing Results

I can't see the model's detail in the code

Follow up to "change-in-level" forecast

Clarification regarding data normalization

Is backcast interpolated?

Owner

Cristian Challu

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

Spectral Temporal Graph Neural Network (StemGNN in short) for Multivariate Time-series Forecasting

Hierarchical-Bayesian-Defense - Towards Adversarial Robustness of Bayesian Neural Network through Hierarchical Variational Inference (Openreview)

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

Code for the CIKM 2019 paper "DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting".

The GitHub repository for the paper: “Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction“.

The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting".

Time Series Forecasting with Temporal Fusion Transformer in Pytorch

Event-forecasting - Event Forecasting Algorithms With Python

Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).

An implementation of the [Hierarchical (Sig-Wasserstein) GAN] algorithm for large dimensional Time Series Generation

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

LONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting