I'll be using an example here and generalize whenever needed, let's say we have the following dataset we train a timeseries predictor on:
time,gb,target,aux
1 ,A, 7, foo
2 ,A, 10, foo
3 ,A, 12, bar
4 ,A, 14, bar
2 ,B, 5, foo
4 ,B, 9, foo
In this case target
is what we are predicting, gb
is the column we are grouping on and we are ordering by time
. aux
is an unrelated column that's not timeseries in nature and just used "normally".
We train a predictor with a window of n
Then let's say we have an input stream that looks something like this:
time, gb, target, aux
6, A , 33, foo
7, A, 54, foo
Caching
First, we will need to store, for each value of the column gb
n
recrods.
So, for example, if n==1
we would save the last row in the data above, if n==2
we would save both, whem new rows come in, we un-cache the older rows`.
Infering
Second, when a new datapoint comes into the input stream we'll need to "infer" that the prediction we have to make is actually for the "next" datapoint. Which is to say that when: 7, A, 54, foo
comes in we need to infer that we need to actually make predictions for:
8, A, <this is what we are predicting>, foo
The challenge here is how do we infer that the next timestamp is 8
, one simple way to do this is to just subtract from the previous record, but that's an issue for the first observation (since we don't have a previous record to substract from, unless we cache part of the training data) or we could add a feature to native, to either:
a) Provide a delta
argument for each group by (representing by how much we increment the order column[s])
b) Have an argument when doing timeseries prediction that tells it to predict for the "next" row and then do the inferences under the cover.
@paxcema let me know which of these features would be easy to implement in native, since you're now the resident timeseries expert.
enhancement