A universal framework for learning timestamp-level representations of time series

Zhihan Yue

Last update: Dec 30, 2022

Related tags

Deep Learning ts2vec

Overview

TS2Vec

This repository contains the official implementation for the paper Learning Timestamp-Level Representations for Time Series with Hierarchical Contrastive Loss.

Requirements

The recommended requirements for TS2Vec are specified as follows:

Python 3.8
scipy==1.6.1
torch==1.8.1
numpy==1.19.2
pandas==1.0.1
scikit_learn==0.24.1

The dependencies can be installed by:

pip install -r requirements.txt

Data

The datasets can be obtained and put into datasets/ folder in the following way:

128 UCR datasets should be put into datasets/UCR/ so that each data file can be located by datasets/UCR/<dataset_name>/<dataset_name>_*.csv.
30 UEA datasets should be put into datasets/UEA/ so that each data file can be located by datasets/UEA/<dataset_name>/<dataset_name>_*.arff.
3 ETT datasets should be placed at datasets/ETTh1.csv, datasets/ETTh2.csv and datasets/ETTm1.csv.
Electricity dataset should be resampled into hourly data of 321 clients over the last 3 years and placed at datasets/electricity.csv.

Usage

To train and evaluate TS2Vec on a dataset, run the following command:

python train.py <dataset_name> <run_name> --archive <archive> --batch-size <batch_size> --repr-dims <repr_dims> --gpu <gpu> --eval

The detailed descriptions about the arguments are as following:

Parameter name	Description of parameter
dataset_name	The dataset name
run_name	The folder name used to save model, output and evaluation metrics. This can be set to any word
archive	The archive name that the dataset belongs to. This can be set to `UCR`, `UEA`, `forecast_csv` or `forecast_csv_univar`
batch_size	The batch size (defaults to 8)
repr_dims	The representation dimensions (defaults to 320)
gpu	The gpu no. used for training and inference (defaults to 0)
eval	Whether to perform evaluation after training

(For descriptions of more arguments, run python train.py -h.)

After training and evaluation, the trained encoder, output and evaluation metrics can be found in training/DatasetName__RunName_Date_Time/.

Scripts: The scripts for reproduction are provided in scripts/ folder.

Code Example

from ts2vec import TS2Vec
import datautils

# Load the ECG200 dataset from UCR archive
train_data, train_labels, test_data, test_labels = datautils.load_UCR('ECG200')
# (Both train_data and test_data have a shape of n_instances x n_timestamps x n_features)

# Train a TS2Vec model
model = TS2Vec(
    input_dims=1,
    device=0,
    output_dims=320
)
loss_log = model.fit(
    train_data,
    verbose=True
)

# Compute timestamp-level representations for test set
test_repr = model.encode(test_data)  # n_instances x n_timestamps x output_dims

# Compute instance-level representations for test set
test_repr = model.encode(test_data, encoding_window='full_series')  # n_instances x output_dims

# Sliding inference for test set
test_repr = model.encode(
    test_data,
    casual=True,
    sliding_length=1,
    sliding_padding=50
)  # n_instances x n_timestamps x output_dims
# (The timestamp t's representation vector is computed using the observations located in [t-50+1, t])

Comments

请问关于encoder中的相关问题
大佬您好，想请问下encoder中的Dilated Convolutions。

我看代码后的理解是，分别将a1-b1子序列和a2-b2子序列送入encoder，得到两个子序列表征。再对两个子序列时间公共的部分a2~b1进行对比学习。可以这么理解吗？

请问encoder中的Dilated Convolution是因果卷积吗？是每一点的表征学习只用到了该时刻及之前的数据？

如果是因果卷积，可否认为将a2-b2子序列送入encoder学习表征时，a2-b1段学到的表征，没有使用到b1-b2段的信息？

恳请大佬答疑解惑，多谢多谢！
opened by guobing21 11
请问训练轮次如何控制？
大佬您好，恭喜sota，有两个小问题：

我对非监督不是很熟悉，监督学习下判断过拟合欠拟合我主要是用valid set的early stop。这里任务被解耦成特征提取和一个判别网络，似乎只能观测一下特征提取这个阶段目标的loss curve性状来大体判断一下，而这个目标也不是任务整体的目标，那么如何决策要训练多少轮呢？是否有过、欠拟合风险呢？

我的任务是时序分类，并且全部贴好了标签。那么比起最大化某序列和另一个随机抽样序列的差异，是否有意地在另一个类别里抽样会更好呢？

非常感谢！
opened by kpmokpmo 4
Simple sin wave results

sinwave.csv I am using a simple sin wave to test the algor. s1 is long wave, s2 is med wave, s3 is short wave, s4=s1+s2+s3. The model can predict s1/s2/s3 successfully, but for s4 it performs poorly compared to even LSTM. Could you share some insights on this? I've tried default hyper-parameters, and also tried to tune it. No significant improvement. thanks.

opened by huangtarn 3
data shape, loading custom data, possible lookahead

Hi, I am trying to test on my own datasets which are multivariate time series. I load the data into a Dataframe and then create the slices for train validate and test, just mimicking the existing code.

There is a point where my n x m data, where m is the number of features, or covariate time series, is expanded to 1 x n x m. The comments in your code say "number of instances x timestamps x features". What is instances in this context?

I am worried that my results are perhaps too good to be true and I am trying to make sure I understand where lookahead might be.

opened by gminorcoles 2
clarification on the sliding length and padding

Thank you for your great contribution. I was unable to understand the difference in usage for the sliding length and sliding padding. For example, if I wanted to utilize X days for a forecasting problem, what would be the proper usage for the parameters be?

Thank you in advance.

sliding_length sliding_padding

Note: I noticed on my dataset that using 24 =>sliding length > 1 yields better results, however for sliding length >24 a size mismatch error occurs at evaluation. The impact for increasing the padding was less impactful than the length, so if you can clarify the proper usage it would be great.

opened by m13ammed 2
where can I download 'electricity.csv'

Hi Zhihan,

Thanks for the very useful repository. Could you point me to the place where I can download the 'electricity.csv'. From the link in the README, Electricity dataset, I can only get a file named LD2011_2014.txt.zip. Not sure how to convert it to the 'electricity.csv'.

opened by hehaodele 2
请教下游任务中的padding

大佬您好，代码中有两个问题想请教您，希望大佬能够解答一下，多谢大佬！ 1 下游预测任务中，padding设置了200，这个padding在预训练中有没有相对应的地方呢，也就是说预训练中要不要也padding呢？如果要，预训练的padding和下游任务的padding是否必须保持一致呢？200的值是怎么选取的呢？您使用的电力数据集，数据很长，padding设置为了200。如果数据没有那么长，例如只有100个点，那padding怎么设置合理点呢？ padding = 200 t = time.time() all_repr = model.encode( data, casual=True, sliding_length=1, sliding_padding=padding, batch_size=256 ) 2 生成训练数据的时候，为什么要drop掉padding个数据呢？ train_features, train_labels = generate_pred_samples(train_repr, train_data, pred_len, drop=padding)

恳请大佬解惑，多谢大佬！！！

opened by guobing21 1
drop=padding in forecasting

Hi,

Is there any reason you set drop to equal padding lengths for training in forecasting, but not for valid and test? This could train forecast function with complete history only.

https://github.com/yuezhihan/ts2vec/blob/12a737e6561878452fffb68c81c98d24628f274a/tasks/forecasting.py#L46

opened by opsuisppn 1
what is difference between n_instance and n_features?

in ts2vec.py, fit() requires train_data type (n_instance. n_timestamps, n_features). what is difference between n_instance and n_features? I think n_feature means the number of time series (e.g. univariate time series -> n_features=1) Is same that n_instance to window size? or something? thank you.

opened by Haebuk 1
CUDA out of memory

I run python3 train.py ETTm1 mytest --loader forecast_csv

and have got an error as follows. Could you pls help me? thanks ############################## Dataset: ETTm1 Arguments: Namespace(batch_size=8, dataset='ETTm1', epochs=None, eval=False, gpu=0, irregular=0, iters=None, loader='forecast_csv', lr=0.001, max_threads=None, max_train_length=3000, repr_dims=320, run_name='binh', save_every=None, seed=None) Loading data... done Traceback (most recent call last): File "train.py", line 120, in loss_log = model.fit( File "/home/binh/experiments/ts2vec/ts2vec.py", line 137, in fit loss.backward() File "/home/binh/.local/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/binh/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA out of memory. Tried to allocate 380.00 MiB (GPU 0; 3.82 GiB total capacity; 1.80 GiB already allocated; 254.62 MiB free; 2.25 GiB reserved in total by PyTorch)

opened by Thanh-Binh 1
Format of Yahoo dataset for pre-processing

Thank you so much for continuously open-sourcing your findings! I noticed that the downloaded data from Yahoo seems to be in a different format than the one required for preprocess_yahoo.py. Will it be possible for you to look into this? Thank you very much!

Yahoo follows the format of A1/real_1.csv ... A2/synthetic_1.csv ...

While the required format is path/1 ... path/367, which seems to contain dictionaries.

opened by gorold 1
The dataset about ETT

Hi,it is a very nice work. But I have a question about Multivariate time series forecasting results. This Github repo https://github.com/zhouhaoyi/ETDataset only offer ETT-small dataset that is a Univariate time serie. I don't konw how to use Multivariate time serie dataset of ETT to run this code. Thank you very much Best wishes

opened by fuyuyuputao 0
下游任务相关问题？
你好，我想请教一下下游任务的问题

Compute timestamp-level representations for test set

test_repr = model.encode(test_data) # n_instances x n_timestamps x output_dims

Compute instance-level representations for test set

test_repr = model.encode(test_data, encoding_window='full_series') # n_instances x output_dims

n_instances x n_timestamps x output_dims 我可以理解成为 batchsize * t * channels吗？

然后这个时间级别的表示和实例级别的表示在分类任务中有什么性能上的差异吗？另外，我看代码是将输入维度升高到了320，这好像和我理解的传统的特征提取不太一样？
opened by sunuo1997 0
Rounding error concerning max_train_length

Hi, I think there is a rounding error concerning the max_train_length

https://github.com/yuezhihan/ts2vec/blob/631bd533aab3547d1310f4e02a20f3eb53de26be/ts2vec.py#L77-L80

To crop the data into cropped into some sequences, each of which has a length less than <max_train_length>, the number of sections should be rounded up.

For example in the ETTh1 dataset cropping the train slice of length 8640 with max_train_length = 201 results in 42 sections of length 206, instead of 43 sections of length 201.

opened by RichardAffolter 0
Training iterations

Thanks for putting out this paper, sounds very promising. Would you mind clarifying the number of iterations you use for self-supervised training. You mention in the paper that you use 600 for datasets larger than 100,000 (with batch size 8). That seems incredibly low, are you sure you don't mean 600 epochs?

Any clarification would be great, thank you!

opened by TKassis 0
How to use your approach for downstream forecasting tasks

Summary

Thanks for making the code available. I really like the idea of first learning the embeddings in a self-supervised manner and then using a simpler model for forecasting. However, I am struggling how to use the learned embeddings for the forecasting part.

Problem Description

Say you are tasked with forecasting a monthly univariate time series Y = (y1, ..., yT), which is historically available from January.2010 until December.2020. The task is to forecast 2021, with the forecasting horizon being h=12 months. Based on your framework, we are using the TCN-Encoder to learn the embeddings for January.2010 until December.2020. For training of the downstream forecasting model, say a Ridge Regression Model, we are using the final timestamp of the learned representations. So far so good.

@yuezhihan & @linytsysu My questions is: given the representations and the trained Ridge model, how do we forecast 2021, since the data and hence representations are available until end of 2020 only? More specifically, what are the features for the Ridge model used for forecasting 2021?

In your Paper, Section C.2 you state that

For each task, we only use the training set to train the representation model, and apply the model to the testing set to get representations

Does this mean you show the actual test-data to the model, create the representations/embeddings based on the test-data and then use these to fit the same test-data? Isn't this a simple interpolation of the test-data, using the representations instead of the actuals, rather than forecasting?

I highly appreciate your comments on this. Many thanks.

opened by StatMixedML 18
A dont understad part about random cropping in code?
For below part ,why use crop_right substract crop_eleft instead of substracting crop_left? out1 = self._net(take_per_row(x, crop_offset + crop_eleft, crop_right - crop_eleft)) out1 = out1[:, -crop_l:]

out2 = self._net(take_per_row(x, crop_offset + crop_left, crop_eright - crop_left)) out2 = out2[:, :crop_l]
opened by meihuameii 2

A universal framework for learning timestamp-level representations of time series

Related tags

Overview

TS2Vec

Requirements

Data

Usage

Code Example

Comments

Compute timestamp-level representations for test set

Compute instance-level representations for test set

Summary

Problem Description

Owner

Zhihan Yue

A unified framework for machine learning with time series

Merlion: A Machine Learning Framework for Time Series Intelligence

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

[ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction

The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting".

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Deep learning-based approach to discovering Granger causality networks in multivariate time series

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.

git《USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation》(2020) GitHub: [fig2]

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

Source for the paper "Universal Activation Function for machine learning"

Code for "MetaMorph: Learning Universal Controllers with Transformers", Gupta et al, ICLR 2022

Pytorch implementation of COIN, a framework for compression with implicit neural representations 🌸

Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations.

ImageNet-CoG is a benchmark for concept generalization. It provides a full evaluation framework for pre-trained visual representations which measure how well they generalize to unseen concepts.

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting