TICC is a python solver for efficiently segmenting and clustering a multivariate time series

Last update: Dec 12, 2022

Related tags

Deep Learning TICC

Overview

TICC

TICC is a python solver for efficiently segmenting and clustering a multivariate time series. It takes as input a T-by-n data matrix, a regularization parameter lambda and smoothness parameter beta, the window size w and the number of clusters k. TICC breaks the T timestamps into segments where each segment belongs to one of the k clusters. The total number of segments is affected by the smoothness parameter beta. It does so by running an EM algorithm where TICC alternately assigns points to clusters using a dynamic programming algorithm and updates the cluster parameters by solving a Toeplitz Inverse Covariance Estimation problem.

For details about the method and implementation see the paper [1].

Download & Setup

Download the source code, by running in the terminal:

git clone https://github.com/davidhallac/TICC.git

Using TICC

The TICC-constructor takes the following parameters:

window_size: the size of the sliding window
number_of_clusters: the number of underlying clusters 'k'
lambda_parameter: sparsity of the Markov Random Field (MRF) for each of the clusters. The sparsity of the inverse covariance matrix of each cluster.
beta: The switching penalty used in the TICC algorithm. Same as the beta parameter described in the paper.
maxIters: the maximum iterations of the TICC algorithm before convergence. Default value is 100.
threshold: convergence threshold
write_out_file: Boolean. Flag indicating if the computed inverse covariances for each of the clusters should be saved.
prefix_string: Location of the folder to which you want to save the outputs.

The TICC.fit(input_file)-function runs the TICC algorithm on a specific dataset to learn the model parameters.

input_file: Location of the data matrix of size T-by-n.

An array of cluster assignments for each time point is returned in the form of a dictionary with keys being the cluster_id (from 0 to k-1) and the values being the cluster MRFs.

Example Usage

See example.py.

References

[1] D. Hallac, S. Vare, S. Boyd, and J. Leskovec Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 215--223

Comments

Warning and Can't converge problem
Hi David,

I applied your algorithm to my data set and found two issues:

I sometimes obtain the warning like this: /usr/lib/python2.7/dist-packages/numpy/lib/function_base.py:2476: RuntimeWarning: Degrees of freedom <= 0 for slice warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning) Is it serious warning? should I stop or just ignore it.

Oscillation between solutions ('\n\n\nITERATION ###', 15) ('OPTIMIZATION for Cluster #', 0, 'DONE!!!') ('OPTIMIZATION for Cluster #', 2, 'DONE!!!') ('OPTIMIZATION for Cluster #', 3, 'DONE!!!') ('length of the cluster ', 0, '------>', 739) ('length of the cluster ', 1, '------>', 0) ('length of the cluster ', 2, '------>', 20) ('length of the cluster ', 3, '------>', 20) ('length of the cluster ', 4, '------>', 0) UPDATED THE OLD COVARIANCE beginning the smoothening ALGORITHM ('cluster that is zero is:', 1, 'selected cluster instead is:', 0) ('cluster that is zero is:', 4, 'selected cluster instead is:', 3) ('length of cluster #', 0, '-------->', 739) ('length of cluster #', 1, '-------->', 20) ('length of cluster #', 2, '-------->', 0) ('length of cluster #', 3, '-------->', 0) ('length of cluster #', 4, '-------->', 20) Done writing the figure

('\n\n\nITERATION ###', 16) ('OPTIMIZATION for Cluster #', 0, 'DONE!!!') ('OPTIMIZATION for Cluster #', 1, 'DONE!!!') ('OPTIMIZATION for Cluster #', 4, 'DONE!!!') ('length of the cluster ', 0, '------>', 739) ('length of the cluster ', 1, '------>', 20) ('length of the cluster ', 2, '------>', 0) ('length of the cluster ', 3, '------>', 0) ('length of the cluster ', 4, '------>', 20) UPDATED THE OLD COVARIANCE beginning the smoothening ALGORITHM ('cluster that is zero is:', 2, 'selected cluster instead is:', 0) ('cluster that is zero is:', 3, 'selected cluster instead is:', 4) ('length of cluster #', 0, '-------->', 739) ('length of cluster #', 1, '-------->', 0) ('length of cluster #', 2, '-------->', 20) ('length of cluster #', 3, '-------->', 20) ('length of cluster #', 4, '-------->', 0) Done writing the figure

The runs continuously oscillate between two outcomes (739, 0, 20, 20, 0) and (739, 20, 0, 0, 20). Should I stop the run and choose one of them? What happens here and how to avoid this?

Thanks,
opened by vutlle 8
Implement `predict()` method?
Hello, great work on your paper.

Results of the algorithm look promising. Unfortunately, it seems like I'll have to call solve() method with all training dataset to cluster any new data every time. That's not efficient.

Now I have few questions:

Is it possible to implement predict method on an already trained model?

Could we add load_model and save_model methods?

Will you be following scikit-learn interfaces in the future (__init__,fit, predict, etc)?

I could definitely contribute but after 2 hours of attempting to refactor the code, it seems like there are plenty of dependent variables and super long methods. How can I help?
opened by mohataher 8
Using TICC for online clustering

Hi David,

Can we use TICC for online clustering time series? For instance, we want to identify the state the car is in during driving, given a set of learned states using TICC for batch learning.

Thanks,

opened by vutlle 6
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

I was following the readme.md and when I was executing this (cluster_assignment, cluster_MRFs) = TICC_solver.solve(window_size = 10,number_of_clusters = 5, lambda_parameter = 11e-2, beta = 400, maxIters = 70, threshold = 2e-5, write_out_file = True, input_file = "data.csv", prefix_string = "output_folder/") it throws me error: IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

I am using anaconda python 2.7 with cvxpy and snap just installed today.

opened by rguo12 6

where is the paper?

Please give the name of paper or url, Thanks!

BTW: windows （64bit）after run generate_synthetic_data.py, run TICC.py, find this in output:

RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()

opened by mikewin 6

Eigenvalues did not converge

Hello! because I am trying to test my data in your example.py but I got an error: numpy.linalg.linalg.LinAlgError: Eigenvalues did not converge Do you know what's wrong with this? Thank you very much

opened by yuyingg0921 5
Example stops at iteration zero

Hi Could you please help me with this. As I tried to run example, I constantly received similar result: it printed ITERATION ### 0 and did not proceed. I will be beyond grateful if you could help me figuring out what is going wrong.

opened by elmiraberjisian 5
generate synthetic data

Hi,

I want to ask about how do yo generate synthetic data based on previous samples? In your paper, you mentioned that "The data is then drawn one sample at a time, conditioned on the values of the previous w − 1 samples.". I know you generate w samples (n * w array) using "numpy.random.multinomial", but how do you generate w samples based on previous w − 1 samples? It means how do I generate 1 sample given w - 1 samples?

Thank you for your time.

opened by blacksnail789521 4
Some of the value in "optRes" will be nan.

Hi, Could you please help me with this.

I have a problem running your code on my dataset. When I'm in Iteration 0 and calling the function "train_clusters", the return value "optRes" seems to contain some nan. So when the function "optimize_clusters" got the value "optRes" from the function "train_clusters", it raised the error.

I did print out the value of "val" every time. Here is the output and the error information: ################################################## ITERATION ### 0 /usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:497: RuntimeWarning: Degrees of freedom <= 0 for slice /usr/local/lib/python3.5/dist-packages/numpy/lib/function_base.py:3093: RuntimeWarning: divide by zero encountered in double_scalars c *= 1. / np.float64(fact) /usr/local/lib/python3.5/dist-packages/numpy/lib/function_base.py:3093: RuntimeWarning: invalid value encountered in multiply c *= 1. / np.float64(fact) val: [0.2301 0.1450 -0.0000 ..., 0.9513 -0.0000 0.3904] OPTIMIZATION for Cluster # 0 DONE!!! val: [0.7449 0.0000 -0.0000 ..., 1.0000 -0.0000 0.5509] OPTIMIZATION for Cluster # 1 DONE!!! val: [0.9931 0.0000 0.0000 ..., 0.7676 -0.0000 0.5487] OPTIMIZATION for Cluster # 2 DONE!!! val: [0.4735 0.0000 0.0000 ..., 0.7480 0.0000 0.3804] OPTIMIZATION for Cluster # 3 DONE!!! val: [0.1999 0.1088 0.0000 ..., 0.8651 0.0000 0.5592] OPTIMIZATION for Cluster # 4 DONE!!! val: [0.0202 0.0314 0.0143 ..., 0.7245 0.0000 0.1511] OPTIMIZATION for Cluster # 5 DONE!!! val: [nan nan nan ..., nan nan nan] OPTIMIZATION for Cluster # 6 DONE!!! /usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:567: RuntimeWarning: invalid value encountered in less /usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:567: RuntimeWarning: invalid value encountered in greater

LinAlgErrorTraceback (most recent call last) in fit(self, input_file) 274 #print("opt_res: " + str(opt_res)) 275 self.optimize_clusters(computed_covariance, len_train_clusters, log_det_values, opt_res, --> 276 train_cluster_inverse) 277 278 # update old computed covariance

in optimize_clusters(self, computed_covariance, len_train_clusters, log_det_values, optRes, train_cluster_inverse) 467 X2 = S_est 468 #print(S_est) --> 469 u, _ = np.linalg.eig(S_est) 470 cov_out = np.linalg.inv(X2) 471

/usr/local/lib/python3.5/dist-packages/numpy/linalg/linalg.py in eig(a) 1126 _assertRankAtLeast2(a) 1127 _assertNdSquareness(a) -> 1128 _assertFinite(a) 1129 t, result_t = _commonType(a) 1130

/usr/local/lib/python3.5/dist-packages/numpy/linalg/linalg.py in _assertFinite(*arrays) 215 for a in arrays: 216 if not (isfinite(a).all()): --> 217 raise LinAlgError("Array must not contain infs or NaNs") 218 219 def _isEmpty2d(arr):

LinAlgError: Array must not contain infs or NaNs ##################################################

I assume that there are some problems when the program calculate (use) the ADMM_solver. Do you know what's wrong with this? Thank you for your time.

Btw, I did normalize my dataset and set lambda to 0.1, 1, ...

opened by blacksnail789521 4

`TestStringMethods.test_multiExample` fails when run alone

I have a had a look at your updated code. Interestingly enough, when running the two tests cases in TestStringMethods together, they both pass.

However, when running TestStringMethods.test_multiExample, it fails. I'm not sure why exactly. Do you have an explanation for this?

I ran that using Mac Pro machine, Python 3.6.4 (miniconda). Which machine and Python version did you use?

When test case passes, this is a snippet of the output

`# PASSED`
lam_sparse 0.11
switch_penalty 600
num_cluster 5
num stacked 5
completed getting the data



ITERATION ### 0
OPTIMIZATION for Cluster # 0 DONE!!!
OPTIMIZATION for Cluster # 1 DONE!!!
OPTIMIZATION for Cluster # 2 DONE!!!
OPTIMIZATION for Cluster # 3 DONE!!!
OPTIMIZATION for Cluster # 4 DONE!!!
length of the cluster  0 ------> 2851.       ` # <---- these numbers are different than the failed one.`
length of the cluster  1 ------> 2698
length of the cluster  2 ------> 2094
length of the cluster  3 ------> 7522
length of the cluster  4 ------> 4438
UPDATED THE OLD COVARIANCE
beginning the smoothening ALGORITHM
length of cluster # 0 --------> 2131
length of cluster # 1 --------> 1886
length of cluster # 2 --------> 3072
length of cluster # 3 --------> 7915
length of cluster # 4 --------> 4599
Done writing the figure

And that is a snippet when it fails

 `# FAILED`
lam_sparse 0.11
switch_penalty 600
num_cluster 5
num stacked 5
completed getting the data



ITERATION ### 0
OPTIMIZATION for Cluster # 0 DONE!!!
OPTIMIZATION for Cluster # 1 DONE!!!
OPTIMIZATION for Cluster # 2 DONE!!!
OPTIMIZATION for Cluster # 3 DONE!!!
OPTIMIZATION for Cluster # 4 DONE!!!
length of the cluster  0 ------> 7851.   `# <-----  really higher than when it passed`
length of the cluster  1 ------> 2246
length of the cluster  2 ------> 2261
length of the cluster  3 ------> 2613
length of the cluster  4 ------> 4632
UPDATED THE OLD COVARIANCE
beginning the smoothening ALGORITHM
length of cluster # 0 --------> 7957
length of cluster # 1 --------> 1617
length of cluster # 2 --------> 3102
length of cluster # 3 --------> 3280
length of cluster # 4 --------> 3647
Done writing the figure

The failed test case has successfully convergred, but still fails

CONVERGED!!! BREAKING EARLY!!!



TRAINING F1 score: -1 -1 -1


Failure
Traceback (most recent call last):
  File "/Users/motaher/miniconda2/envs/pyliger/lib/python3.6/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/Users/motaher/miniconda2/envs/pyliger/lib/python3.6/unittest/case.py", line 605, in run
    testMethod()
  File "/Users/motaher/dev/PycharmProjects/ORIGINAL_TICC/UnitTest.py", line 33, in test_multiExample
    self.assertEqual(sum(val), 0)
  File "/Users/motaher/miniconda2/envs/pyliger/lib/python3.6/unittest/case.py", line 829, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/Users/motaher/miniconda2/envs/pyliger/lib/python3.6/unittest/case.py", line 822, in _baseAssertEqual
    raise self.failureException(msg)
AssertionError: 45059.0 != 0


Ran 1 test in 53.887s

FAILED (failures=1)

That could be due to the fact that states in stored file and states coming from the algorithms were in different values?

opened by mohataher 4

switch to python 3

Switch TICC solver to python 3

The mammoth changes in TICC_helper are just converting all the indentation to spaces instead of a mix of tabs and spaces (which python 3 will break on)

For some reason this change breaks unit tests, but only slightly? The assignments stay the same, but the cluster MRFs are ever so slightly different: 5.9496 vs 5.9497 on the second diagonal entry (which shouldn't matter anyway?).

opened by scoutsaachi 4
how to get betweenness centrality for each sensor based on MRF network?

how to get betweenness centrality for each sensor based on MRF network, I’m trying to use networkx which is a Python library for studying graphs and networks to calculate betweenness centrality,but I’m confused about how to use it based on MRF network,at the same time, I couldn’t find the part in your code(https://github.com/davidhallac/TICC),could you be so kind as to give me the code about how to get betweenness centrality for each sensor based on MRF network?

opened by ykchasingwind 0
smoothen_clusters: LLE of last points not computed ?

Hi all,

I had a question regarding the smoothen_clusters function:

If I understand well, this function compute the LLE for all points. Hower, the follwoing condition avoid to compute the LLE of the last points... Any particular reason for that ? Since the vector LLE_all_points_clusters is of size [clustered_points_len, self.number_of_clusters], the last point have a LLE fixed to zero... which semms strange!

if point + self.window_size - 1 < complete_D_train.shape[0]:

Thanks in advance, Flo

opened by Ravisik 0
How to use the trained model on unseen data

It's not clear to me how to use TICC after training on test data.

Say I have a dataset X with 100 samples and I want to train on the first 50 and test it on the remaining 50. Could you elaborate on how to do that?

So basically when you have new points how are those assigned to the cluster? From what I understand I need to calculate the LLE matrix of dimension n_samples x n _clusters and then run the updateClusters function which is the dynamic programming algorithm that, given the trained inverse covariances for each cluster, works out the optimal cluster assignment for each point.

My question is do I need to concatenate the trained LLE with the new LLE deriving from the test data points? Because the results will change depending on if I do that or not.

Many thanks Gio

opened by gioxc88 2
difference in the optimization of Graphical Lasso between the paper_code TICC.py and in TICC_solver.py

I noticed that the TICC_solver.py and the TICC.py in the paper_code folder have two different implementation of the optimization for the Graphical Lasso.

For TICC_solver.py you use a custom class called ADMMSolver and the process seems pretty straightforward whereas in the TICC.py you use a class called TGraphV which also uses the ADMMSolver but it's a much more complicated implementation and brings additional dependencies like cvxpy and snap

I was wondering if there is any major difference between the two or anything else the user should be aware of

opened by gioxc88 0
difference between TICC_solver.py and TICC.py in paper_code folder?

Hello and thank you for the wonderful paper I noticed that the two implementation are quite different.

In the paper_code the implementation seems more complicated and it uses the SolverCrossTime and cvxpy which are not used in TICC_solver.py (as it only uses ADMM_solver.

Would please clarify on this aspect?

Many thanks Gio

opened by gioxc88 0

TICC is a python solver for efficiently segmenting and clustering a multivariate time series

Related tags

Overview

TICC

Download & Setup

Using TICC

Example Usage

References

Comments

Owner

Spectral Temporal Graph Neural Network (StemGNN in short) for Multivariate Time-series Forecasting

Deep learning-based approach to discovering Granger causality networks in multivariate time series

Code for the CIKM 2019 paper "DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting".

USAD - UnSupervised Anomaly Detection on multivariate time series

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

This repo contains the code required to train the multivariate time-series Transformer.

"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

SOTR: Segmenting Objects with Transformers [ICCV 2021]

Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

Sudoku solver - A sudoku solver with python

Simple Linear 2nd ODE Solver GUI - A 2nd constant coefficient linear ODE solver with simple GUI using euler's method

Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

Multivariate Boosted TRee

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

2D Time independent Schrodinger equation solver for arbitrary shape of well