TICC is a python solver for efficiently segmenting and clustering a multivariate time series

Related tags

Deep Learning TICC
Overview

TICC

TICC is a python solver for efficiently segmenting and clustering a multivariate time series. It takes as input a T-by-n data matrix, a regularization parameter lambda and smoothness parameter beta, the window size w and the number of clusters k. TICC breaks the T timestamps into segments where each segment belongs to one of the k clusters. The total number of segments is affected by the smoothness parameter beta. It does so by running an EM algorithm where TICC alternately assigns points to clusters using a dynamic programming algorithm and updates the cluster parameters by solving a Toeplitz Inverse Covariance Estimation problem.

For details about the method and implementation see the paper [1].

Download & Setup

Download the source code, by running in the terminal:

git clone https://github.com/davidhallac/TICC.git

Using TICC

The TICC-constructor takes the following parameters:

  • window_size: the size of the sliding window
  • number_of_clusters: the number of underlying clusters 'k'
  • lambda_parameter: sparsity of the Markov Random Field (MRF) for each of the clusters. The sparsity of the inverse covariance matrix of each cluster.
  • beta: The switching penalty used in the TICC algorithm. Same as the beta parameter described in the paper.
  • maxIters: the maximum iterations of the TICC algorithm before convergence. Default value is 100.
  • threshold: convergence threshold
  • write_out_file: Boolean. Flag indicating if the computed inverse covariances for each of the clusters should be saved.
  • prefix_string: Location of the folder to which you want to save the outputs.

The TICC.fit(input_file)-function runs the TICC algorithm on a specific dataset to learn the model parameters.

  • input_file: Location of the data matrix of size T-by-n.

An array of cluster assignments for each time point is returned in the form of a dictionary with keys being the cluster_id (from 0 to k-1) and the values being the cluster MRFs.

Example Usage

See example.py.

References

[1] D. Hallac, S. Vare, S. Boyd, and J. Leskovec Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 215--223

Comments
  • Warning and Can't converge problem

    Warning and Can't converge problem

    Hi David,

    I applied your algorithm to my data set and found two issues:

    1. I sometimes obtain the warning like this: /usr/lib/python2.7/dist-packages/numpy/lib/function_base.py:2476: RuntimeWarning: Degrees of freedom <= 0 for slice warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning) Is it serious warning? should I stop or just ignore it.

    2. Oscillation between solutions ('\n\n\nITERATION ###', 15) ('OPTIMIZATION for Cluster #', 0, 'DONE!!!') ('OPTIMIZATION for Cluster #', 2, 'DONE!!!') ('OPTIMIZATION for Cluster #', 3, 'DONE!!!') ('length of the cluster ', 0, '------>', 739) ('length of the cluster ', 1, '------>', 0) ('length of the cluster ', 2, '------>', 20) ('length of the cluster ', 3, '------>', 20) ('length of the cluster ', 4, '------>', 0) UPDATED THE OLD COVARIANCE beginning the smoothening ALGORITHM ('cluster that is zero is:', 1, 'selected cluster instead is:', 0) ('cluster that is zero is:', 4, 'selected cluster instead is:', 3) ('length of cluster #', 0, '-------->', 739) ('length of cluster #', 1, '-------->', 20) ('length of cluster #', 2, '-------->', 0) ('length of cluster #', 3, '-------->', 0) ('length of cluster #', 4, '-------->', 20) Done writing the figure

      ('\n\n\nITERATION ###', 16) ('OPTIMIZATION for Cluster #', 0, 'DONE!!!') ('OPTIMIZATION for Cluster #', 1, 'DONE!!!') ('OPTIMIZATION for Cluster #', 4, 'DONE!!!') ('length of the cluster ', 0, '------>', 739) ('length of the cluster ', 1, '------>', 20) ('length of the cluster ', 2, '------>', 0) ('length of the cluster ', 3, '------>', 0) ('length of the cluster ', 4, '------>', 20) UPDATED THE OLD COVARIANCE beginning the smoothening ALGORITHM ('cluster that is zero is:', 2, 'selected cluster instead is:', 0) ('cluster that is zero is:', 3, 'selected cluster instead is:', 4) ('length of cluster #', 0, '-------->', 739) ('length of cluster #', 1, '-------->', 0) ('length of cluster #', 2, '-------->', 20) ('length of cluster #', 3, '-------->', 20) ('length of cluster #', 4, '-------->', 0) Done writing the figure

      The runs continuously oscillate between two outcomes (739, 0, 20, 20, 0) and (739, 20, 0, 0, 20). Should I stop the run and choose one of them? What happens here and how to avoid this?

    Thanks,

    opened by vutlle 8
  • Implement `predict()` method?

    Implement `predict()` method?

    Hello, great work on your paper.

    Results of the algorithm look promising. Unfortunately, it seems like I'll have to call solve() method with all training dataset to cluster any new data every time. That's not efficient.

    Now I have few questions:

    1. Is it possible to implement predict method on an already trained model?
    2. Could we add load_model and save_model methods?
    3. Will you be following scikit-learn interfaces in the future (__init__,fit, predict, etc)?

    I could definitely contribute but after 2 hours of attempting to refactor the code, it seems like there are plenty of dependent variables and super long methods. How can I help?

    opened by mohataher 8
  • Using TICC for online clustering

    Using TICC for online clustering

    Hi David,

    Can we use TICC for online clustering time series? For instance, we want to identify the state the car is in during driving, given a set of learned states using TICC for batch learning.

    Thanks,

    opened by vutlle 6
  • IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

    IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

    I was following the readme.md and when I was executing this (cluster_assignment, cluster_MRFs) = TICC_solver.solve(window_size = 10,number_of_clusters = 5, lambda_parameter = 11e-2, beta = 400, maxIters = 70, threshold = 2e-5, write_out_file = True, input_file = "data.csv", prefix_string = "output_folder/") it throws me error: IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

    I am using anaconda python 2.7 with cvxpy and snap just installed today.

    opened by rguo12 6
  • where is the paper?

    where is the paper?

    Please give the name of paper or url, Thanks!

    BTW: windows (64bit)after run generate_synthetic_data.py, run TICC.py, find this in output:

    RuntimeError: 
                Attempt to start a new process before the current process
                has finished its bootstrapping phase.
    
                This probably means that you are on Windows and you have
                forgotten to use the proper idiom in the main module:
    
                    if __name__ == '__main__':
                        freeze_support()
    
    opened by mikewin 6
  • Eigenvalues did not converge

    Eigenvalues did not converge

    Hello! because I am trying to test my data in your example.py but I got an error: numpy.linalg.linalg.LinAlgError: Eigenvalues did not converge Do you know what's wrong with this? Thank you very much

    opened by yuyingg0921 5
  • Example stops at iteration zero

    Example stops at iteration zero

    Hi Could you please help me with this. As I tried to run example, I constantly received similar result: it printed ITERATION ### 0 and did not proceed. I will be beyond grateful if you could help me figuring out what is going wrong.

    opened by elmiraberjisian 5
  • generate synthetic data

    generate synthetic data

    Hi,

    I want to ask about how do yo generate synthetic data based on previous samples? In your paper, you mentioned that "The data is then drawn one sample at a time, conditioned on the values of the previous w − 1 samples.". I know you generate w samples (n * w array) using "numpy.random.multinomial", but how do you generate w samples based on previous w − 1 samples? It means how do I generate 1 sample given w - 1 samples?

    Thank you for your time.

    opened by blacksnail789521 4
  • Some of the value in

    Some of the value in "optRes" will be nan.

    Hi, Could you please help me with this.

    I have a problem running your code on my dataset. When I'm in Iteration 0 and calling the function "train_clusters", the return value "optRes" seems to contain some nan. So when the function "optimize_clusters" got the value "optRes" from the function "train_clusters", it raised the error.

    I did print out the value of "val" every time. Here is the output and the error information: ################################################## ITERATION ### 0 /usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:497: RuntimeWarning: Degrees of freedom <= 0 for slice /usr/local/lib/python3.5/dist-packages/numpy/lib/function_base.py:3093: RuntimeWarning: divide by zero encountered in double_scalars c *= 1. / np.float64(fact) /usr/local/lib/python3.5/dist-packages/numpy/lib/function_base.py:3093: RuntimeWarning: invalid value encountered in multiply c *= 1. / np.float64(fact) val: [0.2301 0.1450 -0.0000 ..., 0.9513 -0.0000 0.3904] OPTIMIZATION for Cluster # 0 DONE!!! val: [0.7449 0.0000 -0.0000 ..., 1.0000 -0.0000 0.5509] OPTIMIZATION for Cluster # 1 DONE!!! val: [0.9931 0.0000 0.0000 ..., 0.7676 -0.0000 0.5487] OPTIMIZATION for Cluster # 2 DONE!!! val: [0.4735 0.0000 0.0000 ..., 0.7480 0.0000 0.3804] OPTIMIZATION for Cluster # 3 DONE!!! val: [0.1999 0.1088 0.0000 ..., 0.8651 0.0000 0.5592] OPTIMIZATION for Cluster # 4 DONE!!! val: [0.0202 0.0314 0.0143 ..., 0.7245 0.0000 0.1511] OPTIMIZATION for Cluster # 5 DONE!!! val: [nan nan nan ..., nan nan nan] OPTIMIZATION for Cluster # 6 DONE!!! /usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:567: RuntimeWarning: invalid value encountered in less /usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:567: RuntimeWarning: invalid value encountered in greater

    LinAlgErrorTraceback (most recent call last) in fit(self, input_file) 274 #print("opt_res: " + str(opt_res)) 275 self.optimize_clusters(computed_covariance, len_train_clusters, log_det_values, opt_res, --> 276 train_cluster_inverse) 277 278 # update old computed covariance

    in optimize_clusters(self, computed_covariance, len_train_clusters, log_det_values, optRes, train_cluster_inverse) 467 X2 = S_est 468 #print(S_est) --> 469 u, _ = np.linalg.eig(S_est) 470 cov_out = np.linalg.inv(X2) 471

    /usr/local/lib/python3.5/dist-packages/numpy/linalg/linalg.py in eig(a) 1126 _assertRankAtLeast2(a) 1127 _assertNdSquareness(a) -> 1128 _assertFinite(a) 1129 t, result_t = _commonType(a) 1130

    /usr/local/lib/python3.5/dist-packages/numpy/linalg/linalg.py in _assertFinite(*arrays) 215 for a in arrays: 216 if not (isfinite(a).all()): --> 217 raise LinAlgError("Array must not contain infs or NaNs") 218 219 def _isEmpty2d(arr):

    LinAlgError: Array must not contain infs or NaNs ##################################################

    I assume that there are some problems when the program calculate (use) the ADMM_solver. Do you know what's wrong with this? Thank you for your time.

    Btw, I did normalize my dataset and set lambda to 0.1, 1, ...

    opened by blacksnail789521 4
  • `TestStringMethods.test_multiExample` fails when run alone

    `TestStringMethods.test_multiExample` fails when run alone

    I have a had a look at your updated code. Interestingly enough, when running the two tests cases in TestStringMethods together, they both pass.

    However, when running TestStringMethods.test_multiExample, it fails. I'm not sure why exactly. Do you have an explanation for this?

    I ran that using Mac Pro machine, Python 3.6.4 (miniconda). Which machine and Python version did you use?

    When test case passes, this is a snippet of the output

    `# PASSED`
    lam_sparse 0.11
    switch_penalty 600
    num_cluster 5
    num stacked 5
    completed getting the data
    
    
    
    ITERATION ### 0
    OPTIMIZATION for Cluster # 0 DONE!!!
    OPTIMIZATION for Cluster # 1 DONE!!!
    OPTIMIZATION for Cluster # 2 DONE!!!
    OPTIMIZATION for Cluster # 3 DONE!!!
    OPTIMIZATION for Cluster # 4 DONE!!!
    length of the cluster  0 ------> 2851.       ` # <---- these numbers are different than the failed one.`
    length of the cluster  1 ------> 2698
    length of the cluster  2 ------> 2094
    length of the cluster  3 ------> 7522
    length of the cluster  4 ------> 4438
    UPDATED THE OLD COVARIANCE
    beginning the smoothening ALGORITHM
    length of cluster # 0 --------> 2131
    length of cluster # 1 --------> 1886
    length of cluster # 2 --------> 3072
    length of cluster # 3 --------> 7915
    length of cluster # 4 --------> 4599
    Done writing the figure
    

    And that is a snippet when it fails

     `# FAILED`
    lam_sparse 0.11
    switch_penalty 600
    num_cluster 5
    num stacked 5
    completed getting the data
    
    
    
    ITERATION ### 0
    OPTIMIZATION for Cluster # 0 DONE!!!
    OPTIMIZATION for Cluster # 1 DONE!!!
    OPTIMIZATION for Cluster # 2 DONE!!!
    OPTIMIZATION for Cluster # 3 DONE!!!
    OPTIMIZATION for Cluster # 4 DONE!!!
    length of the cluster  0 ------> 7851.   `# <-----  really higher than when it passed`
    length of the cluster  1 ------> 2246
    length of the cluster  2 ------> 2261
    length of the cluster  3 ------> 2613
    length of the cluster  4 ------> 4632
    UPDATED THE OLD COVARIANCE
    beginning the smoothening ALGORITHM
    length of cluster # 0 --------> 7957
    length of cluster # 1 --------> 1617
    length of cluster # 2 --------> 3102
    length of cluster # 3 --------> 3280
    length of cluster # 4 --------> 3647
    Done writing the figure
    
    

    The failed test case has successfully convergred, but still fails

    CONVERGED!!! BREAKING EARLY!!!
    
    
    
    TRAINING F1 score: -1 -1 -1
    
    
    Failure
    Traceback (most recent call last):
      File "/Users/motaher/miniconda2/envs/pyliger/lib/python3.6/unittest/case.py", line 59, in testPartExecutor
        yield
      File "/Users/motaher/miniconda2/envs/pyliger/lib/python3.6/unittest/case.py", line 605, in run
        testMethod()
      File "/Users/motaher/dev/PycharmProjects/ORIGINAL_TICC/UnitTest.py", line 33, in test_multiExample
        self.assertEqual(sum(val), 0)
      File "/Users/motaher/miniconda2/envs/pyliger/lib/python3.6/unittest/case.py", line 829, in assertEqual
        assertion_func(first, second, msg=msg)
      File "/Users/motaher/miniconda2/envs/pyliger/lib/python3.6/unittest/case.py", line 822, in _baseAssertEqual
        raise self.failureException(msg)
    AssertionError: 45059.0 != 0
    
    
    Ran 1 test in 53.887s
    
    FAILED (failures=1)
    

    That could be due to the fact that states in stored file and states coming from the algorithms were in different values?

    opened by mohataher 4
  • switch to python 3

    switch to python 3

    Switch TICC solver to python 3

    The mammoth changes in TICC_helper are just converting all the indentation to spaces instead of a mix of tabs and spaces (which python 3 will break on)

    For some reason this change breaks unit tests, but only slightly? The assignments stay the same, but the cluster MRFs are ever so slightly different: 5.9496 vs 5.9497 on the second diagonal entry (which shouldn't matter anyway?).

    opened by scoutsaachi 4
  • how to get betweenness centrality for each sensor based on MRF network?

    how to get betweenness centrality for each sensor based on MRF network?

    how to get betweenness centrality for each sensor based on MRF network, I’m trying to use networkx which is a Python library for studying graphs and networks to calculate betweenness centrality,but I’m confused about how to use it based on MRF network,at the same time, I couldn’t find the part in your code(https://github.com/davidhallac/TICC),could you be so kind as to give me the code about how to get betweenness centrality for each sensor based on MRF network?

    opened by ykchasingwind 0
  • smoothen_clusters: LLE of last points not computed ?

    smoothen_clusters: LLE of last points not computed ?

    Hi all,

    I had a question regarding the smoothen_clusters function:

    If I understand well, this function compute the LLE for all points. Hower, the follwoing condition avoid to compute the LLE of the last points... Any particular reason for that ? Since the vector LLE_all_points_clusters is of size [clustered_points_len, self.number_of_clusters], the last point have a LLE fixed to zero... which semms strange!

    if point + self.window_size - 1 < complete_D_train.shape[0]:

    Thanks in advance, Flo

    opened by Ravisik 0
  • How to use the trained model on unseen data

    How to use the trained model on unseen data

    It's not clear to me how to use TICC after training on test data.

    Say I have a dataset X with 100 samples and I want to train on the first 50 and test it on the remaining 50. Could you elaborate on how to do that?

    So basically when you have new points how are those assigned to the cluster? From what I understand I need to calculate the LLE matrix of dimension n_samples x n _clusters and then run the updateClusters function which is the dynamic programming algorithm that, given the trained inverse covariances for each cluster, works out the optimal cluster assignment for each point.

    My question is do I need to concatenate the trained LLE with the new LLE deriving from the test data points? Because the results will change depending on if I do that or not.

    Many thanks Gio

    opened by gioxc88 2
  • difference in the optimization of Graphical Lasso between the paper_code TICC.py and in TICC_solver.py

    difference in the optimization of Graphical Lasso between the paper_code TICC.py and in TICC_solver.py

    I noticed that the TICC_solver.py and the TICC.py in the paper_code folder have two different implementation of the optimization for the Graphical Lasso.

    For TICC_solver.py you use a custom class called ADMMSolver and the process seems pretty straightforward whereas in the TICC.py you use a class called TGraphV which also uses the ADMMSolver but it's a much more complicated implementation and brings additional dependencies like cvxpy and snap

    I was wondering if there is any major difference between the two or anything else the user should be aware of

    opened by gioxc88 0
  • difference between TICC_solver.py and TICC.py in paper_code folder?

    difference between TICC_solver.py and TICC.py in paper_code folder?

    Hello and thank you for the wonderful paper I noticed that the two implementation are quite different.

    In the paper_code the implementation seems more complicated and it uses the SolverCrossTime and cvxpy which are not used in TICC_solver.py (as it only uses ADMM_solver.

    Would please clarify on this aspect?

    Many thanks Gio

    opened by gioxc88 0
Owner
www.viaduct.ai
null
Spectral Temporal Graph Neural Network (StemGNN in short) for Multivariate Time-series Forecasting

Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting This repository is the official implementation of Spectral Temporal Gr

Microsoft 306 Dec 29, 2022
Ian Covert 130 Jan 1, 2023
Code for the CIKM 2019 paper "DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting".

Dual Self-Attention Network for Multivariate Time Series Forecasting 20.10.26 Update: Due to the difficulty of installation and code maintenance cause

Kyon Huang 223 Dec 16, 2022
USAD - UnSupervised Anomaly Detection on multivariate time series

USAD - UnSupervised Anomaly Detection on multivariate time series Scripts and utility programs for implementing the USAD architecture. Implementation

null 116 Jan 4, 2023
Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Spacetimeformer Multivariate Forecasting This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecast

QData 440 Jan 2, 2023
This repo contains the code required to train the multivariate time-series Transformer.

Multi-Variate Time-Series Transformer This repo contains the code required to train the multivariate time-series Transformer. Download the data The No

Gregory Duthé 4 Nov 24, 2022
"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

SOLQ: Segmenting Objects by Learning Queries This repository is an official implementation of the paper SOLQ: Segmenting Objects by Learning Queries.

MEGVII Research 179 Jan 2, 2023
SOTR: Segmenting Objects with Transformers [ICCV 2021]

SOTR: Segmenting Objects with Transformers [ICCV 2021] By Ruohao Guo, Dantong Niu, Liao Qu, Zhenbo Li Introduction This is the official implementation

null 186 Dec 20, 2022
Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

Stock Price Prediction Using Deep Learning Univariate Time Series Predicting stock price using historical data of a company using Neural networks for

Abdultawwab Safarji 7 Nov 27, 2022
Sudoku solver - A sudoku solver with python

sudoku_solver A sudoku solver What is Sudoku? Sudoku (Japanese: 数独, romanized: s

Sikai Lu 0 May 22, 2022
Simple Linear 2nd ODE Solver GUI - A 2nd constant coefficient linear ODE solver with simple GUI using euler's method

Simple_Linear_2nd_ODE_Solver_GUI Description It is a 2nd constant coefficient li

:) 4 Feb 5, 2022
Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Graph Regularized Residual Subspace Clustering Network for hyperspectral image clustering

Yaoming Cai 5 Jul 18, 2022
Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

ADGC: Awesome Deep Graph Clustering ADGC is a collection of state-of-the-art (SOTA), novel deep graph clustering methods (papers, codes and datasets).

yueliu1999 297 Dec 27, 2022
Multivariate Boosted TRee

Multivariate Boosted TRee What is MBTR MBTR is a python package for multivariate boosted tree regressors trained in parameter space. The package can h

SUPSI-DACD-ISAAC 61 Dec 19, 2022
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.6k Dec 31, 2022
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.6k Jan 6, 2023
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.3k Feb 12, 2021
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Knodle (Knowledge-supervised Deep Learning Framework) - a new framework for weak supervision with neural networks. It provides a modularization for se

null 93 Nov 6, 2022
2D Time independent Schrodinger equation solver for arbitrary shape of well

Schrodinger Well Python Python solver for timeless Schrodinger equation for well with arbitrary shape https://imgur.com/a/jlhK7OZ Pictures of circular

WeightAn 24 Nov 18, 2022