MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

Overview

MINIROCKET

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

arXiv:2012.08791 (preprint)

Until recently, the most accurate methods for time series classification were limited by high computational complexity. ROCKET achieves state-of-the-art accuracy with a fraction of the computational expense of most existing methods by transforming input time series using random convolutional kernels, and using the transformed features to train a linear classifier. We reformulate ROCKET into a new method, MINIROCKET, making it up to 75 times faster on larger datasets, and making it almost deterministic (and optionally, with additional computational expense, fully deterministic), while maintaining essentially the same accuracy. Using this method, it is possible to train and test a classifier on all of 109 datasets from the UCR archive to state-of-the-art accuracy in less than 10 minutes. MINIROCKET is significantly faster than any other method of comparable accuracy (including ROCKET), and significantly more accurate than any other method of even roughly-similar computational expense. As such, we suggest that MINIROCKET should now be considered and used as the default variant of ROCKET.

Please cite as:

@article{dempster_etal_2020,
  author  = {Dempster, Angus and Schmidt, Daniel F and Webb, Geoffrey I},
  title   = {{MINIROCKET}: A Very Fast (Almost) Deterministic Transform for Time Series Classification},
  year    = {2020},
  journal = {arXiv:2012.08791}
}

sktime* / Multivariate

MINIROCKET (including a basic multivariate implementation) is also available through sktime. See the examples.

* for larger datasets (10,000+ training examples), the sktime methods should be integrated with SGD or similar as per softmax.py (replace calls to fit(...) and transform(...) from minirocket.py with calls to the relevant sktime methods as appropriate)

Results

* num_training_examples does not include the validation set of 2,048 training examples, but the transform time for the validation set is included in time_training_seconds

Requirements*

  • Python, NumPy, pandas
  • Numba (0.50+)
  • scikit-learn or similar
  • PyTorch or similar (for larger datasets)

* all pre-packaged with or otherwise available through Anaconda

Code

minirocket.py

minirocket_dv.py (MINIROCKETDV)

softmax.py (PyTorch / 10,000+ Training Examples)

minirocket_multivariate.py (equivalent to sktime/MiniRocketMultivariate)

minirocket_variable.py (variable-length input; experimental)

Important Notes

Compilation

The functions in minirocket.py and minirocket_dv.py are compiled by Numba on import, which may take some time. By default, the compiled functions are now cached, so this should only happen once (i.e., on the first import).

Input Data Type

Input data should be of type np.float32. Alternatively, you can change the Numba signatures to accept, e.g., np.float64.

Normalisation

Unlike ROCKET, MINIROCKET does not require the input time series to be normalised. (However, whether or not it makes sense to normalise the input time series may depend on your particular application.)

Examples

MINIROCKET

from minirocket import fit, transform
from sklearn.linear_model import RidgeClassifierCV

[...] # load data, etc.

# note:
# * input time series do *not* need to be normalised
# * input data should be np.float32

parameters = fit(X_training)

X_training_transform = transform(X_training, parameters)

classifier = RidgeClassifierCV(alphas = np.logspace(-3, 3, 10), normalize = True)
classifier.fit(X_training_transform, Y_training)

X_test_transform = transform(X_test, parameters)

predictions = classifier.predict(X_test_transform)

MINIROCKETDV

from minirocket_dv import fit_transform
from minirocket import transform
from sklearn.linear_model import RidgeClassifierCV

[...] # load data, etc.

# note:
# * input time series do *not* need to be normalised
# * input data should be np.float32

parameters, X_training_transform = fit_transform(X_training)

classifier = RidgeClassifierCV(alphas = np.logspace(-3, 3, 10), normalize = True)
classifier.fit(X_training_transform, Y_training)

X_test_transform = transform(X_test, parameters)

predictions = classifier.predict(X_test_transform)

PyTorch / 10,000+ Training Examples

from softmax import train, predict

model_etc = train("InsectSound_TRAIN_shuffled.csv", num_classes = 10, training_size = 22952)
# note: 22,952 = 25,000 - 2,048 (validation)

predictions, accuracy = predict("InsectSound_TEST.csv", *model_etc)

Variable-Length Input (Experimental)

from minirocket_variable import fit, transform, filter_by_length
from sklearn.linear_model import RidgeClassifierCV

[...] # load data, etc.

# note:
# * input time series do *not* need to be normalised
# * input data should be np.float32

# special instructions for variable-length input:
# * concatenate variable-length input time series into a single 1d numpy array
# * provide another 1d array with the lengths of each of the input time series
# * input data should be np.float32 (as above); lengths should be np.int32

# optionally, use a different reference length when setting dilation (default is
# the length of the longest time series), and use fit(...) with time series of
# at least this length, e.g.:
# >>> reference_length = X_training_lengths.mean()
# >>> X_training_1d_filtered, X_training_lengths_filtered = \
# >>> filter_by_length(X_training_1d, X_training_lengths, reference_length)
# >>> parameters = fit(X_training_1d_filtered, X_training_lengths_filtered, reference_length)

parameters = fit(X_training_1d, X_training_lengths)

X_training_transform = transform(X_training_1d, X_training_lengths, parameters)

classifier = RidgeClassifierCV(alphas = np.logspace(-3, 3, 10), normalize = True)
classifier.fit(X_training_transform, Y_training)

X_test_transform = transform(X_test_1d, X_test_lengths, parameters)

predictions = classifier.predict(X_test_transform)

Acknowledgements

We thank Professor Eamonn Keogh and all the people who have contributed to the UCR time series classification archive. Figures in our paper showing mean ranks were produced using code from Ismail Fawaz et al. (2019).

🚀 🚀 🚀
Comments
  • starting with

    starting with "wide" data

    If I start with the wide data format, a 2d array of samples (rows) by sensor readings (columns), what is the right way to transform that to fit the requirements of this library?

    opened by BrannonKing 7
  • TypeError: No matching definition for argument type(s) array(float64, 2d, C), array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)

    TypeError: No matching definition for argument type(s) array(float64, 2d, C), array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)

    Thank you very much, once again, for this great piece of software. Very much appreciated! I'm trying to use it with my data but unfortunately, I always get the following error if I attempt to fit my input with "parameters = fit(x_trainScaled)":

    TypeError: No matching definition for argument type(s) array(float64, 2d, C), array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)

    Here are some, probably, relevant characteristics of my input:

        print(x_trainScaled.shape)
        print(x_trainScaled.dtype)
    

    returns:

    (3000, 3000)
    float64
    

    // edit:

    This is the whole traceback:

      File "minirocket\code\minirocket.py", line 130, in fit
        biases = _fit_biases(X, dilations, num_features_per_dilation, quantiles)
      File "\lib\site-packages\numba\dispatcher.py", line 500, in _explain_matching_error
        raise TypeError(msg)
    
    opened by Huii 7
  • Example of CSV file reading

    Example of CSV file reading

    Hello, I'm trying to figure out what minirocket expects as data on input. I keep on getting TypeError: No matching definition for argument type(s) pyobject, array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)

    My data has following format:

    timestamp,close
    1619773130596,54559.47
    1619773134938,54563.93
    1619773139226,54554.23
    1619773143564,54564.34
    

    And I read it like this:

    dataset = pd.read_csv(filename, usecols = [0, 1], header=0)
    dataset = dataset.dropna()
    dataset.columns = dataset.columns.to_series().apply(lambda x: x.strip())
    
    opened by jumpingfella 5
  • some question about multivarible version.

    some question about multivarible version.

    hello, I watch the code about multivarible miniroket. I think the combine multi channels is not make sense for me. Conv(x) , x is channel 0 Conv(y), y is channel 1 when combine the channel, just become: Conv(x+y) why not, change the np.sum to np.prod. Conv(x*y)

    opened by Presburger 3
  • Unlabeled data

    Unlabeled data

    hello, thanks for your excellent work. wmm, and I have a problem, I find the response in "starting with "wide" data", you say the data can be unlabeled, it depends on my task "(You don't need labels necessarily, depending on your task.)" and when I read your article or code readme, I notice that you mentioned the parameters in different data are same, right? (ok, I don't know if I understand right, and I can't find where is the latter information.) So my question is, could I apply your work on my unlabeled data? if it's true, how can I set the "Y_traing" in examples codes? thanks!

    opened by hyjocean 2
  • Feature Size

    Feature Size

    Thank you so much for making your work available! I have a quick question about the feature size. Looks like the minimum number of feature size is 84. Is there any harm in extracting 84 features and using only a subset them?

    opened by tdincer 2
  • minirocket_multivariate extremely slow

    minirocket_multivariate extremely slow

    My setup is that I am using large dataset (10,000+) and I pass data as batches into model. I do not cache the data and run transform every time I pass data into model on every epoch. I run this same setup for both

    minirocket.py with input shape (32768,99) and

    minirocket_multivariate.py with input shape (32768,1,99) so the number of channel is 1.

    I find that the minirocket_multivariate.py version runs significantly more slow on every transform() relative to minirocket.py.

    Is there a potential bug in the code?

    opened by turmeric-blend 2
  • X_validation not transformed properly?

    X_validation not transformed properly?

    hi, for softmax.py, if the data is split into multiple chunks, then X_validation is only transformed for the first's chunk biases, as biases for different chunks are different, but the transform is only applied once.

    if epoch == 0 and chunk_index == 0: # only run once <---
    
       parameters = fit(X_training, args["num_features"]) # returns: dilations, num_features_per_dilation, biases
    
       # transform validation data
       X_validation_transform = transform(X_validation, parameters)
    

    would transforming the X_validation with each chunk's biases improve performance?

    EDIT:

    similarly for the latter part (where X_validation_transform is only normalised with mean and std values from the first chunk):

    if epoch == 0 and chunk_index == 0:
    
                        # per-feature mean and standard deviation
                        f_mean = X_training_transform.mean(0)
                        f_std = X_training_transform.std(0) + 1e-8
    
                        # normalise validation features
                        X_validation_transform = (X_validation_transform - f_mean) / f_std
                        X_validation_transform = torch.FloatTensor(X_validation_transform)
    
    opened by turmeric-blend 2
  • datatype

    datatype

    when i use my data with minirocket in pycharm , had a problem with dataype, like: Traceback (most recent call last): File "E:/PycharmProjects/minirocket-main/code/traintest.py", line 46, in parameters = fit(X_training) File "E:\PycharmProjects\minirocket-main\code\minirocket.py", line 130, in fit biases = _fit_biases(X, dilations, num_features_per_dilation, quantiles) File "E:\ProgramData\Anaconda3\envs\deepl\lib\site-packages\numba\core\dispatcher.py", line 703, in _explain_matching_error raise TypeError(msg)TypeError: No matching definition for argument type(s) pyobject, array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C) how can i work it

    opened by dfx1822375 13
  • Can't set random_state when doing a gridsearchCV

    Can't set random_state when doing a gridsearchCV

    Dependencies

    import numpy as np
    from sklearn.linear_model import RidgeClassifier
    from sklearn.pipeline import Pipeline
    from sklearn.model_selection import GridSearchCV
    
    from sktime.datasets import load_basic_motions
    from sktime.transformations.panel.rocket import MiniRocketMultivariate`
    

    Make train/test split and set up pipeline

    X_train, y_train = load_basic_motions(split="train", return_X_y=True)
    
    model = Pipeline([
        ('minirocket', MiniRocketMultivariate(random_state=42)), 
        ('ridge_clf', RidgeClassifier(random_state=42)),
    ])
    

    Fit 1 model

    model.fit(X_train, y_train) Works fine

    Now do a gridsearch for alpha value

    parameters = {
      'ridge_clf__alpha': [0.1, 1, 10],
    }
    
    model_cv = GridSearchCV(model, parameters)
    
    model_cv.fit(X_train, y_train)
    

    "RuntimeError: Cannot clone object MiniRocketMultivariate(random_state=42), as the constructor either does not set or modifies parameter random_state"

    opened by StijnBr 3
  • Extending Documentation of minirocket multivariate

    Extending Documentation of minirocket multivariate

    Hello,

    The implementations for minirocket multivariate (both here and on sktime) mention that it is a naive extension of the univatiate version, but do not give any clearer explanation of what is actually happening under the hood. Looking directly at the source code for this version does not help that much either, as it is fairly hard to read.

    Could you extend the documentation on the repository with a (coarse) description of how the algorithm was extended to handle multivariate data and/or add some comments to the source code in that regard?

    Thanks!

    opened by bdudzik 1
Owner
null
A very tiny, very simple, and very secure file encryption tool.

Picocrypt is a very tiny (hence "Pico"), very simple, yet very secure file encryption tool. It uses the modern ChaCha20-Poly1305 cipher suite as well

Evan Su 1k Dec 30, 2022
Hough Transform and Hough Line Transform Using OpenCV

Hough transform is a feature extraction method for detecting simple shapes such as circles, lines, etc in an image. Hough Transform and Hough Line Transform is implemented in OpenCV with two methods; the Standard Hough Transform and the Probabilistic Hough Line Transform.

Happy  N. Monday 3 Feb 15, 2022
Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

Stock Price Prediction Using Deep Learning Univariate Time Series Predicting stock price using historical data of a company using Neural networks for

Abdultawwab Safarji 7 Nov 27, 2022
A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

Multi-Agent-Deep-Deterministic-Policy-Gradients A Pytorch implementation of the multi agent deep deterministic policy gradients(MADDPG) algorithm This

Phil Tabor 159 Dec 28, 2022
Code for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty

Deep Deterministic Uncertainty This repository contains the code for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic

Jishnu Mukhoti 69 Nov 28, 2022
This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

Quinn Herden 1 Feb 4, 2022
Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion This repository contains a pytorch implementation of "Learning to Listen: Modeling

null 50 Dec 17, 2022
Fast Scattering Transform with CuPy/PyTorch

Announcement 11/18 This package is no longer supported. We have now released kymatio: http://www.kymat.io/ , https://github.com/kymatio/kymatio which

Edouard Oyallon 289 Dec 7, 2022
Fast Neural Style for Image Style Transform by Pytorch

FastNeuralStyle by Pytorch Fast Neural Style for Image Style Transform by Pytorch This is famous Fast Neural Style of Paper Perceptual Losses for Real

Bengxy 81 Sep 3, 2022
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

timeseriesAI 2.8k Jan 8, 2023
ICML 21 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Voice2Series-Reprogramming Voice2Series: Reprogramming Acoustic Models for Time Series Classification International Conference on Machine Learning (IC

null 49 Jan 3, 2023
Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Framework overview This library allows to quickly implement different architectures based on Reservoir Computing (the family of approaches popularized

Filippo Bianchi 249 Dec 21, 2022
A real world application of a Recurrent Neural Network on a binary classification of time series data

What is this This is a real world application of a Recurrent Neural Network on a binary classification of time series data. This project includes data

Josep Maria Salvia Hornos 2 Jan 30, 2022
DRLib:A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

DRLib:A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos A concise deep reinforcement learning libr

null 329 Jan 3, 2023
Dark Finix: All in one hacking framework with almost 100 tools

Dark Finix - Hacking Framework. Dark Finix is a all in one hacking framework wit

Md. Nur habib 2 Feb 18, 2022
DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation This project hosts the code for implementing the DCT-MASK algorithms

Alibaba Cloud 57 Nov 27, 2022
Simple Python application to transform Serial data into OSC messages

SerialToOSC-Bridge Simple Python application to transform Serial data into OSC messages. The current purpose is to be a compatibility layer between ha

Division of Applied Acoustics at Chalmers University of Technology 3 Jun 3, 2021
Style transfer, deep learning, feature transform

FastPhotoStyle License Copyright (C) 2018 NVIDIA Corporation. All rights reserved. Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons

NVIDIA Corporation 10.9k Jan 2, 2023
Classifying audio using Wavelet transform and deep learning

Audio Classification using Wavelet Transform and Deep Learning A step-by-step tutorial to classify audio signals using continuous wavelet transform (C

Aditya Dutt 17 Nov 29, 2022