MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification



arXiv:2012.08791 (preprint)

Until recently, the most accurate methods for time series classification were limited by high computational complexity. ROCKET achieves state-of-the-art accuracy with a fraction of the computational expense of most existing methods by transforming input time series using random convolutional kernels, and using the transformed features to train a linear classifier. We reformulate ROCKET into a new method, MINIROCKET, making it up to 75 times faster on larger datasets, and making it almost deterministic (and optionally, with additional computational expense, fully deterministic), while maintaining essentially the same accuracy. Using this method, it is possible to train and test a classifier on all of 109 datasets from the UCR archive to state-of-the-art accuracy in less than 10 minutes. MINIROCKET is significantly faster than any other method of comparable accuracy (including ROCKET), and significantly more accurate than any other method of even roughly-similar computational expense. As such, we suggest that MINIROCKET should now be considered and used as the default variant of ROCKET.

Please cite as:

  author  = {Dempster, Angus and Schmidt, Daniel F and Webb, Geoffrey I},
  title   = {{MINIROCKET}: A Very Fast (Almost) Deterministic Transform for Time Series Classification},
  year    = {2020},
  journal = {arXiv:2012.08791}

sktime* / Multivariate

MINIROCKET (including a basic multivariate implementation) is also available through sktime. See the examples.

* for larger datasets (10,000+ training examples), the sktime methods should be integrated with SGD or similar as per (replace calls to fit(...) and transform(...) from with calls to the relevant sktime methods as appropriate)


* num_training_examples does not include the validation set of 2,048 training examples, but the transform time for the validation set is included in time_training_seconds


  • Python, NumPy, pandas
  • Numba (0.50+)
  • scikit-learn or similar
  • PyTorch or similar (for larger datasets)

* all pre-packaged with or otherwise available through Anaconda

Code (MINIROCKETDV) (PyTorch / 10,000+ Training Examples) (equivalent to sktime/MiniRocketMultivariate) (variable-length input; experimental)

Important Notes


The functions in and are compiled by Numba on import, which may take some time. By default, the compiled functions are now cached, so this should only happen once (i.e., on the first import).

Input Data Type

Input data should be of type np.float32. Alternatively, you can change the Numba signatures to accept, e.g., np.float64.


Unlike ROCKET, MINIROCKET does not require the input time series to be normalised. (However, whether or not it makes sense to normalise the input time series may depend on your particular application.)



from minirocket import fit, transform
from sklearn.linear_model import RidgeClassifierCV

[...] # load data, etc.

# note:
# * input time series do *not* need to be normalised
# * input data should be np.float32

parameters = fit(X_training)

X_training_transform = transform(X_training, parameters)

classifier = RidgeClassifierCV(alphas = np.logspace(-3, 3, 10), normalize = True), Y_training)

X_test_transform = transform(X_test, parameters)

predictions = classifier.predict(X_test_transform)


from minirocket_dv import fit_transform
from minirocket import transform
from sklearn.linear_model import RidgeClassifierCV

[...] # load data, etc.

# note:
# * input time series do *not* need to be normalised
# * input data should be np.float32

parameters, X_training_transform = fit_transform(X_training)

classifier = RidgeClassifierCV(alphas = np.logspace(-3, 3, 10), normalize = True), Y_training)

X_test_transform = transform(X_test, parameters)

predictions = classifier.predict(X_test_transform)

PyTorch / 10,000+ Training Examples

from softmax import train, predict

model_etc = train("InsectSound_TRAIN_shuffled.csv", num_classes = 10, training_size = 22952)
# note: 22,952 = 25,000 - 2,048 (validation)

predictions, accuracy = predict("InsectSound_TEST.csv", *model_etc)

Variable-Length Input (Experimental)

from minirocket_variable import fit, transform, filter_by_length
from sklearn.linear_model import RidgeClassifierCV

[...] # load data, etc.

# note:
# * input time series do *not* need to be normalised
# * input data should be np.float32

# special instructions for variable-length input:
# * concatenate variable-length input time series into a single 1d numpy array
# * provide another 1d array with the lengths of each of the input time series
# * input data should be np.float32 (as above); lengths should be np.int32

# optionally, use a different reference length when setting dilation (default is
# the length of the longest time series), and use fit(...) with time series of
# at least this length, e.g.:
# >>> reference_length = X_training_lengths.mean()
# >>> X_training_1d_filtered, X_training_lengths_filtered = \
# >>> filter_by_length(X_training_1d, X_training_lengths, reference_length)
# >>> parameters = fit(X_training_1d_filtered, X_training_lengths_filtered, reference_length)

parameters = fit(X_training_1d, X_training_lengths)

X_training_transform = transform(X_training_1d, X_training_lengths, parameters)

classifier = RidgeClassifierCV(alphas = np.logspace(-3, 3, 10), normalize = True), Y_training)

X_test_transform = transform(X_test_1d, X_test_lengths, parameters)

predictions = classifier.predict(X_test_transform)


We thank Professor Eamonn Keogh and all the people who have contributed to the UCR time series classification archive. Figures in our paper showing mean ranks were produced using code from Ismail Fawaz et al. (2019).

  • starting with

    starting with "wide" data

    If I start with the wide data format, a 2d array of samples (rows) by sensor readings (columns), what is the right way to transform that to fit the requirements of this library?

    opened by BrannonKing 7
  • TypeError: No matching definition for argument type(s) array(float64, 2d, C), array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)

    TypeError: No matching definition for argument type(s) array(float64, 2d, C), array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)

    Thank you very much, once again, for this great piece of software. Very much appreciated! I'm trying to use it with my data but unfortunately, I always get the following error if I attempt to fit my input with "parameters = fit(x_trainScaled)":

    TypeError: No matching definition for argument type(s) array(float64, 2d, C), array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)

    Here are some, probably, relevant characteristics of my input:



    (3000, 3000)

    // edit:

    This is the whole traceback:

      File "minirocket\code\", line 130, in fit
        biases = _fit_biases(X, dilations, num_features_per_dilation, quantiles)
      File "\lib\site-packages\numba\", line 500, in _explain_matching_error
        raise TypeError(msg)
    opened by Huii 7
  • Example of CSV file reading

    Example of CSV file reading

    Hello, I'm trying to figure out what minirocket expects as data on input. I keep on getting TypeError: No matching definition for argument type(s) pyobject, array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)

    My data has following format:


    And I read it like this:

    dataset = pd.read_csv(filename, usecols = [0, 1], header=0)
    dataset = dataset.dropna()
    dataset.columns = dataset.columns.to_series().apply(lambda x: x.strip())
    opened by jumpingfella 5
  • some question about multivarible version.

    some question about multivarible version.

    hello, I watch the code about multivarible miniroket. I think the combine multi channels is not make sense for me. Conv(x) , x is channel 0 Conv(y), y is channel 1 when combine the channel, just become: Conv(x+y) why not, change the np.sum to Conv(x*y)

    opened by Presburger 3
  • Unlabeled data

    Unlabeled data

    hello, thanks for your excellent work. wmm, and I have a problem, I find the response in "starting with "wide" data", you say the data can be unlabeled, it depends on my task "(You don't need labels necessarily, depending on your task.)" and when I read your article or code readme, I notice that you mentioned the parameters in different data are same, right? (ok, I don't know if I understand right, and I can't find where is the latter information.) So my question is, could I apply your work on my unlabeled data? if it's true, how can I set the "Y_traing" in examples codes? thanks!

    opened by hyjocean 2
  • Feature Size

    Feature Size

    Thank you so much for making your work available! I have a quick question about the feature size. Looks like the minimum number of feature size is 84. Is there any harm in extracting 84 features and using only a subset them?

    opened by tdincer 2
  • minirocket_multivariate extremely slow

    minirocket_multivariate extremely slow

    My setup is that I am using large dataset (10,000+) and I pass data as batches into model. I do not cache the data and run transform every time I pass data into model on every epoch. I run this same setup for both with input shape (32768,99) and with input shape (32768,1,99) so the number of channel is 1.

    I find that the version runs significantly more slow on every transform() relative to

    Is there a potential bug in the code?

    opened by turmeric-blend 2
  • X_validation not transformed properly?

    X_validation not transformed properly?

    hi, for, if the data is split into multiple chunks, then X_validation is only transformed for the first's chunk biases, as biases for different chunks are different, but the transform is only applied once.

    if epoch == 0 and chunk_index == 0: # only run once <---
       parameters = fit(X_training, args["num_features"]) # returns: dilations, num_features_per_dilation, biases
       # transform validation data
       X_validation_transform = transform(X_validation, parameters)

    would transforming the X_validation with each chunk's biases improve performance?


    similarly for the latter part (where X_validation_transform is only normalised with mean and std values from the first chunk):

    if epoch == 0 and chunk_index == 0:
                        # per-feature mean and standard deviation
                        f_mean = X_training_transform.mean(0)
                        f_std = X_training_transform.std(0) + 1e-8
                        # normalise validation features
                        X_validation_transform = (X_validation_transform - f_mean) / f_std
                        X_validation_transform = torch.FloatTensor(X_validation_transform)
    opened by turmeric-blend 2
  • datatype


    when i use my data with minirocket in pycharm , had a problem with dataype, like: Traceback (most recent call last): File "E:/PycharmProjects/minirocket-main/code/", line 46, in parameters = fit(X_training) File "E:\PycharmProjects\minirocket-main\code\", line 130, in fit biases = _fit_biases(X, dilations, num_features_per_dilation, quantiles) File "E:\ProgramData\Anaconda3\envs\deepl\lib\site-packages\numba\core\", line 703, in _explain_matching_error raise TypeError(msg)TypeError: No matching definition for argument type(s) pyobject, array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C) how can i work it

    opened by dfx1822375 13
  • Can't set random_state when doing a gridsearchCV

    Can't set random_state when doing a gridsearchCV


    import numpy as np
    from sklearn.linear_model import RidgeClassifier
    from sklearn.pipeline import Pipeline
    from sklearn.model_selection import GridSearchCV
    from sktime.datasets import load_basic_motions
    from sktime.transformations.panel.rocket import MiniRocketMultivariate`

    Make train/test split and set up pipeline

    X_train, y_train = load_basic_motions(split="train", return_X_y=True)
    model = Pipeline([
        ('minirocket', MiniRocketMultivariate(random_state=42)), 
        ('ridge_clf', RidgeClassifier(random_state=42)),

    Fit 1 model, y_train) Works fine

    Now do a gridsearch for alpha value

    parameters = {
      'ridge_clf__alpha': [0.1, 1, 10],
    model_cv = GridSearchCV(model, parameters), y_train)

    "RuntimeError: Cannot clone object MiniRocketMultivariate(random_state=42), as the constructor either does not set or modifies parameter random_state"

    opened by StijnBr 3
  • Extending Documentation of minirocket multivariate

    Extending Documentation of minirocket multivariate


    The implementations for minirocket multivariate (both here and on sktime) mention that it is a naive extension of the univatiate version, but do not give any clearer explanation of what is actually happening under the hood. Looking directly at the source code for this version does not help that much either, as it is fairly hard to read.

    Could you extend the documentation on the repository with a (coarse) description of how the algorithm was extended to handle multivariate data and/or add some comments to the source code in that regard?


    opened by bdudzik 1
