ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels

Last update: Dec 26, 2022

Related tags

Deep Learning scalable random convolution convolutional-neural-network time-series-classification convolutional-kernel

Overview

ROCKET + MINIROCKET

ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels.

Data Mining and Knowledge Discovery / arXiv:1910.13051 (preprint)

Most methods for time series classification that attain state-of-the-art accuracy have high computational complexity, requiring significant training time even for smaller datasets, and are intractable for larger datasets. Additionally, many existing methods focus on a single type of feature such as shape or frequency. Building on the recent success of convolutional neural networks for time series classification, we show that simple linear classifiers using random convolutional kernels achieve state-of-the-art accuracy with a fraction of the computational expense of existing methods. Using this method, it is possible to train and test a classifier on all 85 ‘bake off’ datasets in the UCR archive in < 2 h, and it is possible to train a classifier on a large dataset of more than one million time series in approximately 1 h.

Please cite as:

@article{dempster_etal_2020,
  author = {Dempster, Angus and Petitjean, Fran\c{c}ois and Webb, Geoffrey I},
  title = {ROCKET: Exceptionally fast and accurate time classification using random convolutional kernels},
  year = {2020},
  journal = {Data Mining and Knowledge Discovery},
  doi = {https://doi.org/10.1007/s10618-020-00701-z}
}

`sktime`

An implementation of ROCKET (with basic multivariate capability) is available through sktime. See the examples.

MINIROCKET NEW

MINIROCKET is up to 75× faster than ROCKET on larger datasets.

Results

UCR Archive

Scalability

Code

Requirements

Python;
Numba;
NumPy;
scikit-learn (or equivalent).

Example

from rocket_functions import generate_kernels, apply_kernels
from sklearn.linear_model import RidgeClassifierCV

[...] # load data, etc.

# generate random kernels
kernels = generate_kernels(X_training.shape[-1], 10_000)

# transform training set and train classifier
X_training_transform = apply_kernels(X_training, kernels)
classifier = RidgeClassifierCV(alphas = np.logspace(-3, 3, 10), normalize = True)
classifier.fit(X_training_transform, Y_training)

# transform test set and predict
X_test_transform = apply_kernels(X_test, kernels)
predictions = classifier.predict(X_test_transform)

Reproducing the Experiments

`reproduce_experiments_ucr.py`

Arguments:
-d --dataset_names : txt file of dataset names
-i --input_path    : parent directory for datasets
-o --output_path   : path for results
-n --num_runs      : number of runs (optional, default 10)
-k --num_kernels   : number of kernels (optional, default 10,000)

Examples:
> python reproduce_experiments_ucr.py -d bakeoff.txt -i ./Univariate_arff -o ./
> python reproduce_experiments_ucr.py -d additional.txt -i ./Univariate_arff -o ./ -n 1 -k 1000

`reproduce_experiments_scalability.py`

Arguments:
-tr --training_path : training dataset (csv)
-te --test_path     : test dataset (csv)
-o  --output_path   : path for results
-k  --num_kernels   : number of kernels

Examples:
> python reproduce_experiments_scalability.py -tr training.csv -te test.csv -o ./ -k 100
> python reproduce_experiments_scalability.py -tr training.csv -te test.csv -o ./ -k 1000

Acknowledgements

We thank Professor Eamonn Keogh and all the people who have contributed to the UCR time series classification archive. Figures in our paper showing the ranking of different classifiers and variants of ROCKET were produced using code from Ismail Fawaz et al. (2019).

🚀

Comments

Results_ucr_resamples

Hi, I have learnt your paper these day, which is very enlightening to me. I have a question about Results_ucr_resamples. I do not how to reproduce the results and what does resample mean. Could you please give me some help?

opened by peter943 7

Memory Error

First of all: Thank you very much for providing this new method! If it works, it works very well and leads to good results (i.e. leads to results comparable to other methods within minutes instead of weeks).

Unfortunately, it doesn't work all the time for me. Especially if I use 10,000 kernels as recommended in your paper. This is what I end up with on a Windows 10 machine equipped with an NVIDIA 1080 TI GPU and 32 GB RAM:

Algorithms.trainAndEvaluateROCKET(...)
  File "PathToMyProject\MyClass.py", line 180, in trainAndEvaluateROCKET
    classifier.fit(X_training_transform,trainY)
  File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1815, in fit
    _BaseRidgeCV.fit(self, X, Y, sample_weight=sample_weight)
  File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1528, in fit
    estimator.fit(X, y, sample_weight=sample_weight)
  File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1436, in fit
    X_mean, *decomposition = decompose(X, y, sqrt_sw)
  File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1348, in _svd_decompose_design_matrix
    U, s, _ = linalg.svd(X, full_matrices=0)
  File "PathToMyAnacondaFolder\lib\site-packages\scipy\linalg\decomp_svd.py", line 129, in svd
    full_matrices=full_matrices, overwrite_a=overwrite_a)
MemoryError

My testing data-set consists of 3,278 time series (with a length of 1041 amplitude values each) and the training data-set consists of 43,264 time series (again with a length of 1041 amplitude values each). Any help is highly appreciated.

// edit:I forgot to mention that it works perfectly fine for 100 or 1,000 kernels.

opened by Huii 6

Fixing the random state
Hi,

I just wanted to let you know that it is possible to fix the random seed with Numba. The only change would be to add a seed parameter to the generate_kernels function:

@njit def generate_kernels(input_length, num_kernels, seed=42): np.random.seed(seed)

This way, the results would be perfectly reproducible (as it is, there is randomness when the kernels are generated that cannot be fixed).

Best, Johann
opened by johannfaouzi 5
Can output class-probabilities? TS must be same length?

Hi,

I would like to know if, instead of the predicted class, I can know the probability of being in that class. Also, if it allows for time series with different lengths.

regards, Ferran

opened by Ferran-pv 2
Application of kernels to multivariate data

Hi,

I notice that that the ROCKET transform implemented in sktime now supports the transformation of multivariate datasets. I was wondering whether you could detail how it is applied?

Many thanks.

opened by MJFlynn 2
Per-feature normalization

Hi Angus,

First of all congrats on your new TSC method. I've used it and the results are very good, and super fast compared to other methods!

I've noticed a discrepancy between the bake-off and the scalable codes you've shared. In the scalable, you perform a per-feature normalization while you don't do that in the bake-off. Is there any reason for that? I've run a few tests with some bake-off datasets and have seen that this per-feature normalization hurts performance.

opened by oguiza 2
Satellite Image Time Series (SITS) dataset

Hi I'd like to reproduce the scalability experiment using the SITS dataset mentioned in the paper. Is this dataset publicly available? If so, where can I find it?

Thanks =)

opened by stevcabello 1
Added support for multichannel Timeseries

this PR adds support for multichannel Timeseries input to ROCKET. It does this by adding a new axis channel to the weights array. It also adds some checks to verify if the input is single channel or multichannel. Also adds casting to np.float32 everywhere. An example of a 2 channel TS regression is also provided in a notebook

opened by tcapelle 0
How to handle NaN values in the UCR data sets?

Hello,

I just learn the amazing Rocket and MiniRocket recently. When I used them to classify some UCRArchive2018 datasets (such as 'DodgerLoopGame'), NaN values representing missing values caused Errors. Therefore, I wonder how you handle these values? Is it approriate to directly delete samples with NaN values?

Thanks for your help!

opened by ShaowuChen 1

Owner

GitHub

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

187 Dec 26, 2022

Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels Blazing fast x86-64 VM kernel fuzzing framework with performant VM reloads for Linux, MacOS an

541 Nov 27, 2022

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes Introduction This is the unofficial code of Deep Dual-re

113 Dec 23, 2022

Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket:

MLJAR Automated Machine Learning Documentation: https://supervised.mljar.com/ Source Code: https://github.com/mljar/mljar-supervised Table of Contents

2.4k Dec 31, 2022

This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Stock Market Buy/Sell/Hold prediction Using convolutional Neural Network This repo is an attempt to implement the research paper titled "Algorithmic F

136 Dec 28, 2022

Rocket-recycling with Reinforcement Learning

Rocket-recycling with Reinforcement Learning Developed by: Zhengxia Zou I have long been fascinated by the recovery process of SpaceX rockets. In this

202 Jan 3, 2023

A framework that allows people to write their own Rocket League bots.

YOU PROBABLY SHOULDN'T PULL THIS REPO Bot Makers Read This! If you just want to make a bot, you don't need to be here. Instead, start with one of thes

543 Dec 20, 2022

Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.

Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.

1 Dec 28, 2021

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

2.8k Jan 8, 2023

Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Framework overview This library allows to quickly implement different architectures based on Reservoir Computing (the family of approaches popularized

249 Dec 21, 2022

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

简介通过PaddlePaddle框架复现了论文 Real-time Convolutional Neural Networks for Emotion and Gender Classification 中提出的两个模型，分别是SimpleCNN和MiniXception。利用 imdb_crop

8 Mar 11, 2022

ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels

Related tags

Overview

ROCKET + MINIROCKET

sktime

MINIROCKET *NEW*

Results

UCR Archive

Scalability

Code

Requirements

Example

Reproducing the Experiments

Acknowledgements

Comments

Owner

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket:

This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Rocket-recycling with Reinforcement Learning

A framework that allows people to write their own Rocket League bots.

Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

ICML 21 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification

A real world application of a Recurrent Neural Network on a binary classification of time series data

A lightweight deep network for fast and accurate optical flow estimation.

A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"

Realtime segmentation with ENet, the fast and accurate segmentation net.

Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

Random-Afg - Afghanistan Random Old Idz Cloner Tools

`sktime`

MINIROCKET NEW