ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels

Overview

ROCKET + MINIROCKET

ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels.

Data Mining and Knowledge Discovery / arXiv:1910.13051 (preprint)

Most methods for time series classification that attain state-of-the-art accuracy have high computational complexity, requiring significant training time even for smaller datasets, and are intractable for larger datasets. Additionally, many existing methods focus on a single type of feature such as shape or frequency. Building on the recent success of convolutional neural networks for time series classification, we show that simple linear classifiers using random convolutional kernels achieve state-of-the-art accuracy with a fraction of the computational expense of existing methods. Using this method, it is possible to train and test a classifier on all 85 ‘bake off’ datasets in the UCR archive in < 2 h, and it is possible to train a classifier on a large dataset of more than one million time series in approximately 1 h.

Please cite as:

@article{dempster_etal_2020,
  author = {Dempster, Angus and Petitjean, Fran\c{c}ois and Webb, Geoffrey I},
  title = {ROCKET: Exceptionally fast and accurate time classification using random convolutional kernels},
  year = {2020},
  journal = {Data Mining and Knowledge Discovery},
  doi = {https://doi.org/10.1007/s10618-020-00701-z}
}

sktime

An implementation of ROCKET (with basic multivariate capability) is available through sktime. See the examples.

MINIROCKET *NEW*

MINIROCKET is up to 75× faster than ROCKET on larger datasets.

Results

UCR Archive

Scalability

Code

rocket_functions.py

Requirements

  • Python;
  • Numba;
  • NumPy;
  • scikit-learn (or equivalent).

Example

from rocket_functions import generate_kernels, apply_kernels
from sklearn.linear_model import RidgeClassifierCV

[...] # load data, etc.

# generate random kernels
kernels = generate_kernels(X_training.shape[-1], 10_000)

# transform training set and train classifier
X_training_transform = apply_kernels(X_training, kernels)
classifier = RidgeClassifierCV(alphas = np.logspace(-3, 3, 10), normalize = True)
classifier.fit(X_training_transform, Y_training)

# transform test set and predict
X_test_transform = apply_kernels(X_test, kernels)
predictions = classifier.predict(X_test_transform)

Reproducing the Experiments

reproduce_experiments_ucr.py

Arguments:
-d --dataset_names : txt file of dataset names
-i --input_path    : parent directory for datasets
-o --output_path   : path for results
-n --num_runs      : number of runs (optional, default 10)
-k --num_kernels   : number of kernels (optional, default 10,000)

Examples:
> python reproduce_experiments_ucr.py -d bakeoff.txt -i ./Univariate_arff -o ./
> python reproduce_experiments_ucr.py -d additional.txt -i ./Univariate_arff -o ./ -n 1 -k 1000

reproduce_experiments_scalability.py

Arguments:
-tr --training_path : training dataset (csv)
-te --test_path     : test dataset (csv)
-o  --output_path   : path for results
-k  --num_kernels   : number of kernels

Examples:
> python reproduce_experiments_scalability.py -tr training.csv -te test.csv -o ./ -k 100
> python reproduce_experiments_scalability.py -tr training.csv -te test.csv -o ./ -k 1000

Acknowledgements

We thank Professor Eamonn Keogh and all the people who have contributed to the UCR time series classification archive. Figures in our paper showing the ranking of different classifiers and variants of ROCKET were produced using code from Ismail Fawaz et al. (2019).

🚀
Comments
  • Results_ucr_resamples

    Results_ucr_resamples

    Hi, I have learnt your paper these day, which is very enlightening to me. I have a question about Results_ucr_resamples. I do not how to reproduce the results and what does resample mean. Could you please give me some help?

    opened by peter943 7
  • Memory Error

    Memory Error

    First of all: Thank you very much for providing this new method! If it works, it works very well and leads to good results (i.e. leads to results comparable to other methods within minutes instead of weeks).

    Unfortunately, it doesn't work all the time for me. Especially if I use 10,000 kernels as recommended in your paper. This is what I end up with on a Windows 10 machine equipped with an NVIDIA 1080 TI GPU and 32 GB RAM:

    Algorithms.trainAndEvaluateROCKET(...)
      File "PathToMyProject\MyClass.py", line 180, in trainAndEvaluateROCKET
        classifier.fit(X_training_transform,trainY)
      File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1815, in fit
        _BaseRidgeCV.fit(self, X, Y, sample_weight=sample_weight)
      File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1528, in fit
        estimator.fit(X, y, sample_weight=sample_weight)
      File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1436, in fit
        X_mean, *decomposition = decompose(X, y, sqrt_sw)
      File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1348, in _svd_decompose_design_matrix
        U, s, _ = linalg.svd(X, full_matrices=0)
      File "PathToMyAnacondaFolder\lib\site-packages\scipy\linalg\decomp_svd.py", line 129, in svd
        full_matrices=full_matrices, overwrite_a=overwrite_a)
    MemoryError
    

    My testing data-set consists of 3,278 time series (with a length of 1041 amplitude values each) and the training data-set consists of 43,264 time series (again with a length of 1041 amplitude values each). Any help is highly appreciated.

    // edit:I forgot to mention that it works perfectly fine for 100 or 1,000 kernels.

    opened by Huii 6
  • Fixing the random state

    Fixing the random state

    Hi,

    I just wanted to let you know that it is possible to fix the random seed with Numba. The only change would be to add a seed parameter to the generate_kernels function:

    @njit
    def generate_kernels(input_length, num_kernels, seed=42):
        np.random.seed(seed)
    

    This way, the results would be perfectly reproducible (as it is, there is randomness when the kernels are generated that cannot be fixed).

    Best, Johann

    opened by johannfaouzi 5
  • Can output class-probabilities? TS must be same length?

    Can output class-probabilities? TS must be same length?

    Hi,

    I would like to know if, instead of the predicted class, I can know the probability of being in that class. Also, if it allows for time series with different lengths.

    regards, Ferran

    opened by Ferran-pv 2
  • Application of kernels to multivariate data

    Application of kernels to multivariate data

    Hi,

    I notice that that the ROCKET transform implemented in sktime now supports the transformation of multivariate datasets. I was wondering whether you could detail how it is applied?

    Many thanks.

    opened by MJFlynn 2
  • Per-feature normalization

    Per-feature normalization

    Hi Angus,

    First of all congrats on your new TSC method. I've used it and the results are very good, and super fast compared to other methods!

    I've noticed a discrepancy between the bake-off and the scalable codes you've shared. In the scalable, you perform a per-feature normalization while you don't do that in the bake-off. Is there any reason for that? I've run a few tests with some bake-off datasets and have seen that this per-feature normalization hurts performance.

    opened by oguiza 2
  • Satellite Image Time Series (SITS) dataset

    Satellite Image Time Series (SITS) dataset

    Hi I'd like to reproduce the scalability experiment using the SITS dataset mentioned in the paper. Is this dataset publicly available? If so, where can I find it?

    Thanks =)

    opened by stevcabello 1
  • Added support for multichannel Timeseries

    Added support for multichannel Timeseries

    this PR adds support for multichannel Timeseries input to ROCKET. It does this by adding a new axis channel to the weights array. It also adds some checks to verify if the input is single channel or multichannel. Also adds casting to np.float32 everywhere. An example of a 2 channel TS regression is also provided in a notebook

    opened by tcapelle 0
  • How to handle NaN values in the UCR data sets?

    How to handle NaN values in the UCR data sets?

    Hello,

    I just learn the amazing Rocket and MiniRocket recently. When I used them to classify some UCRArchive2018 datasets (such as 'DodgerLoopGame'), NaN values representing missing values caused Errors. Therefore, I wonder how you handle these values? Is it approriate to directly delete samples with NaN values?

    Thanks for your help!

    opened by ShaowuChen 1
Owner
null
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

null 187 Dec 26, 2022
Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels Blazing fast x86-64 VM kernel fuzzing framework with performant VM reloads for Linux, MacOS an

Chair for Sys­tems Se­cu­ri­ty 541 Nov 27, 2022
Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket:

MLJAR Automated Machine Learning Documentation: https://supervised.mljar.com/ Source Code: https://github.com/mljar/mljar-supervised Table of Contents

MLJAR 2.4k Dec 31, 2022
This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Stock Market Buy/Sell/Hold prediction Using convolutional Neural Network This repo is an attempt to implement the research paper titled "Algorithmic F

Asutosh Nayak 136 Dec 28, 2022
Rocket-recycling with Reinforcement Learning

Rocket-recycling with Reinforcement Learning Developed by: Zhengxia Zou I have long been fascinated by the recovery process of SpaceX rockets. In this

Zhengxia Zou 202 Jan 3, 2023
A framework that allows people to write their own Rocket League bots.

YOU PROBABLY SHOULDN'T PULL THIS REPO Bot Makers Read This! If you just want to make a bot, you don't need to be here. Instead, start with one of thes

null 543 Dec 20, 2022
Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.

Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.

Nafis Ahmed 1 Dec 28, 2021
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

timeseriesAI 2.8k Jan 8, 2023
Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Framework overview This library allows to quickly implement different architectures based on Reservoir Computing (the family of approaches popularized

Filippo Bianchi 249 Dec 21, 2022
An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

简介 通过PaddlePaddle框架复现了论文 Real-time Convolutional Neural Networks for Emotion and Gender Classification 中提出的两个模型,分别是SimpleCNN和MiniXception。利用 imdb_crop

null 8 Mar 11, 2022
ICML 21 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Voice2Series-Reprogramming Voice2Series: Reprogramming Acoustic Models for Time Series Classification International Conference on Machine Learning (IC

null 49 Jan 3, 2023
A real world application of a Recurrent Neural Network on a binary classification of time series data

What is this This is a real world application of a Recurrent Neural Network on a binary classification of time series data. This project includes data

Josep Maria Salvia Hornos 2 Jan 30, 2022
A lightweight deep network for fast and accurate optical flow estimation.

FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation The official PyTorch implementation of FastFlowNet (ICRA 2021). Authors: Lingtong

Tone 161 Jan 3, 2023
A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

One-Stage Visual Grounding ***** New: Our recent work on One-stage VG is available at ReSC.***** A Fast and Accurate One-Stage Approach to Visual Grou

Zhengyuan Yang 118 Dec 5, 2022
Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"

CTDNet The PyTorch code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection" Requirements Python 3.6

CVTEAM 28 Oct 20, 2022
Realtime segmentation with ENet, the fast and accurate segmentation net.

Enet This is a realtime segmentation net with almost 22 fps on GTX1080 ti, and the model size is very small with only 28M. This repo contains the infe

JinTian 14 Aug 30, 2022
Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

Receptive Field Block Net for Accurate and Fast Object Detection By Songtao Liu, Di Huang, Yunhong Wang Updatas (2021/07/23): YOLOX is here!, stronger

Liu Songtao 1.4k Dec 21, 2022
Random-Afg - Afghanistan Random Old Idz Cloner Tools

AFGHANISTAN RANDOM OLD IDZ CLONER TOOLS Install $ apt update $ apt upgrade $ apt

MAHADI HASAN AFRIDI 5 Jan 26, 2022