Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

Overview

GENDIS Build Status PyPI version Read The Docs Downloads

GENetic DIscovery of Shapelets

In the time series classification domain, shapelets are small subseries that are discriminative for a certain class. It has been shown that by projecting the original dataset to a distance space, where each axis corresponds to the distance to a certain shapelet, classifiers are able to achieve state-of-the-art results on a plethora of datasets.

This repository contains an implementation of GENDIS, an algorithm that searches for a set of shapelets in a genetic fashion. The algorithm is insensitive to its parameters (such as population size, crossover and mutation probability, ...) and can quickly extract a small set of shapelets that is able to achieve predictive performances similar (or better) to that of other shapelet techniques.

Installation

We currently support Python 3.5 & Python 3.6. For installation, there are two alternatives:

  1. Clone the repository https://github.com/IBCNServices/GENDIS.git and run (python3 -m) pip -r install requirements.txt
  2. GENDIS is hosted on PyPi. You can just run (python3 -m) pip install gendis to add gendis to your dist-packages (you can use it from everywhere).

Make sure NumPy and Cython is already installed (pip install numpy and pip install Cython), since that is required for the setup script.

Tutorial & Example

1. Loading & preprocessing the datasets

In a first step, we need to construct at least a matrix with timeseries (X_train) and a vector with labels (y_train). Additionally, test data can be loaded as well in order to evaluate the pipeline in the end.

import pandas as pd
# Read in the datafiles
train_df = pd.read_csv(<DATA_FILE>)
test_df = pd.read_csv(<DATA_FILE>)
# Split into feature matrices and label vectors
X_train = train_df.drop('target', axis=1)
y_train = train_df['target']
X_test = test_df.drop('target', axis=1)
y_test = test_df['target']

2. Creating a GeneticExtractor object

Construct the object. For a list of all possible parameters, and a description, please refer to the documentation in the code

from gendis.genetic import GeneticExtractor
genetic_extractor = GeneticExtractor(population_size=50, iterations=25, verbose=True, 
                                     mutation_prob=0.3, crossover_prob=0.3, 
                                     wait=10, max_len=len(X_train) // 2)

3. Fit the GeneticExtractor and construct distance matrix

shapelets = genetic_extractor.fit(X_train, y_train)
distances_train = genetic_extractor.transform(X_train)
distances_test = genetic_extractor.transform(X_test)

4. Fit ML classifier on constructed distance matrix

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
lr = LogisticRegression()
lr.fit(distances_train, y_train)

print('Accuracy = {}'.format(accuracy_score(y_test, lr.predict(distances_test))))

Example notebook

A simple example is provided in this notebook

Data

All datasets in this repository are downloaded from timeseriesclassification. Please refer to them appropriately when using any dataset.

Paper experiments

In order to reproduce the results from the corresponding paper, please check out this directory.

Tests

We provide a few doctests and unit tests. To run the doctests: python3 -m doctest -v <FILE>, where <FILE> is the Python file you want to run the doctests from. To run unit tests: nose2 -v

Contributing, Citing and Contact

If you have any questions, are experiencing bugs in the GENDIS implementation, or would like to contribute, please feel free to create an issue/pull request in this repository or take contact with me at gilles(dot)vandewiele(at)ugent(dot)be

If you use GENDIS in your work, please use the following citation:

@article{vandewiele2021gendis,
  title={GENDIS: Genetic Discovery of Shapelets},
  author={Vandewiele, Gilles and Ongenae, Femke and Turck, Filip De},
  journal={Sensors},
  volume={21},
  number={4},
  pages={1059},
  year={2021},
  publisher={Multidisciplinary Digital Publishing Institute}
}
Comments
  • Run

    Run "genetic_extractor.fit(X_train, y_train)"failed

    When I run "example.ipynb", it couldn't work successfully but without any error message,just stay busy. It seem to happened in this code:"genetic_extractor.fit(X_train, y_train)". How can I fix it? wrong

    opened by Lichat 10
  • GENDIS import problem

    GENDIS import problem

    Hi,

    I have problems with importing modules from gendis, when I run this code I get the following error:

    from gendis.genetic import GeneticExtractor
    genetic_extractor = GeneticExtractor(population_size=50, iterations=25, verbose=False, 
                                         normed=False, add_noise_prob=0.3, add_shapelet_prob=0.3, 
                                         wait=10, plot='notebook', remove_shapelet_prob=0.3, 
                                         crossover_prob=0.66, n_jobs=4)
    

    ERROR:

    Traceback (most recent call last):
      File "/Users/maria/PycharmProjects/gendis_new/main.py", line 1, in <module>
        from gendis.genetic import GeneticExtractor
      File "/Users/maria/PycharmProjects/gendis_new/gendis/genetic.py", line 35, in <module>
        from gendis.pairwise_dist import _pdist
    ImportError: No module named 'gendis.pairwise_dist'
    
    Process finished with exit code 1
    

    I’ve installed gendis sucessfully with Python 3.5. OS: Mojave 10.14.3

    opened by mmar30 9
  • GENDIS Install problem

    GENDIS Install problem

    I am trying to install gendis. But I can't install it. I used 'pip install gendis' in my command prompt. But It's failed. I am using python 3.6.5

    At the end there is this error

    " ---------------------------------------- Command ""c:\users\humaun rashid\anaconda3\python.exe" -u -c "import setuptools, tokenize;file='C:\Users\HUMAUN~1\AppData\Local\Temp\pip-install-m90pf7zc\tslearn\setup.py';f=getattr(tokeni ze, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\HUMAUN~1\AppData\Local\Temp\pip-record-zwcfs0k2\install-recor d.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\HUMAUN~1\AppData\Local\Temp\pip-install-m90pf7zc\tslearn"

    opened by humaun21 7
  • from gendis.genetic import GeneticExtractor and from genetic import GeneticExtractor don't work

    from gendis.genetic import GeneticExtractor and from genetic import GeneticExtractor don't work

    Hi There,

    thanks for creating this great package, i am just installed and try to explore it. However when i try to follow the tutorial, i got below error:

    from gendis.genetic import GeneticExtractor Traceback (most recent call last): File "C:\Anaconda\envs\te\gendis\genetic.py", line 33, in from fitness import logloss_fitness ModuleNotFoundError: No module named 'fitness'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "C:\Anaconda\envs\te\gendis\fitness.py", line 6, in from gendis.pairwise_dist import _pdist ModuleNotFoundError: No module named 'gendis.pairwise_dist'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "", line 1, in File "C:\Anaconda\envs\te\gendis\genetic.py", line 35, in from gendis.fitness import logloss_fitness File "C:\Anaconda\envs\te\gendis\fitness.py", line 8, in from pairwise_dist import _pdist ModuleNotFoundError: No module named 'pairwise_dist'

    from genetic import GeneticExtractor Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'genetic'

    but i am able to import gendis using below and see the private members only of this class, could you please kindly help?

    import gendis gendis. gendis.cached gendis.format( gendis.loader gendis.reduce_ex( gendis.class( gendis.ge( gendis.lt( gendis.repr( gendis.delattr( gendis.getattribute( gendis.name gendis.setattr( gendis.dict gendis.gt( gendis.ne( gendis.sizeof( gendis.dir( gendis.hash( gendis.new( gendis.spec gendis.doc gendis.init( gendis.package gendis.str( gendis.eq( gendis.init_subclass( gendis.path gendis.subclasshook( gendis.file gendis.le( gendis.reduce(

    best pandaa

    opened by texaspandaa 5
  • Installation Problem

    Installation Problem

    Hello Gilles, I am a graduate student currently trying to use GENDIS for a machine learning project. However, when I used command "!pip install gendis", it returns the "ResolutionImpossible" error since GENDIS requires matplotlib version 2.1.2, but now I have matplotlib version 3.5.2. I tried "!pipx install matplotlib==2.1.2" to install the earlier version, and also attempted the clone repository alternative, but both also failed. Any idea on how to get over this problem?

    opened by Haoxiang-Sun-BioEng 2
  • Error: No module named 'gendis'

    Error: No module named 'gendis'

    The following error is reported when I run the first shell in example.ipynb. The runing window is shown below. I will appreciate if you can give me some guidance. image image image

    opened by jadew0321 2
  • Data inputs for univariate time series

    Data inputs for univariate time series

    Hi, I have a univariate time series with time stamps as a starting point and a target variable array. Not sure how you are processing the data input in the example provided, what are you expecting as Data input into GENDIS. Thanks !!

    opened by aron-alarik 2
  • Multi-processing

    Multi-processing

    Hi, your package is great! One thing is I can't get multiprocessing to work. I have n_jobs > 1 but it doesn't appear to start more jobs. I have 50+ cores and just uses one. Any help is much appreciated. Thanks:

    cpus = 30
    st = GeneticExtractor(verbose=True, population_size=30, iterations=10, plot=None, n_jobs=cpus)
    
    opened by jmrichardson 2
  • Support for custom fitness function

    Support for custom fitness function

    allow user to write his own custom loss function, interface should be smth like:

    def fitness(D, y):
      """Calculate the fitness of a candidate solution.
      
      Parameters
      ---------------
      D: 2D array-like. 
        array of distances
      y: 1D array-like
        array with ground truth
    
      Returns
      -----------
      x: float
        the score the genetic algorithm tries to maximize
      """
      return x
    
    enhancement 
    opened by GillesVandewiele 0
  • Update documentations + readthedocs hosting

    Update documentations + readthedocs hosting

    Re-check all docs. An example of a mistake is the fact that the wait parameter is not explained in the docs. While we're at it, host a readthedocs as well.

    opened by GillesVandewiele 0
  • Bump numpy from 1.14.5 to 1.22.0 in /gendis/docs

    Bump numpy from 1.14.5 to 1.22.0 in /gendis/docs

    Bumps numpy from 1.14.5 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump numpy from 1.16.0 to 1.22.0

    Bump numpy from 1.16.0 to 1.22.0

    Bumps numpy from 1.16.0 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Extension to multivariate time series

    Extension to multivariate time series

    Hi, thanks for your work!Acturally, I would like to extract a set of shapelets from a bunch of multivariate time series. Does the current version of the code provide the above functionality? Or do I need to extract shapelets for each variable individually?

    opened by jadew0321 1
  • Installation problems with up-to-date libraries

    Installation problems with up-to-date libraries

    Is this package still supported? It seems that the requirements in the installation procedure are outdated and interfere with an up-to-date environment, making the installation difficult and inconvenient in a regular situation.

    opened by aendrs 7
  • TypeError: __init__() got an unexpected keyword argument 'add_noise_prob'

    TypeError: __init__() got an unexpected keyword argument 'add_noise_prob'

    Hi, I am getting this error when running this code:

    from gendis.genetic import GeneticExtractor
    genetic_extractor = GeneticExtractor(population_size=50, iterations=25, verbose=False,
                                         normed=False, add_noise_prob=0.3, add_shapelet_prob=0.3,
                                         wait=10, plot='notebook', remove_shapelet_prob=0.3,
                                         crossover_prob=0.66, n_jobs=4, max_len=len(X_train_pyts) // 2)
    

    Traceback (most recent call last): File "C:\Users\john\AppData\Roaming\Python\Python37\site-packages\IPython\core\interactiveshell.py", line 3418, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 5, in crossover_prob=0.66, n_jobs=4, max_len=len(X_train_pyts) // 2) TypeError: init() got an unexpected keyword argument 'add_noise_prob'

    I installed by clone method. Thanks for the help

    opened by jmrichardson 2
  • Matrix profiling for initialization

    Matrix profiling for initialization

    Keogh et al recently published a paper that shows how shapelets can heuristically (but fast) be determined through the matrix profile.

    Implement this init operator in GENDIS.

    enhancement 
    opened by GillesVandewiele 0
Owner
IDLab Services
Internet and Data Lab research group from Ghent University
IDLab Services
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

[Due to the time taken @ uni, work + hell breaking loose in my life, since things have calmed down a bit, will continue commiting!!!] [By the way, I'm

Daniel Han-Chen 1.4k Jan 1, 2023
In this Repo a simple Sklearn Model will be trained and pushed to MLFlow

SKlearn_to_MLFLow In this Repo a simple Sklearn Model will be trained and pushed to MLFlow Install This Repo is based on poetry python3 -m venv .venv

null 1 Dec 13, 2021
Machine learning template for projects based on sklearn library.

Machine learning template for projects based on sklearn library.

Janez Lapajne 17 Oct 28, 2022
Test symmetries with sklearn decision tree models

Test symmetries with sklearn decision tree models Setup Begin from an environment with a recent version of python 3. source setup.sh Leave the enviro

Rupert Tombs 2 Jul 19, 2022
Turning images into '9-pan' palettes using KMeans clustering from sklearn.

img2palette Turning images into '9-pan' palettes using KMeans clustering from sklearn. Requirements We require: Pillow, for opening and processing ima

Samuel Vidovich 2 Jan 1, 2022
Napari sklearn decomposition

napari-sklearn-decomposition A simple plugin to use with napari This napari plug

null 1 Sep 1, 2022
Multiple Linear Regression using the LinearRegression class from sklearn.linear_model library

Multiple-Linear-Regression-master - A python program to implement Multiple Linear Regression using the LinearRegression class from sklearn.linear model library

Kushal Shingote 1 Feb 6, 2022
MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

The collaboration platform for Machine Learning MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine

MLReef 1.4k Dec 27, 2022
This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment to test the algorithm

Martin Huber 59 Dec 9, 2022
A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

?? Interactive Machine Learning experiments: ??️models training + ??models demo

Oleksii Trekhleb 1.4k Jan 6, 2023
XManager: A framework for managing machine learning experiments 🧑‍🔬

XManager is a platform for packaging, running and keeping track of machine learning experiments. It currently enables one to launch experiments locally or on Google Cloud Platform (GCP). Interaction with experiments is done via XManager's APIs through Python launch scripts.

DeepMind 620 Dec 27, 2022
This repository contains the code to predict house price using Linear Regression Method

House-Price-Prediction-Using-Linear-Regression The dataset I used for this personal project is from Kaggle uploaded by aariyan panchal. Link of Datase

null 0 Jan 28, 2022
Python implementation of the rulefit algorithm

RuleFit Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF) The algorithm can be used f

Christoph Molnar 326 Jan 2, 2023
Implementation of K-Nearest Neighbors Algorithm Using PySpark

KNN With Spark Implementation of KNN using PySpark. The KNN was used on two separate datasets (https://archive.ics.uci.edu/ml/datasets/iris and https:

Zachary Petroff 4 Dec 30, 2022
MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training

MosaicML Composer MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training. We aim to ease th

MosaicML 2.8k Jan 6, 2023
This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

Zillow-Houses This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform. Pipeline is consists of 10

null 2 Jan 9, 2022
This machine-learning algorithm takes in data from the last 60 days and tries to predict tomorrow's price of any crypto you ask it.

Crypto-Currency-Predictor This machine-learning algorithm takes in data from the last 60 days and tries to predict tomorrow's price of any crypto you

Hazim Arafa 6 Dec 4, 2022
Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm.

Naive-Bayes Spam Classificator Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm. Main goal is to code a

Viktoria Maksymiuk 1 Jun 27, 2022
BASTA: The BAyesian STellar Algorithm

BASTA: BAyesian STellar Algorithm Current stable version: v1.0 Important note: BASTA is developed for Python 3.8, but Python 3.7 should work as well.

BASTA team 16 Nov 15, 2022