A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

Target

Last update: Dec 26, 2022

Related tags

Machine Learning python data-science time-series pypi motif python3 pip motif-discovery pypi-packages timeseries-analysis pip3 matrix-profile timeseries-segmentation

Overview

matrixprofile-ts

matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keogh and Mueen research groups at UC-Riverside and the University of New Mexico. Current implementations include MASS, STMP, STAMP, STAMPI, STOMP, SCRIMP++, and FLUSS.

Read the Target blog post here.

Further academic description can be found here.

The PyPi page for matrixprofile-ts is here

Installation
Quick start
Detailed Example
Algorithm Comparison
Matrix Profile in Other Languages
Contact
Citations

Installation

Major releases of matrixprofile-ts are available on the Python Package Index:

pip install matrixprofile-ts

Details about each release can be found here.

Quick start

>>> from matrixprofile import *
>>> import numpy as np
>>> a = np.array([0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0])
>>> matrixProfile.stomp(a,4)
(array([0., 0., 0., 0., 0., 0., 0., 0., 0.]), array([4., 5., 6., 7., 0., 1., 2., 3., 0.]))

Note that SCRIMP++ is highly recommended for calculating the Matrix Profile due to its speed and anytime ability.

Examples

Jupyter notebooks containing various examples of how to use matrixprofile-ts can be found under docs/examples.

As a basic introduction, we can take a synthetic signal and use STOMP to calculate the corresponding Matrix Profile (this is the same synthetic signal as in the Golang Matrix Profile library). Code for this example can be found here

There are several items of note:

The Matrix Profile value jumps at each phase change. High Matrix Profile values are associated with "discords": time series behavior that hasn't been observed before.
Repeated patterns in the data (or "motifs") lead to low Matrix Profile values.

We can introduce an anomaly to the end of the time series and use STAMPI to detect it

The Matrix Profile has spiked in value, highlighting the (potential) presence of a new behavior. Note that Matrix Profile anomaly detection capabilities will depend on the nature of the data, as well as the selected subquery length parameter. Like all good algorithms, it's important to try out different parameter values.

Algorithm Comparison

This section shows the matrix profile algorithms and the time it takes to compute them. It also discusses use cases on when to use one versus another. The timing comparison is based on the synthetic sample data set to show run time speed.

For a more comprehensive runtime comparison, please review the notebook docs/examples/Algorithm Comparison.ipynb.

All time comparisons were ran on a 4 core 2.8 ghz processor with 16 GB of memory. The operating system used was Ubuntu 18.04LTS 64 bit.

Algorithm	Time to Complete	Description
STAMP	310 ms ± 1.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)	STAMP is an anytime algorithm that lets you sample the data set to get an approximate solution. Our implementation provides you with the option to specify the sampling size in percent format.
STOMP	79.8 ms ± 473 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)	STOMP computes an exact solution in a very efficient manner. When you have a historic time series that you would like to examine, STOMP is typically the quickest at giving an exact solution.
SCRIMP++	59 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)	SCRIMP++ merges the concepts of STAMP and STOMP together to provide an anytime algorithm that enables "interactive analysis speed". Essentially, it provides an exact or approximate solution in a very timely manner. Our implementation allows you to specify the max number of seconds you are willing to wait for a solution to obtain an approximate solution. If you are wanting the exact solution, it is able to provide that as well. The original authors of this algorithm suggest that SCRIMP++ can be used in all use cases.

Matrix Profile in Other Languages

Contact

Frankie Cancino ([email protected])
Andrew Van Benschoten ([email protected])

Citations

Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, Eamonn Keogh (2016). Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets. IEEE ICDM 2016
Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million Barrier for Time Series Motifs and Joins. Yan Zhu, Zachary Zimmerman, Nader Shakibay Senobari, Chin-Chia Michael Yeh, Gareth Funning, Abdullah Mueen, Philip Berisk and Eamonn Keogh (2016). EEE ICDM 2016
Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. Hoang Anh Dau and Eamonn Keogh. KDD'17, Halifax, Canada.
Matrix Profile XI: SCRIMP++: Time Series Motif Discovery at Interactive Speed. Yan Zhu, Chin-Chia Michael Yeh, Zachary Zimmerman, Kaveh Kamgar and Eamonn Keogh, ICDM 2018.
Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels. Shaghayegh Gharghabi, Yifei Ding, Chin-Chia Michael Yeh, Kaveh Kamgar, Liudmila Ulanova, and Eamonn Keogh. ICDM 2017.

Comments

Feature fluss
FLUSS - semantic segmentation using the MPI

This PR adds the FLUSS algorithm which provides Fast Low-cost Unipotent Semantic Segmentation. It is defined in pseudo-code in MP paper VIII, which I've added as a source in the Readme as well.

This PR changes the following items:

New: matrixprofile/fluss.py defines the fluss function. It returns the Corrected Arc Curve, based on the matrix profile index and the subsequence length.

New: tests/test_fluss.py defines a functionality test.

Changed: matrixprofile/__init__.py now includes fluss.

Changed: docs/examples/Matrix_Profile_Tutorial.ipynb now includes an example (plot) of FLUSS.

Changed: README.md now includes a reference to the algorithm and the paper.

A nice future feature would be to implement the anytime version of this algorithm (which does produce slightly different results). For now, I think this is already a useful addition. If there are things that need to be improved, please let me know and I'll have a look at them!
opened by mpieters93 18
Readme is a bit missleading

Hi, Very interesting concept thanks for coding it and sharing it. In the readme, you mention:

We can introduce an anomaly to the end of the time series and use STAMPI to detect it And then you conclude The Matrix Profile has spiked in value, highlighting the (potential) presence of a new behavior

I am a bit puzzled by this.. Yes the matrix profile has spiked but so did the data. I do not see in this example the additional value of STAMPI. Overall I am a bit confused about how to interpret the data. Especially if you'd like to do it in an automatic way for an anomaly detection. Here is a picture taking the example with some questions.. The first one, the number seems pretty high, so this mean that the z-norm euclidian distance is high (correct me if I am wrong). However, I could argue that there was just a changepoint and this is the normal "new behavior" (ie. there was an earthquake at the beginning of the data and then it went off). Going into this logic, I would interpret the black square more as outliers than "normal" but in this the MP have values close to 0 which would mean (if I understand correctly) that it should not be seen as outliers. Finally at the end I am unsure why I see an initial spike or the data going back to 0. I am sure this come from my lack of understanding of the MP but it might be nice to add a more detailed description of this chart as well as a function or an heuristic in the readme to automatically detect outliers using MP. Best,

opened by zippeurfou 10
What is the difference between matrixprofile and matrixprofile-ts?

So, I have no idea what the difference between these two packages? They both have pip, but there is no more update in matrixprofile-ts. Strangely, matrixprofile-ts has more stars. Can anyone answer my question?

opened by RexKing6 8
Strange behaviour when testing with constant values

Hi, thank you for sharing the Matrix Profile code.It is very helpfull

I have a time series sample data with a frequency of one point per hour and it work's perfectly with a period of 24, as you can see in image below.

the code used was this:

df_Sample = pd.read_csv('Sample.csv', sep=',') df_Sample= df_Sample.drop(columns=['Index']) new_df= df_Sample[:1156] a = new_df.values.squeeze() m=24 profile = matrixProfile.stomp(a,m) new_df['profile'] = np.append(profile[0],np.zeros(m-1)+np.nan) new_df['profile_index'] = np.append(profile[1], np.zeros(m - 1) + np.nan) fig, (ax1, ax2) = plt.subplots(2,1,sharex=True,figsize=(15,10)) new_df['Value'].plot(ax=ax1, style='.', title='Sample') new_df['profile'].plot(ax=ax2, c='r', title='Matrix Profile') plt.show()

But when I add additional time series data (that is constant) this happens:

The code is the same but instead of "new_df= df_Sample[:1156]" I defined "new_df= df_Sample".

I was expecting a different result. It seems that the constant values are destroying all the previous analyses. Is this a bug?

Thanks in advance,

David

Sample.zip

opened by dvdcoliveira 8
Definition and explanation of parameters

Can any one provide definitions of the used parameters. The questions are:

(1) How does one determine the samples to be excluded?

Computes the top k motifs from a matrix profile Parameters ---------- ts: time series to used to calculate mp mp: tuple, (matrix profile numpy array, matrix profile indices) max_motifs: the maximum number of motifs to discover ex_zone: the number of samples to exclude and set to Inf on either side of a found motifs defaults to m/2 Returns tuple (motifs, distances) motifs: a list of lists of indexes representing the motif starting locations. distances: list of minimum distances for each motif """

opened by radokotorov 7
Dealing with missing values

I tried to use matrix profile to analyse data with missing values, unfortunately I get an empty graph. Is it possible to analyse data with missing values with this implementation? Since in the paper it was stated that matrix profile should result some analysis even with missing data.

opened by Modestas96 7
Pattern Recognition

I would like to use your algorithm but I have one question. My problem is this: I have a query (pattern of interest) and I would like to find in a time series the patterns that are the closest to the query. Should I use in this case the matrix profile (SCRIMP++) or MASS? Thank you!

opened by AlexiaArtemis 6

Tutorial code running into issue: 'unicode' object is not callable

Using exactly the Tutorial code under Python 2.7 with miniconda environment, when running the following section, got an error. Is this just me ? can anyone help please? : Calculate the Matrix Profile

m = 32
mp = matrixProfile.stomp(pattern,m)

TypeError                                 Traceback (most recent call last)
<ipython-input-3-d3196b066bd3> in <module>()
      1 m = 32
----> 2 mp = matrixProfile.stomp(pattern,m)

/Users/dev/miniconda2/envs/dsf/lib/python2.7/site-packages/matrixprofile/matrixProfile.pyc in stomp(tsA, m, tsB)
    270     tsB: Time series to compare the query against. Note that, if no value is provided, tsB = tsA by default.
    271     """
--> 272     return _matrixProfile_stomp(tsA,m,order.linearOrder,distanceProfile.STOMPDistanceProfile,tsB)
    273 
    274 

/Users/dev/miniconda2/envs/dsf/lib/python2.7/site-packages/matrixprofile/matrixProfile.pyc in _matrixProfile_stomp(tsA, m, orderClass, distanceProfileFunction, tsB)
    166 
    167         #Need to pass in the previous sliding dot product for subsequent distance profile calculations
--> 168         (distanceProfile,querySegmentsID),dot_prev = distanceProfileFunction(tsA,idx,m,tsB,dot_first,dp,mean,std)
    169 
    170         if idx == 0:

/Users/dev/miniconda2/envs/dsf/lib/python2.7/site-packages/matrixprofile/distanceProfile.pyc in STOMPDistanceProfile(tsA, idx, m, tsB, dot_first, dp, mean, std)
    116     #Calculate the first distance profile via MASS
    117     if idx == 0:
--> 118         distanceProfile = np.real(np.sqrt(mass(query,tsB).astype(complex)))
    119 
    120         #Currently re-calculating the dot product separately as opposed to updating all of the mass function...

/Users/dev/miniconda2/envs/dsf/lib/python2.7/site-packages/matrixprofile/utils.pyc in mass(query, ts)
    172     q_std = np.std(query)
    173     mean, std = movmeanstd(ts,m)
--> 174     dot = slidingDotProduct(query,ts)
    175 
    176     #res = np.sqrt(2*m*(1-(dot-m*mean*q_mean)/(m*std*q_std)))

/Users/dev/miniconda2/envs/dsf/lib/python2.7/site-packages/matrixprofile/utils.pyc in slidingDotProduct(query, ts)
    122 
    123 
--> 124     query = np.pad(query,(0,n-m+ts_add-q_add),'constant')
    125 
    126     #Determine trim length for dot product. Note that zero-padding of the query has no effect on array length, which is solely determined by the longest vector

/Users/dev/miniconda2/envs/dsf/lib/python2.7/site-packages/numpy/lib/arraypad.pyc in pad(array, pad_width, mode, **kwargs)
   1383                                 pad_width[iaxis],
   1384                                 iaxis,
-> 1385                                 kwargs)
   1386         return newmat
   1387 

/Users/dev/miniconda2/envs/dsf/lib/python2.7/site-packages/numpy/lib/shape_base.pyc in apply_along_axis(func1d, axis, arr, *args, **kwargs)
     89     outshape = asarray(arr.shape).take(indlist)
     90     i.put(indlist, ind)
---> 91     res = func1d(arr[tuple(i.tolist())], *args, **kwargs)
     92     #  if res is a number, then we have a smaller output array
     93     if isscalar(res):

TypeError: 'unicode' object is not callable

opened by dingzj 6

Python 2 compatibility

We've seen interest in making the library compatible with Python 2. We will explore possible avenues, but if you have any thoughts on the subject please reach out.

opened by vanbenschoten 5
Duplicate license files: remove License.md?

Looks like there are both a LICENSE as well as a License.md file in the repo. The LICENSE file is the authoritative full-length Apache license, while the License.md is the short-form version, suitable for top-of-file comment header, for example, but not a substitute for the full license.

It would be good to remove License.md since it's the short-form version, and it's currently keeping the repo from getting the automatic "Apache 2.0" license badge on the overview page courtesy of GitHub's automatic license detector, due to the presence of 2 license files.

Thanks!

opened by mbrukman 4
MASS yields different results than brute force search

I observed different results when calculating the distance profile using the brute force search algorithm and the function matrixprofile.MASS.distance_profile(). Here is a sample of my code:

calculate by brute force

query_ = (query - query.mean()) / query.std(ddof=0) len_m = query.shape[0] dist = [] for index in range(0, serie.shape[0] - len_m + 1): sub = serie[index:index + len_m] sub = (sub - sub.mean()) / sub.std(ddof=0) dist.append(np.sqrt(np.sum(np.power(sub - query_, 2)))) dist = np.array(dist)

This is less of an issue and more me trying to validate the work of a colleague who has already implemented a version of matrix profile. Given it has already been implemented by this group I thought it would be best to open up dialogue. Thanks!

opened by youzzefb 4
Bump setuptools from 39.1.0 to 65.5.1
Bumps setuptools from 39.1.0 to 65.5.1.

Release notes

Sourced from setuptools's releases.

v65.5.1

No release notes provided.

v65.5.0

No release notes provided.

v65.4.1

No release notes provided.

v65.4.0

No release notes provided.

v65.3.0

No release notes provided.

v65.2.0

No release notes provided.

v65.1.1

No release notes provided.

v65.1.0

No release notes provided.

v65.0.2

No release notes provided.

v65.0.1

No release notes provided.

v65.0.0

No release notes provided.

v64.0.3

No release notes provided.

v64.0.2

No release notes provided.

v64.0.1

No release notes provided.

v64.0.0

No release notes provided.

v63.4.3

No release notes provided.

v63.4.2

No release notes provided.

... (truncated)

Changelog

Sourced from setuptools's changelog.

v65.5.1

Misc ^^^^

#3638: Drop a test dependency on the mock package, always use :external+python:py:mod:unittest.mock -- by :user:hroncok

#3659: Fixed REDoS vector in package_index.

v65.5.0

Changes ^^^^^^^

#3624: Fixed editable install for multi-module/no-package src-layout projects.

#3626: Minor refactorings to support distutils using stdlib logging module.

Documentation changes ^^^^^^^^^^^^^^^^^^^^^

#3419: Updated the example version numbers to be compliant with PEP-440 on the "Specifying Your Project’s Version" page of the user guide.

Misc ^^^^

#3569: Improved information about conflicting entries in the current working directory and editable install (in documentation and as an informational warning).

#3576: Updated version of validate_pyproject.

v65.4.1

Misc ^^^^

#3613: Fixed encoding errors in expand.StaticModule when system default encoding doesn't match expectations for source files.

#3617: Merge with pypa/distutils@6852b20 including fix for pypa/distutils#181.

v65.4.0

Changes ^^^^^^^

#3609: Merge with pypa/distutils@d82d926 including support for DIST_EXTRA_CONFIG in pypa/distutils#177.

v65.3.0

... (truncated)

Commits

a462cb5 Bump version: 65.5.0 → 65.5.1

de35d8b Merge pull request #3656 from bmorris3/typos

58e23de Update changelog. Ref #3659.

43a9c9b Limit the amount of whitespace to search/backtrack. Fixes #3659.

5791343 Add test capturing failed expectation. Ref #3659.

1f97905 ⚫ Fade to black.

6254567 Remove workaround for emacs.

729b180 ⚫ Fade to black.

c068081 Typo corrections

f777a40 Suppress deprecation warning in --rsyncdir. Workaround for #3655.

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump numpy from 1.16.2 to 1.22.0
Bumps numpy from 1.16.2 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Use of 'is' for non-object.

Minor, but keeps popping up in tooling we're doing:

In matrixprofile/matrixProfile.py the use of is (instead of ==) to compare n_threads to non-object -1 prompts SyntaxWarning: "is" with a literal.

opened by gbartlet 1
ValueError: Length of values does not match length of index

In:

mp = matrixProfile.stomp(pattern,m) mtfs ,motif_d = motifs.motifs(pattern, mp, max_motifs=10)

self._set_item(key, value) value = self._sanitize_column(key, value) alue = sanitize_index(value, self.index, copy=False)

Any idea how to solve this?

opened by hn2 1
matrixProfile.stomp() gives nan and inf values

The following code below gives nan and inf values; am I using this incorrectly?

seconds = np.arange(30) traffic_light = np.array([0]*15 + [1]*5 + [2]*10) brake_0 = np.array([0]*15 + [1, 2, 3, 4, 5] + [8, 10, 10, 10, 8, 6, 4, 2, 0, 0])

matrixProfile.stomp(traffic_light, 3) matrixProfile.stomp(brake_0, 3)

opened by ltbd78 5

A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

Related tags

Overview

matrixprofile-ts

Contents

Installation

Quick start

Examples

Algorithm Comparison

Matrix Profile in Other Languages

Contact

Citations

Comments

FLUSS - semantic segmentation using the MPI

calculate by brute force

v65.5.1

v65.5.0

v65.4.1

v65.4.0

v65.3.0

v65.2.0

v65.1.1

v65.1.0

v65.0.2

v65.0.1

v65.0.0

v64.0.3

v64.0.2

v64.0.1

v64.0.0

v63.4.3

v63.4.2

v65.5.1

v65.5.0

v65.4.1

v65.4.0

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

Owner

Target

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

This is an auto-ML tool specialized in detecting of outliers

This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

Meerkat provides fast and flexible data structures for working with complex machine learning datasets.

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Combines Bayesian analyses from many datasets.

PLUR is a collection of source code datasets suitable for graph-based machine learning.

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Multiple Linear Regression using the LinearRegression class from sklearn.linear_model library

A library of extension and helper modules for Python's data analysis and machine learning libraries.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service

A python library for easy manipulation and forecasting of time series.

FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically

Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio