A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

Overview

PyPI version Build Status Downloads Downloads/Week License

matrixprofile-ts

matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keogh and Mueen research groups at UC-Riverside and the University of New Mexico. Current implementations include MASS, STMP, STAMP, STAMPI, STOMP, SCRIMP++, and FLUSS.

Read the Target blog post here.

Further academic description can be found here.

The PyPi page for matrixprofile-ts is here

Contents

Installation

Major releases of matrixprofile-ts are available on the Python Package Index:

pip install matrixprofile-ts

Details about each release can be found here.

Quick start

>>> from matrixprofile import *
>>> import numpy as np
>>> a = np.array([0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0])
>>> matrixProfile.stomp(a,4)
(array([0., 0., 0., 0., 0., 0., 0., 0., 0.]), array([4., 5., 6., 7., 0., 1., 2., 3., 0.]))

Note that SCRIMP++ is highly recommended for calculating the Matrix Profile due to its speed and anytime ability.

Examples

Jupyter notebooks containing various examples of how to use matrixprofile-ts can be found under docs/examples.

As a basic introduction, we can take a synthetic signal and use STOMP to calculate the corresponding Matrix Profile (this is the same synthetic signal as in the Golang Matrix Profile library). Code for this example can be found here

datamp

There are several items of note:

  • The Matrix Profile value jumps at each phase change. High Matrix Profile values are associated with "discords": time series behavior that hasn't been observed before.

  • Repeated patterns in the data (or "motifs") lead to low Matrix Profile values.

We can introduce an anomaly to the end of the time series and use STAMPI to detect it

datampanom

The Matrix Profile has spiked in value, highlighting the (potential) presence of a new behavior. Note that Matrix Profile anomaly detection capabilities will depend on the nature of the data, as well as the selected subquery length parameter. Like all good algorithms, it's important to try out different parameter values.

Algorithm Comparison

This section shows the matrix profile algorithms and the time it takes to compute them. It also discusses use cases on when to use one versus another. The timing comparison is based on the synthetic sample data set to show run time speed.

For a more comprehensive runtime comparison, please review the notebook docs/examples/Algorithm Comparison.ipynb.

All time comparisons were ran on a 4 core 2.8 ghz processor with 16 GB of memory. The operating system used was Ubuntu 18.04LTS 64 bit.

Algorithm Time to Complete Description
STAMP 310 ms ± 1.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) STAMP is an anytime algorithm that lets you sample the data set to get an approximate solution. Our implementation provides you with the option to specify the sampling size in percent format.
STOMP 79.8 ms ± 473 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) STOMP computes an exact solution in a very efficient manner. When you have a historic time series that you would like to examine, STOMP is typically the quickest at giving an exact solution.
SCRIMP++ 59 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) SCRIMP++ merges the concepts of STAMP and STOMP together to provide an anytime algorithm that enables "interactive analysis speed". Essentially, it provides an exact or approximate solution in a very timely manner. Our implementation allows you to specify the max number of seconds you are willing to wait for a solution to obtain an approximate solution. If you are wanting the exact solution, it is able to provide that as well. The original authors of this algorithm suggest that SCRIMP++ can be used in all use cases.

Matrix Profile in Other Languages

Contact

Citations

  1. Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, Eamonn Keogh (2016). Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets. IEEE ICDM 2016

  2. Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million Barrier for Time Series Motifs and Joins. Yan Zhu, Zachary Zimmerman, Nader Shakibay Senobari, Chin-Chia Michael Yeh, Gareth Funning, Abdullah Mueen, Philip Berisk and Eamonn Keogh (2016). EEE ICDM 2016

  3. Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. Hoang Anh Dau and Eamonn Keogh. KDD'17, Halifax, Canada.

  4. Matrix Profile XI: SCRIMP++: Time Series Motif Discovery at Interactive Speed. Yan Zhu, Chin-Chia Michael Yeh, Zachary Zimmerman, Kaveh Kamgar and Eamonn Keogh, ICDM 2018.

  5. Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels. Shaghayegh Gharghabi, Yifei Ding, Chin-Chia Michael Yeh, Kaveh Kamgar, Liudmila Ulanova, and Eamonn Keogh. ICDM 2017.

Comments
  • Feature fluss

    Feature fluss

    FLUSS - semantic segmentation using the MPI

    This PR adds the FLUSS algorithm which provides Fast Low-cost Unipotent Semantic Segmentation. It is defined in pseudo-code in MP paper VIII, which I've added as a source in the Readme as well.

    This PR changes the following items:

    1. New: matrixprofile/fluss.py defines the fluss function. It returns the Corrected Arc Curve, based on the matrix profile index and the subsequence length.
    2. New: tests/test_fluss.py defines a functionality test.
    3. Changed: matrixprofile/__init__.py now includes fluss.
    4. Changed: docs/examples/Matrix_Profile_Tutorial.ipynb now includes an example (plot) of FLUSS.
    5. Changed: README.md now includes a reference to the algorithm and the paper.

    A nice future feature would be to implement the anytime version of this algorithm (which does produce slightly different results). For now, I think this is already a useful addition. If there are things that need to be improved, please let me know and I'll have a look at them!

    opened by mpieters93 18
  • Readme is a bit missleading

    Readme is a bit missleading

    Hi, Very interesting concept thanks for coding it and sharing it. In the readme, you mention:

    We can introduce an anomaly to the end of the time series and use STAMPI to detect it And then you conclude The Matrix Profile has spiked in value, highlighting the (potential) presence of a new behavior

    I am a bit puzzled by this.. Yes the matrix profile has spiked but so did the data. I do not see in this example the additional value of STAMPI. Overall I am a bit confused about how to interpret the data. Especially if you'd like to do it in an automatic way for an anomaly detection. Here is a picture taking the example with some questions.. b34f5467c762150b8bf5c0404e4a6551 _image 2019-01-11 at 12 01 40 pm The first one, the number seems pretty high, so this mean that the z-norm euclidian distance is high (correct me if I am wrong). However, I could argue that there was just a changepoint and this is the normal "new behavior" (ie. there was an earthquake at the beginning of the data and then it went off). Going into this logic, I would interpret the black square more as outliers than "normal" but in this the MP have values close to 0 which would mean (if I understand correctly) that it should not be seen as outliers. Finally at the end I am unsure why I see an initial spike or the data going back to 0. I am sure this come from my lack of understanding of the MP but it might be nice to add a more detailed description of this chart as well as a function or an heuristic in the readme to automatically detect outliers using MP. Best,

    opened by zippeurfou 10
  • What is the difference between matrixprofile and matrixprofile-ts?

    What is the difference between matrixprofile and matrixprofile-ts?

    So, I have no idea what the difference between these two packages? They both have pip, but there is no more update in matrixprofile-ts. Strangely, matrixprofile-ts has more stars. Can anyone answer my question?

    opened by RexKing6 8
  • Strange behaviour when testing with constant values

    Strange behaviour when testing with constant values

    Hi, thank you for sharing the Matrix Profile code.It is very helpfull

    I have a time series sample data with a frequency of one point per hour and it work's perfectly with a period of 24, as you can see in image below.

    Figure_3

    the code used was this:

    df_Sample = pd.read_csv('Sample.csv', sep=',') df_Sample= df_Sample.drop(columns=['Index']) new_df= df_Sample[:1156] a = new_df.values.squeeze() m=24 profile = matrixProfile.stomp(a,m) new_df['profile'] = np.append(profile[0],np.zeros(m-1)+np.nan) new_df['profile_index'] = np.append(profile[1], np.zeros(m - 1) + np.nan) fig, (ax1, ax2) = plt.subplots(2,1,sharex=True,figsize=(15,10)) new_df['Value'].plot(ax=ax1, style='.', title='Sample') new_df['profile'].plot(ax=ax2, c='r', title='Matrix Profile') plt.show()

    But when I add additional time series data (that is constant) this happens: Figure_1

    The code is the same but instead of "new_df= df_Sample[:1156]" I defined "new_df= df_Sample".

    I was expecting a different result. It seems that the constant values are destroying all the previous analyses. Is this a bug?

    Thanks in advance,

    David

    Sample.zip

    opened by dvdcoliveira 8
  • Definition and explanation of parameters

    Definition and explanation of parameters

    Can any one provide definitions of the used parameters. The questions are:

    (1) How does one determine the samples to be excluded?

    Computes the top k motifs from a matrix profile Parameters ---------- ts: time series to used to calculate mp mp: tuple, (matrix profile numpy array, matrix profile indices) max_motifs: the maximum number of motifs to discover ex_zone: the number of samples to exclude and set to Inf on either side of a found motifs defaults to m/2 Returns tuple (motifs, distances) motifs: a list of lists of indexes representing the motif starting locations. distances: list of minimum distances for each motif """

    opened by radokotorov 7
  • Dealing with missing values

    Dealing with missing values

    I tried to use matrix profile to analyse data with missing values, unfortunately I get an empty graph. Is it possible to analyse data with missing values with this implementation? Since in the paper it was stated that matrix profile should result some analysis even with missing data.

    opened by Modestas96 7
  • Pattern Recognition

    Pattern Recognition

    I would like to use your algorithm but I have one question. My problem is this: I have a query (pattern of interest) and I would like to find in a time series the patterns that are the closest to the query. Should I use in this case the matrix profile (SCRIMP++) or MASS? Thank you!

    opened by AlexiaArtemis 6
  • Tutorial code running into issue: 'unicode' object is not callable

    Tutorial code running into issue: 'unicode' object is not callable

    Using exactly the Tutorial code under Python 2.7 with miniconda environment, when running the following section, got an error. Is this just me ? can anyone help please? : Calculate the Matrix Profile

    m = 32
    mp = matrixProfile.stomp(pattern,m)
    

    TypeError                                 Traceback (most recent call last)
    <ipython-input-3-d3196b066bd3> in <module>()
          1 m = 32
    ----> 2 mp = matrixProfile.stomp(pattern,m)
    
    /Users/dev/miniconda2/envs/dsf/lib/python2.7/site-packages/matrixprofile/matrixProfile.pyc in stomp(tsA, m, tsB)
        270     tsB: Time series to compare the query against. Note that, if no value is provided, tsB = tsA by default.
        271     """
    --> 272     return _matrixProfile_stomp(tsA,m,order.linearOrder,distanceProfile.STOMPDistanceProfile,tsB)
        273 
        274 
    
    /Users/dev/miniconda2/envs/dsf/lib/python2.7/site-packages/matrixprofile/matrixProfile.pyc in _matrixProfile_stomp(tsA, m, orderClass, distanceProfileFunction, tsB)
        166 
        167         #Need to pass in the previous sliding dot product for subsequent distance profile calculations
    --> 168         (distanceProfile,querySegmentsID),dot_prev = distanceProfileFunction(tsA,idx,m,tsB,dot_first,dp,mean,std)
        169 
        170         if idx == 0:
    
    /Users/dev/miniconda2/envs/dsf/lib/python2.7/site-packages/matrixprofile/distanceProfile.pyc in STOMPDistanceProfile(tsA, idx, m, tsB, dot_first, dp, mean, std)
        116     #Calculate the first distance profile via MASS
        117     if idx == 0:
    --> 118         distanceProfile = np.real(np.sqrt(mass(query,tsB).astype(complex)))
        119 
        120         #Currently re-calculating the dot product separately as opposed to updating all of the mass function...
    
    /Users/dev/miniconda2/envs/dsf/lib/python2.7/site-packages/matrixprofile/utils.pyc in mass(query, ts)
        172     q_std = np.std(query)
        173     mean, std = movmeanstd(ts,m)
    --> 174     dot = slidingDotProduct(query,ts)
        175 
        176     #res = np.sqrt(2*m*(1-(dot-m*mean*q_mean)/(m*std*q_std)))
    
    /Users/dev/miniconda2/envs/dsf/lib/python2.7/site-packages/matrixprofile/utils.pyc in slidingDotProduct(query, ts)
        122 
        123 
    --> 124     query = np.pad(query,(0,n-m+ts_add-q_add),'constant')
        125 
        126     #Determine trim length for dot product. Note that zero-padding of the query has no effect on array length, which is solely determined by the longest vector
    
    /Users/dev/miniconda2/envs/dsf/lib/python2.7/site-packages/numpy/lib/arraypad.pyc in pad(array, pad_width, mode, **kwargs)
       1383                                 pad_width[iaxis],
       1384                                 iaxis,
    -> 1385                                 kwargs)
       1386         return newmat
       1387 
    
    /Users/dev/miniconda2/envs/dsf/lib/python2.7/site-packages/numpy/lib/shape_base.pyc in apply_along_axis(func1d, axis, arr, *args, **kwargs)
         89     outshape = asarray(arr.shape).take(indlist)
         90     i.put(indlist, ind)
    ---> 91     res = func1d(arr[tuple(i.tolist())], *args, **kwargs)
         92     #  if res is a number, then we have a smaller output array
         93     if isscalar(res):
    
    TypeError: 'unicode' object is not callable
    
    opened by dingzj 6
  • Python 2 compatibility

    Python 2 compatibility

    We've seen interest in making the library compatible with Python 2. We will explore possible avenues, but if you have any thoughts on the subject please reach out.

    opened by vanbenschoten 5
  • Duplicate license files: remove License.md?

    Duplicate license files: remove License.md?

    Looks like there are both a LICENSE as well as a License.md file in the repo. The LICENSE file is the authoritative full-length Apache license, while the License.md is the short-form version, suitable for top-of-file comment header, for example, but not a substitute for the full license.

    It would be good to remove License.md since it's the short-form version, and it's currently keeping the repo from getting the automatic "Apache 2.0" license badge on the overview page courtesy of GitHub's automatic license detector, due to the presence of 2 license files.

    Thanks!

    opened by mbrukman 4
  • MASS yields different results than brute force search

    MASS yields different results than brute force search

    I observed different results when calculating the distance profile using the brute force search algorithm and the function matrixprofile.MASS.distance_profile(). Here is a sample of my code:

    calculate by brute force

    query_ = (query - query.mean()) / query.std(ddof=0) len_m = query.shape[0] dist = [] for index in range(0, serie.shape[0] - len_m + 1): sub = serie[index:index + len_m] sub = (sub - sub.mean()) / sub.std(ddof=0) dist.append(np.sqrt(np.sum(np.power(sub - query_, 2)))) dist = np.array(dist)

    This is less of an issue and more me trying to validate the work of a colleague who has already implemented a version of matrix profile. Given it has already been implemented by this group I thought it would be best to open up dialogue. Thanks!

    opened by youzzefb 4
  • Bump setuptools from 39.1.0 to 65.5.1

    Bump setuptools from 39.1.0 to 65.5.1

    Bumps setuptools from 39.1.0 to 65.5.1.

    Release notes

    Sourced from setuptools's releases.

    v65.5.1

    No release notes provided.

    v65.5.0

    No release notes provided.

    v65.4.1

    No release notes provided.

    v65.4.0

    No release notes provided.

    v65.3.0

    No release notes provided.

    v65.2.0

    No release notes provided.

    v65.1.1

    No release notes provided.

    v65.1.0

    No release notes provided.

    v65.0.2

    No release notes provided.

    v65.0.1

    No release notes provided.

    v65.0.0

    No release notes provided.

    v64.0.3

    No release notes provided.

    v64.0.2

    No release notes provided.

    v64.0.1

    No release notes provided.

    v64.0.0

    No release notes provided.

    v63.4.3

    No release notes provided.

    v63.4.2

    No release notes provided.

    ... (truncated)

    Changelog

    Sourced from setuptools's changelog.

    v65.5.1

    Misc ^^^^

    • #3638: Drop a test dependency on the mock package, always use :external+python:py:mod:unittest.mock -- by :user:hroncok
    • #3659: Fixed REDoS vector in package_index.

    v65.5.0

    Changes ^^^^^^^

    • #3624: Fixed editable install for multi-module/no-package src-layout projects.
    • #3626: Minor refactorings to support distutils using stdlib logging module.

    Documentation changes ^^^^^^^^^^^^^^^^^^^^^

    • #3419: Updated the example version numbers to be compliant with PEP-440 on the "Specifying Your Project’s Version" page of the user guide.

    Misc ^^^^

    • #3569: Improved information about conflicting entries in the current working directory and editable install (in documentation and as an informational warning).
    • #3576: Updated version of validate_pyproject.

    v65.4.1

    Misc ^^^^

    • #3613: Fixed encoding errors in expand.StaticModule when system default encoding doesn't match expectations for source files.
    • #3617: Merge with pypa/distutils@6852b20 including fix for pypa/distutils#181.

    v65.4.0

    Changes ^^^^^^^

    v65.3.0

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump numpy from 1.16.2 to 1.22.0

    Bump numpy from 1.16.2 to 1.22.0

    Bumps numpy from 1.16.2 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Use of 'is' for non-object.

    Use of 'is' for non-object.

    Minor, but keeps popping up in tooling we're doing:

    In matrixprofile/matrixProfile.py the use of is (instead of ==) to compare n_threads to non-object -1 prompts SyntaxWarning: "is" with a literal.

    opened by gbartlet 1
  • ValueError: Length of values does not match length of index

    ValueError: Length of values does not match length of index

    In:

    mp = matrixProfile.stomp(pattern,m) mtfs ,motif_d = motifs.motifs(pattern, mp, max_motifs=10)

    self._set_item(key, value) value = self._sanitize_column(key, value) alue = sanitize_index(value, self.index, copy=False)

    Any idea how to solve this?

    opened by hn2 1
  • matrixProfile.stomp() gives nan and inf values

    matrixProfile.stomp() gives nan and inf values

    The following code below gives nan and inf values; am I using this incorrectly?

    seconds = np.arange(30) traffic_light = np.array([0]*15 + [1]*5 + [2]*10) brake_0 = np.array([0]*15 + [1, 2, 3, 4, 5] + [8, 10, 10, 10, 8, 6, 4, 2, 0, 0])

    matrixProfile.stomp(traffic_light, 3) matrixProfile.stomp(brake_0, 3)

    opened by ltbd78 5
Owner
Target
Target's official GitHub organization
Target
Uber Open Source 1.6k Dec 31, 2022
This is an auto-ML tool specialized in detecting of outliers

Auto-ML tool specialized in detecting of outliers Description This tool will allows you, with a Dash visualization, to compare 10 models of machine le

null 1 Nov 3, 2021
This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

B DEVA DEEKSHITH 1 Nov 3, 2021
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla

null 6.2k Jan 1, 2023
Meerkat provides fast and flexible data structures for working with complex machine learning datasets.

Meerkat makes it easier for ML practitioners to interact with high-dimensional, multi-modal data. It provides simple abstractions for data inspection, model evaluation and model training supported by efficient and robust IO under the hood.

Robustness Gym 115 Dec 12, 2022
Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets Datasets Used: Iris dataset,

Samrat Mitra 2 Nov 18, 2021
Combines Bayesian analyses from many datasets.

PosteriorStacker Combines Bayesian analyses from many datasets. Introduction Method Tutorial Output plot and files Introduction Fitting a model to a d

Johannes Buchner 19 Feb 13, 2022
PLUR is a collection of source code datasets suitable for graph-based machine learning.

PLUR (Programming-Language Understanding and Repair) is a collection of source code datasets suitable for graph-based machine learning. We provide scripts for downloading, processing, and loading the datasets. This is done by offering a unified API and data structures for all datasets.

Google Research 76 Nov 25, 2022
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

null 23.3k Dec 31, 2022
Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir

null 6 Jun 30, 2022
To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

Astitva Veer Garg 1 Jan 11, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 3, 2023
Multiple Linear Regression using the LinearRegression class from sklearn.linear_model library

Multiple-Linear-Regression-master - A python program to implement Multiple Linear Regression using the LinearRegression class from sklearn.linear model library

Kushal Shingote 1 Feb 6, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 5, 2023
machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service

This is a machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service. We initially made this project as a requirement for an internship at Indian Servers. We are now making it open to contribution.

Krishna Priyatham Potluri 73 Dec 1, 2022
A python library for easy manipulation and forecasting of time series.

Time Series Made Easy in Python darts is a python library for easy manipulation and forecasting of time series. It contains a variety of models, from

Unit8 5.2k Jan 4, 2023
Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

By Investors, For Investors. Want to read this in Chinese? Click here Empyrial is a Python-based open-source quantitative investment library dedicated

Santosh 640 Dec 31, 2022