Anomaly Detection and Correlation library

Last update: Jan 1, 2023

Related tags

Overview

luminol

Overview

Luminol is a light weight python library for time series data analysis. The two major functionalities it supports are anomaly detection and correlation. It can be used to investigate possible causes of anomaly. You collect time series data and Luminol can:

Given a time series, detect if the data contains any anomaly and gives you back a time window where the anomaly happened in, a time stamp where the anomaly reaches its severity, and a score indicating how severe is the anomaly compare to others in the time series.
Given two time series, help find their correlation coefficient. Since the correlation mechanism allows a shift room, you are able to correlate two peaks that are slightly apart in time.

Luminol is configurable in a sense that you can choose which specific algorithm you want to use for anomaly detection or correlation. In addition, the library does not rely on any predefined threshold on the values of a time series. Instead, it assigns each data point an anomaly score and identifies anomalies using the scores.

By using the library, we can establish a logic flow for root cause analysis. For example, suppose there is a spike in network latency:

Anomaly detection discovers the spike in network latency time series
Get the anomaly period of the spike, and correlate with other system metrics(GC, IO, CPU, etc.) in the same time range
Get a ranked list of correlated metrics, and the root cause candidates are likely to be on the top.

Investigating the possible ways to automate root cause analysis is one of the main reasons we developed this library and it will be a fundamental part of the future work.

Installation

make sure you have python, pip, numpy, and install directly through pip:

pip install luminol

the most up-to-date version of the library is 0.4.

Quick Start

This is a quick start guide for using luminol for time series analysis.

import the library

import luminol

conduct anomaly detection on a single time series ts.

detector = luminol.anomaly_detector.AnomalyDetector(ts)
anomalies = detector.get_anomalies()

if there is anomaly, correlate the first anomaly period with a secondary time series ts2.

if anomalies:
    time_period = anomalies[0].get_time_window()
    correlator = luminol.correlator.Correlator(ts, ts2, time_period)

print the correlation coefficient

print(correlator.get_correlation_result().coefficient)

These are really simple use of luminol. For information about the parameter types, return types and optional parameters, please refer to the API.

Modules

Modules in Luminol refers to customized classes developed for better data representation, which are Anomaly, CorrelationResult and TimeSeries.

Anomaly

class luminol.modules.anomaly.Anomaly
It contains these attributes:

self.start_timestamp: # epoch seconds represents the start of the anomaly period.
self.end_timestamp: # epoch seconds represents the end of the anomaly period.
self.anomaly_score: # a score indicating how severe is this anomaly.
self.exact_timestamp: # epoch seconds indicates when the anomaly reaches its severity.

It has these public methods:

get_time_window(): returns a tuple (start_timestamp, end_timestamp).

CorrelationResult

class luminol.modules.correlation_result.CorrelationResult
It contains these attributes:

self.coefficient: # correlation coefficient.
self.shift: # the amount of shift needed to get the above coefficient.
self.shifted_coefficient: # a correlation coefficient with shift taken into account.

TimeSeries

class luminol.modules.time_series.TimeSeries

__init__(self, series)

series(dict): timestamp -> value

It has a various handy methods for manipulating time series, including generator iterkeys, itervalues, and iteritems. It also supports binary operations such as add and subtract. Please refer to the code and inline comments for more information.

API

The library contains two classes: AnomalyDetector and Correlator, and there are two sets of APIs, one corresponding to each class. There are also customized modules for better data representation. The Modules section in this documentation may provide useful information as you walk through the APIs.

AnomalyDetector

class luminol.anomaly_detector.AnomalyDetecor

__init__(self, time_series, baseline_time_series=None, score_only=False, score_threshold=None,
         score_percentile_threshold=None, algorithm_name=None, algorithm_params=None,
         refine_algorithm_name=None, refine_algorithm_params=None)

time_series: The metric you want to conduct anomaly detection on. It can have the following three types:

1. string: # path to a csv file
2. dict: # timestamp -> value
3. lumnol.modules.time_series.TimeSeries

baseline_time_series: an optional baseline time series of one the types mentioned above.
score only(bool): if asserted, anomaly scores for the time series will be available, while anomaly periods will not be identified.
score_threshold: if passed, anomaly scores above this value will be identified as anomaly. It can override score_percentile_threshold.
score_precentile_threshold: if passed, anomaly scores above this percentile will be identified as anomaly. It can not override score_threshold.
algorithm_name(string): if passed, the specific algorithm will be used to compute anomaly scores.
algorithm_params(dict): additional parameters for algorithm specified by algorithm_name.
refine_algorithm_name(string): if passed, the specific algorithm will be used to compute the time stamp of severity within each anomaly period.
refine_algorithm_params(dict): additional parameters for algorithm specified by refine_algorithm_params.

Available algorithms and their additional parameters are:

1.  'bitmap_detector': # behaves well for huge data sets, and it is the default detector.
    {
      'precision'(4): # how many sections to categorize values,
      'lag_window_size'(2% of the series length): # lagging window size,
      'future_window_size'(2% of the series length): # future window size,
      'chunk_size'(2): # chunk size.
    }
2.  'default_detector': # used when other algorithms fails, not meant to be explicitly used.
3.  'derivative_detector': # meant to be used when abrupt changes of value are of main interest.
    {
      'smoothing factor'(0.2): # smoothing factor used to compute exponential moving averages
                                # of derivatives.
    }
4.  'exp_avg_detector': # meant to be used when values are in a roughly stationary range.
                        # and it is the default refine algorithm.
    {
      'smoothing factor'(0.2): # smoothing factor used to compute exponential moving averages.
      'lag_window_size'(20% of the series length): # lagging window size.
      'use_lag_window'(False): # if asserted, a lagging window of size lag_window_size will be used.
    }

It may seem vague for the meanings of some parameters above. Here are some useful insights:

The AnomalyDetector class has the following public methods:

get_all_scores(): returns an anomaly score time series of type TimeSeries.
get_anomalies(): return a list of Anomaly objects.

Correlator

class luminol.correlator.Correlator

__init__(self, time_series_a, time_series_b, time_period=None, use_anomaly_score=False,
         algorithm_name=None, algorithm_params=None)

time_series_a: a time series, for its type, please refer to time_series for AnomalyDetector above.
time_series_b: a time series, for its type, please refer to time_series for AnomalyDetector above.
time_period(tuple): a time period where to correlate the two time series.
use_anomaly_score(bool): if asserted, the anomaly scores of the time series will be used to compute correlation coefficient instead of the original data in the time series.
algorithm_name: if passed, the specific algorithm will be used to calculate correlation coefficient.
algorithm_params: any additional parameters for the algorithm specified by algorithm_name.

Available algorithms and their additional parameters are:

1.  'cross_correlator': # when correlate two time series, it tries to shift the series around so that it
                       # can catch spikes that are slightly apart in time.
    {
      'max_shift_seconds'(60): # maximal allowed shift room in seconds,
      'shift_impact'(0.05): # weight of shift in the shifted coefficient.
    }

The Correlator class has the following public methods:

get_correlation_result(): return a CorrelationResult object.
is_correlated(threshold=0.7): if coefficient above the passed in threshold, return a CorrelationResult object. Otherwise, return false.

Example

Calculate anomaly scores.

from luminol.anomaly_detector import AnomalyDetector

ts = {0: 0, 1: 0.5, 2: 1, 3: 1, 4: 1, 5: 0, 6: 0, 7: 0, 8: 0}

my_detector = AnomalyDetector(ts)
score = my_detector.get_all_scores()
for timestamp, value in score.iteritems():
    print(timestamp, value)

""" Output:
0 0.0
1 0.873128250131
2 1.57163085024
3 2.13633686334
4 1.70906949067
5 2.90541813415
6 1.17154110935
7 0.937232887479
8 0.749786309983
"""

Correlate ts1 with ts2 on every anomaly.

from luminol.anomaly_detector import AnomalyDetector
from luminol.correlator import Correlator

ts1 = {0: 0, 1: 0.5, 2: 1, 3: 1, 4: 1, 5: 0, 6: 0, 7: 0, 8: 0}
ts2 = {0: 0, 1: 0.5, 2: 1, 3: 0.5, 4: 1, 5: 0, 6: 1, 7: 1, 8: 1}

my_detector = AnomalyDetector(ts1, score_threshold=1.5)
score = my_detector.get_all_scores()
anomalies = my_detector.get_anomalies()
for a in anomalies:
    time_period = a.get_time_window()
    my_correlator = Correlator(ts1, ts2, time_period)
    if my_correlator.is_correlated(threshold=0.8):
        print("ts2 correlate with ts1 at time period (%d, %d)" % time_period)

""" Output:
ts2 correlates with ts1 at time period (2, 5)
"""

Contributing

Clone source and install package and dev requirements:

pip install -r requirements.txt
pip install pytest pytest-cov pylama

Tests and linting run with:

python -m pytest --cov=src/luminol/ src/luminol/tests/
python -m pylama -i E501 src/luminol/

Comments

Python 3.6 doesn't run your examples

Looks like the support for Python 3 is not completely developed yet. With Python 3.6 the examples you have don't run:

from luminol.anomaly_detector import AnomalyDetector

What's the easy fix for this?

opened by sjjpo2002 8
latest version (0.4) not published to pypi?

Hello! I see that all of @brennv 's PRs have been merged in (see issue #22 ), and the package version has been incremented here in the repo, but PyPi has not yet been updated to v0.4.

@RiteshMaheshwari , could I ask you for one last favor: Publish the latest version of luminol to PyPi, so we can reap the benefits of all those recent commits? Again, if you're not the person to tag / nag, please point me in the right direction. Thanks for your help!

opened by bdewilde 4
pep8

Any time when suggesting pep8 changes we are reminded of this. With that in mind, I'd like to suggest some aesthetic changes for convention and maintainability. In our Travis runs we check pep8 compliance as the last step. The last results before this PR are shown here:

https://travis-ci.org/linkedin/luminol/jobs/273116296#L663

This first commit serves only to make imports more explicit. It resolves all the warnings that read: W0401 'from luminol.constants import *' used; unable to detect undefined names [pyflakes]

I'd like to add additional commits to this PR if folks are open to it.

opened by brennv 4
Fixed typos in README

Hi,

I fixed few small typos in README.md file. Please check once and let me know if there are any issues.

Thanks!

Also, it would be good if we specify the latest version of dependencies in requirements.txt file.

opened by vicky002 4
problem with import

I have install the package via pip (using python 2.7.12) When I import the module luminol , it lacks all of the basic functions. I've atached a print screen. Can you help me ?

opened by guyoh 3

error in AnomalyDetector instantiation

I'm trying to run the Quick Start example, and in the very first command that instantiates a detector I get the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The command run is

detector = anomaly_detector.AnomalyDetector(ts)

Could this be caused by a change in the Pandas library?

I installed the last version of luminol, 0.3.1, with pip and I'm using Python 2.7.11 from the Anaconda distribution version 4.1.0 on Kubuntu 15.10.

This is the full traceback of the error:

<class 'pandas.core.series.Series'>
Traceback (most recent call last):
  File "open_heat_treatments.py", line 97, in <module>
    main()
  File "open_heat_treatments.py", line 90, in main
    detector = anomaly_detector.AnomalyDetector(ts)
  File "/home/dp/anaconda2/lib/python2.7/site-packages/luminol/anomaly_detector.py", line 44, in __init__
    self.time_series = self._load(time_series)
  File "/home/dp/anaconda2/lib/python2.7/site-packages/luminol/anomaly_detector.py", line 69, in _load
    if not time_series:
  File "/home/dp/anaconda2/lib/python2.7/site-packages/pandas/core/generic.py", line 892, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

opened by ghost 2

Citing luminol

Hi,

I would like to cite luminol for an academic publication, but was unable to find a list of authors other than "Naarad Developers". Would it be possible to let me know how you would you like me to cite the package?

Thank you very much in advance.

Best wishes, Alex

opened by Fisch-Alex 1
update-readme
[x] fix formatting and indentation issues around code blocks

[x] fix examples as discussed here

[x] add section on contributing and running tests

[x] add badges
opened by brennv 1
Python3 roadmap
Continuing the discussion from #15. I think adding travis and getting tests passing would good start. That way we can watch follow-on PRs pass or fail.

[x] ~~fix tests~~ #20

[x] ~~add travis~~ #11

[x] ~~fix spacing~~ #25

[x] ~~clean up~~ #27

[x] ~~add Python 3 support~~ #28

[x] ~~maybe more pep8 fun~~ #29 and/or refactor tests

[x] ~~update readme~~ #30 ~~and bump version~~ #31

[ ] pypi release
opened by brennv 1
Example 1 Put anomaly scores in a list is giving error

getting ValueError: (22, 'Invalid argument') in line: t_str = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(timestamp))

actually, the error is due to value of timestamp provided as argument in time.localtime(timestamp)

using SAR-device.sdb.await.csv provided in luminol/demo/src/static/data/.

Using Python 2.7. Please Help.

opened by ashish-r 1
update signt_test algo

use self.scale for alpha to make it consistent with offset. For example if we are detecting 'percent_threshold_lower' 5 percent, we do not have to write -5.

opened by bl44 1
Import anomaly_detector and correlator into luminol

With this, the example from README works.

In other words with proposed change one can import luminol and later detector = luminol.anomaly_detector.AnomalyDetector(ts), currently anomaly_detector is not visible as a member of luminol module.

opened by sjachim 0
Warn user on automatic modify of algorithm or parameters

Please throw a warning message to the user when automatically modifying parameters or algorithms. Doing this silently makes it extremely difficulty to debug and fine-tune.

https://github.com/linkedin/luminol/blob/42e4ab969b774ff98f902d064cb041556017f635/src/luminol/algorithms/anomaly_detector_algorithms/bitmap_detector.py#L60-L73

https://github.com/linkedin/luminol/blob/42e4ab969b774ff98f902d064cb041556017f635/src/luminol/anomaly_detector.py#L91-L104

opened by devinaconley 0
error in diff_percent_threshold.py

the code in enumerater should be baseline_value = self.baseline_time_series[timestamp] instead of baseline_value = self.baseline_time_series[i]. otherwise it will give "timestamp does not exist in time series object" exception.

opened by khushalvora 0
Package Definition

Hi,

This is not really an issue but couple questions. The example code that calculates the anomaly scores e.g:

from luminol.anomaly_detector import AnomalyDetector

ts = {0: 0, 1: 0.5, 2: 1, 3: 1, 4: 1, 5: 0, 6: 0, 7: 0, 8: 0}

my_detector = AnomalyDetector(ts) score = my_detector.get_all_scores() for timestamp, value in score.iteritems(): print(timestamp, value)

Does it calculate the scores as they come like a real-time anomaly detection instead of looking at what the value is before? Is there a way to tune the parameters of the above code as well like the window size and chunk size? If so, how?

Thank you very much.

opened by priencesstan 0

Anomaly Detection and Correlation library

Related tags

Overview

luminol

Overview

Installation

Quick Start

Modules

Anomaly

CorrelationResult

TimeSeries

API

AnomalyDetector

Correlator

Example

Contributing

Comments

Owner

LinkedIn

Optimal Randomized Canonical Correlation Analysis

Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

This repo implements a Topological SLAM: Deep Visual Odometry with Long Term Place Recognition (Loop Closure Detection)

YouTube Spam Detection with python

Credit Card Fraud Detection, used the credit card fraud dataset from Kaggle

A library of extension and helper modules for Python's data analysis and machine learning libraries.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

ThunderSVM: A Fast SVM Library on GPUs and CPUs

A python library for easy manipulation and forecasting of time series.

STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

An open-source library of algorithms to analyse time series in GPU and CPU.

﻿Greykite: A flexible, intuitive and fast forecasting library

Greykite: A flexible, intuitive and fast forecasting library