LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

Overview

logdeep

Introduction

LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

Framework of logdeep

Note: This repo does not include log parsing,if you need to use it, please check logparser

Major features

  • Modular Design

  • Support multi log event features out of box

  • State of the art(Including resluts from deeplog,loganomaly,robustlog...)

Models

Model Paper reference
DeepLog [CCS'17] DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning
LogAnomaly [IJCAI'19] LogAnomaly: UnsupervisedDetectionof SequentialandQuantitativeAnomaliesinUnstructuredLogs
RobustLog [FSE'19] RobustLog-BasedAnomalyDetectiononUnstableLogData

Requirement

  • python>=3.6
  • pytorch >= 1.1.0

Quick start

git clone https://github.com/donglee-afar/logdeep.git
cd logdeep

Example of building your own log dataset
SAMPLING_EXAMPLE.md

Train & Test DeepLog example

cd demo
# Train
python deeplog.py train
# Test
python deeplog.py test

The output results, key parameters and train logs will be saved under result/ path

DIY your own pipeline

Here is an example of the key parameters of the loganomaly model which in demo/loganomaly.py
Try to modify these parameters to build a new model!

# Smaple
options['sample'] = "sliding_window"
options['window_size'] = 10

# Features
options['sequentials'] = True
options['quantitatives'] = True
options['semantics'] = False

Model = loganomaly(input_size=options['input_size'],
                    hidden_size=options['hidden_size'],
                    num_layers=options['num_layers'],
                    num_keys=options['num_classes'])

Benchmark results

HDFS
Model feature Precision Recall F1
DeepLog(unsupervised) seq 0.9583 0.9330 0.9454
LogAnomaly(unsupervised) seq+quan 0.9690 0.9825 0.9757
RobustLog(supervised) semantic 0.9216 0.9586 0.9397
Comments
  • Question about obtaining the benchmark result

    Question about obtaining the benchmark result

    Thank you for all the amazing work you've done!

    I successfully ran through the training and predicting process of deeplog model using the same HDFS data file that you are using (from loghub).

    And I'm using Drain as my parsing tool to get the structured log data. I ended up having 48 unique event ID in the template. And I'm using around 5000 sessions for the training and the train loss and validation loss converged to 0.2 (start from 0.8) around 300+ epochs. I didn't change the default parameter setting in the deeplog.py file except for the number of classes (48 in my case).

    The result that I got from prediction is shown below. It does not look as promising as the benchmark. image

    I'm not sure why but is it because of the parsing tool?

    And idea or suggetions of improving the model results are welcome!!

    opened by cherishwsx 5
  • Question about LogAnomaly

    Question about LogAnomaly

    Thank you for your work. It was very helpful.

    I have a Two questions about LogAnomaly.

    First, When I read LogAnomaly paper there was Template2Vec Section. But I can't find that part in your code. There was a count vector part, but the sequence part does not seem to have Template2Vec applied.

    Second, Attention was implemented in the logdeep/models/lstm.py, but it was not used.

    Again, Thanks for your work. :)

    opened by yunsangq 2
  • Problems About RobustLog

    Problems About RobustLog

    Hi, thank you for this awesome toolkit!

    What confuses me is that it seems that you don't set an attention layer, which is mentioned in the paper, in the RobustLog model. Do you mind explaining the reason for me? I'm a ML/DL newbie. Thanks in advance!

    opened by rhanqtl 2
  • question about bgl dataset

    question about bgl dataset

    Hi, thank you for making this amazing project.

    I have some question when I use the BGL dataset to train and test. In the logdeep/dataset/sample.py you use 'hdfs/event2semantic_vec.json', I have no idea what is the function of this file. And when I use the BGL dataset, how I generate this file? Or I needn't this file? If it is not needed, how should I modify sample.py?

    looking forward to your reply.

    opened by Yanyyy 1
  • DeepLog hdfs original unpased data

    DeepLog hdfs original unpased data

    Hello, I started to use this ropo and it is grat! But I need to see the original data of the train and the test (normal and abnormal). I understood that the data generated from the csv 'HDFS_100k.log_structured', but how it has generated? and what format exactly is the date and time in this file?

    thanks!

    opened by shimonShouei 0
  • One-hot encoding?

    One-hot encoding?

    I look at your input to deeplog model is just numerical token sequences. Why isn't one-hot encoding transformation used as the input? The numbers in sequences represent operations, so they are nominal, not ordinal.

    opened by htcml 0
  • A data processing problem

    A data processing problem

    Hi @donglee-afar: I read the answers to the issue areas, but I still don't understand the data processing process. I don't know how to convert "sequece_hdfs.csv" to "hdfs_train" in logdeep project. image

    Could please you help me with it?😀 Can you provide a reference code and a workflow?

    thank you very much.

    opened by heyd7fc 1
  • Question about deeplog in logs Apache

    Question about deeplog in logs Apache

    How would I go about using deeplog for Apache logs?

    192.168.0.14 - - [15/Sep/2021:07:28:39 -0400] "GET /media/plg_system_popup/js/jquery.js HTTP/1.1" 200 293755 "https://192.168.0.52/" "Mozilla /5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0"

    opened by Levi-zz 0
  • An error occurs when the terminal command line runs

    An error occurs when the terminal command line runs

    (python38) houjingwen@MacBook-Pro-2 demo % python3 loganomaly.py train
    Traceback (most recent call last): File "loganomaly.py", line 11, in from logdeep.models.lstm import loganomaly,deeplog,robustlog File "/Users/houjingwen/Desktop/logdeep-master/logdeep/models/lstm.py", line 1, in import torch ModuleNotFoundError: No module named 'torch'

    opened by Houjingwen 0
Owner
donglee
Kaggle Master
donglee
Paper list of log-based anomaly detection

Paper list of log-based anomaly detection

Weibin Meng 411 Dec 5, 2022
Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight)

About Code release for Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy (ICLR 2022 Spotlight)

THUML @ Tsinghua University 221 Dec 31, 2022
DeepLearning Anomalies Detection with Bluetooth Sensor Data

Final Year Project. Constructing models to create offline anomalies detection using Travel Time Data collected from Bluetooth sensors along the route.

null 1 Jan 10, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17.3k Dec 29, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17k Feb 11, 2021
PyTorchVideo is a deeplearning library with a focus on video understanding work

PyTorchVideo is a deeplearning library with a focus on video understanding work. PytorchVideo provides resusable, modular and efficient components needed to accelerate the video understanding research. PyTorchVideo is developed using PyTorch and supports different deeplearning video components like video models, video datasets, and video-specific transforms.

Facebook Research 2.7k Jan 7, 2023
(JMLR'19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Python Outlier Detection (PyOD) Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License PyOD is a comprehensive and sca

Yue Zhao 6.6k Jan 3, 2023
Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)

Python Streaming Anomaly Detection (PySAD) PySAD is an open-source python framework for anomaly detection on streaming multivariate data. Documentatio

Selim Firat Yilmaz 181 Dec 18, 2022
A Python Library for Graph Outlier Detection (Anomaly Detection)

PyGOD is a Python library for graph outlier detection (anomaly detection). This exciting yet challenging field has many key applications, e.g., detect

PyGOD Team 757 Jan 4, 2023
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

null 185 Dec 26, 2022
LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and build their own methods.

TuZheng 405 Jan 4, 2023
MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

MemStream Implementation of MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift . Siddharth Bhatia, Arjit Jain, Shivi

Stream-AD 61 Dec 2, 2022
Industrial knn-based anomaly detection for images. Visit streamlit link to check out the demo.

Industrial KNN-based Anomaly Detection ⭐ Now has streamlit support! ⭐ Run $ streamlit run streamlit_app.py This repo aims to reproduce the results of

aventau 102 Dec 26, 2022
Anomaly Detection Based on Hierarchical Clustering of Mobile Robot Data

We proposed a new approach to detect anomalies of mobile robot data. We investigate each data seperately with two clustering method hierarchical and k-means. There are two sub-method that we used for produce an anomaly score. Then, we merge these two score and produce merged anomaly score as a result.

Zekeriyya Demirci 1 Jan 9, 2022
This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

BUPT GAMMA Lab 519 Jan 2, 2023
Real-world Anomaly Detection in Surveillance Videos- pytorch Re-implementation

Real world Anomaly Detection in Surveillance Videos : Pytorch RE-Implementation This repository is a re-implementation of "Real-world Anomaly Detectio

seominseok 62 Dec 8, 2022
This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

haifeng xia 32 Oct 26, 2022
Demo project for real time anomaly detection using kafka and python

kafkaml-anomaly-detection Project for real time anomaly detection using kafka and python It's assumed that zookeeper and kafka are running in the loca

Rodrigo Arenas 36 Dec 12, 2022
Unofficial implementation of PatchCore anomaly detection

PatchCore anomaly detection Unofficial implementation of PatchCore(new SOTA) anomaly detection model Original Paper : Towards Total Recall in Industri

Changwoo Ha 268 Dec 22, 2022