This tool parses log data and allows to define analysis pipelines for anomaly detection.

Overview

logdata-anomaly-miner Build Status DeepSource

This tool parses log data and allows to define analysis pipelines for anomaly detection. It was designed to run the analysis with limited resources and lowest possible permissions to make it suitable for production server use.

AECID Demo – Anomaly Detection with aminer and Reporting to IBM QRadar

Requirements

In order to install logdata-anomaly-miner a Linux system with python >= 3.6 is required. Debian-based distributions are currently recommended.

See requirements.txt for further module dependencies

Installation

Debian

There are Debian packages for logdata-anomaly-miner in the official Debian/Ubuntu repositories.

apt-get update && apt-get install logdata-anomaly-miner

From source

The following command will install the latest stable release:

cd $HOME
wget https://raw.githubusercontent.com/ait-aecid/logdata-anomaly-miner/main/scripts/aminer_install.sh
chmod +x aminer_install.sh
./aminer_install.sh

Docker

For installation with Docker see: Deployment with Docker

Getting started

Here are some resources to read in order to get started with configurations:

Publications

Publications and talks:

A complete list of publications can be found at https://aecid.ait.ac.at/further-information/.

Contribution

We're happily taking patches and other contributions. Please see the following links for how to get started:

Bugs

If you encounter any bugs, please create an issue on Github.

Security

If you discover any security-related issues read the SECURITY.md first and report the issues.

License

GPL-3.0

Comments
  • Multiline support

    Multiline support

    Since issue 372 was closed, I open a new issue for multiline support. See https://github.com/ait-aecid/logdata-anomaly-miner/issues/372

    As I mentioned in the issue, it would be good to have an optional EOL parameter in the config to support simple multiline logs that are clearly separable, e.g., by \n\n that otherwise does not occur. We could also think about supporting more advanced multiline logs, in particular, json formatted logs where each json object spans over several lines rather than a single line. This could be solved by counting brackets, i.e., the ByteStreamAtomizer increases a counter (initially set to 0) for every "{" and decreases it for every "}" (or any other user-defined characters), and passes a log_atom to the parser every time this counter reaches 0.

    enhancement 
    opened by landauermax 15
  • Allowlist and blocklist for detector path lists

    Allowlist and blocklist for detector path lists

    allowlisted_paths in ECD should be named blocklisted_paths, since these paths are not considered for detection.

    allowlisted_paths should also exist, but does the oppsite: Only when one of the paths in the logatom match dictionary contains one of the allowlisted_paths, analysis should be carried out.

    The attribute paths should overrule these lists.

    This feature should be available for all detectors that may be analyzing all available parser matches, such as the VTD.

    enhancement 
    opened by landauermax 15
  • Fix import warnings

    Fix import warnings

    /usr/lib/python3.6/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from spec or package, falling back on name and path

    return f(*args, **kwds)

    should not occur, when running the aminer.

    bug 
    opened by 4cti0nfi9ure 15
  • %z makes parsing way too slow

    %z makes parsing way too slow

    When using the %z in the parsing model (see slow.txt), I get around 50 lines per second. Without it I get around 1000 lines per second (see fast.txt). There is something wrong with parsing %z in the DateTimeModelElement.

    fast.txt slow.txt train.log config.py.txt

    bug high 
    opened by landauermax 12
  • added nullable functionality to JsonModelElements.

    added nullable functionality to JsonModelElements.

    Make sure these boxes are signed before submitting your Pull Request -- thank you.

    Must haves

    • [x] I have read and followed the contributing guide lines at https://github.com/ait-aecid/logdata-anomaly-miner/wiki/Git-development-workflow
    • [x] Issues exist for this PR
    • [x] I added related issues using the "Fixes #"-notations
    • [x] This Pull-Requests merges into the "development"-branch

    Fixes #1061 Fixes #1074

    Submission specific

    • [ ] This PR introduces breaking changes
    • [ ] My change requires a change to the documentation
    • [ ] I have updated the documentation accordingly
    • [ ] I have added tests to cover my changes
    • [ ] All new and existing tests passed

    Describe changes:

    opened by ernstleierzopf 11
  • Create backups of persistency

    Create backups of persistency

    There should be a parameter for the command line that backups the persistency in regular intervals. Also, there should be a command for the remote control that saves the persistency when executed.

    The persistency should be copied into a directory /var/lib/aminer/backup/yyyy-mm-dd-hh-mm-ss/...

    There should also be the possibility to restore configs, by remote control, config settings, etc.

    enhancement 
    opened by landauermax 11
  • Tabs in logs

    Tabs in logs

    My log file contains tabulators (e.g. System name:\tTESTNAME). However, the byte strings in the parsing models cannot interpret these tabulators (\t): FixedDataModelElement('fixed1', b'System name:\t'),

    How can I make it possible for the tabs to be interpreted correctly?

    opened by tschohanna 10
  • Add overall output for aminer

    Add overall output for aminer

    There should be a way to write everything that the AMiner outputs in a file. For example, in the beginning of the config, a parameter StandardOutput: "/etc/aminer/output.txt" can be set, where all the output (anomalies, errors, etc) is written to in addition to the usual output components. By default, it should be None and not write anything.

    enhancement 
    opened by landauermax 10
  • Warning if two detectors persist on same file

    Warning if two detectors persist on same file

    It is possible to define two detectors of the same type that will end up persisting in the same file - this can especially happen by accident, when the "Default" name is used. We should not prevent it completely, but at least print a warning when two or more detectors persist on the same file.

    enhancement 
    opened by landauermax 9
  • AtomFilterMatchAction YAML support

    AtomFilterMatchAction YAML support

    There should be a way to use a MatchRule so that only logs that match are forwarded to a specific detector, using the AtomFilterMatchAction. This can be done in python configs, but not in yaml configs. Also, tests and documentation is missing.

    enhancement high 
    opened by landauermax 8
  • Paths to JSON list elements

    Paths to JSON list elements

    I have this sample data:

    root@user-5:/home/ubuntu# cat file3.log 
    {"a": ["success", "a.png"]}
    {"a": ["success", "b.png"]}
    {"a": ["fail", "c.png"]}
    {"a": ["success", "c.png"]}
    

    The values in the list should be detected with a value detector. They should not be mixed, i.e., the first and second element in the list are independent.

    I use the following config to parse the file:

    LearnMode: True
    
    LogResourceList:
      - "file:///home/ubuntu/file3.log"
    
    Parser:  
           - id: x
             type: VariableByteDataModelElement
             name: 'x'
             args: '.abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGGHIJKLMNOPQRSTUVWXYZ'
    
           - id: json
             start: True
             type: JsonModelElement
             name: 'model'
             key_parser_dict:
               "a": 
                 - x
    
    Input:
            timestamp_paths: None
            verbose: True
            json_format: True
    
    Analysis:
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/x'
              learn_mode: true
              persistence_id: test
    
    EventHandlers:
            - id: stpe
              json: true
              type: StreamPrinterEventHandler
    

    Note that I use a value detector on the list. The result is as follows:

    root@user-5:/home/ubuntu# cat /var/lib/aminer/NewMatchPathValueDetector/test 
    ["bytes:a.png", "bytes:c.png", "bytes:b.png"]
    

    Only the last value has been learned, but I also want to learn the first element in the array.

    I propose to model all elements of the lists as their own elements, so that the parser looks like this:

    Parser:
           - id: y
             type: FixedWordlistDataModelElement
             name: 'y'
             args:
               - 'success'
               - 'fail'
                 
           - id: x
             type: VariableByteDataModelElement
             name: 'x'
             args: '.abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGGHIJKLMNOPQRSTUVWXYZ'
    
           - id: json
             start: True
             type: JsonModelElement
             name: 'model'
             key_parser_dict:
               "a": 
                 - y
                 - x
    

    and the analysis could look like this, where each element can be addressed individually by an analysis component:

    Analysis:
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/x'
              learn_mode: true
              persistence_id: test
    
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/y'
              learn_mode: true
              persistence_id: test
    

    The current implementation uses a single element to model all elements of the list. This can also be convenient and should be possible by introducing a new element called ListOfElements. It should parse any number of elements in the list with the specified parsing model element. For example, the list of elements here is a list of variable byte elements:

    Parser:
           - id: loe
             type: ListOfElements
             name: 'loe'
             args: z
                 
           - id: z
             type: VariableByteDataModelElement
             name: 'z'
             args: '.abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGGHIJKLMNOPQRSTUVWXYZ'
    
           - id: json
             start: True
             type: JsonModelElement
             name: 'model'
             key_parser_dict:
               "a": 
                 - loe
    

    The ListOfElements element should then assign the index of the element in the JSON list at the end of the path. For example, the following paths can be used in the analysis section:

    Analysis:
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/loe/0'
              learn_mode: true
              persistence_id: test
    
            - id: vd
              type: NewMatchPathValueDetector
              paths:
                  - '/model/loe/1'
              learn_mode: true
              persistence_id: test
    
    enhancement medium 
    opened by landauermax 8
  • extended FrequencyDetector wiki tests.

    extended FrequencyDetector wiki tests.

    Make sure these boxes are signed before submitting your Pull Request -- thank you.

    Must haves

    • [x] I have read and followed the contributing guide lines at https://github.com/ait-aecid/logdata-anomaly-miner/wiki/Git-development-workflow
    • [x] Issues exist for this PR
    • [x] I added related issues using the "Fixes #"-notations
    • [x] This Pull-Requests merges into the "development"-branch

    Fixes #1008 Fixes #1009

    Submission specific

    • [ ] This PR introduces breaking changes
    • [ ] My change requires a change to the documentation
    • [ ] I have updated the documentation accordingly
    • [ ] I have added tests to cover my changes
    • [ ] All new and existing tests passed

    Describe changes:

    opened by ernstleierzopf 0
  • fixed test26 so no fix definition number has to be added.

    fixed test26 so no fix definition number has to be added.

    Make sure these boxes are signed before submitting your Pull Request -- thank you.

    Must haves

    • [x] I have read and followed the contributing guide lines at https://github.com/ait-aecid/logdata-anomaly-miner/wiki/Git-development-workflow
    • [x] Issues exist for this PR
    • [x] I added related issues using the "Fixes #"-notations
    • [x] This Pull-Requests merges into the "development"-branch

    Fixes #1181

    Submission specific

    • [ ] This PR introduces breaking changes
    • [ ] My change requires a change to the documentation
    • [ ] I have updated the documentation accordingly
    • [ ] I have added tests to cover my changes
    • [ ] All new and existing tests passed

    Describe changes:

    opened by ernstleierzopf 0
  • Random test fails when new detector is added

    Random test fails when new detector is added

    When adding a new detector and running the tests, they usually fail at test26_filter_config_errors in YamlConfigTest.py as there is an integer that needs to be incremented. For example, see PR #1180 where this had to be fixed when adding a new detector. It is hard to spot why this test fails as it has nothing to do with the added detector and it is not an indicator of something that needs to be fixed. I therefore suggest to modify this test case so that no matter what integer comes after the "definition" keyword, the test passes. Then adding new detectors in the future should not make it necessary to always update this test.

    test medium 
    opened by landauermax 0
  • Add possibility to run some LogResources as json input and some as normal text input.

    Add possibility to run some LogResources as json input and some as normal text input.

    LogResourceList:
    
       - url: "file:///var/log/apache2/access.log"
       - url: "unix:///var/lib/akafka/aminer.sock"
         type: json  # Konfiguriert den ByteStream
         parser_id: kafka_audit_logs  # Konfiguriert den zugehörigen Parser
    
    
    Parser:
       - id: kafka_audit_logs
         type: AuditDingsParser
    
       - id: ApacheAccessModel
         start: true
    
    opened by ernstleierzopf 0
  • Shorten the build-time for docker builds

    Shorten the build-time for docker builds

    Currently the complete docker image is build at once. This takes a lot of time for each build. We could shorten the build time by inheriting from a pre-built image.

    enhancement 
    opened by whotwagner 0
Releases(V2.5.1)
  • V2.5.1(May 17, 2022)

    Bugfixes:

    • EFD: Fixed problem that appears with empty windows
    • Fixed index out of range if matches are empty in JsonModelElement array.
    • EFD: Fixed problem that appears with empty windows
    • EFD: Enabled immediate detection without training, if both limits are set
    • EFD: Fixed bug related to auto_include_flag
    • Remove spaces in aminer logo
    • ParserCounter: Fixed do_timer
    • Fixed code to allow the usage of AtomFilterMatchAction in yaml configs
    • Fixed JsonModelElement when json object is null
    • Fix incorrect message of charset detector
    • Fix match list handling for json objects
    • Fix incorrect message of charset detector

    Changes:

    • Added nullable functionality to JsonModelElements
    • Added include-directive to supervisord.conf
    • ETD: Output warning when count first exceeds range
    • EFD: Added option to output anomaly when the count first exceeds the range
    • VTD: Added variable type 'range'
    • EFD: Added the function reset_counter
    • EFD: Added option to set the lower and upper limit of the range interval
    • Enhance EFD to consider multiple time windows
    • VTD: Changed the value of parameter num_updates_until_var_reduction to track all variables from False to 0.
    • PAD: Used the binom_test of the scipy package as test if the model should be reinitialized if too few anomalies occur than are expected
    • Add ParsedLogAtom to aminer parser to ensure compatibility with lower versions
    • Added script to add build-id to the version-string
    • Support for installations from source in install-script
    • Fixed and stadardize the persistence time of various detectors
    • Refactoring
    • Improve performance
    • Improve output handling
    • Improved testing
    Source code(tar.gz)
    Source code(zip)
  • V2.5.0(Dec 6, 2021)

    Bugfixes:

    • Fixed bug in YamlConfig

    Changes:

    • Added supervisord to docker
    • Moved unparsed atom handlers to analysis(yamlconfig)
    • Moved new_match_path_detector to analysis(yamlconfig)
    • Refactor: merged all UnparsedHandlers into one python-file
    • Added remotecontrol-command for reopening eventhandlers
    • Added config-parameters for logrotation
    • Improved testing
    Source code(tar.gz)
    Source code(zip)
  • V2.4.2(Nov 24, 2021)

    Bugfixes:

    • PVTID: Fixed output format of previously appeared times
    • VTD: Fixed bugs (static -> discrete)
    • VTD: Fixed persistency-bugs
    • Fixed %z performance issues
    • Fixed error where optional keys with an array type are not parsed when being null
    • Fixed issues with JasonModelElement
    • Fixed persistence handling for ValueRangeDetector
    • PTSAD: Fixed a bug, which occurs, when the ETD stops saving the values of one analyzed path
    • ETD: Fixed the problem when entries of the match_dictionary are not of type MatchElement
    • Fixed error where json data instead of array was parsed successfully.

    Changes:

    • Added multiple parameters to VariableCorrelationDetector
    • Improved VTD
    • PVTID: Renamed parameter time_window_length to time_period_length
    • PVTID: Added check if atom time is None
    • Enhanced output of MTTD and PVTID
    • Improved docker-compose-configuration
    • Improved testing
    • Enhanced PathArimaDetector
    • Improved documentation
    • Improved KernelMsgParsingModel
    • Added pretty print for json output
    • Added the PathArimaDetector
    • TSA: Added functionality to discard arima models with too few log lines per time step
    • TSA: improved confidence calculation
    • TSA: Added the option to force the period length
    • TSA: Automatic selection of the pause area of the ACF
    • Extended EximGenericParsingModel
    • Extended AudispdParsingModel
    Source code(tar.gz)
    Source code(zip)
  • V2.4.1(Jul 23, 2021)

    Bugfixes:

    • Fixed issues with array of arrays in JsonParser
    • Fixed problems with invalid json-output
    • Fixed ValueError in DTME
    • Fixed error with parsing floats in scientific notation with the JsonModelElement.
    • Fixed issue with paths in JsonModelElement
    • Fixed error with \x encoded json
    • Fixed error where EMPTY_ARRAY and EMPTY_OBJECT could not be parsed from the yaml config
    • Fixed a bug in the TSA when encountering a new event type
    • Fixed systemd script
    • Fixed encoding errors when reading yaml configs

    Changes:

    • Add entropy detector
    • Add charset detector
    • Add value range detector
    • Improved ApacheAccessModel, AudispdParsingModel
    • Refactoring
    • Improved documentation
    • Improved testing
    • Improved schema for yaml-config
    • Added EMPTY_STRING option to the JsonModelElement
    • Implemented check to report unparsed atom if ALLOW_ALL is used with data with a type other than list or dict
    Source code(tar.gz)
    Source code(zip)
  • V2.4.0(Jun 10, 2021)

    Bugfixes:

    • Fixed error in JsonModelElement
    • Fixed problems with umlauts in JsonParser
    • Fixed problems with the start element of the ElementValueBranchModelElement
    • Fixed issues with the stat and debug command line parameters
    • Fixed issues if posix acl are not supported by the filesystem
    • Fixed issues with output for non ascii characters
    • Modified kafka-version

    Changes:

    • Improved command-line-options install-script
    • Added documentation
    • Improved VTD CM-Test
    • Improved unit-tests
    • Refactoring
    • Added TSAArimaDetector
    • Improved ParserCount
    • Added the PathValueTimeIntervalDetector
    • Implemented offline mode
    • Added PCA detector
    • Added timeout-paramter to ESD
    Source code(tar.gz)
    Source code(zip)
  • V2.3.1(Apr 8, 2021)

  • V2.3.0(Mar 31, 2021)

    Bugfixes:

    • Changed pyyaml-version to 5.4
    • NewMatchIdValueComboDetector: Fix allow multiple values per id path
    • ByteStreamLineAtomizer: fixed encoding error
    • Fixed too many open directory-handles
    • Added close() function to LogStream

    Changes:

    • Added EventFrequencyDetector
    • Added EventSequenceDetector
    • Added JsonModelElement
    • Added tests for Json-Handling
    • Added command line parameter for update checks
    • Improved testing
    • Splitted yaml-schemas into multiple files
    • Improved support for yaml-config
    • YamlConfig: set verbose default to true
    • Various refactoring
    Source code(tar.gz)
    Source code(zip)
  • V2.2.3(Feb 5, 2021)

  • V2.2.2(Jan 29, 2021)

  • V2.2.1(Jan 26, 2021)

    Bugfixes:

    • Fixed warnigs due to files in Persistency-Directory
    • Fixed ACL-problems in dockerfile and autocreate /var/lib/aminer/log

    Changes:

    • Added simple test for dockercontainer
    • Negate result of the timeout-command. 1 is okay. 0 must be an error
    • Added bullseye-tests
    • Make tmp-dir in debian-bullseye-test and debian-buster-test unique
    Source code(tar.gz)
    Source code(zip)
  • V2.2.0(Dec 23, 2020)

    Changes:

    • Added Dockerfile
    • Addes checks for acl of persistency directory
    • Added VariableCorrelationDetector
    • Added tool for managing multiple persistency files
    • Added supress-list for output
    • Added suspend-mode to remote-control
    • Added requirements.txt
    • Extended documentation
    • Extended yaml-configuration-support
    • Standardize command line parameters
    • Removed --Forground cli parameter
    • Fixed Security warnings by removing functions that allow race-condition
    • Refactoring
    • Ethical correct naming of variables
    • Enhanced testing
    • Added statistic outputs
    • Enhanced status info output
    • Changed global learn_mode behavior
    • Added RemoteControlSocket to yaml-config
    • Reimplemented the default mailnotificationhandler

    Bugfixes:

    • Fixed typos in documentation
    • Fixed issue with the AtomFilter in the yaml-config
    • Fixed order of ETD in yaml-config
    • Fixed various issues in persistency
    Source code(tar.gz)
    Source code(zip)
  • V2.1.0(Nov 5, 2020)

    • Changes:
      • Added VariableTypeDetector,EventTypeDetector and EventCorrelationDetector
      • Added support for unclean format strings in the DateTimeModelElement
      • Added timezones to the DateTimeModelElement
      • Enhanced ApacheAccessModel
      • Yamlconfig: added support for kafka stream
      • Removed cpu limit configuration
      • Various refactoring
      • Yamlconfig: added support for more detectors
      • Added new command-line-parameters
      • Renamed executables to aminer.py and aminerremotecontroly.py
      • Run Aminer in forgroundd-mode per default
      • Added various unit-tests
      • Improved yamlconfig and checks
      • Added start-config for parser to yamlconfig
      • Renamed config templates
      • Removed imports from init.py for better modularity
      • Created AnalysisComponentsPerformanceTests for the EventTypeDetector
      • Extended demo-config
      • Renamed whitelist to allowlist
      • Added warnings for non-existent resources
      • Changed default of auto_include_flag to false
    • Bugfixes:
      • Fixed some exit() in forks
      • Fixed debian files
      • Fixed JSON output of the AffectedLogAtomValues in all detectors
      • Fixed normal output of the NewMatchPathValueDetector
      • Fixed reoccuring alerting in MissingMatchPathValueDetector
    Source code(tar.gz)
    Source code(zip)
  • V2.0.2(Jul 17, 2020)

    • Changes:
      • Added help parameters
      • Added help-screen
      • Added version parameter
      • Adden path and value filter
      • Change time model of ApacheAccessModel for arbitrary time zones
      • Update link to documentation
      • Added SECURITY.md
      • Refactoring
      • Updated man-page
      • Added unit-tests for loadYamlconfig
    • Bugfixes:
      • Fixed header comment type in schema file
      • Fix debian files
    Source code(tar.gz)
    Source code(zip)
  • V2.0.1(Jun 24, 2020)

    • Changes:
      • Updated documentation
      • Updated testcases
      • Updated demos
      • Updated debian files
      • Added copyright headers
      • Added executable bit to AMiner
    Source code(tar.gz)
    Source code(zip)
  • V2.0.0(May 29, 2020)

    • Changes:
      • Updated documentation
      • Added functions getNameByComponent and getIdByComponent to AnalysisChild.py
      • Update DefaultMailNotificationEventHandler.py to python3
      • Extended AMinerRemoteControl
      • Added support for configuration in yaml format
      • Refactoring
      • Added KafkaEventHandler
      • Added JsonConverterHandler
      • Added NewMatchIdValueComboDetector
      • Enabled multiple default timestamp paths
      • Added debug feature ParserCount
      • Added unit and integration tests
      • Added installer script
      • Added VerboseUnparsedHandler
    • Bugfixes including:
      • Fixed dependencies in Debian packaging
      • Fixed typo in various analysis components
      • Fixed import of ModelElementInterface in various parsing components
      • Fixed issues with byte/string comparison
      • Fixed issue in DecimalIntegerValueModelElement, when parsing integer including sign and padding character
      • Fixed unnecessary long blocking time in SimpleMultisourceAtomSync
      • Changed minum matchLen in DelimitedDataModelElement to 1 byte
      • Fixed timezone offset in ModuloTimeMatchRule
      • Minor bugfixes
    Source code(tar.gz)
    Source code(zip)
Owner
AECID
Automatic Event Correlation for Incident Detection
AECID
Parses data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)

google_takeout_parser parses both the Historical HTML and new JSON format for Google Takeouts caches individual takeout results behind cachew merge mu

Sean Breckenridge 27 Dec 28, 2022
Office365 (Microsoft365) audit log analysis tool

Office365 (Microsoft365) audit log analysis tool The header describes it all WHY?? The first line of code was written long time before other colleague

Anatoly 1 Jul 27, 2022
An Integrated Experimental Platform for time series data anomaly detection.

Curve Sorry to tell contributors and users. We decided to archive the project temporarily due to the employee work plan of collaborators. There are no

Baidu 486 Dec 21, 2022
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

Tuplex 791 Jan 4, 2023
Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

null 1 Nov 22, 2021
simple way to build the declarative and destributed data pipelines with python

unipipeline simple way to build the declarative and distributed data pipelines. Why you should use it Declarative strict config Scaffolding Fully type

aliaksandr-master 0 Jan 26, 2022
Streamz helps you build pipelines to manage continuous streams of data

Streamz helps you build pipelines to manage continuous streams of data. It is simple to use in simple cases, but also supports complex pipelines that involve branching, joining, flow control, feedback, back pressure, and so on.

Python Streamz 1.1k Dec 28, 2022
Data pipelines built with polars

valves Warning: the project is very much work in progress. Valves is a collection of functions for your data .pipe()-lines. This project aimes to host

null 14 Jan 3, 2023
Python library for creating data pipelines with chain functional programming

PyFunctional Features PyFunctional makes creating data pipelines easy by using chained functional operators. Here are a few examples of what it can do

Pedro Rodriguez 2.1k Jan 5, 2023
The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

Bell Eapen 14 Jan 2, 2023
Statistical Analysis 📈 focused on statistical analysis and exploration used on various data sets for personal and professional projects.

Statistical Analysis ?? This repository focuses on statistical analysis and the exploration used on various data sets for personal and professional pr

Andy Pham 1 Sep 3, 2022
PipeChain is a utility library for creating functional pipelines.

PipeChain Motivation PipeChain is a utility library for creating functional pipelines. Let's start with a motivating example. We have a list of Austra

Michael Milton 2 Aug 7, 2022
🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

???? ??. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

Marc Skov Madsen 97 Dec 8, 2022
Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python ??

Thomas 2 May 26, 2022
Flenser is a simple, minimal, automated exploratory data analysis tool.

Flenser Have you ever been handed a dataset you've never seen before? Flenser is a simple, minimal, automated exploratory data analysis tool. It runs

John McCambridge 79 Sep 20, 2022
A set of functions and analysis classes for solvation structure analysis

SolvationAnalysis The macroscopic behavior of a liquid is determined by its microscopic structure. For ionic systems, like batteries and many enzymes,

MDAnalysis 19 Nov 24, 2022
Python data processing, analysis, visualization, and data operations

Python This is a Python data processing, analysis, visualization and data operations of the source code warehouse, book ISBN: 9787115527592 Descriptio

FangWei 1 Jan 16, 2022
Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data.

Hatchet Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data. It is intended for analyzing

Lawrence Livermore National Laboratory 14 Aug 19, 2022