One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

Related tags

Text Data & NLP OSAS
Overview

One Stop Anomaly Shop (OSAS)

Quick start guide

Step 1: Get/build the docker image

Option 1: Use precompiled image (might not reflect latest changes):

docker pull tiberiu44/osas:latest
docker image tag tiberiu44/osas:latest osas:latest

Option 2: Build the image locally

git clone https://github.com/adobe/OSAS.git
cd OSAS
docker build . -f docker/osas-elastic/Dockerfile -t osas:latest

Step 2: After building the docker image you can start OSAS by typing:

docker run -p 8888:8888/tcp -p 5601:5601/tcp -v <ABSOLUTE PATH TO DATA FOLDER>:/app osas

IMPORTANT NOTE: Please modify the above command by adding the absolute path to your datafolder in the appropiate location

After OSAS has started (it might take 1-2 minutes) you can use your browser to access some standard endpoints:

For Debug (in case you need to):

docker run -p 8888:8888/tcp -p 5601:5601/tcp -v <ABSOLUTE PATH TO DATA FOLDER>:/app -ti osas /bin/bash

Building the test pipeline

This guide will take you through all the necessary steps to configure, train and run your own pipeline on your own dataset.

Prerequisite: Add you own CSV dataset into your data-folder (the one provided in the docker run command)

Once you started your docker image, use the OSAS console to gain CLI access to all the tools.

In what follows, we assume that your dataset is called dataset.csv. Please update the commands as necessary in case you use a different name/location.

Be sure you are running scripts in the root folder of OSAS:

cd /osas

Step 1: Build a custom pipeline configuration file - this can be done fully manually on by bootstraping using our conf autogenerator script:

python3 osas/main/autoconfig.py --input-file=/app/dataset.csv --output-file=/app/dataset.conf

The above command will generate a custom configuration file for your dataset. It will try guess field types and optimal combinations between fields. You can edit the generated file (which should be available in the shared data-folder), using your favourite editor.

Standard templates for label generator types are:

[LG_MULTINOMIAL]
generator_type = MultinomialField
field_name = <FIELD_NAME>
absolute_threshold = 10
relative_threshold = 0.1

[LG_TEXT]
generator_type = TextField
field_name = <FIELD_NAME>
lm_mode = char
ngram_range = (3, 5)

[LG_NUMERIC]
generator_type = NumericField
field_name = <FIELD_NAME>

[LG_MUTLINOMIAL_COMBINER]
generator_type = MultinomialFieldCombiner
field_names = ['<FIELD_1>', '<FIELD_2>', ...]
absolute_threshold = 10
relative_threshold = 0.1

[LG_KEYWORD]
generator_type = KeywordBased
field_name = <FIELD_NAME>
keyword_list = ['<KEYWORD_1>', '<KEYWORD_2>', '<KEYWORD_3>', ...]

[LG_REGEX]
generator_type = KnowledgeBased
field_name = <FIELD_NAME>
rules_and_labels_tuple_list = [('<REGEX_1>','<LABEL_1>'), ('<REGEX_2>','<LABEL_2>'), ...]

You can use the above templates to add as many label generators you want. Just make sure that the header IDs are unique in the configuration file.

Step 2: Train the pipeline

python3 osas/main/train_pipeline --conf-file=/app/dataset.conf --input-file=/app/dataset.csv --model-file=/app/dataset.json

The above command will generate a pretrained pipeline using the previously created configuration file and the dataset

Step 3: Run the pipeline on a dataset

python3 osas/main/run_pipeline --conf-file=/app/dataset.conf --model-file=/app/dataset.json --input-file=/app/dataset.csv --output-file=/app/dataset-out.csv

The above command will run the pretrained pipeline on any compatible dataset. In the example we run the pipeline on the training data, but you can use previously unseen data. It will generate an output file with labels and anomaly scores and it will also import your data into Elasticsearch/Kibana. To view the result just use the the web interface.

Pipeline explained

The pipeline sequentially applies all label generators on the raw data, collects the labels and uses an anomaly scoring algorithm to generate anomaly scores. There are two main component classes: LabelGenerator and ScoringAlgorithm.

Label generators

NumericField

  • This type of LabelGenerator handles numerical fields. It computes the mean and standard deviation and generates labels according to the distance between the current value and the mean value (value<=sigma NORMAL, sigma<value<=2sigma BORDERLINE, 2sigma<value OUTLIER)

Params:

  • field_name: what field to look for in the data object

TextField

  • This type of LabelGenerator handles text fields. It builds a n-gram based language model and computes the perplexity of newly observed data. It also holds statistics over the training data (mean and stdev). (perplexity<=sigma NORMAL, sigma<preplexity<=2sigma BORDERLINE, 2perplexity<value OUTLIER)

Params:

  • field_name: What field to look for
  • lm_mode: Type of LM to build: char or token
  • ngram_range: N-gram range to use for computation

MultinomialField

  • This type of LabelGenerator handles fields with discreet value sets. It computes the probability of seeing a specific value and alerts based on relative and absolute thresholds.

Params

  • field_name: What field to use
  • absolute_threshold: Minimum absolute value for occurrences to trigger alert for
  • relative_threshold: Minimum relative value for occurrences to trigger alert for

MultinomialFieldCombiner

  • This type of LabelGenerator handles fields with discreet value sets and build advanced features by combining values across the same dataset entry. It computes the probability of seeing a specific value and alerts based on relative and absolute thresholds.

Params

  • field_names: What fields to combine
  • absolute_threshold: Minimum absolute value for occurrences to trigger alert for
  • relative_threshold: Minimum relative value for occurrences to trigger alert for

KeywordBased

  • This is a rule-based label generators. It applies a simple tokenization procedure on input text, by dropping special characters and numbers and splitting on white-space. It then looks for a specific set of keywords and generates labels accordingly

Params:

  • field_name: What field to use
  • keyword_list: The list of keywords to look for

OSAS has four unsupervised anomaly detection algorithms:

  • IFAnomaly: n-hot encoding, singular value decomposition, isolation forest (IF)

  • LOFAnomaly: n-hot encoding, singular value decomposition, local outlier factor (LOF)

  • SVDAnomaly: n-hot encoding, singular value decomposition, inverted transform, input reconstruction error

  • StatisticalNGramAnomaly: compute label n-gram probabilities, compute anomaly score as a sum of negative log likelihood

Comments
  • Security Vulnerability Found

    Security Vulnerability Found

    Absolute Path Traversal due to incorrect use of send_file call

    A path traversal attack (also known as directory traversal) aims to access files and directories that are stored outside the web root folder. By manipulating variables that reference files with “dot-dot-slash (../)” sequences and its variations or by using absolute file paths, it may be possible to access arbitrary files and directories stored on file system including application source code or configuration and critical system files. This attack is also known as “dot-dot-slash”, “directory traversal”, “directory climbing” and “backtracking”.

    Common Weakness Enumeration category

    CWE - 36

    Root Cause Analysis

    The os.path.join call is unsafe for use with untrusted input. When the os.path.join call encounters an absolute path, it ignores all the parameters it has encountered till that point and starts working with the new absolute path. Please see the example below.

    >>> import os.path
    >>> static = "path/to/mySafeStaticDir"
    >>> malicious = "/../../../../../etc/passwd"
    >>> os.path.join(t,malicious)
    '/../../../../../etc/passwd'
    

    Since the "malicious" parameter represents an absolute path, the result of os.path.join ignores the static directory completely. Hence, untrusted input is passed via the os.path.join call to flask.send_file can lead to path traversal attacks.

    In this case, the problems occurs due to the following code : https://github.com/adobe/OSAS/blob/f97ea247543d5386b83b078745469f3e8a727047/osas/webserver.py#L62

    Here, the filename parameter is attacker controlled. This parameter passes through the unsafe os.path.join call making the effective directory and filename passed to the send_file call attacker controlled. This leads to a path traversal attack.

    Proof of Concept

    The bug can be verified using a proof of concept similar to the one shown below.

    curl --path-as-is 'http://<domain>/osas/static//../../../../etc/passwd"'
    

    Remediation

    This can be fixed by preventing flow of untrusted data to the vulnerable send_file function. In case the application logic necessiates this behaviour, one can either use the werkzeug.utils.safe_join to join untrusted paths or replace flask.send_file calls with flask.send_from_directory calls.

    Common Vulnerability Scoring System Vector

    The attack can be carried over the network. A complex non-standard configuration or a specialized condition is not required for the attack to be successfully conducted. There is no user interaction required for successful execution. The attack can affect components outside the scope of the target module. The attack can be used to gain access to confidential files like passwords, login credentials and other secrets. It cannot be directly used to affect a change on a system resource. Hence has limited to no impact on integrity. Using this attack vector a attacker may make multiple requests for accessing huge files such as a database. This can lead to a partial system denial service. However, the impact on availability is quite low in this case. Taking this account an appropriate CVSS v3.1 vector would be

    (AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:N/A:L)[https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator?vector=AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:N/A:L&version=3.1]

    This gives it a base score of 9.3/10 and a severity rating of critical.

    References

    This bug was found using CodeQL by Github

    opened by porcupineyhairs 1
  • Supervised classifiers support

    Supervised classifiers support

    Added supervised classifier support for OSAS pipeline

    Description

    Added an anomaly classifier for users that have a supervised dataset Fixed a README typo for the run and train pipelines Fixed a hyperlink typo that caused a bug going from train_pipeline -> run_pipeline endpoints in the web server

    Related Issue

    Motivation and Context

    How Has This Been Tested?

    Manually tested with a test dataset and adjusted for different types of inputs

    Screenshots (if appropriate):

    Types of changes

    • [x ] Bug fix (non-breaking change which fixes an issue)
    • [x ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    opened by wilsontang06 1
  • Updated readme, fixed a typing error

    Updated readme, fixed a typing error

    Description

    Related Issue

    A typing error

    Motivation and Context

    To try, test and improve the project and contribute to it.

    The typing error could potentially consume a lot of the developer's time trying to find the bug.

    How Has This Been Tested?

    Screenshots (if appropriate):

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [x] I have signed the Adobe Open Source CLA.
    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    • [x] I have read the CONTRIBUTING document.
    • [] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    invalid 
    opened by devangi2000 1
  • chore : reduce docker layers and cleanup files after use to reduce size

    chore : reduce docker layers and cleanup files after use to reduce size

    Description

    reduce docker layers and cleanup files after use to reduce size

    Checklist:

    • [x] I have signed the Adobe Open Source CLA.
    • [ ] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    • [ ] I have added tests to cover my changes.
    • [ ] All new and existing tests passed.
    enhancement 
    opened by Rajpratik71 1
  • chore: Use --no-cache-dir flag to pip in Dockerfiles, to save space

    chore: Use --no-cache-dir flag to pip in Dockerfiles, to save space

    Description and ## Motivation and Context

    Using "--no-cache-dir" flag in pip install ,make sure dowloaded packages by pip don't cached on system . This is a best practise which make sure to fetch ftom repo instead of using local cached one . Further , in case of Docker Containers , by restricing caching , we can reduce image size. In term of stats , it depends upon the number of python packages multiplied by their respective size . e.g for heavy packages with a lot of dependencies it reduce a lot by don't caching pip packages.

    Further , more detail information can be found at

    https://medium.com/sciforce/strategies-of-docker-images-optimization-2ca9cc5719b6

    Signed-off-by: Pratik Raj [email protected]

    Checklist:

    • [x] I have signed the Adobe Open Source CLA.
    • [x] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have read the CONTRIBUTING document.
    • [ ] I have added tests to cover my changes.
    • [ ] All new and existing tests passed.
    enhancement 
    opened by Rajpratik71 1
  • Dev.numeric tweaks

    Dev.numeric tweaks

    Made tweaks to numeric field

    Description

    Added a "spike" mode to numeric field. Numeric field can now detect anomalies based on a spike in numbers, either by ratio or fixed. It can now either use one of standard deviation or spike, or both.

    Related Issue

    New use case for numeric fields

    Motivation and Context

    More specific granularity control over numeric fields

    How Has This Been Tested?

    Test on sample and real dataset

    Screenshots (if appropriate):

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [x] I have signed the Adobe Open Source CLA.
    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    • [x] I have read the CONTRIBUTING document.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by wilsontang06 0
  • Dev.alertlist

    Dev.alertlist

    Enable OSAS to support static rules

    Description

    I added another CLI tool apply_rules.py that enables the user to add labels and modify anomaly scores, based on static rules.

    Related Issue

    This change is based on an internal request.

    Motivation and Context

    This change enables post-pipeline execution of rules (via boolean logic). It is useful for brining more human knowledge into the pipelines.

    How Has This Been Tested?

    Tested on dummy dataset to see if the rules are matched and applied correctly

    Screenshots (if appropriate):

    Types of changes

    • [-] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [-] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [x] I have signed the Adobe Open Source CLA.
    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    • [x] I have read the CONTRIBUTING document.
    • [-] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by tiberiu44 0
  • Dev.numeric group by

    Dev.numeric group by

    This update adds group_by support for multinomial, numeric and combiner fields

    Description

    Added an optional field (group_by) that can be specified for MultinomialField, MultinomialFieldCombiner and NumericalField, which changes the behaviour of OSAS to build the statistical models around mini-groups of data. This enables better statistical modeling.

    Related Issue

    This PR is based on an internal change request

    Motivation and Context

    Previously, OSAS had issues modeling and tagging anomalies for under-represented classes. For instance, if you would try to build a model for login anomalies based on username and origin country (MultinomialField), or average CPU/memory usage based on host (NumericalField), you would find it difficult to cope for users that have a small number of events, when compared to the other users. An example could be a dataset, with 99 users that each have 5000 events and a user with only 10 events. Though all his login could originate from the same country, they will always be tagged as anomalies, because they are under-represented in the overall dataset. With the group_by option, you can simply group the login country based on the username and the statistical models will be relative per user, thus better modeling anomalies.

    How Has This Been Tested?

    This change has been validated by our TH team internally, using real datasets. We checked that the statistical model are correctly build and that the tags are assigned as expeted.

    Screenshots (if appropriate):

    Types of changes

    • [-] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [-] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [x] I have signed the Adobe Open Source CLA.
    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    • [x] I have read the CONTRIBUTING document.
    • [-] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by tiberiu44 0
  • Dev.group by label

    Dev.group by label

    Description

    This PR contains the following updates:

    • Added description of group_by operations in the labels
    • Added graceful Elastic Search error handling
    • Added option to skip elastic search push
    • Added sanity check for execution of run_pipeline. Either --output-file must be specified or --no-elastic must be removed

    Related Issue

    Internal request

    Motivation and Context

    • group_by information was missing from the labels, which could result in ambiguous label readings
    • Elastic should not be mandatory for simple pipelines, but the --no-elastic option was the only way we could avoid breaking changes

    How Has This Been Tested?

    I ran the pipeline several times to see that labels are generated correctly and I manually checked the execution sanity by trying different parameter combinations. Finally, I checked the graceful Elastic Errors by enabling elastic push and stopping my local instance

    Screenshots (if appropriate):

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [x] I have signed the Adobe Open Source CLA.
    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [-] I have updated the documentation accordingly.
    • [x] I have read the CONTRIBUTING document.
    • [-] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by tiberiu44 0
  • Multikey group_by + formatting of output JSON

    Multikey group_by + formatting of output JSON

    Description

    Extra option for multikey group_by

    Motivation and Context

    Right now, group_by only supports a single attribute. Combining multiple attribute values was internally requested

    How Has This Been Tested?

    I tested this by building pipelines with all possible group_by cases: None, single key, multi key

    Types of changes

    • [-] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [-] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [x] I have signed the Adobe Open Source CLA.
    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [-] I have updated the documentation accordingly.
    • [x] I have read the CONTRIBUTING document.
    • [-] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by tiberiu44 0
  • Dev.group by

    Dev.group by

    Added group_by optional attribute for MultinomialField and MultinomialFieldCombiner. This will allow OSAS to compute relative statistics instead of global. For instance, the value of the pair (USER, ASN) (you can imagine what this means) might be rare when compared to an entire dataset, but if you factor in something like ACCOUNT_ID (again, try to imagine what this does), then it might not be that rare.

    Related Issue

    This should fix #10 .

    Motivation and Context

    It provides better anomaly estimates for multinomial values, by enabling the user to configure group_by values.

    How Has This Been Tested?

    Screenshots (if appropriate):

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [x] I have signed the Adobe Open Source CLA.
    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    • [x] I have read the CONTRIBUTING document.
    • [-] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by tiberiu44 0
Owner
Adobe, Inc.
Open source from Adobe
Adobe, Inc.
Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

Tanuj Sur 4 Jul 1, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Dec 30, 2022
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.3k Jan 7, 2023
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.1k Feb 17, 2021
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Texar-PyTorch is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar

ASYML 726 Dec 30, 2022
Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

Samuel Cahyawijaya 11 Aug 26, 2022
LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language

LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language ⚖️ The library of Natural Language Processing for Brazilian legal lang

Felipe Maia Polo 125 Dec 20, 2022
A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

MIDI Language Introduction Reference Paper: Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions: code This

Robert Bogan Kang 3 May 25, 2022
This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Pipeline For NLP with Bloom's Taxonomy Using Improved Question Classification and Question Generation using Deep Learning This repository contains all

Rohan Mathur 9 Jul 17, 2021
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.9k Jan 2, 2023
DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

DELTA 1.5k Dec 26, 2022
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.6k Feb 18, 2021
DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

DELTA 1.4k Feb 17, 2021
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.9k Dec 31, 2022
FedNLP: A Benchmarking Framework for Federated Learning in Natural Language Processing

FedNLP is a research-oriented benchmarking framework for advancing federated learning (FL) in natural language processing (NLP). It uses FedML repository as the git submodule. In other words, FedNLP only focuses on adavanced models and dataset, while FedML supports various federated optimizers (e.g., FedAvg) and platforms (Distributed Computing, IoT/Mobile, Standalone).

FedML-AI 216 Nov 27, 2022
Deep Learning for Natural Language Processing - Lectures 2021

This repository contains slides for the course "20-00-0947: Deep Learning for Natural Language Processing" (Technical University of Darmstadt, Summer term 2021).

null 0 Feb 21, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2.3k Dec 29, 2022