(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)

Overview

IsoTree

Fast and multi-threaded implementation of Extended Isolation Forest, Fair-Cut Forest, SCiForest (a.k.a. Split-Criterion iForest), and regular Isolation Forest, for outlier/anomaly detection, plus additions for imputation of missing values, distance/similarity calculation between observations, and handling of categorical data. Written in C++ with interfaces for Python and R. An additional wrapper for Ruby can be found here.

The new concepts in this software are described in:

Description

Isolation Forest is an algorithm originally developed for outlier detection that consists in splitting sub-samples of the data according to some attribute/feature/column at random. The idea is that, the rarer the observation, the more likely it is that a random uniform split on some feature would put outliers alone in one branch, and the fewer splits it will take to isolate an outlier observation like this. The concept is extended to splitting hyperplanes in the extended model (i.e. splitting by more than one column at a time), and to guided (not entirely random) splits in the SCiForest model that aim at isolating outliers faster and finding clustered outliers.

Note that this is a black-box model that will not produce explanations or importances - for a different take on explainable outlier detection see OutlierTree.

image

(Code to produce these plots can be found in the R examples in the documentation)

Comparison against other libraries

The folder timings contains a speed comparison against other Isolation Forest implementations in Python (SciKit-Learn, EIF) and R (IsolationForest, isofor, solitude). From the benchmarks, IsoTree tends to be at least 1 order of magnitude faster than the libraries compared against in both single-threaded and multi-threaded mode.

Example timings for 100 trees and different sample sizes, CovType dataset - see the link above for full benchmark and details:

Library Model Time (s) 256 Time (s) 1024 Time (s) 10k
isotree orig 0.00161 0.00631 0.0848
isotree ext 0.00326 0.0123 0.168
eif orig 0.149 0.398 4.99
eif ext 0.16 0.428 5.06
h2o orig 9.33 11.21 14.23
h2o ext 1.06 2.07 17.31
scikit-learn orig 8.3 8.01 6.89
solitude orig 32.612 34.01 41.01

Example AUC as outlier detector in typical datasets (notebook to produce results here):

  • Satellite dataset:
Library AUC defaults AUC grid search
isotree 0.70 0.84
eif - 0.714
scikit-learn 0.687 0.74
h2o 0.662 0.748
  • Annthyroid dataset:
Library AUC defaults AUC grid search
isotree 0.80 0.982
eif - 0.808
scikit-learn 0.836 0.836
h2o 0.80 0.80

(Disclaimer: these are rather small datasets and thus these AUC estimates have high variance)

Non-random splits

While the original idea behind isolation forests consisted in deciding splits uniformly at random, it's possible to get better performance at detecting outliers in some datasets (particularly those with multimodal distributions) by determining splits according to an information gain criterion instead. The idea is described in "Revisiting randomized choices in isolation forests" along with some comparisons of different split guiding criteria.

Distance / similarity calculations

General idea was extended to produce distance (alternatively, similarity) between observations according to how many random splits it takes to separate them - idea is described in "Distance approximation using Isolation Forests".

Imputation of missing values

The model can also be used to impute missing values in a similar fashion as kNN, by taking the values from observations in the terminal nodes of each tree in which an observation with missing values falls at prediction time, combining the non-missing values of the other observations as a weighted average according to the depth of the node and the number of observations that fall there. This is not related to how the model handles missing values internally, but is rather meant as a faster way of imputing by similarity. Quality is usually not as good as chained equations, but the method is a lot faster and more scalable. Recommended to use non-random splits when used as an imputer. Details are described in "Imputing missing values with unsupervised random trees".

Highlights

There's already many available implementations of isolation forests for both Python and R (such as the one from the original paper's authors' or the one in SciKit-Learn), but at the time of writing, all of them are lacking some important functionality and/or offer sub-optimal speed. This particular implementation offers the following:

  • Implements the extended model (with splitting hyperplanes) and split-criterion model (with non-random splits).
  • Can handle missing values (but performance with them is not so good).
  • Can handle categorical variables (one-hot/dummy encoding does not produce the same result).
  • Can use a mixture of random and non-random splits, and can split by weighted/pooled gain (in addition to simple average).
  • Can produce approximated pairwise distances between observations according to how many steps it takes on average to separate them down the tree.
  • Can produce missing value imputations according to observations that fall on each terminal node.
  • Can work with sparse matrices.
  • Supports sample/observation weights, either as sampling importance or as distribution density measurement.
  • Supports user-provided column sample weights.
  • Can sample columns randomly with weights given by kurtosis.
  • Uses exact formula (not approximation as others do) for harmonic numbers at lower sample and remainder sizes, and a higher-order approximation for larger sizes.
  • Can fit trees incrementally to user-provided data samples.
  • Produces serializable model objects with reasonable file sizes.
  • Can convert the models to treelite format (Python-only and depending on the parameters that are used) (example here).
  • Can translate the generated trees into SQL statements.
  • Fast and multi-threaded C++ code with an ISO C interface, which is architecture-agnostic, multi-platform, and with the only external dependency (Robin-Map) being optional. Can be wrapped in languages other than Python/R/Ruby.

(Note that categoricals, NAs, and density-like sample weights, are treated heuristically with different options as there is no single logical extension of the original idea to them, and having them present might degrade performance/accuracy for regular numerical non-missing observations)

Installation

  • Python:
pip install isotree

or if that fails:

pip install --no-use-pep517 isotree

Note for macOS users: on macOS, the Python version of this package might compile without multi-threading capabilities. In order to enable multi-threading support, first install OpenMP:

brew install libomp

And then reinstall this package: pip install --force-reinstall isotree.


  • R:
install.packages("isotree")
  • C and C++:
git clone --recursive https://www.github.com/david-cortes/isotree.git
cd isotree
mkdir build
cd build
cmake -DUSE_MARCH_NATIVE=1 ..
cmake --build .

### for a system-wide install in linux
sudo make install
sudo ldconfig

(Will build as a shared object - linkage is then done with -lisotree)

Be aware that the snippet above includes option -DUSE_MARCH_NATIVE=1, which will make it use the highest-available CPU instruction set (e.g. AVX2) and will produces objects that might not run on older CPUs - to build more "portable" objects, remove this option from the cmake command.

The package has an optional dependency on the Robin-Map library, which is added to this repository as a linked submodule. If this library is not found under /src, will use the compiler's own hashmaps, which are less optimal.

  • Ruby:

See external repository with wrapper.

Sample usage

Warning: default parameters in this implementation are very different from default parameters in others such as SciKit-Learn's, and these defaults won't scale to large datasets (see documentation for details).

  • Python:

(Library is SciKit-Learn compatible)

import numpy as np
from isotree import IsolationForest

### Random data from a standard normal distribution
np.random.seed(1)
n = 100
m = 2
X = np.random.normal(size = (n, m))

### Will now add obvious outlier point (3, 3) to the data
X = np.r_[X, np.array([3, 3]).reshape((1, m))]

### Fit a small isolation forest model
iso = IsolationForest(ntrees = 10, nthreads = 1)
iso.fit(X)

### Check which row has the highest outlier score
pred = iso.predict(X)
print("Point with highest outlier score: ",
      X[np.argsort(-pred)[0], ])
  • R:

(see documentation for more examples - help(isotree::isolation.forest))

### Random data from a standard normal distribution
library(isotree)
set.seed(1)
n <- 100
m <- 2
X <- matrix(rnorm(n * m), nrow = n)

### Will now add obvious outlier point (3, 3) to the data
X <- rbind(X, c(3, 3))

### Fit a small isolation forest model
iso <- isolation.forest(X, ntrees = 10, nthreads = 1)

### Check which row has the highest outlier score
pred <- predict(iso, X)
cat("Point with highest outlier score: ",
    X[which.max(pred), ], "\n")
  • C++:

The package comes with two different C++ interfaces: (a) a struct-based interface which exposes the full library's functionalities but makes little checks on the inputs it receives and is perhaps a bit difficult to use due to the large number of arguments that functions require; and (b) a scikit-learn-like interface in which the model exposes a single class with methods like 'fit' and 'predict', which is less flexible than the struct-based interface but easier to use and the function signatures disallow some potential errors due to invalid parameter combinations.

See files: isotree_cpp_ex.cpp for an example with the struct-based interface; and isotree_cpp_oop_ex.cpp for an example with the scikit-learn-like interface.

Note that the second interface does not expose all the functionalities - for example, it only supports inputs of classes 'double' and 'int', while the struct-based interface also supports 'float'/'size_t'.

  • C:

See file isotree_c_ex.c.

Note that the C interface is a simple wrapper over the scikit-learn-like C++ interface, but using only ISO C bindings for better compatibility and easier wrapping in other languages.

  • Ruby

See external repository with wrapper.

Examples

  • Python: example notebook here, (also example as imputer in sklearn pipeline here, and example converting to treelite here).
  • R: examples available in the documentation (help(isotree::isolation.forest), link to CRAN).
  • C and C++: see short examples in the section above.
  • Ruby: see external repository with wrapper.

Documentation

  • Python: documentation is available at ReadTheDocs.
  • R: documentation is available internally in the package (e.g. help(isolation.forest)) and in CRAN.
  • C++: documentation is available in the public header (include/isotree.hpp) and in the source files. See also the header for the scikit-learn-like interface (include/isotree_oop.hpp).
  • C: interface is not documented per-se, but the same documentation from the C++ header applies to it. See also its header for some non-comprehensive comments about the parameters that functions take (include/isotree_c.h).
  • Ruby: see external repository with wrapper for the syntax and the Python docs for details about the parameters.

Help wanted

The package does not currenly have any functionality for visualizing trees. Pull requests adding such functionality would be welcome.

References

  • Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation forest." 2008 Eighth IEEE International Conference on Data Mining. IEEE, 2008.
  • Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation-based anomaly detection." ACM Transactions on Knowledge Discovery from Data (TKDD) 6.1 (2012): 3.
  • Hariri, Sahand, Matias Carrasco Kind, and Robert J. Brunner. "Extended Isolation Forest." arXiv preprint arXiv:1811.02141 (2018).
  • Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "On detecting clustered anomalies using SCiForest." Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, 2010.
  • https://sourceforge.net/projects/iforest/
  • https://math.stackexchange.com/questions/3388518/expected-number-of-paths-required-to-separate-elements-in-a-binary-tree
  • Quinlan, J. Ross. C4. 5: programs for machine learning. Elsevier, 2014.
  • Cortes, David. "Distance approximation using Isolation Forests." arXiv preprint arXiv:1910.12362 (2019).
  • Cortes, David. "Imputing missing values with unsupervised random trees." arXiv preprint arXiv:1911.06646 (2019).
  • Cortes, David. "Revisiting randomized choices in isolation forests." arXiv preprint arXiv:2110.13402 (2021).
Comments
  • R Crashes when running isolation.forest ndim possibly the issue

    R Crashes when running isolation.forest ndim possibly the issue

    Hi, I believe i have narrowed it down to the ndim parameter. Each time i run isolation.foreset with the ndim parameter >1 R session aborts immediately. Can you please help me understand why this happens?

    opened by tararae7 50
  • Error in fit_model

    Error in fit_model

    I get this error when running the following model, but I don't always get the error. Sometimes it runs fine.

    Error in fit_model(pdata$X_num, pdata$X_cat, unname(pdata$ncat), pdata$Xc, : negative length vectors are not allowed

    isotree_mdl2 <- isolation.forest(df,

    •                             ntrees =400,
      
    •                             sample_size=256,
      
    •                             ndim=1,
      
    •                             prob_pick_pooled_gain=0,
      
    •                             prob_pick_avg_gain=0,
      
    •                             penalize_range = FALSE,
      
    •                             missing_action="fail",
      
    •                             nthreads = parallel::detectCores()-9)
      
    opened by tararae7 31
  • Trapped signals

    Trapped signals

    Hi David, it looks like running fit_iforest adds a signal handler that causes signals to be ignored. You can test it out with a basic Flask app.

    from flask import Flask
    import numpy as np
    from isotree import IsolationForest
    app = Flask(__name__)
    
    @app.route('/')
    def hello_world():
        X = np.random.normal(size = (100, 2))
        iso = IsolationForest(ntrees = 10, ndim = 2, nthreads = 1)
        iso.fit(X)
        return 'Hello, World!'
    

    Start the server and visit the page. When trying to exit with ctrl+C, it prints:

    ^CError: procedure was interrupted
    ^CError: procedure was interrupted
    ^CError: procedure was interrupted
    ^CError: procedure was interrupted
    ^CError: procedure was interrupted
    
    opened by ankane 18
  • isolation.forest() is not reproducible whenever `nthreads > 1`

    isolation.forest() is not reproducible whenever `nthreads > 1`

    Hi @david-cortes, thanks for a great package. I'm writing a book on tree-based methods and am including a section on isolation forests using your package (which works really well). I've noticed, however, that the anomaly scores are not reproducible (at least for me) when specifying the seed via set.seed() or the random_seed argument. Reproducible example below:

    library(isotree)
    
    
    # Generate fake data (no anomalies)
    set.seed(101)
    X <- as.data.frame(matrix(rnorm(5 * 100), ncol = 5))
    
    # Fit an isolation forest
    ifo <- isolation.forest(X, random_seed = 102)
    
    # Compute anomaly scores
    head(scores <- predict(ifo, newdata = X))
    # [1] 0.4002608 0.4996714 0.5253563 0.4303659 0.4204118 0.4323855
    
    #
    # Run again, but notice different scores with same seed
    #
    
    # Generate fake data (no anomalies)
    set.seed(101)
    X <- as.data.frame(matrix(rnorm(5 * 100), ncol = 5))
    
    # Fit an isolation forest
    ifo <- isolation.forest(X, random_seed = 102)
    
    # Compute anomaly scores
    head(scores <- predict(ifo, newdata = X))
    # [1] 0.3950409 0.4929140 0.5302152 0.4239435 0.4225947 0.4325836
    

    Is this a bug, or am I missing something?

    opened by bgreenwell 13
  • Reproductibility problems with Extented Isolation Forest

    Reproductibility problems with Extented Isolation Forest

    Hi @david-cortes, thank you for this great package.

    I'm currently using isotree to fit an extented isolation forest model. My issue is the following : I created, fitted and tested for anomaly detection an instance of IsolationForest with : (ndim=2, max_samples = int(len(data)/20, ntrees=500, ntry=1, random_seed=0,max_depth=12, missing_action="fail", coefs="normal", standardize_data=True, penalize_range=True,n_threads=2,bootstrap=False,prob_pick_pool_gain=1)

    After this I implmeented the same model with the same hyperparameters in another script of mine. However, when looking at the scores after fitting this model to the same data as the previous one I find different values. The values are really close to the ones obtained previously but are still different. I wonder whether there is a randomness factor that I didn't control through my parameters (I thought fixing random seed would suffice) or if it is a real issue. Many thanks in advance for your assistance.

    opened by Harmadah 8
  • problem saving (exporting) model with imputer

    problem saving (exporting) model with imputer

    Hi. I upgraded to the latest version (Successfully installed isotree-0.1.31) so the imputer model can get saved alongside the main model. I am getting the following error:

    iso.export_model(model_save_folder + preprocess_missing_model_file_name, use_cpp=True)


    RuntimeError Traceback (most recent call last) in 5 res = _files_fct.save_object_to_pkl(missing_imputer, model_save_folder + preprocess_missing_model_file_name) 6 else: ----> 7 iso.export_model(model_save_folder + preprocess_missing_model_file_name, use_cpp=True) 8 del iso 9 gc.collect()

    /anaconda/envs/isotree_2_missing/lib/python3.8/site-packages/isotree/init.py in export_model(self, file, use_cpp) 1458 with open(file + ".metadata", "w") as of: 1459 json.dump(metadata, of, indent=4) -> 1460 self._cpp_obj.serialize_obj(file, use_cpp, self.ndim > 1, has_imputer=self.build_imputer) 1461 return self 1462

    isotree/cpp_interface.pyx in isotree._cpp_interface.isoforest_cpp_obj.serialize_obj()

    RuntimeError: Failed to write 3064 bytes to output stream! Wrote 864

    Any idea? It's a large model - could this be the problem?

    Thank you

    opened by tmontana 8
  • Installing isotree compatibility issues

    Installing isotree compatibility issues

    Hi David,

    We are having issues installing isotree on our R Server. I could install it fine locally on my laptop but when i requested to have it installed on our R Studio Server we are receiving incompatibility issues. I am being told its a possible C++ issue. Are you familiar with any sort of incompatibilities like this when trying to install the package?

    opened by tararae7 8
  • Difficulty installing Isotree for R on Linux

    Difficulty installing Isotree for R on Linux

    I am trying to install isotree for R on Linux but I am getting the following error:

    In file included from Rwrapper.cpp:75:0: isotree.h:224:12: error: ‘isinf’ is already declared in this scope using std::isinf; ^ isotree.h:225:12: error: ‘isnan’ is already declared in this scope using std::isnan; ^ make: *** [Rwrapper.o] Error 1 ERROR: compilation failed for package ‘isotree’

    • removing ‘/home/myname/R/x86_64-pc-linux-gnu-library/4.0/isotree’ Warning in install.packages : installation of package ‘isotree’ had non-zero exit status

    The downloaded source packages are in ‘/tmp/RtmpGs2W4l/downloaded_packages’

    I'm out of my depth with error and don't know how to troubleshoot it. I'd be very grateful if you had any suggestions.

    opened by jgarrigan 7
  • Problem with saving trained isolation forest when `categ_cols` is not None.

    Problem with saving trained isolation forest when `categ_cols` is not None.

    Hello David,

    I am facing another issue when I am trying to save a trained iso forest with categ_cols not None. When I do not provide any categorical column numbers (when its None), the model is saved, but when this is not the case, I get this error:

      File "/scratch/vsahil/data-drift-explanation/GOAD/goad-pyenv/lib/python3.6/site-packages/isotree/__init__.py", line 2132, in export_model
        json.dump(metadata, of, indent=4)
      File "/usr/lib64/python3.6/json/__init__.py", line 179, in dump
        for chunk in iterable:
      File "/usr/lib64/python3.6/json/encoder.py", line 430, in _iterencode
        yield from _iterencode_dict(o, _current_indent_level)
      File "/usr/lib64/python3.6/json/encoder.py", line 404, in _iterencode_dict
        yield from chunks
      File "/usr/lib64/python3.6/json/encoder.py", line 404, in _iterencode_dict
        yield from chunks
      File "/usr/lib64/python3.6/json/encoder.py", line 325, in _iterencode_list
        yield from chunks
      File "/usr/lib64/python3.6/json/encoder.py", line 437, in _iterencode
        o = _default(o)
      File "/usr/lib64/python3.6/json/encoder.py", line 180, in default
        o.__class__.__name__)
    TypeError: Object of type 'int32' is not JSON serializable```
    
    
    Do you have any clue why this is happening and how can I circumvent this problem? I verified that all the values in the columns marked as categorical are integers with values starting at 0 (I am passing a numpy array in the `fit` function). 
    opened by vsahil 6
  • pip install issue: cc1plus no such file or directory

    pip install issue: cc1plus no such file or directory

    Issue

    I cannot pip install isotree. Is there a way to fix it without sudo? Is there a pre-built wheel anywhere? Thanks!

    Command and output

    pip install isotree --no-cache-dir
    Defaulting to user installation because normal site-packages is not writeable
    Looking in indexes: ...
    Collecting isotree
      Downloading 
    ...tar.gz (288 kB)
         ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 288.1/288.1 kB 15.1 MB/s eta 0:00:00
      Installing build dependencies ... done
      Getting requirements to build wheel ... done
      Preparing metadata (pyproject.toml) ... done
    Building wheels for collected packages: isotree
      Building wheel for isotree (pyproject.toml) ... error
      error: subprocess-exited-with-error
      
      × Building wheel for isotree (pyproject.toml) did not run successfully.
      │ exit code: 1
      ╰─> [29 lines of output]
          /tmp/pip-build-env-v6v2li_1/overlay/lib/python3.9/site-packages/setuptools/_distutils/extension.py:134: UserWarning: Unknown Extension options: 'install_requires'
            warnings.warn(msg)
          running bdist_wheel
          running build
          running build_py
          creating build
          creating build/lib.linux-x86_64-cpython-39
          creating build/lib.linux-x86_64-cpython-39/isotree
          copying isotree/__init__.py -> build/lib.linux-x86_64-cpython-39/isotree
          running build_ext
          g++: error: unrecognized command line option ‘-std=c++17’
          g++: error: unrecognized command line option ‘-std=gnu++14’
          --- Checking compiler support for option '-fopenmp'
          --- Checking compiler support for '__restrict' qualifier
          --- Checking compiler support for option '-O3'
          --- Checking compiler support for option '-fno-math-errno'
          --- Checking compiler support for option '-fno-trapping-math'
          --- Checking compiler support for option '-std=c++17'
          --- Checking compiler support for option '-std=gnu++14'
          --- Checking compiler support for option '-std=c++11'
          --- Checking compiler support for option '-flto'
          cythoning isotree/cpp_interface.pyx to isotree/cpp_interface.cpp
          building 'isotree._cpp_interface' extension
          creating build/temp.linux-x86_64-cpython-39
          creating build/temp.linux-x86_64-cpython-39/isotree
          creating build/temp.linux-x86_64-cpython-39/src
          gcc -pthread -B /opt/deep_learning/conda/envs/my_env/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/deep_learning/conda/envs/my_env/include -fPIC -O2 -isystem /opt/deep_learning/conda/envs/my_env/include -march=x86-64 -fPIC -D_USE_XOSHIRO -D_FOR_PYTHON -DSUPPORTS_RESTRICT=1 -D_USE_ROBIN_MAP -I/tmp/pip-build-env-v6v2li_1/overlay/lib/python3.9/site-packages/numpy/core/include -I. -I./src -I/opt/deep_learning/conda/envs/my_env/include/python3.9 -c isotree/cpp_interface.cpp -o build/temp.linux-x86_64-cpython-39/isotree/cpp_interface.o -fopenmp -O3 -fno-math-errno -fno-trapping-math -std=c++11 -flto
          gcc: error trying to exec 'cc1plus': execvp: No such file or directory
          error: command '/usr/bin/gcc' failed with exit code 1
          [end of output]
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
      ERROR: Failed building wheel for isotree
    Failed to build isotree
    ERROR: Could not build wheels for isotree, which is required to install pyproject.toml-based projects
    

    System information

    Ubuntu 2018.03 on a managed cloud computing service within a docker container. I don't have the ability to sudo install stuff and there is a reluctance to change the docker image.

    opened by rmurphy2718 5
  • Same category is always imputed when enough trees are grown

    Same category is always imputed when enough trees are grown

    See this example:

    from sklearn.datasets import load_iris
    import pandas as pd
    import numpy as np
    iris = pd.concat(load_iris(return_X_y=True, as_frame=True), axis=1)
    iris["target"] = iris["target"].astype("category")
    
    amp_iris = iris.copy()
    na_where = {}
    for c in iris.columns:
        na_where[c] = sorted(np.random.choice(amp_iris.shape[0], size=25, replace=False))
        amp_iris.loc[na_where[c],c] = np.NaN
    
    # Only class 0 was imputed
    from isotree import IsolationForest
    imputer = IsolationForest(
        ntrees=100,
        build_imputer=True,
        ndim=1,
        missing_action="impute"
    )
    imp_iris = imputer.fit_transform(amp_iris)
    t = "target"
    imp_iris.loc[na_where[t], t].unique()
    
    # Use less trees, process is much more accurate
    imputer = IsolationForest(
        ntrees=10,
        build_imputer=True,
        ndim=1,
        missing_action="impute"
    )
    imp_iris = imputer.fit_transform(amp_iris)
    (imp_iris.loc[na_where[t], t] == iris.loc[na_where[t], t]).mean()
    

    Using any number of trees over 100 caused only the first class (0) to ever be imputed. Using only 10 trees usually makes the imputation much more accurate. I tried playing around with different max_depths, but to no avail. Are there any obvious parameters I am missing to make the categorical imputation more accurate?

    opened by AnotherSamWilson 3
(JMLR'19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Python Outlier Detection (PyOD) Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License PyOD is a comprehensive and sca

Yue Zhao 6.6k Jan 3, 2023
Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)

Python Streaming Anomaly Detection (PySAD) PySAD is an open-source python framework for anomaly detection on streaming multivariate data. Documentatio

Selim Firat Yilmaz 181 Dec 18, 2022
A Python Library for Graph Outlier Detection (Anomaly Detection)

PyGOD is a Python library for graph outlier detection (anomaly detection). This exciting yet challenging field has many key applications, e.g., detect

PyGOD Team 757 Jan 4, 2023
TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

null 912 Jan 8, 2023
Python package for missing-data imputation with deep learning

MIDASpy Overview MIDASpy is a Python package for multiply imputing missing data using deep learning methods. The MIDASpy algorithm offers significant

MIDASverse 77 Dec 3, 2022
Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

Sharpened Cosine Similarity A layer implementation for PyTorch Install At your c

Brandon Rohrer 203 Nov 30, 2022
This is a GUI interface which can process forest fire detection, smoke detection and fire segmentation

This is a GUI interface which can process forest fire detection, smoke detection and fire segmentation. Yolov5 is used to detect fire and smoke and unet is used to segment fire.

null 7 Jan 8, 2023
This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et al. 2020

README This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et a

Raghav 42 Dec 15, 2022
SSD: A Unified Framework for Self-Supervised Outlier Detection [ICLR 2021]

SSD: A Unified Framework for Self-Supervised Outlier Detection [ICLR 2021] Pdf: https://openreview.net/forum?id=v5gjXpmR8J Code for our ICLR 2021 pape

Princeton INSPIRE Research Group 113 Nov 27, 2022
Outlier Exposure with Confidence Control for Out-of-Distribution Detection

OOD-detection-using-OECC This repository contains the essential code for the paper Outlier Exposure with Confidence Control for Out-of-Distribution De

Nazim Shaikh 64 Nov 2, 2022
Deep Anomaly Detection with Outlier Exposure (ICLR 2019)

Outlier Exposure This repository contains the essential code for the paper Deep Anomaly Detection with Outlier Exposure (ICLR 2019). Requires Python 3

Dan Hendrycks 464 Dec 27, 2022
(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework Background: Outlier detection (OD) is a key data mining task for identify

Yue Zhao 127 Jan 5, 2023
Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

Adam Goodge 25 Dec 28, 2022
A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

OpenHands OpenHands is a gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor. Currently the system can iden

Paul Treanor 12 Jan 10, 2022
Certifiable Outlier-Robust Geometric Perception

Certifiable Outlier-Robust Geometric Perception About This repository holds the implementation for certifiably solving outlier-robust geometric percep

null 83 Dec 31, 2022
VOS: Learning What You Don’t Know by Virtual Outlier Synthesis

VOS This is the source code accompanying the paper VOS: Learning What You Don’t

null 248 Dec 25, 2022