machine learning with logical rules in Python

Overview

Travis Coveralls CircleCI Python27 Python35

logo.png

skope-rules

Skope-rules is a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license.

Skope-rules aims at learning logical, interpretable rules for "scoping" a target class, i.e. detecting with high precision instances of this class.

Skope-rules is a trade off between the interpretability of a Decision Tree and the modelization power of a Random Forest.

See the AUTHORS.rst file for a list of contributors.

schema.png

Installation

You can get the latest sources with pip :

pip install skope-rules

Quick Start

SkopeRules can be used to describe classes with logical rules :

from sklearn.datasets import load_iris
from skrules import SkopeRules

dataset = load_iris()
feature_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
clf = SkopeRules(max_depth_duplication=2,
                 n_estimators=30,
                 precision_min=0.3,
                 recall_min=0.1,
                 feature_names=feature_names)

for idx, species in enumerate(dataset.target_names):
    X, y = dataset.data, dataset.target
    clf.fit(X, y == idx)
    rules = clf.rules_[0:3]
    print("Rules for iris", species)
    for rule in rules:
        print(rule)
    print()
    print(20*'=')
    print()

SkopeRules can also be used as a predictor if you use the "score_top_rules" method :

from sklearn.datasets import load_boston
from sklearn.metrics import precision_recall_curve
from matplotlib import pyplot as plt
from skrules import SkopeRules

dataset = load_boston()
clf = SkopeRules(max_depth_duplication=None,
                 n_estimators=30,
                 precision_min=0.2,
                 recall_min=0.01,
                 feature_names=dataset.feature_names)

X, y = dataset.data, dataset.target > 25
X_train, y_train = X[:len(y)//2], y[:len(y)//2]
X_test, y_test = X[len(y)//2:], y[len(y)//2:]
clf.fit(X_train, y_train)
y_score = clf.score_top_rules(X_test) # Get a risk score for each test example
precision, recall, _ = precision_recall_curve(y_test, y_score)
plt.plot(recall, precision)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision Recall curve')
plt.show()

For more examples and use cases please check our documentation. You can also check the demonstration notebooks.

Links with existing literature

The main advantage of decision rules is that they are offering interpretable models. The problem of generating such rules has been widely considered in machine learning, see e.g. RuleFit [1], Slipper [2], LRI [3], MLRules[4].

A decision rule is a logical expression of the form "IF conditions THEN response". In a binary classification setting, if an instance satisfies conditions of the rule, then it is assigned to one of the two classes. If this instance does not satisfy conditions, it remains unassigned.

  1. In [2, 3, 4], rules induction is done by considering each single decision rule as a base classifier in an ensemble, which is built by greedily minimizing some loss function.
  2. In [1], rules are extracted from an ensemble of trees; a weighted combination of these rules is then built by solving a L1-regularized optimization problem over the weights as described in [5].

In this package, we use the second approach. Rules are extracted from tree ensemble, which allow us to take advantage of existing fast algorithms (such as bagged decision trees, or gradient boosting) to produce such tree ensemble. Too similar or duplicated rules are then removed, based on a similarity threshold of their supports.. The main goal of this package is to provide rules verifying precision and recall conditions. It still implement a score (decision_function) method, but which does not solve the L1-regularized optimization problem as in [1]. Instead, weights are simply proportional to the OOB associated precision of the rule.

This package also offers convenient methods to compute predictions with the k most precise rules (cf score_top_rules() and predict_top_rules() functions).

[1] Friedman and Popescu, Predictive learning via rule ensembles,Technical Report, 2005.

[2] Cohen and Singer, A simple, fast, and effective rule learner, National Conference on Artificial Intelligence, 1999.

[3] Weiss and Indurkhya, Lightweight rule induction, ICML, 2000.

[4] DembczyΕ„ski, KotΕ‚owski and SΕ‚owiΕ„ski, Maximum Likelihood Rule Ensembles, ICML, 2008.

[5] Friedman and Popescu, Gradient directed regularization, Technical Report, 2004.

Dependencies

skope-rules requires:

  • Python (>= 2.7 or >= 3.3)
  • NumPy (>= 1.10.4)
  • SciPy (>= 0.17.0)
  • Pandas (>= 0.18.1)
  • Scikit-Learn (>= 0.17.1)

For running the examples Matplotlib >= 1.1.1 is required.

Documentation

You can access the full project documentation here

You can also check the notebooks/ folder which contains some examples of utilization.

Comments
  • TerminatedWorkerError

    TerminatedWorkerError

    I keep running into a TerminatedWorkerError when running clf.fit with skope rules. I seem to have ample memory so I'm unsure what's going on. Any potential ideas?

    Traceback (most recent call last):
      File "experiment.py", line 171, in <module>
        result = process(topic)
      File "experiment.py", line 95, in process
        clf.fit(features, training_data_labels)
      File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/skrules/skope_rules.py", line 312, in fit
        clf.fit(X, y)
      File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/ensemble/bagging.py", line 244, in fit
        return self._fit(X, y, self.max_samples, sample_weight=sample_weight)
      File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/ensemble/bagging.py", line 378, in _fit
        for i in range(n_jobs))
      File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py", line 930, in __call__
        self.retrieve()
      File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py", line 833, in retrieve
        self._output.extend(job.get(timeout=self.timeout))
      File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 521, in wrap_future_result
        return future.result(timeout=timeout)
      File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
        return self.__get_result()
      File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
        raise self._exception
    sklearn.externals.joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGKILL(-9)}
    
    opened by AlJohri 37
  • Remove unnecessary checks for numpy/scipy in setup.py.

    Remove unnecessary checks for numpy/scipy in setup.py.

    These deps are listing in install_requires, so will get installed by pip.

    Addresses issue #19.

    Interesting project, thanks for your dev efforts folks :-)

    opened by timstaley 7
  • Skope Rules should accept any kind of feature name

    Skope Rules should accept any kind of feature name

    SkopeRules uses pandas.eval method for evaluating semantic rules. It leads to error when features have meaningful characters in their name (eg: (,)=- ). For example :

    from sklearn.datasets import load_iris
    from skrules import SkopeRules
    dataset = load_iris()
    
    X, y, features_names = dataset.data, dataset.target, dataset.feature_names
    y = (y == 0)  # Predicting the first specy vs all
    clf = SkopeRules(max_depth_duplication=2,
                     n_estimators=30,
                     precision_min=0.3,
                     recall_min=0.1,
                     feature_names=features_names)
    clf.fit(X, y)
    

    will lead to following error :

    Traceback (most recent call last):
      File "main.py", line 20, in <module>
        clf.fit(X, y)
      File "/usr/local/lib/python3.6/site-packages/skrules/skope_rules.py", line 350, in fit
        for r in set(rules_from_tree)]
      File "/usr/local/lib/python3.6/site-packages/skrules/skope_rules.py", line 350, in <listcomp>
        for r in set(rules_from_tree)]
      File "/usr/local/lib/python3.6/site-packages/skrules/skope_rules.py", line 600, in _eval_rule_perf
        detected_index = list(X.query(rule).index)
      File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2297, in query
        res = self.eval(expr, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2366, in eval
        return _eval(expr, inplace=inplace, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/eval.py", line 290, in eval
        truediv=truediv)
      File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 732, in __init__
        self.terms = self.parse()
      File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 749, in parse
        return self._visitor.visit(self.expr)
      File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 310, in visit
        node = ast.fix_missing_locations(ast.parse(clean))
      File "/usr/local/Cellar/python3/3.6.4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ast.py", line 35, in parse
        return compile(source, filename, mode, PyCF_ONLY_AST)
      File "<unknown>", line 1
        petal length (cm )<=2.5999999046325684
    

    Skope Rules should accept any kind of feature name. It means we have to transform feature name for computation and transforming it back at the end.

    opened by floriangardin 3
  • Fix import error in modern Python

    Fix import error in modern Python

    collections.Iterable alias was removed in Python 3.10 and typing.Iterable alias is marked as deprecated; fallback to explicit import from collections.abc.

    opened by patrick-nicholson-czi 1
  • ImportError: cannot import name 'Iterable' from 'collections'

    ImportError: cannot import name 'Iterable' from 'collections'

    Python 3.10
    skope-rules==1.0.1
    

    Error

    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    Input In [1], in <module>
         15 from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
         16 from interpret.glassbox import ExplainableBoostingClassifier
    ---> 17 from skrules import SkopeRules
    
    File /t/pyenv/versions/py-default/lib/python3.10/site-packages/skrules/__init__.py:1, in <module>
    ----> 1 from .skope_rules import SkopeRules
          2 from .rule import Rule, replace_feature_name
          4 __all__ = ['SkopeRules', 'Rule']
    
    File /t/pyenv/versions/py-default/lib/python3.10/site-packages/skrules/skope_rules.py:2, in <module>
          1 import numpy as np
    ----> 2 from collections import Counter, Iterable
          3 import pandas
          4 import numbers
    
    ImportError: cannot import name 'Iterable' from 'collections' (/t/pyenv/versions/3.10.2/lib/python3.10/collections/__init__.py)
    
    
    opened by elcolie 1
  • Fix/update tests

    Fix/update tests

    Description

    This PR contains some fixes to get tests working and build passing, primarily around updating tests and imports to handle deprecation of various sklearn testing functions and other imports.

    Instead of pinning a new sklearn version, have tried to maintain compatibility with a bunch of try-except blocks, but would be happy to hear thoughts on this approach. May also be worthwhile to add different sklearn versions in the travis CI build.

    opened by AndrewTanQB 1
  • issue in mask indexing

    issue in mask indexing

    Hi, thank you for sharing this great package.

    However, I think I might find a mistake in the mask indexing.

    mask = ~samples

    samples is numpy array, and when you put ~, you can get -(value+1).

    ex. samples = np.array([1,2,3,4]) ~samples [-2, -3, -4, -5]

    please check this issue.

    Thanks!

    opened by stat17-hb 1
  • Release new version to pypi.org?

    Release new version to pypi.org?

    There are a number of useful commits on the master branch, e.g. https://github.com/scikit-learn-contrib/skope-rules/pull/24.

    It's been more than 1.5 years since the last release. Would it be possible for you to upload a new package to pypi.org?

    opened by ecederstrand 1
  • Any variable name can be used in

    Any variable name can be used in "feature_names"

    Now any variable name can be used in the "feature_names" list parameter of Skope Rules. I decoupled the feature names from the internal queries logic.

    opened by floriangardin 1
  • conda-forge package

    conda-forge package

    It would be nice to add a skope-rules package to conda-forge https://conda-forge.org/ (in addition to pypi)

    P.S. You can use grayskull https://github.com/conda-incubator/grayskull to generate a boilerplate for the conda recipe.

    opened by candalfigomoro 2
  • cannot import name 'ScopeRules' from 'skrules'

    cannot import name 'ScopeRules' from 'skrules'

    Hi!

    The package import spell, which is clearly described in the package readme, does not work image

    image

    Six imported. What should I do to make the package work?

    opened by avraam-inside 1
  • Questions about how to use and interpret rules?

    Questions about how to use and interpret rules?

    1. Can SkopeRules be used for multiclass classification or only binary classification.

    2. How do I interpret the outputted decision rules? Do the top-k rules in the example notebook correspond to the rules that best classify the test data, ordered in descending order by precision? If I want to classify new test data, do I consider the top-1 rule, the majority vote from the top-k rules, or some other approach?

    3. If I want to understand the underlying method and how rules are computed, is Predictive Learning via Rule Ensembles by Friedman and Popescu the closest work?

    opened by preethiseshadri518 0
  • Not compatible with sklearn v1?

    Not compatible with sklearn v1?

    Minimal example:

    >>> import sklearn
    >>> sklearn.__version__
    1.0.1
    >>> import skrules
    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    <ipython-input-3-195b491d5645> in <module>
    ----> 1 import skrules
    
    ~/.virtualenvs/risk-modeling/lib/python3.9/site-packages/skrules/__init__.py in <module>
    ----> 1 from .skope_rules import SkopeRules
          2 from .rule import Rule, replace_feature_name
          3 
          4 __all__ = ['SkopeRules', 'Rule']
    
    ~/.virtualenvs/risk-modeling/lib/python3.9/site-packages/skrules/skope_rules.py in <module>
         10 from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
         11 from sklearn.ensemble import BaggingClassifier, BaggingRegressor
    ---> 12 from sklearn.externals import six
         13 from sklearn.tree import _tree
         14 
    
    ImportError: cannot import name 'six' from 'sklearn.externals' (/home/mwintner/.virtualenvs/risk-modeling/lib/python3.9/site-packages/sklearn/externals/__init__.py)
    

    According to some stackoverflow sources like this one, six is not in sklearn.externals beyond sklearn v0.23.

    opened by mwintner-fn 1
  • The oob score

    The oob score

    I think the oob score computed in the fit function is wrong.

    The authors get the oob sample indices by "mask = ~samples", and then apply X[mask, :] to get the oob samples. Actually, I test the case and found that there are many same elements between samples and X[mask,:], and the length of training samples and mask samples are the same. For example, if we totally have 100 samples, when 80 samples are used to train the model, then the length of oob samples should be 100-80=20 (without considering replacement).

    I also turn to the implementation of sampling oob of randomforest, and I found following codes:

    random_instance = check_random_state(random_state) sample_indices = random_instance.randint(0, samples, max_samples) # get the indices of training samples sample_counts = np.bincount(sample_indices, minlength=len(samples)) unsampled_mask = sample_counts == 0 indices_range = np.arange(len(samples)) unsampled_indices = indices_range[unsampled_mask] # get the indices of oob samples

    then the unsampled_indices is the truely oob sample indices.

    opened by wjj5881005 0
Releases(v1.0.1)
Owner
scikit-learn compatible projects
null
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 28, 2022
(AAAI' 20) A Python Toolbox for Machine Learning Model Combination

combo: A Python Toolbox for Machine Learning Model Combination Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License

Yue Zhao 606 Dec 21, 2022
Large-scale linear classification, regression and ranking in Python

lightning lightning is a library for large-scale linear classification, regression and ranking in Python. Highlights: follows the scikit-learn API con

null 1.6k Dec 31, 2022
Multivariate imputation and matrix completion algorithms implemented in Python

A variety of matrix completion and imputation algorithms implemented in Python 3.6. To install: pip install fancyimpute Do not use conda. We don't sup

Alex Rubinsteyn 1.1k Dec 18, 2022
A Python library for dynamic classifier and ensemble selection

DESlib DESlib is an easy-to-use ensemble learning library focused on the implementation of the state-of-the-art techniques for dynamic classifier and

null 425 Dec 18, 2022
Topological Data Analysis for Python🐍

Scikit-TDA is a home for Topological Data Analysis Python libraries intended for non-topologists. This project aims to provide a curated library of TD

Scikit-TDA 373 Dec 24, 2022
Fully Automated YouTube Channel ▢️with Added Extra Features.

Fully Automated Youtube Channel β–’β–ˆβ–€β–€β–ˆ β–ˆβ–€β–€β–ˆ β–€β–€β–ˆβ–€β–€ β–€β–€β–ˆβ–€β–€ β–ˆβ–‘β–‘β–ˆ β–ˆβ–€β–€β–„ β–ˆβ–€β–€ β–ˆβ–€β–€β–ˆ β–’β–ˆβ–€β–€β–„ β–ˆβ–‘β–‘β–ˆ β–‘β–‘β–ˆβ–‘β–‘ β–‘β–’β–ˆβ–‘β–‘ β–ˆβ–‘β–‘β–ˆ β–ˆβ–€β–€β–„ β–ˆβ–€β–€ β–ˆβ–„β–„β–€ β–’β–ˆβ–„β–„β–ˆ β–€β–€β–€β–€ β–‘β–‘β–€β–‘β–‘ β–‘β–’β–ˆβ–‘β–‘ β–‘β–€β–€β–€ β–€β–€β–€β–‘

sam-sepiol 249 Jan 2, 2023
Deep Learning and Logical Reasoning from Data and Knowledge

Logic Tensor Networks (LTN) Logic Tensor Network (LTN) is a neurosymbolic framework that supports querying, learning and reasoning with both rich data

null 171 Dec 29, 2022
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage

Jon C Cline 0 Sep 5, 2021
Mastermind-Game - A game to test programming and logical skills

Bem vindo ao jogo Mastermind! O jogo consiste em adivinhar uma senha que serΓ‘ ge

Marcelo Daros 0 Jan 27, 2022
One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

One Stop Anomaly Shop (OSAS) Quick start guide Step 1: Get/build the docker image Option 1: Use precompiled image (might not reflect latest changes):

Adobe, Inc. 148 Dec 26, 2022
Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir

null 6 Jun 30, 2022
Implementation of association rules mining algorithms (Apriori|FPGrowth) using python.

Association Rules Mining Using Python Implementation of association rules mining algorithms (Apriori|FPGrowth) using python. As a part of hw1 code in

Pre 2 Nov 10, 2021
Python script to generate Vale linting rules from word usage guidance in the Red Hat Supplementary Style Guide

ssg-vale-rules-gen Python script to generate Vale linting rules from word usage guidance in the Red Hat Supplementary Style Guide. These rules are use

Vale at Red Hat 1 Jan 13, 2022
Python script to autodetect a base set of swiftlint rules.

swiftlint-autodetect Python script to autodetect a base set of swiftlint rules. Installation brew install pipx

Jonathan Wight 24 Sep 20, 2022
Bazel rules to install Python dependencies with Poetry

rules_python_poetry Bazel rules to install Python dependencies from a Poetry project. Works with native Python rules for Bazel. Getting started Add th

Martin Liu 7 Dec 15, 2021
QuickStart specific rules for cfn-python-lint

AWS Quick Start cfn-lint rules This repo provides CloudFormation linting rules specific to AWS Quick Start guidelines, for more information see the Co

AWS Quick Start 12 Jul 30, 2022
This library attempts to abstract the handling of Sigma rules in Python

This library attempts to abstract the handling of Sigma rules in Python. The rules are parsed using a schema defined with pydantic, and can be easily loaded from YAML files into a structured Python object.

Caleb Stewart 44 Oct 29, 2022
Arbitrium is a cross-platform, fully undetectable remote access trojan, to control Android, Windows and Linux and doesn't require any firewall exceptions or port forwarding rules

About: Arbitrium is a cross-platform is a remote access trojan (RAT), Fully UnDetectable (FUD), It allows you to control Android, Windows and Linux an

Ayoub 861 Feb 18, 2021