machine learning with logical rules in Python

Last update: Dec 31, 2022

Related tags

Sklearn Utilities skope-rules

Overview

skope-rules

Skope-rules is a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license.

Skope-rules aims at learning logical, interpretable rules for "scoping" a target class, i.e. detecting with high precision instances of this class.

Skope-rules is a trade off between the interpretability of a Decision Tree and the modelization power of a Random Forest.

See the AUTHORS.rst file for a list of contributors.

Installation

You can get the latest sources with pip :

pip install skope-rules

Quick Start

SkopeRules can be used to describe classes with logical rules :

from sklearn.datasets import load_iris
from skrules import SkopeRules

dataset = load_iris()
feature_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
clf = SkopeRules(max_depth_duplication=2,
                 n_estimators=30,
                 precision_min=0.3,
                 recall_min=0.1,
                 feature_names=feature_names)

for idx, species in enumerate(dataset.target_names):
    X, y = dataset.data, dataset.target
    clf.fit(X, y == idx)
    rules = clf.rules_[0:3]
    print("Rules for iris", species)
    for rule in rules:
        print(rule)
    print()
    print(20*'=')
    print()

SkopeRules can also be used as a predictor if you use the "score_top_rules" method :

from sklearn.datasets import load_boston
from sklearn.metrics import precision_recall_curve
from matplotlib import pyplot as plt
from skrules import SkopeRules

dataset = load_boston()
clf = SkopeRules(max_depth_duplication=None,
                 n_estimators=30,
                 precision_min=0.2,
                 recall_min=0.01,
                 feature_names=dataset.feature_names)

X, y = dataset.data, dataset.target > 25
X_train, y_train = X[:len(y)//2], y[:len(y)//2]
X_test, y_test = X[len(y)//2:], y[len(y)//2:]
clf.fit(X_train, y_train)
y_score = clf.score_top_rules(X_test) # Get a risk score for each test example
precision, recall, _ = precision_recall_curve(y_test, y_score)
plt.plot(recall, precision)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision Recall curve')
plt.show()

For more examples and use cases please check our documentation. You can also check the demonstration notebooks.

Links with existing literature

The main advantage of decision rules is that they are offering interpretable models. The problem of generating such rules has been widely considered in machine learning, see e.g. RuleFit [1], Slipper [2], LRI [3], MLRules[4].

A decision rule is a logical expression of the form "IF conditions THEN response". In a binary classification setting, if an instance satisfies conditions of the rule, then it is assigned to one of the two classes. If this instance does not satisfy conditions, it remains unassigned.

In [2, 3, 4], rules induction is done by considering each single decision rule as a base classifier in an ensemble, which is built by greedily minimizing some loss function.
In [1], rules are extracted from an ensemble of trees; a weighted combination of these rules is then built by solving a L1-regularized optimization problem over the weights as described in [5].

In this package, we use the second approach. Rules are extracted from tree ensemble, which allow us to take advantage of existing fast algorithms (such as bagged decision trees, or gradient boosting) to produce such tree ensemble. Too similar or duplicated rules are then removed, based on a similarity threshold of their supports.. The main goal of this package is to provide rules verifying precision and recall conditions. It still implement a score (decision_function) method, but which does not solve the L1-regularized optimization problem as in [1]. Instead, weights are simply proportional to the OOB associated precision of the rule.

This package also offers convenient methods to compute predictions with the k most precise rules (cf score_top_rules() and predict_top_rules() functions).

[1] Friedman and Popescu, Predictive learning via rule ensembles,Technical Report, 2005.

[2] Cohen and Singer, A simple, fast, and effective rule learner, National Conference on Artificial Intelligence, 1999.

[3] Weiss and Indurkhya, Lightweight rule induction, ICML, 2000.

[4] Dembczyński, Kotłowski and Słowiński, Maximum Likelihood Rule Ensembles, ICML, 2008.

[5] Friedman and Popescu, Gradient directed regularization, Technical Report, 2004.

Dependencies

skope-rules requires:

Python (>= 2.7 or >= 3.3)
NumPy (>= 1.10.4)
SciPy (>= 0.17.0)
Pandas (>= 0.18.1)
Scikit-Learn (>= 0.17.1)

For running the examples Matplotlib >= 1.1.1 is required.

Documentation

You can access the full project documentation here

You can also check the notebooks/ folder which contains some examples of utilization.

Comments

TerminatedWorkerError

I keep running into a TerminatedWorkerError when running clf.fit with skope rules. I seem to have ample memory so I'm unsure what's going on. Any potential ideas?

Traceback (most recent call last):
  File "experiment.py", line 171, in <module>
    result = process(topic)
  File "experiment.py", line 95, in process
    clf.fit(features, training_data_labels)
  File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/skrules/skope_rules.py", line 312, in fit
    clf.fit(X, y)
  File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/ensemble/bagging.py", line 244, in fit
    return self._fit(X, y, self.max_samples, sample_weight=sample_weight)
  File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/ensemble/bagging.py", line 378, in _fit
    for i in range(n_jobs))
  File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py", line 930, in __call__
    self.retrieve()
  File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py", line 833, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 521, in wrap_future_result
    return future.result(timeout=timeout)
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
sklearn.externals.joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGKILL(-9)}

opened by AlJohri 37

Remove unnecessary checks for numpy/scipy in setup.py.

These deps are listing in install_requires, so will get installed by pip.

Addresses issue #19.

Interesting project, thanks for your dev efforts folks :-)

opened by timstaley 7

Skope Rules should accept any kind of feature name

SkopeRules uses pandas.eval method for evaluating semantic rules. It leads to error when features have meaningful characters in their name (eg: (,)=- ). For example :

from sklearn.datasets import load_iris
from skrules import SkopeRules
dataset = load_iris()

X, y, features_names = dataset.data, dataset.target, dataset.feature_names
y = (y == 0)  # Predicting the first specy vs all
clf = SkopeRules(max_depth_duplication=2,
                 n_estimators=30,
                 precision_min=0.3,
                 recall_min=0.1,
                 feature_names=features_names)
clf.fit(X, y)

will lead to following error :

Traceback (most recent call last):
  File "main.py", line 20, in <module>
    clf.fit(X, y)
  File "/usr/local/lib/python3.6/site-packages/skrules/skope_rules.py", line 350, in fit
    for r in set(rules_from_tree)]
  File "/usr/local/lib/python3.6/site-packages/skrules/skope_rules.py", line 350, in <listcomp>
    for r in set(rules_from_tree)]
  File "/usr/local/lib/python3.6/site-packages/skrules/skope_rules.py", line 600, in _eval_rule_perf
    detected_index = list(X.query(rule).index)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2297, in query
    res = self.eval(expr, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2366, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/eval.py", line 290, in eval
    truediv=truediv)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 732, in __init__
    self.terms = self.parse()
  File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 749, in parse
    return self._visitor.visit(self.expr)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 310, in visit
    node = ast.fix_missing_locations(ast.parse(clean))
  File "/usr/local/Cellar/python3/3.6.4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 1
    petal length (cm )<=2.5999999046325684

Skope Rules should accept any kind of feature name. It means we have to transform feature name for computation and transforming it back at the end.

opened by floriangardin 3

Fix import error in modern Python

collections.Iterable alias was removed in Python 3.10 and typing.Iterable alias is marked as deprecated; fallback to explicit import from collections.abc.

opened by patrick-nicholson-czi 1

ImportError: cannot import name 'Iterable' from 'collections'

Python 3.10
skope-rules==1.0.1

Error

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Input In [1], in <module>
     15 from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
     16 from interpret.glassbox import ExplainableBoostingClassifier
---> 17 from skrules import SkopeRules

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/skrules/__init__.py:1, in <module>
----> 1 from .skope_rules import SkopeRules
      2 from .rule import Rule, replace_feature_name
      4 __all__ = ['SkopeRules', 'Rule']

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/skrules/skope_rules.py:2, in <module>
      1 import numpy as np
----> 2 from collections import Counter, Iterable
      3 import pandas
      4 import numbers

ImportError: cannot import name 'Iterable' from 'collections' (/t/pyenv/versions/3.10.2/lib/python3.10/collections/__init__.py)

opened by elcolie 1

Fix/update tests

Description

This PR contains some fixes to get tests working and build passing, primarily around updating tests and imports to handle deprecation of various sklearn testing functions and other imports.

Instead of pinning a new sklearn version, have tried to maintain compatibility with a bunch of try-except blocks, but would be happy to hear thoughts on this approach. May also be worthwhile to add different sklearn versions in the travis CI build.

opened by AndrewTanQB 1
issue in mask indexing

Hi, thank you for sharing this great package.

However, I think I might find a mistake in the mask indexing.

mask = ~samples

samples is numpy array, and when you put ~, you can get -(value+1).

ex. samples = np.array([1,2,3,4]) ~samples [-2, -3, -4, -5]

please check this issue.

Thanks!

opened by stat17-hb 1
Release new version to pypi.org?

There are a number of useful commits on the master branch, e.g. https://github.com/scikit-learn-contrib/skope-rules/pull/24.

It's been more than 1.5 years since the last release. Would it be possible for you to upload a new package to pypi.org?

opened by ecederstrand 1
Any variable name can be used in "feature_names"

Now any variable name can be used in the "feature_names" list parameter of Skope Rules. I decoupled the feature names from the internal queries logic.

opened by floriangardin 1
conda-forge package

It would be nice to add a skope-rules package to conda-forge https://conda-forge.org/ (in addition to pypi)

P.S. You can use grayskull https://github.com/conda-incubator/grayskull to generate a boilerplate for the conda recipe.

opened by candalfigomoro 2
cannot import name 'ScopeRules' from 'skrules'

Hi!

The package import spell, which is clearly described in the package readme, does not work

Six imported. What should I do to make the package work?

opened by avraam-inside 1
Questions about how to use and interpret rules?
Can SkopeRules be used for multiclass classification or only binary classification.

How do I interpret the outputted decision rules? Do the top-k rules in the example notebook correspond to the rules that best classify the test data, ordered in descending order by precision? If I want to classify new test data, do I consider the top-1 rule, the majority vote from the top-k rules, or some other approach?

If I want to understand the underlying method and how rules are computed, is Predictive Learning via Rule Ensembles by Friedman and Popescu the closest work?
opened by preethiseshadri518 0

Not compatible with sklearn v1?

Minimal example:

>>> import sklearn
>>> sklearn.__version__
1.0.1
>>> import skrules
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-3-195b491d5645> in <module>
----> 1 import skrules

~/.virtualenvs/risk-modeling/lib/python3.9/site-packages/skrules/__init__.py in <module>
----> 1 from .skope_rules import SkopeRules
      2 from .rule import Rule, replace_feature_name
      3 
      4 __all__ = ['SkopeRules', 'Rule']

~/.virtualenvs/risk-modeling/lib/python3.9/site-packages/skrules/skope_rules.py in <module>
     10 from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
     11 from sklearn.ensemble import BaggingClassifier, BaggingRegressor
---> 12 from sklearn.externals import six
     13 from sklearn.tree import _tree
     14 

ImportError: cannot import name 'six' from 'sklearn.externals' (/home/mwintner/.virtualenvs/risk-modeling/lib/python3.9/site-packages/sklearn/externals/__init__.py)

According to some stackoverflow sources like this one, six is not in sklearn.externals beyond sklearn v0.23.

opened by mwintner-fn 1

The oob score

I think the oob score computed in the fit function is wrong.

The authors get the oob sample indices by "mask = ~samples", and then apply X[mask, :] to get the oob samples. Actually, I test the case and found that there are many same elements between samples and X[mask,:], and the length of training samples and mask samples are the same. For example, if we totally have 100 samples, when 80 samples are used to train the model, then the length of oob samples should be 100-80=20 (without considering replacement).

I also turn to the implementation of sampling oob of randomforest, and I found following codes:

random_instance = check_random_state(random_state) sample_indices = random_instance.randint(0, samples, max_samples) # get the indices of training samples sample_counts = np.bincount(sample_indices, minlength=len(samples)) unsampled_mask = sample_counts == 0 indices_range = np.arange(len(samples)) unsampled_indices = indices_range[unsampled_mask] # get the indices of oob samples

then the unsampled_indices is the truely oob sample indices.

opened by wjj5881005 0

Releases(v1.0.1)

v1.0.1(Dec 11, 2020)

Source code(tar.gz)
Source code(zip)

Owner

scikit-learn compatible projects

GitHub http://skope-rules.readthedocs.io

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

4.2k Dec 28, 2022

(AAAI' 20) A Python Toolbox for Machine Learning Model Combination

combo: A Python Toolbox for Machine Learning Model Combination Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License

606 Dec 21, 2022

Large-scale linear classification, regression and ranking in Python

lightning lightning is a library for large-scale linear classification, regression and ranking in Python. Highlights: follows the scikit-learn API con

1.6k Dec 31, 2022

Multivariate imputation and matrix completion algorithms implemented in Python

A variety of matrix completion and imputation algorithms implemented in Python 3.6. To install: pip install fancyimpute Do not use conda. We don't sup

1.1k Dec 18, 2022

A Python library for dynamic classifier and ensemble selection

DESlib DESlib is an easy-to-use ensemble learning library focused on the implementation of the state-of-the-art techniques for dynamic classifier and

425 Dec 18, 2022

Topological Data Analysis for Python🐍

Scikit-TDA is a home for Topological Data Analysis Python libraries intended for non-topologists. This project aims to provide a curated library of TD

373 Dec 24, 2022

Fully Automated YouTube Channel ▶️with Added Extra Features.

Fully Automated Youtube Channel ▒█▀▀█ █▀▀█ ▀▀█▀▀ ▀▀█▀▀ █░░█ █▀▀▄ █▀▀ █▀▀█ ▒█▀▀▄ █░░█ ░░█░░ ░▒█░░ █░░█ █▀▀▄ █▀▀ █▄▄▀ ▒█▄▄█ ▀▀▀▀ ░░▀░░ ░▒█░░ ░▀▀▀ ▀▀▀░

249 Jan 2, 2023

Deep Learning and Logical Reasoning from Data and Knowledge

Logic Tensor Networks (LTN) Logic Tensor Network (LTN) is a neurosymbolic framework that supports querying, learning and reasoning with both rich data

171 Dec 29, 2022

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage

0 Sep 5, 2021

Mastermind-Game - A game to test programming and logical skills

Bem vindo ao jogo Mastermind! O jogo consiste em adivinhar uma senha que será ge

0 Jan 27, 2022

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

One Stop Anomaly Shop (OSAS) Quick start guide Step 1: Get/build the docker image Option 1: Use precompiled image (might not reflect latest changes):

148 Dec 26, 2022

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl

6.1k Jan 5, 2023

Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir

6 Jun 30, 2022

Arbitrium is a cross-platform, fully undetectable remote access trojan, to control Android, Windows and Linux and doesn't require any firewall exceptions or port forwarding rules

About: Arbitrium is a cross-platform is a remote access trojan (RAT), Fully UnDetectable (FUD), It allows you to control Android, Windows and Linux an

861 Feb 18, 2021