Extra blocks for scikit-learn pipelines.

vincent d warmerdam

Last update: Dec 30, 2022

Related tags

Overview

scikit-lego

We love scikit learn but very often we find ourselves writing custom transformers, metrics and models. The goal of this project is to attempt to consolidate these into a package that offers code quality/testing. This project started as a collaboration between multiple companies in the Netherlands but has since received contributions from around the globe. It was initiated by Matthijs Brouns and Vincent D. Warmerdam as a tool to teach people how to contribute to open source.

Note that we're not formally affiliated with the scikit-learn project at all, but we aim to strictly adhere to their standards.

The same holds with lego. LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this project.

Installation

Install scikit-lego via pip with

python -m pip install scikit-lego

Via conda with

conda install -c conda-forge scikit-lego

Alternatively, to edit and contribute you can fork/clone and run:

python -m pip install -e ".[dev]"
python setup.py develop

Documentation

The documentation can be found here.

Usage

We offer custom metrics, models and transformers. You can import them just like you would in scikit-learn.

# the scikit learn stuff we love
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# from scikit lego stuff we add
from sklego.preprocessing import RandomAdder
from sklego.mixture import GMMClassifier

...

mod = Pipeline([
    ("scale", StandardScaler()),
    ("random_noise", RandomAdder()),
    ("model", GMMClassifier())
])

...

Features

Here's a list of features that this library currently offers:

sklego.datasets.load_abalone loads in the abalone dataset
sklego.datasets.load_arrests loads in a dataset with fairness concerns
sklego.datasets.load_chicken loads in the joyful chickweight dataset
sklego.datasets.load_heroes loads a heroes of the storm dataset
sklego.datasets.load_hearts loads a dataset about hearts
sklego.datasets.load_penguins loads a lovely dataset about penguins
sklego.datasets.fetch_creditcard fetch a fraud dataset from openml
sklego.datasets.make_simpleseries make a simulated timeseries
sklego.pandas_utils.add_lags adds lag values in a pandas dataframe
sklego.pandas_utils.log_step a useful decorator to log your pipeline steps
sklego.dummy.RandomRegressor dummy benchmark that predicts random values
sklego.linear_model.DeadZoneRegressor experimental feature that has a deadzone in the cost function
sklego.linear_model.DemographicParityClassifier logistic classifier constrained on demographic parity
sklego.linear_model.EqualOpportunityClassifier logistic classifier constrained on equal opportunity
sklego.linear_model.ProbWeightRegression linear model that treats coefficients as probabilistic weights
sklego.linear_model.LowessRegression locally weighted linear regression
sklego.linear_model.LADRegression least absolute deviation regression
sklego.linear_model.ImbalancedLinearRegression punish over/under-estimation of a model directly
sklego.naive_bayes.GaussianMixtureNB classifies by training a 1D GMM per column per class
sklego.naive_bayes.BayesianGaussianMixtureNB classifies by training a bayesian 1D GMM per class
sklego.mixture.BayesianGMMClassifier classifies by training a bayesian GMM per class
sklego.mixture.BayesianGMMOutlierDetector detects outliers based on a trained bayesian GMM
sklego.mixture.GMMClassifier classifies by training a GMM per class
sklego.mixture.GMMOutlierDetector detects outliers based on a trained GMM
sklego.meta.ConfusionBalancer experimental feature that allows you to balance the confusion matrix
sklego.meta.DecayEstimator adds decay to the sample_weight that the model accepts
sklego.meta.EstimatorTransformer adds a model output as a feature
sklego.meta.OutlierClassifier turns outlier models into classifiers for gridsearch
sklego.meta.GroupedPredictor can split the data into runs and run a model on each
sklego.meta.GroupedTransformer can split the data into runs and run a transformer on each
sklego.meta.SubjectiveClassifier experimental feature to add a prior to your classifier
sklego.meta.Thresholder meta model that allows you to gridsearch over the threshold
sklego.meta.RegressionOutlierDetector meta model that finds outliers by adding a threshold to regression
sklego.meta.ZeroInflatedRegressor predicts zero or applies a regression based on a classifier
sklego.preprocessing.ColumnCapper limits extreme values of the model features
sklego.preprocessing.ColumnDropper drops a column from pandas
sklego.preprocessing.ColumnSelector selects columns based on column name
sklego.preprocessing.InformationFilter transformer that can de-correlate features
sklego.preprocessing.IdentityTransformer returns the same data, allows for concatenating pipelines
sklego.preprocessing.OrthogonalTransformer makes all features linearly independent
sklego.preprocessing.PandasTypeSelector selects columns based on pandas type
sklego.preprocessing.PatsyTransformer applies a patsy formula
sklego.preprocessing.RandomAdder adds randomness in training
sklego.preprocessing.RepeatingBasisFunction repeating feature engineering, useful for timeseries
sklego.preprocessing.DictMapper assign numeric values on categorical columns
sklego.preprocessing.OutlierRemover experimental method to remove outliers during training
sklego.model_selection.KlusterFoldValidation experimental feature that does K folds based on clustering
sklego.model_selection.TimeGapSplit timeseries Kfold with a gap between train/test
sklego.pipeline.DebugPipeline adds debug information to make debugging easier
sklego.pipeline.make_debug_pipeline shorthand function to create a debugable pipeline
sklego.metrics.correlation_score calculates correlation between model output and feature
sklego.metrics.equal_opportunity_score calculates equal opportunity metric
sklego.metrics.p_percent_score proxy for model fairness with regards to sensitive attribute
sklego.metrics.subset_score calculate a score on a subset of your data (meant for fairness tracking)

New Features

We want to be rather open here in what we accept but we do demand three things before they become added to the project:

any new feature contributes towards a demonstratable real-world usecase
any new feature passes standard unit tests (we use the ones from scikit-learn)
the feature has been discussed in the issue list beforehand

We automate all of our testing and use pre-commit hooks to keep the code working.

Comments

[FEATURE] Bayesian Kernel Density Classifier

I've been using this Bayesian kernel density classifier for a few years and I thought I should move it out from my poorly organized project to this one here.

The prior is $P(y=0)$. I primarily use it for spatial problems.

It is similar to the GMM Classifier with only 2 caveats I can think of.

Hyperparameters are easier to decide on.
Scaling is worse as I believe due to the KDE part scaling linearly with the sample size.

# noinspection PyPep8Naming
class BayesianKernelDensityClassifier(BaseEstimator, ClassifierMixin):
    """
    Bayesian Classifier that uses Kernel Density Estimations to generate the joint distribution
    Parameters:
        - bandwidth: float
        - kernel: for scikit learn KernelDensity
    """
    def __init__(self, bandwidth=0.2, kernel='gaussian'):
        self.classes_, self.models_, self.priors_logp_ = [None] * 3
        self.bandwidth = bandwidth
        self.kernel = kernel

    def fit(self, X, y):
        self.classes_ = np.sort(np.unique(y))
        training_sets = [X[y == yi] for yi in self.classes_]
        self.models_ = [KernelDensity(bandwidth=self.bandwidth, kernel=self.kernel).fit(x_subset)
                        for x_subset in training_sets]

        self.priors_logp_ = [np.log(x_subset.shape[0] / X.shape[0]) for x_subset in training_sets]
        return self

    def predict_proba(self, X):
        logp = np.array([model.score_samples(X) for model in self.models_]).T
        result = np.exp(logp + self.priors_logp_)
        return result / result.sum(1, keepdims=True)

    def predict(self, X):
        return self.classes_[np.argmax(self.predict_proba(X), 1)]

enhancement

opened by arose13 26

[FEATURE] A light version that does not depend on cvxpy (and others?)
Please explain clearly what you'd like to see added.

[x] convince us of the use-case, we're open to many suggestions but we prefer to solve problems with pipelines that are at least somewhat general When using the package in a Docker container without C installed, installation (can?) fail on CVXPY

[x] ~~add a screenshot if applicable (ML stuff is hard to explain with words, pictures say 1000 words)~~

[x] ~~make sure that the feature you want is not already supported by sklearn~~

Happy to work on this if you agree
enhancement
opened by pim-hoeven 19
Bugfix for #158

Fixed the bug caused when using the grouped_estimator with a string column as grouping variable.

Solution: Try the checks without adjustments, if that fails: remove the grouping column from the array or dataframe.

opened by pim-hoeven 18
Timegap optimize simplify
replace two parameters df, date_col by one date_serie

insure no mutation with a copy

check if index match and if index is unique

optimise the iloc/loc/get_loc with a join and index

add plotting function with unit test

add summary function with unit test

update notebook doc
opened by stephanecollot 17

[BUG] Stacking classifier cannot use Thresholder function - no .predict_proba

Description:

I'm able to use the thresholder on sklearn's voting classifer, but not on the stacking classifier. It throws this error, which I believe is in error. StackingClassifier does have predict_proba. Maybe I'm missunderstanding the use case, but this seems to fit.

ValueError: The Thresholder meta model only works on classifcation models with .predict_proba.

Code for reproduction (using the sklearn sample data for StackingClassifier):

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.ensemble import StackingClassifier

X, y = load_iris(return_X_y=True)
estimators = [
    ('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
    ('svr', make_pipeline(StandardScaler(), LinearSVC(random_state=42)))]
clf = StackingClassifier(    estimators=estimators, final_estimator=LogisticRegression())
clf.fit(X, y)

a = Thresholder(clf, threshold=0.2)
a.fit(X, y)
a.predict(X)

Full trace:

ValueError                                Traceback (most recent call last)
<ipython-input-26-1b89dbfa16b8> in <module>
     16 
     17 a = Thresholder(clf, threshold=0.2)
---> 18 a.fit(X_train_std, np.ceil(y_train[targets[2]]))
     19 a.predict(X_train_std)

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\sklego\meta\thresholder.py in fit(self, X, y, sample_weight)
     54         self.estimator_ = clone(self.model)
     55         if not isinstance(self.estimator_, ProbabilisticClassifier):
---> 56             raise ValueError(
     57                 "The Thresholder meta model only works on classifcation models with .predict_proba."
     58             )

ValueError: The Thresholder meta model only works on classifcation models with .predict_proba.

bug

opened by L-Marriott 16

GMM methods - classification and outlier detection

As far as outlier detection is concerned, this is the current flow:

import numpy as np
import pandas as pd
import plotnine as p9

from sklego.mixture import GMMOutlierDetector

X = np.random.normal(-10, 1, (2000, 2))
mod = GMMOutlierDetector(n_components=1, threshold=0.99).fit(X)

df = pd.DataFrame({"x1": X[:, 0], "x2": X[:, 1],
                   "loglik": mod.gmm.score_samples(X), 
                   "prediction": mod.predict(X).astype(str)})

(p9.ggplot() + p9.geom_point(data=df, mapping=p9.aes("x1", "x2", color="loglik")))

(p9.ggplot() + p9.geom_point(data=df, mapping=p9.aes("x1", "x2", color="prediction")))

opened by koaning 16

Add log names and types
I feel that by copying the log_step function many times and slightly changing the logging section I'm repeating a lot of code. Do you have any suggestion to avoid this?

This is WIP, some TODOs:

rename log_step -> log_shape.

tests, docs.

log_dtypes
opened by david26694 15
[FEATURE] Pass additional parameters to fit underlying estimator in `EstimatorTransformer`

In EstimatorTransformer the underlying estimator is being fitted without the ability to pass along additional arguments to self.estimator_.fit.

This limits use cases for EstimatorTransformer. For example, if the underlying estimator is an XGBClassifier we would like to be able to pass eval_set to monitor validation performance and enable early stopping. This is currently not possible. Adding *args, **kwargs should fix this issue.

https://github.com/koaning/scikit-lego/blob/b4d087f0131ff164e6feebf238356ba6512b3635/sklego/meta/estimator_transformer.py#L31
enhancement

opened by CarloLepelaars 14
Add Repeating Basis Functions
I worked on this a while ago in PR #147, but I started from a fresh branch because I decided to limit the scope only to repeating basis functions (#20), excluding the spanning basis functions (#29).

Feedback adapted so far:

Write a transformer when X contains only one column

Write a wrapper which selects one column, when X has more than one column

Added basic tests

Added docstring

To Do:

Write documentation

Could someone review the code? I'll work on the documentation in the meantime as well.

Tagging people involved in PR #147: @kayhoogland @koaning @MBrouns
opened by RensDimmendaal 14
Fix start of train split in TimeGapSplit and added n_split parameter

Addresses changes in #192 and #232 I am currently working on a Time Series problem with vibration data where I needed a functionality like the one suggested in #232 so I decided to add it here.

I tried to explain the changes in functionality in the docstring:

Each validation fold doesn't overlap. The entire 'window' moves by 1 valid_duration until there is not enough data. If this would lead to more splits then specified with n_splits, the 'window' moves by the validation_duration times the fraction of possible splits and requested splits -- n_possible_splits = (total_length-train_duration-gap_duration)//valid_duration -- time_shift = valid_duratiopn n_possible_splits/n_slits so the CV spans the whole dataset. If train_duration is not passed but n_split, the training duration is increased to -- train_duration = total_length-(self.gap_duration + self.valid_duration * self.n_splits) such that the shifting the entire window by one validation duration spans the whole training set

The changes are also added to the docs notebook for visualization.

opened by rpauli 13
[FEATURE] WontPredict: meta model.

In the world of hype @MBrouns and myself came up with a very normal thought.

One way to acomplish this is by introduction of a new meta model: WontPredict.

This thread is meant as a place to discuss the implementation.
enhancement

opened by koaning 13
[FEATURE] Adding the MRMR (Maximum Relevance Minimum Redundancy) feature selection
Hi!

The only feature selections that scikit-learn offers are quite naive. MRMR seems like a bit more advanced and reasonable approach to select informative and non-redundant features as described here.

Long story short:

Pick a feature that is most informative in some metric (e.g. F-statistic).

Pick the next feature that is very informative, but doesn't correlate with the previous feature too much (e.g. the average absolute Pearson correlation between the current feature and the feature selected in step 2).

Pick the next feature that is very informative, but doesn't correlate with the previous 2 features too much.

Pick the next feature that is very informative, but doesn't correlate with the previous 3 features too much.

(repeat until K features selected)

Here, K, the measure of information and correlation can be specified by the user.
enhancement
opened by Garve 7
[FEATURE] allow for all kwargs when using @log_step
Hi,

When using @log_step in debugging a Pandas Pipeline, the current function must accept a single argument of df:pd.Dataframe.

However if the user sends all the parameters as kwargs there is an error .

It would be useful if the @log_step will check the first kwargs and if it is a pd.Dataframe then it will convert it into an arg - possible implementation before running the def wrapper()as follows

_kwargs = {**kwargs} first_arg= next(iter(_kwargs)) if isinstance(_kwargs[first_arg],pd.DataFrame) and len(args)==0: args=args+(_kwargs.pop[first_arg],)
enhancement
opened by sephib 6
[FEATURE] - Grid search across model parameters AND thresholds with Thresholder() without refitting
Thanks for this great set of extensions to sklearn.

The Tresholder() model is quite close to something I've been looking for for a while.

I'm looking to include threshold optimisation as part of a broader parameter search.

I can perhaps best describe the desired behaviour as follows

for each parameters in grid: fit model with parameters for each threshold in thresholds: evaluate model

However, if I pass a model that has not yet been fit to Thresholder(), then, even with refit=False, the same model is fit also for each threshold.

Is there an easy way around this? Thinking about this the best way to achieve this would be tinkering with the GridSearchCV code, but perhaps you have an idea and would also find this interesting?

Thanks!
enhancement
opened by mcallaghan 1
Selectors : allow results in empty dataframe
Before working on a large PR, please check with @koaning or @MBrouns that they agree with the direction of the PR. This discussion should take place in a Github issue before working on the PR, unless it's a minor change like spelling in the docs.

Description

Consider you want to build a semi-auto Pipeline. So, the pipeline may looks like:

import pandas as pd import numpy as np from sklearn.pipeline import Pipeline, FeatureUnion from sklearn.preprocessing import OneHotEncoder, StandardScaler transformer = Pipeline([ ('features', FeatureUnion(n_jobs=1, transformer_list=[ # Part 1 ('boolean', Pipeline([ ('selector', PandasTypeSelector(include='bool')), ])), # booleans close ('numericals', Pipeline([ ('selector', PandasTypeSelector(include='number')), ('scaler', StandardScaler()), ('add_pca', FeatureUnion([ ('orig', IdentityTransformer()), ('pca', PCA(2)) ])) ])), # numericals close # Part 2 ('categoricals', Pipeline([ ('selector', PandasTypeSelector(include='category')), ('labeler', StringIndexer()), ('encoder', OneHotEncoder(handle_unknown='ignore')), ])) # categoricals close ])), # features close ]) # pipeline close

There may be boolean, numericals and categoricals variables , but may not exists erther . Current behaviour is raise Exception when a Selector return empty DataFrame , I think we can expose a parameter let user choose .

Fixes # (issue)

Type of change

[ ] Bug fix (non-breaking change which fixes an issue)

[x ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[ ] My code follows the style guidelines (flake8)

[ ] I have commented my code, particularly in hard-to-understand areas

[ ] I have made corresponding changes to the documentation (also to the readme.md)

[ ] I have added tests that prove my fix is effective or that my feature works

[ ] I have added tests to check whether the new feature adheres to the sklearn convention

[ ] New and existing unit tests pass locally with my changes

If you feel your PR is ready for a review, ping @koaning or @mbrouns.
opened by eromoe 3
[WIP] `get_feature_names_out` for `sklego.preprocessing`.
This PR solves issue #543 and implements get_feature_names_out for all relevant transformers in sklego.preprocessing (i.e. transformers that do not contain the TrainOnlyTransformerMixin).

Functionality is implemented through adding the _ClassNamePrefixFeaturesOutMixin to the class and making sure self._n_features_out is defined in .fit. This is also generally how scikit-learn implements get_feature_names_out for many of its transformers (Example). Unit tests are added for all new functionality.

P.S. Don't pay attention to the commit history before October 10th. These changes have already been merged into koaning/scikit-lego/main, but is still displayed here as commit history. Will try to fix this. Suggestions to remove these redundant commits from the commit history of this CarloLepelaars/scikit-lego/ fork are welcome.

[x] ~~Find alternative solution for using _ClassNamePrefixFeaturesOutMixin so it works with scikit-learn on Python 3.7. (Remove Python 3.7. support)~~

[x] Add implementation of get_feature_names_out to contributing guidelines so people implement this for each new preprocessor that is not TrainOnly.

[x] Remove Python 3.7. GitHub Actions pipelines and update Optional dependencies GitHub Actions pipeline to use Python 3.8.

[x] Add general unit test that checks if get_feature_names_out can be called for all relevant preprocessors and EstimatorTransformer.
opened by CarloLepelaars 14

Releases(0.6.14)

0.6.14(Nov 2, 2022)

Added support for GroupTimeSeriesSplit in https://github.com/koaning/scikit-lego/pull/540.
Source code(tar.gz)
Source code(zip)
0.6.13(Sep 10, 2022)

Added check_X flags for GroupedPredictor and IdentityTransformer. Add support for multi-output models in the EstimatorTransformer.

Thanks for the PRs @kamilm @CarloLepelaars @skylarbpayne!
Source code(tar.gz)
Source code(zip)
0.6.12(Jun 5, 2022)

Fixed a bug related to the decision_function of the grouped predictor.
Source code(tar.gz)
Source code(zip)
0.6.11(Apr 20, 2022)

Fixed a few minor bugs. One relating to a numeric overflow in our GMM models and another relating to the Thresholder and the StackingClassifier.
Source code(tar.gz)
Source code(zip)
0.6.10(Mar 12, 2022)

check_array is now optional in grouped models.
Source code(tar.gz)
Source code(zip)
0.6.9(Dec 9, 2021)

Our loggers now capture errors as well. Great for production use-cases. https://github.com/koaning/scikit-lego/pull/493
Source code(tar.gz)
Source code(zip)
0.6.8(Jul 3, 2021)

Keep compatibility with new Pandas version.
Source code(tar.gz)
Source code(zip)
0.6.7(May 7, 2021)

Quantile regression.
Source code(tar.gz)
Source code(zip)
0.6.6(Apr 21, 2021)

Make the package cite-able.
Source code(tar.gz)
Source code(zip)
0.6.2(Oct 25, 2020)

Fixed some sklearn compatibility bugs for 0.24.0 release. Removed deprecated tools from guides in docs.
Source code(tar.gz)
Source code(zip)
0.6.1(Sep 22, 2020)

Revamped the pandas logging utilities.
Source code(tar.gz)
Source code(zip)
0.6.0(Sep 7, 2020)
added optional dependencies to compensate for cvxpy

added regression outlier detection model

Source code(tar.gz)
Source code(zip)
0.5.2(Jul 31, 2020)

GroupedTransformers and renamed GroupedEstimator to GroupedPredictor
Source code(tar.gz)
Source code(zip)
0.5.1(Jul 8, 2020)

The credicard and penguin datasets. Also a DictMapper.
Source code(tar.gz)
Source code(zip)
0.5.0(May 31, 2020)

Added PCA/UMAP methods for outlier detection.
Source code(tar.gz)
Source code(zip)
0.4.4(May 26, 2020)

We now have support for Lowess Regression!
Source code(tar.gz)
Source code(zip)
0.4.3(May 13, 2020)
fixed bugs for scikit-learn v0.23.0

added the hearts dataset

Source code(tar.gz)
Source code(zip)
0.4.2(May 3, 2020)
we've added more submodules to keep the code clean

we've gotten a nice update on the TimeGapSplit crossvalidator, thanks @rpauli

Source code(tar.gz)
Source code(zip)
0.4.0(Feb 12, 2020)

added a dataset for fairness added kernel classifiers added monotonic interval encoders
Source code(tar.gz)
Source code(zip)
0.3.4(Jan 17, 2020)
added a constrained linear model sklego.linear_model.ProbWeightRegression

added support for grouped metrics

bugfixes and precommit hooks

Source code(tar.gz)
Source code(zip)
0.3.3(Oct 23, 2019)

mayor fix... gmm detectors were broken.
Source code(tar.gz)
Source code(zip)
0.3.2(Oct 18, 2019)
added support for bayesian-gmm models

fixes keyword argumens of gmm models in package

made some preprocessing features

added some experimental models ready for field testing

Source code(tar.gz)
Source code(zip)
0.3.1(Sep 22, 2019)

confusionbalancer shrinkage in grouped estimator
Source code(tar.gz)
Source code(zip)
0.2.1(Jul 25, 2019)

GroupedEstimator fix for Non Numeric groups ColumnDropper ci/cd fix for pandas update
Source code(tar.gz)
Source code(zip)
0.2.0(Jul 19, 2019)

FairClassifier InformationFilter
Source code(tar.gz)
Source code(zip)
0.1.8(Jun 19, 2019)
orthogonal projections

thresholder meta model

correlation metric

fairness metric

decay estimator

small bugfixes

Source code(tar.gz)
Source code(zip)
0.1.7(May 26, 2019)
Added a GMMNaiveBayes Classifier.

Bugfix for Estimator Transformer

Added a performance improvement for ColumnCapper

Source code(tar.gz)
Source code(zip)
0.1.6(May 5, 2019)

we got our first pull request from brazil!
Source code(tar.gz)
Source code(zip)
0.1.5(Apr 24, 2019)

grouped estimator got added as well as documented with pretty pictures!
Source code(tar.gz)
Source code(zip)
0.1.4(Apr 7, 2019)

we added a patsy preprocessor.
Source code(tar.gz)
Source code(zip)

Owner

vincent d warmerdam

Solving problems involving data. Mostly NLP these days. AskMeAnything[tm].

GitHub https://scikit-lego.netlify.app

Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

213 Jan 2, 2023

scikit-learn inspired API for CRFsuite

sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i

418 Jan 9, 2023

A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

803 Jan 5, 2023

Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

Repository Status for Scikit-learn Live webpage Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn. Running local

6 Dec 27, 2022

Scikit-Learn useful pre-defined Pipelines Hub

Scikit-Pipes Scikit-Learn useful pre-defined Pipelines Hub Usage: Install scikit-pipes It's advised to install sklearn-genetic using a virtual env, in

1 Apr 26, 2022

PySpark + Scikit-learn = Sparkit-learn

Sparkit-learn PySpark + Scikit-learn = Sparkit-learn GitHub: https://github.com/lensacom/sparkit-learn About Sparkit-learn aims to provide scikit-lear

1.1k Jan 4, 2023

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

380 Nov 5, 2022

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

SciKit-Learn Laboratory This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. O

528 Nov 25, 2022

Python package for Bayesian Machine Learning with scikit-learn API

Python package for Bayesian Machine Learning with scikit-learn API Installing & Upgrading package pip install https://github.com/AmazaspShumik/sklearn

482 Jan 4, 2023

A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

4.9k Dec 31, 2022

An intuitive library to add plotting functionality to scikit-learn objects.

Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i

2.3k Dec 31, 2022

scikit-learn: machine learning in Python

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started

52.5k Jan 8, 2023

A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

3.8k Feb 13, 2021

Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

665 Dec 17, 2022

scikit-learn wrappers for Python fastText.

skift scikit-learn wrappers for Python fastText. >>> from skift import FirstColFtClassifier >>> df = pandas.DataFrame([['woof', 0], ['meow', 1]], colu

233 Sep 9, 2022

Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

631 Feb 2, 2021

scikit-learn wrappers for Python fastText.

skift scikit-learn wrappers for Python fastText. >>> from skift import FirstColFtClassifier >>> df = pandas.DataFrame([['woof', 0], ['meow', 1]], colu

209 Feb 17, 2021

A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

802 Jan 1, 2023

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

482 Nov 19, 2022

Automated Machine Learning with scikit-learn

auto-sklearn auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. Find the documentation here

6.7k Jan 7, 2023

Extra blocks for scikit-learn pipelines.

Related tags

Overview

scikit-lego

Installation

Documentation

Usage

Features

New Features

Comments

Description

Type of change

Checklist:

Releases(0.6.14)

0.6.14(Nov 2, 2022)

0.6.13(Sep 10, 2022)

0.6.12(Jun 5, 2022)

0.6.11(Apr 20, 2022)

0.6.10(Mar 12, 2022)

0.6.9(Dec 9, 2021)

0.6.8(Jul 3, 2021)

0.6.7(May 7, 2021)

0.6.6(Apr 21, 2021)

0.6.2(Oct 25, 2020)

0.6.1(Sep 22, 2020)

0.6.0(Sep 7, 2020)

0.5.2(Jul 31, 2020)

0.5.1(Jul 8, 2020)

0.5.0(May 31, 2020)

0.4.4(May 26, 2020)

0.4.3(May 13, 2020)

0.4.2(May 3, 2020)

0.4.0(Feb 12, 2020)

0.3.4(Jan 17, 2020)

0.3.3(Oct 23, 2019)

0.3.2(Oct 18, 2019)

0.3.1(Sep 22, 2019)

0.2.1(Jul 25, 2019)

0.2.0(Jul 19, 2019)

0.1.8(Jun 19, 2019)

0.1.7(May 26, 2019)

0.1.6(May 5, 2019)

0.1.5(Apr 24, 2019)

0.1.4(Apr 7, 2019)

Owner

vincent d warmerdam

Scikit-learn compatible estimation of general graphical models

scikit-learn inspired API for CRFsuite

A scikit-learn based module for multi-label et. al. classification

Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

Scikit-Learn useful pre-defined Pipelines Hub

PySpark + Scikit-learn = Sparkit-learn

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

Python package for Bayesian Machine Learning with scikit-learn API

A scikit-learn compatible neural network library that wraps PyTorch

An intuitive library to add plotting functionality to scikit-learn objects.

scikit-learn: machine learning in Python

A scikit-learn compatible neural network library that wraps PyTorch

Scikit-learn style model finetuning for NLP

scikit-learn wrappers for Python fastText.

Scikit-learn style model finetuning for NLP

scikit-learn wrappers for Python fastText.

A scikit-learn based module for multi-label et. al. classification

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Automated Machine Learning with scikit-learn