Extra blocks for scikit-learn pipelines.

Overview

Build status Downloads Version Conda Version Code style: black

scikit-lego

We love scikit learn but very often we find ourselves writing custom transformers, metrics and models. The goal of this project is to attempt to consolidate these into a package that offers code quality/testing. This project started as a collaboration between multiple companies in the Netherlands but has since received contributions from around the globe. It was initiated by Matthijs Brouns and Vincent D. Warmerdam as a tool to teach people how to contribute to open source.

Note that we're not formally affiliated with the scikit-learn project at all, but we aim to strictly adhere to their standards.

The same holds with lego. LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this project.

Installation

Install scikit-lego via pip with

python -m pip install scikit-lego

Via conda with

conda install -c conda-forge scikit-lego

Alternatively, to edit and contribute you can fork/clone and run:

python -m pip install -e ".[dev]"
python setup.py develop

Documentation

The documentation can be found here.

Usage

We offer custom metrics, models and transformers. You can import them just like you would in scikit-learn.

# the scikit learn stuff we love
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# from scikit lego stuff we add
from sklego.preprocessing import RandomAdder
from sklego.mixture import GMMClassifier

...

mod = Pipeline([
    ("scale", StandardScaler()),
    ("random_noise", RandomAdder()),
    ("model", GMMClassifier())
])

...

Features

Here's a list of features that this library currently offers:

  • sklego.datasets.load_abalone loads in the abalone dataset
  • sklego.datasets.load_arrests loads in a dataset with fairness concerns
  • sklego.datasets.load_chicken loads in the joyful chickweight dataset
  • sklego.datasets.load_heroes loads a heroes of the storm dataset
  • sklego.datasets.load_hearts loads a dataset about hearts
  • sklego.datasets.load_penguins loads a lovely dataset about penguins
  • sklego.datasets.fetch_creditcard fetch a fraud dataset from openml
  • sklego.datasets.make_simpleseries make a simulated timeseries
  • sklego.pandas_utils.add_lags adds lag values in a pandas dataframe
  • sklego.pandas_utils.log_step a useful decorator to log your pipeline steps
  • sklego.dummy.RandomRegressor dummy benchmark that predicts random values
  • sklego.linear_model.DeadZoneRegressor experimental feature that has a deadzone in the cost function
  • sklego.linear_model.DemographicParityClassifier logistic classifier constrained on demographic parity
  • sklego.linear_model.EqualOpportunityClassifier logistic classifier constrained on equal opportunity
  • sklego.linear_model.ProbWeightRegression linear model that treats coefficients as probabilistic weights
  • sklego.linear_model.LowessRegression locally weighted linear regression
  • sklego.linear_model.LADRegression least absolute deviation regression
  • sklego.linear_model.ImbalancedLinearRegression punish over/under-estimation of a model directly
  • sklego.naive_bayes.GaussianMixtureNB classifies by training a 1D GMM per column per class
  • sklego.naive_bayes.BayesianGaussianMixtureNB classifies by training a bayesian 1D GMM per class
  • sklego.mixture.BayesianGMMClassifier classifies by training a bayesian GMM per class
  • sklego.mixture.BayesianGMMOutlierDetector detects outliers based on a trained bayesian GMM
  • sklego.mixture.GMMClassifier classifies by training a GMM per class
  • sklego.mixture.GMMOutlierDetector detects outliers based on a trained GMM
  • sklego.meta.ConfusionBalancer experimental feature that allows you to balance the confusion matrix
  • sklego.meta.DecayEstimator adds decay to the sample_weight that the model accepts
  • sklego.meta.EstimatorTransformer adds a model output as a feature
  • sklego.meta.OutlierClassifier turns outlier models into classifiers for gridsearch
  • sklego.meta.GroupedPredictor can split the data into runs and run a model on each
  • sklego.meta.GroupedTransformer can split the data into runs and run a transformer on each
  • sklego.meta.SubjectiveClassifier experimental feature to add a prior to your classifier
  • sklego.meta.Thresholder meta model that allows you to gridsearch over the threshold
  • sklego.meta.RegressionOutlierDetector meta model that finds outliers by adding a threshold to regression
  • sklego.meta.ZeroInflatedRegressor predicts zero or applies a regression based on a classifier
  • sklego.preprocessing.ColumnCapper limits extreme values of the model features
  • sklego.preprocessing.ColumnDropper drops a column from pandas
  • sklego.preprocessing.ColumnSelector selects columns based on column name
  • sklego.preprocessing.InformationFilter transformer that can de-correlate features
  • sklego.preprocessing.IdentityTransformer returns the same data, allows for concatenating pipelines
  • sklego.preprocessing.OrthogonalTransformer makes all features linearly independent
  • sklego.preprocessing.PandasTypeSelector selects columns based on pandas type
  • sklego.preprocessing.PatsyTransformer applies a patsy formula
  • sklego.preprocessing.RandomAdder adds randomness in training
  • sklego.preprocessing.RepeatingBasisFunction repeating feature engineering, useful for timeseries
  • sklego.preprocessing.DictMapper assign numeric values on categorical columns
  • sklego.preprocessing.OutlierRemover experimental method to remove outliers during training
  • sklego.model_selection.KlusterFoldValidation experimental feature that does K folds based on clustering
  • sklego.model_selection.TimeGapSplit timeseries Kfold with a gap between train/test
  • sklego.pipeline.DebugPipeline adds debug information to make debugging easier
  • sklego.pipeline.make_debug_pipeline shorthand function to create a debugable pipeline
  • sklego.metrics.correlation_score calculates correlation between model output and feature
  • sklego.metrics.equal_opportunity_score calculates equal opportunity metric
  • sklego.metrics.p_percent_score proxy for model fairness with regards to sensitive attribute
  • sklego.metrics.subset_score calculate a score on a subset of your data (meant for fairness tracking)

New Features

We want to be rather open here in what we accept but we do demand three things before they become added to the project:

  1. any new feature contributes towards a demonstratable real-world usecase
  2. any new feature passes standard unit tests (we use the ones from scikit-learn)
  3. the feature has been discussed in the issue list beforehand

We automate all of our testing and use pre-commit hooks to keep the code working.

Comments
  • [FEATURE] Bayesian Kernel Density Classifier

    [FEATURE] Bayesian Kernel Density Classifier

    • I've been using this Bayesian kernel density classifier for a few years and I thought I should move it out from my poorly organized project to this one here.

    The prior is $P(y=0)$. I primarily use it for spatial problems.

    It is similar to the GMM Classifier with only 2 caveats I can think of.

    • Hyperparameters are easier to decide on.
    • Scaling is worse as I believe due to the KDE part scaling linearly with the sample size.
    # noinspection PyPep8Naming
    class BayesianKernelDensityClassifier(BaseEstimator, ClassifierMixin):
        """
        Bayesian Classifier that uses Kernel Density Estimations to generate the joint distribution
        Parameters:
            - bandwidth: float
            - kernel: for scikit learn KernelDensity
        """
        def __init__(self, bandwidth=0.2, kernel='gaussian'):
            self.classes_, self.models_, self.priors_logp_ = [None] * 3
            self.bandwidth = bandwidth
            self.kernel = kernel
    
        def fit(self, X, y):
            self.classes_ = np.sort(np.unique(y))
            training_sets = [X[y == yi] for yi in self.classes_]
            self.models_ = [KernelDensity(bandwidth=self.bandwidth, kernel=self.kernel).fit(x_subset)
                            for x_subset in training_sets]
    
            self.priors_logp_ = [np.log(x_subset.shape[0] / X.shape[0]) for x_subset in training_sets]
            return self
    
        def predict_proba(self, X):
            logp = np.array([model.score_samples(X) for model in self.models_]).T
            result = np.exp(logp + self.priors_logp_)
            return result / result.sum(1, keepdims=True)
    
        def predict(self, X):
            return self.classes_[np.argmax(self.predict_proba(X), 1)]
    
    enhancement 
    opened by arose13 26
  • [FEATURE] A light version that does not depend on cvxpy (and others?)

    [FEATURE] A light version that does not depend on cvxpy (and others?)

    Please explain clearly what you'd like to see added.

    • [x] convince us of the use-case, we're open to many suggestions but we prefer to solve problems with pipelines that are at least somewhat general When using the package in a Docker container without C installed, installation (can?) fail on CVXPY

    • [x] ~~add a screenshot if applicable (ML stuff is hard to explain with words, pictures say 1000 words)~~

    • [x] ~~make sure that the feature you want is not already supported by sklearn~~

    Happy to work on this if you agree

    enhancement 
    opened by pim-hoeven 19
  • Bugfix for #158

    Bugfix for #158

    Fixed the bug caused when using the grouped_estimator with a string column as grouping variable.

    Solution: Try the checks without adjustments, if that fails: remove the grouping column from the array or dataframe.

    opened by pim-hoeven 18
  • Timegap optimize simplify

    Timegap optimize simplify

    • replace two parameters df, date_col by one date_serie
    • insure no mutation with a copy
    • check if index match and if index is unique
    • optimise the iloc/loc/get_loc with a join and index
    • add plotting function with unit test
    • add summary function with unit test
    • update notebook doc
    opened by stephanecollot 17
  • [BUG] Stacking classifier cannot use Thresholder function - no .predict_proba

    [BUG] Stacking classifier cannot use Thresholder function - no .predict_proba

    Description:

    I'm able to use the thresholder on sklearn's voting classifer, but not on the stacking classifier. It throws this error, which I believe is in error. StackingClassifier does have predict_proba. Maybe I'm missunderstanding the use case, but this seems to fit.

    ValueError: The Thresholder meta model only works on classifcation models with .predict_proba.

    Code for reproduction (using the sklearn sample data for StackingClassifier):

    from sklearn.datasets import load_iris
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.svm import LinearSVC
    from sklearn.linear_model import LogisticRegression
    from sklearn.preprocessing import StandardScaler
    from sklearn.pipeline import make_pipeline
    from sklearn.ensemble import StackingClassifier
    
    X, y = load_iris(return_X_y=True)
    estimators = [
        ('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
        ('svr', make_pipeline(StandardScaler(), LinearSVC(random_state=42)))]
    clf = StackingClassifier(    estimators=estimators, final_estimator=LogisticRegression())
    clf.fit(X, y)
    
    a = Thresholder(clf, threshold=0.2)
    a.fit(X, y)
    a.predict(X)
    

    Full trace:

    ValueError                                Traceback (most recent call last)
    <ipython-input-26-1b89dbfa16b8> in <module>
         16 
         17 a = Thresholder(clf, threshold=0.2)
    ---> 18 a.fit(X_train_std, np.ceil(y_train[targets[2]]))
         19 a.predict(X_train_std)
    
    ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\sklego\meta\thresholder.py in fit(self, X, y, sample_weight)
         54         self.estimator_ = clone(self.model)
         55         if not isinstance(self.estimator_, ProbabilisticClassifier):
    ---> 56             raise ValueError(
         57                 "The Thresholder meta model only works on classifcation models with .predict_proba."
         58             )
    
    ValueError: The Thresholder meta model only works on classifcation models with .predict_proba.
    
    bug 
    opened by L-Marriott 16
  • GMM methods - classification *and* outlier detection

    GMM methods - classification *and* outlier detection

    As far as outlier detection is concerned, this is the current flow:

    import numpy as np
    import pandas as pd
    import plotnine as p9
    
    from sklego.mixture import GMMOutlierDetector
    
    X = np.random.normal(-10, 1, (2000, 2))
    mod = GMMOutlierDetector(n_components=1, threshold=0.99).fit(X)
    
    df = pd.DataFrame({"x1": X[:, 0], "x2": X[:, 1],
                       "loglik": mod.gmm.score_samples(X), 
                       "prediction": mod.predict(X).astype(str)})
    
    (p9.ggplot() + p9.geom_point(data=df, mapping=p9.aes("x1", "x2", color="loglik")))
    

    image

    (p9.ggplot() + p9.geom_point(data=df, mapping=p9.aes("x1", "x2", color="prediction")))
    

    image

    opened by koaning 16
  • Add log names and types

    Add log names and types

    I feel that by copying the log_step function many times and slightly changing the logging section I'm repeating a lot of code. Do you have any suggestion to avoid this?

    This is WIP, some TODOs:

    • rename log_step -> log_shape.
    • tests, docs.
    • log_dtypes
    opened by david26694 15
  • [FEATURE] Pass additional parameters to fit underlying estimator in `EstimatorTransformer`

    [FEATURE] Pass additional parameters to fit underlying estimator in `EstimatorTransformer`

    In EstimatorTransformer the underlying estimator is being fitted without the ability to pass along additional arguments to self.estimator_.fit.

    This limits use cases for EstimatorTransformer. For example, if the underlying estimator is an XGBClassifier we would like to be able to pass eval_set to monitor validation performance and enable early stopping. This is currently not possible. Adding *args, **kwargs should fix this issue.

    https://github.com/koaning/scikit-lego/blob/b4d087f0131ff164e6feebf238356ba6512b3635/sklego/meta/estimator_transformer.py#L31

    enhancement 
    opened by CarloLepelaars 14
  • Add Repeating Basis Functions

    Add Repeating Basis Functions

    I worked on this a while ago in PR #147, but I started from a fresh branch because I decided to limit the scope only to repeating basis functions (#20), excluding the spanning basis functions (#29).

    Feedback adapted so far:

    • Write a transformer when X contains only one column
    • Write a wrapper which selects one column, when X has more than one column
    • Added basic tests
    • Added docstring

    To Do:

    • Write documentation

    Could someone review the code? I'll work on the documentation in the meantime as well.

    Tagging people involved in PR #147: @kayhoogland @koaning @MBrouns

    opened by RensDimmendaal 14
  • Fix start of train split in TimeGapSplit and added n_split parameter

    Fix start of train split in TimeGapSplit and added n_split parameter

    Addresses changes in #192 and #232 I am currently working on a Time Series problem with vibration data where I needed a functionality like the one suggested in #232 so I decided to add it here.

    I tried to explain the changes in functionality in the docstring:

    Each validation fold doesn't overlap. The entire 'window' moves by 1 valid_duration until there is not enough data. If this would lead to more splits then specified with n_splits, the 'window' moves by the validation_duration times the fraction of possible splits and requested splits -- n_possible_splits = (total_length-train_duration-gap_duration)//valid_duration -- time_shift = valid_duratiopn n_possible_splits/n_slits so the CV spans the whole dataset. If train_duration is not passed but n_split, the training duration is increased to -- train_duration = total_length-(self.gap_duration + self.valid_duration * self.n_splits) such that the shifting the entire window by one validation duration spans the whole training set

    The changes are also added to the docs notebook for visualization.

    opened by rpauli 13
  • [FEATURE] WontPredict: meta model.

    [FEATURE] WontPredict: meta model.

    In the world of hype @MBrouns and myself came up with a very normal thought.

    Screenshot 2019-10-28 at 20 00 04

    One way to acomplish this is by introduction of a new meta model: WontPredict.

    This thread is meant as a place to discuss the implementation.

    enhancement 
    opened by koaning 13
  • [FEATURE] Adding the MRMR (Maximum Relevance Minimum Redundancy) feature selection

    [FEATURE] Adding the MRMR (Maximum Relevance Minimum Redundancy) feature selection

    Hi!

    The only feature selections that scikit-learn offers are quite naive. MRMR seems like a bit more advanced and reasonable approach to select informative and non-redundant features as described here.

    Long story short:

    1. Pick a feature that is most informative in some metric (e.g. F-statistic).
    2. Pick the next feature that is very informative, but doesn't correlate with the previous feature too much (e.g. the average absolute Pearson correlation between the current feature and the feature selected in step 2).
    3. Pick the next feature that is very informative, but doesn't correlate with the previous 2 features too much.
    4. Pick the next feature that is very informative, but doesn't correlate with the previous 3 features too much.
    5. (repeat until K features selected)

    Here, K, the measure of information and correlation can be specified by the user.

    enhancement 
    opened by Garve 7
  • [FEATURE] allow for all kwargs when using @log_step

    [FEATURE] allow for all kwargs when using @log_step

    Hi,

    When using @log_step in debugging a Pandas Pipeline, the current function must accept a single argument of df:pd.Dataframe.

    However if the user sends all the parameters as kwargs there is an error .

    It would be useful if the @log_step will check the first kwargs and if it is a pd.Dataframe then it will convert it into an arg - possible implementation before running the def wrapper()as follows

        _kwargs = {**kwargs}
        first_arg= next(iter(_kwargs))
        if isinstance(_kwargs[first_arg],pd.DataFrame) and len(args)==0:
            args=args+(_kwargs.pop[first_arg],)
    
    
    
    enhancement 
    opened by sephib 6
  • [FEATURE] - Grid search across model parameters AND thresholds with Thresholder() without refitting

    [FEATURE] - Grid search across model parameters AND thresholds with Thresholder() without refitting

    Thanks for this great set of extensions to sklearn.

    The Tresholder() model is quite close to something I've been looking for for a while.

    I'm looking to include threshold optimisation as part of a broader parameter search.

    I can perhaps best describe the desired behaviour as follows

    for each parameters in grid:
        fit model with parameters
        for each threshold in thresholds:
            evaluate model
    

    However, if I pass a model that has not yet been fit to Thresholder(), then, even with refit=False, the same model is fit also for each threshold.

    Is there an easy way around this? Thinking about this the best way to achieve this would be tinkering with the GridSearchCV code, but perhaps you have an idea and would also find this interesting?

    Thanks!

    enhancement 
    opened by mcallaghan 1
  • Selectors : allow results in empty dataframe

    Selectors : allow results in empty dataframe

    Before working on a large PR, please check with @koaning or @MBrouns that they agree with the direction of the PR. This discussion should take place in a Github issue before working on the PR, unless it's a minor change like spelling in the docs.

    Description

    Consider you want to build a semi-auto Pipeline. So, the pipeline may looks like:

    
    import pandas as pd
    import numpy as np
    from sklearn.pipeline import Pipeline, FeatureUnion
    from sklearn.preprocessing import OneHotEncoder, StandardScaler
    
    transformer = Pipeline([
        ('features', FeatureUnion(n_jobs=1, transformer_list=[
            # Part 1
            ('boolean', Pipeline([
                ('selector', PandasTypeSelector(include='bool')),
            ])),  # booleans close
            
            ('numericals', Pipeline([
                ('selector', PandasTypeSelector(include='number')),
                ('scaler', StandardScaler()),
                ('add_pca', FeatureUnion([
                    ('orig', IdentityTransformer()),
                    ('pca', PCA(2))
                ]))
            ])),  # numericals close
            
            # Part 2
            ('categoricals', Pipeline([
                ('selector', PandasTypeSelector(include='category')),
                ('labeler', StringIndexer()),
                ('encoder', OneHotEncoder(handle_unknown='ignore')),
            ]))  # categoricals close
        
        ])),  # features close
    ])  # pipeline close
    

    There may be boolean, numericals and categoricals variables , but may not exists erther . Current behaviour is raise Exception when a Selector return empty DataFrame , I think we can expose a parameter let user choose .

    Fixes # (issue)

    Type of change

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

    Checklist:

    • [ ] My code follows the style guidelines (flake8)
    • [ ] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation (also to the readme.md)
    • [ ] I have added tests that prove my fix is effective or that my feature works
    • [ ] I have added tests to check whether the new feature adheres to the sklearn convention
    • [ ] New and existing unit tests pass locally with my changes

    If you feel your PR is ready for a review, ping @koaning or @mbrouns.

    opened by eromoe 3
  • [WIP] `get_feature_names_out` for `sklego.preprocessing`.

    [WIP] `get_feature_names_out` for `sklego.preprocessing`.

    This PR solves issue #543 and implements get_feature_names_out for all relevant transformers in sklego.preprocessing (i.e. transformers that do not contain the TrainOnlyTransformerMixin).

    Functionality is implemented through adding the _ClassNamePrefixFeaturesOutMixin to the class and making sure self._n_features_out is defined in .fit. This is also generally how scikit-learn implements get_feature_names_out for many of its transformers (Example). Unit tests are added for all new functionality.

    P.S. Don't pay attention to the commit history before October 10th. These changes have already been merged into koaning/scikit-lego/main, but is still displayed here as commit history. Will try to fix this. Suggestions to remove these redundant commits from the commit history of this CarloLepelaars/scikit-lego/ fork are welcome.

    • [x] ~~Find alternative solution for using _ClassNamePrefixFeaturesOutMixin so it works with scikit-learn on Python 3.7. (Remove Python 3.7. support)~~
    • [x] Add implementation of get_feature_names_out to contributing guidelines so people implement this for each new preprocessor that is not TrainOnly.
    • [x] Remove Python 3.7. GitHub Actions pipelines and update Optional dependencies GitHub Actions pipeline to use Python 3.8.
    • [x] Add general unit test that checks if get_feature_names_out can be called for all relevant preprocessors and EstimatorTransformer.
    opened by CarloLepelaars 14
Releases(0.6.14)
Owner
vincent d warmerdam
Solving problems involving data. Mostly NLP these days. AskMeAnything[tm].
vincent d warmerdam
Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

null 213 Jan 2, 2023
scikit-learn inspired API for CRFsuite

sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i

null 418 Jan 9, 2023
A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

null 803 Jan 5, 2023
Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

Repository Status for Scikit-learn Live webpage Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn. Running local

Thomas J. Fan 6 Dec 27, 2022
Scikit-Learn useful pre-defined Pipelines Hub

Scikit-Pipes Scikit-Learn useful pre-defined Pipelines Hub Usage: Install scikit-pipes It's advised to install sklearn-genetic using a virtual env, in

Rodrigo Arenas 1 Apr 26, 2022
PySpark + Scikit-learn = Sparkit-learn

Sparkit-learn PySpark + Scikit-learn = Sparkit-learn GitHub: https://github.com/lensacom/sparkit-learn About Sparkit-learn aims to provide scikit-lear

Lensa 1.1k Jan 4, 2023
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 5, 2022
SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

SciKit-Learn Laboratory This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. O

ETS 528 Nov 25, 2022
Python package for Bayesian Machine Learning with scikit-learn API

Python package for Bayesian Machine Learning with scikit-learn API Installing & Upgrading package pip install https://github.com/AmazaspShumik/sklearn

Amazasp Shaumyan 482 Jan 4, 2023
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.9k Dec 31, 2022
An intuitive library to add plotting functionality to scikit-learn objects.

Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i

Reiichiro Nakano 2.3k Dec 31, 2022
scikit-learn: machine learning in Python

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started

scikit-learn 52.5k Jan 8, 2023
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 3.8k Feb 13, 2021
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 665 Dec 17, 2022
scikit-learn wrappers for Python fastText.

skift scikit-learn wrappers for Python fastText. >>> from skift import FirstColFtClassifier >>> df = pandas.DataFrame([['woof', 0], ['meow', 1]], colu

Shay Palachy 233 Sep 9, 2022
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 631 Feb 2, 2021
scikit-learn wrappers for Python fastText.

skift scikit-learn wrappers for Python fastText. >>> from skift import FirstColFtClassifier >>> df = pandas.DataFrame([['woof', 0], ['meow', 1]], colu

Shay Palachy 209 Feb 17, 2021
A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

null 802 Jan 1, 2023
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

Tamas Madl 482 Nov 19, 2022
Automated Machine Learning with scikit-learn

auto-sklearn auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. Find the documentation here

AutoML-Freiburg-Hannover 6.7k Jan 7, 2023