Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas.

Taylor G Smith

Last update: Aug 20, 2022

Related tags

Machine Learning python data-science machine-learning scikit-learn pandas imbalanced-data skutil

Overview

skoot

Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas. Its objective is to expedite data munging and pre-processing tasks that can tend to take up so much of data science practitioners' time. See the documentation for more info.

Note that skoot is the preferred alternative to the now deprecated skutil library

Two minutes to model-readiness

Real world data is nasty. Most data scientists spend the majority of their time tackling data cleansing tasks. With skoot, we can automate away so much of the bespoke hacking solutions that consume data scientists' time.

In this example, we'll examine a common dataset (the adult dataset from the UCI machine learning repo) that requires significant pre-processing.

from skoot.datasets import load_adult_df
from skoot.feature_selection import FeatureFilter
from skoot.decomposition import SelectivePCA
from skoot.preprocessing import DummyEncoder
from skoot.utils.dataframe import get_numeric_columns
from skoot.utils.dataframe import get_categorical_columns
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# load the dataset with the skoot-native loader & split it
adult = load_adult_df(tgt_name="target")
y = adult.pop("target")
X_train, X_test, y_train, y_test = train_test_split(
    adult, y, random_state=42, test_size=0.2)
    
# get numeric and categorical feature names
num_cols = get_numeric_columns(X_train).columns
obj_cols = get_categorical_columns(X_train).columns

# remove the education-num from the num_cols since we're going to remove it
num_cols = num_cols[~(num_cols == "education-num")]
    
# build a pipeline
pipe = Pipeline([
    # drop out the ordinal level that's otherwise equal to "education"
    ("dropper", FeatureFilter(cols=["education-num"])),
    
    # decompose the numeric features with PCA
    ("pca", SelectivePCA(cols=num_cols)),
    
    # dummy encode the categorical features
    ("dummy", DummyEncoder(cols=obj_cols, handle_unknown="ignore")),
    
    # and a simple classifier class
    ("clf", RandomForestClassifier(n_estimators=100, random_state=42))
])

pipe.fit(X_train, y_train)

# produce predictions
preds = pipe.predict(X_test)
print("Test accuracy: %.3f" % accuracy_score(y_test, preds))

For more tutorials, check out the documentation.

Comments

Windows: pip install not working

Hi, I can't install skoot neither via pip, nor anaconda.

> pip install skoot
Collecting skoot
  Could not find a version that satisfies the requirement skoot (from versions: )
No matching distribution found for skoot

Any ideas why that might be? Thank you!

opened by r0f1 2

Bump django from 1.11 to 1.11.29 in /build_tools/doc
Bumps django from 1.11 to 1.11.29.

Commits

f1e3017 [1.11.x] Bumped version for 1.11.29 release.

02d97f3 [1.11.x] Fixed CVE-2020-9402 -- Properly escaped tolerance parameter in GIS f...

e643833 [1.11.x] Pinned PyYAML < 5.3 in test requirements.

d0e3eb8 [1.11.x] Added CVE-2020-7471 to security archive.

9a62ed5 [1.11.x] Post-release version bump.

e09f09b [1.11.x] Bumped version for 1.11.28 release.

001b063 [1.11.x] Fixed CVE-2020-7471 -- Properly escaped StringAgg(delimiter) parameter.

7fd1ca3 [1.11.x] Fixed timezones tests for PyYAML 5.3+.

121115d [1.11.x] Added CVE-2019-19844 to the security archive.

2c4fb9a [1.11.x] Post-release version bump.

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump django from 1.11 to 1.11.28 in /build_tools/doc
Bumps django from 1.11 to 1.11.28.

Commits

e09f09b [1.11.x] Bumped version for 1.11.28 release.

001b063 [1.11.x] Fixed CVE-2020-7471 -- Properly escaped StringAgg(delimiter) parameter.

7fd1ca3 [1.11.x] Fixed timezones tests for PyYAML 5.3+.

121115d [1.11.x] Added CVE-2019-19844 to the security archive.

2c4fb9a [1.11.x] Post-release version bump.

358973a [1.11.x] Bumped version for 1.11.27 release.

f4cff43 [1.11.x] Fixed CVE-2019-19844 -- Used verified user email for password reset ...

a235574 [1.11.x] Refs #31073 -- Added release notes for 02eff7ef60466da108b1a33f1e4dc...

e8fdf00 [1.11.x] Fixed #31073 -- Prevented CheckboxInput.get_context() from mutating ...

4f15016 [1.11.x] Post-release version bump.

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump django from 1.11 to 1.11.23 in /build_tools/doc
Bumps django from 1.11 to 1.11.23.

Commits

9748977 [1.11.x] Bumped version for 1.11.23 release.

869b34e [1.11.x] Fixed CVE-2019-14235 -- Fixed potential memory exhaustion in django....

ed682a2 [1.11.x] Fixed CVE-2019-14234 -- Protected JSONField/HStoreField key and inde...

52479ac [1.11.x] Fixed CVE-2019-14233 -- Prevented excessive HTMLParser recursion in ...

42a66e9 [1.11.X] Fixed CVE-2019-14232 -- Adjusted regex to avoid backtracking issues ...

693046e [1.11.x] Added stub release notes for security releases.

6d054b5 [1.11.x] Added CVE-2019-12781 to the security release archive.

7c849b9 [1.11.x] Post-release version bump.

480380c [1.11.x] Bumped version for 1.11.22 release.

32124fc [1.11.x] Fixed CVE-2019-12781 -- Made HttpRequest always trust SECURE_PROXY_S...

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Wrapped classes still reference sklearn user-guide

The "See Also" section of wrapped sklearn estimators still references sklearn user_guide refs. We need to monkey patch "Selective" (or whatever prefix we are using) in front of them so they link in our documentation.
bug

opened by tgsmith61591 1
Bump django from 1.11 to 2.2.24 in /build_tools/doc
Bumps django from 1.11 to 2.2.24.

Commits

2da029d [2.2.x] Bumped version for 2.2.24 release.

f27c38a [2.2.x] Fixed CVE-2021-33571 -- Prevented leading zeros in IPv4 addresses.

053cc95 [2.2.x] Fixed CVE-2021-33203 -- Fixed potential path-traversal via admindocs'...

6229d87 [2.2.x] Confirmed release date for Django 2.2.24.

f163ad5 [2.2.x] Added stub release notes and date for Django 2.2.24.

bed1755 [2.2.x] Changed IRC references to Libera.Chat.

63f0d7a [2.2.x] Refs #32718 -- Fixed file_storage.test_generate_filename and model_fi...

5fe4970 [2.2.x] Post-release version bump.

61f814f [2.2.x] Bumped version for 2.2.23 release.

b8ecb06 [2.2.x] Fixed #32718 -- Relaxed file name validation in FileField.

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
scipy._lib_version not found when building package

problem: error saying scipy._lib_version is missing when building skoot

cause: scipy._lib_version was removed in scipy 1.5.0 --> https://github.com/scipy/scipy/pull/11290 (downgrading to scipy 1.4.0 helps)

Thanks!

opened by AgroSimi 0
pip install Skoot on Mac keeps failing with ERROR: Could not find a version that satisfies the requirement skoot (from versions: none).

Description

pip install Skoot on Mac keeps failing with ERROR: Could not find a version that satisfies the requirement skoot (from versions: none) ERROR: No matching distribution found for skoot

Steps/Code to Reproduce

pip install skoot using python version : Python 2.7.17 using pip version : pip 19.3.1

Expected Results

No errors thrown, successful installation of Skoot

Actual Results

ERROR: Could not find a version that satisfies the requirement skoot (from versions: none) ERROR: No matching distribution found for skoot

Versions

platform - Darwin-19.2.0-x86_64-i386-64bit sys - ('Python', '2.7.17 (default, Oct 24 2019, 12:57:47) \n[GCC 4.2.1 Compatible Apple LLVM 11.0.0 (clang-1100.0.33.8)]') Skoot -( not able to install ) numpy -("NumPy", numpy.version) scipy - ('SciPy', '1.2.3') sklearn - scikit-learn->sklearn (1.16.6)

opened by lakshmikrish-97 8
[MRG] Mac builds

This PR adds builds for mac. Currently, it does not deploy to PyPI. We still need the deploy-vars group on ADO. Since we decided to just do mac + Linux for now, this branched off of add-azure... We can use that branch to play around with Windows, or create a new one

opened by aaronreidsmith 1
Package Roadmap

Is skoot still an active project? Or is there a successor to this concept? Looking to build something similar for my specific workflow, but maybe it would be mutually beneficial to contribute to this project.

opened by MattConflitti 2
String fields with typos

Description

TODO: Create a transformer that can map values in text fields to known "good" values given Levenstein distance or some other method.
enhancement

opened by tgsmith61591 0

Releases(0.20.0)

0.20.0(Jul 25, 2019)

Initial release of Skoot. Wheels are only built for Linux for this release.
Source code(tar.gz)
Source code(zip)

Owner

Taylor G Smith

Data scientist, ML engineer and all-around hacker. Java was once my first love, but I've long since converted to the cult of Python.

GitHub https://alkaline-ml.com/skoot

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

1 Jan 11, 2022

Painless Machine Learning for python based on scikit-learn

PlainML Painless Machine Learning Library for python based on scikit-learn. Install pip install plainml Example from plainml import KnnModel, load_ir

1 Aug 6, 2022

Automated Machine Learning with scikit-learn

auto-sklearn auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. Find the documentation here

6.7k Jan 7, 2023

Relevance Vector Machine implementation using the scikit-learn API.

scikit-rvm scikit-rvm is a Python module implementing the Relevance Vector Machine (RVM) machine learning technique using the scikit-learn API. Quicks

204 Nov 18, 2022

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly.

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly. Its main purpose is the transformation of bilinear forms into sparse matrices and linear forms into vectors.

297 Dec 13, 2022

Scikit learn library models to account for data and concept drift.

liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d

7 Nov 18, 2021

Pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code

pandas-method-chaining pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code. It is a fork from pandas-v

5 May 14, 2022

Iris species predictor app is used to classify iris species created using python's scikit-learn, fastapi, numpy and joblib packages.

Iris Species Predictor Iris species predictor app is used to classify iris species using their sepal length, sepal width, petal length and petal width

5 Apr 5, 2022

Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Penguins Classification App Penguins species predictor app is used to classify penguins species using their island, sex, bill length (mm), bill depth

3 Apr 5, 2022

K-Means clusternig example with Python and Scikit-learn

Unsupervised-Machine-Learning Flat Clustering K-Means clusternig example with Python and Scikit-learn Flat clustering Clustering algorithms group a se

1 Dec 13, 2021

Predicting Baseball Metric Clusters: Clustering Application in Python Using scikit-learn

Clustering Clustering Application in Python Using scikit-learn This repository contains the prediction of baseball metric clusters using MLB Statcast

2 Apr 18, 2022

Pandas Machine Learning and Quant Finance Library Collection

148 Dec 7, 2022

A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

0 Mar 30, 2022

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets Datasets Used: Iris dataset,

2 Nov 18, 2021

icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

icepickle It's a cooler way to store simple linear models. The goal of icepickle is to allow a safe way to serialize and deserialize linear scikit-lea

24 Dec 9, 2022

A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

802 Jan 1, 2023

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

482 Nov 19, 2022

Distributed scikit-learn meta-estimators in PySpark

sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn

282 Dec 9, 2022

Scikit-Learn useful pre-defined Pipelines Hub

Scikit-Pipes Scikit-Learn useful pre-defined Pipelines Hub Usage: Install scikit-pipes It's advised to install sklearn-genetic using a virtual env, in

1 Apr 26, 2022

Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas.

Related tags

Overview

skoot

Two minutes to model-readiness

Comments

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Description

Releases(0.20.0)

0.20.0(Jul 25, 2019)

Owner

Taylor G Smith

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

Painless Machine Learning for python based on scikit-learn

Automated Machine Learning with scikit-learn

Relevance Vector Machine implementation using the scikit-learn API.

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly.

Scikit learn library models to account for data and concept drift.

Pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code

Iris species predictor app is used to classify iris species created using python's scikit-learn, fastapi, numpy and joblib packages.

Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

K-Means clusternig example with Python and Scikit-learn

Predicting Baseball Metric Clusters: Clustering Application in Python Using scikit-learn

Pandas Machine Learning and Quant Finance Library Collection

A collection of Scikit-Learn compatible time series transformers and tools.

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

A scikit-learn based module for multi-label et. al. classification

Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Distributed scikit-learn meta-estimators in PySpark

Scikit-Learn useful pre-defined Pipelines Hub