A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Machinalis

Last update: Nov 5, 2022

Related tags

Feature Engineering featureforge

Overview

Feature Forge

This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, etc.), and particularly helpful if you use scikit-learn (although this can work if you have a different algorithm).

Most machine learning problems involve an step of feature definition and preprocessing. Feature Forge helps you with:

Defining and documenting features
Testing your features against specified cases and against randomly generated cases (stress-testing). This helps you making your application more robust against invalid/misformatted input data. This also helps you checking that low-relevance results when doing feature analysis is actually because the feature is bad, and not because there's a slight bug in your feature code.
Evaluating your features on a data set, producing a feature evaluation matrix. The evaluator has a robust mode that allows you some tolerance both for invalid data and buggy features.
Experimentation: running, registering, classifying and reproducing experiments for determining best settings for your problems.

Installation

Just pip install featureforge.

Documentation

Documentation is available at http://feature-forge.readthedocs.org/en/latest/

Contact information

Javier Mansilla <[email protected]> (jmansilla at github)
Daniel Moisset <[email protected]> (dmoisset at github)
Rafael Carrascosa <[email protected]> (rafacarrascosa at github)

Any contributions or suggestions are welcome, the official channel for this is submitting github pull requests or issues.

Changelog

0.1.7:

StatsManager api change (order of arguments swapped)
For experimentation, enabled a way of booking experiments forever.

0.1.6:

Bug fixes related to sparse matrices.
Small documentation improvements.
Reduced default logging verbosity.

0.1.5:

Using sparse numpy matrices by default.

0.1.4:

Discarded the need of using forked version of Schema library.

0.1.3:

Added support for running and generating stats for experiments

0.1.2:

Fixing installer dependencies

0.1.1:

Added support for python 3
Added support for bag-of-words features

0.1:

Initial release

Comments

Test failing, schema validates integer as str

There is a test failing: test_feature_flattener.TestFeatureMappingFlattener Is related to the fact that schema is validating without error an integer 1 as a str.
bug

opened by rafacarrascosa 1

Abusive memory usage

The following script consumes all 4Gb of RAM in my laptop:

from featureforge.vectorizer import Vectorizer

data = [i for i in range(20000)]
feature = lambda x: str(x)

vectorizer = Vectorizer([feature])
X = vectorizer.fit_transform(data, None)

I suspect this is a bug.

bug

opened by rafacarrascosa 1

Does not install with pip3

root@5da98a0113fa:/# pip install featureforge
bash: pip: command not found
root@5da98a0113fa:/# pip3 install featureforge
Downloading/unpacking featureforge
  Downloading featureforge-0.1.6.tar.gz
  Running setup.py (path:/tmp/pip_build_root/featureforge/setup.py) egg_info for package featureforge
    Traceback (most recent call last):
      File "<string>", line 17, in <module>
      File "/tmp/pip_build_root/featureforge/setup.py", line 11, in <module>
        long_description = open(os.path.join(base_path, 'README.rst')).read()
      File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1383: ordinal not in range(128)
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 17, in <module>

  File "/tmp/pip_build_root/featureforge/setup.py", line 11, in <module>

    long_description = open(os.path.join(base_path, 'README.rst')).read()

  File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode

    return codecs.ascii_decode(input, self.errors)[0]

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1383: ordinal not in range(128)

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_root/featureforge
Storing debug log for failure in /root/.pip/pip.log

opened by timrichd 6

in stats manager, booking_duration=None is not supported

This code from the documentation is not working because of this:

>>> from featureforge.experimentation.stats_manager import StatsManager
>>> sm = StatsManager(None, 'Your-database-name')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/francolq/.virtualenvs/lq-research/local/lib/python2.7/site-packages/featureforge/experimentation/stats_manager.py", line 62, in __init__
    self.booking_delta = timedelta(seconds=booking_duration)
TypeError: unsupported type for timedelta seconds component: NoneType

fixed-on-develop

opened by francolq 1

stats manager should allow storing intermediate results

In a very long experiment, I would like to be able to incrementally submit results. This is useful if the experiment fails later, or if I want to make queries to see how is it going.

opened by francolq 4
Include feature name in OutputValueError / InputValueError

Whenever a feature output / input check fails there's no indication as to which feature has the blame. It's necesary to know this in an environment with tens of features or more.

opened by rafacarrascosa 0
Experiment runner should take an optional argv argument

It's customary when providing APIs for runners to provide an optional argv argument to use instead of sys.argv. This allows building custom runners more easily or overriding/defaulting arguments. It also makes the runner argumetn parsing easier to unit test

As an example of this API pattern in other places, you can take a look at https://github.com/docopt/docopt#api or https://docs.python.org/2/library/unittest.html#unittest.main

opened by dmoisset 0

Owner

Machinalis

GitHub

scikit-learn addon to operate on set/"group"-based features

skl-groups skl-groups is a package to perform machine learning on sets (or "groups") of features in Python. It extends the scikit-learn library with s

41 Apr 6, 2022

A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.

Master status: Development status: Package information: MDR A scikit-learn-compatible Python implementation of Multifactor Dimensionality Reduction (M

122 Jul 6, 2022

Automatic extraction of relevant features from time series:

tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis

7k Jan 3, 2023

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

380 Nov 5, 2022

Fully Automated YouTube Channel ▶️with Added Extra Features.

Fully Automated Youtube Channel ▒█▀▀█ █▀▀█ ▀▀█▀▀ ▀▀█▀▀ █░░█ █▀▀▄ █▀▀ █▀▀█ ▒█▀▀▄ █░░█ ░░█░░ ░▒█░░ █░░█ █▀▀▄ █▀▀ █▄▄▀ ▒█▄▄█ ▀▀▀▀ ░░▀░░ ░▒█░░ ░▀▀▀ ▀▀▀░

249 Jan 2, 2023

Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

Repository Status for Scikit-learn Live webpage Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn. Running local

6 Dec 27, 2022

A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

0 Mar 30, 2022

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

Master status: Development status: Package information: scikit-rebate This package includes a scikit-learn-compatible Python implementation of ReBATE,

374 Dec 15, 2022

scikit-learn addon to operate on set/"group"-based features

skl-groups skl-groups is a package to perform machine learning on sets (or "groups") of features in Python. It extends the scikit-learn library with s

41 Apr 6, 2022

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

102 Nov 10, 2022

A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

4.9k Dec 31, 2022

A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

3.8k Feb 13, 2021

A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

4.9k Jan 3, 2023

Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

213 Jan 2, 2023

Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

213 Jan 2, 2023

A scikit-learn-compatible module for estimating prediction intervals.

|Anaconda|_ MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals using your favourite sklearn

584 Dec 27, 2022

Python package for Bayesian Machine Learning with scikit-learn API

Python package for Bayesian Machine Learning with scikit-learn API Installing & Upgrading package pip install https://github.com/AmazaspShumik/sklearn

482 Jan 4, 2023

Unit testing AWS interactions with pytest and moto. These examples demonstrate how to structure, setup, teardown, mock, and conduct unit testing. The source code is only intended to demonstrate unit testing.

Unit Testing Interactions with Amazon Web Services (AWS) Unit testing AWS interactions with pytest and moto. These examples demonstrate how to structu

21 Nov 17, 2022

PySpark + Scikit-learn = Sparkit-learn

Sparkit-learn PySpark + Scikit-learn = Sparkit-learn GitHub: https://github.com/lensacom/sparkit-learn About Sparkit-learn aims to provide scikit-lear

1.1k Jan 4, 2023

Relevance Vector Machine implementation using the scikit-learn API.

scikit-rvm scikit-rvm is a Python module implementing the Relevance Vector Machine (RVM) machine learning technique using the scikit-learn API. Quicks

204 Nov 18, 2022

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Related tags

Overview

Feature Forge

Installation

Documentation

Contact information

Changelog

Comments

Test failing, schema validates integer as str

Abusive memory usage

Does not install with pip3

in stats manager, booking_duration=None is not supported

stats manager should allow storing intermediate results

Include feature name in OutputValueError / InputValueError

Experiment runner should take an optional argv argument

Owner

Machinalis

scikit-learn addon to operate on set/"group"-based features

A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.

Automatic extraction of relevant features from time series:

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Fully Automated YouTube Channel ▶️with Added Extra Features.

Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

A collection of Scikit-Learn compatible time series transformers and tools.

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

scikit-learn addon to operate on set/"group"-based features

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch

Scikit-learn compatible estimation of general graphical models

Scikit-learn compatible estimation of general graphical models

A scikit-learn-compatible module for estimating prediction intervals.

Python package for Bayesian Machine Learning with scikit-learn API

Unit testing AWS interactions with pytest and moto. These examples demonstrate how to structure, setup, teardown, mock, and conduct unit testing. The source code is only intended to demonstrate unit testing.

PySpark + Scikit-learn = Sparkit-learn

Relevance Vector Machine implementation using the scikit-learn API.