A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Overview

Feature Forge

This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, etc.), and particularly helpful if you use scikit-learn (although this can work if you have a different algorithm).

Most machine learning problems involve an step of feature definition and preprocessing. Feature Forge helps you with:

  • Defining and documenting features
  • Testing your features against specified cases and against randomly generated cases (stress-testing). This helps you making your application more robust against invalid/misformatted input data. This also helps you checking that low-relevance results when doing feature analysis is actually because the feature is bad, and not because there's a slight bug in your feature code.
  • Evaluating your features on a data set, producing a feature evaluation matrix. The evaluator has a robust mode that allows you some tolerance both for invalid data and buggy features.
  • Experimentation: running, registering, classifying and reproducing experiments for determining best settings for your problems.

Installation

Just pip install featureforge.

Documentation

Documentation is available at http://feature-forge.readthedocs.org/en/latest/

Contact information

Feature Forge is copyright 2014 Machinalis (http://www.machinalis.com/). Its primary authors are:

Any contributions or suggestions are welcome, the official channel for this is submitting github pull requests or issues.

Changelog

0.1.7:
  • StatsManager api change (order of arguments swapped)
  • For experimentation, enabled a way of booking experiments forever.
0.1.6:
  • Bug fixes related to sparse matrices.
  • Small documentation improvements.
  • Reduced default logging verbosity.
0.1.5:
  • Using sparse numpy matrices by default.
0.1.4:
  • Discarded the need of using forked version of Schema library.
0.1.3:
  • Added support for running and generating stats for experiments
0.1.2:
  • Fixing installer dependencies
0.1.1:
  • Added support for python 3
  • Added support for bag-of-words features
0.1:
  • Initial release
Comments
  • Test failing, schema validates integer as str

    Test failing, schema validates integer as str

    There is a test failing: test_feature_flattener.TestFeatureMappingFlattener Is related to the fact that schema is validating without error an integer 1 as a str.

    bug 
    opened by rafacarrascosa 1
  • Abusive memory usage

    Abusive memory usage

    The following script consumes all 4Gb of RAM in my laptop:

    from featureforge.vectorizer import Vectorizer
    
    data = [i for i in range(20000)]
    feature = lambda x: str(x)
    
    vectorizer = Vectorizer([feature])
    X = vectorizer.fit_transform(data, None)
    

    I suspect this is a bug.

    bug 
    opened by rafacarrascosa 1
  • Does not install with pip3

    Does not install with pip3

    root@5da98a0113fa:/# pip install featureforge
    bash: pip: command not found
    root@5da98a0113fa:/# pip3 install featureforge
    Downloading/unpacking featureforge
      Downloading featureforge-0.1.6.tar.gz
      Running setup.py (path:/tmp/pip_build_root/featureforge/setup.py) egg_info for package featureforge
        Traceback (most recent call last):
          File "<string>", line 17, in <module>
          File "/tmp/pip_build_root/featureforge/setup.py", line 11, in <module>
            long_description = open(os.path.join(base_path, 'README.rst')).read()
          File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
            return codecs.ascii_decode(input, self.errors)[0]
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1383: ordinal not in range(128)
        Complete output from command python setup.py egg_info:
        Traceback (most recent call last):
    
      File "<string>", line 17, in <module>
    
      File "/tmp/pip_build_root/featureforge/setup.py", line 11, in <module>
    
        long_description = open(os.path.join(base_path, 'README.rst')).read()
    
      File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
    
        return codecs.ascii_decode(input, self.errors)[0]
    
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1383: ordinal not in range(128)
    
    ----------------------------------------
    Cleaning up...
    Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_root/featureforge
    Storing debug log for failure in /root/.pip/pip.log
    
    opened by timrichd 6
  • in stats manager, booking_duration=None is not supported

    in stats manager, booking_duration=None is not supported

    This code from the documentation is not working because of this:

    >>> from featureforge.experimentation.stats_manager import StatsManager
    >>> sm = StatsManager(None, 'Your-database-name')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/francolq/.virtualenvs/lq-research/local/lib/python2.7/site-packages/featureforge/experimentation/stats_manager.py", line 62, in __init__
        self.booking_delta = timedelta(seconds=booking_duration)
    TypeError: unsupported type for timedelta seconds component: NoneType
    
    fixed-on-develop 
    opened by francolq 1
  • stats manager should allow storing intermediate results

    stats manager should allow storing intermediate results

    In a very long experiment, I would like to be able to incrementally submit results. This is useful if the experiment fails later, or if I want to make queries to see how is it going.

    opened by francolq 4
  • Include feature name in OutputValueError / InputValueError

    Include feature name in OutputValueError / InputValueError

    Whenever a feature output / input check fails there's no indication as to which feature has the blame. It's necesary to know this in an environment with tens of features or more.

    opened by rafacarrascosa 0
  • Experiment runner should take an optional argv argument

    Experiment runner should take an optional argv argument

    It's customary when providing APIs for runners to provide an optional argv argument to use instead of sys.argv. This allows building custom runners more easily or overriding/defaulting arguments. It also makes the runner argumetn parsing easier to unit test

    As an example of this API pattern in other places, you can take a look at https://github.com/docopt/docopt#api or https://docs.python.org/2/library/unittest.html#unittest.main

    opened by dmoisset 0
Owner
Machinalis
Machinalis
scikit-learn addon to operate on set/"group"-based features

skl-groups skl-groups is a package to perform machine learning on sets (or "groups") of features in Python. It extends the scikit-learn library with s

Danica J. Sutherland 41 Apr 6, 2022
A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.

Master status: Development status: Package information: MDR A scikit-learn-compatible Python implementation of Multifactor Dimensionality Reduction (M

Epistasis Lab at UPenn 122 Jul 6, 2022
Automatic extraction of relevant features from time series:

tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis

Blue Yonder GmbH 7k Jan 3, 2023
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 5, 2022
Fully Automated YouTube Channel ▶️with Added Extra Features.

Fully Automated Youtube Channel ▒█▀▀█ █▀▀█ ▀▀█▀▀ ▀▀█▀▀ █░░█ █▀▀▄ █▀▀ █▀▀█ ▒█▀▀▄ █░░█ ░░█░░ ░▒█░░ █░░█ █▀▀▄ █▀▀ █▄▄▀ ▒█▄▄█ ▀▀▀▀ ░░▀░░ ░▒█░░ ░▀▀▀ ▀▀▀░

sam-sepiol 249 Jan 2, 2023
Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

Repository Status for Scikit-learn Live webpage Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn. Running local

Thomas J. Fan 6 Dec 27, 2022
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Chris Santiago 0 Mar 30, 2022
A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

Master status: Development status: Package information: scikit-rebate This package includes a scikit-learn-compatible Python implementation of ReBATE,

Epistasis Lab at UPenn 374 Dec 15, 2022
scikit-learn addon to operate on set/"group"-based features

skl-groups skl-groups is a package to perform machine learning on sets (or "groups") of features in Python. It extends the scikit-learn library with s

Danica J. Sutherland 41 Apr 6, 2022
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

Coiled 102 Nov 10, 2022
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.9k Dec 31, 2022
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 3.8k Feb 13, 2021
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.9k Jan 3, 2023
Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

null 213 Jan 2, 2023
Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

null 213 Jan 2, 2023
A scikit-learn-compatible module for estimating prediction intervals.

|Anaconda|_ MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals using your favourite sklearn

SimAI 584 Dec 27, 2022
Python package for Bayesian Machine Learning with scikit-learn API

Python package for Bayesian Machine Learning with scikit-learn API Installing & Upgrading package pip install https://github.com/AmazaspShumik/sklearn

Amazasp Shaumyan 482 Jan 4, 2023
Unit testing AWS interactions with pytest and moto. These examples demonstrate how to structure, setup, teardown, mock, and conduct unit testing. The source code is only intended to demonstrate unit testing.

Unit Testing Interactions with Amazon Web Services (AWS) Unit testing AWS interactions with pytest and moto. These examples demonstrate how to structu

AWS Samples 21 Nov 17, 2022
PySpark + Scikit-learn = Sparkit-learn

Sparkit-learn PySpark + Scikit-learn = Sparkit-learn GitHub: https://github.com/lensacom/sparkit-learn About Sparkit-learn aims to provide scikit-lear

Lensa 1.1k Jan 4, 2023
Relevance Vector Machine implementation using the scikit-learn API.

scikit-rvm scikit-rvm is a Python module implementing the Relevance Vector Machine (RVM) machine learning technique using the scikit-learn API. Quicks

James Ritchie 204 Nov 18, 2022