A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Overview

Feature Forge

This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, etc.), and particularly helpful if you use scikit-learn (although this can work if you have a different algorithm).

Most machine learning problems involve an step of feature definition and preprocessing. Feature Forge helps you with:

  • Defining and documenting features
  • Testing your features against specified cases and against randomly generated cases (stress-testing). This helps you making your application more robust against invalid/misformatted input data. This also helps you checking that low-relevance results when doing feature analysis is actually because the feature is bad, and not because there's a slight bug in your feature code.
  • Evaluating your features on a data set, producing a feature evaluation matrix. The evaluator has a robust mode that allows you some tolerance both for invalid data and buggy features.
  • Experimentation: running, registering, classifying and reproducing experiments for determining best settings for your problems.

Installation

Just pip install featureforge.

Documentation

Documentation is available at http://feature-forge.readthedocs.org/en/latest/

Contact information

Feature Forge is copyright 2014 Machinalis (http://www.machinalis.com/). Its primary authors are:

Any contributions or suggestions are welcome, the official channel for this is submitting github pull requests or issues.

Changelog

0.1.7:
  • StatsManager api change (order of arguments swapped)
  • For experimentation, enabled a way of booking experiments forever.
0.1.6:
  • Bug fixes related to sparse matrices.
  • Small documentation improvements.
  • Reduced default logging verbosity.
0.1.5:
  • Using sparse numpy matrices by default.
0.1.4:
  • Discarded the need of using forked version of Schema library.
0.1.3:
  • Added support for running and generating stats for experiments
0.1.2:
  • Fixing installer dependencies
0.1.1:
  • Added support for python 3
  • Added support for bag-of-words features
0.1:
  • Initial release
Comments
  • Test failing, schema validates integer as str

    Test failing, schema validates integer as str

    There is a test failing: test_feature_flattener.TestFeatureMappingFlattener Is related to the fact that schema is validating without error an integer 1 as a str.

    bug 
    opened by rafacarrascosa 1
  • Abusive memory usage

    Abusive memory usage

    The following script consumes all 4Gb of RAM in my laptop:

    from featureforge.vectorizer import Vectorizer
    
    data = [i for i in range(20000)]
    feature = lambda x: str(x)
    
    vectorizer = Vectorizer([feature])
    X = vectorizer.fit_transform(data, None)
    

    I suspect this is a bug.

    bug 
    opened by rafacarrascosa 1
  • Does not install with pip3

    Does not install with pip3

    root@5da98a0113fa:/# pip install featureforge
    bash: pip: command not found
    root@5da98a0113fa:/# pip3 install featureforge
    Downloading/unpacking featureforge
      Downloading featureforge-0.1.6.tar.gz
      Running setup.py (path:/tmp/pip_build_root/featureforge/setup.py) egg_info for package featureforge
        Traceback (most recent call last):
          File "<string>", line 17, in <module>
          File "/tmp/pip_build_root/featureforge/setup.py", line 11, in <module>
            long_description = open(os.path.join(base_path, 'README.rst')).read()
          File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
            return codecs.ascii_decode(input, self.errors)[0]
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1383: ordinal not in range(128)
        Complete output from command python setup.py egg_info:
        Traceback (most recent call last):
    
      File "<string>", line 17, in <module>
    
      File "/tmp/pip_build_root/featureforge/setup.py", line 11, in <module>
    
        long_description = open(os.path.join(base_path, 'README.rst')).read()
    
      File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
    
        return codecs.ascii_decode(input, self.errors)[0]
    
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1383: ordinal not in range(128)
    
    ----------------------------------------
    Cleaning up...
    Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_root/featureforge
    Storing debug log for failure in /root/.pip/pip.log
    
    opened by timrichd 6
  • in stats manager, booking_duration=None is not supported

    in stats manager, booking_duration=None is not supported

    This code from the documentation is not working because of this:

    >>> from featureforge.experimentation.stats_manager import StatsManager
    >>> sm = StatsManager(None, 'Your-database-name')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/francolq/.virtualenvs/lq-research/local/lib/python2.7/site-packages/featureforge/experimentation/stats_manager.py", line 62, in __init__
        self.booking_delta = timedelta(seconds=booking_duration)
    TypeError: unsupported type for timedelta seconds component: NoneType
    
    fixed-on-develop 
    opened by francolq 1
  • stats manager should allow storing intermediate results

    stats manager should allow storing intermediate results

    In a very long experiment, I would like to be able to incrementally submit results. This is useful if the experiment fails later, or if I want to make queries to see how is it going.

    opened by francolq 4
  • Include feature name in OutputValueError / InputValueError

    Include feature name in OutputValueError / InputValueError

    Whenever a feature output / input check fails there's no indication as to which feature has the blame. It's necesary to know this in an environment with tens of features or more.

    opened by rafacarrascosa 0
  • Experiment runner should take an optional argv argument

    Experiment runner should take an optional argv argument

    It's customary when providing APIs for runners to provide an optional argv argument to use instead of sys.argv. This allows building custom runners more easily or overriding/defaulting arguments. It also makes the runner argumetn parsing easier to unit test

    As an example of this API pattern in other places, you can take a look at https://github.com/docopt/docopt#api or https://docs.python.org/2/library/unittest.html#unittest.main

    opened by dmoisset 0
Owner
Machinalis
Machinalis
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 3.8k Feb 13, 2021
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.9k Jan 3, 2023
Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

null 213 Jan 2, 2023
A scikit-learn-compatible module for estimating prediction intervals.

|Anaconda|_ MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals using your favourite sklearn

SimAI 584 Dec 27, 2022
Python package for Bayesian Machine Learning with scikit-learn API

Python package for Bayesian Machine Learning with scikit-learn API Installing & Upgrading package pip install https://github.com/AmazaspShumik/sklearn

Amazasp Shaumyan 482 Jan 4, 2023
SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

SciKit-Learn Laboratory This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. O

ETS 528 Nov 25, 2022
scikit-learn: machine learning in Python

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started

scikit-learn 52.5k Jan 8, 2023
Automatically download the cwru data set, and then divide it into training data set and test data set

Automatically download the cwru data set, and then divide it into training data set and test data set.自动下载cwru数据集,然后分训练数据集和测试数据集

null 6 Jun 27, 2022
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR2022)[paper] Authors: Chenhang He, Ruihuang Li, Shuai Li, L

Billy HE 141 Dec 30, 2022
scikit-learn inspired API for CRFsuite

sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i

null 417 Dec 20, 2022
Genetic Programming in Python, with a scikit-learn inspired API

Welcome to gplearn! gplearn implements Genetic Programming in Python, with a scikit-learn inspired and compatible API. While Genetic Programming (GP)

Trevor Stephens 1.3k Jan 3, 2023
Using python and scikit-learn to make stock predictions

MachineLearningStocks in python: a starter project and guide EDIT as of Feb 2021: MachineLearningStocks is no longer actively maintained MachineLearni

Robert Martin 1.3k Dec 29, 2022
Regression Metrics Calculation Made easy for tensorflow2 and scikit-learn

Regression Metrics Installation To install the package from the PyPi repository you can execute the following command: pip install regressionmetrics I

Ashish Patel 11 Dec 16, 2022
A real-time speech emotion recognition application using Scikit-learn and gradio

Speech-Emotion-Recognition-App A real-time speech emotion recognition application using Scikit-learn and gradio. Requirements librosa==0.6.3 numpy sou

Son Tran 6 Oct 4, 2022
Genetic feature selection module for scikit-learn

sklearn-genetic Genetic feature selection module for scikit-learn Genetic algorithms mimic the process of natural selection to search for optimal valu

Manuel Calzolari 260 Dec 14, 2022
Use evolutionary algorithms instead of gridsearch in scikit-learn

sklearn-deap Use evolutionary algorithms instead of gridsearch in scikit-learn. This allows you to reduce the time required to find the best parameter

rsteca 709 Jan 3, 2023
SigOpt wrappers for scikit-learn methods

SigOpt + scikit-learn Interfacing This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together Getting Started In

SigOpt 73 Sep 30, 2022
Convert scikit-learn models to PyTorch modules

sk2torch sk2torch converts scikit-learn models into PyTorch modules that can be tuned with backpropagation and even compiled as TorchScript. Problems

Alex Nichol 101 Dec 16, 2022