A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Machinalis

Last update: Nov 5, 2022

Related tags

Deep Learning featureforge

Overview

Feature Forge

This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, etc.), and particularly helpful if you use scikit-learn (although this can work if you have a different algorithm).

Most machine learning problems involve an step of feature definition and preprocessing. Feature Forge helps you with:

Defining and documenting features
Testing your features against specified cases and against randomly generated cases (stress-testing). This helps you making your application more robust against invalid/misformatted input data. This also helps you checking that low-relevance results when doing feature analysis is actually because the feature is bad, and not because there's a slight bug in your feature code.
Evaluating your features on a data set, producing a feature evaluation matrix. The evaluator has a robust mode that allows you some tolerance both for invalid data and buggy features.
Experimentation: running, registering, classifying and reproducing experiments for determining best settings for your problems.

Installation

Just pip install featureforge.

Documentation

Documentation is available at http://feature-forge.readthedocs.org/en/latest/

Contact information

Javier Mansilla <[email protected]> (jmansilla at github)
Daniel Moisset <[email protected]> (dmoisset at github)
Rafael Carrascosa <[email protected]> (rafacarrascosa at github)

Any contributions or suggestions are welcome, the official channel for this is submitting github pull requests or issues.

Changelog

0.1.7:

StatsManager api change (order of arguments swapped)
For experimentation, enabled a way of booking experiments forever.

0.1.6:

Bug fixes related to sparse matrices.
Small documentation improvements.
Reduced default logging verbosity.

0.1.5:

Using sparse numpy matrices by default.

0.1.4:

Discarded the need of using forked version of Schema library.

0.1.3:

Added support for running and generating stats for experiments

0.1.2:

Fixing installer dependencies

0.1.1:

Added support for python 3
Added support for bag-of-words features

0.1:

Initial release

Comments

Test failing, schema validates integer as str

There is a test failing: test_feature_flattener.TestFeatureMappingFlattener Is related to the fact that schema is validating without error an integer 1 as a str.
bug

opened by rafacarrascosa 1

Abusive memory usage

The following script consumes all 4Gb of RAM in my laptop:

from featureforge.vectorizer import Vectorizer

data = [i for i in range(20000)]
feature = lambda x: str(x)

vectorizer = Vectorizer([feature])
X = vectorizer.fit_transform(data, None)

I suspect this is a bug.

bug

opened by rafacarrascosa 1

Does not install with pip3

root@5da98a0113fa:/# pip install featureforge
bash: pip: command not found
root@5da98a0113fa:/# pip3 install featureforge
Downloading/unpacking featureforge
  Downloading featureforge-0.1.6.tar.gz
  Running setup.py (path:/tmp/pip_build_root/featureforge/setup.py) egg_info for package featureforge
    Traceback (most recent call last):
      File "<string>", line 17, in <module>
      File "/tmp/pip_build_root/featureforge/setup.py", line 11, in <module>
        long_description = open(os.path.join(base_path, 'README.rst')).read()
      File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1383: ordinal not in range(128)
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 17, in <module>

  File "/tmp/pip_build_root/featureforge/setup.py", line 11, in <module>

    long_description = open(os.path.join(base_path, 'README.rst')).read()

  File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode

    return codecs.ascii_decode(input, self.errors)[0]

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1383: ordinal not in range(128)

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_root/featureforge
Storing debug log for failure in /root/.pip/pip.log

opened by timrichd 6

in stats manager, booking_duration=None is not supported

This code from the documentation is not working because of this:

>>> from featureforge.experimentation.stats_manager import StatsManager
>>> sm = StatsManager(None, 'Your-database-name')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/francolq/.virtualenvs/lq-research/local/lib/python2.7/site-packages/featureforge/experimentation/stats_manager.py", line 62, in __init__
    self.booking_delta = timedelta(seconds=booking_duration)
TypeError: unsupported type for timedelta seconds component: NoneType

fixed-on-develop

opened by francolq 1

stats manager should allow storing intermediate results

In a very long experiment, I would like to be able to incrementally submit results. This is useful if the experiment fails later, or if I want to make queries to see how is it going.

opened by francolq 4
Include feature name in OutputValueError / InputValueError

Whenever a feature output / input check fails there's no indication as to which feature has the blame. It's necesary to know this in an environment with tens of features or more.

opened by rafacarrascosa 0
Experiment runner should take an optional argv argument

It's customary when providing APIs for runners to provide an optional argv argument to use instead of sys.argv. This allows building custom runners more easily or overriding/defaulting arguments. It also makes the runner argumetn parsing easier to unit test

As an example of this API pattern in other places, you can take a look at https://github.com/docopt/docopt#api or https://docs.python.org/2/library/unittest.html#unittest.main

opened by dmoisset 0

Owner

Machinalis

GitHub

A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

3.8k Feb 13, 2021

A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

4.9k Jan 3, 2023

Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

213 Jan 2, 2023

A scikit-learn-compatible module for estimating prediction intervals.

|Anaconda|_ MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals using your favourite sklearn

584 Dec 27, 2022

Python package for Bayesian Machine Learning with scikit-learn API

Python package for Bayesian Machine Learning with scikit-learn API Installing & Upgrading package pip install https://github.com/AmazaspShumik/sklearn

482 Jan 4, 2023

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

SciKit-Learn Laboratory This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. O

528 Nov 25, 2022

scikit-learn: machine learning in Python

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started

52.5k Jan 8, 2023

Automatically download the cwru data set, and then divide it into training data set and test data set

Automatically download the cwru data set, and then divide it into training data set and test data set.自动下载cwru数据集，然后分训练数据集和测试数据集

6 Jun 27, 2022

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR2022)[paper] Authors: Chenhang He, Ruihuang Li, Shuai Li, L

141 Dec 30, 2022

scikit-learn inspired API for CRFsuite

sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i

417 Dec 20, 2022

Genetic Programming in Python, with a scikit-learn inspired API

Welcome to gplearn! gplearn implements Genetic Programming in Python, with a scikit-learn inspired and compatible API. While Genetic Programming (GP)

1.3k Jan 3, 2023

Using python and scikit-learn to make stock predictions

MachineLearningStocks in python: a starter project and guide EDIT as of Feb 2021: MachineLearningStocks is no longer actively maintained MachineLearni

1.3k Dec 29, 2022

Regression Metrics Calculation Made easy for tensorflow2 and scikit-learn

Regression Metrics Installation To install the package from the PyPi repository you can execute the following command: pip install regressionmetrics I

11 Dec 16, 2022

A real-time speech emotion recognition application using Scikit-learn and gradio

Speech-Emotion-Recognition-App A real-time speech emotion recognition application using Scikit-learn and gradio. Requirements librosa==0.6.3 numpy sou

6 Oct 4, 2022

Genetic feature selection module for scikit-learn

sklearn-genetic Genetic feature selection module for scikit-learn Genetic algorithms mimic the process of natural selection to search for optimal valu

260 Dec 14, 2022

Use evolutionary algorithms instead of gridsearch in scikit-learn

sklearn-deap Use evolutionary algorithms instead of gridsearch in scikit-learn. This allows you to reduce the time required to find the best parameter

709 Jan 3, 2023

SigOpt wrappers for scikit-learn methods

SigOpt + scikit-learn Interfacing This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together Getting Started In

73 Sep 30, 2022

Convert scikit-learn models to PyTorch modules

sk2torch sk2torch converts scikit-learn models into PyTorch modules that can be tuned with backpropagation and even compiled as TorchScript. Problems

101 Dec 16, 2022

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl

6.1k Jan 5, 2023

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Related tags

Overview

Feature Forge

Installation

Documentation

Contact information

Changelog

Comments

Test failing, schema validates integer as str

Abusive memory usage

Does not install with pip3

in stats manager, booking_duration=None is not supported

stats manager should allow storing intermediate results

Include feature name in OutputValueError / InputValueError

Experiment runner should take an optional argv argument

Owner

Machinalis

A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch

Scikit-learn compatible estimation of general graphical models

A scikit-learn-compatible module for estimating prediction intervals.

Python package for Bayesian Machine Learning with scikit-learn API

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

scikit-learn: machine learning in Python

Automatically download the cwru data set, and then divide it into training data set and test data set

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

scikit-learn inspired API for CRFsuite

Genetic Programming in Python, with a scikit-learn inspired API

Using python and scikit-learn to make stock predictions

Regression Metrics Calculation Made easy for tensorflow2 and scikit-learn

A real-time speech emotion recognition application using Scikit-learn and gradio

Genetic feature selection module for scikit-learn

Use evolutionary algorithms instead of gridsearch in scikit-learn

SigOpt wrappers for scikit-learn methods

Convert scikit-learn models to PyTorch modules