Primitives for machine learning and data science.

Overview

DAI-Lab An Open Source Project from the Data to AI Lab, at MIT

Development Status PyPi Shield Tests Downloads Binder

MLPrimitives

Pipelines and primitives for machine learning and data science.

Overview

This repository contains primitive annotations to be used by the MLBlocks library, as well as the necessary Python code to make some of them fully compatible with the MLBlocks API requirements.

There is also a collection of custom primitives contributed directly to this library, which either combine third party tools or implement new functionalities from scratch.

Why did we create this library?

  • Too many libraries in a fast growing field
  • Huge societal need to build machine learning apps
  • Domain expertise resides at several places (knowledge of math)
  • No documented information about hyperparameters, behavior...

Installation

Requirements

MLPrimitives has been developed and tested on Python 3.6, 3.7 and 3.8

Also, although it is not strictly required, the usage of a virtualenv is highly recommended in order to avoid interfering with other software installed in the system where MLPrimitives is run.

Install with pip

The easiest and recommended way to install MLPrimitives is using pip:

pip install mlprimitives

This will pull and install the latest stable release from PyPi.

If you want to install from source or contribute to the project please read the Contributing Guide.

Quickstart

This section is a short series of tutorials to help you getting started with MLPrimitives.

In the following steps you will learn how to load and run a primitive on some data.

Later on you will learn how to evaluate and improve the performance of a primitive by tuning its hyperparameters.

Running a Primitive

In this first tutorial, we will be executing a single primitive for data transformation.

1. Load a Primitive

The first step in order to run a primitive is to load it.

This will be done using the mlprimitives.load_primitive function, which will load the indicated primitive as an MLBlock Object from MLBlocks

In this case, we will load the mlprimitives.custom.feature_extraction.CategoricalEncoder primitive.

from mlprimitives import load_primitive

primitive = load_primitive('mlprimitives.custom.feature_extraction.CategoricalEncoder')

2. Load some data

The CategoricalEncoder is a transformation primitive which applies one-hot encoding to all the categorical columns of a pandas.DataFrame.

So, in order to be able to run our primitive, we will first load some data that contains categorical columns.

This can be done with the mlprimitives.datasets.load_census function:

from mlprimitives.datasets import load_census

dataset = load_census()

This dataset object has an attribute data which contains a table with several categorical columns.

We can have a look at this table by executing dataset.data.head(), which will return a table like this:

                             0                    1                   2
age                         39                   50                  38
workclass            State-gov     Self-emp-not-inc             Private
fnlwgt                   77516                83311              215646
education            Bachelors            Bachelors             HS-grad
education-num               13                   13                   9
marital-status   Never-married   Married-civ-spouse            Divorced
occupation        Adm-clerical      Exec-managerial   Handlers-cleaners
relationship     Not-in-family              Husband       Not-in-family
race                     White                White               White
sex                       Male                 Male                Male
capital-gain              2174                    0                   0
capital-loss                 0                    0                   0
hours-per-week              40                   13                  40
native-country   United-States        United-States       United-States

3. Fit the primitive

In order to run our pipeline, we first need to fit it.

This is the process where it analyzes the data to detect which columns are categorical

This is done by calling its fit method and assing the dataset.data as X.

primitive.fit(X=dataset.data)

4. Produce results

Once the pipeline is fit, we can process the data by calling the produce method of the primitive instance and passing agin the data as X.

transformed = primitive.produce(X=dataset.data)

After this is done, we can see how the transformed data contains the newly generated one-hot vectors:

                                                0      1       2       3       4
age                                            39     50      38      53      28
fnlwgt                                      77516  83311  215646  234721  338409
education-num                                  13     13       9       7      13
capital-gain                                 2174      0       0       0       0
capital-loss                                    0      0       0       0       0
hours-per-week                                 40     13      40      40      40
workclass= Private                              0      0       1       1       1
workclass= Self-emp-not-inc                     0      1       0       0       0
workclass= Local-gov                            0      0       0       0       0
workclass= ?                                    0      0       0       0       0
workclass= State-gov                            1      0       0       0       0
workclass= Self-emp-inc                         0      0       0       0       0
...                                             ...    ...     ...     ...     ...

Tuning a Primitive

In this short tutorial we will teach you how to evaluate the performance of a primitive and improve its performance by modifying its hyperparameters.

To do so, we will load a primitive that can learn from the transformed data that we just generated and later on make predictions based on new data.

1. Load another primitive

Firs of all, we will load the xgboost.XGBClassifier primitive that we will use afterwards.

primitive = load_primitive('xgboost.XGBClassifier')

2. Split the dataset

Before being able to evaluate the primitive perfomance, we need to split the data in two parts: train, which will be used for the primitive to learn, and test, which will be used to make the predictions that later on will be evaluated.

In order to do this, we will get the first 75% of rows from the transformed data that we obtained above and call it X_train, and then set the next 25% of rows as X_test.

train_size = int(len(transformed) * 0.75)
X_train = transformed.iloc[:train_size]
X_test = transformed.iloc[train_size:]

Similarly, we need to obtain the y_train and y_test variables containing the corresponding output values.

y_train = dataset.target[:train_size]
y_test = dataset.target[train_size:]

3. Fit the new primitive

Once we have have splitted the data, we can fit the primitive by passing X_train and y_train to its fit method.

primitive.fit(X=X_train, y=y_train)

4. Make predictions

Once the primitive has been fitted, we can produce predictions using the X_test data as input.

predictions = primitive.produce(X=X_test)

5. Evalute the performance

We can now evaluate how good the predictions from our primitive are by using the score method from the dataset object on both the expected output and the real output from the primitive:

dataset.score(y_test, predictions)

This will output a float value between 0 and 1 indicating how good the predicitons are, being 0 the worst score possible and 1 the best one.

In this case we will obtain a score around 0.866

6. Set new hyperparameter values

In order to improve the performance of our primitive we will try to modify a couple of its hyperparameters.

First we will see which hyperparameter values the primitive has by calling its get_hyperparameters method.

primitive.get_hyperparameters()

which will return a dictionary like this:

{
    "n_jobs": -1,
    "n_estimators": 100,
    "max_depth": 3,
    "learning_rate": 0.1,
    "gamma": 0,
    "min_child_weight": 1
}

Next, we will see which are the valid values for each one of those hyperparameters by calling its get_tunable_hyperparameters method:

primitive.get_tunable_hyperparameters()

For example, we will see that the max_depth hyperparameter has the following specification:

{
    "type": "int",
    "default": 3,
    "range": [
        3,
        10
    ]
}

Next, we will choose a valid value, for example 7, and set it into the pipeline using the set_hyperparameters method:

primitive.set_hyperparameters({'max_depth': 7})

7. Re-evaluate the performance

Once the new hyperparameter value has been set, we repeat the fit/train/score cycle to evaluate the performance of this new hyperparameter value:

primitive.fit(X=X_train, y=y_train)
predictions = primitive.produce(X=X_test)
dataset.score(y_test, predictions)

This time we should see that the performance has improved to a value around 0.724

What's Next?

Do you want to learn more about how the project, about how to contribute to it or browse the API Reference? Please check the corresponding sections of the documentation!

Comments
  • mlprimitives/candidates/timseries.py lacks ability to return sequences of desired length

    mlprimitives/candidates/timseries.py lacks ability to return sequences of desired length

    I want to push all the final touches and changes that were made after MLPrimitives release v0.1.4 in order to make the LSTM pipeline primitives work as expected, since there are bugs that prevent them from being used now.

    I will push the changes to the primitives that are currently in the working demo as tested on private servers and repos.

    wontfix 
    opened by itinawi 7
  • Add DSP primitives

    Add DSP primitives

    I'd like to add primitives for Digital Signal Processing (DSP). These primitives will be used to detect anomalies in the telemetry data received from different satellites as part of project Orion, but they can be used to detect anomalies in any system.

    Filename: dsp.py Classname: freqAnalysis Methods:

    • fitFreqMax
    • fitFreqStdDev
    • produceFreqCompare
    • produceBPF

    There are some more methods used internally in the class: windowDesign, nextPowerOf2

    I also wrote the first JSON file: dsp.stdDevFreqAnalysis.json. This file points to the fitFreqStdDev method for data fitting and to the produceFreqCompare method for making predictions. Later I'll write more JSON files pointing to the other methods in the freqAnalysis class.

    I also wrote two more classes for testing and scoring: Filename: dsp_utils.py Classname: SIGEN Methods: - sigen - anomaliesGenerator - noiseGenerator - signalGenerator - show Classname: EVAL Methods:
    - score - evaluate

    new primitives 
    opened by rjdiez 7
  • Typo in the primitive keras.Sequential.LSTMTimeSeriesRegressor

    Typo in the primitive keras.Sequential.LSTMTimeSeriesRegressor

    • MLPrimitives version:0.1.10
    • Python version:N/A
    • Operating System:N/A

    Description

    There is a typo in line 74 of MLPrimitives/mlprimitives/jsons/keras.Sequential.LSTMTimeSeriesRegressor.json It says bastch_size and should say batch_size. Otherwise, the batch size of the training cannot be tuned

    What I Did

    Simply correct the typo and it works :)
    
    bug approved 
    opened by DanielCalvoCerezo 4
  • Potential dependency conflicts between mlprimitives and numpy

    Potential dependency conflicts between mlprimitives and numpy

    Hi, as shown in the following full dependency graph of mlprimitives, mlprimitives requires numpy >=1.15.2,<1.17, mlprimitives requires statsmodels <1,>=0.9.0 (statsmodels 0.11.1 will be installed, i.e., the newest version satisfying the version constraint), and directed dependency statsmodels 0.11.1 transitively introduces numpy >=1.14.

    Obviously, there are multiple version constraints set for numpy in this project. However, according to pip's “first found wins” installation strategy, numpy 1.16.6 (i.e., the newest version satisfying constraint >=1.15.2,<1.17) is the actually installed version.

    Although the first found package version numpy 1.16.6 just satisfies the later dependency constraint (numpy >=1.15.2,<1.17), such installed version is very close to the upper bound of the version constraint of numpy specified by statsmodels 0.11.1.

    Once statsmodels upgrades,its newest version will be installed, Therefore, it will easily cause a dependency conflict (build failure), if the upgraded statsmodels version introduces a higher version of numpy, violating its another version constraint >=1.15.2,<1.17.

    According to the release history of statsmodels, it habitually upgrates Numpy in its recent releases. For instance, statsmodels 0.11.0rc1 upgrated Numpy’s constraint from >=1.11 to >=1.14, and statsmodels next version upgrated Numpy’s constraint from >=1.14 to >=1.15.

    As such, it is a warm warning of a potential dependency conflict issue for mlprimitives.

    Dependency tree

    mlprimitives  - 0.2.4
    | +- docutils(install version:0.15.2 version range:>=0.10,<0.16)
    | +- featuretools(install version:0.11.0 version range:<0.12,>=0.6.1)
    | | +- backports.tempfile(install version:1.0 version range:>=1.0)
    | | +- click(install version:7.1.1 version range:>=7.0.0)
    | | +- cloudpickle(install version:1.3.0 version range:>=0.4.0)
    | | +- dask(install version:2.14.0 version range:>=1.1.0)
    | | +- dask(install version:1.2.2 version range:<2.0.0)
    | | +- distributed(install version:2.14.0 version range:>=1.24.2)
    | | | +- click (install version:7.1.1 version range:>=6.6)
    | | | +- cloudpickle (install version:1.3.0 version range:>=0.2.2)
    | | | +- dask (install version:2.14.0 version range:>=2.9.0)
    | | | +- msgpack (install version:1.0.0 version range:>=0.6.0)
    | | | +- psutil (install version:5.7.0 version range:>=5.0)
    | | | +- pyyaml(install version:5.3.1 version range:*)
    | | | +- setuptools(install version:46.1.3 version range:*)
    | | +- distributed(install version:1.28.1 version range:<2.0.0)
    | | | +- click(install version:7.1.1 version range:>=6.6)
    | | | +- cloudpickle(install version:1.3.0 version range:>=0.2.2)
    | | | +- dask(install version:2.14.0 version range:>=0.18.0)
    | | | +- futures(install version:3.3.0 version range:*)
    | | | +- msgpack(install version:1.0.0 version range:*)
    | | | +- psutil(install version:5.7.0 version range:>=5.0)
    | | | +- pyyaml(install version:5.3.1 version range:*)
    | | | +- singledispatch(install version:3.4.0.3 version range:*)
    | | | | +- six(install version:1.14.0 version range:*)
    | | | +- six(install version:1.14.0 version range:*)
    | | +- funcsigs(install version:1.0.2 version range:>=1.0.2)
    | | +- future(install version:0.18.2 version range:>=0.16.0)
    | | +- numpy(install version:1.16.6 version range:>=1.13.3)
    | | +- pandas(install version:0.24.2 version range:>=0.23.0)
    | | +- pathlib(install version:1.0.1 version range:>=1.0.1)
    | | +- psutil(install version:5.7.0 version range:>=5.4.8)
    | | +- pyyaml(install version:5.3.1 version range:>=3.12)
    | | +- s3fs(install version:0.4.2 version range:>=0.2.2)
    | | | +- botocore(install version:1.15.39 version range:>=1.12.91)
    | | | | +- docutils(install version:0.15.2 version range:>=0.10,<0.16)
    | | | | +- jmespath(install version:0.10.0 version range:>=0.7.1,<1.0.0)
    | | | | +- python-dateutil(install version:2.8.1 version range:>=2.1,<3.0.0)
    | | | | +- urllib3(install version:1.25.9 version range:>=1.20,<1.26)
    | | | +- fsspec(install version:0.7.2 version range:>=0.6.0)
    | | +- scikit-learn(install version:0.20.4 version range:>=0.20.0)
    | | +- scikit-learn(install version:0.20.4 version range:<0.21,>=0.20.0)
    | | +- smart-open(install version:1.11.1 version range:>=1.8.4)
    | | | +- boto(install version:2.49.0 version range:*)
    | | | +- boto3(install version:1.12.39 version range:*)
    | | | | +- botocore(install version:1.15.49 version range:>=1.15.39,<1.16.0)
    | | | | +- jmespath(install version:0.10.0 version range:>=0.7.1,<1.0.0)
    | | | | +- s3transfer(install version:0.3.3 version range:>=0.3.0,<0.4.0)
    | | | +- requests(install version:2.23.0 version range:*)
    | | | | +- certifi(install version:2020.4.5.1 version range:>=2017.4.17)
    | | | | +- chardet(install version:3.0.4 version range:>=3.0.2,<4)
    | | | | +- idna(install version:2.9 version range:>=2.5,<3)
    | | | | +- urllib3(install version:1.25.9 version range:>=1.21.1,<1.26)
    | | +- tqdm(install version:4.45.0 version range:>=4.32.0)
    | +- iso639(install version:0.1.4 version range:<0.2,>=0.1.4)
    | +- keras(install version:2.3.1 version range:<3,>=2.1.6)
    | | +- h5py(install version:2.10.0 version range:*)
    | | +- keras_applications(install version: version range:>=1.0.6)
    | | +- keras_preprocessing(install version: version range:>=1.0.5)
    | | +- numpy(install version:1.16.6 version range:>=1.9.1)
    | | +- pyyaml(install version:5.3.1 version range:*)
    | | +- scipy(install version:1.4.1 version range:>=0.14)
    | | +- six(install version:1.14.0 version range:>=1.9.0)
    | +- langdetect(install version:1.0.8 version range:<2,>=1.0.7)
    | | +- six(install version:1.14.0 version range:*)
    | +- lightfm(install version:1.15 version range:>=1.15,<2)
    | | +- 1-15(install version: version range:*)
    | | +- numpy(install version:1.16.6 version range:*)
    | | +- requests(install version:2.23.0 version range:*)
    | | | +- certifi(install version:2020.4.5.1 version range:>=2017.4.17)
    | | | +- chardet(install version:3.0.4 version range:>=3.0.2,<4)
    | | | +- idna(install version:2.9 version range:>=2.5,<3)
    | | | +- urllib3(install version:1.25.9 version range:>=1.21.1,<1.26)
    | | +- scipy(install version:1.4.1 version range:>=0.17.0)
    | +- mlblocks(install version:0.3.4 version range:>=0.3.4,<0.4)
    | +- networkx(install version:2.4 version range:>=2.0,<3)
    | | +- decorator(install version:4.4.2 version range:>=4.3.0)
    | +- nltk(install version:3.5 version range:>=3.3,<4)
    | +- numpy(install version:1.16.6 version range:>=1.15.2,<1.17)
    | +- opencv-python(install version:4.2.0.34 version range:<5,>=3.4.0.12)
    | +- pandas(install version:0.24.2 version range:>=0.23.4,<0.25)
    | +- python-louvain(install version:0.13 version range:>=0.10,<0.14)
    | | +- networkx(install version:2.4 version range:*)
    | | | +- decorator(install version:4.4.2 version range:>=4.3.0)
    | +- scikit-image(install version:0.14.5 version range:<0.15,>=0.13.1)
    | +- scikit-learn(install version:0.20.4 version range:>=0.20.0,<0.21)
    | +- scipy(install version:1.4.1 version range:<2,>=1.1.0)
    | +- setuptools(install version:46.1.3 version range:>=41.0.0)
    | +- statsmodels(install version:0.11.1 version range:<1,>=0.9.0)
    | | +- numpy(install version:1.16.6 version range:>=1.14)
    | | +- pandas(install version:0.24.2 version range:>=0.21)
    | | +- patsy(install version:0.5.1 version range:>=0.5)
    | | | +- numpy(install version:1.16.6 version range:>=1.4)
    | | | +- six(install version:1.14.0 version range:*)
    | | +- scipy(install version:1.4.1 version range:>=1.0)
    | +- tensorflow(install version:1.15.2 version range:<2,>=1.11.0)
    | +- xgboost(install version:0.90 version range:<1,>=0.72.1)
    

    Thanks for your help. Best, Neolith

    opened by NeolithEra 3
  • Add LSTM-CycleGAN primitive for time series anomaly detection

    Add LSTM-CycleGAN primitive for time series anomaly detection

    resolve #200

    Add prototype of LSTM-CycleGAN and corresponding error-calculation primitives.

    This GAN architecture allows to encode and decode a time-series signal. For the error calculation a combination of the reconstruction error and the score from the critic network is used.

    Since the model is still under development, the primitives are located in the candidates folder.

    opened by AlexanderGeiger 3
  • Add new primitive: Arima model

    Add new primitive: Arima model

    Description

    ARIMA models are often used to describe time series data. Therefore we should add an Arima primitive for time series forecasting. We can use the statsmodels library.

    The primitive takes an array as input, on which an Arima model is fitted. The forecast method returns another array of predictions.

    What I Did

    I started implementing this primitive for testing purposes in the arima branch on my fork, which you can check out. Concretely, I included an adapter for statsmodels and a Primitive JSON file.

    Any feedback on the primitive itself and the implementation would be highly appreciated.

    new primitives approved 
    opened by AlexanderGeiger 3
  • Add Primitives for Error calculation, smoothing, and thresholding

    Add Primitives for Error calculation, smoothing, and thresholding

    I want to add primitives for dynamic error calculation, smoothing, and thresholding.

    I will be adding the code for these functions to MLPrimitives as its own python file, to be called MLPrimitives/mlprimitives/dynamic_error_thresholding.py

    The implementation is as per Section 3.2 of : https://arxiv.org/pdf/1802.04431.pdf

    dynamic_error_thresholding.py software architecture:

    It will be a Python class called ErrorThresholder which is initialized with

    • y_true (np array): array of test targets corresponding to true values to be predicted at end of each sequence
    • y_hat (np array): predicted test values for each timestep in y_test
    • smoothed (bool): If False, return unsmooothed errors (used for assessing quality of predictions) Some parameters that have default values:
    • BATCH_SIZE = 70 # number of values to evaluate in each batch
    • WINDOW_SIZE = 30 # number of trailing batches to use in error calculation
    • SMOOTHING_PERC = 0.05 # determines window size used in EWMA smoothing (percentage of total values for channel)

    Methods:

    • get_errors(inputs y_true and y_hat from self, smoothed (boolean)), returns errors (list of errors) Calculate the difference between predicted telemetry values and actual values, then smooth residuals using ewma to encourage identification of sustained errors/anomalies.

    • process_errors(inputs: y_true and y_hat, smoothed_errors) returns sequence_anomalies (a list of anomalies detected from the errors) and anomaly_scores (score for each anomaly sequence).

    • other helper methods for these two functions, as needed

    new primitives 
    opened by itinawi 3
  • Add sklearn.neighbors primitives

    Add sklearn.neighbors primitives

    I would like to add some primitives from sklearn.neighbors

    Primitives to add:

    • sklearn.neighbors.KNeighborsClassifier
    • sklearn.neighbors.KNeighborsClassifier_proba
    • sklearn.neighbors.KNeighborsRegressor
    new primitives 
    opened by wsnalice 3
  • Issue 180 improve find anomalies primitive

    Issue 180 improve find anomalies primitive

    Addresses the changes mentioned in #180 to improve the custom.timeseries_anomalies.find_anomalies primitive

    • Add optional threshold for unusually low errors
    • Add possibility to use overlapping thresholds
    • Once an anomalous region is found, increase the region by a predefined size to make the pruning afterwards more stable
    • Fix the calculation of anomaly score to work with consecutive and overlapping sequences
    opened by AlexanderGeiger 2
  • Add anomaly threshold calculation for batches of errors

    Add anomaly threshold calculation for batches of errors

    Description

    The calculation of the threshold ε in the timeseries_anomalies.py primitive should happen in batches, i.e. the threshold should be calculated for each batch of errors separately. Currently we are using the whole array of errors to calculate one single threshold.

    opened by AlexanderGeiger 2
  • Predict output probabilities using predict_proba

    Predict output probabilities using predict_proba

    • MLPrimitives version: 0.1.5
    • Python version: 3.7.1
    • Operating System: Amazon Linux (4.14.88-88.76.amzn2.x86_64)

    Description

    I have a use case where I need classification models to output probabilities instead of the predicted class.

    For eg: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.predict_proba

    To work around this, I had to create a custom block for each model and insert a new keyword for predict as shown below.

    from sklearn.ensemble import RandomForestClassifier
    
    class RandomForestBlock(object):
        def __init__(self, **kwargs):
            self.kwargs = kwargs
            self.model = None
    
        def fit(self, X,y):
            self.model = RandomForestClassifier(**self.kwargs)
            self.model.fit(X,y)
    
        def predict(self, X, prob=False):
            if prob:
                return self.model.predict_proba(X)[:,1]
            else:
                return self.model.predict(X)
    

    I was wondering if there is a better way to do this without writing custom code? This will probably be a pretty common use case.

    question 
    opened by sarin1991 2
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 1
  • Upgrade ``featuretools``

    Upgrade ``featuretools``

    • MLPrimitives version: 0.3.2
    • Python version: 3.8
    • Operating System: macOS

    Description

    Featuretools released new stable versions, reaching v1.0.0, would like to update MLPrimitives to add it into range.

    opened by sarahmish 0
  • Loss/Validation Plot Callback

    Loss/Validation Plot Callback

    • MLPrimitives version: 0.2.5
    • Python version: 3.6.9
    • Operating System: Ubuntu 18.04.6

    Description

    Is it possible to plot training and validation losses during the .fit() call? I saw there is this pull request https://github.com/MLBazaar/MLPrimitives/pull/163. I did not see any examples of using a callback to plot losses during training.

    Any other recommendation on how to visualize if models are converging during training would be most appreciated.

    Thank you

    Chris

    opened by cjtaylo-csu 0
  • Allow build layer to recognize layers imported from tensorflow keras

    Allow build layer to recognize layers imported from tensorflow keras

    • MLPrimitives version: 0.3.3.dev0
    • Python version: 3.7.0
    • Operating System: macOS

    Description

    Allow ml.primitives.adapters.build_layer to also recognize layers imported from tensorflow.keras.

    What I Did

    Current Version:

    if issubclass(layer_class, keras.layers.wrappers.Wrapper):
    

    Suggested Changes:

    if issubclass(layer_class, tf.keras.layers.Wrapper) or issubclass(layer_class, keras.layers.wrappers.Wrapper):
    
    opened by lcwong0928 0
  • tensorflow `get_config` error

    tensorflow `get_config` error

    • MLPrimitives version: 0.3.0
    • Python version: 3.6

    Description

    The current version of MLPrimitives will automatically install tensorflow 2.3.4.

    This version will encounter the following issue:

    /usr/local/lib/python3.6/site-packages/keras/backend.py in <module>
         34 from tensorflow.core.protobuf import config_pb2
         35 from tensorflow.python.eager import context
    ---> 36 from tensorflow.python.eager.context import get_config
         37 from tensorflow.python.framework import config
         38 from keras import backend_config
    
    ImportError: cannot import name 'get_config'
    

    Because of the piece of code from mlprimitives/adapters/keras.py

    
    import logging
    import tempfile
    
    import keras        # this is the line causing error
    import numpy as np
    

    Solution

    Simply replace

    import keras
    

    as

    from tensorflow import keras
    
    opened by dyuliu 0
Releases(v0.3.2)
  • v0.3.2(Nov 9, 2021)

  • v0.3.1(Oct 7, 2021)

  • v0.3.0(Jan 9, 2021)

    New Primitives

    • Add primitive sklearn.naive_bayes.GaussianNB - Issue #242 by @sarahmish
    • Add primitive sklearn.linear_model.SGDClassifier - Issue #241 by @sarahmish

    Primitive Improvements

    • Add offset to rolling_window_sequence primitive - Issue #251 by @skyeeiskowitz
    • Rename the time_index column to time - Issue #252 by @pvk-developer
    • Update featuretools dependency - Issue #250 by @pvk-developer

    General Improvements

    Source code(tar.gz)
    Source code(zip)
  • v0.2.5(Jul 29, 2020)

    Primitive Improvements

    • Accept timedelta window_size in cutoff_window_sequences - Issue #239 by @joanvaquer

    Bug Fixes

    • ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via pip install tensorflow - Issue #237 by @joanvaquer

    New Primitives

    • Add pandas.DataFrame.set_index primitive - Issue #222 by @JDTheRipperPC
    Source code(tar.gz)
    Source code(zip)
  • v0.2.4(Jan 30, 2020)

    New Primitives

    • Add RangeScaler and RangeUnscaler primitives - Issue #232 by @csala

    Primitive Improvements

    • Extract input_shape from X in keras.Sequential - Issue #223 by @csala

    Bug Fixes

    • mlprimitives.custom.text.TextCleaner fails if text is empty - Issue #228 by @csala
    • Error when loading the reviews dataset - Issue #230 by @csala
    • Curate dependencies: specify an explicit prompt-toolkit version range - Issue #224 by @csala
    Source code(tar.gz)
    Source code(zip)
  • v0.2.3(Nov 14, 2019)

    New Primitives

    Add primitive to make window_sequences based on cutoff times - Issue #217 by @csala Create a keras LSTM based TimeSeriesClassifier primitive - Issue #218 by @csala Add pandas DataFrame primitives - Issue #214 by @csala Add featuretools.EntitySet.normalize_entity primitive - Issue #209 by @csala

    Primitive Improvements

    Make featuretools.EntitySet.entity_from_dataframe entityset arg optional - Issue #208 by @csala

    Add text regression dataset - Issue #206 by @csala

    Bug Fixes

    pandas.DataFrame.resample crash when grouping by integer columns - Issue #211 by @csala

    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Oct 8, 2019)

    New Primitives

    • Add primitives for GAN based time-series anomaly detection - Issue #200 by @AlexanderGeiger
    • Add numpy.reshape and numpy.ravel primitives - Issue #197 by @AlexanderGeiger
    • Add feature selection primitive based on Lasso - Issue #194 by @csala

    Primitive Improvements

    • feature_extraction.CategoricalEncoder support dtype category - Issue #196 by @csala
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Sep 9, 2019)

    New Primitives

    • Timeseries Intervals to Mask Primitive - Issue #186 by @AlexanderGeiger
    • Add new primitive: Arima model - Issue #168 by @AlexanderGeiger

    Primitive Improvements

    • Curate PCA primitive hyperparameters - Issue #190 by @AlexanderGeiger
    • Add option to drop rolling window sequences - Issue #186 by @AlexanderGeiger

    Bug Fixes

    • scikit-image==0.14.3 crashes when installed on Mac - Issue #188 by @csala
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jul 11, 2019)

    New Features

    • Publish the pipelines as an entry_point Issue #175 by @csala

    Primitive Improvements

    • Improve pandas.DataFrame.resample primitive Issue #177 by @csala
    • Improve feature_extractor primitives Issue #183 by @csala
    • Improve find_anomalies primitive Issue #180 by @AlexanderGeiger

    Bug Fixes

    • Typo in the primitive keras.Sequential.LSTMTimeSeriesRegressor Issue #176 by @DanielCalvoCerezo
    Source code(tar.gz)
    Source code(zip)
  • v0.1.10(May 23, 2019)

    New Features

    • Add function to run primitives without a pipeline Issue #43 by @csala

    New Pipelines

    • Add pipelines for all the MLBlocks examples Issue #162 by @csala

    Primitive Improvements

    • Add Early Stopping to keras.Sequential.LSTMTimeSeriesRegressor primitive Issue #156 by @csala
    • Make FeatureExtractor primitives accept Numpy arrays Issue #165 by @csala
    • Add window size and pruning to the timeseries_anomalies.find_anomalies primitive Issue #160 by @csala
    Source code(tar.gz)
    Source code(zip)
  • v0.1.9(Apr 25, 2019)

    New Features

    • Add a single table binary classification dataset Issue #141 by @csala

    New Primitives

    • Add Multilayer Perceptron (MLP) primitive for binary classification Issue #140 by @Hector-hedb12
    • Add primitive for Sequence classification with LSTM Issue #150 by @Hector-hedb12
    • Add VGG-like convnet primitive Issue #149 by @Hector-hedb12
    • Add Multilayer Perceptron (MLP) primitive for multi-class softmax classification Issue #139 by @Hector-hedb12
    • Add primitive to count feature matrix columns Issue #146 by @csala

    Primitive Improvements

    • Add additional fit and predict arguments to keras.Sequential Issue #161 by @csala
    • Add suport for keras.Sequential Callbacks Issue #159 by @csala
    • Add fixed hyperparam to control keras.Sequential verbosity Issue #143 by @csala
    Source code(tar.gz)
    Source code(zip)
  • v0.1.8(Apr 25, 2019)

    New Primitives

    • mlprimitives.custom.timeseries_preprocessing.time_segments_average - Issue #137

    New Features

    • Add target_index output in timseries_preprocessing.rolling_window_sequences - Issue #136
    Source code(tar.gz)
    Source code(zip)
  • v0.1.7(Mar 16, 2019)

    General Improvements

    • Validate JSON format in make lint - Issue #133
    • Add demo datasets - Issue #131
    • Improve featuretools.dfs primitive - Issue #127

    New Primitives

    • pandas.DataFrame.resample - Issue #123
    • pandas.DataFrame.unstack - Issue #124
    • featuretools.EntitySet.add_relationship - Issue #126
    • featuretools.EntitySet.entity_from_dataframe - Issue #126

    Bug Fixes

    • Bug in timeseries_anomalies.py - Issue #119
    Source code(tar.gz)
    Source code(zip)
  • v0.1.6(Feb 28, 2019)

    General Improvements

    • Add Contributing Documentation
    • Remove upper bound in pandas version given new release of featuretools v0.6.1
    • Improve LSTMTimeSeriesRegressor hyperparameters

    New Primitives

    • mlprimitives.candidates.dsp.SpectralMask
    • mlprimitives.custom.timeseries_anomalies.find_anomalies
    • mlprimitives.custom.timeseries_anomalies.regression_errors
    • mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences
    • mlprimitives.custom.timeseries_preprocessing.time_segments_average
    • sklearn.linear_model.ElasticNet
    • sklearn.linear_model.Lars
    • sklearn.linear_model.Lasso
    • sklearn.linear_model.MultiTaskLasso
    • sklearn.linear_model.Ridge
    Source code(tar.gz)
    Source code(zip)
  • v0.1.5(Feb 12, 2019)

    New Primitives

    • sklearn.impute.SimpleImputer
    • sklearn.preprocessing.MinMaxScaler
    • sklearn.preprocessing.MaxAbsScaler
    • sklearn.preprocessing.RobustScaler
    • sklearn.linear_model.LinearRegression

    General Improvements

    • Separate curated from candidate primitives
    • Setup entry_points in setup.py to improve compaitibility with MLBlocks
    • Add a test-pipelines command to test all the existing pipelines
    • Clean sklearn example pipelines
    • Change the author entry to a contributors list
    • Change the name of mlblocks_primitives folder
    • Fix installation instructions

    Bug Fixes

    • Fix LSTMTimeSeriesRegressor primitive. Issue #90
    • Fix timeseries primitives. Issue #91
    • Negative index anomalies in timeseries_errors. Issue #89
    • Keep pandas version below 0.24.0. Issue #87
    Source code(tar.gz)
    Source code(zip)
  • v0.1.4(Jan 4, 2019)

    New Primitives

    • mlprimitives.timeseries primitives for timeseries data preprocessing
    • mlprimitives.timeseres_error primitives for timeseries anomaly detection
    • keras.Sequential.LSTMTimeSeriesRegressor
    • sklearn.neighbors.KNeighbors Classifier and Regressor
    • several sklearn.decomposition primitives
    • several sklearn.ensemble primitives

    Bug Fixes

    • Fix typo in mlprimitives.text.TextCleaner primitive
    • Fix bug in index handling in featuretools.dfs primitive
    • Fix bug in SingleLayerCNNImageClassifier annotation
    • Remove old vlaidation tags from JSON annotations
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Oct 22, 2018)

  • v0.1.2(Oct 10, 2018)

    New Features

    • Add pipeline specification language and Evaluation utilities.
    • Add pipelines for graph, text and tabular problems.
    • New primitives ClassEncoder and ClassDecoder
    • New primitives UniqueCounter and VocabularyCounter

    Bug Fixes

    • Fix TrivialPredictor bug when working with numpy arrays
    • Change XGB default learning rate and number of estimators
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Sep 21, 2018)

    New Features

    • Add more keras.applications primitives.
    • Add a Text Cleanup primitive.

    Bug Fixes

    • Add keywords to keras.preprocessing primtives.
    • Fix the image_transformmethod.
    • Add epoch as a fixed hyperparameter for keras.Sequential primitives.
    Source code(tar.gz)
    Source code(zip)
Owner
MLBazaar
The Machine Learning Bazaar
MLBazaar
Data Version Control or DVC is an open-source tool for data science and machine learning projects

Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj

Azaria Gebremichael 2 Jul 29, 2021
A collection of neat and practical data science and machine learning projects

Data Science A collection of neat and practical data science and machine learning projects Explore the docs » Report Bug · Request Feature Table of Co

Will Fong 2 Dec 10, 2021
Python module for data science and machine learning users.

dsnk-distributions package dsnk distribution is a Python module for data science and machine learning that was created with the goal of reducing calcu

Emmanuel ASIFIWE 1 Nov 23, 2021
Applied Machine Learning for Graduate Program in Computer Science (PPGCC)

Applied Machine Learning for Graduate Program in Computer Science (PPGCC) - Federal University of Santa Catarina

Jônatas Negri Grandini 1 Dec 22, 2021
A data preprocessing package for time series data. Design for machine learning and deep learning.

A data preprocessing package for time series data. Design for machine learning and deep learning.

Allen Chiang 152 Jan 7, 2023
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Jan 9, 2023
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 84 Nov 25, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbit 8.1k Dec 30, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 19 Oct 3, 2022
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

Daniel Formoso 5.7k Dec 30, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022
Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

The Apache Software Foundation 121 Dec 28, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.6k Jan 2, 2023
Meerkat provides fast and flexible data structures for working with complex machine learning datasets.

Meerkat makes it easier for ML practitioners to interact with high-dimensional, multi-modal data. It provides simple abstractions for data inspection, model evaluation and model training supported by efficient and robust IO under the hood.

Robustness Gym 115 Dec 12, 2022
This machine-learning algorithm takes in data from the last 60 days and tries to predict tomorrow's price of any crypto you ask it.

Crypto-Currency-Predictor This machine-learning algorithm takes in data from the last 60 days and tries to predict tomorrow's price of any crypto you

Hazim Arafa 6 Dec 4, 2022
ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

Broad Institute 65 Dec 20, 2022
Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared

Feature-Engineering Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared. When the dataset

kemalgunay 5 Apr 21, 2022
A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

FEATURE ENGINEERING Business Problem: A data preprocessing and feature engineering script for a machine learning pipeline needs to be prepared. It is

Pinar Oner 7 Dec 18, 2021
easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

Neuron AI 5 Jun 18, 2022