A Sklearn-like Framework for Hyperparameter Tuning and AutoML in Deep Learning projects. Finally have the right abstractions and design patterns to properly do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.

Neuraxio

Last update: Dec 24, 2022

Related tags

Deep Learning machine-learning framework deep-learning pipeline scikit-learn python-library parallel pipeline-framework hyperparameters hyperparameter-optimization hyperparameter-tuning hyperparameter-search neuraxle

Overview

Neuraxle Pipelines

Code Machine Learning Pipelines - The Right Way.

https://img.shields.io/github/workflow/status/Neuraxio/Neuraxle/Test%20Python%20Package/master?:alt:Build

https://img.shields.io/gitter/room/Neuraxio/Neuraxle?:alt:Gitter

https://img.shields.io/pypi/l/neuraxle?:alt:PyPI-License

https://img.shields.io/pypi/dm/neuraxle?:alt:PyPI-Downloads

https://img.shields.io/github/v/release/neuraxio/neuraxle?:alt:GitHubrelease(latestbydate)

Neuraxle is a Machine Learning (ML) library for building machine learning pipelines.

Component-Based: Build encapsulated steps, then compose them to build complex pipelines.
Evolving State: Each pipeline step can fit, and evolve through the learning process
Hyperparameter Tuning: Optimize your pipelines using AutoML, where each pipeline step has their own hyperparameter space.
Compatible: Use your favorite machine learning libraries inside and outside Neuraxle pipelines.
Production Ready: Pipeline steps can manage how they are saved by themselves, and the lifecycle of the objects allow for train, and test modes.
Streaming Pipeline: Transform data in many pipeline steps at the same time in parallel using multiprocessing Queues.

Documentation

You can find the Neuraxle documentation on the website.

The documentation is divided into several sections:

Installation

Simply do:

pip install neuraxle

Examples

We have several examples on the website.

For example, you can build a time series processing pipeline as such:

p = Pipeline([
    TrainOnly(DataShuffler()),
    WindowTimeSeries(),
    MiniBatchSequentialPipeline([
        Tensorflow2ModelStep(
            create_model=create_model,
            create_optimizer=create_optimizer,
            create_loss=create_loss
        ).set_hyperparams(HyperparameterSpace({
            'hidden_dim': 12,
            'layers_stacked_count': 2,
            'lambda_loss_amount': 0.0003,
            'learning_rate': 0.001
            'window_size_future': sequence_length,
            'output_dim': output_dim,
            'input_dim': input_dim
        })).set_hyperparams_space(HyperparameterSpace({
            'hidden_dim': RandInt(6, 750),
            'layers_stacked_count': RandInt(1, 4),
            'lambda_loss_amount': Uniform(0.0003, 0.001),
            'learning_rate': Uniform(0.001, 0.01),
            'window_size_future': FixedHyperparameter(sequence_length),
            'output_dim': FixedHyperparameter(output_dim),
            'input_dim': FixedHyperparameter(input_dim)
        }))
    ])
])

# Load data
X_train, y_train, X_test, y_test = generate_classification_data()

# The pipeline will learn on the data and acquire state.
p = p.fit(X_train, y_train)

# Once it learned, the pipeline can process new and
# unseen data for making predictions.
y_test_predicted = p.predict(X_test)

You can also tune your hyperparameters using AutoML algorithms such as the TPE:

auto_ml = AutoML(
    pipeline=pipeline,
    hyperparams_optimizer=TreeParzenEstimatorHyperparameterSelectionStrategy(
        number_of_initial_random_step=10,
        quantile_threshold=0.3,
        number_good_trials_max_cap=25,
        number_possible_hyperparams_candidates=100,
        prior_weight=0.,
        use_linear_forgetting_weights=False,
        number_recent_trial_at_full_weights=25
    ),
    validation_splitter=ValidationSplitter(test_size=0.20),
    scoring_callback=ScoringCallback(accuracy_score, higher_score_is_better=True),
    callbacks[
        MetricCallback(f1_score, higher_score_is_better=True),
        MetricCallback(precision, higher_score_is_better=True),
        MetricCallback(recall, higher_score_is_better=True)
    ],
    n_trials=7,
    epochs=10,
    hyperparams_repository=HyperparamsJSONRepository(cache_folder='cache'),
    refit_trial=True,
)

# Load data, and launch AutoML loop !
X_train, y_train, X_test, y_test = generate_classification_data()
auto_ml = auto_ml.fit(X_train, y_train)

# Get the model from the best trial, and make predictions using predict.
best_pipeline = auto_ml.get_best_model()
y_pred = best_pipeline.predict(X_test)

Why Neuraxle ?

Most research projects don't ever get to production. However, you want your project to be production-ready and already adaptable (clean) by the time you finish it. You also want things to be simple so that you can get started quickly. Read more about the why of Neuraxle here.

Community

For technical questions, please post them on StackOverflow using the neuraxle tag. The StackOverflow question will automatically be posted in Neuraxio's slack workspace in the #Neuraxle channel.

For suggestions, feature requests, and error reports, please open an issue.

For contributors, we recommend using the PyCharm code editor and to let it manage the virtual environment, with the default code auto-formatter, and using pytest as a test runner. To contribute, first fork the project, then do your changes, and then open a pull request in the main repository. Please make your pull request(s) editable, such as for us to add you to the list of contributors if you didn't add the entry, for example. Ensure that all tests run before opening a pull request. You'll also agree that your contributions will be licensed under the Apache 2.0 License, which is required for everyone to be able to use your open-source contributions.

Finally, you can as well join our Slack workspace and our Gitter to collaborate with us. We <3 collaborators. You can also subscribe to our mailing list where we will post some updates and news.

License

Neuraxle is licensed under the Apache License, Version 2.0.

Citation

You may cite our extended abstract that was presented at the Montreal Artificial Intelligence Symposium (MAIS) 2019. Here is the bibtex code to cite:

@misc{neuraxle,
author = {Chevalier, Guillaume and Brillant, Alexandre and Hamel, Eric},
year = {2019},
month = {09},
pages = {},
title = {Neuraxle - A Python Framework for Neat Machine Learning Pipelines},
doi = {10.13140/RG.2.2.33135.59043}
}

Contributors

Thanks to everyone who contributed to the project:

Guillaume Chevalier: https://github.com/guillaume-chevalier
Alexandre Brillant: https://github.com/alexbrillant
Éric Hamel: https://github.com/Eric2Hamel
Jérôme Blanchet: https://github.com/JeromeBlanchet
Michaël Lévesque-Dion: https://github.com/mlevesquedion
Philippe Racicot: https://github.com/Vaunorage
Neurodata: https://github.com/NeuroData-ltd
Klaimohelmi: https://github.com/Klaimohelmi
Vincent Antaki: https://github.com/vincent-antaki

Supported By

We thank these organisations for generously supporting the project:

Neuraxio Inc.: https://github.com/Neuraxio

Umanéo Technologies Inc.: https://www.umaneo.com/

Solution Nexam Inc.: https://nexam.io/

La Cité, LP: https://www.lacitelp.com/

Kimoby: https://www.kimoby.com/

Comments

Recursive dict compress feature

What it is

My pull request does: This change will try to compress the flat list of hyperparams, into a more structured representation. Please refer the following issue #486 for more information. I haven't added any tests for now, i wanted to make sure this implementation is reviewed.

How it works

I coded it this way: Current implementation will try to read the all_hps = pipeline.get_hyperparams() output and try to compress this particular output into more structured and readable format.

Example usage

Here is how you can use this new code as a end user:

Note: 
Please make dimensions and types clear to the reader.
E.g.: in the event fictious data is processed in this code example.
>>>all_hps = pipeline.get_hyperparams()\
>>>all_hps_shortened = all_hps.compress()
>>>print(type(all_hps_shortened))
...<class 'neuraxle.hyperparams.space.CompressedHyperparameterSamples'>
>>>pprint(all_hps_shortened)
...[
    {

        "step_name": "step1",
        "hyperparams": {'copy': True, 'iterated_power': 'auto'},
        "ancestor_steps": ["root"]
    },
    {

        "step_name": "step2",
        "hyperparams": {'copy': True, 'iterated_power': 'auto'},
        "ancestor_steps": ["root"]
    }

]

opened by Rohith295 20

Feature: Add support for string column names in ColumnTransformer for when we use a pandas df as an input
Raw data is a mixed of text columns, numerical columns, date columns, category (string based) columns.

Pre-processed data is numerical data with : 1Hot encoding of text columns 1Hot encoding of category data. 1Hot encoding of numerical (binning).

The pipeline should manage enlargement of initial columns, and different types of pre-processing based on different date types.

Keeps track of column name changes is useful (but complicated to handlle).

Key of AutoML is mostly the pre-processing part.
enhancement
opened by arita37 16
Feature: Add Binomial distributions to hyperparams/distributions.py

I think it would be a neat addition to generate Discrete-value hyperparams. You could redefine Boolean (which should really be called Bernouilli in my opinion) to be Binomial(1,p).

In the mean time it is possible to use Choice with custom probabilities to model this distribution.

Edit : Side question, lets say that, for my random hyperparameter search, I'd really like to use learning_rate = 10**uniform(-2,-4). That doesn't seem to be an option within the current state of the framework; I'd have to create my own class ExpUniform (which maybe could be already included). Would there be a not too complex way to introduce operator on these distributions instead? Or even if a complex option is on the table, are all attempts at doing this fundamentally overkill and computationally wasteful for the conceptual scope of the distributions in this file?
enhancement wontfix

opened by vincent-antaki 9
Service assertion overhaul, MixinForBaseTransformer, minor fix in AutoML and ForceHandleMixin
What it is

Changes in Mixin

Adds a private method, called at initialisation, in ForceHandleMixin to ensure the basic functions _X_data_container inherited from _TransformerStep and _FittableStep are redefined. Previously, failure to overrides theses methods led to infinite recursion.

Defines MixinForBaseTransformer which all other Mixin inherits. This Mixin asserts, at initilization, that BaseTransform is inherited from the instance and that it has been initialized. This change has the purpose to ensure that the initialization of base class is done in proper order (i.e. Steps before Mixin). (fixes #412)

NonTransformableMixin defines a _transform_data_container method.

Changes in AutoML class

Defined _transform_data_container and _fit_transfrom_data_contain to launch a NotImplementError (instead of executing an infinite recurrence)

Fixed the undefined local variable error which would happen when a non-terminal exception that happened during the first trial would be catch by the try-catch in the main loop

Added a flag in AutoML's contructor to define whether its should raise all error that happen during trial or only the one previously defined (EOFError, KeyboardInterrupt, SystemError, SystemExit). This is useful for moments we want all error to be raised (such as unittests)

Overhaul of the service assertion mechanism

There is now two different way to make service assertion; both of which are applicable on any step.

SomeStep().assert_has_service_at_execution() wrap the SomeStep() instance with a LocalServiceAssertionWrapper and will test the presence of the service right before SomeStep's execution in the will_process call of LocalServiceAssertionWrapper

SomeStep().assert_has_service() wrap SomeStep() with the GlobalyRetrievableWrapper which also tests the presence of the service at execution (like the LocalServiceAssertionWrapper)

GlobalServiceAssertionExecutorMixin allows retrieval of all GlobalyRetrievableWrapper instances within the pipeline and tests the presence of required services.

StepWithContext nows inherits of GlobalServiceAssertionExecutorMixin and thus test services at the root of the pipeline (i.e. beginning of pipeline execution)

Reason for this change : The previous version, which only asserted presence of service in a StepWithContext's will_process instance, did not allow to assert presence of a service which is registered during the execution of a pipeline.

Other minor changes

_HasSavers now has an add_saver function

Fixed the initialization order of TruncableSteps (fixes #422, mentionned in #369)

Identity now inherits from NonFittableMixin and BaseStep instead of BaseTransformer

test_could_have_context.py is now test_service_assertions.py (and has been extended)

Removed function MetaStepMixin._ensure_proper_mixin_init_order since its now redundant with the MixinForBaseTransformer interface.

examples/getting_started/plot_non_fittable_mixin.py has been adjusted to deal with the ForceHandleMixin changes

fixes #403

fixes #404

cla-signed
opened by vincent-antaki 9
Find a way to pass validation data during fit?
Hi @alexbrillant @Eric2Hamel,

We'd need a way to have a validation curve during training to be able to do this:

Early stopping in deep learning mdoels when validation loss starts to raise

Being able to plot train/val curves to debug the model's training and convergence

Being able to send train/val curves to the AutoML algorithm (as model features that can be getted by model.introspect() after a training), so that the AutoML algorithm can also feel things as we see them when in look at those charts.

How should we change the API of Neuraxle in the next big release, 0.3.0? Neuraxle is currently at 0.2.1. The ability to plot the validation curve is something we procrastinated for too long.

Suggestion: the MiniBatchSequentialPipeline could be responsible for doing this by calling .fit on train, then .predict on the validation data, then train again... and so forth. Any other ideas? I also thought of:

Doing the train/validation split at each mini-batch, inside the fit of the model, but this is very bad.

Adding 2 more arguments to fit, but this is bad.

Adding 1 more argument to transform, but this is bad and we should be using predict instead.

So the only solution to me looks like making the MiniBatchSequentialPipeline class manage that. However, how should a user tell the MiniBatchSequentialPipeline what is validation data? Would the MiniBatchSequentialPipeline instead just need a ValidationSplitWrapper (#174) and instead let the wrapper do the alternation of calling fit then predict then fit then predict again?

Thank you and Best Regards, Guillaume
enhancement help wanted invalid question
opened by guillaume-chevalier 9

Bug: Cannot load training states from checkpoints to resume training

Describe the bug When training AutoML on ResumablePipeline, logs are full of messages like these: UserWarning: Cannot Load Step /models/resumable_pipeline/AutoML/ResumablePipeline (ResumablePipeline:ResumablePipeline) With Step Saver JoblibStepSaver. saver.class.name))

To Reproduce

  HP_real = HyperparameterSpace({
    "learning_rate": Uniform(1e-5, 1),
    "max_depth": RandInt(2, 4),
    "n_estimators": Choice([30,60,90,100,130])
})

    pipeline_sk = ResumablePipeline([  
    # A Pipeline is composed of multiple chained steps. Steps
    # can alter the data before passing it to the next steps.
    AddFeatures([
        PCA(n_components=2),
        FastICA(n_components=2),
    ]),
    RidgeModelStacking([
        RandomForestRegressor(),
        GradientBoostingRegressor(warm_start=False, min_samples_leaf=2, random_state=42)  # validation_fraction = 0.2
    ])
], cache_folder=resumable_pipeline_folder).set_hyperparams_space(HP_real)

time_a = time.time()
auto_ml = AutoML(
    pipeline_sk,
    #AutoMLContainer  = AutoMLContainer(main_scoring_metric_name = "mse"),
    refit_trial=True,
    n_trials=int(epochs),
    cache_folder_when_no_handle=resumable_pipeline_folder,
    validation_splitter=ValidationSplitter(test_set_ratio),
    hyperparams_optimizer=RandomSearchHyperparameterSelectionStrategy(),
    scoring_callback=ScoringCallback(mean_squared_error, higher_score_is_better=False),
    callbacks=[
        MetricCallback('mse', metric_function=mean_squared_error, higher_score_is_better=False)
    ]
)
auto_ml = auto_ml.fit(train, y_train)  # if you use custom label encoder, then fit takes in the whole data at a time

Expected behavior I am expecting that there would be no errors such as these. The goal is to be able to warm start training from old checkpoints. Additional context Add any other context about the problem here.

bug invalid

opened by alarivarmann 8

Uncomplete Documentation Strings (Docstrings)

Many classes have incomplete docstring. Sometimes, there are also typos.

The docstrings in classes are used with the sphinx documentation builder to build the website's complete documentation API: https://www.neuraxle.neuraxio.com/stable/api.html
help wanted good first issue invalid wontfix

opened by guillaume-chevalier 8
Finally fix checkpoints for good: step saving by summary, and resume/should_resume refactor
Step Saving Checkpoints Are 100% valid now !! 🚀

Moved all resume, and should resume for steps with children inside _HasChildrenMixin

Simplified ResumablePipeline algo ALOT

Fixed step saving checkpoints : step saving by summary id !

Added summary id to save, and load (this is a breaking change :/ we could maybe do something to avoid it.... like method signatures ?) or we could live with it...

WORK IN PROGRESS TODO: docstrings validate/avoid breaking change weird merge some typings broke

This needed to be done... I felt really inspired and did this real fast tonight lol If we want to ship properly, we need to have this done because it is written in the readme...
wontfix cla-signed
opened by alexbrillant 7
Documentation: Add examples for the neuraxle.steps.output_handlers
Can you add an example in the documentation for:

neuraxle.steps.output_handlers.InputAndOutputTransformerMixin neuraxle.steps.output_handlers.OutputTransformerWrapper

Especially on how to access the X and Y data and return them.

In my use case I want to set Y based on X values (financial forecasting)

thx!
question wontfix documentation
opened by stefvra 7

Added feature to compress hps

What it is

My pull request does: This change will try to compress the flat list of hyperparams, into a more structured representation. Please refer the following PR for more information. I haven't added any tests for now, i wanted to make sure this implementation is reviewed.

How it works

I coded it this way: Current implementation will try to read the all_hps = pipeline.get_hyperparams() output and try to compress this particular output into more structured and readable format.

Example usage

Here is how you can use this new code as a end user:

Note: 
Please make dimensions and types clear to the reader.
E.g.: in the event fictious data is processed in this code example.

>>>all_hps = pipeline.get_hyperparams()\
>>>all_hps_shortened = all_hps.compress()
>>>print(type(all_hps_shortened))
...<class 'neuraxle.hyperparams.space.CompressedHyperparameterSamples'>
>>>pprint(all_hps_shortened)
...[
    {

        "step_name": "step1",
        "hyper_parameters": OrderedDict([('copy', True), ('iterated_power', 'auto')]),
        "ancestor_steps": ["root"]
    },
    {

        "step_name": "step1",
        "hyper_parameters":OrderedDict([('copy', True), ('iterated_power', 'auto')]),
        "ancestor_steps": ["root"]
    }

]

opened by Rohith295 6

Feature: RecursiveDict.compress() to shorten paths to steps and their hyperparams
Is your feature request related to a problem? Please describe. Hyperparam names are too long in nested steps

Describe the solution you'd like A way to compress the names so as to make them shorter. More specifically, I think that an automated algorithm for all existing ML pipelines could be built. That would be to do something like:

all_hps = pipeline.get_hyperparams() all_hps_shortened = all_hps.compress() pprint(all_hps_shortened)

Then we'd see something like this in the pprint:

{ "*__MetaStep__*__SKLearnWrapper_LinearRegression__C": 1000, "*__SomeStep__hyperparam3": value, "*__SKLearnWrapper_BoostedTrees__count": 10 }

That is, the unique paths to some steps were compressed using the star (*) operator. The Star operator means "one or more steps between". But the way the paths are compressed would be lossless, in the sense that the original names could ALWAYS be retrieved given the original pipeline's tree structure.

Describe alternatives you've considered Using custom ways to flush words and compress them. That seems good, but it doesn't seem to generalize to all pipelines that could exist.

Additional context Hyperparameter names were said to be too long as well in #478

Additional idea For hyperparameters, given the fact that in the future every model may need to name its expected hyperparams, then it may be possible to use their name only and directly if there is no other step with the same hyperparams. If another step uses the same hyperparam names, then compression with the "*" could go up in the tree to find the first non-common parent name or something.

More ideas are needed to be sure we do this the right way.
enhancement help wanted good first issue question
opened by guillaume-chevalier 6
Bug: Test files missing from `sdist` on PyPI
Describe the bug

Currently the testing_neuraxle directory is missing from the sdist on PyPI.

To Reproduce

curl -LO https://pypi.io/packages/source/n/neuraxle/neuraxle-0.8.1.tar.gz tar xzf neuraxle-0.8.1.tar.gz ls -1 neuraxle-0.8.1/testing_neuraxle

Expected behavior

Ideally the tests would be included in the sdist to make it easier to test after building/installing the package.

Suggested Fix

One solution would be to include a MANIFEST.in and add...

recursive-include testing_neuraxle *.py

...as well as anything else that might be needed to run the tests (like other directories or other extension types).

Additional context

This came up in trying to build & test a Conda package for conda-forge.

xref: https://github.com/conda-forge/staged-recipes/pull/16566#issuecomment-1217118836

cc @guillaume-chevalier (for awareness)
bug invalid
opened by jakirkham 0
Feature: Additional arguments to fit method in BaseStep

The problem: Currently the neuraxle BaseStep has a fit method signature with only 2 parameters (data_inputs, expected_outputs). In libraries like keras it is possible to have additional arguments being passed to the fit method. This could be things like validation generators if the main data_inputs is a data generator as well.

This means if we want to wrap a keras model that takes two data generators, in a subclass of BaseStep, then it wouldnt be a straight forward implementation.

Solution: It would be extremely useful if an additional **kwargs is added to the base step fit method(in one or more of the Mixin classes) to enable passing arbitrary arguments to the custom estimator implementations.
enhancement

opened by subramaniam20jan 4
Bug: StepSaverCallback & BestModelCheckpoint not working in version 0.7.0

Describe the bug From version 0.7.0 and later, these callbacks don't have access to the pipeline anymore since the pipeline isn't stored anymore in the TrialSplit, but just in the AutoML class.

To Reproduce Update to 0.7.0 and run these tests that are skipped.

Expected behavior Callbacks are supposed to log models using the repo.

Suggested Fix Possibly have the TrialSplits contain their trained model again, instead of letting the Trainer call the model.

Additional context 0.7.0 is not merged nor deployed at the moment of writing this issue.
bug invalid

opened by guillaume-chevalier 0
Testing: test the ParallelFeatureUnion with the tests of the FeatureUnion, and test the ParallelColumnTransformer with the tests of the ColumnTransformer.
There is currently no ParallelColumnTransformer, and lack of tests for ParallelFeatureUnion.

Suggested fix:

Parametrize FeatureUnion tests and ColumnTransformer tests.

invalid documentation
opened by guillaume-chevalier 0
Feature: Recursive Context instead of list context for different service cache levels

Is your feature request related to a problem? Please describe. As the context acts like a stack of function calls with each level having its own "variables"-like saving in the context (e.g.: Memory (as per #443 ), it might be good that if adding a service at the middle of the pipeline, that the service is scoped and local to just this context.pushed level (like a programming language stack / Assembly CPU Stack that is cleared upon stack push).

Describe the solution you'd like When pushing and popping to context, use a recursive context data structure instead of lists of parents and a single service dict. This was done originally in the first releases of Neuraxle ever, but that was change for some reasons I don't know. It seems that the original intuition was possibly better.

Describe alternatives you've considered Singly-linked list (stack data structure / design pattern) for the context's parents, and services. Could set global v.s. local services as well (like programming languages' global v.s. local scopes when declaring/using variables).

Additional context None
enhancement

opened by guillaume-chevalier 1
Feature: Have a step store using metaclass-based registration and DLS-based declaration.
Is your feature request related to a problem? Please describe. Meta-describing a pipeline from, say, a loaded configuration json would require to compose the objects back together. To do so, from object names, we'd need an object store to save the objects to upon creating them, including custom objects.

Describe the solution you'd like BaseStep to register objects to the global store using a descriptor.

Describe alternatives you've considered The "Pattern: language integrated registration" of the article found below as well. But it adds too much boilerplate.

Additional context See this article, especially the sections:

Pattern: metaclass based registration

Pattern: DSL-based declaration

Metaclass-based registration could be used as well for specifying hyperparameter distributions and more objects. This could be like Orion strings in the Orion framework.
enhancement
opened by guillaume-chevalier 0

Releases(0.8.1)

0.8.1(Aug 16, 2022)
General improvements and bugfixes:

Update FlattenForEach that wasn't up to date.

Use _ids, data_inputs, and expected_outputs more often in DataContainer instead of shorthand ids, di, and eo, which were auto-filling values too often when looping over things in flow classes and output handlers.

Changed InputAndOutputTransformerMixin to IdsAndInputAndOutputTransformerMixin and derived classes to also process IDs more often in triplets rather than duo tuples of di and eo.

Fixed bugs in DataContainer.

Easy to read string representation of the DataContainer (DACT) is now possible for easier print debugging at a glimpse.

Repair some skipped tests.

Cleaned some docstrings.

Add __str__ and __repr__ functionalities to context to show its services and parents in detail upon printing.

_TruncableMixin that is common to _TruncableSteps and _TruncableService.

Introducing _TruncableServiceWithBodyMixin for .body and .joiner that is easier. Also fix FlattenForEach.

Added different copy constructors to services depending on the AutoML train/val phase.

Add the .mutate(...) function again in the services and steps.

Add the .will_mutate_to(...) function again in the services and steps.

Rename copy() to _copy() in the services and ExecutionContext to bypass the fact that the copy method was already defined in some python core data structures. This renaming avoids conflicting these functionalities of the core python libs and of Neuraxle when defining services that inherits from core data structures at the same time.

Improvements to the _repr to make step strings less bloated when debugging: removed steps names and steps hyperparams when names are redundant with class names and hyperparams empty. Also sometimes the str will be a compact one-liner when the children of a truncable step are of length 1.

Source code(tar.gz)
Source code(zip)
0.8.0(Jul 22, 2022)
New major version number since new changes added in 0.7.1 and 0.7.2 and the following changes of 0.8.0 that makes debugging and usage of parallelism much easier:

Lots of upgrades to the parallelization module of Neuraxle, which is now much more safer and works with logging to be able to debug exceptions and bubble up exceptions properly in unit tests as well. This quality of life improvement saves lots of debugging time even in the short term.

Timeouts on tests to avoid deadlocks to hang in unit tester.

Doing pickling checks on the parallelized services for them to avoid deadlocking the multithreading queues by having picklables services.

It also repairs several bugs that impacted the parallelism that were sometimes causing deadlocks. Only a few race conditions seems to remain.

Moving some modules around and removing some dependencies between modules for lighter workflow and clearer file names relatively to contained classes.

Source code(tar.gz)
Source code(zip)
0.7.2(Jul 22, 2022)
Minor release to add AutoML Report classes used to generate statistics:

A new file called reporting.py in the AutoML module allows generating statistics on optimization rounds and other related objects.

Such a report contains a dataclass of the same subclass-level of itself, so as to be able to dig into the dataclass so as to observe it, such as to generate statistics and query its information.

Dataclasses represent the results of an AutoML optimization round, even multiple rounds.

These AutoML reports are used to get information from the nested dataclasses, such as to create visuals.

Just pass the dataclass to the reporting class, and do function calls.

Example usage: BaseReport.from_dc(some_auto_ml_dataclass)

Then call the methods for the statistics you want to compute for reporting.

Source code(tar.gz)
Source code(zip)
0.7.1(Jul 22, 2022)
Minor version release that allows for usage of SQLAlchemy ORM for hyperparameter repositories:

Recursive tree table that joins on itself to load the vanilla data classes in depth.

To represent the nodes, SQL polymorphism is used by joining on other tables.

Abstract databases can use many technologies (e.g.: open to be implemented in SQLite, PostgreSQL, MySQL, and more).

Some utility functions were added to dataclasses, such as ".tree()".

Source code(tar.gz)
Source code(zip)
0.7.0(Apr 15, 2022)
Major changes to the AutoML module are done in this version to improve its capabilities considerably:

Feature: 3rd Version for AutoML module.

Feature: ability to print epochs "i/n", and also the iter "j/m" of batches inside an epoch

Fix: ValidationSplitWrapper should use the same code as ValidationSplitter.

Fix: Cleanup all of the scipy distributions that are already available within the regular distribution module

Feature: Ctor of Hp Space should validate that each object is of type Distribution.

Feature: ExecutionContext could inherit from some steps mixins like a TruncableStep, and ExecutionContext Services could inherit from some mixins as well.

Feature: Typing in meta steps ! (wrappers' subtypes) - like C++ data structures' content templates

Feature: Improving Context to handle Assertions and Logging differently depending on the context.

Fix: If using savers in ParallelQueuedPipeline

Fix: Have the hyperparams setters (and space setters) of the TruncableSteps and MetaStepMixin throw errors when the name of the substep doesn't exist

Feature: DeprecatedMixin

Feature: Create _HasAssertsMixin to do things like self._assert(1 == 1, error_msg, context, level=None).

Feature: Ditch IDs in the data container, and ditch re-hashing mechanisms

Feature: Trainer class should be passed as an argument to AutoML

Feature: Add BaseStep.config, BaseStep.get_config() and BaseStep.set_config()

Fix: ForceHandlers things require overriding fit_transform

Fix: higher_score_is_better=False while it should be true

Fix: TPE LogNormal distributions

Source code(tar.gz)
Source code(zip)
0.6.1(Oct 17, 2021)
Make Neuraxle work under Windows. May error sometimes with the file lock issues at deletions / teardowns, but that is a detail and things are mostly working.

Add python 3.9 support.

Removed support for pickle checkpoints and kept only joblib checkpoints to simplify maintenance.

Add vscode to gitignore.

Little debugging and refactor of streaming pipeline for python 3.9 support.

Usage of savers fixed in the threading and processes code after discovering the bug doing the refactor.

API breaking change toSequentialQueuedPipeline and ParallelQueuedFeatureUnion such that use_threading changed touse_processes to be more clear.

Some changes to the logging and to objects' print behavior due to the changes in parallelism and python version update.

Moves the new_trial function to the base HyperparamsRepository class, thus removing duplicated code.

Improve the implementation of str and repr in BaseStep and in the TruncableSteps.

Source code(tar.gz)
Source code(zip)
0.6.0(Jun 29, 2021)
Added support for parallel trial execution

Added separate logger by trial

Change in setup behaviour

SKLearnWrapper supports Ensemble methods

Added Time Series Processing example

Updated / Cleaned a few example

And many more quality of life upgrades

Source code(tar.gz)
Source code(zip)
0.5.7(Feb 25, 2021)

Update service assertion interface, extended build to python 3.8, a few minor problems fixed.
Source code(tar.gz)
Source code(zip)
0.5.6(Feb 22, 2021)

Source code(tar.gz)
Source code(zip)
0.5.5(Sep 15, 2020)

Update H1, H2, H3 in intro page
Source code(tar.gz)
Source code(zip)
0.5.4(Sep 11, 2020)
Fix update_hyperparams for SKLearnWrapper (critical)

Source code(tar.gz)
Source code(zip)
0.5.3(Sep 10, 2020)
Tree Parzen Estimators

Scipy Distributions Support

Source code(tar.gz)
Source code(zip)
0.5.2(Jul 20, 2020)
Partial fit in SKLearnWrapper

Setup with context

All the changes from 0.5.1 (was not uploaded to pypi)

Source code(tar.gz)
Source code(zip)
0.5.1(Jul 20, 2020)
Fix warnings.warn for python 3.6 & python 3.7

Add with_context method to BaseStep to force the use of a given context in the pipeline

Use dependency injection by setting the execution context service locator : ExecutionContext().set_service_locator({ BaseType: instance })

Source code(tar.gz)
Source code(zip)
0.5.0(Jul 10, 2020)
Added streaming pipelines for parallel processing.

Implemented neat .apply() logic in the steps to recursively apply any function if it exists.

Refactored the BaseStep so as to make it inherit from multiple smaller classes to separate the logic and allow for creating various steps such as the BaseTransformer that doesn't need any fitting: the BaseStep is thus a BaseTransformer with aditionnal fitting behavior that is added by inheriting another mixin.

Some bug were fixed.

Source code(tar.gz)
Source code(zip)
0.4.1(Jul 10, 2020)

Source code(tar.gz)
Source code(zip)
0.4.0(Jul 10, 2020)

Source code(tar.gz)
Source code(zip)
0.3.4(Jul 10, 2020)

Source code(tar.gz)
Source code(zip)
0.3.3(Jul 10, 2020)

Source code(tar.gz)
Source code(zip)
0.3.2(Jul 10, 2020)

Source code(tar.gz)
Source code(zip)
0.3.1(Jan 16, 2020)

Source code(tar.gz)
Source code(zip)
0.3.0(Dec 25, 2019)

Source code(tar.gz)
Source code(zip)
0.2.2(Dec 19, 2019)

Source code(tar.gz)
Source code(zip)
0.2.1(Dec 25, 2019)

Source code(tar.gz)
Source code(zip)
0.2.0(Oct 29, 2019)

Source code(tar.gz)
Source code(zip)
0.1.1(Dec 25, 2019)

Source code(tar.gz)
Source code(zip)
0.1.0(Dec 25, 2019)

Source code(tar.gz)
Source code(zip)