A Sklearn-like Framework for Hyperparameter Tuning and AutoML in Deep Learning projects. Finally have the right abstractions and design patterns to properly do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.

Overview

Neuraxle Pipelines

Code Machine Learning Pipelines - The Right Way.

https://img.shields.io/github/workflow/status/Neuraxio/Neuraxle/Test%20Python%20Package/master?:alt:Build https://img.shields.io/gitter/room/Neuraxio/Neuraxle?:alt:Gitter https://img.shields.io/pypi/l/neuraxle?:alt:PyPI-License https://img.shields.io/pypi/dm/neuraxle?:alt:PyPI-Downloads https://img.shields.io/github/v/release/neuraxio/neuraxle?:alt:GitHubrelease(latestbydate)

Neuraxle Logo

Neuraxle is a Machine Learning (ML) library for building machine learning pipelines.

  • Component-Based: Build encapsulated steps, then compose them to build complex pipelines.
  • Evolving State: Each pipeline step can fit, and evolve through the learning process
  • Hyperparameter Tuning: Optimize your pipelines using AutoML, where each pipeline step has their own hyperparameter space.
  • Compatible: Use your favorite machine learning libraries inside and outside Neuraxle pipelines.
  • Production Ready: Pipeline steps can manage how they are saved by themselves, and the lifecycle of the objects allow for train, and test modes.
  • Streaming Pipeline: Transform data in many pipeline steps at the same time in parallel using multiprocessing Queues.

Documentation

You can find the Neuraxle documentation on the website.

The documentation is divided into several sections:

Installation

Simply do:

pip install neuraxle

Examples

We have several examples on the website.

For example, you can build a time series processing pipeline as such:

p = Pipeline([
    TrainOnly(DataShuffler()),
    WindowTimeSeries(),
    MiniBatchSequentialPipeline([
        Tensorflow2ModelStep(
            create_model=create_model,
            create_optimizer=create_optimizer,
            create_loss=create_loss
        ).set_hyperparams(HyperparameterSpace({
            'hidden_dim': 12,
            'layers_stacked_count': 2,
            'lambda_loss_amount': 0.0003,
            'learning_rate': 0.001
            'window_size_future': sequence_length,
            'output_dim': output_dim,
            'input_dim': input_dim
        })).set_hyperparams_space(HyperparameterSpace({
            'hidden_dim': RandInt(6, 750),
            'layers_stacked_count': RandInt(1, 4),
            'lambda_loss_amount': Uniform(0.0003, 0.001),
            'learning_rate': Uniform(0.001, 0.01),
            'window_size_future': FixedHyperparameter(sequence_length),
            'output_dim': FixedHyperparameter(output_dim),
            'input_dim': FixedHyperparameter(input_dim)
        }))
    ])
])

# Load data
X_train, y_train, X_test, y_test = generate_classification_data()

# The pipeline will learn on the data and acquire state.
p = p.fit(X_train, y_train)

# Once it learned, the pipeline can process new and
# unseen data for making predictions.
y_test_predicted = p.predict(X_test)

You can also tune your hyperparameters using AutoML algorithms such as the TPE:

auto_ml = AutoML(
    pipeline=pipeline,
    hyperparams_optimizer=TreeParzenEstimatorHyperparameterSelectionStrategy(
        number_of_initial_random_step=10,
        quantile_threshold=0.3,
        number_good_trials_max_cap=25,
        number_possible_hyperparams_candidates=100,
        prior_weight=0.,
        use_linear_forgetting_weights=False,
        number_recent_trial_at_full_weights=25
    ),
    validation_splitter=ValidationSplitter(test_size=0.20),
    scoring_callback=ScoringCallback(accuracy_score, higher_score_is_better=True),
    callbacks[
        MetricCallback(f1_score, higher_score_is_better=True),
        MetricCallback(precision, higher_score_is_better=True),
        MetricCallback(recall, higher_score_is_better=True)
    ],
    n_trials=7,
    epochs=10,
    hyperparams_repository=HyperparamsJSONRepository(cache_folder='cache'),
    refit_trial=True,
)

# Load data, and launch AutoML loop !
X_train, y_train, X_test, y_test = generate_classification_data()
auto_ml = auto_ml.fit(X_train, y_train)

# Get the model from the best trial, and make predictions using predict.
best_pipeline = auto_ml.get_best_model()
y_pred = best_pipeline.predict(X_test)

Why Neuraxle ?

Most research projects don't ever get to production. However, you want your project to be production-ready and already adaptable (clean) by the time you finish it. You also want things to be simple so that you can get started quickly. Read more about the why of Neuraxle here.

Community

For technical questions, please post them on StackOverflow using the neuraxle tag. The StackOverflow question will automatically be posted in Neuraxio's slack workspace in the #Neuraxle channel.

For suggestions, feature requests, and error reports, please open an issue.

For contributors, we recommend using the PyCharm code editor and to let it manage the virtual environment, with the default code auto-formatter, and using pytest as a test runner. To contribute, first fork the project, then do your changes, and then open a pull request in the main repository. Please make your pull request(s) editable, such as for us to add you to the list of contributors if you didn't add the entry, for example. Ensure that all tests run before opening a pull request. You'll also agree that your contributions will be licensed under the Apache 2.0 License, which is required for everyone to be able to use your open-source contributions.

Finally, you can as well join our Slack workspace and our Gitter to collaborate with us. We <3 collaborators. You can also subscribe to our mailing list where we will post some updates and news.

License

Neuraxle is licensed under the Apache License, Version 2.0.

Citation

You may cite our extended abstract that was presented at the Montreal Artificial Intelligence Symposium (MAIS) 2019. Here is the bibtex code to cite:

@misc{neuraxle,
author = {Chevalier, Guillaume and Brillant, Alexandre and Hamel, Eric},
year = {2019},
month = {09},
pages = {},
title = {Neuraxle - A Python Framework for Neat Machine Learning Pipelines},
doi = {10.13140/RG.2.2.33135.59043}
}

Contributors

Thanks to everyone who contributed to the project:

Supported By

We thank these organisations for generously supporting the project:

Comments
  • Recursive dict compress feature

    Recursive dict compress feature

    What it is

    My pull request does: This change will try to compress the flat list of hyperparams, into a more structured representation. Please refer the following issue #486 for more information. I haven't added any tests for now, i wanted to make sure this implementation is reviewed.

    How it works

    I coded it this way: Current implementation will try to read the all_hps = pipeline.get_hyperparams() output and try to compress this particular output into more structured and readable format.

    Example usage

    Here is how you can use this new code as a end user:

    Note: 
    Please make dimensions and types clear to the reader.
    E.g.: in the event fictious data is processed in this code example.
    >>>all_hps = pipeline.get_hyperparams()\
    >>>all_hps_shortened = all_hps.compress()
    >>>print(type(all_hps_shortened))
    ...<class 'neuraxle.hyperparams.space.CompressedHyperparameterSamples'>
    >>>pprint(all_hps_shortened)
    ...[
        {
    
            "step_name": "step1",
            "hyperparams": {'copy': True, 'iterated_power': 'auto'},
            "ancestor_steps": ["root"]
        },
        {
    
            "step_name": "step2",
            "hyperparams": {'copy': True, 'iterated_power': 'auto'},
            "ancestor_steps": ["root"]
        }
    
    ]
    
    opened by Rohith295 20
  • Feature: Add support for string column names in ColumnTransformer for when we use a pandas df as an input

    Feature: Add support for string column names in ColumnTransformer for when we use a pandas df as an input

    Raw data is a mixed of text columns, numerical columns, date columns, category (string based) columns.

    Pre-processed data is numerical data with : 1Hot encoding of text columns 1Hot encoding of category data. 1Hot encoding of numerical (binning).

    1. The pipeline should manage enlargement of initial columns, and different types of pre-processing based on different date types.

    2. Keeps track of column name changes is useful (but complicated to handlle).

    Key of AutoML is mostly the pre-processing part.

    enhancement 
    opened by arita37 16
  • Feature: Add Binomial distributions to hyperparams/distributions.py

    Feature: Add Binomial distributions to hyperparams/distributions.py

    I think it would be a neat addition to generate Discrete-value hyperparams. You could redefine Boolean (which should really be called Bernouilli in my opinion) to be Binomial(1,p).

    In the mean time it is possible to use Choice with custom probabilities to model this distribution.

    Edit : Side question, lets say that, for my random hyperparameter search, I'd really like to use learning_rate = 10**uniform(-2,-4). That doesn't seem to be an option within the current state of the framework; I'd have to create my own class ExpUniform (which maybe could be already included). Would there be a not too complex way to introduce operator on these distributions instead? Or even if a complex option is on the table, are all attempts at doing this fundamentally overkill and computationally wasteful for the conceptual scope of the distributions in this file?

    enhancement wontfix 
    opened by vincent-antaki 9
  • Service assertion overhaul, MixinForBaseTransformer, minor fix in AutoML and ForceHandleMixin

    Service assertion overhaul, MixinForBaseTransformer, minor fix in AutoML and ForceHandleMixin

    What it is

    Changes in Mixin

    • Adds a private method, called at initialisation, in ForceHandleMixin to ensure the basic functions _X_data_container inherited from _TransformerStep and _FittableStep are redefined. Previously, failure to overrides theses methods led to infinite recursion.
    • Defines MixinForBaseTransformer which all other Mixin inherits. This Mixin asserts, at initilization, that BaseTransform is inherited from the instance and that it has been initialized. This change has the purpose to ensure that the initialization of base class is done in proper order (i.e. Steps before Mixin). (fixes #412)
    • NonTransformableMixin defines a _transform_data_container method.

    Changes in AutoML class

    • Defined _transform_data_container and _fit_transfrom_data_contain to launch a NotImplementError (instead of executing an infinite recurrence)
    • Fixed the undefined local variable error which would happen when a non-terminal exception that happened during the first trial would be catch by the try-catch in the main loop
    • Added a flag in AutoML's contructor to define whether its should raise all error that happen during trial or only the one previously defined (EOFError, KeyboardInterrupt, SystemError, SystemExit). This is useful for moments we want all error to be raised (such as unittests)

    Overhaul of the service assertion mechanism

    There is now two different way to make service assertion; both of which are applicable on any step.

    • SomeStep().assert_has_service_at_execution() wrap the SomeStep() instance with a LocalServiceAssertionWrapper and will test the presence of the service right before SomeStep's execution in the will_process call of LocalServiceAssertionWrapper
    • SomeStep().assert_has_service() wrap SomeStep() with the GlobalyRetrievableWrapper which also tests the presence of the service at execution (like the LocalServiceAssertionWrapper)
    • GlobalServiceAssertionExecutorMixin allows retrieval of all GlobalyRetrievableWrapper instances within the pipeline and tests the presence of required services.
    • StepWithContext nows inherits of GlobalServiceAssertionExecutorMixin and thus test services at the root of the pipeline (i.e. beginning of pipeline execution)

    Reason for this change : The previous version, which only asserted presence of service in a StepWithContext's will_process instance, did not allow to assert presence of a service which is registered during the execution of a pipeline.

    Other minor changes

    • _HasSavers now has an add_saver function
    • Fixed the initialization order of TruncableSteps (fixes #422, mentionned in #369)
    • Identity now inherits from NonFittableMixin and BaseStep instead of BaseTransformer
    • test_could_have_context.py is now test_service_assertions.py (and has been extended)
    • Removed function MetaStepMixin._ensure_proper_mixin_init_order since its now redundant with the MixinForBaseTransformer interface.
    • examples/getting_started/plot_non_fittable_mixin.py has been adjusted to deal with the ForceHandleMixin changes
    • fixes #403
    • fixes #404
    cla-signed 
    opened by vincent-antaki 9
  • Find a way to pass validation data during fit?

    Find a way to pass validation data during fit?

    Hi @alexbrillant @Eric2Hamel,

    We'd need a way to have a validation curve during training to be able to do this:

    • Early stopping in deep learning mdoels when validation loss starts to raise
    • Being able to plot train/val curves to debug the model's training and convergence
    • Being able to send train/val curves to the AutoML algorithm (as model features that can be getted by model.introspect() after a training), so that the AutoML algorithm can also feel things as we see them when in look at those charts.

    How should we change the API of Neuraxle in the next big release, 0.3.0? Neuraxle is currently at 0.2.1. The ability to plot the validation curve is something we procrastinated for too long.

    Suggestion: the MiniBatchSequentialPipeline could be responsible for doing this by calling .fit on train, then .predict on the validation data, then train again... and so forth. Any other ideas? I also thought of:

    • Doing the train/validation split at each mini-batch, inside the fit of the model, but this is very bad.
    • Adding 2 more arguments to fit, but this is bad.
    • Adding 1 more argument to transform, but this is bad and we should be using predict instead.

    So the only solution to me looks like making the MiniBatchSequentialPipeline class manage that. However, how should a user tell the MiniBatchSequentialPipeline what is validation data? Would the MiniBatchSequentialPipeline instead just need a ValidationSplitWrapper (#174) and instead let the wrapper do the alternation of calling fit then predict then fit then predict again?

    Thank you and Best Regards, Guillaume

    enhancement help wanted invalid question 
    opened by guillaume-chevalier 9
  • Bug: Cannot load training states from  checkpoints to resume training

    Bug: Cannot load training states from checkpoints to resume training

    Describe the bug When training AutoML on ResumablePipeline, logs are full of messages like these: UserWarning: Cannot Load Step /models/resumable_pipeline/AutoML/ResumablePipeline (ResumablePipeline:ResumablePipeline) With Step Saver JoblibStepSaver. saver.class.name))

    To Reproduce

      HP_real = HyperparameterSpace({
        "learning_rate": Uniform(1e-5, 1),
        "max_depth": RandInt(2, 4),
        "n_estimators": Choice([30,60,90,100,130])
    })
    
        pipeline_sk = ResumablePipeline([  
        # A Pipeline is composed of multiple chained steps. Steps
        # can alter the data before passing it to the next steps.
        AddFeatures([
            PCA(n_components=2),
            FastICA(n_components=2),
        ]),
        RidgeModelStacking([
            RandomForestRegressor(),
            GradientBoostingRegressor(warm_start=False, min_samples_leaf=2, random_state=42)  # validation_fraction = 0.2
        ])
    ], cache_folder=resumable_pipeline_folder).set_hyperparams_space(HP_real)
    
    time_a = time.time()
    auto_ml = AutoML(
        pipeline_sk,
        #AutoMLContainer  = AutoMLContainer(main_scoring_metric_name = "mse"),
        refit_trial=True,
        n_trials=int(epochs),
        cache_folder_when_no_handle=resumable_pipeline_folder,
        validation_splitter=ValidationSplitter(test_set_ratio),
        hyperparams_optimizer=RandomSearchHyperparameterSelectionStrategy(),
        scoring_callback=ScoringCallback(mean_squared_error, higher_score_is_better=False),
        callbacks=[
            MetricCallback('mse', metric_function=mean_squared_error, higher_score_is_better=False)
        ]
    )
    auto_ml = auto_ml.fit(train, y_train)  # if you use custom label encoder, then fit takes in the whole data at a time
    

    Expected behavior I am expecting that there would be no errors such as these. The goal is to be able to warm start training from old checkpoints. Additional context Add any other context about the problem here.

    bug invalid 
    opened by alarivarmann 8
  • Uncomplete Documentation Strings (Docstrings)

    Uncomplete Documentation Strings (Docstrings)

    Many classes have incomplete docstring. Sometimes, there are also typos.

    The docstrings in classes are used with the sphinx documentation builder to build the website's complete documentation API: https://www.neuraxle.neuraxio.com/stable/api.html

    help wanted good first issue invalid wontfix 
    opened by guillaume-chevalier 8
  • Finally fix checkpoints for good: step saving by summary, and resume/should_resume refactor

    Finally fix checkpoints for good: step saving by summary, and resume/should_resume refactor

    Step Saving Checkpoints Are 100% valid now !! 🚀

    • Moved all resume, and should resume for steps with children inside _HasChildrenMixin
    • Simplified ResumablePipeline algo ALOT
    • Fixed step saving checkpoints : step saving by summary id !
    • Added summary id to save, and load (this is a breaking change :/ we could maybe do something to avoid it.... like method signatures ?) or we could live with it...

    WORK IN PROGRESS TODO: docstrings validate/avoid breaking change weird merge some typings broke

    This needed to be done... I felt really inspired and did this real fast tonight lol If we want to ship properly, we need to have this done because it is written in the readme...

    wontfix cla-signed 
    opened by alexbrillant 7
  • Documentation: Add examples for the neuraxle.steps.output_handlers

    Documentation: Add examples for the neuraxle.steps.output_handlers

    Can you add an example in the documentation for:

    neuraxle.steps.output_handlers.InputAndOutputTransformerMixin
    neuraxle.steps.output_handlers.OutputTransformerWrapper
    

    Especially on how to access the X and Y data and return them.

    In my use case I want to set Y based on X values (financial forecasting)

    thx!

    question wontfix documentation 
    opened by stefvra 7
  • Added feature to compress hps

    Added feature to compress hps

    What it is

    My pull request does: This change will try to compress the flat list of hyperparams, into a more structured representation. Please refer the following PR for more information. I haven't added any tests for now, i wanted to make sure this implementation is reviewed.

    How it works

    I coded it this way: Current implementation will try to read the all_hps = pipeline.get_hyperparams() output and try to compress this particular output into more structured and readable format.

    Example usage

    Here is how you can use this new code as a end user:

    Note: 
    Please make dimensions and types clear to the reader.
    E.g.: in the event fictious data is processed in this code example.
    
    >>>all_hps = pipeline.get_hyperparams()\
    >>>all_hps_shortened = all_hps.compress()
    >>>print(type(all_hps_shortened))
    ...<class 'neuraxle.hyperparams.space.CompressedHyperparameterSamples'>
    >>>pprint(all_hps_shortened)
    ...[
        {
    
            "step_name": "step1",
            "hyper_parameters": OrderedDict([('copy', True), ('iterated_power', 'auto')]),
            "ancestor_steps": ["root"]
        },
        {
    
            "step_name": "step1",
            "hyper_parameters":OrderedDict([('copy', True), ('iterated_power', 'auto')]),
            "ancestor_steps": ["root"]
        }
    
    ]
    
    
    opened by Rohith295 6
  • Feature: RecursiveDict.compress() to shorten paths to steps and their hyperparams

    Feature: RecursiveDict.compress() to shorten paths to steps and their hyperparams

    Is your feature request related to a problem? Please describe. Hyperparam names are too long in nested steps

    Describe the solution you'd like A way to compress the names so as to make them shorter. More specifically, I think that an automated algorithm for all existing ML pipelines could be built. That would be to do something like:

    all_hps = pipeline.get_hyperparams()
    all_hps_shortened = all_hps.compress()
    pprint(all_hps_shortened)
    

    Then we'd see something like this in the pprint:

    {
        "*__MetaStep__*__SKLearnWrapper_LinearRegression__C": 1000,
        "*__SomeStep__hyperparam3": value,
        "*__SKLearnWrapper_BoostedTrees__count": 10
    }
    

    That is, the unique paths to some steps were compressed using the star (*) operator. The Star operator means "one or more steps between". But the way the paths are compressed would be lossless, in the sense that the original names could ALWAYS be retrieved given the original pipeline's tree structure.

    Describe alternatives you've considered Using custom ways to flush words and compress them. That seems good, but it doesn't seem to generalize to all pipelines that could exist.

    Additional context Hyperparameter names were said to be too long as well in #478

    Additional idea For hyperparameters, given the fact that in the future every model may need to name its expected hyperparams, then it may be possible to use their name only and directly if there is no other step with the same hyperparams. If another step uses the same hyperparam names, then compression with the "*" could go up in the tree to find the first non-common parent name or something.

    More ideas are needed to be sure we do this the right way.

    enhancement help wanted good first issue question 
    opened by guillaume-chevalier 6
  • Bug: Test files missing from `sdist` on PyPI

    Bug: Test files missing from `sdist` on PyPI

    Describe the bug

    Currently the testing_neuraxle directory is missing from the sdist on PyPI.

    To Reproduce

    curl -LO https://pypi.io/packages/source/n/neuraxle/neuraxle-0.8.1.tar.gz
    tar xzf neuraxle-0.8.1.tar.gz
    ls -1 neuraxle-0.8.1/testing_neuraxle
    

    Expected behavior

    Ideally the tests would be included in the sdist to make it easier to test after building/installing the package.

    Suggested Fix

    One solution would be to include a MANIFEST.in and add...

    recursive-include testing_neuraxle *.py
    

    ...as well as anything else that might be needed to run the tests (like other directories or other extension types).

    Additional context

    This came up in trying to build & test a Conda package for conda-forge.

    xref: https://github.com/conda-forge/staged-recipes/pull/16566#issuecomment-1217118836


    cc @guillaume-chevalier (for awareness)

    bug invalid 
    opened by jakirkham 0
  • Feature: Additional arguments to fit method in BaseStep

    Feature: Additional arguments to fit method in BaseStep

    The problem: Currently the neuraxle BaseStep has a fit method signature with only 2 parameters (data_inputs, expected_outputs). In libraries like keras it is possible to have additional arguments being passed to the fit method. This could be things like validation generators if the main data_inputs is a data generator as well.

    This means if we want to wrap a keras model that takes two data generators, in a subclass of BaseStep, then it wouldnt be a straight forward implementation.

    Solution: It would be extremely useful if an additional **kwargs is added to the base step fit method(in one or more of the Mixin classes) to enable passing arbitrary arguments to the custom estimator implementations.

    enhancement 
    opened by subramaniam20jan 4
  • Bug: StepSaverCallback & BestModelCheckpoint not working in version 0.7.0

    Bug: StepSaverCallback & BestModelCheckpoint not working in version 0.7.0

    Describe the bug From version 0.7.0 and later, these callbacks don't have access to the pipeline anymore since the pipeline isn't stored anymore in the TrialSplit, but just in the AutoML class.

    To Reproduce Update to 0.7.0 and run these tests that are skipped.

    Expected behavior Callbacks are supposed to log models using the repo.

    Suggested Fix Possibly have the TrialSplits contain their trained model again, instead of letting the Trainer call the model.

    Additional context 0.7.0 is not merged nor deployed at the moment of writing this issue.

    bug invalid 
    opened by guillaume-chevalier 0
  • Testing: test the ParallelFeatureUnion with the tests of the FeatureUnion, and test the ParallelColumnTransformer with the tests of the ColumnTransformer.

    Testing: test the ParallelFeatureUnion with the tests of the FeatureUnion, and test the ParallelColumnTransformer with the tests of the ColumnTransformer.

    There is currently no ParallelColumnTransformer, and lack of tests for ParallelFeatureUnion.

    Suggested fix:

    • Parametrize FeatureUnion tests and ColumnTransformer tests.
    invalid documentation 
    opened by guillaume-chevalier 0
  • Feature: Recursive Context instead of list context for different service cache levels

    Feature: Recursive Context instead of list context for different service cache levels

    Is your feature request related to a problem? Please describe. As the context acts like a stack of function calls with each level having its own "variables"-like saving in the context (e.g.: Memory (as per #443 ), it might be good that if adding a service at the middle of the pipeline, that the service is scoped and local to just this context.pushed level (like a programming language stack / Assembly CPU Stack that is cleared upon stack push).

    Describe the solution you'd like When pushing and popping to context, use a recursive context data structure instead of lists of parents and a single service dict. This was done originally in the first releases of Neuraxle ever, but that was change for some reasons I don't know. It seems that the original intuition was possibly better.

    Describe alternatives you've considered Singly-linked list (stack data structure / design pattern) for the context's parents, and services. Could set global v.s. local services as well (like programming languages' global v.s. local scopes when declaring/using variables).

    Additional context None

    enhancement 
    opened by guillaume-chevalier 1
  • Feature: Have a step store using metaclass-based registration and DLS-based declaration.

    Feature: Have a step store using metaclass-based registration and DLS-based declaration.

    Is your feature request related to a problem? Please describe. Meta-describing a pipeline from, say, a loaded configuration json would require to compose the objects back together. To do so, from object names, we'd need an object store to save the objects to upon creating them, including custom objects.

    Describe the solution you'd like BaseStep to register objects to the global store using a descriptor.

    Describe alternatives you've considered The "Pattern: language integrated registration" of the article found below as well. But it adds too much boilerplate.

    Additional context See this article, especially the sections:

    • Pattern: metaclass based registration
    • Pattern: DSL-based declaration

    Metaclass-based registration could be used as well for specifying hyperparameter distributions and more objects. This could be like Orion strings in the Orion framework.

    enhancement 
    opened by guillaume-chevalier 0
Releases(0.8.1)
  • 0.8.1(Aug 16, 2022)

    General improvements and bugfixes:

    • Update FlattenForEach that wasn't up to date.
    • Use _ids, data_inputs, and expected_outputs more often in DataContainer instead of shorthand ids, di, and eo, which were auto-filling values too often when looping over things in flow classes and output handlers.
    • Changed InputAndOutputTransformerMixin to IdsAndInputAndOutputTransformerMixin and derived classes to also process IDs more often in triplets rather than duo tuples of di and eo.
    • Fixed bugs in DataContainer.
    • Easy to read string representation of the DataContainer (DACT) is now possible for easier print debugging at a glimpse.
    • Repair some skipped tests.
    • Cleaned some docstrings.
    • Add __str__ and __repr__ functionalities to context to show its services and parents in detail upon printing.
    • _TruncableMixin that is common to _TruncableSteps and _TruncableService.
    • Introducing _TruncableServiceWithBodyMixin for .body and .joiner that is easier. Also fix FlattenForEach.
    • Added different copy constructors to services depending on the AutoML train/val phase.
    • Add the .mutate(...) function again in the services and steps.
    • Add the .will_mutate_to(...) function again in the services and steps.
    • Rename copy() to _copy() in the services and ExecutionContext to bypass the fact that the copy method was already defined in some python core data structures. This renaming avoids conflicting these functionalities of the core python libs and of Neuraxle when defining services that inherits from core data structures at the same time.
    • Improvements to the _repr to make step strings less bloated when debugging: removed steps names and steps hyperparams when names are redundant with class names and hyperparams empty. Also sometimes the str will be a compact one-liner when the children of a truncable step are of length 1.
    Source code(tar.gz)
    Source code(zip)
  • 0.8.0(Jul 22, 2022)

    New major version number since new changes added in 0.7.1 and 0.7.2 and the following changes of 0.8.0 that makes debugging and usage of parallelism much easier:

    • Lots of upgrades to the parallelization module of Neuraxle, which is now much more safer and works with logging to be able to debug exceptions and bubble up exceptions properly in unit tests as well. This quality of life improvement saves lots of debugging time even in the short term.
    • Timeouts on tests to avoid deadlocks to hang in unit tester.
    • Doing pickling checks on the parallelized services for them to avoid deadlocking the multithreading queues by having picklables services.
    • It also repairs several bugs that impacted the parallelism that were sometimes causing deadlocks. Only a few race conditions seems to remain.
    • Moving some modules around and removing some dependencies between modules for lighter workflow and clearer file names relatively to contained classes.
    Source code(tar.gz)
    Source code(zip)
  • 0.7.2(Jul 22, 2022)

    Minor release to add AutoML Report classes used to generate statistics:

    • A new file called reporting.py in the AutoML module allows generating statistics on optimization rounds and other related objects.
    • Such a report contains a dataclass of the same subclass-level of itself, so as to be able to dig into the dataclass so as to observe it, such as to generate statistics and query its information.
    • Dataclasses represent the results of an AutoML optimization round, even multiple rounds.
    • These AutoML reports are used to get information from the nested dataclasses, such as to create visuals.
    • Just pass the dataclass to the reporting class, and do function calls.
    • Example usage: BaseReport.from_dc(some_auto_ml_dataclass)
    • Then call the methods for the statistics you want to compute for reporting.
    Source code(tar.gz)
    Source code(zip)
  • 0.7.1(Jul 22, 2022)

    Minor version release that allows for usage of SQLAlchemy ORM for hyperparameter repositories:

    • Recursive tree table that joins on itself to load the vanilla data classes in depth.
    • To represent the nodes, SQL polymorphism is used by joining on other tables.
    • Abstract databases can use many technologies (e.g.: open to be implemented in SQLite, PostgreSQL, MySQL, and more).
    • Some utility functions were added to dataclasses, such as ".tree()".
    Source code(tar.gz)
    Source code(zip)
  • 0.7.0(Apr 15, 2022)

    Major changes to the AutoML module are done in this version to improve its capabilities considerably:

    • Feature: 3rd Version for AutoML module.
    • Feature: ability to print epochs "i/n", and also the iter "j/m" of batches inside an epoch
    • Fix: ValidationSplitWrapper should use the same code as ValidationSplitter.
    • Fix: Cleanup all of the scipy distributions that are already available within the regular distribution module
    • Feature: Ctor of Hp Space should validate that each object is of type Distribution.
    • Feature: ExecutionContext could inherit from some steps mixins like a TruncableStep, and ExecutionContext Services could inherit from some mixins as well.
    • Feature: Typing in meta steps ! (wrappers' subtypes) - like C++ data structures' content templates
    • Feature: Improving Context to handle Assertions and Logging differently depending on the context.
    • Fix: If using savers in ParallelQueuedPipeline
    • Fix: Have the hyperparams setters (and space setters) of the TruncableSteps and MetaStepMixin throw errors when the name of the substep doesn't exist
    • Feature: DeprecatedMixin
    • Feature: Create _HasAssertsMixin to do things like self._assert(1 == 1, error_msg, context, level=None).
    • Feature: Ditch IDs in the data container, and ditch re-hashing mechanisms
    • Feature: Trainer class should be passed as an argument to AutoML
    • Feature: Add BaseStep.config, BaseStep.get_config() and BaseStep.set_config()
    • Fix: ForceHandlers things require overriding fit_transform
    • Fix: higher_score_is_better=False while it should be true
    • Fix: TPE LogNormal distributions
    Source code(tar.gz)
    Source code(zip)
  • 0.6.1(Oct 17, 2021)

    • Make Neuraxle work under Windows. May error sometimes with the file lock issues at deletions / teardowns, but that is a detail and things are mostly working.
    • Add python 3.9 support.
    • Removed support for pickle checkpoints and kept only joblib checkpoints to simplify maintenance.
    • Add vscode to gitignore.
    • Little debugging and refactor of streaming pipeline for python 3.9 support.
    • Usage of savers fixed in the threading and processes code after discovering the bug doing the refactor.
    • API breaking change toSequentialQueuedPipeline and ParallelQueuedFeatureUnion such that use_threading changed touse_processes to be more clear.
    • Some changes to the logging and to objects' print behavior due to the changes in parallelism and python version update.
    • Moves the new_trial function to the base HyperparamsRepository class, thus removing duplicated code.
    • Improve the implementation of str and repr in BaseStep and in the TruncableSteps.
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Jun 29, 2021)

    • Added support for parallel trial execution
    • Added separate logger by trial
    • Change in setup behaviour
    • SKLearnWrapper supports Ensemble methods
    • Added Time Series Processing example
    • Updated / Cleaned a few example
    • And many more quality of life upgrades
    Source code(tar.gz)
    Source code(zip)
  • 0.5.7(Feb 25, 2021)

  • 0.5.5(Sep 15, 2020)

  • 0.5.4(Sep 11, 2020)

  • 0.5.3(Sep 10, 2020)

  • 0.5.2(Jul 20, 2020)

  • 0.5.1(Jul 20, 2020)

    • Fix warnings.warn for python 3.6 & python 3.7
    • Add with_context method to BaseStep to force the use of a given context in the pipeline
    • Use dependency injection by setting the execution context service locator : ExecutionContext().set_service_locator({ BaseType: instance })
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Jul 10, 2020)

    • Added streaming pipelines for parallel processing.
    • Implemented neat .apply() logic in the steps to recursively apply any function if it exists.
    • Refactored the BaseStep so as to make it inherit from multiple smaller classes to separate the logic and allow for creating various steps such as the BaseTransformer that doesn't need any fitting: the BaseStep is thus a BaseTransformer with aditionnal fitting behavior that is added by inheriting another mixin.
    • Some bug were fixed.
    Source code(tar.gz)
    Source code(zip)
Code for You Only Cut Once: Boosting Data Augmentation with a Single Cut

You Only Cut Once (YOCO) YOCO is a simple method/strategy of performing augmenta

null 88 Dec 28, 2022
Benchmark spaces - Benchmarks of how well different two dimensional spaces work for clustering algorithms

benchmark_spaces Benchmarks of how well different two dimensional spaces work fo

Bram Cohen 6 May 7, 2022
Near-Duplicate Video Retrieval with Deep Metric Learning

Near-Duplicate Video Retrieval with Deep Metric Learning This repository contains the Tensorflow implementation of the paper Near-Duplicate Video Retr

null 2 Jan 24, 2022
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pytorch Lightning 1.4k Jan 1, 2023
A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

Xcessiv Xcessiv is a tool to help you create the biggest, craziest, and most excessive stacked ensembles you can think of. Stacked ensembles are simpl

Reiichiro Nakano 1.3k Nov 17, 2022
optimization routines for hyperparameter tuning

Optunity is a library containing various optimizers for hyperparameter tuning. Hyperparameter tuning is a recurrent problem in many machine learning t

Marc Claesen 398 Nov 9, 2022
Saeed Lotfi 28 Dec 12, 2022
Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

Clairvoyance: A Pipeline Toolkit for Medical Time Series Authors: van der Schaar Lab This repository contains implementations of Clairvoyance: A Pipel

van_der_Schaar \LAB 89 Dec 7, 2022
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Detectron is deprecated. Please see detectron2, a ground-up rewrite of Detectron in PyTorch. Detectron Detectron is Facebook AI Research's software sy

Facebook Research 25.5k Jan 7, 2023
Code for "Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks", CVPR 2021

Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks This repository contains the code that accompanies our CVPR 20

Despoina Paschalidou 161 Dec 20, 2022
The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Habitat-Matterport 3D Dataset (HM3D) The Habitat-Matterport 3D Research Dataset is the largest-ever dataset of 3D indoor spaces. It consists of 1,000

Meta Research 62 Dec 27, 2022
Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a

Tianxiang Sun 149 Jan 4, 2023
Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket:

MLJAR Automated Machine Learning Documentation: https://supervised.mljar.com/ Source Code: https://github.com/mljar/mljar-supervised Table of Contents

MLJAR 2.4k Dec 31, 2022
This is the implementation of our work Deep Extreme Cut (DEXTR), for object segmentation from extreme points.

This is the implementation of our work Deep Extreme Cut (DEXTR), for object segmentation from extreme points.

Sergi Caelles 828 Jan 5, 2023
Data loaders and abstractions for text and NLP

torchtext This repository consists of: torchtext.datasets: The raw text iterators for common NLP datasets torchtext.data: Some basic NLP building bloc

null 3.2k Jan 8, 2023
TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline

null 193 Dec 22, 2022