This repository contains FEDOT - an open-source framework for automated modeling and machine learning (AutoML)

Overview

Logo of FEDOT framework

package
tests
docs Documentation Status
license
Supported Python Versions
stats
downloads_stats
support

This repository contains FEDOT - an open-source framework for automated modeling and machine learning (AutoML). It can build custom modeling pipelines for different real-world processes in an automated way using an evolutionary approach. FEDOT supports classification (binary and multiclass), regression, clustering, and time series prediction tasks.

The structure of the modeling pipeline that can be optimised by FEDOT

The main feature of the framework is the complex management of interactions between various blocks of pipelines. First of all, this includes the stage of machine learning model design. FEDOT allows you to not just choose the best type of the model, but to create a complex (composite) model. It allows you to combine several models of different complexity, which helps you to achieve better modeling quality than when using any of these models separately. Within the framework, we describe composite models in the form of a graph defining the connections between data preprocessing blocks and model blocks.

The framework is not limited to specific AutoML tasks (such as pre-processing of input data, feature selection, or optimization of model hyperparameters), but allows you to solve a more general structural learning problem - for a given data set, a solution is built in the form of a graph (DAG), the nodes of which are represented by ML models, pre-processing procedures, and data transformation.

The project is maintained by the research team of the Natural Systems Simulation Lab, which is a part of the National Center for Cognitive Research of ITMO University.

The intro video about Fedot is available here:

Introducing Fedot

FEDOT features

The main features of the framework are as follows:

  • The FEDOT architecture is highly flexible and therefore the framework can be used to automate the creation of mathematical models for various problems, types of data, and models;
  • FEDOT already supports popular ML libraries (scikit-learn, keras, statsmodels, etc.), but you can also integrate custom tools into the framework if necessary;
  • Pipeline optimization algorithms are not tied to specific data types or tasks, but you can use special templates for a specific task class or data type (time series forecasting, NLP, tabular data, etc.) to increase the efficiency;
  • The framework is not limited only to machine learning, it is possible to embed models related to specific areas into pipelines (for example, models in ODE or PDE);
  • Additional methods for hyperparameters tuning can be seamlessly integrated into FEDOT (in addition to those already supported);
  • The resulting pipelines can be exported in a human-readable JSON format, which allows you to achieve reproducibility of the experiments.

Thus, compared to other frameworks, FEDOT:

  • Is not limited to specific modeling tasks and claims versatility and expandability;
  • Allows managing the complexity of models and thereby achieving better results.
  • Allows building models using input data of various nature (texts, images, tables, etc.) and consisting of different types of models.

Installation

Common installation:

$ pip install fedot

In order to work with FEDOT source code:

$ git clone https://github.com/nccr-itmo/FEDOT.git
$ cd FEDOT
$ pip install -r requirements.txt
$ pytest -s test

How to use

FEDOT provides a high-level API that allows you to use its capabilities in a simple way. At the moment, the API can be used for classification and regression tasks only. But the time series forecasting and clustering support will be implemented soon (you can still solve these tasks via advanced initialization, see below). Input data must be either in NumPy arrays or CSV files.

To use the API, follow these steps:

  1. Import Fedot class
from fedot.api.main import Fedot
  1. Initialize the Fedot object and define the type of modeling problem. It provides a fit/predict interface:
  • fedot.fit runs the optimization and returns the resulting composite model;
  • fedot.predict returns the prediction for the given input data;
  • fedot.get_metrics estimates the quality of predictions using selected metrics

Numpy arrays, pandas data frames, and file paths can be used as sources of input data.

model = Fedot(problem='classification')

model.fit(features=train_data.features, target=train_data.target)
prediction = model.predict(features=test_data.features)

metrics = model.get_metrics()

For more advanced approaches, please use Examples & Tutorials section.

Examples & Tutorials

Jupyter notebooks with tutorials are located in the examples repository. There you can find the following guides:

Notebooks are issued with the corresponding release versions (the default version is 'latest').

Also, external examples are available:

Extended examples:

Also, several video tutorials are available (in Russian).

Publications about FEDOT

We also published several posts and news devoted to the different aspects of the framework:

In English:

In Russian:

  • General concepts of evolutionary design for composite pipelines - habr.com
  • Automated time series forecasting with FEDOT - habr.com
  • Details of FEDOT-based solution for Emergency DataHack - habr.com

Project structure

The latest stable release of FEDOT is on the master branch.

The repository includes the following directories:

  • Package core contains the main classes and scripts. It is the core of FEDOT framework
  • Package examples includes several how-to-use-cases where you can start to discover how FEDOT works
  • All unit and integration tests can be observed in the test directory
  • The sources of the documentation are in the docs

Also, you can check benchmarking a repository that was developed to provide a comparison of FEDOT against some well-known AutoML frameworks.

Current R&D and future plans

Currently, we are working on new features and trying to improve the performance and the user experience of FEDOT. The major ongoing tasks and plans:

  • Effective and ready-to-use pipeline templates for certain tasks and data types;
  • Integration with GPU via Rapids framework;
  • Alternative optimization methods of fixed-shaped pipelines;
  • Integration with MLFlow for import and export of the pipelines;
  • Improvement of high-level API.

Also, we are doing several research tasks related to AutoML time-series benchmarking and multi-modal modeling.

Any contribution is welcome. Our R&D team is open for cooperation with other scientific teams as well as with industrial partners.

Documentation

The general description is available in FEDOT.Docs repository.

Also, a detailed FEDOT API description is available in the Read the Docs.

Contribution Guide

  • The contribution guide is available in the repository.

Acknowledgments

We acknowledge the contributors for their important impact and the participants of the numerous scientific conferences and workshops for their valuable advice and suggestions.

Side projects

  • The prototype of web-GUI for FEDOT is available in FEDOT.WEB repository.

Contacts

Supported by

Citation

@article{nikitin2021automated,
title = {Automated evolutionary approach for the design of composite machine learning pipelines}, author = {Nikolay O. Nikitin and Pavel Vychuzhanin and Mikhail Sarafanov and Iana S. Polonskaia and Ilia Revin and Irina V. Barabanova and Gleb Maximov and Anna V. Kalyuzhnaya and Alexander Boukhanovsky}, journal = {Future Generation Computer Systems}, year = {2021}, issn = {0167-739X}, doi = {https://doi.org/10.1016/j.future.2021.08.022}}
@inproceedings{polonskaia2021multi,
title={Multi-Objective Evolutionary Design of Composite Data-Driven Models}, author={Polonskaia, Iana S. and Nikitin, Nikolay O. and Revin, Ilia and Vychuzhanin, Pavel and Kalyuzhnaya, Anna V.}, booktitle={2021 IEEE Congress on Evolutionary Computation (CEC)}, year={2021}, pages={926-933}, doi={10.1109/CEC45853.2021.9504773}}

Other papers - in ResearchGate.

Comments
  • Visualization of the operations used in pipelines

    Visualization of the operations used in pipelines

    Featuring visualization of operations used in evolutionary process. Changes:

    • Added operation_kde plot to show operations by generations.
    • Added operation_animated_bar to show operations by generations with (or without) changing fitness.
    • Modified fitness_box plot. Now it visually fits the mentioned visuals and supports pct_best parameter.

    Examples of visuals: KDE kde_best_20

    Animated barplot _new_test_fitness The same with explicitly hidden fitness: _new_test

    Modified fitness box fitness_box

    opened by MorrisNein 14
  • Preprocessing refactor

    Preprocessing refactor

    Changed the core architecture. Preprocessing operations now can be placed in separate nodes. Important changes:

    1. The Model abstraction is now replaced with an Operation that has two descendant classes: DataOperation class and Model class;
    2. Preprocessing operations, both simple, such as scaling, and advanced, such as lagged transformation for time series, can be used in nodes as previously models can be;
    3. New tuning (optimization of hyperparameters in nodes) is implemented - here are two classes: ChainTuner and SequentialTuner, both of them are using hyperopt library;
    4. New operator for mutation was implemented. Now it is possible to change hyperparameters in the nodes during chain composing;
    5. Now it is possible to feed different data sources to primary nodes, in particular, this functionality is used with exogenous time series, an example for which is available in examples;
    6. New low-level abstraction "Implementation" was appeared in the core. This abstraction includes custom models that are implemented in FEDOT;
    7. New operations have been implemented and added to the data_operations repository, such as feature selection, exclusion of anomalous values, smoothing of time series, and much more.
    opened by Dreamlone 13
  • + pipeline node operations cache support

    + pipeline node operations cache support

    Example showing the meaning of using an operations cache (haven't tested with multiprocessing yet): examples/advanced/pipelines_caching.py

    But...if you've got no time to wait for the results (up to 8 minutes by default), here is the result image with example_number=1, timeout=1., n_partitions=10: image

    Usages of the cache were added to: fedot/api/main.py fedot/core/composer/gp_composer/gp_composer.py fedot/core/validation/compose/metric_estimation.py fedot/api/api_utils/api_composer.py

    Class responsible for the caching: fedot/core/composer/cache.py


    Other changes contains mostly readability/style/performance fixes + full logger support.

    opened by IIaKyJIuH 12
  • Feature/pipeline explanations

    Feature/pipeline explanations

    What's new:

    1. An inner repo fedot/explainability for the corresponding experiments.
    2. Explainer abstract class, implementing an interface.
    3. SurrogateExplainer class for building surrogate explanation models. The only supported surrogate at the moment is the decision tree (for both classification and regression tasks).
    4. explain method of the Fedot class for creating explanations -- instances of Explainer successors.
    enhancement cases test 
    opened by MorrisNein 12
  • NLP init

    NLP init

    • New method InputData.from_text(), where you can pass meta_file.csv with text in it or path to directories with text files
    • New TextData class, where all the nlp utils are located. Not expected to use directly.

    Current idea is: text files -> feature extraction (make table data, not text) -> pass to model/chain

    • [x] Finish BatchLoader for creation of meta_file.csv for collections of data (images, text)

    • [x] Finish the text files -> meta_file.csv

    • [x] Add tests

    • [x] add packed data && unpacking script

    opened by BarabanovaIrina 11
  • enabled `logging_level` option

    enabled `logging_level` option

    1. Fixed Log initialization uniqueness (in collaboration with @maypink)
    2. Rid of useless logging_level_opt option (@maypink)
    3. Enabled usage whilst multiprocessing: separate file writings, log level preserving
    4. Added show_progress option to tuner
    5. Made pytest fixture that "resets" singletons before each test to sustain theirs pattern
    6. Minor fixes: caching, sortings, docstrings, logical fixes
    opened by IIaKyJIuH 10
  • 735-improving-fedot-documentation (structure)

    735-improving-fedot-documentation (structure)

    Partial improvement.

    1. Changed structure of the documentation according to this document
    2. Added docstrings to classes/functions/props and improved existing ones
    3. Created copy_doc decorator to copy docstrings for logically the same functions
    4. Improved code blocks, typings, variables names
    opened by IIaKyJIuH 10
  • Encoding bug

    Encoding bug

    • add import/export for "one_hot_encoding" operation
    • whether categories in test data contain in train data
    • fix issues with categorical expansion
    • fix testing time

    RIGHT now we have preprocessing pipeline:

    • convert values to one type in columns
    • fill missing values
    • one hot encoding for categorical

    Closed issues: #400 #399 #412

    opened by MAGLeb 10
  • Sensitivity

    Sensitivity

    Реализованы подходы к структурному анализу композитной модели. На данный момент Node можно:

    • Просто удалить с сохранением ее поддерева,
    • Затюнить,
    • Заменить на ноды:
      • кастомные (передать список моделей напрямую)
      • рандомные (передать количество нод, которые хотелось бы сгенерить)
      • иначе будут применены все модели доступные в рамках задачи.
    • Оценить чувствительность гиперпараметров моделей с использование индексов Соболя

    Это можно сделать:

    • через class NodeAnalysis, который может проанализировать 1 ноду несколькими подходами.
    • через class ChainStructureAnalysis, который может проанализировать несколько нод несколькими подходами

    В fedot.utilities.define_metric_by_task есть MetricByTask. Если метрика не указана, то берется стандартная метрика для Task, определенного внутри InputData.

    Диаграмма классов: Screenshot 2021-01-28 at 18 39 20

    Мини-туториал: https://fedot.readthedocs.io/en/latest/fedot/features/sensitivity_analysis.html

    UPD: в рамках данного pr была отключена сборка информации о покрытие кода в manual_build action.

    opened by BarabanovaIrina 10
  • Remote evaluation feature implemented for pipelines in composer

    Remote evaluation feature implemented for pipelines in composer

    What's new:

    'Remote' module:

    1. run_pipeline.py script is added. It receives the pipeline and data description as input and returns the fitted pipeline. This script is aimed to be called at the remote node
    2. ComputationalSetup class added to implement the remote fit of pipelines
    3. Batch evaluation of the pipelines added to the composer if ComputationalSetup is initialized.
    4. Logic for REST requests to a remote server with computational resources.

    Other:

    1. Indices corrected for Data. Now, the indices are preserved during transformations and not creating from scratch.
    2. New notebook added for industrial case
    3. Minor fixes in forecasting

    Known disadvantages:

    1. Now, the ComputationalSetup is DataMall-specific. The refactoring to support different strategies is expected.
    2. Сlient should be moved to external repository as exported from pypi.
    enhancement in progress 
    opened by nicl-nno 9
  • Very strict dependency requirements

    Very strict dependency requirements

    I'm very excited to try out this package, but its strict dependency requirements are giving me headaches when setting up my conda environment.

    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    fedot 0.3.1 requires scikit-learn==0.24.1, but you have scikit-learn 0.24.2 which is incompatible.
    fedot 0.3.1 requires scikit-optimize==0.7.4, but you have scikit-optimize 0.9.dev0 which is incompatible.
    fedot 0.3.1 requires xgboost==1.0.1, but you have xgboost 1.4.2 which is incompatible.
    

    Is it possible to relax the requirements?

    dependencies 
    opened by bacalfa 9
  • Metric evaluation error: y_true and y_pred contain different number of classes

    Metric evaluation error: y_true and y_pred contain different number of classes

    The problem "Metric evaluation error: y_true and y_pred contain different number of classes 45, 85" raises for click_prediction_small dataset

    Looks like class stratification is failing.

    bug 
    opened by nicl-nno 2
  • Caching performance is worse than the one from the earlier versions

    Caching performance is worse than the one from the earlier versions

    As for FEDOTv0.6.0 performance of the cache (both for operations and for preprocessors) has deeply regressed from the previous versions. That can be seen from quality metrics on classification datasets - they're better when the cache is turned off... image

    It's important to accelerate a work of the cacher, so it would be reasonable to use it (It is enabled by default).

    opened by IIaKyJIuH 1
  • Bug with filter operation

    Bug with filter operation

    https://colab.research.google.com/drive/1MRgwB9sCrFeXRme0eGSZ0KeBRAMlnfOk?usp=sharing data: https://drive.google.com/file/d/1Msx8sUVm6jh6L-QH20aJrw47DvvKk-HM/view?usp=sharing

    Exception in run_regression_example(visualise) 22 **composer_params) 23 ---> 24 auto_model.fit(features=train, target='target') 25 prediction = auto_model.predict(features=test) 26 if visualise:

    /usr/local/lib/python3.8/dist-packages/fedot/api/main.py in fit(self, features, target, predefined_model) 181 # Final fit for obtained pipeline on full dataset 182 if self.history and not self.history.is_empty() or not self.current_pipeline.is_fitted: --> 183 self._train_pipeline_on_full_dataset(recommendations, full_train_not_preprocessed) 184 self.params.api_params['logger'].message('Final pipeline was fitted') 185 else:

    /usr/local/lib/python3.8/dist-packages/fedot/api/main.py in _train_pipeline_on_full_dataset(self, recommendations, full_train_not_preprocessed) 456 {k: v for k, v in recommendations.items() 457 if k != 'cut'}) --> 458 self.current_pipeline.fit( 459 full_train_not_preprocessed, 460 n_jobs=self.params.api_params['n_jobs'],

    /usr/local/lib/python3.8/dist-packages/fedot/core/pipelines/pipeline.py in fit(self, input_data, time_constraint, n_jobs) 139 140 if time_constraint is None: --> 141 train_predicted = self._fit(input_data=copied_input_data) 142 else: 143 train_predicted = self._fit_with_time_limit(input_data=copied_input_data, time=time_constraint)

    /usr/local/lib/python3.8/dist-packages/fedot/core/pipelines/pipeline.py in _fit(self, input_data, process_state_dict, fitted_operations) 102 with Timer() as t: 103 computation_time_update = not self.root_node.fitted_operation or self.computation_time is None --> 104 train_predicted = self.root_node.fit(input_data=input_data) 105 if computation_time_update: 106 self.computation_time = round(t.minutes_from_start, 3)

    /usr/local/lib/python3.8/dist-packages/fedot/core/pipelines/node.py in fit(self, input_data, **kwargs) 389 self.log.debug(f'Trying to fit secondary node with operation: {self.operation}') 390 --> 391 secondary_input = self._input_from_parents(input_data=input_data, parent_operation='fit') 392 393 return super().fit(input_data=secondary_input)

    /usr/local/lib/python3.8/dist-packages/fedot/core/pipelines/node.py in _input_from_parents(self, input_data, parent_operation) 429 parent_nodes = self._nodes_from_with_fixed_order() 430 --> 431 parent_results, _ = _combine_parents(parent_nodes, input_data, 432 parent_operation) 433

    /usr/local/lib/python3.8/dist-packages/fedot/core/pipelines/node.py in _combine_parents(parent_nodes, input_data, parent_operation) 471 parent_results.append(prediction) 472 elif parent_operation == 'fit': --> 473 prediction = parent.fit(input_data=input_data) 474 parent_results.append(prediction) 475 else:

    /usr/local/lib/python3.8/dist-packages/fedot/core/pipelines/node.py in fit(self, input_data, **kwargs) 301 else: 302 self.node_data = input_data --> 303 return super().fit(input_data) 304 305 def unfit(self):

    /usr/local/lib/python3.8/dist-packages/fedot/core/pipelines/node.py in fit(self, input_data) 179 self.fit_time_in_seconds = round(t.seconds_from_start, 3) 180 else: --> 181 operation_predict = self.operation.predict_for_fit(fitted_operation=self.fitted_operation, 182 data=input_data, 183 params=self._parameters)

    /usr/local/lib/python3.8/dist-packages/fedot/core/operations/operation.py in predict_for_fit(self, fitted_operation, data, params, output_mode) 112 for example, is the operation predict probabilities or class labels 113 """ --> 114 return self._predict(fitted_operation, data, params, output_mode, is_fit_stage=True) 115 116 def _predict(self, fitted_operation, data: InputData, params: Optional[OperationParameters] = None,

    /usr/local/lib/python3.8/dist-packages/fedot/core/operations/operation.py in _predict(self, fitted_operation, data, params, output_mode, is_fit_stage) 122 123 if is_fit_stage: --> 124 prediction = self._eval_strategy.predict_for_fit( 125 trained_operation=fitted_operation, 126 predict_data=data)

    /usr/local/lib/python3.8/dist-packages/fedot/core/operations/evaluation/regression.py in predict_for_fit(self, trained_operation, predict_data) 84 :return: 85 """ ---> 86 prediction = trained_operation.transform_for_fit(predict_data) 87 converted = self._convert_to_output(prediction, predict_data) 88 return converted

    /usr/local/lib/python3.8/dist-packages/fedot/core/operations/evaluation/operation_implementations/data_operations/sklearn_filters.py in transform_for_fit(self, input_data) 59 mask = self.operation.inlier_mask_ 60 if mask is not None: ---> 61 input_data = update_data(input_data, mask) 62 else: 63 self.log.info("Filtering Algorithm: didn't fit correctly. Return all objects")

    /usr/local/lib/python3.8/dist-packages/fedot/core/operations/evaluation/operation_implementations/data_operations/sklearn_filters.py in update_data(input_data, mask) 231 old_idx = modified_input_data.idx 232 --> 233 modified_input_data.features = old_features[mask] 234 modified_input_data.target = old_target[mask] 235 modified_input_data.idx = np.array(old_idx)[mask]

    IndexError: boolean index did not match indexed array along dimension 0; dimension is 68 but corresponding boolean dimension is 55

    bug 
    opened by valer1435 0
  • Proposed method of installation on MAC M1 and GPU utilisation

    Proposed method of installation on MAC M1 and GPU utilisation

    What is the proposed method of installation in Mac silicon M1?

    After a lot of trials, my installation worked with the following set of commands on a x86 architecture conda env

    in new python env:

    conda config --env --set subdir osx-64 conda install python=3.8.13 pip install fedot brew install libomp conda install -c conda-forge lightgbm

    Also, can't seem to get any GPU activity, is there any config script I can include in code for M1?

    opened by gdamianakos 1
Releases(v0.6.1)
  • v0.6.1(Dec 12, 2022)

    Hi, folk! We're making a new minor release with a number of improvements. This is an important release in a sense that this is a last release of self-contained FEDOT. The next major release will mark a separation of the optimizer core into the separate project.

    New features, better quality & changes in API

    • More intuitive predict interface for time series forecasting (#930)
    • Pipeline save/load now have more intuitive behavior (#971)
    • Early stopping criteria now can take timeout into considerations, and not only number of iterations (early_stopping_timeout api parameter)
    • Graph nodes now can be accessed by name or uid (#982)
    • Tuner speed is better due to better initial params in the search space (#985)

    Enhancements and fixes:

    • Fix inplace modification of data during data definition (resolves #943)
    • Fix regression preprocessing (#955)
    • Less evaluation errors during population selection in corner cases (#956)
    • Fix getting suitable operations for multi ts (#981)
    • Integration tests are fixed & passing now
    • More minor fixes & minor class interface refactorings
    • Important fix for multi-objective optimization (#996)

    Documentation is extended

    Architectural refactorings are continued:

    • Better PipelineAdapter (#941)
    • Abstracting optimiser core (most tasks in issue #713 are done) Notably, Serializer subsystem is now extendable (#969)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Oct 18, 2022)

    Hi everyone! We released a new major version of FEDOT - 0.6.0

    It includes a lot of major changes:

    • Improvement of API for multi-modal datasets and models;
    • New PipelineBuilder (#597) – that simplifies manual construction of ML Pipelines;
    • Joblib was embedded as a multiprocessing backend (#843). Data exchange between processes minimized (#926);
    • Embedding stratify k fold strategy for cases with imbalance data;
    • New visualization of graphs, pipelines and optimisation history;

    Also, this release contains by a lot of architectural refactorings of the framework:

    • New Graph Adapter subsystem (#876);
    • Merging two different implementation of evolutionary optimizer (parameter-free & usual) into one EvoGraphOptimizer (#687)
    • Architectural refactorings of the Graph hierarchy (#750)
    • Introduce notions of Objective & Fitness (#654) – classes that substitutes simple float metric values & abstract single vs. multi-objective metrics
    • Refactored parameter classes – for more intuitive segregation of different parameters controlling optimization process (#852)
    • Refactored DataMerger facility
    • Refactoring of selection operator implementation (#918)

    Also, there are various bug-fixes related to ML operations, evolutionary operators & internal Graph operations.

    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Feb 22, 2022)

    The most important changes:

    • Cache support for the cross-validation implemented;
    • AutoML can be run without a time limit;
    • Graph operators improved;
    • Multi-task pipelines processing improved;
    • Custom parameters support for external optimizer;
    • Time series processing improved;
    • Multimodal table processing improved;
    • Lightweight docker prepared;
    • Isolation Forest added as new operation
    • Major and minor bugs are fixed.
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Dec 31, 2021)

    Hi everyone!

    We released a new major version of FEDOT - 0.5.0 It includes several major changes

    The new version is available and can be imported via pip: pip install fedot==0.5.0

    The most important changes:

    • Preprocessing for tabular features and target variables improved dramatically.
    • API is refactored and improved (presets, parameters, etc). The postfix "_tun" has been removed from the presets, so now you have to specify composer_params={'with_tuning': True} to set tuning. Important changes in preset names: light - best_quality, ultra_light - fast_train.
    • Support for external optimisers is implemented.
    • Zero-code console interface is implemented.
    • Surrogate decision trees for pipeline interpretation are added.
    • Custom model support is implemented.
    • Better presets and models for time series forecasting (derivatives, polynomial models, cuts, etc)
    • Better integration with FEDOT.Web
    • Prototype for remote infrastructure support
    • Evolutionary optimiser improved (stopping criterion, progress bar, better mutations, etc)
    • Major and minor bugs are fixed.
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Oct 8, 2021)

    We released a new major version of FEDOT - 0.4.1 It includes several large changes, features, and fixes.

    The new version is available and can be imported via pip: pip install fedot==0.4.1

    The most important changes:

    • Major bugs fixed for evolutionary composing: we get rid of many annoying problems related to fitness evaluation and mutations
    • Multi-variate time series forecasting improved
    • Torch-based LSTM model added
    • Encoding stage for categorial features implemented
    • Docker containers updated, GPU example improved
    • Export of pipelines improved
    • Processing of hyperparameters improved
    • API refactored
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Aug 18, 2021)

    We released a new major version of FEDOT - 0.4.0 It includes several large changes, features, and fixes.

    The new version is available and can be imported via pip: pip install fedot==0.4.0

    The most important changes:

    Infrastructure:

    • Docker version added;
    • GPU support added;
    • Requirements become more flexible

    Optimizer:

    • Evolutionary optimizer generalized to allow the application to the custom non-ML tasks
    • Mutation schemes improved for a more explainable evolution process;
    • History saving extended

    Time series:

    • Cross-validation for the time series implemented;
    • Sparse lagged transformation for time series implemented to improve performance;

    Common:

    • API updated and simplified;
    • Processing of categorical features improved;
    • Fixes and improvements for the hyperparameters tuning;
    • The ‘chain’ term is replaced with ‘pipeline’ for better understandability.

    Utilities:

    • Sensitivity analysis improved
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(May 30, 2021)

    During the last month, we have merged several major features and fixed a bunch of bugs. Some of them are experimental and should be tested extensively in real-world cases. But we have tried our best and covered it with unit tests.

    The new version (fedot == 0.3.1) is available and can be imported via pip.

    The most important features:

    • ML pipelines for multi-modal datasets
    • Decompose operation in ML pipelines
    • Cross-validation in Composer
    • Add Memory and Time profilers
    • Memory consumption improving

    For details, see the post in repository: https://github.com/nccr-itmo/FEDOT/discussions/317

    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(May 10, 2021)

    Hello everyone!

    Our team finally has finished preparing a new major release of fedot == 0.3.0. Thanks to all dev team who was working on it! It is available and can be imported via pip: https://pypi.org/project/fedot/0.3.0/. The most important changes:

    • Extended data operations and their automatic optimization

    Previously, Fedot (Chain objects) allow one to automatically build ML pipelines including models, but data operations (like scaling or gap-filling) were embedded in the nodes and could be changed manually only. In the latest release we significantly refactored the core logic of the framework, thus data operations are fully supported as separate nodes. It can extend the overall search space of a suitable ML pipeline.

    • New AutoML for time-series forecasting

    Now Fedot supports not the only manual building of ML pipelines for time-series forecasting but also in an automated mode via Composer! Fedot allow one to build pipelines and forecast time-series for a given window size and forecasting length. Also, it is possible to use exogen variables for forecasting. To check all features, see examplesin the repository.

    Our early studies showed it is a promising approach that can improve AutoML field for time-series. We are actively working on the benchmarking of well-known SOTA frameworks for time-series forecasting and novel results will be published in a near future. Also, you can check our fresh preprint about gap filling in time-series using Fedot framework.

    • Black-box optimization of ML pipeline hyperparameters

    During the experiments, we found out that our previous version of tuning of hyperparameters seems to be ineffective (also it didn't work out for preprocessing nodes). Therefore, we significantly refactored the tuning module and it provides several schemas for black-box optimization of ML pipelines hyperparameters. For details, check tuning module sources and the examples.

    • Multi-Objective AutoML for pipelines

    Several months ago during the team discussion, we formulate a hypothesis: "Most of the AutoML frameworks are trying to maximize only one metric - prediction quality. But can we optimize several metrics (like pipeline complexity, for instance) simultaneously?" So we made research where evolutionary multi-objective optimization algorithms (like NSGA-II, SPEA-2) were adapted to the AutoML task. And it was concluded that it is a promising feature and we have integrated it into Fedot. The preprint is available, but also you can check the example how to use multi-objective optimization via Fedot API.

    • New input data support for image classification

    Later, we have announced that images will be supported in Fedot. And we made several changes in InputData and now pipelines for image classification can be built manually. We also added several CNN architectures and example of its usage. Composer should also work for image classification but we have not tested extensively this functionality yet.

    Also, we have fixed a bunch of bugs and improved Fedot API.

    Thanks to everyone who is following our progress! Any issues and user reports are welcomed. Cya!

    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Mar 12, 2021)

    Greetings to everyone who follows our team and FEDOT development progress!

    Today, we released a new version of fedot == 0.2.1.

    Here is the list of the main changes:

    • Main API is updated. The basic 'how-to-use is available in the https://github.com/nccr-itmo/FEDOT/blob/master/notebooks/intro_to_automl.ipynb
    • Support of the pandas dataframes is added
    • Logging is improved
    • The sensitivity analysis of the composite model (chain) structure is added. The description is available in https://fedot.readthedocs.io/en/latest/fedot/features/sensitivity_analysis.html

    New version can be obtained using pip install fedot == 0.2.1

    Оur team is very interested in any user feedback, the new issues are extremely welcomed! Thank you!

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jan 21, 2021)

    Greetings to everyone who follows our team and FEDOT development progress!

    Last week, we released a new version of fedot == 0.2.0. A bunch of bugs in framework were fixed and merged to master (main) and release branches. Here is the list of the main changes:

    • NLP tasks are now supported, a simple example of text classification were added (see here)
    • The first version of fedot high-level API were implemented, see readme for the instructions
    • Fixed several bugs with chain import/export
    • Composer now should work correctly for time-series task
    • Embedded visualization of composing and the resulted chains were improved, see the example here
    Source code(tar.gz)
    Source code(zip)
Owner
National Center for Cognitive Research of ITMO University
National Center for Cognitive Research of ITMO University
FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning (FedML) developed and maintained by Scaleout Systems. FEDn enables highly scalable cross-silo and cross-device use-cases over FEDn networks.

Scaleout 75 Nov 9, 2022
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model architecture for their classification problems (i.e., DNNs with different types of layers).

Google 3.2k Dec 31, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Dec 30, 2022
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Achraf Rahouti 3 Nov 30, 2021
An AutoML Library made with Optuna and PyTorch Lightning

An AutoML Library made with Optuna and PyTorch Lightning Installation Recommended pip install -U gradsflow From source pip install git+https://github.

GradsFlow 294 Dec 17, 2022
Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

Chord Recognition Demo application The demo application is written in C# with .NETCore. As of July 9, 2020, the only version available is for windows

Andres Mauricio Rondon Patiño 24 Oct 22, 2022
MMRazor: a model compression toolkit for model slimming and AutoML

Documentation: https://mmrazor.readthedocs.io/ English | 简体中文 Introduction MMRazor is a model compression toolkit for model slimming and AutoML, which

OpenMMLab 899 Jan 2, 2023
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

null 170.1k Jan 4, 2023
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

null 170.1k Jan 5, 2023
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

null 153.2k Feb 13, 2021
Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

Clairvoyance: A Pipeline Toolkit for Medical Time Series Authors: van der Schaar Lab This repository contains implementations of Clairvoyance: A Pipel

van_der_Schaar \LAB 89 Dec 7, 2022
AutoDeeplab / auto-deeplab / AutoML for semantic segmentation, implemented in Pytorch

AutoML for Image Semantic Segmentation Currently this repo contains the only working open-source implementation of Auto-Deeplab which, by the way out-

AI Necromancer 299 Dec 17, 2022
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

null 185 Dec 26, 2022
LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

donglee 279 Dec 13, 2022
This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Deep Continuous Clustering Introduction This is a Pytorch implementation of the DCC algorithms presented in the following paper (paper): Sohil Atul Sh

Sohil Shah 197 Nov 29, 2022
The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

Booz Allen Hamilton 112 Dec 13, 2022
This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks Project Page | Video | Presentation | Paper | Data L

Facebook Research 281 Dec 22, 2022
This repository contains the source code of our work on designing efficient CNNs for computer vision

Efficient networks for Computer Vision This repo contains source code of our work on designing efficient networks for different computer vision tasks:

Sachin Mehta 386 Nov 26, 2022