Automated Machine Learning with scikit-learn

Overview

auto-sklearn

auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.

Find the documentation here

Automated Machine Learning in four lines of code

import autosklearn.classification
cls = autosklearn.classification.AutoSklearnClassifier()
cls.fit(X_train, y_train)
predictions = cls.predict(X_test)

Relevant publications

Efficient and Robust Automated Machine Learning
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum and Frank Hutter
Advances in Neural Information Processing Systems 28 (2015)
http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf

Auto-Sklearn 2.0: The Next Generation
Authors: Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer and Frank Hutter
arXiv:2007.04074 [cs.LG], 2020 https://arxiv.org/abs/2007.04074

Comments
  • Can't install auto-sklearn due to pyrfr dependency

    Can't install auto-sklearn due to pyrfr dependency

    Hi all, I'm trying to benchmark auto-sklearn against the same data sets as TPOT on my local cluster (running Linux), but I can't get pyrfr to install when I pip install auto-sklearn.

    Is there an earlier version of auto-sklearn that I can use to perform this benchmark?

    opened by rhiever 39
  • Changes show_models() function to return a dictionary of models in ensemble

    Changes show_models() function to return a dictionary of models in ensemble

    Summary

    Currently the show_models() function returns an str that has to be manually parsed with no way to access the models. I have changed it so that it returns a dictionary of models in ensemble and their information. This helps fix issue #1298 and the issues mentioned inside that thread.

    What's changed

    Using show_models() will return a dictionary where the key would be model_id. Each entry is a model dictionary which contains the following:

    • model_id
    • rank
    • ensemble_weight
    • data preprocessor
    • balancing
    • feature_preprocessor
    • regressor or classifier (autosklearn wrapped model)
    • sklearn model

    Example

    import sklearn.datasets
    import sklearn.metrics
    
    import autosklearn.regression
    import matplotlib.pyplot as plt
    
    X, y = sklearn.datasets.load_diabetes(return_X_y=True)
    
    X_train, X_test, y_train, y_test = \
        sklearn.model_selection.train_test_split(X, y, random_state=1)
    
    automl = autosklearn.regression.AutoSklearnRegressor(
        time_left_for_this_task=120,
        per_run_time_limit=30,
        tmp_folder='/tmp/autosklearn_regression_example_tmp',
    )
    automl.fit(X_train, y_train, dataset_name='diabetes')
    
    ensemble_dict = automl.show_models()
    print(ensemble_dict)
    

    Output:

    {
    25: {'model_id': 25.0, 'rank': 1.0, 'ensemble_weight': 0.46, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7ff2c06588d0>, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7ff2c057bd50>, 'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7ff2c057ba90>, 'sklearn_model': SGDRegressor(alpha=0.0006517033225329654, epsilon=0.012150149892783745,
                 eta0=0.016444224834275295, l1_ratio=1.7462342366289323e-09,
                 loss='epsilon_insensitive', max_iter=16, penalty='elasticnet',
                 power_t=0.21521743568582094, random_state=1,
                 tol=0.002431731981071206, warm_start=True)}, 
    6: {'model_id': 6.0, 'rank': 2.0, 'ensemble_weight': 0.32, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7ff2c05b3f50>, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7ff2c065c990>, 'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7ff2c057ba10>, 'sklearn_model': ARDRegression(alpha_1=0.0003701926442639788, alpha_2=2.2118001735899097e-07,
                  copy_X=False, lambda_1=1.2037591637980971e-06,
                  lambda_2=4.358378124977852e-09,
                  threshold_lambda=1136.5286041327277, tol=0.021944240404849075)}, ....
    
    opened by UserFindingSelf 31
  • Segmentation fault

    Segmentation fault

    when try the sample code:

    import autosklearn.classification
    import sklearn.model_selection
    import sklearn.datasets
    import sklearn.metrics
    X, y = sklearn.datasets.load_digits(return_X_y=True)
    X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, random_state=1)
    automl = autosklearn.classification.AutoSklearnClassifier()
    automl.fit(X_train, y_train)
    y_hat = automl.predict(X_test)
    print("Accuracy score", sklearn.metrics.accuracy_score(y_test, y_hat))
    

    something wrong with the automl.fit line, and got "Segmentation fault", the python command console exit.

    >automl.fit(X_train, y_train)
    /home/work/.pyenv/versions/py3_env/lib/python3.6/site-packages/autosklearn/evaluation/train_evaluator.py:197: RuntimeWarning: Mean of empty slice
      Y_train_pred = np.nanmean(Y_train_pred_full, axis=0)
    [WARNING] [2019-05-29 16:26:32,672:EnsembleBuilder(1):d74860caaa557f473ce23908ff7ba369] No models better than random - using Dummy Score!
    [WARNING] [2019-05-29 16:26:32,680:EnsembleBuilder(1):d74860caaa557f473ce23908ff7ba369] No models better than random - using Dummy Score!
    Segmentation fault
    (py3_env) [work@*** ~]$ [WARNING] [2019-05-29 16:26:34,685:EnsembleBuilder(1):d74860caaa557f473ce23908ff7ba369] No models better than random - using Dummy Score!
    [WARNING] [2019-05-29 16:26:36,689:EnsembleBuilder(1):d74860caaa557f473ce23908ff7ba369] No models better than random - using Dummy Score!
    

    does anyone come across this problem?

    opened by ChangbingChen 28
  • Error during installation of pyrfr

    Error during installation of pyrfr

    I got the following error when installing pyrfr. What is installation directory here? My current directory?

    
    (ml_env) ubuntu@ip-10-0-0-10:~/random_forest_run/build$ curl https://raw.githubusercontent.com/automl/auto-sklearn/master/requirements.txt | xargs -n 1 -L 1 pip install
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100   230  100   230    0     0   2334      0 --:--:-- --:--:-- --:--:--  4893
    Requirement already satisfied: unittest2 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: six>=1.4 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from unittest2)
    Requirement already satisfied: argparse in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from unittest2)
    Requirement already satisfied: traceback2 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from unittest2)
    Requirement already satisfied: linecache2 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from traceback2->unittest2)
    Requirement already satisfied: setuptools in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg
    Requirement already satisfied: nose in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: six in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: Cython in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: numpy<1.12,>=1.9.0 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: scipy>=0.14.1 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: numpy>=1.8.2 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from scipy>=0.14.1)
    Requirement already satisfied: scikit-learn==0.17.1 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: lockfile in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: joblib in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: psutil in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: pyyaml in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: ConfigArgParse in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: liac-arff in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: pandas in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: python-dateutil>=2 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from pandas)
    Requirement already satisfied: pytz>=2011k in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from pandas)
    Requirement already satisfied: numpy>=1.7.0 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from pandas)
    Requirement already satisfied: six>=1.5 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from python-dateutil>=2->pandas)
    Requirement already satisfied: xgboost==0.4a30 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: scipy in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from xgboost==0.4a30)
    Requirement already satisfied: numpy in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from xgboost==0.4a30)
    Requirement already satisfied: ConfigSpace<0.4,>=0.3.1 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: typing in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from ConfigSpace<0.4,>=0.3.1)
    Requirement already satisfied: numpy in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from ConfigSpace<0.4,>=0.3.1)
    Requirement already satisfied: argparse in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from ConfigSpace<0.4,>=0.3.1)
    Requirement already satisfied: pyparsing in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from ConfigSpace<0.4,>=0.3.1)
    Requirement already satisfied: pynisher>=0.4 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: docutils>=0.3 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from pynisher>=0.4)
    Requirement already satisfied: psutil in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from pynisher>=0.4)
    Requirement already satisfied: setuptools in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg (from pynisher>=0.4)
    
    Collecting pyrfr
      Using cached pyrfr-0.4.0.tar.gz
    Building wheels for collected packages: pyrfr
      Running setup.py bdist_wheel for pyrfr ... error
      Complete output from command /home/ubuntu/anaconda3/envs/ml_env/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-axcsozeu/pyrfr/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpmdz8gclmpip-wheel- --python-tag cp36:
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.6
      creating build/lib.linux-x86_64-3.6/pyrfr
      copying pyrfr/__init__.py -> build/lib.linux-x86_64-3.6/pyrfr
      running build_ext
      building '_regression' extension
      swigging pyrfr/regression.i to pyrfr/regression_wrap.cpp
      swig -python -c++ -I${CMAKE_SOURCE_DIR}/include -I./include -o pyrfr/regression_wrap.cpp pyrfr/regression.i
      creating build/temp.linux-x86_64-3.6
      creating build/temp.linux-x86_64-3.6/pyrfr
      gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I${CMAKE_SOURCE_DIR}/include -I./include -I/home/ubuntu/anaconda3/envs/ml_env/include/python3.6m -c pyrfr/regression_wrap.cpp -o build/temp.linux-x86_64-3.6/pyrfr/regression_wrap.o -O2 -std=c++11
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
      g++ -pthread -shared -L/home/ubuntu/anaconda3/envs/ml_env/lib -Wl,-rpath=/home/ubuntu/anaconda3/envs/ml_env/lib,--no-as-needed build/temp.linux-x86_64-3.6/pyrfr/regression_wrap.o -L/home/ubuntu/anaconda3/envs/ml_env/lib -lpython3.6m -o build/lib.linux-x86_64-3.6/_regression.cpython-36m-x86_64-linux-gnu.so
      building '_util' extension
      swigging pyrfr/util.i to pyrfr/util_wrap.cpp
      swig -python -c++ -I${CMAKE_SOURCE_DIR}/include -I./include -o pyrfr/util_wrap.cpp pyrfr/util.i
      gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I${CMAKE_SOURCE_DIR}/include -I./include -I/home/ubuntu/anaconda3/envs/ml_env/include/python3.6m -c pyrfr/util_wrap.cpp -o build/temp.linux-x86_64-3.6/pyrfr/util_wrap.o -O2 -std=c++11
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
      g++ -pthread -shared -L/home/ubuntu/anaconda3/envs/ml_env/lib -Wl,-rpath=/home/ubuntu/anaconda3/envs/ml_env/lib,--no-as-needed build/temp.linux-x86_64-3.6/pyrfr/util_wrap.o -L/home/ubuntu/anaconda3/envs/ml_env/lib -lpython3.6m -o build/lib.linux-x86_64-3.6/_util.cpython-36m-x86_64-linux-gnu.so
      installing to build/bdist.linux-x86_64/wheel
      running install
      running install_lib
      creating build/bdist.linux-x86_64
      creating build/bdist.linux-x86_64/wheel
      copying build/lib.linux-x86_64-3.6/_regression.cpython-36m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
      creating build/bdist.linux-x86_64/wheel/pyrfr
      copying build/lib.linux-x86_64-3.6/pyrfr/__init__.py -> build/bdist.linux-x86_64/wheel/pyrfr
      copying build/lib.linux-x86_64-3.6/_util.cpython-36m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
      running install_egg_info
      running egg_info
      writing pyrfr.egg-info/PKG-INFO
      writing dependency_links to pyrfr.egg-info/dependency_links.txt
      writing top-level names to pyrfr.egg-info/top_level.txt
      warning: manifest_maker: standard file '-c' not found
      
      reading manifest file 'pyrfr.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      warning: no files found matching '*.pxd' under directory 'pyrfr'
      warning: no files found matching '*.pyx' under directory 'pyrfr'
      warning: manifest_maker: MANIFEST.in, line 11: unknown action 'CMakeList.txt'
      
      writing manifest file 'pyrfr.egg-info/SOURCES.txt'
      Copying pyrfr.egg-info to build/bdist.linux-x86_64/wheel/pyrfr-0.4.0-py3.6.egg-info
      running install_scripts
      Checking .pth file support in build/bdist.linux-x86_64/wheel/
      /home/ubuntu/anaconda3/envs/ml_env/bin/python -E -c pass
      TEST FAILED: build/bdist.linux-x86_64/wheel/ does NOT support .pth files
      error: bad install directory or PYTHONPATH
      creating build/bdist.linux-x86_64/wheel/pyrfr
      copying build/lib.linux-x86_64-3.6/pyrfr/__init__.py -> build/bdist.linux-x86_64/wheel/pyrfr
      copying build/lib.linux-x86_64-3.6/_util.cpython-36m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
      running install_egg_info
      running egg_info
      writing pyrfr.egg-info/PKG-INFO
      writing dependency_links to pyrfr.egg-info/dependency_links.txt
      writing top-level names to pyrfr.egg-info/top_level.txt
      warning: manifest_maker: standard file '-c' not found 
    
      reading manifest file 'pyrfr.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
    
    
          You are attempting to install a package to a directory that is not
          on PYTHONPATH and which Python does not read ".pth" files from.  The
          installation directory you specified (via --install-dir, --prefix, or
          the distutils default setting) was:
      
          build/bdist.linux-x86_64/wheel/
    
    
      and your PYTHONPATH environment variable currently contains:
      
          ''
      
      Here are some of your options for correcting the problem:
      
      * You can choose a different installation directory, i.e., one that is
        on PYTHONPATH or supports .pth files
      
      * You can add the installation directory to the PYTHONPATH environment
        variable.  (It must then also be on PYTHONPATH whenever you run
        Python and want to use the package(s) you are installing.)
      
      * You can set up the installation directory to support ".pth" files by
        using one of the approaches described here:
      
        https://setuptools.readthedocs.io/en/latest/easy_install.html#custom-installation-locations
      
      
      Please make the appropriate changes for your system and try again.
      
      ----------------------------------------
      Failed building wheel for pyrfr
      Running setup.py clean for pyrfr
    
    Failed to build pyrfr
    Installing collected packages: pyrfr
      Running setup.py install for pyrfr ... done
      Could not find .egg-info directory in install record for pyrfr from https://pypi.python.org/packages/21/4c/58533c51ab301f61d3521dc4cd29ba8145eed8f11b84f70aba9fd28f6aca/pyrfr-0.4.0.tar.gz#md5=70ccd2527bd85c18b8a65c7498d5e0de
    Successfully installed pyrfr-0.4.0
    Requirement already satisfied: smac==0.3.0 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
    Requirement already satisfied: numpy>=1.7.1 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
    Requirement already satisfied: scipy>=0.18.1 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
    Requirement already satisfied: scikit-learn in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
    Requirement already satisfied: pyrfr in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
    Requirement already satisfied: ConfigSpace>=0.3.1 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
    Requirement already satisfied: pynisher>=0.4.1 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
    Requirement already satisfied: psutil in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
    Requirement already satisfied: typing in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
    Requirement already satisfied: setuptools in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg (from smac==0.3.0)
    Requirement already satisfied: Cython in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
    Requirement already satisfied: six in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
    Requirement already satisfied: pyparsing in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from ConfigSpace>=0.3.1->smac==0.3.0)
    Requirement already satisfied: argparse in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from ConfigSpace>=0.3.1->smac==0.3.0)
    Requirement already satisfied: docutils>=0.3 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from pynisher>=0.4.1->smac==0.3.0)
    
    
    
    
    opened by ajing 21
  • making data preprocessing step configurable with two options no preprocessing and feature type split

    making data preprocessing step configurable with two options no preprocessing and feature type split

    New approach for #900 : Data preprocessing step of AutoML system will have two options:

    • use feature_type split method (existing implementation)
    • disable step by selecting no_preprocessing

    Introduced new parameters:

    • include_data_preprocessors
    • exclude_data_preprocessors

    If none of the parameter is set by user, by default, it will add 'no_preprocessing' to step exclude_data_preprocessors and will only use existing FeatureTypeSplit component.

    opened by rabsr 18
  • add pyrfr's gcc requirements to installation guide

    add pyrfr's gcc requirements to installation guide

    Trying to install auto-sklearn and more specifically pyrfr in an anaconda environment failed due to a wrong version of GCC. This happened even though - and as it turns out especially because - I followed the instruction to conda install gcc swig given in the installation guide.

    Building pyrfr failed for me with GCC 4.8.5 (which was installed through conda) and worked with the system GCC 7.1.1. The pyrfr readme hints on using GCC 5.4 or 6.2. Sadly I can't identify the minimum working version of GCC to build pyrfr right now but would recommend adding an according hint to the installation instructions on the website.

    opened by LGro 18
  • Document AutoSklearnClassifier constructor options

    Document AutoSklearnClassifier constructor options

    Dear auto-sklearn team, I have just learned about this project and am very excited to try to include it into my modelling flow!

    It seems the command line option names for autosklearn are not the same as what AutoSklearnClassifier() constructor would accept. So I have kind of reverse engineered a few but I still can not figure out whether it is possible to specify task_type for example. task_type="binary.classification" is rejected by the AutoSklearnClassifier() constructor.

    I understand this is a very young project and is actively worked on. Will be happy to supply you with the feedback from the fields as I am actively running modelling experiments on various datasets available at my company. Currently I am successfully using scikit learn with SGD Classifier for one. Is there a better way to reconnect with you in a forum or a chat somewhere to ask questions or give feedback?

    opened by Motorrat 18
  • Improve user method of seeing pipelines generated

    Improve user method of seeing pipelines generated

    Currently, the easiest way for a user to see the pipelines included in the ensemble is through estimator.show_models() which just returns a str which needs to be manually parsed and looked through. There could definitely be a nicer format to view any such pipeline and provide easy access.

    Good first issue maintenance 
    opened by eddiebergman 17
  • Error creating dummy predictions: {'error': 'Result queue is empty', 'configuration_origin': 'DUMMY'}

    Error creating dummy predictions: {'error': 'Result queue is empty', 'configuration_origin': 'DUMMY'}

    Greetings,

    I am trying to run autosklearn 0.6.0 on an HPC infrastructure. I am using the kr-vs-kp dataset (https://www.openml.org/d/3) and I have made all the values of the dataset numeric. The error I get is:

    Unhandled exception in thread started by libgcc_s.so.1 must be installed for pthread_cancel to work Traceback (most recent call last): File "main_script.py", line 41, in <module> autosklearn_script.autosklearn_func(train_data,train_target,test_data,test_target,dataset_info_dict,task,time_threshold) File "/users/pr004/ixanthos/test_files/autosklearn_script.py", line 31, in autosklearn_func automl.fit(train_data, train_target, metric=autosklearn.metrics.roc_auc) File "/apps/applications/pytorch/1.3.1/lib/python3.6/site-packages/autosklearn/estimators.py", line 664, in fit dataset_name=dataset_name, File "/apps/applications/pytorch/1.3.1/lib/python3.6/site-packages/autosklearn/estimators.py", line 337, in fit self._automl[0].fit(**kwargs) File "/apps/applications/pytorch/1.3.1/lib/python3.6/site-packages/autosklearn/automl.py", line 996, in fit load_models=load_models, File "/apps/applications/pytorch/1.3.1/lib/python3.6/site-packages/autosklearn/automl.py", line 208, in fit only_return_configuration_space=only_return_configuration_space, File "/apps/applications/pytorch/1.3.1/lib/python3.6/site-packages/autosklearn/automl.py", line 384, in _fit num_run = self._do_dummy_prediction(datamanager, num_run) File "/apps/applications/pytorch/1.3.1/lib/python3.6/site-packages/autosklearn/automl.py", line 313, in _do_dummy_prediction raise ValueError("Dummy prediction failed: %s " % str(additional_info)) ValueError: Dummy prediction failed: {'error': 'Result queue is empty', 'configuration_origin': 'DUMMY'}

    Any ideas?

    Thank you in advance.

    bug 
    opened by iXanthos 17
  • Unable to process large data set??

    Unable to process large data set??

    I have a data set which is more than 100k records. When I try to fit into AutoSklearnRegressor, it always thrown an warning. It seems causing that I cannot get a expected output.

    However, if number of records is small enough (says less than 20k), it can execute without any warming/ error. May you advise this situation? I am using 0.2 version

    Sample code

    import autosklearn.regression
    import numpy as np
    
    x = np.random.randint(2, size=(250000,100))
    y = np.random.randint(2, size=(250000,1))
    
    
    feature_types = (['numerical'] * 100)
    automl = autosklearn.regression.AutoSklearnRegressor(
        time_left_for_this_task=120, per_run_time_limit=30
    )
    automl.fit(x, y, dataset_name='boston', feat_type=feature_types)
    

    The exception is

    WARNING] [2017-10-26 11:21:28,580:AutoMLSMBO(1)::boston] Could not find meta-data directory /home/anaconda3/lib/python3.5/site-packages/autosklearn/metalearning/files/r2_regression_dense
    
    /home/anaconda3/lib/python3.5/site-packages/autosklearn/smbo.py:737: RuntimeWarning: invalid value encountered in true_divide
      (1. - dataset_minimum))
    /home/anaconda3/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
      return (self.a < x) & (x < self.b)
    /home/anaconda3/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
      return (self.a < x) & (x < self.b)
    /home/anaconda3/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:1735: RuntimeWarning: invalid value encountered in greater_equal
      cond2 = (x >= self.b) & cond0
    /home/anaconda3/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:876: RuntimeWarning: invalid value encountered in greater_equal
    
    opened by mkcedward 17
  • Speed and time budgets

    Speed and time budgets

    In a text classification task the SGDClassifier needs just a few minutes to get to the same result as auto-sklearn that I let running for 20 hours. Lesser time budget for auto-sklearn resulted in absolute or relative failure in prediction.

    I wonder if there is a strategy to try the fastest algorithms first and if time is up use their results at least?

    Another question is about the recommended per_run_time_limit value. Is there a rule of thumb choosing it?

    SGDClassifier Precision: 0.20 Test FR Recall: 0.53 F1: 0.29 Auto-sklearn Precision: 0.28 Recall: 0.31 F1: 0.29 classifier.fit(X_train, y_train, metric='f1_metric') AutoSklearnClassifier( time_left_for_this_task=72000, per_run_time_limit=19000, ml_memory_limit=10000)

    opened by Motorrat 17
  • Bump actions/stale from 6 to 7

    Bump actions/stale from 6 to 7

    Bumps actions/stale from 6 to 7.

    Release notes

    Sourced from actions/stale's releases.

    v7.0.0

    ⚠️ This version contains breaking changes ⚠️

    What's Changed

    Breaking Changes

    • In this release we prevent this action from managing the stale label on items included in exempt-issue-labels and exempt-pr-labels
    • We decided that this is outside of the scope of this action, and to be left up to the maintainer

    New Contributors

    Full Changelog: https://github.com/actions/stale/compare/v6...v7.0.0

    v6.0.1

    Update @​actions/core to 1.10.0 #839

    Full Changelog: https://github.com/actions/stale/compare/v6.0.0...v6.0.1

    Changelog

    Sourced from actions/stale's changelog.

    Changelog

    [7.0.0]

    :warning: Breaking change :warning:

    [6.0.1]

    Update @​actions/core to v1.10.0 (#839)

    [6.0.0]

    :warning: Breaking change :warning:

    Issues/PRs default close-issue-reason is now not_planned(#789)

    [5.1.0]

    Don't process stale issues right after they're marked stale [Add close-issue-reason option]#764#772 Various dependabot/dependency updates

    4.1.0 (2021-07-14)

    Features

    4.0.0 (2021-07-14)

    Features

    Bug Fixes

    • dry-run: forbid mutations in dry-run (#500) (f1017f3), closes #499
    • logs: coloured logs (#465) (5fbbfba)
    • operations: fail fast the current batch to respect the operations limit (#474) (5f6f311), closes #466
    • label comparison: make label comparison case insensitive #517, closes #516
    • filtering comments by actor could have strange behavior: "stale" comments are now detected based on if the message is the stale message not who made the comment(#519), fixes #441, #509, #518

    Breaking Changes

    ... (truncated)

    Commits
    • 6f05e42 draft release for v7.0.0 (#888)
    • eed91cb Update how stale handles exempt items (#874)
    • 10dc265 Merge pull request #880 from akv-platform/update-stale-repo
    • 9c1eb3f Update .md files and allign build-test.yml with the current test.yml
    • bc357bd Update .github/workflows/release-new-action-version.yml
    • 690ede5 Update .github/ISSUE_TEMPLATE/bug_report.md
    • afbcabf Merge branch 'main' into update-stale-repo
    • e364411 Update name of codeql.yml file
    • 627cef3 fix print outputs step (#859)
    • 975308f Merge pull request #876 from jongwooo/chore/use-cache-in-check-dist
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • [Question] How to get values of categorical variables from a fit model?

    [Question] How to get values of categorical variables from a fit model?

    How can one get back the values of categorical variables from a fitted model? Let's say that I have a model where one of the features is a categorical variable, is there a way to get back the values of that variable that were observed during training?

    My use case is I have a multi-model system that I am building (multiple auto-sklearn models, not ensemble) and I want to implement some logic for deciding which model to use depending on if a certain categorical value was observed during training. In scikit-learn this could be easily accessed from the categories_ attribute of a OneHotEncoder, but given the complex nature of auto-sklearn classes and use of ensembles I'm not sure where to begin looking.

    Alternatively, one could set an encoder to error on unknown values, and build logic around catching these errors. This doesn't work for auto-sklearn either because the "missing" category is always created, so models will always successfully predict on missing values without any sign that it was on an unknown categorical value.

    Any help here would be appreciated.

    Running version 0.14.7

    opened by eliwoods 0
  • [Question] How to know the data and feature preprocessing used in the ensemble?

    [Question] How to know the data and feature preprocessing used in the ensemble?

    Hi, the method AutoSklearnClassifier().show_models() displays the models found in the ensemble. I wonder if it is possible to know exactly which data or feature preprocessing steps have been done before training the model. The method show_models only gives only the object:

    {'model_id': 2, 
    'rank': 1, 
    'cost': 0.04255319148936165, 
    'ensemble_weight': 0.04, 
    'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f704fb2dee0>, 
    'balancing': Balancing(random_state=1), 
    'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f70a7e4e7f0>,
    'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f70a7e4e1c0>, 
    'sklearn_classifier': RandomForestClassifier(max_features=5, n_estimators=512, n_jobs=1,
                           random_state=1, warm_start=True)}
    

    and it is not clear to know which steps they are. Is it possible to get the preprocessing steps in such a case?

    Many thanks!

    opened by beantr96 0
  • Bump actions/checkout from 3.1.0 to 3.2.0

    Bump actions/checkout from 3.1.0 to 3.2.0

    Bumps actions/checkout from 3.1.0 to 3.2.0.

    Release notes

    Sourced from actions/checkout's releases.

    v3.2.0

    What's Changed

    New Contributors

    Full Changelog: https://github.com/actions/checkout/compare/v3...v3.2.0

    Changelog

    Sourced from actions/checkout's changelog.

    Changelog

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Multilabel Classification with AutoSklearn2Classifier - sgd component not valid error

    Multilabel Classification with AutoSklearn2Classifier - sgd component not valid error

    Hello Auto-sklearn team,

    Describe the bug

    AutoSklearn2Classifier with y-variable of type multilabel-indicator erroring with component sgd. Demonstrated below with example data.

    To Reproduce

    Steps to reproduce the behavior:

    1. Using the sample multilabel classification data from https://automl.github.io/auto-sklearn/master/examples/20_basic/example_multilabel_classification.html
    2. autosklearn.classification.AutoSklearnClassifier is working fine with y-variable of type multilabel-indicator
    3. Importing AutoSklearn2Classifier (from autosklearn.experimental.askl2 import AutoSklearn2Classifier)
    4. AutoSklearn2Classifier works fine with y-variable of type binary (e.g. using just the first column of the y-matrix)
    5. but AutoSklearn2Classifier with y-variable of type multilabel-indicator is producing the error below

    Actual behavior, stacktrace or logfile

    Screenshot (81)

    Text-based error: ValueError: The provided component 'sgd' for the key 'classifier' in the 'include' argument is not valid. The supported components for the step 'classifier' for this task are ['bernoulli_nb', 'decision_tree', 'extra_trees', 'gaussian_nb', 'k_nearest_neighbors', 'lda', 'liblinear_svc', 'mlp', 'multinomial_nb', 'passive_aggressive', 'qda', 'random_forest']

    Environment and installation:

    • Google Colab Jupyter Notebook
    • Python 3.8.16
    • Auto-sklearn version 0.15.0

    Best, Anthony

    opened by anthonyromyn 0
Releases(v0.14.7)
  • v0.14.7(Aug 18, 2022)

    Version 0.14.7

    • HOTFIX #1445: Locks ConfigSpace to <0.5.0 and smac to <1.3. Adds upper bounds on automl packages to help prevent further issues.

    Contributors v0.14.7


    • Eddie Bergman

    Note, this release was generated later but has been on PyPI for a while

    Source code(tar.gz)
    Source code(zip)
  • v0.14.6(Feb 18, 2022)

    Version 0.14.6

    • HOTFIX #1407: Catches keyword arguments in SingleThreadedClient so they don't get passed to it's executing func.

    Contributors v0.14.6


    • Eddie Bergman
    Source code(tar.gz)
    Source code(zip)
  • v0.14.4(Jan 25, 2022)

    Version 0.14.4

    • Fix #1356: SVR degree hyperparameter now only active with "poly" kernel.
    • Add #1311: Black format checking (non-strict).
    • Maint #1306: Run history is now saved every iteration
    • Doc #1309: Updated the doc faqs to include many use cases and the manual for early introductions
    • Doc #1322: Fix typo in contribution guide
    • Maint #1326: Add isort checker (non-strict)
    • Maint #1238, #1346, #1368, #1370: Update warnings in tests
    • Maint #1325: Test workflow can now be manually triggered
    • Maint #1332: Update docstring and typing of include and exclude params
    • Add #1260: Support for Python 3.10
    • Add #1318: First update to use the shared backend in a new submodule automl_common <https://github.com/automl/automl_common>_
    • Fix #1339: Resolve dependancy issues with sphinx_toolbox
    • Fix #1335: Fix issue where some regression algorithm gave incorrect output dimensions as raised in #1297
    • Doc #1340: Update example for predefined splits
    • Fix #1329: Fix random state not being passed to the ConfigurationSpace
    • Maint #1348: Stop double triggering of github workflows
    • Doc #1349: Rename OSX to macOS in docs
    • Add #1321: Change show_models() to produce actual pipeline objects and not a str
    • Maint #1361: Remove flaky dependency
    • Maint #1366: Make SimpleClassificationPipeline tests more deterministic
    • Maint #1367: Update test values for MLPRegressor with newer numpy

    Contributors v0.14.4


    • Eddie Bergman
    • Matthias Feurer
    • Katharina Eggensperger
    • UserFindingSelf
    • partev
    Source code(tar.gz)
    Source code(zip)
    auto-sklearn-0.14.4.tar.gz(6.10 MB)
  • v0.14.1(Nov 9, 2021)

    Version 0.14.1

    • FIX #1248: Allow for sparse y_test.
    • FIX #1259: Fix an issue that could result in setup.py not working due to relative paths being chosen.
    • MAINT #1261: Include a CITATION.cff file
    • MAINT #1263: Make unit test deterministic.
    • DOC #1269: Fix example on extending data preprocessing.
    • DOC #1270: Remove >>> from code examples in the documentation.
    • DOC #1271: Fix a typo in an example in the documentation.
    • DOC #1282: Add a contribution guide.

    Contributors

    • Edward Bergman
    • Michael Becker
    • Katharina Eggensperger
    Source code(tar.gz)
    Source code(zip)
  • v0.14.0(Sep 14, 2021)

    Version 0.14.0

    • ADD #900: Make data preprocessing more configurable, for example allow to completely disable it.
    • ADD #1128: Adds new functionality to retrieve data for an accuracy over time plot from Auto-sklearn without additional code.
    • FIX #1149: Stops Auto-sklearn from printing weird warnings (Exception ignored in [...]) at shutdown.
    • FIX #1169: Fixes a bug which made cross-validation and multi-output regression incompatible.
    • FIX #1170: Make all preprocessing techniques deterministic.
    • FIX #1190: Fixes a bug which could make predictive probabilities contain too few classes in case one class was only present a single time.
    • FIX #1209: Pass random states to pipeline objects.
    • FIX #1204: Add support for sparse data in Auto-sklearn 2.0.
    • FIX #1210: Add support for sparse y labels.
    • FIX #1245: Fixes a bug which could result in Auto-sklearn crashing in case a class was present only once.
    • DOC #532,#1242: Simplify installation instructions.
    • DOC #1144: Document installation via conda
    • DOC #1195,#1201,#1214: Fix a few typos and links. Make some http links https links.
    • DOC #1200: Fixes variable name in an example.
    • DOC #1229: Improve code formatting in the documentation.
    • DOC #1235: Improve docker startup command so it also work on Windows.
    • MAINT #1198: Use latest Ubuntu LTS (20:04) for github actions.
    • MAINT #1231: The command make linkcheck no longer builds the documentation, speeding up link-checking.
    • MAINT #1233: Enable regression testing with 3 classification and 3 regression datasets on github actions.
    • MAINT #1239: Increase the timeout for github actions to 60 minutes.

    Contributors v0.14.0

    • Pieter Gijsbers
    • Taneli Mielikäinen
    • Rohit Agarwal
    • hnishi
    • Francisco Rivera Valverde
    • Eddie Bergman
    • Satyam Jha
    • Joel Jose
    • Oli
    • Matthias Feurer
    Source code(tar.gz)
    Source code(zip)
  • v0.13.0(Jul 28, 2021)

    Version 0.13.0

    • ADD #1100: Provide access to the callbacks of SMAC.
    • ADD #1185: New leaderboard functionality to visualize models
    • FIX #1133: Refer to the correct attribute in an error message.
    • FIX #1154: Allow running Auto-sklearn on a 32-bit system.
    • MAINT #924: Instead of passing classes for the resampling strategy one has now to pass objects.
    • MAINT #1108: Limit the number of threads used by numpy and/or scikit-learn via threadpoolctl.
    • MAINT #1135: Simplify internal workflow of pandas handling. This results in pandas being passed directly passed to scikit-learn models instead of being internally converted into a numpy array. However, this should neither impact the behavior nor the performance of Auto-sklearn.
    • MAINT #1157: Drop support for Python 3.6, enable support for Python 3.9.
    • MAINT #1159: Remove the output directory argument to the classifier and regressor. Despite the name, the output directory was not used and was a leftover from participating in the AutoML challenges.
    • MAINT #1187: Bump requires SMAC version to at least 0.14.
    • DOC #1109: Add an FAQ.
    • DOC #1126: Add new examples on how to use scikit-learn's inspect module.
    • DOC #1136: Add a new example on how to perform multi-output regression.
    • DOC #1152: Enable link checking when buiding the documentation.
    • DOC #1158: New example on how to configure the logger for Auto-sklearn.
    • DOC #1165: Improve the readme page.

    Contributors v0.13.0

    • Francisco Rivera Valverde
    • Matthias Feurer
    • JJ Ben-Joseph
    • Isaac Chung
    • Katharina Eggensperger
    • bitsbuffer
    • Eddie Bergman
    • olehb007
    Source code(tar.gz)
    Source code(zip)
  • v0.12.7(Jul 20, 2021)

    Version 0.12.7

    • ADD #1178: Reduce precision if dataset is too large for given memory limit.
    • ADD #1179: Improve Auto-sklearn 2.0 meta-data by providing new meta-data for the metrics roc_auc and logloss.
    • DOC: Fix reference to arXiv paper
    • MAINT #1134,#1142,#1143: Improvements to the stale bot - the stale bot now marks issues labeled with feedback required as stale if there is nothing happening for 30 days. After another 7 days it then closes the issue.
    • MAINT: Added a new issue template for questions.
    • MAINT #1168: Upper-bound scipy to 1.6.3 as 1.7.0 is incompatible with SMAC.
    • MAINT #1173: Update the license files to be recognized by github.

    Contributors v0.12.7

    Source code(tar.gz)
    Source code(zip)
  • v0.12.6(Apr 17, 2021)

    Version 0.12.6

    • ADD #886: Provide new function which allows fitting only a single configuration.
    • DOC #1070: Clarify example on how successive halving and Bayesian optimization play together.
    • DOC #1112: Fix type.
    • DOC #1122: Add Python 3 to the installation command for Ubuntu.
    • FIX #1114: Fix a bug which made printing dummy models fail.
    • FIX #1117: Fix a bug previously made memory_limit=None fail.
    • FIX #1121: Fix an edge case which could decrease performance in Auto-sklearn 2.0 when using cross-validation with iterative fitting.
    • FIX #1123: Fix a bug autosklearn.metrics.calculate_score for metrics/scores which need to be minimized where the function previously returned the loss and not the score.
    • FIX #1115/#1124: Fix a bug which would prevent Auto-sklearn from computing meta-features in the multiprocessing case.

    Contributors v0.12.6

    • Francisco Rivera Valverde
    • stock90975
    • Lucas Nildaimon dos Santos Silva
    • Matthias Feurer
    • Rohit Agarwal
    Source code(tar.gz)
    Source code(zip)
  • v0.12.5(Mar 29, 2021)

  • v0.12.4(Mar 29, 2021)

    Version 0.12.4

    • ADD #660: Enable scikit-learn's power transformation for input features.
    • MAINT: Bump the pyrfr minimum dependency to 0.8.1 to automatically download wheels from pypi if possible.
    • FIX #732: Add a missing size check into the GMEANS clustering used for the NeurIPS 2015 paper.
    • FIX #1050: Add missing arguments to the AutoSklearn2Classifier signature.
    • FIX #1072: Fixes a bug where the AutoSklearn2Classifier could not be created due to trying to cache to the wrong directory.

    Contributors v0.12.4

    • Matthias Feurer
    • Francisco Rivera
    • Maximilian Greil
    • Pepe Berba
    Source code(tar.gz)
    Source code(zip)
  • v0.12.3(Feb 17, 2021)

    Version 0.12.3

    • FIX #1061: Fixes a bug where the model could not be printed in a jupyter notebook.
    • FIX #1075: Fixes a bug where the ensemble builder would wrongly prune good models for loss functions (i.e. functions that need to be minimized such as logloss or mean_squared_error.
    • FIX #1079: Fixes a bug where AutoMLClassifier.cv_results and AutoMLRegressor.cv_results could rank results in opposite order for loss functions (i.e. functions that need to be minimized such as logloss or mean_squared_error.
    • FIX: Fixes a bug in offline meta-data generation that could lead to a deadlock.
    • MAINT #1076: Uses the correct multiprocessing context for computing meta-features
    • MAINT: Cleanup readme and main directory

    Contributors v0.12.3

    • Matthias Feurer
    • ROHIT AGARWAL
    • Francisco Rivera
    Source code(tar.gz)
    Source code(zip)
  • v0.12.2(Feb 3, 2021)

    Version 0.12.2

    • ADD #1045: New example demonstrating how to log multiple metrics during a run of Auto-sklearn.
    • DOC #1052: Add links to mybinder
    • DOC #1059: Improved the example on manually starting workers for Auto-sklearn.
    • FIX #1046: Add the final result of the ensemble builder to the ensemble builder trajectory.
    • MAINT: Two log outputs of level warning about metadata were turned reduced to the info loglevel as they are not actionable for the user.
    • MAINT #1062: Use threads for local dask workers and forkserver to start subprocesses to reduce overhead.
    • MAINT #1053: Remove the restriction to guard single-core Auto-sklearn by __main__ == "__name__" again.

    Contributors v0.12.2

    • Matthias Feurer
    • ROHIT AGARWAL
    • Francisco Rivera
    • Katharina Eggensperger
    Source code(tar.gz)
    Source code(zip)
  • v0.12.1(Apr 13, 2021)

    Version 0.12.1

    • ADD: A new heuristic which gives a warning and subsamples the data if it is too large for the given memory_limit.
    • ADD #1024: Tune scikit-learn's MLPClassifier and MLPRegressor.
    • MAINT #1017: Improve the logging server introduced in release 0.12.0.
    • MAINT #1024: Move to scikit-learn 0.24.X.
    • MAINT #1038: Use new datasets for regression and classification and also update the metadata used for Auto-sklearn 1.0.
    • MAINT #1040: Minor speed improvements in the ensemble selection algorithm.

    Contributors v0.12.1

    • Matthias Feurer
    • Katharina Eggensperger
    • Francisco Rivera
    Source code(tar.gz)
    Source code(zip)
  • v0.12.1rc1(Dec 22, 2020)

    Version 0.12.1

    • ADD: A new heuristic which gives a warning and subsamples the data if it is too large for the given memory_limit.
    • ADD #1024: Tune scikit-learn's MLPClassifier and MLPRegressor.
    • MAINT #1017: Improve the logging server introduced in release 0.12.0.
    • MAINT #1024: Move to scikit-learn 0.24.X.
    • MAINT #1038: Use new datasets for regression and classification and also update the metadata used for Auto-sklearn 1.0.
    • MAINT #1040: Minor speed improvements in the ensemble selection algorithm.

    Contributors v0.12.1

    • Matthias Feurer
    • Katharina Eggensperger
    • Francisco Rivera
    Source code(tar.gz)
    Source code(zip)
  • v0.12.0(Dec 8, 2020)

    Version 0.12.0

    • BREAKING: Auto-sklearn must now be guarded by __name__ == "__main__" due to the use of the spawn multiprocessing context.
    • ADD #1026: Adds improved meta-data for Auto-sklearn 2.0 which results in strong improved performance.
    • MAINT #984 and #1008: Move to scikit-learn 0.23.X
    • MAINT #1004: Move from travis-ci to github actions.
    • MAINT 8b67af6: drop the requirement to the lockfile package.
    • FIX #990: Fixes a bug that made Auto-sklearn fail if there are missing values in a pandas DataFrame.
    • FIX #1007, #1012 and #1014: Log multiprocessing output via a new log server. Remove several potential deadlocks related to the joint use of multi-processing, multi-threading and logging.

    Contributors v0.12.0

    • Matthias Feurer
    • ROHIT AGARWAL
    • Francisco Rivera
    Source code(tar.gz)
    Source code(zip)
  • v0.11.1(Nov 11, 2020)

    Version 0.11.1

    • FIX #989: Fixes a bug where y was not passed to all data preprocessors which made 3rd party category encoders fail.
    • FIX #1001: Fixes a bug which could make Auto-sklearn fail at random.
    • MAINT #1000: Introduce a minimal version for dask.distributed.
    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Nov 6, 2020)

    Version 0.11.0

    • ADD #992: Move ensemble building from being a separate process to a job submitted to the dask cluster. This allows for better control of the memory used in multiprocessing settings. This change also removes the arguments ensemble_memory_limit and ml_memory_limit and replaces them by the single argument memory_limit.
    • FIX #905: Make AutoSklearn2Classifier picklable.
    • FIX #970: Fix a bug where Auto-sklearn would fail if categorical features are passed as a Pandas Dataframe.
    • MAINT #772: Improve error message in case of dummy prediction failure.
    • MAINT #948: Finally use Pandas >= 1.0.
    • MAINT #973: Improve meta-data by running meta-data generation for more time and separately for important metrics.
    • MAINT #997: Improve memory handling in the ensemble building process. This allows building ensembles for larger datasets.

    Contributors v0.11.0

    • Matthias Feurer
    • Francisco Rivera
    • Karl Leswing
    • ROHIT AGARWAL
    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Sep 26, 2020)

    Version 0.10.0

    • ADD #325: Allow to separately optimize metrics for metadata generation.
    • ADD #946: New dask backend for parallel Auto-sklearn.
    • BREAKING #947: Drop Python3.5 support.
    • BREAKING #946: Remove shared model mode for parallel Auto-sklearn.
    • FIX #351: No longer pass un-picklable logger instances to the target function.
    • FIX #840: Fixes a bug which prevented computing metadata for regression datasets. Also adds a unit test for regression metadata computation.
    • FIX #897: Allow custom splitters to be used with multi-ouput regression.
    • FIX #951: Fixes a lot of bugs in the regression pipeline that caused bad performance for regression datasets.
    • FIX #953: Re-add liac-arff as a dependency.
    • FIX #956: Fixes a bug which could cause Auto-sklearn not to find a model on disk which is part of the ensemble.
    • FIX #961: Fixes a bug which caused Auto-sklearn to load bad meta-data for metrics which cannot be computed on multiclass datasets (especially ROC_AUC).
    • DOC #498: Improve the example on resampling strategies by showing how to pass scikit-learn's splitter objects to Auto-sklearn.
    • DOC #670: Demonstrate how to give access to training accuracy.
    • DOC #872: Improve an example on how obtain the best model.
    • DOC #940: Improve documentation of the docker image.
    • MAINT: Improve the docker file by setting environment variable that restrict BLAS and OMP to only use a single core.
    • MAINT #949: Replace pip by pip3 in the installation guidelines.
    • MAINT #280, #535, #956: Update meta-data and include regression meta-data again.

    Contributors v0.10.0

    • Francisco Rivera
    • Matthias Feurer
    • felixleungsc
    • Chu-Cheng Fu
    • Francois Berenger
    Source code(tar.gz)
    Source code(zip)
  • v.0.8.0(Jul 8, 2020)

    Version 0.8.0

    • ADD #803: multi-output regression
    • ADD #893: new Auto-sklearn mode Auto-sklearn 2.0

    Contributors

    • Chu-Cheng Fu
    • Matthias Feurer
    Source code(tar.gz)
    Source code(zip)
  • v.0.7.1(Jul 3, 2020)

    Version 0.7.1

    • ADD #764: support for automatic per_run_time_limit selection
    • ADD #864: add the possibility to predict with cross-validation
    • ADD #874: support to limit the disk space consumption
    • MAINT #862: improved documentation and render examples in web page
    • MAINT #869: removal of competition data manager support
    • MAINT #870: memory improvements when building ensemble
    • MAINT #882: memory improvements when performing ensemble selection
    • FIX #701: scaling factors for metafeatures should not be learned using test data
    • FIX #715: allow unlimited ML memory
    • FIX #771: improved worst possible result calculation
    • FIX #843: default value for SelectPercentileRegression
    • FIX #852: clip probabilities within [0-1]
    • FIX #854: improved tmp file naming
    • FIX #863: SMAC exceptions also registered in log file
    • FIX #876: allow Auto-sklearn model to be cloned
    • FIX #879: allow 1-D binary predictions

    Contributors

    • Matthias Feurer
    • Xiaodong DENG
    • Francisco Rivera
    Source code(tar.gz)
    Source code(zip)
  • v.0.7.0(May 7, 2020)

    Version 0.7.0

    • ADD #785: user control to reduce the hard drive memory required to store ensembles
    • ADD #794: iterative fit for gradient boosting
    • ADD #795: add successive halving evaluation strategy
    • ADD #814: new sklearn.metrics.balanced_accuracy_score instead of custom metric
    • ADD #815: new experimental evaluation mode called iterative_cv
    • MAINT #774: move from scikit-learn 0.21.X to 0.22.X
    • MAINT #791: move from smac 0.8 to 0.12
    • MAINT #822: make autosklearn modules PEP8 compliant
    • FIX #733: fix for n_jobs=-1
    • FIX #739: remove unnecessary warning
    • FIX ##769: fixed error in calculation of meta features
    • FIX #778: support for python 3.8
    • FIX #781: support for pandas 1.x

    Contributors

    • Andrew Nader
    • Gui Miotto
    • Julian Berman
    • Katharina Eggensperger
    • Matthias Feurer
    • Maximilian Peters
    • Rong-Inspur
    • Valentin Geffrier
    • Francisco Rivera
    Source code(tar.gz)
    Source code(zip)
  • v.0.6.0(Jan 3, 2020)

    Version 0.6.0

    • MAINT: move from scikit-learn 0.19.X to 0.21.X
    • MAINT #688: allow for pyrfr version 0.8.X
    • FIX #680: Remove unnecessary print statement
    • FIX #600: Remove unnecessary warning

    Contributors

    • Guilherme Miotto
    • Matthias Feurer
    • Jin Woo Ahn
    Source code(tar.gz)
    Source code(zip)
  • v.0.5.2(May 13, 2019)

    Version 0.5.2

    • FIX #669: Correctly handle arguments to the AutoMLRegressor
    • FIX #667: Auto-sklearn works with numpy 1.16.3 again.
    • ADD #676: Allow brackets [ ] inside the temporary and output directory paths.
    • ADD #424: (Experimental) scripts to reproduce the results from the original Auto-sklearn paper.

    Contributors

    • Jin Woo Ahn (@ahn1340)
    • Herilalaina Rakotoarison (@herilalaina)
    • Matthias Feurer (@mfeurer)
    • yazanobeidi (@yazanobeidi)
    Source code(tar.gz)
    Source code(zip)
  • v.0.4.2(Dec 13, 2018)

    Version 0.4.2

    • Fixes #538: Remove rounding errors when giving a training set fraction for holdout.
    • Fixes #558: Ensemble script now uses less memory and the memory limit can be given to Auto-sklearn.
    • Fixes #585: Auto-sklearn’s ensemble script produced wrong results when called directly (and not via one of Auto-sklearn’s estimator classes).
    • Fixes an error in the ensemble script which made it non-deterministic.
    • MAINT #569: Rename hyperparameter to have a different name than a scikit-learn hyperparameter with different meaning.
    • MAINT #592: backwards compatible requirements.txt
    • MAINT #588: Fix SMAC version to 0.8.0
    • MAINT: remove dependency on the six package
    • MAINT: upgrade to XGBoost 0.80

    Contributors

    • Taneli Mielikäinen (@tmielika)
    • Matthias Feurer (@mfeurer)
    • Diogo Bastos (@diogo-bastos)
    • Zeyi Wen (@zeyiwen)
    • Teresa Conceição (@teresaconc)
    • Jin Woo Ahn (@ahn1340)
    Source code(tar.gz)
    Source code(zip)
  • v.0.4.1(Nov 12, 2018)

    Changes

    • Added examples on how to extend Auto-sklearn with a custom classifier, regressor, and preprocessor.
    • Auto-sklearn now requires numpy version between 1.9.0 and 1.14.5, due to higher versions causing travis failure.
    • Examples now use sklearn.datasets.load_breast_cancer() instead of sklearn.datasets.load_digits() to reduce memory usage for travis build.
    • Fixes future warnings on non-tuple sequence for indexing.
    • Fixes #500: fixes ensemble builder to correctly evaluate model score with any metrics. See PR #522.
    • Fixes #482 and #491: Users can now set up custom logger configuration by passing a dictionary created by a yaml file to logging_config.
    • Fixes #566: ensembles are now sorted correctly.
    • Fixes #293: Auto-sklearn checks if appropriate target type was given for classification and regression before call to fit().
    • Travis-ci now runs flake8 to enforce pep8 style guide, and uses travis-ci instead of circle-ci for deployment.

    Contributors

    • Matthias Feurer
    • Manuel Streuhofer
    • Taneli Mielikäinen
    • Katharina Eggensperger
    • Jin Woo Ahn
    Source code(tar.gz)
    Source code(zip)
  • v.0.4.0(Jun 19, 2018)

Owner
AutoML-Freiburg-Hannover
AutoML-Freiburg-Hannover
PySpark + Scikit-learn = Sparkit-learn

Sparkit-learn PySpark + Scikit-learn = Sparkit-learn GitHub: https://github.com/lensacom/sparkit-learn About Sparkit-learn aims to provide scikit-lear

Lensa 1.1k Jan 4, 2023
To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

Astitva Veer Garg 1 Jan 11, 2022
Painless Machine Learning for python based on scikit-learn

PlainML Painless Machine Learning Library for python based on scikit-learn. Install pip install plainml Example from plainml import KnnModel, load_ir

null 1 Aug 6, 2022
Relevance Vector Machine implementation using the scikit-learn API.

scikit-rvm scikit-rvm is a Python module implementing the Relevance Vector Machine (RVM) machine learning technique using the scikit-learn API. Quicks

James Ritchie 204 Nov 18, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Jan 9, 2023
A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

null 802 Jan 1, 2023
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

Tamas Madl 482 Nov 19, 2022
Distributed scikit-learn meta-estimators in PySpark

sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn

Ibotta 282 Dec 9, 2022
Iris species predictor app is used to classify iris species created using python's scikit-learn, fastapi, numpy and joblib packages.

Iris Species Predictor Iris species predictor app is used to classify iris species using their sepal length, sepal width, petal length and petal width

Siva Prakash 5 Apr 5, 2022
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Chris Santiago 0 Mar 30, 2022
Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Penguins Classification App Penguins species predictor app is used to classify penguins species using their island, sex, bill length (mm), bill depth

Siva Prakash 3 Apr 5, 2022
Scikit learn library models to account for data and concept drift.

liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d

null 7 Nov 18, 2021
Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets Datasets Used: Iris dataset,

Samrat Mitra 2 Nov 18, 2021
K-Means clusternig example with Python and Scikit-learn

Unsupervised-Machine-Learning Flat Clustering K-Means clusternig example with Python and Scikit-learn Flat clustering Clustering algorithms group a se

Emin 1 Dec 13, 2021
Scikit-Learn useful pre-defined Pipelines Hub

Scikit-Pipes Scikit-Learn useful pre-defined Pipelines Hub Usage: Install scikit-pipes It's advised to install sklearn-genetic using a virtual env, in

Rodrigo Arenas 1 Apr 26, 2022
Predicting Baseball Metric Clusters: Clustering Application in Python Using scikit-learn

Clustering Clustering Application in Python Using scikit-learn This repository contains the prediction of baseball metric clusters using MLB Statcast

Tom Weichle 2 Apr 18, 2022
icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

icepickle It's a cooler way to store simple linear models. The goal of icepickle is to allow a safe way to serialize and deserialize linear scikit-lea

vincent d warmerdam 24 Dec 9, 2022
MLBox is a powerful Automated Machine Learning python library.

MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle

Axel 1.4k Jan 6, 2023
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning

The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. I

MLJAR 2.4k Jan 2, 2023