Automated Machine Learning with scikit-learn

AutoML-Freiburg-Hannover

Last update: Jan 7, 2023

Related tags

Machine Learning scikit-learn hyperparameter-optimization bayesian-optimization hyperparameter-tuning automl automated-machine-learning smac meta-learning hyperparameter-search metalearning

Overview

auto-sklearn

auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.

Find the documentation here

Automated Machine Learning in four lines of code

import autosklearn.classification
cls = autosklearn.classification.AutoSklearnClassifier()
cls.fit(X_train, y_train)
predictions = cls.predict(X_test)

Relevant publications

Efficient and Robust Automated Machine Learning
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum and Frank Hutter
Advances in Neural Information Processing Systems 28 (2015)
http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf

Auto-Sklearn 2.0: The Next Generation
Authors: Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer and Frank Hutter
arXiv:2007.04074 [cs.LG], 2020 https://arxiv.org/abs/2007.04074

Comments

Can't install auto-sklearn due to pyrfr dependency

Hi all, I'm trying to benchmark auto-sklearn against the same data sets as TPOT on my local cluster (running Linux), but I can't get pyrfr to install when I pip install auto-sklearn.

Is there an earlier version of auto-sklearn that I can use to perform this benchmark?

opened by rhiever 39

Changes show_models() function to return a dictionary of models in ensemble

Summary

Currently the show_models() function returns an str that has to be manually parsed with no way to access the models. I have changed it so that it returns a dictionary of models in ensemble and their information. This helps fix issue #1298 and the issues mentioned inside that thread.

What's changed

Using show_models() will return a dictionary where the key would be model_id. Each entry is a model dictionary which contains the following:

model_id
rank
ensemble_weight
data preprocessor
balancing
feature_preprocessor
regressor or classifier (autosklearn wrapped model)
sklearn model

Example

import sklearn.datasets
import sklearn.metrics

import autosklearn.regression
import matplotlib.pyplot as plt

X, y = sklearn.datasets.load_diabetes(return_X_y=True)

X_train, X_test, y_train, y_test = \
    sklearn.model_selection.train_test_split(X, y, random_state=1)

automl = autosklearn.regression.AutoSklearnRegressor(
    time_left_for_this_task=120,
    per_run_time_limit=30,
    tmp_folder='/tmp/autosklearn_regression_example_tmp',
)
automl.fit(X_train, y_train, dataset_name='diabetes')

ensemble_dict = automl.show_models()
print(ensemble_dict)

Output:

{
25: {'model_id': 25.0, 'rank': 1.0, 'ensemble_weight': 0.46, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7ff2c06588d0>, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7ff2c057bd50>, 'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7ff2c057ba90>, 'sklearn_model': SGDRegressor(alpha=0.0006517033225329654, epsilon=0.012150149892783745,
             eta0=0.016444224834275295, l1_ratio=1.7462342366289323e-09,
             loss='epsilon_insensitive', max_iter=16, penalty='elasticnet',
             power_t=0.21521743568582094, random_state=1,
             tol=0.002431731981071206, warm_start=True)}, 
6: {'model_id': 6.0, 'rank': 2.0, 'ensemble_weight': 0.32, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7ff2c05b3f50>, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7ff2c065c990>, 'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7ff2c057ba10>, 'sklearn_model': ARDRegression(alpha_1=0.0003701926442639788, alpha_2=2.2118001735899097e-07,
              copy_X=False, lambda_1=1.2037591637980971e-06,
              lambda_2=4.358378124977852e-09,
              threshold_lambda=1136.5286041327277, tol=0.021944240404849075)}, ....

opened by UserFindingSelf 31

Segmentation fault

when try the sample code:

import autosklearn.classification
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
X, y = sklearn.datasets.load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, random_state=1)
automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
y_hat = automl.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, y_hat))

something wrong with the automl.fit line, and got "Segmentation fault", the python command console exit.

>automl.fit(X_train, y_train)
/home/work/.pyenv/versions/py3_env/lib/python3.6/site-packages/autosklearn/evaluation/train_evaluator.py:197: RuntimeWarning: Mean of empty slice
  Y_train_pred = np.nanmean(Y_train_pred_full, axis=0)
[WARNING] [2019-05-29 16:26:32,672:EnsembleBuilder(1):d74860caaa557f473ce23908ff7ba369] No models better than random - using Dummy Score!
[WARNING] [2019-05-29 16:26:32,680:EnsembleBuilder(1):d74860caaa557f473ce23908ff7ba369] No models better than random - using Dummy Score!
Segmentation fault
(py3_env) [work@*** ~]$ [WARNING] [2019-05-29 16:26:34,685:EnsembleBuilder(1):d74860caaa557f473ce23908ff7ba369] No models better than random - using Dummy Score!
[WARNING] [2019-05-29 16:26:36,689:EnsembleBuilder(1):d74860caaa557f473ce23908ff7ba369] No models better than random - using Dummy Score!

does anyone come across this problem?

opened by ChangbingChen 28

Error during installation of pyrfr

I got the following error when installing pyrfr. What is installation directory here? My current directory?


(ml_env) ubuntu@ip-10-0-0-10:~/random_forest_run/build$ curl https://raw.githubusercontent.com/automl/auto-sklearn/master/requirements.txt | xargs -n 1 -L 1 pip install
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   230  100   230    0     0   2334      0 --:--:-- --:--:-- --:--:--  4893
Requirement already satisfied: unittest2 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: six>=1.4 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from unittest2)
Requirement already satisfied: argparse in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from unittest2)
Requirement already satisfied: traceback2 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from unittest2)
Requirement already satisfied: linecache2 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from traceback2->unittest2)
Requirement already satisfied: setuptools in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg
Requirement already satisfied: nose in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: six in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: Cython in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: numpy<1.12,>=1.9.0 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: scipy>=0.14.1 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: numpy>=1.8.2 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from scipy>=0.14.1)
Requirement already satisfied: scikit-learn==0.17.1 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: lockfile in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: joblib in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: psutil in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: pyyaml in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: ConfigArgParse in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: liac-arff in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: pandas in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: python-dateutil>=2 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from pandas)
Requirement already satisfied: pytz>=2011k in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from pandas)
Requirement already satisfied: numpy>=1.7.0 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from pandas)
Requirement already satisfied: six>=1.5 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from python-dateutil>=2->pandas)
Requirement already satisfied: xgboost==0.4a30 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: scipy in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from xgboost==0.4a30)
Requirement already satisfied: numpy in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from xgboost==0.4a30)
Requirement already satisfied: ConfigSpace<0.4,>=0.3.1 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: typing in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from ConfigSpace<0.4,>=0.3.1)
Requirement already satisfied: numpy in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from ConfigSpace<0.4,>=0.3.1)
Requirement already satisfied: argparse in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from ConfigSpace<0.4,>=0.3.1)
Requirement already satisfied: pyparsing in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from ConfigSpace<0.4,>=0.3.1)
Requirement already satisfied: pynisher>=0.4 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: docutils>=0.3 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from pynisher>=0.4)
Requirement already satisfied: psutil in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from pynisher>=0.4)
Requirement already satisfied: setuptools in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg (from pynisher>=0.4)

Collecting pyrfr
  Using cached pyrfr-0.4.0.tar.gz
Building wheels for collected packages: pyrfr
  Running setup.py bdist_wheel for pyrfr ... error
  Complete output from command /home/ubuntu/anaconda3/envs/ml_env/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-axcsozeu/pyrfr/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpmdz8gclmpip-wheel- --python-tag cp36:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.6
  creating build/lib.linux-x86_64-3.6/pyrfr
  copying pyrfr/__init__.py -> build/lib.linux-x86_64-3.6/pyrfr
  running build_ext
  building '_regression' extension
  swigging pyrfr/regression.i to pyrfr/regression_wrap.cpp
  swig -python -c++ -I${CMAKE_SOURCE_DIR}/include -I./include -o pyrfr/regression_wrap.cpp pyrfr/regression.i
  creating build/temp.linux-x86_64-3.6
  creating build/temp.linux-x86_64-3.6/pyrfr
  gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I${CMAKE_SOURCE_DIR}/include -I./include -I/home/ubuntu/anaconda3/envs/ml_env/include/python3.6m -c pyrfr/regression_wrap.cpp -o build/temp.linux-x86_64-3.6/pyrfr/regression_wrap.o -O2 -std=c++11
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
  g++ -pthread -shared -L/home/ubuntu/anaconda3/envs/ml_env/lib -Wl,-rpath=/home/ubuntu/anaconda3/envs/ml_env/lib,--no-as-needed build/temp.linux-x86_64-3.6/pyrfr/regression_wrap.o -L/home/ubuntu/anaconda3/envs/ml_env/lib -lpython3.6m -o build/lib.linux-x86_64-3.6/_regression.cpython-36m-x86_64-linux-gnu.so
  building '_util' extension
  swigging pyrfr/util.i to pyrfr/util_wrap.cpp
  swig -python -c++ -I${CMAKE_SOURCE_DIR}/include -I./include -o pyrfr/util_wrap.cpp pyrfr/util.i
  gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I${CMAKE_SOURCE_DIR}/include -I./include -I/home/ubuntu/anaconda3/envs/ml_env/include/python3.6m -c pyrfr/util_wrap.cpp -o build/temp.linux-x86_64-3.6/pyrfr/util_wrap.o -O2 -std=c++11
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
  g++ -pthread -shared -L/home/ubuntu/anaconda3/envs/ml_env/lib -Wl,-rpath=/home/ubuntu/anaconda3/envs/ml_env/lib,--no-as-needed build/temp.linux-x86_64-3.6/pyrfr/util_wrap.o -L/home/ubuntu/anaconda3/envs/ml_env/lib -lpython3.6m -o build/lib.linux-x86_64-3.6/_util.cpython-36m-x86_64-linux-gnu.so
  installing to build/bdist.linux-x86_64/wheel
  running install
  running install_lib
  creating build/bdist.linux-x86_64
  creating build/bdist.linux-x86_64/wheel
  copying build/lib.linux-x86_64-3.6/_regression.cpython-36m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
  creating build/bdist.linux-x86_64/wheel/pyrfr
  copying build/lib.linux-x86_64-3.6/pyrfr/__init__.py -> build/bdist.linux-x86_64/wheel/pyrfr
  copying build/lib.linux-x86_64-3.6/_util.cpython-36m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
  running install_egg_info
  running egg_info
  writing pyrfr.egg-info/PKG-INFO
  writing dependency_links to pyrfr.egg-info/dependency_links.txt
  writing top-level names to pyrfr.egg-info/top_level.txt
  warning: manifest_maker: standard file '-c' not found
  
  reading manifest file 'pyrfr.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  warning: no files found matching '*.pxd' under directory 'pyrfr'
  warning: no files found matching '*.pyx' under directory 'pyrfr'
  warning: manifest_maker: MANIFEST.in, line 11: unknown action 'CMakeList.txt'
  
  writing manifest file 'pyrfr.egg-info/SOURCES.txt'
  Copying pyrfr.egg-info to build/bdist.linux-x86_64/wheel/pyrfr-0.4.0-py3.6.egg-info
  running install_scripts
  Checking .pth file support in build/bdist.linux-x86_64/wheel/
  /home/ubuntu/anaconda3/envs/ml_env/bin/python -E -c pass
  TEST FAILED: build/bdist.linux-x86_64/wheel/ does NOT support .pth files
  error: bad install directory or PYTHONPATH
  creating build/bdist.linux-x86_64/wheel/pyrfr
  copying build/lib.linux-x86_64-3.6/pyrfr/__init__.py -> build/bdist.linux-x86_64/wheel/pyrfr
  copying build/lib.linux-x86_64-3.6/_util.cpython-36m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
  running install_egg_info
  running egg_info
  writing pyrfr.egg-info/PKG-INFO
  writing dependency_links to pyrfr.egg-info/dependency_links.txt
  writing top-level names to pyrfr.egg-info/top_level.txt
  warning: manifest_maker: standard file '-c' not found 

  reading manifest file 'pyrfr.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'


      You are attempting to install a package to a directory that is not
      on PYTHONPATH and which Python does not read ".pth" files from.  The
      installation directory you specified (via --install-dir, --prefix, or
      the distutils default setting) was:
  
      build/bdist.linux-x86_64/wheel/


  and your PYTHONPATH environment variable currently contains:
  
      ''
  
  Here are some of your options for correcting the problem:
  
  * You can choose a different installation directory, i.e., one that is
    on PYTHONPATH or supports .pth files
  
  * You can add the installation directory to the PYTHONPATH environment
    variable.  (It must then also be on PYTHONPATH whenever you run
    Python and want to use the package(s) you are installing.)
  
  * You can set up the installation directory to support ".pth" files by
    using one of the approaches described here:
  
    https://setuptools.readthedocs.io/en/latest/easy_install.html#custom-installation-locations
  
  
  Please make the appropriate changes for your system and try again.
  
  ----------------------------------------
  Failed building wheel for pyrfr
  Running setup.py clean for pyrfr

Failed to build pyrfr
Installing collected packages: pyrfr
  Running setup.py install for pyrfr ... done
  Could not find .egg-info directory in install record for pyrfr from https://pypi.python.org/packages/21/4c/58533c51ab301f61d3521dc4cd29ba8145eed8f11b84f70aba9fd28f6aca/pyrfr-0.4.0.tar.gz#md5=70ccd2527bd85c18b8a65c7498d5e0de
Successfully installed pyrfr-0.4.0
Requirement already satisfied: smac==0.3.0 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages
Requirement already satisfied: numpy>=1.7.1 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
Requirement already satisfied: scipy>=0.18.1 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
Requirement already satisfied: scikit-learn in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
Requirement already satisfied: pyrfr in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
Requirement already satisfied: ConfigSpace>=0.3.1 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
Requirement already satisfied: pynisher>=0.4.1 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
Requirement already satisfied: psutil in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
Requirement already satisfied: typing in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
Requirement already satisfied: setuptools in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg (from smac==0.3.0)
Requirement already satisfied: Cython in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
Requirement already satisfied: six in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from smac==0.3.0)
Requirement already satisfied: pyparsing in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from ConfigSpace>=0.3.1->smac==0.3.0)
Requirement already satisfied: argparse in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from ConfigSpace>=0.3.1->smac==0.3.0)
Requirement already satisfied: docutils>=0.3 in /home/ubuntu/anaconda3/envs/ml_env/lib/python3.6/site-packages (from pynisher>=0.4.1->smac==0.3.0)

opened by ajing 21

making data preprocessing step configurable with two options no preprocessing and feature type split
New approach for #900 : Data preprocessing step of AutoML system will have two options:

use feature_type split method (existing implementation)

disable step by selecting no_preprocessing

Introduced new parameters:

include_data_preprocessors

exclude_data_preprocessors

If none of the parameter is set by user, by default, it will add 'no_preprocessing' to step exclude_data_preprocessors and will only use existing FeatureTypeSplit component.
opened by rabsr 18
add pyrfr's gcc requirements to installation guide

Trying to install auto-sklearn and more specifically pyrfr in an anaconda environment failed due to a wrong version of GCC. This happened even though - and as it turns out especially because - I followed the instruction to conda install gcc swig given in the installation guide.

Building pyrfr failed for me with GCC 4.8.5 (which was installed through conda) and worked with the system GCC 7.1.1. The pyrfr readme hints on using GCC 5.4 or 6.2. Sadly I can't identify the minimum working version of GCC to build pyrfr right now but would recommend adding an according hint to the installation instructions on the website.

opened by LGro 18
Document AutoSklearnClassifier constructor options

Dear auto-sklearn team, I have just learned about this project and am very excited to try to include it into my modelling flow!

It seems the command line option names for autosklearn are not the same as what AutoSklearnClassifier() constructor would accept. So I have kind of reverse engineered a few but I still can not figure out whether it is possible to specify task_type for example. task_type="binary.classification" is rejected by the AutoSklearnClassifier() constructor.

I understand this is a very young project and is actively worked on. Will be happy to supply you with the feedback from the fields as I am actively running modelling experiments on various datasets available at my company. Currently I am successfully using scikit learn with SGD Classifier for one. Is there a better way to reconnect with you in a forum or a chat somewhere to ask questions or give feedback?

opened by Motorrat 18
Improve user method of seeing pipelines generated

Currently, the easiest way for a user to see the pipelines included in the ensemble is through estimator.show_models() which just returns a str which needs to be manually parsed and looked through. There could definitely be a nicer format to view any such pipeline and provide easy access.
Good first issue maintenance

opened by eddiebergman 17
$Error creating dummy predictions: {'error': 'Result queue is empty', 'configuration_origin': 'DUMMY'}$

Error creating dummy predictions: {'error': 'Result queue is empty', 'configuration_origin': 'DUMMY'}

Greetings,

I am trying to run autosklearn 0.6.0 on an HPC infrastructure. I am using the kr-vs-kp dataset (https://www.openml.org/d/3) and I have made all the values of the dataset numeric. The error I get is:

Unhandled exception in thread started by libgcc_s.so.1 must be installed for pthread_cancel to work Traceback (most recent call last): File "main_script.py", line 41, in <module> autosklearn_script.autosklearn_func(train_data,train_target,test_data,test_target,dataset_info_dict,task,time_threshold) File "/users/pr004/ixanthos/test_files/autosklearn_script.py", line 31, in autosklearn_func automl.fit(train_data, train_target, metric=autosklearn.metrics.roc_auc) File "/apps/applications/pytorch/1.3.1/lib/python3.6/site-packages/autosklearn/estimators.py", line 664, in fit dataset_name=dataset_name, File "/apps/applications/pytorch/1.3.1/lib/python3.6/site-packages/autosklearn/estimators.py", line 337, in fit self._automl[0].fit(**kwargs) File "/apps/applications/pytorch/1.3.1/lib/python3.6/site-packages/autosklearn/automl.py", line 996, in fit load_models=load_models, File "/apps/applications/pytorch/1.3.1/lib/python3.6/site-packages/autosklearn/automl.py", line 208, in fit only_return_configuration_space=only_return_configuration_space, File "/apps/applications/pytorch/1.3.1/lib/python3.6/site-packages/autosklearn/automl.py", line 384, in _fit num_run = self._do_dummy_prediction(datamanager, num_run) File "/apps/applications/pytorch/1.3.1/lib/python3.6/site-packages/autosklearn/automl.py", line 313, in _do_dummy_prediction raise ValueError("Dummy prediction failed: %s " % str(additional_info)) ValueError: Dummy prediction failed: {'error': 'Result queue is empty', 'configuration_origin': 'DUMMY'}

Any ideas?

Thank you in advance.
bug

opened by iXanthos 17

Unable to process large data set??

I have a data set which is more than 100k records. When I try to fit into AutoSklearnRegressor, it always thrown an warning. It seems causing that I cannot get a expected output.

However, if number of records is small enough (says less than 20k), it can execute without any warming/ error. May you advise this situation? I am using 0.2 version

Sample code

import autosklearn.regression
import numpy as np

x = np.random.randint(2, size=(250000,100))
y = np.random.randint(2, size=(250000,1))


feature_types = (['numerical'] * 100)
automl = autosklearn.regression.AutoSklearnRegressor(
    time_left_for_this_task=120, per_run_time_limit=30
)
automl.fit(x, y, dataset_name='boston', feat_type=feature_types)

The exception is

WARNING] [2017-10-26 11:21:28,580:AutoMLSMBO(1)::boston] Could not find meta-data directory /home/anaconda3/lib/python3.5/site-packages/autosklearn/metalearning/files/r2_regression_dense

/home/anaconda3/lib/python3.5/site-packages/autosklearn/smbo.py:737: RuntimeWarning: invalid value encountered in true_divide
  (1. - dataset_minimum))
/home/anaconda3/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
  return (self.a < x) & (x < self.b)
/home/anaconda3/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
  return (self.a < x) & (x < self.b)
/home/anaconda3/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:1735: RuntimeWarning: invalid value encountered in greater_equal
  cond2 = (x >= self.b) & cond0
/home/anaconda3/lib/python3.5/site-packages/scipy/stats/_distn_infrastructure.py:876: RuntimeWarning: invalid value encountered in greater_equal

opened by mkcedward 17

Speed and time budgets

In a text classification task the SGDClassifier needs just a few minutes to get to the same result as auto-sklearn that I let running for 20 hours. Lesser time budget for auto-sklearn resulted in absolute or relative failure in prediction.

I wonder if there is a strategy to try the fastest algorithms first and if time is up use their results at least?

Another question is about the recommended per_run_time_limit value. Is there a rule of thumb choosing it?

SGDClassifier Precision: 0.20 Test FR Recall: 0.53 F1: 0.29 Auto-sklearn Precision: 0.28 Recall: 0.31 F1: 0.29 classifier.fit(X_train, y_train, metric='f1_metric') AutoSklearnClassifier( time_left_for_this_task=72000, per_run_time_limit=19000, ml_memory_limit=10000)

opened by Motorrat 17
Bump actions/stale from 6 to 7
Bumps actions/stale from 6 to 7.

Release notes

Sourced from actions/stale's releases.

v7.0.0

⚠️ This version contains breaking changes ⚠️

What's Changed

Allow daysBeforeStale options to be float by @irega in actions/stale#841

Use cache in check-dist.yml by @jongwooo in actions/stale#876

fix print outputs step in existing workflows by @irega in actions/stale#859

Update issue and PR templates, add/delete workflow files by @IvanZosimov in actions/stale#880

Update how stale handles exempt items by @johnsudol in actions/stale#874

Breaking Changes

In this release we prevent this action from managing the stale label on items included in exempt-issue-labels and exempt-pr-labels

We decided that this is outside of the scope of this action, and to be left up to the maintainer

New Contributors

@irega made their first contribution in actions/stale#841

@jongwooo made their first contribution in actions/stale#876

@IvanZosimov made their first contribution in actions/stale#880

@johnsudol made their first contribution in actions/stale#874

Full Changelog: https://github.com/actions/stale/compare/v6...v7.0.0

v6.0.1

Update @actions/core to 1.10.0 #839

Full Changelog: https://github.com/actions/stale/compare/v6.0.0...v6.0.1

Changelog

Sourced from actions/stale's changelog.

Changelog

[7.0.0]

:warning: Breaking change :warning:

Allow daysBeforeStale options to be float by @irega in actions/stale#841

Use cache in check-dist.yml by @jongwooo in actions/stale#876

fix print outputs step in existing workflows by @irega in actions/stale#859

Update issue and PR templates, add/delete workflow files by @IvanZosimov in actions/stale#880

Update how stale handles exempt items by @johnsudol in actions/stale#874

[6.0.1]

Update @actions/core to v1.10.0 (#839)

[6.0.0]

:warning: Breaking change :warning:

Issues/PRs default close-issue-reason is now not_planned(#789)

[5.1.0]

Don't process stale issues right after they're marked stale [Add close-issue-reason option]#764 #772 Various dependabot/dependency updates

4.1.0 (2021-07-14)

Features

Ability to exempt draft PRs

4.0.0 (2021-07-14)

Features

options: simplify config by removing skip stale message options (#457) (6ec637d), closes #405 #455

output: print output parameters (#458) (3e6d35b)

Bug Fixes

dry-run: forbid mutations in dry-run (#500) (f1017f3), closes #499

logs: coloured logs (#465) (5fbbfba)

operations: fail fast the current batch to respect the operations limit (#474) (5f6f311), closes #466

label comparison: make label comparison case insensitive #517, closes #516

filtering comments by actor could have strange behavior: "stale" comments are now detected based on if the message is the stale message not who made the comment(#519), fixes #441, #509, #518

Breaking Changes

... (truncated)

Commits

6f05e42 draft release for v7.0.0 (#888)

eed91cb Update how stale handles exempt items (#874)

10dc265 Merge pull request #880 from akv-platform/update-stale-repo

9c1eb3f Update .md files and allign build-test.yml with the current test.yml

bc357bd Update .github/workflows/release-new-action-version.yml

690ede5 Update .github/ISSUE_TEMPLATE/bug_report.md

afbcabf Merge branch 'main' into update-stale-repo

e364411 Update name of codeql.yml file

627cef3 fix print outputs step (#859)

975308f Merge pull request #876 from jongwooo/chore/use-cache-in-check-dist

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
[Question] How to get values of categorical variables from a fit model?

How can one get back the values of categorical variables from a fitted model? Let's say that I have a model where one of the features is a categorical variable, is there a way to get back the values of that variable that were observed during training?

My use case is I have a multi-model system that I am building (multiple auto-sklearn models, not ensemble) and I want to implement some logic for deciding which model to use depending on if a certain categorical value was observed during training. In scikit-learn this could be easily accessed from the categories_ attribute of a OneHotEncoder, but given the complex nature of auto-sklearn classes and use of ensembles I'm not sure where to begin looking.

Alternatively, one could set an encoder to error on unknown values, and build logic around catching these errors. This doesn't work for auto-sklearn either because the "missing" category is always created, so models will always successfully predict on missing values without any sign that it was on an unknown categorical value.

Any help here would be appreciated.

Running version 0.14.7

opened by eliwoods 0

[Question] How to know the data and feature preprocessing used in the ensemble?

Hi, the method AutoSklearnClassifier().show_models() displays the models found in the ensemble. I wonder if it is possible to know exactly which data or feature preprocessing steps have been done before training the model. The method show_models only gives only the object:

{'model_id': 2, 
'rank': 1, 
'cost': 0.04255319148936165, 
'ensemble_weight': 0.04, 
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f704fb2dee0>, 
'balancing': Balancing(random_state=1), 
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f70a7e4e7f0>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f70a7e4e1c0>, 
'sklearn_classifier': RandomForestClassifier(max_features=5, n_estimators=512, n_jobs=1,
                       random_state=1, warm_start=True)}

and it is not clear to know which steps they are. Is it possible to get the preprocessing steps in such a case?

Many thanks!

opened by beantr96 0

Bump actions/checkout from 3.1.0 to 3.2.0
Bumps actions/checkout from 3.1.0 to 3.2.0.

Release notes

Sourced from actions/checkout's releases.

v3.2.0

What's Changed

Add GitHub Action to perform release by @rentziass in actions/checkout#942

Fix status badge by @ScottBrenner in actions/checkout#967

Replace datadog/squid with ubuntu/squid Docker image by @cory-miller in actions/checkout#1002

Wrap pipeline commands for submoduleForeach in quotes by @jokreliable in actions/checkout#964

Update @actions/io to 1.1.2 by @cory-miller in actions/checkout#1029

Upgrading version to 3.2.0 by @vmjoseph in actions/checkout#1039

New Contributors

@ScottBrenner made their first contribution in actions/checkout#967

@cory-miller made their first contribution in actions/checkout#1002

@jokreliable made their first contribution in actions/checkout#964

@vmjoseph made their first contribution in actions/checkout#1039

Full Changelog: https://github.com/actions/checkout/compare/v3...v3.2.0

Changelog

Sourced from actions/checkout's changelog.

Changelog

Commits

755da8c 3.2.0 (#1039)

26d48e8 Update @actions/io to 1.1.2 (#1029)

bf08527 wrap pipeline commands for submoduleForeach in quotes (#964)

5c3ccc2 Replace datadog/squid with ubuntu/squid Docker image (#1002)

1f9a0c2 README - fix status badge (#967)

8230315 Add workflow to update a main version (#942)

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
Multilabel Classification with AutoSklearn2Classifier - sgd component not valid error
Hello Auto-sklearn team,

Describe the bug

AutoSklearn2Classifier with y-variable of type multilabel-indicator erroring with component sgd. Demonstrated below with example data.

To Reproduce

Steps to reproduce the behavior:

Using the sample multilabel classification data from https://automl.github.io/auto-sklearn/master/examples/20_basic/example_multilabel_classification.html

autosklearn.classification.AutoSklearnClassifier is working fine with y-variable of type multilabel-indicator

Importing AutoSklearn2Classifier (from autosklearn.experimental.askl2 import AutoSklearn2Classifier)

AutoSklearn2Classifier works fine with y-variable of type binary (e.g. using just the first column of the y-matrix)

but AutoSklearn2Classifier with y-variable of type multilabel-indicator is producing the error below

Actual behavior, stacktrace or logfile

Text-based error: ValueError: The provided component 'sgd' for the key 'classifier' in the 'include' argument is not valid. The supported components for the step 'classifier' for this task are ['bernoulli_nb', 'decision_tree', 'extra_trees', 'gaussian_nb', 'k_nearest_neighbors', 'lda', 'liblinear_svc', 'mlp', 'multinomial_nb', 'passive_aggressive', 'qda', 'random_forest']

Environment and installation:

Google Colab Jupyter Notebook

Python 3.8.16

Auto-sklearn version 0.15.0

Best, Anthony
opened by anthonyromyn 0

Releases(v0.14.7)

v0.14.7(Aug 18, 2022)
Version 0.14.7

HOTFIX #1445: Locks ConfigSpace to <0.5.0 and smac to <1.3. Adds upper bounds on automl packages to help prevent further issues.

Contributors v0.14.7

Eddie Bergman

Note, this release was generated later but has been on PyPI for a while
Source code(tar.gz)
Source code(zip)
v0.14.6(Feb 18, 2022)
Version 0.14.6

HOTFIX #1407: Catches keyword arguments in SingleThreadedClient so they don't get passed to it's executing func.

Contributors v0.14.6

Eddie Bergman

Source code(tar.gz)
Source code(zip)
v0.14.5(Jan 25, 2022)

Source code(tar.gz)
Source code(zip)
auto-sklearn-0.14.5.tar.gz(6.12 MB)
v0.14.4(Jan 25, 2022)
Version 0.14.4

Fix #1356: SVR degree hyperparameter now only active with "poly" kernel.

Add #1311: Black format checking (non-strict).

Maint #1306: Run history is now saved every iteration

Doc #1309: Updated the doc faqs to include many use cases and the manual for early introductions

Doc #1322: Fix typo in contribution guide

Maint #1326: Add isort checker (non-strict)

Maint #1238, #1346, #1368, #1370: Update warnings in tests

Maint #1325: Test workflow can now be manually triggered

Maint #1332: Update docstring and typing of include and exclude params

Add #1260: Support for Python 3.10

Add #1318: First update to use the shared backend in a new submodule automl_common <https://github.com/automl/automl_common>_

Fix #1339: Resolve dependancy issues with sphinx_toolbox

Fix #1335: Fix issue where some regression algorithm gave incorrect output dimensions as raised in #1297

Doc #1340: Update example for predefined splits

Fix #1329: Fix random state not being passed to the ConfigurationSpace

Maint #1348: Stop double triggering of github workflows

Doc #1349: Rename OSX to macOS in docs

Add #1321: Change show_models() to produce actual pipeline objects and not a str

Maint #1361: Remove flaky dependency

Maint #1366: Make SimpleClassificationPipeline tests more deterministic

Maint #1367: Update test values for MLPRegressor with newer numpy

Contributors v0.14.4

Eddie Bergman

Matthias Feurer

Katharina Eggensperger

UserFindingSelf

partev

Source code(tar.gz)
Source code(zip)
auto-sklearn-0.14.4.tar.gz(6.10 MB)
v0.14.3(Dec 25, 2021)

Source code(tar.gz)
Source code(zip)
auto-sklearn-0.14.3.tar.gz(6.00 MB)
v0.14.1(Nov 9, 2021)
Version 0.14.1

FIX #1248: Allow for sparse y_test.

FIX #1259: Fix an issue that could result in setup.py not working due to relative paths being chosen.

MAINT #1261: Include a CITATION.cff file

MAINT #1263: Make unit test deterministic.

DOC #1269: Fix example on extending data preprocessing.

DOC #1270: Remove >>> from code examples in the documentation.

DOC #1271: Fix a typo in an example in the documentation.

DOC #1282: Add a contribution guide.

Contributors

Edward Bergman

Michael Becker

Katharina Eggensperger

Source code(tar.gz)
Source code(zip)
v0.14.0(Sep 14, 2021)
Version 0.14.0

ADD #900: Make data preprocessing more configurable, for example allow to completely disable it.

ADD #1128: Adds new functionality to retrieve data for an accuracy over time plot from Auto-sklearn without additional code.

FIX #1149: Stops Auto-sklearn from printing weird warnings (Exception ignored in [...]) at shutdown.

FIX #1169: Fixes a bug which made cross-validation and multi-output regression incompatible.

FIX #1170: Make all preprocessing techniques deterministic.

FIX #1190: Fixes a bug which could make predictive probabilities contain too few classes in case one class was only present a single time.

FIX #1209: Pass random states to pipeline objects.

FIX #1204: Add support for sparse data in Auto-sklearn 2.0.

FIX #1210: Add support for sparse y labels.

FIX #1245: Fixes a bug which could result in Auto-sklearn crashing in case a class was present only once.

DOC #532,#1242: Simplify installation instructions.

DOC #1144: Document installation via conda

DOC #1195,#1201,#1214: Fix a few typos and links. Make some http links https links.

DOC #1200: Fixes variable name in an example.

DOC #1229: Improve code formatting in the documentation.

DOC #1235: Improve docker startup command so it also work on Windows.

MAINT #1198: Use latest Ubuntu LTS (20:04) for github actions.

MAINT #1231: The command make linkcheck no longer builds the documentation, speeding up link-checking.

MAINT #1233: Enable regression testing with 3 classification and 3 regression datasets on github actions.

MAINT #1239: Increase the timeout for github actions to 60 minutes.

Contributors v0.14.0

Pieter Gijsbers

Taneli Mielikäinen

Rohit Agarwal

hnishi

Francisco Rivera Valverde

Eddie Bergman

Satyam Jha

Joel Jose

Oli

Matthias Feurer

Source code(tar.gz)
Source code(zip)
v0.13.0(Jul 28, 2021)
Version 0.13.0

ADD #1100: Provide access to the callbacks of SMAC.

ADD #1185: New leaderboard functionality to visualize models

FIX #1133: Refer to the correct attribute in an error message.

FIX #1154: Allow running Auto-sklearn on a 32-bit system.

MAINT #924: Instead of passing classes for the resampling strategy one has now to pass objects.

MAINT #1108: Limit the number of threads used by numpy and/or scikit-learn via threadpoolctl.

MAINT #1135: Simplify internal workflow of pandas handling. This results in pandas being passed directly passed to scikit-learn models instead of being internally converted into a numpy array. However, this should neither impact the behavior nor the performance of Auto-sklearn.

MAINT #1157: Drop support for Python 3.6, enable support for Python 3.9.

MAINT #1159: Remove the output directory argument to the classifier and regressor. Despite the name, the output directory was not used and was a leftover from participating in the AutoML challenges.

MAINT #1187: Bump requires SMAC version to at least 0.14.

DOC #1109: Add an FAQ.

DOC #1126: Add new examples on how to use scikit-learn's inspect module.

DOC #1136: Add a new example on how to perform multi-output regression.

DOC #1152: Enable link checking when buiding the documentation.

DOC #1158: New example on how to configure the logger for Auto-sklearn.

DOC #1165: Improve the readme page.

Contributors v0.13.0

Francisco Rivera Valverde

Matthias Feurer

JJ Ben-Joseph

Isaac Chung

Katharina Eggensperger

bitsbuffer

Eddie Bergman

olehb007

Source code(tar.gz)
Source code(zip)
v0.12.7(Jul 20, 2021)
Version 0.12.7

ADD #1178: Reduce precision if dataset is too large for given memory limit.

ADD #1179: Improve Auto-sklearn 2.0 meta-data by providing new meta-data for the metrics roc_auc and logloss.

DOC: Fix reference to arXiv paper

MAINT #1134,#1142,#1143: Improvements to the stale bot - the stale bot now marks issues labeled with feedback required as stale if there is nothing happening for 30 days. After another 7 days it then closes the issue.

MAINT: Added a new issue template for questions.

MAINT #1168: Upper-bound scipy to 1.6.3 as 1.7.0 is incompatible with SMAC.

MAINT #1173: Update the license files to be recognized by github.

Contributors v0.12.7
Source code(tar.gz)
Source code(zip)
v0.12.6(Apr 17, 2021)
Version 0.12.6

ADD #886: Provide new function which allows fitting only a single configuration.

DOC #1070: Clarify example on how successive halving and Bayesian optimization play together.

DOC #1112: Fix type.

DOC #1122: Add Python 3 to the installation command for Ubuntu.

FIX #1114: Fix a bug which made printing dummy models fail.

FIX #1117: Fix a bug previously made memory_limit=None fail.

FIX #1121: Fix an edge case which could decrease performance in Auto-sklearn 2.0 when using cross-validation with iterative fitting.

FIX #1123: Fix a bug autosklearn.metrics.calculate_score for metrics/scores which need to be minimized where the function previously returned the loss and not the score.

FIX #1115/#1124: Fix a bug which would prevent Auto-sklearn from computing meta-features in the multiprocessing case.

Contributors v0.12.6

Francisco Rivera Valverde

stock90975

Lucas Nildaimon dos Santos Silva

Matthias Feurer

Rohit Agarwal

Source code(tar.gz)
Source code(zip)
v0.12.5(Mar 29, 2021)
Version 0.12.5

MAINT: Remove Cython and numpy as installation requirements.

Contributors 0.12.5

Matthias Feurer

Source code(tar.gz)
Source code(zip)
v0.12.4(Mar 29, 2021)
Version 0.12.4

ADD #660: Enable scikit-learn's power transformation for input features.

MAINT: Bump the pyrfr minimum dependency to 0.8.1 to automatically download wheels from pypi if possible.

FIX #732: Add a missing size check into the GMEANS clustering used for the NeurIPS 2015 paper.

FIX #1050: Add missing arguments to the AutoSklearn2Classifier signature.

FIX #1072: Fixes a bug where the AutoSklearn2Classifier could not be created due to trying to cache to the wrong directory.

Contributors v0.12.4

Matthias Feurer

Francisco Rivera

Maximilian Greil

Pepe Berba

Source code(tar.gz)
Source code(zip)
v0.12.3(Feb 17, 2021)
Version 0.12.3

FIX #1061: Fixes a bug where the model could not be printed in a jupyter notebook.

FIX #1075: Fixes a bug where the ensemble builder would wrongly prune good models for loss functions (i.e. functions that need to be minimized such as logloss or mean_squared_error.

FIX #1079: Fixes a bug where AutoMLClassifier.cv_results and AutoMLRegressor.cv_results could rank results in opposite order for loss functions (i.e. functions that need to be minimized such as logloss or mean_squared_error.

FIX: Fixes a bug in offline meta-data generation that could lead to a deadlock.

MAINT #1076: Uses the correct multiprocessing context for computing meta-features

MAINT: Cleanup readme and main directory

Contributors v0.12.3

Matthias Feurer

ROHIT AGARWAL

Francisco Rivera

Source code(tar.gz)
Source code(zip)
v0.12.2(Feb 3, 2021)
Version 0.12.2

ADD #1045: New example demonstrating how to log multiple metrics during a run of Auto-sklearn.

DOC #1052: Add links to mybinder

DOC #1059: Improved the example on manually starting workers for Auto-sklearn.

FIX #1046: Add the final result of the ensemble builder to the ensemble builder trajectory.

MAINT: Two log outputs of level warning about metadata were turned reduced to the info loglevel as they are not actionable for the user.

MAINT #1062: Use threads for local dask workers and forkserver to start subprocesses to reduce overhead.

MAINT #1053: Remove the restriction to guard single-core Auto-sklearn by __main__ == "__name__" again.

Contributors v0.12.2

Matthias Feurer

ROHIT AGARWAL

Francisco Rivera

Katharina Eggensperger

Source code(tar.gz)
Source code(zip)
v0.12.1(Apr 13, 2021)
Version 0.12.1

ADD: A new heuristic which gives a warning and subsamples the data if it is too large for the given memory_limit.

ADD #1024: Tune scikit-learn's MLPClassifier and MLPRegressor.

MAINT #1017: Improve the logging server introduced in release 0.12.0.

MAINT #1024: Move to scikit-learn 0.24.X.

MAINT #1038: Use new datasets for regression and classification and also update the metadata used for Auto-sklearn 1.0.

MAINT #1040: Minor speed improvements in the ensemble selection algorithm.

Contributors v0.12.1

Matthias Feurer

Katharina Eggensperger

Francisco Rivera

Source code(tar.gz)
Source code(zip)
v0.12.1rc1(Dec 22, 2020)
Version 0.12.1

ADD: A new heuristic which gives a warning and subsamples the data if it is too large for the given memory_limit.

ADD #1024: Tune scikit-learn's MLPClassifier and MLPRegressor.

MAINT #1017: Improve the logging server introduced in release 0.12.0.

MAINT #1024: Move to scikit-learn 0.24.X.

MAINT #1038: Use new datasets for regression and classification and also update the metadata used for Auto-sklearn 1.0.

MAINT #1040: Minor speed improvements in the ensemble selection algorithm.

Contributors v0.12.1

Matthias Feurer

Katharina Eggensperger

Francisco Rivera

Source code(tar.gz)
Source code(zip)
v0.12.0(Dec 8, 2020)
Version 0.12.0

BREAKING: Auto-sklearn must now be guarded by __name__ == "__main__" due to the use of the spawn multiprocessing context.

ADD #1026: Adds improved meta-data for Auto-sklearn 2.0 which results in strong improved performance.

MAINT #984 and #1008: Move to scikit-learn 0.23.X

MAINT #1004: Move from travis-ci to github actions.

MAINT 8b67af6: drop the requirement to the lockfile package.

FIX #990: Fixes a bug that made Auto-sklearn fail if there are missing values in a pandas DataFrame.

FIX #1007, #1012 and #1014: Log multiprocessing output via a new log server. Remove several potential deadlocks related to the joint use of multi-processing, multi-threading and logging.

Contributors v0.12.0

Matthias Feurer

ROHIT AGARWAL

Francisco Rivera

Source code(tar.gz)
Source code(zip)
v0.11.1(Nov 11, 2020)
Version 0.11.1

FIX #989: Fixes a bug where y was not passed to all data preprocessors which made 3rd party category encoders fail.

FIX #1001: Fixes a bug which could make Auto-sklearn fail at random.

MAINT #1000: Introduce a minimal version for dask.distributed.

Source code(tar.gz)
Source code(zip)
v0.11.0(Nov 6, 2020)
Version 0.11.0

ADD #992: Move ensemble building from being a separate process to a job submitted to the dask cluster. This allows for better control of the memory used in multiprocessing settings. This change also removes the arguments ensemble_memory_limit and ml_memory_limit and replaces them by the single argument memory_limit.

FIX #905: Make AutoSklearn2Classifier picklable.

FIX #970: Fix a bug where Auto-sklearn would fail if categorical features are passed as a Pandas Dataframe.

MAINT #772: Improve error message in case of dummy prediction failure.

MAINT #948: Finally use Pandas >= 1.0.

MAINT #973: Improve meta-data by running meta-data generation for more time and separately for important metrics.

MAINT #997: Improve memory handling in the ensemble building process. This allows building ensembles for larger datasets.

Contributors v0.11.0

Matthias Feurer

Francisco Rivera

Karl Leswing

ROHIT AGARWAL

Source code(tar.gz)
Source code(zip)
v0.10.0(Sep 26, 2020)
Version 0.10.0

ADD #325: Allow to separately optimize metrics for metadata generation.

ADD #946: New dask backend for parallel Auto-sklearn.

BREAKING #947: Drop Python3.5 support.

BREAKING #946: Remove shared model mode for parallel Auto-sklearn.

FIX #351: No longer pass un-picklable logger instances to the target function.

FIX #840: Fixes a bug which prevented computing metadata for regression datasets. Also adds a unit test for regression metadata computation.

FIX #897: Allow custom splitters to be used with multi-ouput regression.

FIX #951: Fixes a lot of bugs in the regression pipeline that caused bad performance for regression datasets.

FIX #953: Re-add liac-arff as a dependency.

FIX #956: Fixes a bug which could cause Auto-sklearn not to find a model on disk which is part of the ensemble.

FIX #961: Fixes a bug which caused Auto-sklearn to load bad meta-data for metrics which cannot be computed on multiclass datasets (especially ROC_AUC).

DOC #498: Improve the example on resampling strategies by showing how to pass scikit-learn's splitter objects to Auto-sklearn.

DOC #670: Demonstrate how to give access to training accuracy.

DOC #872: Improve an example on how obtain the best model.

DOC #940: Improve documentation of the docker image.

MAINT: Improve the docker file by setting environment variable that restrict BLAS and OMP to only use a single core.

MAINT #949: Replace pip by pip3 in the installation guidelines.

MAINT #280, #535, #956: Update meta-data and include regression meta-data again.

Contributors v0.10.0

Francisco Rivera

Matthias Feurer

felixleungsc

Chu-Cheng Fu

Francois Berenger

Source code(tar.gz)
Source code(zip)
v.0.8.0(Jul 8, 2020)
Version 0.8.0

ADD #803: multi-output regression

ADD #893: new Auto-sklearn mode Auto-sklearn 2.0

Contributors

Chu-Cheng Fu

Matthias Feurer

Source code(tar.gz)
Source code(zip)
v.0.7.1(Jul 3, 2020)
Version 0.7.1

ADD #764: support for automatic per_run_time_limit selection

ADD #864: add the possibility to predict with cross-validation

ADD #874: support to limit the disk space consumption

MAINT #862: improved documentation and render examples in web page

MAINT #869: removal of competition data manager support

MAINT #870: memory improvements when building ensemble

MAINT #882: memory improvements when performing ensemble selection

FIX #701: scaling factors for metafeatures should not be learned using test data

FIX #715: allow unlimited ML memory

FIX #771: improved worst possible result calculation

FIX #843: default value for SelectPercentileRegression

FIX #852: clip probabilities within [0-1]

FIX #854: improved tmp file naming

FIX #863: SMAC exceptions also registered in log file

FIX #876: allow Auto-sklearn model to be cloned

FIX #879: allow 1-D binary predictions

Contributors

Matthias Feurer

Xiaodong DENG

Francisco Rivera

Source code(tar.gz)
Source code(zip)
v.0.7.0(May 7, 2020)
Version 0.7.0

ADD #785: user control to reduce the hard drive memory required to store ensembles

ADD #794: iterative fit for gradient boosting

ADD #795: add successive halving evaluation strategy

ADD #814: new sklearn.metrics.balanced_accuracy_score instead of custom metric

ADD #815: new experimental evaluation mode called iterative_cv

MAINT #774: move from scikit-learn 0.21.X to 0.22.X

MAINT #791: move from smac 0.8 to 0.12

MAINT #822: make autosklearn modules PEP8 compliant

FIX #733: fix for n_jobs=-1

FIX #739: remove unnecessary warning

FIX ##769: fixed error in calculation of meta features

FIX #778: support for python 3.8

FIX #781: support for pandas 1.x

Contributors

Andrew Nader

Gui Miotto

Julian Berman

Katharina Eggensperger

Matthias Feurer

Maximilian Peters

Rong-Inspur

Valentin Geffrier

Francisco Rivera

Source code(tar.gz)
Source code(zip)
v.0.6.0(Jan 3, 2020)
Version 0.6.0

MAINT: move from scikit-learn 0.19.X to 0.21.X

MAINT #688: allow for pyrfr version 0.8.X

FIX #680: Remove unnecessary print statement

FIX #600: Remove unnecessary warning

Contributors

Guilherme Miotto

Matthias Feurer

Jin Woo Ahn

Source code(tar.gz)
Source code(zip)
v.0.5.2(May 13, 2019)
Version 0.5.2

FIX #669: Correctly handle arguments to the AutoMLRegressor

FIX #667: Auto-sklearn works with numpy 1.16.3 again.

ADD #676: Allow brackets [ ] inside the temporary and output directory paths.

ADD #424: (Experimental) scripts to reproduce the results from the original Auto-sklearn paper.

Contributors

Jin Woo Ahn (@ahn1340)

Herilalaina Rakotoarison (@herilalaina)

Matthias Feurer (@mfeurer)

yazanobeidi (@yazanobeidi)

Source code(tar.gz)
Source code(zip)
v.0.4.2(Dec 13, 2018)
Version 0.4.2

Fixes #538: Remove rounding errors when giving a training set fraction for holdout.

Fixes #558: Ensemble script now uses less memory and the memory limit can be given to Auto-sklearn.

Fixes #585: Auto-sklearn’s ensemble script produced wrong results when called directly (and not via one of Auto-sklearn’s estimator classes).

Fixes an error in the ensemble script which made it non-deterministic.

MAINT #569: Rename hyperparameter to have a different name than a scikit-learn hyperparameter with different meaning.

MAINT #592: backwards compatible requirements.txt

MAINT #588: Fix SMAC version to 0.8.0

MAINT: remove dependency on the six package

MAINT: upgrade to XGBoost 0.80

Contributors

Taneli Mielikäinen (@tmielika)

Matthias Feurer (@mfeurer)

Diogo Bastos (@diogo-bastos)

Zeyi Wen (@zeyiwen)

Teresa Conceição (@teresaconc)

Jin Woo Ahn (@ahn1340)

Source code(tar.gz)
Source code(zip)
v.0.4.1(Nov 12, 2018)
Changes

Added examples on how to extend Auto-sklearn with a custom classifier, regressor, and preprocessor.

Auto-sklearn now requires numpy version between 1.9.0 and 1.14.5, due to higher versions causing travis failure.

Examples now use sklearn.datasets.load_breast_cancer() instead of sklearn.datasets.load_digits() to reduce memory usage for travis build.

Fixes future warnings on non-tuple sequence for indexing.

Fixes #500: fixes ensemble builder to correctly evaluate model score with any metrics. See PR #522.

Fixes #482 and #491: Users can now set up custom logger configuration by passing a dictionary created by a yaml file to logging_config.

Fixes #566: ensembles are now sorted correctly.

Fixes #293: Auto-sklearn checks if appropriate target type was given for classification and regression before call to fit().

Travis-ci now runs flake8 to enforce pep8 style guide, and uses travis-ci instead of circle-ci for deployment.

Contributors

Matthias Feurer

Manuel Streuhofer

Taneli Mielikäinen

Katharina Eggensperger

Jin Woo Ahn

Source code(tar.gz)
Source code(zip)
v.0.4.0(Jun 19, 2018)

Source code(tar.gz)
Source code(zip)
v.0.3.0(Jan 5, 2018)

Source code(tar.gz)
Source code(zip)
v.0.2.1(Oct 4, 2017)

Source code(tar.gz)
Source code(zip)