SigOpt wrappers for scikit-learn methods

Overview

SigOpt + scikit-learn Interfacing

Build Status

This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together

Getting Started

Install the sigopt_sklearn python modules with pip install sigopt_sklearn.

Sign up for an account at https://sigopt.com. To use the interfaces, you'll need your API token from the API tokens page.

SigOptSearchCV

The simplest use case for SigOpt in conjunction with scikit-learn is optimizing estimator hyperparameters using cross validation. A short example that tunes the parameters of an SVM on a small dataset is provided below

from sklearn import svm, datasets
from sigopt_sklearn.search import SigOptSearchCV

# find your SigOpt client token here : https://sigopt.com/tokens
client_token = '<YOUR_SIGOPT_CLIENT_TOKEN>'

iris = datasets.load_iris()

# define parameter domains
svc_parameters  = {'kernel': ['linear', 'rbf'], 'C': (0.5, 100)}

# define sklearn estimator
svr = svm.SVC()

# define SigOptCV search strategy
clf = SigOptSearchCV(svr, svc_parameters, cv=5,
    client_token=client_token, n_jobs=5, n_iter=20)

# perform CV search for best parameters and fits estimator
# on all data using best found configuration
clf.fit(iris.data, iris.target)

# clf.predict() now uses best found estimator
# clf.best_score_ contains CV score for best found estimator
# clf.best_params_ contains best found param configuration

The objective optimized by default is is the default score associated with an estimator. A custom objective can be used by passing the scoring option to the SigOptSearchCV constructor. Shown below is an example that uses the f1_score already implemented in sklearn

from sklearn.metrics import f1_score, make_scorer
f1_scorer = make_scorer(f1_score)

# define SigOptCV search strategy
clf = SigOptSearchCV(svr, svc_parameters, cv=5, scoring=f1_scorer,
    client_token=client_token, n_jobs=5, n_iter=50)

# perform CV search for best parameters
clf.fit(X, y)

XGBoostClassifier

SigOptSearchCV also works with XGBoost's XGBClassifier wrapper. A hyperparameter search over XGBClassifier models can be done using the same interface

import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from sklearn import datasets
from sigopt_sklearn.search import SigOptSearchCV

# find your SigOpt client token here : https://sigopt.com/tokens
client_token = '<YOUR_SIGOPT_CLIENT_TOKEN>'
iris = datasets.load_iris()

xgb_params = {
  'learning_rate': (0.01, 0.5),
  'n_estimators': (10, 50),
  'max_depth': (3, 10),
  'min_child_weight': (6, 12),
  'gamma': (0, 0.5),
  'subsample': (0.6, 1.0),
  'colsample_bytree': (0.6, 1.)
}

xgbc = XGBClassifier()

clf = SigOptSearchCV(xgbc, xgb_params, cv=5,
    client_token=client_token, n_jobs=5, n_iter=70, verbose=1)

clf.fit(iris.data, iris.target)

SigOptEnsembleClassifier

This class concurrently trains and tunes several classification models within sklearn to facilitate model selection efforts when investigating new datasets.

You'll need to install the sigopt_sklearn library with the extra requirements of xgboost for this aspect of the library to work:

pip install sigopt_sklearn[ensemble]

A short example, using an activity recognition dataset is provided below We also have a video tutorial outlining how to run this example here:

SigOpt scikit-learn Tutorial

# Human Activity Recognition Using Smartphone
# https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
wget https://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip
unzip UCI\ HAR\ Dataset.zip
cd UCI\ HAR\ Dataset
import numpy as np
import pandas as pd
from sigopt_sklearn.ensemble import SigOptEnsembleClassifier

def load_datafile(filename):
  X = []
  with open(filename, 'r') as f:
    for l in f:
      X.append(np.array([float(v) for v in l.split()]))
  X = np.vstack(X)
  return X

X_train = load_datafile('train/X_train.txt')
y_train = load_datafile('train/y_train.txt').ravel()
X_test = load_datafile('test/X_test.txt')
y_test = load_datafile('test/y_test.txt').ravel()

# fit and tune several classification models concurrently
# find your SigOpt client token here : https://sigopt.com/tokens
sigopt_clf = SigOptEnsembleClassifier()
sigopt_clf.parallel_fit(X_train, y_train, est_timeout=(40 * 60),
    client_token='<YOUR_CLIENT_TOKEN>')

# compare model performance on hold out set
ensemble_train_scores = [est.score(X_train,y_train) for est in sigopt_clf.estimator_ensemble]
ensemble_test_scores = [est.score(X_test,y_test) for est in sigopt_clf.estimator_ensemble]
data = sorted(zip([est.__class__.__name__
                        for est in sigopt_clf.estimator_ensemble], ensemble_train_scores, ensemble_test_scores),
                        reverse=True, key=lambda x: (x[2], x[1]))
pd.DataFrame(data, columns=['Classifier ALGO.', 'Train ACC.', 'Test ACC.'])

CV Fold Timeouts

SigOptSearchCV performs evaluations on cv folds in parallel using joblib. Timeouts are now supported in the master branch of joblib and SigOpt can use this timeout information to learn to avoid hyperparameter configurations that are too slow.

from sklearn import svm, datasets
from sigopt_sklearn.search import SigOptSearchCV

# find your SigOpt client token here : https://sigopt.com/tokens
client_token = '<YOUR_SIGOPT_CLIENT_TOKEN>'
dataset = datasets.fetch_20newsgroups_vectorized()
X = dataset.data
y = dataset.target

# define parameter domains
svc_parameters  = {
  'kernel': ['linear', 'rbf'],
  'C': (0.5, 100),
  'max_iter': (10, 200),
  'tol': (1e-2, 1e-6)
}
svr = svm.SVC()

# SVM fitting can be quite slow, so we set timeout = 180 seconds
# for each fit.  SigOpt will then avoid configurations that are too slow
clf = SigOptSearchCV(svr, svc_parameters, cv=5, opt_timeout=180,
    client_token=client_token, n_jobs=5, n_iter=40)

clf.fit(X, y)

Categoricals

SigOptSearchCV supports categorical parameters specified as list of string as the kernel parameter is in the SVM example:

svc_parameters  = {'kernel': ['linear', 'rbf'], 'C': (0.5, 100)}

SigOpt also supports non-string valued categorical parameters. For example the hidden_layer_sizes parameter in the MLPRegressor example below,

parameters = {
  'activation': ['relu', 'tanh', 'logistic'],
  'solver': ['lbfgs', 'adam'],
  'alpha': (0.0001, 0.01),
  'learning_rate_init': (0.001, 0.1),
  'power_t': (0.001, 1.0),
  'beta_1': (0.8, 0.999),
  'momentum': (0.001, 1.0),
  'beta_2': (0.8, 0.999),
  'epsilon': (0.00000001, 0.0001),
  'hidden_layer_sizes': {
    'shallow': (100,),
    'medium': (10, 10),
    'deep': (10, 10, 10, 10)
  }
}
nn = MLPRegressor()
clf = SigOptSearchCV(nn, parameters, cv=5, cv_timeout=240,
    client_token=client_token, n_jobs=5, n_iter=40)

clf.fit(X, y)
Comments
  • cross-platform timeout

    cross-platform timeout

    Adapted from StackOverflow

    confirmed that toy functions (sleep $n) return either 0 (success) or -9 (pkill) on timeout.

    Does our testing framework include MacOS (which was the motivating use-case for this change)? Or Windows?

    opened by anderson-dan-w 9
  • Allow the option to use Connection objects instead of client tokens w…

    Allow the option to use Connection objects instead of client tokens w…

    Allow the option to use Connection objects instead of client tokens with SigOptSearchCV. Some code hygiene as well.

    Most important change is https://github.com/sigopt/sigopt_sklearn/compare/local_connections?expand=1#diff-17df7515f991ddca25b43305a4f5874eR276.

    @iandewancker @mikemccourt

    opened by samuela 7
  • Fix broken build

    Fix broken build

    Last passing test was January 31, 2019. Unfortunately, re-running it causes the build to fail, most likely due to changing underlying dependencies.

    Now tests break with an xgboost error. Since January 31, 2019, versions 0.82 and 0.90 have been released, so try walking back until we see passing tests

    opened by alexandraj777 4
  • SigOpt_sklearn incompatible with scikit-learn 0.21

    SigOpt_sklearn incompatible with scikit-learn 0.21

    To reproduce:

    pip install sigopt_sklearn
    pip install sigopt_sklearn[ensemble]
    pip uninstall scikit-learn
    pip install "scikit-learn>=0.21"
    

    Then, run ensemble example: https://app.sigopt.com/docs/overview/scikit_learn

    Observe that fit_params has been removed as an argument to BaseSearchCV in sklearn version 0.21

    Also note that it appears our current version of this library also requires versions more recent than 0.17. Versions 0.17 and 0.18 ran into errors of missing libraries, as well.

    opened by alexandraj777 1
  • ensemble throwing error of file not found for timeout

    ensemble throwing error of file not found for timeout

    sigopt_clf.parallel_fit(X_work, y_work, est_timeout=(40 * 60),
        client_token=token)
    

    for the ensemble classifier is throwing:

    ---------------------------------------------------------------------------
    FileNotFoundError                         Traceback (most recent call last)
    <ipython-input-10-60218f176dcc> in <module>()
          3 sigopt_clf = SigOptEnsembleClassifier()
          4 sigopt_clf.parallel_fit(X_work, y_work, est_timeout=(40 * 60),
    ----> 5     client_token=token)
    
    /usr/local/lib/python3.6/site-packages/sigopt_sklearn/ensemble.py in parallel_fit(self, X, y, client_token, est_timeout)
         64         "--X_file", build_args['X_file'], "--y_file", build_args['y_file'],
         65         "--client_token", client_token,
    ---> 66         "--output_file", build_args['output_file']
         67       ]))
         68     exit_codes = [p.wait() for p in sigopt_procs]
    
    /usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors)
        707                                 c2pread, c2pwrite,
        708                                 errread, errwrite,
    --> 709                                 restore_signals, start_new_session)
        710         except:
        711             # Cleanup if the child failed starting.
    
    /usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
       1342                         if errno_num == errno.ENOENT:
       1343                             err_msg += ': ' + repr(err_filename)
    -> 1344                     raise child_exception_type(errno_num, err_msg, err_filename)
       1345                 raise child_exception_type(err_msg)
       1346 
    
    FileNotFoundError: [Errno 2] No such file or directory: 'timeout': 'timeout'
    
    
    opened by geoHeil 1
  • AttributeError: 'NoneType' object has no attribute 'id'

    AttributeError: 'NoneType' object has no attribute 'id'

    https://github.com/sigopt/sigopt-sklearn/blob/6d5de20d2fae04cdb1e732e00f5ff83ccb5fa4d1/sigopt_sklearn/search.py#L262

    Creating SigOpt experiment:  XGBClassifier (sklearn)
    Traceback (most recent call last):
      File "/snap/pycharm-professional/254/plugins/python/helpers/pydev/pydevd.py", line 1483, in _exec
        pydev_imports.execfile(file, globals, locals)  # execute the script
      File "/snap/pycharm-professional/254/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
        exec(compile(contents+"\n", file, 'exec'), glob, loc)
      File "/home/newdriver/Work/Novelis/SigOpt_Eagle/classification_tester.py", line 25, in <module>
        clf.fit(iris.data, iris.target)
      File "/home/newdriver/Work/Novelis/SigOpt_Eagle/venv/lib/python3.7/site-packages/sigopt_sklearn/search.py", line 450, in fit
        return self._fit(X, y=y, groups=groups, **fit_params)
      File "/home/newdriver/Work/Novelis/SigOpt_Eagle/venv/lib/python3.7/site-packages/sigopt_sklearn/search.py", line 337, in _fit
        self.experiment = self._create_sigopt_exp(self.sigopt_connection)
      File "/home/newdriver/Work/Novelis/SigOpt_Eagle/venv/lib/python3.7/site-packages/sigopt_sklearn/search.py", line 262, in _create_sigopt_exp
        exp_url = 'https://sigopt.com/experiment/{0}'.format(self.experiment.id)
    AttributeError: 'NoneType' object has no attribute 'id'
    
    opened by siyujianNovelis 0
  • Unable to Import Modules

    Unable to Import Modules

    Hello Team,

    I am running into errors while trying to import modules from the library.

    Import statement: from sigopt_sklearn.search import SigOptSearchCV

    Error:

    ImportError                               Traceback (most recent call last)
    <ipython-input-2-031260a92d5b> in <module>
    ----> 1 from sigopt_sklearn.search import SigOptSearchCV
    
    ~\Anaconda3\envs\sigopt\lib\site-packages\sigopt_sklearn\search.py in <module>
         12 import numpy
         13 from joblib import Parallel, delayed
    ---> 14 from joblib.func_inspect import getfullargspec
         15 
         16 try:
    
    ImportError: cannot import name 'getfullargspec' from 'joblib.func_inspect' (C:\******\Anaconda3\envs\sigopt\lib\site-packages\joblib\func_inspect.py)
    


    Import statement: from sigopt_sklearn.ensemble import SigOptEnsembleClassifier

    Error:

    OSError                                   Traceback (most recent call last)
    <ipython-input-3-1a36f5fb4231> in <module>
    ----> 1 from sigopt_sklearn.ensemble import SigOptEnsembleClassifier
    
    ~\Anaconda3\envs\sigopt\lib\site-packages\sigopt_sklearn\ensemble.py in <module>
         11 
         12 import numpy as np
    ---> 13 from sklearn.base import ClassifierMixin
         14 from sklearn.utils.validation import check_array
         15 
    
    ~\Anaconda3\envs\sigopt\lib\site-packages\sklearn\__init__.py in <module>
         62 else:
         63     from . import __check_build
    ---> 64     from .base import clone
         65     from .utils._show_versions import show_versions
         66 
    
    ~\Anaconda3\envs\sigopt\lib\site-packages\sklearn\base.py in <module>
         10 
         11 import numpy as np
    ---> 12 from scipy import sparse
         13 from .externals import six
         14 from .utils.fixes import signature
    
    ~\Anaconda3\envs\sigopt\lib\site-packages\scipy\__init__.py in <module>
        128 
        129     # Allow distributors to run custom init code
    --> 130     from . import _distributor_init
        131 
        132     from scipy._lib import _pep440
    
    ~\Anaconda3\envs\sigopt\lib\site-packages\scipy\_distributor_init.py in <module>
         57             os.chdir(libs_path)
         58             for filename in glob.glob(os.path.join(libs_path, '*dll')):
    ---> 59                 WinDLL(os.path.abspath(filename))
         60         finally:
         61             os.chdir(owd)
    
    ~\Anaconda3\envs\sigopt\lib\ctypes\__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
        362 
        363         if handle is None:
    --> 364             self._handle = _dlopen(self._name, mode)
        365         else:
        366             self._handle = handle
    
    OSError: [WinError 126] The specified module could not be found
    

    /cc @ethanchaNovelis @amitdingareNovelis

    opened by AbhijeetK-Novelis 1
Owner
SigOpt
SigOpt
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Knodle (Knowledge-supervised Deep Learning Framework) - a new framework for weak supervision with neural networks. It provides a modularization for se

null 93 Nov 6, 2022
Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Zhengzhong Tu 5 Sep 16, 2022
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 5, 2022
SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

SciKit-Learn Laboratory This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. O

ETS 528 Nov 25, 2022
Python package for Bayesian Machine Learning with scikit-learn API

Python package for Bayesian Machine Learning with scikit-learn API Installing & Upgrading package pip install https://github.com/AmazaspShumik/sklearn

Amazasp Shaumyan 482 Jan 4, 2023
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.9k Dec 31, 2022
scikit-learn: machine learning in Python

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started

scikit-learn 52.5k Jan 8, 2023
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 3.8k Feb 13, 2021
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.9k Jan 3, 2023
Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

null 213 Jan 2, 2023
scikit-learn inspired API for CRFsuite

sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i

null 417 Dec 20, 2022
Genetic Programming in Python, with a scikit-learn inspired API

Welcome to gplearn! gplearn implements Genetic Programming in Python, with a scikit-learn inspired and compatible API. While Genetic Programming (GP)

Trevor Stephens 1.3k Jan 3, 2023
Genetic feature selection module for scikit-learn

sklearn-genetic Genetic feature selection module for scikit-learn Genetic algorithms mimic the process of natural selection to search for optimal valu

Manuel Calzolari 260 Dec 14, 2022
Use evolutionary algorithms instead of gridsearch in scikit-learn

sklearn-deap Use evolutionary algorithms instead of gridsearch in scikit-learn. This allows you to reduce the time required to find the best parameter

rsteca 709 Jan 3, 2023
Using python and scikit-learn to make stock predictions

MachineLearningStocks in python: a starter project and guide EDIT as of Feb 2021: MachineLearningStocks is no longer actively maintained MachineLearni

Robert Martin 1.3k Dec 29, 2022
A scikit-learn-compatible module for estimating prediction intervals.

|Anaconda|_ MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals using your favourite sklearn

SimAI 584 Dec 27, 2022
Regression Metrics Calculation Made easy for tensorflow2 and scikit-learn

Regression Metrics Installation To install the package from the PyPi repository you can execute the following command: pip install regressionmetrics I

Ashish Patel 11 Dec 16, 2022
A real-time speech emotion recognition application using Scikit-learn and gradio

Speech-Emotion-Recognition-App A real-time speech emotion recognition application using Scikit-learn and gradio. Requirements librosa==0.6.3 numpy sou

Son Tran 6 Oct 4, 2022
Convert scikit-learn models to PyTorch modules

sk2torch sk2torch converts scikit-learn models into PyTorch modules that can be tuned with backpropagation and even compiled as TorchScript. Problems

Alex Nichol 101 Dec 16, 2022