hgboost - Hyperoptimized Gradient Boosting
Star it if you like it!
hgboost
is short for Hyperoptimized Gradient Boosting and is a python package for hyperparameter optimization for xgboost, catboost and lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost
can be applied for classification and regression tasks.
hgboost
is fun because:
* 1. Hyperoptimization of the Parameter-space using bayesian approach.
* 2. Determines the best scoring model(s) using k-fold cross validation.
* 3. Evaluates best model on independent evaluation set.
* 4. Fit model on entire input-data using the best model.
* 5. Works for classification and regression
* 6. Creating a super-hyperoptimized model by an ensemble of all individual optimized models.
* 7. Return model, space and test/evaluation results.
* 8. Makes insightful plots.
Documentation
- API Documentation: https://erdogant.github.io/hgboost/
- Github: https://github.com/erdogant/hgboost/
Schematic overview of hgboost
Installation Environment
- Install hgboost from PyPI (recommended). hgboost is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
- A new environment is recommended and created as following:
conda create -n env_hgboost python=3.6
conda activate env_hgboost
Install newest version hgboost from pypi
pip install hgboost
Force to install latest version
pip install -U hgboost
Install from github-source
pip install git+https://github.com/erdogant/hgboost#egg=master
Import hgboost package
import hgboost as hgboost
Classification example for xgboost, catboost and lightboost:
# Load library
from hgboost import hgboost
# Initialization
hgb = hgboost(max_eval=10, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=42)
# Import data
df = hgb.import_example()
y = df['Survived'].values
y = y.astype(str)
y[y=='1']='survived'
y[y=='0']='dead'
# Preprocessing by encoding variables
del df['Survived']
X = hgb.preprocessing(df)
# Fit catboost by hyperoptimization and cross-validation
results = hgb.catboost(X, y, pos_label='survived')
# Fit lightboost by hyperoptimization and cross-validation
results = hgb.lightboost(X, y, pos_label='survived')
# Fit xgboost by hyperoptimization and cross-validation
results = hgb.xgboost(X, y, pos_label='survived')
# [hgboost] >Start hgboost classification..
# [hgboost] >Collecting xgb_clf parameters.
# [hgboost] >Number of variables in search space is [11], loss function: [auc].
# [hgboost] >method: xgb_clf
# [hgboost] >eval_metric: auc
# [hgboost] >greater_is_better: True
# [hgboost] >pos_label: True
# [hgboost] >Total dataset: (891, 204)
# [hgboost] >Hyperparameter optimization..
# 100% |----| 500/500 [04:39<05:21, 1.33s/trial, best loss: -0.8800619834710744]
# [hgboost] >Best performing [xgb_clf] model: auc=0.881198
# [hgboost] >5-fold cross validation for the top 10 scoring models, Total nr. tests: 50
# 100%|██████████| 10/10 [00:42<00:00, 4.27s/it]
# [hgboost] >Evalute best [xgb_clf] model on independent validation dataset (179 samples, 20.00%).
# [hgboost] >[auc] on independent validation dataset: -0.832
# [hgboost] >Retrain [xgb_clf] on the entire dataset with the optimal parameters settings.
# Plot searched parameter space
hgb.plot_params()
# Plot summary results
hgb.plot()
# Plot the best tree
hgb.treeplot()
# Plot the validation results
hgb.plot_validation()
# Plot the cross-validation results
hgb.plot_cv()
# use the learned model to make new predictions.
y_pred, y_proba = hgb.predict(X)
Create ensemble model for Classification
from hgboost import hgboost
hgb = hgboost(max_eval=100, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None, verbose=3)
# Import data
df = hgb.import_example()
y = df['Survived'].values
del df['Survived']
X = hgb.preprocessing(df, verbose=0)
results = hgb.ensemble(X, y, pos_label=1)
# use the predictor
y_pred, y_proba = hgb.predict(X)
Create ensemble model for Regression
from hgboost import hgboost
hgb = hgboost(max_eval=100, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=None, verbose=3)
# Import data
df = hgb.import_example()
y = df['Age'].values
del df['Age']
I = ~np.isnan(y)
X = hgb.preprocessing(df, verbose=0)
X = X.loc[I,:]
y = y[I]
results = hgb.ensemble(X, y, methods=['xgb_reg','ctb_reg','lgb_reg'])
# use the predictor
y_pred, y_proba = hgb.predict(X)
# Plot the ensemble classification validation results
hgb.plot_validation()
References
* http://hyperopt.github.io/hyperopt/
* https://github.com/dmlc/xgboost
* https://github.com/microsoft/LightGBM
* https://github.com/catboost/catboost
Maintainers
- Erdogan Taskesen, github: erdogant
Contribute
- Contributions are welcome.
Licence See LICENSE for details.
Coffee
- If you wish to buy me a Coffee for this work, it is very appreciated :)