XGBoost + Optuna

abhishek thakur

Last update: Dec 31, 2022

Related tags

Machine Learning autoxgb

Overview

AutoXGB

XGBoost + Optuna: no brainer

auto train xgboost directly from CSV files
auto tune xgboost using optuna
auto serve best xgboot model using fastapi

NOTE: PRs are currently not accepted. If there are issues/problems, please create an issue.

Installation

Install using pip

pip install autoxgb

Usage

Training a model using AutoXGB is a piece of cake. All you need is some tabular data.

Parameters

###############################################################################
### required parameters
###############################################################################

# path to training data
train_filename = "data_samples/binary_classification.csv"

# path to output folder to store artifacts
output = "output"

###############################################################################
### optional parameters
###############################################################################

# path to test data. if specified, the model will be evaluated on the test data
# and test_predictions.csv will be saved to the output folder
# if not specified, only OOF predictions will be saved
# test_filename = "test.csv"
test_filename = None

# task: classification or regression
# if not specified, the task will be inferred automatically
# task = "classification"
# task = "regression"
task = None

# an id column
# if not specified, the id column will be generated automatically with the name `id`
# idx = "id"
idx = None

# target columns are list of strings
# if not specified, the target column be assumed to be named `target`
# and the problem will be treated as one of: binary classification, multiclass classification,
# or single column regression
# targets = ["target"]
# targets = ["target1", "target2"]
targets = ["income"]

# features columns are list of strings
# if not specified, all columns except `id`, `targets` & `kfold` columns will be used
# features = ["col1", "col2"]
features = None

# categorical_features are list of strings
# if not specified, categorical columns will be inferred automatically
# categorical_features = ["col1", "col2"]
categorical_features = None

# use_gpu is boolean
# if not specified, GPU is not used
# use_gpu = True
# use_gpu = False
use_gpu = True

# number of folds to use for cross-validation
# default is 5
num_folds = 5

# random seed for reproducibility
# default is 42
seed = 42

# number of optuna trials to run
# default is 1000
# num_trials = 1000
num_trials = 100

# time_limit for optuna trials in seconds
# if not specified, timeout is not set and all trials are run
# time_limit = None
time_limit = 360

# if fast is set to True, the hyperparameter tuning will use only one fold
# however, the model will be trained on all folds in the end
# to generate OOF predictions and test predictions
# default is False
# fast = False
fast = False

Python API

To train a new model, you can run:

from autoxgb import AutoXGB


# required parameters:
train_filename = "data_samples/binary_classification.csv"
output = "output"

# optional parameters
test_filename = None
task = None
idx = None
targets = ["income"]
features = None
categorical_features = None
use_gpu = True
num_folds = 5
seed = 42
num_trials = 100
time_limit = 360
fast = False

# Now its time to train the model!
axgb = AutoXGB(
    train_filename=train_filename,
    output=output,
    test_filename=test_filename,
    task=task,
    idx=idx,
    targets=targets,
    features=features,
    categorical_features=categorical_features,
    use_gpu=use_gpu,
    num_folds=num_folds,
    seed=seed,
    num_trials=num_trials,
    time_limit=time_limit,
    fast=fast,
)
axgb.train()

CLI

Train the model using the autoxgb train command. The parameters are same as above.

autoxgb train \
 --train_filename datasets/30train.csv \
 --output outputs/30days \
 --test_filename datasets/30test.csv \
 --use_gpu

You can also serve the trained model using the autoxgb serve command.

autoxgb serve --model_path outputs/mll --host 0.0.0.0 --debug

To know more about a command, run:

`autoxgb <command> --help`

autoxgb train --help


usage: autoxgb <command> [<args>] train [-h] --train_filename TRAIN_FILENAME [--test_filename TEST_FILENAME] --output
                                        OUTPUT [--task {classification,regression}] [--idx IDX] [--targets TARGETS]
                                        [--num_folds NUM_FOLDS] [--features FEATURES] [--use_gpu] [--fast]
                                        [--seed SEED] [--time_limit TIME_LIMIT]

optional arguments:
  -h, --help            show this help message and exit
  --train_filename TRAIN_FILENAME
                        Path to training file
  --test_filename TEST_FILENAME
                        Path to test file
  --output OUTPUT       Path to output directory
  --task {classification,regression}
                        User defined task type
  --idx IDX             ID column
  --targets TARGETS     Target column(s). If there are multiple targets, separate by ';'
  --num_folds NUM_FOLDS
                        Number of folds to use
  --features FEATURES   Features to use, separated by ';'
  --use_gpu             Whether to use GPU for training
  --fast                Whether to use fast mode for tuning params. Only one fold will be used if fast mode is set
  --seed SEED           Random seed
  --time_limit TIME_LIMIT
                        Time limit for optimization

Comments

module 'pyarrow.lib' has no attribute 'MonthDayNanoIntervalArray'

Getting error while using TPS November data on Kaggle conda env (my GPU is on)

https://www.kaggle.com/yogeshkalauni/tps-nov-21-auto-xgboost-error

Getting error while using pip install in Kaggle kernel.

Collecting autoxgb
  Downloading autoxgb-0.2.1-py3-none-any.whl (20 kB)
Collecting scikit-learn==1.0.1
  Downloading scikit_learn-1.0.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (23.2 MB)
     |████████████████████████████████| 23.2 MB 1.3 MB/s eta 0:00:01
Requirement already satisfied: optuna==2.10.0 in /opt/conda/lib/python3.7/site-packages (from autoxgb) (2.10.0)
Collecting pyarrow==6.0.0
  Downloading pyarrow-6.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25.5 MB)
     |████████████████████████████████| 25.5 MB 43.9 MB/s eta 0:00:01
Requirement already satisfied: pydantic==1.8.2 in /opt/conda/lib/python3.7/site-packages (from autoxgb) (1.8.2)
Collecting loguru==0.5.3
  Downloading loguru-0.5.3-py3-none-any.whl (57 kB)
     |████████████████████████████████| 57 kB 4.9 MB/s  eta 0:00:01
Collecting xgboost==1.5.0
  Downloading xgboost-1.5.0-py3-none-manylinux2014_x86_64.whl (173.5 MB)
     |████████████████████████████████| 173.5 MB 66 kB/s s eta 0:00:01    |██████████████████              | 97.9 MB 59.6 MB/s eta 0:00:02
Collecting pandas==1.3.4
  Downloading pandas-1.3.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB)
     |████████████████████████████████| 11.3 MB 46.0 MB/s eta 0:00:01
Requirement already satisfied: fastapi==0.70.0 in /opt/conda/lib/python3.7/site-packages (from autoxgb) (0.70.0)
Requirement already satisfied: uvicorn==0.15.0 in /opt/conda/lib/python3.7/site-packages (from autoxgb) (0.15.0)
Collecting numpy==1.21.3
  Downloading numpy-1.21.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
     |████████████████████████████████| 15.7 MB 39.9 MB/s eta 0:00:01
Collecting joblib==1.1.0
  Downloading joblib-1.1.0-py2.py3-none-any.whl (306 kB)
     |████████████████████████████████| 306 kB 39.9 MB/s eta 0:00:01
Requirement already satisfied: starlette==0.16.0 in /opt/conda/lib/python3.7/site-packages (from fastapi==0.70.0->autoxgb) (0.16.0)
Requirement already satisfied: scipy!=1.4.0 in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (1.7.1)
Requirement already satisfied: cliff in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (3.9.0)
Requirement already satisfied: colorlog in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (6.5.0)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (21.0)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (4.62.3)
Requirement already satisfied: alembic in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (1.7.4)
Requirement already satisfied: cmaes>=0.8.2 in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (0.8.2)
Requirement already satisfied: sqlalchemy>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (1.4.25)
Requirement already satisfied: PyYAML in /opt/conda/lib/python3.7/site-packages (from optuna==2.10.0->autoxgb) (5.4.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.7/site-packages (from pandas==1.3.4->autoxgb) (2.8.0)
Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas==1.3.4->autoxgb) (2021.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.7/site-packages (from pydantic==1.8.2->autoxgb) (3.10.0.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn==1.0.1->autoxgb) (2.2.0)
Requirement already satisfied: anyio<4,>=3.0.0 in /opt/conda/lib/python3.7/site-packages (from starlette==0.16.0->fastapi==0.70.0->autoxgb) (3.3.0)
Requirement already satisfied: click>=7.0 in /opt/conda/lib/python3.7/site-packages (from uvicorn==0.15.0->autoxgb) (8.0.1)
Requirement already satisfied: asgiref>=3.4.0 in /opt/conda/lib/python3.7/site-packages (from uvicorn==0.15.0->autoxgb) (3.4.1)
Requirement already satisfied: h11>=0.8 in /opt/conda/lib/python3.7/site-packages (from uvicorn==0.15.0->autoxgb) (0.12.0)
Requirement already satisfied: sniffio>=1.1 in /opt/conda/lib/python3.7/site-packages (from anyio<4,>=3.0.0->starlette==0.16.0->fastapi==0.70.0->autoxgb) (1.2.0)
Requirement already satisfied: idna>=2.8 in /opt/conda/lib/python3.7/site-packages (from anyio<4,>=3.0.0->starlette==0.16.0->fastapi==0.70.0->autoxgb) (2.10)
Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from click>=7.0->uvicorn==0.15.0->autoxgb) (4.8.1)
Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging>=20.0->optuna==2.10.0->autoxgb) (2.4.7)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas==1.3.4->autoxgb) (1.16.0)
Requirement already satisfied: greenlet!=0.4.17 in /opt/conda/lib/python3.7/site-packages (from sqlalchemy>=1.1.0->optuna==2.10.0->autoxgb) (1.1.1)
Requirement already satisfied: Mako in /opt/conda/lib/python3.7/site-packages (from alembic->optuna==2.10.0->autoxgb) (1.1.5)
Requirement already satisfied: importlib-resources in /opt/conda/lib/python3.7/site-packages (from alembic->optuna==2.10.0->autoxgb) (5.2.2)
Requirement already satisfied: PrettyTable>=0.7.2 in /opt/conda/lib/python3.7/site-packages (from cliff->optuna==2.10.0->autoxgb) (2.2.0)
Requirement already satisfied: cmd2>=1.0.0 in /opt/conda/lib/python3.7/site-packages (from cliff->optuna==2.10.0->autoxgb) (2.2.0)
Requirement already satisfied: autopage>=0.4.0 in /opt/conda/lib/python3.7/site-packages (from cliff->optuna==2.10.0->autoxgb) (0.4.0)
Requirement already satisfied: stevedore>=2.0.1 in /opt/conda/lib/python3.7/site-packages (from cliff->optuna==2.10.0->autoxgb) (3.4.0)
Requirement already satisfied: pbr!=2.1.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from cliff->optuna==2.10.0->autoxgb) (5.6.0)
Requirement already satisfied: colorama>=0.3.7 in /opt/conda/lib/python3.7/site-packages (from cmd2>=1.0.0->cliff->optuna==2.10.0->autoxgb) (0.4.4)
Requirement already satisfied: attrs>=16.3.0 in /opt/conda/lib/python3.7/site-packages (from cmd2>=1.0.0->cliff->optuna==2.10.0->autoxgb) (21.2.0)
Requirement already satisfied: pyperclip>=1.6 in /opt/conda/lib/python3.7/site-packages (from cmd2>=1.0.0->cliff->optuna==2.10.0->autoxgb) (1.8.2)
Requirement already satisfied: wcwidth>=0.1.7 in /opt/conda/lib/python3.7/site-packages (from cmd2>=1.0.0->cliff->optuna==2.10.0->autoxgb) (0.2.5)
Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->click>=7.0->uvicorn==0.15.0->autoxgb) (3.5.0)
Requirement already satisfied: MarkupSafe>=0.9.2 in /opt/conda/lib/python3.7/site-packages (from Mako->alembic->optuna==2.10.0->autoxgb) (2.0.1)
Installing collected packages: numpy, joblib, xgboost, scikit-learn, pyarrow, pandas, loguru, autoxgb
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.5
    Uninstalling numpy-1.19.5:
      Successfully uninstalled numpy-1.19.5
  Attempting uninstall: joblib
    Found existing installation: joblib 1.0.1
    Uninstalling joblib-1.0.1:
      Successfully uninstalled joblib-1.0.1
  Attempting uninstall: xgboost
    Found existing installation: xgboost 1.4.2
    Uninstalling xgboost-1.4.2:
      Successfully uninstalled xgboost-1.4.2
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 0.23.2
    Uninstalling scikit-learn-0.23.2:
      Successfully uninstalled scikit-learn-0.23.2
  Attempting uninstall: pyarrow
    Found existing installation: pyarrow 5.0.0
    Uninstalling pyarrow-5.0.0:
      Successfully uninstalled pyarrow-5.0.0
  Attempting uninstall: pandas
    Found existing installation: pandas 1.3.3
    Uninstalling pandas-1.3.3:
      Successfully uninstalled pandas-1.3.3
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-io 0.18.0 requires tensorflow-io-gcs-filesystem==0.18.0, which is not installed.
explainable-ai-sdk 1.3.2 requires xai-image-widget, which is not installed.
dask-cudf 21.8.3 requires cupy-cuda114, which is not installed.
cudf 21.8.3 requires cupy-cuda110, which is not installed.
beatrix-jupyterlab 3.1.1 requires google-cloud-bigquery-storage, which is not installed.
yellowbrick 1.3.post1 requires numpy<1.20,>=1.16.0, but you have numpy 1.21.3 which is incompatible.
tfx-bsl 1.3.0 requires absl-py<0.13,>=0.9, but you have absl-py 0.14.0 which is incompatible.
tfx-bsl 1.3.0 requires numpy<1.20,>=1.16, but you have numpy 1.21.3 which is incompatible.
tfx-bsl 1.3.0 requires pyarrow<3,>=1, but you have pyarrow 6.0.0 which is incompatible.
tensorflow 2.6.0 requires numpy~=1.19.2, but you have numpy 1.21.3 which is incompatible.
tensorflow 2.6.0 requires six~=1.15.0, but you have six 1.16.0 which is incompatible.
tensorflow 2.6.0 requires typing-extensions~=3.7.4, but you have typing-extensions 3.10.0.2 which is incompatible.
tensorflow-transform 1.3.0 requires absl-py<0.13,>=0.9, but you have absl-py 0.14.0 which is incompatible.
tensorflow-transform 1.3.0 requires numpy<1.20,>=1.16, but you have numpy 1.21.3 which is incompatible.
tensorflow-transform 1.3.0 requires pyarrow<3,>=1, but you have pyarrow 6.0.0 which is incompatible.
tensorflow-io 0.18.0 requires tensorflow<2.6.0,>=2.5.0, but you have tensorflow 2.6.0 which is incompatible.
pdpbox 0.2.1 requires matplotlib==3.1.1, but you have matplotlib 3.4.3 which is incompatible.
numba 0.54.0 requires numpy<1.21,>=1.17, but you have numpy 1.21.3 which is incompatible.
matrixprofile 1.1.10 requires protobuf==3.11.2, but you have protobuf 3.18.1 which is incompatible.
hypertools 0.7.0 requires scikit-learn!=0.22,<0.24,>=0.19.1, but you have scikit-learn 1.0.1 which is incompatible.
dask-cudf 21.8.3 requires dask<=2021.07.1,>=2021.6.0, but you have dask 2021.9.1 which is incompatible.
dask-cudf 21.8.3 requires pandas<1.3.0dev0,>=1.0, but you have pandas 1.3.4 which is incompatible.
cudf 21.8.3 requires pandas<1.3.0dev0,>=1.0, but you have pandas 1.3.4 which is incompatible.
apache-beam 2.32.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.4 which is incompatible.
apache-beam 2.32.0 requires numpy<1.21.0,>=1.14.3, but you have numpy 1.21.3 which is incompatible.
apache-beam 2.32.0 requires pyarrow<5.0.0,>=0.15.1, but you have pyarrow 6.0.0 which is incompatible.
apache-beam 2.32.0 requires typing-extensions<3.8.0,>=3.7.0, but you have typing-extensions 3.10.0.2 which is incompatible.
Successfully installed autoxgb-0.2.1 joblib-1.1.0 loguru-0.5.3 numpy-1.21.3 pandas-1.3.4 pyarrow-6.0.0 scikit-learn-1.0.1 xgboost-1.5.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

from autoxgb import AutoXGB


# required parameters:
train_filename = "../input/tabular-playground-series-nov-2021/train.csv"
output = "outputt"

# optional parameters
test_filename = '../input/tabular-playground-series-nov-2021/test.csv'
task = 'classification'
idx = None
targets = ["target"]
features = None
categorical_features = None
use_gpu = True
num_folds = 5
seed = 42
num_trials = 100
time_limit = 7*60*60
fast = False

# Now its time to train the model!
axgb = AutoXGB(
    train_filename=train_filename,
    output=output,
    test_filename=test_filename,
    task=task,
    idx=idx,
    targets=targets,
    features=features,
    categorical_features=categorical_features,
    use_gpu=use_gpu,
    num_folds=num_folds,
    seed=seed,
    num_trials=num_trials,
    time_limit=time_limit,
    fast=fast,
)
axgb.train()

2021-11-01 07:03:06.106 | INFO     | autoxgb.autoxgb:__post_init__:42 - Output directory: outputt
2021-11-01 07:03:06.108 | WARNING  | autoxgb.autoxgb:__post_init__:49 - No id column specified. Will default to `id`.
2021-11-01 07:03:06.110 | INFO     | autoxgb.autoxgb:_process_data:149 - Reading training data
2021-11-01 07:03:22.502 | INFO     | autoxgb.utils:reduce_memory_usage:50 - Mem. usage decreased to 117.30 Mb (74.9% reduction)
2021-11-01 07:03:22.583 | INFO     | autoxgb.autoxgb:_determine_problem_type:140 - Problem type: binary_classification
2021-11-01 07:03:38.131 | INFO     | autoxgb.utils:reduce_memory_usage:50 - Mem. usage decreased to 105.06 Mb (74.8% reduction)
2021-11-01 07:03:38.132 | INFO     | autoxgb.autoxgb:_create_folds:58 - Creating folds
2021-11-01 07:03:38.248 | INFO     | autoxgb.autoxgb:_process_data:170 - Encoding target(s)
2021-11-01 07:03:38.282 | INFO     | autoxgb.autoxgb:_process_data:195 - Found 0 categorical features.
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_38/3565386527.py in <module>
     37     fast=fast,
     38 )
---> 39 axgb.train()

/opt/conda/lib/python3.7/site-packages/autoxgb/autoxgb.py in train(self)
    244 
    245     def train(self):
--> 246         self._process_data()
    247         best_params = train_model(self.model_config)
    248         logger.info("Training complete")

/opt/conda/lib/python3.7/site-packages/autoxgb/autoxgb.py in _process_data(self)
    210                     test_fold[categorical_features] = ord_encoder.transform(test_fold[categorical_features].values)
    211                 categorical_encoders[fold] = ord_encoder
--> 212             fold_train.to_feather(os.path.join(self.output, f"train_fold_{fold}.feather"))
    213             fold_valid.to_feather(os.path.join(self.output, f"valid_fold_{fold}.feather"))
    214             if self.test_filename is not None:

/opt/conda/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    205                 else:
    206                     kwargs[new_arg_name] = new_arg_value
--> 207             return func(*args, **kwargs)
    208 
    209         return cast(F, wrapper)

/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in to_feather(self, path, **kwargs)
   2517         from pandas.io.feather_format import to_feather
   2518 
-> 2519         to_feather(self, path, **kwargs)
   2520 
   2521     @doc(

/opt/conda/lib/python3.7/site-packages/pandas/io/feather_format.py in to_feather(df, path, storage_options, **kwargs)
     44     """
     45     import_optional_dependency("pyarrow")
---> 46     from pyarrow import feather
     47 
     48     if not isinstance(df, DataFrame):

/opt/conda/lib/python3.7/site-packages/pyarrow/feather.py in <module>
     23                          concat_tables, schema)
     24 import pyarrow.lib as ext
---> 25 from pyarrow import _feather
     26 from pyarrow._feather import FeatherError  # noqa: F401
     27 from pyarrow.vendored.version import Version

/opt/conda/lib/python3.7/site-packages/pyarrow/_feather.pyx in init pyarrow._feather()

AttributeError: module 'pyarrow.lib' has no attribute 'MonthDayNanoIntervalArray'

opened by Ankitkalauni 4

TypeError: __init__() got an unexpected keyword argument 'handle_unknown'

Hey I got this error and no idea where it is from. My manual xgb model works without problems with exactly the same dataset.

I have no idea how to use github so sorry if this does not match your standards.

opened by bigcharless 3

AttributeError: dlsym(0x7fd108ca6760, XGDMatrixCreateFromDense): symbol not found

As per the subject, I am getting the error when I am running in local:


2021-11-01 15:45:04.651 | INFO     | autoxgb.autoxgb:__post_init__:42 - Output directory: output3
2021-11-01 15:45:04.652 | WARNING  | autoxgb.autoxgb:__post_init__:49 - No id column specified. Will default to `id`.
2021-11-01 15:45:04.653 | INFO     | autoxgb.autoxgb:_process_data:149 - Reading training data
2021-11-01 15:45:04.885 | INFO     | autoxgb.utils:reduce_memory_usage:48 - Mem. usage decreased to 2.19 Mb (76.0% reduction)
2021-11-01 15:45:04.891 | INFO     | autoxgb.autoxgb:_determine_problem_type:140 - Problem type: multi_class_classification
2021-11-01 15:45:04.892 | INFO     | autoxgb.autoxgb:_create_folds:58 - Creating folds
2021-11-01 15:45:04.922 | INFO     | autoxgb.autoxgb:_process_data:170 - Encoding target(s)
2021-11-01 15:45:04.931 | INFO     | autoxgb.autoxgb:_process_data:195 - Found 0 categorical features.
2021-11-01 15:45:05.054 | INFO     | autoxgb.autoxgb:_process_data:236 - Model config: train_filename='train.csv' test_filename=None idx='id' targets=['label'] problem_type=<ProblemType.multi_class_classification: 2> output='output3' features=['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'y1', 'z1', 'z2', 'z3', 'z4'] num_folds=5 use_gpu=False seed=42 categorical_features=[] num_trials=100 time_limit=360 fast=False
2021-11-01 15:45:05.054 | INFO     | autoxgb.autoxgb:_process_data:237 - Saving model config
2021-11-01 15:45:05.055 | INFO     | autoxgb.autoxgb:_process_data:241 - Saving encoders
[I 2021-11-01 15:45:05,230] A new study created in RDB with name: autoxgb
[W 2021-11-01 15:45:05,339] Trial 0 failed because of the following error: AttributeError('dlsym(0x7fd108ca6760, XGDMatrixCreateFromDense): symbol not found')
Traceback (most recent call last):
  File "/Users/A124661/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/optuna/study/_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "/Users/A124661/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/autoxgb/utils.py", line 172, in optimize
    model.fit(
  File "/Users/A124661/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/core.py", line 506, in inner_f
    return f(**kwargs)
  File "/Users/A124661/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/sklearn.py", line 1231, in fit
    train_dmatrix, evals = _wrap_evaluation_matrices(
  File "/Users/A124661/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/sklearn.py", line 286, in _wrap_evaluation_matrices
    train_dmatrix = create_dmatrix(
  File "/Users/A124661/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/sklearn.py", line 1245, in <lambda>
    create_dmatrix=lambda **kwargs: DMatrix(nthread=self.n_jobs, **kwargs),
  File "/Users/A124661/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/core.py", line 506, in inner_f
    return f(**kwargs)
  File "/Users/A124661/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/core.py", line 616, in __init__
    handle, feature_names, feature_types = dispatch_data_backend(
  File "/Users/A124661/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/data.py", line 707, in dispatch_data_backend
    return _from_pandas_df(data, enable_categorical, missing, threads,
  File "/Users/A124661/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/data.py", line 299, in _from_pandas_df
    return _from_numpy_array(data, missing, nthread, feature_names,
  File "/Users/A124661/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/data.py", line 179, in _from_numpy_array
    _LIB.XGDMatrixCreateFromDense(
  File "/Users/A124661/opt/anaconda3/envs/deep_py38/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/Users/A124661/opt/anaconda3/envs/deep_py38/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: dlsym(0x7fd108ca6760, XGDMatrixCreateFromDense): symbol not found
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/pp/ym01m3sx0hg3my_gzpsdl8680000gp/T/ipykernel_728/1462055845.py in <module>
     16     fast=fast,
     17 )
---> 18 axgb.train()

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/autoxgb/autoxgb.py in train(self)
    245     def train(self):
    246         self._process_data()
--> 247         best_params = train_model(self.model_config)
    248         logger.info("Training complete")
    249         self.predict(best_params)

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/autoxgb/utils.py in train_model(model_config)
    211         load_if_exists=True,
    212     )
--> 213     study.optimize(optimize_func, n_trials=model_config.num_trials, timeout=model_config.time_limit)
    214     return study.best_params
    215 

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/optuna/study/study.py in optimize(self, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
    398             )
    399 
--> 400         _optimize(
    401             study=self,
    402             func=func,

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/optuna/study/_optimize.py in _optimize(study, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
     64     try:
     65         if n_jobs == 1:
---> 66             _optimize_sequential(
     67                 study,
     68                 func,

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/optuna/study/_optimize.py in _optimize_sequential(study, func, n_trials, timeout, catch, callbacks, gc_after_trial, reseed_sampler_rng, time_start, progress_bar)
    161 
    162         try:
--> 163             trial = _run_trial(study, func, catch)
    164         except Exception:
    165             raise

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/optuna/study/_optimize.py in _run_trial(study, func, catch)
    262 
    263     if state == TrialState.FAIL and func_err is not None and not isinstance(func_err, catch):
--> 264         raise func_err
    265     return trial
    266 

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/optuna/study/_optimize.py in _run_trial(study, func, catch)
    211 
    212     try:
--> 213         value_or_values = func(trial)
    214     except exceptions.TrialPruned as e:
    215         # TODO(mamu): Handle multi-objective cases.

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/autoxgb/utils.py in optimize(trial, xgb_model, use_predict_proba, eval_metric, model_config)
    170 
    171         else:
--> 172             model.fit(
    173                 xtrain,
    174                 ytrain,

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/core.py in inner_f(*args, **kwargs)
    504         for k, arg in zip(sig.parameters, args):
    505             kwargs[k] = arg
--> 506         return f(**kwargs)
    507 
    508     return inner_f

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/sklearn.py in fit(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, base_margin_eval_set, feature_weights, callbacks)
   1229 
   1230         model, feval, params = self._configure_fit(xgb_model, eval_metric, params)
-> 1231         train_dmatrix, evals = _wrap_evaluation_matrices(
   1232             missing=self.missing,
   1233             X=X,

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/sklearn.py in _wrap_evaluation_matrices(missing, X, y, group, qid, sample_weight, base_margin, feature_weights, eval_set, sample_weight_eval_set, base_margin_eval_set, eval_group, eval_qid, create_dmatrix, enable_categorical, label_transform)
    284 
    285     """
--> 286     train_dmatrix = create_dmatrix(
    287         data=X,
    288         label=label_transform(y),

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/sklearn.py in <lambda>(**kwargs)
   1243             eval_group=None,
   1244             eval_qid=None,
-> 1245             create_dmatrix=lambda **kwargs: DMatrix(nthread=self.n_jobs, **kwargs),
   1246             enable_categorical=self.enable_categorical,
   1247             label_transform=label_transform,

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/core.py in inner_f(*args, **kwargs)
    504         for k, arg in zip(sig.parameters, args):
    505             kwargs[k] = arg
--> 506         return f(**kwargs)
    507 
    508     return inner_f

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/core.py in __init__(self, data, label, weight, base_margin, missing, silent, feature_names, feature_types, nthread, group, qid, label_lower_bound, label_upper_bound, feature_weights, enable_categorical)
    614             return
    615 
--> 616         handle, feature_names, feature_types = dispatch_data_backend(
    617             data,
    618             missing=self.missing,

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/data.py in dispatch_data_backend(data, missing, threads, feature_names, feature_types, enable_categorical)
    705         return _from_tuple(data, missing, threads, feature_names, feature_types)
    706     if _is_pandas_df(data):
--> 707         return _from_pandas_df(data, enable_categorical, missing, threads,
    708                                feature_names, feature_types)
    709     if _is_pandas_series(data):

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/data.py in _from_pandas_df(data, enable_categorical, missing, nthread, feature_names, feature_types)
    297     data, feature_names, feature_types = _transform_pandas_df(
    298         data, enable_categorical, feature_names, feature_types)
--> 299     return _from_numpy_array(data, missing, nthread, feature_names,
    300                              feature_types)
    301 

~/opt/anaconda3/envs/deep_py38/lib/python3.8/site-packages/xgboost/data.py in _from_numpy_array(data, missing, nthread, feature_names, feature_types)
    177     config = bytes(json.dumps(args), "utf-8")
    178     _check_call(
--> 179         _LIB.XGDMatrixCreateFromDense(
    180             _array_interface(data),
    181             config,

~/opt/anaconda3/envs/deep_py38/lib/python3.8/ctypes/__init__.py in __getattr__(self, name)
    384         if name.startswith('__') and name.endswith('__'):
    385             raise AttributeError(name)
--> 386         func = self.__getitem__(name)
    387         setattr(self, name, func)
    388         return func

~/opt/anaconda3/envs/deep_py38/lib/python3.8/ctypes/__init__.py in __getitem__(self, name_or_ordinal)
    389 
    390     def __getitem__(self, name_or_ordinal):
--> 391         func = self._FuncPtr((name_or_ordinal, self))
    392         if not isinstance(name_or_ordinal, int):
    393             func.__name__ = name_or_ordinal

AttributeError: dlsym(0x7fd108ca6760, XGDMatrixCreateFromDense): symbol not found

opened by chandrad 2

Add single column regression.
add example for single-column regression (I think we should name it single-output and multi-output because single/multi "column" is a bit misleading)

add dummy test data for multi col regression (exactly same as train data with targets removed)

some minor changes in setup.py
opened by INF800 0
Use `suggest_float` instead of `suggest_loguniform`.
Thank you for your great project! I just found a point I could contribute although I'm not sure if this repository receives pull requests. Please feel free to close the PR if you don't need it.

Motivation

Trial.suggest_loguniform will be deprecated according to https://github.com/optuna/optuna/issues/2939.

It recommends Trial.suggest_float(..., log=True) instead of it.

Changes

This PR simply replaces Trial.suggest_loguniform with Trial.suggest_float.
opened by toshihikoyanase 0
TypeError: argument of type 'method' is not iterable

Hi, I try autoxgb but very quickly it returns that error:

2022-03-08 09:22:21.269 | INFO | autoxgb.autoxgb:post_init:42 - Output directory: output2 2022-03-08 09:22:21.276 | WARNING | autoxgb.autoxgb:post_init:49 - No id column specified. Will default to id. 2022-03-08 09:22:21.283 | INFO | autoxgb.autoxgb:_process_data:149 - Reading training data

TypeError Traceback (most recent call last) in () 37 fast=fast, 38 ) ---> 39 axgb.train() 10 frames /usr/local/lib/python3.7/dist-packages/autoxgb/autoxgb.py in train(self) 244 245 def train(self): --> 246 self._process_data() 247 best_params = train_model(self.model_config) 248 logger.info("Training complete")

/usr/local/lib/python3.7/dist-packages/autoxgb/autoxgb.py in _process_data(self) 148 def _process_data(self): 149 logger.info("Reading training data") --> 150 train_df = pd.read_csv(self.train_filename) 151 train_df = reduce_memory_usage(train_df) 152 problem_type = self._determine_problem_type(train_df)

/usr/local/lib/python3.7/dist-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs) 309 stacklevel=stacklevel, 310 ) --> 311 return func(*args, **kwargs) 312 313 return wrapper

/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options) 584 kwds.update(kwds_defaults) 585 --> 586 return _read(filepath_or_buffer, kwds) 587 588

/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/readers.py in _read(filepath_or_buffer, kwds) 480 481 # Create the parser. --> 482 parser = TextFileReader(filepath_or_buffer, **kwds) 483 484 if chunksize or iterator:

/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/readers.py in init(self, f, engine, **kwds) 809 self.options["has_index_names"] = kwds["has_index_names"] 810 --> 811 self._engine = self._make_engine(self.engine) 812 813 def close(self):

/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/readers.py in _make_engine(self, engine) 1038 ) 1039 # error: Too many arguments for "ParserBase" -> 1040 return mapping[engine](self.f, **self.options) # type: ignore[call-arg] 1041 1042 def _failover_to_python(self):

/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/c_parser_wrapper.py in init(self, src, **kwds) 49 50 # open handles ---> 51 self._open_handles(src, kwds) 52 assert self.handles is not None 53

/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/base_parser.py in _open_handles(self, src, kwds) 227 memory_map=kwds.get("memory_map", False), 228 storage_options=kwds.get("storage_options", None), --> 229 errors=kwds.get("encoding_errors", "strict"), 230 ) 231

/usr/local/lib/python3.7/dist-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options) 583 584 # read_csv does not know whether the buffer is opened in binary/text mode --> 585 if _is_binary_mode(path_or_buf, mode) and "b" not in mode: 586 mode += "b" 587

/usr/local/lib/python3.7/dist-packages/pandas/io/common.py in _is_binary_mode(handle, mode) 960 # classes that expect bytes 961 binary_classes = (BufferedIOBase, RawIOBase) --> 962 return isinstance(handle, binary_classes) or "b" in getattr(handle, "mode", mode)

TypeError: argument of type 'method' is not iterable

Don't know why..??? Thank you all for your help

My code: from autoxgb import AutoXGB

required parameters:

train_filename = df.iloc[:round(df.shape[0]*.8)] output = "output2"

optional parameters

test_filename = None task = None idx = df.index targets = ["Goal"] features = None categorical_features = None use_gpu = False num_folds = 5 seed = 42 num_trials = 100 time_limit = 360 fast = False

Now its time to train the model!

axgb = AutoXGB( train_filename=train_filename, output=output, test_filename=test_filename, task=task, idx=idx, targets=targets, features=features, categorical_features=categorical_features, use_gpu=use_gpu, num_folds=num_folds, seed=seed, num_trials=num_trials, time_limit=time_limit, fast=fast, ) axgb.train()

opened by ludomare 1
Time_limit . Doesn't stop the training

I have been dealing with TPS in Kaggle and I have tried auto xgboost. I have set the time limit to 3600*4. But the training didn't stop at 4 hours. Now is at 6.5 hours and still going. Is anything I am doing wrong?

ps. the first trial took 4 hours to complete

opened by Kyrpel 0
best params and model
Hi,

Thanks for building a very useful package. I have two simple questions:

How come only the following params are tuned: {'colsample_bytree': 0.18270180565544739, 'early_stopping_rounds': 401, 'learning_rate': 0.013529250923369278, 'max_depth': 6, 'n_estimators': 20000, 'reg_alpha': 0.0019387086612090178, 'reg_lambda': 5.879563892375361e-08, 'subsample': 0.8925701729066172} what about gamma and other xgBoost parameters? Are they assumed to be default values?

How do I access the best model from the output directory? I plugged in the above best params in my xgb model, but didn't get the same result as autoxgb result showed. Is there a way to access these models and/or the best model in the output directory, so I can run the model on any data to see the results?

thank you so much, any help will be greatly appreciated.

p.s. any docs on how to use the output files? There are lot of useful info there, but don't know how to access them smartly.
opened by mchen172 0

Owner

abhishek thakur

Kaggle: www.kaggle.com/abhishek

GitHub

The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

mlflow_hydra_optuna_the_easy_way The easy way to combine mlflow, hydra and optuna into one machine learning pipeline. Objective TODO Usage 1. build do

9 Sep 9, 2022

Tools for Optuna, MLflow and the integration of both.

HPOflow - Sphinx DOC Tools for Optuna, MLflow and the integration of both. Detailed documentation with examples can be found here: Sphinx DOC Table of

17 Nov 20, 2022

LightGBM + Optuna: no brainer

AutoLGBM LightGBM + Optuna: no brainer auto train lightgbm directly from CSV files auto tune lightgbm using optuna auto serve best lightgbm model usin

22 Dec 15, 2022

Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Auto_TS: Auto_TimeSeries Automatically build multiple Time Series models using a Single Line of Code. Now updated with Dask. Auto_timeseries is a comp

519 Jan 3, 2023

Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Call of Duty World League: Search & Destroy Outcome Predictions Growing up as an avid Call of Duty player, I was always curious about what factors led

2 Jan 18, 2022

Mortality risk prediction for COVID-19 patients using XGBoost models

Mortality risk prediction for COVID-19 patients using XGBoost models Using demographic and lab test data received from the HM Hospitales in Spain, I b

1 Jan 19, 2022

XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

92 Dec 14, 2022

Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

video_lie_detector_using_xgboost a video lie detector using OpenFace and xgboost

2 Jan 11, 2022

An AutoML Library made with Optuna and PyTorch Lightning

An AutoML Library made with Optuna and PyTorch Lightning Installation Recommended pip install -U gradsflow From source pip install git+https://github.

294 Dec 17, 2022

easyopt is a super simple yet super powerful optuna-based Hyperparameters Optimization Framework that requires no coding.

9 Feb 4, 2022

The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

mlflow_hydra_optuna_the_easy_way The easy way to combine mlflow, hydra and optuna into one machine learning pipeline. Objective TODO Usage 1. build do

9 Sep 9, 2022

Tools for Optuna, MLflow and the integration of both.

HPOflow - Sphinx DOC Tools for Optuna, MLflow and the integration of both. Detailed documentation with examples can be found here: Sphinx DOC Table of

17 Nov 20, 2022

Numerai tournament example scripts using NN and optuna

numerai_NN_example Numerai tournament example scripts using pytorch NN, lightGBM and optuna https://numer.ai/tournament Performance of my model based

12 Oct 10, 2022

LightGBM + Optuna: no brainer

AutoLGBM LightGBM + Optuna: no brainer auto train lightgbm directly from CSV files auto tune lightgbm using optuna auto serve best lightgbm model usin

22 Dec 15, 2022

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl

6.1k Jan 5, 2023

Improving XGBoost survival analysis with embeddings and debiased estimators

xgbse: XGBoost Survival Embeddings "There are two cultures in the use of statistical modeling to reach conclusions from data

242 Dec 30, 2022

A fast xgboost feature selection algorithm

BoostARoota A Fast XGBoost Feature Selection Algorithm (plus other sklearn tree-based classifiers) Why Create Another Algorithm? Automated processes l

187 Dec 22, 2022

Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Auto_TS: Auto_TimeSeries Automatically build multiple Time Series models using a Single Line of Code. Now updated with Dask. Auto_timeseries is a comp

519 Jan 3, 2023

Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Call of Duty World League: Search & Destroy Outcome Predictions Growing up as an avid Call of Duty player, I was always curious about what factors led

2 Jan 18, 2022

Mortality risk prediction for COVID-19 patients using XGBoost models

Mortality risk prediction for COVID-19 patients using XGBoost models Using demographic and lab test data received from the HM Hospitales in Spain, I b

1 Jan 19, 2022

XGBoost + Optuna

Related tags

Overview

AutoXGB

Installation

Usage

Parameters

Python API

CLI

Comments

Motivation

Changes

2022-03-08 09:22:21.269 | INFO | autoxgb.autoxgb:post_init:42 - Output directory: output2 2022-03-08 09:22:21.276 | WARNING | autoxgb.autoxgb:post_init:49 - No id column specified. Will default to id. 2022-03-08 09:22:21.283 | INFO | autoxgb.autoxgb:_process_data:149 - Reading training data

required parameters:

optional parameters

Now its time to train the model!

Owner

abhishek thakur

The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

Tools for Optuna, MLflow and the integration of both.

LightGBM + Optuna: no brainer

Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Mortality risk prediction for COVID-19 patients using XGBoost models

XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

An AutoML Library made with Optuna and PyTorch Lightning

easyopt is a super simple yet super powerful optuna-based Hyperparameters Optimization Framework that requires no coding.

The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

Tools for Optuna, MLflow and the integration of both.

Numerai tournament example scripts using NN and optuna

LightGBM + Optuna: no brainer

Improving XGBoost survival analysis with embeddings and debiased estimators

A fast xgboost feature selection algorithm

Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Mortality risk prediction for COVID-19 patients using XGBoost models

2022-03-08 09:22:21.269 | INFO | autoxgb.autoxgb:post_init:42 - Output directory: output2 2022-03-08 09:22:21.276 | WARNING | autoxgb.autoxgb:post_init:49 - No id column specified. Will default to `id`. 2022-03-08 09:22:21.283 | INFO | autoxgb.autoxgb:_process_data:149 - Reading training data