Hummingbird compiles trained ML models into tensor computation for faster inference.

Microsoft

Last update: Dec 30, 2022

Related tags

Overview

Hummingbird

Introduction

Hummingbird is a library for compiling trained traditional ML models into tensor computations. Hummingbird allows users to seamlessly leverage neural network frameworks (such as PyTorch) to accelerate traditional ML models. Thanks to Hummingbird, users can benefit from: (1) all the current and future optimizations implemented in neural network frameworks; (2) native hardware acceleration; (3) having a unique platform to support for both traditional and neural network models; and have all of this (4) without having to re-engineer their models.

Currently, you can use Hummingbird to convert your trained traditional ML models into PyTorch, TorchScript, ONNX, and TVM). Hummingbird supports a variety of ML models and featurizers. These models include scikit-learn Decision Trees and Random Forest, and also LightGBM and XGBoost Classifiers/Regressors. Support for other neural network backends and models is on our roadmap.

Hummingbird also provides a convenient uniform "inference" API following the Sklearn API. This allows swapping Sklearn models with Hummingbird-generated ones without having to change the inference code. By converting the models to PyTorch and TorchScript it also becomes possible to serve them using TorchServe.

How Hummingbird Works

Hummingbird works by reconfiguring algorithmic operators such that we can perform more regular computations which are amenable to vectorized and GPU execution. Each operator is slightly different, and we incorporate multiple strategies. This example explains one of Hummingbird's strategies for translating a decision tree into tensors involving GEMM (GEneric Matrix Multiplication), where we implement the traversal of the tree using matrix multiplications. (GEMM is one of the three tree conversion strategies we currently support.)

Simple decision tree

In this example, the decision tree has four decision nodes (orange), and five leaf nodes (blue). The tree takes a feature vector with five elements as input. For example, assume that we want to calculate the output of this observation:

Step 1: Multiply the input tensor with tensor A (computed from the decision tree model above) that captures the relationship between input features and internal nodes. Then compare it with tensor B which is set to the value of each internal node (orange) to create the tensor input path that represents the path from input to node. In this case, the tree model has 4 conditions and the input vector is 5, therefore, the shape of tensor A is 5x4 and tensor B is 1x4.

Step 2: The input path tensor will be multiplied with tensor C that captures whether the internal node is a parent of that internal node, and if so, whether it is in the left or right sub-tree (left = 1, right =-1, otherwise =0) and then check the equals with tensor D that captures the count of the left child of its parent in the path from a leaf node to the tree root to create the tenor output path that represents the path from node to output. In this case, this tree model has 5 outputs with 4 conditions, therefore, the shape of tensor C is 4x5 and tensor D is 1x5.

Step 3: The output path will be multiplied with tensor E that captures the mapping between leaf nodes to infer the final prediction. In this case, tree model has 5 outputs, therefore, shape of tensor E is 5x1.

And now Hummingbird has compiled a tree-based model using the GEMM strategy! For more details, please see Figure 3 of our paper.

Thank you to Chien Vu for contributing the graphics and descriptions in his blog for this example!

Installation

Hummingbird was tested on Python >= 3.6 on Linux, Windows and MacOS machines. (Python 3.5 is suppored up to hummingbird-ml==0.2.1.) It is recommended to use a virtual environment (See: python3 venv doc or Using Python environments in VS Code.)

Hummingbird requires PyTorch >= 1.4.0. Please go here for instructions on how to install PyTorch based on your platform and hardware.

Once PyTorch is installed, you can get Hummingbird from pip with:

pip install hummingbird-ml

If you require the optional dependencies lightgbm and xgboost, you can use:

pip install hummingbird-ml[extra]

See also Troubleshooting for common problems.

Examples

See the notebooks section for examples that demonstrate use and speedups.

In general, Hummingbird syntax is very intuitive and minimal. To run your traditional ML model on DNN frameworks, you only need to import hummingbird.ml and add convert(model, 'dnn_framework') to your code. Below is an example using a scikit-learn random forest model and PyTorch as target framework.

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from hummingbird.ml import convert, load

# Create some random data for binary classification
num_classes = 2
X = np.random.rand(100000, 28)
y = np.random.randint(num_classes, size=100000)

# Create and train a model (scikit-learn RandomForestClassifier in this case)
skl_model = RandomForestClassifier(n_estimators=10, max_depth=10)
skl_model.fit(X, y)

# Use Hummingbird to convert the model to PyTorch
model = convert(skl_model, 'pytorch')

# Run predictions on CPU
model.predict(X)

# Run predictions on GPU
model.to('cuda')
model.predict(X)

# Save the model
model.save('hb_model')

# Load the model back
model = load('hb_model')

Documentation

The API documentation is here.

You can also read about Hummingbird in our blog post here.

For more details on the vision and on the technical details related to Hummingbird, please check our papers:

Tensors: An abstraction for general data processing. Dimitrios Koutsoukos, Supun Nakandala, Konstantinos Karanasos, Karla Saur, Gustavo Alonso, Matteo Interlandi. PVLDB 2021.
A Tensor Compiler for Unified Machine Learning Prediction Serving. Supun Nakandala, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, Matteo Interlandi. OSDI 2020.
Compiling Classical ML Pipelines into Tensor Computations for One-size-fits-all Prediction Serving. Supun Nakandala, Gyeong-In Yu, Markus Weimer, Matteo Interlandi. System for ML Workshop. NeurIPS 2019

Contributing

We welcome contributions! Please see the guide on Contributing.

Also, see our roadmap of planned features.

Community

Join our community!

For more formal enquiries, you can contact us.

Authors

Supun Nakandala
Matteo Interlandi
Karla Saur

License

MIT License

Comments

[WIP] Add sklearn's HistGradientBoosting
Closes #64:

This PR adds sklearn's HistGradientBoostingClassifier to hummingbird.

It modifies the two functions convert_sklearn_gbdt_classifier() and get_parameters_for_sklearn_common() to handle the HistGradientBoostingClassifier model attributes.

It also updates the documentation to include the new HistGradientBoostingClassifier.
opened by ahmedkrmn 27
Support lightgbm >= 3

setup.py requires lightgbm < 3, I'm using the current version of lightgbm so hummingbird reports that it's not installed. Is there any blocking incompatibility or is it just that the requirement version is out of date?

opened by memeplex 18

AttributeError: 'XGBClassifier' object has no attribute 'raw_operator'

Code:

hummingmodel = hummingbird.ml.operator_converters.xgb.convert_sklearn_xgb_classifier(model, 'pytorch',extra_config={"n_features":18})

Error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/f2/9tbmpg411hndwc482xn850br0000gn/T/ipykernel_2889/1708670718.py in <module>
      1 # Use Hummingbird to convert the model to PyTorch
----> 2 hummingmodel = hummingbird.ml.operator_converters.xgb.convert_sklearn_xgb_classifier(model, 'pytorch',extra_config={"n_features":18})

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/xgb.py in convert_sklearn_xgb_classifier(operator, device, extra_config)
    102              Please pass "n_features:N" as extra configuration to the converter or fill a bug report.'
    103         )
--> 104     tree_infos = operator.raw_operator.get_booster().get_dump()
    105     n_classes = operator.raw_operator.n_classes_
    106 

AttributeError: 'XGBClassifier' object has no attribute 'raw_operator'

XGB Version: 1.6.1 Hummingbird Version: 0.4.7

Any idea about this issue? What other configurations are required to make this work?

opened by dintellect 17

Add float64 support in hummingbird
Addresses issue #51

This PR:

Add support for float64 input data based on discussion in issue #51

It appears Pytorch expects both inputs and model params (weights) to be of the same data type. The weights are in float32. In this failure case, the input is float64 type. In order to have the same type, it is easier (and computationally efficient) to down cast the inputs.

Add a few tests around the same change.

Pending:

I did validate other supported operators as well (Sklearn GBDT, HGBDT; XGB, etc). Should I add tests for them as well?

I feel we can update the notebooks (and other examples) as well removing the casting from float64 to float32. This makes the example simpler and less mysterious. Should I do that?

Validated that all we can now use float32/float64 data.
opened by KranthiGV 16
Fixing test flakiness

Hi,

The test test_tree_regressors_multioutput_regression is flaky. It failed 41/3000 times that I ran. It seems the absolute difference can be much higher than the current threshold (1e-5).

Empirically, I observed a value of a maximum absolute difference of upto 3.7. Based on my experiments, the 99th percentile seems to be close to 4.5. Hence, I set this bound accordingly.

Please let me know if this makes sense. I am assuming there are no bugs in this case. I would be happy to incorporate any other suggestions that you may have.

Thanks!

opened by sleepy-owl 14
pip install of Version 0.2.0 dosen't work
I try'd to install the new hummingbird 0.2.0 release in a new venv and the installation failed because the hummingbird wheel specified a pytorch version that seems to have a broken wheel or the hummingbird wheel is missing the full pytorch installation command (which is: pip install torch===1.7.0 torchvision===0.8.1 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html). It's also not possible to get this exact pytorch install command from the official pytorch website. So its not possible to install the requirered pytorch version by just looking at the pytorch website where you only can get the 1.7.1 or the 1.6.0 version of pytorch.

Right now you have to find the pip install torch===1.7.0 torchvision===0.8.1 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html command somewhere in the internet and install this specific version before you can install hummingbird-ml

Collecting hummingbird-ml Downloading hummingbird_ml-0.2.0-py2.py3-none-any.whl (151 kB) Collecting numpy<=1.19.4,>=1.15 Using cached numpy-1.19.4-cp37-cp37m-win_amd64.whl (12.9 MB) Collecting onnxconverter-common<=1.7.0,>=1.6.0 Using cached onnxconverter_common-1.7.0-py2.py3-none-any.whl (64 kB) Collecting scikit-learn<=0.23.2,>=0.21.3 Using cached scikit_learn-0.23.2-cp37-cp37m-win_amd64.whl (6.8 MB) Collecting joblib>=0.11 Downloading joblib-1.0.0-py3-none-any.whl (302 kB) Collecting scipy>=0.19.1 Downloading scipy-1.6.0-cp37-cp37m-win_amd64.whl (32.5 MB) Collecting threadpoolctl>=2.0.0 Using cached threadpoolctl-2.1.0-py3-none-any.whl (12 kB) Collecting torch<=1.7.0,>=1.4.* Downloading torch-1.7.0-cp37-cp37m-win_amd64.whl (184.0 MB) ERROR: torch has an invalid wheel, .dist-info directory not found
opened by speedfreakw 13
float64 issue

At the moment, HB only works with float32. You must cast float64 to float32 for correct results. (You will get an error with the gemm algorithm). We need to fix this.

Ex: in scikit-learn-random-forest-example.ipynb we must cast X as follows: X = X[0:nrows].astype('|f4')

opened by ksaur 13
Add delete location bool param to PyTorchContainer load() call

What? Adds delete_unzip_location_folder param to load() method of PyTorchSklearnContainer to avoid implicit deletion of model artifact supplied to load() method. The changes in this PR are backward compatible.

Why? Related issue: https://github.com/microsoft/hummingbird/issues/557

opened by akshjain83 12
Performance Issues Using the TVM Backend

I am running inference on an XGBoost model using Hummingbird on a desktop CPU target. I've installed Pytorch, Hummingbird and TVM into a conda environment (TVM was built from source and linked to llvm-10). I have trained models serialized into XGBoost JSONs and am creating XGBoost sklearn objects from them. I am able to compile these models using Hummingbird and run them. My python code to run on PyTorch is as follows:

xgb_model = xgb.XGBRegressor() xgb_model.load_model(model_json) hb_model = convert(xgb_model, 'pytorch') // ... pred = hb_model.predict(batch)

My python code to compile the model using the TVM backend is the following:

xgb_model = xgb.XGBRegressor() xgb_model.load_model(model_json) hb_model = convert(xgb_model, 'tvm', test_input=inputs[0:batch_size]) // ... pred = hb_model.predict(batch)

Is this the right way to compile models (especially using TVM)? Do I need to enable any other TVM features (eg. auto-tuning) through the Hummingbird API? For my models, the performance of the inference with both the PyTorch and the TVM backends are roughly the same which makes me think that I may be using the TVM backend wrong. I was able to verify that the predict methods are computing the correct predictions.

opened by asprasad 11
KernelPCA + PyTorch

Hello, I'm trying to utilize GPU with Pytorch backend to speed up a Kernel PCA operation. However, when I convert to Pytorch, it ends up taking about 9x longer to run the .transform() function. Additionally, I'm not seeing any GPU utilization at all. Sklearn: 0.8 seconds Pytorch + CPU: 7.8 seconds Pytorch + GPU (supposedly, but again, not seeing any GPU utilization): 7.9 seconds

Would it be possible for you to look into this? Have already checked that CUDA is configured correctly with torch.cuda.is_available(). Thanks!
bug

opened by carolinemckee 11
CUDA out of memory

Hi there,

When I'm testing Hummingbird with GPU for more and deeper trees (worked fine if I have less trees), I got OOM error:

"CUDA out of memory. Tried to allocate 5.96 GiB (GPU 0; 11.91 GiB total capacity; 6.47 GiB already allocated; 4.74 GiB free; 6.48 GiB reserved in total by PyTorch)"

I'm using: CUDA 10.1, Pytorch 1.5.1+cu101 RandomForestRegressor from sklearn total number of samples: 800k number of features: 100 number of trees: 1000 average tree depth: 15 average number of nodes per tree: 3700

Is it expected? Any idea how to make it work for more and deeper trees?

Thanks!

opened by zhanjiezhu 11

MissingConverter: Unable to find converter

---------------------------------------------------------------------------
MissingConverter                          Traceback (most recent call last)
/var/folders/f2/9tbmpg411hndwc482xn850br0000gn/T/ipykernel_27005/3005074338.py in <module>
----> 1 hb_model = convert(clf, 'torch',X_train[0:1])

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/convert.py in convert(model, backend, test_input, device, extra_config)
    442     """
    443     assert constants.REMAINDER_SIZE not in extra_config
--> 444     return _convert_common(model, backend, test_input, device, extra_config)
    445 
    446 

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/convert.py in _convert_common(model, backend, test_input, device, extra_config)
    403         return _convert_sparkml(model, backend_formatted, test_input, device, extra_config)
    404 
--> 405     return _convert_sklearn(model, backend_formatted, test_input, device, extra_config)
    406 
    407 

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/convert.py in _convert_sklearn(model, backend, test_input, device, extra_config)
    106     # We modify the scikit learn model during translation.
    107     model = deepcopy(model)
--> 108     topology = parse_sklearn_api_model(model, extra_config)
    109 
    110     # Convert the Topology object into a PyTorch model.

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/_parse.py in parse_sklearn_api_model(model, extra_config)
     63     # Parse the input scikit-learn model into a topology object.
     64     # Get the outputs of the model.
---> 65     outputs = _parse_sklearn_api(topology, model, inputs)
     66 
     67     # Declare output variables.

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/_parse.py in _parse_sklearn_api(topology, model, inputs)
    228     tmodel = type(model)
    229     if tmodel in sklearn_api_parsers_map:
--> 230         outputs = sklearn_api_parsers_map[tmodel](topology, model, inputs)
    231     else:
    232         outputs = _parse_sklearn_single_model(topology, model, inputs)

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/_parse.py in _parse_sklearn_pipeline(topology, model, inputs)
    274     """
    275     for step in model.steps:
--> 276         inputs = _parse_sklearn_api(topology, step[1], inputs)
    277     return inputs
    278 

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/_parse.py in _parse_sklearn_api(topology, model, inputs)
    228     tmodel = type(model)
    229     if tmodel in sklearn_api_parsers_map:
--> 230         outputs = sklearn_api_parsers_map[tmodel](topology, model, inputs)
    231     else:
    232         outputs = _parse_sklearn_single_model(topology, model, inputs)

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/_parse.py in _parse_sklearn_column_transformer(topology, model, inputs)
    451                 )
    452         else:
--> 453             var_out = _parse_sklearn_api(topology, model_obj, transform_inputs)[0]
    454             if model.transformer_weights is not None and name in model.transformer_weights:
    455                 # Create a Multiply node

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/_parse.py in _parse_sklearn_api(topology, model, inputs)
    230         outputs = sklearn_api_parsers_map[tmodel](topology, model, inputs)
    231     else:
--> 232         outputs = _parse_sklearn_single_model(topology, model, inputs)
    233 
    234     return outputs

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/_parse.py in _parse_sklearn_single_model(topology, model, inputs)
    250         raise RuntimeError("Parameter model must be an object not a " "string '{0}'.".format(model))
    251 
--> 252     alias = get_sklearn_api_operator_name(type(model))
    253     this_operator = topology.declare_logical_operator(alias, model)
    254     this_operator.inputs = inputs

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/supported.py in get_sklearn_api_operator_name(model_type)
    463     """
    464     if model_type not in sklearn_api_operator_name_map:
--> 465         raise MissingConverter("Unable to find converter for model type {}.".format(model_type))
    466     return sklearn_api_operator_name_map[model_type]
    467 

MissingConverter: Unable to find a converter for model type <class 'sklearn.preprocessing._encoders.OrdinalEncoder'>.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter implemented.
Please fill an issue at https://github.com/microsoft/hummingbird.

Which Scikit-learn pipeline operators do hummingbird support?

enhancement

opened by dintellect 3

FLOPs counting for the converted model

Could you please share some suggestions on FLOPs counting for the converted model?

I have tried: thop : https://github.com/Lyken17/pytorch-OpCounter flop-counter : https://github.com/sovrasov/flops-counter.pytorch pthflops: https://github.com/1adrianb/pytorch-estimate-flops torchprofile: https://github.com/zhijian-liu/torchprofile deepspeed: https://www.deepspeed.ai/tutorials/flops-profiler/

Seems non of them support the calculations of the converted models, any advice will be highly appreciated, thanks!

opened by ChuniHiro 1

Kernel crashing while converting

RF was previously defined.

# Can crash sometimes
# Convert model to pytroch with Hummingbird-ml
RFconv = convert(RF, 'pytorch')

# Save Model Converted
RFconv.save(os.path.join(ResultsFolder, "RFmodel_V3"))

When I ran this on macOS it crashes the Jupyter kernel, this doesn't happen in Windows.

Screenshot 2022-12-15 at 16 14 19

opened by EmanuelCastanho 4

Need new test dataset for xgb

Build workflow run

==================================== ERRORS ====================================
_______________ ERROR collecting tests/test_xgboost_converter.py _______________
ImportError while importing test module '/home/runner/work/hummingbird/hummingbird/tests/test_xgboost_converter.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.9.15/x64/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_xgboost_converter.py:8: in <module>
    from sklearn.datasets import load_boston
/opt/hostedtoolcache/Python/3.9.15/x64/lib/python3.9/site-packages/sklearn/datasets/__init__.py:156: in __getattr__
    raise ImportError(msg)
E   ImportError: 
E   `load_boston` has been removed from scikit-learn since version 1.2.

opened by ksaur 0

Should SKLearn operators be assumed to produce a single output?

See https://github.com/microsoft/hummingbird/blob/main/hummingbird/ml/_parse.py#L256

Consider models which implement predict and predict_proba functions. These return both label and probabilities as outputs. The current logic means that we cannot name the outputs in the hummingbird conversion step (ie with output_names argument to extra_config) and instead have to perform some ONNX graph surgery afterwards.

opened by stillmatic 0

Releases(v0.4.7)

v0.4.7(Nov 29, 2022)
What's Changed

This patch release fixes a bug in ONNX conversion and allows support of varying batch sizes by @stillmatic in https://github.com/microsoft/hummingbird/pull/654

Fixes deprecations in https://github.com/microsoft/hummingbird/pull/655

New Contributors

Thank you to @stillmatic for catching and fixing this bug (https://github.com/microsoft/hummingbird/issues/653) and for the additional maintenance work!

Full Changelog: https://github.com/microsoft/hummingbird/compare/v0.4.6...v0.4.7
Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.4.7.tar.gz(533.74 KB)
hummingbird_ml-0.4.7-py2.py3-none-any.whl(158.37 KB)
v0.4.6(Nov 10, 2022)
What's Changed

Features

Adds Lasso, Ridge and ElasticNet by @fd0r in https://github.com/microsoft/hummingbird/pull/625

Added support for more decision conditions in trees and ONNX conversion by @grafail in https://github.com/microsoft/hummingbird/pull/631

Add support for Tweedie, Poisson and Gamma regressors by @interesaaat in https://github.com/microsoft/hummingbird/pull/650

Maintenance and fixes

Remove pinned version for onnxconverter-common by @interesaaat in https://github.com/microsoft/hummingbird/pull/618

Bump action/cache to v3 by @mshr-h in https://github.com/microsoft/hummingbird/pull/619

Fix deprecation warnings for sklearn/scipy by @mshr-h in https://github.com/microsoft/hummingbird/pull/610

deprecating email due to spammers.... by @ksaur in https://github.com/microsoft/hummingbird/pull/621

updating ubuntu version by @ksaur in https://github.com/microsoft/hummingbird/pull/623

Fix linear models conversion when fit_intercept set to False by @RomanBredehoft in https://github.com/microsoft/hummingbird/pull/630

Corrects some typos by @RomanBredehoft in https://github.com/microsoft/hummingbird/pull/628

allow derived types of DataFrame by @liangfu in https://github.com/microsoft/hummingbird/pull/637

[TVM] Unify load params interface by @liangfu in https://github.com/microsoft/hummingbird/pull/640

Update TVM to 0.10 by @mshr-h in https://github.com/microsoft/hummingbird/pull/642

Update to pytorch 1.13 by @interesaaat in https://github.com/microsoft/hummingbird/pull/646

Update actions/checkout and actions/setup-python by @mshr-h in https://github.com/microsoft/hummingbird/pull/647

update codecov/codecov-action to v3 by @mshr-h in https://github.com/microsoft/hummingbird/pull/648

New Contributors

@fd0r made their first contribution in https://github.com/microsoft/hummingbird/pull/625

@RomanBredehoft made their first contribution in https://github.com/microsoft/hummingbird/pull/630

@grafail made their first contribution in https://github.com/microsoft/hummingbird/pull/631

@liangfu made their first contribution in https://github.com/microsoft/hummingbird/pull/637

Special Thanks

Thank you to @mshr-h for the continued support!!

Full Changelog: https://github.com/microsoft/hummingbird/compare/v0.4.5...v0.4.6
Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.4.6.tar.gz(532.37 KB)
hummingbird_ml-0.4.6-py2.py3-none-any.whl(158.58 KB)
v0.4.5(Aug 5, 2022)
What's Changed

Update _decomposition_implementations.py by @interesaaat in https://github.com/microsoft/hummingbird/pull/578

revise example in readme by @vumichien in https://github.com/microsoft/hummingbird/pull/579

Fix problem with new onnx and protobuf by @interesaaat in https://github.com/microsoft/hummingbird/pull/582

Bump TVM to v0.8 by @mshr-h in https://github.com/microsoft/hummingbird/pull/581

Fix the things broken by SKL==1.1.1 by @ksaur in https://github.com/microsoft/hummingbird/pull/588

Use onnxmltools>=1.6.0,<=1.11.0 instead of onnxmltools>=1.6.0 by @mshr-h in https://github.com/microsoft/hummingbird/pull/592

Deprecating Python3.7; updating sklearn-onnx version by @ksaur in https://github.com/microsoft/hummingbird/pull/593

deprecate torch1.7, push to macOS11 by @ksaur in https://github.com/microsoft/hummingbird/pull/594

Fix doc gen by @ksaur in https://github.com/microsoft/hummingbird/pull/596

Fix broken link by @mshr-h in https://github.com/microsoft/hummingbird/pull/599

Fixing documentation generation bug by @ksaur in https://github.com/microsoft/hummingbird/pull/598

Update Dockerfile by @mshr-h in https://github.com/microsoft/hummingbird/pull/601

Use a Microsoft compliant image for docker by @interesaaat in https://github.com/microsoft/hummingbird/pull/602

testing torch==1.12 by @ksaur in https://github.com/microsoft/hummingbird/pull/603

Use n_features_in_ instead of n_features_ by @mshr-h in https://github.com/microsoft/hummingbird/pull/604

Use python -m pip instead of the pip executable by @mshr-h in https://github.com/microsoft/hummingbird/pull/605

check for Sklearn model NotFitted before conversion by @SangamSwadiK in https://github.com/microsoft/hummingbird/pull/607

Upgrade prophet to v1.1 by @mshr-h in https://github.com/microsoft/hummingbird/pull/608

Add Python 3.10 by @mshr-h in https://github.com/microsoft/hummingbird/pull/586

Bump TVM to v0.9 by @mshr-h in https://github.com/microsoft/hummingbird/pull/609

Pinning onnxconverter-common to avoid dep by @ksaur in https://github.com/microsoft/hummingbird/pull/614

Use pre-installed LLVM for building TVM on macOS-11 by @mshr-h in https://github.com/microsoft/hummingbird/pull/615

New Contributors

@vumichien made their first contribution in https://github.com/microsoft/hummingbird/pull/579

@mshr-h made their first contribution in https://github.com/microsoft/hummingbird/pull/581

@SangamSwadiK made their first contribution in https://github.com/microsoft/hummingbird/pull/607

Special thanks

Extra special thanks to @mshr-h for the work maintaining our version dependencies and pipeline!

Full Changelog: https://github.com/microsoft/hummingbird/compare/v0.4.4...v0.4.5
Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.4.5.tar.gz(531.49 KB)
hummingbird_ml-0.4.5-py2.py3-none-any.whl(158.65 KB)
v0.4.4(Apr 25, 2022)
This minor release includes bug fixes for performance and external dependency updates.

What's Changed

Verified compatibility with Torch 1.11 released March10 by @ksaur in https://github.com/microsoft/hummingbird/pull/572

Fix xgboost tests for xgb > 1.6.0 by @interesaaat in https://github.com/microsoft/hummingbird/pull/576

Fix for KernelPCA using GPU by @interesaaat in https://github.com/microsoft/hummingbird/pull/575

Full Changelog: https://github.com/microsoft/hummingbird/compare/v0.4.3...v0.4.4

New Contributors

Thanks to @carolinemckee for the bug report on KPCA #574

Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.4.4.tar.gz(93.60 KB)
hummingbird_ml-0.4.4-py2.py3-none-any.whl(177.22 KB)
v0.4.3(Mar 11, 2022)
This minor release includes bug fixes and external dependency updates.

What's Changed

Fixed a problem with pandas with the latest xgb models in https://github.com/microsoft/hummingbird/pull/562

Minor changes to tests related to verifying that new versions of torch, scikit-learn, and Python3.9 work in #554, #560

allow tree_implementation="gemm" with onnx backend by @jfrery in https://github.com/microsoft/hummingbird/pull/566

New Contributors

@jfrery made their first contribution in https://github.com/microsoft/hummingbird/pull/566

@shubh0508 contributed a bug fix in https://github.com/microsoft/hummingbird/pull/568

Full Changelog: https://github.com/microsoft/hummingbird/compare/v0.4.2...v0.4.3
Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.4.3.tar.gz(93.58 KB)
hummingbird_ml-0.4.3-py2.py3-none-any.whl(177.22 KB)
v0.4.2(Dec 14, 2021)
This minor release includes a new operator, some improvements, and some fixes due to external dependency updates.

New Operator:

Added support for PLSRegressor (#549)

Improvements:

Better delete (with location) for saved models (#558)

Doc updates: Installation instructions for Fedora-based distros (#543) and readme update (#550)

External dependency management:

Hummingbird now works with scikit-learn==1.0.x (#545) and the current version of scipy (scipy==1.7.x) (#552)

Note that there is currently an open issue (#556) with Onnxruntime and torch==1.10.x on Linux/Mac that causes the tests to hang (for static length dimensions) and fail (for dynamic dimensions). This will be fixed in a subsequent release.

Credits:

Thanks to @akshjain83 and @bibhabasumohapatra for their contributions!
Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.4.2.tar.gz(93.24 KB)
hummingbird_ml-0.4.2-py2.py3-none-any.whl(176.85 KB)
v0.4.1(Aug 31, 2021)
This patch release includes a new operator and some documentation fixes.

New Operator:

Added support for lightgbm booster (#540)

Documentation Improvements:

Added TorchServe support to documentation (#537)

Updated sklearn_year_with_train.ipynb (#532)

Credits:

Thanks to @marsupialtail, @ananiask8, and @parulnith for their contributions!
Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.4.1.tar.gz(92.76 KB)
hummingbird_ml-0.4.1-py2.py3-none-any.whl(176.47 KB)
v0.4.0(Jun 22, 2021)
This release includes integration with Prophet, bug/usability fixes, and versioning fixes.

We are excited to announce trend prediction support for Prophet! See our notebook for examples.

New features:

Prophet integration (trend prediction) (#519)

Bug fixes/Usability fixes:

Fix several numerical precision issues in tree-based models (#511)

Better assertions and error messages (#514/#521)

Versioning fixes:

Remove constraint on dependencies version (#523)

Special thanks:

Extra special thanks to @xadupre for helping us resolve library dependency issues

Thanks to @pnhathuy07 for helping us clean up our asserts

Thanks to @scnakandala for the ongoing contributions

Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.4.0.tar.gz(92.41 KB)
hummingbird_ml-0.4.0-py2.py3-none-any.whl(176.28 KB)
v0.3.1(Apr 24, 2021)
This patch release includes several improvements related to load/store of a model:

Improvements:

Better error messages on load/save problems (#499)

Auto-clean temp directory on load/save (#502)

Using pickle instead of dill for model load/save to enable the use of Spark broadcast to share the model (#498)

Other Notes:

It seems the most recent release of libomp broke the pipeline for MacOS python3.6 and 3.7. We fixed this by pinning to an older version of libomp (#500). While not directly related to Hummingbird, this information may be useful to MacOS users with older versions of python wanting to use the lightgbm Hummingbird converters.

Credits:

Thanks to @dbanda for the feedback and testing on our load/store.
Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.3.1.tar.gz(90.53 KB)
hummingbird_ml-0.3.1-py2.py3-none-any.whl(173.68 KB)
v0.3.0(Apr 13, 2021)
This release includes many new operators, version upgrades, and minor bug fixes.

New Features:

ONNXML imputer (#459)

SKL Bagging Classifier/Regressor (#490)

SKL GridSearchCV and RandomizedGridSearchCV (#476)

SKL KMeans (#472)

SKL MeanShift (#473)

SKL Stack Classifier and Regressor (#471)

SKL RidgeCV and LinearSVR (#470)

New example notebooks (#462) (#461)

Versioning changes:

Bumped numpy from 1.19.4 to <=1.20.*

Bumped torch to 1.8.1

Notable bug fixes:

Fixed error when pandas is passed as input to predict but no test_input provided (#487)

Credits:

Thanks to our first-time contributor @rathijit for fixes to the README.
Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.3.0.tar.gz(89.54 KB)
hummingbird_ml-0.3.0-py2.py3-none-any.whl(172.17 KB)
v0.2.2(Mar 9, 2021)
This release includes several bug fixes, version upgrades, and minor feature upgrades.

Versioning changes:

Upgrading to Torch 1.8.0

Deprecating Python 3.5

Notable bug fixes:

Fix save to pytorch for onnx models with trees (#425)

Fix bug with degenerate trees (#426)

Fix bloated models when saving (#429)

Also included are fixes to test flakiness and error messages.

Feature upgrades:

Add support for modified huber loss in SGD (#415)

Simplify the discretizer logic (#448)

Credits:

Thanks to our first-time contributors @qin-xiong and @sleepy-owl
Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.2.2.tar.gz(86.85 KB)
hummingbird_ml-0.2.2-py2.py3-none-any.whl(167.38 KB)
v0.2.1(Jan 4, 2021)
Patch release

Version bump:

upgrading from torch 1.7.0 to torch 1.7.1

Also included:

Refactoring of '_container' structure

TVM loading

TVM padding

Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.2.1.tar.gz(83.56 KB)
hummingbird_ml-0.2.1-py2.py3-none-any.whl(161.28 KB)
v0.2.0(Dec 29, 2020)
Announcing: TVM Support

This release adds support for TVM (#236), giving us our fastest speeds yet!

New Features

Adding save/load features for models (#399)

BatchContainer for batch by batch prediction use case (#377)

Native support for string features (#396)

Add batch_benchmark option to do benchmark on a single batch (#369)

New Operators

Binarizer for ONNX-ML (#353)

Feature Vectorizer for ONNX-ML (#385)

Label Encoder for ONNX-ML and SKL (#374)

Credits

Thank you to @scnakandala and @masahi for their ongoing contributions!
Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.2.0.tar.gz(82.18 KB)
hummingbird_ml-0.2.0-py2.py3-none-any.whl(147.81 KB)
v0.1.0(Oct 30, 2020)
Announcing: Integration with PySpark.ML (pyspark)

In this release, we added PySpark.ML support, which will open new doors for collaboration in the Spark space! (#310)

So far, we support Bucketizer, VectorAssembler, and LogisticRegressionModel. We look forward to adding more operators soon!

Announcing: Pandas Dataframes Support

This release also adds support for Pandas Dataframes both at conversion time and inference time (#300).

New Features

We added benchmarks (#328, #330, #331) from our OSDI paper

We also have a variety of features and improvements to the user experience:

Added the capability of setting number of threads to the model container (#319)

Removed the need for requirements on providing input schemas for ONNX models (#334)

Added support for ONNX models with multiple inputs (#339)

Added batching to output containers (#323)

New Operators

scikit-learn KNeighbors Classifier (#296) and Regressor (#303)

Credits

Thank you to @scnakandala for SparkML and the additional scikit-learn operators! Thank you to @vumichien for contributing the README diagrams.
Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.1.0.tar.gz(73.08 KB)
hummingbird_ml-0.1.0-py2.py3-none-any.whl(113.84 KB)
v0.0.6(Sep 10, 2020)
This release adds a huge batch of new operators to scikit-learn, including pipeline support! It also includes bug fixes.

New Features

Basic support for scikit-learn Pipeline's including FeatureUnion and ColumnTransformer (#251)

New Operators - scikit-learn

Binarizer (#258)

KBinsDiscretizer (#285)

Matrix Decomposition Operators (#277)

FastICA

KernelPCA

PCA

TruncatedSVD

MissingIndicator (#268)

MLPClassifer (#260)

MLPRegressor (#288)

Other Classifiers: (#260)

BernoulliNB

GaussianNB

MultinomialNB

SelectKBest chi2 support (#262)

SelectPercentile (#263)

SimpleImputer (#267)

PolynomialFeatures (#269)

Other updates:

We now support xgboost>=0.90 (#253)

Added optimized_execution to torchscript backend (#276)

We now allow older version of scikit-learn for compatibility reasons (#274)

Bug fixes:

Logistic Regression with the lbfgs option (#261)

Empty trees and multiclass (#265)

Fix isolation forest for sklearn <= 0.21 (#290)

Credits

A big thank you to our contributors: @scnakandala, @zhanjiezhu Extra big thanks to @scnakandala for all of the scikit-learn converters!
Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.0.6.tar.gz(48.55 KB)
hummingbird_ml-0.0.6-py2.py3-none-any.whl(76.07 KB)
v0.0.5(Aug 21, 2020)
This release adds TorchScript as a backend, removes the problematic auto-installation of pytorch, improves syntax for the ONNX converter, adds notebook enhancements, and adds 3rd party library version upgrades. Hummingbird now provides the same Sklearn API across backends for executing inference.

Announcing: Integration with TorchScript

This release adds TorchScript as a backend (#225).

Users can convert models with:

hummingbird.ml.convert(model, "torchscript", X)

Installation changes:

After several reports from users across multiple platforms (in terms of both OS and underlying hardware), we changed the Hummingbird installer to require users to first install pytorch before installing Hummingbird (#246). This allows users to select the right pytorch version for a specific platform and should simplify the installation process and remove issues caused by having the wrong pytorch version installed.

ONNX API

For ONNX, we changed the API to have a more seamless experience, allowing users to interact with ONNX models in a consistent way with other models in Hummingbird (#231).

Instead of the user having to instanciate the ONNX runtime session:

- session = ort.InferenceSession(onnx_model.SerializeToString()) - onnx_pred = session.run(output_names, inputs)

The user can now just call predict, predict_proba, transform, etc. as with other Hummingbird conversions.

+ onnx_pred = onnx_model.predict(X)

New Operators

OneHotEncoder - integers (#197)

Support for Tweedie distribution in LGBM (#242)

Miscellaneous

The target opset for ONNX is now 11 (#214)

The target pytorch version is now 1.6.0, except for with Python 3.5 it remains at 1.5.1 for compatibility reasons (#213)

Docs are now auto-generated (#223)

Credits

Thanks to @KranthiGV for the updated LGBM ONNX notebook example
Source code(tar.gz)
Source code(zip)
hummingbird_ml-0.0.5-py2.py3-none-any.whl(58.75 KB)
v0.0.4(Jul 21, 2020)
This release adds several new operators to both scikit-learn and Onnx.

New Features

float 64 support [#186]

Better windows installation support [#179]

New Operators - scikit-learn

IsolationForest [#191]

LGBMRanker [#173]

OneHotEncoder [#193]

Scaler [#171]

XBGRanker [#189]

New Operators - Onnx

ArrayFeatureExtractor [#198]

Linear Classifier/Regressor [#190, #194]

Normalizer [#188]

Scaler [#196]

Credits

This release would not have been possible without the following contributors: @ahmedkrmn, @KranthiGV, @TuanNguyen27, @zhanjiezhu
Source code(tar.gz)
Source code(zip)
hummingbird-ml-0.0.4.tar.gz(35.51 KB)
hummingbird_ml-0.0.4-py2.py3-none-any.whl(53.82 KB)
hummingbird_ml-0.0.4-py2.py3-none-win32.whl(99.51 KB)
hummingbird_ml-0.0.4-py2.py3-none-win_amd64.whl(99.51 KB)
v0.0.3(Jun 19, 2020)
This release adds several new cool features and bug fixes to Hummingbird!

API Changes

When selecting the backend to use for conversion, we renamed pytorch into torch (to match the module name). [#142]

New Operators

HistGradientBoostingRegressor [#135 ]

LinearRegression [#140 ]

LinearSVC [#140 ]

LogisticRegression [#140 ]

LogisticRegressionCV [#140 ]

Normalizer [#126]

New Features

transform method is added to the PyTorch container to match the transformer API of Sklearn. [#148 ]

Support for ONNX models as input (at the moment this feature only works in combination with the lightgbm_converter in ONNXMLTOOLS) [#142 ]

Generation of ONNX models as output (at the moment this feature only works when a ONNX model is passed as input) [#142]

Credits

This release would not have been possible without the following contributors: @ahmedkrmn, @jspisak, and @TuanNguyen27.
Source code(tar.gz)
Source code(zip)
hummingbird_ml-0.0.3-py2.py3-none-any.whl(41.47 KB)
v0.0.2(Jun 10, 2020)
This release adds several new operators, an updated API, and contains several documentation fixes.

New Operators

DecisionTreeRegressor [#102 ]

ExtraTreesRegressor [#91 ]

GradientBoostingRegressor [#88 ]

HistGradientBoostingClassifier [#87]

Credits

Special thanks to following contributors: @KranthiGV (DecisionTreeRegressor), @mmbhatk (ExtraTreesRegressor), @bfgray3 (GradientBoostingRegressor), and @ahmedkrmn (HistGradientBoostingClassifier)
Source code(tar.gz)
Source code(zip)
hummingbird_ml-0.0.2-py2.py3-none-any.whl(27.06 KB)
v0.0.1(May 7, 2020)

This is the first release for Hummingbird! In this release, Hummingbird supports conversion from scikit-learn, LightGBM and XGBoost models to PyTorch. Currently supported models are listed here.
Source code(tar.gz)
Source code(zip)
hummingbird_ml-0.0.1-py2.py3-none-any.whl(48.47 KB)