Python binding for Microsoft LightGBM

Ardalan

Last update: Nov 18, 2022

Related tags

Third-party APIs Wrappers pyLightGBM

Overview

pyLightGBM: python binding for Microsoft LightGBM

Features:

Regression, Classification (binary, multi class)
Feature importance (clf.feature_importance())
Early stopping (clf.best_round)
Works with scikit-learn: GridSearchCV, cross_val_score, etc...
Silent mode (verbose=False)

Installation

Install lastest verion of Microsoft LightGBM then install the wrapper:

 pip install git+https://github.com/ArdalanM/pyLightGBM.git

Examples

Regression:

import numpy as np
from sklearn import datasets, metrics, model_selection
from pylightgbm.models import GBMRegressor

# full path to lightgbm executable (on Windows include .exe)
exec = "~/Documents/apps/LightGBM/lightgbm"

X, y = datasets.load_diabetes(return_X_y=True)
clf = GBMRegressor(exec_path=exec,
                   num_iterations=100, early_stopping_round=10,
                   num_leaves=10, min_data_in_leaf=10)

x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2)

clf.fit(x_train, y_train, test_data=[(x_test, y_test)])
print("Mean Square Error: ", metrics.mean_squared_error(y_test, clf.predict(x_test)))

Binary Classification:

import numpy as np
from sklearn import datasets, metrics, model_selection
from pylightgbm.models import GBMClassifier

# full path to lightgbm executable (on Windows include .exe)
exec = "~/Documents/apps/LightGBM/lightgbm"

X, Y = datasets.make_classification(n_samples=200, n_features=10)
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, Y, test_size=0.2)

clf = GBMClassifier(exec_path=exec, min_data_in_leaf=1)
clf.fit(x_train, y_train, test_data=[(x_test, y_test)])
y_pred = clf.predict(x_test)
print("Accuracy: ", metrics.accuracy_score(y_test, y_pred))

Grid Search:

import numpy as np
from sklearn import datasets, metrics, model_selection
from pylightgbm.models import GBMClassifier

# full path to lightgbm executable (on Windows include .exe)
exec = "~/Documents/apps/LightGBM/lightgbm"

X, Y = datasets.make_classification(n_samples=1000, n_features=10)

gbm = GBMClassifier(exec_path=exec,
                    metric='binary_error', early_stopping_round=10, bagging_freq=10)

param_grid = {'learning_rate': [0.1, 0.04], 'bagging_fraction': [0.5, 0.9]}

scorer = metrics.make_scorer(metrics.accuracy_score, greater_is_better=True)
clf = model_selection.GridSearchCV(gbm, param_grid, scoring=scorer, cv=2)

clf.fit(X, Y)

print("Best score: ", clf.best_score_)
print("Best params: ", clf.best_params_)

Notebooks

Using pyLightGBM for Kaggle competition (Allstate Claims Severity)
[Bayesian global optimization with pyLightGBM using data from Kaggle competition (Allstate Claims Severity)](https://github.com/ArdalanM/pyLightGBM/blob/master/notebooks/bayesian optimization_example_kaggle_allstate.ipynb)

Available parameters (default values):

application="regression"
num_iterations=10
learning_rate=0.1
num_leaves=127
tree_learner="serial"
num_threads=1
min_data_in_leaf=100
metric='l2'
is_training_metric=False
feature_fraction=1.
feature_fraction_seed=2
bagging_fraction=1.
bagging_freq=0
bagging_seed=3
metric_freq=1
early_stopping_round=0
max_bin=255
is_unbalance=False
num_class=1
boosting_type='gbdt'
min_sum_hessian_in_leaf=10
drop_rate=0.01
drop_seed=4
max_depth=-1
lambda_l1=0.
lambda_l2=0.
min_gain_to_split=0.
verbose=True
model=None

Comments

IOError when calling fit()

IOError when calling fit() OS = Windows 10 x64 Cloned and built it just a few hours ago.

est = GBMRegressor(exec_path="O:/Coding/LightGBM/", 
                   config='', 
                   application='regression', 
                   num_iterations=2500, 
                   learning_rate=0.1, 
                   num_leaves=127, 
                   tree_learner='serial', 
                   num_threads=4, 
                   min_data_in_leaf=100, 
                   metric='l2', 
                   feature_fraction=1.0, 
                   feature_fraction_seed=2, 
                   bagging_fraction=1.0, 
                   bagging_freq=0, 
                   bagging_seed=3, 
                   metric_freq=1, 
                   early_stopping_round=0)
est.fit(X, y, test_data=[(X_holdout, y_holdout)])

IOErrorTraceback (most recent call last) in () ----> 1 est.fit(X, y, test_data=[(X_holdout, y_holdout)])

C:\Users\ihopethiswillfi\Anaconda2\lib\site-packages\pylightgbm-0.2-py2.7.egg\pylightgbm\models.pyc in fit(self, X, y, test_data) 71 os.system("{} config={}".format(self.exec_path, self.config)) 72 ---> 73 with open(self.param['output_model'], mode='rb') as file: 74 self.model = file.read() 75

IOError: [Errno 2] No such file or directory: 'c:\users\ihopethiswillfi\appdata\local\temp\tmpksw8jv\LightGBM_model.txt'

opened by ihopethiswillfi 6

how can i use this package

Hello. It's my first time to use cpp file and headfile for python.

If I want to use this package, May I put the LightGBM's src into LighGBM/lightgbm? And then, I can design a classifier.

ps. Is the directory wrong? LighGBM -> LightGBM?

opened by lai-bluejay 6
max_bin and early_stopping_rounds
Thank you @ArdalanM for creating the wrapper which looks great!

Is it possible to:

add max_bin to the parameters

add a verbose/ silent flag to control if LightGBM's running message could be printed out, this can be of help for:

extract the best rounds from the running message if early stopping is used.

Thanks,
opened by yychenca 4
result file not found

A clf.predict(X) seems to cause a FileNotFoundError: [Errno 2] No such file or directory: 'pathToLightGBM/lightgbm_models/32028_1476974477/LightGBM_predict_result_32028_1476974523.txt' for me. I just downloaded the package from github and created a GBMClassifier - not sure if install via pip ... would be required.

edit

I am using a mac / osx 10.11.6

but I can see the files in the finder:

edit2

I noticed, that no LightGBM_predict_result_32028_1476974809.txt result.txt but rather only 3 other files were created.

opened by geoHeil 4
file format issues
sometimes I get input format error, should be LibSVMinput format error, should be LibSVM but have no real clue how to debug it. Do you have any suggestions?

The strange thing is that the training worked just fine:

[LightGBM] [Info] 3.972398 seconds elapsed, finished 59 iteration [LightGBM] [Info] 4.042168 seconds elapsed, finished 60 iteration [LightGBM] [Info] Finish train

The problem occurs when I try to predict new values.
opened by geoHeil 3

GBM predict with returned non-zero exit status 1 error

Hello ArdalanM, it's great and simple to use GBM in this py wrapper code.

Now i am run your notebook regression_example_kaggle_allstate.ipynb , get one error in gbmr.predict line, the output message is

<ipython-input-10-0c40d9335a9b> in <module>()
     22 
     23 gbmr.fit(X_train, y_train, test_data=[(X_valid, y_valid)])
---> 24 print("Mean Square Error: ", metrics.mean_absolute_error(y_true=(np.exp(y_valid)-1), y_pred=(np.exp(gbmr.predict(X_valid))-1)))

/usr/local/lib/python2.7/dist-packages/pylightgbm/models.pyc in predict(self, X)
    122 
    123         process = subprocess.check_output([self.exec_path, "config={}".format(conf_filepath)],
--> 124                                           universal_newlines=True)
    125 
    126         if self.verbose:

/usr/lib/python2.7/subprocess.pyc in check_output(*popenargs, **kwargs)
    572         if cmd is None:
    573             cmd = popenargs[0]
--> 574         raise CalledProcessError(retcode, cmd, output=output)
    575     return output
    576 

CalledProcessError: Command '['/home/lyz/Workspace/Github/LightGBM/lightgbm', 'config=/tmp/tmphdAl6R/predict.conf']' returned non-zero exit status 1

Really appreciate your help.

opened by finlay-liu 3

'LIGHTGBM_EXEC' environment variable, cannot be found

The examples show:

path_to_exec = "~/Documents/apps/LightGBM/lightgbm"

This path does not exist in the package

ls /Users/me/LightGBM/
CMakeLists.txt	README.md	docs		include		python-package	tests
LICENSE		build		examples	pmml		src		windows

...and when I try to build a model I get an error

from pylightgbm.models import GBMClassifier

df = pd.read_csv('my_data.csv')

params = {'exec_path': path_to_exec,
      'num_iterations': 1000, 'learning_rate': 0.01,
      'min_data_in_leaf': 1, 'num_leaves': 5,
      'metric': 'binary_error', 'verbose': False,
      'early_stopping_round': 20}

GBMClassifier(params).fit(df['X_var'], df['y_var'])

pyLightGBM is looking for 'LIGHTGBM_EXEC' environment variable, cannot be found.
exec_path will be deprecated in favor of environment variable

/Users/me/anaconda/lib/python2.7/site-packages/pylightgbm/models.pyc in fit(self, X, y, test_data, init_scores)
    129 
    130             process = subprocess.Popen([self.exec_path, "config={}".format(conf_filepath)],
--> 131                                        stdout=subprocess.PIPE, bufsize=1)
    132 
    133         else:

/Users/me/anaconda/lib/python2.7/subprocess.pyc in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags)
    708                                 p2cread, p2cwrite,
    709                                 c2pread, c2pwrite,
--> 710                                 errread, errwrite)
    711         except Exception:
    712             # Preserve original exception in case os.close raises.

/Users/me/anaconda/lib/python2.7/subprocess.pyc in _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, universal_newlines, startupinfo, creationflags, shell, to_close, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite)
   1333                         raise
   1334                 child_exception = pickle.loads(data)
-> 1335                 raise child_exception
   1336 
   1337 

OSError: [Errno 13] Permission denied

opened by cbjorgol 2

new param max_depth added in LightGBM

Hi,

in LightGBM there is new setting for control overfitting for small datasets: max_depth

https://github.com/Microsoft/LightGBM/pull/35

thx for adding

opened by gugatr0n1c 2

PermissionError: [WinError 32] The process cannot access "LightGBM_model.txt"

The model trains and then breaks at the last instance.

The model does output a prediction when called to do so

[LightGBM] [Info] 0.018901 seconds elapsed, finished iteration 99
[LightGBM] [Info] 0.019084 seconds elapsed, finished iteration 100
[LightGBM] [Info] Finished training

... Lots of whitespace ...

---------------------------------------------------------------------------
PermissionError                           Traceback (most recent call last)
<ipython-input-11-f6bccd7e15ae> in <module>()
     17 #x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2)
     18 
---> 19 clf.fit(X, y)
     20 print("Mean Square Error: ", metrics.mean_squared_error(y, clf.predict(X)))

C:\Anaconda\envs\py35\lib\site-packages\pylightgbm\models.py in fit(self, X, y, test_data)
    110         with open(self.param['output_model'], mode='r') as file:
    111             self.model = file.read()
--> 112             shutil.rmtree(tmp_dir)
    113 
    114         if test_data and self.param['early_stopping_round'] > 0:

C:\Anaconda\envs\py35\lib\shutil.py in rmtree(path, ignore_errors, onerror)
    486             os.close(fd)
    487     else:
--> 488         return _rmtree_unsafe(path, onerror)
    489 
    490 # Allow introspection of whether or not the hardening against symlink

C:\Anaconda\envs\py35\lib\shutil.py in _rmtree_unsafe(path, onerror)
    381                 os.unlink(fullname)
    382             except OSError:
--> 383                 onerror(os.unlink, fullname, sys.exc_info())
    384     try:
    385         os.rmdir(path)

C:\Anaconda\envs\py35\lib\shutil.py in _rmtree_unsafe(path, onerror)
    379         else:
    380             try:
--> 381                 os.unlink(fullname)
    382             except OSError:
    383                 onerror(os.unlink, fullname, sys.exc_info())

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\admin\\AppData\\Local\\Temp\\tmpehya862g\\LightGBM_model.txt'

Model prediction call,

[LightGBM] [Info] Finished loading parameters
[LightGBM] [Info] Finished loading 100 models
[LightGBM] [Info] Finished initializing prediction
[LightGBM] [Info] Finished prediction

... Lots of whitespace ...

Mean Square Error:  668.323460051

Just wondering if this error affects the model or what can be done to stop the error being thrown? Any other info regarding this please specify. Thanks.

opened by JoshuaC3 2

Error in installing Via Ipython terminal

Hello,

Whenever I have tried installing it using the ipython terminal or the anaconda command prompt , it throws the following error, saying the procedure entry point SSL_COMP_free_compression_methods could not be located.

The screen shot is attached. Would you possibly know how to fix this error ?? Apparently, i have been unable to search it using keywords on google , so apologies if it's not entirely related. Any help is appreciated

opened by mike-m123 1
Cannot train for n_iteration greater than 2600?

Hi. It seems that I cannot train the model when setting num_iterations greater than 3000? Setting to 5000 throws error: [LightGBM] [Info] 209.081899 seconds elapsed, finished iteration 2600 Traceback (most recent call last): File "/home/lemma/miniconda2/lib/python2.7/site-packages/pylightgbm/models.py", line 143, in fit with open(self.param['output_model'], mode='r') as file: IOError: [Errno 2] No such file or directory: '/tmp/tmpNEm79g/LightGBM_model.txt' Command exited with non-zero status 1

Is there a need to fix this? or no need because no matter how many num_iterations, the result will be the same? Below 3000 is fine though.

opened by kroscek 1
pyLightGBM is looking for LIGHTGBM_EXEC environment Variable
Hello,

I have been trying to run the code of pylightgbm but it raises the exception that

" pyLightGBM is looking for 'LIGHTGBM_EXEC' environment variable, cannot be found. exec_path will be deprecated in favor of environment variable "

I have also tried specifying the path to the lightgbm package installed in the library as well as pointed the path towards the lightgbm or pylightgbm package files which were downloaded from their respective github sources, but none of the files seem to provide the 'LIGHTGBM_EXEC' file/folder.

That is why i think the exception is getting raised again and again, ?? Is there a workaround this Issue.

Also, please see that I have tried using the following link which i think could probably provide a work around, but the last two lines while using conda prompt are NOT clear. That is, pip install requirements points to which source, they are certainly not present in pep8 package that was downloaded. Why are we using setup.py at the end ?? What package does this try install considering pip was already used to install packages

https://github.com/ArdalanM/pyLightGBM/blob/master/.travis.yml

conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION

source activate test-environment

pip install pytest pytest-cov python-coveralls pytest-xdist coverage #we need this version of coverage for coveralls.io to work

pip install pep8 pytest-pep8 _- pip install -r requirements.txt

python setup.py install_

Any suggestion, feedback is welcome
opened by mike-m123 1
Direct using of categorical features

LightGBM can use categorical feature directly.

There is a categorical_feature parameter in LightGBM docs to deal with this behavior: https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.md

It would be nice to add categorical feature support to pyLightGBM.

opened by sashulyak 1
comparison with original lightGBM python package

Hi, what is the difference of pyLightGBM compared to https://github.com/Microsoft/LightGBM/tree/master/python-package is this package still maintained?

opened by geoHeil 1
python example error

clf.fit(x_train, y_train, test_data=[(x_test, y_test)]) Traceback (most recent call last): File "", line 1, in File "/home/apps/duane_tmp/anaconda2/lib/python2.7/site-packages/pylightgbm/models.py", line 131, in fit stdout=subprocess.PIPE, bufsize=1) File "/home/apps/duane_tmp/anaconda2/lib/python2.7/subprocess.py", line 711, in init errread, errwrite) File "/home/apps/duane_tmp/anaconda2/lib/python2.7/subprocess.py", line 1343, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

hello,when try the python example at clf.fit(x_train, y_train, test_data=[(x_test, y_test)]) this line,it response me that above error.could you help me fix that?tks!

opened by Duanexiao 0

FileNotFoundError after validation

I got the error after validation. Here is the code.

import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from pylightgbm.models import GBMClassifier
from sklearn.metrics import roc_auc_score

diabetes = load_diabetes()

split_ind = 200

X = diabetes['data']
y = diabetes['target']

X = pd.DataFrame(X).add_prefix('c')
y = pd.Series(y)
y = (y>150)*1

X_train, X_test = X[:split_ind], X[split_ind:]
y_train, y_test = y[:split_ind], y[split_ind:]

exec = "~/LightGBM/lightgbm"

clf = GBMClassifier(exec_path=exec, 
                    num_iterations=3000,
                    metric='auc',
                    early_stopping_round=20)

clf.fit(X_train, y_train, 
        test_data=[(X_test, y_test)])
clf.param['num_iterations'] = clf.best_round # Also .set_params wouldn't work
clf.fit(X_train, y_train)

The second 'clf.fit' occurs 'FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/~', since in 'def fit' the 'with process.stdout:' does not write LightGBM_model.

bug

opened by KazukiOnodera 6

Owner

Ardalan

GitHub

fhempy is a FHEM binding to write modules in Python language

fhempy (BETA) fhempy allows the usage of Python 3 (NOT 2!) language to write FHEM modules. Python 3.7 or higher is required, therefore I recommend usi

27 Dec 14, 2022

Python binding for Terraform.

Python libterraform Python binding for Terraform. Installation $ pip install libterraform NOTE Please install version 0.3.1 or above, which solves the

28 Dec 29, 2022

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/en-us/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.

Azure SDK for Python This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public

3.4k Jan 3, 2023

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/en-us/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.

Azure SDK for Python This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public

3.4k Jan 2, 2023

A Python SDK for connecting devices to Microsoft Azure IoT services

V2 - We are now GA! This repository contains code for the Azure IoT SDKs for Python. This enables python developers to easily create IoT device soluti

381 Dec 30, 2022

Enumerate Microsoft 365 Groups in a tenant with their metadata

Enumerate Microsoft 365 Groups in a tenant with their metadata Description The all_groups.py script allows to enumerate all Microsoft 365 Groups in a

46 Dec 26, 2022

A Discord token grabber executing in a Microsoft Document.

?? Rage ?? Rage is a tool written in Python3 allowing you to inject a Python3 complete Discord token grabber (Riot) script in a Microsoft Document usi

73 Nov 3, 2022

Bot made with Microsoft Azure' cloud service

IttenWearBot Autori: Antonio Zizzari Simone Giglio IttenWearBot è un bot intelligente dotato di sofisticate tecniche di machile learning che aiuta gli

1 Jan 24, 2022

Automatic login to Microsoft Teams conferences

1 Jan 24, 2022

Official Python client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Python apps.

MonkeyLearn API for Python Official Python client for the MonkeyLearn API. Build and run machine learning models for language processing from your Pyt

157 Nov 22, 2022

PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

PRAW: The Python Reddit API Wrapper PRAW, an acronym for "Python Reddit API Wrapper", is a Python package that allows for simple access to Reddit's AP

3k Dec 29, 2022

PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

PRAW: The Python Reddit API Wrapper PRAW, an acronym for "Python Reddit API Wrapper", is a Python package that allows for simple access to Reddit's AP

3k Dec 29, 2022

alpaca-trade-api-python is a python library for the Alpaca Commission Free Trading API.

alpaca-trade-api-python is a python library for the Alpaca Commission Free Trading API. It allows rapid trading algo development easily, with support for both REST and streaming data interfaces

1.5k Jan 9, 2023

🖥️ Python - P1 Monitor API Asynchronous Python Client

??️ Asynchronous Python client for the P1 Monitor

9 Dec 12, 2022

Volt is yet another discord api wrapper for Python. It supports python 3.8 +

Volt Volt is yet another discord api wrapper for Python. It supports python 3.8 + How to install [Currently Not Supported.] pip install volt.py Speed

11 Nov 21, 2022

Bagas Mirror&Leech Bot is a multipurpose Telegram Bot written in Python for mirroring files on the Internet to our beloved Google Drive. Based on python-aria-mirror-bot

- [ MAYBE UPDATE & ADD MORE MODULE ] Bagas Mirror&Leech Bot Bagas Mirror&Leech Bot is a multipurpose Telegram Bot written in Python for mirroring file

4 Nov 23, 2021

Python binding for Microsoft LightGBM

Related tags

Overview

pyLightGBM: python binding for Microsoft LightGBM

Installation

Examples

Notebooks

Available parameters (default values):

Comments

edit

edit2

Owner

Ardalan

fhempy is a FHEM binding to write modules in Python language

Python binding for Terraform.

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/en-us/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/en-us/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.

A Python SDK for connecting devices to Microsoft Azure IoT services

Enumerate Microsoft 365 Groups in a tenant with their metadata

A Discord token grabber executing in a Microsoft Document.

Bot made with Microsoft Azure' cloud service

Automatic login to Microsoft Teams conferences

Official Python client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Python apps.

PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

alpaca-trade-api-python is a python library for the Alpaca Commission Free Trading API.

🖥️ Python - P1 Monitor API Asynchronous Python Client

Volt is yet another discord api wrapper for Python. It supports python 3.8 +

Bagas Mirror&Leech Bot is a multipurpose Telegram Bot written in Python for mirroring files on the Internet to our beloved Google Drive. Based on python-aria-mirror-bot

A python Discord wrapper made in well, python.

A wrapper for aqquiring Choice Coin directly through a Python Terminal. Leverages the TinyMan Python-SDK.

Python On WhatsApp - Run your python codes on whatsapp along with talking to a chatbot