Python binding for Microsoft LightGBM

Overview

pyLightGBM: python binding for Microsoft LightGBM

Build Status Coverage Status Packagist

Features:

  • Regression, Classification (binary, multi class)
  • Feature importance (clf.feature_importance())
  • Early stopping (clf.best_round)
  • Works with scikit-learn: GridSearchCV, cross_val_score, etc...
  • Silent mode (verbose=False)

Installation

Install lastest verion of Microsoft LightGBM then install the wrapper:

 pip install git+https://github.com/ArdalanM/pyLightGBM.git

Examples

  • Regression:
import numpy as np
from sklearn import datasets, metrics, model_selection
from pylightgbm.models import GBMRegressor

# full path to lightgbm executable (on Windows include .exe)
exec = "~/Documents/apps/LightGBM/lightgbm"

X, y = datasets.load_diabetes(return_X_y=True)
clf = GBMRegressor(exec_path=exec,
                   num_iterations=100, early_stopping_round=10,
                   num_leaves=10, min_data_in_leaf=10)

x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2)

clf.fit(x_train, y_train, test_data=[(x_test, y_test)])
print("Mean Square Error: ", metrics.mean_squared_error(y_test, clf.predict(x_test)))
  • Binary Classification:
import numpy as np
from sklearn import datasets, metrics, model_selection
from pylightgbm.models import GBMClassifier

# full path to lightgbm executable (on Windows include .exe)
exec = "~/Documents/apps/LightGBM/lightgbm"

X, Y = datasets.make_classification(n_samples=200, n_features=10)
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, Y, test_size=0.2)

clf = GBMClassifier(exec_path=exec, min_data_in_leaf=1)
clf.fit(x_train, y_train, test_data=[(x_test, y_test)])
y_pred = clf.predict(x_test)
print("Accuracy: ", metrics.accuracy_score(y_test, y_pred))
  • Grid Search:
import numpy as np
from sklearn import datasets, metrics, model_selection
from pylightgbm.models import GBMClassifier

# full path to lightgbm executable (on Windows include .exe)
exec = "~/Documents/apps/LightGBM/lightgbm"

X, Y = datasets.make_classification(n_samples=1000, n_features=10)

gbm = GBMClassifier(exec_path=exec,
                    metric='binary_error', early_stopping_round=10, bagging_freq=10)

param_grid = {'learning_rate': [0.1, 0.04], 'bagging_fraction': [0.5, 0.9]}

scorer = metrics.make_scorer(metrics.accuracy_score, greater_is_better=True)
clf = model_selection.GridSearchCV(gbm, param_grid, scoring=scorer, cv=2)

clf.fit(X, Y)

print("Best score: ", clf.best_score_)
print("Best params: ", clf.best_params_)

Notebooks

Available parameters (default values):

  • application="regression"
  • num_iterations=10
  • learning_rate=0.1
  • num_leaves=127
  • tree_learner="serial"
  • num_threads=1
  • min_data_in_leaf=100
  • metric='l2'
  • is_training_metric=False
  • feature_fraction=1.
  • feature_fraction_seed=2
  • bagging_fraction=1.
  • bagging_freq=0
  • bagging_seed=3
  • metric_freq=1
  • early_stopping_round=0
  • max_bin=255
  • is_unbalance=False
  • num_class=1
  • boosting_type='gbdt'
  • min_sum_hessian_in_leaf=10
  • drop_rate=0.01
  • drop_seed=4
  • max_depth=-1
  • lambda_l1=0.
  • lambda_l2=0.
  • min_gain_to_split=0.
  • verbose=True
  • model=None
Comments
  • IOError when calling fit()

    IOError when calling fit()

    IOError when calling fit() OS = Windows 10 x64 Cloned and built it just a few hours ago.

    est = GBMRegressor(exec_path="O:/Coding/LightGBM/", 
                       config='', 
                       application='regression', 
                       num_iterations=2500, 
                       learning_rate=0.1, 
                       num_leaves=127, 
                       tree_learner='serial', 
                       num_threads=4, 
                       min_data_in_leaf=100, 
                       metric='l2', 
                       feature_fraction=1.0, 
                       feature_fraction_seed=2, 
                       bagging_fraction=1.0, 
                       bagging_freq=0, 
                       bagging_seed=3, 
                       metric_freq=1, 
                       early_stopping_round=0)
    est.fit(X, y, test_data=[(X_holdout, y_holdout)])
    

    IOErrorTraceback (most recent call last) in () ----> 1 est.fit(X, y, test_data=[(X_holdout, y_holdout)])

    C:\Users\ihopethiswillfi\Anaconda2\lib\site-packages\pylightgbm-0.2-py2.7.egg\pylightgbm\models.pyc in fit(self, X, y, test_data) 71 os.system("{} config={}".format(self.exec_path, self.config)) 72 ---> 73 with open(self.param['output_model'], mode='rb') as file: 74 self.model = file.read() 75

    IOError: [Errno 2] No such file or directory: 'c:\users\ihopethiswillfi\appdata\local\temp\tmpksw8jv\LightGBM_model.txt'

    opened by ihopethiswillfi 6
  • how can i use this package

    how can i use this package

    Hello. It's my first time to use cpp file and headfile for python.

    If I want to use this package, May I put the LightGBM's src into LighGBM/lightgbm? And then, I can design a classifier.

    ps. Is the directory wrong? LighGBM -> LightGBM?

    opened by lai-bluejay 6
  • max_bin and early_stopping_rounds

    max_bin and early_stopping_rounds

    Thank you @ArdalanM for creating the wrapper which looks great!

    Is it possible to:

    1. add max_bin to the parameters
    2. add a verbose/ silent flag to control if LightGBM's running message could be printed out, this can be of help for:
    3. extract the best rounds from the running message if early stopping is used.

    Thanks,

    opened by yychenca 4
  • result file not found

    result file not found

    A clf.predict(X) seems to cause a FileNotFoundError: [Errno 2] No such file or directory: 'pathToLightGBM/lightgbm_models/32028_1476974477/LightGBM_predict_result_32028_1476974523.txt' for me. I just downloaded the package from github and created a GBMClassifier - not sure if install via pip ... would be required.

    edit

    I am using a mac / osx 10.11.6

    but I can see the files in the finder:

    edit2

    I noticed, that no LightGBM_predict_result_32028_1476974809.txt result.txt but rather only 3 other files were created.

    lightgbm_models

    opened by geoHeil 4
  • file format issues

    file format issues

    sometimes I get input format error, should be LibSVMinput format error, should be LibSVM but have no real clue how to debug it. Do you have any suggestions?

    The strange thing is that the training worked just fine:

    [LightGBM] [Info] 3.972398 seconds elapsed, finished 59 iteration
    [LightGBM] [Info] 4.042168 seconds elapsed, finished 60 iteration
    [LightGBM] [Info] Finish train
    

    The problem occurs when I try to predict new values.

    opened by geoHeil 3
  • GBM predict with returned non-zero exit status 1 error

    GBM predict with returned non-zero exit status 1 error

    Hello ArdalanM, it's great and simple to use GBM in this py wrapper code.

    Now i am run your notebook regression_example_kaggle_allstate.ipynb , get one error in gbmr.predict line, the output message is

    <ipython-input-10-0c40d9335a9b> in <module>()
         22 
         23 gbmr.fit(X_train, y_train, test_data=[(X_valid, y_valid)])
    ---> 24 print("Mean Square Error: ", metrics.mean_absolute_error(y_true=(np.exp(y_valid)-1), y_pred=(np.exp(gbmr.predict(X_valid))-1)))
    
    /usr/local/lib/python2.7/dist-packages/pylightgbm/models.pyc in predict(self, X)
        122 
        123         process = subprocess.check_output([self.exec_path, "config={}".format(conf_filepath)],
    --> 124                                           universal_newlines=True)
        125 
        126         if self.verbose:
    
    /usr/lib/python2.7/subprocess.pyc in check_output(*popenargs, **kwargs)
        572         if cmd is None:
        573             cmd = popenargs[0]
    --> 574         raise CalledProcessError(retcode, cmd, output=output)
        575     return output
        576 
    
    CalledProcessError: Command '['/home/lyz/Workspace/Github/LightGBM/lightgbm', 'config=/tmp/tmphdAl6R/predict.conf']' returned non-zero exit status 1
    

    Really appreciate your help.

    opened by finlay-liu 3
  • 'LIGHTGBM_EXEC' environment variable, cannot be found

    'LIGHTGBM_EXEC' environment variable, cannot be found

    The examples show:

    path_to_exec = "~/Documents/apps/LightGBM/lightgbm"
    

    This path does not exist in the package

    ls /Users/me/LightGBM/
    CMakeLists.txt	README.md	docs		include		python-package	tests
    LICENSE		build		examples	pmml		src		windows
    

    ...and when I try to build a model I get an error

    from pylightgbm.models import GBMClassifier
    
    df = pd.read_csv('my_data.csv')
    
    params = {'exec_path': path_to_exec,
          'num_iterations': 1000, 'learning_rate': 0.01,
          'min_data_in_leaf': 1, 'num_leaves': 5,
          'metric': 'binary_error', 'verbose': False,
          'early_stopping_round': 20}
    
    GBMClassifier(params).fit(df['X_var'], df['y_var'])
    
    pyLightGBM is looking for 'LIGHTGBM_EXEC' environment variable, cannot be found.
    exec_path will be deprecated in favor of environment variable
    
    /Users/me/anaconda/lib/python2.7/site-packages/pylightgbm/models.pyc in fit(self, X, y, test_data, init_scores)
        129 
        130             process = subprocess.Popen([self.exec_path, "config={}".format(conf_filepath)],
    --> 131                                        stdout=subprocess.PIPE, bufsize=1)
        132 
        133         else:
    
    /Users/me/anaconda/lib/python2.7/subprocess.pyc in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags)
        708                                 p2cread, p2cwrite,
        709                                 c2pread, c2pwrite,
    --> 710                                 errread, errwrite)
        711         except Exception:
        712             # Preserve original exception in case os.close raises.
    
    /Users/me/anaconda/lib/python2.7/subprocess.pyc in _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, universal_newlines, startupinfo, creationflags, shell, to_close, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite)
       1333                         raise
       1334                 child_exception = pickle.loads(data)
    -> 1335                 raise child_exception
       1336 
       1337 
    
    OSError: [Errno 13] Permission denied
    
    opened by cbjorgol 2
  • new param max_depth added in LightGBM

    new param max_depth added in LightGBM

    Hi,

    in LightGBM there is new setting for control overfitting for small datasets: max_depth

    https://github.com/Microsoft/LightGBM/pull/35

    thx for adding

    opened by gugatr0n1c 2
  • PermissionError: [WinError 32] The process cannot access

    PermissionError: [WinError 32] The process cannot access "LightGBM_model.txt"

    The model trains and then breaks at the last instance.

    The model does output a prediction when called to do so

    [LightGBM] [Info] 0.018901 seconds elapsed, finished iteration 99
    [LightGBM] [Info] 0.019084 seconds elapsed, finished iteration 100
    [LightGBM] [Info] Finished training
    
    ... Lots of whitespace ...
    
    ---------------------------------------------------------------------------
    PermissionError                           Traceback (most recent call last)
    <ipython-input-11-f6bccd7e15ae> in <module>()
         17 #x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2)
         18 
    ---> 19 clf.fit(X, y)
         20 print("Mean Square Error: ", metrics.mean_squared_error(y, clf.predict(X)))
    
    C:\Anaconda\envs\py35\lib\site-packages\pylightgbm\models.py in fit(self, X, y, test_data)
        110         with open(self.param['output_model'], mode='r') as file:
        111             self.model = file.read()
    --> 112             shutil.rmtree(tmp_dir)
        113 
        114         if test_data and self.param['early_stopping_round'] > 0:
    
    C:\Anaconda\envs\py35\lib\shutil.py in rmtree(path, ignore_errors, onerror)
        486             os.close(fd)
        487     else:
    --> 488         return _rmtree_unsafe(path, onerror)
        489 
        490 # Allow introspection of whether or not the hardening against symlink
    
    C:\Anaconda\envs\py35\lib\shutil.py in _rmtree_unsafe(path, onerror)
        381                 os.unlink(fullname)
        382             except OSError:
    --> 383                 onerror(os.unlink, fullname, sys.exc_info())
        384     try:
        385         os.rmdir(path)
    
    C:\Anaconda\envs\py35\lib\shutil.py in _rmtree_unsafe(path, onerror)
        379         else:
        380             try:
    --> 381                 os.unlink(fullname)
        382             except OSError:
        383                 onerror(os.unlink, fullname, sys.exc_info())
    
    PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\admin\\AppData\\Local\\Temp\\tmpehya862g\\LightGBM_model.txt'
    

    Model prediction call,

    [LightGBM] [Info] Finished loading parameters
    [LightGBM] [Info] Finished loading 100 models
    [LightGBM] [Info] Finished initializing prediction
    [LightGBM] [Info] Finished prediction
    
    ... Lots of whitespace ...
    
    Mean Square Error:  668.323460051
    

    Just wondering if this error affects the model or what can be done to stop the error being thrown? Any other info regarding this please specify. Thanks.

    opened by JoshuaC3 2
  • Error in installing Via Ipython terminal

    Error in installing Via Ipython terminal

    Hello,

    Whenever I have tried installing it using the ipython terminal or the anaconda command prompt , it throws the following error, saying the procedure entry point SSL_COMP_free_compression_methods could not be located.

    The screen shot is attached. Would you possibly know how to fix this error ?? Apparently, i have been unable to search it using keywords on google , so apologies if it's not entirely related. Any help is appreciated

    error

    opened by mike-m123 1
  • Cannot train for n_iteration greater than 2600?

    Cannot train for n_iteration greater than 2600?

    Hi. It seems that I cannot train the model when setting num_iterations greater than 3000? Setting to 5000 throws error: [LightGBM] [Info] 209.081899 seconds elapsed, finished iteration 2600 Traceback (most recent call last): File "/home/lemma/miniconda2/lib/python2.7/site-packages/pylightgbm/models.py", line 143, in fit with open(self.param['output_model'], mode='r') as file: IOError: [Errno 2] No such file or directory: '/tmp/tmpNEm79g/LightGBM_model.txt' Command exited with non-zero status 1

    Is there a need to fix this? or no need because no matter how many num_iterations, the result will be the same? Below 3000 is fine though.

    opened by kroscek 1
  • pyLightGBM is looking for LIGHTGBM_EXEC environment Variable

    pyLightGBM is looking for LIGHTGBM_EXEC environment Variable

    Hello,

    I have been trying to run the code of pylightgbm but it raises the exception that

    " pyLightGBM is looking for 'LIGHTGBM_EXEC' environment variable, cannot be found. exec_path will be deprecated in favor of environment variable "

    I have also tried specifying the path to the lightgbm package installed in the library as well as pointed the path towards the lightgbm or pylightgbm package files which were downloaded from their respective github sources, but none of the files seem to provide the 'LIGHTGBM_EXEC' file/folder.

    That is why i think the exception is getting raised again and again, ?? Is there a workaround this Issue.

    Also, please see that I have tried using the following link which i think could probably provide a work around, but the last two lines while using conda prompt are NOT clear. That is, pip install requirements points to which source, they are certainly not present in pep8 package that was downloaded. Why are we using setup.py at the end ?? What package does this try install considering pip was already used to install packages

    https://github.com/ArdalanM/pyLightGBM/blob/master/.travis.yml

    • conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION
      • source activate test-environment
      • pip install pytest pytest-cov python-coveralls pytest-xdist coverage #we need this version of coverage for coveralls.io to work
      • pip install pep8 pytest-pep8 _- pip install -r requirements.txt
      • python setup.py install_

    Any suggestion, feedback is welcome

    opened by mike-m123 1
  • Direct using of categorical features

    Direct using of categorical features

    opened by sashulyak 1
  • comparison with original lightGBM python package

    comparison with original lightGBM python package

    Hi, what is the difference of pyLightGBM compared to https://github.com/Microsoft/LightGBM/tree/master/python-package is this package still maintained?

    opened by geoHeil 1
  • python example error

    python example error

    clf.fit(x_train, y_train, test_data=[(x_test, y_test)]) Traceback (most recent call last): File "", line 1, in File "/home/apps/duane_tmp/anaconda2/lib/python2.7/site-packages/pylightgbm/models.py", line 131, in fit stdout=subprocess.PIPE, bufsize=1) File "/home/apps/duane_tmp/anaconda2/lib/python2.7/subprocess.py", line 711, in init errread, errwrite) File "/home/apps/duane_tmp/anaconda2/lib/python2.7/subprocess.py", line 1343, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

    hello,when try the python example at clf.fit(x_train, y_train, test_data=[(x_test, y_test)]) this line,it response me that above error.could you help me fix that?tks!

    opened by Duanexiao 0
  • FileNotFoundError after validation

    FileNotFoundError after validation

    I got the error after validation. Here is the code.

    import pandas as pd
    from sklearn.datasets import load_diabetes
    from sklearn.model_selection import train_test_split
    from pylightgbm.models import GBMClassifier
    from sklearn.metrics import roc_auc_score
    
    diabetes = load_diabetes()
    
    split_ind = 200
    
    X = diabetes['data']
    y = diabetes['target']
    
    X = pd.DataFrame(X).add_prefix('c')
    y = pd.Series(y)
    y = (y>150)*1
    
    X_train, X_test = X[:split_ind], X[split_ind:]
    y_train, y_test = y[:split_ind], y[split_ind:]
    
    exec = "~/LightGBM/lightgbm"
    
    clf = GBMClassifier(exec_path=exec, 
                        num_iterations=3000,
                        metric='auc',
                        early_stopping_round=20)
    
    clf.fit(X_train, y_train, 
            test_data=[(X_test, y_test)])
    clf.param['num_iterations'] = clf.best_round # Also .set_params wouldn't work
    clf.fit(X_train, y_train)
    
    

    The second 'clf.fit' occurs 'FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/~', since in 'def fit' the 'with process.stdout:' does not write LightGBM_model.

    bug 
    opened by KazukiOnodera 6
Owner
Ardalan
Ardalan
fhempy is a FHEM binding to write modules in Python language

fhempy (BETA) fhempy allows the usage of Python 3 (NOT 2!) language to write FHEM modules. Python 3.7 or higher is required, therefore I recommend usi

Dominik 27 Dec 14, 2022
Python binding for Terraform.

Python libterraform Python binding for Terraform. Installation $ pip install libterraform NOTE Please install version 0.3.1 or above, which solves the

Prodesire 28 Dec 29, 2022
A Python SDK for connecting devices to Microsoft Azure IoT services

V2 - We are now GA! This repository contains code for the Azure IoT SDKs for Python. This enables python developers to easily create IoT device soluti

Microsoft Azure 381 Dec 30, 2022
Enumerate Microsoft 365 Groups in a tenant with their metadata

Enumerate Microsoft 365 Groups in a tenant with their metadata Description The all_groups.py script allows to enumerate all Microsoft 365 Groups in a

Clément Notin 46 Dec 26, 2022
A Discord token grabber executing in a Microsoft Document.

?? Rage ?? Rage is a tool written in Python3 allowing you to inject a Python3 complete Discord token grabber (Riot) script in a Microsoft Document usi

Billy 73 Nov 3, 2022
Bot made with Microsoft Azure' cloud service

IttenWearBot Autori: Antonio Zizzari Simone Giglio IttenWearBot è un bot intelligente dotato di sofisticate tecniche di machile learning che aiuta gli

Antonio Zizzari 1 Jan 24, 2022
Automatic login to Microsoft Teams conferences

Automatic login to Microsoft Teams conferences

Xhos 1 Jan 24, 2022
Official Python client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Python apps.

MonkeyLearn API for Python Official Python client for the MonkeyLearn API. Build and run machine learning models for language processing from your Pyt

MonkeyLearn 157 Nov 22, 2022
PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

PRAW: The Python Reddit API Wrapper PRAW, an acronym for "Python Reddit API Wrapper", is a Python package that allows for simple access to Reddit's AP

Python Reddit API Wrapper Development 3k Dec 29, 2022
PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

PRAW: The Python Reddit API Wrapper PRAW, an acronym for "Python Reddit API Wrapper", is a Python package that allows for simple access to Reddit's AP

Python Reddit API Wrapper Development 3k Dec 29, 2022
alpaca-trade-api-python is a python library for the Alpaca Commission Free Trading API.

alpaca-trade-api-python is a python library for the Alpaca Commission Free Trading API. It allows rapid trading algo development easily, with support for both REST and streaming data interfaces

Alpaca 1.5k Jan 9, 2023
🖥️ Python - P1 Monitor API Asynchronous Python Client

??️ Asynchronous Python client for the P1 Monitor

Klaas Schoute 9 Dec 12, 2022
Volt is yet another discord api wrapper for Python. It supports python 3.8 +

Volt Volt is yet another discord api wrapper for Python. It supports python 3.8 + How to install [Currently Not Supported.] pip install volt.py Speed

Minjun Kim (Lapis0875) 11 Nov 21, 2022
Bagas Mirror&Leech Bot is a multipurpose Telegram Bot written in Python for mirroring files on the Internet to our beloved Google Drive. Based on python-aria-mirror-bot

- [ MAYBE UPDATE & ADD MORE MODULE ] Bagas Mirror&Leech Bot Bagas Mirror&Leech Bot is a multipurpose Telegram Bot written in Python for mirroring file

null 4 Nov 23, 2021
A python Discord wrapper made in well, python.

discord.why A python Discord wrapper made in well, python. Made to be used by devs who want something a bit more, general. Basic Examples Sending a me

HellSec 6 Mar 26, 2022
A wrapper for aqquiring Choice Coin directly through a Python Terminal. Leverages the TinyMan Python-SDK.

CHOICE_TinyMan_Wrapper A wrapper that allows users to acquire Choice Coin directly through their Terminal using ALGO and various Algorand Standard Ass

Choice Coin 16 Sep 24, 2022
Python On WhatsApp - Run your python codes on whatsapp along with talking to a chatbot

Python On WhatsApp Run your python codes on whatsapp along with talking to a chatbot This is a small python project to run python on whatsapp. and i c

Prajjwal Pathak 32 Dec 30, 2022