pywFM is a Python wrapper for Steffen Rendle's factorization machines library libFM

Overview

pywFM

pywFM is a Python wrapper for Steffen Rendle's libFM. libFM is a Factorization Machine library:

Factorization machines (FM) are a generic approach that allows to mimic most factorization models by feature engineering. This way, factorization machines combine the generality of feature engineering with the superiority of factorization models in estimating interactions between categorical variables of large domain. libFM is a software implementation for factorization machines that features stochastic gradient descent (SGD) and alternating least squares (ALS) optimization as well as Bayesian inference using Markov Chain Monte Carlo (MCMC).

For more information regarding Factorization machines and libFM, read Steffen Rendle's paper: Factorization Machines with libFM, in ACM Trans. Intell. Syst. Technol., 3(3), May. 2012

Don't forget to acknowledge libFM (i.e. cite the paper Factorization Machines with libFM) if you publish results produced with this software.

Motivation

While using Python implementations of Factorization Machines, I felt that the current implementations (pyFM and fastFM) had many flaws. Then I though, why re-invent the wheel? Why not use the original libFM?

Sure, it's not Python native yada yada ... But at least we have a bulletproof, battle-tested implementation that we can guide ourselves with.

Installing

First you have to clone and compile libFM repository and set an environment variable to the libFM bin folder:

git clone https://github.com/srendle/libfm /home/libfm
cd /home/libfm/
# taking advantage of a bug to allow us to save model #ShameShame
git reset --hard 91f8504a15120ef6815d6e10cc7dee42eebaab0f
make all
export LIBFM_PATH=/home/libfm/bin/

Make sure you are compiling source from libfm repository and at this specific commit, since pywFM needs the save_model. Beware that the installers and source code in libfm.org are both dated before this commit. I know this is extremely hacky, but since a fix was deployed it only allows the save_model option for SGD or ALS. I don't know why exactly, because it was working well before.

If you use Jupyter take a look at the following issue for some extra notes on getting libfm to work.

Then, install pywFM using pip:

pip install pywFM

Binary installers for the latest released version are available at the Python package index.

Dependencies

  • numpy
  • scipy
  • sklearn
  • pandas

Example

Very simple example taken from Steffen Rendle's paper: Factorization Machines with libFM.

import pywFM
import numpy as np
import pandas as pd

features = np.matrix([
#     Users  |     Movies     |    Movie Ratings   | Time | Last Movies Rated
#    A  B  C | TI  NH  SW  ST | TI   NH   SW   ST  |      | TI  NH  SW  ST
    [1, 0, 0,  1,  0,  0,  0,   0.3, 0.3, 0.3, 0,     13,   0,  0,  0,  0 ],
    [1, 0, 0,  0,  1,  0,  0,   0.3, 0.3, 0.3, 0,     14,   1,  0,  0,  0 ],
    [1, 0, 0,  0,  0,  1,  0,   0.3, 0.3, 0.3, 0,     16,   0,  1,  0,  0 ],
    [0, 1, 0,  0,  0,  1,  0,   0,   0,   0.5, 0.5,   5,    0,  0,  0,  0 ],
    [0, 1, 0,  0,  0,  0,  1,   0,   0,   0.5, 0.5,   8,    0,  0,  1,  0 ],
    [0, 0, 1,  1,  0,  0,  0,   0.5, 0,   0.5, 0,     9,    0,  0,  0,  0 ],
    [0, 0, 1,  0,  0,  1,  0,   0.5, 0,   0.5, 0,     12,   1,  0,  0,  0 ]
])
target = [5, 3, 1, 4, 5, 1, 5]

fm = pywFM.FM(task='regression', num_iter=5)

# split features and target for train/test
# first 5 are train, last 2 are test
model = fm.run(features[:5], target[:5], features[5:], target[5:])
print(model.predictions)
# you can also get the model weights
print(model.weights)

You can also use numpy's array, sklearn's sparse_matrix, and even pandas' DataFrame as features input.

Prediction on new data

Current approach is to send test data as x_test, y_test in run method call. libfm uses the test values to output some results regarding its predictions. They are not used when training the model. y_test can be set as dummy value and just collect the predictions with model.predictions (also disregard the prediction statistics since those will be wrong). For more info check libfm manual.

Running against a new dataset using something like a predict method is not supported yet. Pending feature request: https://github.com/jfloff/pywFM/issues/7

Feel free to PR that change ;)

Usage

Don't forget to acknowledge libFM (i.e. cite the paper Factorization Machines with libFM) if you publish results produced with this software.

FM: Class that wraps libFM parameters. For more information read libFM manual
Parameters
----------
task : string, MANDATORY
        regression: for regression
        classification: for binary classification
num_iter: int, optional
    Number of iterations
    Defaults to 100
init_stdev : double, optional
    Standard deviation for initialization of 2-way factors
    Defaults to 0.1
k0 : bool, optional
    Use bias.
    Defaults to True
k1 : bool, optional
    Use 1-way interactions.
    Defaults to True
k2 : int, optional
    Dimensionality of 2-way interactions.
    Defaults to 8
learning_method: string, optional
    sgd: parameter learning with SGD
    sgda: parameter learning with adpative SGD
    als: parameter learning with ALS
    mcmc: parameter learning with MCMC
    Defaults to 'mcmc'
learn_rate: double, optional
    Learning rate for SGD
    Defaults to 0.1
r0_regularization: int, optional
    bias regularization for SGD and ALS
    Defaults to 0
r1_regularization: int, optional
    1-way regularization for SGD and ALS
    Defaults to 0
r2_regularization: int, optional
    2-way regularization for SGD and ALS
    Defaults to 0
rlog: bool, optional
    Enable/disable rlog output
    Defaults to True.
verbose: bool, optional
    How much infos to print
    Defaults to False.
seed: int, optional
    seed used to reproduce the results
    Defaults to None.
silent: bool, optional
    Completly silences all libFM output
    Defaults to False.
temp_path: string, optional
    Sets path for libFM temporary files. Usefull when dealing with large data.
    Defaults to None (default mkstemp behaviour)
FM.run: run factorization machine model against train and test data

Parameters
----------
x_train : {array-like, matrix}, shape = [n_train, n_features]
    Training data
y_train : numpy array of shape [n_train]
    Target values
x_test: {array-like, matrix}, shape = [n_test, n_features]
    Testing data
y_test : numpy array of shape [n_test]
    Testing target values
x_validation_set: optional, {array-like, matrix}, shape = [n_train, n_features]
    Validation data (only for SGDA)
y_validation_set: optional, numpy array of shape [n_train]
    Validation target data (only for SGDA)

Return
-------
Returns `namedtuple` with the following properties:

predictions: array [n_samples of x_test]
   Predicted target values per element in x_test.
global_bias: float
    If k0 is True, returns the model's global bias w0
weights: array [n_features]
    If k1 is True, returns the model's weights for each features Wj
pairwise_interactions: numpy matrix [n_features x k2]
    Matrix with pairwise interactions Vj,f
rlog: pandas dataframe [nrow = num_iter]
    `pandas` DataFrame with measurements about each iteration

Docker

This repository includes Dockerfile for development and for running pywFM.

  • Run pywFM examples (Dockerfile): if you are only interested in running the examples, you can use the pre-build image availabe in Docker Hub:
# to run examples/simple.py (the one in this README).
docker run --rm -v "$(pwd)":/home/pywfm -w /home/pywfm -ti jfloff/pywfm python examples/simple.py
  • Development of pywFM (Dockerfile): useful if you want to make changes to the repo. Dockerfile defaults to bash.
# to build image
docker build --rm=true -t jfloff/pywfm-dev .
# to run image
docker run --rm -v "$(pwd)":/home/pywfm-dev -w /home/pywfm-dev -ti jfloff/pywfm-dev

Future work

  • Improve the save_model / load_model so we can have a more defined init-fit-predict cycle (perhaps we could inherit from sklearn.BaseEstimator)
  • Can we contribute to libFM repo so save_model is enabled for all learning methods (namely MCMC)?
  • Look up into shared library solution to improve I/O overhead

I'm no factorization machine expert, so this library was just an effort to have libFM as fast as possible in Python. Feel free to suggest features, enhancements; to point out issues; and of course, to post PRs.

License

MIT (see LICENSE.txt file)

Comments
  • FM.run in Example code fails on Windows

    FM.run in Example code fails on Windows

    model = fm.run(features[:5], target[:5], features[5:], target[5:]) line in https://github.com/jfloff/pywFM example fails with the following output. Same error happens with both libFM compiled from sources and using binaries http://www.libfm.org/libfm-1.40.windows.zip.

    `--------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 20 # split features and target for train/test 21 # first 5 are train, last 2 are test ---> 22 model = fm.run(features[:5], target[:5], features[5:], target[5:]) 23 print(model.predictions) 24 # you can also get the model weights

    C:\Miniconda2\lib\site-packages\pywFM__init__.pyc in run(self, x_train, y_train, x_test, y_test, x_validation_set, y_validation_set) 228 # parses rlog into 229 import pandas as pd --> 230 rlog = pd.read_csv(rlog_path, sep='\t') 231 os.close(rlog_fd) 232 os.remove(rlog_path)

    C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision) 527 skip_blank_lines=skip_blank_lines) 528 --> 529 return _read(filepath_or_buffer, kwds) 530 531 parser_f.name = name

    C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in _read(filepath_or_buffer, kwds) 293 294 # Create the parser. --> 295 parser = TextFileReader(filepath_or_buffer, **kwds) 296 297 if (nrows is not None) and (chunksize is not None):

    C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in init(self, f, engine, **kwds) 610 self.options['has_index_names'] = kwds['has_index_names'] 611 --> 612 self._make_engine(self.engine) 613 614 def _get_options_with_defaults(self, engine):

    C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in _make_engine(self, engine) 745 def _make_engine(self, engine='c'): 746 if engine == 'c': --> 747 self._engine = CParserWrapper(self.f, **self.options) 748 else: 749 if engine == 'python':

    C:\Miniconda2\lib\site-packages\pandas\io\parsers.pyc in init(self, src, *_kwds) 1117 kwds['allow_leading_cols'] = self.index_col is not False 1118 -> 1119 self._reader = _parser.TextReader(src, *_kwds) 1120 1121 # XXX

    pandas\parser.pyx in pandas.parser.TextReader.cinit (pandas\parser.c:5030)()

    ValueError: No columns to parse from file`

    opened by pablojrios 23
  • Cannot produce Test(ll) results locally

    Cannot produce Test(ll) results locally

    Hi,

    I've been testing pywFM package and my question involves understanding how the model.prediction links to the information that is produced in the output

    My specific example: If i run libFM with a train and test dataset, i can see in the output test(ll) drops to 0.515385, if i take the predictions and run the test predictions against the test label i get logloss values of 8.134375875846, where i should get 0.515385

    For clarity please see the thread i started on Kaggle which also enables you to download the data and reproduce the error.

    Full example code: https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/forums/t/19319/help-with-libfm/110652#post110652

    opened by mpearmain 19
  • Python3?

    Python3?

    Hi,

    Does pywFM support Python3? I tried to run the example and got this error

    model = fm.run(features[:5], target[:5], features[5:], target[5:])
    /bin/sh: /usr/local/lib/python3.5/site-packages/pywFM/libfm/bin/libFM: cannot execute binary file
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-8-85e95dc4c2fd> in <module>()
    ----> 1 model = fm.run(features[:5], target[:5], features[5:], target[5:])
    
    /usr/local/lib/python3.5/site-packages/pywFM/__init__.py in run(self, x_train, y_train, x_test, y_test)
        184         # parses rlog into
        185         import pandas as pd
    --> 186         rlog = pd.read_csv(rlog_path, sep='\t')
        187
        188         # removes temporary output file after using
    
    /usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
        496                     skip_blank_lines=skip_blank_lines)
        497
    --> 498         return _read(filepath_or_buffer, kwds)
        499
        500     parser_f.__name__ = name
    
    /usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
        273
        274     # Create the parser.
    --> 275     parser = TextFileReader(filepath_or_buffer, **kwds)
        276
        277     if (nrows is not None) and (chunksize is not None):
    
    /usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
        588             self.options['has_index_names'] = kwds['has_index_names']
        589
    --> 590         self._make_engine(self.engine)
        591
        592     def _get_options_with_defaults(self, engine):
    
    /usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
        729     def _make_engine(self, engine='c'):
        730         if engine == 'c':
    --> 731             self._engine = CParserWrapper(self.f, **self.options)
        732         else:
        733             if engine == 'python':
    
    /usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
       1101         kwds['allow_leading_cols'] = self.index_col is not False
       1102
    -> 1103         self._reader = _parser.TextReader(src, **kwds)
       1104
       1105         # XXX
    
    pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5030)()
    
    ValueError: No columns to parse from file
    
    enhancement 
    opened by yitang 17
  • problem of running in jupyter notebook

    problem of running in jupyter notebook

    I followed the instructions as you wrote on the github and can successfully run the test codes. however, I met the following error when running in my jupyter notebook


    OSError Traceback (most recent call last) in () 16 target = [5, 3, 1, 4, 5, 1, 5] 17 ---> 18 fm = pywFM.FM(task='regression', num_iter=5) 19 20 # split features and target for train/test

    /usr/local/lib/python2.7/dist-packages/pywFM/init.pyc in init(self, task, num_iter, init_stdev, k0, k1, k2, learning_method, learn_rate, r0_regularization, r1_regularization, r2_regularization, rlog, verbose, seed, silent, temp_path)

    OSError: LIBFM_PATH is not set. Please install libFM and set the path variable (https://github.com/jfloff/pywFM#installing).

    actually I have already set the LIBFM_PATH in my ~/.bashrc file as this: export LIBFM_PATH=$HOME/local/libfm/bin I don't know why the jupyter notebook cannot find this path

    opened by kkhuang1990 8
  • an Error in Pandas

    an Error in Pandas

    Hi. I tried your sample code, then an error occurred in "model = fm.run(features[:5], target[:5] , features[5:], target[5:])". Then, it says "EmptyDataError: No columns to parse from file".

    Could you tell me how to solve this problem??

    opened by naaktslaktauge 7
  • Path not set

    Path not set

    Hi, trying the use this wrapper and I am getting the error

    OSError                                   Traceback (most recent call last)
    <ipython-input-1-6bd4815fa3b6> in <module>()
         16 target = [5, 3, 1, 4, 5, 1, 5]
         17 
    ---> 18 fm = pywFM.FM(task='regression', num_iter=5)
         19 
         20 # split features and target for train/test
    
    /usr/local/lib/python3.5/dist-packages/pywFM/__init__.py in __init__(self, task, num_iter, init_stdev, k0, k1, k2, learning_method, learn_rate, r0_regularization, r1_regularization, r2_regularization, rlog, verbose, silent, temp_path)
        103         self.__libfm_path = os.environ.get('LIBFM_PATH')
        104         if self.__libfm_path is None:
    --> 105             raise OSError("`LIBFM_PATH` is not set. Please install libFM and set the path variable (https://github.com/jfloff/pywFM#installing).")
        106 
        107     def run(self, x_train, y_train, x_test, y_test, x_validation_set=None, y_validation_set=None):
    
    OSError: `LIBFM_PATH` is not set. Please install libFM and set the path variable (https://github.com/jfloff/pywFM#installing).
    

    I followed the install instructions exactly, specifically making sure I did the export path part correctly. What might be going wrong here?

    opened by JoshuaC3 6
  • Predict for new data.

    Predict for new data.

    Say I trained the model with

    fm.run(train_x, train_y, val_x, val_y)

    How do i run prediction for another dataset?

    pred_y = fm.run(test_x)

    run method expects y_test as input, Which doesn't make sense at all. run(self, x_train, y_train, x_test, y_test, x_validation_set=None, y_validation_set=None, meta=None)

    opened by vi3k6i5 5
  • Problem with  -save_model on Windows Running toy example

    Problem with -save_model on Windows Running toy example

    If I run the toy example:

     import pywFM
     import numpy as np
     import pandas as pd
    
     features = np.matrix([
     #     Users  |     Movies     |    Movie Ratings   | Time | Last Movies Rated
     #    A  B  C | TI  NH  SW  ST | TI   NH   SW   ST  |      | TI  NH  SW  ST
        [1, 0, 0,  1,  0,  0,  0,   0.3, 0.3, 0.3, 0,     13,   0,  0,  0,  0 ],
        [1, 0, 0,  0,  1,  0,  0,   0.3, 0.3, 0.3, 0,     14,   1,  0,  0,  0 ],
        [1, 0, 0,  0,  0,  1,  0,   0.3, 0.3, 0.3, 0,     16,   0,  1,  0,  0 ],
        [0, 1, 0,  0,  0,  1,  0,   0,   0,   0.5, 0.5,   5,    0,  0,  0,  0 ],
        [0, 1, 0,  0,  0,  0,  1,   0,   0,   0.5, 0.5,   8,    0,  0,  1,  0 ],
        [0, 0, 1,  1,  0,  0,  0,   0.5, 0,   0.5, 0,     9,    0,  0,  0,  0 ],
        [0, 0, 1,  0,  0,  1,  0,   0.5, 0,   0.5, 0,     12,   1,  0,  0,  0 ]
     ])
     target = [5, 3, 1, 4, 5, 1, 5]
    
     fm = pywFM.FM(task='regression', num_iter=5)
    
     model = fm.run(features[:5], target[:5], features[5:], target[5:])
    

    I got this error

    ERROR: the parameter save_model does not exist
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Users\FedRo\Anaconda\lib\site-packages\pywFM\__init__.py", line 231, in run
        rlog = pd.read_csv(rlog_path, sep='\t')
      File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 498, in parser_f
        return _read(filepath_or_buffer, kwds)
      File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 275, in _read
        parser = TextFileReader(filepath_or_buffer, **kwds)
      File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 590, in __init__
        self._make_engine(self.engine)
      File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 731, in _make_engine
        self._engine = CParserWrapper(self.f, **self.options)
      File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1103, in __init__
        self._reader = _parser.TextReader(src, **kwds)
      File "pandas\parser.pyx", line 518, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:5030)
    ValueError: No columns to parse from file
    

    so it looks like that -save_model doesn't work. My temporary solution is change lines 162 and 163 in __init__.py from:

                    "-verbosity %d" % self.__verbose,
                    "-save_model %s" % model_path]
    

    to:

                    "-verbosity %d" % self.__verbose]
                    #"-save_model %s" % model_path]
    

    The model starts working, you can use model prediction, but the model isn't saved.

    opened by fedro86 4
  • problem of regression results

    problem of regression results

    I ran your example code as follows

    import pywFM import numpy as np import pandas as pd

    features = np.matrix([

    [1, 0, 0,  1,  0,  0,  0,   0.3, 0.3, 0.3, 0,     13,   0,  0,  0,  0 ],
    [1, 0, 0,  0,  1,  0,  0,   0.3, 0.3, 0.3, 0,     14,   1,  0,  0,  0 ],
    [1, 0, 0,  0,  0,  1,  0,   0.3, 0.3, 0.3, 0,     16,   0,  1,  0,  0 ],
    [0, 1, 0,  0,  0,  1,  0,   0,   0,   0.5, 0.5,   5,    0,  0,  0,  0 ],
    [0, 1, 0,  0,  0,  0,  1,   0,   0,   0.5, 0.5,   8,    0,  0,  1,  0 ],
    [0, 0, 1,  1,  0,  0,  0,   0.5, 0,   0.5, 0,     9,    0,  0,  0,  0 ],
    [0, 0, 1,  0,  0,  1,  0,   0.5, 0,   0.5, 0,     12,   1,  0,  0,  0 ]
    

    ]) target = [5, 3, 1, 4, 5, 1, 5]

    fm = pywFM.FM(task='regression', num_iter=1000, verbose=True)

    model = fm.run(features[:5], target[:5], features[5:], target[5:]) print(model.predictions) print(model.weights)

    and the predictions are not so good, like this: [3.71942, 3.4779] [-0.36734, -1.25636, 1.04973, -2.0381, -2.07228, 0.0822247, -0.202202, -1.26609, -2.40143, -0.568957, -2.13888, -1.41459, 0.36015, 0.787539, -0.303377] actually the GT is [1, 5]. I tried to implement with even more iterations and the result is still not so good. how about the result of your implementation.

    thx

    opened by kkhuang1990 2
  • Problem on Windows Running toy example

    Problem on Windows Running toy example

    Hi, I installed pywFM with in a Anaconda environment. I pip installed it right, then I set as environment variable LIBFM_PATH. Then I run the program:

    `

    import pywFM
    import numpy as np
    import pandas as pd
    
    features = np.matrix([
    #     Users  |     Movies     |    Movie Ratings   | Time | Last Movies Rated
    #    A  B  C | TI  NH  SW  ST | TI   NH   SW   ST  |      | TI  NH  SW  ST
        [1, 0, 0,  1,  0,  0,  0,   0.3, 0.3, 0.3, 0,     13,   0,  0,  0,  0 ],
        [1, 0, 0,  0,  1,  0,  0,   0.3, 0.3, 0.3, 0,     14,   1,  0,  0,  0 ],
        [1, 0, 0,  0,  0,  1,  0,   0.3, 0.3, 0.3, 0,     16,   0,  1,  0,  0 ],
        [0, 1, 0,  0,  0,  1,  0,   0,   0,   0.5, 0.5,   5,    0,  0,  0,  0 ],
        [0, 1, 0,  0,  0,  0,  1,   0,   0,   0.5, 0.5,   8,    0,  0,  1,  0 ],
        [0, 0, 1,  1,  0,  0,  0,   0.5, 0,   0.5, 0,     9,    0,  0,  0,  0 ],
        [0, 0, 1,  0,  0,  1,  0,   0.5, 0,   0.5, 0,     12,   1,  0,  0,  0 ]
    ])
    target = [5, 3, 1, 4, 5, 1, 5]
    
    fm = pywFM.FM(task='regression', num_iter=5)
    
    # split features and target for train/test
    # first 5 are train, last 2 are test
    model = fm.run(features[:5], target[:5], features[5:], target[5:])
    

    `

    But the last line gives me an error:

    "C:\Users\[...]\Anaconda\Lib\site-packages\pywFMlibFM" non è riconosciuto come comando interno o esterno, 
     un programma eseguibile o un file batch. (it means that it's not recognized as a internal or external command, an executable or a file batch)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Users\FedRo\Anaconda\lib\site-packages\pywFM\__init__.py", line 230, in run
        rlog = pd.read_csv(rlog_path, sep='\t')
      File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 498, in parser_f
        return _read(filepath_or_buffer, kwds)
      File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 275, in _read
        parser = TextFileReader(filepath_or_buffer, **kwds)
      File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 590, in __init__
        self._make_engine(self.engine)
      File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 731, in _make_engine
        self._engine = CParserWrapper(self.f, **self.options)
      File "C:\Users\FedRo\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1103, in __init__
        self._reader = _parser.TextReader(src, **kwds)
      File "pandas\parser.pyx", line 518, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:5030)
    ValueError: No columns to parse from file
    

    Could you point me please where is the error? Thank you

    opened by fedro86 2
  • Added quick fix which should allow the model to run only the single-i…

    Added quick fix which should allow the model to run only the single-i…

    If the user sets the number of two-way interactions to 0, the model output file used by the wrapper outputs blanks under the pairwise interaction label. Under the current version, this raises a ValueError in the wrapper because wrapper assumes there will always be a compatible float following the two-way interaction header.

    I added a quick fix which catches this ValueError and sets the two-way interactions to 0 if an incompatible (non-numerical) value is parsed. This allows users to input the number of two-way interactions (k2) as 0, which is useful for running the one-way or global-bias models as a baseline. This is more closely inline with the libfm Binary input behavior, which, when the number of two-way interactions is set to 0, models only the global or one-way biases.

    There may be a way to handle the k2 input as 0 earlier in the setup phase of the wrapper, but this fix works. Catching the exception as a ValueError may create issues if for some reason the temporary model output file outputs other incompatible values which don't appear as a result of the user inputting 0 for k2.

    opened by bbradt 1
  • will it work for third order categorical features interaction

    will it work for third order categorical features interaction

    Great code, thanks !

    Plese help to understand 1 will it work for third order categorical features interaction ? 2 will it run on Windows computer ?

    3 will it work for sparse data ?

    opened by Sandy4321 0
  • Running error with __init__.py

    Running error with __init__.py

    Hi, I am trying the example code but got some error message w.r.t the init.py (see below). Is there anything I can do to avoid the error? Thanks!


    EmptyDataError Traceback (most recent call last) in () 20 # split features and target for train/test 21 # first 5 are train, last 2 are test ---> 22 model = fm.run(features[:5], target[:5], features[5:], target[5:]) 23 print(model.predictions) 24 # you can also get the model weights

    F:\Anaconda\lib\site-packages\pywFM_init_.py in run(self, x_train, y_train, x_test, y_test, x_validation_set, y_validation_set) 228 # parses rlog into 229 import pandas as pd --> 230 rlog = pd.read_csv(rlog_path, sep='\t') 231 os.close(rlog_fd) 232 os.remove(rlog_path)

    opened by helenxu 8
  • Global bias is None, even k0 is True

    Global bias is None, even k0 is True

    I was trying to calculate the prediction value myself with the weight output by model and always got the wrong answer. Then I realized that the bias value is always None, even if I set k0=True manually.

    fm = pywFM.FM(task='regression', num_iter=5, k2=2, k0=True)

    I don't know whether the wrong answer came from the absence of bias, but it seems strange that the global_bias is always None.

    How can I fix it? Or, is there any possibility I can calculate the prediction myself?

    opened by snowsteper 2
  • Model cannot be saved when k2 = 0

    Model cannot be saved when k2 = 0

    When the latent dimension is 0, libFM still performs training but the pywFM can't save the model.

    Code:

    fm = pywFM.FM(task='classification', num_iter=5, k2=0, rlog=False)
    

    Error:

    Traceback (most recent call last):
      File "fm_assist.py", line 34, in <module>
        model = fm.run(X_train, df_train['outcome'], X_test, df_test['outcome'])
      File "/Users/jilljenn/code/TF-recomm/venv/lib/python3.6/site-packages/pywFM/__init__.py", line 222, in run
        pairwise_interactions.append([float(x) for x in line.split(' ')])
      File "/Users/jilljenn/code/TF-recomm/venv/lib/python3.6/site-packages/pywFM/__init__.py", line 222, in <listcomp>
        pairwise_interactions.append([float(x) for x in line.split(' ')])
    ValueError: could not convert string to float: 
    
    opened by jilljenn 5
Factorization machines in python

Factorization Machines in Python This is a python implementation of Factorization Machines [1]. This uses stochastic gradient descent with adaptive re

Corey Lynch 892 Jan 3, 2023
TensorFlow implementation of an arbitrary order Factorization Machine

This is a TensorFlow implementation of an arbitrary order (>=2) Factorization Machine based on paper Factorization Machines with libFM. It supports: d

Mikhail Trofimov 785 Dec 21, 2022
High performance implementation of Extreme Learning Machines (fast randomized neural networks).

High Performance toolbox for Extreme Learning Machines. Extreme learning machines (ELM) are a particular kind of Artificial Neural Networks, which sol

Anton Akusok 174 Dec 7, 2022
[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

TensorFrames (Deprecated) Note: TensorFrames is deprecated. You can use pandas UDF instead. Experimental TensorFlow binding for Scala and Apache Spark

Databricks 757 Dec 31, 2022
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

null 23.3k Dec 31, 2022
Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir

null 6 Jun 30, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022
Open source time series library for Python

PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible array

Ross Taylor 2k Jan 2, 2023
MLBox is a powerful Automated Machine Learning python library.

MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle

Axel 1.4k Jan 6, 2023
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 3, 2023
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 5, 2023
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
A python library for easy manipulation and forecasting of time series.

Time Series Made Easy in Python darts is a python library for easy manipulation and forecasting of time series. It contains a variety of models, from

Unit8 5.2k Jan 4, 2023
STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

TD Ameritrade 2.5k Jan 6, 2023
A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

matrixprofile-ts matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keo

Target 696 Dec 26, 2022
A python library for Bayesian time series modeling

PyDLM Welcome to pydlm, a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and W

Sam 438 Dec 17, 2022
Uber Open Source 1.6k Dec 31, 2022
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

null 152 Jan 2, 2023
CobraML: Completely Customizable A python ML library designed to give the end user full control

CobraML: Completely Customizable What is it? CobraML is a python library built on both numpy and numba. Unlike other ML libraries CobraML gives the us

Sriram Govindan 14 Dec 19, 2021