TensorFlow implementation of an arbitrary order Factorization Machine

Mikhail Trofimov

Last update: Dec 21, 2022

Related tags

Overview

This is a TensorFlow implementation of an arbitrary order (>=2) Factorization Machine based on paper Factorization Machines with libFM.

It supports:

dense and sparse inputs
different (gradient-based) optimization methods
classification/regression via different loss functions (logistic and mse implemented)
logging via TensorBoard

The inference time is linear with respect to the number of features.

Tested on Python3.5, but should work on Python2.7

This implementation is quite similar to the one described in Blondel's et al. paper [https://arxiv.org/abs/1607.07195], but was developed independently and prior to the first appearance of the paper.

Dependencies

Installation

Stable version can be installed via pip install tffm.

Usage

The interface is similar to scikit-learn models. To train a 6-order FM model with rank=10 for 100 iterations with learning_rate=0.01 use the following sample

from tffm import TFFMClassifier
model = TFFMClassifier(
    order=6,
    rank=10,
    optimizer=tf.train.AdamOptimizer(learning_rate=0.01),
    n_epochs=100,
    batch_size=-1,
    init_std=0.001,
    input_type='dense'
)
model.fit(X_tr, y_tr, show_progress=True)

See example.ipynb and gpu_benchmark.ipynb for more details.

It's highly recommended to read tffm/core.py for help.

Testing

Just run python test.py in the terminal. nosetests works too, but you must pass the --logging-level=WARNING flag to avoid printing insane amounts of TensorFlow logs to the screen.

Citation

If you use this software in academic research, please, cite it using the following BibTeX:

@misc{trofimov2016,
author = {Mikhail Trofimov, Alexander Novikov},
title = {tffm: TensorFlow implementation of an arbitrary order Factorization Machine},
year = {2016},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/geffy/tffm}},
}

Comments

Custom loss function

Added support for custom loss functions in TFFMClassifier. Implemented cross-entropy in the utils module. Also added a class_weight parameter, which automatically uses weighted cross-entropy. One can use class_weight = "balanced" to use the heuristic pos_weight = n_negative / n_positive, where n_positive is the number of positive samples in the training labels.

Note that custom loss functions are disallowed for TFFMRegressor at the moment. Adding them would be trivial, but I don't know of a use case where you'd want something other than MSE, so I left it alone.

I will add a demo of this functionality to the example notebook sometime soon.

opened by peterewills 8
Implement regression part

Thanks for the great repo, which inspired me on learning tensorflow, I am trying to understand the tffm code line by line.

I am planning to implement the regression part. It's not easy for me, but I would like to give it a try. If anybody have finished this, please give me some hint.

opened by Vimos 6
Incorrect code when order >= 3

The code for computing predictions (https://github.com/geffy/tffm/blob/master/tffm.py#L212) is incorrect.

You naively applied Lemma 3.1 from Rendle's original paper (http://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf) but this is incorrect when order >= 3.

If you compare with the predictions obtained by Equation (5) in the paper, you'll see that the predictions are not the same.

opened by mblondel 6
got NaN issue running on a sparse data

Hi,

I tried to run TFFMClassifier on a sparse data, (for example: https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt), but got an error when fitting the data, the data is loaded with load_svmlight_file ========================= trace log ======================== tensorflow.python.framework.errors.InvalidArgumentError: NaN or Inf in target value : Tensor had NaN values [[Node: target/CheckNumerics = CheckNumericsT=DT_FLOAT, _class=["loc:@add"], message="NaN or Inf in target value", _device="/job:localhost/replica:0/task:0/cpu:0"]] Caused by op u'target/CheckNumerics', defined at: File "/usr/local/lib/python2.7/dist-packages/tffm/testtffm.py", line 51, in model.fit(X_tr.toarray(), y_tr, show_progress=True) File "/usr/local/lib/python2.7/dist-packages/tffm/tffm/base.py", line 242, in fit self.core.build_graph() File "/usr/local/lib/python2.7/dist-packages/tffm/tffm/core.py", line 208, in build_graph self.init_target() File "/usr/local/lib/python2.7/dist-packages/tffm/tffm/core.py", line 191, in init_target msg='NaN or Inf in target value', name='target') File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/numerics.py", line 42, in verify_tensor_all_finite

verify_input = array_ops.check_numerics(t, message=msg)

it seems the problem is in self.loss = self.loss_function(self.outputs, self.train_y), that somehow generated NaN points.

Can someone look at this issue? thanks.

opened by VinceShieh 5
Make installable

Was poking around and noticed this wasn't installable. Tensorflow isn't on PyPI AFAIK, so this will fail if it's not installed but the error message is ok. Might want to add a note to the README. Not sure what you want to put for version and author information.

opened by jseabold 4
introduced new parameter to have a different batch size for training and testing

Hi geffy! I did a minor change to make it possible to do the prediction in a different batch size than the batch size that is used for training.

This is useful when you want to predict the scores for a given matrix (for example test set matrix) in one go.

Best, Babak.

opened by babakx 3
about l2 regularization

hello geffy, tffm is very useful, here I have a small question: how do you deal with the l2 regularization when input is sparse, I mean I look at your code, your regularization term include all of the parameters, but when the input data is very sparse, factorization machine only use a small part of parameters(e.g. not all of the w at order 1 can be used), if you do like this, all of the parameters will change, instead of only changing those part of parameters. Do you think it is a issue?

opened by graytowne 3
is the output of model the same as original FM?

is the self.outputs in the core.py the same as:

$p=w_0+\sum_{i=1}^{n}w_ix_i+1/2\sum_{f=1}^{k}((\sum_{i=1}^{n}v_{i,f}x_{i})^2-\sum_{i=1}^{n}v_{i,f}^{2}x_{i}^{2})$

from the original FM model paper?

I have trained a model using tffm, and I want to use saved model to inference by c++ code, so I need to rewrite a output by myself, I find the output in your model is a little confusing, so does it have a equation for order 2 FM model like above one? or is it the same equation?

opened by kdqzzxxcc 2
Use of self.train_w in TFFMCore.init_loss()
I am trying to understand how tffm is working but I can't figure out why self.loss is obtained by multiplying self.loss_function by self.train_w in the class TFFMCore. I would have thought that self.train_w shouldn't be there...

def init_loss(self): with tf.name_scope('loss') as scope: self.loss = self.loss_function(self.outputs, self.train_y) * self.train_w self.reduced_loss = tf.reduce_mean(self.loss) tf.summary.scalar('loss', self.reduced_loss)
opened by martincousi 2
Question about data format

Hi! I just want to ask about do we need to transform every feature column in the dataset to 0/1 representation? I know we need to transform the categorical variables, but what about the numerical variables (like price)? Do we also need to transform them?

Besides, when I tried to transform all variables to 0/1 representations, I got 550+ columns, and I also have 100,000 rows. When I train the model, I always got this error: NaN or Inf in w[2]. : Tensor had NaN values But I am pretty sure there are no other numbers other than 0/1. How does it happen? However, when I only use 90,000 rows of my dataset, this problem disappears.I really don't know why and I really need your help!!!

Thank a lot!!! Weisi

opened by BlaBlaPer 2

Errors while working with TensorFlow 1.3

I noticed README mentions TF 1.0 but thought report this and if easy I can fix it. So running test.py for TF 1.3 results in errors below and it seems decision_function() has changed:

======================================================================
ERROR: test_dense_FM (__main__.TestFM)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 54, in test_dense_FM
    self.decision_function_order_4(input_type='dense', use_diag=False)
  File "test.py", line 48, in decision_function_order_4
    actual = model.decision_function(X)
TypeError: decision_function() takes exactly 3 arguments (2 given)

======================================================================
ERROR: test_dense_PN (__main__.TestFM)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 57, in test_dense_PN
    self.decision_function_order_4(input_type='dense', use_diag=True)
  File "test.py", line 48, in decision_function_order_4
    actual = model.decision_function(X)
TypeError: decision_function() takes exactly 3 arguments (2 given)

======================================================================
ERROR: test_sparse_FM (__main__.TestFM)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 60, in test_sparse_FM
    self.decision_function_order_4(input_type='sparse', use_diag=False)
  File "test.py", line 48, in decision_function_order_4
    actual = model.decision_function(X)
TypeError: decision_function() takes exactly 3 arguments (2 given)

======================================================================
ERROR: test_sparse_PN (__main__.TestFM)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 63, in test_sparse_PN
    self.decision_function_order_4(input_type='sparse', use_diag=True)
  File "test.py", line 48, in decision_function_order_4
    actual = model.decision_function(X)
TypeError: decision_function() takes exactly 3 arguments (2 given)

----------------------------------------------------------------------
Ran 4 tests in 8.957s

FAILED (errors=4)

opened by aliostad 2

model.fit(X_tr, y_tr, show_progress=True)

I used sparse style and the following error occurs: `AttributeError Traceback (most recent call last) in ----> 1 model.fit(X_tr, y_tr, show_progress=True)

~\anaconda3\lib\site-packages\tffm\models.py in fit(self, X, y, sample_weight, n_epochs, show_progress) 124 def fit(self, X, y, sample_weight=None, n_epochs=None, show_progress=False): 125 sample_weight = np.ones_like(y) if sample_weight is None else sample_weight --> 126 self.fit(X=X, y_=y, w_=sample_weight, n_epochs=n_epochs, show_progress=show_progress) 127 128 def predict(self, X, pred_batch_size=None):

~\anaconda3\lib\site-packages\tffm\base.py in fit(self, X, y_, w_, n_epochs, show_progress) 224 # iterate over batches 225 for bX, bY, bW in batcher(X_[perm], y_=y_[perm], w_=w_[perm], batch_size=self.batch_size): --> 226 fd = batch_to_feeddict(bX, bY, bW, core=self.core) 227 ops_to_run = [self.core.trainer, self.core.target, self.core.summary_op] 228 result = self.session.run(ops_to_run, feed_dict=fd)

~\anaconda3\lib\site-packages\tffm\base.py in batch_to_feeddict(X, y, w, core) 79 # sparse case 80 X_sparse = X.tocoo() ---> 81 fd[core.raw_indices] = np.hstack( 82 (X_sparse.row[:, np.newaxis], X_sparse.col[:, np.newaxis]) 83 ).astype(np.int64)

AttributeError: 'TFFMCore' object has no attribute 'raw_indices'`

opened by zenglongjin 0
ALS / MCME

This is not an issue but more of a question. Is it so that tffm supports SGD but not ALS and MCMC algorithms? This was my understanding with a quick look at the code and readme.

opened by juhoimmonen 1
Issue with tensorflow 2.0: module 'tensorflow_core._api.v2.train' has no attribute 'AdamOptimizer'

pip install tensorflow==2.0 import numpy as np import tensorflow as tf

from tffm import TFFMClassifier

gives error:

AttributeError Traceback (most recent call last) in ----> 1 from tffm import TFFMClassifier

~/anaconda3/lib/python3.7/site-packages/tffm/init.py in ----> 1 from .models import TFFMClassifier, TFFMRegressor 2 3 all = ['TFFMClassifier', 'TFFMRegressor']

~/anaconda3/lib/python3.7/site-packages/tffm/models.py in 2 3 import numpy as np ----> 4 from .base import TFFMBaseModel 5 from .utils import loss_logistic, loss_mse, sigmoid 6

~/anaconda3/lib/python3.7/site-packages/tffm/base.py in 1 import tensorflow as tf ----> 2 from .core import TFFMCore 3 from sklearn.base import BaseEstimator 4 from abc import ABCMeta, abstractmethod 5 import six

~/anaconda3/lib/python3.7/site-packages/tffm/core.py in 4 5 ----> 6 class TFFMCore(): 7 """This class implements underlying routines about creating computational graph. 8

~/anaconda3/lib/python3.7/site-packages/tffm/core.py in TFFMCore() 94 """ 95 def init(self, order=2, rank=2, input_type='dense', loss_function=None, ---> 96 optimizer=tf.train.AdamOptimizer(learning_rate=0.01), reg=0, 97 init_std=0.01, use_diag=False, reweight_reg=False, 98 seed=None):

AttributeError: module 'tensorflow_core._api.v2.train' has no attribute 'AdamOptimizer'

opened by hugocool 2
Huge variations in predictions as the seed changes.

I ran the model on my data and could see huge variations in predictions with respect to previous training on the same data. How can I tackle that so that the prediction is as good as possible?

opened by akshit96 0

Owner

Mikhail Trofimov

GitHub

Factorization machines in python

Factorization Machines in Python This is a python implementation of Factorization Machines [1]. This uses stochastic gradient descent with adaptive re

892 Jan 3, 2023

fastFM: A Library for Factorization Machines

Citing fastFM The library fastFM is an academic project. The time and resources spent developing fastFM are therefore justified by the number of citat

1k Dec 24, 2022

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Petastorm Contents Petastorm Installation Generating a dataset Plain Python API Tensorflow API Pytorch API Spark Dataset Converter API Analyzing petas

1.6k Dec 31, 2022

SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow

SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow, in High Performance Computing (HPC) simulations and workloads.

139 Jan 1, 2023

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

8.9k Jan 9, 2023

Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

84 Nov 25, 2022

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

8.1k Dec 30, 2022

CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

19 Oct 3, 2022

Extreme Learning Machine implementation in Python

Python-ELM v0.3 ---> ARCHIVED March 2021 <--- This is an implementation of the Extreme Learning Machine [1][2] in Python, based on scikit-learn. From

511 Dec 20, 2022

Relevance Vector Machine implementation using the scikit-learn API.

scikit-rvm scikit-rvm is a Python module implementing the Relevance Vector Machine (RVM) machine learning technique using the scikit-learn API. Quicks

204 Nov 18, 2022

Machine learning algorithms implementation

Machine learning algorithms implementation This repository consisits of implementation of various machine learning algorithms. The algorithms implemen

1 Jan 3, 2022

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

12.9k Jan 7, 2023

Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

1.3k Dec 26, 2022

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

TensorFlowOnSpark TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from the T

3.8k Jan 4, 2023

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

A unified Data Analytics and AI platform for distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray What is Analytics Zoo? Analytics Zo

2.5k Dec 28, 2022

[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

TensorFrames (Deprecated) Note: TensorFrames is deprecated. You can use pandas UDF instead. Experimental TensorFlow binding for Scala and Apache Spark

757 Dec 31, 2022

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models and supports classification, regression and ranking. TF-DF is a TensorFlow wrapper around the Yggdrasil Decision Forests C++ libraries. Models trained with TF-DF are compatible with Yggdrasil Decision Forests' models, and vice versa.

538 Jan 1, 2023

cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

3.1k Dec 28, 2022

mlpack: a scalable C++ machine learning library --

4.2k Jan 1, 2023

TensorFlow implementation of an arbitrary order Factorization Machine

Related tags

Overview

Dependencies

Installation

Usage

Testing

Citation

Comments

verify_input = array_ops.check_numerics(t, message=msg)

gives error:

Owner

Mikhail Trofimov

Factorization machines in python

fastFM: A Library for Factorization Machines

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

Extreme Learning Machine implementation in Python

Relevance Vector Machine implementation using the scikit-learn API.

Machine learning algorithms implementation

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Mesh TensorFlow: Model Parallelism Made Easier

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

cuML - RAPIDS Machine Learning Library

mlpack: a scalable C++ machine learning library --