A python library for implementing a recommender system

Overview

python-recsys

A python library for implementing a recommender system.

Installation

Dependencies

python-recsys is build on top of Divisi2, with csc-pysparse (Divisi2 also requires NumPy, and uses Networkx).

python-recsys also requires SciPy.

To install the dependencies do something like this (Ubuntu):

sudo apt-get install python-scipy python-numpy
sudo apt-get install python-pip
sudo pip install csc-pysparse networkx divisi2

# If you don't have pip installed then do:
# sudo easy_install csc-pysparse
# sudo easy_install networkx
# sudo easy_install divisi2

Download

Download python-recsys from github.

Install

tar xvfz python-recsys.tar.gz
cd python-recsys
sudo python setup.py install

Example

  1. Load Movielens dataset:
from recsys.algorithm.factorize import SVD
svd = SVD()
svd.load_data(filename='./data/movielens/ratings.dat',
            sep='::',
            format={'col':0, 'row':1, 'value':2, 'ids': int})
  1. Compute Singular Value Decomposition (SVD), M=U Sigma V^t:
k = 100
svd.compute(k=k,
            min_values=10,
            pre_normalize=None,
            mean_center=True,
            post_normalize=True,
            savefile='/tmp/movielens')
  1. Get similarity between two movies:
ITEMID1 = 1    # Toy Story (1995)
ITEMID2 = 2355 # A bug's life (1998)

svd.similarity(ITEMID1, ITEMID2)
# 0.67706936677315799
  1. Get movies similar to Toy Story:
svd.similar(ITEMID1)

# Returns: <ITEMID, Cosine Similarity Value>
[(1,    0.99999999999999978), # Toy Story
 (3114, 0.87060391051018071), # Toy Story 2
 (2355, 0.67706936677315799), # A bug's life
 (588,  0.5807351496754426),  # Aladdin
 (595,  0.46031829709743477), # Beauty and the Beast
 (1907, 0.44589398718134365), # Mulan
 (364,  0.42908159895574161), # The Lion King
 (2081, 0.42566581277820803), # The Little Mermaid
 (3396, 0.42474056361935913), # The Muppet Movie
 (2761, 0.40439361857585354)] # The Iron Giant
  1. Predict the rating a user (USERID) would give to a movie (ITEMID):
MIN_RATING = 0.0
MAX_RATING = 5.0
ITEMID = 1
USERID = 1

svd.predict(ITEMID, USERID, MIN_RATING, MAX_RATING)
# Predicted value 5.0

svd.get_matrix().value(ITEMID, USERID)
# Real value 5.0
  1. Recommend (non-rated) movies to a user:
svd.recommend(USERID, is_row=False) #cols are users and rows are items, thus we set is_row=False

# Returns: <ITEMID, Predicted Rating>
[(2905, 5.2133848204673416), # Shaggy D.A., The
 (318,  5.2052108435956033), # Shawshank Redemption, The
 (2019, 5.1037438278755474), # Seven Samurai (The Magnificent Seven)
 (1178, 5.0962756861447023), # Paths of Glory (1957)
 (904,  5.0771405690055724), # Rear Window (1954)
 (1250, 5.0744156653222436), # Bridge on the River Kwai, The
 (858,  5.0650911066862907), # Godfather, The
 (922,  5.0605327279819408), # Sunset Blvd.
 (1198, 5.0554543765500419), # Raiders of the Lost Ark
 (1148, 5.0548789542105332)] # Wrong Trousers, The
  1. Which users should see Toy Story? (e.g. which users -that have not rated Toy Story- would give it a high rating?)
svd.recommend(ITEMID)

# Returns: <USERID, Predicted Rating>
[(283,  5.716264440514446),
 (3604, 5.6471765418323141),
 (5056, 5.6218800339214496),
 (446,  5.5707524860615738),
 (3902, 5.5494529168484652),
 (4634, 5.51643364021289),
 (3324, 5.5138903299082802),
 (4801, 5.4947999354188548),
 (1131, 5.4941438045650068),
 (2339, 5.4916048051511659)]

Documentation

Documentation and examples available here.

To create the HTML documentation files from doc/source do:

cd doc
make html

HTML files are created here:

doc/build/html/index.html
Comments
  • Factorise.py file showing indentation error in each line after line 15  even though there is not  any

    Factorise.py file showing indentation error in each line after line 15 even though there is not any

    After doing step 1 (load data set) it shows error in line 15 and onwards of factorise.py(i am new to python) is there something i am doing wrong 1)i installed python first 2)then pip and dependies 3)then placed movie lens data set in respective places 4)install python-recys-master

    Now when i run first step this error even though i have installed divis2 ::

    Traceback (most recent call last): File "load.py", line 1, in from recsys.algorithm.factorize import SVD File "/home/abhimanyu/Desktop/python-recsys-master/recsys/algorithm/factorize.py", line 15, in from csc import divisi2 ImportError: No module named csc

    opened by abhinabyu 10
  • No data set. Matrix is empty!I

    No data set. Matrix is empty!I

    According to the python-recsys v1.0 documentation Algorithms,i put movielens-1M ratiing.dat in /usr/local/python-recsys-master/recsys/data/movielens/, ,then i load data,!!! but when i comput,ValueError: No data set. Matrix is empty!I want to konw why ?

    from recsys.algorithm.factorize import SVD
    filename = '/usr/local/python-recsys-master/recsys/data/movielens/ratings.dat'
    svd = SVD()
    svd.load_data(filename=filename, sep='::', format={'col':0, 'row':1, 'value':2, 'ids':int})
    
    from recsys.datamodel.data import Data
    from recsys.algorithm.factorize import SVD
    filename = '/usr/local/python-recsys-master/recsys/data/movielens/ratings.dat'
    data = Data()
    format = {'col':0, 'row':1, 'value':2, 'ids': int}
    data.load(filename, sep='::', format=format)
    train, test = data.split_train_test(percent=80)
    svd = SVD()
    svd.set_data(train)
    
    from recsys.utils.svdlibc import SVDLIBC
    svdlibc = SVDLIBC('/usr/local/python-recsys-master/recsys/data/movielens/ratings.dat')
    svdlibc.to_sparse_matrix(sep='::', format={'col':0, 'row':1, 'value':2, 'ids': int})
    svdlibc.compute(k=100)
    svd = svdlibc.export()
    

    !!! but when i comput,ValueError: No data set. Matrix is empty!I want to konw why ?

    K=100
    svd.compute(k=K, min_values=10, pre_normalize=None, mean_center=True, post_normalize=True, 
    savefile=None)
    Traceback (most recent call last):
    File "", line 3, in 
    File "recsys/algorithm/factorize.py", line 244, in compute
    super(SVD, self).compute(min_values)
    File "recsys/algorithm/baseclass.py", line 126, in compute
    raise ValueError('No data set. Matrix is empty!')
    ValueError: No data set. Matrix is empty!
    
    opened by cycchao 6
  • KeyError in svd.recommend()

    KeyError in svd.recommend()

    I have a dat-file with 3705912 lines, matrix 602x6156. When I call svd.recommend for identifiers have little values I get KeyError.

    svd.get_matrix().get_col(50652)
    
    SparseVector (4 of 602 entries): [6840789=14, 6843925=100, 6843926=100, 6843927=16]
    
    
    svd.recommend(50652, is_row=False)
    
    Traceback (most recent call last):
      File "svd.py", line 251, in <module>
        print svd.recommend(50652, is_row=False)
      File "/home/igor/sandbox/svd/local/lib/python2.7/site-packages/recsys/algorithm/factorize.py", line 352, in recommend
        item = self._get_col_reconstructed(i, zeros)
      File "/home/igor/sandbox/svd/local/lib/python2.7/site-packages/recsys/algorithm/factorize.py", line 300, in _get_col_reconstructed
        return self._matrix_reconstructed.col_named(j)
      File "/home/igor/sandbox/svd/local/lib/python2.7/site-packages/divisi2/labels.py", line 65, in col_named
        return self[:,self.col_index(label)]
      File "/home/igor/sandbox/svd/local/lib/python2.7/site-packages/divisi2/labels.py", line 57, in col_index
        return self.col_labels.index(label)
    KeyError: 50652
    

    But for identifiers have many values

    svd.get_matrix().get_col(10536)
    
    SparseVector (22 of 602 entries): [6840778=96, 6840779=100, 6840780=100, 6840781=100, 6840782=100, 6840783=100, 6840784=100, 6840785=100, 6840786=65, 6840818=83, 6840819=100, 6840820=100, 6840821=100, 6840822=100, 6840823=100, 6840824=100, 6840825=100, 6840826=100, 6840827=100, 6840828=21, ...]
    
    
    svd.recommend(10536, is_row=False)
    
    [(6900161, 100.00000000232269), (6840819, 100.00000000214945), (6840822, 100.00000000186564), (6840821, 100.00000000178625), (6840820, 100.0000000016603), (6840783, 100.00000000144556), (6840827, 100.00000000137024), (6840826, 100.00000000134551), (6840784, 100.00000000123218), (6840825, 100.00000000112296)]
    
    opened by igor87z 5
  •  RuntimeWarning: invalid value encountered in divide ----Special characters in user_id

    RuntimeWarning: invalid value encountered in divide ----Special characters in user_id

    hi, please I need to do recommender with id_user="X3escB3aJ_rP1u5DaTN9cw" but It does not allow it because the character hyphen. in the case id_user="X3escB3aJ_rP1u5DaTN9cw"

    "X3escB3aJ_rP1u5DaTN9cw Creating matrix (426351 tuples) Matrix density is: 0.0237% Updating matrix: squish to at least 10 values Computing svd k=10, min_values=10, pre_normalize=None, mean_center=True, post_normalize=True [WARNING] mean_center is True. svd.similar(...) might return nan's. If so, then do svd.compute(..., mean_center=False) /root/anaconda2/lib/python2.7/site-packages/divisi2/dense.py:269: RuntimeWarning: invalid value encountered in divide return self / norms Traceback (most recent call last): File "ejemplo.py", line 47, in recomendaciones= svd.recommend(id_user, n=10,is_row=False) File "build/bdist.linux-x86_64/egg/recsys/algorithm/factorize.py", line 352, in recommend File "build/bdist.linux-x86_64/egg/recsys/algorithm/factorize.py", line 300, in _get_col_reconstructed File "/root/anaconda2/lib/python2.7/site-packages/divisi2/labels.py", line 65, in col_named return self[:,self.col_index(label)] File "/root/anaconda2/lib/python2.7/site-packages/divisi2/labels.py", line 57, in col_index return self.col_labels.index(label) KeyError: 'X3escB3aJ_rP1u5DaTN9cw'"

    Thanks

    opened by FrancyPinedaB77 4
  • SVD.compute() kernel fail on Windows

    SVD.compute() kernel fail on Windows

    Used the Movielens SVD example in my class tonight to show them a rec system.

    People (including me) who were using a MAC went though the tutorial without issues.

    All people on Windows (7 and 8.1): the kernel fails on the "svd.compute" step.

    I believe everyone is using the 2.7+ Anaconda version of Python.

    thoughts?

    opened by mylesg 4
  • ValueError: No data set, Matrix is empty!

    ValueError: No data set, Matrix is empty!

    I have tried some data set, but if I call the functions svd.similarity() or svd.recommend(), the ouput of the console is :

    Traceback (most recent call last): File "recsys_data.py", line 20, in svd.compute(k=k, min_values=5, pre_normalize=None, mean_center=True, post_normalize=True,savefile=None) File "/usr/local/lib/python2.7/dist-packages/python_recsys-0.2-py2.7.egg/recsys/algorithm/factorize.py", line 244, in compute super(SVD, self).compute(min_values) File "/usr/local/lib/python2.7/dist-packages/python_recsys-0.2-py2.7.egg/recsys/algorithm/baseclass.py", line 126, in compute raise ValueError('No data set. Matrix is empty!') ValueError: No data set. Matrix is empty!

    I want to konw why ?

    opened by yllions 3
  • Problem in load() method

    Problem in load() method

    If you do this:

    rmse = RMSE() rmse.load(x, y)

    with x and y being numpy uni-dimensional arrays, you get this:

    in compute(self) 98 Computes the evaluation using the loaded ground truth and test lists 99 """ --> 100 if not self._ground_truth: 101 raise ValueError('Ground Truth dataset is empty!') 102 if not self._test:

    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

    I fixed it this way:

        if self._ground_truth is None:
            raise ValueError('Ground Truth dataset is empty!')
        if self._test is None:
            raise ValueError('Test dataset is empty!')
    
    opened by matiasherranz 3
  • IndexError: Error creating second index list

    IndexError: Error creating second index list

    hi

    I have this user=1 stars=nn.get("rating") movie_id=nn.get("movieId") id_user=nn.get("userId") tupla=(stars,movie_id,id_user) #if tupla=(4.0, 1193, 2) data.add_tuple(tupla)

    and I have this problem when: recomendaciones= svd.recommend(user, n=10, only_unknowns=True, is_row=False)

    ERROR

    File "/usr/local/lib/python2.7/dist-packages/pysparse/sparse/pysparseMatrix.py", line 149, in getitem m = self.matrix[index] IndexError: Error creating second index list image

    opened by FrancyPinedaB77 2
  • Getting error while laoding the data.

    Getting error while laoding the data.

    I want to try out the python-recsys but it is giving me the error while loading the movie ratings mail.

    **>>> svd.load_data(filename='./movierates.dat', sep='::', format={'col':0,'row':1,'value':2,'ids': int})

    Error (ID is not int) while reading: [u'userId', u'movieId', u'rating', u'timestamp'] Error while reading: [u'']**

    opened by dilipbobby 2
  • How to increase the number of similarity/recommending item results

    How to increase the number of similarity/recommending item results

    Hi, now I am testing recsys algorithm for only similar users finding. Each output result contains 10 users with similarity scores (actually it has 9 similar users and 1 search/target user). I want to increase the number of results from 10 to 50. I couldn't find about it in documentation of parameter setting . Could you please give me a direction?

    opened by ohnmar-htun 2
  • SVD.similar() user or item

    SVD.similar() user or item

    When trying to get SVD.similarity(i, j), how do I specify if I'm looking users or items?

    Thanks!

    http://ocelma.net/software/python-recsys/build/html/api.html#recsys.algorithm.baseclass.Algorithm.similarity

    opened by fedeisas 2
  • `recsys` package ownership on PyPI

    `recsys` package ownership on PyPI

    Hi @ocelma ,

    I currently "own" (but do not use) the recsys package name on PyPI. recsys was the name I wanted to use when I was developing Surprise, but I later realized that it would conflict with your own python-recsys package, whose import-time name is also recsys. (so I had to come up with a more far-fetched acronym! 😅).

    @jiwidi recently reached out to me, asking if I'd be open to transferring the PyPI recsys name to him. I don't use the namespace myself so I'm happy to transfer; that being said, @jiwidi and I both agree that it's best to reach out to you first.

    Is it fine with you for @jiwidi to own and use the recsys name on PyPI? (that means users will install and use the project with pip install recsys / import recsys). We also reached out to you via email a few weeks ago, so feel free to follow up there if you prefer.

    Thank you!

    opened by NicolasHug 0
  • docs: fix simple typo, relationshops -> relationships

    docs: fix simple typo, relationshops -> relationships

    There is a small typo in recsys/datamodel/data.py.

    Should read relationships rather than relationshops.

    Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

    opened by timgates42 0
  • Added the capability of incremental SVD to python-recsys which is

    Added the capability of incremental SVD to python-recsys which is "Folding-in" new users (or items) to the SVD model so they can receive recommendations instantly without re-building the model from scratch each time new users come to the system.

    • Now supports incrementally adding new users or items instead of building the model from scratch for these new users or items via the folding-in technique which was mentioned in Sarwar et al.'s paper_ (Titled: Incremental Singular Value Decomposition Algorithms for Highly Scalable Recommender Systems), this latest commit is simply an implementation to it for python-recsys.

    .. _paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.3.7894&rep=rep1&type=pdf

    • A Demonstration video is available_ for this latest commit in form of a demo site built using the MEAN stack which uses the updated python-recsys as backend for the recommender which folds-in the website's user in to the SVD model and gets recommendations instantaneously instead of building the model from scratch.

    .. _Demonstration video is available: https://youtu.be/tIvQxBfa2d4

    • There is also an accompanying bachelor thesis paper_ (For those interested) which outlines the background, architecture and discusses the "Folding-in" approach.

    .. _bachelor thesis paper: https://drive.google.com/file/d/0BylQe2cRVWE_RmZoUTJYSGZNaXM/view

    opened by Ibrahim-AbouElseoud 0
  • AveragePrecision in recsys.evaluation.ranking

    AveragePrecision in recsys.evaluation.ranking

    Hi Oscar, The calculation of AveragePrecision in recsys.evaluation.ranking is not correct. The returned value should be sum(p_at_k)/number of relevant items, rather than sum(p_at_k)/hits.

    In your document --> evaluation, the corresponding part also needs to be changed.

    from recsys.evaluation.ranking import AveragePrecision

    ap = AveragePrecision()

    GT = [1,2,3,4,5] q = [1,3,5] ap.load(GT, q) ap.compute() # returns 1.0, should return 0.6

    GT = [1,2,3,4,5] q = [99,3,5] ap.load(GT, q) ap.compute() # returns 0.5833335, should return 0.23333

    Kind Regards, Siqi

    opened by lisiqi 1
  • replace csc-pyparse with SciPy

    replace csc-pyparse with SciPy

    while I was trying to install csc-pyparse I got this error

    pysparse/sparse/src/spmatrixmodule.c:1:20: fatal error: Python.h: No such file or directory

    so it seems that csc-pyparse couldn't be installed for newer versions of python (python 3) check here

    so could you port python-recsys to SciPy as an alternative or solve this problem please :/ .

    opened by molhaMaleh 3
Owner
Oscar Celma
I used to code. Now I barely remember how to do it
Oscar Celma
Fashion Recommender System With Python

Fashion-Recommender-System Thr growing e-commerce industry presents us with a la

Omkar Gawade 2 Feb 2, 2022
A TikTok-like recommender system for GitHub repositories based on Gorse

GitRec GitRec is the missing recommender system for GitHub repositories based on Gorse. Architecture The trending crawler crawls trending repositories

null 337 Jan 4, 2023
A Real-World Benchmark for Reinforcement Learning based Recommender System

RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System RL4RS is a real-world deep reinforcement learning recommender system

null 121 Dec 1, 2022
Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

Demetri Pananos 9 Oct 4, 2022
Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Framework overview This library allows to quickly implement different architectures based on Reservoir Computing (the family of approaches popularized

Filippo Bianchi 249 Dec 21, 2022
This is an open source library implementing hyperbox-based machine learning algorithms

hyperbox-brain is a Python open source toolbox implementing hyperbox-based machine learning algorithms built on top of scikit-learn and is distributed

Complex Adaptive Systems (CAS) Lab - University of Technology Sydney 21 Dec 14, 2022
A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

collie_recs Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Coll

ShopRunner 97 Jan 3, 2023
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

null 419 Jan 3, 2023
A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

collie Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Collie do

ShopRunner 96 Dec 29, 2022
Crab is a flexible, fast recommender engine for Python that integrates classic information filtering recommendation algorithms in the world of scientific Python packages (numpy, scipy, matplotlib).

Crab - A Recommendation Engine library for Python Crab is a flexible, fast recommender engine for Python that integrates classic information filtering r

python-recsys 1.2k Dec 21, 2022
Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Noah Getz 3 Jun 22, 2022
Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy

Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy Simplex Algorithm is a popular algorithm for linear programmi

Reda BELHAJ 2 Oct 12, 2022
LoL Runes Recommender With Python

LoL-Runes-Recommender Para ejecutar la aplicación se debe llamar a execute_app.p

Sebastián Salinas 1 Jan 10, 2022
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Detectron is deprecated. Please see detectron2, a ground-up rewrite of Detectron in PyTorch. Detectron Detectron is Facebook AI Research's software sy

Facebook Research 25.5k Jan 7, 2023
Generating Anime Images by Implementing Deep Convolutional Generative Adversarial Networks paper

AnimeGAN - Deep Convolutional Generative Adverserial Network PyTorch implementation of DCGAN introduced in the paper: Unsupervised Representation Lear

Rohit Kukreja 23 Jul 21, 2022
This is the code repository implementing the paper "TreePartNet: Neural Decomposition of Point Clouds for 3D Tree Reconstruction".

TreePartNet This is the code repository implementing the paper "TreePartNet: Neural Decomposition of Point Clouds for 3D Tree Reconstruction". Depende

刘彦超 34 Nov 30, 2022
Facilitates implementing deep neural-network backbones, data augmentations

Introduction Nowadays, the training of Deep Learning models is fragmented and unified. When AI engineers face up with one specific task, the common wa

null 40 Dec 29, 2022
PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

Subin An 8 Nov 21, 2022
Implementing yolov4 target detection and tracking based on nao robot

Implementing yolov4 target detection and tracking based on nao robot

null 6 Apr 19, 2022