A Python scikit for building and analyzing recommender systems

Overview

GitHub version Documentation Status Build Status python versions License DOI

logo

Overview

Surprise is a Python scikit for building and analyzing recommender systems that deal with explicit rating data.

Surprise was designed with the following purposes in mind:

The name SurPRISE (roughly :) ) stands for Simple Python RecommendatIon System Engine.

Please note that surprise does not support implicit ratings or content-based information.

Getting started, example

Here is a simple example showing how you can (down)load a dataset, split it for 5-fold cross-validation, and compute the MAE and RMSE of the SVD algorithm.

from surprise import SVD
from surprise import Dataset
from surprise.model_selection import cross_validate

# Load the movielens-100k dataset (download it if needed).
data = Dataset.load_builtin('ml-100k')

# Use the famous SVD algorithm.
algo = SVD()

# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Output:

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

            Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std
RMSE        0.9311  0.9370  0.9320  0.9317  0.9391  0.9342  0.0032
MAE         0.7350  0.7375  0.7341  0.7342  0.7375  0.7357  0.0015
Fit time    6.53    7.11    7.23    7.15    3.99    6.40    1.23
Test time   0.26    0.26    0.25    0.15    0.13    0.21    0.06

Surprise can do much more (e.g, GridSearchCV)! You'll find more usage examples in the documentation .

Benchmarks

Here are the average RMSE, MAE and total execution time of various algorithms (with their default parameters) on a 5-fold cross-validation procedure. The datasets are the Movielens 100k and 1M datasets. The folds are the same for all the algorithms. All experiments are run on a notebook with Intel Core i5 7th gen (2.5 GHz) and 8Go RAM. The code for generating these tables can be found in the benchmark example.

Movielens 100k RMSE MAE Time
SVD 0.934 0.737 0:00:11
SVD++ 0.92 0.722 0:09:03
NMF 0.963 0.758 0:00:15
Slope One 0.946 0.743 0:00:08
k-NN 0.98 0.774 0:00:10
Centered k-NN 0.951 0.749 0:00:10
k-NN Baseline 0.931 0.733 0:00:12
Co-Clustering 0.963 0.753 0:00:03
Baseline 0.944 0.748 0:00:01
Random 1.514 1.215 0:00:01
Movielens 1M RMSE MAE Time
SVD 0.873 0.686 0:02:13
SVD++ 0.862 0.673 2:54:19
NMF 0.916 0.724 0:02:31
Slope One 0.907 0.715 0:02:31
k-NN 0.923 0.727 0:05:27
Centered k-NN 0.929 0.738 0:05:43
k-NN Baseline 0.895 0.706 0:05:55
Co-Clustering 0.915 0.717 0:00:31
Baseline 0.909 0.719 0:00:19
Random 1.504 1.206 0:00:19

Installation

With pip (you'll need numpy, and a C compiler. Windows users might prefer using conda):

$ pip install numpy
$ pip install scikit-surprise

With conda:

$ conda install -c conda-forge scikit-surprise

For the latest version, you can also clone the repo and build the source (you'll first need Cython and numpy):

$ pip install numpy cython
$ git clone https://github.com/NicolasHug/surprise.git
$ cd surprise
$ python setup.py install

License and reference

This project is licensed under the BSD 3-Clause license, so it can be used for pretty much everything, including commercial applications. Please let us know how Surprise is useful to you!

Please make sure to cite the paper if you use Surprise for your research:

@article{Hug2020,
  doi = {10.21105/joss.02174},
  url = {https://doi.org/10.21105/joss.02174},
  year = {2020},
  publisher = {The Open Journal},
  volume = {5},
  number = {52},
  pages = {2174},
  author = {Nicolas Hug},
  title = {Surprise: A Python library for recommender systems},
  journal = {Journal of Open Source Software}
}

Contributors

The following persons have contributed to Surprise:

ashtou, bobbyinfj, caoyi, Олег Демиденко, Charles-Emmanuel Dias, dmamylin, Lauriane Ducasse, Marc Feger, franckjay, Lukas Galke, Tim Gates, Pierre-François Gimenez, Zachary Glassman, Jeff Hale, Nicolas Hug, Janniks, jyesawtellrickson, Doruk Kilitcioglu, Ravi Raju Krishna, Hengji Liu, Maher Malaeb, Manoj K, James McNeilis, Naturale0, nju-luke, Jay Qi, Lucas Rebscher, Skywhat, David Stevens, TrWestdoor, Victor Wang, Mike Lee Williams, Jay Wong, Chenchen Xu, YaoZh1918.

Thanks a lot :) !

Development Status

Starting from version 1.1.0 (September 19), we will only maintain the package and provide bugfixes. No new features will be considered.

For bugs, issues or questions about Surprise, please use the GitHub project page. Please don't send emails (we will not answer).

Comments
  • Can I get the latent Matrix?

    Can I get the latent Matrix?

    Description

    Hello Here is my quesition. If I use the SVD for recommendation, Can I get the latent matrixs? If I can, please tell me how to.Thanks.

    Versions

    Surprise v1.0.4

    opened by a6822342 38
  • Remove numpy dependency for installation

    Remove numpy dependency for installation

    Description

    Installation via pip should automatically install all requirements, including numpy. This is helpful in a continuous integration setting, where setting up the environment with pip install -r requirements.txt is quite common (and doesn't currently work for Surprise).

    Steps/Code to Reproduce

    I tested on both a CircleCI Docker container and a Ubuntu container. Easiest way to reproduce is in docker, but I get the same result by publishing a simple project to circleci.

    ➜  ~ docker run -it circleci/python:3.6.5  # same results with ubuntu
    $ bash
    circleci@54577d1f4689:/$ cd tmp
    circleci@54577d1f4689:/tmp$ echo "joblib==0.12.0
    > numpy==1.14.5
    > scikit-surprise==1.0.6
    > scipy==1.1.0
    > six==1.11.0" > requirements.txt
    circleci@54577d1f4689:/tmp$ 
    circleci@54577d1f4689:/tmp$ pip install -r requirements.txt
    Collecting joblib==0.12.0 (from -r requirements.txt (line 1))
      Downloading https://files.pythonhosted.org/packages/f4/2f/66db8ecbfa71cd36146d894c867b2f595682d620329fd823c5c041687b5f/joblib-0.12.0-py2.py3-none-any.whl (261kB)
        100% |████████████████████████████████| 266kB 16.8MB/s 
    Collecting numpy==1.14.5 (from -r requirements.txt (line 2))
      Downloading https://files.pythonhosted.org/packages/68/1e/116ad560de97694e2d0c1843a7a0075cc9f49e922454d32f49a80eb6f1f2/numpy-1.14.5-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
        100% |████████████████████████████████| 12.2MB 3.9MB/s 
    Collecting scikit-surprise==1.0.6 (from -r requirements.txt (line 3))
      Downloading https://files.pythonhosted.org/packages/4d/fc/cd4210b247d1dca421c25994740cbbf03c5e980e31881f10eaddf45fdab0/scikit-surprise-1.0.6.tar.gz (3.3MB)
        100% |████████████████████████████████| 3.3MB 16.7MB/s 
        Complete output from command python setup.py egg_info:
        Please install numpy>=1.11.2 first.
        
        ----------------------------------------
    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-9bs19soh/scikit-surprise/
    circleci@54577d1f4689:/tmp$ 
    

    Likely cause

    This almost definitely happens because we try to import numpy in setup.py. Is there a way to avoid this? For example, https://stackoverflow.com/a/42163080 might be a way forward.

    Installation from requirements file works (on some environments) if numpy is in the pip cache

    On my environment, if numpy happens to be in the pip cache, then installation from requirements.txt succeeds even if numpy is not present in the virtualenv. (!)

    [user@machine tmp]$ virtualenv -p python3 env
    Running virtualenv with interpreter /usr/bin/python3
    Using base prefix '/usr'
    New python executable in /home/user/tmp/env/bin/python3
    Also creating executable in /home/user/tmp/env/bin/python
    Installing setuptools, pip, wheel...done.
    [user@machine tmp]$ source env/bin/activate
    (env) [user@machine tmp]$ python -c "import numpy"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ModuleNotFoundError: No module named 'numpy'
    (env) [user@machine tmp]$ 
    (env) [user@machine tmp]$ pip install -r requirements.txt 
    Collecting joblib==0.12.0 (from -r requirements.txt (line 1))
      Using cached https://files.pythonhosted.org/packages/f4/2f/66db8ecbfa71cd36146d894c867b2f595682d620329fd823c5c041687b5f/joblib-0.12.0-py2.py3-none-any.whl
    Collecting numpy==1.14.5 (from -r requirements.txt (line 2))
      Using cached https://files.pythonhosted.org/packages/68/1e/116ad560de97694e2d0c1843a7a0075cc9f49e922454d32f49a80eb6f1f2/numpy-1.14.5-cp36-cp36m-manylinux1_x86_64.whl
    Collecting scikit-surprise==1.0.6 (from -r requirements.txt (line 3))
    Collecting scipy==1.1.0 (from -r requirements.txt (line 4))
      Using cached https://files.pythonhosted.org/packages/a8/0b/f163da98d3a01b3e0ef1cab8dd2123c34aee2bafbb1c5bffa354cc8a1730/scipy-1.1.0-cp36-cp36m-manylinux1_x86_64.whl
    Collecting six==1.11.0 (from -r requirements.txt (line 5))
      Using cached https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
    Installing collected packages: joblib, numpy, six, scipy, scikit-surprise
    Successfully installed joblib-0.12.0 numpy-1.14.5 scikit-surprise-1.0.6 scipy-1.1.0 six-1.11.0
    (env) [user@machine tmp]$ 
    
    

    But if we rm -rf ~/.cache/pip and retry then the above installation will fail.

    This is true on my local machine, but not in the barebones docker environment.

    This behavior seems very strange, and I'm not familiar enough with pip and python packaging to try to explain it. But, it could explain why most regular users (who probably installed numpy some time in the past) don't run into this issue.

    Workaround

    My current workaround is to pip install numpy and then pip install -r requirements.txt in ths environment setup. However, having to manually keep track of installation dependencies seems a little strange.

    Versions

    >>> import platform; print(platform.platform())
    Linux-4.17.2-1-ARCH-x86_64-with-debian-9.4
    >>> import sys; print("Python", sys.version)
    Python 3.6.5 (default, Jun  6 2018, 19:19:24) 
    [GCC 6.3.0 20170516
    
    opened by gbrova 20
  • The first sample script seems to run indefinitely.

    The first sample script seems to run indefinitely.

    Hi! I am a newbie with Surprise.

    I have installed Surprise (1.0.5) in WinPython (WinPython-64bit-3.6.3.0Qt5). I am in Windows 10 x64. Apparently, the installation was flawless. However when i test the first sample code:

    from surprise import SVD
    from surprise import Dataset
    from surprise.model_selection import cross_validate
    
    # Load the movielens-100k dataset (download it if needed),
    data = Dataset.load_builtin('ml-100k')
    
    # We'll use the famous SVD algorithm.
    algo = SVD()
    
    # Run 5-fold cross-validation and print results
    cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
    

    I 'never' get a result. The script seems to run indefinitely. Just in case I've waited more than two hours... I heard clearly the acceleration fan CPU noise... (The first time the database was downloaded without problems)

    I have not problem with another samples. For instance;

    
    from surprise import SVD
    from surprise import Dataset
    from surprise import accuracy
    from surprise.model_selection import train_test_split
    
    # Load the movielens-100k dataset (download it if needed),
    data = Dataset.load_builtin('ml-100k')
    
    # sample random trainset and testset
    # test set is made of 25% of the ratings.
    trainset, testset = train_test_split(data, test_size=.25)
    
    # We'll use the famous SVD algorithm.
    algo = SVD()
    
    # Train the algorithm on the trainset, and predict ratings for the testset
    algo.fit(trainset)
    predictions = algo.test(testset)
    
    # Then compute RMSE
    accuracy.rmse(predictions)
    
    

    I get:

    RMSE: 0.9450

    I think my problem is related with cross_validate. I request confirmation that the sample script is working properly in 1.0.5. And Could you tell me in what configuration it has been tested successfully. Thanks!! And congrats for SurPRISE! 👍

    P.S: I doubt that the problem is in the setting (Winpython, Windows 10 x64, ...) I have worked with scikit_learn in the same setting. No problem. It is fair to admit that scikit_learn is preinstalled in Winpython.

    opened by DJuego 19
  • Prediction is fixed no matter what

    Prediction is fixed no matter what

    I have a dataset that looks like this

    keywordID  rating  userID
    5        774     1.0    2100
    5        914     1.0    2100
    5        965     1.0    2100
    5       4035     0.6    2100
    

    my first problem is my data is not very diverse a counter of the my ratings looks like this, originally the ratings can range between -1 and 1

    Counter({0.5: 15, 0.66666666666666663: 6, 1.0: 3274})

    I'm reading with a custom reader from with pandas dataframe.

    my problem now, when I run the get_top_n as in the docs i get different data with each run, and running the prediction will always yield estimation of -1 no matter what

    algo.predict('2100', '774')

    so its not an issue with surprise itself but any ideas what am I doing wrong?

    opened by AEzzatA 19
  • Simple example so difficult to understand

    Simple example so difficult to understand

    Hi,

    Im making a simple book recommendation. So im using your "custom_dataset" in my example:

    import os
    
    from surprise import Dataset
    from surprise import KNNBasic
    from surprise import Reader
    
    reader = Reader(line_format='user item rating', sep=' ', skip_lines=3, rating_scale=(1, 5))
    
    custom_dataset_path = (os.path.dirname(os.path.realpath(__file__)) + '/custom_dataset')
    print("Using: " + custom_dataset_path)
    
    data = Dataset.load_from_file(file_path=custom_dataset_path, reader=reader)
    trainset = data.build_full_trainset()
    
    sim_options = {
        'name': 'cosine',
        'user_based': True  # compute  similarities between items
    }
    
    algo = KNNBasic(sim_options=sim_options)
    algo.train(trainset)
    
    uid = "user0"
    
    pred = algo.predict(uid=uid, iid="", r_ui=0, verbose=True)
    print(pred)
    

    Results of execution:

    python process.py 
    
    Using: /Users/paulo/Developer/workspaces/python/ubook-recommender/custom_dataset
    Computing the cosine similarity matrix...
    Done computing similarity matrix.
    user: user0      item:            r_ui = 0.00   est = 3.00   {u'reason': 'User and/or item is unkown.', u'was_impossible': True}
    user: user0      item:            r_ui = 0.00   est = 3.00   {u'reason': 'User and/or item is unkown.', u'was_impossible': True}
    

    Can you help me?

    I need understand how it works, i think it is simple, but it dont enter on my mind. It is not intuitive as this:

    https://www.librec.net/dokuwiki/doku.php?id=introduction

    I want understand your library to make it work at same level.

    Can you help me?

    opened by paulocoutinhox 18
  • A bigger epoch brings worse results.

    A bigger epoch brings worse results.

    Hello Nicolas, I have a problem that when I use Surprise to implement SVD on my own split movielens-1m dataset, the results will get worse if I increase the parameter epoch.. It seems that svd has overfitted the train set since the epoch 1... A bigger epoch brings a worse result and even epoch=1 cannot achieve rmse less than 1 (I have tested many learning rates but in vain)... I can hardly tune my parameters to achieve that good results on the docs(e..g. 0.8+ rmse).. Is there any way to solve the problem? Thanks for your time!

    opened by Cong-Zou 17
  • Unable to get the correct similarity Values

    Unable to get the correct similarity Values

    am using the code below and using the dataset https://github.com/ngovind93/Recommender/blob/master/toydataset.txt

    This is example dataset from c.aggarwal book pg.no 34 table 2.1. the pearson similarity values are not matching when I run.

    a slight modification done in knns.py to view the similarity values, in def estimate(self,u,i)

    details = {'actual_k': actual_k,'neigh':k_neighbors}

    from surprise import KNNBasic
    from surprise import Dataset
     from surprise import Reader
    import os
    file_path = os.path.expanduser('toydataset.txt')
    reader = Reader(line_format='user item rating ', sep=' ',rating_scale=(1,10))
    data = Dataset.load_from_file(file_path, reader=reader)
    trainset = data.build_full_trainset()
    sim_options={'name':'pearson','min_support':1,'user_based':True}
    algo = KNNBasic(sim_options=sim_options,verbose=True)
    algo.fit(trainset)
    uid=str(3)
    iid=str(6)
    pred = algo.predict(uid,iid , r_ui=None, verbose=True)
    
    

    am getting this result:

    Computing the pearson similarity matrix... Done computing similarity matrix.

    user: 3 item: 6 r_ui = None est = 4.00 {u'actual_k': 2, u'was_impossible': False, u'neigh': [(0.9707253433941511, 4.0), (0.8944271909999159, 4.0), (-0.8660254037844387, 3.0), (-1.0, 3.0)]}

    opened by ngovind93 15
  • Problem: current kNNWithMeans implementation doesn't allow for classical item-item kNN approach

    Problem: current kNNWithMeans implementation doesn't allow for classical item-item kNN approach

    Current implementation for kNNWithMeans computes similarity measure on the original ratings, and than uses mean values to compute a prediction.

    According to:

    • "Recommender systems. the textbook" by Aggarwal, chapter: 2.3.2 Item-Based Neighborhood Models,
    • initial paper on item-item CF: Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms.

    For the purpose of item-item similarity computation: we should use "adjusted cosine" similariy, by taking into account not raw ratings, but rather subtracting average item-rating from each users-rating, before computing item similarities.

    How to address the problem: Possible approach would be to add an "adjusted cosine" similarity measure. But in case we compute it independently of kNNWithMeans - this will mean that we will compute mean values twice, independently for similarity measure and predictions, which seems to be a computation-ineffective approach. So, I would suggest rather to incorporate mean-adjustment in the prediction algorythm itself.

    opened by ODemidenko 15
  • Own Prediction Alogrithm: Adjusted KNN With Means

    Own Prediction Alogrithm: Adjusted KNN With Means

    Hi,

    I want to adjust the KNN alogrithm a little bit. My idea is that before you specify k_neighbours by heapq.nlargest, I want to ignore the similarities results that are smaller than a threshold (min_s). Specifically:

            for (n,sim,r) in neighbors:
                if sim < min_s:
                    sim = 0
    

    Of course, I am trying to check the accuracy for this adjustment. However when I built my own alogrithm like your guidlines, i receive the following errors:

    TypeError: compute_similarities() got an unexpected keyword argument 'verbose'

    These are the libs I have imported. The code Class Adjusted_KNNWithMeans is the same as your KNNWithMeans; except the change I mentioned above.

    from surprise import Reader, Dataset, trainset
    from surprise import SVD, evaluate, SVDpp
    from surprise.model_selection import cross_validate, train_test_split
    from surprise.model_selection import train_test_split
    from surprise import accuracy
    from surprise import PredictionImpossible
    from surprise import AlgoBase
    from surprise.prediction_algorithms.knns import SymmetricAlgo
    from __future__ import (absolute_import, division, print_function,
                            unicode_literals)
    import numpy as np
    from six import iteritems
    import heapq
    

    Thank you very much.

    opened by TamTSE2301 14
  • [Feature Request] Recommendation strategy to recommend the items with the 10 highest estimation

    [Feature Request] Recommendation strategy to recommend the items with the 10 highest estimation

    Hi Nicolas, I have been using the Surprise Package for a few days now. Great job +1: In your todo list, you have an item "Implement some recommendation strategy (like recommend the items with the 10 highest estimation)". Do you have an idea as to when it will be released, approximately? Any info in regards to this would be really helpful.

    What would be your basic strategy to implement this feature?

    I wrote a function for SVD to return a list of top K recommendations. I follow these steps to get the list,

    1. Iterate through all items for a specific user
    2. Run predict function for items that don't have a rating ( Item for a user without rating)
    3. Sort top K and return them as recommendations.

    However, the function is returning same or almost similar 10 / 20 recommendations irrespective of the user.

    Is the above strategy wrong? Any thoughts on why I am getting almost similar recommendations?

    Thanks :)

    opened by adideshp 14
  • Add mean squared error (MSE) to accuracy

    Add mean squared error (MSE) to accuracy

    Hello,

    I noticed that the MSE measurement is not in the accuracy module.

    Of course you can get the MSE by squaring RMSE ^^. But I think it would be more enjoyable if MSE is directly available.

    So I pulled MSE out of RMSE and adjusted the corresponding tests.

    I hope the request is justified.

    Yours sincerely Marc Feger

    opened by ghost 13
  • How do I apply ALS minimization in SVD?

    How do I apply ALS minimization in SVD?

    Hello, @NicolasHug

    According to the docs, the SVD model uses SGD as the minimization algorithm. Is there any way of using ALS instead? I needed to compare the performance of these two in my data

    opened by Shariar076 1
  • A bug when importing data from DataFrame

    A bug when importing data from DataFrame

    Description

    When importing data using DataFrame, all estimated rating equal to the mean value, not really predict the rating. But if importing the same data set from file, it works as normal.

    Steps/Code to Reproduce

    import pandas as pd
    from surprise import SVD
    from surprise import Dataset
    from surprise import Reader
    
    # Creation of the dataframe. Column names are irrelevant.
    ratings_dict = {'itemID': [1, 1, 1, 2, 2],
                    'userID': [9, 32, 2, 45, 'user_foo'],
                    'rating': [3, 2, 4, 3, 1]}
    df = pd.DataFrame(ratings_dict)
    
    # A reader is still needed but only the rating_scale param is requiered.
    reader = Reader(rating_scale=(1, 5))
    
    # The columns must correspond to user id, item id and ratings (in that order).
    data = Dataset.load_from_df(df[['userID', 'itemID', 'rating']], reader)
    
    # We can now use this dataset as we please, e.g. calling cross_validate
    svd_model = SVD()
    svd_model.fit(trainset=data.build_full_trainset())
    test_case = svd_model.predict(str(1),str(2),verbose=True)
    
    

    Expected Results

    The result were different if predict each user and item.

    Actual Results

    But the actual result was that all predict ratings equal to the mean value (2.6)

    Versions

    Windows-10-10.0.22621-SP0 Python 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)] surprise 1.1.3

    opened by Alaskyed 1
  • What to do if the dataset I want to read has more than three parameters

    What to do if the dataset I want to read has more than three parameters

    Hey! I've recently started working on research pertaining to finding new uses for matrix factorization techniques. The biggest challenge is that it has proven quite hard not to overflow while testing with different datasets. Tried using surprise but whenever I try to pass anything bigger than 3 collumns on the Reader, I get an error (ValueError: line_format parameter is incorrect.)

    If there's any way to bypasss that restriction, or any resource you could point me to, I'd be delighted! Thanks!

    opened by GustaMatos0 0
  • build_anti_testset() takes along time and at the end it doesnot work

    build_anti_testset() takes along time and at the end it doesnot work

    1- reader = Reader(rating_scale=(1, 5)) 2- data = Dataset.load_from_df(ratings[['userId', 'asin', 'rating']], reader) # this is my own dataset 3 - svd = SVD(n_factors= 30 , n_epochs= 20 , lr_all = 0.005 , reg_all = 0.02 ) 4 - real_trainset = data.build_full_trainset() 5 - svd.fit(real_trainset) 6 -real_testset = real_trainset.build_anti_testset() # the code stop here after along time and at the end it returns memory error
    7 -predictions = svd.test(real_testset) 8 - top_n = get_top_n(predictions, n=20)

    When I run the program it stops at line number 6 because of (build_anti_testset()) and it returns memory error after along time

    however when I replace (build_anti_testset()) with (build_testset()) it works and doesnot have any problem

    but I need to use (build_anti_testset()) instead of (build_testset()) because I need the predictions to be on the items that the users has not rated yet

    opened by bodymostafa123 1
  • Using Cross Validate with a Pandas Dataframe, running into datatype error

    Using Cross Validate with a Pandas Dataframe, running into datatype error

    Description

    from future import (absolute_import, division, print_function, unicode_literals)

    import pandas as pd

    from surprise import SVD from surprise import dataset from surprise import Reader

    reader = Reader() model = SVD() ratings_dict=ratings.to_dict('records') ratings_dict

    Example output: [{'userId': 1, 'movieId': 31, 'rating': 2.5, 'timestamp': 1260759144}, {'userId': 1, 'movieId': 1029, 'rating': 3.0, 'timestamp': 1260759179}, {'userId': 1, 'movieId': 1061, 'rating': 3.0, 'timestamp': 1260759182}, {'userId': 1, 'movieId': 1129, 'rating': 2.0, 'timestamp': 1260759185},

    df = pd.DataFrame.from_dict(ratings_dict)

    You'll need to create a dummy reader

    reader = Reader(line_format='user item rating', rating_scale=(1, 5))

    Also, a dummy Dataset class

    class MyDataset(dataset.DatasetAutoFolds):

    def __init__(self, df, reader):
    
        self.raw_ratings = [(uid, iid, r, None) for (uid, iid, r) in
                            zip(df['userId'], df['movieId'], df['rating'])]
        self.reader=reader
    

    data = MyDataset(df, reader)

    cross_validate(model,data,measures=['userId', 'movieId', 'rating'],cv=3)

    Error Message: AttributeError: module 'surprise.accuracy' has no attribute 'userid' 1 cross_validate(model,data,measures=['userId', 'movieId', 'rating'],cv=3)

    Steps/Code to Reproduce

    Before running the above code:

    ratings=pd.read_csv('ratings_small.csv') ratings.head()

    !pip install scikit-surprise from surprise import Reader, Dataset, SVD from surprise.model_selection import cross_validate

    Expected Results

    Actual Results

    Versions

    opened by JackRaines 0
Owner
Nicolas Hug
ML engineer, Scikit-learn core-developer
Nicolas Hug
Graph Neural Networks for Recommender Systems

This repository contains code to train and test GNN models for recommendation, mainly using the Deep Graph Library (DGL).

null 217 Jan 4, 2023
NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs.

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

null 420 Jan 4, 2023
Collaborative variational bandwidth auto-encoder (VBAE) for recommender systems.

Collaborative Variational Bandwidth Auto-encoder The codes are associated with the following paper: Collaborative Variational Bandwidth Auto-encoder f

Yaochen Zhu 14 Dec 11, 2022
E-Commerce recommender demo with real-time data and a graph database

?? E-Commerce recommender demo ?? This is a simple stream setup that uses Memgraph to ingest real-time data from a simulated online store. Data is str

g-despot 3 Feb 23, 2022
Deep recommender models using PyTorch.

Spotlight uses PyTorch to build both deep and shallow recommender models. By providing both a slew of building blocks for loss functions (various poin

Maciej Kula 2.8k Dec 29, 2022
Recommender System Papers

Included Conferences: SIGIR 2020, SIGKDD 2020, RecSys 2020, CIKM 2020, AAAI 2021, WSDM 2021, WWW 2021

RUCAIBox 704 Jan 6, 2023
RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems

RecSim NG, a probabilistic platform for multi-agent recommender systems simulation. RecSimNG is a scalable, modular, differentiable simulator implemented in Edward2 and TensorFlow. It offers: a powerful, general probabilistic programming language for agent-behavior specification;

Google Research 110 Dec 16, 2022
Movie Recommender System

Movie-Recommender-System Movie-Recommender-System is a web application using which a user can select his/her watched movie from list and system will r

null 1 Jul 14, 2022
Mutual Fund Recommender System. Tailor for fund transactions.

Explainable Mutual Fund Recommendation Data Please see 'DATA_DESCRIPTION.md' for mode detail. Recommender System Methods Baseline Collabarative Fiilte

JHJu 2 May 19, 2022
Movies/TV Recommender

recommender Movies/TV Recommender. Recommends Movies, TV Shows, Actors, Directors, Writers. Setup Create file API_KEY and paste your TMDB API key in i

Aviem Zur 3 Apr 22, 2022
6002project-rl - An implemention of offline RL on recommender system

An implemention of offline RL on recommender system @author: misajie @update: 20

Tzay Lee 3 May 24, 2022
Plex-recommender - Get movie recommendations based on your current PleX library

plex-recommender Description: Get movie/tv recommendations based on your current

null 5 Jul 19, 2022
Persine is an automated tool to study and reverse-engineer algorithmic recommendation systems.

Persine, the Persona Engine Persine is an automated tool to study and reverse-engineer algorithmic recommendation systems. It has a simple interface a

Jonathan Soma 87 Nov 29, 2022
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Annoy Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given quer

Spotify 10.6k Jan 1, 2023
A TensorFlow recommendation algorithm and framework in Python.

TensorRec A TensorFlow recommendation algorithm and framework in Python. NOTE: TensorRec is not under active development TensorRec will not be receivi

James Kirk 1.2k Jan 4, 2023
Fast Python Collaborative Filtering for Implicit Feedback Datasets

Implicit Fast Python Collaborative Filtering for Implicit Datasets. This project provides fast Python implementations of several different popular rec

Ben Frederickson 3k Dec 31, 2022
A Python implementation of LightFM, a hybrid recommendation algorithm.

LightFM Build status Linux OSX (OpenMP disabled) Windows (OpenMP disabled) LightFM is a Python implementation of a number of popular recommendation al

Lyst 4.2k Jan 2, 2023
EXEMPLO DE SISTEMA ESPECIALISTA PARA RECOMENDAR SERIADOS EM PYTHON

exemplo-de-sistema-especialista EXEMPLO DE SISTEMA ESPECIALISTA PARA RECOMENDAR SERIADOS EM PYTHON Resumo O objetivo de auxiliar o usuário na escolha

Josue Lopes 3 Aug 31, 2021
Books Recommendation With Python

Books-Recommendation Business Problem During the last few decades, with the rise

Çağrı Karadeniz 7 Mar 12, 2022