A Python scikit for building and analyzing recommender systems

Nicolas Hug

Last update: Jan 1, 2023

Related tags

Recommender Systems matrix systems recommender recommendation factorization svd

Overview

Surprise is a Python scikit for building and analyzing recommender systems that deal with explicit rating data.

Surprise was designed with the following purposes in mind:

Give users perfect control over their experiments. To this end, a strong emphasis is laid on documentation, which we have tried to make as clear and precise as possible by pointing out every detail of the algorithms.
Alleviate the pain of Dataset handling. Users can use both built-in datasets (Movielens, Jester), and their own custom datasets.
Provide various ready-to-use prediction algorithms such as baseline algorithms, neighborhood methods, matrix factorization-based ( SVD, PMF, SVD++, NMF), and many others. Also, various similarity measures (cosine, MSD, pearson...) are built-in.
Make it easy to implement new algorithm ideas.
Provide tools to evaluate, analyse and compare the algorithms' performance. Cross-validation procedures can be run very easily using powerful CV iterators (inspired by scikit-learn excellent tools), as well as exhaustive search over a set of parameters.

The name SurPRISE (roughly :) ) stands for Simple Python RecommendatIon System Engine.

Please note that surprise does not support implicit ratings or content-based information.

Getting started, example

Here is a simple example showing how you can (down)load a dataset, split it for 5-fold cross-validation, and compute the MAE and RMSE of the SVD algorithm.

from surprise import SVD
from surprise import Dataset
from surprise.model_selection import cross_validate

# Load the movielens-100k dataset (download it if needed).
data = Dataset.load_builtin('ml-100k')

# Use the famous SVD algorithm.
algo = SVD()

# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Output:

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

            Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std
RMSE        0.9311  0.9370  0.9320  0.9317  0.9391  0.9342  0.0032
MAE         0.7350  0.7375  0.7341  0.7342  0.7375  0.7357  0.0015
Fit time    6.53    7.11    7.23    7.15    3.99    6.40    1.23
Test time   0.26    0.26    0.25    0.15    0.13    0.21    0.06

Surprise can do much more (e.g, GridSearchCV)! You'll find more usage examples in the documentation .

Benchmarks

Here are the average RMSE, MAE and total execution time of various algorithms (with their default parameters) on a 5-fold cross-validation procedure. The datasets are the Movielens 100k and 1M datasets. The folds are the same for all the algorithms. All experiments are run on a notebook with Intel Core i5 7th gen (2.5 GHz) and 8Go RAM. The code for generating these tables can be found in the benchmark example.

Movielens 100k	RMSE	MAE	Time
SVD	0.934	0.737	0:00:11
SVD++	0.92	0.722	0:09:03
NMF	0.963	0.758	0:00:15
Slope One	0.946	0.743	0:00:08
k-NN	0.98	0.774	0:00:10
Centered k-NN	0.951	0.749	0:00:10
k-NN Baseline	0.931	0.733	0:00:12
Co-Clustering	0.963	0.753	0:00:03
Baseline	0.944	0.748	0:00:01
Random	1.514	1.215	0:00:01

Movielens 1M	RMSE	MAE	Time
SVD	0.873	0.686	0:02:13
SVD++	0.862	0.673	2:54:19
NMF	0.916	0.724	0:02:31
Slope One	0.907	0.715	0:02:31
k-NN	0.923	0.727	0:05:27
Centered k-NN	0.929	0.738	0:05:43
k-NN Baseline	0.895	0.706	0:05:55
Co-Clustering	0.915	0.717	0:00:31
Baseline	0.909	0.719	0:00:19
Random	1.504	1.206	0:00:19

Installation

With pip (you'll need numpy, and a C compiler. Windows users might prefer using conda):

$ pip install numpy
$ pip install scikit-surprise

With conda:

$ conda install -c conda-forge scikit-surprise

For the latest version, you can also clone the repo and build the source (you'll first need Cython and numpy):

$ pip install numpy cython
$ git clone https://github.com/NicolasHug/surprise.git
$ cd surprise
$ python setup.py install

License and reference

This project is licensed under the BSD 3-Clause license, so it can be used for pretty much everything, including commercial applications. Please let us know how Surprise is useful to you!

Please make sure to cite the paper if you use Surprise for your research:

@article{Hug2020,
  doi = {10.21105/joss.02174},
  url = {https://doi.org/10.21105/joss.02174},
  year = {2020},
  publisher = {The Open Journal},
  volume = {5},
  number = {52},
  pages = {2174},
  author = {Nicolas Hug},
  title = {Surprise: A Python library for recommender systems},
  journal = {Journal of Open Source Software}
}

Contributors

The following persons have contributed to Surprise:

ashtou, bobbyinfj, caoyi, Олег Демиденко, Charles-Emmanuel Dias, dmamylin, Lauriane Ducasse, Marc Feger, franckjay, Lukas Galke, Tim Gates, Pierre-François Gimenez, Zachary Glassman, Jeff Hale, Nicolas Hug, Janniks, jyesawtellrickson, Doruk Kilitcioglu, Ravi Raju Krishna, Hengji Liu, Maher Malaeb, Manoj K, James McNeilis, Naturale0, nju-luke, Jay Qi, Lucas Rebscher, Skywhat, David Stevens, TrWestdoor, Victor Wang, Mike Lee Williams, Jay Wong, Chenchen Xu, YaoZh1918.

Thanks a lot :) !

Development Status

Starting from version 1.1.0 (September 19), we will only maintain the package and provide bugfixes. No new features will be considered.

For bugs, issues or questions about Surprise, please use the GitHub project page. Please don't send emails (we will not answer).

Comments

Can I get the latent Matrix?

Description

Hello Here is my quesition. If I use the SVD for recommendation, Can I get the latent matrixs? If I can, please tell me how to.Thanks.

Versions

Surprise v1.0.4

opened by a6822342 38

Remove numpy dependency for installation

Description

Installation via pip should automatically install all requirements, including numpy. This is helpful in a continuous integration setting, where setting up the environment with pip install -r requirements.txt is quite common (and doesn't currently work for Surprise).

Steps/Code to Reproduce

I tested on both a CircleCI Docker container and a Ubuntu container. Easiest way to reproduce is in docker, but I get the same result by publishing a simple project to circleci.

➜  ~ docker run -it circleci/python:3.6.5  # same results with ubuntu
$ bash
circleci@54577d1f4689:/$ cd tmp
circleci@54577d1f4689:/tmp$ echo "joblib==0.12.0
> numpy==1.14.5
> scikit-surprise==1.0.6
> scipy==1.1.0
> six==1.11.0" > requirements.txt
circleci@54577d1f4689:/tmp$ 
circleci@54577d1f4689:/tmp$ pip install -r requirements.txt
Collecting joblib==0.12.0 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/f4/2f/66db8ecbfa71cd36146d894c867b2f595682d620329fd823c5c041687b5f/joblib-0.12.0-py2.py3-none-any.whl (261kB)
    100% |████████████████████████████████| 266kB 16.8MB/s 
Collecting numpy==1.14.5 (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/68/1e/116ad560de97694e2d0c1843a7a0075cc9f49e922454d32f49a80eb6f1f2/numpy-1.14.5-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
    100% |████████████████████████████████| 12.2MB 3.9MB/s 
Collecting scikit-surprise==1.0.6 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/4d/fc/cd4210b247d1dca421c25994740cbbf03c5e980e31881f10eaddf45fdab0/scikit-surprise-1.0.6.tar.gz (3.3MB)
    100% |████████████████████████████████| 3.3MB 16.7MB/s 
    Complete output from command python setup.py egg_info:
    Please install numpy>=1.11.2 first.
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-9bs19soh/scikit-surprise/
circleci@54577d1f4689:/tmp$

Likely cause

This almost definitely happens because we try to import numpy in setup.py. Is there a way to avoid this? For example, https://stackoverflow.com/a/42163080 might be a way forward.

Installation from requirements file works (on some environments) if numpy is in the pip cache

On my environment, if numpy happens to be in the pip cache, then installation from requirements.txt succeeds even if numpy is not present in the virtualenv. (!)

[user@machine tmp]$ virtualenv -p python3 env
Running virtualenv with interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /home/user/tmp/env/bin/python3
Also creating executable in /home/user/tmp/env/bin/python
Installing setuptools, pip, wheel...done.
[user@machine tmp]$ source env/bin/activate
(env) [user@machine tmp]$ python -c "import numpy"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'numpy'
(env) [user@machine tmp]$ 
(env) [user@machine tmp]$ pip install -r requirements.txt 
Collecting joblib==0.12.0 (from -r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/f4/2f/66db8ecbfa71cd36146d894c867b2f595682d620329fd823c5c041687b5f/joblib-0.12.0-py2.py3-none-any.whl
Collecting numpy==1.14.5 (from -r requirements.txt (line 2))
  Using cached https://files.pythonhosted.org/packages/68/1e/116ad560de97694e2d0c1843a7a0075cc9f49e922454d32f49a80eb6f1f2/numpy-1.14.5-cp36-cp36m-manylinux1_x86_64.whl
Collecting scikit-surprise==1.0.6 (from -r requirements.txt (line 3))
Collecting scipy==1.1.0 (from -r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/a8/0b/f163da98d3a01b3e0ef1cab8dd2123c34aee2bafbb1c5bffa354cc8a1730/scipy-1.1.0-cp36-cp36m-manylinux1_x86_64.whl
Collecting six==1.11.0 (from -r requirements.txt (line 5))
  Using cached https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
Installing collected packages: joblib, numpy, six, scipy, scikit-surprise
Successfully installed joblib-0.12.0 numpy-1.14.5 scikit-surprise-1.0.6 scipy-1.1.0 six-1.11.0
(env) [user@machine tmp]$

But if we rm -rf ~/.cache/pip and retry then the above installation will fail.

This is true on my local machine, but not in the barebones docker environment.

This behavior seems very strange, and I'm not familiar enough with pip and python packaging to try to explain it. But, it could explain why most regular users (who probably installed numpy some time in the past) don't run into this issue.

Workaround

My current workaround is to pip install numpy and then pip install -r requirements.txt in ths environment setup. However, having to manually keep track of installation dependencies seems a little strange.

Versions

>>> import platform; print(platform.platform())
Linux-4.17.2-1-ARCH-x86_64-with-debian-9.4
>>> import sys; print("Python", sys.version)
Python 3.6.5 (default, Jun  6 2018, 19:19:24) 
[GCC 6.3.0 20170516

opened by gbrova 20

The first sample script seems to run indefinitely.
Hi! I am a newbie with Surprise.

I have installed Surprise (1.0.5) in WinPython (WinPython-64bit-3.6.3.0Qt5). I am in Windows 10 x64. Apparently, the installation was flawless. However when i test the first sample code:

from surprise import SVD from surprise import Dataset from surprise.model_selection import cross_validate # Load the movielens-100k dataset (download it if needed), data = Dataset.load_builtin('ml-100k') # We'll use the famous SVD algorithm. algo = SVD() # Run 5-fold cross-validation and print results cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

I 'never' get a result. The script seems to run indefinitely. Just in case I've waited more than two hours... I heard clearly the acceleration fan CPU noise... (The first time the database was downloaded without problems)

I have not problem with another samples. For instance;

from surprise import SVD from surprise import Dataset from surprise import accuracy from surprise.model_selection import train_test_split # Load the movielens-100k dataset (download it if needed), data = Dataset.load_builtin('ml-100k') # sample random trainset and testset # test set is made of 25% of the ratings. trainset, testset = train_test_split(data, test_size=.25) # We'll use the famous SVD algorithm. algo = SVD() # Train the algorithm on the trainset, and predict ratings for the testset algo.fit(trainset) predictions = algo.test(testset) # Then compute RMSE accuracy.rmse(predictions)

I get:

RMSE: 0.9450

I think my problem is related with cross_validate. I request confirmation that the sample script is working properly in 1.0.5. And Could you tell me in what configuration it has been tested successfully. Thanks!! And congrats for SurPRISE! 👍

P.S: I doubt that the problem is in the setting (Winpython, Windows 10 x64, ...) I have worked with scikit_learn in the same setting. No problem. It is fair to admit that scikit_learn is preinstalled in Winpython.
opened by DJuego 19
Prediction is fixed no matter what
I have a dataset that looks like this

keywordID rating userID 5 774 1.0 2100 5 914 1.0 2100 5 965 1.0 2100 5 4035 0.6 2100

my first problem is my data is not very diverse a counter of the my ratings looks like this, originally the ratings can range between -1 and 1

Counter({0.5: 15, 0.66666666666666663: 6, 1.0: 3274})

I'm reading with a custom reader from with pandas dataframe.

my problem now, when I run the get_top_n as in the docs i get different data with each run, and running the prediction will always yield estimation of -1 no matter what

algo.predict('2100', '774')

so its not an issue with surprise itself but any ideas what am I doing wrong?
opened by AEzzatA 19

Simple example so difficult to understand

Hi,

Im making a simple book recommendation. So im using your "custom_dataset" in my example:

import os

from surprise import Dataset
from surprise import KNNBasic
from surprise import Reader

reader = Reader(line_format='user item rating', sep=' ', skip_lines=3, rating_scale=(1, 5))

custom_dataset_path = (os.path.dirname(os.path.realpath(__file__)) + '/custom_dataset')
print("Using: " + custom_dataset_path)

data = Dataset.load_from_file(file_path=custom_dataset_path, reader=reader)
trainset = data.build_full_trainset()

sim_options = {
    'name': 'cosine',
    'user_based': True  # compute  similarities between items
}

algo = KNNBasic(sim_options=sim_options)
algo.train(trainset)

uid = "user0"

pred = algo.predict(uid=uid, iid="", r_ui=0, verbose=True)
print(pred)

Results of execution:

python process.py 

Using: /Users/paulo/Developer/workspaces/python/ubook-recommender/custom_dataset
Computing the cosine similarity matrix...
Done computing similarity matrix.
user: user0      item:            r_ui = 0.00   est = 3.00   {u'reason': 'User and/or item is unkown.', u'was_impossible': True}
user: user0      item:            r_ui = 0.00   est = 3.00   {u'reason': 'User and/or item is unkown.', u'was_impossible': True}

Can you help me?

I need understand how it works, i think it is simple, but it dont enter on my mind. It is not intuitive as this:

https://www.librec.net/dokuwiki/doku.php?id=introduction

I want understand your library to make it work at same level.

Can you help me?

opened by paulocoutinhox 18

A bigger epoch brings worse results.

Hello Nicolas, I have a problem that when I use Surprise to implement SVD on my own split movielens-1m dataset, the results will get worse if I increase the parameter epoch.. It seems that svd has overfitted the train set since the epoch 1... A bigger epoch brings a worse result and even epoch=1 cannot achieve rmse less than 1 (I have tested many learning rates but in vain)... I can hardly tune my parameters to achieve that good results on the docs(e..g. 0.8+ rmse).. Is there any way to solve the problem? Thanks for your time!

opened by Cong-Zou 17
Unable to get the correct similarity Values
am using the code below and using the dataset https://github.com/ngovind93/Recommender/blob/master/toydataset.txt

This is example dataset from c.aggarwal book pg.no 34 table 2.1. the pearson similarity values are not matching when I run.

a slight modification done in knns.py to view the similarity values, in def estimate(self,u,i)

details = {'actual_k': actual_k,'neigh':k_neighbors}

from surprise import KNNBasic from surprise import Dataset from surprise import Reader import os file_path = os.path.expanduser('toydataset.txt') reader = Reader(line_format='user item rating ', sep=' ',rating_scale=(1,10)) data = Dataset.load_from_file(file_path, reader=reader) trainset = data.build_full_trainset() sim_options={'name':'pearson','min_support':1,'user_based':True} algo = KNNBasic(sim_options=sim_options,verbose=True) algo.fit(trainset) uid=str(3) iid=str(6) pred = algo.predict(uid,iid , r_ui=None, verbose=True)

am getting this result:

Computing the pearson similarity matrix... Done computing similarity matrix.

user: 3 item: 6 r_ui = None est = 4.00 {u'actual_k': 2, u'was_impossible': False, u'neigh': [(0.9707253433941511, 4.0), (0.8944271909999159, 4.0), (-0.8660254037844387, 3.0), (-1.0, 3.0)]}
opened by ngovind93 15
Problem: current kNNWithMeans implementation doesn't allow for classical item-item kNN approach
Current implementation for kNNWithMeans computes similarity measure on the original ratings, and than uses mean values to compute a prediction.

According to:

"Recommender systems. the textbook" by Aggarwal, chapter: 2.3.2 Item-Based Neighborhood Models,

initial paper on item-item CF: Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms.

For the purpose of item-item similarity computation: we should use "adjusted cosine" similariy, by taking into account not raw ratings, but rather subtracting average item-rating from each users-rating, before computing item similarities.

How to address the problem: Possible approach would be to add an "adjusted cosine" similarity measure. But in case we compute it independently of kNNWithMeans - this will mean that we will compute mean values twice, independently for similarity measure and predictions, which seems to be a computation-ineffective approach. So, I would suggest rather to incorporate mean-adjustment in the prediction algorythm itself.
opened by ODemidenko 15
Own Prediction Alogrithm: Adjusted KNN With Means
Hi,

I want to adjust the KNN alogrithm a little bit. My idea is that before you specify k_neighbours by heapq.nlargest, I want to ignore the similarities results that are smaller than a threshold (min_s). Specifically:

for (n,sim,r) in neighbors: if sim < min_s: sim = 0

Of course, I am trying to check the accuracy for this adjustment. However when I built my own alogrithm like your guidlines, i receive the following errors:

TypeError: compute_similarities() got an unexpected keyword argument 'verbose'

These are the libs I have imported. The code Class Adjusted_KNNWithMeans is the same as your KNNWithMeans; except the change I mentioned above.

from surprise import Reader, Dataset, trainset from surprise import SVD, evaluate, SVDpp from surprise.model_selection import cross_validate, train_test_split from surprise.model_selection import train_test_split from surprise import accuracy from surprise import PredictionImpossible from surprise import AlgoBase from surprise.prediction_algorithms.knns import SymmetricAlgo from __future__ import (absolute_import, division, print_function, unicode_literals) import numpy as np from six import iteritems import heapq

Thank you very much.
opened by TamTSE2301 14
[Feature Request] Recommendation strategy to recommend the items with the 10 highest estimation
Hi Nicolas, I have been using the Surprise Package for a few days now. Great job +1: In your todo list, you have an item "Implement some recommendation strategy (like recommend the items with the 10 highest estimation)". Do you have an idea as to when it will be released, approximately? Any info in regards to this would be really helpful.

What would be your basic strategy to implement this feature?

I wrote a function for SVD to return a list of top K recommendations. I follow these steps to get the list,

Iterate through all items for a specific user

Run predict function for items that don't have a rating ( Item for a user without rating)

Sort top K and return them as recommendations.

However, the function is returning same or almost similar 10 / 20 recommendations irrespective of the user.

Is the above strategy wrong? Any thoughts on why I am getting almost similar recommendations?

Thanks :)
opened by adideshp 14
Add mean squared error (MSE) to accuracy

Hello,

I noticed that the MSE measurement is not in the accuracy module.

Of course you can get the MSE by squaring RMSE ^^. But I think it would be more enjoyable if MSE is directly available.

So I pulled MSE out of RMSE and adjusted the corresponding tests.

I hope the request is justified.

Yours sincerely Marc Feger

opened by ghost 13
How do I apply ALS minimization in SVD?

Hello, @NicolasHug

According to the docs, the SVD model uses SGD as the minimization algorithm. Is there any way of using ALS instead? I needed to compare the performance of these two in my data

opened by Shariar076 1

A bug when importing data from DataFrame

Description

When importing data using DataFrame, all estimated rating equal to the mean value, not really predict the rating. But if importing the same data set from file, it works as normal.

Steps/Code to Reproduce

import pandas as pd
from surprise import SVD
from surprise import Dataset
from surprise import Reader

# Creation of the dataframe. Column names are irrelevant.
ratings_dict = {'itemID': [1, 1, 1, 2, 2],
                'userID': [9, 32, 2, 45, 'user_foo'],
                'rating': [3, 2, 4, 3, 1]}
df = pd.DataFrame(ratings_dict)

# A reader is still needed but only the rating_scale param is requiered.
reader = Reader(rating_scale=(1, 5))

# The columns must correspond to user id, item id and ratings (in that order).
data = Dataset.load_from_df(df[['userID', 'itemID', 'rating']], reader)

# We can now use this dataset as we please, e.g. calling cross_validate
svd_model = SVD()
svd_model.fit(trainset=data.build_full_trainset())
test_case = svd_model.predict(str(1),str(2),verbose=True)

Expected Results

The result were different if predict each user and item.

Actual Results

But the actual result was that all predict ratings equal to the mean value (2.6)

Versions

Windows-10-10.0.22621-SP0 Python 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)] surprise 1.1.3

opened by Alaskyed 1

What to do if the dataset I want to read has more than three parameters

Hey! I've recently started working on research pertaining to finding new uses for matrix factorization techniques. The biggest challenge is that it has proven quite hard not to overflow while testing with different datasets. Tried using surprise but whenever I try to pass anything bigger than 3 collumns on the Reader, I get an error (ValueError: line_format parameter is incorrect.)

If there's any way to bypasss that restriction, or any resource you could point me to, I'd be delighted! Thanks!

opened by GustaMatos0 0
build_anti_testset() takes along time and at the end it doesnot work

1- reader = Reader(rating_scale=(1, 5)) 2- data = Dataset.load_from_df(ratings[['userId', 'asin', 'rating']], reader) # this is my own dataset 3 - svd = SVD(n_factors= 30 , n_epochs= 20 , lr_all = 0.005 , reg_all = 0.02 ) 4 - real_trainset = data.build_full_trainset() 5 - svd.fit(real_trainset) 6 -real_testset = real_trainset.build_anti_testset() # the code stop here after along time and at the end it returns memory error
7 -predictions = svd.test(real_testset) 8 - top_n = get_top_n(predictions, n=20)

When I run the program it stops at line number 6 because of (build_anti_testset()) and it returns memory error after along time

however when I replace (build_anti_testset()) with (build_testset()) it works and doesnot have any problem

but I need to use (build_anti_testset()) instead of (build_testset()) because I need the predictions to be on the items that the users has not rated yet

opened by bodymostafa123 1
Using Cross Validate with a Pandas Dataframe, running into datatype error
Description

from future import (absolute_import, division, print_function, unicode_literals)

import pandas as pd

from surprise import SVD from surprise import dataset from surprise import Reader

reader = Reader() model = SVD() ratings_dict=ratings.to_dict('records') ratings_dict

Example output: [{'userId': 1, 'movieId': 31, 'rating': 2.5, 'timestamp': 1260759144}, {'userId': 1, 'movieId': 1029, 'rating': 3.0, 'timestamp': 1260759179}, {'userId': 1, 'movieId': 1061, 'rating': 3.0, 'timestamp': 1260759182}, {'userId': 1, 'movieId': 1129, 'rating': 2.0, 'timestamp': 1260759185},

df = pd.DataFrame.from_dict(ratings_dict)

You'll need to create a dummy reader

reader = Reader(line_format='user item rating', rating_scale=(1, 5))

Also, a dummy Dataset class

class MyDataset(dataset.DatasetAutoFolds):

def __init__(self, df, reader): self.raw_ratings = [(uid, iid, r, None) for (uid, iid, r) in zip(df['userId'], df['movieId'], df['rating'])] self.reader=reader

data = MyDataset(df, reader)

cross_validate(model,data,measures=['userId', 'movieId', 'rating'],cv=3)

Error Message: AttributeError: module 'surprise.accuracy' has no attribute 'userid' 1 cross_validate(model,data,measures=['userId', 'movieId', 'rating'],cv=3)

Steps/Code to Reproduce

Before running the above code:

ratings=pd.read_csv('ratings_small.csv') ratings.head()

!pip install scikit-surprise from surprise import Reader, Dataset, SVD from surprise.model_selection import cross_validate

Expected Results

Actual Results

Versions
opened by JackRaines 0

Owner

Nicolas Hug

ML engineer, Scikit-learn core-developer

GitHub http://surpriselib.com

Graph Neural Networks for Recommender Systems

This repository contains code to train and test GNN models for recommendation, mainly using the Deep Graph Library (DGL).

217 Jan 4, 2023

NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs.

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

420 Jan 4, 2023

Collaborative variational bandwidth auto-encoder (VBAE) for recommender systems.

Collaborative Variational Bandwidth Auto-encoder The codes are associated with the following paper: Collaborative Variational Bandwidth Auto-encoder f

14 Dec 11, 2022

E-Commerce recommender demo with real-time data and a graph database

?? E-Commerce recommender demo ?? This is a simple stream setup that uses Memgraph to ingest real-time data from a simulated online store. Data is str

3 Feb 23, 2022

Deep recommender models using PyTorch.

Spotlight uses PyTorch to build both deep and shallow recommender models. By providing both a slew of building blocks for loss functions (various poin

2.8k Dec 29, 2022

Recommender System Papers

Included Conferences: SIGIR 2020, SIGKDD 2020, RecSys 2020, CIKM 2020, AAAI 2021, WSDM 2021, WWW 2021

704 Jan 6, 2023

RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems

RecSim NG, a probabilistic platform for multi-agent recommender systems simulation. RecSimNG is a scalable, modular, differentiable simulator implemented in Edward2 and TensorFlow. It offers: a powerful, general probabilistic programming language for agent-behavior specification;