RecList is an open source library providing behavioral, "black-box" testing for recommender systems.

Overview

RecList

Documentation Status Contributors License Downloads

RecList

Overview

RecList is an open source library providing behavioral, "black-box" testing for recommender systems. Inspired by the pioneering work of Ribeiro et al. 2020 in NLP, we introduce a general plug-and-play procedure to scale up behavioral testing, with an easy-to-extend interface for custom use cases.

RecList ships with some popular datasets and ready-made behavioral tests: check the paper for more details on the relevant literature and the philosophical motivations behind the project.

If you are not familiar with the library, we suggest first taking our small tour to get acquainted with the main abstractions through ready-made models and public datasets.

Quick Links

  • Our paper, with in-depth analysis, detailed use cases and scholarly references.
  • A colab notebook (WIP), showing how to train a cart recommender model from scratch and use the library to test it.
  • Our blog post (forthcoming), with examples and practical tips.

Project updates

Nov. 2021: the library is currently in alpha. We welcome contributions and feedback, but please be advised that the package may change substantially in the upcoming months.

As the project is in active development, come back often for updates.

Summary

This doc is structured as follows:

Quick Start

If you want to see RecList in action, clone the repository, create and activate a virtual env, and install the required packages from root. If you prefer to experiment in an interactive, no-installation-required fashion, try out our colab notebook.

Sample scripts are divided by use-cases: similar items, complementary items or session-based recommendations. When executing one, a suitable public dataset will be downloaded, and a baseline ML model trained: finally, the script will run a pre-made suite of behavioral tests to show typical results.

git clone https://github.com/jacopotagliabue/reclist
cd reclist
python3 -m venv venv
source venv/bin/activate
pip install -e .
python examples/coveo_complementary_rec.py

Running your model on one of the supported dataset, leveraging the pre-made tests, is as easy as implementing a simple interface, RecModel.

Once you've run successfully the sample script, take the guided tour below to learn more about the abstractions and the out-of-the-box capabilities of RecList.

A Guided Tour

An instance of RecList represents a suite of tests for recommender systems: given a dataset (more appropriately, an instance of RecDataset) and a model (an instance of RecModel), it will run the specified tests on the target dataset, using the supplied model.

For example, the following code instantiates a pre-made suite of tests that contains sensible defaults for a cart recommendation use case:

rec_list = CoveoCartRecList(
    model=model,
    dataset=coveo_dataset
)
# invoke rec_list to run tests
rec_list(verbose=True)

Our library pre-packages standard recSys KPIs and important behavioral tests, divided by use cases, but it is built with extensibility in mind: you can re-use tests in new suites, or you can write new domain-specific suites and tests.

Any suite must inherit the RecList interface, and then declare with Pytonic decorators its tests: in this case, the test re-uses a standard function:

class MyRecList(RecList):

    @rec_test(test_type='stats')
    def basic_stats(self):
        """
        Basic statistics on training, test and prediction data
        """
        from reclist.metrics.standard_metrics import statistics
        return statistics(self._x_train,
            self._y_train,
            self._x_test,
            self._y_test,
            self._y_preds)

Any model can be tested, as long as its predictions are wrapped in a RecModel. This allows for pure "black-box" testings, a SaaS provider can be tested just by wrapping the proper API call in the method:

class MyCartModel(RecModel):

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def predict(self, prediction_input: list, *args, **kwargs):
        """
        Implement the abstract method, accepting a list of lists, each list being
        the content of a cart: the predictions returned by the model are the top K
        items suggested to complete the cart.
        """

        return

While many standard KPIs are available in the package, the philosophy behind RecList is that metrics like Hit Rate provide only a partial picture of the expected behavior of recommenders in the wild: two models with very similar accuracy can have very different behavior on, say, the long-tail, or model A can be better than model B overall, but at the expense of providing disastrous performance on a set of inputs that are particularly important in production.

RecList recognizes that outside of academic benchmarks, some mistakes are worse than others, and not all inputs are created equal: when possible, it tries to operationalize through scalable code behavioral insights for debugging and error analysis; it also provides extensible abstractions when domain knowledge and custom logic are needed.

Once you run a suite of tests, results are dumped automatically and versioned in a local folder, structured as follows (name of the suite, name of the model, run timestamp):

.reclist/
  myList/
    myModel/
      1637357392/
      1637357404/

We provide a simple (and very WIP) UI to easily compare runs and models. After you run two times one of the example scripts, you can do:

cd app
python app.py

to start a local web app that lets you explore test results:

https://github.com/jacopotagliabue/reclist/blob/main/images/explorer.png

If you select more than model, the app will automatically build comparison tables:

https://github.com/jacopotagliabue/reclist/blob/main/images/comparison.png

If you start using RecList as part of your standard testings - either for research or production purposes - you can use the JSON report for machine-to-machine communication with downstream system (e.g. you may want to automatically fail the model pipeline if certain behavioral tests are not passed).

Capabilities

RecList provides a dataset and model agnostic framework to scale up behavioral tests. As long as the proper abstractions are implemented, all the out-of-the-box components can be re-used. For example:

  • you can use a public dataset provided by RecList to train your new cart recommender model, and then use the RecTests we provide for that use case;
  • you can use some baseline model on your custom dataset, to establish a baseline for your project;
  • you can use a custom model, on a private dataset and define from scratch a new suite of tests, mixing existing methods and domain-specific tests.

We list below what we currently support out-of-the-box, with particular focus on datasets and tests, as the models we provide are convenient baselines, but they are not meant to be SOTA research models.

Datasets

RecList features convenient wrappers around popular datasets, to help test models over known benchmarks in a standardized way.

Behavioral Tests

Coming soon!

Roadmap

To do:

  • the app is just a stub: improve the report "contract" and extend the app capabilities, possibly including it in the library itself;
  • continue adding default RecTests by use cases, and test them on public datasets;
  • improving our test suites and refactor some abstractions;
  • adding Colab tutorials, extensive documentation and a blog-like write-up to explain the basic usage.

We maintain a small Trello board on the project which we plan on sharing with the community: more details coming soon!

Contributing

We will update this repo with some guidelines for contributions as soon as the codebase becomes more stable. Check back often for updates!

Acknowledgments

The main contributors are:

If you have questions or feedback, please reach out to: jacopo dot tagliabue at tooso dot ai.

License and Citation

All the code is released under an open MIT license. If you found RecList useful, or you are using it to benchmark/debug your model, please cite our pre-print (forhtcoming):

@inproceedings{Chia2021BeyondNB,
  title={Beyond NDCG: behavioral testing of recommender systems with RecList},
  author={Patrick John Chia and Jacopo Tagliabue and Federico Bianchi and Chloe He and Brian Ko},
  year={2021}
}

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Comments
  • Reclist tutorial - Hit Rate by Brand graph

    Reclist tutorial - Hit Rate by Brand graph

    Description

    I ran through the provided RecList Colab Notebook tutorial. I also read the blog on RecList.

    In the blog post, it shows the Hit Rate @ 10 by brand. Picture below.

    image

    I did not see how to generate the HR@10 by brand in the Colab Notebook example. Would you mind sharing the code on how to generate that please?

    Thank you

    opened by kmcmearty 1
  • How to pick use case clarification

    How to pick use case clarification

    Description

    When I pick a use case from Reclist and I want to build off of one of them, eg complementary items, does that mean I would instantiate the CoveoCartRecList class?

    From the Colab notebook tutorial:

    `In particular in this tutorial, we are targeting a "complementary items" use cases, such as for example a cart recommender:

    if a shopper added item X to the cart, what is she likely to add next? We train a simple, yet effective prod2vec baseline model, re-using for convenience a "training embedding" function already implemented by recsys. `

    For example, say I'm interested in the similar item or session based use cases from RecList, would I call MovieLensSimilarItemRecList or SpotifySessionRecList() to start my build from?

    I'm trying to get some clarification on the right way to use this library from the blog post and colab notebook when it says "Step #1 Pick a use case"

    Thank you

    opened by kmcmearty 1
  • may you share y_train  data for CoveoDataset  please

    may you share y_train data for CoveoDataset please

    may you share data for CoveoDataset please as we can see y_train is not provided https://github.com/jacopotagliabue/reclist/blob/main/reclist/datasets.py

        self._x_train = data["x_train"]
        self._y_train = None
        self._x_test = data["x_test"]
        self._y_test = data["y_test"]
        self._catalog = data["catalog"]
    

    x_train can be downloaded len(x_train) 927357 but y_train not

    opened by Sandy4321 1
  • Update precision_at_k

    Update precision_at_k

    Description

    Line 123 in 'reclist/reclist/metrics/standard_metrics.py'(url) Is precision@k is the percentage of the relevant documents that is viewed? Is that the definition followed here?

    def precision_at_k(y_preds, y_test, k=3):
        precision_ls = [len(set(_y).intersection(set(_p[:k]))) / len(_p) if _p else 1 for _p, _y in zip(y_preds, y_test)]
        return np.average(precision_ls)
    

    should not the len(_p) be len(_p[:k]) or the precision@k is defined is some other way? This is what I think should be precision@k if we follow the standard definition

    
    def precision_at_k(y_preds, y_test, k=3):
        precision_ls = [len(set(_y).intersection(set(_p[:k]))) / len(_p[:k]) if _p else 1 for _p, _y in zip(y_preds, y_test)]
        return np.average(precision_ls)
    
    opened by nsbits 0
  • Datalab 4426: Add BBC Sounds datasets and models

    Datalab 4426: Add BBC Sounds datasets and models

    This PR adds classes and methods to load a BBC Sounds dataset from GCP; it also adds a dummy lightFM model, which loads pre-generated predictions, and functions to generate metrics by custom (age and gender) groups).

    opened by alepiscopo 0
  • Feature Requests: Smaller Datasets as example.

    Feature Requests: Smaller Datasets as example.

    • RecList version: 0.3.1
    • Python version: 3.11
    • Operating System: Linux/Windows

    Description

    Example datasets are really large. They tend to make the system run out of memory. Can we have smaller example datasets to run the tests?

    enhancement 
    opened by unna97 0
  • Issue with Reclist package pip install.

    Issue with Reclist package pip install.

    • RecList version:
    • Python version: 3.10
    • Operating System: Windows

    Description

    Failed to install reclist on the local machine (windows) and GitHub codespaces (Linux). Runs into an issue with matplotlib and gensim.

    What I Did

    Upgrade the versions in requirements.txt

    !pip install reclist
    
    opened by unna97 1
  • Update popularity bias at k

    Update popularity bias at k

    Description

    Line 118 in 'reclist/reclist/metrics/standard_metrics.py'(url) The denominator should be len(p[:k)

    def popularity_bias_at_k(y_preds, x_train, k=3):
        # estimate popularity from training data
        pop_map = collections.defaultdict(lambda : 0)
        num_interactions = 0
        for session in x_train:
            for event in session:
                pop_map[event] += 1
                num_interactions += 1
        # normalize popularity
        pop_map = {k:v/num_interactions for k,v in pop_map.items()}
        all_popularity = []
        for p in y_preds:
            average_pop = sum(pop_map.get(_, 0.0) for _ in p[:k]) / len(p) if len(p) > 0 else 0
            all_popularity.append(average_pop)
        return sum(all_popularity) / len(y_preds)
    

    should not the len(_p) be len(_p[:k]) ? we will be looking till the kth slice This is what I think should be popularity_bias@k

    
    def popularity_bias_at_k(y_preds, x_train, k=3):
        # estimate popularity from training data
        pop_map = collections.defaultdict(lambda : 0)
        num_interactions = 0
        for session in x_train:
            for event in session:
                pop_map[event] += 1
                num_interactions += 1
        # normalize popularity
        pop_map = {k:v/num_interactions for k,v in pop_map.items()}
        all_popularity = []
        for p in y_preds:
            average_pop = sum(pop_map.get(_, 0.0) for _ in p[:k]) / len(p[:k]) if len(p) > 0 else 0
            all_popularity.append(average_pop)
        return sum(all_popularity) / len(y_preds)
    
    opened by nsbits 0
Owner
Jacopo Tagliabue
I failed the Turing Test once, but that was many friends ago.
Jacopo Tagliabue
Graph Neural Networks for Recommender Systems

This repository contains code to train and test GNN models for recommendation, mainly using the Deep Graph Library (DGL).

null 217 Jan 4, 2023
Collaborative variational bandwidth auto-encoder (VBAE) for recommender systems.

Collaborative Variational Bandwidth Auto-encoder The codes are associated with the following paper: Collaborative Variational Bandwidth Auto-encoder f

Yaochen Zhu 14 Dec 11, 2022
QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)

QRec is a Python framework for recommender systems (Supported by Python 3.7.4 and Tensorflow 1.14+) in which a number of influential and newly state-of-the-art recommendation models are implemented. QRec has a lightweight architecture and provides user-friendly interfaces. It can facilitate model implementation and evaluation.

Yu 1.4k Dec 27, 2022
Plex-recommender - Get movie recommendations based on your current PleX library

plex-recommender Description: Get movie/tv recommendations based on your current

null 5 Jul 19, 2022
Deep recommender models using PyTorch.

Spotlight uses PyTorch to build both deep and shallow recommender models. By providing both a slew of building blocks for loss functions (various poin

Maciej Kula 2.8k Dec 29, 2022
Recommender System Papers

Included Conferences: SIGIR 2020, SIGKDD 2020, RecSys 2020, CIKM 2020, AAAI 2021, WSDM 2021, WWW 2021

RUCAIBox 704 Jan 6, 2023
RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems

RecSim NG, a probabilistic platform for multi-agent recommender systems simulation. RecSimNG is a scalable, modular, differentiable simulator implemented in Edward2 and TensorFlow. It offers: a powerful, general probabilistic programming language for agent-behavior specification;

Google Research 110 Dec 16, 2022
E-Commerce recommender demo with real-time data and a graph database

?? E-Commerce recommender demo ?? This is a simple stream setup that uses Memgraph to ingest real-time data from a simulated online store. Data is str

g-despot 3 Feb 23, 2022
Movie Recommender System

Movie-Recommender-System Movie-Recommender-System is a web application using which a user can select his/her watched movie from list and system will r

null 1 Jul 14, 2022
Mutual Fund Recommender System. Tailor for fund transactions.

Explainable Mutual Fund Recommendation Data Please see 'DATA_DESCRIPTION.md' for mode detail. Recommender System Methods Baseline Collabarative Fiilte

JHJu 2 May 19, 2022
Movies/TV Recommender

recommender Movies/TV Recommender. Recommends Movies, TV Shows, Actors, Directors, Writers. Setup Create file API_KEY and paste your TMDB API key in i

Aviem Zur 3 Apr 22, 2022
6002project-rl - An implemention of offline RL on recommender system

An implemention of offline RL on recommender system @author: misajie @update: 20

Tzay Lee 3 May 24, 2022
An open source movie recommendation WebApp build by movie buffs and mathematicians that uses cosine similarity on the backend.

Movie Pundit Find your next flick by asking the (almost) all-knowing Movie Pundit Jump to Project Source » View Demo · Report Bug · Request Feature Ta

Kapil Pramod Deshmukh 8 May 28, 2022
Persine is an automated tool to study and reverse-engineer algorithmic recommendation systems.

Persine, the Persona Engine Persine is an automated tool to study and reverse-engineer algorithmic recommendation systems. It has a simple interface a

Jonathan Soma 87 Nov 29, 2022
fastFM: A Library for Factorization Machines

Citing fastFM The library fastFM is an academic project. The time and resources spent developing fastFM are therefore justified by the number of citat

null 1k Dec 24, 2022
A Library for Field-aware Factorization Machines

Table of Contents ================= - What is LIBFFM - Overfitting and Early Stopping - Installation - Data Format - Command Line Usage - Examples -

null 1.6k Dec 5, 2022
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

null 419 Jan 3, 2023
Code for Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? (SDM 2022)

Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? (SDM 2022) We consider how a user of a web servi

joisino 20 Aug 21, 2022
Log4jScanner is a Log4j Related CVEs Scanner, Designed to Help Penetration Testers to Perform Black Box Testing on given subdomains.

Log4jScanner Log4jScanner is a Log4j Related CVEs Scanner, Designed to Help Penetration Testers to Perform Black Box Testing on given subdomains. Disc

Pushpender Singh 35 Dec 12, 2022
NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs.

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

null 420 Jan 4, 2023