LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate.

Last update: Dec 28, 2022

Related tags

Machine Learning LibRerank

Overview

LibRerank

LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate. It also supports LambdaMART and DNN as initial ranker.

Get Started

Create virtual environment(optional)

pip install --user virtualenv
~/.local/bin/virtualenv -p python3 ./venv
source venv/bin/activate

Install LibRerank from source

git clone https://github.com/LibRerank-Community/LibRerank.git
cd LibRerank
make init

Run example

Run initial ranker

bash example/run_ranker.sh

Run re-ranker

bash example/run_reranker.sh

We can choose to enter a config file like example/run_reranker.sh via the parameter setting_path. The config files for the different models can be found in example/config. We can also set various parameters directly from the command line. A list of supported parameters can be found in run_init_ranker.py and run_reranker.py.

Structure

librerank

Initial rankers

DNN: a naive algorithm that directly train a multi-layer perceptron network with input labels (e.g., clicks).

LambdaMART: the implementation of the LambdaMART model in From RankNet to LambdaRank to LambdaMART: An Overview

Re-ranking algorithms

DLCM: the implementation of the Deep Listwise Context Model in Learning a Deep Listwise Context Model for Ranking Refinement.

PRM: the implementation of the Personalized Re-ranking Model in Personalized Re-ranking for Recommendation

GSF: the implementation of the Groupwise Scoring Function in Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks.

miDNN: the implementation of the miDNN model in Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search

SetRank: the implementation of the SetRank model in SetRank: Learning a Permutation-Invariant Ranking Model for Information Retrieval.

Seq2Slate: the implementation of sequence-to-sequence model for re-ranking in Seq2Slate: Re-ranking and Slate Optimization with RNNs

EGRerank: the implementation of the Evaluator-Generator Reranking in AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online

Data

We process two datasets, Ad and PRM Public, containing user and item features with recommendation lists for the experimentation with personalized re-ranking. The details of processed datasets are summarized in the following table

Dataset	#item	#list	# user feature	# item feature
Ad	349,404	483,049	8	6
PRM Public	2,851,766	1,295,496	3	24

Depending on the length of the initial ranking, the maximum length of initial lists (re-ranking size n) is set to 10 and 30 for Ad and PRM Public, respectively.

The original Ad dataset records 1 million users and 26 million ad display/click logs, with 8 user profiles (e.g., id, age, and occupation), 6 item features (e.g., id, campaign, and brand). Following previous work, We transform records of each user into ranking lists according to the timestamp of the user browsing the advertisement. Items that have been interacted with within five minutes are sliced into a list and the processed data is avaliable here. The detailed process is here.

PRM public

The original PRM public dataset contains re-ranking lists from a real-world e-commerce RS. Each record is a recommendation list consisting of 3 user profile features, 5 categorical, and 19 dense item features. Due to the memory limitation, we randomly sample 10% of lists and remained data is avaliable here. The detailed process is here.

You might also like...

A toolkit for geo ML data processing and model evaluation (fork of solaris)

An open source ML toolkit for overhead imagery. This is a beta version of lunular which may continue to develop. Please report any bugs through issues

4 Nov 4, 2021

Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

SDK: Overview of the Kubeflow pipelines service Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on

3.1k Jan 6, 2023

A Powerful Serverless Analysis Toolkit That Takes Trial And Error Out of Machine Learning Projects

KXY: A Seemless API to 10x The Productivity of Machine Learning Engineers Documentation https://www.kxy.ai/reference/ Installation From PyPi: pip inst

35 Jan 2, 2023

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

25 Dec 28, 2022

Comments

segmentation fault

LibRerank# bash example/run_ranker.sh num of item 349404 num of list 483049 profile num 8 spar num 5 dens num 1 Tree 0 example/run_ranker.sh: line 1: 2477587 Segmentation fault

opened by SpaceLearner 1

LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate.

Related tags

Overview

LibRerank

Get Started

Create virtual environment(optional)

Install LibRerank from source

Run example

Structure

librerank

Initial rankers

Re-ranking algorithms

Data

Ad

PRM public

You might also like...

A toolkit for geo ML data processing and model evaluation (fork of solaris)

Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

A Powerful Serverless Analysis Toolkit That Takes Trial And Error Out of Machine Learning Projects

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

Metric learning algorithms in Python

Uplift modeling and causal inference with machine learning algorithms

Python-based implementations of algorithms for learning on imbalanced data.

Distributed Evolutionary Algorithms in Python

An open-source library of algorithms to analyse time series in GPU and CPU.

Comments

segmentation fault

Owner

Price forecasting of SGB and IRFC Bonds and comparing there returns

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

A toolkit for making real world machine learning and data analysis applications in C++

Sequence learning toolkit for Python

A machine learning toolkit dedicated to time-series data

A machine learning toolkit dedicated to time-series data

A Python toolkit for rule-based/unsupervised anomaly detection in time series

Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis.

ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more