LibRerank
LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate. It also supports LambdaMART and DNN as initial ranker.
Get Started
Create virtual environment(optional)
pip install --user virtualenv
~/.local/bin/virtualenv -p python3 ./venv
source venv/bin/activate
Install LibRerank from source
git clone https://github.com/LibRerank-Community/LibRerank.git
cd LibRerank
make init
Run example
Run initial ranker
bash example/run_ranker.sh
Run re-ranker
bash example/run_reranker.sh
We can choose to enter a config file like example/run_reranker.sh
via the parameter setting_path
. The config files for the different models can be found in example/config
. We can also set various parameters directly from the command line. A list of supported parameters can be found in run_init_ranker.py
and run_reranker.py
.
Structure
librerank
Initial rankers
DNN: a naive algorithm that directly train a multi-layer perceptron network with input labels (e.g., clicks).
LambdaMART: the implementation of the LambdaMART model in From RankNet to LambdaRank to LambdaMART: An Overview
Re-ranking algorithms
DLCM: the implementation of the Deep Listwise Context Model in Learning a Deep Listwise Context Model for Ranking Refinement.
PRM: the implementation of the Personalized Re-ranking Model in Personalized Re-ranking for Recommendation
GSF: the implementation of the Groupwise Scoring Function in Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks.
miDNN: the implementation of the miDNN model in Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search
SetRank: the implementation of the SetRank model in SetRank: Learning a Permutation-Invariant Ranking Model for Information Retrieval.
Seq2Slate: the implementation of sequence-to-sequence model for re-ranking in Seq2Slate: Re-ranking and Slate Optimization with RNNs
EGRerank: the implementation of the Evaluator-Generator Reranking in AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online
Data
We process two datasets, Ad and PRM Public, containing user and item features with recommendation lists for the experimentation with personalized re-ranking. The details of processed datasets are summarized in the following table
Dataset | #item | #list | # user feature | # item feature |
---|---|---|---|---|
Ad | 349,404 | 483,049 | 8 | 6 |
PRM Public | 2,851,766 | 1,295,496 | 3 | 24 |
Depending on the length of the initial ranking, the maximum length of initial lists (re-ranking size n) is set to 10 and 30 for Ad and PRM Public, respectively.
Ad
The original Ad dataset records 1 million users and 26 million ad display/click logs, with 8 user profiles (e.g., id, age, and occupation), 6 item features (e.g., id, campaign, and brand). Following previous work, We transform records of each user into ranking lists according to the timestamp of the user browsing the advertisement. Items that have been interacted with within five minutes are sliced into a list and the processed data is avaliable here. The detailed process is here.
PRM public
The original PRM public dataset contains re-ranking lists from a real-world e-commerce RS. Each record is a recommendation list consisting of 3 user profile features, 5 categorical, and 19 dense item features. Due to the memory limitation, we randomly sample 10% of lists and remained data is avaliable here. The detailed process is here.