Unbiased Learning To Rank Algorithms (ULTRA)

Last update: Dec 1, 2022

Related tags

Deep Learning ULTRA_pytorch

Overview

Unbiased Learning to Rank Algorithms (ULTRA)

🔥 News: A TensorFlow version of this package can be found in ULTRA.

This is an Unbiased Learning To Rank Algorithms (ULTRA) toolbox, which provides a codebase for experiments and research on learning to rank with human annotated or noisy labels. With the unified data processing pipeline, ULTRA supports multiple unbiased learning-to-rank algorithms, online learning-to-rank algorithms, neural learning-to-rank models, as well as different methods to use and simulate noisy labels (e.g., clicks) to train and test different algorithms/ranking models. A user-friendly documentation can be found here.

Get Started

Create virtual environment (optional):

pip install --user virtualenv
~/.local/bin/virtualenv -p python3 ./venv
source venv/bin/activate

Install ULTRA from the source:

git clone https://github.com/ULTR-Community/ULTRA_pytorch.git
cd ULTRA
make init

Run toy example:

bash example/toy/offline_exp_pipeline.sh

Structure

Input Layers

ClickSimulationFeed: this is the input layer that generate synthetic clicks on fixed ranked lists to feed the learning algorithm.
DeterministicOnlineSimulationFeed: this is the input layer that first create ranked lists by sorting documents according to the current ranking model, and then generate synthetic clicks on the lists to feed the learning algorithm. It can do result interleaving if required by the learning algorithm.
StochasticOnlineSimulationFeed: this is the input layer that first create ranked lists by sampling documents based on their scores in the current ranking model and the Plackett-Luce distribution, and then generate synthetic clicks on the lists to feed the learning algorithm. It can do result interleaving if required by the learning algorithm.
DirectLabelFeed: this is the input layer that directly feed the true relevance labels of each documents to the learning algorithm.

Learning Algorithms

NA: this model is an implementation of the naive algorithm that directly train models with input labels (e.g., clicks).
DLA: this is an implementation of the Dual Learning Algorithm in Unbiased Learning to Rank with Unbiased Propensity Estimation.
IPW: this model is an implementation of the Inverse Propensity Weighting algorithms in Learning to Rank with Selection Bias in Personal Search and Unbiased Learning-to-Rank with Biased Feedback
REM: this model is an implementation of the regression-based EM algorithm in Position bias estimation for unbiased learning to rank in personal search
PD: this model is an implementation of the pairwise debiasing algorithm in Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm.
DBGD: this model is an implementation of the Dual Bandit Gradient Descent algorithm in Interactively optimizing information retrieval systems as a dueling bandits problem
MGD: this model is an implementation of the Multileave Gradient Descent in Multileave Gradient Descent for Fast Online Learning to Rank
NSGD: this model is an implementation of the Null Space Gradient Descent algorithm in Efficient Exploration of Gradient Space for Online Learning to Rank
PDGD: this model is an implementation of the Pairwise Differentiable Gradient Descent algorithm in Differentiable unbiased online learning to rank

Ranking Models

Linear: this is a linear ranking algorithm that compute ranking scores with a linear function.
DNN: this is neural ranking algorithm that compute ranking scores with a multi-layer perceptron network (with non-linear activation functions).
DLCM: this is an implementation of the Deep Listwise Context Model in Learning a Deep Listwise Context Model for Ranking Refinement (TODO).
GSF: this is an implementation of the Groupwise Scoring Function in Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks (TODO).
SetRank: this is an implementation of the SetRank model in SetRank: Learning a Permutation-Invariant Ranking Model for Information Retrieval (TODO).

Supported Evaluation Metrics

MRR: the Mean Reciprocal Rank.
ERR: the Expected Reciprocal Rank from Expected reciprocal rank for graded relevance.
ARP: the Average Relevance Position.
NDCG: the Normalized Discounted Cumulative Gain.
DCG: the Discounted Cumulative Gain.
Precision: the Precision.
MAP: the Mean Average Precision.
Ordered_Pair_Accuracy: the percentage of correctedly ordered pair.

Click Simulation Example

Create click models for click simulations

python ultra/utils/click_models.py pbm 0.1 1 4 1.0 example/ClickModel

* The output is a json file containing the click mode that could be used for click simulation. More details could be found in the code.

(Optional) Estimate examination propensity with result randomization

python ultra/utils/propensity_estimator.py example/ClickModel/pbm_0.1_1.0_4_1.0.json 
   
     example/PropensityEstimator/

* The output is a json file containing the estimated examination propensity (used for IPW). DATA_DIR is the directory for the prepared data created by ./libsvm_tools/prepare_exp_data_with_svmrank.py. More details could be found in the code.

Citation

If you use ULTRA in your research, please use the following BibTex entry.

@misc{tran2021ultra,
      title={ULTRA: An Unbiased Learning To Rank Algorithm Toolbox}, 
      author={Anh Tran and Tao Yang and Qingyao Ai},
      year={2021},
      eprint={2108.05073},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}

@article{10.1145/3439861,
author = {Ai, Qingyao and Yang, Tao and Wang, Huazheng and Mao, Jiaxin},
title = {Unbiased Learning to Rank: Online or Offline?},
year = {2021},
issue_date = {February 2021},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {39},
number = {2},
issn = {1046-8188},
url = {https://doi.org/10.1145/3439861},
doi = {10.1145/3439861},
journal = {ACM Trans. Inf. Syst.},
month = feb,
articleno = {21},
numpages = {29},
keywords = {unbiased learning, online learning, Learning to rank}
}

Development Team

Qingyao Ai

Core Dev
ASST PROF, Univ. of Utah

Anh Tran

Core Dev
Ph.D., Univ. of Utah

Tao Yang

Core Dev
Ph.D., Univ. of Utah

Huazheng Wang

Core Dev
Ph.D., Univ. of Virginia

Jiaxin Mao

Core Dev
ASST PROF, Renmin Univ.

Contribution

Please read the Contributing Guide before creating a pull request.

Project Organizers

Qingyao Ai
- School of Computing, University of Utah
- Homepage

License

Apache-2.0

Comments

KeyError: 'selection_bias_cutoff' in example/toy/offline_exp_pipeline.sh

Hi. Thanks for sharing the great project. I'm new to this and tried the first example script. Training seems fine but testing fails as follows. Any idea?

Ubuntu 20.04
Python 3.8.10

Thanks

(venv) me@DESKTOP-FCVGD1U:~/workspace/github/ULTRA_pytorch$ bash example/toy/offline_exp_pipeline.sh
Reading data in ./example/toy/data/
Read data from ./example/toy/data//train in libsvm format.
Reading finish: 0 lines
Remove 7 invalid queries.
Data reading finish!
Finished reading 20 queries with lists.
Read data from ./example/toy/data//valid in libsvm format.
Reading finish: 0 lines
Remove 3 invalid queries.
Data reading finish!
Finished reading 6 queries with lists.
Train Rank list size 9
Valid Rank list size 9
Users can only see the top 9 documents for each query in training.
Creating model...
Build DLA

Loss Function is softmax_loss
Created model with fresh parameters.
Create simluated clicks feed

Create direct label feed with list size 9 with feature size 136
max_train_iter:  10
 Loss 4.346205 at Global Step 0: 
 Loss 6.528031 at Global Step 1: 
 Loss 8.008065 at Global Step 2: 
 Loss 4.675643 at Global Step 3: 
 Loss 8.523232 at Global Step 4: 
 Loss 4.580126 at Global Step 5: 
 Loss 5.745630 at Global Step 6: 
 Loss 4.260286 at Global Step 7: 
 Loss 4.781512 at Global Step 8: 
 Loss 4.347456 at Global Step 9: 
 Loss 4.397479 at Global Step 10: 
 Loss 4.239964 at Global Step 11: 
 Loss 4.159795 at Global Step 12: 
 Loss 5.694969 at Global Step 13: 
 Loss 4.179561 at Global Step 14: 
 Loss 4.300393 at Global Step 15: 
 Loss 4.171987 at Global Step 16: 
 Loss 4.124520 at Global Step 17: 
 Loss 5.438332 at Global Step 18: 
 Loss 4.089553 at Global Step 19: 
 Loss 5.418541 at Global Step 20: 
 Loss 4.175595 at Global Step 21: 
 Loss 4.735269 at Global Step 22: 
 Loss 4.107638 at Global Step 23: 
 Loss 4.677073 at Global Step 24: 
 Loss 4.106503 at Global Step 25: 
 Loss 4.410158 at Global Step 26: 
 Loss 4.117153 at Global Step 27: 
 Loss 4.603172 at Global Step 28: 
 Loss 4.095118 at Global Step 29: 
 Loss 4.604589 at Global Step 30: 
 Loss 4.101539 at Global Step 31: 
 Loss 4.362749 at Global Step 32: 
 Loss 4.077655 at Global Step 33: 
 Loss 4.241923 at Global Step 34: 
 Loss 4.122931 at Global Step 35: 
 Loss 4.558557 at Global Step 36: 
 Loss 4.125799 at Global Step 37: 
 Loss 4.322569 at Global Step 38: 
 Loss 4.135751 at Global Step 39: 
 Loss 4.521419 at Global Step 40: 
 Loss 4.149001 at Global Step 41: 
 Loss 4.357186 at Global Step 42: 
 Loss 4.099861 at Global Step 43: 
 Loss 4.441446 at Global Step 44: 
 Loss 4.093635 at Global Step 45: 
 Loss 4.306568 at Global Step 46: 
 Loss 4.094622 at Global Step 47: 
 Loss 4.529568 at Global Step 48: 
 Loss 4.184571 at Global Step 49: 
global step 50 learning rate 0.0500 step-time 0.13 loss 4.6094
mrr_3 tensor(0.5972, device='cuda:0')
mrr_5 tensor(0.5972, device='cuda:0')
mrr_10 tensor(0.5972, device='cuda:0')
ndcg_3 tensor(0.4604, device='cuda:0')
ndcg_5 tensor(0.5386, device='cuda:0')
ndcg_10 tensor(0.6191, device='cuda:0')
Save model, valid ndcg_10:0.619
current_step:  50
Reading data in ./example/toy/data/
Read data from ./example/toy/data//test in libsvm format.
Reading finish: 0 lines
Remove 3 invalid queries.
Data reading finish!
Finished reading 6 queries with lists.
Build DLA

Traceback (most recent call last):
  File "main.py", line 291, in <module>
    main(argv)
  File "main.py", line 284, in main
    test(exp_settings)
  File "main.py", line 236, in test
    model = create_model(exp_settings, test_set)
  File "main.py", line 70, in create_model
    model = ultra.utils.find_class(exp_settings['learning_algorithm'])(data_set, exp_settings)
  File "/home/me/workspace/github/ULTRA_pytorch/ultra/learning_algorithm/dla.py", line 97, in __init__
    self.rank_list_size = exp_settings['selection_bias_cutoff']
KeyError: 'selection_bias_cutoff'

opened by hideojoho 4

Confusion for example in main.py

Hi,

Thanks for open source this amazing work.

I have a question for the current example in main.py. Specifcially, for the following code,

https://github.com/ULTR-Community/ULTRA_pytorch/blob/ec4fe329e4239b588a940cb4bcdd6a321aade679/main.py#L92

If I understand correctly, is this should be ultra.utils.find_class(exp_settings['valid_input_feed']).preprocess_data(valid_set, exp_settings['valid_input_hparams'], exp_settings)

since for the valid set, we would like to evaluate based on the human annotation for LTR, thus the key should be valid_input_feed instead of train_input_feed, and for the same reason, the key for the second parameter shall be altered as well.

In addition, should be the test_set need to altered correspondingly.

Thanks for your time and consideration.

opened by rowedenny 2

SetRank w/ DLA learning error

i'm trying to reproduce the algorithm on my local machine, w/ different IDE, i has these settings:

exp_settings = {
    "train_input_feed":"ultra.input_layer.ClickSimulationFeed",
    "train_input_hparams":"",
    "valid_input_feed":"ultra.input_layer.DirectLabelFeed",
    "valid_input_hparams":"",
    "test_input_feed":"ultra.input_layer.DirectLabelFeed",
    "test_input_hparams":"",
    "ranking_model":"ultra.ranking_model.SetRank",
    "ranking_model_hparams":" ",
    "learning_algorithm": "ultra.learning_algorithm.dla.DLA",
    "learning_algorithm_hparams":"",
    "metrics": [
        "mrr", "ndcg"
    ],
    "metrics_topn" : [3,5,10],
    "objective_metric": "ndcg_10",
    "max_candidate_num": 10 #added after error in DLA default setting setrank*
}

the error i got, when i execute model = create_model(exp_settings, train_set)

Reading finish: 0 lines
Remove 7 invalid queries.
Data reading finish!
Finished reading 20 queries with lists.
Read data from /ULTRA/example/toy/data//valid in libsvm format.
Reading finish: 0 lines
Remove 3 invalid queries.
Data reading finish!
Finished reading 6 queries with lists.
Train Rank list size 9
Valid Rank list size 9
Users can only see the top 9 documents for each query in training.
Creating model...
Build DLA
Traceback (most recent call last):

  File "/tmp/ipykernel_1350348/2504700013.py", line 151, in <module>
    model = create_model_main(exp_settings, train_set)

  File "/tmp/ipykernel_1350348/2504700013.py", line 99, in create_model_main
    model = ultra.utils.find_class(exp_settings['learning_algorithm'])(data_set, exp_settings)

  File "/ULTRA/ultra/learning_algorithm/dla.py", line 102, in __init__
    self.rank_model = self.create_model(self.feature_size)

  File "/ULTRA/ultra/learning_algorithm/base_algorithm.py", line 164, in create_model
    model = utils.find_class(

TypeError: 'module' object is not callable

There something wrong w/ function or nesting parameters here:

def create_model(self, feature_size):
      if not hasattr(self, "model"):
              model = utils.find_class(
                  self.exp_settings['ranking_model'])(
                  self.exp_settings['ranking_model_hparams'], feature_size)
      return model

debugging lead to setrank() model inside of DLA learning algorithm which is stuck w/ this line of code:

self.exp_settings['ranking_model_hparams'], feature_size)"

opened by sfquest 1

Memory Leakage in DLA implementation

https://github.com/ULTR-Community/ULTRA_pytorch/blob/ec4fe329e4239b588a940cb4bcdd6a321aade679/ultra/learning_algorithm/dla.py#L266

Refer to https://discuss.pytorch.org/t/cpu-ram-usage-increasing-for-every-epoch/24475/9

When I run DLA for 10k steps, the CPU memory will be used up. A simple fixation is to modify it as self.loss.item().

A similar issue also occurs in regression_EM, pdgd, nsgd, etc.

opened by rowedenny 0
Metrics computation issue
Hi, Thanks for sharing this work.

I re-run the DLA on Yahoo! data, and the data preprocessing follows the example/Yahoo/offline_exp_pipeline.sh. However, I find the following metrics triggers errors:

average_relevance_position has an issue that tensors on different devices as the following tensor is on CPU instead of GPU. https://github.com/ULTR-Community/ULTRA_pytorch/blob/ec4fe329e4239b588a940cb4bcdd6a321aade679/ultra/utils/metrics.py#L363

precision is not callable. Since it is computed without repeat, it will trigger an error as TypeError: zip argument #2 must support iteration
opened by rowedenny 0
Is this a typo for variable --data_dir?

Thanks for this amazing work.

I am trying to follow the experiments on Yahoo! Learning-to-Rank dataset, and find the variable $Data_path/tmp_data/ is not specified previously.

I assume it should be folder containing normalized train/test/valid, so is this should be Data_path/normalized/ instead of Data_path/tmp_data/?

https://github.com/ULTR-Community/ULTRA_pytorch/blob/d2d98b859ff49ad20335d0287d4de8906f9708b9/example/Yahoo/offline_exp_pipeline.sh#L82

And then the model will load train/valid/test, respectively for unbiased learning-to-rank.

oops, sorry that I miss that is the output from ./libsvm_tools/prepare_exp_data_with_svmrank.py

Issue closed.

opened by rowedenny 0
Loss not reducing, high validation and test metric values

I tried to run the code with DLA algorithm on Yahoo dataset. Following is the output attached. I am not sure of the following observation where I am getting almost constant training loss of about 4 (with each rank loss and exam loss as about 2), and high validation and testing metric values of more than 0.9. I did try to observe the parameter values of 2 models, which are actually updating. Also the loss is just fluctuating in range of 3.9 to 4.5 always. Is there something I should do with hyperparameters, have kept the default learning rate of 0.05 and selection_bias_cutoff = 10. This is with respect to the pytorch implementation of the code

opened by parth-shettiwar 10
Bug in regression_EM.py?

I'm a little rusty at this stuff but I believe there might be a bug in the implementation of the position bias EM algorithm here for p_r1 = reshaped_train_labels + (1 - reshaped_train_labels) * p_e0_r1_c0. In particular, I think p_r1 for unclicked documents, (the 1-reshaped_train_labels case), is incorrect.

Using notation from the paper, page 614, we want to compute $P(R=1|C=0, q, k, d)$ by marginalizing $P(R=1, E|C=0, q, k, d)$ over $E$:

$$ \begin{aligned} P(R=1|C=0, q, k, d) =& \underbrace{P(R=1, E=1|C=0, q, k, d)}_{=0} P(E=1|C=0, q, k, d) \ & + P(R=1, E=0|C=0, q, k, d) P(E=0|C=0, q, k, d) \ =& P(R=1, E=0|C=0, q, k, d) (1- P(E=1|C=0, q, k, d)) \end{aligned} $$

The implementation just has $P(R=1|C=0, q, k, d) = P(R=1, E=0|C=0, q, k, d)$. That is, it's missing the $(1- P(E=1|C=0, q, k, d))$ term.

Am I missing something or is this a bug? Unfortunately, the paper just says "From this, we can compute the marginals P(E = 1|c, q, d, k) and P(R = 1|c, q, d, k)" without additional details and I can't find any other implementations of this paper to see how they're working. Like I said, I'm rusty at this so I may be missing something obvious.

Note, I think the same issue comes up here with $P(E=1|C=0, q,k, d)$, which is not explicitly named in the code like p_r1, but is computed as reshaped_train_labels + (1 - reshaped_train_labels) * p_e1_r0_c0 if I'm reading things correctly.

I think it should instead be

$$ P(E=1|C=0, q, k, d) = P(R=0, E=1|C=0, q, k, d) (1- P(R=1|C=0, q, k, d)) $$

Of course, these depend on each other, but with these two equations we can solve for $P(E=1|C=0, q, k, d)$ and $P(R=1|C=0, q, k, d)$ in terms of $P(R, E|C=0, q, k, d)$ which we can compute from gamma and self.propensity.

opened by lendle 0
usage from Python

Hi,

Thanks for releasing this interesting platform.

I'm interested in how ULTRA could be used from another Python library - e.g. say I wanted to use an ULTRA model to re-rank results in PyTerrier.

With sklearn, xgboost, fastrank etc, I can give it an array of feature values for a given document, and it will return a score - see https://pyterrier.readthedocs.io/en/latest/ltr.html#learning for our integration.

Does ULTRA have a similar API?

opened by cmacdonald 5

Unbiased Learning To Rank Algorithms (ULTRA)

Related tags

Overview

Unbiased Learning to Rank Algorithms (ULTRA)

Get Started

Structure

Input Layers

Learning Algorithms

Ranking Models

Supported Evaluation Metrics

Click Simulation Example

Citation

Development Team

Contribution

Project Organizers

License

Comments

Owner

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Official PyTorch implementation of "ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows"

Toward Spatially Unbiased Generative Models (ICCV 2021)

[ICCV 2021] Released code for Causal Attention for Unbiased Visual Recognition

Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

Simple Tensorflow implementation of Toward Spatially Unbiased Generative Models (ICCV 2021)

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite.

Ultra-lightweight human body posture key point CNN model. ModelSize:2.3MB HUAWEI P40 NCNN benchmark: 6ms/img,

Lane assist for ETS2, built with the ultra-fast-lane-detection model.

An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform