Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

H.R. Oosterhuis

Last update: Nov 29, 2022

Related tags

Deep Learning 2021-SIGIR-plackett-luce

Overview

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

This repository contains the code used for the experiments in "Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness" published at SIGIR 2021 (preprint available).

Citation

If you use this code to produce results for your scientific publication, or if you share a copy or fork, please refer to our SIGIR 2021 paper:

@inproceedings{oosterhuis2021plrank,
  Author = {Oosterhuis, Harrie},
  Booktitle = {Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR`21)},
  Organization = {ACM},
  Title = {Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness},
  Year = {2021}
}

License

The contents of this repository are licensed under the MIT license. If you modify its contents in any way, please link back to this repository.

Usage

This code makes use of Python 3, the numpy and the tensorflow packages, make sure they are installed.

A file is required that explains the location and details of the LTR datasets available on the system, for the Yahoo! Webscope, MSLR-Web30k, and Istella datasets an example file is available. Copy the file:

cp example_datasets_info.txt local_dataset_info.txt

Open this copy and edit the paths to the folders where the train/test/vali files are placed.

Here are some command-line examples that illustrate how the results in the paper can be replicated. First create a folder to store the resulting models:

mkdir local_output

To optimize NDCG use run.py with the --loss flag to indicate the loss to use (PL_rank_1/PL_rank_2/lambdaloss/pairwise/policygradient/placementpolicygradient); --cutoff indicates the top-k that is being optimized, e.g. 5 for NDCG@5; --num_samples the number of samples to use per gradient estimation (with dynamic for the dynamic strategy); --dataset indicates the dataset name, e.g. Webscope_C14_Set1. The following command optimizes NDCG@5 with PL-Rank-2 and the dynamic sampling strategy on the Yahoo! dataset:

python3 run.py local_output/yahoo_ndcg5_dynamic_plrank2.txt --num_samples dynamic --loss PL_rank_2 --cutoff 5 --dataset Webscope_C14_Set1

To optimize the disparity metric for exposure fairness use fairrun.py this has the additional flag --num_exposure_samples for the number of samples to use to estimate exposure (this must always be a greater number than --num_samples). The following command optimizes disparity with PL-Rank-2 and the dynamic sampling strategy on the Yahoo! dataset with 1000 samples for estimating exposure:

python3 fairrun.py local_output/yahoo_fairness_dynamic_plrank2.txt --num_samples dynamic --loss PL_rank_2 --cutoff 5 --num_exposure_samples 1000 --dataset Webscope_C14_Set1

Comments

Confusion about data set source file format

I read the code details and noticed that the code directly read .json files, but the official download link of these data sets are all .txt files. Politely, I wonder if you have cleaned source data or changed the format by yourself? Or it's just because I didn't find the right download link for data sets?

opened by AndyZZt 1
confusion about comment for ndcg

hi, I'm confused aboud the code below https://github.com/HarrieO/2021-SIGIR-plackett-luce/blob/8a59cdfe565e59e6f122ddd6dc807ee156d25b8c/run.py#L132

it says we should uncomment it for NDCG, but it says we could use command

python3run.py local_output/yahoo_ndcg5_dynamic_plrank2.txt --num_samples dynamic --loss PL_rank_2 --cutoff 5 --datasetWebscope_C14_Set1

to optimize NDCG@5 withoud any change to run.py

Is it by design or there is a bug here?

opened by we1559 1

Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Racing Line Optimization with PSO This repository contains a racing line optimization algorithm in python that uses Particle Swarm Optimization. Requi

6 Dec 14, 2022

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

8.4k Jan 1, 2023

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

6.9k Jan 4, 2023

5.7k Feb 12, 2021

XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale

XtremeDistilTransformers for Distilling Massive Multilingual Neural Networks ACL 2020 Microsoft Research [Paper] [Video] Releasing [XtremeDistilTransf

125 Jan 4, 2023

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

14.5k Jan 8, 2023

Learning embeddings for classification, retrieval and ranking.

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

Related tags

Overview

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

Citation

License

Usage

You might also like...

Racing line optimization algorithm in python that uses Particle Swarm Optimization.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Learning embeddings for classification, retrieval and ranking.

Fast, differentiable sorting and ranking in PyTorch

Code and data of the ACL 2021 paper: Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision

Comments

Confusion about data set source file format

confusion about comment for ndcg

Owner

H.R. Oosterhuis

An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

ICLR 2021, Fair Mixup: Fairness via Interpolation

Ranking Models in Unlabeled New Environments （iccv21）

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective