Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

Haruka Kiyohara

Last update: Dec 7, 2022

Related tags

Overview

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

About

This repository contains the code to replicate the synthetic experiment conducted in the paper "Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model" by Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, and Yasuo Yamamoto, which has been accepted to WSDM2022.

If you find this code useful in your research then please site:

@inproceedings{kiyohara2022doubly,
  author = {Kiyohara, Haruka and Saito, Yuta and Matsuhiro, Tatsuya and Narita, Yusuke and Shimizu, Nobuyuki and Yamamoto, Yasuo},
  title = {Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model},
  booktitle = {Proceedings of the 15th International Conference on Web Search and Data Mining},
  pages = {xxx--xxx},
  year = {2022},
}

Dependencies

This repository supports Python 3.7 or newer.

numpy==1.20.0
pandas==1.2.1
scikit-learn==0.24.1
matplotlib==3.4.3
obp==0.5.2
hydra-core==1.0.6

Note that the proposed Cascade-DR estimator is implemented in Open Bandit Pipeline (obp.ope.SlateCascadeDoublyRobust).

Running the code

To conduct the synthetic experiment, run the following commands.

(i) run OPE simulations with varying data size, with the fixed slate size.

python src/main.py setting=n_rounds

(ii), (iii) run OPE simulations with varying slate size and policy similarities, with the fixed data size.

python src/main.py

Once the code is finished executing, you can find the results (squared_error.csv, relative_ee.csv, configuration.csv) in the ./logs/ directory. Lower value is better for squared error and relative estimation error (relative-ee).

Visualize the results

To visualize the results, run the following commands. Make sure that you have executed the above two experiments (by running python src/main.py and python src/main.py setting=default) before visualizing the results.

python src/visualize.py

Then, you will find the following figures (slate size (standard/cascade/independent).png, evaluation policy similarity (standard/cascade/independent).png, data size (standard/cascade/independent).png) in the ./logs/ directory. Lower value is better for the relative-MSE (y-axis).

reward structure	Standard	Cascade	Independent
varying data size (n)
varying slate size (L)
varying evaluation policy similarity (λ)

You might also like...

PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Out-of-distribution Generalization Investigation on Vision Transformers This repository contains PyTorch evaluation code for Delving Deep into the Gen

72 Dec 13, 2022

Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting

Pytorch Pedestrian Attribute Recognition: A strong PyTorch baseline of pedestrian attribute recognition and multi-label classification.

79 Dec 18, 2022

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

offline-MBPO This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settings Pape

1 Oct 24, 2021

PRTR: Pose Recognition with Cascade Transformers

PRTR: Pose Recognition with Cascade Transformers Introduction This repository is the official implementation for Pose Recognition with Cascade Transfo

133 Dec 30, 2022

Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.

CasRel-pytorch-reimplement Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The o

170 Dec 1, 2022

一个目标检测的通用框架(不需要cuda编译)，支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。

Comments

Fix README
I'm gonna update obp to v0.5.2 when Cascade-DR is merged

hydra-core?

you may also need matplotlib in requirements

you can add the link to our personal websites when you list our names, if you want

Please add a link to the paper and its bibtex when available (like the following)

=== If you find this code useful in your research then please cite:

@inproceedings{saito2020doubly, author = {Saito, Yuta}, title = {Doubly Robust Estimator for Ranking Metrics with Post-Click Conversions}, year = {2020}, booktitle = {Fourteenth ACM Conference on Recommender Systems}, pages = {92–100}, location = {Virtual Event, Brazil}, series = {RecSys '20} } ```
opened by usaito 0

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

Related tags

Overview

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

About

Dependencies

Running the code

Visualize the results

You might also like...

PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

PRTR: Pose Recognition with Cascade Transformers

Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.

一个目标检测的通用框架(不需要cuda编译)，支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。

3D cascade RCNN for object detection on point cloud

Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

Neural Dynamic Policies for End-to-End Sensorimotor Learning

Comments

Fix README

Owner

Haruka Kiyohara

CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching（CVPR2021）

Our CIKM21 Paper "Incorporating Query Reformulating Behavior into Web Search Evaluation"

DRLib：A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

Automatic labeling, conversion of different data set formats, sample size statistics, model cascade

Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin et al., 2020).