Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

Overview

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model


About

This repository contains the code to replicate the synthetic experiment conducted in the paper "Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model" by Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, and Yasuo Yamamoto, which has been accepted to WSDM2022.

If you find this code useful in your research then please site:

@inproceedings{kiyohara2022doubly,
  author = {Kiyohara, Haruka and Saito, Yuta and Matsuhiro, Tatsuya and Narita, Yusuke and Shimizu, Nobuyuki and Yamamoto, Yasuo},
  title = {Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model},
  booktitle = {Proceedings of the 15th International Conference on Web Search and Data Mining},
  pages = {xxx--xxx},
  year = {2022},
}

Dependencies

This repository supports Python 3.7 or newer.

  • numpy==1.20.0
  • pandas==1.2.1
  • scikit-learn==0.24.1
  • matplotlib==3.4.3
  • obp==0.5.2
  • hydra-core==1.0.6

Note that the proposed Cascade-DR estimator is implemented in Open Bandit Pipeline (obp.ope.SlateCascadeDoublyRobust).

Running the code

To conduct the synthetic experiment, run the following commands.

(i) run OPE simulations with varying data size, with the fixed slate size.

python src/main.py setting=n_rounds

(ii), (iii) run OPE simulations with varying slate size and policy similarities, with the fixed data size.

python src/main.py

Once the code is finished executing, you can find the results (squared_error.csv, relative_ee.csv, configuration.csv) in the ./logs/ directory. Lower value is better for squared error and relative estimation error (relative-ee).

Visualize the results

To visualize the results, run the following commands. Make sure that you have executed the above two experiments (by running python src/main.py and python src/main.py setting=default) before visualizing the results.

python src/visualize.py

Then, you will find the following figures (slate size (standard/cascade/independent).png, evaluation policy similarity (standard/cascade/independent).png, data size (standard/cascade/independent).png) in the ./logs/ directory. Lower value is better for the relative-MSE (y-axis).

reward structure Standard Cascade Independent
varying data size (n)
varying slate size (L)
varying evaluation policy similarity (λ)
You might also like...
 PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.
PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Out-of-distribution Generalization Investigation on Vision Transformers This repository contains PyTorch evaluation code for Delving Deep into the Gen

 Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting
Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting

Pytorch Pedestrian Attribute Recognition: A strong PyTorch baseline of pedestrian attribute recognition and multi-label classification.

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

offline-MBPO This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settings Pape

PRTR: Pose Recognition with Cascade Transformers
PRTR: Pose Recognition with Cascade Transformers

PRTR: Pose Recognition with Cascade Transformers Introduction This repository is the official implementation for Pose Recognition with Cascade Transfo

Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.

CasRel-pytorch-reimplement Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The o

一个目标检测的通用框架(不需要cuda编译),支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。
一个目标检测的通用框架(不需要cuda编译),支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。

一个目标检测的通用框架(不需要cuda编译),支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。

3D cascade RCNN for object detection on point cloud
3D cascade RCNN for object detection on point cloud

3D Cascade RCNN This is the implementation of 3D Cascade RCNN: High Quality Object Detection in Point Clouds. We designed a 3D object detection model

Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

Real Cascade U-Nets for Anime Image Super Resolution 中文 | English 🔥 Real-CUGAN

 Neural Dynamic Policies for End-to-End Sensorimotor Learning
Neural Dynamic Policies for End-to-End Sensorimotor Learning

This is a PyTorch based implementation for our NeurIPS 2020 paper on Neural Dynamic Policies for end-to-end sensorimotor learning.

Comments
  • Fix README

    Fix README

    • I'm gonna update obp to v0.5.2 when Cascade-DR is merged
    • hydra-core?
    • you may also need matplotlib in requirements
    • you can add the link to our personal websites when you list our names, if you want
    • Please add a link to the paper and its bibtex when available (like the following)

    === If you find this code useful in your research then please cite:

    @inproceedings{saito2020doubly,
    author = {Saito, Yuta},
    title = {Doubly Robust Estimator for Ranking Metrics with Post-Click Conversions},
    year = {2020},
    booktitle = {Fourteenth ACM Conference on Recommender Systems},
    pages = {92–100},
    location = {Virtual Event, Brazil},
    series = {RecSys '20}
    }
    ```
    opened by usaito 0
Owner
Haruka Kiyohara
Tokyo Tech undergrads / interested in (offline) reinforcement learning and off-policy evaluation / intern at negocia, Hanjuku-kaso, Yahoo! Japan Research
Haruka Kiyohara
CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching(CVPR2021)

CFNet(CVPR 2021) This is the implementation of the paper CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching, CVPR 2021, Zhelun Shen, Yuch

null 106 Dec 28, 2022
Our CIKM21 Paper "Incorporating Query Reformulating Behavior into Web Search Evaluation"

Reformulation-Aware-Metrics Introduction This codebase contains source-code of the Python-based implementation of our CIKM 2021 paper. Chen, Jia, et a

xuanyuan14 5 Mar 5, 2022
DRLib:A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

DRLib:A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos A concise deep reinforcement learning libr

null 329 Jan 3, 2023
Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms This repository contains implementations of various off-policy multi-agent reinforceme

null 183 Dec 28, 2022
Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

advantage-weighted-regression Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning, by Peng et al. (

Omar D. Domingues 1 Dec 2, 2021
Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

V-MPO Simple code to demonstrate Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) in Pyt

Nugroho Dewantoro 9 Jun 6, 2022
Automatic labeling, conversion of different data set formats, sample size statistics, model cascade

Simple Gadget Collection for Object Detection Tasks Automatic image annotation Conversion between different annotation formats Obtain statistical info

llt 4 Aug 24, 2022
Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Self-Supervised Policy Adaptation during Deployment PyTorch implementation of PAD and evaluation benchmarks from Self-Supervised Policy Adaptation dur

Nicklas Hansen 101 Nov 1, 2022
ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin et al., 2020).

ReConsider ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin

Facebook Research 47 Jul 26, 2022