The implementation of the submitted paper "Deep Multi-Behaviors Graph Network for Voucher Redemption Rate Prediction" in SIGKDD 2021 Applied Data Science Track.

Overview

DMBGN: Deep Multi-Behaviors Graph Networks for Voucher Redemption Rate Prediction

The implementation of the accepted paper "Deep Multi-Behaviors Graph Networks for Voucher Redemption Rate Prediction" in SIGKDD 2021 Applied Data Science Track.

DMBGN utilizes a User-Behavior Voucher Graph (UVG) to extract complex user-voucher-item relationship and the attention mechanism to capture users' long-term voucher redemption preference. Experiments shows that DMBGN achieves 10%-16% relative AUC improvement over Deep Neural Networks (DNN), and 2% to 4% AUC improvement over Deep Interest Network (DIN).

Benchmark Dataset

A randomly desensitized sampled dataset from one of the large-scaled production dataset from from Lazada (Alibaba Group) is included. The dataset contains three dataframes corresponding users' voucher collection logs, related user behavior logs and related item features, a detailed description can be found in ./data/README.md file.

We hope this dataset could help to facilitate research in the voucher redemption rate prediction field.

DMBGN Performance

Compared Models:

  • LR: Logistic Regression [1], a shallow model.
  • GBDT: Gradient Boosting Decision Tree [2], a tree-based non deep-learning model.
  • DNN: Deep Neural Networks.
  • WDL: Wide and Deep model [3], a widely accepted model in real industrial applications with an additional linear model besides the deep model compared to DNN.
  • DIN: Deep Interest Network [4], an attention-based model in recommendation systems that has been proven successful in Alibaba.

The experimental results on the public sample dataset are as follows:

Model AUC RelaImpr(DNN) RelaImpr(DIN) Logloss
LR 0.7377 -9.22% -14.28% 0.3897
xgBoost 0.7759 5.40% -0.48% 0.3640
DNN 0.7618 0.00% -5.57% 0.3775
WDL 0.7716 3.73% -2.05% 0.3717
DIN 0.7773 5.90% 0.00% 0.3688
DMBGN_AvgPooling 0.7789 6.54% 0.61% 0.3684
DMBGN_Pretrained 0.7804 7.11% 1.14% 0.3680
DMBGN 0.7885 10.20% 4.06% 0.3616

Note that this dataset is a random sample from dataset Region-C and the performance is different as in the submitted paper due to the smaller sample size (especially xgBoost). However, the conclusion from the experiment results is consistent with the submitted paper, where DMBGN achieves 10.20% relative AUC improvement over DNN and 4.6% uplift over DIN.

image info

How To Use

All experiment codes are organized into the DMBGN_SIGKDD21-release.ipynb jupyter notebook including corresponding running logs, detail code implementation of each model (LR, GBDT, DNN, WDL, DIN, DMBGN) can be found in ./models folder.

To run the experiments, simply start a jupyter notebook and run all code cells in the DMBGN_SIGKDD21-release.ipynb file and check the output logs. Alternatively, you can refer to the existing log outputs in the notebook file. (If you encounter "Sorry, something went wrong. Roload?" error message, just click Reload and the notebook will show.)

To use the DMBGN model, please refer to the code implementation in ./models/DMBGN.py.

Minimum Requirement

python: 3.7.1
numpy: 1.19.5
pandas 1.2.1
pandasql 0.7.3
torch: 1.7.1
torch_geometric: 1.6.3
torch: 1.7.1
torch-cluster: 1.5.8
torch-geometric: 1.6.3
torch-scatter: 2.0.5
torch-sparse: 0.6.8
torch-spline-conv: 1.2.0
torchaudio: 0.7.2
torchvision: 0.8.2
deepctr-torch: 0.2.3
pickle: 4.0

What To Do

  • We are currently deploying DMBGN model online for Lazada voucher related business, the online A/B testing performance will be reported soon.
  • More detailed code comments are being added.

Acknowledgment

Our code implementation is developed based on the Deep Interest Network (DIN) codes from the DeepCTR package, with modification to fit DMBGN model architecture and multi-GPU usage.

We thanks the anonymous reviewers for their time and feedback.

Reference

  • [1] H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner,Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, et al.2013. Ad click prediction: a view from the trenches. InProceedings of the 19thACM SIGKDD international conference on Knowledge discovery and data mining.1222–1230.
  • [2] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma,Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boostingdecision tree.Advances in neural information processing systems30 (2017), 3146–3154.
  • [3] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, RohanAnil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah.2016. Wide & Deep Learning for Recommender Systems.CoRRabs/1606.07792(2016). arXiv:1606.07792 http://arxiv.org/abs/1606.07792 .
  • [4] Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, YanghuiYan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-throughrate prediction. InProceedings of the 24th ACM SIGKDD International Conferenceon Knowledge Discovery & Data Mining. 1059–1068.
You might also like...
Collects all accepted (partial and full scored) codes submitted within the given timeframe and saves them locally for plagiarism check.
Collects all accepted (partial and full scored) codes submitted within the given timeframe and saves them locally for plagiarism check.

Collects all accepted (partial and full scored) codes submitted within the given timeframe of any contest.

Handy Tool to check the availability of onion site and to extract the title of submitted onion links.

This tool helps is to quickly investigate a huge set of onion sites based by checking its availability which helps to filter out the inactive sites and collect the site title that might helps us to categories what site we are handling.

Official PyTorch implementation of the paper
Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

RecycleD Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Multi-Modal Self-Supervision using GDT and StiCa This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized D

The repository forked from NVlabs uses our data. (Differentiable rasterization applied to 3D model simplification tasks)
The repository forked from NVlabs uses our data. (Differentiable rasterization applied to 3D model simplification tasks)

nvdiffmodeling [origin_code] Differentiable rasterization applied to 3D model simplification tasks, as described in the paper: Appearance-Driven Autom

Deep Learning applied to Integral data analysis

DeepIntegralCompton Deep Learning applied to Integral data analysis Module installation Move to the root directory of the project and execute : pip in

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).
Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma This repo provi

Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.
Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

Modeling High-Frequency Limit Order Book Dynamics Using Machine Learning Framework to capture the dynamics of high-frequency limit order books. Overvi

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a
Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.

time-series-kafka-demo Mock stream producer for time series data using Kafka. I walk through this tutorial and others here on GitHub and on my Medium

Data science, Data manipulation and Machine learning package.
Data science, Data manipulation and Machine learning package.

duality Data science, Data manipulation and Machine learning package. Use permitted according to the terms of use and conditions set by the attached l

Data Version Control or DVC is an open-source tool for data science and machine learning projects
Data Version Control or DVC is an open-source tool for data science and machine learning projects

Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials
Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Explore-bikeshare-data - GitHub project as part of the Programming for Data Science with Python Nanodegree from Udacity

Date created February 10, 2022 Project Title Explore US Bikeshare Data Descripti

Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Toy example of an applied ML pipeline for me to experiment with MLOps tools.

Toy Machine Learning Pipeline Table of Contents About Getting Started ML task description and evaluation procedure Dataset description Repository stru

Differentiable rasterization applied to 3D model simplification tasks
Differentiable rasterization applied to 3D model simplification tasks

nvdiffmodeling Differentiable rasterization applied to 3D model simplification tasks, as described in the paper: Appearance-Driven Automatic 3D Model

skweak: A software toolkit for weak supervision applied to NLP tasks
skweak: A software toolkit for weak supervision applied to NLP tasks

Labelled data remains a scarce resource in many practical NLP scenarios. This is especially the case when working with resource-poor languages (or text domains), or when using task-specific labels without pre-existing datasets. The only available option is often to collect and annotate texts by hand, which is expensive and time-consuming.

Comments
  • 关于log_df和session_df时间排序rk的问题

    关于log_df和session_df时间排序rk的问题

    1. log_df时间排序问题: 代码:q1 = "select *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY voucher_collect_time ASC) as rk from {logdf}".format(logdf="log_df") log_df = ps.sqldf(q1, locals()) 使用的是voucher_collect_time的字符串排序,而非数字排序
    2. session_df中rk和action_time排序有差异 如选session_id=6777_314时: 983372,1,6777_314,314,199,30,99368,55161,order,aft,1,100722,4113,101441,1.0 983373,1,6777_314,314,199,30,99368,105402,order,aft,2,106784,6966,99361,4.0 983374,1,6777_314,314,199,30,99368,60529,order,aft,3,106806,4117,56645,3.0 983375,1,6777_314,314,199,30,99368,124268,order,aft,4,106807,8989,85140,2.0 983376,1,6777_314,314,199,30,99368,149035,order,aft,5,106807,14451,99361, 983377,1,6777_314,314,199,30,99368,105409,order,aft,6,106807,8975,99361,4.0 983378,1,6777_314,314,199,30,99368,147303,order,aft,7,107227,12284,56645,2.0 983379,1,6777_314,314,199,30,99368,41904,order,aft,8,108025,5891,126072,3.0 983380,1,6777_314,314,199,30,99368,19501,order,aft,9,109614,10100179,56645,3.0 983381,1,6777_314,314,199,30,99368,83839,order,aft,10,117892,12033,103322,1.0 983382,1,6777_314,314,199,30,99368,126274,order,aft,11,119693,13457,56645,4.0 983383,1,6777_314,314,199,30,99368,63166,order,aft,12,120140,13456,56645,2.0 983384,1,6777_314,314,199,30,99368,55415,order,aft,13,122646,12003,56645,2.0 983386,1,6777_314,314,199,30,99368,159734,order,aft,15,124508,7232,129454,5.0 983385,1,6777_314,314,199,30,99368,55148,order,aft,14,124527,4113,101441,1.0 983387,1,6777_314,314,199,30,99368,136254,order,aft,16,124553,14451,129454, 983388,1,6777_314,314,199,30,99368,159734,order,aft,17,124553,7232,129454,5.0 983389,1,6777_314,314,199,30,99368,119064,order,aft,18,124711,4307,267730,7.0 983390,1,6777_314,314,199,30,99368,90041,order,aft,19,124938,5493,56645,3.0 983391,1,6777_314,314,199,30,99368,119064,order,aft,20,124948,4307,267730,7.0
    opened by zjpf 1
Owner
null
QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)

QRec is a Python framework for recommender systems (Supported by Python 3.7.4 and Tensorflow 1.14+) in which a number of influential and newly state-of-the-art recommendation models are implemented. QRec has a lightweight architecture and provides user-friendly interfaces. It can facilitate model implementation and evaluation.

Yu 1.4k Dec 27, 2022
Implementation of a hadoop based movie recommendation system

Implementation-of-a-hadoop-based-movie-recommendation-system 通过编写代码,设计一个基于Hadoop的电影推荐系统,通过此推荐系统的编写,掌握在Hadoop平台上的文件操作,数据处理的技能。windows 10 hadoop 2.8.3 p

汝聪(Ricardo) 5 Oct 2, 2022
Bachelor's Thesis in Computer Science: Privacy-Preserving Federated Learning Applied to Decentralized Data

federated is the source code for the Bachelor's Thesis Privacy-Preserving Federated Learning Applied to Decentralized Data (Spring 2021, NTNU) Federat

Dilawar Mahmood 25 Nov 30, 2022
Object tracking and object detection is applied to track golf puts in real time and display stats/games.

Putting_Game Object tracking and object detection is applied to track golf puts in real time and display stats/games. Works best with the Perfect Prac

Max 1 Dec 29, 2021
Applied Machine Learning for Graduate Program in Computer Science (PPGCC)

Applied Machine Learning for Graduate Program in Computer Science (PPGCC) - Federal University of Santa Catarina

Jônatas Negri Grandini 1 Dec 22, 2021
The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

DG-TrajGen The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022. Our Meth

Wang 25 Sep 26, 2022
The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for LiDAR-Based Place Recognition.

OverlapTransformer The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for

HAOMO.AI 136 Jan 3, 2023
This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC

LEAP Lab 2 Sep 15, 2022
ClusterMonitor - a very simple python script which monitors and records the CPU and RAM consumption of submitted cluster jobs

ClusterMonitor A very simple python script which monitors and records the CPU and RAM consumption of submitted cluster jobs. Usage To start recording

null 23 Oct 4, 2021