The implementation of the submitted paper "Deep Multi-Behaviors Graph Network for Voucher Redemption Rate Prediction" in SIGKDD 2021 Applied Data Science Track.

Last update: Jul 12, 2022

Related tags

Recommender Systems DMBGN

Overview

DMBGN: Deep Multi-Behaviors Graph Networks for Voucher Redemption Rate Prediction

The implementation of the accepted paper "Deep Multi-Behaviors Graph Networks for Voucher Redemption Rate Prediction" in SIGKDD 2021 Applied Data Science Track.

DMBGN utilizes a User-Behavior Voucher Graph (UVG) to extract complex user-voucher-item relationship and the attention mechanism to capture users' long-term voucher redemption preference. Experiments shows that DMBGN achieves 10%-16% relative AUC improvement over Deep Neural Networks (DNN), and 2% to 4% AUC improvement over Deep Interest Network (DIN).

Benchmark Dataset

A randomly desensitized sampled dataset from one of the large-scaled production dataset from from Lazada (Alibaba Group) is included. The dataset contains three dataframes corresponding users' voucher collection logs, related user behavior logs and related item features, a detailed description can be found in ./data/README.md file.

We hope this dataset could help to facilitate research in the voucher redemption rate prediction field.

DMBGN Performance

Compared Models:

LR: Logistic Regression [1], a shallow model.
GBDT: Gradient Boosting Decision Tree [2], a tree-based non deep-learning model.
DNN: Deep Neural Networks.
WDL: Wide and Deep model [3], a widely accepted model in real industrial applications with an additional linear model besides the deep model compared to DNN.
DIN: Deep Interest Network [4], an attention-based model in recommendation systems that has been proven successful in Alibaba.

The experimental results on the public sample dataset are as follows:

Model	AUC	RelaImpr(DNN)	RelaImpr(DIN)	Logloss
LR	0.7377	-9.22%	-14.28%	0.3897
xgBoost	0.7759	5.40%	-0.48%	0.3640
DNN	0.7618	0.00%	-5.57%	0.3775
WDL	0.7716	3.73%	-2.05%	0.3717
DIN	0.7773	5.90%	0.00%	0.3688
DMBGN_AvgPooling	0.7789	6.54%	0.61%	0.3684
DMBGN_Pretrained	0.7804	7.11%	1.14%	0.3680
DMBGN	0.7885	10.20%	4.06%	0.3616

Note that this dataset is a random sample from dataset Region-C and the performance is different as in the submitted paper due to the smaller sample size (especially xgBoost). However, the conclusion from the experiment results is consistent with the submitted paper, where DMBGN achieves 10.20% relative AUC improvement over DNN and 4.6% uplift over DIN.

How To Use

All experiment codes are organized into the DMBGN_SIGKDD21-release.ipynb jupyter notebook including corresponding running logs, detail code implementation of each model (LR, GBDT, DNN, WDL, DIN, DMBGN) can be found in ./models folder.

To run the experiments, simply start a jupyter notebook and run all code cells in the DMBGN_SIGKDD21-release.ipynb file and check the output logs. Alternatively, you can refer to the existing log outputs in the notebook file. (If you encounter "Sorry, something went wrong. Roload?" error message, just click Reload and the notebook will show.)

To use the DMBGN model, please refer to the code implementation in ./models/DMBGN.py.

Minimum Requirement

python: 3.7.1
numpy: 1.19.5
pandas 1.2.1
pandasql 0.7.3
torch: 1.7.1
torch_geometric: 1.6.3
torch: 1.7.1
torch-cluster: 1.5.8
torch-geometric: 1.6.3
torch-scatter: 2.0.5
torch-sparse: 0.6.8
torch-spline-conv: 1.2.0
torchaudio: 0.7.2
torchvision: 0.8.2
deepctr-torch: 0.2.3
pickle: 4.0

What To Do

We are currently deploying DMBGN model online for Lazada voucher related business, the online A/B testing performance will be reported soon.
More detailed code comments are being added.

Acknowledgment

Our code implementation is developed based on the Deep Interest Network (DIN) codes from the DeepCTR package, with modification to fit DMBGN model architecture and multi-GPU usage.

We thanks the anonymous reviewers for their time and feedback.

Reference

[1] H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner,Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, et al.2013. Ad click prediction: a view from the trenches. InProceedings of the 19thACM SIGKDD international conference on Knowledge discovery and data mining.1222–1230.
[2] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma,Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boostingdecision tree.Advances in neural information processing systems30 (2017), 3146–3154.
[3] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, RohanAnil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah.2016. Wide & Deep Learning for Recommender Systems.CoRRabs/1606.07792(2016). arXiv:1606.07792 http://arxiv.org/abs/1606.07792 .
[4] Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, YanghuiYan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-throughrate prediction. InProceedings of the 24th ACM SIGKDD International Conferenceon Knowledge Discovery & Data Mining. 1059–1068.

You might also like...

Collects all accepted (partial and full scored) codes submitted within the given timeframe and saves them locally for plagiarism check.

Collects all accepted (partial and full scored) codes submitted within the given timeframe of any contest.

2 Dec 28, 2021

Handy Tool to check the availability of onion site and to extract the title of submitted onion links.

This tool helps is to quickly investigate a huge set of onion sites based by checking its availability which helps to filter out the inactive sites and collect the site title that might helps us to categories what site we are handling.

13 Nov 25, 2022

Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

RecycleD Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN

23 Nov 5, 2022

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Multi-Modal Self-Supervision using GDT and StiCa This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized D

42 Dec 9, 2022

The repository forked from NVlabs uses our data. (Differentiable rasterization applied to 3D model simplification tasks)

nvdiffmodeling [origin_code] Differentiable rasterization applied to 3D model simplification tasks, as described in the paper: Appearance-Driven Autom

2 Oct 31, 2022

Deep Learning applied to Integral data analysis

DeepIntegralCompton Deep Learning applied to Integral data analysis Module installation Move to the root directory of the project and execute : pip in

1 Dec 10, 2021

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

CKAN: The Open Source Data Portal Software CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work

3.6k Dec 27, 2022

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma This repo provi

99 Dec 27, 2022

Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

Modeling High-Frequency Limit Order Book Dynamics Using Machine Learning Framework to capture the dynamics of high-frequency limit order books. Overvi

1.3k Jan 7, 2023

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

791 Jan 4, 2023

Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.

time-series-kafka-demo Mock stream producer for time series data using Kafka. I walk through this tutorial and others here on GitHub and on my Medium

26 Nov 15, 2022

Data science, Data manipulation and Machine learning package.

duality Data science, Data manipulation and Machine learning package. Use permitted according to the terms of use and conditions set by the attached l

3 Oct 19, 2022

Data Version Control or DVC is an open-source tool for data science and machine learning projects

Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj

2 Jul 29, 2021

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

27 Nov 1, 2022

Explore-bikeshare-data - GitHub project as part of the Programming for Data Science with Python Nanodegree from Udacity

Date created February 10, 2022 Project Title Explore US Bikeshare Data Descripti

1 Feb 14, 2022

Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

3.2k Jan 2, 2023

Toy example of an applied ML pipeline for me to experiment with MLOps tools.

Toy Machine Learning Pipeline Table of Contents About Getting Started ML task description and evaluation procedure Dataset description Repository stru

190 Dec 21, 2022

Differentiable rasterization applied to 3D model simplification tasks

nvdiffmodeling Differentiable rasterization applied to 3D model simplification tasks, as described in the paper: Appearance-Driven Automatic 3D Model

336 Dec 30, 2022

skweak: A software toolkit for weak supervision applied to NLP tasks

Labelled data remains a scarce resource in many practical NLP scenarios. This is especially the case when working with resource-poor languages (or text domains), or when using task-specific labels without pre-existing datasets. The only available option is often to collect and annotate texts by hand, which is expensive and time-consuming.

Norsk Regnesentral (Norwegian Computing Center)

850 Dec 28, 2022

Comments

关于log_df和session_df时间排序rk的问题
log_df时间排序问题：代码：q1 = "select *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY voucher_collect_time ASC) as rk from {logdf}".format(logdf="log_df") log_df = ps.sqldf(q1, locals()) 使用的是voucher_collect_time的字符串排序，而非数字排序

session_df中rk和action_time排序有差异如选session_id=6777_314时： 983372,1,6777_314,314,199,30,99368,55161,order,aft,1,100722,4113,101441,1.0 983373,1,6777_314,314,199,30,99368,105402,order,aft,2,106784,6966,99361,4.0 983374,1,6777_314,314,199,30,99368,60529,order,aft,3,106806,4117,56645,3.0 983375,1,6777_314,314,199,30,99368,124268,order,aft,4,106807,8989,85140,2.0 983376,1,6777_314,314,199,30,99368,149035,order,aft,5,106807,14451,99361, 983377,1,6777_314,314,199,30,99368,105409,order,aft,6,106807,8975,99361,4.0 983378,1,6777_314,314,199,30,99368,147303,order,aft,7,107227,12284,56645,2.0 983379,1,6777_314,314,199,30,99368,41904,order,aft,8,108025,5891,126072,3.0 983380,1,6777_314,314,199,30,99368,19501,order,aft,9,109614,10100179,56645,3.0 983381,1,6777_314,314,199,30,99368,83839,order,aft,10,117892,12033,103322,1.0 983382,1,6777_314,314,199,30,99368,126274,order,aft,11,119693,13457,56645,4.0 983383,1,6777_314,314,199,30,99368,63166,order,aft,12,120140,13456,56645,2.0 983384,1,6777_314,314,199,30,99368,55415,order,aft,13,122646,12003,56645,2.0 983386,1,6777_314,314,199,30,99368,159734,order,aft,15,124508,7232,129454,5.0 983385,1,6777_314,314,199,30,99368,55148,order,aft,14,124527,4113,101441,1.0 983387,1,6777_314,314,199,30,99368,136254,order,aft,16,124553,14451,129454, 983388,1,6777_314,314,199,30,99368,159734,order,aft,17,124553,7232,129454,5.0 983389,1,6777_314,314,199,30,99368,119064,order,aft,18,124711,4307,267730,7.0 983390,1,6777_314,314,199,30,99368,90041,order,aft,19,124938,5493,56645,3.0 983391,1,6777_314,314,199,30,99368,119064,order,aft,20,124948,4307,267730,7.0
opened by zjpf 1