This repo includes some graph-based CTR prediction models and other representative baselines.

Related tags

Machine Learning GraphCTR

Overview

Graph-based CTR prediction

This is a repository designed for graph-based CTR prediction methods, it includes our graph-based CTR prediction methods:

Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction paper
GraphFM: Graph Factorization Machines for Feature Interaction Modeling paper

and some other representative baselines:

HoAFM: A High-order Attentive Factorization Machine for CTR Prediction paper
AutoInt: AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks paper
InterHAt: Interpretable Click-Through Rate Prediction through Hierarchical Attention paper

Requirements:

Tensorflow 1.5.0
Python 3.6
CUDA 9.0+ (For GPU)

Usage

Our code is based on AutoInt.

Input Format

The required input data is in the following format:

train_x: matrix with shape (num_sample, num_field). train_x[s][t] is the feature value of feature field t of sample s in the dataset. The default value for categorical feature is 1.
train_i: matrix with shape (num_sample, num_field). train_i[s][t] is the feature index of feature field t of sample s in the dataset. The maximal value of train_i is the feature size.
train_y: label of each sample in the dataset.

If you want to know how to preprocess the data, please refer to data/Dataprocess/Criteo/preprocess.py

Example

There are four public real-world datasets(Avazu, Criteo, KDD12, MovieLens-1M) that you can use. You can run the code on MovieLens-1M dataset directly in /movielens. The other three datasets are super huge, and they can not be fit into the memory as a whole. Therefore, we split the whole dataset into 10 parts and we use the first file as test set and the second file as valid set. We provide the codes for preprocessing these three datasets in data/Dataprocess. If you want to reuse these codes, you should first run preprocess.py to generate train_x.txt, train_i.txt, train_y.txt as described in Input Format. Then you should run data/Dataprocesss/Kfold_split/StratifiedKfold.py to split the whole dataset into ten folds. Finally you can run scale.py to scale the numerical value(optional).

To help test the correctness of the code and familarize yourself with the code, we upload the first 10000 samples of Criteo dataset in train_examples.txt. And we provide the scripts for preprocessing and training.(Please refer to data/sample_preprocess.sh and run_criteo.sh, you may need to modify the path in config.py and run_criteo.sh).

After you run the data/sample_preprocess.sh, you should get a folder named Criteo which contains part*, feature_size.npy, fold_index.npy, train_*.txt. feature_size.npy contains the number of total features which will be used to initialize the model. train_*.txt is the whole dataset.

Here's how to run the preprocessing.

cd data
mkdir Criteo
python ./Dataprocess/Criteo/preprocess.py
python ./Dataprocess/Kfold_split/stratifiedKfold.py
python ./Dataprocess/Criteo/scale.py

Here's how to train GraphFM on Criteo dataset.

CUDA_VISIBLE_DEVICES=$GPU python -m code.train \
--model_type GraphFM \
                        --data_path $YOUR_DATA_PATH --data Criteo \
                        --blocks 3 --heads 2 --block_shape "[64, 64, 64]" \
                        --ks "[39, 20, 5]" \
                        --is_save --has_residual \
                        --save_path ./models/GraphFM/Criteo/b3h2_64x64x64/ \
                        --field_size 39  --run_times 1 \
                        --epoch 2 --batch_size 1024 \

Here's how to train GraphFM on Avazu dataset.

CUDA_VISIBLE_DEVICES=$GPU python -m code.train \
--model_type GraphFM \
                        --data_path $YOUR_DATA_PATH --data Avazu \
                        --blocks 3 --heads 2 --block_shape "[64, 64, 64]" \
                        --ks "[23, 10, 2]" \
                        --is_save --has_residual \
                        --save_path ./models/GraphFM/Avazu/b3h2_64x64x64/ \
                        --field_size 23  --run_times 1 \
                        --epoch 2 --batch_size 1024 \

You can run the training on the relatively small MovieLens dataset in /movielens.

You should see the output like this:

...
train logs
...
start testing!...
restored from ./models/Criteo/b3h2_64x64x64/1/
test-result = 0.8088, test-logloss = 0.4430
test_auc [0.8088305055534442]
test_log_loss [0.44297631300399626]
avg_auc 0.8088305055534442
avg_log_loss 0.44297631300399626

Citation

If you find this repo useful for your research, please consider citing the following paper:

@inproceedings{li2019fi,
  title={Fi-gnn: Modeling feature interactions via graph neural networks for ctr prediction},
  author={Li, Zekun and Cui, Zeyu and Wu, Shu and Zhang, Xiaoyu and Wang, Liang},
  booktitle={Proceedings of the 28th ACM International Conference on Information and Knowledge Management},
  pages={539--548},
  year={2019}
}

@article{li2021graphfm,
  title={GraphFM: Graph Factorization Machines for Feature Interaction Modeling},
  author={Li, Zekun and Wu, Shu and Cui, Zeyu and Zhang, Xiaoyu},
  journal={arXiv preprint arXiv:2105.11866},
  year={2021}
}

Contact information

You can contact Zekun Li ([email protected]), if there are questions related to the code.

Acknowledgement

This implementation is based on Weiping Song and Chence Shi's AutoInt. Thanks for their sharing and contribution.

Cryptocurrency price prediction and exceptions in python

Cryptocurrency price prediction and exceptions in python This is a coursework on foundations of computing module Through this coursework i worked on m

1 Nov 7, 2021

Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification

Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification Introduction. This package includes the pyth

5 Dec 6, 2022

A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

🤖 Interactive Machine Learning experiments: 🏋️models training + 🎨models demo

1.4k Jan 6, 2023

In this Repo a simple Sklearn Model will be trained and pushed to MLFlow

SKlearn_to_MLFLow In this Repo a simple Sklearn Model will be trained and pushed to MLFlow Install This Repo is based on poetry python3 -m venv .venv

1 Dec 13, 2021

jaxfg - Factor graph-based nonlinear optimization library for JAX.

Factor graphs + nonlinear optimization in JAX

134 Dec 21, 2022

PLUR is a collection of source code datasets suitable for graph-based machine learning.

PLUR (Programming-Language Understanding and Repair) is a collection of source code datasets suitable for graph-based machine learning. We provide scripts for downloading, processing, and loading the datasets. This is done by offering a unified API and data structures for all datasets.

76 Nov 25, 2022

A single Python file with some tools for visualizing machine learning in the terminal.

Machine Learning Visualization Tools A single Python file with some tools for visualizing machine learning in the terminal. This demo is composed of t

35 Dec 29, 2022

2D fluid simulation implementation of Jos Stam paper on real-time fuild dynamics, including some suggested extensions.

Fluid Simulation Usage Download this repo and store it in your computer. Open a terminal and go to the root directory of this folder. Make sure you ha

5 Dec 2, 2022

pure-predict: Machine learning prediction in pure Python

pure-predict speeds up and slims down machine learning prediction applications. It is a foundational tool for serverless inference or small batch prediction with popular machine learning frameworks like scikit-learn and fasttext. It implements the predict methods of these frameworks in pure Python.

84 Dec 29, 2022

Comments

Weighted graph in fignn

Hi,

Thanks for the great work! I have a question regarding to the FiGNN implementation: is the weighted graph version implemented? By checking the code, it seems we only consider the fully connected (unweighted) graph, rather than weighted graph with edge weights learnt based on Equation (2) in the paper. I'm looking forward to your reply. Thanks in advance!

opened by cdeng1016 0

This repo includes some graph-based CTR prediction models and other representative baselines.

Related tags

Overview

Graph-based CTR prediction

Requirements:

Usage

Input Format

Example

Citation

Contact information

Acknowledgement

You might also like...

Cryptocurrency price prediction and exceptions in python

Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification

A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

In this Repo a simple Sklearn Model will be trained and pushed to MLFlow

jaxfg - Factor graph-based nonlinear optimization library for JAX.

PLUR is a collection of source code datasets suitable for graph-based machine learning.

A single Python file with some tools for visualizing machine learning in the terminal.

2D fluid simulation implementation of Jos Stam paper on real-time fuild dynamics, including some suggested extensions.

pure-predict: Machine learning prediction in pure Python

Comments

Weighted graph in fignn

Owner

Big Data and Multi-modal Computing Group, CRIPAC

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Mortality risk prediction for COVID-19 patients using XGBoost models

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

LibTraffic is a unified, flexible and comprehensive traffic prediction library based on PyTorch

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

customer churn prediction prevention in telecom industry using machine learning and survival analysis