pytorch-lifestream
a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-size volumes of raw events like game history events, clickstream data, purchase history or card transactions.
It supports various methods of self-supervised training, adapted for event sequences:
- Contrastive Learning for Event Sequences (CoLES)
- Contrastive Predictive Coding (CPC)
- Replaced Token Detection (RTD) from ELECTRA
- Next Sequence Prediction (NSP) from BERT
- Sequences Order Prediction (SOP) from ALBERT
It supports several types of encoders, including Transformer and RNN. It also supports many types of self-supervised losses.
The following variants of the contrastive losses are supported:
- Contrastive loss (paper)
- Triplet loss (paper)
- Binomial deviance loss (paper)
- Histogramm loss (paper)
- Margin loss (paper)
- VICReg loss (paper)
Install from PyPi
pip install pytorch-lifestream
Install from source
# Ubuntu 20.04
sudo apt install python3.8 python3-venv
pip3 install pipenv
pipenv sync --dev # install packages exactly as specified in Pipfile.lock
pipenv shell
pytest
Demo notebooks
- Self-supervided training and embeddings for downstream task notebook
- Self-supervided embeddings in CatBoost notebook
- Self-supervided training and fine-tuning notebook
- PySpark and Parquet for data preprocessing notebook
Experiments on public datasets
pytorch-lifestream
usage experiments on several public event datasets are available in the separate repo