A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Dmitri Babaev

Last update: Dec 17, 2022

Related tags

Deep Learning pytorch-lifestream

Overview

pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-size volumes of raw events like game history events, clickstream data, purchase history or card transactions.

It supports various methods of self-supervised training, adapted for event sequences:

Contrastive Learning for Event Sequences (CoLES)
Contrastive Predictive Coding (CPC)
Replaced Token Detection (RTD) from ELECTRA
Next Sequence Prediction (NSP) from BERT
Sequences Order Prediction (SOP) from ALBERT

It supports several types of encoders, including Transformer and RNN. It also supports many types of self-supervised losses.

The following variants of the contrastive losses are supported:

Contrastive loss (paper)
Triplet loss (paper)
Binomial deviance loss (paper)
Histogramm loss (paper)
Margin loss (paper)
VICReg loss (paper)

Install from PyPi

pip install pytorch-lifestream

Install from source

# Ubuntu 20.04

sudo apt install python3.8 python3-venv
pip3 install pipenv

pipenv sync  --dev # install packages exactly as specified in Pipfile.lock
pipenv shell
pytest

Demo notebooks

Self-supervided training and embeddings for downstream task notebook
Self-supervided embeddings in CatBoost notebook
Self-supervided training and fine-tuning notebook
PySpark and Parquet for data preprocessing notebook

Experiments on public datasets

pytorch-lifestream usage experiments on several public event datasets are available in the separate repo

Comments

torch.stack in def collate_feature_dict

ptls/data_load/utils.py

Hello!

If the dataloader has a feature called target. And the batchsize is not a multiple of the length of the dataset, then an error pops up on the last batch: "Sizes of tensors must match except in dimension 0". Due to the use of torch.staсk when processing a feature startwith 'target'.

opened by Ivanich-spb 11
Not supported multiGPU option from pytorchlightning.Trainer

Try to set Trainer(gpus=[0,1]), while using PtlsDataModule as data module, get such error:

AttributeError: Can't pickle local object 'PtlsDataModule.__init__.<locals>.train_dataloader'

opened by mazitovs 1
Correct seq_len for feature dict
rec = { 'mcc': [0, 1, 2, 3], 'target_distribution': [0.1, 0.2, 0.4, 0.1, 0.1, 0.0], }

How to get correct seq_len. true len: 4 possible length: 4, 6 'target_distribution' is incorrect field to get length, this is not a sequence, this is an array
opened by ivkireev86 1
Save categories encodings along with model weights in demos

Вместе с обученной моделью необходимо сохранять обученный препроцессор и разбивку на трейн-тест. Иначе категории могут поехать и сохраненная предобученная модель станет бесполезной.

opened by ivkireev86 1
Documentation index
Прототип главной страницы документации. Три секции:

описание моделей библиотеки

гайд как использовать библиотеку

как писать свои компоненты

Есть краткое описание и ссылки на подробные (которые напишем потом).

В описании модулей предложена структура библиотеки. Предполагается, что мы эти модули в ближайшее создадим и перетащим туда соответсвующие классы из библиотеки. Старые, модули, которые станут пустыми, удалим. Далее будем придерживаться схемы, описанной в этом документе.

На ревью предлагается чекнуть предлагаемую структуру библиотеки, названия модулей ну и сам описательный текст документа.
opened by ivkireev86 1
KL cyclostationarity test tools

Test provides a hystogram with self-samples similarity vs. random sample similarity. Shows compatibility with CoLES.

Think about tests for other frameworks.

opened by ivkireev86 0
Repair pyspark tests
def test_dt_to_timestamp(): spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00'}, {'dt': '2012-01-01 12:01:16'}, {'dt': '2021-12-30 00:00:00'} ])

df = df.withColumn('ts', dt_to_timestamp('dt')) ts = [rec.ts for rec in df.select('ts').collect()]

assert ts == [0, 1325419276, 1640822400]

E assert [-10800, 1325...6, 1640811600] == [0, 1325419276, 1640822400] E At index 0 diff: -10800 != 0 E Use -v to get more diff

ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:16: AssertionError

def test_datetime_to_timestamp(): t = DatetimeToTimestamp(col_name_original='dt') spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00', 'rn': 1}, {'dt': '2012-01-01 12:01:16', 'rn': 2}, {'dt': '2021-12-30 00:00:00', 'rn': 3} ]) df = t.fit_transform(df) et = [rec.event_time for rec in df.select('event_time').collect()]

assert et[0] == 0

E assert -10800 == 0

ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:48: AssertionError
opened by ikretus 0
docs. Development guide (for demo notebooks)
add current patterns

when model training start print message "model training stats, please wait. See tensorboard to track progress", use it with enable_progress=False

documentation user feedback
opened by ivkireev86 0

Releases(v0.5.1)

v0.5.1(Dec 28, 2022)
What's Changed

fixed cpc import by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/90

add softmaxloss and tests by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/87

MLM NSP Module by @mazitovs in https://github.com/dllllb/pytorch-lifestream/pull/88

fix test dropout error by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/91

New Contributors

@ArtyomVorobev made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/90

@mazitovs made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/88

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.5.0...v0.5.1
Source code(tar.gz)
Source code(zip)
v0.5.0(Nov 9, 2022)
What's Changed

Fix metrics reset by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/72

Pandas preprocessing without df copy, faster preprocessing for large datasets by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/73

fix in supervised-sequence-to-target.ipynb by @blinovpd in https://github.com/dllllb/pytorch-lifestream/pull/74

ptls.nn.PBDropout by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/75

tanh for rnn starter by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/76

Auc regr metric by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/78

spatial dropout for NoisyEmbedding, LastMaxAvgEncoder, warning for bidir RnnEncoder by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/80

Hparam tuning demo. hydra, optuna, tensorboard by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/81

tabformer by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/83

Supervised Coles Module, trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/84

New Contributors

@blinovpd made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/74

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.4.0...v0.5.0
Source code(tar.gz)
Source code(zip)
v0.4.0(Jul 27, 2022)
What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0
Source code(tar.gz)
Source code(zip)
v0.3.0(Jun 12, 2022)
More Pythonic Core API: constructor arguments instead of config objects

What's Changed

cpc params by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/9

All modules by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/15

Mlm pretrain by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/13

all encoders and get rid of get_loss by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/19

init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/20

Documentation index by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/8

Demos api update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/18

loss output correction by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/22

Test fixes by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/23

readme_demo_link by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/25

init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/26

work without logger by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/7

trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/28

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.1.2...v0.3.0
Source code(tar.gz)
Source code(zip)

Owner

Dmitri Babaev

GitHub

Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

OpenSelfSup News Downstream tasks now support more methods(Mask RCNN-FPN, RetinaNet, Keypoints RCNN) and more datasets(Cityscapes). 'GaussianBlur' is

332 Jan 3, 2023

Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch

Neural Distance Embeddings for Biological Sequences Official implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTo

56 Dec 23, 2022

Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Event Sourced Bank A "wide but shallow" example of using the Python event sourci

3 Mar 9, 2022

Pytoydl: A toy deep learning framework built upon numpy.

Documents: https://pytoydl.readthedocs.io/zh/latest/ Pytoydl A toy deep learning framework built upon numpy. You can star this repository to keep trac

28 Dec 10, 2022

Generic Event Boundary Detection: A Benchmark for Event Segmentation

Generic Event Boundary Detection: A Benchmark for Event Segmentation We release our data annotation & baseline codes for detecting generic event bound

47 Nov 22, 2022

Scikit-event-correlation - Event Correlation and Forecasting over High Dimensional Streaming Sensor Data algorithms

scikit-event-correlation Event Correlation and Changing Detection Algorithm Theo

5 Oct 30, 2022

Event-forecasting - Event Forecasting Algorithms With Python

event-forecasting Event Forecasting Algorithms Theory Correlating events in comp

4 Feb 15, 2022

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

253 Jan 6, 2023

Implementation of the method described in the Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

4 Mar 11, 2022

Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

AimCLR This is an official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Reco

44 Dec 17, 2022

Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision https://arxiv.org/abs/2003.00393 Abstract Active learning (AL) aims to min

29 Nov 21, 2022

Embeddinghub is a database built for machine learning embeddings.

1.2k Jan 1, 2023

API for RL algorithm design & testing of BCA (Building Control Agent) HVAC on EnergyPlus building energy simulator by wrapping their EMS Python API

RL - EmsPy (work In Progress...) The EmsPy Python package was made to facilitate Reinforcement Learning (RL) algorithm research for developing and tes

20 Jan 5, 2023

Self-training with Weak Supervision (NAACL 2021)

This repo holds the code for our weak supervision framework, ASTRA, described in our NAACL 2021 paper: "Self-Training with Weak Supervision"

148 Nov 20, 2022

Improving Transferability of Representations via Augmentation-Aware Self-Supervision

Improving Transferability of Representations via Augmentation-Aware Self-Supervision Accepted to NeurIPS 2021 TL;DR: Learning augmentation-aware infor

38 Sep 16, 2022

Code release for SLIP Self-supervision meets Language-Image Pre-training

SLIP: Self-supervision meets Language-Image Pre-training What you can find in this repo: Pre-trained models (with ViT-Small, Base, Large) and code to

621 Dec 31, 2022

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

256 Dec 28, 2022

PyTorch package for the discrete VAE used for DALL·E.

Overview [Blog] [Paper] [Model Card] [Usage] This is the official PyTorch package for the discrete VAE used for DALL·E. Installation Before running th

9.5k Jan 5, 2023

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Auto-ViML Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced "auto vimal" (autovimal logo created by Sanket Ghanmare) N

397 Dec 30, 2022

A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Related tags

Overview

Install from PyPi

Install from source

Demo notebooks

Experiments on public datasets

Comments

Releases(v0.5.1)

v0.5.1(Dec 28, 2022)

What's Changed

New Contributors

v0.5.0(Nov 9, 2022)

What's Changed

New Contributors

v0.4.0(Jul 27, 2022)

What's Changed

New Contributors

What's Changed

New Contributors

What's Changed

New Contributors

v0.3.0(Jun 12, 2022)

What's Changed

Owner

Dmitri Babaev

Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch

Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Pytoydl: A toy deep learning framework built upon numpy.

Generic Event Boundary Detection: A Benchmark for Event Segmentation

Scikit-event-correlation - Event Correlation and Forecasting over High Dimensional Streaming Sensor Data algorithms

Event-forecasting - Event Forecasting Algorithms With Python

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Implementation of the method described in the Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

Embeddinghub is a database built for machine learning embeddings.

API for RL algorithm design & testing of BCA (Building Control Agent) HVAC on EnergyPlus building energy simulator by wrapping their EMS Python API

Self-training with Weak Supervision (NAACL 2021)

Improving Transferability of Representations via Augmentation-Aware Self-Supervision

Code release for SLIP Self-supervision meets Language-Image Pre-training

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PyTorch package for the discrete VAE used for DALL·E.

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.