tsflex is a toolkit for flexible time series processing & feature extraction, that is efficient and makes few assumptions about sequence data.
Useful links
Installation
command | |
---|---|
pip | pip install tsflex |
conda | conda install -c conda-forge tsflex |
Usage
tsflex is built to be intuitive, so we encourage you to copy-paste this code and toy with some parameters!
Feature extraction
import pandas as pd; import numpy as np; import scipy.stats as ss
from tsflex.features import MultipleFeatureDescriptors, FeatureCollection
from tsflex.utils.data import load_empatica_data
# 1. Load sequence-indexed data (in this case a time-index)
df_tmp, df_acc, df_ibi = load_empatica_data(['tmp', 'acc', 'ibi'])
# 2. Construct your feature extraction configuration
fc = FeatureCollection(
MultipleFeatureDescriptors(
functions=[np.min, np.mean, np.std, ss.skew, ss.kurtosis],
series_names=["TMP", "ACC_x", "ACC_y", "IBI"],
windows=["15min", "30min"],
strides="15min",
)
)
# 3. Extract features
fc.calculate(data=[df_tmp, df_acc, df_ibi], approve_sparsity=True)
Note that the feature extraction is performed on multivariate data with varying sample rates.
signal | columns | sample rate |
---|---|---|
df_tmp | ["TMP"] | 4Hz |
df_acc | ["ACC_x", "ACC_y", "ACC_z" ] | 32Hz |
df_ibi | ["IBI"] | irregularly sampled |
Processing
✨
Why tsflex? Flexible
:- handles multivariate/multimodal time series
- versatile function support => integrates with many packages for:
- processing (e.g., scipy.signal, statsmodels.tsa)
- feature extraction (e.g., numpy, scipy.stats, seglearn¹, tsfresh¹, tsfel¹)
- feature extraction handles multiple strides & window sizes
Efficient
:
- view-based operations for processing & feature extraction => extremely low memory peak & fast execution time
- view-based operations for processing & feature extraction => extremely low memory peak & fast execution time
Intuitive
:
- maintains the sequence-index of the data
- feature extraction constructs interpretable output column names
- intuitive API
Few assumptions
about the sequence data:- no assumptions about sampling rate
- able to deal with multivariate asynchronous data
i.e. data with small time-offsets between the modalities
Advanced functionalities
:- apply FeatureCollection.reduce after feature selection for faster inference
- use function execution time logging to discover processing and feature extraction bottlenecks
- embedded SeriesPipeline & FeatureCollection serialization
- time series chunking
¹ These integrations are shown in integration-example notebooks.
🔨
Future work - scikit-learn integration for both processing and feature extraction
note: is actively developed upon sklearn integration branch. - Support time series segmentation (exposing under the hood strided-rolling functionality) - see this issue
- Support for multi-indexed dataframes
=> Also see the enhancement issues
👪
Contributing We are thrilled to see your contributions to further enhance tsflex
.
See this guide for more instructions on how to contribute.
Referencing our package
If you use tsflex
in a scientific publication, we would highly appreciate citing us as:
@article{vanderdonckt2021tsflex,
author = {Van Der Donckt, Jonas and Van Der Donckt, Jeroen and Deprost, Emiel and Van Hoecke, Sofie},
title = {tsflex: flexible time series processing \& feature extraction},
journal = {SoftwareX},
year = {2021},
url = {https://github.com/predict-idlab/tsflex},
publisher={Elsevier}
}
Linkt to the preprint paper: https://arxiv.org/abs/2111.12429