tsflex - feature-extraction benchmarking
This repository withholds the benchmark results and visualization code of the tsflex
paper and toolkit.
Flow
The benchmark process follows these steps for each feature-extraction configuration:
- The corresponding feature-extraction Python script is called. This is done 20 times to average out the memory usage and create upper memory bounds. Remark that by (re)calling the script sequentially, no caching or memory is shared among the separate script-executions.
- In this script:
- Load the data and store as a pd.DataFrame
- VizTracer starts logging
- Create the feature extraction configuration
- Extract & store the features
- VizTracer stops logging
- Write the VizTracer results to a JSON-file
The existing benchmark JSONS were collected on a desktop with an Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz CPU and SAMSUNG M393B1G73QH0-CMA DDR3 1600MT/s RAM, with Ubuntu 18.04.5 LTS x86_64 as operating system. Other running processes were limited to a minimum.
Instructions
To install the required dependencies, just run:
pip install -r requirements.txt
If you want to re-run the benchmarks, use the run_scripts notebook to generate new benchmark JSONs and then visualize them with the benchmark visualization notebook.
We are open to new-benchmark use-cases via pull-requests!
Examples of other interesting benchmarks are different sample rates, other feature extraction functions, other data properties, ...
Referencing our package
If you use tsflex
in a scientific publication, we would highly appreciate citing us as:
@article{vanderdonckt2021tsflex,
author = {Van Der Donckt, Jonas and Van Der Donckt, Jeroen and Deprost, Emiel and Van Hoecke, Sofie},
title = {tsflex: flexible time series processing \& feature extraction},
journal = {SoftwareX},
year = {2021},
url = {https://github.com/predict-idlab/tsflex},
publisher={Elsevier}
}