A Python package to preprocess time series

Maximilian Christ

Last update: Dec 17, 2022

Related tags

Machine Learning tspreprocess

Overview

Disclaimer: This package is WIP. Do not take any APIs for granted.

tspreprocess

Time series can contain noise, may be sampled under a non fitting rate or just need to be compressed. tspreprocess is a library for such preprocessing tasks. It contains tools to transform and clean time series data for better analyses.

In detail, we are planning to add methods to do

Denoising
Compression
Resampling
...

Our goal is to make this the most comprehensive time series preprocessing library.

Installation

Clone the repo, cd into it and install it with pip locally

git clone https://github.com/MaxBenChrist/tspreprocess
cd tspreprocess
pip install -e .

You can run the test suite by

python setup.py test

Relation to tsfresh

This package will based on the data formats from the python feature extraction pacakge tsfresh (https://github.com/blue-yonder/tsfresh), allowing a seamless integration between both packages.

Comments

Replace "time" with "bin"
In the prior setting, sorting is deone lexicographically (i.e. bin_1, then bin_10, bin_11 and so on, which is not useful

By eval, the bin number is treated as a number and not as a string

replaced the prefix feature with map
opened by nikhase 0
will it work for multivariate time series prediction both regression and classification
great code thanks may you clarify : will it work for multivariate time series prediction both regression and classification 1 where all values are continues values weight height age target 1 56 160 34 1.2 2 77 170 54 3.5 3 87 167 43 0.7 4 55 198 72 0.5 5 88 176 32 2.3

2 or even will it work for multivariate time series where values are mixture of continues and categorical values for example 2 dimensions have continues values and 3 dimensions are categorical values

color weight gender height age target

1 black 56 m 160 34 yes 2 white 77 f 170 54 no 3 yellow 87 m 167 43 yes 4 white 55 m 198 72 no 5 white 88 f 176 32 yes
opened by Sandy4321 0
Project status?

I am looking to add a robust preprocessing library to a library that I work on, matrixprofile. Is this library dead? Did someone else start working on it in a different fork?

opened by tylerwmarrs 0
Lexicographical sort of column "time" after compression

The "time" shows bins and is encoded as bin_0.0. This makes it hard to sort by the column and make plot. What about renaming "time" to "bin" and providing bin numbers?

In general, one would like to pass the dataframe to tsfresh, so the "time" column should be ordered accordingly.

id | feature_agg_autocorrelation_f_agg_"mean" | feature_agg_autocorrelation_f_agg_"median" | feature_agg_autocorrelation_f_agg_"var" | time -- | -- | -- | -- | -- 0 | -0.006695 | -0.031946 | 0.031041 | bin_0.0 0 | 0.003307 | 0.002723 | 0.015377 | bin_1.0 0 | -0.019875 | -0.020356 | 0.016519 | bin_10.0 0 | -0.010753 | -0.026369 | 0.021735 | bin_100.0 0 | 0.011816 | 0.019509 | 0.010336 | bin_101.0 0 | -0.012836 | -0.012418 | 0.038740 | bin_102.0 0 | -0.013034 | -0.008422 | 0.008983 | bin_103.0 0 | -0.015615 | -0.015442 | 0.022139 | bin_104.0 0 | -0.011075 | 0.006340 | 0.018839 | bin_105.0 0 | -0.012528 | -0.002204 | 0.014608 | bin_106.0 0 | 0.003264 | -0.012552 | 0.012001 | bin_107.0 0 | -0.008267 | -0.013056 | 0.031777 | bin_108.0 0 | -0.014031 | -0.026050 | 0.011954 | bin_109.0 0 | -0.027372 | -0.028189 | 0.012125 | bin_11.0 0 | -0.006538 | -0.016846 | 0.020991 | bin_110.0 0 | 0.028912 | -0.002320 | 0.018458 | bin_111.0 0 | -0.011757 | -0.021368 | 0.040606 | bin_112.0 0 | -0.014773 | -0.022101 | 0.013958 | bin_113.0 0 | -0.010944 | -0.001797 | 0.028481 | bin_114.0 0 | -0.016143 | -0.028406 | 0.007117 | bin_115.0 0 | -0.013865 | -0.021711 | 0.011233 | bin_116.0 0 | -0.009488 | 0.007354 | 0.008971 | bin_117.0 0 | -0.014187 | -0.017223 | 0.044131 | bin_118.0 0 | -0.013005 | -0.005250 | 0.011614 | bin_119.0 0 | -0.011601 | 0.010453 | 0.016970 | bin_12.0 0 | -0.012738 | -0.004333 | 0.012729 | bin_120.0 0 | -0.013266 | -0.016564 | 0.007020 | bin_121.0 0 | -0.015038 | -0.042097 | 0.024701 | bin_122.0 0 | -0.012776 | -0.004399 | 0.016492 | bin_123.0 0 | -0.012934 | -0.018298 | 0.017719 | bin_124.0 ... | ... | ... | ... | ... 9 | -0.017292 | -0.010434 | 0.007727 | bin_72.0 9 | -0.009239 | 0.000410 | 0.007263 | bin_73.0 9 | -0.050343 | -0.035553 | 0.016307 | bin_74.0 9 | -0.016550 | -0.019668 | 0.007808 | bin_75.0 9 | -0.015879 | -0.034310 | 0.014253 | bin_76.0 9 | -0.019754 | -0.037949 | 0.018174 | bin_77.0 9 | -0.016839 | -0.005070 | 0.016695 | bin_78.0 9 | -0.015295 | -0.005584 | 0.012654 | bin_79.0 9 | -0.015647 | -0.016262 | 0.008907 | bin_8.0 9 | -0.010676 | -0.014450 | 0.010222 | bin_80.0 9 | -0.003566 | 0.010439 | 0.009648 | bin_81.0 9 | 0.008290 | 0.015121 | 0.009266 | bin_82.0 9 | -0.004448 | -0.014874 | 0.007668 | bin_83.0 9 | -0.012481 | -0.017615 | 0.012226 | bin_84.0 9 | -0.018334 | -0.007268 | 0.009883 | bin_85.0 9 | -0.017429 | -0.029421 | 0.009856 | bin_86.0 9 | -0.000159 | 0.010534 | 0.008968 | bin_87.0 9 | -0.003924 | -0.022100 | 0.018910 | bin_88.0 9 | 0.008415 | 0.019052 | 0.020014 | bin_89.0 9 | -0.012393 | -0.000086 | 0.010260 | bin_9.0 9 | 0.006285 | 0.020495 | 0.012573 | bin_90.0 9 | -0.010193 | -0.008106 | 0.008721 | bin_91.0 9 | -0.016792 | -0.009178 | 0.012188 | bin_92.0 9 | 0.008476 | 0.020195 | 0.010278 | bin_93.0 9 | 0.005893 | 0.007117 | 0.008789 | bin_94.0 9 | -0.008254 | -0.010829 | 0.017784 | bin_95.0 9 | 0.004660 | 0.014164 | 0.009694 | bin_96.0 9 | 0.011764 | -0.004501 | 0.010030 | bin_97.0 9 | -0.017136 | -0.026493 | 0.011077 | bin_98.0 9 | 0.013644 | 0.033041 | 0.008518 | bin_99.0

opened by nikhase 3
Change import of compress

from tspreprocess.compress.compress import compress looks very redundant.

I would prefer from tspreprocess.compress import compress. One could get rid of the intermediate folders and have all files on the 2nd level?

One could also think of renaming the compression function such that it does not coincide with the module name, e.g. to compress_ts.

opened by nikhase 1

Owner

Maximilian Christ

Follow me on twitter: https://twitter.com/MaxBenChrist

GitHub

A data preprocessing package for time series data. Design for machine learning and deep learning.

152 Jan 7, 2023

Open source time series library for Python

PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible array

2k Jan 2, 2023

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

1.3k Dec 22, 2022

Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

3.3k Jan 3, 2023

A python library for easy manipulation and forecasting of time series.

Time Series Made Easy in Python darts is a python library for easy manipulation and forecasting of time series. It contains a variety of models, from

5.2k Jan 4, 2023

STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

2.5k Jan 6, 2023

Python module for machine learning time series:

seglearn Seglearn is a python package for machine learning time series or sequences. It provides an integrated pipeline for segmentation, feature extr

536 Dec 29, 2022

A Python toolkit for rule-based/unsupervised anomaly detection in time series

Anomaly Detection Toolkit (ADTK) Anomaly Detection Toolkit (ADTK) is a Python package for unsupervised / rule-based time series anomaly detection. As

888 Dec 30, 2022

AtsPy: Automated Time Series Models in Python (by @firmai)

Automated Time Series Models in Python (AtsPy) SSRN Report Easily develop state of the art time series models to forecast univariate data series. Simp

465 Jan 2, 2023

A python library for Bayesian time series modeling

PyDLM Welcome to pydlm, a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and W

438 Dec 17, 2022

A Python implementation of GRAIL, a generic framework to learn compact time series representations.

GRAIL A Python implementation of GRAIL, a generic framework to learn compact time series representations. Requirements Python 3.6+ numpy scipy tslearn

3 Nov 24, 2021

PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

A python toolbox/library for data mining on partially-observed time series, supporting tasks of forecasting/imputation/classification/clustering on incomplete multivariate time series with missing values.

179 Dec 31, 2022

A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Jan 5, 2023

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

15.4k Jan 7, 2023

Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Auto_TS: Auto_TimeSeries Automatically build multiple Time Series models using a Single Line of Code. Now updated with Dask. Auto_timeseries is a comp

519 Jan 3, 2023

A Python package to preprocess time series

Related tags

Overview

tspreprocess

Installation

Relation to tsfresh

Comments

Replace "time" with "bin"

will it work for multivariate time series prediction both regression and classification

Project status?

Lexicographical sort of column "time" after compression

Change import of compress

Owner

Maximilian Christ

A data preprocessing package for time series data. Design for machine learning and deep learning.

Open source time series library for Python

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

Probabilistic time series modeling in Python

A python library for easy manipulation and forecasting of time series.

STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

Python module for machine learning time series:

A Python toolkit for rule-based/unsupervised anomaly detection in time series

AtsPy: Automated Time Series Models in Python (by @firmai)

A python library for Bayesian time series modeling

A Python implementation of GRAIL, a generic framework to learn compact time series representations.

PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

A machine learning toolkit dedicated to time-series data

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Automatic extraction of relevant features from time series:

A unified framework for machine learning with time series

A machine learning toolkit dedicated to time-series data

Time series forecasting with PyTorch

Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.