A Python package to preprocess time series

Overview

Disclaimer: This package is WIP. Do not take any APIs for granted.

tspreprocess

Time series can contain noise, may be sampled under a non fitting rate or just need to be compressed. tspreprocess is a library for such preprocessing tasks. It contains tools to transform and clean time series data for better analyses.

In detail, we are planning to add methods to do

  • Denoising
  • Compression
  • Resampling
  • ...

Our goal is to make this the most comprehensive time series preprocessing library.

Installation

Clone the repo, cd into it and install it with pip locally

git clone https://github.com/MaxBenChrist/tspreprocess
cd tspreprocess
pip install -e .

You can run the test suite by

python setup.py test

Relation to tsfresh

This package will based on the data formats from the python feature extraction pacakge tsfresh (https://github.com/blue-yonder/tsfresh), allowing a seamless integration between both packages.

Comments
  • Replace

    Replace "time" with "bin"

    • In the prior setting, sorting is deone lexicographically (i.e. bin_1, then bin_10, bin_11 and so on, which is not useful

    • By eval, the bin number is treated as a number and not as a string

    • replaced the prefix feature with map

    opened by nikhase 0
  • will it work for multivariate time series prediction   both regression and classification

    will it work for multivariate time series prediction both regression and classification

    great code thanks may you clarify : will it work for multivariate time series prediction both regression and classification 1 where all values are continues values weight height age target 1 56 160 34 1.2 2 77 170 54 3.5 3 87 167 43 0.7 4 55 198 72 0.5 5 88 176 32 2.3

    2 or even will it work for multivariate time series where values are mixture of continues and categorical values for example 2 dimensions have continues values and 3 dimensions are categorical values

    color        weight     gender  height  age  target 
    

    1 black 56 m 160 34 yes 2 white 77 f 170 54 no 3 yellow 87 m 167 43 yes 4 white 55 m 198 72 no 5 white 88 f 176 32 yes

    opened by Sandy4321 0
  • Project status?

    Project status?

    I am looking to add a robust preprocessing library to a library that I work on, matrixprofile. Is this library dead? Did someone else start working on it in a different fork?

    opened by tylerwmarrs 0
  • Lexicographical sort of column

    Lexicographical sort of column "time" after compression

    The "time" shows bins and is encoded as bin_0.0. This makes it hard to sort by the column and make plot. What about renaming "time" to "bin" and providing bin numbers?

    In general, one would like to pass the dataframe to tsfresh, so the "time" column should be ordered accordingly.

    id | feature_agg_autocorrelation_f_agg_"mean" | feature_agg_autocorrelation_f_agg_"median" | feature_agg_autocorrelation_f_agg_"var" | time -- | -- | -- | -- | -- 0 | -0.006695 | -0.031946 | 0.031041 | bin_0.0 0 | 0.003307 | 0.002723 | 0.015377 | bin_1.0 0 | -0.019875 | -0.020356 | 0.016519 | bin_10.0 0 | -0.010753 | -0.026369 | 0.021735 | bin_100.0 0 | 0.011816 | 0.019509 | 0.010336 | bin_101.0 0 | -0.012836 | -0.012418 | 0.038740 | bin_102.0 0 | -0.013034 | -0.008422 | 0.008983 | bin_103.0 0 | -0.015615 | -0.015442 | 0.022139 | bin_104.0 0 | -0.011075 | 0.006340 | 0.018839 | bin_105.0 0 | -0.012528 | -0.002204 | 0.014608 | bin_106.0 0 | 0.003264 | -0.012552 | 0.012001 | bin_107.0 0 | -0.008267 | -0.013056 | 0.031777 | bin_108.0 0 | -0.014031 | -0.026050 | 0.011954 | bin_109.0 0 | -0.027372 | -0.028189 | 0.012125 | bin_11.0 0 | -0.006538 | -0.016846 | 0.020991 | bin_110.0 0 | 0.028912 | -0.002320 | 0.018458 | bin_111.0 0 | -0.011757 | -0.021368 | 0.040606 | bin_112.0 0 | -0.014773 | -0.022101 | 0.013958 | bin_113.0 0 | -0.010944 | -0.001797 | 0.028481 | bin_114.0 0 | -0.016143 | -0.028406 | 0.007117 | bin_115.0 0 | -0.013865 | -0.021711 | 0.011233 | bin_116.0 0 | -0.009488 | 0.007354 | 0.008971 | bin_117.0 0 | -0.014187 | -0.017223 | 0.044131 | bin_118.0 0 | -0.013005 | -0.005250 | 0.011614 | bin_119.0 0 | -0.011601 | 0.010453 | 0.016970 | bin_12.0 0 | -0.012738 | -0.004333 | 0.012729 | bin_120.0 0 | -0.013266 | -0.016564 | 0.007020 | bin_121.0 0 | -0.015038 | -0.042097 | 0.024701 | bin_122.0 0 | -0.012776 | -0.004399 | 0.016492 | bin_123.0 0 | -0.012934 | -0.018298 | 0.017719 | bin_124.0 ... | ... | ... | ... | ... 9 | -0.017292 | -0.010434 | 0.007727 | bin_72.0 9 | -0.009239 | 0.000410 | 0.007263 | bin_73.0 9 | -0.050343 | -0.035553 | 0.016307 | bin_74.0 9 | -0.016550 | -0.019668 | 0.007808 | bin_75.0 9 | -0.015879 | -0.034310 | 0.014253 | bin_76.0 9 | -0.019754 | -0.037949 | 0.018174 | bin_77.0 9 | -0.016839 | -0.005070 | 0.016695 | bin_78.0 9 | -0.015295 | -0.005584 | 0.012654 | bin_79.0 9 | -0.015647 | -0.016262 | 0.008907 | bin_8.0 9 | -0.010676 | -0.014450 | 0.010222 | bin_80.0 9 | -0.003566 | 0.010439 | 0.009648 | bin_81.0 9 | 0.008290 | 0.015121 | 0.009266 | bin_82.0 9 | -0.004448 | -0.014874 | 0.007668 | bin_83.0 9 | -0.012481 | -0.017615 | 0.012226 | bin_84.0 9 | -0.018334 | -0.007268 | 0.009883 | bin_85.0 9 | -0.017429 | -0.029421 | 0.009856 | bin_86.0 9 | -0.000159 | 0.010534 | 0.008968 | bin_87.0 9 | -0.003924 | -0.022100 | 0.018910 | bin_88.0 9 | 0.008415 | 0.019052 | 0.020014 | bin_89.0 9 | -0.012393 | -0.000086 | 0.010260 | bin_9.0 9 | 0.006285 | 0.020495 | 0.012573 | bin_90.0 9 | -0.010193 | -0.008106 | 0.008721 | bin_91.0 9 | -0.016792 | -0.009178 | 0.012188 | bin_92.0 9 | 0.008476 | 0.020195 | 0.010278 | bin_93.0 9 | 0.005893 | 0.007117 | 0.008789 | bin_94.0 9 | -0.008254 | -0.010829 | 0.017784 | bin_95.0 9 | 0.004660 | 0.014164 | 0.009694 | bin_96.0 9 | 0.011764 | -0.004501 | 0.010030 | bin_97.0 9 | -0.017136 | -0.026493 | 0.011077 | bin_98.0 9 | 0.013644 | 0.033041 | 0.008518 | bin_99.0

    opened by nikhase 3
  • Change import of compress

    Change import of compress

    from tspreprocess.compress.compress import compress looks very redundant.

    I would prefer from tspreprocess.compress import compress. One could get rid of the intermediate folders and have all files on the 2nd level?

    One could also think of renaming the compression function such that it does not coincide with the module name, e.g. to compress_ts.

    opened by nikhase 1
Owner
Maximilian Christ
Follow me on twitter: https://twitter.com/MaxBenChrist
Maximilian Christ
A data preprocessing package for time series data. Design for machine learning and deep learning.

A data preprocessing package for time series data. Design for machine learning and deep learning.

Allen Chiang 152 Jan 7, 2023
Open source time series library for Python

PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible array

Ross Taylor 2k Jan 2, 2023
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

Amazon Web Services - Labs 3.3k Jan 3, 2023
A python library for easy manipulation and forecasting of time series.

Time Series Made Easy in Python darts is a python library for easy manipulation and forecasting of time series. It contains a variety of models, from

Unit8 5.2k Jan 4, 2023
STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

TD Ameritrade 2.5k Jan 6, 2023
Python module for machine learning time series:

seglearn Seglearn is a python package for machine learning time series or sequences. It provides an integrated pipeline for segmentation, feature extr

David Burns 536 Dec 29, 2022
A Python toolkit for rule-based/unsupervised anomaly detection in time series

Anomaly Detection Toolkit (ADTK) Anomaly Detection Toolkit (ADTK) is a Python package for unsupervised / rule-based time series anomaly detection. As

Arundo Analytics 888 Dec 30, 2022
AtsPy: Automated Time Series Models in Python (by @firmai)

Automated Time Series Models in Python (AtsPy) SSRN Report Easily develop state of the art time series models to forecast univariate data series. Simp

Derek Snow 465 Jan 2, 2023
A python library for Bayesian time series modeling

PyDLM Welcome to pydlm, a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and W

Sam 438 Dec 17, 2022
A Python implementation of GRAIL, a generic framework to learn compact time series representations.

GRAIL A Python implementation of GRAIL, a generic framework to learn compact time series representations. Requirements Python 3.6+ numpy scipy tslearn

null 3 Nov 24, 2021
PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

A python toolbox/library for data mining on partially-observed time series, supporting tasks of forecasting/imputation/classification/clustering on incomplete multivariate time series with missing values.

Wenjie Du 179 Dec 31, 2022
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

null 2.3k Jan 5, 2023
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

Facebook 15.4k Jan 7, 2023
Automatic extraction of relevant features from time series:

tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis

Blue Yonder GmbH 7k Jan 6, 2023
A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

The Alan Turing Institute 6k Jan 6, 2023
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

null 2.3k Dec 29, 2022
Time series forecasting with PyTorch

Our article on Towards Data Science introduces the package and provides background information. Pytorch Forecasting aims to ease state-of-the-art time

Jan Beitner 2.5k Jan 2, 2023
Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Auto_TS: Auto_TimeSeries Automatically build multiple Time Series models using a Single Line of Code. Now updated with Dask. Auto_timeseries is a comp

AutoViz and Auto_ViML 519 Jan 3, 2023