A python library for time-series smoothing and outlier detection in a vectorized way.

Marco Cerliani

Last update: Dec 28, 2022

Related tags

Deep Learning bootstrap timeseries time-series smoothing outlier-detection bootstrapping-statistics outlier-removal series-smoothing

Overview

tsmoothie

A python library for time-series smoothing and outlier detection in a vectorized way.

Overview

tsmoothie computes, in a fast and efficient way, the smoothing of single or multiple time-series.

The smoothing techniques available are:

Exponential Smoothing
Convolutional Smoothing with various window types (constant, hanning, hamming, bartlett, blackman)
Spectral Smoothing with Fourier Transform
Polynomial Smoothing
Spline Smoothing of various kind (linear, cubic, natural cubic)
Gaussian Smoothing
Binner Smoothing
LOWESS
Seasonal Decompose Smoothing of various kind (convolution, lowess, natural cubic spline)
Kalman Smoothing with customizable components (level, trend, seasonality, long seasonality)

tsmoothie provides the calculation of intervals as result of the smoothing process. This can be useful to identify outliers and anomalies in time-series.

In relation to the smoothing method used, the interval types available are:

sigma intervals
confidence intervals
predictions intervals
kalman intervals

tsmoothie can carry out a sliding smoothing approach to simulate an online usage. This is possible splitting the time-series into equal sized pieces and smoothing them independently. As always, this functionality is implemented in a vectorized way through the WindowWrapper class.

tsmoothie can operate time-series bootstrap through the BootstrappingWrapper class.

The supported bootstrap algorithms are:

none overlapping block bootstrap
moving block bootstrap
circular block bootstrap
stationary bootstrap

Media

Blog Posts:

Installation

pip install --upgrade tsmoothie

The module depends only on NumPy, SciPy and simdkalman. Python 3.6 or above is supported.

Usage: smoothing

Below a couple of examples of how tsmoothie works. Full examples are available in the notebooks folder.

# import libraries
import numpy as np
import matplotlib.pyplot as plt
from tsmoothie.utils_func import sim_randomwalk
from tsmoothie.smoother import LowessSmoother

# generate 3 randomwalks of lenght 200
np.random.seed(123)
data = sim_randomwalk(n_series=3, timesteps=200, 
                      process_noise=10, measure_noise=30)

# operate smoothing
smoother = LowessSmoother(smooth_fraction=0.1, iterations=1)
smoother.smooth(data)

# generate intervals
low, up = smoother.get_intervals('prediction_interval')

# plot the smoothed timeseries with intervals
plt.figure(figsize=(18,5))

for i in range(3):
    
    plt.subplot(1,3,i+1)
    plt.plot(smoother.smooth_data[i], linewidth=3, color='blue')
    plt.plot(smoother.data[i], '.k')
    plt.title(f"timeseries {i+1}"); plt.xlabel('time')

    plt.fill_between(range(len(smoother.data[i])), low[i], up[i], alpha=0.3)

# import libraries
import numpy as np
import matplotlib.pyplot as plt
from tsmoothie.utils_func import sim_seasonal_data
from tsmoothie.smoother import DecomposeSmoother

# generate 3 periodic timeseries of lenght 300
np.random.seed(123)
data = sim_seasonal_data(n_series=3, timesteps=300, 
                         freq=24, measure_noise=30)

# operate smoothing
smoother = DecomposeSmoother(smooth_type='lowess', periods=24,
                             smooth_fraction=0.3)
smoother.smooth(data)

# generate intervals
low, up = smoother.get_intervals('sigma_interval')

# plot the smoothed timeseries with intervals
plt.figure(figsize=(18,5))

for i in range(3):
    
    plt.subplot(1,3,i+1)
    plt.plot(smoother.smooth_data[i], linewidth=3, color='blue')
    plt.plot(smoother.data[i], '.k')
    plt.title(f"timeseries {i+1}"); plt.xlabel('time')

    plt.fill_between(range(len(smoother.data[i])), low[i], up[i], alpha=0.3)

All the available smoothers are fully integrable with sklearn (see here).

Usage: bootstrap

# import libraries
import numpy as np
import matplotlib.pyplot as plt
from tsmoothie.utils_func import sim_seasonal_data
from tsmoothie.smoother import ConvolutionSmoother
from tsmoothie.bootstrap import BootstrappingWrapper

# generate a periodic timeseries of lenght 300
np.random.seed(123)
data = sim_seasonal_data(n_series=1, timesteps=300, 
                         freq=24, measure_noise=15)

# operate bootstrap
bts = BootstrappingWrapper(ConvolutionSmoother(window_len=8, window_type='ones'), 
                           bootstrap_type='mbb', block_length=24)
bts_samples = bts.sample(data, n_samples=100)

# plot the bootstrapped timeseries
plt.figure(figsize=(13,5))
plt.plot(bts_samples.T, alpha=0.3, c='orange')
plt.plot(data[0], c='blue', linewidth=2)

References

Polynomial, Spline, Gaussian and Binner smoothing are carried out building a regression on custom basis expansions. These implementations are based on the amazing intuitions of Matthew Drury available here
Time Series Modelling with Unobserved Components, Matteo M. Pelagatti
Bootstrap Methods in Time Series Analysis, Fanny Bergström, Stockholms universitet

Comments

Question on KalmanSmoother usage
Hi, I have a time-series that has seasonality at certain time windows (lets call it sw) and no seasonality at other windows (lets call it nsw). I plan to pass random windows of this time-series into the smoother.

I am trying to use KalmanSmoother and is considering between:

smoother1 = ts.smoother.KalmanSmoother(component='level_trend_season', component_noise={'level':0.1, 'trend':0.1, 'season':0.1}) vs smoother2 = ts.smoother.KalmanSmoother(component='level_trend', component_noise={'level':0.1, 'trend':0.1})

If the random window slice is sw, the smoother1 should work just fine, and at nsw cases, smoother2 should work better. However I can only use one smoother.

My question is if I pass nsw into smoother1, will it degrade performance as compared to if pass nsw to smoother2? Is the smoother1 smart enough to "ignore" the fact that nsw has no seasonality in its time-series?
opened by turmeric-blend 5
enhance for tsmoothie to be applicable for inputs with multiple dimensions

Hi, thanks for this library.

Is it possible to vectorize across multiple dimensions? So a generic N dimensions (..., ..., ..., ..., , timesteps), currently it is limited to (series, timesteps). This would be useful to apply to multivariate time-series problems as well as deep learning applications where there is a batch_size. This should be fairly straight forward using PyTorch (actually even doable with numpy). Would there be a computation limitation?

opened by turmeric-blend 4
question

Hi Marco

First thank you for your python package !

Among all the smoother of the package which one is casual ? or are they all no casual ?

Regards Ludo

opened by LinuxpowerLudo 4
Numpy rounding issue causes NaN array on Lowess prediction results

Marco, thanks for the excellent project! You've made a great effort combining all the smoothing theories in one single, easy-to-use library! I couldn't thank you enough!

I stumbled upon a rounding math problem today on the "prediction_interval" function. This problem is actually not on your code, but instead on how Numpy chooses to round floating numbers on the numpy.sum method:

For floating point numbers the numerical precision of sum (and np.add.reduce) is in general limited by directly adding each number individually to the result causing rounding errors in every step. However, often numpy will use a numerically better approach (partial pairwise summation) leading to improved precision in many use-cases. This improved precision is always provided when no axis is given. When axis is given, it will depend on which axis is summed. Technically, to provide the best speed possible, the improved precision is only used when the summation is along the fast axis in memory. Note that the exact precision may vary depending on other parameters. In contrast to NumPy, Python’s math.fsum function uses a slower but more precise approach to summation. Especially when summing a large number of lower precision floating point numbers, such as float32, numerical errors can become significant. In such cases it can be advisable to use dtype=”float64” to use a higher precision for the output.

This only occurred with a very particular set of numbers while using the LowessSmoother, which ended up with a negative value that caused an excepetion on the square root later on:

mse = (np.square(resid).sum(axis=1, keepdims=True) / (N - d_free)).T .... predstd = np.sqrt(predvar).T

tsmoothie\utils_func.py:306: RuntimeWarning: invalid value encountered in sqrt

Quick solution first:

mse = (np.square(resid).sum(axis=1, dtype="float64", keepdims=True) / (N - d_free)).T

Adding the dtype parameter solved the problem. This causes numpy to increase rounding precision (as stated above) which ended up giving me the correct result.

Quick observation: As yet, I'm not quite sure on how adding dtype might affect speed and performance on all the other smoother methods, but I will have to check on this eventually.

Explanation and info:

While calling Lowess Smother method, setting the iterations parameter to any value greater than 1 caused the rounding numpy problem on the following set of data:

data[6318, 36871, 39933, 22753, 9680, 6503, 4032, 2733, 2807, 2185, 1866, 1800, 1907, 1537, 1357, 1221, 1354, 1514, 2021, 11110, 17656, 17397, 24385, 22361, 18709, 20201, 20245, 25767, 21345, 18928, 20958, 20425, 23066, 20221, 18756, 17403, 17843, 21201, 25867, 17342, 16815, 5700, 25897, 20891, 20022, 22291, 24334, 21304, 25328, 22201, 20308, 21539, 29637, 22740, 19510, 18959, 21160, 23520, 20574, 16519, 18779]

Problem occurs at data[-3]. The problem doesn't occur when I cut the rest out:

data[0:len(data)-3]

At that point, the total sum causes numpy's rounding to go berserk, I imagine.

This ends up calling the square root exception above, which in turn causes your "prediction_interval" function to return an array of NaN results:

[[nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]] [[nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]]

Variable "mse" without dtype outputs: [[-35780673.18644068]]

And after including dtype: [[35780673.18644068]]

Other important parameters I used to help you check this were: Prediction: prediction_interval Confidence: 0.05 Smooth Fraction: 0.3 Batch Size: None Didn't use a WindowWrapper

For this project I'm stuck with iterations between 5 and 6 and no Batch Size, I need to smooth the entire data together.

By the way, I think there something going on with the batch_size parameter also, but I haven't got time to look at it yet.

Thanks again for the great project!! Keep up the good work!!

opened by brunom63 3
sklearn api

Would you consider the possibility of making it compatible with sklearn using fit and transform instead of smooth? Is there a specific reason why you save the transformed data as an instance attribute? (this would be against the sklearn API)

I am thinking of doing it myself for a project I am working on but I wanted to ask you first if I missed anything obvious that would make this difficult or not possible.

Many thanks

opened by gioxc88 3
Interoperability with sktime

Hi,

Rather a discussion point than issue: I just saw your post on https://github.com/MaxBenChrist/awesome_time_series_in_python/issues/31 and I'd love to make sktime easily inter-operable with tsmoothie. Would you be interested in working on that?

opened by mloning 3
About component_noise in Kalman filter

Hi Marco I am new to Tsmootie and also Kalman filters. In a process to understand. Have a doubt about component_noise. I have time series where daily seasonality is prominent. So mostly component noise: season= 0.1 works well (low sigma value as I am confident about daily seasonality). But I have tried values like 0.01 and 1 also for the same. I want to know is there any valid limits/ range for the sigma values of component_noise? i.e. 0 to 1 (0 to 100%) etc.

opened by tawdes 2
Is there a way to extend the model past the data?

This is a great library, thanks a lot! I have a question, is there a way to extend the smooth/CI past the data domain? See the below plot, aesthetically I would like the smooth regions to go to the edges of the graph region....

opened by parksj10 2
Which smoother is the best to detect and remove outliers?
Hi Marco! Thank you for an awesome package!

I have a quick question for you. Since you're obviously well-rehearsed in time-series smoothing, which particular smoother will you recommend as a default option?

In particular, I have a training series y_train (which is potentially very short, <50 observations), and I use some univariate forecasting model to forecast H-periods ahead, resulting in an H-dim vector y_hat. Since my training vector is not always very long, some flexible methods give me crazy results for y_hat, which I want to reset to some sensible value.

I could do, for instance,

# Instantiate smoother smoother = ConvolutionSmoother(window_len=0.1*len(y_train), window_type='ones') smoother.smooth(pd.concat([y_train, y_hat], axis=0) # Get threshold threshold_lower, threshold_upper = smoother.get_intervals('sigma_interval', n_sigma=2) # Subset to match length threshold_lower = threshold_lower[0,-len(y_hat):] threshold_upper = threshold_upper[0,-len(y_hat):]

and then use these thresholds. Do you have any recommendations in this setup?
opened by muhlbach 2
Anomaly inference from smoothed data

Thanks for developing this library. This is a pretty interesting one. I have a question when using tsmoothie as follows.

Currently I am using an (unsupervised) clustering method to create a model once on a large amount of data (that, assigns inlier and outlier labels) and then query the model repeatedly with small amounts of new data to predict the label (to infer anomaly).

I am planning to use tsmoothie for filtering the noise in the large input data which will be subject to clustering to assign inlier and outlier labels . Later when I use new data points for predicting the normal or anomaly label, I should smooth that also before prediction. Is that correct?

opened by nsankar 2
WindowWrapper behavior with ExponentialSmoother

When I use the WindowWrapper in combination with LowessSmoother, like in the notebook example, I obtain the desired output (NxM numpy array, where N=samples and M=window size). However, when I use WindowWrapper with ExponentialSmoother i get a Nx1 numpy array.

Is this because ExponentialSmoother is an online-ready algorithm?

code: https://ibb.co/GdBtWJv

opened by meneghet 2

Releases(v1.0.4)

v1.0.4(Jul 15, 2021)

Source code(tar.gz)
Source code(zip)
v1.0.3(Jul 8, 2021)

Source code(tar.gz)
Source code(zip)
v1.0.2(May 5, 2021)

Code restructured
Source code(tar.gz)
Source code(zip)
v1.0.1(Jan 6, 2021)

Added SpectralSmoother and introduced Bootstrap techniques
Source code(tar.gz)
Source code(zip)
v0.2.0(Oct 2, 2020)

Added DecomposeSmoother and improved LowessSmoother
Source code(tar.gz)
Source code(zip)
v0.1.6(Sep 3, 2020)

Source code(tar.gz)
Source code(zip)
v0.1.5(Aug 28, 2020)

Source code(tar.gz)
Source code(zip)
v0.1.4(Aug 27, 2020)

Source code(tar.gz)
Source code(zip)
v0.1.3(Aug 25, 2020)

Source code(tar.gz)
Source code(zip)

Owner

Marco Cerliani

Statistician Hacker & Data Scientist

GitHub

Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”

VectorNet Re-implementation This is the unofficial pytorch implementation of CVPR2020 paper "VectorNet: Encoding HD Maps and Agent Dynamics from Vecto

120 Jan 6, 2023

(JMLR'19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Python Outlier Detection (PyOD) Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License PyOD is a comprehensive and sca

6.6k Jan 3, 2023

Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)

Python Streaming Anomaly Detection (PySAD) PySAD is an open-source python framework for anomaly detection on streaming multivariate data. Documentatio

181 Dec 18, 2022

Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

Stock Price Prediction Using Deep Learning Univariate Time Series Predicting stock price using historical data of a company using Neural networks for

7 Nov 27, 2022

Image Processing, Image Smoothing, Edge Detection and Transforms

opevcvdl-hw1 This project uses openCV and Qt to achieve the requirements. Version Python 3.7 opencv-contrib-python 3.4.2.17 Matplotlib 3.1.1 pyqt5 5.1

3 Aug 17, 2022

SSD: A Unified Framework for Self-Supervised Outlier Detection [ICLR 2021]

SSD: A Unified Framework for Self-Supervised Outlier Detection [ICLR 2021] Pdf: https://openreview.net/forum?id=v5gjXpmR8J Code for our ICLR 2021 pape

113 Nov 27, 2022

Outlier Exposure with Confidence Control for Out-of-Distribution Detection

OOD-detection-using-OECC This repository contains the essential code for the paper Outlier Exposure with Confidence Control for Out-of-Distribution De

64 Nov 2, 2022

Deep Anomaly Detection with Outlier Exposure (ICLR 2019)

Outlier Exposure This repository contains the essential code for the paper Deep Anomaly Detection with Outlier Exposure (ICLR 2019). Requires Python 3

464 Dec 27, 2022

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework Background: Outlier detection (OD) is a key data mining task for identify

127 Jan 5, 2023

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

25 Dec 28, 2022

LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

LVI-SAM This repository contains code for a lidar-visual-inertial odometry and mapping system, which combines the advantages of LIO-SAM and Vins-Mono

1.1k Dec 27, 2022

Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow.

Denoised-Smoothing-TF Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow. Denoised Smoothing is

19 Dec 11, 2022

Implementation of Online Label Smoothing in PyTorch

Online Label Smoothing Pytorch implementation of Online Label Smoothing (OLS) presented in Delving Deep into Label Smoothing. Introduction As the abst

83 Dec 14, 2022

Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

SMU A Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE arXiv https://arxiv.org/abs/211

5 Jan 18, 2022

Supervised Sliding Window Smoothing Loss Function Based on MS-TCN for Video Segmentation

SSWS-loss_function_based_on_MS-TCN Supervised Sliding Window Smoothing Loss Function Based on MS-TCN for Video Segmentation Supervised Sliding Window

3 Aug 3, 2022

A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

OpenHands OpenHands is a gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor. Currently the system can iden

12 Jan 10, 2022

Certifiable Outlier-Robust Geometric Perception

Certifiable Outlier-Robust Geometric Perception About This repository holds the implementation for certifiably solving outlier-robust geometric percep

83 Dec 31, 2022

VOS: Learning What You Don’t Know by Virtual Outlier Synthesis

VOS This is the source code accompanying the paper VOS: Learning What You Don’t

248 Dec 25, 2022

A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding his way.

GuidEye A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding h

0 Aug 9, 2022

A python library for time-series smoothing and outlier detection in a vectorized way.

Related tags

Overview

tsmoothie

Overview

Media

Installation

Usage: smoothing

Usage: bootstrap

References

Comments

Releases(v1.0.4)

v1.0.4(Jul 15, 2021)

v1.0.3(Jul 8, 2021)

v1.0.2(May 5, 2021)

v1.0.1(Jan 6, 2021)

v0.2.0(Oct 2, 2020)

v0.1.6(Sep 3, 2020)

v0.1.5(Aug 28, 2020)

v0.1.4(Aug 27, 2020)

v0.1.3(Aug 25, 2020)

Owner

Marco Cerliani

Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”

(JMLR'19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)

Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

Image Processing, Image Smoothing, Edge Detection and Transforms

SSD: A Unified Framework for Self-Supervised Outlier Detection [ICLR 2021]

Outlier Exposure with Confidence Control for Out-of-Distribution Detection

Deep Anomaly Detection with Outlier Exposure (ICLR 2019)

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow.

Implementation of Online Label Smoothing in PyTorch

Tensorflow Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

Supervised Sliding Window Smoothing Loss Function Based on MS-TCN for Video Segmentation

A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

Certifiable Outlier-Robust Geometric Perception

VOS: Learning What You Don’t Know by Virtual Outlier Synthesis

A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding his way.