Quantify the difference between two arbitrary curves in space

Charles Jekel

Last update: Jan 8, 2023

Related tags

Deep Learning python dtw measure distance curve similarity-measures warping dynamic-time-warping frechet-distance fr-chet-distance

Overview

similaritymeasures

Quantify the difference between two arbitrary curves

Curves in this case are:

discretized by inidviudal data points
ordered from a beginning to an ending

Consider the following two curves. We want to quantify how different the Numerical curve is from the Experimental curve. Notice how there are no concurrent Stress or Strain values in the two curves. Additionally one curve has more data points than the other curves.

In the ideal case the Numerical curve would match the Experimental curve exactly. This means that the two curves would appear directly on top of each other. Our measures of similarity would return a zero distance between two curves that were on top of each other.

Methods covered

This library includes the following methods to quantify the difference (or similarity) between two curves:

Partial Curve Mapping^x (PCM) method: Matches the area of a subset between the two curves [1]
Area method^x: An algorithm for calculating the Area between two curves in 2D space [2]
Discrete Frechet distance^y: The shortest distance in-between two curves, where you are allowed to very the speed at which you travel along each curve independently (walking dog problem) [3, 4, 5, 6, 7, 8]
Curve Length^x method: Assumes that the only true independent variable of the curves is the arc-length distance along the curve from the origin [9, 10]
Dynamic Time Warping^y (DTW): A non-metric distance between two time-series curves that has been proven useful for a variety of applications [11, 12, 13, 14, 15, 16]

^x denotes methods created specifically for material parameter identification

^y denotes that the method implemented in this library supports N-D data!

Installation

Install with pip

[sudo] pip install similaritymeasures

or clone and install from source.

git clone https://github.com/cjekel/similarity_measures
[sudo] pip install ./similarity_measures

Example usage

This shows you how to compute the various similarity measures

import numpy as np
import similaritymeasures
import matplotlib.pyplot as plt

# Generate random experimental data
x = np.random.random(100)
y = np.random.random(100)
exp_data = np.zeros((100, 2))
exp_data[:, 0] = x
exp_data[:, 1] = y

# Generate random numerical data
x = np.random.random(100)
y = np.random.random(100)
num_data = np.zeros((100, 2))
num_data[:, 0] = x
num_data[:, 1] = y

# quantify the difference between the two curves using PCM
pcm = similaritymeasures.pcm(exp_data, num_data)

# quantify the difference between the two curves using
# Discrete Frechet distance
df = similaritymeasures.frechet_dist(exp_data, num_data)

# quantify the difference between the two curves using
# area between two curves
area = similaritymeasures.area_between_two_curves(exp_data, num_data)

# quantify the difference between the two curves using
# Curve Length based similarity measure
cl = similaritymeasures.curve_length_measure(exp_data, num_data)

# quantify the difference between the two curves using
# Dynamic Time Warping distance
dtw, d = similaritymeasures.dtw(exp_data, num_data)

# print the results
print(pcm, df, area, cl, dtw)

# plot the data
plt.figure()
plt.plot(exp_data[:, 0], exp_data[:, 1])
plt.plot(num_data[:, 0], num_data[:, 1])
plt.show()

If you are interested in setting up an optimization problem using these measures, check out this Jupyter Notebook which replicates Section 3.2 from [2].

Changelog

Version 0.3.0: Frechet distance now supports N-D data! See CHANGELOG.md for full details.

Documenation

Each function includes a descriptive docstring, which you can view online here.

References

[1] Katharina Witowski and Nielen Stander. Parameter Identification of Hysteretic Models Using Partial Curve Mapping. 12th AIAA Aviation Technology, Integration, and Op- erations (ATIO) Conference and 14th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, sep 2012. doi: doi:10.2514/6.2012-5580.

[2] Jekel, C. F., Venter, G., Venter, M. P., Stander, N., & Haftka, R. T. (2018). Similarity measures for identifying material parameters from hysteresis loops using inverse analysis. International Journal of Material Forming. https://doi.org/10.1007/s12289-018-1421-8

[3] M Maurice Frechet. Sur quelques points du calcul fonctionnel. Rendiconti del Circol Matematico di Palermo (1884-1940), 22(1):1–72, 1906.

[4] Thomas Eiter and Heikki Mannila. Computing discrete Frechet distance. Technical report, 1994.

[5] Anne Driemel, Sariel Har-Peled, and Carola Wenk. Approximating the Frechet Distance for Realistic Curves in Near Linear Time. Discrete & Computational Geometry, 48(1): 94–127, 2012. ISSN 1432-0444. doi: 10.1007/s00454-012-9402-z. URL http://dx.doi.org/10.1007/s00454-012-9402-z.

[6] K Bringmann. Why Walking the Dog Takes Time: Frechet Distance Has No Strongly Subquadratic Algorithms Unless SETH Fails, 2014.

[7] Sean L Seyler, Avishek Kumar, M F Thorpe, and Oliver Beckstein. Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways. PLOS Computational Biology, 11(10):1–37, 2015. doi: 10.1371/journal.pcbi.1004568. URL https://doi.org/10.1371/journal.pcbi.1004568.

[8] Helmut Alt and Michael Godau. Computing the Frechet Distance Between Two Polyg- onal Curves. International Journal of Computational Geometry & Applications, 05 (01n02):75–91, 1995. doi: 10.1142/S0218195995000064.

[9] A Andrade-Campos, R De-Carvalho, and R A F Valente. Novel criteria for determina- tion of material model parameters. International Journal of Mechanical Sciences, 54 (1):294–305, 2012. ISSN 0020-7403. doi: https://doi.org/10.1016/j.ijmecsci.2011.11.010. URL http://www.sciencedirect.com/science/article/pii/S0020740311002451.

[10] J Cao and J Lin. A study on formulation of objective functions for determin- ing material models. International Journal of Mechanical Sciences, 50(2):193–204, 2008. ISSN 0020-7403. doi: https://doi.org/10.1016/j.ijmecsci.2007.07.003. URL http://www.sciencedirect.com/science/article/pii/S0020740307001178.

[11] Donald J Berndt and James Clifford. Using Dynamic Time Warping to Find Pat- terns in Time Series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, AAAIWS’94, pages 359–370. AAAI Press, 1994. URL http://dl.acm.org/citation.cfm?id=3000850.3000887.

[12] François Petitjean, Alain Ketterlin, and Pierre Gançarski. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognition, 44 (3):678–693, 2011. ISSN 0031-3203. doi: https://doi.org/10.1016/j.patcog.2010.09.013. URL http://www.sciencedirect.com/science/article/pii/S003132031000453X.

[13] Toni Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. Journal of Statistical Software; Vol 1, Issue 7 (2009), aug 2009. URL http://dx.doi.org/10.18637/jss.v031.i07.

[14] Stan Salvador and Philip Chan. Toward Accurate Dynamic Time Warping in Linear Time and Space. Intell. Data Anal., 11(5):561–580, oct 2007. ISSN 1088-467X. URL http://dl.acm.org/citation.cfm?id=1367985.1367993.

[15] Paolo Tormene, Toni Giorgino, Silvana Quaglini, and Mario Stefanelli. Matching incomplete time series with dynamic time warping: an algorithm and an applica- tion to post-stroke rehabilitation. Artificial Intelligence in Medicine, 45(1):11–34, 2009. ISSN 0933-3657. doi: https://doi.org/10.1016/j.artmed.2008.11.007. URL http://www.sciencedirect.com/science/article/pii/S0933365708001772.

[16] Senin, P., 2008. Dynamic time warping algorithm review. Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA, 855, pp.1-23. http://seninp.github.io/assets/pubs/senin_dtw_litreview_2008.pdf

Contributions welcome!

This is by no means a complete list of all possible similarity measures. For instance the SciPy Hausdorff distance is an alternative similarity measure useful if you don't know the beginning and ending of each curve. There are many more possible functions out there. Feel free to send PRs for other functions in literature!

Requirements for adding new method to this library:

all methods should be able to quantify the difference between two curves
method must support the case where each curve may have a different number of data points
follow the style of existing functions
reference to method details, or descriptive docstring of the method
include test(s) for your new method
minimum Python dependencies (try to stick to SciPy/numpy functions if possible)

Please cite

If you've found this information or library helpful please cite the following paper. You should also cite the papers of any methods that you have used.

Jekel, C. F., Venter, G., Venter, M. P., Stander, N., & Haftka, R. T. (2018). Similarity measures for identifying material parameters from hysteresis loops using inverse analysis. International Journal of Material Forming. https://doi.org/10.1007/s12289-018-1421-8

@article{Jekel2019,
author = {Jekel, Charles F and Venter, Gerhard and Venter, Martin P and Stander, Nielen and Haftka, Raphael T},
doi = {10.1007/s12289-018-1421-8},
issn = {1960-6214},
journal = {International Journal of Material Forming},
month = {may},
title = {{Similarity measures for identifying material parameters from hysteresis loops using inverse analysis}},
url = {https://doi.org/10.1007/s12289-018-1421-8},
year = {2019}
}

Comments

frechet_dist input size is bounded by maximum recursion depth
Consider the followings:

max_len = 1000 a = [[1,2,3] for i in range(max_len)] b = [[1,6,3] for i in range(max_len)] frechet_dist(a,b)

While running this code on a 32GB RAM machine it raises a stack-overflow error. I would suggest to switch the recursion based computations to iterative based computations using Queue's.

Is anyone currently working on optimizing the memory usage of frechet_dist ?

Thank you for your work, Arbel Amir
opened by ArbelAmir 8
discrete Frechet distance between lists or 1D arrays

my question might sounds a little dumb. but is it possible to use similaritymeasures.frechet_dist() for lists or 1D arrays? i tried to calculate similarity between a list and other multiple list (which also contains the first list )but the most similar output was not the first argument.which i expect to return it since they are exactly the same.but it works when implemented in real coordinates with lat ,lon like trajectories and the most similar output is the first given argument.i'm trying to use factors other than distance for calculating similarity between two thing and those parametrs are just a numerical values and i'm wondering how can i use this frechet _dist for list arrays.

opened by miladad8 6
Regarding code update in "is_simple_quad" function on Aug 18,2019
Dear Authors,

Thanks for your contribution in the form of "simialritymeasures" library for quantifying the difference between the curves. I have been using it for finding the area between the curves. But, since your update in the code to check if the quadrilateral is simple or not [ in "is_simple_quad" function on Aug 18,2019], the output for area between the curves is not correct. (However, if I use the previous code the area returned is correct). Specifically, the "if condition" which checks the number of cross products with same sign, should be: sum(crossTF) > 2 instead of sum(crossTF) == 2

The same can be checked from the following code which tries to find the area between two simple curves. Running the following prints : area1 : 0.0

while using the previous code give correct area (4 in this case)

import matplotlib.pyplot as plt import similaritymeasures xaxis=[0,1, 2, 3, 4] curve1=[0,0,0,0,0] curve2=[1,1,1,1,1] exp_data = np.zeros((len(xaxis), 2)) num_data = np.zeros((len(xaxis), 2)) exp_data[:, 0] = xaxis exp_data[:, 1] = curve1 num_data[:, 0] = xaxis num_data[:, 1] = curve2 plt.figure() plt.scatter(xaxis, curve1) plt.scatter(xaxis, curve2) plt.show() area1=similaritymeasures.area_between_two_curves(exp_data, num_data) print("area1 : "+str(area1) )```
opened by aanchalMongia 5

Problem during pip install: "UnicodeDecodeError: 'gbk' codec can't decode byte"

pip install similaritymeasures gives

(base) D:\repositories\joinTracks>pip install similaritymeasures
Collecting similaritymeasures
  Using cached similaritymeasures-0.4.3.tar.gz (397 kB)
    ERROR: Command errored out with exit status 1:
     command: 'C:\Users\s00557672\Anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\S00557~1\\AppData\\Local\\Temp\\pip-install-d5tndazp\\similaritymeasures\\setup.py'"'"';
__file__='"'"'C:\\Users\\S00557~1\\AppData\\Local\\Temp\\pip-install-d5tndazp\\similaritymeasures\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'
"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\S00557~1\AppData\Local\Temp\pip-install-d5tndazp\similaritymeasures\pip-egg-info'
         cwd: C:\Users\S00557~1\AppData\Local\Temp\pip-install-d5tndazp\similaritymeasures\
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\S00557~1\AppData\Local\Temp\pip-install-d5tndazp\similaritymeasures\setup.py", line 12, in <module>
        long_description=open('README.md').read(),
    UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 5204: illegal multibyte sequence
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

How can I fix it?

opened by sergorl 4

Added MAE and MSE

Added extra functions to find the Mean Absolute Distance (MAE) and the Mean Squared Distance (MSE) between the two curves. It works with all the distance measures in scipy.spatial.distance.cdist.

opened by HarshRaoD 2
Similarity between two curves which have different number of data points

Hello, I would like to know how to compute the similarity of two curves which own various number of data points? Such like that you referred to on the main page: Also, which methods support this? And what is the meaning of the results? Looking for your reply.

opened by xiaobrnbrn 2
pcm may be wrong

A user has pointed out that i have potentially an incorrect pcm implementation because I divide the distances by a max value.

It is possible that it is a mistake on my part, where I was trying to combine code for the curve_length and pcm methods. It is also possible that I thought xmax and ymax would always be one, so it wouldn't matter. The curve_length method needs the max values because there is no other normalization.

The line in question: https://github.com/cjekel/similarity_measures/blob/master/similaritymeasures/similaritymeasures.py#L352

I'm pretty sure that line is correct for the curve_length_measure method.
bug help wanted

opened by cjekel 0
Improv perf

Good afternoon, I hope this message finds you well, and I compliment and thank you for the code.

I worked with frechet and dtw, and I found that computational performances of the frechet function were subpar since they didn't use the cdist function from scipy (which i found by far more performing than the minkowski_distance one).

I took the freedom to change the code and propose you a pull request (i also modified tests code in order for it not to import the already installed package).

On my day to day job I also use cython to improve performances of python programs, and I was thinking that maybe it could benefit some of the loops in the code.

Best of wishes for whatever!

Nuc

opened by nucccc 3
Similarity between two curves using PyTorch

Hi guys,

I want to implement in my trainer a measure of similarity between my predicted trajectory and the GT trajectory. Here is an example:

The GT is the red line, my observation is the yellow line (almost hidden by the other participants) and the green line is my prediction. The other agents are not used at this moment.

Now, in order to train my DL based Motion Prediction algorithm I am using the ADE, FDE and NLL losses w.r.t. the GT. Nevertheless, I think that if my prediction does not match exactly the GT but it is in the same centerline (but driving with a different velocity, for example) it will be better. E.g.

This prediction does not match the GT (until the red diamond at the bottom), but at least the shapes of both curves are more or less the same.

How could I do that?

opened by Cram3r95 20
Incorrect measurement of area between intersecting 2D curves

It appears that either my understanding of area between the curves or its calculation in the library is incorrect (the referenced paper is paywalled). In the following example, I have two plots, where grey line is original data, and there are two different blue splines. Visually, you can clearly see that the area between two lines on the left plot is several times larger than the area on the right plot, but the calculation with similaritymeasures.area_between_two_curves shows only a 2x difference.

Here is the GitHub gist, where I present the calculation (the GDrive zip with airfoils is public, so the whole thing can be ran in Colab or elsewhere if you modify the path in the 2nd cell): https://gist.github.com/rafalszulejko/2c9ff645b448d60d857975a8f7965045#file-wing-optimization2-ipynb

opened by rafalszulejko 11
Faster DTW

Hello,

Thanks for a really nice repo with an easy-to-use API for quickly generating some metrics on curve similarities. I just thought I would let you know that there is a much faster DTW implementation than the one you are using in this repo which if it covers your needs you should consider replacing with the current implementation:

Link to faster DTW implementation

Carry on the great work! :)

opened by vancromy 2
Add other interpolation methods to the area between curves method

Right now the area between curves method uses bisection of largest gap to add artificial data points. This method was used to minimize the number of artificial quads/points. However, this can have some negative effects in some cases, specifically when the sampling rate is artificial and does not match (e.g. one curve is just a straight line with few points).

A potential alternative it to use the arc length projection of one curve's points onto the other. This would preserve the sampling rate, and may make for more uniform quads. This is similar to what's done in PCM method.

When another interpolation method is added, give users the choice of which interpolation method to use. Changing the interpolation method is anticipated to change the results.

opened by cjekel 0

Releases(0.6.0)

0.6.0(Oct 8, 2022)
similaritymeasures.pcm now produces different values! This was done to better follow the original algorithm. To get the same results from previous versions, set norm_seg_length=True. What this option does is scale each segment length by the maximum values of the curve (borrowed from the curve_length_measure). This scaling should not be needed with the PCM method because both curves are always scaled initially.

Fix docstring documentation for returns in similaritymeasures.dtw and similaritymeasures.curve_length_measure

Source code(tar.gz)
Source code(zip)
v0.5.0(Aug 6, 2022)

Adds mean squared error (mse) and mean absolute error (mae) methods thanks to @HarshRaoD
Source code(tar.gz)
Source code(zip)

Quantify the difference between two arbitrary curves in space

Related tags

Overview

similaritymeasures

Quantify the difference between two arbitrary curves

Methods covered

Installation

Example usage

Changelog

Documenation

References

Contributions welcome!

Please cite

Comments

Releases(0.6.0)

0.6.0(Oct 8, 2022)

v0.5.0(Aug 6, 2022)

Owner

Charles Jekel

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

MNIST, but with Bezier curves instead of pixels

Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'

Plotting points that lie on the intersection of the given curves using gradient descent.

DIT is a DTLS MitM proxy implemented in Python 3. It can intercept, manipulate and suppress datagrams between two DTLS endpoints and supports psk-based and certificate-based authentication schemes (RSA + ECC).

A fast model to compute optical flow between two input images.

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

TDN: Temporal Difference Networks for Efficient Action Recognition

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Official PyTorch implementation of "Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics".

Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

Ankou: Guiding Grey-box Fuzzing towards Combinatorial Difference

Finite difference solution of 2D Poisson equation. Can handle Dirichlet, Neumann and mixed boundary conditions.

CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote Sensing Images

2D Time independent Schrodinger equation solver for arbitrary shape of well

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos