Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave

Ben Hamner

Last update: Dec 26, 2022

Related tags

Deep Learning Metrics

Overview

Note: the current releases of this toolbox are a beta release, to test working with Haskell's, Python's, and R's code repositories.

Metrics provides implementations of various supervised machine learning evaluation metrics in the following languages:

Python easy_install ml_metrics
R install.packages("Metrics") from the R prompt
Haskell cabal install Metrics
MATLAB / Octave (clone the repo & run setup from the MATLAB command line)

For more detailed installation instructions, see the README for each implementation.

EVALUATION METRICS

Evaluation Metric	Python	R	Haskell	MATLAB / Octave
Absolute Error (AE)	✓	✓	✓	✓
Average Precision at K (APK, AP@K)	✓	✓	✓	✓
Area Under the ROC (AUC)	✓	✓	✓	✓
Classification Error (CE)	✓	✓	✓	✓
F1 Score (F1)		✓
Gini				✓
Levenshtein	✓		✓	✓
Log Loss (LL)	✓	✓	✓	✓
Mean Log Loss (LogLoss)	✓	✓	✓	✓
Mean Absolute Error (MAE)	✓	✓	✓	✓
Mean Average Precision at K (MAPK, MAP@K)	✓	✓	✓	✓
Mean Quadratic Weighted Kappa	✓	✓		✓
Mean Squared Error (MSE)	✓	✓	✓	✓
Mean Squared Log Error (MSLE)	✓	✓	✓	✓
Normalized Gini				✓
Quadratic Weighted Kappa	✓	✓		✓
Relative Absolute Error (RAE)		✓
Root Mean Squared Error (RMSE)	✓	✓	✓	✓
Relative Squared Error (RSE)		✓
Root Relative Squared Error (RRSE)		✓
Root Mean Squared Log Error (RMSLE)	✓	✓	✓	✓
Squared Error (SE)	✓	✓	✓	✓
Squared Log Error (SLE)	✓	✓	✓	✓

TO IMPLEMENT

F1 score
Multiclass log loss
Lift
Average Precision for binary classification
precision / recall break-even point
cross-entropy
True Pos / False Pos / True Neg / False Neg rates
precision / recall / sensitivity / specificity
mutual information

HIGHER LEVEL TRANSFORMATIONS TO HANDLE

GroupBy / Reduce
Weight individual samples or groups

PROPERTIES METRICS CAN HAVE

(Nonexhaustive and to be added in the future)

Min or Max (optimize through minimization or maximization)
Binary Classification
- Scores predicted class labels
- Scores predicted ranking (most likely to least likely for being in one class)
- Scores predicted probabilities
Multiclass Classification
- Scores predicted class labels
- Scores predicted probabilities
Regression
Discrete Rater Comparison (confusion matrix)

Comments

Automatically run 2to3 when installing on Python 3

I added a couple lines to setup.py to make 2to3 get run automatically by Distribute on Python 3 because we were running into some issues installing this on 3.3. I just followed the recommended steps here.

opened by dan-blanchard 7
Become maintainer of this package

Hi

Do you have any interest in being the maintainer of this package? If not, would you mind if I help revive its status on CRAN?

Thanks Michael Frasco

opened by mfrasco 3
ml_metrics fails to install via pip
$ pip --version pip 1.4.1 from /usr/local/lib/python2.7/site-packages/pip-1.4.1-py2.7.egg (python 2.7

$ pip install ml_metrics Downloading/unpacking ml-metrics Downloading ml_metrics-0.1.3.zip Running setup.py egg_info for package ml-metrics Traceback (most recent call last): File "", line 16, in File "/Users/ndronen/Source/dissertation/projects/iclr-2014/build/ml-metrics/setup.py", line 6, in requirements = [x.strip() for x in open("requirements.txt")] IOError: [Errno 2] No such file or directory: 'requirements.txt' Complete output from command python setup.py egg_info: Traceback (most recent call last):

File "", line 16, in

File "/Users/ndronen/Source/build/ml-metrics/setup.py", line 6, in

requirements = [x.strip() for x in open("requirements.txt")]

IOError: [Errno 2] No such file or directory: 'requirements.txt'
opened by ndronen 3

I have used kappa metric provided here and getting a near zero values for complete disagreement instead of -1. Am i missing something?

x = np.array([0,3,2,4,0,2,0,4,3,0,2])  
y = np.array([2,1,3,2,3,4,2,1,4,3,1])
print (kappa(x,y,min_rating = 0, max_rating = 4))


[complete disagreement]
-0.18627450980392224

opened by prakashjayy 2

Suggested Metric: Mean Absolute Scaled Error

Mean Absolute Scaled Error (MASE) is pretty widely used in econometrics and is one of the best known metrics for forecasting. I wanted to post that it may be appropriate for this package as well

http://robjhyndman.com/papers/foresight.pdf

opened by SteveBronder 2
Removed extra "reduce" statements.

min supports lists directly, so reduce(min, rater_a) was verbose (and potentially less efficient).

I also improved the PEP8 compliance by getting rid of TABs used for indentation and adding spaces around operators.

opened by dan-blanchard 2
Metrics::auc fails due to integer overflow

The auc function cannot support large datasets due to integer overflow. The algorithm that the function uses multiplies the number of positive cases by the number of negative cases. If either of these numbers is large enough, there can be integer overflow.

Would you be open to a pull request that fixed this bug?

opened by mfrasco 1

Metrics R package has been orphaned on CRAN

Hi Ben, I just noticed that the maintainer status of the Metrics R package has been changed to "ORPHANED" on April 21, 2017. The CRAN maintainers must have sent you some emails about issues with the package and couldn't reach you so after a certain amount of time, they set the maintainer to "ORPHANED" and incremented the package version number to 0.1.2.

I fixed the CRAN issues, made updates to the documentation, added examples to all the functions, and incremented the version number to 0.1.3. I've pushed the updates, which you can review on my fork here. Are you interested in re-establishing yourself as the maintainer? If so, I'll submit a PR with my changes and you can submit version 0.1.3 to CRAN directly. If not, let me know and I can help you find someone to take over as the maintainer and have them submit version 0.1.3 to CRAN.

CRAN check output from running R CMD check --as-cran Metrics_0.1.3.tar.gz:

* using log directory ‘/Users/me/code/github-myforks/Metrics/Metrics.Rcheck’
* using R version 3.3.2 (2016-10-31)
* using platform: x86_64-apple-darwin13.4.0 (64-bit)
* using session charset: UTF-8
* using option ‘--as-cran’
* checking for file ‘Metrics/DESCRIPTION’ ... OK
* this is package ‘Metrics’ version ‘0.1.3’
* checking CRAN incoming feasibility ... NOTE
Maintainer: ‘Ben Hamner <[email protected]>’

Days since last update: 4

New maintainer:
  Ben Hamner <[email protected]>
Old maintainer(s):
  ORPHANED

License components with restrictions and base license permitting such:
  BSD_3_clause + file LICENSE
File 'LICENSE':
  YEAR: 2012-2017
  COPYRIGHT HOLDER: Ben Hamner
  ORGANIZATION: copyright holder

CRAN repository db overrides:
  X-CRAN-Comment: Orphaned and corrected on 2017-04-21 as check errors
    were not corrected despite reminders.
  Maintainer: ORPHANED
CRAN repository db conflicts: ‘Maintainer’
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘Metrics’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking use of S3 registration ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking examples ... OK
* checking PDF version of manual ... OK
* DONE

Status: 1 NOTE
See
  ‘/Users/me/code/github-myforks/Metrics/Metrics.Rcheck/00check.log’
for details.

opened by ledell 1

Bumped up version number in setup.py to make pip install latest version with Python 3 fixes.

Currently, the Python ml_metrics package on PyPI is not Python 3 compatible because it is out-of-date. I've bumped the version number up to avoid any conflicts, so if you could please run "python setup.py register" and "python setup.py sdist upload" so this latest version gets up there, that would be extremely helpful.

We at ETS have recently released a ML package, SciKit-Learn Laboratory (SKLL) that relies on ml_metrics for kappa, and we don't want to have to repackage your code with ours unless absolutely necessary.

opened by dan-blanchard 1
Forgot an `import sys` in `setup.py`?
It's such a small thing I think I may be the one missing something here.

I cloned and ran python setup.py build and ran into

Traceback (most recent call last): File "setup.py", line 9, in <module> if sys.version_info >= (3,): NameError: name 'sys' is not defined

Of course, doing an import sys fixes it right up.

Just thought I'd let you know!
opened by vietjtnguyen 1
Something seems wrong with kappa

Adding tests which currently fail -- so we have something to check against.

Also -- I am not familiar with git, so lets hope I am pushing the right buttons,

opened by OlexiyO 1
AP@K Calculate

https://github.com/benhamner/Metrics/blob/9a637aea795dc6f2333f022b0863398de0a1ca77/Python/ml_metrics/average_precision.py#L32

Hello: I notice that the reuslt from running apk([1, 1, 1], [1, 1, 1], 3) does not return 1. I wonder if it should be if p in actual and p in predicted[:i+1]? Thank you.

opened by yanyijiang09 0
Installation problem

when tried to install with pip in virtual environment, it throws an error as below.

pip install ml_metrics Collecting ml_metrics Using cached ml_metrics-0.1.4.tar.gz (5.0 kB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [1 lines of output] error in ml_metrics setup command: use_2to3 is invalid. [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

note: This is an issue with the package mentioned above, not pip. hint: See above for details.

opened by chaitanya-kolliboyina 0
Fix average precision at k calculation

This PR fixes #49 According to the Wikipedia page of Average Precision the equation is defined as follow: where rel(k) is an indicator function equaling 1 if the item at rank k is a relevant document, zero otherwise. Note that the average is over all relevant documents, and the relevant documents not retrieved get a precision score of zero. Before, the average was calculated over the minimum value between the length of the actual value and k. This doesn't seem right since the length of the actual list of k increases; the AP@K will decrease. I fixed and cleaned up the code. Please consider merging this! This could lead to many mistakes.

opened by raminqaf 0
wrong ap@k

After I run the code in my anaconda3

pip install ml_metrics Collecting ml_metrics Requirement already satisfied: numpy in /home/westwood/anaconda3/lib/python3.7/site-packages (from ml_metrics) (1.15.1) Requirement already satisfied: pandas in /home/westwood/anaconda3/lib/python3.7/site-packages (from ml_metrics) (0.23.4) Requirement already satisfied: python-dateutil>=2.5.0 in /home/westwood/anaconda3/lib/python3.7/site-packages (from pandas->ml_metrics) (2.7.3) Requirement already satisfied: pytz>=2011k in /home/westwood/anaconda3/lib/python3.7/site-packages (from pandas->ml_metrics) (2018.5) Requirement already satisfied: six>=1.5 in /home/westwood/anaconda3/lib/python3.7/site-packages (from python-dateutil>=2.5.0->pandas->ml_metrics) (1.11.0) Installing collected packages: ml-metrics Successfully installed ml-metrics-0.1.4

In the file

Is wrong !!!

And different with

opened by MentalOmega 2

Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave

Related tags

Overview

EVALUATION METRICS

TO IMPLEMENT

HIGHER LEVEL TRANSFORMATIONS TO HANDLE

PROPERTIES METRICS CAN HAVE

Comments

After I run the code in my anaconda3

In the file

Is wrong !!!

And different with

Owner

Ben Hamner

Tensors and neural networks in Haskell

TorchMetrics is a collection of 25+ PyTorch metrics implementations and an easy-to-use API to create custom metrics.

Object detection evaluation metrics using Python.

Provide baselines and evaluation metrics of the task: traffic flow prediction

The project covers common metrics for super-resolution performance evaluation.

An efficient PyTorch implementation of the evaluation metrics in recommender systems.

YoloV5 implemented by TensorFlow2 , with support for training, evaluation and inference.

Numba-accelerated Pythonic implementation of MPDATA with examples in Python, Julia and Matlab

Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab

Matlab Python Heuristic Battery Opt - SMOP conversion and manual conversion

Use MATLAB to simulate the signal and extract features. Use PyTorch to build and train deep network to do spectrum sensing.

MATLAB codes of the book "Digital Image Processing Fourth Edition" converted to Python

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

📚 A collection of all the Deep Learning Metrics that I came across which are not accuracy/loss.