**Uncertainty Toolbox**

A python toolbox for predictive uncertainty quantification, calibration, metrics, and visualization.

Also: a glossary of useful terms and a collection of relevant papers and references.

Many machine learning methods return predictions along with uncertainties of some form, such as distributions or confidence intervals. This begs the questions: How do we determine which predictive uncertanties are best? What does it mean to produce a *best* or *ideal* uncertainty? Are our uncertainties accurate and *well calibrated*?

Uncertainty Toolbox provides standard metrics to quantify and compare predictive uncertainty estimates, gives intuition for these metrics, produces visualizations of these metrics/uncertainties, and implements simple "re-calibration" procedures to improve these uncertainties. This toolbox currently focuses on regression tasks.

## Toolbox Contents

Uncertainty Toolbox contains:

- Glossary of terms related to predictive uncertainty quantification.
- Metrics for assessing quality of predictive uncertainty estimates.
- Visualizations for predictive uncertainty estimates and metrics.
- Recalibration methods for improving the calibration of a predictor.
- Paper list: publications and references on relevant methods and metrics.

## Installation

Uncertainty Toolbox requires Python 3.6+. For a lightweight installation of the package only, run:

`pip install git+https://github.com/uncertainty-toolbox/uncertainty-toolbox`

For a full installation with examples and tests, run:

```
git clone https://github.com/uncertainty-toolbox/uncertainty-toolbox.git
cd uncertainty-toolbox
pip install -e .
```

To verify correct installation, you can run the test suite via:

`source shell/run_all_tests.sh`

## Quick Start

```
import uncertainty_toolbox as uct
# Load an example dataset of 100 predictions, uncertainties, and ground truth values
predictions, predictions_std, y, x = uct.data.synthetic_sine_heteroscedastic(100)
# Compute all uncertainty metrics
metrics = uct.metrics.get_all_metrics(predictions, predictions_std, y)
```

This example computes metrics for a vector of predicted values (`predictions`

) and associated uncertainties (`predictions_std`

, a vector of standard deviations), taken with respect to a corresponding set of ground truth values `y`

.

**Colab notebook:** You can also take a look at this Colab notebook, which walks through a use case of Uncertainty Toolbox.

## Metrics

Uncertainty Toolbox provides a number of metrics to quantify and compare predictive uncertainty estimates. For example, the `get_all_metrics`

function will return:

**average calibration**:*mean absolute calibration error, root mean squared calibration error, miscalibration area.***adversarial group calibration**:*mean absolute adversarial group calibration error, root mean squared adversarial group calibration error.***sharpness**:*expected standard deviation.***proper scoring rules**:*negative log-likelihood, continuous ranked probability score, check score, interval score.***accuracy**:*mean absolute error, root mean squared error, median absolute error, coefficient of determination, correlation.*

## Visualizations

The following plots are a few of the visualizations provided by Uncertainty Toolbox. See this example for code to reproduce these plots.

**Overconfident** (*too little uncertainty*)

**Underconfident** (*too much uncertainty*)

**Well calibrated**

And here are a few of the calibration metrics for the above three cases:

Mean absolute calibration error (MACE) | Root mean squared calibration error (RMSCE) | Miscalibration area (MA) | |
---|---|---|---|

Overconfident | 0.19429 | 0.21753 | 0.19625 |

Underconfident | 0.20692 | 0.23003 | 0.20901 |

Well calibrated | 0.00862 | 0.01040 | 0.00865 |

## Recalibration

The following plots show the results of a recalibration procedure provided by Uncertainty Toolbox, which transforms a set of predictive uncertainties to improve average calibration. The algorithm is based on isotonic regression, as proposed by Kuleshov et al.

See this example for code to reproduce these plots.

**Recalibrating overconfident predictions**

Mean absolute calibration error (MACE) | Root mean squared calibration error (RMSCE) | Miscalibration area (MA) | |
---|---|---|---|

Before Recalibration | 0.19429 | 0.21753 | 0.19625 |

After Recalibration | 0.01124 | 0.02591 | 0.01117 |

**Recalibrating underconfident predictions**

Mean absolute calibration error (MACE) | Root mean squared calibration error (RMSCE) | Miscalibration area (MA) | |
---|---|---|---|

Before Recalibration | 0.20692 | 0.23003 | 0.20901 |

After Recalibration | 0.00157 | 0.00205 | 0.00132 |

## Contributing

We welcome and greatly appreciate contributions from the community! Please see our contributing guidelines for details on how to help out.

## Citation

If you found this toolbox helpful, please cite the following paper:

```
@article{chung2021uncertainty,
title={Uncertainty Toolbox: an Open-Source Library for Assessing, Visualizing, and Improving Uncertainty Quantification},
author={Chung, Youngseog and Char, Ian and Guo, Han and Schneider, Jeff and Neiswanger, Willie},
journal={arXiv preprint arXiv:2109.10254},
year={2021}
}
```

Additionally, here are papers that led to the development of the toolbox:

```
@article{chung2020beyond,
title={Beyond Pinball Loss: Quantile Methods for Calibrated Uncertainty Quantification},
author={Chung, Youngseog and Neiswanger, Willie and Char, Ian and Schneider, Jeff},
journal={arXiv preprint arXiv:2011.09588},
year={2020}
}
@article{tran2020methods,
title={Methods for comparing uncertainty quantifications for material property predictions},
author={Tran, Kevin and Neiswanger, Willie and Yoon, Junwoong and Zhang, Qingyang and Xing, Eric and Ulissi, Zachary W},
journal={Machine Learning: Science and Technology},
volume={1},
number={2},
pages={025006},
year={2020},
publisher={IOP Publishing}
}
```

## Acknowledgments

Development of Uncertainty Toolbox is supported by the following organizations.