Histogramming for analysis powered by boost-histogram

Scikit-HEP Project

Last update: Dec 25, 2022

Related tags

Overview

Hist

Hist is an analyst-friendly front-end for boost-histogram, designed for Python 3.7+ (3.6 users get version 2.4). See what's new.

Installation

You can install this library from PyPI with pip:

python3 -m pip install "hist[plot]"

If you do not need the plotting features, you can skip the [plot] extra.

Features

Hist currently provides everything boost-histogram provides, and the following enhancements:

Hist augments axes with names:
- name= is a unique label describing each axis.
- label= is an optional string that is used in plotting (defaults to name if not provided).
- Indexing, projection, and more support named axes.
- Experimental NamedHist is a Hist that disables most forms of positional access, forcing users to use only names.
The Hist class augments bh.Histogram with simpler construction:
- flow=False is a fast way to turn off flow for the axes on construction.
- Storages can be given by string.
- storage= can be omitted, strings and storages can be positional.
- data= can initialize a histogram with existing data.
- Hist.from_columns can be used to initialize with a DataFrame or dict.
- You can cast back and forth with boost-histogram (or any other extensions).
Hist support QuickConstruct, an import-free construction system that does not require extra imports:
- Use Hist.new.<axis>().<axis>().<storage>().
- Axes names can be full (Regular) or short (Reg).
- Histogram arguments (like data=) can go in the storage.
Extended Histogram features:
- Direct support for .name and .label, like axes.
- .density() computes the density as an array.
- .profile(remove_ax) can convert a ND COUNT histogram into a (N-1)D MEAN histogram.
- .sort(axis) supports sorting a histogram by a categorical axis. Optionally takes a function to sort by.
Hist implements UHI+; an extension to the UHI (Unified Histogram Indexing) system designed for import-free interactivity:
- Uses j suffix to switch to data coordinates in access or slices.
- Uses j suffix on slices to rebin.
- Strings can be used directly to index into string category axes.
Quick plotting routines encourage exploration:
- .plot() provides 1D and 2D plots (or use plot1d(), plot2d())
- .plot2d_full() shows 1D projects around a 2D plot.
- .plot_ratio(...) make a ratio plot between the histogram and another histogram or callable.
- .plot_pull(...) performs a pull plot.
- .plot_pie() makes a pie plot.
- .show() provides a nice str printout using Histoprint.
Stacks: work with groups of histograms with identical axes
- Stacks can be created with h.stack(axis), using index or name of an axis (StrCategory axes ideal).
- You can also create with hist.stacks.Stack(h1, h2, ...), or use from_iter or from_dict.
- You can index a stack, and set an entry with a matching histogram.
- Stacks support .plot() and .show(), with names (plot labels default to original axes info).
- Stacks pass through .project, *, +, and -.
New modules
- intervals supports frequentist coverage intervals.
Notebook ready: Hist has gorgeous in-notebook representation.
- No dependencies required

Usage

from hist import Hist

# Quick construction, no other imports needed:
h = (
  Hist.new
  .Reg(10, 0 ,1, name="x", label="x-axis")
  .Var(range(10), name="y", label="y-axis")
  .Int64()
)

# Filling by names is allowed:
h.fill(y=[1, 4, 6], x=[3, 5, 2])

# Names can be used to manipulate the histogram:
h.project("x")
h[{"y": 0.5j + 3, "x": 5j}]

# You can access data coordinates or rebin with a `j` suffix:
h[.3j:, ::2j] # x from .3 to the end, y is rebinned by 2

# Elegant plotting functions:
h.plot()
h.plot2d_full()
h.plot_pull(Callable)

Development

From a git checkout, either use nox, or run:

python -m pip install -e .[dev]

See Contributing guidelines for information on setting up a development environment.

Contributors

We would like to acknowledge the contributors that made this project possible (emoji key):

_{Henry Schreiner} 🚧 💻 📖	_{Nino Lau} 🚧 💻 📖	_{Chris Burr} 💻	_{Nick Amin} 💻	_{Eduardo Rodrigues} 💻	_{Andrzej Novak} 💻	_{Matthew Feickert} 💻
_{Kyle Cranmer} 📖	_{Daniel Antrim} 💻	_{Nicholas Smith} 💻	_{Michael Eliachevitch} 💻	_{Jonas Eschle} 📖

This project follows the all-contributors specification.

Talks

Acknowledgements

This library was primarily developed by Henry Schreiner and Nino Lau.

Support for this work was provided by the National Science Foundation cooperative agreement OAC-1836650 (IRIS-HEP) and OAC-1450377 (DIANA/HEP). Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Comments

feat: Add ratio plot support through .plot_ratio API
Resolves #148

This PR adds support for ratio plots to the .plot... API by adding .plot_ratio and refactoring .plot_pull to use .plot_ratio.

This PR is still very rough, but I thought I'd open it up as there is a minimal working example now and then revise it heavily from there.

As the commit log is huge and messy I'll rebase and squash things into more reasonable commit sections to make review more digestible.

Questions to come to consensus on

(Discussion can happen on Issue #148)

Should the interval functions be moved to their own stat_interval module?

Answer: Yes, this is PR #176.

TODO before requesting review

[x] Finish refactoring .plot_pull

[x] Add kwargs and kwargs filtering support

[x] Fix typing errors

[x] Add tests

[x] All tests are passing

[x] Docstrings added

[x] README updated

[x] Changelog updated.

Suggested squash and merge message

* Add .plot_ratio to BastHist API - Adds ratio plot support for other Hists and callables * Factor out plot_ratio and plot_pull to perform the final subplot plotting - Majority of logic is factored out of plot_pull into _plot_ratiolike - _plot_ratiolike then calls plot_ratio or plot_pull as needed * Add plotting tests and test images for plot_ratio * Update test image for plot_pull * Update README and Changelog to reflect addition of plot_ratio Co-authored-by: Henry Schreiner <[email protected]>
enhancement
opened by matthewfeickert 34
Hist.plot_pull: more suitable bands in the pull bands 1sigma, 2 sigma, etc.

I was playing with Hist.plot_pull and noticed that the range of the pulls is dynamic, meaning it is calculated from max(np.abs(pulls)) / pp_num in the code. This is not ideal since (1) pull plots are basically always displayed between -5 and 5, and (2) having several pull plots side-by-side would likely provide ranges on a plot-by-plot case, which is not ideal.

This little PR fixes the issue. It also makes sure that the colour bands by default are set at 1/2/.../5 sigma.

I will post an issue with some other matters related to the function.

Next week I will be giving some lectures and will be showing Hist. If at all possible it would be great to have this in a release :-).

opened by eduardo-rodrigues 13
Should Every Axis Have Name?

https://github.com/scikit-hep/hist/blob/be2c8380793807966f1ab58c4e50962fae7e45f9/src/hist/_internal/axis.py#L5-L17

Should every axis be forced to have a name? I see you let title to be an arg (cannot be omitted). Hence that name should have the same status, it is should be indispensable, too. But in the unit tests, we are supposed to pass the test where axes have no names, e.g.

https://github.com/scikit-hep/hist/blob/be2c8380793807966f1ab58c4e50962fae7e45f9/tests/test_general.py#L5

and

https://github.com/scikit-hep/hist/blob/be2c8380793807966f1ab58c4e50962fae7e45f9/tests/test_named.py#L6

P.S. The current practice, i.e., #13, is to modify the tests to enforce them to have names. An alternative plan is to make title and name both omissible, which might be subject to the issue of filling-by-name (this functionality should be thrown out for hist with anonymous axes).

opened by LovelyBuggies 13

More flexible fitting function, allow likelihood, remove uncertainties dependency

Based on the discussion in https://github.com/scikit-hep/hist/issues/146, I added some features to plot_pull. The theme is making fitting more streamlined for exploration.

Pull curve_fit initial guess (p0) from default arguments, if they exist
Allow a string as an alternative to a lambda function (plot_pull("a+b*x"))
Cosmetic change to the band, and embedding fit result into the legend
Likelihood fit (plot_pull(..., likelihood=True)) (chi2 by default, as before)
Remove uncertainties.numpy dependency and construct band by resampling covariance matrix
Introduce iminuit (gets the covariance matrix right, unlike scipy.optimize most of the time), but the initial guesses for iminuit are seeded from scipy

Setup

import numpy as np
from hist import Hist

np.random.seed(42)
hh = Hist.new.Reg(50, -5, 5).Double().fill(np.random.normal(0,1,int(1e5)))

Before (including a bug-fix for the variances from the above issue):

from uncertainties import unumpy as unp
def func(x, constant, mean, sigma):
    exp = unp.exp if constant.dtype == np.dtype("O") else np.exp
    return constant * exp(-((x - mean) ** 2.0) / (2 * sigma ** 2))

hh.plot_pull(func)

After:

# as before, but no need for `uncertainties.numpy` as the error band comes
# from resampling the covariance matrix
def func(x, constant, mean, sigma):
    return constant * np.exp(-((x - mean) ** 2.0) / (2 * sigma ** 2))
hh.plot_pull(func)

# `curve_fit` `p0` extracted from defaults, if any
def func(x, constant=80, mean=0., sigma=1.):
    return constant * np.exp(-((x - mean) ** 2.0) / (2 * sigma ** 2))
hh.plot_pull(func)

# strings are allowed to allow for more compactness than a lambda
# x is assumed to be the main variable
hh.plot_pull("constant*np.exp(-(x-mean)**2. / (2*sigma**2))")

# gaussian is a common/special function, so this also works
# reasonable guesses are made for constant/mean/sigma
hh.plot_pull("gaus")

# chi2 puts `a` around 5, but likelihood puts `a` around 1e3/50 = 20
hh.plot_pull("a+b*x", likelihood=True)

opened by aminnj 12

[FEATURE] Typing Hints for `get_item` and `set_item`

Describe the problem, if any, that your feature request is related to

Don't know how to add typing hints for these two func.

https://github.com/scikit-hep/hist/blob/dc9b209dfa5d2fa934a803c9cb589b414314fc13/src/hist/named.py#L50

https://github.com/scikit-hep/hist/blob/dc9b209dfa5d2fa934a803c9cb589b414314fc13/src/hist/named.py#L66
enhancement

opened by LovelyBuggies 11
feat: axis sort
When completed, closes #222

To implement:

sort by passing index

sort by passing lambda to sorted

add tests

@henryiii could you take a look? I had to abuse some design decisions of hist to get this work, so maybe you can recommend better workarounds.

AxisNamedTuple (and I assume axes) as well cannot be assigned to, so I had to recreate the sorted axis

Couldn't easily copy all meta/traits from old axis to new - hence the helper with inspect (name/label don't seem to be part of traits/metadata) - also cannot create axis and edit metadata

Here is a couple of examples 2D - yax 2D - xax 1D 3D - sort on not-projected axis
opened by andrzejnovak 10

feat: adding something like the classic hist

Addressing #35.

Currently it seems that axes without names are broken in Hist. If I do a new pip install -e .[dev], then try, from a command line:

hist
1
2
3
2

Then press control-D, I get:

Traceback (most recent call last):
  File "/Users/henryschreiner/git/scikit-hep/hist/.env/bin/hist", line 33, in <module>
    sys.exit(load_entry_point('hist', 'console_scripts', 'hist')())
  File "/Users/henryschreiner/git/scikit-hep/hist/src/hist/classic_hist.py", line 22, in main
    h = bh.numpy.histogram(values, bins=args.buckets, histogram=hist.Hist)
  File "/Users/henryschreiner/git/scikit-hep/hist/.env/lib/python3.8/site-packages/boost_histogram/numpy.py", line 111, in histogram
    return histogramdd((a,), (bins,), (range,), normed, weights, density, **kwargs)
  File "/Users/henryschreiner/git/scikit-hep/hist/.env/lib/python3.8/site-packages/boost_histogram/numpy.py", line 74, in histogramdd
    hist = cls(*axs, storage=bh_storage).fill(*a, weight=weights, threads=threads)
  File "/Users/henryschreiner/git/scikit-hep/hist/src/hist/core.py", line 38, in __init__
    if ax.name in self.names:
  File "/Users/henryschreiner/git/scikit-hep/hist/src/hist/_internal/axis.py", line 52, in name
    return self.metadata["name"]
TypeError: 'NoneType' object is not subscriptable

@LovelyBuggies, could you take a look? This is really just running bh.numpy.histogram([1,2,3,2], histogram=hist.Hist).

opened by henryiii 9

[SUPPORT] How to Check Test files by Mypy CI

Describe your questions

I have used mypy according to https://scikit-hep.org/developer/style#type-checking-new. My pre-commit profig file is like this:

repos:
- repo: https://github.com/psf/black
  rev: 19.10b0
  hooks:
  - id: black
- repo: https://github.com/pre-commit/pre-commit-hooks
  rev: v2.5.0
  hooks:
  - id: check-added-large-files
  - id: mixed-line-ending
  - id: trailing-whitespace
  - id: check-merge-conflict
  - id: check-case-conflict
  - id: check-symlinks
  - id: check-yaml
- repo: https://github.com/pre-commit/mirrors-mypy
  rev: v0.770
  hooks:
  - id: mypy
    files: all  # I tried tests and src, but none worked

I deliberately put a type error in a test file, but pre-commit did not find it. How to make mypy CI work for unit tests?

question

opened by LovelyBuggies 9

feat: adding profile for COUNT -> MEAN conversion
Closes #156, adds a .profile method. Based heavily on @aminnj's example.

TODO:

[x] Needs tests

This probably could be a WeightedMean, probably give a choice? Just got the basics working for now. (Can revisit later)

[x] Should hide or handle warnings (true for some of our tests, too)

Followup:

histoprint should handle showing kind=MEAN histograms (it doesn't like the NaN's, I think) CC @ast0815 (edit: done!)

mplhep should show a better plot style for kind=MEAN, and should also ensure it handles kind=MEAN's requirements (we could also dispatch differently, but the best fix would be in mplhep) CC @andrzejnovak

Boost-histogram should describe accumulators/storages a bit more in the docs, especially setting with a stack, including the correct order. CC Me

If kind=MEAN, you are only supposed to plot variances if counts() > 1, and values if counts() >= 1.
opened by henryiii 7
Fix Hist and NamedHist
Does named indexing work on all Hist's? I don’t see the code for it in BaseHist, only in NamedHist. Something like this: h[{“x”:2}] should be converted into h[{0:2}] assuming that axes 0 has name "x".

NamedHist should simply verify that the index item is a dict, and that it has only string keys. Then it should just call BaseHist's indexing.

NamedHist's fill should simply be: def fill(self, **kwargs): return super().fill(**kwargs) (maybe with a few more keyword arguments listed to be nice to the user's inspection tools). BaseHist should be able to fill via kwargs or via position.

Overall, NamedHist should be pretty short, it is just taking away functionally (position-based access) from BaseHist. I'm not sure how strict we want to be - we can also have an OnlyNamedAxesTuple too.

Also we should make sure you can cast between Hist and NamedHist. Hist(NamedHist(...)), etc. I can do that one.
bug
opened by henryiii 7
[BUG] Uncertainty bands in efficiency ratios go above unity
Describe the bug

The error bars produced in the ratio of an efficiency style plot (hist.plot_ratio(..., rp_uncertainty_type = "poisson-ratio", ...)) can extend above unity. This is unexpected and not meaningful if the numerator is a true subset of the denominator.

Steps to reproduce

This is observed using hist==2.4.0.

Following the example from the docs,

hist0 = hist.Hist(hist.axis.Regular(50, -5, 5), underflow = False, overflow = False).fill(np.random.normal(size=1700)) hist1 = hist0.copy() * 0.98 fig = plt.figure(figsize = (10,8)) _ = hist1.plot_ratio( hist0, rp_ylabel="Efficiency", rp_num_label="hist1", rp_denom_label="hist0", rp_uncert_draw_type="line", rp_uncertainty_type="poisson-ratio", rp_ylim=[0.9,1.1], )

This produces something like the following,

where we see the error bars going above unity.

If I take the same data used in hist0 and hist1 above, and produce a TEfficiency object in ROOT, I get something like the following, which is something more like what I would expect since I believe hist is using a Clopper-Pearson interval (default in ROOT).
bug
opened by dantrim 6
[FEATURE] dask histogram backend
Describe the problem, if any, that your feature request is related to

Right now the https://github.com/dask-contrib/dask-histogram package exposes an interface that is very nearly 1:1 with boost histogram, which I think is the correct scope for this package. However, many users of columnar analysis tools in HEP prefer hist since its UI is more intuitive and comfortable. It would be best to have a sub-module within hist that exposed a dask-histogram backed interface and suite of tools.

Describe the feature you'd like

I would like users to be able to do something like:

import hist.dask as hist

or

import hist h = hist.Hist(**axes).as_dask()

and have access to the hist interface with (nearly) all of its features but backed by computation in dask.

This is very close to practice since dask_histogram already very closely follows (and mostly directly uses) the boost-histogram interface, which means that most of what hist does should be easy to adapt.

This will require a few pieces of interface to be smoothened out in dask_histogram since it's not 1:1 with boost-histogram in some cases. (e.g. what is mentioned in the bottom of the comment here: https://github.com/dask-contrib/dask-histogram/issues/35#issuecomment-1368108690)

Describe alternatives, if any, you've considered

The alternative is using the dask histogram interface directly, which while effective, is significantly less widely used and less pleasant than hist. This would require significant readoption and generate attrition risk in our user base.
enhancement
opened by lgray 0
chore(deps): bump pypa/gh-action-pypi-publish from 1.5.1 to 1.6.4
Bumps pypa/gh-action-pypi-publish from 1.5.1 to 1.6.4.

Release notes

Sourced from pypa/gh-action-pypi-publish's releases.

v1.6.4

oh, boi! again?

This is the last one tonight, promise! It fixes this embarrassing bug that was actually caught by the CI but got overlooked due to the lack of sleep. TL;DR GH passed $HOME from the external env into the container and that tricked the Python's site module to think that the home directory is elsewhere, adding non-existent paths to the env vars. See #115.

Full Diff: https://github.com/pypa/gh-action-pypi-publish/compare/v1.6.3...v1.6.4

v1.6.3

Another Release!? Why?

In pypa/gh-action-pypi-publish#112, it was discovered that passing a $PATH variable even breaks the shebang. So this version adds more safeguards to make sure it keeps working with a fully broken $PATH.

Full Diff: https://github.com/pypa/gh-action-pypi-publish/compare/v1.6.2...v1.6.3

v1.6.2

What's Fixed

Made the $PATH and $PYTHONPATH environment variables resilient to broken values passed from the host runner environment, which previously allowed the users to accidentally break the container's internal runtime as reported in pypa/gh-action-pypi-publish#112

Internal Maintenance Improvements

Added a devpi-based smoke-test GitHub Actions CI/CD workflow by @sesdaile-varmour in pypa/gh-action-pypi-publish#111

New Contributors

@sesdaile-varmour made their first contribution in pypa/gh-action-pypi-publish#111

Full Diff: https://github.com/pypa/gh-action-pypi-publish/compare/v1.6.1...v1.6.2

v1.6.1

What's happened?!

There was a sneaky bug in v1.6.0 which caused Twine to be outside the import path in the Python runtime. It is fixed in v1.6.1 by updating $PYTHONPATH to point to a correct location of the user-global site-packages/ directory.

Full Diff: https://github.com/pypa/gh-action-pypi-publish/compare/v1.6.0...v1.6.1

v1.6.0

Anything's changed?

The only update is that the Python runtime has been upgraded from 3.9 to 3.11. There are no functional changes in this release.

Full Changelog: https://github.com/pypa/gh-action-pypi-publish/compare/v1.5.2...v1.6.0

v1.5.2

What's Improved

Implemented the Twine transitive dependency tree pinning using pip-tools-generated constraint files. See pypa/gh-action-pypi-publish#107 and pypa/gh-action-pypi-publish#101 for details.

Full Diff: https://github.com/pypa/gh-action-pypi-publish/compare/v1.5.1...v1.5.2

Commits

c7f29f7 🐛 Override $HOME in the container with /root

644926c 🧪 Always run smoke testing in debug mode

e71a4a4 Add support for verbose bash execusion w/ $DEBUG

e56e821 🐛 Make id always available in twine-upload

c879b84 🐛 Use full path to bash in shebang

57e7d53 🐛Ensure the default $PATH value is pre-loaded

ce291dc 🎨🐛Fix the branch @ pre-commit.ci badge links

102d8ab 🐛 Rehardcode devpi port for GHA srv container

3a9eaef 🐛Use different ports in/out of GHA containers

a01fa74 🐛 Use localhost @ GHA outside the containers

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0

Make it easier to plot ratio of multiple histograms [FEATURE]

In:

import matplotlib.pyplot as plt
import numpy
from   hist import Hist

#------------------------------------------------------
def get_hist():
    h = (
      Hist.new
      .Reg(10, -2 ,2, name='x', label='x-axis')
      .Int64()
    )
    data=numpy.random.normal(size=10000)
    h.fill(data)

    return h
#------------------------------------------------------
def main():
    h_1 = get_hist()
    h_2 = get_hist()
    h_3 = get_hist()

    fig, (ax, rax) = plt.subplots(nrows=2, gridspec_kw={"height_ratios": (3, 1)})
    axs = {"main_ax": ax, "ratio_ax": rax}

    h_2.plot_ratio(h_1,
        rp_ylabel          ='Ratio',
        rp_num_label       ='data',
        rp_denom_label     ='sim 1',
        rp_uncert_draw_type='bar',
        ax_dict = axs,
    )

    h_3.plot_ratio(h_1,
        rp_ylabel          ='Ratio',
        rp_num_label       ='data',
        rp_denom_label     ='sim 2',
        rp_uncert_draw_type='bar',
        ax_dict = axs,
    )

    plt.show()
#------------------------------------------------------
if __name__ == '__main__':
    main()

The code is unable to:

Plot the ratio of data to both simulations and there is no easy way to make this work. I see repeated histograms.
The axes are not aligned by default
I do not see documentation (or at least it is not easy to find) fr the rp_* arguments.

Describe the feature you'd like

The user should be able to do:

    h_1 = get_hist()
    h_2 = get_hist()
    h_3 = get_hist()

   h_1.plot_ratio([h_2, h_3])

   plt.show()

and we should get two figures, upper one with h_* overlaid. Lower with two ratios, to h_2 and h_3. The axes should be aligned, the labels and legends should be taken from the histograms themselves and we should not have to do any manipulation of artists.

Describe alternatives, if any, you've considered

The way the code is implemented is bad, it's too complicated, and I do not have time to make it work the way I need to, so I am moving back to pure matplotlib. The plots I need do not need to be perfect and matplotlib is good enough for me. It would be nice if hist can do quickly what we need though.

Cheers.

enhancement

opened by angelcampoverde 0

chore: update pre-commit hooks
updates:

github.com/psf/black: 22.10.0 → 22.12.0

github.com/pre-commit/pre-commit-hooks: v4.3.0 → v4.4.0

github.com/PyCQA/isort: 5.10.1 → 5.11.4

github.com/asottile/pyupgrade: v3.2.0 → v3.3.1

github.com/nbQA-dev/nbQA: 1.5.3 → 1.6.0

github.com/pycqa/flake8: 5.0.4 → 6.0.0

github.com/pre-commit/mirrors-mypy: v0.982 → v0.991

github.com/mgedmin/check-manifest: 0.48 → 0.49

github.com/shellcheck-py/shellcheck-py: v0.8.0.4 → v0.9.0.2
opened by pre-commit-ci[bot] 0

[BUG] Raise an error when adding hists of different storage types.

Describe the bug

Adding two hists of different storage types just results in having an empty hist without raising an error:

Steps to reproduce

import numpy as np
from hist import Hist
import hist
a_hist = hist.Hist(hist.axis.Regular(3, -1, 1), storage=hist.storage.Int64())
a_hist.sum()

0.0

b_hist = hist.Hist(hist.axis.Regular(3, -1, 1), storage=hist.storage.Double())
b_hist.fill(np.random.normal(size=1000))
b_hist.sum()

681.0

a_hist += b_hist
a_hist.sum()

0.0

opened by JohanWulff 0

[FEATURE] plot_ratio() to support Weighted histograms

Describe the problem, if any, that your feature request is related to I believe the current plot_ratio() method does not take into account weights of the histograms when calculating the errors.

Describe the feature you'd like

Propagate errors correctly into the ratio uncertainty, taking into account the weights

Describe alternatives, if any, you've considered

Using coffea.hist.plotratio instead
enhancement

opened by andreypz 0

Releases(v2.6.2)

v2.6.2(Sep 20, 2022)
Nicer stacks repr #449

Backport storage_type if boost-histogram < 1.3.2 #447

Allow overwriting labels for plot/overlay #414

Use Hatching to build the package #418

Support git archival version numbers #441

Source code(tar.gz)
Source code(zip)
v2.6.1(Mar 10, 2022)
Fall back on normal repr when histogram is too large #388

Fix issue with no-axis histogram #388

Fix issue with empty axis causing segfault until fixed upstream #387

Only require SciPy if using SciPy #386

Source code(tar.gz)
Source code(zip)
v2.6.0(Feb 16, 2022)
Using boost-histogram 1.3

Fix runtime dependency on matplotlib when not plotting #353

Fix .plot shortcut failure #368

New nox sessions: regenerate and pylint

Update tests for latest matplotlib

Source code(tar.gz)
Source code(zip)
v2.5.2(Nov 18, 2021)
Remove more-itertools requirement #347

Fix missing pass-through for stack plot #339

Source code(tar.gz)
Source code(zip)
v2.5.1(Sep 21, 2021)
Support named stack indexing #325

Fix histoprint error with stacks #325

Better README

Source code(tar.gz)
Source code(zip)
v2.5.0(Sep 21, 2021)
Stacks support axes, math operations, projection, setting items, and iter/dict construction. They also support histogram titles in legends. Added histoprint support for Stacks. #291 #315 #317 #318

Added name= and label= to histograms, include Hist arguments in QuickConstruct. #297

AxesTuple now supports bulk name setting, h.axes.name = ("a", "b", ...). #288

Added hist.new alias for hist.Hist.new. #296

Added "efficiency" uncertainty_type option for ratio_plot API. #266 #278

Smaller features or fixes:

Dropped Python 3.6 support. #194

Uses boost-histogram 1.2.x series, includes all features and fixes, and Python 3.10 support.

No longer require scipy or iminuit unless actually needed. #316

Improve and clarify treatment of confidence intervals in intervals submodule. #281

Use NumPy 1.21 for static typing. #285

Support running tests without plotting requirements. #321

Source code(tar.gz)
Source code(zip)
v2.4.0(Jul 7, 2021)
Support .stack(axis) and stacked histograms. #244 #257 #258

Support selection lists (experimental with boost-histogram 1.1.0). #255

Support full names for QuickConstruct, and support mistaken usage in constructor. #256

Add .sort(axis) for quickly sorting a categorical axis. #243

Smaller features or fixes:

Support nox for easier contributor setup. #228

Better name axis error. #232

Fix for issue plotting size 0 axes. #238

Fix issues with repr information missing. #241

Fix issues with wrong plot shortcut being triggered by Integer axes. #247

Warn and better error if overlapping keyword used as axis name. #250

Along with lots of smaller docs updates.
Source code(tar.gz)
Source code(zip)
v2.3.0(Apr 12, 2021)
Add plot_ratio to the public API, which allows for making ratio plots between the histogram and either another histogram or a callable. #161

Add .profile to compute a (N-1)D profile histogram. #160

Support plot1d / plot on Histograms with a categorical axis. #174

Add frequentist coverage interval support in the intervals module. #176

Allow plot_pull to take a more generic callable or a string as a fitting function. Introduce an option to perform a likelihood fit. Write fit parameters' values and uncertainties in the legend. #149

Add fit_fmt= to plot_pull to control display of fit params. #168

Support <prefix>_kw arguments for setting each axis in plotting. #193

Cleaner IPython completion for Python 3.7+. #179

Source code(tar.gz)
Source code(zip)
v2.2.1(Mar 17, 2021)
Fix bug with plot_pull missing a sqrt. #150

Fix static typing with ellipses. #145

Require boost-histogram 1.0.1+, fixing typing related issues, allowing subclassing Hist without a family and including a important Mean/WeighedMean summing fix. #151

Source code(tar.gz)
Source code(zip)
v2.2.0(Mar 9, 2021)
Support boost-histogram 1.0. Better plain reprs. Full Static Typing. #137

Support data= when construction a histogram to copy in initial data. #142

Support Hist.from_columns, for simple conversion of DataFrames and similar structures #140

Support .plot_pie for quick pie plots #140

Source code(tar.gz)
Source code(zip)
v2.1.1(Mar 4, 2021)
Fix density (and density based previews) #134 <https://github.com/scikit-hep/hist/pull/134>_

Source code(tar.gz)
Source code(zip)
v2.1.0(Feb 20, 2021)
This version provides many new features from boost-histogram 0.12 and 0.13; see the changelog in boost-histogram for details.

Support shortcuts for setting storages by string or position #129

Updated dependencies:

boost-histogram 0.11.0 to 0.13.0.

Major new features, including PlottableProtocol

histoprint >=1.4 to >=1.6.

mplhep >=0.2.16 when [plot] given

Source code(tar.gz)
Source code(zip)
v2.0.1(Oct 11, 2020)
Hist version 2.0.1 comes out. The following fixes are applied:

Make sum of bins explicit in notebook representations. #106

Fixed plot2d_full incorrectly mirroring the y-axis. #105

Hist.plot_pull: more suitable bands in the pull bands 1sigma, 2 sigma, etc. #102

Fixed classichist's usage of get_terminal_size to support not running in a terminal #99

Source code(tar.gz)
Source code(zip)
v2.0.0(Sep 28, 2020)
Final 2.0 release of Hist! Since beta 1, the following changes were made:

Based on boost-histogram 0.11; now supports two way conversion without metadata issues.

mplhep is now used for all plotting. Return types changed; fig dropped, new figures only created if needed.

QuickConstruct was rewritten, uses new.Reg(...).Double(); not as magical but clearer types and usage.

Plotting requirements are no longer required, use [plot] to request.

The following new features were added:

Jupyter HTML repr's were added.

flow=False shortcut added.

Static type checker support for dependent projects.

The following fixes were applied:

.fill was broken for WeighedMean storage.

Source code(tar.gz)
Source code(zip)
v2.0.0b1(Sep 8, 2020)

First beta release. title has been renamed label. Significant improvements to documentation, and a bug fix for plotting (#87). Uses Boost-Histogram 0.10+.
Source code(tar.gz)
Source code(zip)
v2.0.0a5(Jul 17, 2020)

Release before the PyHEP preview.
Source code(tar.gz)
Source code(zip)
v2.0.0a4(Jul 16, 2020)

Bumps the version of boost-histogram to the 0.10 series.
Source code(tar.gz)
Source code(zip)
v2.0.0a3(Jul 16, 2020)

A release for the talk of PyHEP 2020 - boost-histogram.
Source code(tar.gz)
Source code(zip)
v2.0.0a2(Jul 12, 2020)

First release before PyHEP 2020.
Source code(tar.gz)
Source code(zip)
v2.0.0a1(Jul 12, 2020)

First alpha release, just before PyHEP 2020.
Source code(tar.gz)
Source code(zip)

Owner

Scikit-HEP Project

GitHub https://hist.readthedocs.io

Python histogram library - histograms as updateable, fully semantic objects with visualization tools. [P]ython [HYST]ograms.

physt P(i/y)thon h(i/y)stograms. Inspired (and based on) numpy.histogram, but designed for humans(TM) on steroids(TM). The goal is to unify different

120 Dec 8, 2022

This plugin plots the time you spent on a tag as a histogram.

7 Sep 9, 2022

Bcc2telegraf: An integration that sends ebpf-based bcc histogram metrics to telegraf daemon

bcc2telegraf bcc2telegraf is an integration that sends ebpf-based bcc histogram

2 Feb 17, 2022

Regress.me is an easy to use data visualization tool powered by Dash/Plotly.

Regress.me Regress.me is an easy to use data visualization tool powered by Dash/Plotly. Regress.me.-.Google.Chrome.2022-05-10.15-58-59.mp4 Get Started

14 Aug 14, 2022

Tools for exploratory data analysis in Python

Dora Exploratory data analysis toolkit for Python. Contents Summary Setup Usage Reading Data & Configuration Cleaning Feature Selection & Extraction V

599 Dec 25, 2022

3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

PyVista Deployment Build Status Metrics Citation License Community 3D plotting and mesh analysis through a streamlined interface for the Visualization

1.6k Jan 8, 2023

3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

PyVista Deployment Build Status Metrics Citation License Community 3D plotting and mesh analysis through a streamlined interface for the Visualization

692 Feb 18, 2021

Python package for hypergraph analysis and visualization.

The HyperNetX library provides classes and methods for the analysis and visualization of complex network data. HyperNetX uses data structures designed to represent set systems containing nested data and/or multi-way relationships. The library generalizes traditional graph metrics to hypergraphs.

304 Dec 27, 2022

Domain Connectivity Analysis Tools to analyze aggregate connectivity patterns across a set of domains during security investigations

DomainCAT (Domain Connectivity Analysis Tool) Domain Connectivity Analysis Tool is used to analyze aggregate connectivity patterns across a set of dom

34 Dec 9, 2022

Squidpy is a tool for the analysis and visualization of spatial molecular data.

Squidpy is a tool for the analysis and visualization of spatial molecular data. It builds on top of scanpy and anndata, from which it inherits modularity and scalability. It provides analysis tools that leverages the spatial coordinates of the data, as well as tissue images if available.

251 Dec 19, 2022

📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

???? Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

1.2k Jan 1, 2023

Histogramming for analysis powered by boost-histogram

Related tags

Overview

Hist

Installation

Features

Usage

Development

Contributors

Talks

Acknowledgements

Comments

Questions to come to consensus on

TODO before requesting review

Suggested squash and merge message

Describe your questions

v1.6.4

oh, boi! again?

v1.6.3

Another Release!? Why?

v1.6.2

What's Fixed

Internal Maintenance Improvements

New Contributors

v1.6.1

What's happened?!

v1.6.0

Anything's changed?

v1.5.2

What's Improved

Releases(v2.6.2)

v2.6.2(Sep 20, 2022)

v2.6.1(Mar 10, 2022)

v2.6.0(Feb 16, 2022)

v2.5.2(Nov 18, 2021)

v2.5.1(Sep 21, 2021)

v2.5.0(Sep 21, 2021)

v2.4.0(Jul 7, 2021)

v2.3.0(Apr 12, 2021)

v2.2.1(Mar 17, 2021)

v2.2.0(Mar 9, 2021)

v2.1.1(Mar 4, 2021)

v2.1.0(Feb 20, 2021)

v2.0.1(Oct 11, 2020)

v2.0.0(Sep 28, 2020)

v2.0.0b1(Sep 8, 2020)

v2.0.0a5(Jul 17, 2020)

v2.0.0a4(Jul 16, 2020)

v2.0.0a3(Jul 16, 2020)

v2.0.0a2(Jul 12, 2020)

v2.0.0a1(Jul 12, 2020)

Owner

Scikit-HEP Project

Python histogram library - histograms as updateable, fully semantic objects with visualization tools. [P]ython [HYST]ograms.

This plugin plots the time you spent on a tag as a histogram.

Bcc2telegraf: An integration that sends ebpf-based bcc histogram metrics to telegraf daemon

Regress.me is an easy to use data visualization tool powered by Dash/Plotly.

Tools for exploratory data analysis in Python

3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

Python package for hypergraph analysis and visualization.

Domain Connectivity Analysis Tools to analyze aggregate connectivity patterns across a set of domains during security investigations

Squidpy is a tool for the analysis and visualization of spatial molecular data.

📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

Exploratory analysis and data visualization of aircraft accidents and incidents in Brazil.

Automate the case review on legal case documents and find the most critical cases using network analysis

Political elections, appointment, analysis and visualization in Python

Sentiment Analysis application created with Python and Dash, hosted at socialsentiment.net

Python package for the analysis and visualisation of finite-difference fields.

Runtime analysis of code with plotting

Analysis and plotting for motor/prop/ESC characterization, thrust vs RPM and torque vs thrust

AB-test-analyzer - Python class to perform AB test analysis