Python histogram library - histograms as updateable, fully semantic objects with visualization tools. [P]ython [HYST]ograms.

Overview

physt Physt logo

P(i/y)thon h(i/y)stograms. Inspired (and based on) numpy.histogram, but designed for humans(TM) on steroids(TM).

The goal is to unify different concepts of histograms as occurring in numpy, pandas, matplotlib, ROOT, etc. and to create one representation that is easily manipulated with from the data point of view and at the same time provides nice integration into IPython notebook and various plotting options. In short, whatever you want to do with histograms, physt aims to be on your side.

Note: bokeh plotting backend has been discontinued (due to external library being redesigned.)

Travis ReadTheDocs Join the chat at https://gitter.im/physt/Lobby PyPI version Anaconda-Server Badge Anaconda-Server Badge

Versioning

  • Versions 0.3.x support Python 2.7 (no new releases in 2019)
  • Versions 0.4.x support Python 3.5+ while continuing the 0.3 API
  • Versions 0.4.9+ support only Python 3.6+ while continuing the 0.3 API
  • Versions 0.5.x slightly change the interpretation of *args in h1, h2, ...

Simple example

from physt import h1

# Create the sample
heights = [160, 155, 156, 198, 177, 168, 191, 183, 184, 179, 178, 172, 173, 175,
           172, 177, 176, 175, 174, 173, 174, 175, 177, 169, 168, 164, 175, 188,
           178, 174, 173, 181, 185, 166, 162, 163, 171, 165, 180, 189, 166, 163,
           172, 173, 174, 183, 184, 161, 162, 168, 169, 174, 176, 170, 169, 165]

hist = h1(heights, 10)           # <--- get the histogram data
hist << 190                      # <--- add a forgotten value
hist.plot()                      # <--- and plot it

Heights plot

2D example

from physt import h2
import seaborn as sns

iris = sns.load_dataset('iris')
iris_hist = h2(iris["sepal_length"], iris["sepal_width"], "human", bin_count=[12, 7], name="Iris")
iris_hist.plot(show_zero=False, cmap="gray_r", show_values=True);

Iris 2D plot

3D directional example

import numpy as np
from physt import special_histograms

# Generate some sample data
data = np.empty((1000, 3))
data[:,0] = np.random.normal(0, 1, 1000)
data[:,1] = np.random.normal(0, 1.3, 1000)
data[:,2] = np.random.normal(1, .6, 1000)

# Get histogram data (in spherical coordinates)
h = special_histograms.spherical(data)                 

# And plot its projection on a globe
h.projection("theta", "phi").plot.globe_map(density=True, figsize=(7, 7), cmap="rainbow")   

Directional 3D plot

See more in docstring's and notebooks:

Installation

Using pip:

pip install physt

Features

Implemented

  • 1D histograms
  • 2D histograms
  • ND histograms
  • Some special histograms
    • 2D polar coordinates (with plotting)
    • 3D spherical / cylindrical coordinates (beta)
  • Adaptive rebinning for on-line filling of unknown data (beta)
  • Non-consecutive bins
  • Memory-effective histogramming of dask arrays (beta)
  • Understands any numpy-array-like object
  • Keep underflow / overflow / missed bins
  • Basic numeric operations (* / + -)
  • Items / slice selection (including mask arrays)
  • Add new values (fill, fill_n)
  • Cumulative values, densities
  • Simple statistics for original data (mean, std, sem)
  • Plotting with several backends
    • matplotlib (static plots with many options)
    • vega (interactive plots, beta, help wanted!)
    • folium (experimental for geo-data)
    • plotly (very basic, help wanted!)
    • ascii (experimental)
  • Algorithms for optimized binning
    • human-friendly
    • mathematical
  • IO, conversions
    • I/O JSON
    • I/O xarray.DataSet (experimental)
    • O ROOT file (experimental)
    • O pandas.DataFrame (basic)

Planned

  • Rebinning
    • using reference to original data?
    • merging bins
  • Statistics (based on original data)?
  • Stacked histograms (with names)
  • Potentially holoviews plotting backend (instead of the discontinued bokeh one)

Not planned

  • Kernel density estimates - use your favourite statistics package (like seaborn)
  • Rebinning using interpolation - it should be trivial to use rebin (https://github.com/jhykes/rebin) with physt

Rationale (for both): physt is dumb, but precise.

Dependencies

  • Python 3.5+
  • numpy
  • (optional) matplotlib - simple output
  • (optional) xarray - I/O
  • (optional) protobuf - I/O
  • (optional) uproot - I/O
  • (optional) astropy - additional binning algorithms
  • (optional) folium - map plotting
  • (optional) vega3 - for vega in-line in IPython notebook (note that to generate vega JSON, this is not necessary)
  • (optional) asciiplotlib - for ASCII bar plots
  • (optional) xtermcolot - for ASCII color maps
  • (testing) py.test, pandas
  • (docs) sphinx, sphinx_rtd_theme, ipython

Publicity

Talk at PyData Berlin 2018:

Contribution

I am looking for anyone interested in using / developing physt. You can contribute by reporting errors, implementing missing features and suggest new one.

Thanks to:

Patches:

Alternatives and inspirations

Comments
  • python 2.7 plotting is not working

    python 2.7 plotting is not working

    When runnin plot() function I get the error below even though matplotlib is installed. Also the algorithm is pretty slow when running on something bigger than toy example.

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 137, in __call__
        return plot(self.histogram, kind=kind, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 91, in plot
        backend_name, backend = _get_backend(backend)
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 70, in _get_backend
        raise RuntimeError("No plotting backend available. Please, install matplotlib (preferred) or bokeh (limited).")
    RuntimeError: No plotting backend available. Please, install matplotlib (preferred) or bokeh (limited).
    
    bug 
    opened by romange 13
  • Smooth polar histograms?

    Smooth polar histograms?

    Thanks for writing this awesome library!

    I have a question regarding smoothing of polar 2D histograms. I am constructing a histogram like described on this page https://physt.readthedocs.io/en/latest/special_histograms.html#Polar-histogram and now I want to smooth it with a Gaussian kernel (like scipy.ndimage.gaussian_filter). What is the most elegant / correct method to do that?

    question 
    opened by horsto 7
  • Rebinning histograms related project

    Rebinning histograms related project

    Hi I found a project on rebinning histogram at https://github.com/jhykes/rebin and I opened an issue (jhykes/rebin#5) on that project page asking about integrating his code to this project. I hope you will appreciate it.

    enhancement idea? 
    opened by DancingQuanta 7
  • Option to center labels on bins

    Option to center labels on bins

    If you have a large dataset with a small number of values (such as consisting only of integers 1-10) then it would be nice to have the bin x-axis labels at the center under the respective bin instead of at the bin edges.

    I recognise this case is more of a 'histogram as bar plot' kind of thing, but it is a use-case I have often.

    opened by nzjrs 5
  • Usage of spherical histogram

    Usage of spherical histogram

    Hi, I have tried the example of spherical histogram. After a small modification of the code (normalized the data as unit vectors),

    n = 100 data = np.empty((n, 3)) data[:,0] = np.random.normal(0, 1, n) data[:,1] = np.random.normal(0, 1, n) data[:,2] = np.random.normal(0, 1, n) for i in range(n): scale = np.sqrt(data[i,0]**2 + data[i,1]**2 + data[i,2]**2) data[i,0] = data[i,0]/scale data[i,1] = data[i,1]/scale data[i,2] = data[i,2]/scale

    h = special.spherical_histogram(data, theta_bins=20, phi_bins=20) ax.scatter(data[:,0], data[:,1], data[:,2])

    globe = h.projection("theta", "phi") globe.plot.globe_map(density=True, figsize=(7, 7), cmap="rainbow")

    plt.show()

    I got an error: “RuntimeError: Bins not in rising order.” What did I do wrong? Thank you for your support.

    question 
    opened by zhengpuchen 3
  • approximate histograms

    approximate histograms

    I'm following the paper (http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf) implemented by https://github.com/carsonfarmer/streamhist, and the notion of approximate histograms seems elegant and efficient.

    After seeing the internals of streamhist (trying to fix bugs) and reading the paper, I can imagine ways to make a better implementation: e.g. much more efficient discovery of bins to be joined, and avoiding temporary lists when possible. Also the code seems overly complex, partially due to features like "bin freezing" which try to workaround poor bin joining performance.

    Anyway since streamhist is defunct, I'm thinking about trying an implementation. I wonder if this kind of histogram would fit into physt (and if sortedcollections would be reasonable as a dependency).

    opened by belm0 3
  • please make this library discoverable

    please make this library discoverable

    name: physt (?) github tag line: P(i/y)thon h(i/y)stograms (???)

    google search for "python streaming histogram"

    • top result is https://github.com/carsonfarmer/streamhist (unused / unmaintained)
    • physt not in initial 10 pages of results...

    For over a year I've wanted to find a Python library which supports efficient histogram updates without a bunch of ugly dependencies. I've searched many times. Today I happened to get lucky by seeing physt mentioned at the bottom of a SO question (https://stackoverflow.com/questions/40627274/).

    To improve discoverability by search, please consider updating the github tag line to concisely and accurately describe the library (... rather than be cute).

    opened by belm0 2
  • Warning in current numpy

    Warning in current numpy

    If you try to merge bins:

    from physt import h2
    from scipy.stats import multivariate_normal
    hist = h2(*multivariate_normal.rvs((0,0), size=100_000).T, bins=100)
    hist.merge_bins(2)
    

    You get a warning from numpy:

    /home/schreihf/.local/lib/python3.7/site-packages/physt/histogram_base.py:572: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
      new_frequencies[new_index] += old_frequencies[old_index]
    /home/schreihf/.local/lib/python3.7/site-packages/physt/histogram_base.py:573: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
      new_errors2[new_index] += old_errors2[old_index]
    
    opened by henryiii 2
  • Add 2D & ND histograms

    Add 2D & ND histograms

    • [x] Analogous data model to Histogram1D
    • [x] refactor HistogramBase class -> common behaviour of 1D and 2D
    • [x] revisit binning schemas
    • [x] histogram2D facade function to be compatible with numpy one
    • [x] plotting
    • [x] arithmetic operations
    • [x] documentation
    • [ ] stats
    enhancement 
    opened by janpipek 2
  • ImportError with newer plotly

    ImportError with newer plotly

    [SOMEDIR}\physt\physt\plotting\plotly.py in <module>
         12 
         13 import plotly.offline as pyo
    ---> 14 import plotly.plotly as pyp
         15 import plotly.graph_objs as go
         16 
    
    ~\Miniconda3\lib\site-packages\plotly\plotly\__init__.py in <module>
          2 from _plotly_future_ import _chart_studio_error
          3 
    ----> 4 _chart_studio_error("plotly")
    
    ~\Miniconda3\lib\site-packages\_plotly_future_\__init__.py in _chart_studio_error(submodule)
         41 
         42 def _chart_studio_error(submodule):
    ---> 43     raise ImportError(
         44         """
         45 The plotly.{submodule} module is deprecated,
    
    ImportError: 
    The plotly.plotly module is deprecated,
    please install the chart-studio package and use the
    chart_studio.plotly module instead. 
    
    bug visualization 
    opened by janpipek 1
  • Wrong bars center in polar_map

    Wrong bars center in polar_map

    I have found that the bars in polar_map are centered on the left edge of the phi bins instead of their center. Because of this, the representation of the histogram does not coincide with the data, as in the figure below: polarmap_wrong

    I think this can be easily solved by replacing

    bars = ax.bar(phipos[i], dr[i], width=dphi[i], bottom=rpos[i], color=bin_color,

    with

    bars = ax.bar(phipos[i] + 0.5*dphi[i], dr[i], width=dphi[i], bottom=rpos[i], color=bin_color,

    in the definition of polar_map.

    By the way, thank you for this amazing package!

    bug visualization 
    opened by ruhugu 1
  • Be more explicit about bins too narrow for float representation

    Be more explicit about bins too narrow for float representation

    If the computed range for the binning divided by the number of bins is lower than the minimum float difference at the scale, we receive an error [ValueError: Bins not in rising order.] which is not very informative.

    To reproduce:

    data = [1, np.nextafter(1, 2)]
    physt.h1(data)
    

    It also happens when the range is 0, like in:

    data = [1, 1]
    physt.h1(data)
    
    enhancement 
    opened by janpipek 1
Releases(v0.5.2)
Owner
Jan Pipek
PyData Prague
Jan Pipek
This plugin plots the time you spent on a tag as a histogram.

This plugin plots the time you spent on a tag as a histogram.

Tom Dörr 7 Sep 9, 2022
Histogramming for analysis powered by boost-histogram

Hist Hist is an analyst-friendly front-end for boost-histogram, designed for Python 3.7+ (3.6 users get version 2.4). See what's new. Installation You

Scikit-HEP Project 97 Dec 25, 2022
Bcc2telegraf: An integration that sends ebpf-based bcc histogram metrics to telegraf daemon

bcc2telegraf bcc2telegraf is an integration that sends ebpf-based bcc histogram

Peter Bobrov 2 Feb 17, 2022
This is a super simple visualization toolbox (script) for transformer attention visualization ✌

Trans_attention_vis This is a super simple visualization toolbox (script) for transformer attention visualization ✌ 1. How to prepare your attention m

Mingyu Wang 3 Jul 9, 2022
A Python package that provides evaluation and visualization tools for the DexYCB dataset

DexYCB Toolkit DexYCB Toolkit is a Python package that provides evaluation and visualization tools for the DexYCB dataset. The dataset and results wer

NVIDIA Research Projects 107 Dec 26, 2022
Python Package for CanvasXpress JS Visualization Tools

CanvasXpress Python Library About CanvasXpress for Python CanvasXpress was developed as the core visualization component for bioinformatics and system

Dr. Todd C. Brett 5 Nov 7, 2022
Fast data visualization and GUI tools for scientific / engineering applications

PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h

pyqtgraph 3.1k Jan 8, 2023
Fast data visualization and GUI tools for scientific / engineering applications

PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h

pyqtgraph 2.3k Feb 13, 2021
Fast data visualization and GUI tools for scientific / engineering applications

PyQtGraph A pure-Python graphics library for PyQt5/PyQt6/PySide2/PySide6 Copyright 2020 Luke Campagnola, University of North Carolina at Chapel Hill h

pyqtgraph 2.3k Feb 17, 2021
Farhad Davaripour, Ph.D. 1 Jan 5, 2022
An intuitive library to add plotting functionality to scikit-learn objects.

Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i

Reiichiro Nakano 2.3k Dec 31, 2022
Declarative statistical visualization library for Python

Altair http://altair-viz.github.io Altair is a declarative statistical visualization library for Python. With Altair, you can spend more time understa

Altair 8k Jan 5, 2023
Declarative statistical visualization library for Python

Altair http://altair-viz.github.io Altair is a declarative statistical visualization library for Python. With Altair, you can spend more time understa

Altair 6.4k Feb 13, 2021
Declarative statistical visualization library for Python

Altair http://altair-viz.github.io Altair is a declarative statistical visualization library for Python. With Altair, you can spend more time understa

Altair 6.4k Feb 18, 2021
High-level geospatial data visualization library for Python.

geoplot: geospatial data visualization geoplot is a high-level Python geospatial plotting library. It's an extension to cartopy and matplotlib which m

Aleksey Bilogur 1k Jan 1, 2023
FURY - A software library for scientific visualization in Python

Free Unified Rendering in Python A software library for scientific visualization in Python. General Information • Key Features • Installation • How to

null 169 Dec 21, 2022
Streamlit component for Let's-Plot visualization library

streamlit-letsplot This is a work-in-progress, providing a convenience function to plot charts from the Lets-Plot visualization library. Example usage

Randy Zwitch 9 Nov 3, 2022
Visualization Library

CamViz Overview // Installation // Demos // License Overview CamViz is a visualization library developed by the TRI-ML team with the goal of providing

Toyota Research Institute - Machine Learning 67 Nov 24, 2022
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

null 10k Jan 1, 2023