Interactive plotting for Pandas using Vega-Lite

Overview

pdvega: Vega-Lite plotting for Pandas Dataframes

build status Binder

pdvega is a library that allows you to quickly create interactive Vega-Lite plots from Pandas dataframes, using an API that is nearly identical to Pandas' built-in visualization tools, and designed for easy use within the Jupyter notebook.

Pandas currently has some basic plotting capabilities based on matplotlib. So, for example, you can create a scatter plot this way:

import numpy as np
import pandas as pd

df = pd.DataFrame({'x': np.random.randn(100), 'y': np.random.randn(100)})
df.plot.scatter(x='x', y='y')

matplotlib scatter output

The goal of pdvega is that any time you use dataframe.plot, you'll be able to replace it with dataframe.vgplot and instead get a similar (but prettier and more interactive) visualization output in Vega-Lite that you can easily export to share or customize:

import pdvega  # import adds vgplot attribute to pandas

df.vgplot.scatter(x='x', y='y')

vega-lite scatter output

The above image is a static screenshot of the interactive output; please see the Documentation for a full set of live usage examples.

Installation

You can get started with pdvega using pip:

$ pip install jupyter pdvega
$ jupyter nbextension install --sys-prefix --py vega3

The first line installs pdvega and its dependencies; the second installs the Jupyter extensions that allows plots to be displayed in the Jupyter notebook. For more information on installation and dependencies, see the Installation docs.

Why Vega-Lite?

When working with data, one of the biggest challenges is ensuring reproducibility of results. When you create a figure and export it to PNG or PDF, the data become baked-in to the rendering in a way that is difficult or impossible for others to extract. Vega and Vega-Lite change this: instead of packaging a figure by encoding its pixel values, they package a figure by describing, in a declarative manner, the relationship between data values and visual encodings through a JSON specification.

This means that the Vega-Lite figures produced by pdvega are portable: you can send someone the resulting JSON specification and they can choose whether to render it interactively online, convert it to a PNG or EPS for static publication, or even enhance and extend the figure to learn more about the data.

pdvega is a step in bringing this vision of figure portability and reproducibility to the Python world.

Relationship to Altair

Altair is a project that seeks to design an intuitive declarative API for generating Vega-Lite and Vega visualizations, using Pandas dataframes as data sources.

By contrast, pdvega seeks not to design new visualization APIs, but to use the existing DataFrame.plot visualization api and output visualizations with Vega/Vega-Lite rather than with matplotlib.

In this respect, pdvega is quite similar in spirit to the now-defunct mpld3 project, though the scope is smaller and (hopefully) much more manageable.

Comments
  • rewrite using altair

    rewrite using altair

    the very first passing implementation, mainly

    • added altair as a dependency
    • dropped axis.py
    • dropped some altair-specific tests (perhaps should drop more)
    • every plot function now returns a "non-interactive" altair chart, therefore I dropped interactive and ax as arguements

    Some remarks

    • definitely should try "reenact" this in the notebook https://pandas.pydata.org/pandas-docs/stable/visualization.html
    • perhaps it's worth considering to add the support for repeat in api - for example, plotting the groupby object, or something alike
    opened by Casyfill 20
  • jupyterlab support

    jupyterlab support

    the current version of pdvega will not work in JupyterLab: the main reason is that the new MIME-based rendering used by JupyterLab is not yet supported in the vega3 library that pdvega depends on

    Just wanted to clarify that this is correct, even with the vega3 jupyterlab extension?

    If that is the case I guess this can be kept open to track any progress...

    opened by dhirschfeld 9
  • use new accessor extension api to register plotting attribute

    use new accessor extension api to register plotting attribute

    Hey @jakevdp!

    This is a really cool project!

    This is jumping the gun a bit -- but a new accessor extensions API landed in pandas (dev) (and AccessorProperty will be deprecated). It's really slick! It was designed especially for projects like pdvega.

    Here's a PR using the new extension api with backwards compatibility.

    opened by Zsailer 5
  • support figsize

    support figsize

    Adding figsize support to core plotting functions - same approach as for pdvega.plotting functions.

    also, moved warn_if_keywords_unused to the end of each function

    opened by Casyfill 4
  • Update README.md

    Update README.md

    I think what's promising about this library is people can use this library to create a plot that people can easily export to share or further customize (or even blend with other plots).

    opened by kanitw 4
  • module 'pandas.core' has no attribute 'index'

    module 'pandas.core' has no attribute 'index'

    trying to use pdvega like in documentation anytime I get the error message 'module 'pandas.core' has no attribute 'index'' e.g. import numpy as np import pandas as pd import pdvega from vega_datasets import data iris = data.iris() pdvega.andrews_curves(iris, 'species')

    I am using python 3.8 I think it is because pandas deprecated index

    opened by uli22 2
  • scatter doesn't accept already in-use columns

    scatter doesn't accept already in-use columns

    This works fine.

    df = pd.DataFrame(np.random.rand(5, 3))
    df.vgplot.scatter(x=0, y=1, c=2)
    

    But, if you want to use color and y based on same column, i.e column 1. It throws

    df.vgplot.scatter(x=0, y=1, c=1)
    
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-40-d247c37c9696> in <module>()
          1 df = pd.DataFrame(np.random.rand(5, 3))
    ----> 2 df.vgplot.scatter(x=0, y=1, c=1)
    
    d:\apps\anaconda2\lib\site-packages\pdvega\_core.pyc in scatter(self, x, y, c, s, alpha, interactive, width, height, **kwds)
        504         spec = finalize_vegalite_spec(spec, interactive=interactive,
        505                                       width=width, height=height)
    --> 506         return Axes(spec, data=data[cols])
        507 
        508     def area(self, x=None, y=None, stacked=True, alpha=None,
    
    d:\apps\anaconda2\lib\site-packages\pdvega\_axes.pyc in __init__(self, spec, data)
          5     """Class representing a pdvega plot axes"""
          6     def __init__(self, spec=None, data=None):
    ----> 7         self.vlspec = VegaLite(spec, data)
          8 
          9     @property
    
    d:\apps\anaconda2\lib\site-packages\vega3\base.pyc in __init__(self, spec, data)
         21         """Initialize the visualization object."""
         22         spec = utils.nested_update(copy.deepcopy(self.DEFAULTS), spec)
    ---> 23         self.spec = self._prepare_spec(spec, data)
         24 
         25     def _prepare_spec(self, spec, data):
    
    d:\apps\anaconda2\lib\site-packages\vega3\vegalite.pyc in _prepare_spec(self, spec, data)
         15 
         16     def _prepare_spec(self, spec, data):
    ---> 17         return prepare_spec(spec, data)
         18 
         19 
    
    d:\apps\anaconda2\lib\site-packages\vega3\utils.pyc in prepare_spec(spec, data)
         86         # We have to do the isinstance test first because we can't
         87         # compare a DataFrame to None.
    ---> 88         data = sanitize_dataframe(data)
         89         spec['data'] = {'values': data.to_dict(orient='records')}
         90     elif data is None:
    
    d:\apps\anaconda2\lib\site-packages\vega3\utils.pyc in sanitize_dataframe(df)
         64             # For floats, convert nan->None: np.float is not JSON serializable
         65             col = df[col_name].astype(object)
    ---> 66             df[col_name] = col.where(col.notnull(), None)
         67         elif str(dtype).startswith('datetime'):
         68             # Convert datetimes to strings
    
    e:\github\pandas\pandas\core\frame.pyc in __setitem__(self, key, value)
       2547         else:
       2548             # set column
    -> 2549             self._set_item(key, value)
       2550 
       2551     def _setitem_slice(self, key, value):
    
    e:\github\pandas\pandas\core\frame.pyc in _set_item(self, key, value)
       2623         self._ensure_valid_index(value)
       2624         value = self._sanitize_column(key, value)
    -> 2625         NDFrame._set_item(self, key, value)
       2626 
       2627         # check if we are modifying a copy
    
    e:\github\pandas\pandas\core\generic.pyc in _set_item(self, key, value)
       2290 
       2291     def _set_item(self, key, value):
    -> 2292         self._data.set(key, value)
       2293         self._clear_item_cache()
       2294 
    
    e:\github\pandas\pandas\core\internals.pyc in set(self, item, value, check)
       3992         removed_blknos = []
       3993         for blkno, val_locs in _get_blkno_placements(blknos, len(self.blocks),
    -> 3994                                                      group=True):
       3995             blk = self.blocks[blkno]
       3996             blk_locs = blklocs[val_locs.indexer]
    
    e:\github\pandas\pandas\core\internals.pyc in _get_blkno_placements(blknos, blk_count, group)
       5020 
       5021     # FIXME: blk_count is unused, but it may avoid the use of dicts in cython
    -> 5022     for blkno, indexer in lib.get_blkno_indexers(blknos, group):
       5023         yield blkno, BlockPlacement(indexer)
       5024 
    
    e:\github\pandas\pandas\_libs\lib.pyx in pandas._libs.lib.get_blkno_indexers()
       1164 @cython.boundscheck(False)
       1165 @cython.wraparound(False)
    -> 1166 def get_blkno_indexers(int64_t[:] blknos, bint group=True):
       1167     """
       1168     Enumerate contiguous runs of integers in ndarray.
    
    ValueError: Buffer has wrong number of dimensions (expected 1, got 0)
    

    Is this expected? If not, happy to work on the patch.

    bug 
    opened by pratapvardhan 1
  • Add data parameter to data setter

    Add data parameter to data setter

    flake8 testing of https://github.com/jakevdp/pdvega on Python 2.7.14

    $ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

    ./pdvega/_axes.py:64:33: F821 undefined name 'data'
                self._vlspec.data = data
                                    ^
    ./pdvega/_axes.py:66:26: F821 undefined name 'data'
                self._data = data
                             ^
    2     F821 undefined name 'data'
    
    opened by cclauss 1
  • Add binder install scripts

    Add binder install scripts

    I wanted to try this out but didn't want to install it locally, so I made a binder instead.

    The README badge will start working once this PR is merged, till then: https://mybinder.org/v2/gh/betatim/pdvega/binder?filepath=examples%2Fpdvega_example.ipynb

    opened by betatim 1
  • pdvega on conda

    pdvega on conda

    I recently brought pdvega to the Anaconda users community through the conda-forge channel. you can now install it with:

      conda install -c conda-forge pdvega
    

    Thanks for this awesome tool.

    -Eddie

    opened by adbedada 0
  • Register entry point for pandas backend

    Register entry point for pandas backend

    pdvega could add an entrypoint to register itself with pandas: https://dev.pandas.io/development/extending.html#plotting-backends

    Something like

    # in setup.py
    setup(  # noqa: F821
        ...,
        entry_points={
            "pandas_plotting_backends": [
                "altair = pdvega.<module>",
            ],
        },
    )
    

    where <module> is whatever module has the plot top-level method with the right signature.

    opened by TomAugspurger 5
  • Update altair code internally

    Update altair code internally

    As a rule, I think we should use Altair code internally rather than dicts... it will make things easier to debug if and when Vega-Lite/Altair changes.

    e.g. {'maxbins': 10} should be alt.Bin(maxbins=10) etc.

    opened by jakevdp 0
  • Plotting data with datetimes

    Plotting data with datetimes

    The plotting library doesn't seem to work when I try and plot a datetime object. It can handle just dates but when there is an associated time the plot builds without error but no line is plotted.

    Code here that doesn't work:

    import pandas as pd import matplotlib.pyplot as plt import pdvega

    rng = pd.date_range('1/1/2011', periods=72, freq='H') rng = [pd.Timestamp(r) for r in rng] ts = pd.Series(np.random.randn(len(rng)), index=rng)

    ts.vgplot.line() #this doesn't throw any errors but no data is shown

    ts.plot() #this works on the other hand plt.show()

    opened by StephanieWillis 1
  • Columns of all None treated differently than all np.nan

    Columns of all None treated differently than all np.nan

    Maybe a bit niche, but ran into this issue with lineplot: if there is a column of all np.nan, then it is ignored, but if there is a column of all None, then it makes the plot really wacky.

    Generate some data:

    import pandas as pd
    import numpy as np
    import pdvega
    %matplotlib inline
    
    # generate some data
    np.random.seed(111)
    df = pd.DataFrame(np.random.randn(50, 4), 
            index=pd.date_range('1/1/2000', periods=50),
                      columns=list('ABCD'))
    df = df.cumsum()
    
    # this plot is fine
    df.vgplot()
    

    image

    # this column is ignored in the plot
    df['nan'] = np.nan
    df.vgplot()
    

    (looks the same as above)

    # this column makes everything weird
    df['none'] = None
    df.vgplot()
    

    image

    Oddly enough this doesn't happen if the A and B columns are int:

    np.random.seed(111)
    df = pd.DataFrame(np.random.randint(low=0, high=5, size=[50, 2]), 
            index=pd.date_range('1/1/2000', periods=50),
                      columns=list('AB'))
    df = df.cumsum()
    
    # add a column of all none
    df['nan'] = np.nan
    
    # add a column of all none
    df['none'] = None
    df.vgplot()
    

    image

    opened by alistairewj 0
Owner
Altair
Declarative visualization in Python
Altair
IPython/Jupyter notebook module for Vega and Vega-Lite

IPython Vega IPython/Jupyter notebook module for Vega 5, and Vega-Lite 4. Notebooks with embedded visualizations can be viewed on GitHub and nbviewer.

Vega 335 Nov 29, 2022
IPython/Jupyter notebook module for Vega and Vega-Lite

IPython Vega IPython/Jupyter notebook module for Vega 5, and Vega-Lite 4. Notebooks with embedded visualizations can be viewed on GitHub and nbviewer.

Vega 294 Feb 12, 2021
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?

HoloViz 697 Jan 6, 2023
Bokeh Plotting Backend for Pandas and GeoPandas

Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of

Patrik Hlobil 822 Jan 7, 2023
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?

HoloViz 349 Feb 15, 2021
Bokeh Plotting Backend for Pandas and GeoPandas

Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of

Patrik Hlobil 614 Feb 17, 2021
Plotting library for IPython/Jupyter notebooks

bqplot 2-D plotting library for Project Jupyter Introduction bqplot is a 2-D visualization system for Jupyter, based on the constructs of the Grammar

null 3.4k Dec 29, 2022
Simple plotting for Python. Python wrapper for D3xter - render charts in the browser with simple Python syntax.

PyDexter Simple plotting for Python. Python wrapper for D3xter - render charts in the browser with simple Python syntax. Setup $ pip install PyDexter

D3xter 31 Mar 6, 2021
An intuitive library to add plotting functionality to scikit-learn objects.

Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i

Reiichiro Nakano 2.3k Dec 31, 2022
🎨 Python3 binding for `@AntV/G2Plot` Plotting Library .

PyG2Plot ?? Python3 binding for @AntV/G2Plot which an interactive and responsive charting library. Based on the grammar of graphics, you can easily ma

hustcc 990 Jan 5, 2023
NorthPitch is a python soccer plotting library that sits on top of Matplotlib

NorthPitch is a python soccer plotting library that sits on top of Matplotlib.

Devin Pleuler 30 Feb 22, 2022
matplotlib: plotting with Python

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Check out our home page for more inform

Matplotlib Developers 16.7k Jan 8, 2023
🎨 Python Echarts Plotting Library

pyecharts Python ❤️ ECharts = pyecharts English README ?? 简介 Apache ECharts (incubating) 是一个由百度开源的数据可视化,凭借着良好的交互性,精巧的图表设计,得到了众多开发者的认可。而 Python 是一门富有表达

pyecharts 13.1k Jan 3, 2023
Plotting library for IPython/Jupyter notebooks

bqplot 2-D plotting library for Project Jupyter Introduction bqplot is a 2-D visualization system for Jupyter, based on the constructs of the Grammar

null 3.4k Dec 30, 2022
3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

PyVista Deployment Build Status Metrics Citation License Community 3D plotting and mesh analysis through a streamlined interface for the Visualization

PyVista 1.6k Jan 8, 2023
:small_red_triangle: Ternary plotting library for python with matplotlib

python-ternary This is a plotting library for use with matplotlib to make ternary plots plots in the two dimensional simplex projected onto a two dime

Marc 611 Dec 29, 2022
An open-source plotting library for statistical data.

Lets-Plot Lets-Plot is an open-source plotting library for statistical data. It is implemented using the Kotlin programming language. The design of Le

JetBrains 820 Jan 6, 2023
matplotlib: plotting with Python

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Check out our home page for more inform

Matplotlib Developers 13.1k Feb 18, 2021
🎨 Python Echarts Plotting Library

pyecharts Python ❤️ ECharts = pyecharts English README ?? 简介 Apache ECharts (incubating) 是一个由百度开源的数据可视化,凭借着良好的交互性,精巧的图表设计,得到了众多开发者的认可。而 Python 是一门富有表达

pyecharts 10.6k Feb 18, 2021