Interactive plotting for Pandas using Vega-Lite

Overview

pdvega: Vega-Lite plotting for Pandas Dataframes

build status Binder

pdvega is a library that allows you to quickly create interactive Vega-Lite plots from Pandas dataframes, using an API that is nearly identical to Pandas' built-in visualization tools, and designed for easy use within the Jupyter notebook.

Pandas currently has some basic plotting capabilities based on matplotlib. So, for example, you can create a scatter plot this way:

import numpy as np
import pandas as pd

df = pd.DataFrame({'x': np.random.randn(100), 'y': np.random.randn(100)})
df.plot.scatter(x='x', y='y')

matplotlib scatter output

The goal of pdvega is that any time you use dataframe.plot, you'll be able to replace it with dataframe.vgplot and instead get a similar (but prettier and more interactive) visualization output in Vega-Lite that you can easily export to share or customize:

import pdvega  # import adds vgplot attribute to pandas

df.vgplot.scatter(x='x', y='y')

vega-lite scatter output

The above image is a static screenshot of the interactive output; please see the Documentation for a full set of live usage examples.

Installation

You can get started with pdvega using pip:

$ pip install jupyter pdvega
$ jupyter nbextension install --sys-prefix --py vega3

The first line installs pdvega and its dependencies; the second installs the Jupyter extensions that allows plots to be displayed in the Jupyter notebook. For more information on installation and dependencies, see the Installation docs.

Why Vega-Lite?

When working with data, one of the biggest challenges is ensuring reproducibility of results. When you create a figure and export it to PNG or PDF, the data become baked-in to the rendering in a way that is difficult or impossible for others to extract. Vega and Vega-Lite change this: instead of packaging a figure by encoding its pixel values, they package a figure by describing, in a declarative manner, the relationship between data values and visual encodings through a JSON specification.

This means that the Vega-Lite figures produced by pdvega are portable: you can send someone the resulting JSON specification and they can choose whether to render it interactively online, convert it to a PNG or EPS for static publication, or even enhance and extend the figure to learn more about the data.

pdvega is a step in bringing this vision of figure portability and reproducibility to the Python world.

Relationship to Altair

Altair is a project that seeks to design an intuitive declarative API for generating Vega-Lite and Vega visualizations, using Pandas dataframes as data sources.

By contrast, pdvega seeks not to design new visualization APIs, but to use the existing DataFrame.plot visualization api and output visualizations with Vega/Vega-Lite rather than with matplotlib.

In this respect, pdvega is quite similar in spirit to the now-defunct mpld3 project, though the scope is smaller and (hopefully) much more manageable.

Issues
  • rewrite using altair

    rewrite using altair

    the very first passing implementation, mainly

    • added altair as a dependency
    • dropped axis.py
    • dropped some altair-specific tests (perhaps should drop more)
    • every plot function now returns a "non-interactive" altair chart, therefore I dropped interactive and ax as arguements

    Some remarks

    • definitely should try "reenact" this in the notebook https://pandas.pydata.org/pandas-docs/stable/visualization.html
    • perhaps it's worth considering to add the support for repeat in api - for example, plotting the groupby object, or something alike
    opened by Casyfill 20
  • jupyterlab support

    jupyterlab support

    the current version of pdvega will not work in JupyterLab: the main reason is that the new MIME-based rendering used by JupyterLab is not yet supported in the vega3 library that pdvega depends on

    Just wanted to clarify that this is correct, even with the vega3 jupyterlab extension?

    If that is the case I guess this can be kept open to track any progress...

    opened by dhirschfeld 9
  • use new accessor extension api to register plotting attribute

    use new accessor extension api to register plotting attribute

    Hey @jakevdp!

    This is a really cool project!

    This is jumping the gun a bit -- but a new accessor extensions API landed in pandas (dev) (and AccessorProperty will be deprecated). It's really slick! It was designed especially for projects like pdvega.

    Here's a PR using the new extension api with backwards compatibility.

    opened by Zsailer 5
  • Add flake8 to find syntax errors & undefined names

    Add flake8 to find syntax errors & undefined names

    The error is fixed in #12

    opened by cclauss 5
  • drop color encoding in favor of default color where there is no need

    drop color encoding in favor of default color where there is no need

    as for now, any vgplot will provide color encoding, which creates color legend, even if there is only one category/color

    opened by Casyfill 4
  • support figsize

    support figsize

    Adding figsize support to core plotting functions - same approach as for pdvega.plotting functions.

    also, moved warn_if_keywords_unused to the end of each function

    opened by Casyfill 4
  • Update README.md

    Update README.md

    I think what's promising about this library is people can use this library to create a plot that people can easily export to share or further customize (or even blend with other plots).

    opened by kanitw 4
  • shouldn't vega3 be removed from requirements.txt ?

    shouldn't vega3 be removed from requirements.txt ?

    vega3 is marked as decrepited, if I understood well.

    opened by stonebig 4
  • Update requirements.txt

    Update requirements.txt

    opened by domoritz 4
  • module 'pandas.core' has no attribute 'index'

    module 'pandas.core' has no attribute 'index'

    trying to use pdvega like in documentation anytime I get the error message 'module 'pandas.core' has no attribute 'index'' e.g. import numpy as np import pandas as pd import pdvega from vega_datasets import data iris = data.iris() pdvega.andrews_curves(iris, 'species')

    I am using python 3.8 I think it is because pandas deprecated index

    opened by uli22 2
  • Register entry point for pandas backend

    Register entry point for pandas backend

    pdvega could add an entrypoint to register itself with pandas: https://dev.pandas.io/development/extending.html#plotting-backends

    Something like

    # in setup.py
    setup(  # noqa: F821
        ...,
        entry_points={
            "pandas_plotting_backends": [
                "altair = pdvega.<module>",
            ],
        },
    )
    

    where <module> is whatever module has the plot top-level method with the right signature.

    opened by TomAugspurger 5
  • Binder broken

    Binder broken

    It looks like binder isn't set up correctly - the environment seems to be missing the altair dependency:

    image

    opened by dhirschfeld 4
  • Examples on the website are broken

    Examples on the website are broken

    http://altair-viz.github.io/pdvega/ uses @jakevdp's old repo and thus the examples don't work.

    opened by domoritz 4
  • V0.1.x

    V0.1.x

    advanced.rst

    opened by Casyfill 2
  • Update altair code internally

    Update altair code internally

    As a rule, I think we should use Altair code internally rather than dicts... it will make things easier to debug if and when Vega-Lite/Altair changes.

    e.g. {'maxbins': 10} should be alt.Bin(maxbins=10) etc.

    opened by jakevdp 0
  • Plotting data with datetimes

    Plotting data with datetimes

    The plotting library doesn't seem to work when I try and plot a datetime object. It can handle just dates but when there is an associated time the plot builds without error but no line is plotted.

    Code here that doesn't work:

    import pandas as pd import matplotlib.pyplot as plt import pdvega

    rng = pd.date_range('1/1/2011', periods=72, freq='H') rng = [pd.Timestamp(r) for r in rng] ts = pd.Series(np.random.randn(len(rng)), index=rng)

    ts.vgplot.line() #this doesn't throw any errors but no data is shown

    ts.plot() #this works on the other hand plt.show()

    opened by StephanieWillis 1
  • Columns of all None treated differently than all np.nan

    Columns of all None treated differently than all np.nan

    Maybe a bit niche, but ran into this issue with lineplot: if there is a column of all np.nan, then it is ignored, but if there is a column of all None, then it makes the plot really wacky.

    Generate some data:

    import pandas as pd
    import numpy as np
    import pdvega
    %matplotlib inline
    
    # generate some data
    np.random.seed(111)
    df = pd.DataFrame(np.random.randn(50, 4), 
            index=pd.date_range('1/1/2000', periods=50),
                      columns=list('ABCD'))
    df = df.cumsum()
    
    # this plot is fine
    df.vgplot()
    

    image

    # this column is ignored in the plot
    df['nan'] = np.nan
    df.vgplot()
    

    (looks the same as above)

    # this column makes everything weird
    df['none'] = None
    df.vgplot()
    

    image

    Oddly enough this doesn't happen if the A and B columns are int:

    np.random.seed(111)
    df = pd.DataFrame(np.random.randint(low=0, high=5, size=[50, 2]), 
            index=pd.date_range('1/1/2000', periods=50),
                      columns=list('AB'))
    df = df.cumsum()
    
    # add a column of all none
    df['nan'] = np.nan
    
    # add a column of all none
    df['none'] = None
    df.vgplot()
    

    image

    opened by alistairewj 0
  • How to add vertical and horizontal lines to figures?

    How to add vertical and horizontal lines to figures?

    Please advise.

    opened by BlackArbsCEO 6
  • Is there ax object with two y-scales (twinx)

    Is there ax object with two y-scales (twinx)

    My dataframe have two columns with different scale, I'd like twinx function as Matplotlib

    import numpy as np
    import matplotlib.pyplot as plt
    
    fig, ax1 = plt.subplots()
    t = np.arange(0.01, 10.0, 0.01)
    s1 = np.exp(t)
    ax1.plot(t, s1, 'b-')
    ax1.set_xlabel('time (s)')
    # Make the y-axis label, ticks and tick labels match the line color.
    ax1.set_ylabel('exp', color='b')
    ax1.tick_params('y', colors='b')
    
    ax2 = ax1.twinx()
    s2 = np.sin(2 * np.pi * t)
    ax2.plot(t, s2, 'r.')
    ax2.set_ylabel('sin', color='r')
    ax2.tick_params('y', colors='r')
    
    fig.tight_layout()
    plt.show()
    
    opened by muxuezi 5
Owner
Altair
Declarative visualization in Python
Altair
IPython/Jupyter notebook module for Vega and Vega-Lite

IPython Vega IPython/Jupyter notebook module for Vega 5, and Vega-Lite 4. Notebooks with embedded visualizations can be viewed on GitHub and nbviewer.

Vega 315 Nov 24, 2021
IPython/Jupyter notebook module for Vega and Vega-Lite

IPython Vega IPython/Jupyter notebook module for Vega 5, and Vega-Lite 4. Notebooks with embedded visualizations can be viewed on GitHub and nbviewer.

Vega 294 Feb 12, 2021
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?

HoloViz 475 Nov 23, 2021
Bokeh Plotting Backend for Pandas and GeoPandas

Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of

Patrik Hlobil 733 Dec 2, 2021
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?

HoloViz 349 Feb 15, 2021
Bokeh Plotting Backend for Pandas and GeoPandas

Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of

Patrik Hlobil 614 Feb 17, 2021
Plotting library for IPython/Jupyter notebooks

bqplot 2-D plotting library for Project Jupyter Introduction bqplot is a 2-D visualization system for Jupyter, based on the constructs of the Grammar

null 3.2k Dec 2, 2021
Simple plotting for Python. Python wrapper for D3xter - render charts in the browser with simple Python syntax.

PyDexter Simple plotting for Python. Python wrapper for D3xter - render charts in the browser with simple Python syntax. Setup $ pip install PyDexter

D3xter 31 Mar 6, 2021
An intuitive library to add plotting functionality to scikit-learn objects.

Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i

Reiichiro Nakano 2.2k Nov 24, 2021
🎨 Python3 binding for `@AntV/G2Plot` Plotting Library .

PyG2Plot ?? Python3 binding for @AntV/G2Plot which an interactive and responsive charting library. Based on the grammar of graphics, you can easily ma

hustcc 814 Nov 22, 2021
NorthPitch is a python soccer plotting library that sits on top of Matplotlib

NorthPitch is a python soccer plotting library that sits on top of Matplotlib.

Devin Pleuler 30 May 20, 2021
matplotlib: plotting with Python

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Check out our home page for more inform

Matplotlib Developers 14.6k Nov 24, 2021
🎨 Python Echarts Plotting Library

pyecharts Python ❤️ ECharts = pyecharts English README ?? 简介 Apache ECharts (incubating) 是一个由百度开源的数据可视化,凭借着良好的交互性,精巧的图表设计,得到了众多开发者的认可。而 Python 是一门富有表达

pyecharts 11.7k Nov 26, 2021
Plotting library for IPython/Jupyter notebooks

bqplot 2-D plotting library for Project Jupyter Introduction bqplot is a 2-D visualization system for Jupyter, based on the constructs of the Grammar

null 3.2k Nov 24, 2021
3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

PyVista Deployment Build Status Metrics Citation License Community 3D plotting and mesh analysis through a streamlined interface for the Visualization

PyVista 1k Dec 3, 2021
:small_red_triangle: Ternary plotting library for python with matplotlib

python-ternary This is a plotting library for use with matplotlib to make ternary plots plots in the two dimensional simplex projected onto a two dime

Marc 494 Nov 26, 2021
An open-source plotting library for statistical data.

Lets-Plot Lets-Plot is an open-source plotting library for statistical data. It is implemented using the Kotlin programming language. The design of Le

JetBrains 699 Nov 26, 2021
matplotlib: plotting with Python

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Check out our home page for more inform

Matplotlib Developers 13.1k Feb 18, 2021
🎨 Python Echarts Plotting Library

pyecharts Python ❤️ ECharts = pyecharts English README ?? 简介 Apache ECharts (incubating) 是一个由百度开源的数据可视化,凭借着良好的交互性,精巧的图表设计,得到了众多开发者的认可。而 Python 是一门富有表达

pyecharts 10.6k Feb 18, 2021