Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

Overview



Turn even the largest data into images, accurately

Build Status Build Status
Coverage codecov
Latest dev release Github tag dev-site
Latest release Github release PyPI version datashader version conda-forge version defaults version
Docs gh-pages site
Support Discourse

History of OS GIS Timeline


What is it?

Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data. Datashader breaks the creation of images of data into 3 main steps:

  1. Projection

    Each record is projected into zero or more bins of a nominal plotting grid shape, based on a specified glyph.

  2. Aggregation

    Reductions are computed for each bin, compressing the potentially large dataset into a much smaller aggregate array.

  3. Transformation

    These aggregates are then further processed, eventually creating an image.

Using this very general pipeline, many interesting data visualizations can be created in a performant and scalable way. Datashader contains tools for easily creating these pipelines in a composable manner, using only a few lines of code. Datashader can be used on its own, but it is also designed to work as a pre-processing stage in a plotting library, allowing that library to work with much larger datasets than it would otherwise.

Installation

Datashader supports Python 2.7, 3.6 and 3.7 on Linux, Windows, or Mac and can be installed with conda:

conda install datashader

or with pip:

pip install datashader

For the best performance, we recommend using conda so that you are sure to get numerical libraries optimized for your platform. The latest releases are avalailable on the pyviz channel conda install -c pyviz datashader and the latest pre-release versions are avalailable on the dev-labelled channel conda install -c pyviz/label/dev datashader.

Fetching Examples

Once you've installed datashader as above you can fetch the examples:

datashader examples
cd datashader-examples

This will create a new directory called datashader-examples with all the data needed to run the examples.

To run all the examples you will need some extra dependencies. If you installed datashader within a conda environment, with that environment active run:

conda env update --file environment.yml

Otherwise create a new environment:

conda env create --name datashader --file environment.yml
conda activate datashader

Developer Instructions

  1. Install Python 3 miniconda or anaconda, if you don't already have it on your system.

  2. Clone the datashader git repository if you do not already have it:

    git clone git://github.com/holoviz/datashader.git
    
  3. Set up a new conda environment with all of the dependencies needed to run the examples:

    cd datashader
    conda env create --name datashader --file ./examples/environment.yml
    conda activate datashader
    
  4. Put the datashader directory into the Python path in this environment:

    pip install --no-deps -e .
    

Learning more

After working through the examples, you can find additional resources linked from the datashader documentation, including API documentation and papers and talks about the approach.

Some Examples

USA census

NYC races

NYC taxi

Comments
  • ENH: first draft of MPL artist

    ENH: first draft of MPL artist

    Minimal datashader aware matplotlib artist.

    from datashader.mpl_ext import DSArtist
    import matplotlib.pyplot as plt
    import matplotlib.colors as mocolor
    
    fig, ax = plt.subplots()
    da = DSArtist(ax, df, 'dropoff_x', 'dropoff_y',  ds.count('passenger_count'), norm=mcolors.LogNorm());
    ax.add_artist(da); ax.set_aspect('equal');
    
    fig.colorbar(da)
    
    

    so

    This is using DS to just do the binning and then re-using mpl's existing normalization and color-mapping tools.

    in progress 
    opened by tacaswell 55
  • ENH: updated draft of MPL artist

    ENH: updated draft of MPL artist

    Working on resolving issues with @tacaswell's #200 at Scipy 2020 sprints along with @manzt.

    The DSArtist now takes in a datashader.Pipeline object and so far can handle the case of a 2D raster with a quantitative colormap but not the other 3D categorical case when a color_key is used.

    We currently infer the colormap by applying the datashader pipeline's operations manually rather than using the callable itself. We use the aggregation part of the pipeline (agg, transform_fn) to get a vmin and vmax in order to build and set a matplotlib colormap and norm.

    image

    We'll keep working on the categorical case, but we wanted to share this now and also see if there is still interest in merging this into datashader.

    opened by nvictus 47
  • Add Polygon support

    Add Polygon support

    Overview

    This PR adds support for rasterizing Polygons

    Closes https://github.com/pyviz/datashader/issues/181.

    For example usage, see notebook at https://anaconda.org/jonmmease/datashader_polygons_pr/notebook

    GeomArray ExtensionArrays

    In order to rasterize polygons efficiently, we need a data structure that can store an array of polygon definitions in a form that is directly accessible to numba.

    To accomplish this, I added a new RaggedArray (see https://github.com/pyviz/datashader/pull/687) subclass called PolygonsArray. Each element of this array can store one or more polygons with holes. So elements of a PolygonArray are roughly equivalent to a shapely Polygon or MultiPolygon. The entire PolygonsArray is roughly equivalent to a geopandas GeometryArray of Polygon/MultiPolygon elements.

    The new PolygonsArray pandas extension array could eventually grow to support many of the operations supported by the geopandas GeoSeries. The advantage would be that these operations could be implemented in vectorized form using numba for potentially improved performance, and Dask DataFrame support would largely come for free.

    To demonstrate the potential, I added length and area properties to the PolygonArray class. These operations are ~8x faster than the equivalent GeoSeries operations, and they could also be naturally parallelized using Dask for large datasets.

    Canvas methods

    New Canvas.polygons() method has been added to rasterize polygons, and the Canvas.line() method has been updated to support these new geometry arrays, making it easy to draw polygon outlines.

    Examples

    For code and timing, see https://anaconda.org/jonmmease/datashader_polygons_pr/notebook

    texas

    world

    world_outline

    cc @jbednar @philippjfr

    opened by jonmmease 44
  • Recommended file format for large files

    Recommended file format for large files

    Datashader is agnostic about file formats, working with anything that can be loaded into a dataframe-like object (currently supporting Pandas and Dask dataframes). But because datashader focuses on having good performance for large datasets, the performance of the file format is a major factor in the usability of the library. Thus we should use examples that serve to guide users towards good solutions for their own problems, recommending and demonstrating approaches that we find to work well.

    Right now, our examples use CSV and castra or HDF5 formats. It is of course important to show a CSV example, since nearly every dataset can be obtained in CSV for import into the library. However, CSV is highly inefficient in both file size and reading speed, and it also truncates floating-point precision in ways that are problematic when zooming in closely to a dataset.

    Castra is a relatively high-performance binary format that works well for the large datasets in the examples, but it is not yet a mature project, and is not available on the main conda channel. Should we invest in making castra be more fully supported? If not, I think we should choose another binary format (HDF5?) to use for our examples.

    opened by jbednar 40
  • Add pandas ExtensionArray for storing homogeneous ragged arrays

    Add pandas ExtensionArray for storing homogeneous ragged arrays

    Overview

    This PR introduces a pandas ExtensionArray for storing a column of homogeneous ragged 1D arrays. The Datashader motivation for ragged arrays is to make it possible to store variable-length lines (fixing problems like https://github.com/pyviz/datashader/issues/464) and eventually polygons (https://github.com/pyviz/datashader/issues/181) as elements of a column in a DataFrame. Using one such shape per row makes it simpler to store associated columns of data for use with selections and filtering, hovering, etc.

    This PR currently contains only the extension array and associated testing.

    Implementation

    RaggedArray is a subclass of pandas.api.extension.ExtensionArray with a RaggedDtype that is a subclass of pandas.api.extension.ExtensionDtype. RaggedDtype takes advantage of the @register_extension_dtype decorator introduced in pandas 0.24rc1 to register itself with pandas as a datatype named 'ragged'.

    NOTE: This branch currently requires pandas 0.24rc1

    A ragged array of length n is represented by three numpy arrays:

    • mask: A boolean array of length n where values of True represent missing/NA values
    • flat_array: An array with the same datatype as the ragged array element and with a length equal to the sum of the length of all of the ragged array elements.
    • start_indices: An unsigned integer array of length n of indices into flat_array corresponding to the start of the ragged array element. For space efficiency, the precision of the unsigned integer is chosen to be the smallest available that is capable of indexing the last element in flat_array.

    Example Usage

    In[1]: from datashader.datatypes import RaggedArray
    In[2]: ra = RaggedArray([[1, 2], [], [10, 20], None, [11, 22, 33, 44]])
    In[3]: ra
    Out[3]: 
    <RaggedArray>
    [            array([1., 2.]),    array([], dtype=float64),
               array([10., 20.]),                        None,
     array([11., 22., 33., 44.])]
    Length: 5, dtype: <class 'datashader.datatypes.RaggedDtype'>
    
    In[4]: ra.flat_array
    Out[4]: array([ 1.,  2., 10., 20., 11., 22., 33., 44.])
    
    In[5]: ra.start_indices
    Out[5]: array([0, 2, 2, 4, 4], dtype=uint8)
    
    In[6]: ra.mask
    Out[6]: array([False, False, False,  True, False])
    
    In[7]: pd.array([[1, 2], [], [10, 20], None, [11, 22, 33, 44]], dtype='ragged')
    Out[7]: 
    <RaggedArray>
    [            array([1., 2.]),    array([], dtype=float64),
               array([10., 20.]),                        None,
     array([11., 22., 33., 44.])]
    Length: 5, dtype: <class 'datashader.datatypes.RaggedDtype'>
    
    In[8]: rs = pd.Series([[1, 2], [], [10, 20], None, [11, 22, 33, 44]], dtype='ragged')
    In[9]: rs
    Out[9]: 
    0              [1. 2.]
    1                   []
    2            [10. 20.]
    3                 None
    4    [11. 22. 33. 44.]
    dtype: ragged
    
    In[10]: ragged_subset = rs.loc[[0, 1, 4]]
    In[11]: ragged_subset
    Out[11]: 
    0              [1. 2.]
    1                   []
    4    [11. 22. 33. 44.]
    dtype: ragged
    
    In[12]: ragged_subset.array.mask
    Out[12]: array([False, False, False])
    
    In[13]: ragged_subset.array.flat_array
    Out[13]: array([ 1.,  2., 11., 22., 33., 44.])
    
    In[14]: ragged_subset.array.start_indices
    Out[14]: array([0, 2, 2], dtype=uint8)
    
    opened by jonmmease 39
  • Datashader crashes python with

    Datashader crashes python with "noisy" timeseries data.

    It appears as if datashader renders low noise time series data such as the following no problem:

    low_noise

    However, when attempting to zoom-in on high-noise data such as the following:

    high_noise

    python bombs out with an uncaught win32 exception:

    python_crash_image

    I have attached a python notebook which shows this behavior (at least on my machine):

    Environment:

    • Windows 7
    • new data shader environment (as of 2018-03-12)
      • datashader 0.6.5
      • python 3.6
      • jupyter 1.0.0

    The datashader envrironment was generated using the attached yml file from the datashader examples directory.

    opened by bardiche98 36
  • WIP: Tiling

    WIP: Tiling

    This PR provides tiling for datashader.

    Tiling current assumes that data is provided in meters.

    The interface for tile

    • [x] implement tiles
    • [x] implement supertiles
    • [x] implement zoom level stats scan
    • [x] test stats scan
    • [x] add example notebook
    • [x] add esri metadata output issue (https://github.com/bokeh/datashader/issues/639)
    • [x] add ogc tileset metadata output issue (https://github.com/bokeh/datashader/issues/640)
    opened by brendancol 33
  • Extend InteractiveImage to work with bokeh server

    Extend InteractiveImage to work with bokeh server

    Awesome library, I really love it. I usually embed bokeh with flask and it seems to work with datashader as well. However, I noticed that It doesn't do very well when It tries to re-render the image when zooming in. It's because the js code needs the IPython.notebook.kernel for it to work. Is there a way to make it work without the use of the IPython kernel?

    opened by DavidVillero 33
  • import error in lidar example

    import error in lidar example

    I'm trying to load the lidar example but I got an import error at:

    from holoviews.operation import datashade
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: cannot import name 'datashade'
    

    I'm using datashader and holoview installed from theirs git:master repository.

    Any clue on what I'm missing?

    opened by epifanio 32
  • Add quadmesh glyph with rectilinear and curvilinear support

    Add quadmesh glyph with rectilinear and curvilinear support

    This PR is an alternative implementation of quadmesh rasterization, initially based on the logic from https://github.com/pyviz/datashader/pull/769.

    Unlike that PR, this PR adds quadmesh glyph classes, and supports the standard datashader aggregation framework.

    Overall, the architecture fits well with that of the dataframe-based glyphs (points, line, and area). And relying on the datashader aggregation framework results in a lot less code duplication compared to https://github.com/pyviz/datashader/pull/769.

    Thanks to the variable argument expansion from https://github.com/pyviz/datashader/pull/780, the performance for rectilinear rendering is now on par with the prototype implementation.

    For curvilinear quadmesh aggregation, this PR uses a raycasting algorithm for determining which pixels to fill. I've found the performance of this approach to be ~1.5x slower than the prototype implementation which uses an area-based point inclusion approach. This algorithm isn't the only difference between the implementation, and I didn't exhaustively chase down the differences this time.

    I went with the raycasting algorithm because it handles concave and complex quads. It is also very straightforward to extend this algorithm to general polygons (with or without holes), so I think there's a good path here towards adding general polygon support to datashader as well.

    For example usage and benchmarks, see https://anaconda.org/jonmmease/quadmeshcomparisons_pr/notebook (rendered at https://nbviewer.jupyter.org/urls/notebooks.anaconda.org/jonmmease/quadmeshcomparisons_pr/download)

    Future work: This PR does not include any parallelization support, so extending this to work in a multi-threaded or distributed context is left as future work.

    @jbednar @philippjfr


    Outdated performance observations from initial PR:

    But, it's an order of magnitude slower than the implementations in https://github.com/pyviz/datashader/pull/769. Here is a notebook showing some timing results: https://anaconda.org/jonmmease/rectquadmesh_examples/notebook.

    Roughly speaking, this PR is ~13x faster than representing a rectilinear quadmesh with a trimesh. But the specialized implementation from https://github.com/pyviz/datashader/pull/769 is ~13 faster than this PR. Note that I disabled numba parallelization for these tests for consistency

    I did some performance debugging and I found that nearly all of the extra overhead in this PR, compared to the specialized implementation, is coming from the use of the aggregation framework. If in the _extend function, I don't call append but instead implement a single aggregation then the performance is comparable to the specialized implementations.

    So the bad news is that right now we need to choose between performance and consistency/maintainability for the quadmesh implementation. The good news is that there may be an order of magnitude speedup to be had across points, line, and area glyphs as well if we can work out how to optimize the aggregation framework.

    opened by jonmmease 31
  • Implements tree reduction in the dask layer

    Implements tree reduction in the dask layer

    Following discussions in https://github.com/ratt-ru/shadeMS/issues/29 (in particular, using small chunks in the dask layer would, counterintuitive, explode RAM usage), @sjperkins replaced the current chunk aggregation code in dask.py with a tree reduction.

    This has been tested extensively with https://github.com/ratt-ru/shadeMS, and was found to reduce RAM usage considerably. Not tested in a CUDA context at all though, so if somebody more knowledgable than me can take a look at it, that'd be great.

    opened by o-smirnov 30
  • Better validate canvas.line() coordinate lengths

    Better validate canvas.line() coordinate lengths

    Fixes #1159.

    This adds extra checks in the validate() functions of the various line classes such as LinesAxis1XConstant to check that the supplied x and y coordinates have the same lengths.

    opened by ianthomas23 1
  • Insufficient validation of Canvas.line input lengths

    Insufficient validation of Canvas.line input lengths

    The sizes of the possible inputs to Canvas.line() are not sufficiently validated. Consider this example of LinesAxis1XConstant which accepts a numpy array for the x values and a number of columns of a pandas.DataFrame for the y values:

    import datashader as ds
    import numpy as np
    import pandas as pd
    
    # LinesAxis1XConstant
    x = np.arange(1)  # Incorrect size, should have length of 2.
    df = pd.DataFrame(dict(y_from = [0, 1, 0, 1, 0.0], y_to = [0, 1, 1, 0, 0.5]))
    
    canvas = ds.Canvas()
    agg = canvas.line(source=df, x=x, y=["y_from", "y_to"], axis=1, agg=ds.count())
    

    Here there is only a single x coordinate when there should be two to match the y coordinates. If this example is run normally then no error is reported. If it is run with numba jitting disabled, i.e. NUMBA_DISABLE_JIT=1 python test.py then an out of bounds error is reported as follows:

    Traceback (most recent call last):
      File "/Users/iant/github_temp/datashader_temp/issue1159.py", line 13, in <module>
        agg = canvas.line(source=df, x=x, y=["y_from", "y_to"], axis=1, agg=ds.count())
      File "/Users/iant/github/datashader/datashader/core.py", line 450, in line
        return bypixel(source, self, glyph, agg, antialias=glyph.antialiased)
      File "/Users/iant/github/datashader/datashader/core.py", line 1258, in bypixel
        return bypixel.pipeline(source, schema, canvas, glyph, agg, antialias=antialias)
      File "/Users/iant/github/datashader/datashader/utils.py", line 109, in __call__
        return lk[typ](head, *rest, **kwargs)
      File "/Users/iant/github/datashader/datashader/data_libraries/pandas.py", line 17, in pandas_pipeline
        return glyph_dispatch(glyph, df, schema, canvas, summary, antialias=antialias)
      File "/Users/iant/github/datashader/datashader/utils.py", line 112, in __call__
        return lk[cls](head, *rest, **kwargs)
      File "/Users/iant/github/datashader/datashader/data_libraries/pandas.py", line 48, in default
        extend(bases, source, x_st + y_st, x_range + y_range)
      File "/Users/iant/github/datashader/datashader/glyphs/line.py", line 382, in extend
        do_extend(
      File "/Users/iant/github/datashader/datashader/glyphs/line.py", line 1306, in extend_cpu
        perform_extend_line(
      File "/Users/iant/github/datashader/datashader/glyphs/line.py", line 1278, in perform_extend_line
        x1 = xs[j + 1]
    IndexError: index 1 is out of bounds for axis 0 with size 1
    

    There should be an explicit check that the coordinates have compatible lengths and an appropriate error reported, regardless of whether numba jitting is enabled or not.

    This is using the latest commit (a1d9513915a) of the main branch.

    bug 
    opened by ianthomas23 0
  • Support numpy 1.24

    Support numpy 1.24

    Prior to numpy 1.24 creating an array from ragged nested sequences produced a VisibleDeprecationWarning. With 1.24 this is now a ValueError. This is OK currently as numba doesn't yet support numpy 1.24 but it needs to be fixed here before that happens, so it is quite urgent.

    Thanks to @Hoxbro for identifying this (https://github.com/holoviz/geoviews/pull/608).

    bug 
    opened by ianthomas23 0
  • Expose computed bounds after aggregation

    Expose computed bounds after aggregation

    Currently the data bounds will computed during aggregation if the x or y ranges are not provided explicitely:

    canvas = ds.Canvas(plot_width=300, plot_height=300, x_axis_type='linear', y_axis_type='linear')
    aggregation = canvas.points(df, 'x', 'y', agg=ds.count())
    

    However those computed ranges do not seem to get exposed and are thus not really accessible. Nevertheless they are quite important when displaying the data. Thus I propose to utilize the attrs field of the xarray.DataArray (which can hold arbitrary metadata) to provide the (computed) ranges. Essentially I want to get the following:

    x_range = aggregation.attrs["x_range"]
    y_range = aggregation.attrs["y_range"]
    
    enhancement 
    opened by muendlein 2
  • Add new where reduction

    Add new where reduction

    This partially implements issue #1126, adding a new where reduction that accepts either a max or min reduction. Best illustrated via an example:

    import datashader as ds
    import numpy as np
    import pandas as pd
    
    x = np.arange(2)
    df = pd.DataFrame(dict(
        y_from = [0.0, 1.0, 0.0, 1.0, 0.0],
        y_to   = [0.0, 1.0, 1.0, 0.0, 0.5],
        value  = [1.1, 3.3, 5.5, 2.2, 4.4],
        other  = [-55, -77, -99, -66, -88],
    ))
    
    canvas = ds.Canvas(plot_height=3, plot_width=5)
    agg = canvas.line(
        source=df, x=x, y=["y_from", "y_to"], axis=1,
        agg=ds.where(ds.max("value"), "other"),
    )
    
    print(agg)
    

    which outputs

    <xarray.DataArray (y: 3, x: 5)>
    array([[-99., -88., -55., -66., -66.],
           [ nan, -99., -99., -88., -88.],
           [-77., -77., -77., -99., -99.]])
    Coordinates:
      * x        (x) float64 0.1 0.3 0.5 0.7 0.9
      * y        (y) float64 0.1667 0.5 0.8333
    

    You can think of this using the max('value') reduction as normal, but then returning the corresponding values from the 'other' column rather that the value column.

    What it currently supports:

    • where takes either a min or max selector reduction.
    • Works on CPU (not GPU), with or without dask.
    • Works with antialiased lines.
    • Cannot be used within a summary or categorical by reduction.

    Note that there is no support for use of first and last within a where because there is no advantage in doing this, you can just use the first or last directly on their own.

    Future improvements:

    • Support within categorical reductions.
    • Support for GPU.
    • If lookup_column is not specified, use the index of the row in the supplied DataFrame.
    • New max_n, min_n, first_n, last_n reductions.

    All of these are possible but fiddly to implement, so I would rather have partial functionality available for users to experiment with and I can add these improvements over time.

    Currently some combinations of lines and dask give different results depending on the number of dask partitions, but this has always been the situation and is no worse here.

    opened by ianthomas23 4
Releases(v0.14.3)
  • v0.14.3(Nov 17, 2022)

    This release fixes a bug related to spatial indexing of spatialpandas.GeoDataFrames, and introduces enhancements to antialiased lines, benchmarking and GPU support.

    Thanks to first-time contributors @eriknw and @raybellwaves, and also @ianthomas23 and @maximlt.

    Enhancements:

    • Improvements to antialiased lines:

      • Fit antialiased line code within usual numba/dask framework (#1142)
      • Refactor stage 2 aggregation for antialiased lines (#1145)
      • Support compound reductions for antialiased lines on the CPU (#1146)
    • New benchmark framework:

      • Add benchmarking framework using asv (#1120)
      • Add cudf, dask and dask-cudf Canvas.line benchmarks (#1140)
    • Improvements to GPU support:

      • Cupy implementation of eq_hist (#1129)
    • Improvements to documentation:

      • Fix markdown syntax for link (#1119)
      • Add text link to https://examples.pyviz.org/datashader_dashboard (#1123)
    • Improvements to dependency management (#1111, #1116)

    • Improvements to CI (#1132, #1135, #1136, #1137, #1143)

    Bug fixes:

    • Ensure spatial index _sindex is retained on dataframe copy (#1122)
    Source code(tar.gz)
    Source code(zip)
  • v0.14.2(Aug 10, 2022)

    This is a bug fix release to fix an important divide by zero bug in antialiased lines, along with improvements to documentation and handling of dependencies.

    Thanks to @ianthomas23 and @adamjhawley.

    Enhancements:

    • Improvements to documentation:

      • Fix links in docs when viewed in browser (#1102)
      • Add release notes (#1108)
    • Improvements to handling of dependencies:

      • Correct dask and bokeh dependencies (#1104)
      • Add requests as an install dependency (#1105)
      • Better handle returned dask npartitions in tests (#1107)

    Bug fixes:

    • Fix antialiased line divide by zero bug (#1099)
    Source code(tar.gz)
    Source code(zip)
  • v0.14.1(Jun 21, 2022)

    This release provides a number of important bug fixes and small enhancements from Ian Thomas along with infrastructure improvements from Maxime Liquet and new reductions from @tselea.

    Enhancements:

    • Improvements to antialiased lines:
      • Support antialiased lines for categorical aggregates (#1081,#1083)
      • Correctly handle NaNs in antialiased line coordinates (#1097)
    • Improvements to rescale_discrete_levels for how='eq_hist':
      • Correct implementation of rescale_discrete_levels (#1078)
      • Check before calling rescale_discrete_levels (#1085)
      • Remove empty histogram bins in eq_hist (#1094)
    • Implementation of first and last reduction (#1093) for data types other than raster.

    Bug fixes:

    • Do not snap trimesh vertices to pixel grid (#1092)
    • Correctly orient (y, x) arrays for xarray (#1095)
    • Infrastructure/build fixes (#1080, #1089, #1096)
    Source code(tar.gz)
    Source code(zip)
  • v0.14.0(Apr 25, 2022)

    This release has been nearly a year in the making, with major new contributions from Ian Thomas, Thuy Do Thi Minh, Simon Høxbro Hansen, Maxime Liquet, and James Bednar, and additional support from Andrii Oriekhov, Philipp Rudiger, and Ajay Thorve.

    Enhancements:

    • Full support for antialiased lines of specified width (#1048, #1072). Previous antialiasing support was limited to single-pixel lines and certain floating-point reduction functions. Now supports arbitrary widths and arbitrary reduction functions, making antialiasing fully supported. Performance ranges from 1.3x to 14x slower than the simplest zero-width implementation; see benchmarks.
    • Fixed an issue with visibility on zoomed-in points plots and on overlapping line plots that was first reported in 2017, with a new option rescale_discrete_levels for how='eq_hist' (#1055)
    • Added a categorical color_key for 2D (unstacked) aggregates (#1020), for producing plots where each pixel has at most one category value
    • Improved docs:
      • A brand new polygons guide (#1071)
      • A new guide to 3D aggregations using by, now documenting using categorizer objects to do 3D numerical binning (#1071)
      • Moved documentation for spreading to its own section so it can be presented at the right pipeline stage (was mixed up with colormapping before) (#1071)
      • Added rescale_discrete_levels example (#1071)
      • Other misc doc cleanup (#1035, #1037, #1058, #1074, #1077)

    Bugfixes:

    • Fixed details of the raster coordinate calculations to match other primitives, making it simpler to overlay separately rendered results (#959, #1046)
    • Various fixes and extensions for cupy/CUDA, e.g. to use cuda for category_binning, spread, and dynspread, including cupy.interp where appropriate (#1015, #1016, #1044, #1050, #1060)
    • Infrastructure/build/ecosystem fixes (#1022, #1025, #1027, #1036, #1045, #1049, #1050, #1057, #1061, #1062, #1063, #1064)

    Compatibility:

    • Canvas.line() option antialias=True is now deprecated; use line_width=1 (or another nonzero value) instead. (#1048)
    • Removed long-deprecated bokeh_ext.py (#1059)
    • Dropped support for Python 2.7 (actually already dropped from the tests in Datashader 0.12) and 3.6 (no longer supported by many downstream libraries like rioxarray, but several of them are not properly declaring that restriction, making 3.6 much more difficult to support.) (#1033)
    • Now tested on Python 3.7, 3.8, 3.9, and 3.10. (#1033)
    Source code(tar.gz)
    Source code(zip)
  • v0.13.0(Jun 9, 2021)

    Version 0.13.0

    Thanks to Jim Bednar, Nezar Abdennur, Philipp Rudiger, and Jean-Luc Stevens.

    Enhancements:

    • Defined new dynspread metric based on counting the fraction of non-empty pixels that have non-empty pixels within a given radius. The resulting dynspread behavior is much more intuitive than the old behavior, which counted already-spread pixels as if they were neighbors (#1001)
    • Added ds.count() as the default reduction for ds.by (#1004)

    Bugfixes:

    • Fixed array-bounds reading error in dynspread (#1001)
    • Fix color_key argument for dsshow (#986)
    • Added Matplotlib output to the 3_Interactivity getting started page. (#1009)
    • Misc docs fixes (#1007)
    • Fix nan assignment to integer array in RaggedArray (#1008)

    Compatibility:

    • Any usage of dynspread with datatypes other than points should be replaced with spread(), which will do what was probably intended by the original dynspread call, i.e. to make isolated lines and shapes visible. Strictly speaking, dynspread could still be useful for other glyph types if that glyph is contained entirely in a pixel, e.g. if a polygon or line segment is located within the pixel bounds, but that seems unlikely.
    • Dynspread may need to have the threshold or max_px arguments updated to achieve the same spreading as in previous releases, though the new behavior is normally going to be more useful than the old.
    Source code(tar.gz)
    Source code(zip)
  • v0.12.1(Mar 22, 2021)

    Major release with new features that should really be considered part of the upcoming 0.13 release; please treat all the new features as experimental in this release due to it being officially a minor release (unintentionally).

    Massive thanks to these contributors for substantial new functionality:

    • Nezar Abdennur (nvictus), Trevor Manz, and Thomas Caswell for their contributions to the new dsshow() support for using Datashader as a Matplotlib Artist, providing seamless interactive Matplotlib+Datashader plots.
    • Oleg Smirnov for category_modulo and category_binning for by(), making categorical plots vastly more powerful.
    • Jean-Luc Stevens for spread and dynspread support for numerical aggregate arrays and not just RGB images, allowing isolated datapoints to be made visible while still supporting hover, colorbars, and other plot features that depend on the numeric aggregate values.
    • Valentin Haenel for the initial anti-aliased line drawing support (still experimental).

    Thanks to Jim Bednar, Philipp Rudiger, Peter Roelants, Thuy Do Thi Minh, Chris Ball, and Jean-Luc Stevens for maintenance and other contributions.

    New features:

    • Expanded (and transposed) performance guide table (#961)
    • Add category_modulo and category_binning for grouping numerical values into categories using by() (#927)
    • Support spreading for numerical (non-RGB) aggregate arrays (#771, #954)
    • Xiaolin Wu anti-aliased line drawing, enabled by adding antialias=True to the Canvas.line() method call. Experimental; currently restricted to sum and max reductions ant only supporting a single-pixel line width. (#916)
    • Improve Dask performance issue from #899 using a tree reduction (#926)

    Bugfixes:

    • Fix for xarray 0.17 raster files, supporting various nodata conventions (#991)
    • Fix RaggedArray tests to keep up with Pandas test suite changes (#982, #993)
    • Fix out-of-bounds error on Points aggregation (#981)
    • Fix CUDA issues (#973)
    • Fix Xarray handling (#971)
    • Disable the interactivity warning on the homepage (#983)

    Compatibility:

    • Drop deprecated modules ds.geo (moved to xarray_image) and ds.spatial (moved to SpatialPandas) (#955)
    Source code(tar.gz)
    Source code(zip)
  • v0.11.1(Aug 16, 2020)

    This release is primarily a compatibility release for newer versions of Rapids cuDF and Numba versions along with a small number of bug fixes. With contributions from @jonmmease, @stuartarchibald, @AjayThorve, @kebowen730, @jbednar and @philippjfr.

    • Fixes support for cuDF 0.13 and Numba 0.48 (#933)
    • Fixes for cuDF support on Numba>=0.51 (#934, #947)
    • Fixes tile generation using aggregators with output of boolean dtype (#949)
    • Fixes for CI and build infrastructure (#935, #948, #951)
    • Updates to docstrings (b1349e3, #950)
    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(May 25, 2020)

    This release includes major contributions from @maihde (generalizing count_cat to by, span for colorize), @jonmmease (Dask quadmesh support), @philippjfr and @jbednar (count_cat/by/colorize/docs/bugfixes), and Barry Bragg, Jr. (TMS tileset speedups).

    New features (see getting_started/2_Pipeline.ipynb for examples):

    • New by() categorical aggregator, extending count_cat to work with other reduction functions, no longer just count. Allows binning of aggregates separately per category value, so that you can compare how that aggregate is affected by category value. (#875, #902, #904, #906). See example in the holoviews docs.
    • Support for negative and zero values in tf.shade for categorical aggregates. (#896, #909, #910, #908)
    • Support for span in _colorize(). (#875, #910)
    • Support for Dask-based quadmesh rendering for rectilinear and curvilinear mesh types (#885, #913)
    • Support for GPU-based raster mesh rendering (via Canvas.quadmesh) (#872)
    • Faster TMS tileset generation (#886)
    • Expanded performance guide (#868)

    Bugfixes:

    • Misc bugfixes and improvements (#874, #882, #888, #889, #890, #891)

    Compatibility (breaking changes and deprecations):

    • To allow negative-valued aggregates, count_cat now weights categories according to how far they are from the minimum aggregate value observed, while previously they were referenced to zero. Previous behavior can be restored by passing color_baseline=0 to count_cat or by.
    • count_cat is now deprecated and removed from the docs; use by(..., count()) instead.
    • Result of a count() aggregation is now uint32, not int32, to distinguish counts from other aggregation types (#910).
    • tf.shade now only treats zero values as missing for count aggregates (uint); zero is otherwise a valid value distinct from NaN (#910).
    • alpha is now respected as the upper end of the alpha range for both _colorize() and _interpolate() in tf.shade; previously only _interpolate respected it.
    • Added new nansum_missing utility for working with Numpy>1.9, where nansum no longer returns NaN for all-NaN values.
    • ds.geo and ds.spatial modules are now deprecated; their contents have moved to xarray_spatial and spatialpandas, respectively. (#894)

    Download and install: https://datashader.org/getting_started

    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Jan 21, 2020)

    This release includes major contributions from @jonmmease (polygon rendering, spatialpandas), along with contributions from @philippjfr and @brendancol (bugfixes), and @jbednar (docs, warnings, and import times).

    New features:

    • Polygon (and points and lines) rendering for spatialpandas extension arrays (#826, #853)
    • Quadmesh GPU support (#861)
    • Much faster import times (#863)
    • New table in docs listing glyphs supported for each data library (#864,#867)
    • Support for remote Parquet filesystems (#818,#866)

    Bugfixes and compatibility:

    • Misc bugfixes and improvements (#844, #860, #866)
    • Fix warnings and deprecations in tests (#859)
    • Fix Canvas.raster (padding, mode buffers, etc. #862)

    Download and install: https://datashader.org/getting_started

    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Dec 8, 2019)

    This release includes major contributions from @jonmmease (GPU support), along with contributions from @brendancol (viewshed speedups), @jbednar (docs), and @jsignell (examples, maintenance, website).

    New features:

    • Support for CUDA GPU dataframes (cudf and dask_cudf) (#794, #793, #821, #841, #842)
    • Documented new quadmesh support (renaming user guide section 5_Rasters to 5_Grids to reflect the more-general grid support) (#805)

    Bugfixes and compatibility:

    • Avoid double-counting line segments that fit entirely into a single rendered pixel (#839)
    • Improved geospatial toolbox, including 75X speedups to viewshed algorithm (#811, #824, #844)
    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Oct 8, 2019)

    This release includes major contributions from @jonmmease (quadmesh and filled-area support), @brendancol (geospatial toolbox, tile previewer), @philippjfr (distributed regridding, dask performance), and @jsignell (examples, maintenance, website).

    New features:

    • Native quadmesh (canvas.quadmesh()) support (for rectilinear and curvilinear grids -- 3X faster than approximating with a trimesh; #779)
    • Filled area (canvas.area()) support (#734)
    • Expanded geospatial toolbox, with support for:
      • Zonal statistics (#782)
      • Calculating viewshed (#781)
      • Calculating proximity (Euclidean and other distance metrics, #772)
    • Distributed raster regridding with Dask (#762)
    • Improved dask performance (#798, #801)
    • tile_previewer utility function (simple Bokeh-based plotting of local tile sources for debugging; #761)

    Bugfixes and compatibility:

    • Compatibility with latest Numba, Intake, Pandas, and Xarray (#763, #768, #791)
    • Improved datetime support (#803)
    • Simplified docs (now built on Travis, and no longer requiring GeoViews) and examples (now on examples.pyviz.org)
    • Skip rendering of empty tiles (#760)
    • Improved performance for point, area, and line glyphs (#780)
    • InteractiveImage and Pipeline are now deprecated; removed from examples (#751)
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Apr 8, 2019)

    This release includes major contributions from @jonmmease (raqgged array extension, SpatialPointsFrame, row-oriented line storage, dask trimesh support), @jsignell (maintenance, website), and @jbednar (Panel-based dashboard).

    New features:

    • Simplified Panel-based dashboard using new Param features; now only 48 lines with fewer new concepts (#707)
    • Added pandas ExtensionArray and Dask support for storing homogeneous ragged arrays (#687)
    • Added SpatialPointsFrame and updated census, osm-1billion, and osm examples to use it (#702, #706, #708)
    • Expanded 8_Geography.ipynb to document other geo-related functions
    • Added Dask support for trimesh rendering, though computing the mesh initially still requires vertices and simplicies to fit into memory (#696)
    • Add zero-copy rendering of row-oriented line coordinates, using a new axis argument (#694)

    Bugfixes and compatibility:

    • Added lnglat_to_meters to geo module; new code should import it from there (#708)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.9(Jan 29, 2019)

    This release includes major contributions from @jonmmease (fixing several long-standing bugs), @jlstevens (updating all example notebooks to use current syntax, #685), @jbednar, @philippjfr, and @jsignell (Panel-based dashboard), and @brendancol (geo utilities).

    New features:

    • Replaced outdated 536-line Bokeh dashboard.py with 71-line Panel+HoloViews dashboard .ipynb (#676)
    • Allow aggregating xarray objects (in addition to Pandas and Dask DataFrames) (#675)
    • Create WMTS tiles from Datashader data (#636)
    • Added various geographic utility functions (ndvi, slope, aspect, hillshade, mean, bump map, Perlin noise) (#661)
    • Made OpenSky data public (#691)

    Bugfixes and compatibility:

    • Fix array bounds error on line glyph (#683)
    • Fixed the span argument to tf.shade (#680)
    • Fixed composite.add (for use in spreading) to clip colors rather than overflow (#689)
    • Fixed gerrymandering shape file (#688)
    • Updated to match Bokeh (#656), Dask (#681, #667), Pandas/Numpy (#697)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.8(Sep 11, 2018)

    Minor, mostly bugfix, release with some speed improvements.

    New features:

    • Added Strange Attractors example (#632)
    • Major speedup: optimized dask datashape detection (#634)

    Bugfixes and compatibility:

    • Silenced inappropriate warnings (#631)
    • Fixed various other bugs, including #644
    • Added handling for zero data and zero range (#612, #648)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.7(Jul 7, 2018)

  • v0.6.6(May 20, 2018)

    Minor bugfix release.

    • Now available to install using pip (pip install datashader) or conda defaults (conda install datashader)
    • InteractiveImage is now deprecated; please use the Datashader support in HoloViews instead.
    • Updated installation and example instructions to use new datashader command.
    • Made package building automatic, to allow more frequent releases
    • Ensured transparent (not black) image is returned when there is no data to plot (thanks to Nick Xie)
    • Simplified getting-started example (thanks to David Jones)
    • Various fixes and compatibility updates to examples
    Source code(tar.gz)
    Source code(zip)
  • 0.6.5(Feb 1, 2018)

    Major release with extensive support for triangular meshes and changes to the raster API.

    New features:

    • Trimesh support: Rendering of irregular triangular meshes using Canvas.trimesh() (see user guide) (#525,#552)
    • Added a new website at datashader.org, with new Getting Started pages and an extensive User Guide, with about 50% new material not previously in example notebooks. Built entirely from Jupyter notebooks, which can be run in the examples/ directory. Website is now complete except for sections on points (see the nyc_taxi example in the meantime).
    • Canvas.raster() now accepts xarray Dataset types, not just DataArrays, with the specific DataArray selectable from the Dataset using the column= argument of a supplied aggregation function.
    • tf.Images() now displays anything with an HTML representation, to allow laying out Pandas dataframes alongside datashader output.

    Bugfixes and compatibility:

    • Changed Raster API to match other glyph types:
      • Now accepts a reduction function via an agg= argument like Canvas.line(), Canvas.points(), etc. The previous downsample_method is still accepted for this release, but is now deprecated.
      • upsample_method is now interpolate, accepting linear=True or linear=False; the previous spelling is now deprecated.
      • The layer= argument previously accepted a 1-based integer index, which was confusing given the standard Python 0-based indexing elsewhere. Changed to accept an xarray coordinate, which can be a 1-based index if that's what is defined on the array, but also works with arbitrary floating-point coordinates (e.g. for a depth parameter in an image stack).
      • Now auto-ranges in x and y when not given explicit ranges, instead of raising an error.
    • Fixed various bugs, including one generating incorrect output in Canvas.raster(agg='mode')
    Source code(tar.gz)
    Source code(zip)
  • 0.6.4(Dec 5, 2017)

    Minor compatibility release to track changes in external packages.

    • Updated imports for bokeh 0.12.11 (fixes #535), though there are issues in 0.12.11 itself and so 0.12.12 should be used instead (to be released shortly).
    • Pinned pillow version on Windows (fixes #534).
    Source code(tar.gz)
    Source code(zip)
  • 0.6.3(Dec 1, 2017)

    Apart from the new website, this is a minor release primarily to catch up with changes in external libraries.

    New features:

    • Reorganized examples directory as the basis for a completely new website at https://bokeh.github.io/datashader-docs (#516).
    • Added tf.Images() class to format multiple labeled Datashader images as a table in a Jupyter notebook, now used extensively in the new website.
    • Added utility function dataframe_from_multiple_sequences(x_values, y_values) to convert large numbers of sequences stored as 2D numpy arrays to a NaN-separated pandas dataframe that can be displayed efficiently (see new example in tseries.ipynb) (#512).
    • Improved streaming support (#520).

    Bugfixes and compatibility:

    • Added support for Dask 0.15 and 0.16 and pandas 0.21 (#523,#529) and declared minimum required Numba version.
    • Improved and fixed issues with various example notebooks, primarily to update for changes in dependencies.
    • Changes in network graph support: ignore id field by default to avoid surprising dependence on column name, rename directly_connect_edges to connect_edges for accuracy and conciseness.
    Source code(tar.gz)
    Source code(zip)
  • 0.6.2(Oct 25, 2017)

    Release with bugfixes, changes to match external libraries, and some new features.

    Backwards compatibility:

    • Minor changes to network graph API, e.g. to ignore weights by default in forcelayout2 (#488)
    • Fix upper-bound bin error for auto-ranged data (#459). Previously, points falling on the upper bound of the plotted area were excluded from the plot, which was consistent with the behavior for individual grid cells, but which was confusing and misleading for the outer boundaries. Points falling on the very outermost boundaries are now folded into the final grid cell, which should be the least surprising behavior.

    New or updated examples (.ipynb files in examples/):

    • streaming-aggregation.ipynb: Illustrates combining incoming streams of data for display (also see holoviews streaming).
    • landsat.ipynb: simplified using HoloViews; now includes plots of full spectrum for each point via hovering.
    • Updated and simplified census-hv-dask (now called census-congressional), census-hv, packet_capture_graph.

    New features and improvements

    • Updated Bokeh support to work with new bokeh 0.12.10 release (#505)
    • More options for network/graph plotting (configurable column names, control over weights usage; #488, #494)
    • For lines plots (time series, trajectory, networ graphs), switch line-clipping algorithm from Cohen-Sutherland to Liang-Barsky. The performance gains for random lines range from 50-75% improvement for a million lines. (#495)
    • Added tf.Images class to format a list of images as an HTML table (#492)
    • Faster resampling/regridding operations (#486)

    Known issues:

    • examples/dashboard has not yet been updated to match other libraries, and is thus missing functionality like hovering and legends.
    • A full website with documentation has been started but is not yet ready for deployment.
    Source code(tar.gz)
    Source code(zip)
  • 0.6.1(Sep 13, 2017)

    Minor bugfix release, primarily updating example notebooks to match API changes in external packages.

    Backwards compatibility:

    • Made edge bundling retain edge order, to allow indexing, and absolute coordinates, to allow overlaying on external data.
    • Updated examples to show that xarray now requires dimension names to match before doing arithmetic or comparisons between arrays.

    Known issues:

    • If you use Jupyter notebook 5.0 (earlier or later versions should be ok), you will need to override a setting that prevents visualizations from appearing, e.g.: jupyter notebook --NotebookApp.iopub_data_rate_limit=100000000 census.ipynb &
    • The dashboard needs to be rewritten entirely to match current Bokeh and HoloViews releases, so that hover and legend support can be restored.
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Aug 19, 2017)

    New release of features that may still be in progress, but are already usable:

    • Added graph/network plotting support (still may be in flux) (#385, #390, #398, #408, #415, #418, #436)
    • Improved raster regridding based on gridtools and xarray (still may be in flux); no longer depends on rasterio and scikit-image (#383, #389, #423)
    • Significantly improved performance for dataframes with categorical fields

    New examples (.ipynb files in examples/):

    • osm-1billion: 1-billion-point OSM example, for in-core processing on a 16GB laptop.
    • edge_bundling: Plotting graphs using "edgehammer" bundling of edges to show structure.
    • packet_capture_graph: Laying out and visualizing network packets as a graph.

    Backwards compatibility:

    • Remove deprecated interpolate and colorize functions
    • Made raster processing consistently use bin centers to match xarray conventions (requires recent fixes to xarray; only available on a custom channel for now) (#422)
    • Fixed various limitations and quirks for NaN values
    • Made alpha scaling respect min_alpha consistently (#371)

    Known issues:

    • If you use Jupyter notebook 5.0 (earlier or later versions should be ok), you will need to override a setting that prevents visualizations from appearing, e.g.: jupyter notebook --NotebookApp.iopub_data_rate_limit=100000000 census.ipynb &
    • The dashboard needs updating to match current Bokeh releases; most parts other than hover and legends, should be functional but it needs a rewrite to use currently recommended approaches.
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(May 12, 2017)

    Major release with extensive optimizations and new plotting-library support, incorporating 9 months of development from 5 main contributors:

    • Extensive optimizations for speed and memory usage, providing at least 5X improvements in speed (using the latest Numba versions) and 2X improvements in peak memory requirements. Outlined in #313 and #129.
    • Added HoloViews support for flexible, composable, dynamic plotting, making it simple to switch between datashaded and non-datashaded versions of a Bokeh or Matplotlib plot.
    • Added examples/environment.yml to make it easy to install dependencies needed to run the examples.
    • Updated examples to use the now-recommended supported and fast Apache Parquet file format
    • Added support for variable alpha for non-categorical aggregates, by specifying a single color rather than a list or colormap #345
    • Added datashader.utils.lnglat_to_meters utility function for working in Web Mercator coordinates with Bokeh
    • Added discussion of why you should be using uniform colormaps, and examples of using uniform colormaps from the new colorcet package
    • Numerous bug fixes and updates, mostly in the examples and Bokeh extension
    • Updated reference manual and documentation

    New examples (.ipynb files in examples/):

    • holoviews_datashader: Using HoloViews to create dynamic Datashader plots easily
    • census-hv-dask: Using GeoViews for overlaying shape files, demonstrating gerrymandering by race
    • nyc_taxi-paramnb: Using ParamNB to make a simple dashboard
    • lidar: Visualizing point clouds
    • solar: Visualizing solar radiation data
    • Dynamic 1D histogram example (last code cell in examples/nyc_taxi-nongeo.ipynb)
    • dashboard: Now includes opensky example (python dashboard/dashboard.py -c dashboard/opensky.yml)

    Backwards compatibility:

    • To improve consistency with Numpy and Python data structures and eliminate issues with an empty column and row at the edge of the aggregated raster, the provided xrange,yrange bounds are now treated as upper exclusive. Results will thus differ between 0.5.0 and earlier versions. See #259 for discussion.

    Known issues:

    • If you use Jupyter notebook 5.0 (earlier or later versions should be ok), you will need to override a setting that prevents visualizations from appearing, e.g.: jupyter notebook --NotebookApp.iopub_data_rate_limit=100000000 census.ipynb &
    • Legend and hover support is currently disabled for the dashboard, due to ongoing development of a simpler approach.
    Source code(tar.gz)
    Source code(zip)
  • 0.4.0(Aug 18, 2016)

    Minor bugfix release to support Bokeh 0.12.1, with some API and defaults changes.

    • Added examples() function to obtain the notebooks and other examples corresponding to the installed datashader version; see examples/README.md.
    • Updated dashboard example to match changes in Bokeh
    • Added default color cycle with distinguishable colors for shading categorical data; now tf.shade(agg) with no other arguments should give a usable plot for both categorical and non-categorical data.

    Backwards compatibility:

    • Replaced confusing tf.interpolate() and tf.colorize() functions with a single shading function tf.shade(). The previous names are still supported, but give deprecation warnings. Calls to the previous functions using keyword arguments can simply be renamed to use tf.shade, as all the same keywords are accepted, but calls to colorize that used a positional argument for e.g. the color_key will now need to use a keyword when calling shade().
    • Increased default threshold for tf.dynspread() to improve visibility of sparse dots
    • Increased default min_alpha for tf.shade() (formerly tf.colorize()) to avoid undersaturation

    Known issues:

    • For Bokeh 0.12.1, some notebooks will give warnings for Bokeh plots when used with Jupyter's "Run All" command. Bokeh 0.12.2 will fix this problem when it is released, but for now you can either downgrade to 0.12.0 or use single-cell execution.
    • There are some Bokeh compatibility issues with the dashboard example that are still being investigated and may require a new Bokeh or datashader release in this series.
    Source code(tar.gz)
    Source code(zip)
  • 0.3.2(Jul 18, 2016)

    Minor bugfix release to support Bokeh 0.12:

    • Fixed InteractiveImage zooming to work with Bokeh 0.12.
    • Added more responsive event throttling for DynamicImage; throttle parameter no longer needed and is now deprecated
    • Fixed datashader-download-data command
    • Improved non-geo Taxi example
    • Temporarily disabled dashboard legends; will re-enable in future release
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Jun 23, 2016)

    The major feature of this release is support of raster data via Canvas.raster. To use this feature, you must install the optional dependencies via conda install rasterio scikit-image. rasterio relies on gdal, whose conda package has some known bugs, including a missing dependancy for conda install krb5. InteractiveImage in this release requires bokeh 0.11.1 or earlier, and will not work with bokeh 0.12.

    • PR #160 #187 Improved example notebooks and dashboard
    • PR #186 #184 #178 Add datashader-download-data cli command for grabbing example datasets
    • PR #176 #177 Changed census example data to use HDF5 format (slower but more portable)
    • PR #156 #173 #174 Added Landsat8 and race/ethnicity vs. elevation example notebooks
    • PR #172 #159 #157 #149 Added support for images using Canvas.raster (requires rasterio and scikit-image)
    • PR #169 Added legends notebook demonstrating create_categorical_legend and create_ramp_legend
    • PR #162 Added notebook example for datashader.bokeh_ext.HoverLayer
    • PR #152 Added alpha arg to tf.interpolate
    • PR #151 #150, etc. Small bugfixes
    • PR #146 #145 #144 #143 Added streaming example
    • Added hold decorator to utils, summarize_aggregate_values helper function
    • Added FAQ to docs

    Backwards compatibility:

    • Removed memoize_method
    • Renamed datashader.callbacks --> datashader.bokeh_ext
    • Renamed examples/plotting_problems.ipynb --> examples/plotting_pitfalls.ipynb
    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Apr 1, 2016)

    A major release with significant new functionality and some small backwards-incompatible changes.

    New features:

    • PR #124, census: New census notebook example, showing how to work with categorical data.
    • PR #79, tseries, trajectory: Added line glyph and .any() reduction, used in new time series and trajectory notebook examples.
    • PR #76, #77, #131, etc.: Updated all of the other notebooks in examples/, including nyc_taxi.
    • PR #100, #125: Improved dashboard example: added categorical data support, census and osm datasets, legend and hover support, better performance, out of core option, and more
    • PR #109, #111: Add full colormap support via a new cmap argument to interpolate and colorize; supports color ranges as lists, plus Bokeh palettes and matplotlib colormaps
    • PR #98: Added set_background to make it easier to work with images having a different background color than the default white notebooks
    • PR #119, #121: Added eq_hist option for how in interpolate, performing histogram equalization on the data to reveal structure at every intensity level
    • PR #80, #83, #128: Greatly improved InteractiveImage performance and responsiveness
    • PR #74, #123: Added operators for spreading pixels (to make individual datapoints visible, as circles, squares, or arbitrary mask shapes) and compositing (for simple and flexible composition of images)

    Backwards compatibility:

    • The low and high color options to interpolate and colorize are now deprecated and will be removed in the next release; use cmap=[low,high] instead.
    • The transfer function merge has been removed to avoid confusion. stack and others can be used instead, depending on the use case.
    • The default how for interpolate and colorize is now eq_hist, to reveal the structure automatically regardless of distribution.
    • Pipeline now has a default dynspread step, to make isolated points visible when zooming in, and the default sizes have changed.
    Source code(tar.gz)
    Source code(zip)
  • 0.1.0(Apr 1, 2016)

Owner
HoloViz
High-level tools to simplify visualization in Python
HoloViz
Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

Two phase pipeline + Streamlit This is an example project that demonstrates how to create a pipeline that consists of two phases of execution. In betw

Rick Lamers 1 Nov 17, 2021
Udacity-api-reporting-pipeline - Udacity api reporting pipeline

udacity-api-reporting-pipeline In this exercise, you'll use portions of each of

Fabio Barbazza 1 Feb 15, 2022
An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify.

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify. The ETL process flows from AWS's S3 into staging tables in AWS Redshift.

null 1 Feb 11, 2022
Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

null 2 Nov 20, 2021
PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

Emmanuel Boateng Sifah 1 Jan 19, 2022
Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

Using Streaming Twitter Data with Kafka and Spark Reading streams of Twitter data, publishing them to Kafka topic, process message using Kafka Stream

Rustam Zokirov 1 Dec 6, 2021
A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

TennisBusinessIntelligenceProject - A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

carlo paladino 1 Jan 2, 2022
Renato 214 Jan 2, 2023
pipeline for migrating lichess data into postgresql

How Long Does It Take Ordinary People To "Get Good" At Chess? TL;DR: According to 5.5 years of data from 2.3 million players and 450 million games, mo

Joseph Wong 182 Nov 11, 2022
In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

ETL Pipeline for AWS Project Description In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift. The data is loaded from S3 t

Mobeen Ahmed 1 Nov 1, 2021
Full automated data pipeline using docker images

Create postgres tables from CSV files This first section is only relate to creating tables from CSV files using postgres container alone. Just one of

null 1 Nov 21, 2021
Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

Overview dataflow-mvp provides a basic example pipeline that pulls data from an API and writes it to a BigQuery table using GCP's Dataflow (i.e., Apac

Chris Carbonell 1 Dec 3, 2021
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is

null 6 Sep 7, 2022
ETL pipeline on movie data using Python and postgreSQL

Movies-ETL ETL pipeline on movie data using Python and postgreSQL Overview This project consisted on a automated Extraction, Transformation and Load p

Juan Nicolas Serrano 0 Jul 7, 2021
X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana

X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana

Nguyễn Quang Huy 5 Sep 28, 2022
Python library for creating data pipelines with chain functional programming

PyFunctional Features PyFunctional makes creating data pipelines easy by using chained functional operators. Here are a few examples of what it can do

Pedro Rodriguez 2.1k Jan 5, 2023
Full ELT process on GCP environment.

Rent Houses Germany - GCP Pipeline Project: The goal of the project is to extract data about house rentals in Germany, store, process and analyze it u

Felipe Demenech Vasconcelos 2 Jan 20, 2022
Pipeline and Dataset helpers for complex algorithm evaluation.

tpcp - Tiny Pipelines for Complex Problems A generic way to build object-oriented datasets and algorithm pipelines and tools to evaluate them pip inst

Machine Learning and Data Analytics Lab FAU 3 Dec 7, 2022
A pipeline that creates consensus sequences from a Nanopore reads. I

A pipeline that creates consensus sequences from a Nanopore reads. It clusters reads that are similar to each other and creates a consensus that is then identified using BLAST.

Ada Madejska 2 May 15, 2022