Summary statistics of geospatial raster datasets based on vector geometries.

Overview

rasterstats

BuildStatus

rasterstats is a Python module for summarizing geospatial raster datasets based on vector geometries. It includes functions for zonal statistics and interpolated point queries. The command-line interface allows for easy interoperability with other GeoJSON tools.

Documentation

For details on installation and usage, visit the documentation at http://pythonhosted.org/rasterstats.

What does it do?

Given a vector layer and a raster band, calculate the summary statistics of each vector geometry. For example, with a polygon vector layer and a digital elevation model (DEM) raster, compute the mean elevation of each polygon.

zones elevation

Command Line Quick Start

The command line interfaces to zonalstats and point_query are rio subcommands which read and write geojson features

$ fio cat polygon.shp | rio zonalstats -r elevation.tif

$ fio cat points.shp | rio pointquery -r elevation.tif

See the CLI Docs. for more detail.

Python Quick Start

For zonal statistics

>>> from rasterstats import zonal_stats
>>> stats = zonal_stats("tests/data/polygons.shp", "tests/data/slope.tif")
>>> stats[0].keys()
dict_keys(['min', 'max', 'mean', 'count'])
>>> [f['mean'] for f in stats]
[14.660084635416666, 56.60576171875]

and for point queries

>>> from rasterstats import point_query
>>> point = {'type': 'Point', 'coordinates': (245309.0, 1000064.0)}
>>> point_query(point, "tests/data/slope.tif")
[74.09817594635244]

Issues

Find a bug? Report it via github issues by providing

  • a link to download the smallest possible raster and vector dataset necessary to reproduce the error
  • python code or command to reproduce the error
  • information on your environment: versions of python, gdal and numpy and system memory
Comments
  • Wrong mean/std stats

    Wrong mean/std stats

    Hi,

    I think there is an error in calculating mean and standard deviation in rasterstats 0.10.3 Here is the statistics calculated in ArcMap for my region:
    Band COUNT MIN MAX MEAN STD MEDIAN 1 151057 1378 2399 1584 71 1577 2 151057 1025 2753 1318 119 1305 3 151057 594 3271 1035 191 1014 4 151057 380 3636 854 223 826 5 151057 113 4225 753 252 718 6 151057 366 4869 1853 642 1808 7 151057 153 7181 2947 1099 2873 8 151057 121 5926 2577 916 2519 9 151057 13880 59141 51615 3675 52475 10 151057 14939 44531 37989 1722 38133 11 151057 10063 56708 48523 3707 49289 12 151057 13024 46146 34272 5463 34959 13 151057 23169 59005 46253 3292 46804

    And here is the same statistics calculated using rasterstats (have a look at mean and std values for bands 9-13):
    Band count min max mean std percentile_50 1 151057 1378 2399 1584 71 1577 2 151057 1025 2753 1318 119 1305 3 151057 594 3271 1035 191 1014 4 151057 380 3636 854 223 826 5 151057 113 4225 753 252 718 6 151057 366 4869 1853 642 1808 7 151057 153 7181 2947 1099 2873 8 151057 121 5926 2577 916 2519 9 151057 13880 59141 23182 28669 52475 10 151057 14939 44531 9556 28485 38133 11 151057 10063 56708 20090 28673 49289 12 151057 13024 46146 5839 28953 34959 13 151057 23169 59005 17820 28623 46804

    The code to reproduce the results using rasterstats: for i in range(1, 14): stats = zonal_stats('F:/test.shp', "F:/test_2.dat", stats=['count','min', 'max', 'mean', 'std', 'percentile_50'], all_touched=False, band_num=i, nodata=0)

    Here is the data: https://www.dropbox.com/sh/e8g0jwr1ci0vp2r/AAATlwU2eFjtbtMCO3R5SB1Fa?dl=0

    Can you please let me know what the problem could be?

    opened by yurithefury 17
  • Geopandas and Zonal Statistic error

    Geopandas and Zonal Statistic error

    I run this code

    import geopandas as gpd
    from rasterstats import zonal_stats
    
    geodf = gpd.read_file("SHP/shape.shp")
    zonal_stats(geodf,"raster.tif")
    

    but I get this error:

    ParseException: Unknown type: 'FID'
    ParseException: Unexpected EOF parsing WKB
    Traceback (most recent call last):
    File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2820, in run_code
    exec code_obj in self.user_global_ns, self.user_ns
    File "<ipython-input-43-3d15c4dddd4f>", line 1, in <module>
    zonal_stats(geodf,"raster.tif",stats='mean')
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/main.py", line 21, in zonal_stats
    return list(gen_zonal_stats(*args, **kwargs))
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/main.py", line 128, in gen_zonal_stats
    for i, feat in enumerate(features_iter):
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/io.py", line 107, in <genexpr>
    features_iter = (parse_feature(x) for x in obj)
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/io.py", line 70, in parse_feature
    raise ValueError("Can't parse %s as a geojson Feature object" % obj)
    ValueError: Can't parse FID as a geojson Feature object
    

    The data frame looks like this:

    In[43]: geodf
    Out[42]: 
         FID                                      geometry
    
    0     0  POLYGON ((7.662965420348624 45.05600185397847,...
    1     1  POLYGON ((7.665210146514162 45.05607590098681,...
    2     2  POLYGON ((7.663353281768154 45.05601113630467,...
    3     3  POLYGON ((7.660229137844505 45.0570570693801, ...
    4     4  POLYGON ((7.660047499767138 45.0572381357898, ...
    ....
    

    Any suggestion how to solve this problem?

    bug 
    opened by LorenzoBottaccioli 14
  • Add groupby arg to gen_zonal_stats

    Add groupby arg to gen_zonal_stats

    This PR adds a groupby argument to gen_zonal_stats inspired by the ArcGIS gp.ZonalStatistics zone_field argument. The effect is that stats are computed by group, instead of by each individual feature.

    The groupby arg takes a string or a function. If the value is a string, it is assumed to be a field present in each feature properties. If the value is a function, the function's return value will be the groupby key.

    Caveats:

    • The major caveat of this feature is that groups containing overlapping geometry will yield unexpected results as all features are rasterized together...not individually. There are currently no warnings given for this use-case.
    • zone_arrays will be larger as the unioned bounds of the grouped features will make potentially large intermediate rasters
    • All zones will need to be able to fit into memory

    Also included in this PR is an example shapefile for use with testing.

    opened by brendancol 11
  • Partial polygon overlap in raster stats

    Partial polygon overlap in raster stats

    I am using the raster_stats module in python to calculate the mean carbon value for different sets of polygons. Here is the code that I am using:

    africa_sat = 'E:/Research_Work/data/processing_data/carbon_satcchi/data/raw_data/africa_carbon_1km.tif'
    name_clipshp = 'data/grid_clipped/' + name_hol + '_grid_cl.shp' # create object to point to clipped grid file
    
    africa_stats = raster_stats(name_clipshp, africa_sat,stats=['mean'])
    

    I have approximately 260 grid surface areas that I am employing. The grid areas have between ~10,000 polygons and ~40,000 polygons in them. Here is the error message I get for some of grids:

    Traceback (most recent call last): File "", line 16, in File "C:\Python27\lib\site-packages\rasterstats\main.py", line 153, in raster_stats rvds.SetGeoTransform(new_gt) AttributeError: 'NoneType' object has no attribute 'SetGeoTransform'

    I do not get this error message, if the polygon overlaps completely with the raster image (tiff file) or if the polygon does not at all overlap with the polygon. However, if only a part of the polygon overlaps with the raster I get the error that is above.

    Here is a link to a raster and shapefile that created the problem for me:

    https://www.dropbox.com/sh/c80v8v576imd09f/7pSGN-p9lS

    opened by engelmannjens 10
  • Rasterstats - Error - Rasterize Geom - Width and Height must be > 0

    Rasterstats - Error - Rasterize Geom - Width and Height must be > 0

    Hi Matthew,

    I was trying to do the basic zonal statistics and I cannot get around this error. I tried the process with different shapefiles (with polygons). The result was the same. I am running it with Jupyter-Lab

    Here is my env configuration with anaconda: python 3.7.7 rasterio 1.1.4 py37h02db82b_0 conda-forge rasterstats 0.14.0 py_0 requests 2.23.0 pyh8c360ce_2 conda-forge rioxarray 0.0.26 py_0 conda-forge xarray 0.15.1

    Here is my code:

    shapefile=r'G:\gadm\version36 simplified\gadm36_2_simplified.shp' temp_tif=r'E:\weather\data\gpm\late_source\temp.tif' from rasterstats import zonal_stats print(shapefile, temp_tif) zonal_stats(shapefile,temp_tif)

    Using those files: shapefile.zip temp.zip

    Returning this error: error.txt

    Objective: I would like to use the gid_2 as the code to output the zonal statistics for each polygon

    Thanks in advance

    opened by marcello-moreira 9
  • AttributeError when running zonal_stats

    AttributeError when running zonal_stats

    I'm trying to run zonal_stats and I'm getting the following error in the _init_ function of io.py (line 242): AttributeError: 'GeneratorContextManager' object has no attribute 'transform'

    I've recreated the error with a variety of raster and vector data on a fresh virtual environment with rasterstats version 0.13.0. I've experimented with using different combinations of rasterstats/rasterio versions but I'm still seeing the issue. Any insight would be greatly appreciated.

    Thanks!

    opened by HenryW95 9
  • Exit code -1073741819 (0xC0000005)

    Exit code -1073741819 (0xC0000005)

    Hi all, In django console, a simple code as :

    from rasterstats import zonal_stats, point_query
    stats = zonal_stats('Good Raster/wetlands.json', 'Good Raster/tempRaster.tif')
    

    I received : Process finished with exit code -1073741819 (0xC0000005) (Python Crash)

    Here is the files, (geojson and raster): https://www.dropbox.com/sh/gacm4iu00aama4v/AADUm0RD0LYD_0fUV6Mxo3QFa?dl=0

    Version : Windows 10 pro 64 bit PyCharm 2016.3.2 Python 2.7.12 Django 1.10.4 gdal 2.1.3 numpy 1.12.0+mkl rasterio 0.36.0 rasterstats 0.12

    Update: The entire Zonal Statistics project, with OGR, OSR, Gdal, numpy, geojson that I code myself, work good but too dam slow. I need a efficient Zonal Statistic with geojson and a raster. This lib could save me but a critical error occurred when I use it without any other code or import.

    Work in a virtual environment created by Pycharm. All in 32 bit

    Update2 error django console

    When ran into Shell from terminal, I got some error details.

    FATAL: CPLMalloc(): Out of memory allocating 516 bytes. Python crash

    error shell terminal

    Alex

    opened by Neavrys 8
  • Geo raster mini rasters

    Geo raster mini rasters

    @perrygeo Matthew I have changed the file so as to allow an option to use GeoRasters as output if georasters is installed on the system . GeoRasters have a function to write GeoTiffs. They should make life in general easier for the user...let me know if you want to incorporate this version.

    opened by ozak 8
  • New option: Cell area correction for unprojected data using a weighted mean based on cell latitude

    New option: Cell area correction for unprojected data using a weighted mean based on cell latitude

    This will allow users to generate stats using unprojected / WGS84 data while still accounting for cell area in the calculations used for the "mean" stat. It is just a weighted mean where every cell inside a feature is given a weight based on the latitude of the cells centroid using the haversine function.

    opened by sgoodm 7
  • Rasterstats install on Anaconda

    Rasterstats install on Anaconda

    I seem to be having trouble installing rasterstats successfully in the windows 64bit environment (Windows 7 or Server 2012). I am currently using Anaconda (python 2.7,also tried 3.4) to install, with either the rasterio install instructions (pip install GDAL-2.0.2-cp27-none-win32.whl and rasterio-0.34.0-cp27-cp27m-win32.whl) or via conda forge (https://github.com/conda-forge/rasterio-feedstock).

    In both cases gdal/rasterio will run as expected but when I try to install rasterstats (via rasterstats-0.10.3-py2-none-any.whl or pip install rasterstats) I receive an error "GeoVersion" or "GdalVersion" not found, accompanyied with a "python setup.py egg error code 1".

    I've checked that the corresponding package is installed and the version is up to date. Any ideas what may be causing the install issue?

    Thanks for your time. -Bob

    opened by ghost 7
  • memoryerror exception on raster file in specific projection

    memoryerror exception on raster file in specific projection

    I have a very simple test case, a shapefile, and a raster file in a latlon grid projection. When running stats function it generates an exception:

    python(36508,0x7fff7a4cc000) malloc: *** mach_vm_map(size=362320535494656) failed (error code=3)
    *** error: can't allocate region
    *** set a breakpoint in malloc_error_break to debug
    Traceback (most recent call last):
      File "./create_stats.py", line 56, in <module>
        main()
      File "./create_stats.py", line 52, in main
        create_stats(r, p, date)
      File "./create_stats.py", line 22, in create_stats
        geojson = zonal_stats(shp_file, tif_file, stats='mean sum', geojson_out=True)
      File "/net/ellis/usr/local/home/pavel/code/ricedss/.venv/lib/python2.7/site-packages/rasterstats/main.py", line 21, in zonal_stats
        return list(gen_zonal_stats(*args, **kwargs))
      File "/net/ellis/usr/local/home/pavel/code/ricedss/.venv/lib/python2.7/site-packages/rasterstats/main.py", line 136, in gen_zonal_stats
        fsrc = rast.read(bounds=geom_bounds)
      File "/net/ellis/usr/local/home/pavel/code/ricedss/.venv/lib/python2.7/site-packages/rasterstats/io.py", line 288, in read
        new_array = self.src.read(self.band, window=win, boundless=True, masked=masked)
      File "rasterio/_io.pyx", line 793, in rasterio._io.RasterReader.read (rasterio/_io.c:10053)
    MemoryError
    

    The test data are in http://oka.ags.io/scratch/

    failing code is

    stats = zonal_stats("RRD.shp", "RRD_rice_2014157.tif", stats='mean sum', geojson_out=True)
    

    working code on reprojected data works

    stats = zonal_stats("RRD_sinu.shp", "RRD_rice_2014157_sinu.tif", stats='mean sum', geojson_out=True)
    

    version is '0.10.0'

    opened by demiurg 7
  • ShapelyDeprecationWarning: 'type' attribute to 'geom_type' attribute

    ShapelyDeprecationWarning: 'type' attribute to 'geom_type' attribute

    Welcome to the python-rasterstats issue tracker. Thanks for putting together a bug report! By following the template below, we'll be better able to reproduce the problem on our end.

    If you don't have a bug specifically but a general support question, please visit https://gis.stackexchange.com/

    Describe the bug

    /usr/local/lib/python3.8/dist-packages/rasterstats/main.py:156: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.

    The warning that appears in the provided code refers to a change in the way that Shapely, a geometry processing library, handles the 'type' attribute of geometries. Instead of using the 'type' attribute, it is recommended to use the 'geom_type' attribute. To fix this warning, the code can be modified to use the 'geom_type' attribute instead of the 'type' attribute when necessary.

    To Reproduce Steps to reproduce the behavior:

    1. How did you install rasterstats and its dependencies?
    2. What datasets are necessary to reproduce the bug? Please provide links to example data if necessary.
    3. What code is necessary to reproduce the bug? Provide the code directly below or provide links to it.
    # Code to reproduce the error
    zonal_stats = rasterstats.zonal_stats(
        output_geojson, vi_array, 
        affine=affine, nodata=nodata,
        stats="min max mean ",
        geojson_out=True,
    ) 
    

    Feature Requests

    python-rasterstats is not currently accepting any feature requests via the issue tracker. If you'd like to add a backwards-compatible feature, please open a pull request - it doesn't need to be 100% ready but should include a working proof-of-concept, tests, and should not break the existing API.

    opened by marianobonelli 0
  • segfault when importing rasterio

    segfault when importing rasterio

    I'm not sure what is happening but as soon as I import rasterio I get a segfault (using the test data)

    >>> import rasterio # this will cause the segfault
    >>> from rasterstats import zonal_stats
    >>> 
    >>> stats = zonal_stats(
    ...     "polygons.shp",
    ...     "slope.tif",
    ... )
    [1]    3980 segmentation fault  python3
    

    name = "rasterstats" version = "0.17.0"

    name = "rasterio" version = "1.3.4"

    opened by TimMcCauley 1
  • Performance optimizations

    Performance optimizations

    A small collection of performance enhancements aimed at reducing the number of times the data is traversed.

    Also improved percentile computation by computing them all at once instead of one at a time (and avoid recreating the compressed array each time).

    opened by groutr 1
  • Make int64 casting optional

    Make int64 casting optional

    With 'large' raster datasets of integer datatype, rasterstats wastefully casts the ndarray read from the raster up to int64. In the extreme case, this uses 8 times more memory than necessary if the underlying datatype of the raster dataset is Byte.

    At least for categorical rasters and when using categorical=True, it seems desirable to disable this behaviour to save memory.

    In a not particularly extreme case, this saved us on the order of 20G of memory:

    The raster datasets covers all of Scotland and is of the Byte type. The individual features in the vector dataset are the Scottish country boundaries and therefore cover a decent portion of the raster area (and their bounding box an even larger portion).

    The raster dataset is 46478 * 69365 pixels, 1 byte per pixel = 3.00GiB uncompressed in memory if stored in an ndarray of dtype int8. It would be 24GiB in int64.

    opened by maxfenv 3
  • Resolve shapely deprecation warnings; use pytest.importorskip

    Resolve shapely deprecation warnings; use pytest.importorskip

    This PR resolves a few shapely deprecation warnings:

    • Use shapely.errors.ShapelyError (ReadingError was also only from shapely.errors)
    • Use geom.geom_type instead of geom.type
    • Fix inconsistent dimensionality error with test polygon
    • Use pytest.importorskip feature
    opened by mwtoews 1
  • missing `click` dependency

    missing `click` dependency

    Hi, this is more a proposal than a bug report. Since python-rasterstats directly imports click: https://github.com/perrygeo/python-rasterstats/blob/5da3110230df44e228ba21f16a9d31ab7e84e0d5/src/rasterstats/cli.py#L6

    and since there's a known issue with click<7.1 (#254) , adding a click>=7.1 in requirements.txt would be great.

    The following is a real case scenario in which it would come in handy:

    on a centos8/rocky8 server:

    $ python3 -mvenv --system-site-packages env
    $ env/bin/pip install rasterstats
    # ... various logs ...
    
    $ env/bin/python -c "import rasterstats"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/autofs/nfshomes/dbranchini/buttami/env/lib64/python3.6/site-packages/rasterstats/__init__.py", line 4, in <module>
        from rasterstats import cli
      File "/autofs/nfshomes/dbranchini/buttami/env/lib64/python3.6/site-packages/rasterstats/cli.py", line 28, in <module>
        @cligj.use_rs_opt
      File "/usr/lib/python3.6/site-packages/click/decorators.py", line 170, in decorator
        _param_memo(f, OptionClass(param_decls, **attrs))
      File "/usr/lib/python3.6/site-packages/click/core.py", line 1510, in __init__
        raise TypeError('Got secondary option for non boolean flag.')
    TypeError: Got secondary option for non boolean flag.
    

    (then after some googling I saw #254 and forced the update of click)

    opened by brancomat 2
Releases(0.17.0)
Owner
Matthew Perry
Matthew Perry
Creates 3D geometries from 2D vector graphics, for use in geodynamic models

geomIO - creating 3D geometries from 2D input This is the Julia and Python version of geomIO, a free open source software to generate 3D volumes and s

null 3 Feb 1, 2022
🌐 Local tile server for viewing geospatial raster files with ipyleaflet

?? Local Tile Server for Geospatial Rasters Need to visualize a rather large raster (gigabytes) you have locally? This is for you. A Flask application

Bane Sullivan 192 Jan 4, 2023
🌐 Local tile server for viewing geospatial raster files with ipyleaflet or folium

?? Local Tile Server for Geospatial Rasters Need to visualize a rather large (gigabytes) raster you have locally? This is for you. A Flask application

Bane Sullivan 192 Jan 4, 2023
Specification for storing geospatial vector data (point, line, polygon) in Parquet

GeoParquet About This repository defines how to store geospatial vector data (point, lines, polygons) in Apache Parquet, a popular columnar storage fo

Open Geospatial Consortium 449 Dec 27, 2022
User friendly Rasterio plugin to read raster datasets.

rio-tiler User friendly Rasterio plugin to read raster datasets. Documentation: https://cogeotiff.github.io/rio-tiler/ Source Code: https://github.com

null 372 Dec 23, 2022
Evaluation of file formats in the context of geo-referenced 3D geometries.

Geo-referenced Geometry File Formats Classic geometry file formats as .obj, .off, .ply, .stl or .dae do not support the utilization of coordinate syst

Advanced Information Systems and Technology 11 Mar 2, 2022
Raster-based Spatial Analysis for Python

?? xarray-spatial: Raster-Based Spatial Analysis in Python ?? Fast, Accurate Python library for Raster Operations ⚡ Extensible with Numba ⏩ Scalable w

makepath 649 Jan 1, 2023
Advanced raster and geometry manipulations

buzzard In a nutshell, the buzzard library provides powerful abstractions to manipulate together images and geometries that come from different kind o

Earthcube Lab 30 Jun 20, 2022
h3-js provides a JavaScript version of H3, a hexagon-based geospatial indexing system.

h3-js The h3-js library provides a pure-JavaScript version of the H3 Core Library, a hexagon-based geographic grid system. It can be used either in No

Uber Open Source 648 Jan 7, 2023
WebGL2 powered geospatial visualization layers

deck.gl | Website WebGL2-powered, highly performant large-scale data visualization deck.gl is designed to simplify high-performance, WebGL-based visua

Vis.gl 10.5k Jan 8, 2023
gpdvega is a bridge between GeoPandas and Altair that allows to seamlessly chart geospatial data

gpdvega gpdvega is a bridge between GeoPandas a geospatial extension of Pandas and the declarative statistical visualization library Altair, which all

Ilia Timofeev 49 Jul 25, 2022
Geospatial Image Processing for Python

GIPPY Gippy is a Python library for image processing of geospatial raster data. The core of the library is implemented as a C++ library, libgip, with

GIPIT 83 Aug 19, 2022
leafmap - A Python package for geospatial analysis and interactive mapping in a Jupyter environment.

A Python package for geospatial analysis and interactive mapping with minimal coding in a Jupyter environment

Qiusheng Wu 1.4k Jan 2, 2023
GeoNode is an open source platform that facilitates the creation, sharing, and collaborative use of geospatial data.

Table of Contents What is GeoNode? Try out GeoNode Install Learn GeoNode Development Contributing Roadmap Showcase Most useful links Licensing What is

GeoNode Development Team 1.2k Dec 26, 2022
Minimum Bounding Box of Geospatial data

BBOX Problem definition: The spatial data users often are required to obtain the coordinates of the minimum bounding box of vector and raster data in

Ali Khosravi Kazazi 1 Sep 8, 2022
The geospatial toolkit for redistricting data.

maup maup is the geospatial toolkit for redistricting data. The package streamlines the basic workflows that arise when working with blocks, precincts

Metric Geometry and Gerrymandering Group 60 Dec 5, 2022
A part of HyRiver software stack for handling geospatial data manipulations

Package Description Status PyNHD Navigate and subset NHDPlus (MR and HR) using web services Py3DEP Access topographic data through National Map's 3DEP

Taher Chegini 5 Dec 14, 2022
Enable geospatial data mining through Google Earth Engine in Grasshopper 3D, via its most recent Hops component.

AALU_Geo Mining This repository is produced for a masterclass at the Architectural Association Landscape Urbanism programme. Requirements Rhinoceros (

null 4 Nov 16, 2022
Spatial Interpolation Toolbox is a Python-based GUI that is able to interpolate spatial data in vector format.

Spatial Interpolation Toolbox This is the home to Spatial Interpolation Toolbox, a graphical user interface (GUI) for interpolating geographic vector

Michael Ward 2 Nov 1, 2021