Read and write rasters in parallel using Rasterio and Dask

Overview

dask-rasterio

Build Status codecov Join the chat at https://gitter.im/dymaxionlabs/dask-rasterio

dask-rasterio provides some methods for reading and writing rasters in parallel using Rasterio and Dask arrays.

Usage

Read a multiband raster

>>> from dask_rasterio import read_raster

>>> array = read_raster('tests/data/RGB.byte.tif')
>>> array
dask.array<stack, shape=(3, 718, 791), dtype=uint8, chunksize=(1, 3, 791)>

>>> array.mean()
dask.array<mean_agg-aggregate, shape=(), dtype=float64, chunksize=()>
>>> array.mean().compute()
40.858976977533935

Read a single band from a raster

>>> from dask_rasterio import read_raster

>>> array = read_raster('tests/data/RGB.byte.tif', band=3)
>>> array
dask.array<raster, shape=(718, 791), dtype=uint8, chunksize=(3, 791)>

Write a singleband or multiband raster

>>> from dask_rasterio import read_raster, write_raster

>>> array = read_raster('tests/data/RGB.byte.tif')

>>> new_array = array & (array > 100)
>>> new_array
dask.array<and_, shape=(3, 718, 791), dtype=uint8, chunksize=(1, 3, 791)>

>>> prof = ... # reuse profile from tests/data/RGB.byte.tif...
>>> write_raster('processed_image.tif', new_array, **prof)

Chunk size

Both read_raster and write_raster accept a block_size argument that acts as a multiplier to the block size of rasters. The default value is 1, which means the dask array chunk size will be the same as the block size of the raster file. You will have to adjust this value depending on the specification of your machine (how much memory do you have, and the block size of the raster).

Install

Install with pip:

pip install dask-rasterio

Development

This project is managed by Poetry. If you do not have it installed, please refer to Poetry instructions.

Now, clone the repository and run poetry install. This will create a virtual environment and install all required packages there.

Run poetry run pytest to run all tests.

Run poetry build to build package on dist/.

Issue tracker

Please report any bugs and enhancement ideas using the GitHub issue tracker:

https://github.com/dymaxionlabs/dask-rasterio/issues

Feel free to also ask questions on our Gitter channel, or by email.

Help wanted

Any help in testing, development, documentation and other tasks is highly appreciated and useful to the project.

For more details, see the file CONTRIBUTING.md.

License

Source code is released under a BSD-2 license. Please refer to LICENSE.md for more information.

You might also like...
Download and process satellite imagery in Python using Sentinel Hub services.

Description The sentinelhub Python package allows users to make OGC (WMS and WCS) web requests to download and process satellite images within your Py

 An API built to format given addresses using Python and Flask.
An API built to format given addresses using Python and Flask.

An API built to format given addresses using Python and Flask. About The API returns properly formatted data, i.e. removing duplicate fields, distingu

This is a simple python code to get IP address and its location using python

IP address & Location finder @DEV/ED : Pavan Ananth Sharma Dependencies: ip2geotools Note: use pip install ip2geotools to install this in your termin

A package built to support working with spatial data using open source python

EarthPy EarthPy makes it easier to plot and manipulate spatial data in Python. Why EarthPy? Python is a generic programming language designed to suppo

Using SQLAlchemy with spatial databases

GeoAlchemy GIS Support for SQLAlchemy. Introduction GeoAlchemy is an extension of SQLAlchemy. It provides support for Geospatial data types at the ORM

Solving the Traveling Salesman Problem using Self-Organizing Maps
Solving the Traveling Salesman Problem using Self-Organizing Maps

Solving the Traveling Salesman Problem using Self-Organizing Maps This repository contains an implementation of a Self Organizing Map that can be used

Example of animated maps in matplotlib + geopandas using entire time series of congressional district maps from UCLA archive. rendered, interactive version below
Example of animated maps in matplotlib + geopandas using entire time series of congressional district maps from UCLA archive. rendered, interactive version below

Example of animated maps in matplotlib + geopandas using entire time series of congressional district maps from UCLA archive. rendered, interactive version below

Hapi is a Python library for building Conceptual Distributed Model using HBV96 lumped model & Muskingum routing method
Hapi is a Python library for building Conceptual Distributed Model using HBV96 lumped model & Muskingum routing method

Current build status All platforms: Current release info Name Downloads Version Platforms Hapi - Hydrological library for Python Hapi is an open-sourc

Daily social mapping project in November 2021. Maps made using PyGMT whenever possible.
Daily social mapping project in November 2021. Maps made using PyGMT whenever possible.

Daily social mapping project in November 2021. Maps made using PyGMT whenever possible.

Comments
  • the time to write a dask array in tif is too long?

    the time to write a dask array in tif is too long?

    I have a image which have 40000cols and 40000 rows ,while I use data = read_raster(filename, band=1), and the use witer_raster(out_filename, data), it need about one minutes, I want to know can it be quickly

    opened by DeZhao-Zhang 2
  • Add a Gitter chat badge to README.md

    Add a Gitter chat badge to README.md

    dymaxionlabs/dask-rasterio now has a Chat Room on Gitter

    @munshkr has just created a chat room. You can visit it here: https://gitter.im/dymaxionlabs/dask-rasterio.

    This pull-request adds this badge to your README.md:

    Gitter

    If my aim is a little off, please let me know.

    Happy chatting.

    PS: Click here if you would prefer not to receive automatic pull-requests from Gitter in future.

    opened by gitter-badger 1
  • Does dask-rasterio support masked array?

    Does dask-rasterio support masked array?

    I'm working with dask masked array, and was wondering what would be the translation of these lines?

        import rasterio
        with rasterio.open(inputFile) as source:
            # this is a 3D numpy array, with dimensions [band, row, col]
            src_array = source.read(masked=True)
    

    Thank you for your cool lib!

    opened by Becheler 0
  • TypeError: self._hds cannot be converted to a Python object for pickling

    TypeError: self._hds cannot be converted to a Python object for pickling

    Seems that rasterio's _hds object is no more serializable

    distributed.protocol.pickle - INFO - Failed to serialize ("('filled-2f9fe0560be0502eda038fa941309294', 0, 0)", <dask_rasterio.write.RasterioDataset object at 0x7f8f9deac828>, (slice(0, 748, None), slice(0, 22415, None)), <unlocked _thread.lock object at 0x7f8f9cb2af58>, False). Exception: self._hds cannot be converted to a Python object for pickling
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    ~/miniconda3/envs/jupyter/lib/python3.6/site-packages/distributed/protocol/pickle.py in dumps(x)
         37     try:
    ---> 38         result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
         39         if len(result) < 1000:
    
    ~/miniconda3/envs/jupyter/lib/python3.6/site-packages/rasterio/_io.cpython-36m-x86_64-linux-gnu.so in rasterio._io.DatasetWriterBase.__reduce_cython__()
    
    TypeError: self._hds cannot be converted to a Python object for pickling
    
    opened by arkanoid87 3
Releases(0.2.1)
Owner
Dymaxion Labs
Creating new value from geospatial imagery with deep learning
Dymaxion Labs
Rasterio reads and writes geospatial raster datasets

Rasterio Rasterio reads and writes geospatial raster data. Geographic information systems use GeoTIFF and other formats to organize and store gridded,

Mapbox 1.9k Jan 7, 2023
Cloud Optimized GeoTIFF creation and validation plugin for rasterio

rio-cogeo Cloud Optimized GeoTIFF (COG) creation and validation plugin for Rasterio. Documentation: https://cogeotiff.github.io/rio-cogeo/ Source Code

null 216 Dec 31, 2022
A light-weight, versatile XYZ tile server, built with Flask and Rasterio :earth_africa:

Terracotta is a pure Python tile server that runs as a WSGI app on a dedicated webserver or as a serverless app on AWS Lambda. It is built on a modern

DHI GRAS 531 Dec 28, 2022
Histogram matching plugin for rasterio

rio-hist Histogram matching plugin for rasterio. Provides a CLI and python module for adjusting colors based on histogram matching in a variety of col

Mapbox 75 Sep 23, 2022
Color correction plugin for rasterio

rio-color A rasterio plugin for applying basic color-oriented image operations to geospatial rasters. Goals No heavy dependencies: rio-color is purpos

Mapbox 111 Nov 15, 2022
How to use COG's (Cloud optimized GeoTIFFs) with Rasterio

How to use COG's (Cloud optimized GeoTIFFs) with Rasterio According to Cogeo.org: A Cloud Opdtimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at

Marvin Gabler 8 Jul 29, 2022
Read images to numpy arrays

mahotas-imread: Read Image Files IO with images and numpy arrays. Mahotas-imread is a simple module with a small number of functions: imread Reads an

Luis Pedro Coelho 67 Jan 7, 2023
A short term landscape evolution using a path sampling method to solve water and sediment flow continuity equations and model mass flows over complex topographies.

r.sim.terrain A short-term landscape evolution model that simulates topographic change for both steady state and dynamic flow regimes across a range o

Brendan Harmon 7 Oct 21, 2022
Using Global fishing watch's data to build a machine learning model that can identify illegal fishing and poaching activities through satellite and geo-location data.

Using Global fishing watch's data to build a machine learning model that can identify illegal fishing and poaching activities through satellite and geo-location data.

Ayush Mishra 3 May 6, 2022
Pandas Network Analysis: fast accessibility metrics and shortest paths, using contraction hierarchies :world_map:

Pandana Pandana is a Python library for network analysis that uses contraction hierarchies to calculate super-fast travel accessibility metrics and sh

Urban Data Science Toolkit 321 Jan 5, 2023