Python tools for geographic data

GeoPandas

Last update: Jan 3, 2023

Related tags

Geolocation geopandas

Overview

GeoPandas

Python tools for geographic data

Introduction

GeoPandas is a project to add support for geographic data to pandas objects. It currently implements GeoSeries and GeoDataFrame types which are subclasses of pandas.Series and pandas.DataFrame respectively. GeoPandas objects can act on shapely geometry objects and perform geometric operations.

GeoPandas geometry operations are cartesian. The coordinate reference system (crs) can be stored as an attribute on an object, and is automatically set when loading from a file. Objects may be transformed to new coordinate systems with the to_crs() method. There is currently no enforcement of like coordinates for operations, but that may change in the future.

Documentation is available at geopandas.org (current release) and Read the Docs (release and development versions).

Install

See the installation docs for all details. GeoPandas depends on the following packages:

pandas
shapely
fiona
pyproj

Further, matplotlib is an optional dependency, required for plotting, and rtree is an optional dependency, required for spatial joins. rtree requires the C library libspatialindex.

Those packages depend on several low-level libraries for geospatial analysis, which can be a challenge to install. Therefore, we recommend to install GeoPandas using the conda package manager. See the installation docs for more details.

Get in touch

Ask usage questions ("How do I?") on StackOverflow or GIS StackExchange.
Report bugs, suggest features or view the source code on GitHub.
For a quick question about a bug report or feature request, or Pull Request, head over to the gitter channel.
For less well defined questions or ideas, or to announce other projects of interest to GeoPandas users, ... use the mailing list.

Examples

>>> import geopandas
>>> from shapely.geometry import Polygon
>>> p1 = Polygon([(0, 0), (1, 0), (1, 1)])
>>> p2 = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])
>>> p3 = Polygon([(2, 0), (3, 0), (3, 1), (2, 1)])
>>> g = geopandas.GeoSeries([p1, p2, p3])
>>> g
0         POLYGON ((0 0, 1 0, 1 1, 0 0))
1    POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))
2    POLYGON ((2 0, 3 0, 3 1, 2 1, 2 0))
dtype: geometry

Some geographic operations return normal pandas object. The area property of a GeoSeries will return a pandas.Series containing the area of each item in the GeoSeries:

>>> print(g.area)
0    0.5
1    1.0
2    1.0
dtype: float64

Other operations return GeoPandas objects:

>>> g.buffer(0.5)
0    POLYGON ((-0.3535533905932737 0.35355339059327...
1    POLYGON ((-0.5 0, -0.5 1, -0.4975923633360985 ...
2    POLYGON ((1.5 0, 1.5 1, 1.502407636663901 1.04...
dtype: geometry

GeoPandas objects also know how to plot themselves. GeoPandas uses matplotlib for plotting. To generate a plot of our GeoSeries, use:

>>> g.plot()

GeoPandas also implements alternate constructors that can read any data format recognized by fiona. To read a zip file containing an ESRI shapefile with the boroughs boundaries of New York City (GeoPandas includes this as an example dataset):

>>> nybb_path = geopandas.datasets.get_path('nybb')
>>> boros = geopandas.read_file(nybb_path)
>>> boros.set_index('BoroCode', inplace=True)
>>> boros.sort_index(inplace=True)
>>> boros
               BoroName     Shape_Leng    Shape_Area  \
BoroCode
1             Manhattan  359299.096471  6.364715e+08
2                 Bronx  464392.991824  1.186925e+09
3              Brooklyn  741080.523166  1.937479e+09
4                Queens  896344.047763  3.045213e+09
5         Staten Island  330470.010332  1.623820e+09

                                                   geometry
BoroCode
1         MULTIPOLYGON (((981219.0557861328 188655.31579...
2         MULTIPOLYGON (((1012821.805786133 229228.26458...
3         MULTIPOLYGON (((1021176.479003906 151374.79699...
4         MULTIPOLYGON (((1029606.076599121 156073.81420...
5         MULTIPOLYGON (((970217.0223999023 145643.33221...

>>> boros['geometry'].convex_hull
BoroCode
1    POLYGON ((977855.4451904297 188082.3223876953,...
2    POLYGON ((1017949.977600098 225426.8845825195,...
3    POLYGON ((988872.8212280273 146772.0317993164,...
4    POLYGON ((1000721.531799316 136681.776184082, ...
5    POLYGON ((915517.6877458114 120121.8812543372,...
dtype: geometry

Comments

WIP - Cythonize geometry series operations
This cythonizes a class of geometry operations within geoseries. It builds off of @jorisvandenbossche notebook in #430 by extending the set of operations and setting up a proper build environment.

Some things to note

There is a Cython build setup provided by @eriknw

Cython code is only used if available, falling back on the previous Python solution

@wmay has another attempt at https://github.com/Toblerity/Shapely/issues/501

I get around a 50-100x speedup on a simple comparison

There are some inconsistencies between this and what shapely does that I haven't yet tracked won

There are still plenty of operations to do. This just expands on @jorisvandenbossche 's work

geopandas-cython
opened by mrocklin 103

FYI: Installing geopandas with conda

Command for installing geopandas in conda is: conda install -c https://conda.anaconda.org/ioos geopandas

Copied from comment below: there is some extra complexity due to a broken fiona package. The below is a working recipe to get geopandas installed on Windows:

# need fiona with gdal < 2 (it's not yet compatible, although conda tries to install gdal 2)
conda install fiona "libgdal<2.0"
# basic dependencies -> this possibly needs to reinstall numpy to ensure it's the correct version that pandas needs (fiona downgrades numpy, leaving an already install pandas possibly broken)
conda install pandas matplotlib
# install shapely and pyproj from ioos + its dependecies
conda install -c ioos shapely pyproj
# install geopandas and descarted from ioos without updating its dependencies -> tries to update fiona/pyproj -> see problems above
conda install -c ioos geopandas descartes --no-deps

opened by BKJackson 86

Overlay performance

This PR addresses #338, #400 and #343 (related also to #330, #233, #404)

I changed the overlay function, but kept the old one as overlay_slow. Additionally, I added test_overlay2 and some qgis generated files in datasets for testing. There is an issue with union and ident, which fail in the tests in test_overlay, which uses the Borrows shapefile, while it passes the same tests in test_overlay2 when creating basic polygons. It seems the qgis generated union for the borrows data is not too clean (e.g., missing geometries), so test_overlay2 seems to be correct and shows the function performing correctly.

Closes #338, closes #400, closes #666
ops:overlay

opened by ozak 70
Follow-up - Refactor cythonize geometry series operations
UPDATE: the cython effort has been shifted to the PyGEOS package (to be integrated into Shapely), and a PR has recently landed in master with optional support for that (will be released as GeoPandas 0.8). See https://github.com/geopandas/geopandas/pull/1154 and https://geopandas.readthedocs.io/en/latest/install.html#using-the-optional-pygeos-dependency for some docs.

I merged https://github.com/geopandas/geopandas/pull/467/ in https://github.com/geopandas/geopandas/pull/472

Status: an initial implementation of the refactor (https://github.com/geopandas/geopandas/pull/467) has been merged in the geopandas-cython branch, leaving master currently as the 'stable' branch. Further improvements can be done by PR, but targeting this geopandas-cython branch (when you open a PR, you can choose the base branch).

A bit more background on the new implementation we are trying out: we made a vectorized geometry object GeometryArray (array-like with vectorized operations) in cython in geopandas. This vectorized geometry object only holds the integer pointers as its data, and only boxes it to shapely objects when the user accesses eg a single element, or iterates over it, ... This makes it fast and cheaper to construct.

To integrate this in the GeoDataFrame and GeoSeries, we implemented a new GeometryBlock ('blocks' are the internal building block of pandas for the different columns). The reason we need a custom GeometryBlock, is because we need to have a way to let pandas know the data are not just normal integers we store in the dataframe (it are pointers to geometry objects), and cannot be manipulated as it were integers.

Some known to do items:

[ ] fix remaining failings tests

[x] make installation / building easier (eg automatically finding geos location -> https://github.com/geopandas/geopandas/pull/489)

[ ] some changes will be needed to pandas (eg to support concat)

[ ] implement cythonized/vectorized io functionality (shapefiles, geojson, x/y from csv/df)

[x] create an asv benchmark suite to track progress / improvement over master (this should first be merged in master) -> https://github.com/geopandas/geopandas/pull/497

[ ] update conda recipe (maybe we can use conda-forge to provide 'beta' builds)

[ ] get appveyor working to test on windows

[ ] add a GeometryArray.unique method (then GeoSeries.unique will work automatically)

cc @mrocklin @sgillies @kjordahl @jdmcbr @kuanb @eriknw
geopandas-cython
opened by jorisvandenbossche 68
ENH: Add clip module

Hi Again @jorisvandenbossche

Addressing: https://github.com/geopandas/geopandas/issues/821

This PR is the beginning of adding the clip module to geopandas. As i began to copy things over, i noticed that the data used in our vignette uses data downloaded with an earthpy function. So I will need to refactor it to work here. @nkorinek is actually working on that here:

https://github.com/earthlab/earthpy/pull/414

So this is also a WIP PR. so far tests for this module are working just fine but I want to see what CI has to say as well. I noticed some of the other tests are failing locally but they have nothing to do with what i am adding here. Excited to see this functionality in GeoPandas!! This pr complements #1127 where i'm updating docs as I contribute.

please say the word if something needs to be changed, fixed, etc!!

Closes #956, closes #821

opened by lwasser 59
DOC: add sphinx gallery for examples
This PR converts some of the documentation to examples using the sphinx-gallery plugin. This does a couple of things:

Converts many of the examples in .rst files (that were using .. ipython:: blocks) into proper python files with rst embedded as comments

Generates a gallery of images showing off example outputs

Converts python files with embedded rST into a rendered HTML output

Creates jupyter notebooks for each example as well, to go along with the python file

Embeds examples in documentation stub pages

Makes (some) methods/classes/etc clickable links in the rendered examples.

This PR is still somewhat WIP, but this is a general idea of what it'd look like:

http://predictablynoisy.com/geopandas/gallery/index.html

And here's an example of the examples listed in an API stub page:

http://predictablynoisy.com/geopandas/api/_as_gen/geopandas.datasets.get_path.html#geopandas.datasets.get_path

Let me know what folks think and/or if this is worth me spending more time to refine. I think it'd be a nice step towards making the documentation more user-friendly! Comments welcome!

Note: In general this does not change the content of any examples, it only converts them to python. I think that changing the example content itself should be done in another PR if folks want this.
opened by choldgraf 55
ENH: Add parquet IO support

This PR adds support for reading and writing GeoDataFrames to parquet files, based in part on initial development by @darcy-r (https://github.com/darcy-r/geoparquet-python).

In summary, any geometry columns present are converted to WKB format for serialization in parquet. We retain their original column names throughout.

We use metadata in the parquet file to store the CRS (JSONified dict), primary geometry column name, and a list of all geometry column names. This functionality supports GeoDataFrames with multiple geometry columns.

This metadata is stored in a geo key in the metadata, and should support interoperability with R.

This approach leverages the existing parquet support from pandas but overrides the read / write functions so that we can handle the geospatial specific stuff.

Note: I only provided an implementation using pyarrow, as it does not appear that fastparquet provides an API that allows us to read / write metadata. However, I used the same overall approach as is used in pandas to make it easier to add fastparquet support at a later time, if such functionality becomes available.

We may completely remove feather support from this PR and wait for better support from pyarrow for writing metadata with CRS and geometry column names, per comment in #651

(for now please ignore the feather implementation, it has not yet been standardized with the parquet approach)

I'd like to run some benchmarks comparing parquet to feather before we decide to keep / remove feather support.

resolves #651

opened by brendan-ward 51
ENH: sjoin_nearest
This is a really rough pass for now, not much in the way of tests or docs, just trying to get some general feedback on the shape of the implementation as well as timelines to incorporate it (eg. wait for a PyGEOS release).

This addresses the following:

Closes #1096

Closes #1271 (replaces)

Some questions I have:

Do we want to support the max_distance parameter via nearest_all? I think @brendan-ward can best weigh in on this.

Do we want to cook up a "reasonable" implementation for rtree, like we did with query_bulk? I'd say no, in that case we mainly did it because that logic already existed in sjoin so really we were reshuffling things to get a cleaner implementation, not inventing anything new. This relates to #1509

Is there any particular way we want to choose which is the index and which is the input geometries? I went with the simplest I could think of, but there's probably better choices for performance. Again, I think @brendan-ward is probably the best person for this.

How much testing do we want for this feature? My sense is that not much should be needed since _basic_checks and _frame_join are already extensively tested, and the actual nearest logic is tested in PyGEOS.
opened by adriangb 48
DRAFT: add 'nearest' option to sjoin
This PR is based on the discussion in issue #1096. The conversation seemed to die without a real consensus, so I tried to take a general idea and package it into something concrete.

The main things I think should be discussed are:

API. I'm not dead set on this one. It's simple (only the report_dist parameter is added, compatibility is preserved) but @martinfleis had proposed making nearest a pseudo-option when using op="intersection".

Using the rtree spatial index to pre-determine intersecting geometries. From my testing, this is not worth it, but I only tested with small datasets.

Use of how="right" in conjunction with op="nearest". Since nearest isn't as clean mathematically as intersection or within/contains, how="right" doesn't make as much sense as it does with those. I'm not a fan of having to add that warning/check during the final join operation.

Integration into sjoin. The way I structured things, it's basically saying:
if op=="nearest": # do new stuff else: # do all of the old stuff

While nearest is a departure from the existing op (which map directly to binary predicates), it doesn't feel nicely integrated to have to split the logic up like this.

Thank you all for your patience on this PR, it's my first one for this project.
opened by adriangb 48

More on overlay performance

This is just a follow up to #338, but wanted to make sure someone sees my posts. I was trying to use overlay and noticed it is impossibly slow. So I ended up coding some functions to take care of this. Using the example in #338 I tested and the new functions are much faster, so I am wondering if there is interest and I could create a pull that improves performance. Here's the function (for now it only implements intersection and difference, but I could generalize it):

def spatial_overlays(df1, df2, how='intersection'):
    '''Compute overlay intersection of two 
        GeoPandasDataFrames df1 and df2
    '''
    df1 = df1.copy()
    df2 = df2.copy()
    df1['geometry'] = df1.geometry.buffer(0)
    df2['geometry'] = df2.geometry.buffer(0)
    if how=='intersection':
        # Spatial Index to create intersections
        spatial_index = df2.sindex
        df1['bbox'] = df1.geometry.apply(lambda x: x.bounds)
        df1['histreg']=df1.bbox.apply(lambda x:list(spatial_index.intersection(x)))
        pairs = df1['histreg'].to_dict()
        nei = []
        for i,j in pairs.items():
            for k in j:
                nei.append([i,k])
        
        pairs = gp.GeoDataFrame(nei, columns=['idx1','idx2'], crs=df1.crs)
        pairs = pairs.merge(df1, left_on='idx1', right_index=True)
        pairs = pairs.merge(df2, left_on='idx2', right_index=True, suffixes=['_1','_2'])
        pairs['Intersection'] = pairs.apply(lambda x: (x['geometry_1'].intersection(x['geometry_2'])).buffer(0), axis=1)
        pairs = gp.GeoDataFrame(pairs, columns=pairs.columns, crs=df1.crs)
        cols = pairs.columns.tolist()
        cols.remove('geometry_1')
        cols.remove('geometry_2')
        cols.remove('histreg')
        cols.remove('bbox')
        cols.remove('Intersection')
        dfinter = pairs[cols+['Intersection']].copy()
        dfinter.rename(columns={'Intersection':'geometry'}, inplace=True)
        dfinter = gp.GeoDataFrame(dfinter, columns=dfinter.columns, crs=pairs.crs)
        dfinter = dfinter.loc[dfinter.geometry.is_empty==False]
        return dfinter
    elif how=='difference':
        spatial_index = df2.sindex
        df1['bbox'] = df1.geometry.apply(lambda x: x.bounds)
        df1['histreg']=df1.bbox.apply(lambda x:list(spatial_index.intersection(x)))
        df1['new_g'] = df1.apply(lambda x: reduce(lambda x, y: x.difference(y).buffer(0), [x.geometry]+list(df2.iloc[x.histreg].geometry)) , axis=1)
        df1.geometry = df1.new_g
        df1 = df1.loc[df1.geometry.is_empty==False].copy()
        df1.drop(['bbox', 'histreg', new_g], axis=1, inplace=True)
        return df1

and the example

import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
capitals = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))
countries = world[['geometry', 'name']]
countries = countries.to_crs('+init=epsg:3395')[countries.name!="Antarctica"]
capitals = capitals.to_crs('+init=epsg:3395')
capitals['geometry']= capitals.buffer(500000)

%time gpd.overlay(countries, capitals, how='intersection')
CPU times: user 36.5 s, sys: 357 ms, total: 36.8 s
Wall time: 38.8 s

%time spatial_overlays(countries, capitals, how='intersection')
CPU times: user 1.53 s, sys: 11.3 ms, total: 1.54 s
Wall time: 1.59 s

As you can see an major improvement in performance due to the use of the spatial index.

ops:overlay

opened by ozak 46

Implement plotting using matplotlib Collections
~~This doesn't use Collection.set_array() because I can't find a way to share the same colorscale across different collection types (i.e. geometry types). To keep different collections in sync with the same colorbar, we must resolve every geometry's colors globally.~~ (Fixed below in 17313dc) But we still gain the performance benefit from using collections.

~~This also adds a legend for noncategorical dataframes. However I deprecated facecolor kwarg because it conflicts with color; discussion for this is at #204.~~

Given that TestImageComparisons caused us so much trouble in Travis, and the new plotting mechanism requires re-generating these plots, I dropped this method of testing. If there are objections, let's discuss the options.

[x] ~~Fix plotting Polygon holes. (I've no idea how to do this in a PolygonPatch...)~~ (broken on master too, so deferring to a different PR)

[x] Replace a test for plot(figsize=(...)), which I removed when removing TestImageComparisons.

[x] Smoketest PySAL support

[x] Add tests for heterogeneous geometry types in a single GeoSeries.

[x] Compatibility with mpl < 1.5

Fixes #172 (except for #266 which existed even before this PR) Fixes #259 Fixes #204
opened by IamJeffG 43

Conflict when reading a file that has a "geometry" property

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of geopandas.
[ ] (optional) I have confirmed this bug exists on the main branch of geopandas.

Code Sample

Sorry but I don't know how to reproduce similar data.

I am working with data from AIRBUS GeoStore, which sadly has a "geometry" property (multiple in fact, but that doesn't matter here)

Data:

!wget "https://datadoors.intelligence-airbusds.com/export/v1/static/export/export-20230106-11165137.zip"
!unzip -o "export-20230106-11165137.zip" -d "geostore_products"

Code:

import geopandas as gpd

fpath = "geostore_products"
gdf = gpd.read_file(fpath, layer="Export", engine="fiona")  # ValueError

Traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 3
      1 import geopandas as gpd
----> 3 gdf = gpd.read_file(fpath, layer="Export", engine="fiona")

File ~/miniconda3/envs/geo/lib/python3.11/site-packages/geopandas/io/file.py:259, in _read_file(filename, bbox, mask, rows, engine, **kwargs)
    256     path_or_bytes = filename
    258 if engine == "fiona":
--> 259     return _read_file_fiona(
    260         path_or_bytes, from_bytes, bbox=bbox, mask=mask, rows=rows, **kwargs
    261     )
    262 elif engine == "pyogrio":
    263     return _read_file_pyogrio(
    264         path_or_bytes, bbox=bbox, mask=mask, rows=rows, **kwargs
    265     )

File ~/miniconda3/envs/geo/lib/python3.11/site-packages/geopandas/io/file.py:360, in _read_file_fiona(path_or_bytes, from_bytes, bbox, mask, rows, where, **kwargs)
    356     df = pd.DataFrame(
    357         [record["properties"] for record in f_filt], columns=columns
    358     )
    359 else:
--> 360     df = GeoDataFrame.from_features(
    361         f_filt, crs=crs, columns=columns + ["geometry"]
    362     )
    363 for k in datetime_fields:
    364     as_dt = pd.to_datetime(df[k], errors="ignore")

File ~/miniconda3/envs/geo/lib/python3.11/site-packages/geopandas/geodataframe.py:643, in GeoDataFrame.from_features(cls, features, crs, columns)
    641     row.update(properties)
    642     rows.append(row)
--> 643 return cls(rows, columns=columns, crs=crs)

File ~/miniconda3/envs/geo/lib/python3.11/site-packages/geopandas/geodataframe.py:159, in GeoDataFrame.__init__(self, data, geometry, crs, *args, **kwargs)
    150 if (
    151     geometry is None
    152     and self.columns.nlevels == 1
   (...)
    156     # self["geometry"] is a gdf and constructor gets recursively recalled
    157     # by pandas internals trying to access this
    158     if (self.columns == "geometry").sum() > 1:
--> 159         raise ValueError(
    160             "GeoDataFrame does not support multiple columns "
    161             "using the geometry column name 'geometry'."
    162         )
    164     # only if we have actual geometry values -> call set_geometry
    165     try:

ValueError: GeoDataFrame does not support multiple columns using the geometry column name 'geometry'.

Problem description

Reading the data as above will raise a ValueError, since it has a "geometry" column from one of the layer properties, plus an additionnal "geometry" column derived from following line during construction (geopandas/geodataframe.py::GeoDataFrame.from_features#L640):

row = {
    "geometry": shape(feature["geometry"]) if feature["geometry"] else None
}

We can confirm because if we use ignore_geometry=True there still will be a "geometry" column:

import geopandas as gpd

gdf = gpd.read_file(fpath, layer="Export", engine="fiona", ignore_geometry=True)
gdf.geometry  # returns gdf["geometry"] that does exist

A workaround could be one of:

use ignore_geometry=True, then gdf.set_geometry("geometry").
use ignore_fields=["geometry"], then the "geometry" property will be ignored and a geometry will be derived from the layer.

Both solution should be identical if the data is consistent between layer geometry and geometry property.

Expected Output

I would have expected a warning, or a note in the documentation.

The error also comes from the row being updated with properties after setting the derived geometry from the layer, and not the other way around, in geopandas/geodataframe.py::GeoDataFrame.from_features#L635-L649:

@classmethod
def from_features(cls, features, crs=None, columns=None):
    ...
    rows = []
    for feature in features_lst:
        # load geometry
        if hasattr(feature, "__geo_interface__"):
            feature = feature.__geo_interface__
        row = {
            "geometry": shape(feature["geometry"]) if feature["geometry"] else None
        }
        # load properties
        properties = feature["properties"]
        if properties is None:
            properties = {}
        row.update(properties)
        rows.append(row)
    return cls(rows, columns=columns, crs=crs)

If the "geometry": shape(feature["geometry"]) was added on the properties dict it would be fine. It might require to make a check so not to add an additional "geometry" column in geopandas/io/file.py::_read_file_fiona#L360:

df = GeoDataFrame.from_features(
    f_filt, crs=crs, columns=columns + ["geometry"]
)

Output of `geopandas.show_versions()`

SYSTEM INFO

python : 3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:24:40) [GCC 10.4.0] executable : /home/paul/miniconda3/envs/geo/bin/python machine : Linux-5.14.0-1055-oem-x86_64-with-glibc2.31

GEOS, GDAL, PROJ INFO

GEOS : 3.11.1 GEOS lib : None GDAL : 3.6.1 GDAL data dir: /home/paul/miniconda3/envs/geo/share/gdal PROJ : 9.1.0 PROJ data dir: /home/paul/miniconda3/envs/geo/share/proj

PYTHON DEPENDENCIES

geopandas : 0.12.2 numpy : 1.24.1 pandas : 1.5.2 pyproj : 3.4.1 shapely : 2.0.0 fiona : 1.8.22 geoalchemy2: None geopy : None matplotlib : 3.6.2 mapclassify: 2.4.3 pygeos : None pyogrio : None psycopg2 : None pyarrow : None rtree : 1.0.1

opened by Paul-Aime 9

Deprecate geopandas.datasets module
After the recent discussions about the contents of the geopandas.datasets module, especially the naturalearth_lowres dataset (see #2382 or #1510 but also #1041), we have decided to deprecate the module in the next release and remove it prior GeoPandas 1.0 (expected early 2024).

We know that removing the built-in datasets will cause a bit of friction, especially in creating tutorials and other study materials (like our own documentation), so we want to provide another easy solution to get some real-world data for illustration into GeoDataFrame. The current plan is to create a package similar to xyzservices or pysal.examples that would serve URLs to remote datasets served by existing open data providers. The idea is that instead of typing

geopandas.read_file(geopandas.datasets.get_path('nybb'))

where get_path returns a local path to a file installed with GeoPandas, we would do

geopandas.read_file(new_package.get_path('nybb'))

where new_package.get_path return an URL GeoPandas can directly read.

The contents of the new package will be curated and will not include any data reflecting international politics, among other potentially hurtful topics.

GeoPandas was always a software project, not a data project, and the included datasets were meant to easily showcase the functionality for illustrational purposes. We believe that the proposed solution will swiftly replace the built-in solution, minimise the friction caused by the change and remove unnecessary stress for the project maintainers.
opened by martinfleis 2
COMPAT: support fiona 1.9 (without warnings) + 2.0 for Feature model changes
Fiona 1.9 is adding deprecation warnings in advance of some changes coming to Fiona 2.0 (cfr https://github.com/Toblerity/Fiona/issues/758, https://github.com/Toblerity/fiona-rfc/blob/master/rfc/0001-fiona-2-0-changes.md)

In the dev build (where fiona 1.9b1 is installed), we can see those:

geopandas/io/tests/test_file.py: 2108 warnings geopandas/io/tests/test_file_geom_types_drivers.py: 99 warnings geopandas/tests/test_geoseries.py: 2 warnings /usr/share/miniconda3/envs/test/lib/python3.10/site-packages/fiona/collection.py:551: FionaDeprecationWarning: Support for feature and geometry dicts is deprecated. Instances of Feature and Geometry will be required in 2.0. self.session.writerecs(records, self)

This PR already starts using the new API if fiona >= 1.9
opened by jorisvandenbossche 0
CI: fix versioneer update action

Fixes #2690.

Manually remove init.py changes from the commit to avoid the duplication of those two lines like we have now in #2694.

I tested it on my fork but it will probably needs to be merged to test it properly.

opened by martinfleis 0
Update Versioneer

Automatic update of Versioneer by the versioneer.yml workflow.

Please review changes manually, especially if a duplicate from . import _version is added.
invalid

opened by github-actions[bot] 1
DOC: geoparquet docs page out of sync

https://geopandas.org/en/latest/docs/user_guide/io.html#apache-parquet-and-feather-file-formats page refers to the 0.1.0 geo-arrow-spec and not the newer geoparquet repo. I believe we also removed the corresponding warning text from the parquet io methods.
bug documentation

opened by m-richards 1

Releases(v0.12.2)

v0.12.2(Dec 10, 2022)
Bug fixes:

Correctly handle geometries with Z dimension in to_crs() when using PyGEOS or Shapely >= 2.0 (previously the z coordinates were lost) (#1345).

Assign Crimea to Ukraine in the naturalearth_lowres built-in dataset (#2670)

Source code(tar.gz)
Source code(zip)
geopandas-0.12.2.tar.gz(1.00 MB)
v0.12.1(Oct 29, 2022)

Small bug-fix release removing the shapely<2 pin in the installation requirements.
Source code(tar.gz)
Source code(zip)
geopandas-0.12.1.tar.gz(1.00 MB)
v0.12.0(Oct 24, 2022)
The highlight of this release is the support for Shapely 2.0. This makes it possible to test Shapely 2.0 (currently 2.0b1) alongside GeoPandas.

Note that if you also have PyGEOS installed, you need to set an environment variable (USE_PYGEOS=0) before importing geopandas to actually test Shapely 2.0 features instead of PyGEOS. See https://geopandas.org/en/latest/getting_started/install.html#using-the-optional-pygeos-dependency for more details.

New features and improvements:

Added normalize() method from shapely to GeoSeries/GeoDataframe (#2537).

Added make_valid() method from shapely to GeoSeries/GeoDataframe (#2539).

Added where filter to read_file (#2552).

Updated the distributed natural earth datasets (naturalearth_lowres and naturalearth_cities) to version 5.1 (#2555).

Deprecations and compatibility notes:

Accessing the crs of a GeoDataFrame without active geometry column was deprecated and this now raises an AttributeError (#2578).

Resolved colormap-related warning in .explore() for recent Matplotlib versions (#2596).

Bug fixes:

Fix cryptic error message in geopandas.clip() when clipping with an empty geometry (#2589).

Accessing gdf.geometry where the active geometry column is missing, and a column named "geometry" is present will now raise an AttributeError, rather than returning gdf["geometry"] (#2575).

Combining GeoSeries/GeoDataFrames with pandas.concat will no longer silently override CRS information if not all inputs have the same CRS (#2056).

Acknowledgments

Thanks to everyone who contributed to this release! A total of 17 people contributed patches to this release. People with a "+" by their names contributed a patch for the first time.

Alan D. Snow

Alberto González Rosales +

Brendan Ward

Chris Arderne +

Clemens Korner +

Ewout ter Hoeven

Fred Bunt +

Giacomo Caria

James Gaboardi

Joris Van den Bossche

Martin Fleischmann

Matt Richards

Ray Bell

Shogo Hida +

Simone Parmeggiani +

keirayuki310 +

rraymondgh

Source code(tar.gz)
Source code(zip)
geopandas-0.12.0.tar.gz(1.00 MB)
v0.11.1(Jul 24, 2022)
Small bug-fix release:

Fix regression (RecursionError) in reshape methods such as unstack() and pivot() involving MultiIndex, or GeoDataFrame construction with MultiIndex (#2486).

Fix regression in GeoDataFrame.explode() with non-default geometry column name.

Fix regression in apply() causing row-wise all nan float columns to be casted to GeometryDtype (#2482).

Fix a crash in datetime column reading where the file contains mixed timezone offsets (#2479). These will be read as UTC localized values.

Fix a crash in datetime column reading where the file contains datetimes outside the range supported by [ns] precision (#2505).

Fix regression in passing the Parquet or Feather format version in to_parquet and to_feather. As a result, the version parameter for the to_parquet and to_feather methods has been replaced with schema_version. version will be passed directly to underlying feather or parquet writer. version will only be used to set schema_version if version is one of 0.1.0 or 0.4.0 (#2496).

Source code(tar.gz)
Source code(zip)
geopandas-0.11.1.tar.gz(1018.98 KB)
v0.11.0(Jun 21, 2022)
Highlights of this release:

The geopandas.read_file() and GeoDataFrame.to_file() methods to read and write GIS file formats can now optionally use the pyogrio package under the hood through the engine="pyogrio" keyword. The pyogrio package implements vectorized IO for GDAL/OGR vector data sources, and is faster compared to the fiona-based engine (#2225).

GeoParquet support updated to implement v0.4.0 of the OpenGeospatial/GeoParquet specification (#2441). Backwards compatibility with v0.1.0 of the metadata spec (implemented in the previous releases of GeoPandas) is guaranteed, and reading and writing Parquet and Feather files will no longer produce a UserWarning (#2327).

New features and improvements:

Improved handling of GeoDataFrame when the active geometry column is lost from the GeoDataFrame. Previously, square bracket indexing gdf[[...]] returned a GeoDataFrame when the active geometry column was retained and a DataFrame was returned otherwise. Other pandas indexing methods (loc, iloc, etc) did not follow the same rules. The new behaviour for all indexing/reshaping operations is now as follows (#2329, #2060):

If operations produce a DataFrame containing the active geometry column, a GeoDataFrame is returned

If operations produce a DataFrame containing GeometryDtype columns, but not the active geometry column, a GeoDataFrame is returned, where the active geometry column is set to None (set the new geometry column with set_geometry())

If operations produce a DataFrame containing no GeometryDtype columns, a DataFrame is returned (this can be upcast again by calling set_geometry() or the GeoDataFrame constructor)

If operations produce a Series of GeometryDtype, a GeoSeries is returned, otherwise Series is returned.

Error messages for having an invalid geometry column have been improved, indicating the name of the last valid active geometry column set and whether other geometry columns can be promoted to the active geometry column (#2329).

Datetime fields are now read and written correctly for GIS formats which support them (e.g. GPKG, GeoJSON) with fiona 1.8.14 or higher. Previously, datetimes were read as strings (#2202).

folium.Map keyword arguments can now be specified as the map_kwds argument to GeoDataFrame.explore() method (#2315).

Add a new parameter style_function to GeoDataFrame.explore() to enable plot styling based on GeoJSON properties (#2377).

It is now possible to write an empty GeoDataFrame to a file for supported formats (#2240). Attempting to do so will now emit a UserWarning instead of a ValueError.

Fast rectangle clipping has been exposed as GeoSeries/GeoDataFrame.clip_by_rect() (#1928).

The mask parameter of GeoSeries/GeoDataFrame.clip() now accepts a rectangular mask as a list-like to perform fast rectangle clipping using the new GeoSeries/GeoDataFrame.clip_by_rect() (#2414).

Bundled demo dataset naturalearth_lowres has been updated to version 5.0.1 of the source, with field ISO_A3 manually corrected for some cases (#2418).

Deprecations and compatibility notes:

The active development branch of geopandas on GitHub has been renamed from master to main (#2277).

Deprecated methods GeometryArray.equals_exact() and GeometryArray.almost_equals() have been removed. They should be replaced with GeometryArray.geom_equals_exact() and GeometryArray.geom_almost_equals() respectively (#2267).

Deprecated CRS functions explicit_crs_from_epsg(), epsg_from_crs() and get_epsg_file_contents() were removed (#2340).

Warning about the behaviour change to GeoSeries.isna() with empty geometries present has been removed (#2349).

Specifying a CRS in the GeoDataFrame/GeoSeries constructor which contradicted the underlying GeometryArray now raises a ValueError (#2100).

Specifying a CRS in the GeoDataFrame constructor when no geometry column is provided and calling GeoDataFrame. set_crs on a GeoDataFrame without an active geometry column now raise a ValueError (#2100)

Passing non-geometry data to theGeoSeries constructor is now fully deprecated and will raise a TypeError (#2314). Previously, a pandas.Series was returned for non-geometry data.

Deprecated GeoSeries/GeoDataFrame set operations __xor__(), __or__(), __and__() and __sub__(), geopandas.io.file.read_file/to_file and geopandas.io.sql.read_postgis now emit FutureWarning instead of DeprecationWarning and will be completely removed in a future release.

Accessing the crs of a GeoDataFrame without active geometry column is deprecated and will be removed in GeoPandas 0.12 (#2373).

Bug fixes:

GeoSeries.to_frame now creates a GeoDataFrame with the geometry column name set correctly (#2296)

Fix pickle files created with pygeos installed can not being readable when pygeos is not installed (#2237).

Fixed UnboundLocalError in GeoDataFrame.plot() using legend=True and missing_kwds (#2281).

Fix explode() incorrectly relating index to columns, including where the input index is not unique (#2292)

Fix GeoSeries.[xyz] raising an IndexError when the underlying GeoSeries contains empty points (#2335). Rows corresponding to empty points now contain np.nan.

Fix GeoDataFrame.iloc raising a TypeError when indexing a GeoDataFrame with only a single column of GeometryDtype (#1970).

Fix GeoDataFrame.iterfeatures() not returning features with the same field order as GeoDataFrame.columns (#2396).

Fix GeoDataFrame.from_features() to support reading GeoJSON with null properties (#2243).

Fix GeoDataFrame.to_parquet() not intercepting engine keyword argument, breaking consistency with pandas (#2227)

Fix GeoDataFrame.explore() producing an error when column is of boolean dtype (#2403).

Fix an issue where GeoDataFrame.to_postgis() output the wrong SRID for ESRI authority CRS (#2414).

Fix GeoDataFrame.from_dict/from_features classmethods using GeoDataFrame rather than cls as the constructor.

Fix GeoDataFrame.plot() producing incorrect colors with mixed geometry types when colors keyword is provided. (#2420)

Notes on (optional) dependencies:

GeoPandas 0.11 drops support for Python 3.7 and pandas 0.25 (the minimum supported pandas version is now 1.0.5). Further, the minimum required versions for the listed dependencies have now changed to shapely 1.7, fiona 1.8.13.post1, pyproj 2.6.1.post1, matplotlib 3.2, mapclassify 2.4.0 (#2358, #2391)

Acknowledgments

Thanks to everyone who contributed to this release! A total of 31 people contributed patches to this release. People with a "+" by their names contributed a patch for the first time.

Akylzhan Sauranbay +

Alan D. Snow

Alyssa Ross +

Andreas Meier +

Andrii Oriekhov +

Brendan Ward

Ewout ter Hoeven +

Guillaume Lostis +

James McBride

Joris Van den Bossche

Karol Zlot +

Koshy Thomas +

Martin Fleischmann

Martina Oefelein +

Matt Richards

Mike Taves

Mjumbe Poe +

Nathan Lis +

Nicolò Lucchesi +

RadMagnus +

Ray Bell

Ryan +

Will Schlitzer

bstadlbauer +

clausmichele +

froast +

joooeey +

readthedocs-assistant +

rraymondgh +

ryanward-io +

simberaj

Source code(tar.gz)
Source code(zip)
geopandas-0.11.0.tar.gz(1018.12 KB)
v0.10.2(Oct 16, 2021)
Small bug-fix release:

Fix regression in overlay() in case no geometries are intersecting (but have overlapping total bounds) (#2172).

Fix regression in overlay() with keep_geom_type=True in case the overlay of two geometries in a GeometryCollection with other geometry types (#2177).

Fix overlay() to honor the keep_geom_type keyword for the op="differnce" case (#2164).

Fix regression in plot() with a mapclassify scheme in case the formatted legend labels have duplicates (#2166).

Fix a bug in the explore() method ignoring the vmin and vmax keywords in case they are set to 0 (#2175).

Fix unary_union to correctly handle a GeoSeries with missing values (#2181).

Avoid internal deprecation warning in clip() (#2179).

Source code(tar.gz)
Source code(zip)
geopandas-0.10.2.tar.gz(1003.68 KB)
v0.10.1(Oct 8, 2021)
Small bug-fix release:

Fix regression in overlay() with non-overlapping geometries and a non-default how (i.e. not "intersection") (#2157).

Source code(tar.gz)
Source code(zip)
geopandas-0.10.1.tar.gz(1002.90 KB)
v0.10.0(Oct 3, 2021)
Highlights of this release:

A new sjoin_nearest() method to join based on proximity, with the ability to set a maximum search radius (#1865). In addition, the sindex attribute gained a new method for a "nearest" spatial index query (#1865, #2053).

A new explore() method on GeoDataFrame and GeoSeries with native support for interactive visualization based on folium / leaflet.js (#1953)

The geopandas.sjoin()/overlay()/clip() functions are now also available as methods on the GeoDataFrame (#2141, #1984, #2150).

New features and improvements:

Add support for pandas' value_counts() method for geometry dtype (#2047).

The explode() method has a new ignore_index keyword (consistent with pandas' explode method) to reset the index in the result, and a new index_parts keywords to control whether a cumulative count indexing the parts of the exploded multi-geometries should be added (#1871).

points_from_xy() is now available as a GeoSeries method from_xy (#1936).

The to_file() method will now attempt to detect the driver (if not specified) based on the extension of the provided filename, instead of defaulting to ESRI Shapefile (#1609).

Support for the storage_options keyword in read_parquet() for specifying filesystem-specific options (e.g. for S3) based on fsspec (#2107).

The read/write functions now support ~ (user home directory) expansion (#1876).

Support the convert_dtypes() method from pandas to preserve the GeoDataFrame class (#2115).

Support WKB values in the hex format in GeoSeries.from_wkb() (#2106).

Update the estimate_utm_crs() method to handle crossing the antimeridian with pyproj 3.1+ (#2049).

Improved heuristic to decide how many decimals to show in the repr based on whether the CRS is projected or geographic (#1895).

Switched the default for geocode() from GeoCode.Farm to the Photon geocoding API (https://photon.komoot.io) (#2007).

Deprecations and compatibility notes:

The op= keyword of sjoin() to indicate which spatial predicate to use for joining is being deprecated and renamed in favor of a new predicate= keyword (#1626).

The cascaded_union attribute is deprecated, use unary_union instead (#2074).

Constructing a GeoDataFrame with a duplicated "geometry" column is now disallowed. This can also raise an error in the pd.concat(.., axis=1) function if this results in duplicated active geometry columns (#2046).

The explode() method currently returns a GeoSeries/GeoDataFrame with a MultiIndex, with an additional level with indices of the parts of the exploded multi-geometries. For consistency with pandas, this will change in the future and the new index_parts keyword is added to control this.

Bug fixes:

Fix in the clip() function to correctly clip MultiPoints instead of leaving them intact when partly outside of the clip bounds (#2148).

Fix GeoSeries.isna() to correctly return a boolean Series in case of an empty GeoSeries (#2073).

Fix the GeoDataFrame constructor to preserve the geometry name when the argument is already a GeoDataFrame object (i.e. GeoDataFrame(gdf)) (#2138).

Fix loss of the values' CRS when setting those values as a column (GeoDataFrame.__setitem__) (#1963)

Fix in GeoDataFrame.apply() to preserve the active geometry column name (#1955).

Fix in sjoin() to not ignore the suffixes in case of a right-join (how="right) (#2065).

Fix GeoDataFrame.explode() with a MultiIndex (#1945).

Fix the handling of missing values in to/from_wkb and to_from_wkt (#1891).

Fix to_file() and to_json() when DataFrame has duplicate columns to raise an error (#1900).

Fix bug in the colors shown with user-defined classification scheme (#2019).

Fix handling of the path_effects keyword in plot() (#2127).

Fix GeoDataFrame.explode() to preserve attrs (#1935)

Notes on (optional) dependencies:

GeoPandas 0.9.0 dropped support for Python 3.6 and pandas 0.24. Further, the minimum required versions are numpy 1.18, shapely 1.6, fiona 1.8, matplotlib 3.1 and pyproj 2.2.

Plotting with a classification schema now requires mapclassify version >= 2.4 (#1737).

Compatibility fixes for the latest numpy in combination with Shapely 1.7 (#2072)

Compatibility fixes for the upcoming Shapely 1.8 (#2087).

Compatibility fixes for the latest PyGEOS (#1872, #2014) and matplotlib

Acknowledgments

Thanks to everyone who contributed to this release! A total of 29 people contributed patches to this release. People with a "+" by their names contributed a patch for the first time.

Adrian Garcia Badaracco

Alan D. Snow

Alison Hopkin +

Andreas Eliasson +

Ariel Núñez +

Brendan Ward

Daniel Mesejo-León +

Flavin

Imanol

James A. Bednar +

James McBride

James Myatt +

John Flavin +

Joris Van den Bossche

Martin Fleischmann

Matt Richards +

Matthew Law +

Mike Taves

Murat Can Üste +

Qiusheng Wu +

Ray Bell +

TLouf +

Tom Augspurger +

Tom Russell +

Zero +

danielpallen +

m-richards +

simberaj +

standakozak +

Source code(tar.gz)
Source code(zip)
geopandas-0.10.0.tar.gz(1002.67 KB)
v0.9.0(Feb 28, 2021)
GeoPandas 0.9.0 features a long list of new features, enhancements and bug fixes, see the full list below. In addition, there are many documentation improvements and a restyled and restructured website with a new logo (#1564, #1579, #1617, #1668, #1731, #1750, #1757, #1759).

New features and improvements:

The geopandas.read_file function now accepts more general file-like objects (e.g. fsspec open file objects). It will now also automatically recognize zipped files (#1535).

The GeoDataFrame.plot() method now provides access to the pandas plotting functionality for the non-geometry columns, either using the kind keyword or the accessor method (e.g. gdf.plot(kind="bar") or gdf.plot.bar()) (#1465).

New from_wkt(), from_wkb(), to_wkt(), to_wkb() methods for GeoSeries to construct a GeoSeries from geometries in WKT or WKB representation, or to convert a GeoSeries to a pandas Seriew with WKT or WKB values (#1710).

New GeoSeries.z attribute to access the z-coordinates of Point geometries (similar to the existing .x and .y attributes) (#1773).

The to_crs() method now handles missing values (#1618).

Support for pandas' new .attrs functionality (#1658).

The dissolve() method now allows dissolving by no column (by=None) to create a union of all geometries (single-row GeoDataFrame) (#1568).

New estimate_utm_crs() method on GeoSeries/GeoDataFrame to determine the UTM CRS based on the bounds (#1646).

GeoDataFrame.from_dict() now accepts geometry and crs keywords (#1619).

GeoDataFrame.to_postgis() and geopandas.read_postgis() now supports both sqlalchemy engine and connection objects (#1638).

The GeoDataFrame.explode() method now allows exploding based on a non-geometry column, using the pandas implementation (#1720).

Performance improvement in GeoDataFrame/GeoSeries.explode() when using the PyGEOS backend (#1693).

The binary operation and predicate methods (eg intersection(), intersects()) have a new align keyword which allows optionally not aligning on the index before performing the operation with align=False (#1668).

The GeoDataFrame.dissolve() method now supports all relevant keywords of groupby(), i.e. the level, sort, observed and dropna keywords (#1845).

The geopandas.overlay() function now accepts make_valid=False to skip the step to ensure the input geometries are valid using buffer(0) (#1802).

The GeoDataFrame.to_json() method gained a drop_id keyword to optionally not write the GeoDataFrame's index as the "id" field in the resulting JSON (#1637).

A new aspect keyword in the plotting methods to optionally allow retaining the original aspect (#1512)

A new interval keyword in the legend_kwds group of the plot() method to control the appearance of the legend labels when using a classification scheme (#1605).

The spatial index of a GeoSeries (accessed with the sindex attribute) is now stored on the underlying array. This ensures that the spatial index is preserved in more operations where possible, and that multiple geometry columns of a GeoDataFrame can each have a spatial index (#1444).

Addition of a has_sindex attribute on the GeoSeries/GeoDataFrame to check if a spatial index has already been initialized (#1627).

The geopandas.testing.assert_geoseries_equal() and assert_geodataframe_equal() testing utilities now have a normalize keyword (False by default) to normalize geometries before comparing for equality (#1826). Those functions now also give a more informative error message when failing (#1808).

Deprecations and compatibility notes:

The is_ring attribute currently returns True for Polygons. In the future, this will be False (#1631). In addition, start to check it for LineStrings and LinearRings (instead of always returning False).

The deprecated objects keyword in the intersection() method of the GeoDataFrame/GeoSeries.sindex spatial index object has been removed (#1444).

Bug fixes:

Fix regression in the plot() method raising an error with empty geometries (#1702, #1828).

Fix geopandas.overlay() to preserve geometries of the correct type which are nested within a GeometryCollection as a result of the overlay operation (#1582). In addition, a warning will now be raised if geometries of different type are dropped from the result (#1554).

Fix the repr of an empty GeoSeries to not show spurious warnings (#1673).

Fix the .crs for empty GeoDataFrames (#1560).

Fix geopandas.clip to preserve the correct geometry column name (#1566).

Fix bug in plot() method when using legend_kwds with multiple subplots (#1583)

Fix spurious warning with missing_kwds keyword of the plot() method when there are no areas with missing data (#1600).

Fix the plot() method to correctly align values passed to the column keyword as a pandas Series (#1670).

Fix bug in plotting MultiPoints when passing values to determine the color (#1694)

The rename_geometry() method now raises a more informative error message when a duplicate column name is used (#1602).

Fix explode() method to preserve the CRS (#1655)

Fix the GeoSeries.apply() method to again accept the convert_dtype keyword to be consistent with pandas (#1636).

Fix GeoDataFrame.apply() to preserve the CRS when possible (#1848).

Fix bug in containment test as geom in geoseries (#1753).

The shift() method of a GeoSeries/GeoDataFrame now preserves the CRS (#1744).

The PostGIS IO functionality now quotes table names to ensure it works with case-sensitive names (#1825).

Fix the GeoSeries constructor without passing data but only an index (#1798).

Notes on (optional) dependencies:

GeoPandas 0.9.0 dropped support for Python 3.5. Further, the minimum required versions are pandas 0.24, numpy 1.15 and shapely 1.6 and fiona 1.8.

The descartes package is no longer required for plotting polygons. This functionality is now included by default in GeoPandas itself, when matplotlib is available (#1677).

Fiona is now only imported when used in read_file/to_file. This means you can now force geopandas to install without fiona installed (although it is still a default requirement) (#1775).

Compatibility with the upcoming Shapely 1.8 (#1659, #1662, #1819).

Acknowledgments

Thanks to everyone who contributed to this release! A total of 29 people contributed patches to this release. People with a "+" by their names contributed a patch for the first time.

Adam J. Stewart +

Adrian Garcia Badaracco

Alan D. Snow

Brendan Ward

Charlie +

Dave Rench McCauley +

Flavin +

Giacomo Caria +

Ian Rose

Imanol +

Isaac Boates +

Jacob Hayes +

Jake Clarke +

James McBride

Joris Van den Bossche

Martijn Visser +

Martin Fleischmann

Nick Hand +

Rowan Molony

Sergio Rey

Sönke Schmachtel +

Tim Gates +

WANG Aiyong +

Will Schlitzer +

abonte

bretttully +

donlo

sangarshanan

vangorade +

Source code(tar.gz)
Source code(zip)
geopandas-0.9.0.tar.gz(968.49 KB)
v0.8.2(Jan 25, 2021)

Small bug-fix release for compatibility with PyGEOS 0.9
Source code(tar.gz)
Source code(zip)
geopandas-0.8.2.tar.gz(938.64 KB)
v0.8.1(Jul 15, 2020)
Small bug-fix release:

Fix a regression in the plot() method when visualizing with a JenksCaspallSampled or FisherJenksSampled scheme (#1486).

Fix spurious warning in GeoDataFrame.to_postgis (#1497).

Fix the un-pickling with pd.read_pickle of files written with older GeoPandas versions (#1511).

Thanks to Ian Rose, Joris Van den Bossche and Martin Fleischmann for their contributions!
Source code(tar.gz)
Source code(zip)
geopandas-0.8.1.tar.gz(938.98 KB)
v0.8.0(Jun 24, 2020)
Experimental: optional use of PyGEOS to speed up spatial operations (#1155). PyGEOS is a faster alternative for Shapely (being contributed back to a future version of Shapely), and is used in element-wise spatial operations and for spatial index in e.g. sjoin (#1343, #1401, #1421, #1427, #1428). See the installation docs for more info and how to enable it.

New features and improvements:

IO enhancements:

New GeoDataFrame.to_postgis() method to write to PostGIS database (#1248).

New Apache Parquet and Feather file format support (#1180, #1435)

Allow appending to files with GeoDataFrame.to_file (#1229).

Add support for the ignore_geometry keyword in read_file to only read the attribute data. If set to True, a pandas DataFrame without geometry is returned (#1383).

geopandas.read_file now supports reading from file-like objects (#1329).

GeoDataFrame.to_file now supports specifying the CRS to write to the file (#802). By default it still uses the CRS of the GeoDataFrame.

New chunksize keyword in geopandas.read_postgis to read a query in chunks (#1123).

Improvements related to geometry columns and CRS:

Any column of the GeoDataFrame that has a "geometry" dtype is now returned as a GeoSeries. This means that when having multiple geometry columns, not only the "active" geometry column is returned as a GeoSeries, but also accessing another geometry column (gdf["other_geom_column"]) gives a GeoSeries (#1336).

Multiple geometry columns in a GeoDataFrame can now each have a different CRS. The global gdf.crs attribute continues to returns the CRS of the "active" geometry column. The CRS of other geometry columns can be accessed from the column itself (eg gdf["other_geom_column"].crs) (#1339).

New set_crs() method on GeoDataFrame/GeoSeries to set the CRS of naive geometries (#747).

Improvements related to plotting:

The y-axis is now scaled depending on the center of the plot when using a geographic CRS, instead of using an equal aspect ratio (#1290).

When passing a column of categorical dtype to the column= keyword of the GeoDataFrame plot(), we now honor all categories and its order (#1483). In addition, a new categories keyword allows to specify all categories and their order otherwise (#1173).

For choropleths using a classification scheme (using scheme=), the legend_kwds accept two new keywords to control the formatting of the legend: fmt with a format string for the bin edges (#1253), and labels to pass fully custom class labels (#1302).

New covers() and covered_by() methods on GeoSeries/GeoDataframe for the equivalent spatial predicates (#1460, #1462).

GeoPandas now warns when using distance-based methods with data in a geographic projection (#1378).

Deprecations:

When constructing a GeoSeries or GeoDataFrame from data that already has a CRS, a deprecation warning is raised when both CRS don't match, and in the future an error will be raised in such a case. You can use the new set_crs method to override an existing CRS. See the docs.

The helper functions in the geopandas.plotting module are deprecated for public usage (#656).

The geopandas.io functions are deprecated, use the top-level read_file and to_file instead (#1407).

The set operators (&, |, ^, -) are deprecated, use the intersection(), union(), symmetric_difference(), difference() methods instead (#1255).

The sindex for empty dataframe will in the future return an empty spatial index instead of None (#1438).

The objects keyword in the intersection method of the spatial index returned by the sindex attribute is deprecated and will be removed in the future (#1440).

Bug fixes:

Fix the total_bounds() method to ignore missing and empty geometries (#1312).

Fix geopandas.clip when masking with non-overlapping area resulting in an empty GeoDataFrame (#1309, #1365).

Fix error in geopandas.sjoin when joining on an empty geometry column (#1318).

CRS related fixes: pandas.concat preserves CRS when concatenating GeoSeries objects (#1340), preserve the CRS in geopandas.clip (#1362) and in GeoDataFrame.astype (#1366).

Fix bug in GeoDataFrame.explode() when 'level_1' is one of the column names (#1445).

Better error message when rtree is not installed (#1425).

Fix bug in GeoSeries.equals() (#1451).

Fix plotting of multi-part geometries with additional style keywords (#1385).

And we now have a Code of Conduct!

GeoPandas 0.8.0 is the last release to support Python 3.5. The next release will require Python 3.6, pandas 0.24, numpy 1.15 and shapely 1.6 or higher.

Acknowledgments

Thanks to everyone who contributed to this release! A total of 28 people contributed patches to this release. People with a "+" by their names contributed a patch for the first time.

Adrian Garcia Badaracco +

Alan D. Snow

Bhavika Tekwani +

Bo-Deng +

Brendan Ward

Christopher Yeh +

Geir Arne Hjelle

Henrikki Tenkanen +

Ian Rose

James McBride

Joris Van den Bossche

Julia Signell +

Kyle Barron +

Martin Fleischmann

Martin Jul +

Mateusz Konieczny +

Mike Taves

Oliver Schillinger +

Rowan Molony +

Sergio Rey

SylvainLan +

TimothyLucas +

abonte +

harryposner +

pietro +

raphacosta27 +

rwijtvliet +

sangarshanan

Source code(tar.gz)
Source code(zip)
geopandas-0.8.0.tar.gz(938.47 KB)
v0.7.0(Feb 17, 2020)
Support for Python 2.7 has been dropped. GeoPandas now works with Python >= 3.5.

The important API change of this release is that GeoPandas now requires PROJ > 6 and pyproj > 2.2, and that the .crs attribute of a GeoSeries and GeoDataFrame no longer stores the CRS information as a proj4 string or dict, but as a pyproj.CRS object (#1101).

This gives a better user interface and integrates improvements from pyproj and PROJ 6, but might also require some changes in your code. Check the migration guide in the documentation.

Other API changes:

The GeoDataFrame.to_file method will now also write the GeoDataFrame index to the file, if the index is named and/or non-integer. You can use the index=True/False keyword to overwrite this default inference (#1059).

New features and improvements:

A new geopandas.clip function to clip a GeoDataFrame to the spatial extent of another shape (#1128).

The geopandas.overlay function now works for all geometry types, including points and linestrings in addition to polygons (#1110).

The plot() method gained support for missing values (in the column that determines the colors). By default it doesn't plot the corresponding geometries, but using the new missing_kwds argument you can specify how to style those geometries (#1156).

The plot() method now also supports plotting GeometryCollection and LinearRing objects (#1225).

Added support for filtering with a geometry or reading a subset of the rows in geopandas.read_file (#1160).

Added support for the new nullable integer data type of pandas in GeoDataFrame.to_file (#1220).

Bug fixes:

GeoSeries.reset_index() now correctly results in a GeoDataFrame instead of DataFrame (#1252).

Fixed the geopandas.sjoin function to handle MultiIndex correctly (#1159).

Fixed the geopandas.sjoin function to preserve the index name of the left GeoDataFrame (#1150).

Acknowledgments

Thanks to everyone who contributed to this release! A total of 12 people contributed patches to this release. People with a "+" by their names contributed a patch for the first time.

Alan D. Snow

Aleksey Bilogur

Ardie Orden +

Brendan Ward +

Brett Naul

James McBride

Joris Van den Bossche

Leah Wasser

Martin Fleischmann

Mike Taves

jess +

sangarshanan +

Source code(tar.gz)
Source code(zip)
geopandas-0.7.0.tar.gz(905.59 KB)
v0.6.3(Feb 6, 2020)
Small bug-fix release:

Compatibility with Shapely 1.7 and pandas 1.0 (#1244).

Fix GeoDataFrame.fillna to accept non-geometry values again when there are no missing values in the geometry column. This should make it easier to fill the numerical columns of the GeoDataFrame (#1279).

Source code(tar.gz)
Source code(zip)
geopandas-0.6.3.tar.gz(899.08 KB)
v0.6.2(Nov 18, 2019)
Small bug-fix release fixing a few regressions:

Fix a regression in passing an array of RRB(A) tuples to the .plot() method (#1178, #1211).

Fix the bounds and total_bounds attributes for empty GeoSeries, which also fixes the repr of an empty or all-NA GeoSeries (#1184, #1195).

Fix filtering of a GeoDataFrame to preserve the index type when ending up with an empty result (#1190).

Source code(tar.gz)
Source code(zip)
geopandas-0.6.2.tar.gz(898.70 KB)
v0.6.1(Oct 12, 2019)
Small bug-fix release fixing a few regressions:

Fix astype when converting to string with Multi geometries (#1145) or when converting a dataframe without geometries (#1144).

Fix GeoSeries.fillna to accept np.nan again (#1149).

Source code(tar.gz)
Source code(zip)
geopandas-0.6.1.tar.gz(897.26 KB)
v0.6.0(Sep 27, 2019)
GeoPandas 0.6.0 features a refactor of the internals based on the new pandas ExtensionArray interface, for better integration with pandas. Although this change should keep the user interface mostly stable, there are a few changes summarized below. Further, this release includes a nice set of other improvements and bug fixes.

Important note! This will be the last release to support Python 2.7 (#1031)

API changes:

A refactor of the internals based on the pandas ExtensionArray interface (#1000). Read more on that in this blogpost, and the main user visible changes are:

The .dtype of a GeoSeries is now a 'geometry' dtype (and no longer a numpy object dtype).

The .values of a GeoSeries now returns a custom GeometryArray, and no longer a numpy array. To get back a numpy array of Shapely scalars, you can convert explicitly using np.asarray(..).

The GeoSeries constructor now raises a warning when passed non-geometry data. Currently the constructor falls back to return a pandas Series, but in the future this will raise an error (#1085).

The missing value handling has been changed to now separate the concepts of missing geometries and empty geometries (#601, 1062). In practice this means that (see the docs for more details):

GeoSeries.isna now considers only missing values, and if you want to check for empty geometries, you can use GeoSeries.is_empty (GeoDataFrame.isna already only looked at missing values).

GeoSeries.dropna now actually drops missing values (before it didn't drop either missing or empty geometries)

GeoSeries.fillna only fills missing values (behaviour unchanged).

GeoSeries.align uses missing values instead of empty geometries by default to fill non-matching index entries.

New features and improvements:

Addition of a GeoSeries.affine_transform method, equivalent of Shapely's function (#1008).

Addition of a GeoDataFrame.rename_geometry method to easily rename the active geometry column (#1053).

Addition of geopandas.show_versions() function, which can be used to give an overview of the installed libraries in bug reports (#899).

The legend_kwds keyword of the plot() method can now also be used to specify keywords for the color bar (#1102).

Performance improvement in the sjoin() operation by re-using existing spatial index of the input dataframes, if available (#789).

Updated documentation to work with latest version of geoplot and contextily (#1044, #1088).

A new geopandas.options configuration, with currently a single option to control the display precision of the coordinates (options.display_precision). The default is now to show less coordinates (3 for projected and 5 for geographic coordinates), but the default can be overridden with the option.

Bug fixes:

Also try to use pysal instead of mapclassify if available (#1082).

The GeoDataFrame.astype() method now correctly returns a GeoDataFrame if the geometry column is preserved (#1009).

The to_crs method now uses always_xy=True to ensure correct lon/lat order handling for pyproj>=2.2.0 (#1122).

Fixed passing list-like colors in the plot() method in case of "multi" geometries (#1119).

Fixed the coloring of shapes and colorbar when passing a custom norm in the plot() method (#1091, #1089).

Fixed GeoDataFrame.to_file to preserve VFS file paths (e.g. when a "s3://" path is specified) (#1124).

Fixed failing case in geopandas.sjoin with empty geometries (#1138).

In addition, the minimum required versions of some dependencies have been increased: GeoPandas now requires pandas >=0.23.4 and matplotlib >=2.0.1 (#1002).

Acknowledgments

Thanks to everyone who contributed to this release! A total of 20 people contributed patches to this release. People with a "+" by their names contributed a patch for the first time.

Alan D. Snow +

Aleksey Bilogur

Archana Alva +

François Leblanc

Geir Arne Hjelle

Ian Rose +

James Gaboardi +

James McBride

Joris Van den Bossche

Joshua Wilson

Kushal Borkar +

Leah Wasser

Martin Fleischmann

Mike Taves +

René Buffat +

Sergio Rey +

Thomas Pinder +

awa5114 +

donlo +

jbrockmendel +

Source code(tar.gz)
Source code(zip)
geopandas-0.6.0.tar.gz(896.99 KB)
v0.6.0rc1(Aug 13, 2019)

Source code(tar.gz)
Source code(zip)
geopandas-0.6.0rc1.tar.gz(893.39 KB)
v0.5.1(Jul 11, 2019)
Compatibility with latest mapclassify version 2.1.0 (#1025).

Source code(tar.gz)
Source code(zip)
geopandas-0.5.1.tar.gz(876.20 KB)
v0.5.0(Apr 25, 2019)
GeoPandas 0.5.0 includes some improvements for writing files with fiona (better performance, better support for data types and mixed geometry types), along with many other new features and bug fixes, see the full list below.

Improvements:

Significant performance improvement (around 10x) for GeoDataFrame.iterfeatures, which also improves GeoDataFrame.to_file (#864).

File IO enhancements based on Fiona 1.8:

Support for writing bool dtype (#855) and datetime dtype, if the file format supports it (#728).

Support for writing dataframes with multiple geometry types, if the file format allows it (e.g. GeoJSON for all types, or ESRI Shapefile for Polygon+MultiPolygon) (#827, #867, #870).

Compatibility with pyproj >= 2 (#962).

A new geopandas.points_from_xy() helper function to convert x and y coordinates to Point objects (#896).

The buffer and interpolate methods now accept an array-like to specify a variable distance for each geometry (#781).

Addition of a relate method, corresponding to the shapely method that returns the DE-9IM matrix (#853).

Plotting improvements:

Performance improvement in plotting by only flattening the geometries if there are actually 'Multi' geometries (#785).

Choropleths: access to all mapclassify classification schemes and addition of the classification_kwds keyword in the plot method to specify options for the scheme (#876).

Ability to specify a matplotlib axes object on which to plot the color bar with the cax keyword, in order to have more control over the color bar placement (#894).

Changed the default provider in geopandas.tools.geocode from Google (now requires an API key) to Geocode.Farm (#907, #975).

Bug fixes:

Remove the edge in the legend marker (#807).

Fix the align method to preserve the CRS (#829).

Fix geopandas.testing.assert_geodataframe_equal to correctly compare left and right dataframes (#810).

Fix in choropleth mapping when the values contain missing values (#877).

Better error message in sjoin if the input is not a GeoDataFrame (#842).

Fix in read_postgis to handle nullable (missing) geometries (#856).

Correctly passing through the parse_dates keyword in read_postgis to the underlying pandas method (#860).

Fixed the shape of Antarctica in the included demo dataset 'naturalearth_lowres' (by updating to the latest version) (#804).

Acknowledgments

Thanks to everyone who contributed to this release! A total of 33 people contributed to this release. People with a "+" by their names contributed a patch for the first time.

Andy Garfield +

Benjamin Goldenberg +

Brett Naul +

Brian Lewis +

Dmitry Nikolaev +

Dr Martin Black +

Filipe

Geir Arne Hjelle

Géraud +

Henry Walshaw +

James McBride

Jesse Pisel +

Joris Van den Bossche

Joshua Wilson

Justin Shenk +

Kris Vanhoof +

Leah Wasser +

Levi John Wolf

Martin Fleischmann +

Matthieu Viry +

Philipp Kats +

Pratap Vardhan +

Pulkit Maloo +

Raphael Delhome +

Sean Gillies

Simon Andersson +

TimoRoth +

Yohann Rebattu +

YuichiNotoya +

byrman +

lmmarsano +

Émile Nadeau +

Ömer Özak

Source code(tar.gz)
Source code(zip)
geopandas-0.5.0.tar.gz(876.09 KB)
v0.4.1(Mar 6, 2019)
Small bug-fix release for compatibility with the latest Fiona and PySAL releases:

Compatibility with Fiona 1.8: fix deprecation warning (#854 and #916).

Compatibility with PySAL 2.0: switched to mapclassify instead of PySAL as dependency for choropleth mapping with the scheme keyword (#872).

Fix for new overlay implementation in case the intersection is empty (#800).

Acknowledgments

A total of 7 people contributed to this release. People with a "+" by their names contributed a patch for the first time.

Filipe

Joris Van den Bossche

Kris Vanhoof +

Martin Fleischmann +

Simon Andersson +

TimoRoth +

Ömer Özak

Source code(tar.gz)
Source code(zip)
geopandas-0.4.1.tar.gz(884.17 KB)
v0.4.0(Jul 16, 2018)
GeoPandas 0.4.0 improves the overlay functionality (much better performance, and fixed behaviour for certain cases). This can possibly change results you obtained before, but likely more correct now. But given this change, please test and feedback welcome! Further, there is a long list of other new features and bug fixes, see below.

GeoPandas can be installed with conda from the conda-forge channel (conda install -c conda-forge geopandas) or with pip assuming the dependencies are available for your platform (pip install geopandas).

Improvements:

Improved overlay function (better performance, several incorrect behaviours fixed) (#429)

Pass keywords to control legend behavior (legend_kwds) to plot (#434)

Add basic support for reading remote datasets in read_file (#531)

Pass kwargs for buffer operation on GeoSeries (#535)

Expose all geopy services as options in geocoding (#550)

Faster write speeds to GeoPackage (#605)

Permit read_file filtering with a bounding box from a GeoDataFrame (#613)

Set CRS on GeoDataFrame returned by read_postgis (#627)

Permit setting markersize for Point GeoSeries plots with column values (#633)

Started an example gallery (#463, #690, #717)

Support for plotting MultiPoints (#683)

Testing functionalty (e.g. assert_geodataframe_equal) is now publicly exposed (#707)

Add explode method to GeoDataFrame (similar to the GeoSeries method) (#671)

Set equal aspect on active axis on multi-axis figures (#718)

Pass array of values to column argument in plot (#770)

Bug fixes:

Ensure that colorbars are plotted on the correct axis (#523)

Handle plotting empty GeoDataFrame (#571)

Save z-dimension when writing files (#652)

Handle reading empty shapefiles (#653)

Correct dtype for empty result of spatial operations (#685)

Fix empty sjoin handling for pandas>=0.23 (#762)

Acknowledgments

Thanks to everyone who contributed to this release! A total of 26 people contributed to this release. People with a "+" by their names contributed a patch for the first time.

Aleksey Bilogur

Andrew Feierman +

Benjamin Root +

Chris Holdgraf

Christopher Ren +

Dani Arribas-Bel

Dmitri Lebedev +

Edward Betts +

Elliott Sales de Andrade +

Fabien Maussion +

Filipe +

François Leblanc +

Geir Arne Hjelle +

James McBride

Joris Van den Bossche

Joshua Wilson +

Levi John Wolf +

Ramiro Gómez +

Robert Gieseke +

Rutger Hofste +

Tim Tröndle

balmandhunter +

mrahim +

pinto531 +

robochat +

Ömer Özak +

Source code(tar.gz)
Source code(zip)
geopandas-0.4.0.tar.gz(883.83 KB)
v0.3.0(Aug 28, 2017)
Improvements:

Improve plotting performance using matplotlib.collections (#267)

Improve default plotting appearance. The defaults now follow the new matplotlib defaults (#318, #502, #510)

Provide access to x/y coordinates as attributes for Point GeoSeries (#383)

Make the NYBB dataset available through geopandas.datasets (#384)

Enable sjoin on non-integer-index GeoDataFrames (#422)

Add cx indexer to GeoDataFrame (#482)

GeoDataFrame.from_features now also accepts a Feature Collection (#225, #507)

Use index label instead of integer id in output of iterfeatures and to_json (#421)

Return empty data frame rather than raising an error when performing a spatial join with non overlapping geodataframes (#335)

Bug fixes:

Compatibility with shapely 1.6.0 (#512)

Fix fiona.filter results when bbox is not None (#372)

Fix dissolve to retain CRS (#389)

Fix cx behavior when using index of 0 (#478)

Fix display of lower bin in legend label of choropleth plots using a PySAL scheme (#450)

Source code(tar.gz)
Source code(zip)
geopandas-0.3.0.tar.gz(870.79 KB)
v0.2.1(Jul 30, 2016)

Source code(tar.gz)
Source code(zip)