Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_dataframe with POINT EMPTY interprets geometry as None #436

Open
mwtoews opened this issue Jun 26, 2024 · 1 comment
Open

read_dataframe with POINT EMPTY interprets geometry as None #436

mwtoews opened this issue Jun 26, 2024 · 1 comment

Comments

@mwtoews
Copy link
Contributor

mwtoews commented Jun 26, 2024

While preparing #435 it seems that write_dataframe() is happy to write a dataframe with (e.g.) a "POINT EMPTY" geometry. However, read_dataframe() will read this geometry as None, so the two geodataframes don't round-trip. E.g.:

from pyogrio.geopandas import read_dataframe, write_dataframe
import geopandas as gp
from geopandas.array import from_wkt

expected = gp.GeoDataFrame({"x": [0]}, geometry=from_wkt(["POINT EMPTY"]), crs=4326)
print(expected)
#    x     geometry
# 0  0  POINT EMPTY

filename = "/tmp/test.shp"
write_dataframe(expected, filename)
df = read_dataframe(filename)
print(df)
#    x geometry
# 0  0     None

Note this is the same as fiona, e.g.:

gp.read_file(filename, engine="fiona")

returns the same. And raw fiona doesn't do much better except identify the geometry type in the schema:

import fiona
with fiona.open(filename) as ds:
    print(ds.meta["schema"])
    print([(idx, feat.geometry) for idx, feat in ds.items()])
# {'properties': {'x': 'int:18'}, 'geometry': 'Point'}
# [(0, None)]
@theroggy
Copy link
Member

theroggy commented Jun 28, 2024

Shapefile cannot make the distinction between NULL/None values versus POINT EMPTY, and the choice was made to return NULL/None when reading.

E.g. Geopackage does support the distinction between both, so there you will get the proper round-tripping...

import tempfile
from pyogrio.geopandas import read_dataframe, write_dataframe
import geopandas as gp
import shapely

for geom in [shapely.from_wkt("POINT EMPTY"), None]:
    for suffix in [".shp", ".gpkg"]:
        gdf = gp.GeoDataFrame({"x": [0]}, geometry=[geom], crs=4326)

        filename = f"{tempfile.gettempdir()}/test{suffix}"
        write_dataframe(gdf, filename)
        df = read_dataframe(filename)
        print(f"{suffix=}, {geom=}:  {df.geometry.iloc[0]}")
        # suffix='.shp', geom=<POINT EMPTY>:  None
        # suffix='.gpkg', geom=<POINT EMPTY>:  POINT EMPTY
        # suffix='.shp', geom=None:  None
        # suffix='.gpkg', geom=None:  None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants