Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLayerError when writing using fsspec #430

Open
DahnJ opened this issue Jun 24, 2024 · 2 comments
Open

DataLayerError when writing using fsspec #430

DahnJ opened this issue Jun 24, 2024 · 2 comments

Comments

@DahnJ
Copy link

DahnJ commented Jun 24, 2024

Description

pyogrio raises DataLayerError: The layer name may not contain special characters or spaces when passing in file object using fsspec's file.open

Traceback
CPLE_AppDefinedError                      Traceback (most recent call last)
File ~/miniconda3/envs/geo312/lib/python3.12/site-packages/pyogrio/_io.pyx:2183, in pyogrio._io.create_ogr_dataset_layer()

File ~/miniconda3/envs/geo312/lib/python3.12/site-packages/pyogrio/_err.pyx:183, in pyogrio._err.exc_wrap_pointer()

CPLE_AppDefinedError: The layer name may not contain special characters or spaces

During handling of the above exception, another exception occurred:

DataLayerError                            Traceback (most recent call last)
Cell In[8], line 10
      7 gdf = gpd.read_file(path)
      9 with fs.open('test.gpkg', mode='wb') as file:
---> 10     gdf.to_file(file, driver='GPKG')

File ~/miniconda3/envs/geo312/lib/python3.12/site-packages/geopandas/geodataframe.py:1536, in GeoDataFrame.to_file(self, filename, driver, schema, index, **kwargs)
   1441 """Write the ``GeoDataFrame`` to a file.
   1442 
   1443 By default, an ESRI shapefile is written, but any OGR data source
   (...)
   1532 
   1533 """
   1534 from geopandas.io.file import _to_file
-> 1536 _to_file(self, filename, driver, schema, index, **kwargs)

File ~/miniconda3/envs/geo312/lib/python3.12/site-packages/geopandas/io/file.py:686, in _to_file(df, filename, driver, schema, index, mode, crs, engine, metadata, **kwargs)
    683     raise ValueError(f"'mode' should be one of 'w' or 'a', got '{mode}' instead")
    685 if engine == "pyogrio":
--> 686     _to_file_pyogrio(df, filename, driver, schema, crs, mode, metadata, **kwargs)
    687 elif engine == "fiona":
    688     _to_file_fiona(df, filename, driver, schema, crs, mode, metadata, **kwargs)

File ~/miniconda3/envs/geo312/lib/python3.12/site-packages/geopandas/io/file.py:748, in _to_file_pyogrio(df, filename, driver, schema, crs, mode, metadata, **kwargs)
    745 if not df.columns.is_unique:
    746     raise ValueError("GeoDataFrame cannot contain duplicated column names.")
--> 748 pyogrio.write_dataframe(df, filename, driver=driver, metadata=metadata, **kwargs)

File ~/miniconda3/envs/geo312/lib/python3.12/site-packages/pyogrio/geopandas.py:654, in write_dataframe(df, path, layer, driver, encoding, geometry_type, promote_to_multi, nan_as_null, append, use_arrow, dataset_metadata, layer_metadata, metadata, dataset_options, layer_options, **kwargs)
    651 if geometry_column is not None:
    652     geometry = to_wkb(geometry.values)
--> 654 write(
    655     path,
    656     layer=layer,
    657     driver=driver,
    658     geometry=geometry,
    659     field_data=field_data,
    660     field_mask=field_mask,
    661     fields=fields,
    662     crs=crs,
    663     geometry_type=geometry_type,
    664     encoding=encoding,
    665     promote_to_multi=promote_to_multi,
    666     nan_as_null=nan_as_null,
    667     append=append,
    668     dataset_metadata=dataset_metadata,
    669     layer_metadata=layer_metadata,
    670     metadata=metadata,
    671     dataset_options=dataset_options,
    672     layer_options=layer_options,
    673     gdal_tz_offsets=gdal_tz_offsets,
    674     **kwargs,
    675 )

File ~/miniconda3/envs/geo312/lib/python3.12/site-packages/pyogrio/raw.py:709, in write(path, geometry, field_data, fields, field_mask, layer, driver, geometry_type, crs, encoding, promote_to_multi, nan_as_null, append, dataset_metadata, layer_metadata, metadata, dataset_options, layer_options, gdal_tz_offsets, **kwargs)
    704 # preprocess kwargs and split in dataset and layer creation options
    705 dataset_kwargs, layer_kwargs = _preprocess_options_kwargs(
    706     driver, dataset_options, layer_options, kwargs
    707 )
--> 709 ogr_write(
    710     path,
    711     layer=layer,
    712     driver=driver,
    713     geometry=geometry,
    714     geometry_type=geometry_type,
    715     field_data=field_data,
    716     field_mask=field_mask,
    717     fields=fields,
    718     crs=crs,
    719     encoding=encoding,
    720     promote_to_multi=promote_to_multi,
    721     nan_as_null=nan_as_null,
    722     append=append,
    723     dataset_metadata=dataset_metadata,
    724     layer_metadata=layer_metadata,
    725     dataset_kwargs=dataset_kwargs,
    726     layer_kwargs=layer_kwargs,
    727     gdal_tz_offsets=gdal_tz_offsets,
    728 )

File ~/miniconda3/envs/geo312/lib/python3.12/site-packages/pyogrio/_io.pyx:2298, in pyogrio._io.ogr_write()

File ~/miniconda3/envs/geo312/lib/python3.12/site-packages/pyogrio/_io.pyx:2197, in pyogrio._io.create_ogr_dataset_layer()

DataLayerError: The layer name may not contain special characters or spaces

MRE

import geopandas as gpd
import geodatasets
from fsspec.implementations.local import LocalFileSystem

fs = LocalFileSystem()
path = geodatasets.get_path("naturalearth land")
gdf = gpd.read_file(path)

with fs.open('test.gpkg', mode='wb') as file:
    gdf.to_file(file, driver='GPKG')

Environment

>>> gpd.show_versions()
SYSTEM INFO
-----------
python     : 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:13:44) [Clang 16.0.6 ]
executable : [/Users/danieljahn/miniconda3/envs/geo312/bin/python](https://file+.vscode-resource.vscode-cdn.net/Users/danieljahn/miniconda3/envs/geo312/bin/python)
machine    : macOS-14.5-arm64-arm-64bit

GEOS, GDAL, PROJ INFO
---------------------
GEOS       : 3.12.1
GEOS lib   : None
GDAL       : 3.9.0
GDAL data dir: [/Users/danieljahn/miniconda3/envs/geo312/share/gdal/](https://file+.vscode-resource.vscode-cdn.net/Users/danieljahn/miniconda3/envs/geo312/share/gdal/)
PROJ       : 9.4.0
PROJ data dir: [/Users/danieljahn/miniconda3/envs/geo312/share/proj](https://file+.vscode-resource.vscode-cdn.net/Users/danieljahn/miniconda3/envs/geo312/share/proj)

PYTHON DEPENDENCIES
-------------------
geopandas  : 1.0.0
numpy      : 2.0.0
pandas     : 2.2.2
pyproj     : 3.6.1
shapely    : 2.0.4
pyogrio    : 0.9.0
geoalchemy2: None
geopy      : None
matplotlib : 3.8.4
mapclassify: 2.6.1
fiona      : None
psycopg    : None
psycopg2   : None
pyarrow    : None
@brendan-ward
Copy link
Member

In this case GDAL is not currently deriving the layer name from the open fsspec handle like it would if you were writing directly to a path passed in as a string / pathlib.Path.

The workaround should be to directly provide the layer name.

with fs.open('test.gpkg', mode='wb') as file:
    gdf.to_file(file, layer="test", driver='GPKG')

It is expected that you will get this warning bubbled up from GDAL:
RuntimeWarning: The filename extension should be 'gpkg' instead of '' to conform to the GPKG specification.

However, when Ido this, I get an empty GPKG file. Not yet sure why.

@brendan-ward
Copy link
Member

Ah, right, this currently isn't supported by Pyogrio, but we don't specifically attempt to catch and block this usage. Currently, the path parameter provided to write_dataframe (used by .to_file() when using pyogrio engine - now the default) is limited to either a BytesIO instance or something that can be coerced to a string.

In this case, coercing the fsspec file handle to a string produces bad inputs.

Instead, it looks like we should be detecting if the input has a .write() method and is not a BytesIO instance, and raise a NotImplementedError so that it is obvious it isn't going to work at the moment.

Unfortunately, this is a place where Fiona and Pyogrio differ (for now), because it works fine if you use the Fiona engine:

with fs.open('test.gpkg', mode='wb') as file:
    gdf.to_file(file, driver='GPKG', engine="fiona")

At the moment, Fiona's handling of alternative file interfaces is more advanced than in Pyogrio.

You can work around the issue by first writing to BytesIO:

from io import BytesIO

with fs.open('test.gpkg', mode='wb') as file:
    buffer = BytesIO()
    gdf.to_file(buffer, driver='GPKG')
    buffer.seek(0)
    file.write(buffer.getvalue())

Longer term, we plan to add better support for alternative file / filesystem interfaces, but those are a bit complex to integrate properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants