Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Virtual Datasets] Local Storage ValueError: Filenames containing trailing '/#\d+/' are not supported: #279

Closed
norlandrhagen opened this issue Oct 16, 2024 · 7 comments

Comments

@norlandrhagen
Copy link

norlandrhagen commented Oct 16, 2024

Running into an error when trying to roundtrip a virtual dataset using the Local Storage config. 🤞 this is a StorageConfig setup issue on my part.

ValueError: store error: unsuccessful repository operation: `error when handling virtual reference error fetching virtual reference Generic LocalFileSystem error: Filenames containing trailing '/#\d+/' are not supported: `

MRE:

# mamba create --name mre_icechunk python=3.12 -y && \
# mamba activate mre_icechunk && \
# pip install git+https://github.com/mpiannucci/VirtualiZarr@icechunk && \
# pip install git+https://github.com/pydata/xarray@zarr-v3 && \
# pip install git+https://github.com/mpiannucci/kerchunk@v3 --no-deps && \
# pip install git+https://github.com/mpiannucci/numcodecs@zarr3-codecs-beta-sync --no-deps && \
# pip install xarray pooch netcdf4

import icechunk 
import xarray as xr 
from virtualizarr import open_virtual_dataset
from virtualizarr.writers.icechunk import dataset_to_icechunk

# write netcdf to local
ds = xr.tutorial.open_dataset('air_temperature')
ds.to_netcdf('air.nc', mode='w')

# local config
storage_config = icechunk.StorageConfig.filesystem("icechunk/air")
store = icechunk.IcechunkStore.create(storage_config)

# create virtualizarr dataset
vds = open_virtual_dataset('air.nc', indexes={})

# write vds to store
dataset_to_icechunk(vds, store)

# commit to store
store.commit(message="initial commit air_temp virtual dataset")

# round trip 
# ValueError happens here
ds = xr.open_zarr(store, zarr_version=3, consolidated=False)

Thanks for all the integration work so far @mpiannucci @TomNicholas 🎊

File ~/micromamba/envs/tmp_icechunk/lib/python3.12/site-packages/icechunk/__init__.py:441, in IcechunkStore.get(self, key, prototype, byte_range)
    428 """Retrieve the value associated with a given key.
    429 
    430 Parameters
   (...)
    437 Buffer
    438 """
    440 try:
--> 441     result = await self._store.get(key, byte_range)
    442 except KeyNotFound as _e:
    443     # Zarr python expects None to be returned if the key does not exist
    444     # but an IcechunkStore returns an error if the key does not exist
    445     return None

ValueError: store error: unsuccessful repository operation: `error when handling virtual reference error fetching virtual reference Generic LocalFileSystem error: Filenames containing trailing '/#\d+/' are not supported: `
@norlandrhagen
Copy link
Author

norlandrhagen commented Oct 16, 2024

Pointing to a NetCDF on s3 works fine, even with local config!

Seems like this might be more on the Virtualizarr side 🤔
related? zarr-developers/VirtualiZarr#242

storage_config = icechunk.StorageConfig.filesystem("icechunk/air")

store = icechunk.IcechunkStore.create(storage=storage_config, mode='w', config=StoreConfig(
    virtual_ref_config=VirtualRefConfig.s3_anonymous(region='us-west-2')))

vds = open_virtual_dataset('s3://carbonplan-scratch/air.nc', indexes={})

@mpiannucci
Copy link
Contributor

@norlandrhagen if you get a chance, can you try using an absolute path instead of relative? I think you may be right

@mpiannucci
Copy link
Contributor

Confirmed... running this with an absolute path is required at this time! Thanks for posting this

@rabernat
Copy link
Contributor

Hi @norlandrhagen - it was a deliberate choice to not allow relative paths in Icechunk. The hope is that this requirement makes the Icechunk stores more robust, since moving the store does not imply moving the files themselves. This is of course different to how Kerchunk references work. In my experience, it's easy to break Kerchunk references, and we want to avoid that here.

We'd love to hear a bit more about your use case and if relative paths are truly necessary. Can you get by with absolute paths here?


In any case, a better error message and docs are definitely needed about this point.

@TomNicholas
Copy link
Collaborator

Would zarr-developers/VirtualiZarr#243 fix this problem?

@mpiannucci
Copy link
Contributor

mpiannucci commented Oct 16, 2024

Would zarr-developers/VirtualiZarr#243 fix this problem?

yup! I think so because it currently works fine as an absolute path

@norlandrhagen
Copy link
Author

Thanks for clearing this up @mpiannucci and @rabernat! Totally fine with avoiding relative paths, I think making Icechunk stores more robust is well worth the tradeoff 🧮 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants