-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility for zarr-python 3.x #9552
base: main
Are you sure you want to change the base?
Conversation
1ed4ef1
to
bb2bb6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This set of changes should be backwards compatible and work with zarr-python 2.x (so reading and writing zarr v2 data).
I'll work through zarr-python 3.x now. I think we might want to parametrize most of these tests by zarr_version=[2, 3]
to confirm that we can read / write zarr v2 data with zarr-python 3.x
xarray/backends/zarr.py
Outdated
|
||
if _zarr_v3() and zarr_array.metadata.zarr_format == 3: | ||
encoding["codec_pipeline"] = [ | ||
x.to_dict() for x in zarr_array.metadata.codecs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this instead?
x.to_dict() for x in zarr_array.metadata.codecs | |
zarr_array.metadata.to_dict()["codecs"] |
A bit wasteful since everything has to be serialized, but presumably zarr knows better how to serialize the codec pipeline than we do here?
9f2cb2f
to
d11d593
Compare
* removed open_consolidated workarounds * removed _store_version check * pass through zarr_version
a324329
to
6087e5e
Compare
- skip write_empty_chunks on 3.x - update patch targets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress here @TomAugspurger. I'm impressed by how little you've changed in the backend itself and I'm noting the pain around testing (I felt some of that w/ dask as well).
if consolidated is None: | ||
try: | ||
zarr_group = zarr.open_consolidated(store, **open_kwargs) | ||
except KeyError: | ||
except (ValueError, KeyError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the Zarr side, it may be nice to raise a a custom exception when consolidated metadata is not found. Something like:
class ConsolidatedMetadataNotFound(FileNotFoundError):
pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the only zarr store that supports storage options is |
Let's skip this test with v3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving this the 👍 ! Thanks @TomAugspurger for all the work here (and in Zarr-Python)! A major step forward for Xarray and Zarr.
@pydata/xarray - are there others that are hoping to review this before it goes in? Noting that most of the surface area here is in the tests, the actual code changes are quite minimal. I'll also note that the remaining failing CI checks are, as far as I can tell, unrelated to this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally seems okay to me, as long as we're confident about changing those values in some of the tests...
My comments are just pointing out ToDos and commented-out lines that have maybe been forgotten about.
@@ -1307,6 +1346,8 @@ def test_explicitly_omit_fill_value(self) -> None: | |||
with self.roundtrip(ds) as actual: | |||
assert "_FillValue" not in actual.x.encoding | |||
|
|||
# TODO: decide if this test is really necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TODO: decide if this test is really necessary |
"get": 16, # TODO: fixme upstream (should be 8) | ||
"list_dir": 3, # TODO: fixme upstream (should be 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to specific issues?
I just pushed a commit reverting the changes to avoid values equal to the I think this is ready to go once CI finishes. I expect upstream-ci to fail on the |
There's one typing failure we might want to address:
I'll do some reading about how best to handle type annotations when the proper type depends on the version of a dependency. Edit: a complication here is that this is in |
I don't see why the typing of |
Good catch, this affects both. I was hoping something like this would work: from pathlib import Path
try:
from zarr.storage import StoreLike as _StoreLike
except ImportError:
_StoreLike = str | Path
StoreLike = type[_StoreLike]
def f(x: StoreLike) -> StoreLike:
return x but mypy doesn't like that
|
This PR begins the process of adding compatibility with zarr-python 3.x. It's intended to be run against zarr-python v3 + the open PRs referenced in #9515.
All of the zarr test cases should be parameterized by
zarr_format=[2, 3]
with zarr-python 3.x to exercise reading and writing both formats.This is currently passing with zarr-python==2.18.3.
zarr-python 3.x has about 61 failures, all of which are related to data types that aren't yet implemented in zarr-python 3.x.I'll also note that #5475 is going to become a larger issue once people start writing Zarr-V3 datasets.
whats-new.rst