Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/store paths #2272

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Feat/store paths #2272

wants to merge 5 commits into from

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Sep 30, 2024

Adds a path attribute to all the store classes. Key operations like get, set, list, list_prefix etc are now scoped to the subtree rooted by the path attribute of the store, which is how RemoteStore works in the v3 branch. This change is motivated by the desire for a consistent, scalable store API.

Key changes:

  • renamed LocalStore.root to LocalStore.path, and changed the type of the attribute from pathlib.Path to str
  • added a path attribute to MemoryStore
  • changed the semantics of the path attribute of ZipStore, and introduced a new attribute file_path that does what ZipStore.path used to do (point to the location of the Zipfile in the parent file system).

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 30, 2024

Just a note, this fixes a failure in some xarray tests that write an array at the top level of the store, open a group at the root and then get the array

Edit: whoops, never mind. I missed that this only affects zarr_format=2 when testing. I'll make a new issue.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @d-v-b. I'm not 100% convinced the with_mode approach is going to be substantially cleaner than our StorePath right now but I'm curious to see how this plays out.

@@ -320,6 +336,13 @@ async def _get_many(
for req in requests:
yield (req[0], await self.get(*req))

def with_path(self, path: str) -> Self:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def with_path(self, path: str) -> Self:
@abstractmethod
def with_path(self, path: str) -> Self:

Or do you think think we can do this in a generic way that applies to all stores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can make it generic unfortunately, because each store class might have its own extra attributes. maybe we could use __getstate__ to get around this?

if path.is_dir(): # TODO: support deleting directories? shutil.rmtree?
shutil.rmtree(path)
else:
await to_thread(path.unlink, True) # Q: we may want to raise if path is missing

async def exists(self, key: str) -> bool:
path = self.root / key
path = Path(self.path) / key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do this a lot. I wonder if the store could have a private attr like self._ppath = Path(self.path) that does this once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar thought when I found myself doing this over and over again 😅

@@ -28,12 +29,15 @@ def __init__(
self,
store_dict: MutableMapping[str, Buffer] | None = None,
*,
path: str = "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this store highlights the fact that not all stores have a meaningful path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree -- if store_dict really is an arbitrary mutable mapping, there's no guarantee that we want all IO operations to be scoped to the top-level key space, and that's exactly what the path attribute is for.

Comment on lines +55 to +57
file_path: Path | str,
*,
path: str = "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels quite confusing. IIUC, you are allowing the caller to specify a path within the zipfile as opposed to putting the root of the store at the root of the zip, is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly! this is required to make ZipStore congruent to RemoteStore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though this may have some impact (simplifying) zarr-developers/zarr-specs#311

@jhamman jhamman added the V3 Affects the v3 branch label Oct 11, 2024
@jhamman jhamman changed the base branch from v3 to main October 14, 2024 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
V3 Affects the v3 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants