Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return TypedDict instances from Metadata.to_dict() #2099

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ahmadjiha
Copy link

@ahmadjiha ahmadjiha commented Aug 19, 2024

Summary

This is a partial implementation for #1773.

So far, I have done the following:

  1. Make Metadata abc generic over Mapping[str, object]
  2. Define TypedDict classes for the metadata models that have a well-typed dict representation
  3. Set return value of the the model's to_dict() method to the relevant TypedDict
  4. Set type of metadata models from_dict() methods to the relevant TypedDict

Closes #1773

Notes

  1. Currently, the Metadata abc is generic over a type bound by Mapping[str, object]. I haven't been able to get Mapping[str, JSON] to work as a type bound
  2. The TypedDicts for the various Metadata classes were inferred from each respective classes' from_dict(), to_dict(), and initializer methods
  3. The codec metadata classes from_dict() methods utilize a variant of the overloaded parse_named_configuration function. This accepts a JSON data type, causing it to raise a mypy error when called w/ one of the TypedDict classes. This is the case even if we coerce the TypedDict back to a dict before calling it. I assume this method is performing runtime validation, so we would not want to substitute it w/ dev-time validation provided by the CodecDict classes
  4. Two calls in the ArrayV2Metadata.from_dict() method raise mypy errors due to the the ArrayV2MetadataDict being coerced to a dict to allow it to be mutated
  5. All tests w/ local stores pass

Questions

  1. In the codec metadata class from_dict() methods, would it be appropriate to ignore the mypy arg-type errors that are raised by parse_named_configuration()? Alternatively, parse_named_configuration()'s signature can be modified to accept a Mapping[str, Any] rather than JSON. This is not equivalent to JSON, but seems to be a sufficient substitute for each place that parse_named_configuration is currently called
  2. Similar to the previous question, would it be appropriate to ignore the mypy arg-type errors in the body of ArrayV2Metadata.from_dict()?
  3. There is a confusing mypy error:
    # src/zarr/core/group.py
    
    ...
    
    class GroupMetadataDict(TypedDict):
        """A dictionary representing a group metadata."""
    
        attributes: Mapping[str, Any]
        node_type: NotRequired[Literal["group"]]
    
    ...
    
    @dataclass(frozen=True)
    class AsyncGroup:
        ...
        @classmethod
        async def open(...):
            ...
    
            # src/zarr/core/group.py:218
            group_metadata = {**zgroup, "attributes": zattrs}  # Non-required key "node_type" not explicitly found in any ** item  [typeddict-item]
    
    From what I can understand, this is saying that node_type is never found when the dictionary zgroup is unpacked. However, node_type is defined w/ NotRequired. I searched around and was unable to find any documentation on this -- guidance on next steps would be appreciated!

Checklist

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@ahmadjiha
Copy link
Author

I experimented with making the Metadata base class generic and ran into issues with bounding the generic type with dict[str, JSON].

Making the Metadata class generic was easy enough:

# src/zarr/abc/metadata.py

...

T = TypeVar("T", bound=dict[str, JSON])

@dataclass(frozen=True)
class Metadata(Generic[T]):
    def to_dict(self) -> T:
        ...

However, I ran into issues with the TypedDict classes being outside of the bounds of the generic type. For example:

# src/zarr/core/chunk_key_encodings.py

...

class ChunkKeyEncodingDict(TypedDict):
    """A dictionary representing a chunk key encoding configuration."""

    name: str
    configuration: dict[Literal["separator"], SeparatorLiteral]


@dataclass(frozen=True)
class ChunkKeyEncoding(Metadata[ChunkKeyEncodingDict]):  # Type argument "ChunkKeyEncodingDict" of "Metadata" must be a subtype of "dict[str, JSON]"
    name: str
    separator: SeparatorLiteral = "."

It seems like we need a way to bound the TypedDict value types at the dictionary level. I've tried quite a few things -- I have yet to come across a solution.

Is it possible that this is not supported? Would appreciate any guidance @jhamman @d-v-b

@ahmadjiha ahmadjiha marked this pull request as ready for review September 2, 2024 00:17
@d-v-b
Copy link
Contributor

d-v-b commented Sep 2, 2024

T = TypeVar("T", bound=dict[str, JSON])

I think the problem is specifying the type bound to be dict[str, JSON]. First, typeddicts are supposed to be immutable, so they are not a subclass of dict. Mapping should be used instead.

Second, for some reason Mapping[str, JSON] won't work, but Mapping[str, object] does work as a type bound. I don't really know the answer to this one, hopefully someone can explain it.

Here's an example that I got working:

from abc import abstractmethod
from dataclasses import dataclass
from typing import Any, Generic, TypeVar, Mapping, TypedDict

T = TypeVar('T', bound=Mapping[str, object])

class Meta(Generic[T]):

    @abstractmethod
    def to_dict(self) -> T:
        ...

class ExampleSpec(TypedDict):
    a: str
    b: str

@dataclass
class Example(Meta[ExampleSpec]):
    a: str
    b: str

    def to_dict(self) -> ExampleSpec:
        return {'a': self.a, 'b': self.b}
    
x = Example(a='10', b='10')
y = x.to_dict()
reveal_type(y)
"""
Revealed type is "TypedDict('tessel.ExampleSpec', {'a': builtins.str, 'b': builtins.str})"
"""

@jhamman jhamman added the V3 Affects the v3 branch label Sep 13, 2024
@d-v-b d-v-b mentioned this pull request Oct 8, 2024
6 tasks
Define TypedDict classes for metadata models that have a well-typed dict
representation and set the return value of the the model's to_dict() method
to the TypedDict
Redefine `Metadata` class to be generic and refactor downstream classes
@ahmadjiha
Copy link
Author

Hello @d-v-b!

I rebased and pushed commits to make the Metadata abstract base class generic + required changes to the downstream inheriting classes. I updated the PR descriptions w/ implementation notes + additional questions.

When you have a moment, a review would be super helpful!

@jhamman jhamman changed the base branch from v3 to main October 14, 2024 20:57
@jhamman jhamman added this to the After 3.0.0 milestone Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
V3 Affects the v3 branch
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

[v3] Typing: use TypedDict for all dictionaries
3 participants