Add contains method to KVStore #1454

cgohlke · 2023-07-10T03:04:01Z

Over at Bayer-Group/tiffslide#72, we noticed that reading from a tifffile.ZarrTiffStore calls the store's __getitem__ member function twice for each chunk. This is due to k in self below being routed through KVStore.__getitem__.

zarr-python/zarr/_storage/store.py

Line 160 in 8c98f45

return {k: self[k] for k in keys if k in self}

This patch adds a KVStore.__contains__ member function, which does not require reading and decoding chunks for the membership test.

Using this patch, performance almost doubled using the benchmarks in the linked issue.

TODO:

~~Add unit tests and/or doctests in docstrings~~
~~Add docstrings and API docs for any new/modified user-facing classes and functions~~
~~New/modified features documented in docs/tutorial.rst~~
Changes documented in docs/release.rst
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

rabernat

Thanks a lot for this @cgohlke!

Having also worked with the various store classes exposed by zarr-python, I have to say that I find their behavior pretty confusing. There are a lot of fallback methods like this one which are very expensive, and when developing a custom store, it's hard to know which methods you should have to implement in order to get good performance. I think a broader refactor of this part of the code is needed.

In the meantime, this seems like a much needed improvement.

codecov · 2023-07-10T12:57:18Z

Codecov Report

Merging #1454 (6a9a5af) into main (cc2bd41) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main     #1454   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           37        37           
  Lines        14866     14868    +2     
=========================================
+ Hits         14866     14868    +2

Impacted Files	Coverage Δ
zarr/storage.py	`100.00% <100.00%> (ø)`

This implements the change proposed in zarr-developers/zarr-python#1454

rabernat · 2023-07-10T14:47:08Z

@cgohlke - if you can just update the release notes, we should be gtg here.

* replace openslide with tiffslide * patch zarr to avoid decoding tiles in duplicate This implements the change proposed in zarr-developers/zarr-python#1454 * rm openslide-python and add tiffslide * do not stitch because it imposes a performance penalty * ignore types in vis_params * add isort and tiffslide to dev deps * add NoBackendException * run isort * use wsinfer.wsi module instead of slide_utils and add tiffslide and openslide backends * use wsinfer.wsi.WSI as generic entrypoint for whole slides * replace PathType with "str | Path" * add logging and backend selection to cli * add "from __future__ import annotations"

* use pytorch 2.0.0 as base image * install g++ * do not remove gcc * import torch to please jit compiler * move custom model impls to custom_models namespace * refactor to use wsinfer-zoo * run isort and then black * rm modeldefs + make modellib and patchlib public (no underscore) * do not use torch.compile on torchscript models * Fix/issue 131 (#133) * use tifffile in lieu of large_image * run isort * make outputs float or None * changes to please mypy * add newline at end of document * add openslide-python and tifffile to core deps * add back roi support and mps device caps * black formatting * rm unused file * add wsinfer-zoo to deps * predownload registry JSON + install system deps in early layer * scale step size and print info Fixes #135 * add patchlib presets to package data and rm modeldefs * set default step_size to None * only allow step-size=patch-size * allow custom step sizes * update mpp print logs to slide mpp * add tiff mpp via openslide * resize patches to prescribed patch size and spacing * add model config schema * add schemas to package data * fix error messages Replace `--model-name` with `--model`. * create OpenSlide obj in worker_init func Fixes #137 The OpenSlide object is no longer created in `__init__`. Previously the openslide object was shared across workers. Now each worker creates its own OpenSlide object. I hypothesize that this will allow multi-worker data loading on Windows. * handle num_workers=0 * ADD choice of backends (tiffslide or openslide) (#139) * replace openslide with tiffslide * patch zarr to avoid decoding tiles in duplicate This implements the change proposed in zarr-developers/zarr-python#1454 * rm openslide-python and add tiffslide * do not stitch because it imposes a performance penalty * ignore types in vis_params * add isort and tiffslide to dev deps * add NoBackendException * run isort * use wsinfer.wsi module instead of slide_utils and add tiffslide and openslide backends * use wsinfer.wsi.WSI as generic entrypoint for whole slides * replace PathType with "str | Path" * add logging and backend selection to cli * add "from __future__ import annotations" * TST: update tests for dev branch (#143) * begin to update tests * do not resize images prior to transform This introduces subtle differences from the current stable version of wsinfer. * fix for issue #125 * do not save slide path in model outputs csv * add test_cli_run_with_registered_models * add reference model outputs These reference outputs were created using a patched version of 0.3.6 wsinfer. The patches involved padding the patches from large-image to be the expected patch size. Large image does not pad images by default, whereas openslide and tiffslide pad with black. * skip jit tests and cli with custom config * deprecate python 3.7 * install openslide and tiffslide * remove WSIType object * remove dense grid creation fixes #138 * remove timm and custom models We will focus on using TorchScript models only. In the future, we can also look into using ONNX as a backend. fixes #140 * limit click versions to please mypy related to pallets/click#2558 * satisfy mypy * fix cli args for wsinfer run * fail loudly with dev pytorch + fix jit compile tests * fix test of issue 89 * move wsinfer imports to beginning of file * add test of mutually exclusive cli args * use -p shorthand for model-path * mark that we support typing * add py.typed to package data * run test-package on windows, macos, and linux * fix test of patching * install openslide differently on different systems * close the case statement * fix the way we install openslide on different envs * fix matrix.os test * get line length with python for cross-platform * test "wsinfer run" differently for unix and windows * fix windows test * fix path to csv * skip windows tests for now because tissue segmentation is different * run "wsinfer run" on windows but do not test file length * add test of local model with config

Add __contains__ method to KVStore

55f5c7e

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jul 10, 2023

rabernat approved these changes Jul 10, 2023

View reviewed changes

kaczmarj added a commit to SBU-BMI/wsinfer that referenced this pull request Jul 10, 2023

patch zarr to avoid decoding tiles in duplicate

5138727

This implements the change proposed in zarr-developers/zarr-python#1454

Update release notes

6a9a5af

github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jul 10, 2023

rabernat merged commit ac89782 into zarr-developers:main Jul 10, 2023
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add contains method to KVStore #1454

Add contains method to KVStore #1454

cgohlke commented Jul 10, 2023 •

edited by rabernat

Loading

rabernat left a comment

codecov bot commented Jul 10, 2023 •

edited

Loading

rabernat commented Jul 10, 2023

Add __contains__ method to KVStore #1454

Add __contains__ method to KVStore #1454

Conversation

cgohlke commented Jul 10, 2023 • edited by rabernat Loading

rabernat left a comment

Choose a reason for hiding this comment

codecov bot commented Jul 10, 2023 • edited Loading

Codecov Report

rabernat commented Jul 10, 2023

Add contains method to KVStore #1454

Add contains method to KVStore #1454

cgohlke commented Jul 10, 2023 •

edited by rabernat

Loading

codecov bot commented Jul 10, 2023 •

edited

Loading