Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add __contains__ method to KVStore #1454

Merged
merged 2 commits into from
Jul 10, 2023
Merged

Add __contains__ method to KVStore #1454

merged 2 commits into from
Jul 10, 2023

Conversation

cgohlke
Copy link
Contributor

@cgohlke cgohlke commented Jul 10, 2023

Over at Bayer-Group/tiffslide#72, we noticed that reading from a tifffile.ZarrTiffStore calls the store's __getitem__ member function twice for each chunk. This is due to k in self below being routed through KVStore.__getitem__.

return {k: self[k] for k in keys if k in self}

This patch adds a KVStore.__contains__ member function, which does not require reading and decoding chunks for the membership test.

Using this patch, performance almost doubled using the benchmarks in the linked issue.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jul 10, 2023
Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this @cgohlke!

Having also worked with the various store classes exposed by zarr-python, I have to say that I find their behavior pretty confusing. There are a lot of fallback methods like this one which are very expensive, and when developing a custom store, it's hard to know which methods you should have to implement in order to get good performance. I think a broader refactor of this part of the code is needed.

In the meantime, this seems like a much needed improvement.

@codecov
Copy link

codecov bot commented Jul 10, 2023

Codecov Report

Merging #1454 (6a9a5af) into main (cc2bd41) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main     #1454   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           37        37           
  Lines        14866     14868    +2     
=========================================
+ Hits         14866     14868    +2     
Impacted Files Coverage Δ
zarr/storage.py 100.00% <100.00%> (ø)

kaczmarj added a commit to SBU-BMI/wsinfer that referenced this pull request Jul 10, 2023
@rabernat
Copy link
Contributor

@cgohlke - if you can just update the release notes, we should be gtg here.

@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jul 10, 2023
@rabernat rabernat merged commit ac89782 into zarr-developers:main Jul 10, 2023
19 checks passed
kaczmarj added a commit to SBU-BMI/wsinfer that referenced this pull request Jul 11, 2023
* replace openslide with tiffslide

* patch zarr to avoid decoding tiles in duplicate

This implements the change proposed in
zarr-developers/zarr-python#1454

* rm openslide-python and add tiffslide

* do not stitch because it imposes a performance penalty

* ignore types in vis_params

* add isort and tiffslide to dev deps

* add NoBackendException

* run isort

* use wsinfer.wsi module instead of slide_utils and add tiffslide and openslide backends

* use wsinfer.wsi.WSI as generic entrypoint for whole slides

* replace PathType with "str | Path"

* add logging and backend selection to cli

* add "from __future__ import annotations"
kaczmarj added a commit to SBU-BMI/wsinfer that referenced this pull request Jul 16, 2023
* use pytorch 2.0.0 as base image

* install g++

* do not remove gcc

* import torch to please jit compiler

* move custom model impls to custom_models namespace

* refactor to use wsinfer-zoo

* run isort and then black

* rm modeldefs + make modellib and patchlib public (no underscore)

* do not use torch.compile on torchscript models

* Fix/issue 131 (#133)

* use tifffile in lieu of large_image

* run isort

* make outputs float or None

* changes to please mypy

* add newline at end of document

* add openslide-python and tifffile to core deps

* add back roi support and mps device caps

* black formatting

* rm unused file

* add wsinfer-zoo to deps

* predownload registry JSON + install system deps in early layer

* scale step size and print info

Fixes #135

* add patchlib presets to package data and rm modeldefs

* set default step_size to None

* only allow step-size=patch-size

* allow custom step sizes

* update mpp print logs to slide mpp

* add tiff mpp via openslide

* resize patches to prescribed patch size and spacing

* add model config schema

* add schemas to package data

* fix error messages

Replace `--model-name` with `--model`.

* create OpenSlide obj in worker_init func

Fixes #137

The OpenSlide object is no longer created in `__init__`. Previously the
openslide object was shared across workers. Now each worker creates its
own OpenSlide object. I hypothesize that this will allow multi-worker
data loading on Windows.

* handle num_workers=0

* ADD choice of backends (tiffslide or openslide) (#139)

* replace openslide with tiffslide

* patch zarr to avoid decoding tiles in duplicate

This implements the change proposed in
zarr-developers/zarr-python#1454

* rm openslide-python and add tiffslide

* do not stitch because it imposes a performance penalty

* ignore types in vis_params

* add isort and tiffslide to dev deps

* add NoBackendException

* run isort

* use wsinfer.wsi module instead of slide_utils and add tiffslide and openslide backends

* use wsinfer.wsi.WSI as generic entrypoint for whole slides

* replace PathType with "str | Path"

* add logging and backend selection to cli

* add "from __future__ import annotations"

* TST: update tests for dev branch (#143)

* begin to update tests

* do not resize images prior to transform

This introduces subtle differences from the current stable version of
wsinfer.

* fix for issue #125

* do not save slide path in model outputs csv

* add test_cli_run_with_registered_models

* add reference model outputs

These reference outputs were created using a patched version of 0.3.6
wsinfer. The patches involved padding the patches from large-image to be
the expected patch size. Large image does not pad images by default,
whereas openslide and tiffslide pad with black.

* skip jit tests and cli with custom config

* deprecate python 3.7

* install openslide and tiffslide

* remove WSIType object

* remove dense grid creation

fixes #138

* remove timm and custom models

We will focus on using TorchScript models only. In the future, we can
also look into using ONNX as a backend.

fixes #140

* limit click versions to please mypy

related to pallets/click#2558

* satisfy mypy

* fix cli args for wsinfer run

* fail loudly with dev pytorch + fix jit compile tests

* fix test of issue 89

* move wsinfer imports to beginning of file

* add test of mutually exclusive cli args

* use -p shorthand for model-path

* mark that we support typing

* add py.typed to package data

* run test-package on windows, macos, and linux

* fix test of patching

* install openslide differently on different systems

* close the case statement

* fix the way we install openslide on different envs

* fix matrix.os test

* get line length with python for cross-platform

* test "wsinfer run" differently for unix and windows

* fix windows test

* fix path to csv

* skip windows tests for now because tissue segmentation is different

* run "wsinfer run" on windows but do not test file length

* add test of local model with config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants