Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic pluggable cache implementation #115

Closed
wants to merge 4 commits into from
Closed

basic pluggable cache implementation #115

wants to merge 4 commits into from

Conversation

jhamman
Copy link
Contributor

@jhamman jhamman commented Oct 22, 2022

Description of proposed changes

This PR adds a new cache feature to Xbatcher's BatchGenerator. As implemented, this requires Zarr to serialize Xarray datasets. The cache itself is entirely pluggable, accepting any dict-like object to store caches in.

I'm putting this up to help foster discussions in #109. I'm still not sure its the best path forward but I'd like to get some feedback and this felt like a tangible way to test this idea out.

If you want to try this out, you could try this:

In [1]: import xarray as xr

In [2]: import xbatcher

In [3]: import zarr

In [4]: cache = zarr.storage.DirectoryStore('/flash/fast/storage/cache')

In [5]: ds = xr.tutorial.open_dataset('air_temperature')

In [6]: gen = xbatcher.BatchGenerator(ds, input_dims={'lat': 10, 'lon': 10}, cache=cache)

In [7]: %%time
   ...: for b in gen:
   ...:     pass
   ...: 
CPU times: user 95 ms, sys: 40.8 ms, total: 136 ms
Wall time: 146 ms

In [8]: %%time
   ...: for b in gen:
   ...:     pass
   ...: 
CPU times: user 59.6 ms, sys: 11.4 ms, total: 70.9 ms
Wall time: 65.5 ms

Note that I used a directory store here but this could be any zarr-friendly store (e.g. s3, redis, etc.)

@codecov-commenter
Copy link

codecov-commenter commented Oct 22, 2022

Codecov Report

Merging #115 (ea7f128) into main (77c470b) will decrease coverage by 5.71%.
The diff coverage is 42.85%.

@@             Coverage Diff             @@
##              main     #115      +/-   ##
===========================================
- Coverage   100.00%   94.28%   -5.72%     
===========================================
  Files            5        5              
  Lines          192      210      +18     
  Branches        35       39       +4     
===========================================
+ Hits           192      198       +6     
- Misses           0        9       +9     
- Partials         0        3       +3     
Impacted Files Coverage Δ
xbatcher/generators.py 88.34% <42.85%> (-11.66%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jhamman jhamman closed this by deleting the head repository Dec 30, 2022
@maxrjones
Copy link
Member

@jhamman, do you mind if I open a new PR based on your work here?

@jhamman
Copy link
Contributor Author

jhamman commented Jan 5, 2023

Go for it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants