Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rechunk method for uncompressed arrays #199

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Rechunk method for uncompressed arrays #199

wants to merge 12 commits into from

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Jul 22, 2024

@rsignell this is inspired by your blog post (still a WIP for now)

The idea is that you can simply do

vds = open_virtual_dataset('uncompressed_netcdf3.nc')
subchunked_vds = vds.chunk(time=1)

@TomNicholas TomNicholas added the enhancement New feature or request label Jul 22, 2024
@TomNicholas
Copy link
Member Author

This now works in the sense that the .rechunk method on the ManifestArray class passes dedicated tests (and we can rechunk in however many dimensions we want!), but the new integration test fails because Dataset.chunk is dispatching to dask's version of .rechunk somewhere. This may require a change to xarray's ChunkManagerEntrypoint upstream to fix.

@TomNicholas
Copy link
Member Author

Note to self: we should add more validation to the ZArray class to check that the chunks attribute is a tuple of positive integers, and move the zarray.replace call to the start of the method to catch invalid input early.

@rsignell
Copy link
Collaborator

we can rechunk in however many dimensions we want!

I love that aspect!

@TomNicholas
Copy link
Member Author

TomNicholas commented Jul 30, 2024

but the new integration test fails because Dataset.chunk is dispatching to dask's version of .rechunk somewhere. This may require a change to xarray's ChunkManagerEntrypoint upstream to fix.

pydata/xarray#9286 has now progressed far enough that this PR works for me at least (when using that xarray branch)! Passing all tests locally 🟢

@TomNicholas TomNicholas marked this pull request as ready for review July 30, 2024 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arbitrary chunking of uncompressed files (e.g. netCDF3)
2 participants