[Feature]: Define partial chunk shape for GenericDataChunkIterator #995

bendichter · 2023-11-08T18:37:52Z

What would you like to see added to HDMF?

Right now, for the GenericDataChunkIterator, it's possible to define chunk_mb or chunk_shape. I would like to enable a hybrid approach, where a user could input chunk_mb=10.0, chunk_shape=(None, 64), and the GenericDataChunkIterator would identify the remaining dimension that gets you close to the target chunk size.

Is your feature request related to a problem?

It is pretty common for users to have some insight into the likely read patterns of a dataset.

What solution would you like?

I would like GenericDataChunkIterator to find the maximum size (prod of dims) that is <= the target size. I also would like the chunk to be as cube-like as possible, so I would like to minimize the sum of the dimensions of the array. Previously, we tried building chunks that were scaled down versions of the data shape, similar to h5py, but experience with Jeremy has shown that this approach is poorly suited for common data reading routines, and I think a better naive assumption would be that (hyper-) cube chunks are a good default.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this change was not already requested?

The text was updated successfully, but these errors were encountered:

oruebel · 2023-11-08T18:50:48Z

@CodyCBakerPhD is this an issue you could help with, since you are most familiar with GenericDataChunkIterator?

bendichter · 2023-11-08T18:59:24Z

Above is a proposed solution. Obv this needs tests, but I wanted to run it by the group before moving forward

oruebel · 2023-11-08T19:22:34Z

To be honest, the functionality you describe to me sounds more like a utility function that would be more broadly useful for DataChunk iterators. I.e., this could be a method (e.g., max_chunk_shape that a user would call to get suggested chunk sized that they would then hand to the iterator, e.g.:

GenericDataChunkIterator(chunk_shape=max_chunk_shape(chunk_mb=10.0, chunk_shape=(None, 64)), ...)

This function could either live on it's own in the same module as GenericDataChunkIterator or maybe be a static method on AbstractDataChunkIterator or DataChunk (but I think separately may make sense). The main reason I think this would be useful to do as a utility function is that it:

Makes the logic reusable
Makes the logic explicit, since the user calls the function ,rather than being additional "magic" inside the constructor of GenericDataChunkIterator

oruebel · 2023-11-08T19:23:55Z

Sorry, I didn't see that you made a draft PR, I was referring to the solution you suggested in the issue. Let me take a look at the PR.

oruebel added this to the 4.0 milestone Nov 8, 2023

oruebel added the priority: medium non-critical problem and/or affecting only a small set of users label Nov 8, 2023

oruebel added the category: enhancement improvements of code or code behavior label Nov 8, 2023

bendichter linked a pull request Nov 8, 2023 that will close this issue

propose alternative chunk shape algorithm #996

Draft

6 tasks

CodyCBakerPhD linked a pull request Nov 8, 2023 that will close this issue

Constrained chunk shape estimation #997

Draft

6 tasks

mavaylon1 removed this from the 4.0 milestone Mar 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Define partial chunk shape for GenericDataChunkIterator #995

[Feature]: Define partial chunk shape for GenericDataChunkIterator #995

bendichter commented Nov 8, 2023

oruebel commented Nov 8, 2023

bendichter commented Nov 8, 2023

oruebel commented Nov 8, 2023

oruebel commented Nov 8, 2023

[Feature]: Define partial chunk shape for GenericDataChunkIterator #995

[Feature]: Define partial chunk shape for GenericDataChunkIterator #995

Comments

bendichter commented Nov 8, 2023

What would you like to see added to HDMF?

Is your feature request related to a problem?

What solution would you like?

Do you have any interest in helping implement the feature?

Code of Conduct

oruebel commented Nov 8, 2023

bendichter commented Nov 8, 2023

oruebel commented Nov 8, 2023

oruebel commented Nov 8, 2023