-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VectorData Expand by Default via write_dataset #1093
Conversation
Notes and Questions:
|
Are you referring to hdmf/src/hdmf/backends/hdf5/h5tools.py Line 1319 in d85d0cb
If so, this function is used to write scalar datasets, i.e., dataset with a single value.
Could you point to the case you are referring to?
This would mean to make all datasets expandable by enabling chunking for all datasets. That is a bit broader approach then to make this the default just for VectorData, but it would make it the default behavior for all (non-scalar) datasets. If that is the approach we'd want to take, then I would suggest adding a parameter |
The enable chunking parameter to give the user the option to turn off the expandable default? If so, would there be a reason they would want to? |
In my experience it is best to make choices explicit and provide useful defaults rather than hiding configurations. A user may not want to use chunking if they want to use numpy memory mapping to read contiguous datasets. |
@rly I will shoot to have this done by next week (formerly Friday May 3). |
Dev Notes:
From my understanding we only need modify the input parameter options for only list_fill. Now Oliver mentioned being more explicit with a switch |
Yes, I believe that is correct. I think only logic in |
Tests:
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #1093 +/- ##
=======================================
Coverage 88.70% 88.70%
=======================================
Files 45 45
Lines 9745 9748 +3
Branches 2767 2769 +2
=======================================
+ Hits 8644 8647 +3
Misses 779 779
Partials 322 322 ☔ View full report in Codecov by Sentry. |
@oruebel This is mostly done. I need to check/update or write a test that my changes does not do any interference of existing maxshape settings. (And do another pass through to make sure the logic is efficient) However, the main point I want to bring up is your idea of having a parameter for turning on and off the expandability. This would mean
|
I don't think the parameter needs to be in HDMFIO. I think it's ok to just add as a parameter in HDF5IO |
HDF5IO write needs to call write_builder. It does that by calling super().write(**kwargs). This then gets us to HDMFIO write, which calls write_builder. |
Yes, but Lines 80 to 81 in 126bdb1
and those are being passed through to Line 99 in 126bdb1
So you can add custom keyword arguments without having to add them in |
Well isn't that just right in front of my face. |
Notes:
|
Co-authored-by: Oliver Ruebel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some minor suggestions, but otherwise this looks good to me.
Thanks for the quick review. I will make the doc string more detailed, but take a look at my comments for the other changes. The pass was a deliberate thing (vs a left over from a draft) and I like the warning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks!
@@ -7,6 +7,7 @@ | |||
- Added `TypeConfigurator` to automatically wrap fields with `TermSetWrapper` according to a configuration file. @mavaylon1 [#1016](https://github.com/hdmf-dev/hdmf/pull/1016) | |||
- Updated `TermSetWrapper` to support validating a single field within a compound array. @mavaylon1 [#1061](https://github.com/hdmf-dev/hdmf/pull/1061) | |||
- Updated testing to not install in editable mode and not run `coverage` by default. @rly [#1107](https://github.com/hdmf-dev/hdmf/pull/1107) | |||
- Updated the default behavior for writing HDF5 datasets to be expandandable datasets with chunking enabled by default. This does not override user set chunking parameters. @mavaylon1 [#1093](https://github.com/hdmf-dev/hdmf/pull/1093) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expandandable -> expandable
Could you add documentation on how to expand a VectorData? It looks like creation of a dataset of references is not modified here. Some tables in NWB contain columns that are all references, e.g., the electrode table has a column with references to the ElectrodeGroup. I think such datasets should be expandable as well. |
Yeah the lack of dataset of references support was just a smaller scope for this idea. I agree this makes a lot of sense to have. I will make this an issue ticket. As for the expansion of VectorData documentation, I thought we had that. Maybe I am thinking of the HDF5 documentation, but I will look. If it does not exist, I will loop that into the ticket for dataset of references. |
Motivation
What was the reasoning behind this change? Please explain the changes briefly.
This change allows the new default behavior for writing VectorData data as expandandable datasets. We do this by providing maxshape to dataset settings that do not already have a defined maxshape set by the user.
How to test the behavior?
Testts
Checklist
CHANGELOG.md
with your changes?