Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug when exporting HDF5 datasets with unlimited dimension #155

Merged
merged 5 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
### Enhancements
* Enhanced `ZarrIO` and `ZarrDataIO` to infer io settings (e.g., chunking and compression) from HDF5 datasets to preserve storage settings on export if possible @oruebel [#153](https://github.com/hdmf-dev/hdmf-zarr/pull/153)

### Bug Fixes
* Fixed bug when converting HDF5 datasets with unlimited dimensions @oruebel [#155](https://github.com/hdmf-dev/hdmf-zarr/pull/155)

## 0.5.0 (December 8, 2023)

### Enhancements
Expand Down
16 changes: 11 additions & 5 deletions src/hdmf_zarr/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -1174,9 +1174,8 @@ def __list_fill__(self, parent, name, data, options=None): # noqa: C901
io_settings = dict()
if options is not None:
dtype = options.get('dtype')
io_settings = options.get('io_settings')
if io_settings is None:
io_settings = dict()
if options.get('io_settings') is not None:
io_settings = options.get('io_settings')
# Determine the dtype
if not isinstance(dtype, type):
try:
Expand All @@ -1191,9 +1190,16 @@ def __list_fill__(self, parent, name, data, options=None): # noqa: C901
# Determine the shape and update the dtype if necessary when dtype==object
if 'shape' in io_settings: # Use the shape set by the user
data_shape = io_settings.pop('shape')
# If we have a numeric numpy array then use its shape
# If we have a numeric numpy-like array (e.g., numpy.array or h5py.Dataset) then use its shape
elif isinstance(dtype, np.dtype) and np.issubdtype(dtype, np.number) or dtype == np.bool_:
data_shape = get_data_shape(data)
# HDMF's get_data_shape may return the maxshape of an HDF5 dataset which can include None values
# which Zarr does not allow for dataset shape. Check for the shape attribute first before falling
# back on get_data_shape
if hasattr(data, 'shape') and data.shape is not None:
data_shape = data.shape
# This is a fall-back just in case. However this should not happen for standard numpy and h5py arrays
else: # pragma: no cover
data_shape = get_data_shape(data) # pragma: no cover
# Deal with object dtype
elif isinstance(dtype, np.dtype):
data = data[:] # load the data in case we come from HDF5 or another on-disk data source we don't know
Expand Down
6 changes: 6 additions & 0 deletions tests/unit/test_io_convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -868,6 +868,12 @@ def __get_data_array(self, foo_container):
"""For a container created by __roundtrip_data return the data array"""
return foo_container.buckets['bucket1'].foos['foo1'].my_data

def test_maxshape(self):
"""test when maxshape is set for the dataset"""
data = H5DataIO(data=list(range(5)), maxshape=(None,))
self.__roundtrip_data(data=data)
self.assertContainerEqual(self.out_container, self.read_container, ignore_hdmf_attrs=True)

def test_nofilters(self):
"""basic test that export without any options specified is working as expected"""
data = list(range(5))
Expand Down
Loading