Allow arbitrary precision for generic numeric types #1631

bendichter · 2023-01-10T12:50:22Z

What would you like to see added to PyNWB?

I just got this question from @urut

We're exporting things with matnwb and running through pynwb validator and it gives error below. question is, what is meant by 'int' here -- int32? single?

The answer is int32, but I think this brings up a good point, and this is a common hurdle users producing NWB files outside of PyNWB.

IMO, specifying type precision this way has two problems:

It's unclear. I get that single precision is often the default, so int can be shorthand for int32, but that isn't always the case: MATLAB does not have a default integer precision (int64(5) works but int(5) does not) and the default precision for floats is double (64). Of course when you are creating a variable you must choose a precision, but when used in a schema to me it's more natural to interpret this an int of any precision in this case.
It is inflexible. I would also prefer the "any precision" interpretation because it would allow the user to decide what precision they need. Enforcing a higher precision than necessary is wasteful and honestly I don't really see an upside.

Is your feature request related to a problem?

No response

What solution would you like?

My most preferred solution would be to map "int" to "int8", "uint" to "uint8", and "float" to "float8". I'm pretty sure this would not invalidate any currently valid NWB files and would not break any tools.

If there is some reason why keeping single-precision is important, then my next favorite solution would be to change the schema to make the data requirement explicit, changing all instances of "int" to "int32".

If that is too big of a lift, then another solution that would at least move us in the right direction would be to enhance the messages produced by the validator to be explicit about the precision requirement.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this change was not already requested?

The text was updated successfully, but these errors were encountered:

urut · 2023-01-10T15:43:54Z

Makes sense to me if the validation for "int" would be "any precision". Often these numbers are small like a TTL so int8 would be sufficient. Or, at very least, the error should be specific and say what is required (int32) because "int" is ambiguous (and does not exist in Matlab, it has int8, int16, int32, single etc).

t-b · 2023-01-10T17:35:50Z

Makes sense to me if the validation for "int" would be "any precision".

Yes, that is my preference as well. I can't find the reference right now, the closest I could find was #698 (comment). But we have that defined somewhere in the schema.

@bendichter Does the file really contain int8 data or is just the YAML/JSON datatype int8?

bendichter · 2023-01-10T17:45:59Z

Decision from meeting: map "int" to "int8" in the language and in the HDMF library, effectively allowing any precision of integer for anything labeled "int". It will also require a change to the PyNWB validator.

change the hdmf schema language docs
change the pynwb schema language docs
change the hdmf repo to map int to int8, etc.
change matnwb to map to int to int8, etc.
update pynwb validator

@rly am I missing anything?

rly · 2024-11-07T23:03:58Z

As a note, if the id columns of a DynamicTable (type: ElementIdentifiers) is initialized to be an int8 dataset and is set to be expandable, then you cannot add more than 128 rows to the table. You would have to delete the dataset and re-create it with a different dtype. Similarly, if a DynamicTableRegion is initialized to be an int8 dataset, referenced row indices could not be larger than 128. This is rare and may never come up, but is why @oruebel and I would prefer to keep the dtype of ElementIdentifiers and DynamicTableRegion to be int32 instead of int8.

I am OK with changing the meaning of "int" to int8, implemented here: hdmf-dev/hdmf-schema-language#38

But note that the only areas where "int" is used nowadays is in ElementIdentifiers and DynamicTableRegion and if we change the dtype of these types to be int32, then the only places "int" would be observed is in extensions.

bendichter assigned rly and oruebel Jan 10, 2023

bendichter unassigned oruebel Jan 10, 2023

oruebel added this to the Next Release milestone Jan 10, 2023

oruebel added priority: high impacts proper operation or use of feature important to most users category: proposal proposed enhancements or new features topic: validator issues related to validation of files labels Jan 10, 2023

bendichter changed the title ~~[Discussion]: validation of generic data types, e.g. "int" and "float"~~ Allow arbitrary precision for generic numeric types Jan 11, 2023

stephprince modified the milestones: 2.7.0, 2.8.0 May 9, 2024

stephprince modified the milestones: 2.8.0, 2.9.0 Jul 23, 2024

rly modified the milestones: 2.9.0, Next Major Release - 3.0 Sep 5, 2024

rly mentioned this issue Oct 3, 2024

Change "int" meaning to "int8", reorder table hdmf-dev/hdmf-schema-language#38

Open

rly mentioned this issue Nov 7, 2024

Change dtype of ElementIdentifiers and DynamicTableRegion to int32 hdmf-dev/hdmf-common-schema#85

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow arbitrary precision for generic numeric types #1631

Allow arbitrary precision for generic numeric types #1631

bendichter commented Jan 10, 2023 •

edited

Loading

urut commented Jan 10, 2023

t-b commented Jan 10, 2023

bendichter commented Jan 10, 2023 •

edited

Loading

rly commented Nov 7, 2024

Allow arbitrary precision for generic numeric types #1631

Allow arbitrary precision for generic numeric types #1631

Comments

bendichter commented Jan 10, 2023 • edited Loading

What would you like to see added to PyNWB?

Is your feature request related to a problem?

What solution would you like?

Do you have any interest in helping implement the feature?

Code of Conduct

urut commented Jan 10, 2023

t-b commented Jan 10, 2023

bendichter commented Jan 10, 2023 • edited Loading

rly commented Nov 7, 2024

bendichter commented Jan 10, 2023 •

edited

Loading

bendichter commented Jan 10, 2023 •

edited

Loading