Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow arbitrary precision for generic numeric types #1631

Open
3 tasks done
bendichter opened this issue Jan 10, 2023 · 4 comments
Open
3 tasks done

Allow arbitrary precision for generic numeric types #1631

bendichter opened this issue Jan 10, 2023 · 4 comments
Assignees
Labels
category: proposal proposed enhancements or new features priority: high impacts proper operation or use of feature important to most users topic: validator issues related to validation of files
Milestone

Comments

@bendichter
Copy link
Contributor

bendichter commented Jan 10, 2023

What would you like to see added to PyNWB?

I just got this question from @urut

We're exporting things with matnwb and running through pynwb validator and it gives error below. question is, what is meant by 'int' here -- int32? single?

image (25)

The answer is int32, but I think this brings up a good point, and this is a common hurdle users producing NWB files outside of PyNWB.

IMO, specifying type precision this way has two problems:

  1. It's unclear. I get that single precision is often the default, so int can be shorthand for int32, but that isn't always the case: MATLAB does not have a default integer precision (int64(5) works but int(5) does not) and the default precision for floats is double (64). Of course when you are creating a variable you must choose a precision, but when used in a schema to me it's more natural to interpret this an int of any precision in this case.
  2. It is inflexible. I would also prefer the "any precision" interpretation because it would allow the user to decide what precision they need. Enforcing a higher precision than necessary is wasteful and honestly I don't really see an upside.

Is your feature request related to a problem?

No response

What solution would you like?

My most preferred solution would be to map "int" to "int8", "uint" to "uint8", and "float" to "float8". I'm pretty sure this would not invalidate any currently valid NWB files and would not break any tools.

If there is some reason why keeping single-precision is important, then my next favorite solution would be to change the schema to make the data requirement explicit, changing all instances of "int" to "int32".

If that is too big of a lift, then another solution that would at least move us in the right direction would be to enhance the messages produced by the validator to be explicit about the precision requirement.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

@urut
Copy link

urut commented Jan 10, 2023

Makes sense to me if the validation for "int" would be "any precision". Often these numbers are small like a TTL so int8 would be sufficient. Or, at very least, the error should be specific and say what is required (int32) because "int" is ambiguous (and does not exist in Matlab, it has int8, int16, int32, single etc).

@t-b
Copy link
Collaborator

t-b commented Jan 10, 2023

Makes sense to me if the validation for "int" would be "any precision".

Yes, that is my preference as well. I can't find the reference right now, the closest I could find was #698 (comment). But we have that defined somewhere in the schema.

@bendichter Does the file really contain int8 data or is just the YAML/JSON datatype int8?

@bendichter
Copy link
Contributor Author

bendichter commented Jan 10, 2023

Decision from meeting: map "int" to "int8" in the language and in the HDMF library, effectively allowing any precision of integer for anything labeled "int". It will also require a change to the PyNWB validator.

  • change the hdmf schema language docs
  • change the pynwb schema language docs
  • change the hdmf repo to map int to int8, etc.
  • change matnwb to map to int to int8, etc.
  • update pynwb validator

@rly am I missing anything?

@oruebel oruebel added this to the Next Release milestone Jan 10, 2023
@oruebel oruebel added priority: high impacts proper operation or use of feature important to most users category: proposal proposed enhancements or new features topic: validator issues related to validation of files labels Jan 10, 2023
@bendichter bendichter changed the title [Discussion]: validation of generic data types, e.g. "int" and "float" Allow arbitrary precision for generic numeric types Jan 11, 2023
@stephprince stephprince modified the milestones: 2.7.0, 2.8.0 May 9, 2024
@stephprince stephprince modified the milestones: 2.8.0, 2.9.0 Jul 23, 2024
@rly rly modified the milestones: 2.9.0, Next Major Release - 3.0 Sep 5, 2024
@rly
Copy link
Contributor

rly commented Nov 7, 2024

As a note, if the id columns of a DynamicTable (type: ElementIdentifiers) is initialized to be an int8 dataset and is set to be expandable, then you cannot add more than 128 rows to the table. You would have to delete the dataset and re-create it with a different dtype. Similarly, if a DynamicTableRegion is initialized to be an int8 dataset, referenced row indices could not be larger than 128. This is rare and may never come up, but is why @oruebel and I would prefer to keep the dtype of ElementIdentifiers and DynamicTableRegion to be int32 instead of int8.

I am OK with changing the meaning of "int" to int8, implemented here: hdmf-dev/hdmf-schema-language#38

But note that the only areas where "int" is used nowadays is in ElementIdentifiers and DynamicTableRegion and if we change the dtype of these types to be int32, then the only places "int" would be observed is in extensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: proposal proposed enhancements or new features priority: high impacts proper operation or use of feature important to most users topic: validator issues related to validation of files
Projects
None yet
Development

No branches or pull requests

6 participants