Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve performance of read_object with idx by reading in whole objec… #33

Closed
wants to merge 1 commit into from

Conversation

lvarriano
Copy link
Contributor

…t, then slicing

Significantly improves performance of read_object when using the idx parameter (see #29) at the cost of slightly increased memory use (up to about 150 MB for one channel of a raw cal file of 15 GB, but typically/always a single channel is loaded at a time). Depending on specifics of file, such as number of events, number of idx, and length of gaps between idx, this can improve performance by up to ~100x.

@jasondet
Copy link
Contributor

jasondet commented Nov 6, 2023

  • add a flag like use_h5idx to read the original way in case the user cares more about limiting memory than speed
  • add blurbs in the documentation about this issue, linking to the issues where this is discussed, and comment on the fact that the "fast" version can't be used with obj_buf

@lvarriano
Copy link
Contributor Author

This works for a single file without a problem, but once the unit tests try to read in multiple files, some issues with array views and indexing crop up. This turns out to be significantly more complicated to do because of how things are structured when reading in multiple files... I will close this pull-request for now until a better implementation can be worked out.

@lvarriano lvarriano closed this Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants