Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize calculate_data and redact_data() #361

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

Lash-L
Copy link
Contributor

@Lash-L Lash-L commented Nov 7, 2024

redact data likely needs more work. Gonna let my instance run for a while and then try it. With limited data it does show some pretty large improvements.

There were some merge conflicts for calculate_data and update_advertisement. So if you could just give those a extra lookover that would probably be good!

Copy link
Owner

@agittins agittins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great stuff :D

I'm also thinking that we should:

  • in redaction_list_update():

    • store a (errr... static local?) stamp of MONOTONIC_TIME(), and if the last one was more than... maybe several hours? Then re-initialise list so we don't end up iterating over fixes for addresses (and associated matches) that have long since been purged from the system (ie, when a user does two diags but quite some time apart). There's probably a trade-off between cache miss/size cost and regeneration-time.
    • (maybe/or) schedule a job in hass to purge the redactions[] after a certain time, to free up the memory I assume they'd take up. Maybe it's not big enough to matter.
    • never apply .upper() to address (since we always normalise the address to lowercase) and always lower() any non-known-lowered address when adding/searching keys in redactions.
  • Then in redact_data() we can remove the case-checking variants, and always look for redactions[thing.lower()] / if thing.lower() in redactions or if find in data.lower().

Also, maybe we can ditch the re.sub altogether and use python's string.replace() since it might be a lot faster and we're not using the power of regex at all, really. Notes should be added to the update_redactions info to explain that the keys are not regex, but lower() forms of literal strings to match.

Also...

This comment is an interesting idea that might apply to our use-case:

If the replacements do not change the string size, you can speed up things a lot by using bytearray / buffers or even mmap and modify the data in-place. (re.sub() and string.replace and endstring += line cause that a lot of memory is copyied around.)

custom_components/bermuda/bermuda_device_scanner.py Outdated Show resolved Hide resolved
@@ -384,12 +397,16 @@ def calculate_data(self):
)
# Discard the bogus reading by duplicating the last.
self.hist_distance_by_interval.insert(0, self.hist_distance_by_interval[0])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a bug, just reported in #362 - might be worth including in this PR as a quickie? Suggestion is un-tested.

Suggested change
self.hist_distance_by_interval.insert(0, self.hist_distance_by_interval[0])
if len(self.hist_distance_by_interval) == 0:
self.hist_distance_by_interval = [ self.rssi_distance_raw ]
else:
self.hist_distance_by_interval.insert(0, self.hist_distance_by_interval[0])

custom_components/bermuda/bermuda_device_scanner.py Outdated Show resolved Hide resolved
custom_components/bermuda/bermuda_device_scanner.py Outdated Show resolved Hide resolved
custom_components/bermuda/coordinator.py Outdated Show resolved Hide resolved
@Lash-L
Copy link
Contributor Author

Lash-L commented Nov 11, 2024

I'm also thinking that we should:

  • in redaction_list_update():

    • store a (errr... static local?) stamp of MONOTONIC_TIME(), and if the last one was more than... maybe several hours? Then re-initialise list so we don't end up iterating over fixes for addresses (and associated matches) that have long since been purged from the system (ie, when a user does two diags but quite some time apart). There's probably a trade-off between cache miss/size cost and regeneration-time.
    • (maybe/or) schedule a job in hass to purge the redactions[] after a certain time, to free up the memory I assume they'd take up. Maybe it's not big enough to matter.
    • never apply .upper() to address (since we always normalise the address to lowercase) and always lower() any non-known-lowered address when adding/searching keys in redactions.
  • Then in redact_data() we can remove the case-checking variants, and always look for redactions[thing.lower()] / if thing.lower() in redactions or if find in data.lower().

Also, maybe we can ditch the re.sub altogether and use python's string.replace() since it might be a lot faster and we're not using the power of regex at all, really. Notes should be added to the update_redactions info to explain that the keys are not regex, but lower() forms of literal strings to match.

  • in redaction_list_update():

    • store a (errr... static local?) stamp of MONOTONIC_TIME(), and if the last one was more than... maybe several hours? Then re-initialise list so we don't end up iterating over fixes for addresses (and associated matches) that have long since been purged from the system (ie, when a user does two diags but quite some time apart). There's probably a trade-off between cache miss/size cost and regeneration-time.
    • (maybe/or) schedule a job in hass to purge the redactions[] after a certain time, to free up the memory I assume they'd take up. Maybe it's not big enough to matter.
    • never apply .upper() to address (since we always normalise the address to lowercase) and always lower() any non-known-lowered address when adding/searching keys in redactions.
  • Then in redact_data() we can remove the case-checking variants, and always look for redactions[thing.lower()] / if thing.lower() in redactions or if find in data.lower().

  • ditch the re.sub

@Lash-L
Copy link
Contributor Author

Lash-L commented Nov 11, 2024

Ready for another lookover @agittins

Redactions are deleted every 8 hours.

@agittins
Copy link
Owner

Just a heads-up that I've had to make some changes to get some bugfixes out, so I've rebased your PR against main, I hope I've done that right :-/ There might be some areas (particularly in coordinator) where I've made changes to main that might need careful merging - I'm not 100% sure, I'm still getting to grips with handling merge conflicts (only been mucking around with various VCSs since the early 2000's, can't go too hard on myself! 😅 )

Re the purging, I think there are job/task types that should cancel automatically when the config entry is blatted - I think the ones you attach via the config entry itself might fit that - but I didn't find any that let you schedule future jobs when I was altering the device registry refresh thingie, so just used a timestamp.

@Lash-L
Copy link
Contributor Author

Lash-L commented Nov 13, 2024

Your merge looks fine!

Yeah there is probably a better way to handle it, but honestly this seems to be fine imo. I think the tests were just being annoying

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants