Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage efficiency #33

Closed
skeating opened this issue Jul 28, 2024 · 1 comment
Closed

Storage efficiency #33

skeating opened this issue Jul 28, 2024 · 1 comment

Comments

@skeating
Copy link
Contributor

My initial tests have assumed that there will be 30 patients generating data from one 50Hz and one 300Hz waveform source at all times.

At this rate of data flow, my very naive DB implementation results in ~100GB of backend postgres disk usage being generated per day - clearly far too much if we're aiming for the UDS to stay under 1TB, although that figure may be quite out of date!

You can calculate a theoretical minimum:
30 * 86400 * 350 = 907 million data points per day.
I don't know what numerical type the original data uses, but assuming 8 bytes per data point, that's ~8GB per day. If we keep only the last 7 days of data, that caps it at ~60GB overall. Will need to allow for some row metadata, the underlying DB having to be padded/aligned/whatever, and it will be a bit more. Am assuming compression is impossible.

Using SQL arrays is likely to significantly reduce the data storage needed vs the naive implementation.

@jeremyestein
Copy link
Collaborator

As an epic, this is part of #27
For design discussion see https://github.com/UCLH-DHCT/emap/blob/jeremy/hf-data/docs/dev/features/waveform_hf_data.md

@jeremyestein jeremyestein closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants