Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data quality metrics #57

Open
jeremyestein opened this issue Sep 25, 2024 · 0 comments
Open

Data quality metrics #57

jeremyestein opened this issue Sep 25, 2024 · 0 comments

Comments

@jeremyestein
Copy link
Collaborator

jeremyestein commented Sep 25, 2024

Part of epic: #28

Design some metrics that measure the data quality, likely in the form of SQL statements.
To be run as part of pre-release validation, or as part of routine monitoring of Emap.

Gaps

(Definition: data is missing)
The collation algorithm tries to minimise temporary gaps by waiting for more data to arrive before performing collation.
However at some point (~10 secs) it decides to collate what it has regardless of gaps and send to the Emap core proc. Hopefully the delayed data will come in later but this is beyond our control.

We can test for gaps in the DB by looking at each row's observation date and cardinality of the values array (rows will be of varying cardinality - see fragmentation below).

Fragmentation

(Definition: Emap interchange messages are received containing less than the target 3000 samples, thus more rows are used in the DB than would have been ideal, but data is not actually missing)

It's almost inevitable that messages will occasionally be delayed or out of order, so some fragmentation is to be expected. But excessive fragmentation might be a sign of a bug in the collation algorithm, or an unusual set of messages that was not considered.

Completeness

If we're running validation from a known waveform test source, then we should know how many data points it should have

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant