You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Design some metrics that measure the data quality, likely in the form of SQL statements.
To be run as part of pre-release validation, or as part of routine monitoring of Emap.
Gaps
(Definition: data is missing)
The collation algorithm tries to minimise temporary gaps by waiting for more data to arrive before performing collation.
However at some point (~10 secs) it decides to collate what it has regardless of gaps and send to the Emap core proc. Hopefully the delayed data will come in later but this is beyond our control.
We can test for gaps in the DB by looking at each row's observation date and cardinality of the values array (rows will be of varying cardinality - see fragmentation below).
Fragmentation
(Definition: Emap interchange messages are received containing less than the target 3000 samples, thus more rows are used in the DB than would have been ideal, but data is not actually missing)
It's almost inevitable that messages will occasionally be delayed or out of order, so some fragmentation is to be expected. But excessive fragmentation might be a sign of a bug in the collation algorithm, or an unusual set of messages that was not considered.
Completeness
If we're running validation from a known waveform test source, then we should know how many data points it should have
The text was updated successfully, but these errors were encountered:
Part of epic: #28
Design some metrics that measure the data quality, likely in the form of SQL statements.
To be run as part of pre-release validation, or as part of routine monitoring of Emap.
Gaps
(Definition: data is missing)
The collation algorithm tries to minimise temporary gaps by waiting for more data to arrive before performing collation.
However at some point (~10 secs) it decides to collate what it has regardless of gaps and send to the Emap core proc. Hopefully the delayed data will come in later but this is beyond our control.
We can test for gaps in the DB by looking at each row's observation date and cardinality of the values array (rows will be of varying cardinality - see fragmentation below).
Fragmentation
(Definition: Emap interchange messages are received containing less than the target 3000 samples, thus more rows are used in the DB than would have been ideal, but data is not actually missing)
It's almost inevitable that messages will occasionally be delayed or out of order, so some fragmentation is to be expected. But excessive fragmentation might be a sign of a bug in the collation algorithm, or an unusual set of messages that was not considered.
Completeness
If we're running validation from a known waveform test source, then we should know how many data points it should have
The text was updated successfully, but these errors were encountered: