-
Notifications
You must be signed in to change notification settings - Fork 5
DQA_Results
The results of the DQA output are organized in several sections, which will be introduced in the following.
The descriptive results provide an overview of metadata information, completeness counts, frequency counts / statistics, and conformance checks for each dataelement with two databases being displayed side by side for better comparison.
For each dataelement, a description is provided if defined in the MDR.
The metadata section summarizes some informative metadata from the MDR, such as the dataelement's variable name and table name in the database, the variable type, and more.
The completeness overview provides four numbers for a direct comparison between source database and target database. These numbers are also compared automatically between these databases and displayed at the beginning of the report with the 'ETL Checks (Validation)'.
- n: the total number of available rows in the database for the selected dataelement
- valid values: the number of rows of the dataelement that are not missing (NULL, NA, etc.)
- missing values: the number of rows with missing values ($n = missing
values + validvalues$) - distinct values: the number of distinct / unique values of the dataelement
The formatting of the results section depends on the variable type:
-
enumerated / string: A maximum of 25 categories are displayed by default along with their frequency counts.
-
float / integer: statistical dispersion parameters are displayed (min, median, mean, max, SD, and more ...)
-
datetime: a simple summary statistic is displayed (min, q25, median, mean, q75, max)
If defined in the MDR for the respective dataelement, the results of the value conformance checks are also displayed along with the constraining values / rules. It is indicated clearly, if the checks were passed or failed according to the constraining values / rules. If the status is failed, the values that are not conform with the rules are also displayed. Similar to the completeness checks, these numbers are also compared automatically between both databases and displayed at the beginning of the report with the 'Value Conformance Checks (Verification)'.
💡 If the analyzed database is an SQL database, the SQL statement for retrieving the data for the respective dataelement can be accessed by clicking a button, which shows up on the descriptive results page.
The plausibility checks are organized in the same manner als the descriptive results only for the plausibility statements (if they were defined in the MDR).
The completeness section is basically a table, which provides the absolute and relative missings of each dataelement, again with results of the source and target databases presented side by side for better comparison.
If a TIMESTAMP column is provided in both the source and target databases, the tool compares the resource counts for each timestamp. If the available resources differ for a given TIMESTAMP, it indicates a potential missing resource in either the source or the target database. In this section, a summary table is provided, displaying the timestamps with differences along with their corresponding counts. This table is included in the PDF report. Additionally, the GUI offers detailed tables presenting information on the affected resources. Furthermore, users have the option to download the difference results as .csv or .rds files for further analysis outside of the DQA Tool.