Skip to content

DQA_Results

Helene edited this page May 10, 2024 · 4 revisions

DQA Results

The results of the DQA output are organized in several sections, which will be introduced in the following.

Descriptive Results

The descriptive results provide an overview of metadata information, completeness counts, frequency counts / statistics, and conformance checks for each dataelement with two databases being displayed side by side for better comparison.

Description

For each dataelement, a description is provided if defined in the MDR.

Metadata

The metadata section summarizes some informative metadata from the MDR, such as the dataelement's variable name and table name in the database, the variable type, and more.

Completeness Overview

The completeness overview provides four numbers for a direct comparison between source database and target database. These numbers are also compared automatically between these databases and displayed at the beginning of the report with the 'ETL Checks (Validation)'.

  • n: the total number of available rows in the database for the selected dataelement
  • valid values: the number of rows of the dataelement that are not missing (NULL, NA, etc.)
  • missing values: the number of rows with missing values ($n = missingvalues + validvalues$)
  • distinct values: the number of distinct / unique values of the dataelement

Results (Frequency Counts / Statistics)

The formatting of the results section depends on the variable type:

  • enumerated / string: A maximum of 25 categories are displayed by default along with their frequency counts.

  • float / integer: statistical dispersion parameters are displayed (min, median, mean, max, SD, and more ...)

  • datetime: a simple summary statistic is displayed (min, q25, median, mean, q75, max)

Value Conformance Checks

If defined in the MDR for the respective dataelement, the results of the value conformance checks are also displayed along with the constraining values / rules. It is indicated clearly, if the checks were passed or failed according to the constraining values / rules. If the status is failed, the values that are not conform with the rules are also displayed. Similar to the completeness checks, these numbers are also compared automatically between both databases and displayed at the beginning of the report with the 'Value Conformance Checks (Verification)'.

💡 If the analyzed database is an SQL database, the SQL statement for retrieving the data for the respective dataelement can be accessed by clicking a button, which shows up on the descriptive results page.

Plausibility Checks

The plausibility checks are organized in the same manner als the descriptive results only for the plausibility statements (if they were defined in the MDR).

Completeness Checks

The completeness section is basically a table, which provides the absolute and relative missings of each dataelement, again with results of the source and target databases presented side by side for better comparison.

Difference Checks

If a TIMESTAMP column is provided in both the source and target databases, the tool compares the resource counts for each timestamp. If the available resources differ for a given TIMESTAMP, it indicates a potential missing resource in either the source or the target database. In this section, a summary table is provided, displaying the timestamps with differences along with their corresponding counts. This table is included in the PDF report. Additionally, the GUI offers detailed tables presenting information on the affected resources. Furthermore, users have the option to download the difference results as .csv or .rds files for further analysis outside of the DQA Tool.