Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split data checks based on agency #201

Open
elimillera opened this issue Dec 12, 2023 · 1 comment
Open

Split data checks based on agency #201

elimillera opened this issue Dec 12, 2023 · 1 comment
Labels

Comments

@elimillera
Copy link
Member

Feature Idea

This was brought up in the Dec122023 meeting. There are different rules for different agencies. For example, FDA doesn't allow underscores or non-ascii in filenames. We could add a flag to strict_checks in write_xpt to check agency specific rules.

@cpiraux Feel free to add in anything I missed or misstated.

Relevant Input

No response

Relevant Output

No response

Reproducible Example/Pseudo Code

No response

@elimillera elimillera added enhancement New feature or request programming labels Dec 12, 2023
@cpiraux
Copy link
Collaborator

cpiraux commented Dec 13, 2023

I am adding an example for more clarification.

The XPT requirements and those from regulatory agencies can differ. For instance, let's examine the distinct requirements for dataset and variable labels:

XPT FDA NMPA
No restriction on characters; maximum length is 40 bytes. Variable names, as well as variable and dataset labels, should include American Standard Code for Information Interchange (ASCII) text codes only. Maximum Length in Characters = 40 For eSubmission in China, one of the requirements is to translate the foreign language data package (e.g., English) to Chinese. Variable labels, dataset labels, MedDRA, WHO Drug terms, primary endpoint-related code lists, etc., need to be translated from English to Chinese.

Currently, in df_label.R, the function fails if the label does not meet the following requirements:

label_len <- nchar(label)

if (label_len > 40) {
  abort("Length of dataset label must be 40 characters or less.")
}

if (stringr::str_detect(label, "[^[:ascii:]]")) {
  abort("`label` cannot contain any non-ASCII, symbol, or special characters.")
}

The first check represents an XPT requirement, while the second one aligns with FDA specifications. I suggest moving agency-specific checks to xpt_validate so that they can be ignored if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants