-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* #73 initial setup of data and imputation post * #73 date functions and imputations blog post * #73 updates following review * chore: #73 date stamp --------- Co-authored-by: Ben Straub <[email protected]>
- Loading branch information
1 parent
0d30076
commit 8d7f46f
Showing
4 changed files
with
358 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
327 changes: 327 additions & 0 deletions
327
posts/2023-09-26_date_functions_and_imputation/date_functions_and_imputation.qmd
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,327 @@ | ||
--- | ||
title: "Date/Time Functions and Imputation in {admiral} " | ||
author: | ||
- name: Edoardo Mancini | ||
description: "Dates, times and imputation can be a frustrating facet of any programming language. Here's how {admiral} makes all of this easy!" | ||
date: "2023-09-26" | ||
# please do not use any non-default categories. | ||
# You can find the default categories in the repository README.md | ||
categories: [admiral, ADaM, Date/Time] | ||
# feel free to change the image | ||
image: "admiral.png" | ||
|
||
--- | ||
|
||
<!--------------- typical setup -----------------> | ||
|
||
```{r setup, include=FALSE} | ||
long_slug <- "2023-08-21_date_functions_and_imputation" | ||
# renv::use(lockfile = "renv.lock") | ||
library(admiraldev) | ||
``` | ||
|
||
<!--------------- post begins here -----------------> | ||
|
||
# Introduction | ||
|
||
Date and time is collected in SDTM as character values using the extended [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format `yyyy-dd-mmThh:mm:ss`. This universal format allows missing parts date or time - e.g. the string`"2019-10"` represents a date where the day and the time are unknown. In contrast, ADaM timing variables like `ADTM` (Analysis Datetime) or `ADY` (Analysis Relative Day) are numeric variables, which can be derived only if the date or datetime is complete. | ||
|
||
Most ADaM programmers have, at one point or another, encountered situations where missing dates, unexpected formats or confusing imputation functions rendered derivations of timing variables frustrating and time consuming. `{admiral}` aims to mitigate this (where possible!) by providing functions which automatically derive date/datetime variables for you, and fill in missing date or time parts according to well-defined imputation rules. | ||
|
||
In this article, we first examine the arsenal of functions provided by`{admiral}` to aid in datetime imputation and timing variable derivation. We then observe everything in action through a number of selected typical examples. | ||
|
||
# Date/Datetime Derivation and Imputation Functions | ||
|
||
`{admiral}` provides the following functions for date/datetime imputation: | ||
|
||
- Derivations for adding variables | ||
- [derive_vars_dt()](https://pharmaverse.github.io/admiral/cran-release/reference/derive_vars_dt.html): Adds a date variable and a date imputation flag variable (optional) based on a --DTC variable and imputation rules. | ||
- [derive_vars_dtm()](https://pharmaverse.github.io/admiral/cran-release/reference/derive_vars_dtm.html): Adds a datetime variable, a date imputation flag variable, and a time imputation flag variable (both optional) based on a --DTC variable and imputation rules. | ||
- Computation functions | ||
- [impute_dtc_dtm()](https://pharmaverse.github.io/admiral/cran-release/reference/impute_dtc_dtm.html): Returns a complete ISO 8601 datetime or `NA` based on a partial ISO 8601 datetime and imputation rules. | ||
- [impute_dtc_dt()](https://pharmaverse.github.io/admiral/cran-release/reference/impute_dtc_dt.html): Returns a complete ISO 8601 date (without time) or `NA` based on a partial ISO 8601 date(time) and imputation rules. | ||
- [convert_dtc_to_dt()](https://pharmaverse.github.io/admiral/cran-release/reference/convert_dtc_to_dt.html): Returns a date if the input ISO 8601 date is complete. Otherwise, `NA` is returned. | ||
- [convert_dtc_to_dtm()](https://pharmaverse.github.io/admiral/cran-release/reference/convert_dtc_to_dtm.html): Returns a datetime if the input ISO 8601 date is complete (with missing time replaced by `"00:00:00"` as default). Otherwise, NA is returned. | ||
- [compute_dtf()](https://pharmaverse.github.io/admiral/cran-release/reference/compute_dtf.html): Returns the date imputation flag. | ||
- [compute_tmf()](https://pharmaverse.github.io/admiral/cran-release/reference/compute_tmf.html): Returns the time imputation flag. | ||
|
||
From the point of view of a typical ADaM programmer, the functions `impute_*`, `convert_*` and `compute_*` above can be viewed as utilities for treating dates and/or imputation within any custom code. In contrast, their `derive_*` find their use in directly deriving new timing variables and/or carrying out imputation at an ADaM dataset scale. | ||
|
||
For a detailed look at the Imputation rules applied by these `{admiral}` functions, please visit [this vignette](https://pharmaverse.github.io/admiral/cran-release/articles/imputation.html#imputation-rules) on the documentation website. | ||
|
||
# Simple Examples with Vectors | ||
|
||
In the examples below, one can observe how some members of the class of utilities `impute_*()` and `convert_*()` can be employed to do the date-related heavy lifting. | ||
|
||
## Imputing a Partial Date Portion | ||
|
||
It is easy impute dates to the first day/month if they are partial just by using the `highest_imputation` argument: | ||
|
||
```{r warning = FALSE, message = FALSE} | ||
library(admiral) | ||
library(lubridate) | ||
library(tibble) | ||
library(dplyr, warn.conflicts = FALSE) | ||
dates <- c( | ||
"2019-07-18T15:25:40", | ||
"2019-07-18T15:25", | ||
"2019-07-18T15", | ||
"2019-07-18", | ||
"2019-02", | ||
"2019", | ||
"2019", | ||
"2019---07", | ||
"" | ||
) | ||
impute_dtc_dt( | ||
dtc = dates, | ||
highest_imputation = "M" | ||
) | ||
``` | ||
|
||
A simple modification using `date_imputation = "mid"` or `date_imputation = "last"` or enables the imputation to be made using the middle or last day/month instead: | ||
|
||
```{r} | ||
# Impute to last day/month if date is partial | ||
impute_dtc_dt( | ||
dtc = dates, | ||
highest_imputation = "M", | ||
date_imputation = "last", | ||
) | ||
# Impute to mid day/month if date is partial | ||
impute_dtc_dt( | ||
dtc = dates, | ||
highest_imputation = "M", | ||
date_imputation = "mid" | ||
) | ||
``` | ||
|
||
But what if there exist minimum dates that the imputed date cannot exceed? Here, the `min_date` argument comes to the rescue: | ||
|
||
```{r} | ||
impute_dtc_dt( | ||
"2020-12", | ||
min_dates = list( | ||
ymd("2020-12-06"), | ||
ymd("2020-11-11") | ||
), | ||
highest_imputation = "M" | ||
) | ||
``` | ||
|
||
## Computing Date Imputation Flags | ||
|
||
When it comes to carrying out an imputation, the twin task is to flag the type of imputation that was executed. Here, functions like `compute_dtf()` make this straightforward. For this function, all that needs to be done is to pass a date character date to the `dtc` argument, and the resulting imputed date to the `dt` argument. This will then return the right date imputation flag - see the examples below for some possible behaviors: | ||
|
||
```{r} | ||
compute_dtf(dtc = "2019-07", dt = as.Date("2019-07-18")) | ||
compute_dtf(dtc = "2019", dt = as.Date("2019-07-18")) | ||
compute_dtf(dtc = "--06-01T00:00", dt = as.Date("2022-06-01")) | ||
``` | ||
|
||
|
||
# Action Examples | ||
|
||
The `derive_*()` functions are essentially wrappers around the aforementioned `impute_*()` and `compute_*()` functions. In the following section, we explore examples where ADaM variables can be derived using this class of functions. | ||
|
||
## Creating an Imputed Datetime and Date Variable and Imputation Flag Variables | ||
|
||
As described previously, `derive_vars_dtm()` derives an imputed datetime variable and the corresponding date and time imputation flags. The imputed date variable can then be derived by using `derive_vars_dtm_to_dt()`. It is not necessary and advisable to perform the imputation for the date variable if it was already done for the datetime variable. CDISC considers the datetime and the date variable as two representations of the same date. Thus the imputation must be the same and the imputation flags are valid for both the datetime and the date variable. | ||
|
||
```{r} | ||
ae <- tribble( | ||
~AESTDTC, | ||
"2019-08-09T12:34:56", | ||
"2019-04-12", | ||
"2010-09", | ||
NA_character_ | ||
) %>% | ||
derive_vars_dtm( | ||
dtc = AESTDTC, | ||
new_vars_prefix = "AST", | ||
highest_imputation = "M", | ||
date_imputation = "first", | ||
time_imputation = "first" | ||
) %>% | ||
derive_vars_dtm_to_dt(exprs(ASTDTM)) | ||
``` | ||
```{r, echo=FALSE} | ||
dataset_vignette(ae) | ||
``` | ||
|
||
## Creating an Imputed Date Variable and Imputation Flag Variable | ||
|
||
If an imputed date variable without a corresponding datetime variable is required, it can be derived by the `derive_vars_dt()` function. | ||
|
||
```{r} | ||
ae <- tribble( | ||
~AESTDTC, | ||
"2019-08-09T12:34:56", | ||
"2019-04-12", | ||
"2010-09", | ||
NA_character_ | ||
) %>% | ||
derive_vars_dt( | ||
dtc = AESTDTC, | ||
new_vars_prefix = "AST", | ||
highest_imputation = "M", | ||
date_imputation = "first" | ||
) | ||
``` | ||
```{r, echo=FALSE} | ||
dataset_vignette(ae) | ||
``` | ||
|
||
## Imputing Time Without Imputing Date | ||
|
||
If the time should be imputed but not the date, the `highest_imputation` argument should be set to `"h"`. This results in `NA` if the date is partial. As no date is imputed the date imputation flag is not created. | ||
|
||
```{r} | ||
ae <- tribble( | ||
~AESTDTC, | ||
"2019-08-09T12:34:56", | ||
"2019-04-12", | ||
"2010-09", | ||
NA_character_ | ||
) %>% | ||
derive_vars_dtm( | ||
dtc = AESTDTC, | ||
new_vars_prefix = "AST", | ||
highest_imputation = "h", | ||
time_imputation = "first" | ||
) | ||
``` | ||
```{r, echo=FALSE} | ||
dataset_vignette(ae) | ||
``` | ||
|
||
## Avoiding Imputed Dates Before a Particular Date | ||
Usually an adverse event start date is imputed as the earliest date of all possible dates when filling the missing parts. The result may be a date before treatment start date. This is not desirable because the adverse event would not be considered as treatment emergent and excluded from the adverse event summaries. This can be avoided by specifying the treatment start date variable (`TRTSDTM`) for the `min_dates` argument. | ||
|
||
Importantly, `TRTSDTM` is used as imputed date only if the non missing date and time parts of `AESTDTC` coincide with those of `TRTSDTM`. Therefore `2019-10` is not imputed as `2019-11-11 12:34:56`. This ensures that collected information is not changed by the imputation. | ||
|
||
```{r} | ||
ae <- tribble( | ||
~AESTDTC, ~TRTSDTM, | ||
"2019-08-09T12:34:56", ymd_hms("2019-11-11T12:34:56"), | ||
"2019-10", ymd_hms("2019-11-11T12:34:56"), | ||
"2019-11", ymd_hms("2019-11-11T12:34:56"), | ||
"2019-12-04", ymd_hms("2019-11-11T12:34:56") | ||
) %>% | ||
derive_vars_dtm( | ||
dtc = AESTDTC, | ||
new_vars_prefix = "AST", | ||
highest_imputation = "M", | ||
date_imputation = "first", | ||
time_imputation = "first", | ||
min_dates = exprs(TRTSDTM) | ||
) | ||
``` | ||
```{r, echo=FALSE} | ||
dataset_vignette(ae) | ||
``` | ||
|
||
## Avoiding Imputed Dates After a Particular Date | ||
|
||
If a date is imputed as the latest date of all possible dates when filling the missing parts, it should not result in dates after data cut off or death. This can be achieved by specifying the dates for the `max_dates` argument. | ||
|
||
Importantly, non missing date parts are not changed. Thus `2019-12-04` is imputed as `2019-12-04 23:59:59` although it is after the data cut off date. It may make sense to replace it by the data cut off date but this is not part of the imputation. It should be done in a separate data cleaning or data cut off step. | ||
```{r} | ||
ae <- tribble( | ||
~AEENDTC, ~DTHDT, ~DCUTDT, | ||
"2019-08-09T12:34:56", ymd("2019-11-11"), ymd("2019-12-02"), | ||
"2019-11", ymd("2019-11-11"), ymd("2019-12-02"), | ||
"2019-12", NA, ymd("2019-12-02"), | ||
"2019-12-04", NA, ymd("2019-12-02") | ||
) %>% | ||
derive_vars_dtm( | ||
dtc = AEENDTC, | ||
new_vars_prefix = "AEN", | ||
highest_imputation = "M", | ||
date_imputation = "last", | ||
time_imputation = "last", | ||
max_dates = exprs(DTHDT, DCUTDT) | ||
) | ||
``` | ||
```{r, echo=FALSE} | ||
dataset_vignette(ae) | ||
``` | ||
|
||
## Imputation Without Creating a New Variable | ||
|
||
If imputation is required without creating a new variable the `convert_dtc_to_dt()` function can be called to obtain a vector of imputed dates. It can be used for example here: | ||
|
||
```{r} | ||
mh <- tribble( | ||
~MHSTDTC, ~TRTSDT, | ||
"2019-04", ymd("2019-04-15"), | ||
"2019-04-01", ymd("2019-04-15"), | ||
"2019-05", ymd("2019-04-15"), | ||
"2019-06-21", ymd("2019-04-15") | ||
) %>% | ||
filter( | ||
convert_dtc_to_dt( | ||
MHSTDTC, | ||
highest_imputation = "M", | ||
date_imputation = "first" | ||
) < TRTSDT | ||
) | ||
``` | ||
```{r, echo=FALSE} | ||
dataset_vignette(mh) | ||
``` | ||
|
||
## Using More Than One Imputation Rule for a Variable | ||
|
||
Using different imputation rules depending on the observation can be done by using the higher-order function `slice_derivation()`, which applies a derivation function differently (by varying its arguments) in different subsections of a dataset. For example, consider this Vital Signs case where pre-dose records require a different treatment to other records: | ||
|
||
```{r} | ||
vs <- tribble( | ||
~VSDTC, ~VSTPT, | ||
"2019-08-09T12:34:56", NA, | ||
"2019-10-12", "PRE-DOSE", | ||
"2019-11-10", NA, | ||
"2019-12-04", NA | ||
) %>% | ||
slice_derivation( | ||
derivation = derive_vars_dtm, | ||
args = params( | ||
dtc = VSDTC, | ||
new_vars_prefix = "A" | ||
), | ||
derivation_slice( | ||
filter = VSTPT == "PRE-DOSE", | ||
args = params(time_imputation = "first") | ||
), | ||
derivation_slice( | ||
filter = TRUE, | ||
args = params(time_imputation = "last") | ||
) | ||
) | ||
``` | ||
```{r, echo=FALSE} | ||
dataset_vignette(vs) | ||
``` | ||
|
||
# Conclusion | ||
|
||
Deriving timing variables and carrying out imputations is tricky at the best of times, but hopefully this blog post can shed some light on how make this all easier using the `{admiral}` package! As `{admiral}` developers we are always interested in knowing how users are employing the package for their ADaM needs, so if you have any comments or feedback related to this topic, don't be afraid to leave a comment on our [Slack channel](https://app.slack.com/client/T028PB489D3/C02M8KN8269) or on the [Github repository](https://github.com/pharmaverse/admiral/), either as an issue or as a discussion. | ||
|
||
For an even more detailed treatment of this topic, users are once again invited to read the corresponding [vignette](https://pharmaverse.github.io/admiral/cran-release/articles/imputation.html) on the documentation website, from which this article was adapted. | ||
|
||
<!--------------- appendices go here -----------------> | ||
|
||
```{r, echo=FALSE} | ||
#| eval: false | ||
source("appendix.R") | ||
insert_appendix( | ||
repo_spec = "pharmaverse/blog", | ||
name = long_slug | ||
) | ||
``` |