Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #71 rounding #72

Merged
merged 10 commits into from
Aug 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .lycheeignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ https://www.linkedin.com/*
https://cosa.cdisc.org/events/Admiral
///home/runner/work/blog/blog/posts/2023-06-28_welcome/pharmaverse.slack.com
https://pharmaverse.github.io/blog/posts/2023-06-27_hackathon_app/
https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f
40 changes: 40 additions & 0 deletions inst/WORDLIST.txt
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,43 @@ tlg
TLG
TLGs
xpt
anderson
atorus
CAMIS
ceil
dn
doesn
edu
eps
gp
Gregor
htm
IBMRounding
iml
ji
lexjansen
lhdjung
lrcon
phuse
psiaims
rdocumentation
rounde
Roundings
roundSAS
rw
SAS's
sfirke
stackoverflow
StackOverflow
thm
tidytlg
tplyr
Tplyr
trunc
ucla
unv
ut
zMachine
admiraldev
admiralRoche
apache
Expand Down Expand Up @@ -284,3 +321,6 @@ dy
lubridate
TRTSDTM
ymd
behaviour
Farrugia
ethz
78 changes: 78 additions & 0 deletions posts/2023-07-24_rounding/appendix.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# markdown helpers --------------------------------------------------------

markdown_appendix <- function(name, content) {
paste(paste("##", name, "{.appendix}"), " ", content, sep = "\n")
}
markdown_link <- function(text, path) {
paste0("[", text, "](", path, ")")
}



# worker functions --------------------------------------------------------

insert_source <- function(repo_spec, name,
collection = "posts",
branch = "main",
host = "https://github.com",
text = "source code") {
path <- paste(
host,
repo_spec,
"tree",
branch,
collection,
name,
"index.qmd",
sep = "/"
)
return(markdown_link(text, path))
}

insert_timestamp <- function(tzone = Sys.timezone()) {
time <- lubridate::now(tzone = tzone)
stamp <- as.character(time, tz = tzone, usetz = TRUE)
return(stamp)
}

insert_lockfile <- function(repo_spec, name,
collection = "posts",
branch = "main",
host = "https://github.com",
text = "R environment") {
path <- paste(
host,
repo_spec,
"tree",
branch,
collection,
name,
"renv.lock",
sep = "/"
)
return(markdown_link(text, path))
}



# top level function ------------------------------------------------------

insert_appendix <- function(repo_spec, name, collection = "posts") {
appendices <- paste(
markdown_appendix(
name = "Last updated",
content = insert_timestamp()
),
" ",
markdown_appendix(
name = "Details",
content = paste(
insert_source(repo_spec, name, collection),
insert_lockfile(repo_spec, name, collection),
sep = ", "
)
),
sep = "\n"
)
knitr::asis_output(appendices)
}
Binary file added posts/2023-07-24_rounding/rounding.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
217 changes: 217 additions & 0 deletions posts/2023-07-24_rounding/rounding.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
---
title: "Rounding"
author:
- name: Kangjie Zhang
description: "Exploration of some commonly used rounding methods and their corresponding functions in SAS and R, with a focus on 'round half up' and reliable solutions for numerical precision challenges"
date: "2023-07-24"
# please do not use any non-default categories.
# You can find the default categories in the repository README.md
categories: [rounding]
# feel free to change the image
image: "rounding.png"
---

<!--------------- typical setup ----------------->

```{r setup, include=FALSE}
long_slug <- "2023-07-24_rounding"
# renv::use(lockfile = "renv.lock")
```

<!--------------- post begins here ----------------->

## Rounding methods

Both SAS and base R have the function `round()`, which rounds the input to the specified number of decimal places.
However, they use different approaches when rounding off a 5:

- SAS `round()` [rounds half up](https://en.wikipedia.org/wiki/Rounding#Rounding_half_up).
This is the most common method of rounding.

- base R `round()` [rounds to the nearest even](https://en.m.wikipedia.org/wiki/IEEE_754#Roundings_to_nearest), which matches SAS's `rounde()` function.

Although base R does not have the option for "round half up", there are functions available in other R packages (e.g., `janitor`, `tidytlg`).

In general, there are many often used rounding methods.
In the table below, you can find examples of them applied to the number 1.45.

+---------------+----------------------------+----------------------------+----------+------------+--------------------+
| | round half up | round to even | round up | round down | round towards zero |
+:=============:+:==========================:+:==========================:+:========:+:==========:+:==================:+
| Example: 1.45 | 1.5 | 1.4 | 2 | 1 | 1 |
| | | | | | |
| | (round to 1 decimal place) | (round to 1 decimal place) | | | |
+---------------+----------------------------+----------------------------+----------+------------+--------------------+

Here are the corresponding ways to implement these methods in SAS and R.

+---------+----------------------------------------+-----------------+-------------------+-----------------+--------------------+
| | round half up | round to even | round up | round down | round towards zero |
+:=======:+:======================================:+:===============:+:=================:+:===============:+:==================:+
| SAS | `round()` | `rounde()` | `ceil()` | `floor()` | `int()` |
+---------+----------------------------------------+-----------------+-------------------+-----------------+--------------------+
| R | ::: {style="background-color: yellow"} | <div> | <div> | <div> | <div> |
| | `janitor::round_half_up()` | | | | |
| | | `base::round()` | `base::ceiling()` | `base::floor()` | `base::trunc()` |
| | ::: {style="background-color: yellow"} | | | | |
| | <div> | </div> | </div> | </div> | </div> |
| | | | | | |
| | `tidytlg::roundSAS()` | | | | |
| | | | | | |
| | </div> | | | | |
| | ::: | | | | |
| | ::: | | | | |
+---------+----------------------------------------+-----------------+-------------------+-----------------+--------------------+

This table is summarized from links below, where more detailed discussions can be found -

- Two SAS blogs about [round-to-even](https://blogs.sas.com/content/iml/2019/11/11/round-to-even.html) and [rounding-up-rounding-down](https://blogs.sas.com/content/iml/2011/10/03/rounding-up-rounding-down.html)

- R documentation: Base R [Round](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Round.html), [`janitor::round_half_up()`](https://sfirke.github.io/janitor/reference/round_half_up.html), [`tidytlg::roundSAS()`](https://pharmaverse.github.io/tidytlg/main/reference/roundSAS.html)

- [CAMIS](https://psiaims.github.io/CAMIS/) (Comparing Analysis Method Implementations in Software): A cross-industry initiative to document discrepant results between software.
[Rounding](https://psiaims.github.io/CAMIS/Comp/r-sas_rounding.html) is one of the comparisons, and there are much more [on this page](https://psiaims.github.io/CAMIS/)!

## Round half up in R

The motivation for having a 'round half up' function is clear: it's a widely used rounding method, but there are no such options available in base R.

There are multiple forums that have discussed this topic, and quite a few functions already available.
But which ones to choose?
Are they safe options?

The first time I needed to round half up in R, I chose the function from a [PHUSE paper](https://www.lexjansen.com/phuse-us/2020/ct/CT05.pdf) and applied it to my study.
It works fine for a while until I encountered the following precision issue when double programming in R for TLGs made in SAS.

### Numerical precision issue

Example of rounding half up for 2436.845, with 2 decimal places:

```{r, message = FALSE}
# a function that rounds half up
# exact copy from: https://www.lexjansen.com/phuse-us/2020/ct/CT05.pdf
ut_round <- function(x, n = 0) {
# x is the value to be rounded
# n is the precision of the rounding
scale <- 10^n
y <- trunc(x * scale + sign(x) * 0.5) / scale
# Return the rounded number
return(y)
}
# round half up for 2436.845, with 2 decimal places
ut_round(2436.845, 2)
```

The expected result is 2436.85, but the output rounds it down.
Thanks to the community effort, there are already discussions and resolution available in a [StackOverflow post](https://stackoverflow.com/questions/12688717/round-up-from-5#comment110611119_12688836) -

> There are numerical precision issues, e.g., `round2(2436.845, 2)` returns `2436.84.` Changing `z + 0.5 to z + 0.5 + sqrt(.Machine$double.eps)` seems to work for me. -- Gregor Thomas Jun 24, 2020 at 2:16

- `.Machine$double.eps` is a built-in constant in R that represents the smallest positive floating-point number that can be represented on the system (reference: [Machine Characteristics](https://www.math.ucla.edu/~anderson/rw1001/library/base/html/zMachine.html))

- The expression `+ sqrt(.Machine$double.eps)` is used to add a very small value to mitigate floating-point precision issues.

- For more information about computational precision and floating-point, see the following links -

- R: [Why doesn't R think these numbers are equal?](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f)
- SAS: [Numerical Accuracy in SAS Software](https://documentation.sas.com/doc/en/lrcon/9.4/p0ji1unv6thm0dn1gp4t01a1u0g6.htm)

After the fix:

```{r, message = FALSE}
# revised rounds half up

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally, I don't think we should do this. this will change a lot of behaviors, like ut_round1(2436.84499999999999, 2) gives 2436.85 while round(2436.84499999999999, 2) gives 2436.84 (which is correct)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarkliming Thanks for the review! your example is a similar case to the one mentioned above: a <- 1.5 - 0.5*sqrt(.Machine$double.eps)

The reason for this behavior is that 2436.845 - 2436.84499999999999 is less than sqrt(.Machine$double.eps), the adjustment in the function is not big enough to round it up. When we encounter this situation, maybe we could handle it by the following options -

  1. accept this value and explain: 2436.845 and 2436.84499999999999 are nearly equal by using all.equal() function from base R with default tolerance value (default tolerance = sqrt(.Machine$double.eps)

    • the output from SAS round(2436.84499999999999, 0.01) is also 2436.85
  2. choose a smaller value than sqrt(.Machine$double.eps) or remove it to adjust this function

How would you do "round half up" in R if we don't choose those functions (ut_round1, janitor::round_half_up, ...)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this function will be used deeply inside other functions and large-scale used, so controlling the tolerance manually is not likely(which means that users can only apply one tolerance to all the numeric results obtained).

I will split your question into two: one is how do we deal with round half up, the other is how do we deal with rounding error.

My response to these two questions is the same: no action is needed.

For question 1, R is following IEEE 754, with default bankers rounding. What SAS provided seems reasonable, however, R is not wrong. This is just a different method. R does not need to copy SAS.

For question 2, about the accuracy loss in computation, I will also do nothing. This is an issue by design, and not possible to perfectly solve at the moment.

Both issues won't affect the interpretation of the result

So for me, introducing the fact that R and SAS are different in rounding should be sufficient.

Come back to the issue. The function ut_round1 tries to solve the rounding error arising from the design, but it did not. So it is important to also mention this point. (also, this issue is not possible to solve in the current design, this fix only fixes one part of the issue)

Copy link
Collaborator Author

@kaz462 kaz462 Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not trying to claim that base R's round function is wrong or we should strictly mimic SAS's behavior, but to provide round half up options in R. Similar as the intention of providing rounding option in Tplyr, offering users more flexibility - there may be contexts where 'round half up' is required.

What do you think of using the following examples to highlight those round_half_up functions do not offer the same level of precision and accuracy as the base R round function?
Please feel free to update this post directly, or let me know how to clarify this point :)

> a <- 1.5 - 0.5*sqrt(.Machine$double.eps)
> b <- 1.5 - 0.5*.Machine$double.eps
> janitor::round_half_up(a)
[1] 2
> janitor::round_half_up(b)
[1] 2
> round(a)
[1] 1
# base round reaches the precision limit:
> round(b)
[1] 2

ut_round1 <- function(x, n = 0) {
# x is the value to be rounded
# n is the precision of the rounding
scale <- 10^n
y <- trunc(x * scale + sign(x) * 0.5 + sqrt(.Machine$double.eps)) / scale
# Return the rounded number
return(y)
}
# round half up for 2436.845, with 2 decimal places
ut_round1(2436.845, 2)
```

### We are not alone

The same issue occurred in the following functions/options as well, and has been raised by users:

- `janitor::round_half_up()`: [issue](https://github.com/sfirke/janitor/issues/396) was raised and fixed in v2.1.0

- `Tplyr`: `options(tplyr.IBMRounding = TRUE)`, [issue](https://github.com/atorus-research/Tplyr/issues/124) was raised

- `scrutiny::round_up_from()/round_up()`: [issue](https://github.com/lhdjung/scrutiny/issues/43) was raised and fixed

- \...
and many others!

### Which ones to use?

The following functions have the precision issue mentioned above fixed, they all share the same logic from this [StackOverflow post](https://stackoverflow.com/questions/12688717/round-up-from-5).

- [`janitor::round_half_up()`](https://sfirke.github.io/janitor/reference/round_half_up.html) **version \>= 2.1.0**
- [`tidytlg::roundSAS()`](https://pharmaverse.github.io/tidytlg/main/reference/roundSAS.html)
- this function has two more arguments that can convert the result to character and allow a character string to indicate missing values
- [`scrutiny::round_up_from()/round_up()`](https://lhdjung.github.io/scrutiny/reference/rounding-common.html) **version \>= 0.2.5**
- `round_up_from()` has a `threshold` argument for rounding up, which adds flexibility for rounding up

- `round_up()` rounds up from 5, which is a special case of `round_up_from()`

### Are they safe options?

Those "round half up" functions do not offer the same level of precision and accuracy as the base R round function.

For example, let's consider a value `a` that is slightly less than `1.5`.
If we choose round half up approach to round `a` to 0 decimal places, an output of `1` is expected.
However, those functions yield a result of `2` because `1.5 - a` is less than `sqrt(.Machine$double.eps)`.

```{r, message = FALSE}
a <- 1.5 - 0.5 * sqrt(.Machine$double.eps)
ut_round1(a, 0)
janitor::round_half_up(a, digits = 0)
```

This behavior aligns the floating point number comparison functions `all.equal()` and `dplyr::near()` with default tolerance `.Machine$double.eps^0.5`, where 1.5 and `a` are treated as equal.

```{r, message = FALSE}
all.equal(a, 1.5)
dplyr::near(a, 1.5)
```

We can get the expected results from base R `round` as it provides greater accuracy.

```{r, message = FALSE}
round(a)
```

Here is an example when base R `round` reaches the precision limit:

```{r, message=FALSE}
# b is slightly less than 1.5
b <- 1.5 - 0.5 * .Machine$double.eps
# 1 is expected but the result is 2
round(b)
```

The precision and accuracy requirements can vary depending on the application.
Therefore, it is essential to be aware each function's performance in your specific context before making a choice.

## Conclusion

> With the differences in default behaviour across languages, you could consider your QC strategy and whether an acceptable level of fuzz in the electronic comparisons could be allowed for cases such as rounding when making comparisons between 2 codes written in different languages as long as this is documented.
> Alternatively you could document the exact rounding approach to be used in the SAP and then match this regardless of programming language used.
> - Ross Farrugia

Thanks Ross Farrugia, Ben Straub, Edoardo Mancini and Liming for reviewing this blog post and providing valuable feedback!

If you spot an issue or have different opinions, please don't hesitate to raise them through [pharmaverse/blog](https://github.com/pharmaverse/blog)!

<!--------------- appendices go here ----------------->

```{r, echo=FALSE}
source("appendix.R")
insert_appendix(
repo_spec = "pharmaverse/blog",
name = long_slug
)
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor thing, and perhaps you don't want to do this, but I thought I would mention anyway - would it be worth adding a section looking a the bigger picture, and whether how far we should be trying to go to match SAS rounding? I'm thinking about the fact that if we are trying to move to R-based submissions, then some of this stuff should in a way be expected and accepted. My idea comes from this comment by @rossfarrugia .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when moving from SAS to R submission, what kind of discrepancy are expected/accepted for rounding?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to add context to my comment before... when using same language for double programming QC often teams would strive for a 100% perfect comparison match, whereas when working in a multi-lingual world maybe it's ok to not perfectly match if differences can be explained in audit-ready QC evidence to explain differences are down to the use of different languages. e.g. a % in first line output with R is 1% and in QC with SAS it is 2%, and upon inspection the QC'er sees that the actual value is 1.5% rounded, hence just add a comment in QC evidence to explain such differences are considered acceptable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. a % in first line output with R is 1% and in QC with SAS it is 2%, and upon inspection the QC'er sees that the actual value is 1.5% rounded, hence just add a comment in QC evidence to explain such differences are considered acceptable.

This discrepancy is not caused by SAS/R, but by different rounding approaches, which I think should be based on the study SAP instead of the software. Should the rounding method be clarified first, then choose the corresponding functions in SAS/R?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd doubt most SAPs would go into such detail as nobody really ever questioned rounding approaches when we only used single language. It's a fair point though. I feel like a closing note such as "Note: with the differences in default behaviour across languages, you could consider your QC strategy and whether an acceptable level of fuzz in the electronic comparisons could be allowed for cases such as rounding when making comparisons between 2 codes written in different languages as long as this is documented. Alternatively you could document the exact rounding approach to be used in the SAP and then match this regardless of programming language used."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good summary, I will add it next week after the holiday.
The reason I prefer to align with the rounding method during QC is that when we accept the difference, it's easy to overlook the differences caused by other reasons., e.g., RConsortium/submissions-pilot3-adam#99

Thanks @rossfarrugia @clarkliming for your comments and discussion!