You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In remove_empty(), the documentation of the cutoff argument states:
What fraction (>0 to <=1) of rows or columns must be empty to be removed?
The bold on "empty" is mine. This suggests that the highercutoff is, more rows are supposed to be kept, since the condition to remove them is stricter.
Nevertheless, when you use different values of cutoff, the opposite behavior is observed; the highercutoff is, less rows are kept. Please see the reprex for evidence.
I see two optional solutions:
Change the behavior of cutoff, so that it would behave as described in the documentation. This has the major drawback of being a drastically breaking change. The source for this behavior lies in rows 60 and 74 of remove_empty.R, and a possible solution might be to drop ! from the calculation.
Change the documentation of cutoff, so it would be clear what behavior is expected.
Happy to hear your thoughts on this!
library(tidyr)
library(janitor)
#> #> Attaching package: 'janitor'#> The following objects are masked from 'package:stats':#> #> chisq.test, fisher.testbillboard|>
nrow()
#> [1] 317billboard|>janitor::remove_empty("rows", cutoff=0.9) |>
nrow()
#> [1] 0billboard|>janitor::remove_empty("rows", cutoff=0.1) |>
nrow()
#> [1] 293
Thanks for the report, and I see what you mean. In general, we have a strong preference to avoid breaking changes. When documentation is out of line with behavior, let's fix the documentation (unless it's a honest-to-goodness bug or the behavior is catastrophically bad or ...).
Can you please suggest revised wording that would be clearer? (Feel free to either do it here in the issue or via PR.)
matanhakim
added a commit
to matanhakim/janitor
that referenced
this issue
Mar 2, 2024
In
remove_empty()
, the documentation of thecutoff
argument states:The bold on "empty" is mine. This suggests that the higher
cutoff
is, more rows are supposed to be kept, since the condition to remove them is stricter.Nevertheless, when you use different values of
cutoff
, the opposite behavior is observed; the highercutoff
is, less rows are kept. Please see the reprex for evidence.I see two optional solutions:
cutoff
, so that it would behave as described in the documentation. This has the major drawback of being a drastically breaking change. The source for this behavior lies in rows 60 and 74 ofremove_empty.R
, and a possible solution might be to drop!
from the calculation.cutoff
, so it would be clear what behavior is expected.Happy to hear your thoughts on this!
Created on 2024-02-26 with reprex v2.0.2
The text was updated successfully, but these errors were encountered: