Skip to content

Commit

Permalink
sfirke#563 add set_labels to clean names
Browse files Browse the repository at this point in the history
  • Loading branch information
jospueyo committed Jan 21, 2024
1 parent 2583cf0 commit b596932
Show file tree
Hide file tree
Showing 6 changed files with 55 additions and 9 deletions.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ Authors@R: c(
person("Ryan", "Knight", , "[email protected]", role = "ctb"),
person("Malte", "Grosser", , "[email protected]", role = "ctb"),
person("Jonathan", "Zadra", , "[email protected]", role = "ctb"),
person("Olivier", "Roy", role = "ctb")
person("Olivier", "Roy", role = "ctb"),
person("Josep", "Pueyo-Ros", "[email protected]", role = "ctb")
)
Description: The main janitor functions can: perfectly format data.frame
column names; provide quick counts of variable combinations (i.e.,
Expand Down
4 changes: 3 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ These are all minor breaking changes resulting from enhancements and are not exp

* The new function `excel_time_to_numeric()` converts times from Excel that do not have accompanying dates into a number of seconds. (#245, thanks to **@billdenney** for the feature.)

* A new argument `set_labels` to `clean_names()` stores the old names as labels in each column. Variable labels are visualized in Rstudio's data viewer or used by default by some packages such as `gt` instead of variable names. Labels can also be used in ggplot labels thanks to the function `easy_labs()` in the `ggeasy` package. Read this wonderful [post](https://www.pipinghotdata.com/posts/2022-09-13-the-case-for-variable-labels-in-r/) for more info about column labels. (#563, thanks to **@jospueyo** for the feature).

## Bug fixes

* `adorn_totals("row")` now succeeds if the new `name` of the totals row is already a factor level of the input data.frame (#529, thanks @egozoglu for reporting).
Expand All @@ -22,7 +24,7 @@ These are all minor breaking changes resulting from enhancements and are not exp

* `get_one_to_one()` no longer errors with near-equal values that become identical factor levels (fix #543, thanks to @olivroy for reporting)

# Refactoring
## Refactoring

* Remove dplyr verbs superseded in dplyr 1.0.0 (#547, @olivroy)

Expand Down
17 changes: 14 additions & 3 deletions R/clean_names.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
#' (characters) to "u".
#'
#' @param dat The input `data.frame`.
#' @param set_labels If set to `TRUE`, old names are stored as labels in each column of `dat`.
#' @inheritDotParams make_clean_names -string
#' @return A `data.frame` with clean names.
#'
Expand Down Expand Up @@ -65,13 +66,13 @@
#' x %>%
#' clean_names(case = "upper_camel", abbreviations = c("ID", "DOB"))
#'
clean_names <- function(dat, ...) {
clean_names <- function(dat, ..., set_labels = FALSE) {
UseMethod("clean_names")
}

#' @rdname clean_names
#' @export
clean_names.default <- function(dat, ...) {
clean_names.default <- function(dat, ..., set_labels = FALSE) {
if (is.null(names(dat)) && is.null(dimnames(dat))) {
stop(
"`clean_names()` requires that either names or dimnames be non-null.",
Expand All @@ -81,14 +82,19 @@ clean_names.default <- function(dat, ...) {
if (is.null(names(dat))) {
dimnames(dat) <- lapply(dimnames(dat), make_clean_names, ...)
} else {
if (set_labels){
old_names <- names(dat)
for (i in seq_along(old_names)) attr(dat[[i]], "label") <- old_names[[i]]
}
names(dat) <- make_clean_names(names(dat), ...)

}
dat
}

#' @rdname clean_names
#' @export
clean_names.sf <- function(dat, ...) {
clean_names.sf <- function(dat, ..., set_labels = FALSE) {
if (!requireNamespace("sf", quietly = TRUE)) { # nocov start
stop(
"Package 'sf' needed for this function to work. Please install it.",
Expand All @@ -103,6 +109,10 @@ clean_names.sf <- function(dat, ...) {
sf_cleaned <- make_clean_names(sf_names[1:n_cols], ...)
# rename original df
names(dat)[1:n_cols] <- sf_cleaned

if(set_labels){
for (i in seq_along(sf_names[1:n_cols])) attr(dat[[i]], "label") <- sf_names[[i]]
}

return(dat)
}
Expand All @@ -116,6 +126,7 @@ clean_names.tbl_graph <- function(dat, ...) {
call. = FALSE
)
} # nocov end

dplyr::rename_all(dat, .funs = make_clean_names, ...)
}

Expand Down
8 changes: 5 additions & 3 deletions man/clean_names.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/janitor-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

31 changes: 30 additions & 1 deletion tests/testthat/test-clean-names.R
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,35 @@ test_that("do not create duplicates (fix #251)", {
)
})

test_that("labels are created in default and sf methods (feature request #563)", {
dat_df <- dplyr::tibble(`a a` = c(11, 22), `b b` = c(2, 3))
dat_df_clean_labels <- clean_names(dat_df, set_labels = TRUE)
dat_df_clean <- clean_names(dat_df)

dat_sf <- dat_df
dat_sf$x <- c(1,2)
dat_sf$y = c(1,2)
dat_sf <- sf::st_as_sf(dat_sf, coords = c("x", "y"))
dat_sf_clean_labels <- clean_names(dat_sf, set_labels = TRUE)
dat_sf_clean <- clean_names(dat_sf)

for (i in seq_along(names(dat_df))){
# check that old names are saved as labels when set_labels is TRUE
expect_equal(attr(dat_df_clean_labels[[i]], "label"), names(dat_df)[[i]])
expect_equal(attr(dat_sf_clean_labels[[i]], "label"), names(dat_sf)[[i]])

# check that old names are not stored if set_labels is not TRUE
expect_null(attr(dat_df_clean[[i]], "label"))
expect_null(attr(dat_sf_clean[[i]], "label"))
}

# expect names are always cleaned
expect_equal(names(dat_df_clean), c("a_a", "b_b"))
expect_equal(names(dat_df_clean_labels), c("a_a", "b_b"))
expect_equal(names(dat_sf_clean), c("a_a", "b_b", "geometry"))
expect_equal(names(dat_sf_clean_labels), c("a_a", "b_b", "geometry"))
})


test_that("allow for duplicates (fix #495)", {
expect_equal(
Expand Down Expand Up @@ -589,7 +618,7 @@ test_that("tbl_graph/tidygraph", {
tidygraph::play_erdos_renyi(10, 0.5) %>%
# create nodes wi
tidygraph::bind_nodes(test_df) %>%
dplyr::mutate(dplyr::across(dplyr::where(is.numeric), ~ dplyr::coalesce(x, 1)))
dplyr::mutate(dplyr::across(dplyr::where(is.numeric), \(x) dplyr::coalesce(x, 1)))

# create a graph with clean names
# warning due to unhandled mu
Expand Down

0 comments on commit b596932

Please sign in to comment.