-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Apply our faster implementation of word throughout (#220)
As part of #196, we found that stringr::word was quite slow, and so implemented a faster version. This PR makes the new word function a private function accessible via APCalign:::word; adds tests for new function; extends use of this new function throughout Co-authored-by: ehwenk <[email protected]>
- Loading branch information
Showing
8 changed files
with
118 additions
and
50 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
#' Extract words from a sentence. Intended as a faster | ||
#' replacement for stringr::word | ||
#' | ||
#' @param string A character vector | ||
|
||
#' @param start,end Pair of integer vectors giving range of words (inclusive) | ||
#' to extract. The default value select the first word. | ||
#' @param sep Separator between words. Defaults to single space. | ||
#' @return A character vector with the same length as `string`/`start`/`end`. | ||
#' | ||
#' @examples | ||
#' spp <- c("Banksia serrata", "Actinotus helanthii") | ||
#' APCalign:::word(spp, 1) | ||
#' APCalign:::word(spp, 2) | ||
word <- function(string, start = 1L, end = start, sep = " ") { | ||
if(end == start) { | ||
str_split_i(string, " ", start) | ||
} else if(end == start+1) { | ||
w1 <- str_split_i(string, sep, start) | ||
w2 <- str_split_i(string, sep, start+1) | ||
|
||
out <- paste(w1, w2) | ||
out[is.na(w2)] <- NA_character_ | ||
|
||
return(out) | ||
} else if(end == start+2) { | ||
|
||
w1 <- str_split_i(string, sep, start) | ||
w2 <- str_split_i(string, sep, start+1) | ||
w3 <- str_split_i(string, sep, start+2) | ||
|
||
out <- paste(w1, w2, w3) | ||
out[is.na(w2) | is.na(w3)] <- NA_character_ | ||
|
||
return(out) | ||
} else { | ||
i <- seq(start, end) | ||
|
||
txt <- str_split(string, sep) | ||
out <- purrr::map(txt, ~paste(.x[i], collapse = sep)) | ||
|
||
lngth <- purrr::map_int(txt, length) | ||
out[lngth < end] <- NA | ||
|
||
return(out) | ||
} | ||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
test_that("Word", { | ||
|
||
taxa <- | ||
c( | ||
NA, | ||
"Banksia integrifolia", | ||
"Acacia longifolia", | ||
"Commersonia rosea", | ||
"Thelymitra pauciflora", | ||
"Justicia procumbens", | ||
"Hibbertia", | ||
"Rostellularia long leaves", | ||
"Hibbertia sericea var silliafolius", | ||
"Hibbertia sp.", | ||
"x Cynochloris macivorii", | ||
"(Dockrillia pugioniformis x Dockrillia striolata) x Dockrillia pugioniformis" | ||
) | ||
|
||
expect_equal(APCalign:::word(taxa, 1), stringr::word(taxa, 1)) | ||
expect_equal(APCalign:::word(taxa, 2), stringr::word(taxa, 2)) | ||
expect_equal(APCalign:::word(taxa, 3), stringr::word(taxa, 3)) | ||
expect_equal(APCalign:::word(taxa, 1,2), stringr::word(taxa, 1,2)) | ||
expect_equal(APCalign:::word(taxa, 1,3), stringr::word(taxa, 1,3)) | ||
}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters