Skip to content

Commit

Permalink
GH-43440: [R] Unable to filter a factor column with %in% (#43446)
Browse files Browse the repository at this point in the history
### Rationale for this change

Fixes #43440

### What changes are included in this PR?

The binding for `%in%` sends the DictionaryType's `value_type` to
`cast_or_parse()`. It's possible that it would be better to handle this
in `cast_or_parse()`, but it is used in lots of places and I wasn't sure
that was correct everywhere. We could certainly find out, but that's a
bigger testing exercise than I wanted to take on this afternoon.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

The bug is fixed.
* GitHub Issue: #43440
  • Loading branch information
nealrichardson authored Sep 22, 2024
1 parent 37f62d0 commit 81b94dc
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 2 deletions.
11 changes: 9 additions & 2 deletions r/R/dplyr-funcs-conditional.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,16 @@ register_bindings_conditional <- function() {
value_set <- Array$create(table)
# If possible, `table` should be the same type as `x`
# Try downcasting here; otherwise Acero may upcast x to table's type
x_type <- x$type()
# GH-43440: `is_in` doesn't want a DictionaryType in the value_set,
# so we'll cast to its value_type
# TODO: should this be pushed into cast_or_parse? Is this a bigger issue?
if (inherits(x_type, "DictionaryType")) {
x_type <- x_type$value_type
}
try(
value_set <- cast_or_parse(value_set, x$type()),
silent = TRUE
value_set <- cast_or_parse(value_set, x_type),
silent = !getOption("arrow.debug", FALSE)
)

expr <- Expression$create("is_in", x,
Expand Down
10 changes: 10 additions & 0 deletions r/tests/testthat/test-dplyr-funcs-conditional.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,16 @@ tbl <- example_data
tbl$verses <- verses[[1]]
tbl$another_chr <- tail(letters, 10)

test_that("%in% handles dictionary type", {
df <- tibble::tibble(x = factor(c("a", "b", "c")))
compare_dplyr_binding(
.input %>%
filter(x %in% "a") %>%
collect(),
df
)
})

test_that("if_else and ifelse", {
compare_dplyr_binding(
.input %>%
Expand Down

0 comments on commit 81b94dc

Please sign in to comment.