-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: match_name
gains join_id
col, allowing for an initial matching override based on some unique ID column
#460
Conversation
@jacobvjk requesting your review here for some input already |
and cc @cjyetman and @AlexAxthelm for visibility |
Allowing a named character vector, e.g. |
Yup, that's what I was thinking too! |
I think may make sense to only allow a single join column for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
particularly unclear about removing the join_id
after the inner_join
and before the filter
.
the rest seems quite reasonable
So there's a bunch of failing checks that I will look into later (I have a feeling it's because of changes in But the core functionality seems to be there :-) |
|
This seems to function now as expected, with library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(r2dii.data)
library(r2dii.match)
# 441 points to the company Russo, Russo e Russo Group
abcd_demo <- dplyr::mutate(
abcd_demo,
lei = dplyr::case_when(
company_id == "441" ~ "LEI123",
TRUE ~ lei
)
)
# C267 points to the company Russo s.r.l.
loanbook_demo <- dplyr::mutate(
loanbook_demo,
lei_direct_loantaker = dplyr::case_when(
id_direct_loantaker == "C267" ~ "LEI123",
TRUE ~ lei_direct_loantaker
)
)
out <- match_name(
loanbook_demo,
abcd_demo,
join_id = c(lei_direct_loantaker = "lei")
)
prioritized <- prioritize(out)
prioritized |>
filter(id_direct_loantaker == "C267") |>
select(id_direct_loantaker, lei_direct_loantaker, level, source, score, name, name_abcd)
#> # A tibble: 1 × 7
#> id_direct_loantaker lei_direct_loantaker level source score name name_abcd
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 C267 LEI123 lei_dir… id jo… 1 Russ… Russo, R… Created on 2024-03-12 with reprex v2.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the comments are show stoppers, but I would like to hear your thoughts before I approve. I think the error messages are a bit confusing.
Co-authored-by: CJ Yetman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still two instances mentioning list inputs
Co-authored-by: Jacob Kastl <[email protected]>
Co-authored-by: Jacob Kastl <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This ID column could be, for example,
lei
orisin
.Some open questions I have before marking this as ready for review:
loanbook
column name (e.g.lei_direct_loantaker
), and another column indicating theabcd
column name (e.g.lei
)loanbook
with multiple identical LEIs, and anabcd
with severalname_company
values that have the same LEI. Need to consider what to do in that caseSee reprex:
Created on 2024-03-07 with reprex v2.1.0
Closes #135