-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add filterPrecursorMzValue to support filtering by multiple m/z values #232
Conversation
- Add the `filterPrecursorMzValues` method to enable filtering based on multiple target precursor m/z values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine! I found a typo and like to suggest a minor change using common
instead of closest
R/functions-util.R
Outdated
mtch <- closest(x[keep][idx], sort(mz), tolerance = tolerance, ppm = ppm, | ||
duplicates = "keep", .check = FALSE) | ||
keep[!is.na(mtch[order(idx)])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the order irrelevant here?
mtch <- closest(x[keep][idx], sort(mz), tolerance = tolerance, ppm = ppm, | |
duplicates = "keep", .check = FALSE) | |
keep[!is.na(mtch[order(idx)])] | |
cmn <- common(x[keep][idx], sort(mz), tolerance = tolerance, ppm = ppm, | |
duplicates = "keep", .check = FALSE) | |
keep[cmn] |
EDIT: or keep[idx][cmn]
? It's already late and I am a bit confused (is hard to tell without trying it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order
and idx
was important (otherwise I wouldn't do such costly operations) - but I have to check it again. Thanks for the common
suggestion - I'll have a try with that and see how it works/compares.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I get the same results with:
.values_match_mz2 <- function(x, mz, ppm = 20, tolerance = 0) {
keep <- which(!is.na(x))
idx <- order(x[keep])
cmn <- common(x[keep][idx], sort(mz), tolerance = tolerance, ppm = ppm,
duplicates = "keep", .check = FALSE)
sort(keep[idx][cmn])
}
(i.e. returning sort(keep[idx][cmn])
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in fact your implementation is still (slightly) faster:
> pmz <- c(12.4, 15, 3, 12.4, 3, 1234, 23, 5, 12.4, NA, 3)
> mz <- c(200, 12.4, 3)
> library(microbenchmark)
> microbenchmark(.values_match_mz(pmz, mz), .values_match_mz2(pmz, mz))
Unit: microseconds
expr min lq mean median uq max
.values_match_mz(pmz, mz) 235.827 272.2785 292.6092 284.0745 317.7635 416.735
.values_match_mz2(pmz, mz) 178.463 205.5580 221.5553 212.7210 224.7145 446.566
neval cld
100 b
100 a
> pmz <- rep(pmz, 100)
> mz <- rep(mz, 100)
> microbenchmark(.values_match_mz(pmz, mz), .values_match_mz2(pmz, mz))
Unit: microseconds
expr min lq mean median uq max
.values_match_mz(pmz, mz) 318.139 349.4490 394.0592 376.8095 417.4760 704.230
.values_match_mz2(pmz, mz) 267.281 298.0505 330.7795 320.6225 349.9465 572.168
neval cld
100 b
100 a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now changed it to your implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I don't have an idea how to remove the last sort
step but you could avoid the which(!is.na(x))
call and get minimal more speed:
library("microbenchmark")
library("MsCoreUtils")
#>
#> Attaching package: 'MsCoreUtils'
#> The following object is masked from 'package:stats':
#>
#> smooth
pmz <- c(12.4, 15, 3, 12.4, 3, 1234, 23, 5, 12.4, NA, 3)
mz <- c(200, 12.4, 3)
.values_match_mz2 <- function(x, mz, ppm = 20, tolerance = 0) {
keep <- which(!is.na(x))
idx <- order(x[keep])
cmn <- common(x[keep][idx], sort(mz), tolerance = tolerance, ppm = ppm,
duplicates = "keep", .check = FALSE)
sort(keep[idx][cmn])
}
.values_match_mz3 <- function(x, mz, ppm = 20, tolerance = 0) {
o <- order(x, na.last = NA) # na.last = NA will remove NA
cmn <- common(x[o], sort(mz), tolerance = tolerance, ppm = ppm,
duplicates = "keep", .check = FALSE)
sort(o[cmn])
}
microbenchmark(.values_match_mz2(pmz, mz), .values_match_mz3(pmz, mz), check = "identical")
#> Unit: microseconds
#> expr min lq mean median uq
#> .values_match_mz2(pmz, mz) 114.598 117.081 185.6789 118.6825 122.5355
#> .values_match_mz3(pmz, mz) 111.121 113.789 324.7867 114.7715 120.4950
#> max neval
#> 6300.412 100
#> 20524.152 100
Created on 2022-01-16 by the reprex package (v2.0.1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I learned again something new :)
Co-authored-by: Sebastian Gibb <[email protected]>
Code-wise, looks good to me, but I am wondering whether we aren't making things too complicated with that many functions. Wouldn't it be easier to have a single vectorised |
Yes, we're adding yet another filter function, but IMHO it is justified here, since the original |
Ok, indeed, thanks. I totally agree that different appropriately named functions are better that many arguments. What about renaming And re vectorisation, it's easy enough to pipe multiple |
I've now added a |
This PR adds the
filterPrecursorMzValue
function discussed in issue #230