Add filterPrecursorMzValue to support filtering by multiple m/z values #232

jorainer · 2022-01-13T11:49:23Z

This PR adds the filterPrecursorMzValue function discussed in issue #230

- Add the `filterPrecursorMzValues` method to enable filtering based on multiple target precursor m/z values.

sgibb

Fine! I found a typo and like to suggest a minor change using common instead of closest

R/MsBackend.R

sgibb · 2022-01-13T19:36:18Z

R/functions-util.R

+    mtch <- closest(x[keep][idx], sort(mz), tolerance = tolerance, ppm = ppm,
+                    duplicates = "keep", .check = FALSE)
+    keep[!is.na(mtch[order(idx)])]


Isn't the order irrelevant here?

Suggested change

mtch <- closest(x[keep][idx], sort(mz), tolerance = tolerance, ppm = ppm,

duplicates = "keep", .check = FALSE)

keep[!is.na(mtch[order(idx)])]

cmn <- common(x[keep][idx], sort(mz), tolerance = tolerance, ppm = ppm,

duplicates = "keep", .check = FALSE)

keep[cmn]

EDIT: or keep[idx][cmn]? It's already late and I am a bit confused (is hard to tell without trying it).

The order and idx was important (otherwise I wouldn't do such costly operations) - but I have to check it again. Thanks for the common suggestion - I'll have a try with that and see how it works/compares.

Update: I get the same results with:

.values_match_mz2 <- function(x, mz, ppm = 20, tolerance = 0) { keep <- which(!is.na(x)) idx <- order(x[keep]) cmn <- common(x[keep][idx], sort(mz), tolerance = tolerance, ppm = ppm, duplicates = "keep", .check = FALSE) sort(keep[idx][cmn]) }

(i.e. returning sort(keep[idx][cmn]))

And in fact your implementation is still (slightly) faster:

> pmz <- c(12.4, 15, 3, 12.4, 3, 1234, 23, 5, 12.4, NA, 3) > mz <- c(200, 12.4, 3) > library(microbenchmark) > microbenchmark(.values_match_mz(pmz, mz), .values_match_mz2(pmz, mz)) Unit: microseconds expr min lq mean median uq max .values_match_mz(pmz, mz) 235.827 272.2785 292.6092 284.0745 317.7635 416.735 .values_match_mz2(pmz, mz) 178.463 205.5580 221.5553 212.7210 224.7145 446.566 neval cld 100 b 100 a > pmz <- rep(pmz, 100) > mz <- rep(mz, 100) > microbenchmark(.values_match_mz(pmz, mz), .values_match_mz2(pmz, mz)) Unit: microseconds expr min lq mean median uq max .values_match_mz(pmz, mz) 318.139 349.4490 394.0592 376.8095 417.4760 704.230 .values_match_mz2(pmz, mz) 267.281 298.0505 330.7795 320.6225 349.9465 572.168 neval cld 100 b 100 a

I've now changed it to your implementation.

Unfortunately I don't have an idea how to remove the last sort step but you could avoid the which(!is.na(x)) call and get minimal more speed:

library("microbenchmark") library("MsCoreUtils") #> #> Attaching package: 'MsCoreUtils' #> The following object is masked from 'package:stats': #> #> smooth pmz <- c(12.4, 15, 3, 12.4, 3, 1234, 23, 5, 12.4, NA, 3) mz <- c(200, 12.4, 3) .values_match_mz2 <- function(x, mz, ppm = 20, tolerance = 0) { keep <- which(!is.na(x)) idx <- order(x[keep]) cmn <- common(x[keep][idx], sort(mz), tolerance = tolerance, ppm = ppm, duplicates = "keep", .check = FALSE) sort(keep[idx][cmn]) } .values_match_mz3 <- function(x, mz, ppm = 20, tolerance = 0) { o <- order(x, na.last = NA) # na.last = NA will remove NA cmn <- common(x[o], sort(mz), tolerance = tolerance, ppm = ppm, duplicates = "keep", .check = FALSE) sort(o[cmn]) } microbenchmark(.values_match_mz2(pmz, mz), .values_match_mz3(pmz, mz), check = "identical") #> Unit: microseconds #> expr min lq mean median uq #> .values_match_mz2(pmz, mz) 114.598 117.081 185.6789 118.6825 122.5355 #> .values_match_mz3(pmz, mz) 111.121 113.789 324.7867 114.7715 120.4950 #> max neval #> 6300.412 100 #> 20524.152 100

^{Created on 2022-01-16 by the reprex package (v2.0.1)}

Thanks! I learned again something new :)

Co-authored-by: Sebastian Gibb <[email protected]>

lgatto · 2022-01-14T15:20:57Z

Code-wise, looks good to me, but I am wondering whether we aren't making things too complicated with that many functions. Wouldn't it be easier to have a single vectorised filterPrecursorMz() function? Same applies to other cases.

jorainer · 2022-01-18T12:42:41Z

Yes, we're adding yet another filter function, but IMHO it is justified here, since the original filterPrecursorMz and the newly added filterPrecursorMzValues work differently, the former based on m/z ranges, the latter on individual m/z values. This is similar to the discussion in #133. I believe it is easier on the user to have dedicated functions instead of having to set multiple additional parameters. Also, changing filterPrecursorMz to a vectorized version might break backward compatibility (also based on what users are used from the MSnbase package).

But obviously that's open for discussion @lgatto @sgibb

lgatto · 2022-01-19T07:36:52Z

Ok, indeed, thanks. I totally agree that different appropriately named functions are better that many arguments.

What about renaming filterPrecursorMz() to filterPrecursorMzRange() to clarify that it works on ranges, rather than values, and be consistent with #133 ?

And re vectorisation, it's easy enough to pipe multiple filterPrecursorMz[Range]() into each other to filter multiple ranges.

jorainer · 2022-01-19T09:52:29Z

I've now added a filterPrecursorMzRange method and deprecated the filterPrecursorMz method - I wouldn't like to remove it yet, because we have many workflows depending on that method.

jorainer added 3 commits December 10, 2021 14:19

feat: add filterPrecursorMzValues (issue #230)

4687a57

- Add the `filterPrecursorMzValues` method to enable filtering based on multiple target precursor m/z values.

Merge branch 'master' into filterPrecursorMzValue

1e1791e

Bump version

3e9b991

jorainer requested review from sgibb and lgatto January 13, 2022 11:49

sgibb requested changes Jan 13, 2022

View reviewed changes

jorainer and others added 2 commits January 14, 2022 08:56

Update R/MsBackend.R

f165cbd

Co-authored-by: Sebastian Gibb <[email protected]>

fix: address Sebastian's comments

b58e240

jorainer requested a review from sgibb January 14, 2022 10:57

refactor: adapt to Sebastian's suggestions

6e97171

refactor: add filterPrecursorMzRange and depracate filterPrecursorMz

a956c3e

lgatto approved these changes Jan 19, 2022

View reviewed changes

fix: add missing bracket and fix unit test suite

bc6d4ea

jorainer merged commit 627df30 into master Jan 19, 2022

jorainer deleted the filterPrecursorMzValue branch January 19, 2022 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add filterPrecursorMzValue to support filtering by multiple m/z values #232

Add filterPrecursorMzValue to support filtering by multiple m/z values #232

jorainer commented Jan 13, 2022

sgibb left a comment

sgibb Jan 13, 2022 •

edited

Loading

jorainer Jan 14, 2022

jorainer Jan 14, 2022

jorainer Jan 14, 2022

jorainer Jan 14, 2022

sgibb Jan 16, 2022

jorainer Jan 18, 2022

lgatto commented Jan 14, 2022

jorainer commented Jan 18, 2022 •

edited

Loading

lgatto commented Jan 19, 2022

jorainer commented Jan 19, 2022

Add filterPrecursorMzValue to support filtering by multiple m/z values #232

Add filterPrecursorMzValue to support filtering by multiple m/z values #232

Conversation

jorainer commented Jan 13, 2022

sgibb left a comment

Choose a reason for hiding this comment

sgibb Jan 13, 2022 • edited Loading

Choose a reason for hiding this comment

jorainer Jan 14, 2022

Choose a reason for hiding this comment

jorainer Jan 14, 2022

Choose a reason for hiding this comment

jorainer Jan 14, 2022

Choose a reason for hiding this comment

jorainer Jan 14, 2022

Choose a reason for hiding this comment

sgibb Jan 16, 2022

Choose a reason for hiding this comment

jorainer Jan 18, 2022

Choose a reason for hiding this comment

lgatto commented Jan 14, 2022

jorainer commented Jan 18, 2022 • edited Loading

lgatto commented Jan 19, 2022

jorainer commented Jan 19, 2022

sgibb Jan 13, 2022 •

edited

Loading

jorainer commented Jan 18, 2022 •

edited

Loading