Question about using propr with multi-omics data #9

handibles · 2020-04-03T14:02:32Z

Thanks and regards as ever to the devs.

I'm considering several sets of 'omic data generated from the same cohort of FASTQ files (i.e. taxonomic, pathways, etc). The way of the CoDa seems a good choice for integrating these values, but I've not seen it mentioned / suggested / gainsaid anywhere.

Was planning on subsetting the sets independently (but at similar levels) using propr(... select = ) and then combine, but propr has no native function I can see for this. It could just be an unaddressed usecase, but I would be interested to hear a sane thought on the issue.

Hope all are well.

The text was updated successfully, but these errors were encountered:

tpq · 2020-04-03T20:54:15Z

G'day, and thanks for your interest in propr.

Are you suggesting that that you have multiple data sets which come from the same samples? For example, 16s data, pathway data, etc., each from the same N samples? I assume the pathway data then are derived from the 16s data (and so forth). In that case, it may make more sense to treat each data set separately, and apply propr K times for K data sets.

I'm not exactly sure what you mean by "combine". What propr can do is measure gene-gene (or microbe-microbe or pathway-pathway) associates for a single data set. If you want to integrate data sets in a multivariate analysis, I'd recommend the mixOmics package. They have an option in their software to apply a CLR and do it the CoDa way too.

Anyways, we've recently put together a (hopefully) easy-to-follow workflow for compositional data analysis that covers differential abundance and association testing. You might find it helpful!

https://academic.oup.com/gigascience/article/8/9/giz107/5572529

Feel free to describe your data in more detail and I can advise.

Otherwise, let me know if you have some more questions.

handibles · 2020-04-04T10:53:17Z

Love the link! Super useful, thanks.

Your take on the X sets * N samples is close, but in this case it's shotgun:taxonomy and shotgun:metabolic assignments, generated independent through different pipelines but from the same fastq. To me this seems a reasonable approximation of multiomics (thanks also for the mixOmics recommendation), but thoughts welcome.

By "combine", simply meant concatenate datasets ( y <- c(A,B,C) ) and then investigate proportions between those various features (propr(y, ...)). Was wondering if there was perhaps a smoother way to cat propr objects together. Current method is, as you say, K CLR transforms of K sets, combine, then propr.

Ultimate goal is to be able to determine proportionality between taxonomic and metabolic features generated from the same data/sample/fastq set. Had though that CoDa would operate well between multiomics sets, assuming each set had an appropriately chosen denominator.

This is perhaps edging into a broader question, but in what ways are intercomparisons between CoDa sets (e.g. propr's rho values) restricted once all data has been appropriately transformed out of the dread simplex?

edit: checked out the publication provided above, which for reference of others seems to cover this issue fairly exactly under the heading "Vertical data integration". From Quinn et al., 2020:

For proportionality and differential proportionality analysis, we would need to log-ratio transform each -omics source independently, then column join them with cbind. Here, any proportionality occurring between features from different sources would be with respect to 2 references and must get interpreted accordingly

tpq · 2020-04-04T20:42:02Z

Ah yes! Now we're on the same page!

Had though that CoDa would operate well between multiomics sets, assuming each set had an appropriately chosen denominator.

This is correct. Unfortunately, I haven't yet written a nice API for this (though it's probably overdue). But there is a rough work-around that can get you started. Something like this...

# For 2 -omics data sets called `met.rel` and `mic.rel`...
clr <- function(x) sweep(log(x), 1, rowMeans(log(x)), "-")
REL <- cbind(clr(met.rel), clr(mic.rel))
pr.r <- propr:::lr2rho(as.matrix(REL))
colnames(pr.r) <- colnames(REL)
rownames(pr.r) <- colnames(REL)

Unfortunately, you won't be able to use any of the other propr functions, including FDR estimation. Though, I might be able to hack an update together this week that allows the user to do their own transformation before running the rest of the guts of the program. For example, something like

# Not yet implemented
clr <- function(x) sweep(log(x), 1, rowMeans(log(x)), "-")
REL <- cbind(clr(met.rel), clr(mic.rel))
pr.r propr(REL, ivar = NA)

I'll look into it Tuesday to see if this is doable and update you either way.

FYI we've also written a small commentary on compositional multi-omics analysis. It elaborates on the vertical integration approach in more detail. It sounds like you already understand why each data set needs its own reference, but not everyone gets this point...

https://www.biorxiv.org/content/10.1101/847475v1

Enjoy your weekend!

tpq · 2020-04-07T00:58:16Z

If you perform your own (multi-omic) transformation, you can now pass it through propr, and access all of the helper/wrapper functions. Here is a reproducible example for you.

devtools::install_github("tpq/propr")
library(propr)
data(iris)
met.rel <- iris[,1:2]
mic.rel <- iris[,3:4]
clr <- function(x) sweep(log(x), 1, rowMeans(log(x)), "-")
REL <- cbind(clr(met.rel), clr(mic.rel))
pr <- propr(REL, ivar = NA)

I've done a few tests and everything looks OK. Please let me know if something strange happens and I'll give it a fix.

handibles · 2020-04-08T08:27:35Z

That's class! Was initially hoping for simple guidance on the propriety of doing this, so this is well beyond. I'll start cramming bugs into pipes and let you know how it goes.

handibles closed this as completed Apr 8, 2020

taylorreiter mentioned this issue Sep 1, 2020

Question about how to interpret rho and FDR #17

Open

tpq changed the title ~~combining propr sets~~ Question about using propr for multi-omics data Sep 1, 2020

tpq reopened this Sep 1, 2020

tpq added question helpful This question has been marked as potentially helpful to others. labels Sep 1, 2020

tpq changed the title ~~Question about using propr for multi-omics data~~ Question about using propr with multi-omics data Sep 1, 2020

tpq mentioned this issue Jun 12, 2021

Question about scaling propr to very large data sets (Part 2) #24

Open

tpq mentioned this issue Nov 24, 2021

Question about when to use CLR or ILR #11

Open

mirpie mentioned this issue Sep 17, 2023

[Question] Revisiting Multi-omic Analysis #38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about using propr with multi-omics data #9

Question about using propr with multi-omics data #9

handibles commented Apr 3, 2020

tpq commented Apr 3, 2020

handibles commented Apr 4, 2020 •

edited

Loading

tpq commented Apr 4, 2020 •

edited

Loading

tpq commented Apr 7, 2020 •

edited

Loading

handibles commented Apr 8, 2020

Question about using propr with multi-omics data #9

Question about using propr with multi-omics data #9

Comments

handibles commented Apr 3, 2020

tpq commented Apr 3, 2020

handibles commented Apr 4, 2020 • edited Loading

tpq commented Apr 4, 2020 • edited Loading

tpq commented Apr 7, 2020 • edited Loading

handibles commented Apr 8, 2020

handibles commented Apr 4, 2020 •

edited

Loading

tpq commented Apr 4, 2020 •

edited

Loading

tpq commented Apr 7, 2020 •

edited

Loading