Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about using propr with multi-omics data #9

Open
handibles opened this issue Apr 3, 2020 · 5 comments
Open

Question about using propr with multi-omics data #9

handibles opened this issue Apr 3, 2020 · 5 comments
Labels
helpful This question has been marked as potentially helpful to others. question

Comments

@handibles
Copy link

Thanks and regards as ever to the devs.

I'm considering several sets of 'omic data generated from the same cohort of FASTQ files (i.e. taxonomic, pathways, etc). The way of the CoDa seems a good choice for integrating these values, but I've not seen it mentioned / suggested / gainsaid anywhere.

Was planning on subsetting the sets independently (but at similar levels) using propr(... select = ) and then combine, but propr has no native function I can see for this. It could just be an unaddressed usecase, but I would be interested to hear a sane thought on the issue.

Hope all are well.

@tpq
Copy link
Owner

tpq commented Apr 3, 2020

G'day, and thanks for your interest in propr.

Are you suggesting that that you have multiple data sets which come from the same samples? For example, 16s data, pathway data, etc., each from the same N samples? I assume the pathway data then are derived from the 16s data (and so forth). In that case, it may make more sense to treat each data set separately, and apply propr K times for K data sets.

I'm not exactly sure what you mean by "combine". What propr can do is measure gene-gene (or microbe-microbe or pathway-pathway) associates for a single data set. If you want to integrate data sets in a multivariate analysis, I'd recommend the mixOmics package. They have an option in their software to apply a CLR and do it the CoDa way too.

Anyways, we've recently put together a (hopefully) easy-to-follow workflow for compositional data analysis that covers differential abundance and association testing. You might find it helpful!

https://academic.oup.com/gigascience/article/8/9/giz107/5572529

Feel free to describe your data in more detail and I can advise.

Otherwise, let me know if you have some more questions.

@handibles
Copy link
Author

handibles commented Apr 4, 2020

Love the link! Super useful, thanks.

Your take on the X sets * N samples is close, but in this case it's shotgun:taxonomy and shotgun:metabolic assignments, generated independent through different pipelines but from the same fastq. To me this seems a reasonable approximation of multiomics (thanks also for the mixOmics recommendation), but thoughts welcome.

By "combine", simply meant concatenate datasets ( y <- c(A,B,C) ) and then investigate proportions between those various features (propr(y, ...)). Was wondering if there was perhaps a smoother way to cat propr objects together. Current method is, as you say, K CLR transforms of K sets, combine, then propr.

Ultimate goal is to be able to determine proportionality between taxonomic and metabolic features generated from the same data/sample/fastq set. Had though that CoDa would operate well between multiomics sets, assuming each set had an appropriately chosen denominator.

This is perhaps edging into a broader question, but in what ways are intercomparisons between CoDa sets (e.g. propr's rho values) restricted once all data has been appropriately transformed out of the dread simplex?


edit: checked out the publication provided above, which for reference of others seems to cover this issue fairly exactly under the heading "Vertical data integration". From Quinn et al., 2020:

For proportionality and differential proportionality analysis, we would need to log-ratio transform each -omics source independently, then column join them with cbind. Here, any proportionality occurring between features from different sources would be with respect to 2 references and must get interpreted accordingly

@tpq
Copy link
Owner

tpq commented Apr 4, 2020

Ah yes! Now we're on the same page!

Had though that CoDa would operate well between multiomics sets, assuming each set had an appropriately chosen denominator.

This is correct. Unfortunately, I haven't yet written a nice API for this (though it's probably overdue). But there is a rough work-around that can get you started. Something like this...

# For 2 -omics data sets called `met.rel` and `mic.rel`...
clr <- function(x) sweep(log(x), 1, rowMeans(log(x)), "-")
REL <- cbind(clr(met.rel), clr(mic.rel))
pr.r <- propr:::lr2rho(as.matrix(REL))
colnames(pr.r) <- colnames(REL)
rownames(pr.r) <- colnames(REL)

Unfortunately, you won't be able to use any of the other propr functions, including FDR estimation. Though, I might be able to hack an update together this week that allows the user to do their own transformation before running the rest of the guts of the program. For example, something like

# Not yet implemented
clr <- function(x) sweep(log(x), 1, rowMeans(log(x)), "-")
REL <- cbind(clr(met.rel), clr(mic.rel))
pr.r propr(REL, ivar = NA)

I'll look into it Tuesday to see if this is doable and update you either way.

FYI we've also written a small commentary on compositional multi-omics analysis. It elaborates on the vertical integration approach in more detail. It sounds like you already understand why each data set needs its own reference, but not everyone gets this point...

https://www.biorxiv.org/content/10.1101/847475v1

Enjoy your weekend!

@tpq
Copy link
Owner

tpq commented Apr 7, 2020

If you perform your own (multi-omic) transformation, you can now pass it through propr, and access all of the helper/wrapper functions. Here is a reproducible example for you.

devtools::install_github("tpq/propr")
library(propr)
data(iris)
met.rel <- iris[,1:2]
mic.rel <- iris[,3:4]
clr <- function(x) sweep(log(x), 1, rowMeans(log(x)), "-")
REL <- cbind(clr(met.rel), clr(mic.rel))
pr <- propr(REL, ivar = NA)

I've done a few tests and everything looks OK. Please let me know if something strange happens and I'll give it a fix.

@handibles
Copy link
Author

That's class! Was initially hoping for simple guidance on the propriety of doing this, so this is well beyond. I'll start cramming bugs into pipes and let you know how it goes.

@tpq tpq changed the title combining propr sets Question about using propr for multi-omics data Sep 1, 2020
@tpq tpq reopened this Sep 1, 2020
@tpq tpq added question helpful This question has been marked as potentially helpful to others. labels Sep 1, 2020
@tpq tpq changed the title Question about using propr for multi-omics data Question about using propr with multi-omics data Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
helpful This question has been marked as potentially helpful to others. question
Projects
None yet
Development

No branches or pull requests

2 participants