Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about significant proportionalities #25

Open
alopgar opened this issue Jun 23, 2021 · 3 comments
Open

Question about significant proportionalities #25

alopgar opened this issue Jun 23, 2021 · 3 comments
Labels
helpful This question has been marked as potentially helpful to others. question

Comments

@alopgar
Copy link

alopgar commented Jun 23, 2021

Hello, @tpq! I write because I am having issues with the equivalency between traditional correlation's p-values and rho proportionality significance.
One reviewer asked me for the p-values in my correlations between microbial abundances. As I used rho coefficient, I am not sure wether it is possible to calculate p-values or not. I read this vignette and other Issues in this forum that helped me to understand the interpretation of cutoff and FDR, but I still don't know if selecting a threshold at which my FDR is 0.0001 could be equivalent to claim that all my proportionalities above that cutoff are statistically significant.
Could you help me?
Thanks!

@tpq
Copy link
Owner

tpq commented Jun 28, 2021

Hey, thanks for your interest in propr.

This is a tough one. Calculating an exact p-value requires prior knowledge about how data are generated under the null condition. It is possible to calculate p-values for correlations because Fisher derived that his coefficient can be related to a normal distribution via an inverse hyperbolic tangent function (https://en.wikipedia.org/wiki/Fisher_transformation). I am not aware of any such similar derivation for proportionality, which is why exact p-values are unavailable.

This is why we provide FDR as an alternative (kudos to Ionas Erb for the idea). For an FDR of 5%, we would expect that 5% of the metrics are false discoveries. When using p-values with Benjamini-Hochberg adjustment, and an alpha = 0.05, you would also expect that 5% of those p-values are false discoveries. So there is a similarity here. The difference is that the p-values are derived theoretically based on a null distribution, while our FDR is derived empirically based on permutations.

(Correction: false DISCOVERIES not false POSITIVES!)

@alopgar
Copy link
Author

alopgar commented Jul 1, 2021

Huge thanks for your answer!
So, in summary, what I did with cutoffs should be enough. If selecting rho >= 0.4 throws an FDR = 0.00001 that should mean that when selecting proportionalities higher than 40% we expect only 0.001% of false discoveries, isn't it?

@tpq
Copy link
Owner

tpq commented Jul 2, 2021

Yup, that's the idea!

In propr, FDR = {# of rho >= cutoff for null condition} / {# of rho >= cutoff for true condition}.

If we get FDR=0, that means no rho are above the cutoff once we shuffle the data. In other words, we don't expect any false discoveries. If we get FDR=1, that means the same number of rho are above the cutoff once we shuffle the data. In other words, we expect pretty much all false discoveries. In your case, you see way more {rho >= cutoff for true condition} than {rho >= cutoff for null condition}, so it's safe to assume most of those rho are real.

@tpq tpq added helpful This question has been marked as potentially helpful to others. question labels Nov 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
helpful This question has been marked as potentially helpful to others. question
Projects
None yet
Development

No branches or pull requests

2 participants