Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: posthoc reports finding negative measurements where there is none #12

Open
tormolle opened this issue May 4, 2020 · 7 comments
Labels

Comments

@tormolle
Copy link

tormolle commented May 4, 2020

Heya, Thom

I've been using your package for a while and recently tried the newly implemented posthoc function, but it threw me an error:

Error in if (any(counts < 0)) stop("Data may not contain negative measurements.") :
missing value where TRUE/FALSE is needed

The raw data table used in the analysis is too large to post here, but I believe the procedure is straightforward. The data table is an OTU table of counts, with no negative, infinite or NA values, and only numeric columns. The following snippet describes my pipeline:
otus.cmR <- zCompositions::cmultRepl(otus) # stores counts as proportions
otus.propd <- propd(otus.cmR, group = map$diag_zone) # diag zone is a vector of character strings indicating the diagenetic zone of each sample, with six different states.
otus.propd <- setActive(otus.propd)
otus.propd <- updateCutoffs.propd(otus.propd, cutoff = c(0.05, 0.95, 0.3))
otus.propd <- posthoc(otus.propd) # This is where the error arises.

I checked the output of the cmultRepl function, and there are neither negative nor invalid values; I guess the successful execution of the code lines above the post hoc commant all testify to this.

Best,
Tor Einar

@tpq
Copy link
Owner

tpq commented May 4, 2020

Thanks Tor. I have a few questions...

  1. Could you run: updateF(otus.propd) and let me know if this reproduces the error?

  2. Is map$diag_zone a factor or a character? (I don't know why one would work vs. the other, but make sure it is a character so I can rule out that as the cause)

  3. Could you confirm that the following snippet works?

data(iris)
x <- iris[,1:4]
y <- iris[,5]

library(propr)
otus.propd <- propd(x, group = y)
otus.propd <- setActive(otus.propd)
otus.propd <- updateCutoffs.propd(otus.propd, cutoff = c(0.05, 0.95, 0.3))
otus.propd <- posthoc(otus.propd)

If not, update with devtools::install_github("tpq/propr")

@tpq tpq added the bug label May 4, 2020
@tormolle
Copy link
Author

tormolle commented May 5, 2020

  1. updateF(otus.propd) works.
  2. class(map$diag_zone) gives "character".
  3. The snippet works.

One thing I forgot to add in the initial post, which I believe is insignificant, is that the propd object that's created is stored as a list element as follows:
cmp$propd <- propd(cmp$otus, group = cmp$map$diag_zone)
I'm sorry if leaving this information out is what has caused the trouble.

Edit: I updated the package with devtools::install_github("tpq/propr"). Previously, the function message on ANOVA being run appeared before the error, but this is not the case anymore. The error persists.

@tpq
Copy link
Owner

tpq commented May 5, 2020

I admit I'm a bit stumped without de-bugging the data first hand. Do you feel comfortable sharing the OTU and labels (as separate CSVs) via private message?

@tormolle
Copy link
Author

tormolle commented May 5, 2020

I did another check of the mapping file that went into propd. I see that it contains some NAs. I don't know if that could help by pointing towards possible NA occurrences in the resulting propd object, for instance if samples labelled with NA are somehow treated as a group or if the mapping vector is coerced into character for the analysis but saved in its initial state beforehand, and this initial state is then re-used in posthoc? Just thinking out loud. Nevertheless, it seems weird to me that this would pass through all the other functions without problem.

I'll send you the current data via private message.

@tormolle
Copy link
Author

tormolle commented May 5, 2020

I did an additional check for my data and saw that all theta values were 1. Weird. I tried rerunning the initial snippet, except for the updateCutoffs.propd function, with the updated mapping vector, now without NA values, and it's currently not throwing me any errors where it used to. I'll update this thread again when posthoc has finished running - i don't know how long that's expected to take for my data.

In summary, it seems the mistake is on me for feeding propd NAs in the group vector. I'd suggest letting the propd function throw a warning or error in the future if this turns out to be is the case, though.

Update: It worked without error.

@tpq
Copy link
Owner

tpq commented May 5, 2020

posthoc() does do something different than all other functions... it tries to parse the group vector into a set of pairwise contrasts. Something about the NAs must cause this to fail loudly, while the NAs might cause the other methods to fail silently? I'll be sure to add an explicit check in the next version!!

@tpq
Copy link
Owner

tpq commented May 5, 2020

(By the way, when working with propd, you can use updateF to get exact p-values. This is much, much faster than the updateCutoffs approach.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants