-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generateSignatures error #1
Comments
Hi Anna, Thanks for your interest in our work! Let me try to unpack your situation with a few questions: 1) You have 74 shallow WGS (sWGS). Are they ovarian cancer? To calculate the exposures you would need to run three functions:
This assumes that your working directory is the folder of the CNsignatures package. If not, you would need to specify the => If not, derivation of new signatures could be done given a few constraints. 2) How does your copy number look like? What tool do you use to generate the segmentation? Do you use unrounded copy number? => For sWGS, we currently recommend QDNAseq: => Since you talk about using Poisson models for the changepoint distribution I guess you use rounded copy number (aka integers: 1, 2, 3...). Unrounded copy number (aka float: 1.23, 2.49, ...) is important to us because the rounding hides interesting information: 1.51 and 2.49 would be rounded to 2 masking potential subclonal gains or losses. Let me know how you get along. |
Hi Ruben, Thank you so much for your prompt and thoughtful reply! Our samples are from prostate cancer. I am using ichorCNA off-target to call copy number (a targeted sequencing panel was used for these tumors and ichorCNA off-target treats the off-target reads like sWGS; https://github.com/GavinHaLab/ichorCNA_offtarget). Thank you for your note about rounded vs. unrounded copy number. I was previously using rounded copy number, but switching to unrounded copy number does indeed fix the changepoint distribution problem I was having. CNSignatures with the pre-defined 7 signatures now runs as expected. When trying to derive new signatures, however, I am still getting the same rowSums error as before. This time I ran with chosen_num_signatures <- 6 (I don't fully understand how to identify the "point of stability in the cophenetic, dispersion and silhouette coefficients", but picked 6 as it looks to be the "maximum sparsity achievable above the null model for the basis matrix"--https://www.nature.com/articles/s41588-018-0179-8/figures/8). Here is my chooseNumberSignatures plot this time: When I try to inspect my component_by_signature variable that I am passing in to generateSignatures, I see: component_by_signature Thank you so much for your help, |
Hi Anna, As far as I see it, there are two challenges in your analysis: Prostate as a cancer type and the computational aspect of deriving signatures. At this point it might be important to discuss your sample cohort because most downstream problems are alleviated once you have a nice grip on your samples. Here are a few recommendations to tackle both challenges: 2) Segmentation algorithms. But, and this is a huge problem, segmentation algorithms differ wildly in their results. This has again has downstream effects on signature generation. This problem is becoming so important that our lab is developing its own segmentation algorithm for shallow WGS. That doesn't help you right now but it should give you an indication of how important proper segmentation is. 3) Chose signatures. As you already mentioned and how the paper briefly describes, a good start is to look at the sparseness plot and see at which factorisation (K) the randomness overtakes the observed sparseness: that is the dashed red line versus the full red line. This gives you an upper bound on how many signatures you might have. The intuition behind this plot is, that if the basis plot (the signature definitions) carry more sparseness than expected by chance, then we capture biological signal. 6 is quite close but still more sparse than random matrices, so 5 might be a good choice as well. Given the other plots, there is little difference between 5 or 6 signatures. To me both solutions would make sense from a computational point of view. The NMF per se is a computational exercise, it will always give you an answer. Whether it biologically useful and tells you something about prostate cancer biology depends on the data quality and interpretation of the results. 4) Rowsums Error code. Hope that helps. |
Hello, and thank you for your tool!
I am able to run CNSignatures with default signatures for a dataset of 74 ultra low-pass WGS (albeit only when I change the distribution from Gaussian to Poisson for the "changepoint" feature mixture model--otherwise I get an error).
When I try to generate my own signatures, however, the step generateSignatures gives the error "Error in rowSums(t_df) : 'x' must be an array of at least two dimensions". Additionally, chooseNumberSignatures uses >5,000% CPU even though it looks to me like it is set to use one core.
The steps I used were:
cn_features <- extractCopynumberFeatures(list_of_segment_tables)
generated_components <- fitMixtureModels(cn_features)
sample_by_generated_component_matrix <-generateSampleByComponentMatrix(cn_features, generated_components)
number_signatures <- chooseNumberSignatures(sample_by_generated_component_matrix)
chosen_num_signatures <- 7
component_by_signature <- generateSignatures(sample_by_generated_component_matrix, chosen_num_signatures)
Here is what my chooseNumberSignatures plot looks like:
Do you have any insight into how to get generateSignatures to work? And how to make chooseNumberSignatures use less CPU? Please let me know what other information would be helpful to you.
Thank you so much,
Anna
The text was updated successfully, but these errors were encountered: