Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with the loop in the code #513

Open
khujithrajueni opened this issue Aug 1, 2024 · 3 comments
Open

Issue with the loop in the code #513

khujithrajueni opened this issue Aug 1, 2024 · 3 comments

Comments

@khujithrajueni
Copy link

khujithrajueni commented Aug 1, 2024

Hello Florian,

Issue1:

I am currently trying to do a PRS, however, I am facing a problem, that is, R studio is getting aborted each time I start sunning the loop from here.
These are the codes I am using:

for (chr in 1:22) {
  ind.chr <- which(info_snp$chr == chr)
  ind.chr2 <- info_snp$`_NUM_ID_`[ind.chr]
  corr0 <- snp_cor(
    genotype,
    ind.col = ind.chr2,
    ncores = NCORES,
    infos.pos = POS2[ind.chr2],
    size = 3 / 1000
  )
  if (chr == 1) {
    ld <- Matrix::colSums(corr0^2)
    corr <- as_SFBM(corr0, tmp, compact = TRUE)
  } else {
    ld <- c(ld, Matrix::colSums(corr0^2))
    corr$add_columns(corr0, nrow(corr))
  }
}

(I have followed this too: To use the “compact” format for SFBMs, you need packageVersion("bigsparser") >= package_version("0.5"). Make sure to reinstall {bigsnpr} after updating {bigsparser} to this new version (to avoid crashes). - even though I have a version >= 0.5, the system still seems to get aborted).

Issue 2:
When I ran the codes the second time, the script shows that tmp does not exist and the loop continues to run. This time it does not get aborted but just keeps running all night. (Is there an average based on your expertise, how long the loop usually take?) - even though there was tmp in the environment.

@privefl
Copy link
Owner

privefl commented Aug 1, 2024

  • At which step does the R session crash?
  • Where is tmp defined?
  • What are the dimensions of genotype?

@privefl privefl transferred this issue from privefl/bigparallelr Aug 1, 2024
@khujithrajueni
Copy link
Author

khujithrajueni commented Aug 1, 2024

a) At which step does the R session crash?
ans) I am not sure at which step in the loop R crashes because I run loop at once. However, it definitely does not crash before the loop as I run each step individually.

b) Where is tmp defined?
ans) In the environment, tmp is defined under value. In the code, it is just before the loop.

c) What are the dimensions of genotype?
ans) 5402 x 10928335

@privefl
Copy link
Owner

privefl commented Aug 2, 2024

I guess you're using too many variants for computing the LD.
This will take a lot of time + a lot of memory (which can cause crashes).

I don't know what you want to do with these very large LD matrices, but if you want to use these to run LDpred2 on 11M variants, there are several issues here discussing why this is not recommended for now.

In case you really want to compute these matrices for something else, I would recommend computing and storing all these corr0 (e.g. with runonce::save_run() or simply saveRDS()). You can do this on multiple nodes, and see if they all finish in e.g. 12 hours, or the time and memory it already took for the smallest chromosomes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants