Skip to content

Commit

Permalink
LDpred2.R: redirect temporary data during runs (#204)
Browse files Browse the repository at this point in the history
* `LDpred2.R`: redirect temporary data
Fixes #203

* Check if tempdir exist before creating tempfile

* updated changelog
  • Loading branch information
espenhgn authored Oct 17, 2023
1 parent deed453 commit 7e15724
Show file tree
Hide file tree
Showing 4 changed files with 37 additions and 5 deletions.
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,13 @@ If MD5 sum is not listed for a certain release then it means that the container
### Misc

* Miscellaneous goes here

## [1.3.9] - 2023-10-17

### Added

* User-set directory option for temporary files during LDpred2 runs, by default `base::tempdir()`

## [1.3.8] - 2023-10-17

### Fixed
Expand All @@ -50,6 +57,10 @@ If MD5 sum is not listed for a certain release then it means that the container

* Added a feature to read and convert BGEN (.bgen) files to ``scripts/pgs/LDpred2/createBackingFile.R``

## [1.3.7] - 2023-10-12

* User-set directory for temporary files during LDpred2 runs, by default `base::tempdir()`

## [1.3.6] - 2023-08-17

### Fixed
Expand Down
19 changes: 18 additions & 1 deletion scripts/pgs/LDpred2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ usage: ldpred2.R [--] [--help] [--out-merge] [--geno-impute-zero]
N-CONTROLS] [--name-score NAME-SCORE] [--hyper-p-length
HYPER-P-LENGTH] [--hyper-p-max HYPER-P-MAX] [--ldpred-mode
LDPRED-MODE] [--cores CORES] [--set-seed SET-SEED]
[--genomic-build GENOMIC-BUILD]
[--genomic-build GENOMIC-BUILD] [--tmp-dir TMP-DIR]
Calculate polygenic scores using ldpred2
Expand Down Expand Up @@ -336,3 +336,20 @@ As above, ``<path/to/containers`` should point to the cloned ``containers`` repo
Entries like ``--partition=normal`` may also be adapted for different HPC resources.
Then, the job can be submitted to the queue by issuing ``sbatch run_ldpred2_slurm.job``.
The status of running jobs can usually be enquired by issuing ``squeue -u $USER``.


### Redirect temporary file output

By default, the LDpred2.R script will put large file(s) in the system temporary directory (using `base::tempdir()`).
For use on HPC resources, use of the designated `$SCRATCH`, `$LOCALTMP`, or `$TMPDIR` directories is recommended to avoid
filling up the system temporary directory.

One can redirect the temporary file output by setting the `TMPDIR` environment variable to a mounted directory on the HPC resource,
by incorporating the following lines to the job script:

```
export SINGULARITY_BIND=$REFERENCE:/REF,${LDPRED2_REF}:/ldpred2_ref,$SCRATCH:/scratch
export SINGULARITYENV_TMPDIR=/scratch
```

Otherwise, the location of temporary files can be specified by the `--tmp-dir` argument to the `ldpred2.R` script.
10 changes: 7 additions & 3 deletions scripts/pgs/LDpred2/ldpred2.R
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ par <- add_argument(par, "--cores", help="Number of CPU cores to use, otherwise
par <- add_argument(par, '--set-seed', help="Set a seed for reproducibility", nargs=1)
par <- add_argument(par, "--merge-by-rsid", help="Merge using rsid (the default is to merge by chr:bp:a1:a2 codes).", flag=TRUE)
par <- add_argument(par, "--genomic-build", help="Genomic build to use. Either hg19, hg18 or hg38", default="hg19", nargs=1)
par <- add_argument(par, "--tmp-dir", help="Directory to store temporary files. Default is output of base::tempdir()", default=tempdir())

parsed <- parse_args(par)

Expand Down Expand Up @@ -135,6 +136,9 @@ if (fileOutputMerge) {
verifyScoreOutputFile(fileOutput, nameScore, fileOutputMergeIDs)
}

# check if tmp dir exists
if (!file.exists(parsed$tmp_dir)) stop("Temporary directory", parsed$tmp_dir, "does not exist")

cat('Loading backingfile:', fileGeno ,'\n')
obj.bigSNP <- snp_attach(fileGeno)

Expand Down Expand Up @@ -249,7 +253,7 @@ drops <- c("_NUM_ID_.ss", "rsid.ss", 'block_id', 'pos_hg18', 'pos_hg38')
df_beta <- df_beta[ , !(names(df_beta) %in% drops)]

cat('\n### Loading LD reference from ', fileLD, '\n')
tmp <- tempfile(tmpdir = "tmp-data")
tmp_file <- tempfile(tmpdir=parsed$tmp_dir)
ld_size <- 0; corr <- NULL
for (chr in chr2use) {
## indices in 'df_beta' corresponding to a particular 'chr'
Expand All @@ -269,7 +273,7 @@ for (chr in chr2use) {
corr_chr <- readRDS(fileLD_chr)[ind.chr3, ind.chr3]

if (is.null(corr)) {
corr <- as_SFBM(corr_chr, tmp, compact = TRUE)
corr <- as_SFBM(corr_chr, tmp_file, compact = TRUE)
} else {
corr$add_columns(corr_chr, nrow(corr))
}
Expand Down Expand Up @@ -335,4 +339,4 @@ if (fileOutputMerge) cat('Merging by', paste0(fileOutputMergeIDs, collapse=', ')
writeScore(obj.bigSNP$fam, fileOutput, nameScore, fileOutputMerge, fileOutputMergeIDs)
cat('Scores written to', fileOutput, '\n')
# Drop temporary file
fileRemoved <- file.remove(paste0(tmp, '.sbk'))
fileRemoved <- file.remove(paste0(tmp_file, '.sbk'))
2 changes: 1 addition & 1 deletion version/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
_MINOR = "3"
# On main and in a nightly release the patch should be one ahead of the last
# released build.
_PATCH = "8"
_PATCH = "9"
# This is mainly for nightly builds which have the suffix ".dev$DATE". See
# https://semver.org/#is-v123-a-semantic-version for the semantics.
_SUFFIX = ""
Expand Down

0 comments on commit 7e15724

Please sign in to comment.