Error executing process > 'mod:dss' #27

SilviaMariaMacri · 2024-06-19T10:08:43Z

Operating System

Other Linux (please specify below)

Other Linux

Red Hat Enterprise Linux release 8.6

Workflow Version

v.1.2.1

Workflow Execution

Command line (Cluster)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

/hpcshare/genomics/ASL_ONC/NextFlow_RunningDir/nextflow-23.10.0-all run epi2me-labs/wf-somatic-variation -profile singularity -resume -process.executor pbspro -process.memory 256.GB -work-dir /archive/s2/genomics/onco_nanopore/test_som_var/work -with-timeline --snv --sv --mod --sample_name OHU0002HI --bam_normal /archive/s2/genomics/onco_nanopore/HUM_OHU_OHU0002HTNDN/OHU0002HTNDN_dx0_dx-1_new.bam --bam_tumor /archive/s2/genomics/onco_nanopore/HUM_OHU_OHU0002ITTDN/OHU0002ITTDN_dx0_dx-1_new.bam --ref /archive/s1/sconsRequirements/databases/reference/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta --out_dir /archive/s2/genomics/onco_nanopore/test_som_var --basecaller_cfg [email protected] --phase_normal --classify_insert --force_strand --normal_min_coverage 0 --tumor_min_coverage 0 --haplotype_filter_threads 32 --severus_threads 32 --dss_threads 4 --modkit_threads 32 -process.cpus 32 -process.queue fatnodes

Workflow Execution - CLI Execution Profile

singularity

What happened?

Pipeline failed in its last step mod:dss.

During the issue replication (command "bash .command.run" in the working directory), as suggested by the error message, more information was shown:

System errno 22 unmapping file: Invalid argument
Error in fread("normal.bed", sep = "\t", header = T) :
Opened 15.96GB (17139453993 bytes) file ok but could not memory map it. This is a 64bit process. There is probably not enough contiguous virtual memory available.
Execution halted

Relevant log output

Error executing process > 'mod:dss (3)'

Caused by:
  Process `mod:dss (3)` terminated with an error exit status (137)

Command executed:

  #!/usr/bin/env Rscript
  library(DSS)
  require(bsseq)
  require(data.table)
  # Disable scientific notation
  options(scipen=999)
  
  # Import data
  tumor = fread("tumor.bed", sep = '	', header = T)
  normal = fread("normal.bed", sep = '	', header = T)
  # Create BSobject
  BSobj = makeBSseqData( list(tumor, normal),
      c("Tumor", "Normal") )
  # DML testing
  dmlTest = DMLtest(BSobj, 
      group1=c("Tumor"), 
      group2=c("Normal"),
      equal.disp = FALSE,
      smoothing=TRUE,
      smoothing.span=500,
      ncores=4)
  # Compute DMLs
  dmls = callDML(dmlTest,
      delta=0.25,
      p.threshold=0.001)
  # Compute DMRs
  dmrs = callDMR(dmlTest,
      delta=0.25,
      p.threshold=0.001,
      minlen=100,
      minCG=5,
      dis.merge=1500,
      pct.sig=0.5)
  # Write output files
  write.table(dmls, 'OHU0002HI.6mA_+.dml.tsv', sep='\t', quote=F, col.names=T, row.names=F)
  write.table(dmrs, 'OHU0002HI.6mA_+.dmr.tsv', sep='\t', quote=F, col.names=T, row.names=F)

Command exit status:
  137

Command output:
  (empty)

Command error:
  
      anyMissing, rowMedians
  
  
  Attaching package: 'MatrixGenerics'
  
  The following objects are masked from 'package:matrixStats':
  
      colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
      colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
      colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
      colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
      colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
      colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
      colWeightedMeans, colWeightedMedians, colWeightedSds,
      colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
      rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
      rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
      rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
      rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
      rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
      rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
      rowWeightedSds, rowWeightedVars
  
  The following object is masked from 'package:Biobase':
  
      rowMedians
  
  Loading required package: parallel
  Loading required package: data.table
  
  Attaching package: 'data.table'
  
  The following object is masked from 'package:SummarizedExperiment':
  
      shift
  
  The following object is masked from 'package:GenomicRanges':
  
      shift
  
  The following object is masked from 'package:IRanges':
  
      shift
  
  The following objects are masked from 'package:S4Vectors':
  
      first, second
  
  .command.run: line 164:    35 Killed                  /usr/bin/env Rscript .command.sh

Work dir:
  /archive/s2/genomics/onco_nanopore/test_som_var/work/20/a4581d28e28dd29ec5e3e0e78d757f

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

no

Other demo data information

no attempt

RenzoTale88 · 2024-06-19T10:13:05Z

@SilviaMariaMacri the process is running out of memory (error 137). You can try reducing the number of threads for the DSS process to --dss_threads 2, which should reduce the amount of memory required.

SilviaMariaMacri · 2024-06-19T12:42:18Z

@RenzoTale88 thank you for your reply.
I reduced --dss_threads firstly to 2 and then to 1 but it gave me the same error.

RenzoTale88 · 2024-06-19T15:02:36Z

Then you can try increasing the memory provided to the DSS process. Simply save the following block of code in a separate file:

process {
    withName: dss {
        memory = X.GB
    }
}

Where X is the amount of memory in GB that the process should use. Save the file as a custom configuration ending with .config and provide it to nextflow with the -c option:

nextflow run epi2me-labs/wf-somatic-variation -c < path to custom config file> < options here >

RenzoTale88 · 2024-06-25T09:09:09Z

@SilviaMariaMacri did you try providing a custom configuration file as mentioned above?

SilviaMariaMacri · 2024-06-25T09:26:51Z

@RenzoTale88
yes, but after setting the memory to 256 GB it kept giving me the same error. Then I manually modified the .command.run file by setting the memory to 380 GB and launched the job out of the pipeline; it seems to have successfully completed the job after almost 70 hours of running time.
Now the pipeline is running with the new memory setting and I think it will finish without error since the single job did it.
What do you think the reason of so long running time and this high memory use are? Can it be avoided?

RenzoTale88 · 2024-06-25T09:36:47Z

@SilviaMariaMacri it is quite difficult to say. The DSS process, as the name suggests, relies on the DSS R package to identify the differentially modified regions/loci. The impact on the memory is linked to the size of the dataset and the number of cores used for the analysis, which makes it difficult to predict for every use-case.

SilviaMariaMacri · 2024-07-12T14:25:17Z

Hi @RenzoTale88,

I'm using two whole genome sequencing bam files obtained with dorado and with double methylation (5mC_5hmC and 6mA). The bam file weight is 87G and 120G respectively for normal and tumor tissue.
Six mod:dss processes are sent to pbs code, three of them successfully finish, the fourth one reaches the maximum time limit of 100 hours (each job setting consists of 1 cpu and 750GB of memory).
So, by increasing the number of cpus I obtain memory error and by setting only 1 cpu I obtain time limit error.

Are there any plans to solve this problem by maybe dividing the input files into more than one file (i.e. one for each chromosome) and lauching the job separately for each file? Alternatively, do you have any suggestion to solve my case?

Thanks

RenzoTale88 · 2024-07-12T14:29:58Z

Hi @SilviaMariaMacri sorry to hear this is giving you issues. Do you have access to the logs of the processes failing (i.e. do you have access to the work directory)? That might help us figure out what is going wrong.

SilviaMariaMacri · 2024-07-13T15:58:09Z

Thank you for you answer @RenzoTale88
Yes, here are two log files (with exit status 143 and 130), but I can't get much information
.command.log_exitcode130.log
.command.log_exitcode143.log

RenzoTale88 · 2024-07-16T15:51:42Z

@SilviaMariaMacri thanks for sharing. I'll see if there is a way to reduce the memory usage of the process. I'll keep you updated on the process. Thanks in advance for your patience!

SilviaMariaMacri · 2024-08-06T12:56:33Z

Hi @RenzoTale88
do you have any update on the process? Thank you

RenzoTale88 · 2024-08-14T12:36:31Z

@SilviaMariaMacri sorry for the long silence. We have been running a number of tests, trying to figure out how to improve the situation, and are still working on a longer term solution for the memory issue.
In the meanwhile, we released v1.3.1 that adds the option --diff_mod, that can disable DSS by setting it to false. This should allow the workflow to run to completion, and to emit the outputs that you can then analyse manually.
We realise this is not a solution, and I apologise for the inconvenience.

Andrea

SilviaMariaMacri mentioned this issue Jun 24, 2024

Error executing process > 'ingress_normal:checkBamHeaders (1)' #26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error executing process > 'mod:dss' #27

Error executing process > 'mod:dss' #27

SilviaMariaMacri commented Jun 19, 2024

RenzoTale88 commented Jun 19, 2024

SilviaMariaMacri commented Jun 19, 2024

RenzoTale88 commented Jun 19, 2024

RenzoTale88 commented Jun 25, 2024

SilviaMariaMacri commented Jun 25, 2024

RenzoTale88 commented Jun 25, 2024

SilviaMariaMacri commented Jul 12, 2024

RenzoTale88 commented Jul 12, 2024

SilviaMariaMacri commented Jul 13, 2024

RenzoTale88 commented Jul 16, 2024

SilviaMariaMacri commented Aug 6, 2024

RenzoTale88 commented Aug 14, 2024

Error executing process > 'mod:dss' #27

Error executing process > 'mod:dss' #27

Comments

SilviaMariaMacri commented Jun 19, 2024

Operating System

Other Linux

Workflow Version

Workflow Execution

Other workflow execution

EPI2ME Version

CLI command run

Workflow Execution - CLI Execution Profile

What happened?

Relevant log output

Application activity log entry

Were you able to successfully run the latest version of the workflow with the demo data?

Other demo data information

RenzoTale88 commented Jun 19, 2024

SilviaMariaMacri commented Jun 19, 2024

RenzoTale88 commented Jun 19, 2024

RenzoTale88 commented Jun 25, 2024

SilviaMariaMacri commented Jun 25, 2024

RenzoTale88 commented Jun 25, 2024

SilviaMariaMacri commented Jul 12, 2024

RenzoTale88 commented Jul 12, 2024

SilviaMariaMacri commented Jul 13, 2024

RenzoTale88 commented Jul 16, 2024

SilviaMariaMacri commented Aug 6, 2024

RenzoTale88 commented Aug 14, 2024