Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sarek v3.1.2 -- error for ascat #977

Open
PyGuan opened this issue Mar 23, 2023 · 8 comments
Open

sarek v3.1.2 -- error for ascat #977

PyGuan opened this issue Mar 23, 2023 · 8 comments

Comments

@PyGuan
Copy link

PyGuan commented Mar 23, 2023

I have encountered the following error. I have tried both with out without --save_output_as_bam. It didn't help. I also ensured sufficient memory (256G) is available.

Thank you!

_-[nf-core/sarek] Pipeline completed with errors-
[27/e13d07] NOTE: Process `NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_ASCAT:ASCAT (CGMH-FH-RCC1T_vs_CGMH-FH-RCC1N)` terminated with an error exit status (1) -- Execution is retried (7)
Error executing process > 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_ASCAT:ASCAT (CGMH-FH-RCC1T_vs_CGMH-FH-RCC1N)'

Caused by:
  Process `NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_ASCAT:ASCAT (CGMH-FH-RCC1T_vs_CGMH-FH-RCC1N)` terminated with an error exit status (1)

Command executed:

  #!/usr/bin/env Rscript
  library(RColorBrewer)
  library(ASCAT)
  options(bitmapType='cairo')
  
  #build prefixes: <abspath_to_files/prefix_chr>
  allele_path = normalizePath("G1000_alleles_hg19")
  allele_prefix = paste0(allele_path, "/", "G1000_alleles_hg19", "_chr")
  
  loci_path = normalizePath("G1000_loci_hg19")
  loci_prefix = paste0(loci_path, "/", "G1000_loci_hg19", "_chr")
  
  #prepare from BAM files
  ascat.prepareHTS(
      tumourseqfile = "T.converted.cram",
      normalseqfile = "N.converted.cram",
      tumourname = paste0("T_vs_N", ".tumour"),
      normalname = paste0("T_vs_N", ".normal"),
      allelecounter_exe = "alleleCounter",
      alleles.prefix = allele_prefix,
      loci.prefix = loci_prefix,
      gender = "XY",
      genomeVersion = "hg19",
      nthreads = 16
      ,minCounts = 10
      ,BED_file = 'wgs_calling_regions_Sarek.list'
      ,chrom_names = c(1:22, 'X', 'Y')
      ,min_base_qual = 20
      ,min_map_qual = 35
      ,ref.fasta = 'human_g1k_v37_decoy.fasta'
  
  
  )
  
  
  #Load the data
  ascat.bc = ascat.loadData(
      Tumor_LogR_file = paste0("T_vs_N", ".tumour_tumourLogR.txt"),
      Tumor_BAF_file = paste0("T_vs_N", ".tumour_tumourBAF.txt"),
      Germline_LogR_file = paste0("T_vs_N", ".tumour_normalLogR.txt"),
      Germline_BAF_file = paste0("T_vs_N", ".tumour_normalBAF.txt"),
      genomeVersion = "hg19",
      gender = "XY"
  )
  
  #Plot the raw data
  ascat.plotRawData(ascat.bc, img.prefix = paste0("T_vs_N", ".before_correction."))
  
  # optional LogRCorrection
  if("GC_G1000_hg19" != "NULL") {
      gc_input = paste0(normalizePath("GC_G1000_hg19"), "/", "GC_G1000_hg19", ".txt")
  
      if("RT_G1000_hg19" != "NULL"){
          rt_input = paste0(normalizePath("RT_G1000_hg19"), "/", "RT_G1000_hg19", ".txt")
          ascat.bc = ascat.correctLogR(ascat.bc, GCcontentfile = gc_input, replictimingfile = rt_input)
          #Plot raw data after correction
          ascat.plotRawData(ascat.bc, img.prefix = paste0("T_vs_N", ".after_correction_gc_rt."))
      }
      else {
          ascat.bc = ascat.correctLogR(ascat.bc, GCcontentfile = gc_input, replictimingfile = RT_G1000_hg19)
          #Plot raw data after correction
          ascat.plotRawData(ascat.bc, img.prefix = paste0("T_vs_N", ".after_correction_gc."))
      }
  }
  
  #Segment the data
  ascat.bc = ascat.aspcf(ascat.bc)
  
  #Plot the segmented data
  ascat.plotSegmentedData(ascat.bc)
  
  #Run ASCAT to fit every tumor to a model, inferring ploidy, normal cell contamination, and discrete copy numbers
  #If psi and rho are manually set:
  if (!is.null(NULL) && !is.null(NULL)){
      ascat.output <- ascat.runAscat(ascat.bc, gamma=1, rho_manual=NULL, psi_manual=NULL)
  } else if(!is.null(NULL) && is.null(NULL)){
      ascat.output <- ascat.runAscat(ascat.bc, gamma=1, rho_manual=NULL)
  } else if(!is.null(NULL) && is.null(NULL)){
      ascat.output <- ascat.runAscat(ascat.bc, gamma=1, psi_manual=NULL)
  } else {
      ascat.output <- ascat.runAscat(ascat.bc, gamma=1)
  }
  
  #Extract metrics from ASCAT profiles
  QC = ascat.metrics(ascat.bc,ascat.output)
  
  #Write out segmented regions (including regions with one copy of each allele)
  write.table(ascat.output[["segments"]], file=paste0("T_vs_N", ".segments.txt"), sep=" ", quote=F, row.names=F)
  
  #Write out CNVs in bed format
  cnvs=ascat.output[["segments"]][2:6]
  write.table(cnvs, file=paste0("T_vs_N",".cnvs.txt"), sep="    ", quote=F, row.names=F, col.names=T)
  
  #Write out purity and ploidy info
  summary <- tryCatch({
          matrix(c(ascat.output[["aberrantcellfraction"]], ascat.output[["ploidy"]]), ncol=2, byrow=TRUE)}, error = function(err) {
              # error handler picks up where error was generated
              print(paste("Could not find optimal solution:  ",err))
              return(matrix(c(0,0),nrow=1,ncol=2,byrow = TRUE))
      }
  )
  colnames(summary) <- c("AberrantCellFraction","Ploidy")
  write.table(summary, file=paste0("T_vs_N",".purityploidy.txt"), sep=" ", quote=F, row.names=F, col.names=T)
  
  write.table(QC, file=paste0("T_vs_N", ".metrics.txt"), sep="  ", quote=F, row.names=F)
  
  # version export
  f <- file("versions.yml","w")
  alleleCounter_version = system(paste("alleleCounter --version"), intern = T)
  ascat_version = sessionInfo()$otherPkgs$ASCAT$Version
  writeLines(paste0('"', "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_ASCAT:ASCAT", '"', ":"), f)
  writeLines(paste("    alleleCounter:", alleleCounter_version), f)
  writeLines(paste("    ascat:", ascat_version), f)
  close(f)

Command exit status:
  1

Command output:
  (empty)

Command error:
  Done reading locis
  Multi pos start:
  Reading locis
  [E::sam_itr_next] Null iterator
  [ERROR] (src/bam_access.c: bam_access_get_multi_position_base_counts:379 errno: No such file or directory) Error detected (-2) when trying to iterate through region
  [ERROR] (./src/alleleCounter.c: main:432 errno: None) Error scanning through bam file for loci list with dense snps.
  munmap_chunk(): invalid pointer
  Done reading locis
  Multi pos start:
  [E::sam_itr_next] Null iterator
  [ERROR] (src/bam_access.c: bam_access_get_multi_position_base_counts:379 errno: No such file or directory) Error detected (-2) when trying to iterate through region
  [ERROR] (./src/alleleCounter.c: main:432 errno: None) Error scanning through bam file for loci list with dense snps.
  Reading locis
  Done reading locis
  Multi pos start:
  munmap_chunk(): invalid pointer
  [E::sam_itr_next] Null iterator
  [ERROR] (src/bam_access.c: bam_access_get_multi_position_base_counts:379 errno: No such file or directory) Error   <<..............>>

Work dir:
  <<..............>>

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`_
@PyGuan
Copy link
Author

PyGuan commented Mar 23, 2023

I ran it with GRCh37 and other default settings.

@FriederikeHanssen
Copy link
Contributor

Hey! This looks like an error with the ASCAT tool. Can you attach the .command.log, .command.out, and .command.errfor the process. You can find it in the linked work dir.

The parameter --save_output_bam doesn't do anything for the variant calling tools. It is used to convert the output of the preprocessing processes (markduplicates and applybqsr) to BAM in the results folder.

@PyGuan
Copy link
Author

PyGuan commented Mar 23, 2023

Here you go. Take note I edited the file to mask my actual working directory.

command.err.txt
command.log.txt
command.sh.txt
command.run.txt

@PyGuan
Copy link
Author

PyGuan commented Mar 24, 2023

Just for you information, I also encountered issues with msisensorpro and deepvariant, but only for GRCh37. It worked okay with GRCh38.
So, I am not sure whether it is related to the reference genome I am using. I downloaded the references from here: s3://ngi-igenomes/igenomes/Homo_sapiens/GATK/GRCh37/

@SusiJo
Copy link
Contributor

SusiJo commented Apr 4, 2023

Actually, this sounds more like an issue with cancerit/alleleCount than Ascat. There are some issues mentioned on their github repo. Can you check out if any of those are of help?

@PyGuan
Copy link
Author

PyGuan commented Apr 6, 2023

Just an update, the error can be due to this "connection time out" error:

[W::find_file_url] [W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/96e414eace405d8c27a6d35ba19df56f": Connection timed outFailed to ope
n reference "https://www.ebi.ac.uk/ena/cram/md5/6c198acf68b5af7b9d676dfdd531b5de": Connection timed out

I'm not sure exactly what's going on. The connection timeout seems to be intermittent. I tried running the pipeline a few times, and each time some samples were cleared.

It could be that our server is blocking too-frequent access to external data, or EBI is denying my connection.

For now, I guess, when this happens, I'll just rerun the pipeline.

@PyGuan
Copy link
Author

PyGuan commented Apr 6, 2023

Just an update, the error can be due to this "connection time out" error:

[W::find_file_url] [W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/96e414eace405d8c27a6d35ba19df56f": Connection timed outFailed to ope n reference "https://www.ebi.ac.uk/ena/cram/md5/6c198acf68b5af7b9d676dfdd531b5de": Connection timed out

I'm not sure exactly what's going on. The connection timeout seems to be intermittent. I tried running the pipeline a few times, and each time some samples were cleared.

It could be that our server is blocking too-frequent access to external data, or EBI is denying my connection.

For now, I guess, when this happens, I'll just rerun the pipeline.

I was referring to GRCh38. For GRCh37, the issue persisted.

@oghzzang
Copy link

Dear user,

Hello, I resolved it!
This issue was related to the reference for ascat.

Check if the chromosomes in your BAM file have the "chr" prefix.
There's a high chance that the "chr" prefix is missing.

In that case, you need to modify the "ascat_loci" in the igenome.config file.
Currently, the default configuration in Sarek downloads "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/ASCAT/G1000_loci_hg19.zip", which includes the "chr" prefix.
You can either download a version without "chr" for your local file or simply remove "chr" using a quick sed command.

Please refer to the bottom of the following page for information on "'chr'-based versus non 'chr'-based reference":
Reference: VanLoo-lab ascat ReferenceFiles for WGS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants