deAnalysis requires sample_id in sample_sheet - How to restart the run? #123

KatrinMoller · 2024-10-11T11:05:30Z

Operating System

Ubuntu 22.04

Other Linux

No response

Workflow Version

v1.4.0

Workflow Execution

Command line (Cluster)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

./nextflow pull epi2me-labs/wf-transcriptomes

OUTPUT=~/output;
./nextflow run epi2me-labs/wf-transcriptomes
-profile singularity
--bam /merged_output
--de_analysis
--transcriptome_source precomputed
--ref_genome /genomes/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
--ref_annotation /genomes/Mus_musculus.GRCm38.98.gtf.gz
--ref_transcriptome Mus_musculus.GRCm38.cdna.all.fa.gz
--sample_sheet /sample_sheets/sample_sheet1.csv
--cdna_kit "SQK-PCS114"
--isoform_table_nrows 10000
--out_dir /analysis/outdir1 -w /analysis/workspace_dir1
--threads 64

Workflow Execution - CLI Execution Profile

None

What happened?

The pipeline ran successfully until the deAnalysis, where it stopped apparently because of a missing sample_id in the sample_sheet. In the read_me it is only stated theat barcode, alias and condition are required. However, I assume this error appears because Salmon requires a "sample_id" to run?
My question to this error is: Can I add a sample_id to the sample_sheet in the out_dir? Do I need to restart the entire thing after that or can the process continue from where it halted? How would I execute that?

Relevant log output

Workflow execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'pipeline:differential_expression:deAnalysis (1)'

Caused by:
  Process `pipeline:differential_expression:deAnalysis (1)` terminated with an error exit status (1)


Command executed:

  mkdir merged
  mkdir de_analysis
  de_analysis.R annotation.gtf 3 1 10 3 "sample_sheet.csv"

Command exit status:
  1

Command output:
  Loading counts, conditions and parameters.
  Checking annotation file type.
  Annotation file type is gtf.
  Checking annotation file for presence of transcript_id versions.
  Annotation file transcript_ids do not include versions so also strip versions from the counts df.
  Loading annotation database.
  Filtering counts using DRIMSeq.

Command error:
  Warning message:
  package 'DRIMSeq' was built under R version 4.3.2 
  Warning messages:
  1: package 'GenomicFeatures' was built under R version 4.3.2 
  2: package 'BiocGenerics' was built under R version 4.3.2 
  3: package 'S4Vectors' was built under R version 4.3.3 
  4: package 'IRanges' was built under R version 4.3.3 
  5: package 'GenomeInfoDb' was built under R version 4.3.2 
  6: package 'GenomicRanges' was built under R version 4.3.3 
  7: package 'AnnotationDbi' was built under R version 4.3.2 
  8: package 'Biobase' was built under R version 4.3.3 
  Warning messages:
  1: package 'edgeR' was built under R version 4.3.3 
  2: package 'limma' was built under R version 4.3.3 
  Loading counts, conditions and parameters.
  Checking annotation file type.
  Annotation file type is gtf.
  Checking annotation file for presence of transcript_id versions.
  Annotation file transcript_ids do not include versions so also strip versions from the counts df.
  Loading annotation database.
  Import genomic features from the file as a GRanges object ... OK
  Prepare the 'metadata' data frame ... OK
  Make the TxDb object ... OK
  Warning message:
  In .get_cds_IDX(mcols0$type, mcols0$phase) :
    The "phase" metadata column contains non-NA values for features of type
    stop_codon. This information was ignored.
  'select()' returned 1:many mapping between keys and columns
  Filtering counts using DRIMSeq.
  Error in dmDSdata(counts = counts, samples = coldata) : 
    all(samples$sample_id %in% colnames(counts)) is not TRUE
  Calls: dmDSdata -> stopifnot
  Execution halted

Work dir:
  /analysis/workspace_dir1/95/dba2add8f6897273fb62f3eb453425

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

nrhorner · 2024-10-11T13:14:12Z

Hi @KatrinMoller

You can update your sample sheet and restart the workflow by adding -reume to your command. I'm not sure if the workflow will then start at the process in question, but it should reduce the number of processes that need to be run.

KatrinMoller · 2024-10-11T13:49:14Z

HI @nrhorner
Thanks for the suggestion. I tried this, replacing the sample sheet, then running the same command adding -resume at the end, like so:
[kmoller@compute-71 km127_RNAseq]$ ./nextflow run epi2me-labs/wf-transcriptomes \

-profile singularity
--bam /merged_output
--de_analysis
--transcriptome_source precomputed
--ref_genome /genomes/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
--ref_annotation /genomes/Mus_musculus.GRCm38.98.gtf.gz
--ref_transcriptome Mus_musculus.GRCm38.cdna.all.fa.gz
--sample_sheet /sample_sheets/sample_sheet1.csv
--cdna_kit "SQK-PCS114"
--isoform_table_nrows 10000
--out_dir /analysis/outdir1 -w /analysis/workspace_dir1
--threads 64
-resume

This resulted in the following error:
ERROR ~ Unable to acquire lock on session with ID 03a8b907-ebfa-429f-ab6c-07b93313172f

Common reasons for this error are:

You are trying to resume the execution of an already running pipeline
A previous execution was abruptly interrupted, leaving the session open

You can see which process is holding the lock file by using the following command:

lsof /proj/hpcdata/Mimir/shared/kmoller/km127_RNAseq/.nextflow/cache/03a8b907-ebfa-429f-ab6c-07b93313172f/db/LOCK

-- Check '.nextflow.log' file for details

I can indeed find this particular LOCK file, but it is empty and I do not know how to resume. Please help :)

nrhorner · 2024-11-06T07:31:14Z

Hi @KatrinMoller

Sorry for the late reply. The lock file is from Nextflow nextflow-io/nextflow#3987 (comment)

You should be able to delete this and resume the workflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deAnalysis requires sample_id in sample_sheet - How to restart the run? #123

deAnalysis requires sample_id in sample_sheet - How to restart the run? #123

KatrinMoller commented Oct 11, 2024 •

edited

Loading

nrhorner commented Oct 11, 2024

KatrinMoller commented Oct 11, 2024

nrhorner commented Nov 6, 2024

deAnalysis requires sample_id in sample_sheet - How to restart the run? #123

deAnalysis requires sample_id in sample_sheet - How to restart the run? #123

Comments

KatrinMoller commented Oct 11, 2024 • edited Loading

Operating System

Other Linux

Workflow Version

Workflow Execution

Other workflow execution

EPI2ME Version

CLI command run

Workflow Execution - CLI Execution Profile

What happened?

Relevant log output

Application activity log entry

Were you able to successfully run the latest version of the workflow with the demo data?

Other demo data information

nrhorner commented Oct 11, 2024

KatrinMoller commented Oct 11, 2024

nrhorner commented Nov 6, 2024

KatrinMoller commented Oct 11, 2024 •

edited

Loading