Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure data for negative controls is delivered to customers #2871

Closed
karlnyr opened this issue Jan 26, 2024 · 13 comments · Fixed by #3682 or #3684
Closed

Ensure data for negative controls is delivered to customers #2871

karlnyr opened this issue Jan 26, 2024 · 13 comments · Fixed by #3682 or #3684

Comments

@karlnyr
Copy link
Contributor

karlnyr commented Jan 26, 2024

Description

Negative controls require all reads linked to their housekeeper bundle and have the read counts added to the sample in statusdb and LIMS.

Suggested solution

When we parse the demultiplexing stats we do get the number of reads from the sample, the next step would be to (a) check if a sample is a negative control or (b) find the metagenomic negative control on a flow cell, and handle it differently.

This can be closed when

Negative controls have their reads added to the sample bundle, statusdb, and LIMS disregarding read quality.

Clarification

Negative controls by definition won't pass our QC after sequencing and won't have the fastq files linked to them. Negative controls should always have their data transferred. These can be used by the customer to filter out any noise from the prep. Also, in the new metagenomic analyses they are needed.

  1. Negative controls do not pass sequencing QC
  2. Negative controls fastq are not stored in their respective housekeeper sample bundle
  3. Negative control samples are not added or updated in status db in the IlluminaSampleSequencingMetrics table.
  4. Negative control samples need to be added to LIMS.

The above four steps are required to pass.

Second Clarification

  • Customers want ALL the data produced, and we are not providing the data generated for negative controls
  • The new taxprofiler workflow needs the data from the negative control to run
@karlnyr
Copy link
Contributor Author

karlnyr commented Jan 31, 2024

Discussion

  • Can we use bad-quality reads
  • Add a property to the sample model which is "sample_is_metagenomic"-sh
  • Should we do this for all negative controls, and deal with the repercussions later?

Decision

  • Implement this for all negative controls
    • If a sample is negative control then add any reads at all independent of quality
  • Check that pipelines where this could affect (microsalt, mutant) do not impact it negatively
    • Create a ProdBioInfo task to start a microsalt case including reads with poor quality. @karlnyr

@karlnyr
Copy link
Contributor Author

karlnyr commented Feb 16, 2024

Plan is to analyse eternaltahr with its negative control ACC9713A25 which has reads and of low quality, requesting flow cell HVNHGDSX2 to fetch the data back to hasta so that we can add it to ACC9713A25 temporarily.

@beatrizsavinhas
Copy link
Contributor

beatrizsavinhas commented Jul 19, 2024

  • Demultiplexed flowcell
  • Created housekeeper bundle and added fastq files for negative control sample ACC9713A25
  • Retrieved archived files for all samples
  • Decompressed all files
  • Started microsalt analysis for case eternaltahr

@karlnyr
Copy link
Contributor Author

karlnyr commented Jul 22, 2024

Analysis completed without any issues. This means that we can now begin the process of adding data for all negative controls to hk as well! @Clinical-Genomics/sysdev, could you prioritize this? The current situation is very prone to mistakes, and the metagenomic samples are quite time-sensitive.

@diitaz93
Copy link
Contributor

Technical refinement

Questions

  • Does this concern only MIcroSALT or other pipelines? @karlnyr

@karlnyr
Copy link
Contributor Author

karlnyr commented Jul 25, 2024

It is not pipeline relevant. This is on an organisation level where we would like to store all reads for samples that are negative controls @diitaz93

@Vince-janv Vince-janv removed the on hold label Aug 7, 2024
@Vince-janv
Copy link
Contributor

@karlnyr What issue would this resolve? Is it a requirement for taxprofiler being implemented in production?

@karlnyr
Copy link
Contributor Author

karlnyr commented Aug 8, 2024

This would resolve two things:

  • Automatic transfer of fastq files for negative controls
  • Enable taxprofiler to use negative control reads in the future

@karlnyr karlnyr changed the title Add sample reads to statusdb, LIMS and housekeeper for metagenomic negative controls Add sample reads to statusdb, LIMS and housekeeper for negative controls Aug 8, 2024
@karlnyr
Copy link
Contributor Author

karlnyr commented Aug 8, 2024

Edited issue, as this would be changed for all negative controls, not specifically for metagenomic ones.

@Vince-janv
Copy link
Contributor

Technical Refinement

  • Use sample.control to determine if low-quality fastq files should be added to housekeeper (in post-processing flow)
  • Also update the sample.reads for negative controls

@seallard seallard changed the title Add sample reads to statusdb, LIMS and housekeeper for negative controls Ensure data for negative controls is stored Aug 12, 2024
@seallard seallard changed the title Ensure data for negative controls is stored Ensure data for negative controls is stored and delivered to customers Aug 12, 2024
@seallard seallard changed the title Ensure data for negative controls is stored and delivered to customers Ensure data for negative controls is delivered to customers Aug 12, 2024
@beatrizsavinhas
Copy link
Contributor

Changes reverted.

Blocked until the changes can be implemented in LIMS.

@Karl-Svard
Copy link
Contributor

The reverted changes have now been re-reverted after the LIMS update (Clinical-Genomics/cg_lims#531). See #3769

@RasmusBurge-CG
Copy link
Contributor

I was unable to start case: cg workflow microsalt start preparedmartin

Error was:


[14:26] [hiseq.clinical@hasta:~] [P_base]  $ cg workflow microsalt start preparedmartin
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Starting Microsalt workflow for preparedmartin
Found file ACC15747A1/2024-09-29/22C3K5LT4_ACC15747A1_S153_L004_R2_001.fastq.gz
Found file ACC15747A1/2024-09-29/22C3K5LT4_ACC15747A1_S153_L004_R1_001.fastq.gz
Found file ACC15747A1/2024-09-29/22C3K5LT4_ACC15747A1_S153_L002_R2_001.fastq.gz
Found file ACC15747A1/2024-09-29/22C3K5LT4_ACC15747A1_S153_L001_R2_001.fastq.gz
Found file ACC15747A1/2024-09-29/22C3K5LT4_ACC15747A1_S153_L003_R2_001.fastq.gz
Found file ACC15747A1/2024-09-29/22C3K5LT4_ACC15747A1_S153_L003_R1_001.fastq.gz
Found file ACC15747A1/2024-09-29/22C3K5LT4_ACC15747A1_S153_L002_R1_001.fastq.gz
Found file ACC15747A1/2024-09-29/22C3K5LT4_ACC15747A1_S153_L001_R1_001.fastq.gz
Found file ACC15747A2/2024-09-29/22C3K5LT4_ACC15747A2_S154_L004_R2_001.fastq.gz
Found file ACC15747A2/2024-09-29/22C3K5LT4_ACC15747A2_S154_L003_R2_001.fastq.gz
Found file ACC15747A2/2024-09-29/22C3K5LT4_ACC15747A2_S154_L004_R1_001.fastq.gz
Found file ACC15747A2/2024-09-29/22C3K5LT4_ACC15747A2_S154_L001_R1_001.fastq.gz
Found file ACC15747A2/2024-09-29/22C3K5LT4_ACC15747A2_S154_L002_R1_001.fastq.gz
Found file ACC15747A2/2024-09-29/22C3K5LT4_ACC15747A2_S154_L001_R2_001.fastq.gz
Found file ACC15747A2/2024-09-29/22C3K5LT4_ACC15747A2_S154_L002_R2_001.fastq.gz
Found file ACC15747A2/2024-09-29/22C3K5LT4_ACC15747A2_S154_L003_R1_001.fastq.gz
Found file ACC15747A3/2024-09-29/22C3K5LT4_ACC15747A3_S155_L002_R1_001.fastq.gz
Found file ACC15747A3/2024-09-29/22C3K5LT4_ACC15747A3_S155_L004_R1_001.fastq.gz
Found file ACC15747A3/2024-09-29/22C3K5LT4_ACC15747A3_S155_L001_R2_001.fastq.gz
Found file ACC15747A3/2024-09-29/22C3K5LT4_ACC15747A3_S155_L002_R2_001.fastq.gz
Found file ACC15747A3/2024-09-29/22C3K5LT4_ACC15747A3_S155_L003_R2_001.fastq.gz
Found file ACC15747A3/2024-09-29/22C3K5LT4_ACC15747A3_S155_L001_R1_001.fastq.gz
Found file ACC15747A3/2024-09-29/22C3K5LT4_ACC15747A3_S155_L003_R1_001.fastq.gz
Found file ACC15747A3/2024-09-29/22C3K5LT4_ACC15747A3_S155_L004_R2_001.fastq.gz
Found file ACC15747A4/2024-09-29/22C3K5LT4_ACC15747A4_S156_L003_R2_001.fastq.gz
Found file ACC15747A4/2024-09-29/22C3K5LT4_ACC15747A4_S156_L002_R2_001.fastq.gz
Found file ACC15747A4/2024-09-29/22C3K5LT4_ACC15747A4_S156_L002_R1_001.fastq.gz
Found file ACC15747A4/2024-09-29/22C3K5LT4_ACC15747A4_S156_L003_R1_001.fastq.gz
Found file ACC15747A4/2024-09-29/22C3K5LT4_ACC15747A4_S156_L001_R1_001.fastq.gz
Found file ACC15747A4/2024-09-29/22C3K5LT4_ACC15747A4_S156_L004_R2_001.fastq.gz
Found file ACC15747A4/2024-09-29/22C3K5LT4_ACC15747A4_S156_L001_R2_001.fastq.gz
Found file ACC15747A4/2024-09-29/22C3K5LT4_ACC15747A4_S156_L004_R1_001.fastq.gz
Found file ACC15747A5/2024-09-29/22C3K5LT4_ACC15747A5_S157_L004_R1_001.fastq.gz
Found file ACC15747A5/2024-09-29/22C3K5LT4_ACC15747A5_S157_L003_R1_001.fastq.gz
Found file ACC15747A5/2024-09-29/22C3K5LT4_ACC15747A5_S157_L003_R2_001.fastq.gz
Found file ACC15747A5/2024-09-29/22C3K5LT4_ACC15747A5_S157_L002_R1_001.fastq.gz
Found file ACC15747A5/2024-09-29/22C3K5LT4_ACC15747A5_S157_L001_R1_001.fastq.gz
Found file ACC15747A5/2024-09-29/22C3K5LT4_ACC15747A5_S157_L001_R2_001.fastq.gz
Found file ACC15747A5/2024-09-29/22C3K5LT4_ACC15747A5_S157_L004_R2_001.fastq.gz
Found file ACC15747A5/2024-09-29/22C3K5LT4_ACC15747A5_S157_L002_R2_001.fastq.gz
Found file ACC15747A6/2024-09-29/22C3K5LT4_ACC15747A6_S158_L003_R2_001.fastq.gz
Found file ACC15747A6/2024-09-29/22C3K5LT4_ACC15747A6_S158_L004_R2_001.fastq.gz
Found file ACC15747A6/2024-09-29/22C3K5LT4_ACC15747A6_S158_L001_R2_001.fastq.gz
Found file ACC15747A6/2024-09-29/22C3K5LT4_ACC15747A6_S158_L003_R1_001.fastq.gz
Found file ACC15747A6/2024-09-29/22C3K5LT4_ACC15747A6_S158_L002_R2_001.fastq.gz
Found file ACC15747A6/2024-09-29/22C3K5LT4_ACC15747A6_S158_L004_R1_001.fastq.gz
Found file ACC15747A6/2024-09-29/22C3K5LT4_ACC15747A6_S158_L002_R1_001.fastq.gz
Found file ACC15747A6/2024-09-29/22C3K5LT4_ACC15747A6_S158_L001_R1_001.fastq.gz
Destination path already exists: /home/proj/production/microbial/fastq/preparedmartin/ACC15747A1/ACC15747A1_22C3K5LT4_L1_1.fastq.gz
Destination path already exists: /home/proj/production/microbial/fastq/preparedmartin/ACC15747A1/ACC15747A1_22C3K5LT4_L1_2.fastq.gz
Destination path already exists: /home/proj/production/microbial/fastq/preparedmartin/ACC15747A1/ACC15747A1_22C3K5LT4_L2_1.fastq.gz
Destination path already exists: /home/proj/production/microbial/fastq/preparedmartin/ACC15747A1/ACC15747A1_22C3K5LT4_L2_2.fastq.gz
Destination path already exists: /home/proj/production/microbial/fastq/preparedmartin/ACC15747A1/ACC15747A1_22C3K5LT4_L3_1.fastq.gz
Destination path already exists: /home/proj/production/microbial/fastq/preparedmartin/ACC15747A1/ACC15747A1_22C3K5LT4_L3_2.fastq.gz
Destination path already exists: /home/proj/production/microbial/fastq/preparedmartin/ACC15747A1/ACC15747A1_22C3K5LT4_L4_1.fastq.gz
Destination path already exists: /home/proj/production/microbial/fastq/preparedmartin/ACC15747A1/ACC15747A1_22C3K5LT4_L4_2.fastq.gz
Could not parse header format for header:
Traceback (most recent call last):
  File "/home/proj/production/bin/miniconda3/envs/P_cg/bin/cg", line 8, in <module>
    sys.exit(base())
             ^^^^^^
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/cg/cli/workflow/microsalt/base.py", line 186, in start
    context.invoke(link, ticket=ticket, sample=sample, unique_id=unique_id, dry_run=dry_run)
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/click/decorators.py", line 45, in new_func
    return f(get_current_context().obj, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/cg/cli/workflow/microsalt/base.py", line 65, in link
    analysis_api.link_fastq_files(
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/cg/meta/workflow/microsalt/microsalt.py", line 133, in link_fastq_files
    self.link_fastq_files_for_sample(case=case_obj, sample=sample_obj)
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/cg/meta/workflow/analysis.py", line 419, in link_fastq_files_for_sample
    fastq_files_meta: list[FastqFileMeta] = self.gather_file_metadata_for_sample(sample=sample)
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/cg/meta/workflow/analysis.py", line 395, in gather_file_metadata_for_sample
    return [
           ^
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/cg/meta/workflow/analysis.py", line 396, in <listcomp>
    self.fastq_handler.parse_file_data(hk_file.full_path)
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/cg/meta/workflow/fastq.py", line 119, in parse_file_data
    fastq_file_meta: FastqFileMeta = FastqHandler.parse_fastq_header(header_line)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/cg/meta/workflow/fastq.py", line 114, in parse_fastq_header
    raise exception
  File "/home/proj/production/bin/miniconda3/envs/P_cg/lib/python3.11/site-packages/cg/meta/workflow/fastq.py", line 111, in parse_fastq_header
    return GetFastqFileMeta.header_format.get(len(parts))(parts=parts)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not callable

Issue being empty files in the bundle for the NTC, i.e missing the header some files were missing the headers.

[16:25] [hiseq.clinical@hasta:/home/proj/production/housekeeper-bundles/ACC15747A2/2024-09-29] [P_base]  $ ll
total 0
-rw-rw-rw-+ 2 hiseq.clinical users 316 Sep 29 16:11 22C3K5LT4_ACC15747A2_S154_L001_R1_001.fastq.gz
-rw-rw-rw-+ 2 hiseq.clinical users 287 Sep 29 16:12 22C3K5LT4_ACC15747A2_S154_L001_R2_001.fastq.gz
-rw-rw-rw-+ 2 hiseq.clinical users  23 Sep 29 17:06 22C3K5LT4_ACC15747A2_S154_L002_R1_001.fastq.gz
-rw-rw-rw-+ 2 hiseq.clinical users  23 Sep 29 17:06 22C3K5LT4_ACC15747A2_S154_L002_R2_001.fastq.gz
-rw-rw-rw-+ 2 hiseq.clinical users 296 Sep 29 17:59 22C3K5LT4_ACC15747A2_S154_L003_R1_001.fastq.gz
-rw-rw-rw-+ 2 hiseq.clinical users 294 Sep 29 17:59 22C3K5LT4_ACC15747A2_S154_L003_R2_001.fastq.gz
-rw-rw-rw-+ 2 hiseq.clinical users  23 Sep 29 18:58 22C3K5LT4_ACC15747A2_S154_L004_R1_001.fastq.gz
-rw-rw-rw-+ 2 hiseq.clinical users  23 Sep 29 18:58 22C3K5LT4_ACC15747A2_S154_L004_R2_001.fastq.gz
[16:30] [hiseq.clinical@hasta:/home/proj/production/housekeeper-bundles/ACC15747A2/2024-09-29] [P_base]  $ cat 22C3K5LT4_ACC15747A2_S154_L002_R1_001.fastq.gz
���[16:30] [hiseq.clinical@hasta:/home/proj/production/housekeeper-bundles/ACC15747A2/2024-09-29] [P_base]  $

Issue was manually patch by removing the empty files from the sample bundle for the NTC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment