From 46d8412473f903f7052b39a72f7a1d2c74c2336d Mon Sep 17 00:00:00 2001 From: codemeleon Date: Sat, 15 Jul 2023 09:35:35 +0100 Subject: [PATCH] Minor text correction --- manuscript/paper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/manuscript/paper.md b/manuscript/paper.md index f550b64..439bbf2 100644 --- a/manuscript/paper.md +++ b/manuscript/paper.md @@ -65,7 +65,7 @@ Next-Generation Sequencing (NGS) platforms underlie an exciting era that facilit A high-quality and accurate genome assembly provides insight into the occurrence and associated consequences of genetic changes within pathogenic microorganisms. Analyses to extract this information primarily involve the identification of variations such as single nucleotide polymorphisms (SNPs), and insertions and deletions (INDELs) which may be associated with enhanced pathogen phenotypes, such as increased transmissibility, immune and vaccine escape, increased infectivity, and disease severity [@Harvey_2021]. For SARS-CoV-2, and other pathogens of public health interest, several pipelines exist to map sequenced raw reads to a reference genome and generate the consensus sequence for downstream analyses [@Lam_2015; @Vilsker_2018; @Alvarez_Narvaez_2022]. However, often the consensus deviates from what is expected, requiring additional quality control (QC) and refinements prior to publication. Additionally, to track microbial and viral diversity, genomic sequences are classified into lineages comprising a constellation of mutations exclusive to the lineage [@Rambaut_2020]. The absence of one or more lineage-defining mutations, some of which may have been implicated in the altering of viral phenotypes, calls for further investigation. This includes re-examining the sequenced and mapped reads to establish the underlying reasons for its absence, and occasionally, and if warranted restore it. -Wrong and/or uncalled mutations, representing false positives and negatives, could arise due to several factors that negatively affect the sequencing, mapping, and assembly outcomes. These include primer drop-outs and algorithmic issues [@Li_2018]. Algorithmic issues occur when expected parameter values significantly differ from the values encountered, for example, when depth is lower than the expected minimum due to low viral loads [@Lam_2021]. Low viral loads are typical of samples collected at the later stages of infection or tail ends of an outbreak commonly characterized by high cycle threshold (Ct) values (> 30) uences from these are usually defined by large numbers of frameshifts, INDELs, clustered and private mutations, i.e. mutations that are unique to a strain compared to their nearest neighbor in the global phylogeny for supported pathogens [@nextstrain_2020]. Primer drop-outs, on the other hand, are often caused by hypermutation in primer binding regions, reducing the amplification and sequencing for the targeted regions. This results in sections of the genome with low or no coverage. Primer drop-outs were commonly reported throughout the SARS-CoV-2 pandemic, especially in the variants of concern (VOCs)[@nextstrain_2020; @Davis_2021; @Sutton_2022]. Mutations that have resulted in primer drop-outs for VOCs include the G142D (Delta and Omicron) in the 2_Right primer, the 241/243del (Beta) that occurs in the 74_Left primer, and the K417N (Beta) or K417T (Gamma) which occurs in the 76_Left primer [@Davis_2021; @Ahmed_2022]. The Delta/B.1.1.672 variant has also been associated with ARTIC v3 drop-out of primers 72R and 73L [@Borcard_2022]. +Wrong and/or uncalled mutations, representing false positives and negatives, could arise due to several factors that negatively affect the sequencing, mapping, and assembly outcomes. These include primer drop-outs and algorithmic issues [@Li_2018]. Algorithmic issues occur when expected parameter values significantly differ from the values encountered, for example, when depth is lower than the expected minimum due to low viral loads [@Lam_2021]. Low viral loads are typical of samples collected at the later stages of infection or tail ends of an outbreak commonly characterized by high cycle threshold (Ct) values (> 30) or low viral loads [@Sutton_2022]. Sequences from these are usually defined by large numbers of frameshifts, INDELs, clustered and private mutations, i.e. mutations that are unique to a strain compared to their nearest neighbor in the global phylogeny for supported pathogens [@nextstrain_2020]. Primer drop-outs, on the other hand, are often caused by hypermutation in primer binding regions, reducing the amplification and sequencing for the targeted regions. This results in sections of the genome with low or no coverage. Primer drop-outs were commonly reported throughout the SARS-CoV-2 pandemic, especially in the variants of concern (VOCs)[@nextstrain_2020; @Davis_2021; @Sutton_2022]. Mutations that have resulted in primer drop-outs for VOCs include the G142D (Delta and Omicron) in the 2_Right primer, the 241/243del (Beta) that occurs in the 74_Left primer, and the K417N (Beta) or K417T (Gamma) which occurs in the 76_Left primer [@Davis_2021; @Ahmed_2022]. The Delta/B.1.1.672 variant has also been associated with ARTIC v3 drop-out of primers 72R and 73L [@Borcard_2022]. Tools such as Nextclade [@Aksamentov_2021] can capture and report these sequence anomalies. Reports generated through Nextclade include detailed information on the excess number of gaps, mixed bases, private mutations, and frameshifts. However, there is no easy-to-use bioinformatics QC tool to further explore and report codon-affecting alterations (INDELs and substitutions) in the mapped short reads from a mixed bacterial/viral population or batch update of consensus sequences. Moreover, such tools are often taxonomically limited and remain optimized for a select set of reference genomes.