From 94500cf76f049e898dec7af23097d877fde5894e Mon Sep 17 00:00:00 2001 From: James Bonfield Date: Tue, 5 Nov 2024 16:17:47 +0000 Subject: [PATCH] Add a footnote on case-insensitivity of RG PL (PR #684) This is not changing what is valid / permitted, and indeed this hopefully clarifies it further. However the practicality of dealing with wide-spread non-compliant data with lowercase PL values is that tools may wish to be lenient and use case-insensitive matching. Also removes test/sam/failed/hdr.RG6.sam due to explicitly testing against the use of lower-case PL fields. While strictly not conforming, it's overly harsh if we are advocating a more spec-tolerant testing regime for PL. Fixes #679. --- SAMv1.tex | 4 +++- test/sam/failed/hdr.RG6.sam | 1 - 2 files changed, 3 insertions(+), 2 deletions(-) delete mode 100644 test/sam/failed/hdr.RG6.sam diff --git a/SAMv1.tex b/SAMv1.tex index 4dd87570e..7b0b4c7a0 100644 --- a/SAMv1.tex +++ b/SAMv1.tex @@ -329,7 +329,9 @@ \subsection{The header section} & {\tt PG} & Programs used for processing the read group.\\\cline{2-3} & {\tt PI} & Predicted median insert size, rounded to the nearest integer.\\\cline{2-3} & {\tt PL} & Platform/technology used to produce the reads. \emph{Valid values}: - {\tt CAPILLARY}, {\tt DNBSEQ} (MGI/BGI), {\tt ELEMENT}, {\tt HELICOS}, {\tt ILLUMINA}, {\tt IONTORRENT}, {\tt LS454}, {\tt ONT} (Oxford Nanopore), {\tt PACBIO} (Pacific Biosciences), {\tt SINGULAR}, {\tt SOLID}, and {\tt ULTIMA}. + {\tt CAPILLARY}, {\tt DNBSEQ} (MGI/BGI), {\tt ELEMENT}, {\tt HELICOS}, {\tt ILLUMINA}, {\tt IONTORRENT}, {\tt LS454}, {\tt ONT} (Oxford Nanopore), {\tt PACBIO} (Pacific Biosciences), {\tt SINGULAR}, {\tt SOLID}, and {\tt ULTIMA}.% +\footnote{The {\tt PL} value should be written in uppercase exactly as shown in this list of valid values. +Tools should also accept lowercase when reading the {\tt @RG PL} field, due to the existence of public data files with lowercase {\tt PL} values.} This field should be omitted when the technology is not in this list (though the {\tt PM} field may still be present in this case) or is unknown.\\\cline{2-3} & {\tt PM} & Platform model. Free-form text providing further details of the platform/technology used.\\\cline{2-3} & {\tt PU} & Platform unit (e.g., flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier.\\\cline{2-3} diff --git a/test/sam/failed/hdr.RG6.sam b/test/sam/failed/hdr.RG6.sam deleted file mode 100644 index 229863580..000000000 --- a/test/sam/failed/hdr.RG6.sam +++ /dev/null @@ -1 +0,0 @@ -@RG ID:1 PL:illumina