diff --git a/SAMtags.tex b/SAMtags.tex index e19ec290b..37287f1b0 100644 --- a/SAMtags.tex +++ b/SAMtags.tex @@ -92,6 +92,7 @@ \section{Standard tags} {\tt MI} & Z & Molecular identifier; a string that uniquely identifies the molecule from which the record was derived \\ {\tt ML} & B,C & Base modification probabilities \\ {\tt MM} & Z & Base modifications / methylation \\ + {\tt MN} & i & Length of sequence at the time {\tt MM} and {\tt ML} were produced \\ {\tt MQ} & i & Mapping quality of the mate/next segment \\ {\tt NH} & i & Number of reported alignments that contain the query in the current record \\ {\tt NM} & i & Edit distance to the reference \\ @@ -621,6 +622,16 @@ \subsection{Base modifications} {\tt ML} values for ambiguity codes give the probability that the modification is one of the possible codes compatible with that ambiguity code. For example {\tt MM:Z:C+C,10; ML:B:C,229} indicates a C call with a probability of 90\% of having some form of unspecified modification. +\item[MN:i:\tagvalue{length}] +\hfill\\ +Tools may edit the {\sf SEQ} sequence data, such as modifying the alignment with hard-clipping. +If the sequence is shrunk in this manner then the base offsets in {\tt MM} and {\tt ML} become invalid unless they are also updated accordingly. + +There may be hard-clipping tools which update {\tt MM} and tools which do not, so the {\tt MN} tag offers a simple sanity check. +It holds the length of the sequence at the time {\tt MM} was last written. +Tools that wish to validate {\tt MM} should compare the length of the {\sf SEQ} field with the contents of the {\tt MN} tag. +The tag is optional, but recommended, and if it is absent then there is an implicit assumption that the {\tt MM} data is valid unless evidence implies otherwise (such as having coordinates beyond the end of the sequence). + \end{description} \section{Draft tags}