From c151a3b7410897c33451890c52a9a6721ae96703 Mon Sep 17 00:00:00 2001 From: SalemBajjali Date: Mon, 15 Apr 2024 10:26:13 -0500 Subject: [PATCH] Fixed Spelling Error (#12) * Update index.rst corrected spelling * Update terms_and_model.rst corrected spelling * Update introduction.rst corrected spelling * Update terms_and_model.rst fix the syntax to correctly reference the "Introduction" section --- docs/source/index.rst | 2 +- docs/source/introduction.rst | 14 +++++++------- docs/source/terms_and_model.rst | 4 ++-- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index 4f16cc0..745a658 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -2,7 +2,7 @@ Categorical Variation Representation Specification !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! -The Categorical Variation Representation Specification (Cat-VRS, pronounced "cat verse") is a specification developped by the Global Alliance for Genomics and Health (GA4GH) to provide a standard for the representation of categorical variant concepts in genomics knowledgebases, and improve genomic knowledge search, curation, and harmonization. The specification consists of a JSON Schema for representing classes of categorical variation, conventions to maximize the utility of the schema, and a python implementation that promotes adoption of the standard. +The Categorical Variation Representation Specification (Cat-VRS, pronounced "cat verse") is a specification developed by the Global Alliance for Genomics and Health (GA4GH) to provide a standard for the representation of categorical variant concepts in genomics knowledgebases, and improve genomic knowledge search, curation, and harmonization. The specification consists of a JSON Schema for representing classes of categorical variation, conventions to maximize the utility of the schema, and a python implementation that promotes adoption of the standard. diff --git a/docs/source/introduction.rst b/docs/source/introduction.rst index 2b612c8..80e6533 100644 --- a/docs/source/introduction.rst +++ b/docs/source/introduction.rst @@ -48,16 +48,16 @@ Challenges to Unifying the Representation of Categorical Variants .. CatVars are hard to pin down .. Why they arise -Categorical variants arise organically and continuously in the course of genomics research. When clinical studies are run and journal papers published, the results are typically not charactorized in terms of an exhaustive list of assayed variants to which the conclusions apply. Rather, the domain of the conclusions are currently characterized in terms of a chategorical variant, all of the individual assayed variants that fall into the same biological bucket. Like all scientific abstractions, these models have several useful properties. They describe insightful conclusions related to the biological events that underly a function common to a class of variants. They also make useful predictions, namely that the same conclusions should apply to variants that weren't explicitly tested but ought to function in a similar way to those explicitly tested. They thus allow us to generalize genomic knowledge. +Categorical variants arise organically and continuously in the course of genomics research. When clinical studies are run and journal papers published, the results are typically not charactorized in terms of an exhaustive list of assayed variants to which the conclusions apply. Rather, the domain of the conclusions are currently characterized in terms of a categorical variant, all of the individual assayed variants that fall into the same biological bucket. Like all scientific abstractions, these models have several useful properties. They describe insightful conclusions related to the biological events that underly a function common to a class of variants. They also make useful predictions, namely that the same conclusions should apply to variants that weren't explicitly tested but ought to function in a similar way to those explicitly tested. They thus allow us to generalize genomic knowledge. -To return to the running example, the BRAF V600E categorical variant inlcudes as its members any of 2 single-nucleotide substitutions and 6 double-nucleotide substitions that convert a Valine codon into one coding for Glutamic acid. The Valine to Glutamic Acid amino acid substitution variant is also a member of that set. Any other variant or series of variants that would have the net effect of substituting glutamic acid for valine in the same location of the resulting polypeptide chain is also a member of the same categorical variant. +To return to the running example, the BRAF V600E categorical variant inlcudes as its members any of 2 single-nucleotide substitutions and 6 double-nucleotide substitions that convert a Valine codon into one coding for Glutamic acid. The Valine to Glutamic Acid amino acid substitution variant is also a member of that set. Any other variant or series of variants that would have the net effect of substituting Glutamic acid for Valine in the same location of the resulting polypeptide chain is also a member of the same categorical variant. .. CatVars have complicated relationships with each other -While a single categorical variant may have many assayed variant members, the same is true in the other direction. A single assayed variant is a member of many possible categorical variants simultaneously. While NC_000007.13:g.140453136A>T is a member of the BRAF V600E categorical variant, it is also a Change-of-function variant, a protein missense variant, and a chromosome 7 variant, among other categorical variants. +While a single categorical variant may have many assayed variant members, the same is true in the other direction. A single assayed variant is a member of many possible categorical variants simultaneously. While NC_000007.13:g.140453136A>T is a member of the BRAF V600E categorical variant, it is also a Change-of-Function variant, a protein missense variant, and a chromosome 7 variant, among other categorical variants. .. image:: images/relations-between-assayed-and-CatVars-and-CatVars-to-other-CatVars.png @@ -78,7 +78,7 @@ Because a single categorical variant may have many assayed variants as members, .. CatVar labels do not always denote the same thing across different KBs, and may even be redundant-specified -To make categoricla variant matching even more complicated, it is often the case that identical labels across different resiuorces in fact describe different categroical variants, as seen in the figure below where an ACT sequence has been inserted directly 3' of a ACTG sequence. While this would not be considered a duplication variant in the HGVS nomenclature due to the intervening G base pair, it could appear in other resources as a duplication of the preceeding ACT sequence. This implies that the catgorical variant descriptor "duplication" has different meanings across different resources. +To make categoricla variant matching even more complicated, it is often the case that identical labels across different resources in fact describe different categroical variants, as seen in the figure below where an ACT sequence has been inserted directly 3' of a ACTG sequence. While this would not be considered a duplication variant in the HGVS nomenclature due to the intervening G base pair, it could appear in other resources as a duplication of the preceeding ACT sequence. This implies that the catgorical variant descriptor "duplication" has different meanings across different resources. .. image:: images/CatVar-CatVar-matching.png @@ -87,13 +87,13 @@ To make categoricla variant matching even more complicated, it is often the case :alt: The figure depicts a hypothetical variant where an ACT sequence has been inserted directly 3' of a ACTG sequence. While this would not be considered a duplication variant in the HGVS nomenclature due to the intervening G base pair, it could appear in other resources as a duplication of the preceeding ACT sequence, or alternately simply as an insertion of ACT. This implies that the catgorical variant descriptor "duplication" has different meanings across different resources. -On the other hand, it is also often the case that spurious ambiguity exists within resources. The figure depicts a hypothetical case where compared to a reference sequence ACT, the variant sequence is ACCCCCT. In HVGS, this variant could either validly be described as an insertion of 4 C nucleotides, or else a five repetitions of the single nucleotide sequence C. This demonstrates spurious ambiguity of categorical variant descriptors, as both categorical variants desribe two sets with all and only the same member variants. +On the other hand, it is also often the case that spurious ambiguity exists within resources. The figure depicts a hypothetical case where compared to a reference sequence ACT, the variant sequence is ACCCCCT. In HGVS, this variant could either validly be described as an insertion of 4 C nucleotides, or else a five repetitions of the single nucleotide sequence C. This demonstrates spurious ambiguity of categorical variant descriptors, as both categorical variants desribe two sets with all and only the same member variants. .. image:: images/CatVar-CatVar-spurious-ambiguity.png :width: 40% :align: center - :alt: The figure depicts a hypothetical case where compared to a reference sequence ACT, the variant sequence is ACCCCCT. In HVGS, this variant could either be described as an insertion of 4 C nucleotides, or else a five repetitions of the single nucleotide sequence C. This demonstrates spurious ambiguity of categorical variant descriptors, as both categorical variants desribe two sets with all and only the same member variants. + :alt: The figure depicts a hypothetical case where compared to a reference sequence ACT, the variant sequence is ACCCCCT. In HGVS, this variant could either be described as an insertion of 4 C nucleotides, or else a five repetitions of the single nucleotide sequence C. This demonstrates spurious ambiguity of categorical variant descriptors, as both categorical variants desribe two sets with all and only the same member variants. @@ -101,7 +101,7 @@ Discussion @@@@@@@@@@ -In summary, a crucial step in the course of genomic variant interpretation is assayed-categorical variant matching, where one determines all and only those categorical variants to whoch the assayed variant in question is a member. Successful assayed-categorical variant matching makes it possible to connect evidence to support or refute determinations of pathogenicity and/or oncogenicity of the assayed variants. In a different but related use case, categorical-categorical variant matching is crucial to the process of data harmonization and knowledgebase curation. +In summary, a crucial step in the course of genomic variant interpretation is assayed-categorical variant matching, where one determines all and only those categorical variants to which the assayed variant in question is a member. Successful assayed-categorical variant matching makes it possible to connect evidence to support or refute determinations of pathogenicity and/or oncogenicity of the assayed variants. In a different but related use case, categorical-categorical variant matching is crucial to the process of data harmonization and knowledgebase curation. diff --git a/docs/source/terms_and_model.rst b/docs/source/terms_and_model.rst index 7817e72..171c35b 100644 --- a/docs/source/terms_and_model.rst +++ b/docs/source/terms_and_model.rst @@ -11,7 +11,7 @@ correctly reflecting uncertainty of our understanding at the time. Unfortunately, such terms are not readily translatable into an unambiguous representation of knowledge. -As discussed in the :ref:'Introduction', categorical variation labels are homophonous, ambiguous, and vague, often all three simultanously. This poses a great difficulty to the precise repreentation of categorical variation. In contrast, **the computational representation of categorical variation concepts requires +As discussed in the :ref:`Introduction`, categorical variation labels are homophonous, ambiguous, and vague, often all three simultanously. This poses a great difficulty to the precise representation of categorical variation. In contrast, **the computational representation of categorical variation concepts requires translating precise categorical definitions into information models and data structures that may be used in software.** This translation should result in a representation of information that is consistent @@ -25,7 +25,7 @@ Accordingly, for each term we define below, we begin by describing the term as used by the genetics and/or bioinformatics communities as available. When a term has multiple such definitions, we explicitly choose one of them for the purposes of computational -modelling. We then define the **computational definition** that +modeling. We then define the **computational definition** that reformulates the community definition in terms of information content. Finally, we translate each of these computational definitions into precise specifications for the (**information model**).