Skip to content

Commit

Permalink
Merge pull request #111 from vojtech-kovar/master
Browse files Browse the repository at this point in the history
URI construction for DMLex fragments
  • Loading branch information
michmech authored May 20, 2024
2 parents 127679e + 9adafad commit b310afc
Show file tree
Hide file tree
Showing 4 changed files with 86 additions and 5 deletions.
65 changes: 61 additions & 4 deletions dmlex-v1.0/specification/core/specification.xml
Original file line number Diff line number Diff line change
Expand Up @@ -41,16 +41,16 @@
<para><literal><olink targetptr="core_example">example</olink></literal></para>
</listitem>
</itemizedlist>
<simplesect id="optionalroots">
<section id="optionalroots">
<title>Optional roots</title>
<para>
When exchanging data encoded in a DMLex serialization
which has the concept of a "root" or top-level object, such as XML, JSON or NVH,
the object types <literal>lexicographicResource</literal> and <literal>entry</literal>
can serve as such roots.
</para>
</simplesect>
<simplesect id="fragid">
</section>
<section id="fragid">
<title>Fragment identification</title>
<para>
Incomplete parts of DMLex objects represent valid fragments as long as it is possible to identify their complete source DMLex object.
Expand All @@ -64,7 +64,64 @@
</listitem>
</itemizedlist>
</para>
</simplesect>
<section id="frag_iri">
<title>DMLex fragment identification strings</title>
<para>DMLex provides a recommended method for addressing DMLex objects present on-line, useful for linking (cf. <xref linkend="linking"/>) and general interoperability. Implementing this method is not <glossterm>required</glossterm> for conformance.</para>

<para>Every fragment <glossterm>should</glossterm> be assigned a unique fragment identification string, composed of <literal>lexicographicResource.uri</literal>, with protocol identification prefix (such as <literal>http://</literal> or <literal>https://</literal>) removed, and a sequence of identifiers that uniquely determines the path in the DMLex tree structure. The DMLex fragment identification string of the root object <literal>lexicographicResource</literal> is the value of its attribute <literal>lexicographicResource.uri</literal>, with protocol identification prefix (such as <literal>http://</literal> or <literal>https://</literal>) removed. The fragment identification strings of its direct children are constructed as follows:</para>

<para><literal>lexicographicResource.uri/objectTypeName/objectID</literal></para>

<para>(We define below how object IDs are created.)</para>

<para>The DMLex fragment identification strings of descendant objects are constructed by appending the children's type names and IDs to the fragment identification strings of their direct parents, using “/” as the delimiter. In other words, the full template for a fragment identification string looks as follows:</para>

<para><literal>lexicographicResource.uri/objectTypeName/objectID/child1TypeName/child1ID/child2TypeName/child2ID/…</literal></para>

<para>For example, a particular <literal><olink targetptr="core_sense">sense</olink></literal> (which is a property of <literal><olink targetptr="core_entry">entry</olink></literal>) is assigned the following fragment identification string:</para>

<para><literal>lexicographicResource.uri/entry/entryID/sense/senseID</literal></para>

<para>A fragment identification string of an <literal><olink targetptr="core_example">example</olink></literal> (which is a property of <literal><olink targetptr="core_sense">sense</olink></literal>, which is a property of <literal><olink targetptr="core_entry">entry</olink></literal>) has the following structure:</para>

<para><literal>lexicographicResource.uri/entry/entryID/sense/senseID/example/exampleID</literal></para>

<section id="objectids">
<title>Object IDs</title>

<para>For the purpose of creating DMLex fragment identification strings, each object is assigned a unique ID relative to its parent, based on values of its properties declared as <glossterm>unique</glossterm>. Multiple situations can occur:</para>

<orderedlist>
<listitem>The object type has a single <glossterm>unique</glossterm> property with an arity of “exactly one”, and the value of the property is a string or a number. In this case, the object ID is the string or the number, with the following modifications performed in that particular order:
<itemizedlist>
<listitem>every “\” (ASCII character 5C) is replaced by “\\”</listitem>
<listitem>every “~” (ASCII character 7E) is replaced by “\~”</listitem>
<listitem>every “_” (ASCII character 5F) is replaced by “\_”</listitem>
<listitem>every “0” (zero, ASCII character 30) is replaced by “\0”</listitem>
<listitem>all IRI-unsafe characters (outside the <literal>iunreserved</literal> class according to [<link linkend="bib_rfc3987">RFC 3987</link>]) are percent-encoded according to [<link linkend="bib_rfc3986">RFC 3986</link>]</listitem>
</itemizedlist>
</listitem>
<listitem>The object type has a single <glossterm>unique</glossterm> property with an arity of “exactly one”, and the value of the property is a child DMLex object. In this case, the object ID is the same as the object ID of the child object. (Note: this case actually does not occur in the specification as such; we list it here to streamline the description of the following cases.)</listitem>
<listitem>The object type has a single <glossterm>unique</glossterm> property with an arbitrary arity. In this case, all the partial single values or child object IDs are constructed according to the steps 1. and 2., and the resulting object ID is their concatenation using “_” (ASCII character 5F) as a separator. The order of the partial values is driven by the <literal>listingOrder</literal> of the respective objects. If this procedure returns an empty string (which can happen in case of <glossterm>unique</glossterm> attributes that allow the arity of zero), the string “0” (zero, ASCII character 30) is used instead of the empty string.</listitem>
<listitem>The object type has multiple <glossterm>unique</glossterm> properties. In this case, all the partial values or child object IDs are constructed according to the steps 1., 2. and 3., and the resulting object ID is their concatenation using “~” (ASCII character 5F) as a separator. The order of the partial values is driven by the order of the properties as given in this specification. (Note: all atributes marked as <glossterm>unique</glossterm> need to be represented in the ID, as empty values are replaced by “0” according to step 3. No empty IDs are allowed.)</listitem>
<listitem>In specific situations it may happen there are multiple different objects with all the <glossterm>unique</glossterm> properties empty, i.e. multiple objects with duplicate IDs (the same sequence of zeros) emerge as the result of the step 4. One example of such a situation is multiple senses without <literal>indicator</literal>s or <literal>definition</literal>s, but with different translations. In that case, and only in that case, the value of <literal>listingOrder</literal> is concatenated to the sequence of zeros, to distinguish between the duplicate IDs. If there is only one such object, <literal>listingOrder</literal> is not concatenated to the sequence of zeros.</listitem>
</orderedlist>

<para>DMLex does not define the structure of DMLex fragment identification strings for object types without <glossterm>unique</glossterm> properties.</para>
</section>
<section id="iri_examples">
<title>DMLex fragment identification string examples</title>
<para>Particular examples of DMLex fragment identification strings can then look as follows:</para>
<itemizedlist>
<listitem><literal>www.example.com/lexicon/entry/cat~1~noun</literal></listitem>
<listitem><literal>www.example.com/lexicon/entry/cat~1~noun/sense/0~small%20furry%20animal</literal> (Here we assume that the sense's <literal>indicator</literal> is empty and it has one <literal>definition</literal> which says “small furry animal”).</listitem>
<listitem><literal>www.example.com/lexicon/entry/cat~1~noun/sense/0~small%20furry%20animal/example/I%20have%20two%20dogs%20and%20a%20cat.</literal></listitem>
<listitem><literal>www.example.com/lexicon/entry/cat~1~noun/sense/0~0</literal> (Here we assume that both the sense's <literal>definition</literal> and its <literal>indicator</literal> are empty, and there is only one such sense.)</listitem>
<listitem><literal>www.example.com/lexicon/entry/cat~1~noun/sense/0~02</literal> (Here we assume that both the sense's <literal>definition</literal> and its <literal>indicator</literal> are empty, there are multiple such senses, and this is the sense number 2, of all this entry's senses.)</listitem>
</itemizedlist>
</section>
</section>
</section>
<xi:include href="objectTypes/lexicographicResource.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
<xi:include href="objectTypes/entry.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
<xi:include href="objectTypes/partOfSpeech.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
Expand Down
18 changes: 18 additions & 0 deletions dmlex-v1.0/specification/dmlex.xml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,15 @@
</affiliation>
<email>[email protected]</email>
</editor>
<editor>
<firstname>Vojtěch</firstname>
<surname>Kovář</surname>
<affiliation>
<orgname><ulink url="https://www.muni.cz/">Masaryk University</ulink></orgname>
<address format="linespecific"><email>[email protected]</email></address>
</affiliation>
<email>[email protected]</email>
</editor>
<editor>
<firstname>Simon</firstname>
<surname>Krek</surname>
Expand Down Expand Up @@ -645,6 +654,15 @@
<title/>
<bibliomixed id="bcp14">
<abbrev>BCP 14</abbrev> is a concatenation of [RFC 2119] and [RFC 8174] </bibliomixed>
<bibliomixed id="bib_rfc3986">
<abbrev>RFC 3986</abbrev>
Tim Berners-Lee, Roy T. Fielding, Larry M Masinter
<title>Uniform Resource Identifier (URI): Generic Syntax</title>,
<citetitle>
<ulink url="https://datatracker.ietf.org/doc/rfc3986/">https://datatracker.ietf.org/doc/rfc3986/</ulink>
</citetitle>
IETF (Internet Engineering Task Force) RFC 3986, January 2005.
</bibliomixed>
<bibliomixed id="bib_rfc3987">
<abbrev>RFC 3987</abbrev>
Martin J. Dürst, Michel Suignard,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@
<para><literal>ref</literal>
<glossterm>required</glossterm> (exactly one) and <glossterm>unique</glossterm> (in
combination with other unique properties if present). Reference to an object, such as an
entry or a sense.</para>
entry or a sense. The IRI addressing mechanism described in <xref linkend="frag_iri"/>
can be used (but is not <glossterm>required</glossterm>).</para>
</listitem>
<listitem>
<para><literal>role</literal>
Expand Down
5 changes: 5 additions & 0 deletions dmlex-v1.0/specification/modules/linking/specification.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@
of the <code>member</code> datatype.</para>
<para>The Linking Module can be used to set up relations between objects inside the same
lexicographic resource, or between objects residing in different lexicographic resources.</para>
<para>For linking, some type of reference IDs of linked objects are needed (cf. the
<literal>ref</literal> property in <xref linkend="linking_member"/>). DMLex does not prescribe
the exact form of these IDs, however, a recommended method for creating unique IRIs for
DMLex objects is available in <xref linkend="frag_iri"/>, which may be useful especially
when linking objects from different lexicographic resources on the Web.</para>
<para>Examples: <xref linkend="ex12"/>, <xref linkend="ex13"/>, <xref linkend="ex14"/>, <xref
linkend="ex15"/>, <xref linkend="ex16"/>, <xref linkend="ex17"/>, <xref linkend="ex18"/>. </para>
<xi:include href="extensions/lexicographicResource.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
Expand Down

0 comments on commit b310afc

Please sign in to comment.