diff --git a/dmlex-v1.0/specification/core/specification.xml b/dmlex-v1.0/specification/core/specification.xml index a496c67..27ee257 100644 --- a/dmlex-v1.0/specification/core/specification.xml +++ b/dmlex-v1.0/specification/core/specification.xml @@ -41,7 +41,7 @@ example - +
Optional roots When exchanging data encoded in a DMLex serialization @@ -49,8 +49,8 @@ the object types lexicographicResource and entry can serve as such roots. - - +
+
Fragment identification Incomplete parts of DMLex objects represent valid fragments as long as it is possible to identify their complete source DMLex object. @@ -64,7 +64,64 @@ - +
+ DMLex fragment identification strings + DMLex provides a recommended method for addressing DMLex objects present on-line, useful for linking (cf. ) and general interoperability. Implementing this method is not required for conformance. + + Every fragment should be assigned a unique fragment identification string, composed of lexicographicResource.uri, with protocol identification prefix (such as http:// or https://) removed, and a sequence of identifiers that uniquely determines the path in the DMLex tree structure. The DMLex fragment identification string of the root object lexicographicResource is the value of its attribute lexicographicResource.uri, with protocol identification prefix (such as http:// or https://) removed. The fragment identification strings of its direct children are constructed as follows: + + lexicographicResource.uri/objectTypeName/objectID + + (We define below how object IDs are created.) + + The DMLex fragment identification strings of descendant objects are constructed by appending the children's type names and IDs to the fragment identification strings of their direct parents, using “/” as the delimiter. In other words, the full template for a fragment identification string looks as follows: + + lexicographicResource.uri/objectTypeName/objectID/child1TypeName/child1ID/child2TypeName/child2ID/… + + For example, a particular sense (which is a property of entry) is assigned the following fragment identification string: + + lexicographicResource.uri/entry/entryID/sense/senseID + + A fragment identification string of an example (which is a property of sense, which is a property of entry) has the following structure: + + lexicographicResource.uri/entry/entryID/sense/senseID/example/exampleID + +
+ Object IDs + + For the purpose of creating DMLex fragment identification strings, each object is assigned a unique ID relative to its parent, based on values of its properties declared as unique. Multiple situations can occur: + + + The object type has a single unique property with an arity of “exactly one”, and the value of the property is a string or a number. In this case, the object ID is the string or the number, with the following modifications performed in that particular order: + + every “\” (ASCII character 5C) is replaced by “\\” + every “~” (ASCII character 7E) is replaced by “\~” + every “_” (ASCII character 5F) is replaced by “\_” + every “0” (zero, ASCII character 30) is replaced by “\0” + all IRI-unsafe characters (outside the iunreserved class according to [RFC 3987]) are percent-encoded according to [RFC 3986] + + + The object type has a single unique property with an arity of “exactly one”, and the value of the property is a child DMLex object. In this case, the object ID is the same as the object ID of the child object. (Note: this case actually does not occur in the specification as such; we list it here to streamline the description of the following cases.) + The object type has a single unique property with an arbitrary arity. In this case, all the partial single values or child object IDs are constructed according to the steps 1. and 2., and the resulting object ID is their concatenation using “_” (ASCII character 5F) as a separator. The order of the partial values is driven by the listingOrder of the respective objects. If this procedure returns an empty string (which can happen in case of unique attributes that allow the arity of zero), the string “0” (zero, ASCII character 30) is used instead of the empty string. + The object type has multiple unique properties. In this case, all the partial values or child object IDs are constructed according to the steps 1., 2. and 3., and the resulting object ID is their concatenation using “~” (ASCII character 5F) as a separator. The order of the partial values is driven by the order of the properties as given in this specification. (Note: all atributes marked as unique need to be represented in the ID, as empty values are replaced by “0” according to step 3. No empty IDs are allowed.) + In specific situations it may happen there are multiple different objects with all the unique properties empty, i.e. multiple objects with duplicate IDs (the same sequence of zeros) emerge as the result of the step 4. One example of such a situation is multiple senses without indicators or definitions, but with different translations. In that case, and only in that case, the value of listingOrder is concatenated to the sequence of zeros, to distinguish between the duplicate IDs. If there is only one such object, listingOrder is not concatenated to the sequence of zeros. + + + DMLex does not define the structure of DMLex fragment identification strings for object types without unique properties. +
+
+ DMLex fragment identification string examples + Particular examples of DMLex fragment identification strings can then look as follows: + + www.example.com/lexicon/entry/cat~1~noun + www.example.com/lexicon/entry/cat~1~noun/sense/0~small%20furry%20animal (Here we assume that the sense's indicator is empty and it has one definition which says “small furry animal”). + www.example.com/lexicon/entry/cat~1~noun/sense/0~small%20furry%20animal/example/I%20have%20two%20dogs%20and%20a%20cat. + www.example.com/lexicon/entry/cat~1~noun/sense/0~0 (Here we assume that both the sense's definition and its indicator are empty, and there is only one such sense.) + www.example.com/lexicon/entry/cat~1~noun/sense/0~02 (Here we assume that both the sense's definition and its indicator are empty, there are multiple such senses, and this is the sense number 2, of all this entry's senses.) + +
+
+
diff --git a/dmlex-v1.0/specification/dmlex.xml b/dmlex-v1.0/specification/dmlex.xml index 2d4c457..54a0db1 100644 --- a/dmlex-v1.0/specification/dmlex.xml +++ b/dmlex-v1.0/specification/dmlex.xml @@ -61,6 +61,15 @@ milos.jakubicek@sketchengine.eu + + Vojtěch + Kovář + + Masaryk University +
vojcek@mail.muni.cz
+
+ vojcek@mail.muni.cz +
Simon Krek @@ -645,6 +654,15 @@ <bibliomixed id="bcp14"> <abbrev>BCP 14</abbrev> is a concatenation of [RFC 2119] and [RFC 8174] </bibliomixed> + <bibliomixed id="bib_rfc3986"> + <abbrev>RFC 3986</abbrev> + Tim Berners-Lee, Roy T. Fielding, Larry M Masinter + <title>Uniform Resource Identifier (URI): Generic Syntax, + + https://datatracker.ietf.org/doc/rfc3986/ + + IETF (Internet Engineering Task Force) RFC 3986, January 2005. + RFC 3987 Martin J. Dürst, Michel Suignard, diff --git a/dmlex-v1.0/specification/modules/linking/objectTypes/member.xml b/dmlex-v1.0/specification/modules/linking/objectTypes/member.xml index 6bfdcca..03330d3 100644 --- a/dmlex-v1.0/specification/modules/linking/objectTypes/member.xml +++ b/dmlex-v1.0/specification/modules/linking/objectTypes/member.xml @@ -24,7 +24,8 @@ ref required (exactly one) and unique (in combination with other unique properties if present). Reference to an object, such as an - entry or a sense. + entry or a sense. The IRI addressing mechanism described in + can be used (but is not required). role diff --git a/dmlex-v1.0/specification/modules/linking/specification.xml b/dmlex-v1.0/specification/modules/linking/specification.xml index e6bce3d..1ccabb2 100644 --- a/dmlex-v1.0/specification/modules/linking/specification.xml +++ b/dmlex-v1.0/specification/modules/linking/specification.xml @@ -21,6 +21,11 @@ of the member datatype. The Linking Module can be used to set up relations between objects inside the same lexicographic resource, or between objects residing in different lexicographic resources. + For linking, some type of reference IDs of linked objects are needed (cf. the + ref property in ). DMLex does not prescribe + the exact form of these IDs, however, a recommended method for creating unique IRIs for + DMLex objects is available in , which may be useful especially + when linking objects from different lexicographic resources on the Web. Examples: , , , , , , .