forked from fbergmann/spec-combine-archive
-
Notifications
You must be signed in to change notification settings - Fork 0
/
syntax.tex
286 lines (215 loc) · 15.5 KB
/
syntax.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
% -*- TeX-master: "main"; fill-column: 72 -*-
\section{Proposed syntax and semantics}
\label{syntax}
In this section, we define the syntax and semantics of the COMBINE
archive. We expound on the various data types and constructs defined,
then in \sect{examples}, we provide complete examples of archives.
\subsection{The archive format}
The COMBINE archive is encoded using the "{O}pen
{M}odeling {EX}change format" (OMEX). The archive itself is a "zip" file \citep{zipFile}. Zip is a file format used for data compression and archiving. A zip file contains one or more files that have been compressed, to reduce file size, or stored as is. The technical specification of the ZIP format is available from the PKWARE website \citep{zipSpec}.
\subsection{COMBINE archive extensions}
\label{combine-archive-extensions}
The extension for the COMBINE archive is \token{.omex}
Additional extensions are available to indicate what is the main standard format used within the archive. This help users to choose between different archives, and select appropriate software tools and in particular parsers. As such, the purpose of the extensions is different from the master attribute of the manifest file, described in section \ref{content-class}, which is read by the processing software once the archive is loaded and parsed. Currently, the following extensions are specified:
\begin{itemize}
\item {\token{.sedx} - SED-ML archive} \citep{Waltemath:2011}
\item {\token{.sbex} - SBML archive} \citep{hucka:2003}
\item {\token{.cmex} - CellML archive} \citep{Cuellar:2003}
\item {\token{.neux} - NeuroML archive} \citep{Gleeson:2010}
\item {\token{.phex} - PharmML archive} \citep{Moodie:2013}
\item {\token{.sbox} - SBOL archive} \citep{Galdzicki:2014}
\end{itemize}
Note that a COMBINE archive may contain files in several different standard formats. Therefore the specific extension is only an indication. For instance, a database of models could distribute archives of the same models, encoded in SBML with the extension \token{.sbex} and encoded in CellML with the extension \token{.cmex}. However, archives containing models in both formats could also be distributed and the extension \token{.omex} used. If the archives contain SED-ML files, the extension \token{.sedx} could be used and the selection between the model formats be devolved to the SED-ML file.
\subsection{Content of the archive}
The archive contains:
\begin{enumerate}
\item {
a mandatory manifest file, called \token{manifest.xml}, always located at the
root of the archive, that describes the location and the type of each
data file contained in the archive plus an entry describing
the archive itself.
The location of those files is defined by a relative path. In the current
version of the COMBINE archive, all the files described must be included
in the archive itself. It is envisioned that in the future the manifest
might list files located elsewhere, using valid and resolvable http
URIs.
}
\item {
metadata files containing clerical information about the
various files contained in the archive, and the archive itself. A best practice is to include only one file for each file format metadata called \token{metadata.*} (where \token{*} means the
suitable file extension)
}
\item {all remaining files necessary to the model and simulation project. }
\end{enumerate}
\subsection{Namespace URI and other declarations necessary}
\label{xml-namespace}
In order to uniquely identify the manifest file and the whole archive, OMEX defines two namespaces.
The namespace URI for OMEX:
\begin{quote}
\uri{http://identifiers.org/combine.specifications/omex}
\end{quote}
The namespace URI for the COMBINE archive manifest:
\begin{quote}
\uri{http://identifiers.org/combine.specifications/omex-manifest/version-1.1}
\end{quote}
Versioned URIs will be generated as needed, following the practices of COMBINE standards.
\subsection{Primitive data types}
\label{primtypes}
The COMBINE archive uses the XML Schema 1.0 data types~\citep{biron:2000}.
More specifically we make use of \primtype{string} and \primtype{boolean}.
\subsection{The \token{manifest.xml} file and the \class{OmexManifest} class}
\label{manifest-class}
One file must be present at the root of any COMBINE archive, named \token{manifest.xml}. This file contains an instantiation of the \OmexManifest class.
It contains a number of \Content children, one of which represents the COMBINE archive itself.
A valid manifest needs to have at least one entry, declaring the archive itself, but may
contain as many entries as needed. All the files present in the archive but one must be listed in the manifest. The only optional entry describes the manifest.xml file. Indeed the presence of manifext.xml is mandatory and its declaration is not necessary for parsing (since the software would have already parsed it to access the declaration!).
\begin{figure}[h!]
\centering
% Requires \usepackage{graphicx}
\includegraphics[width=12cm]{images/OmexManifest.pdf}\\
\caption{A UML representation of the Manifest. Each manifest contains a number of Content elements.}
\label{fig:combine_uml}
\end{figure}
\subsection{The \class{Content} class}
\label{content-class}
An entry in the \OmexManifest is represented by the \Content class. It declares a file in the \emph{COMBINE archive}. A content element possesses two required attributes, \token{location} and \token{format}, as well as an optional one, \token{master}.
\paragraph{The \token{location} attribute}
The \token{location} attribute of type
\token{string} is required. It represents the relative location of an entry within the
archive. The archive is represented by a dot \token{'.'}.
\paragraph{The \token{format} attribute}
The \token{format} attribute of type \token{string} is required. It
indicates the file type of the \Content element. The allowed values fall in two categories. Either the format
denotes one of the COMBINE standards, in which case the \token{format}
is the corresponding \token{Identifiers.org} URI. Or the
\token{format} represents an Internet Media type \citep{rfc2046} (previously known as "MIME type"), in which case the \token{format} indicates this Media type.
Using an \token{Identifiers.org} URI allows to unambiguously define a COMBINE format, including the precise version used. For example, the identifier: \token{http://identifiers.org/combine.specifications/sbml} would denote that the file declared by the \Content
element is encoded in the SBML format. That might be sufficient, as tools supporting one version of SBML often
support others as well. However, if the software creating the COMBINE archive wanted to be more precise, it could specify that the document is encoded in SBML Level 2:
\begin{quote}
\token{http://identifiers.org/combine.specifications/sbml.level-2}
\end{quote}
or even:
\begin{quote}
\token{http://identifiers.org/combine.specifications/sbml.level-2.version-4}.
\end{quote}
The Internet Media type \token{purl.org} URI should be of the form \token{http://purl.org/NET/mediatypes/}
followed by the Media type name. Here are a few examples:
\begin{example}
for png (Portable Network Graphics) http://purl.org/NET/mediatypes/image/png
for pdf (Portable Document Format) http://purl.org/NET/mediatypes/application/pdf
for sbml http://purl.org/NET/mediatypes/application/sbml+xml
\end{example}
If both an \token{Identifiers.org} URI and a Media Type are available for one given format, the
\token{Identifiers.org} URI must be used. Therefore the latter example should not be used. However, new Identifiers.org URI could be created for a format where an existing Media type was in used.
When creating a new COMBINE archive, the URI form should always be used for Media types. However, you may encounter some
old COMBINE archives that use directly the Media type name for the format attribute, not in the URI form.
\paragraph{The \token{master} attribute}
\label{active_document}
The \token{master} attribute of type \token{boolean} optional. When set to "true", it
indicates that the file declared by the content element should be used first when
processing the content of the archive. The file can be for instance the description of an upper model in a
composed model, itself declaring the various submodels; a simulation description,
declaring the different model descriptions and data sources used in the
experiment. In most cases, one content element per archive will have its \token{master}
attribute set to \token{true}.
For example the reading the snippet below, a software program should decide to present to the users first the file pointed by \token{location="simulation.xml"}.
\begin{example}
<?xml version="1.0" encoding="utf-8"?>
<omexManifest xmlns="http://identifiers.org/combine.specifications/omex-manifest">
<content location="." format="http://identifiers.org/combine.specifications/omex"/>
<content location="model/model.xml"
format="http://identifiers.org/combine.specifications/sbml"/>
<content location="simulation.xml" master="true"
format="http://identifiers.org/combine.specifications/sed-ml"/>
</omexManifest>
\end{example}
To simplify the identification, by a user or software, of the formats used by the documents inside an archive, file extensions
(see also \ref{combine-archive-extensions}) can also be used. For instance, the archive containing the manifest file shown in the example above, could have the extension \token{.sbex}, to indicate that the software reading the archive need
to support the \token{SBML} format to be able to interpret the content of the archive. If the software
does support SED-ML as well as SBML, it should load the SED-ML file as the creator of the archive indicated
it to be the '\token{master}' file. If the software does not support SED-ML, it can still open the SBML document, ignoring
the master attribute.
Alternatively, if a software does not support either SBML or SED-ML it can choose to disregard the \token{master} attribute and the archive extension to present to the user the files present in the archive that it supports such as a PDF document or a diagram describing the model,
In some cases, the \token{master} attribute is more precise than the information given by the archive file
extension.
\begin{example}
<?xml version="1.0" encoding="utf-8"?>
<omexManifest xmlns="http://identifiers.org/combine.specifications/omex-manifest">
<content location="."
format="http://identifiers.org/combine.specifications/omex"/>
<content location="main-model.xml" master="true"
format="http://identifiers.org/combine.specifications/cellml.1.1"/>
<content location="submodel1.xml"
format="http://identifiers.org/combine.specifications/cellml.1.1"/>
<content location="submodel2.xml"
format="http://identifiers.org/combine.specifications/cellml.1.1"/>
</omexManifest>
\end{example}
In this case, the archive contains a modular CellML model. The file extension \token{cmex} is not sufficient to know which file should be processed first since there are several CellML files. The \token{master} attribute allows the software
to know it should process \token{main-model.xml} first.
Ideally, COMBINE archives should contain only one model and should have only one \token{master} attribute set to \token{true}. However, in some cases, the creator of the archive might want to set several \token{master} attribute to \token{true}, for instance to indicate alternative renditions of a model in different formats, or to bundle together several models described in the same publication. In that case the reading software are free to open a specific file, all the files tagged as master, or display a dialog box to the user offering a choice. This might create some confusion as opening the same archive several times in the same software might result in a different model loaded. One option would be to not set any \token{master} attribute to \token{true}, being aware that proper interpretation of the archive might be difficult, in particular for modular models.
\subsection{Advised format for the archive metadata}
Any type of file can be included in a COMBINE archive, and therefore any
type of metadata format. However, in the interest of interoperability,
and to ease the development of software support for metadata, a
recommended format is provided as part of the specification of the
archive.
The recommended format is based on several standards developed by other
organizations:
\begin{itemize}
\item {
The Resource Description Format \citep{RDF} of the W3C, in particular its terms:
\begin{itemize}
\item \token{RDF} (\url{http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/#coreSyntaxTerms})
\item \token{Description} (\url{http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/#syntaxTerms})
\end{itemize}
}
\item {
vCard Ontology \citep{vCardOnto}, the RDF form of a file format standard for electronic business
cards, in particular its terms:
\begin{itemize}
\item \token{hasName} (\url{http://www.w3.org/2006/vcard/ns#hasName}),
\item \token{family-name}, (\url{http://www.w3.org/2006/vcard/ns#family-name}),
\item \token{given-name}, (\url{http://www.w3.org/2006/vcard/ns#given-name}),
\item \token{hasEmail}, (\url{http://www.w3.org/2006/vcard/ns#hasEmail}),
\item \token{organization-name} (\url{http://www.w3.org/2006/vcard/ns#organization-name}),
\end{itemize}
More information on how to use vCard in RDF can be found
on the W3C website\footnote{\url{http://www.w3.org/TR/vcard-rdf/}}.
}
\item {
Metadata terms\footnote{\url{http://dublincore.org/documents/dcmi-terms/}} of
the Dublin Core Metadata Initiative, in particular the terms:
\begin{itemize}
\item \token{description} (\url{http://purl.org/dc/terms/description}),
\item \token{creator} (\url{http://purl.org/dc/terms/creator}),
\item \token{created} (\url{http://purl.org/dc/terms/created}),
\item \token{modified} (\url{http://purl.org/dc/terms/modified}),
\item \token{W3CDTF} (\url{http://purl.org/dc/terms/W3CDTF}).
\end{itemize}
More information on the use of Dublin Core in RDF can be found
on the Dublin Core website{\footnote{\url{http://dublincore.org/documents/dc-rdf/}}}.
More informationabout the definition of the date format used within \token{dcterms:created} and
\token{dcterms:modified} elements is available on the W3C website {\footnote{\url{http://www.w3.org/TR/NOTE-datetime}}}.
}
\item {The BioModels qualifiers\footnote{\url{http://co.mbine.org/standards/qualifiers}}
}
\end{itemize}
Users of the COMBINE standards may already be familiar with this
approach, as it is also taken by SBML, SED-ML, SBGN-ML \footnote{Note: These terms slightly differs
from the terms used in SBML level 3 version 1 specification.}.
A \textit{COMBINE archive} can include multiple metadata elements adding information about different content files. To
identify the file a metadata element refers to, the \token{rdf:about} attribute of the relevant metadata structure should use the same value as used in the \token{location} attribute of the respective \Content element, e.g.
\begin{example}
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:vCard="http://www.w3.org/2006/vcard/ns#"
xmlns:bqmodel="http://biomodels.net/models-qualifiers">
<rdf:Description rdf:about="simulation.xml">
...
</rdf:Description>
</rdf:RDF>
\end{example}
A complete example of the metadata related to a simulation description contained in a COMBINE archive is described in \ref{examples}.