-
Notifications
You must be signed in to change notification settings - Fork 38
Indices spec
For each index, certain information is required for proper display to the user. This page details the requirements for each index and specifications deriving from those.
Each index requires its items to have, for each instance:
- A link to the inscription
- The identifier of the inscription
- The number of the text part containing the instance
- The line number containing the instance
- An indicator of whether it is partially or completely restored
Do specific indices have further requirements? IOSPE has tei:num indexed alongside whether it is a simple value, an at least value, or an at most value, but I don't see any use made of this on the front end.
Each item needs to have its language indexed.
In addition to the actual index, each index needs some or all of: title, introduction/preamble, notes etc. This metadata is stored in a TEI file in content/xml/indices/. Information about indices of EpiDoc content lives in the file epidoc.xml; similarly, information about indices of TEI content goes in tei.xml. Each index goes into a top level div, titles are represented in headings. Any notes go into a subdiv that can be further structured. The notes by default are rendered at the top of the index.
IOSPE's Solr index is large and cumbersome to work with (in terms of time taken to index and the need for a special script), and EFES should try to avoid that. One possibility is to make the documents harvested for the user-indices more efficient, by grouping every instance for an item as multiple values within a single doc. This requires encoding all of the information for an instance into a single value (easily doable for identifier, text part number, line number, and restoration state). It does however preclude facetting on these indices.
This approach also requires operating on all of the inscriptions at once.
See https://github.com/EpiDoc/EFES/issues/32
As noted in the section above, facetting is not available if the Solr index is improved by grouping all instances of the same term in a single doc.