You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.
As part of the word segmentation process, we will have to correct or add some <w> markup by hand to inscriptions with complicated features. Also, in the future, inscriptions may be amended or corrected, so that the segmented <div> will change in order to mirror the changes in the transcription div. It would be very useful to be able to run a separate script to (re)generate the @xml:id attributes.
Input and Results
This new script should read in an inscription that has a <div type="edition" subtype="transcription_segmented">
It should take the content of the <div type="edition" subtype="transcription_segmented"> and add an @xml:id to each element in the div. These elements are likely to be <w>, <num>, <orig> and <g>.
The @xml:id should be in the form @xml:id="IIPID-001" where IIPID is the IIP number of the file. for ex. beth0345 (don't include the .xml extension) followed by the number of the element in sequence in the div.
Ex: <w xml:id="beth0100.xml-04"> would be the 4th element in the div, for inscription beth0010.xml
Note that most inscriptions have names like this: caes0002.xml, but they can also appear in the from idum0003a.xml
If this script is written in XSLT it will be easier to run in Oxygen. if however, it is written in Python, then it can become part of a pipeline that is run on the command line.
The text was updated successfully, but these errors were encountered:
As part of the word segmentation process, we will have to correct or add some
<w>
markup by hand to inscriptions with complicated features. Also, in the future, inscriptions may be amended or corrected, so that the segmented<div>
will change in order to mirror the changes in the transcription div. It would be very useful to be able to run a separate script to (re)generate the@xml:id
attributes.Files and Folders
The original word segmentation script that does this is here:
https://github.com/lukehollis/iip-word-lists/blob/master/word_segmentation/word_segmentation.py. l. 216
Folder that has files with word segmentation
I'm not sure this is worth copying, but this is how it's done now.
Input and Results
This new script should read in an inscription that has a
<div type="edition" subtype="transcription_segmented">
It should take the content of the
<div type="edition" subtype="transcription_segmented">
and add an@xml:id
to each element in the div. These elements are likely to be<w>
,<num>
,<orig>
and<g>
.The
@xml:id
should be in the form@xml:id="IIPID-001"
where IIPID is the IIP number of the file. for ex. beth0345 (don't include the.xml
extension) followed by the number of the element in sequence in the div.Ex:
<w xml:id="beth0100.xml-04">
would be the 4th element in the div, for inscription beth0010.xmlNote that most inscriptions have names like this: caes0002.xml, but they can also appear in the from idum0003a.xml
If this script is written in XSLT it will be easier to run in Oxygen. if however, it is written in Python, then it can become part of a pipeline that is run on the command line.
The text was updated successfully, but these errors were encountered: