You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.
The python script that does word segmentation currently looks for //div[@subtype="transcription"]/p and applies the word segmentation rules to the text and element nodes inside that <p> element.
However, there are some inscriptions that have multiple texts on them or have texts on more than one part of the object. In this case , the structure of the transcription div is as follows:
//div[@subtype="transcription"]/div[@type="textPart"]/p where there is more than one textPart.
For ex caes0509.xml:
The script currently locates and segments the contents of the <p> in the first textPart. It etiher converts or ignores any subsequent ones, but only writes out the first one in the segmented output.
The script should convert and output each of the textPart divs.
The python script that does word segmentation currently looks for
//div[@subtype="transcription"]/p
and applies the word segmentation rules to the text and element nodes inside that<p>
element.However, there are some inscriptions that have multiple texts on them or have texts on more than one part of the object. In this case , the structure of the transcription
div
is as follows://div[@subtype="transcription"]/div[@type="textPart"]/p
where there is more than one textPart.For ex caes0509.xml:
Other examples: jeru0522.xml, mare0437
The script currently locates and segments the contents of the
<p>
in the first textPart. It etiher converts or ignores any subsequent ones, but only writes out the first one in the segmented output.The script should convert and output each of the textPart divs.
Python script
folder with output files
Will add example output - current and desired
The text was updated successfully, but these errors were encountered: