-
Notifications
You must be signed in to change notification settings - Fork 3
EDH and Pelagios NER
Participants: Orla Murphy, Sarah Middle, Simona Stoyanova, Núria Garcia Casacuberta
The aim of this group was to use Named Entity Recognition (NER) on the text of inscriptions from the Epigraphic Database Heidelberg (EDH) to identify placenames, which could then be linked to their equivalent terms in the Pleiades gazetteer and thereby integrated with Pelagios Commons.
-
Work to be done on XML documents:
a. Method 1: Strip XML from content of div[type=edition]/ab and run NER on plain text Script to extract inscription from XML and strip out XML tags: https://github.com/EpiDoc/OEDUc/blob/master/ExtractInscriptionTextFromEDHXML_V1_20170515.py
b. Method 2: Tokenize everything and run NER process on tokens
-
Run NER process for method 1:
a. Sunoikisis NER I class (https://github.com/SunoikisisDC/SunoikisisDC-2016-2017/wiki/Named-Entity-Extraction-I)
b. Sunoikisis NER II class (https://github.com/SunoikisisDC/SunoikisisDC-2016-2017/wiki/Named-Entity-Extraction-II)
-
Checking and refining, further training