Skip to content

Commit

Permalink
[conll2folia] add original untokenised text if present #53
Browse files Browse the repository at this point in the history
  • Loading branch information
proycon committed Aug 17, 2018
1 parent 7d26f8d commit 6af7d17
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions foliatools/conllu2folia.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,10 @@ def main():
else:
sent_id = doc_id + '.s.' + str(i+1)
sentence = folia.Sentence(doc, id=sent_id)
if 'text' in tokenlist.metadata:
sentence.add(folia.TextContent, tokenlist.metadata['text'], cls="original")
elif 'text_en' in tokenlist.metadata:
sentence.add(folia.TextContent, tokenlist.metadata['text_en'], cls="text_en")
wordindex = {} #quick lookup index for this sentence
for token in tokenlist:
if token['misc'] and 'SpaceAfter' in token['misc'] and token['misc']['SpaceAfter'].lower() == 'no':
Expand Down

0 comments on commit 6af7d17

Please sign in to comment.