Skip to content

Commit

Permalink
fix: support spaces around newlines in brat export
Browse files Browse the repository at this point in the history
  • Loading branch information
TheooJ authored and percevalw committed Aug 7, 2023
1 parent 23bb576 commit caaa603
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 8 deletions.
5 changes: 5 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Changelog

## Pending

### Fixed
- `export_to_brat` issue with spans of entities on multiple lines.

## v0.8.1 (2023-05-31)

Fix release to allow installation from source
Expand Down
13 changes: 6 additions & 7 deletions edsnlp/connectors/brat.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,18 +226,17 @@ def export_to_brat(doc, txt_filename, overwrite_txt=False, overwrite_ann=False):
):
idx = fragment["begin"]
entity_text = doc["text"][fragment["begin"] : fragment["end"]]
for part in entity_text.split("\n"):
begin = idx
end = idx + len(part)
idx = end + 1
if begin != end:
spans.append((begin, end))
# eg: "mon entité \n est problématique"
for match in re.finditer(
r"\s*(.+?)(?:( *\n+)+ *|$)", entity_text, flags=re.DOTALL
):
spans.append((idx + match.start(1), idx + match.end(1)))
print(
"{}\t{} {}\t{}".format(
brat_entity_id,
str(entity["label"]),
";".join(" ".join(map(str, span)) for span in spans),
entity_text.replace("\n", " "),
" ".join(doc["text"][begin:end] for begin, end in spans),
),
file=f,
)
Expand Down
2 changes: 1 addition & 1 deletion tests/connectors/test_brat.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ def test_brat(
A1 etat T1 test
T2 localisation 39 57 dans le bras droit
T3 anatomie 47 57 bras droit
T4 pathologie 75 84;85 98 problème de locomotion
T4 pathologie 75 83;85 98 problème de locomotion
A2 assertion T4 absent
T5 pathologie 114 117 AVC
A3 etat T5 passé
Expand Down

0 comments on commit caaa603

Please sign in to comment.