Skip to content

Commit

Permalink
feat: add patterns for metastasis ct scan
Browse files Browse the repository at this point in the history
  • Loading branch information
aricohen93 committed Oct 10, 2024
1 parent 2e227e9 commit d672b2c
Show file tree
Hide file tree
Showing 24 changed files with 128 additions and 74 deletions.
2 changes: 2 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@

- `eds.tables` accepts a minimum_table_size (default 2) argument to reduce pollution
- `RuleBasedQualifier` now expose a `process` method that only returns qualified entities and token without actually tagging them, defering this task to the `__call__` method.
- Added new patterns for metastasis detection. Developed on CT-Scan reports.
- Added citation of articles

### Fixed

Expand Down
3 changes: 3 additions & 0 deletions docs/pipes/ner/disorders/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
## Presentation

The following components extract 16 different conditions from the [Charlson Comorbidity Index](https://www.rdplf.org/calculateurs/pages/charlson/charlson.html). Each component is based on the ContextualMatcher component.

The components were developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by [@petitjean_2024]

Some general considerations about those components:

- Extracted entities are stored in `doc.ents` and `doc.spans`. For instance, the `eds.tobacco` component stores matches in `doc.spans["tobacco"]`.
Expand Down
16 changes: 16 additions & 0 deletions docs/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -145,3 +145,19 @@ @misc{terminologie-adicap
AUTHOR = {Agence du numérique en santé},
DETAILS = {https://smt.esante.gouv.fr/wp-json/ans/terminologies/document?terminologyId=terminologie-adicap&fileName=cgts_sem_adicap_fiche-detaillee.pdf},
}

@article{petitjean_2024,
author = {Petit-Jean, Thomas and Gérardin, Christel and Berthelot, Emmanuelle and Chatellier, Gilles and Frank, Marie and Tannier, Xavier and Kempf, Emmanuelle and Bey, Romain},
title = "{Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions}",
journal = {Journal of the American Medical Informatics Association},
volume = {31},
number = {6},
pages = {1280-1290},
year = {2024},
month = {04},
abstract = "{To develop and validate a natural language processing (NLP) pipeline that detects 18 conditions in French clinical notes, including 16 comorbidities of the Charlson index, while exploring a collaborative and privacy-enhancing workflow.The detection pipeline relied both on rule-based and machine learning algorithms, respectively, for named entity recognition and entity qualification, respectively. We used a large language model pre-trained on millions of clinical notes along with annotated clinical notes in the context of 3 cohort studies related to oncology, cardiology, and rheumatology. The overall workflow was conceived to foster collaboration between studies while respecting the privacy constraints of the data warehouse. We estimated the added values of the advanced technologies and of the collaborative setting.The pipeline reached macro-averaged F1-score positive predictive value, sensitivity, and specificity of 95.7 (95\\%CI 94.5-96.3), 95.4 (95\\%CI 94.0-96.3), 96.0 (95\\%CI 94.0-96.7), and 99.2 (95\\%CI 99.0-99.4), respectively. F1-scores were superior to those observed using alternative technologies or non-collaborative settings. The models were shared through a secured registry.We demonstrated that a community of investigators working on a common clinical data warehouse could efficiently and securely collaborate to develop, validate and use sensitive artificial intelligence models. In particular, we provided an efficient and robust NLP pipeline that detects conditions mentioned in clinical notes.}",
issn = {1527-974X},
doi = {10.1093/jamia/ocae069},
url = {https://doi.org/10.1093/jamia/ocae069},
eprint = {https://academic.oup.com/jamia/article-pdf/31/6/1280/57769016/ocae069.pdf},
}
6 changes: 3 additions & 3 deletions edsnlp/pipes/ner/behaviors/alcohol/alcohol.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,9 @@ class AlcoholMatcher(DisorderMatcher):
Authors and citation
--------------------
The `eds.alcohol` component was developed by AP-HP's Data Science team with a team
of medical experts. A paper describing in details the development of those
components is being drafted and will soon be available.
The `eds.alcohol` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
"""

def __init__(
Expand Down
6 changes: 3 additions & 3 deletions edsnlp/pipes/ner/behaviors/tobacco/tobacco.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,9 +89,9 @@ class TobaccoMatcher(AlcoholMatcher):
Authors and citation
--------------------
The `eds.tobacco` component was developed by AP-HP's Data Science team with a team
of medical experts. A paper describing in details the development of those
components is being drafted and will soon be available.
The `eds.tobacco` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
"""

def __init__(
Expand Down
6 changes: 3 additions & 3 deletions edsnlp/pipes/ner/disorders/aids/aids.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,9 @@ class AIDSMatcher(DisorderMatcher):
Authors and citation
--------------------
The `eds.aids` component was developed by AP-HP's Data Science team with a team of
medical experts. A paper describing in details the development of those components
is being drafted and will soon be available.
The `eds.aids` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
"""

def __init__(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,10 +78,10 @@ class CerebrovascularAccidentMatcher(DisorderMatcher):
Authors and citation
--------------------
The `eds.cerebrovascular_accident` component was developed by AP-HP's Data Science
team with a team of medical experts. A paper describing in details the development
of those components is being drafted and will soon be available.
"""
The `eds.cerebrovascular_accident` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
""" # noqa: E501

def __init__(
self,
Expand Down
6 changes: 3 additions & 3 deletions edsnlp/pipes/ner/disorders/ckd/ckd.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,9 @@ class CKDMatcher(DisorderMatcher):
Authors and citation
--------------------
The `eds.CKD` component was developed by AP-HP's Data Science team with a team of
medical experts. A paper describing in details the development of those components
is being drafted and will soon be available.
The `eds.ckd` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
"""

def __init__(
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""`eds.congestive_heart_failure` pipeline"""

from typing import Any, Dict, List, Optional, Union

from edsnlp.core import PipelineProtocol
Expand Down Expand Up @@ -71,10 +72,10 @@ class CongestiveHeartFailureMatcher(DisorderMatcher):
Authors and citation
--------------------
The `eds.congestive_heart_failure` component was developed by AP-HP's Data Science
team with a team of medical experts. A paper describing in details the development
of those components is being drafted and will soon be available.
"""
The `eds.congestive_heart_failure` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
""" # noqa: E501

def __init__(
self,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,10 @@ class ConnectiveTissueDiseaseMatcher(DisorderMatcher):
Authors and citation
--------------------
The `eds.connective_tissue_disease` component was developed by AP-HP's Data Science
team with a team of medical experts. A paper describing in details the development
of those components is being drafted and will soon be available.
"""
The `eds.connective_tissue_disease` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
""" # noqa: E501

def __init__(
self,
Expand Down
6 changes: 3 additions & 3 deletions edsnlp/pipes/ner/disorders/copd/copd.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,9 @@ class COPDMatcher(DisorderMatcher):
Authors and citation
--------------------
The `eds.copd` component was developed by AP-HP's Data Science team with a team of
medical experts. A paper describing in details the development of those components
is being drafted and will soon be available.
The `eds.copd` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
"""

def __init__(
Expand Down
6 changes: 3 additions & 3 deletions edsnlp/pipes/ner/disorders/dementia/dementia.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,9 @@ class DementiaMatcher(DisorderMatcher):
Authors and citation
--------------------
The `eds.dementia` component was developed by AP-HP's Data Science team with a team
of medical experts. A paper describing in details the development of those
components is being drafted and will soon be available.
The `eds.dementia` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
"""

def __init__(
Expand Down
10 changes: 4 additions & 6 deletions edsnlp/pipes/ner/disorders/diabetes/diabetes.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,7 @@
from edsnlp.matchers.regex import RegexMatcher
from edsnlp.matchers.utils import get_text
from edsnlp.pipes.base import SpanSetterArg
from edsnlp.pipes.core.contextual_matcher.contextual_matcher import (
get_window,
)
from edsnlp.pipes.core.contextual_matcher.contextual_matcher import get_window

from ..base import DisorderMatcher
from .patterns import COMPLICATIONS, default_patterns
Expand Down Expand Up @@ -86,9 +84,9 @@ class DiabetesMatcher(DisorderMatcher):
# Authors and citation
The `eds.diabetes` component was developed by AP-HP's Data Science team with a team
of medical experts. A paper describing in details the development of those
components is being drafted and will soon be available.
The `eds.diabetes` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
"""

def __init__(
Expand Down
4 changes: 2 additions & 2 deletions edsnlp/pipes/ner/disorders/hemiplegia/hemiplegia.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,8 @@ class HemiplegiaMatcher(DisorderMatcher):
# Authors and citation
The `eds.hemiplegia` component was developed by AP-HP's Data Science team with a
team of medical experts. A paper describing in details the development of those
components is being drafted and will soon be available.
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
"""

def __init__(
Expand Down
6 changes: 3 additions & 3 deletions edsnlp/pipes/ner/disorders/leukemia/leukemia.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,9 @@ class LeukemiaMatcher(DisorderMatcher):
Authors and citation
--------------------
The `eds.leukemia` component was developed by AP-HP's Data Science team with a team
of medical experts. A paper describing in details the development of those
components is being drafted and will soon be available.
The `eds.leukemia` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
"""

def __init__(
Expand Down
4 changes: 2 additions & 2 deletions edsnlp/pipes/ner/disorders/liver_disease/liver_disease.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,8 @@ class LiverDiseaseMatcher(DisorderMatcher):
Authors and citation
--------------------
The `eds.liver_disease` component was developed by AP-HP's Data Science team with a
team of medical experts. A paper describing in details the development of those
components is being drafted and will soon be available.
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
"""

def __init__(
Expand Down
6 changes: 3 additions & 3 deletions edsnlp/pipes/ner/disorders/lymphoma/lymphoma.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,9 @@ class LymphomaMatcher(DisorderMatcher):
Authors and citation
--------------------
The `eds.lymphoma` component was developed by AP-HP's Data Science team with a team
of medical experts. A paper describing in details the development of those
components is being drafted and will soon be available.
The `eds.lymphoma` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
"""

def __init__(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,10 +80,10 @@ class MyocardialInfarctionMatcher(DisorderMatcher):
Authors and citation
--------------------
The `eds.myocardial_infarction` component was developed by AP-HP's Data Science
team with a team of medical experts. A paper describing in details the development
of those components is being drafted and will soon be available.
"""
The `eds.myocardial_infarction` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
""" # noqa: E501

def __init__(
self,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,10 @@ class PepticUlcerDiseaseMatcher(DisorderMatcher):
Authors and citation
--------------------
The `eds.peptic_ulcer_disease` component was developed by AP-HP's Data Science team
with a team of medical experts. A paper describing in details the development of
those components is being drafted and will soon be available.
"""
The `eds.peptic_ulcer_disease` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
""" # noqa: E501

def __init__(
self,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,10 @@ class PeripheralVascularDiseaseMatcher(DisorderMatcher):
Authors and citation
--------------------
The `eds.peripheral_vascular_disease` component was developed by AP-HP's Data
Science team with a team of medical experts. A paper describing in details the
development of those components is being drafted and will soon be available.
"""
The `eds.peripheral_vascular_disease` component was developed by AP-HP's Data Science team with a
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024].
""" # noqa: E501

def __init__(
self,
Expand Down
24 changes: 24 additions & 0 deletions edsnlp/pipes/ner/disorders/solid_tumor/patterns.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,30 @@
),
)

# Patterns developed for CT-Scan reports
metastasis_ct_scan = dict(
source="metastasis_ct_scan",
regex=[
r"(?i)(m[ée]tasta(se|tique)s?)",
r"(diss[ée]min[ée]e?s?)",
r"(carcinose)",
r"(((allure|l[ée]sion|localisation|progression)s?\s)(suspecte?s?)?.{0,50}(secondaire)s?)",
r"(l(a|â)ch(é|e|er)\sde\sballons?)",
r"(l[ée]sions?\s(non\s)?cibles?)",
r"(rupture.{1,20}corticale)",
r"(envahissement.{0,15}parties\smolles)",
r"((l[i,y]se).{1,20}os)|ost[eé]ol[i,y]|rupture.{1,20}corticale|envahissement.{1,20}parties\smolles|ost[eé]ocondensa.{1,20}(suspect|secondaire|[ée]volutive)",
r"(l[ée]sion|anomalie|image).{1,20}os.{1,30}(suspect|secondaire|[ée]volutive)",
r"os.{1,30}(l[ée]sion|anomalie|image).{1,20}(suspect|secondaire|[ée]volutive)",
r"(l[ée]sion|anomalie|image).{1,20}l[i,y]tique",
r"(l[ée]sion|anomalie|image).{1,20}condensant.{1,20}(suspect|secondaire|[ée]volutive)",
r"fracture.{1,30}(suspect|secondaire|[ée]volutive)",
r"((l[ée]sion|anomalie|image|nodule).{1,80}(secondaire))",
r"((l[ée]sion|anomalie|image|nodule)s.{1,40}suspec?ts?)",
],
regex_attr="NORM",
)

default_patterns = [
main_pattern,
metastasis_pattern,
Expand Down
15 changes: 11 additions & 4 deletions edsnlp/pipes/ner/disorders/solid_tumor/solid_tumor.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from edsnlp.utils.numbers import parse_digit

from ..base import DisorderMatcher
from .patterns import default_patterns
from .patterns import default_patterns, metastasis_ct_scan


class SolidTumorMatcher(DisorderMatcher):
Expand Down Expand Up @@ -79,12 +79,15 @@ class SolidTumorMatcher(DisorderMatcher):
How to set matches on the doc
use_tnm : bool
Whether to use TNM scores matching as well
use_patterns_metastasis_ct_scan : bool
Whether to use the metastasis patterns developed for the CT-Scans
Authors and citation
--------------------
The `eds.solid_tumor` component was developed by AP-HP's Data Science team with a
team of medical experts. A paper describing in details the development of those
components is being drafted and will soon be available.
team of medical experts, following the insights of the algorithm proposed
by [@petitjean_2024] and [@kempf:hal-03519085].
"""

def __init__(
Expand All @@ -94,9 +97,13 @@ def __init__(
*,
patterns: Union[Dict[str, Any], List[Dict[str, Any]]] = default_patterns,
use_tnm: bool = False,
use_patterns_metastasis_ct_scan: bool = False,
label: str = "solid_tumor",
span_setter: SpanSetterArg = {"ents": True, "solid_tumor": True},
):
if use_patterns_metastasis_ct_scan:
patterns.append(metastasis_ct_scan)

super().__init__(
nlp=nlp,
name=name,
Expand Down Expand Up @@ -130,7 +137,7 @@ def process_tnm(self, doc):

def process(self, doc: Doc) -> List[Span]:
for span in super().process(doc):
if (span._.source == "metastasis") or (
if (span._.source in ["metastasis", "metastasis_ct_scan"]) or (
"metastasis" in span._.assigned.keys()
):
span._.status = 2
Expand Down
16 changes: 7 additions & 9 deletions tests/pipelines/ner/disorders/solid_tumor.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,5 @@
results_solid_tumor = dict(
has_match=[
True,
True,
False,
True,
True,
True,
True,
],
has_match=[True, True, False, True, True, True, True, True, True],
detailled_status=[
"LOCALIZED",
"LOCALIZED",
Expand All @@ -16,6 +8,8 @@
"METASTASIS",
"LOCALIZED",
"METASTASIS",
"METASTASIS",
"METASTASIS",
],
assign=None,
texts=[
Expand All @@ -26,5 +20,9 @@
"Cancer du poumon au stade 4",
"Cancer du poumon au stade 2",
"Présence de nombreuses lésions secondaires",
"Patient avec fracture abcddd secondaire. Cancer de",
"Patient avec lesions non ciblées",
],
)

solid_tumor_config = dict(use_patterns_metastasis_ct_scan=True)
Loading

0 comments on commit d672b2c

Please sign in to comment.