Skip to content

Commit

Permalink
docs: add training tutorial, update docs & run pre-commit
Browse files Browse the repository at this point in the history
  • Loading branch information
percevalw committed Oct 11, 2023
1 parent e19f23a commit daed632
Show file tree
Hide file tree
Showing 34 changed files with 607 additions and 234 deletions.
20 changes: 9 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,11 @@
[![Codecov](https://img.shields.io/codecov/c/github/aphp/edsnlp?logo=codecov&style=flat-square)](https://codecov.io/gh/aphp/edsnlp)
[![DOI](https://zenodo.org/badge/467585436.svg)](https://zenodo.org/badge/latestdoi/467585436)

# EDS-NLP
EDS-NLP is a collaborative NLP framework that aims at extracting information from French clinical notes.
At its core, it is a collection of components or pipes, either rule-based functions or
[deep learning modules](https://aphp.github.io/concepts/torch-component). These components are organized into a novel efficient and modular [pipeline system](https://aphp.github.io/concepts/pipeline), built for hybrid and multi-task models. We use [spaCy](https://spacy.io) to represent documents and their annotations, and [Pytorch](https://pytorch.org/) as a deep-learning backend for trainable components.

EDS-NLP provides a set of spaCy components that are used to extract information from clinical notes written in French.

Check out the interactive [demo](https://aphp.github.io/edsnlp/demo/)!

If it's your first time with spaCy, we recommend you familiarise yourself with some of their key concepts by looking at the "[spaCy 101](https://aphp.github.io/edsnlp/latest/tutorials/spacy101/)" page in the documentation.
Although initially designed for French clinical notes, the architecture of EDS-NLP is versatile and can be used on any document. The rule-based components are fully compatible with spaCy's pipelines, and vice versa, which makes it easy to integrate and extend with other NLP tools. This library is a product of collaborative effort, and we encourage further contributions to enhance its capabilities. Check out our interactive [demo](https://aphp.github.io/edsnlp/demo/) to see EDS-NLP in action.

## Quick start

Expand All @@ -34,29 +32,29 @@ pip install edsnlp==0.9.1
Once you've installed the library, let's begin with a very simple example that extracts mentions of COVID19 in a text, and detects whether they are negated.

```python
import spacy
import edsnlp

nlp = spacy.blank("eds")
nlp = edsnlp.blank("eds")

terms = dict(
covid=["covid", "coronavirus"],
)

# Sentencizer component, needed for negation detection
# Split the documents into sentences, this isneeded for negation detection
nlp.add_pipe("eds.sentences")
# Matcher component
nlp.add_pipe("eds.matcher", config=dict(terms=terms))
# Negation detection
nlp.add_pipe("eds.negation")

# Process your text in one call !
doc = nlp("Le patient est atteint de covid")
doc = nlp("Le patient n'est pas atteint de covid")

doc.ents
# Out: (covid,)

doc.ents[0]._.negation
# Out: False
# Out: True
```

## Documentation
Expand Down
4 changes: 2 additions & 2 deletions docs/advanced-tutorials/fastapi.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ Let's create a simple NLP model, that can:
You know the drill:

```python title="pipeline.py"
import spacy
import edsnlp

nlp = spacy.blank('fr')
nlp = edsnlp.blank('fr')

nlp.add_pipe("eds.sentences")

Expand Down
20 changes: 10 additions & 10 deletions docs/concepts/torch-component.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,31 +17,31 @@ In the trainable pipes of EDS-NLP, preprocessing and postprocessing are decouple
??? details "Methods of a trainable component"

### `preprocess` {: #edsnlp.core.torch_component.TorchComponent.preprocess }

::: edsnlp.core.torch_component.TorchComponent.preprocess
options:
heading_level: 4
show_source: false
show_toc: false

### `collate` {: #edsnlp.core.torch_component.TorchComponent.collate }

::: edsnlp.core.torch_component.TorchComponent.collate
options:
heading_level: 4
show_source: false
show_toc: false

### `forward` {: #edsnlp.core.torch_component.TorchComponent.forward }

::: edsnlp.core.torch_component.TorchComponent.forward
options:
heading_level: 4
show_source: false
show_toc: false

### `postprocess` {: #edsnlp.core.torch_component.TorchComponent.postprocess }

::: edsnlp.core.torch_component.TorchComponent.postprocess
options:
heading_level: 4
Expand All @@ -50,10 +50,10 @@ In the trainable pipes of EDS-NLP, preprocessing and postprocessing are decouple


Additionally, there is a fifth method:


### `post_init` {: #edsnlp.core.torch_component.TorchComponent.post_init }

::: edsnlp.core.torch_component.TorchComponent.post_init
options:
heading_level: 3
Expand Down
24 changes: 16 additions & 8 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# Getting started

EDS-NLP provides a set of spaCy components that are used to extract information from clinical notes written in French.
EDS-NLP is a collaborative NLP framework that aims at extracting information from French clinical notes.
At its core, it is a collection of components or pipes, either rule-based functions or
[deep learning modules](https://aphp.github.io/concepts/torch-component). These components are organized into a novel efficient and modular [pipeline system](https://aphp.github.io/concepts/pipeline), built for hybrid and multi-task models. We use [spaCy](https://spacy.io) to represent documents and their annotations, and [Pytorch](https://pytorch.org/) as a deep-learning backend for trainable components.

If it's your first time with spaCy, we recommend you familiarise yourself with some of their key concepts by looking at the "[spaCy 101](tutorials/spacy101.md)" page.
Although initially designed for French clinical notes, the architecture of EDS-NLP is versatile and can be used on any document. The rule-based components are fully compatible with spaCy's pipelines, and vice versa, which makes it easy to integrate and extend with other NLP tools. This library is a product of collaborative effort, and we encourage further contributions to enhance its capabilities. Check out our interactive [demo](https://aphp.github.io/edsnlp/demo/) to see EDS-NLP in action.

## Quick start

Expand Down Expand Up @@ -31,9 +33,9 @@ pip install edsnlp==0.9.1
Once you've installed the library, let's begin with a very simple example that extracts mentions of COVID19 in a text, and detects whether they are negated.

```python
import spacy
import edsnlp

nlp = spacy.blank("eds") # (1)
nlp = edsnlp.blank("eds") # (1)

terms = dict(
covid=["covid", "coronavirus"], # (2)
Expand All @@ -47,23 +49,29 @@ nlp.add_pipe("eds.matcher", config=dict(terms=terms)) # (4)
nlp.add_pipe("eds.negation")

# Process your text in one call !
doc = nlp("Le patient est atteint de covid")
doc = nlp("Le patient n'est pas atteint de covid")

doc.ents # (5)
# Out: (covid,)

doc.ents[0]._.negation # (6)
# Out: False
# Out: True
```

1. We only need spaCy's French tokenizer.
1. 'eds' is the name of the language, which defines the [tokenizer](/tokenizers).
2. This example terminology provides a very simple, and by no means exhaustive, list of synonyms for COVID19.
3. In spaCy, pipelines are added via the [`nlp.add_pipe` method](https://spacy.io/api/language#add_pipe). EDS-NLP pipelines are automatically discovered by spaCy.
4. See the [matching tutorial](tutorials/matching-a-terminology.md) for mode details.
5. spaCy stores extracted entities in the [`Doc.ents` attribute](https://spacy.io/api/doc#ents).
6. The `eds.negation` component has adds a `negation` custom attribute.

This example is complete, it should run as-is. Check out the [spaCy 101 page](tutorials/spacy101.md) if you're not familiar with spaCy.
This example is complete, it should run as-is.

## Tutorials

To learn more about EDS-NLP, we have prepared a series of tutorials that should cover the main features of the library.

--8<-- "docs/tutorials/overview.md:tutorials"

## Available pipeline components

Expand Down
4 changes: 2 additions & 2 deletions docs/pipelines/core/contextual-matcher.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,9 +145,9 @@ This parameter can be se to `True` **only for a single assign key per dictionary
## Examples

```python
import spacy
import edsnlp

nlp = spacy.blank("eds")
nlp = edsnlp.blank("eds")

nlp.add_pipe("sentences")
nlp.add_pipe("normalizer")
Expand Down
32 changes: 16 additions & 16 deletions docs/pipelines/core/normalizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,10 @@ The normaliser can act on the input text in five dimensions :
The normalisation is handled by the single `eds.normalizer` pipeline. The following code snippet is complete, and should run as is.

```python
import spacy
import edsnlp
from edsnlp.matchers.utils import get_text

nlp = spacy.blank("eds")
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.normalizer")

# Notice the special character used for the apostrophe and the quotes
Expand Down Expand Up @@ -74,7 +74,7 @@ The `eds.lowercase` pipeline transforms every token to lowercase. It is not conf
Consider the following example :

```python
import spacy
import edsnlp
from edsnlp.matchers.utils import get_text

config = dict(
Expand All @@ -85,7 +85,7 @@ config = dict(
pollution=False,
)

nlp = spacy.blank("eds")
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.normalizer", config=config)

text = "Pneumopathie à NBNbWbWbNbWbNBNbNbWbW `coronavirus'"
Expand All @@ -105,7 +105,7 @@ making it more predictable than using a library such as `unidecode`.
Consider the following example :

```python
import spacy
import edsnlp
from edsnlp.matchers.utils import get_text

config = dict(
Expand All @@ -116,7 +116,7 @@ config = dict(
pollution=False,
)

nlp = spacy.blank("eds")
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.normalizer", config=config)

text = "Pneumopathie à NBNbWbWbNbWbNBNbNbWbW `coronavirus'"
Expand All @@ -135,7 +135,7 @@ Apostrophes and quotation marks can be encoded using unpredictable special chara
Consider the following example :

```python
import spacy
import edsnlp
from edsnlp.matchers.utils import get_text

config = dict(
Expand All @@ -146,7 +146,7 @@ config = dict(
pollution=False,
)

nlp = spacy.blank("eds")
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.normalizer", config=config)

text = "Pneumopathie à NBNbWbWbNbWbNBNbNbWbW `coronavirus'"
Expand All @@ -169,7 +169,7 @@ matching.
`ignore_space_tokens` parameter token to True in a downstream component.

```python
import spacy
import edsnlp

config = dict(
lowercase=False,
Expand All @@ -179,7 +179,7 @@ config = dict(
pollution=False,
)

nlp = spacy.blank("eds")
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.normalizer", config=config)

doc = nlp("Phrase avec des espaces \n et un retour à la ligne")
Expand All @@ -194,7 +194,7 @@ The pollution pipeline uses a set of regular expressions to detect pollutions (i
Consider the following example :

```python
import spacy
import edsnlp
from edsnlp.matchers.utils import get_text

config = dict(
Expand All @@ -205,7 +205,7 @@ config = dict(
pollution=True,
)

nlp = spacy.blank("eds")
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.normalizer", config=config)

text = "Pneumopathie à NBNbWbWbNbWbNBNbNbWbW `coronavirus'"
Expand All @@ -231,9 +231,9 @@ Pollution can come in various forms in clinical texts. We provide a small set of
For instance, if we consider biology tables as pollution, we only need to instantiate the `normalizer` pipe as follows:

```python
import spacy
import edsnlp

nlp = spacy.blank("eds")
nlp = edsnlp.blank("eds")
nlp.add_pipe(
"eds.normalizer",
config=dict(
Expand All @@ -260,9 +260,9 @@ If you want to exclude specific patterns, you can provide them as a RegEx (or a
For instance, to consider text between "AAA" and "ZZZ" as pollution you might use:

```python
import spacy
import edsnlp

nlp = spacy.blank("eds")
nlp = edsnlp.blank("eds")
nlp.add_pipe(
"eds.normalizer",
config=dict(
Expand Down
4 changes: 2 additions & 2 deletions docs/pipelines/ner/behaviors/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ Some general considerations about those components:
## Usage

```{ .python .no-check }
import spacy
import edsnlp
nlp = spacy.blank("eds")
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.sentences")
nlp.add_pipe(
"eds.normalizer",
Expand Down
4 changes: 2 additions & 2 deletions docs/pipelines/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@ EDS-NLP provides easy-to-use pipeline components (aka pipes).
You can add them to your pipeline by simply calling `add_pipe`, for instance:

```python
import spacy
import edsnlp

nlp = spacy.blank("eds")
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.normalizer")
nlp.add_pipe("eds.sentences")
nlp.add_pipe("eds.tnm")
Expand Down
8 changes: 8 additions & 0 deletions docs/pipelines/trainable/embeddings/span_pooler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Span Pooler {: #edsnlp.pipelines.trainable.embeddings.span_pooler.factory.create_component }

::: edsnlp.pipelines.trainable.embeddings.span_pooler.factory.create_component
options:
heading_level: 2
show_bases: false
show_source: false
only_class_level: true
1 change: 1 addition & 0 deletions docs/pipelines/trainable/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ All trainable components implement the [`TorchComponent`][edsnlp.core.torch_comp
|----------------------|----------------------------------------------------------------------|
| `eds.transformer` | Embed text with a transformer model |
| `eds.text_cnn` | Contextualize embeddings with a CNN |
| `eds.span_pooler` | A span embedding component that aggregates word embeddings |
| `eds.ner_crf` | A trainable component to extract entities |
| `eds.span_qualifier` | A trainable component for multi-class multi-label span qualification |

Expand Down
3 changes: 2 additions & 1 deletion docs/scripts/griffe_ext.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,8 @@ def on_instance(self, node: Union[ast.AST, ObjectNode], obj: Object) -> None:
return

callee = (
runtime_obj.__init__ if hasattr(runtime_obj, "__init__")
runtime_obj.__init__
if hasattr(runtime_obj, "__init__")
else runtime_obj
)
spec = inspect.getfullargspec(callee)
Expand Down
8 changes: 4 additions & 4 deletions docs/tokenizers.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,15 @@ To instantiate one of the two languages, you can call the `spacy.blank` method.
=== "EDSLanguage"

```python
import spacy
import edsnlp

nlp = spacy.blank("eds")
nlp = edsnlp.blank("eds")
```

=== "FrenchLanguage"

```python
import spacy
import edsnlp

nlp = spacy.blank("fr")
nlp = edsnlp.blank("fr")
```
Loading

0 comments on commit daed632

Please sign in to comment.