❗ Most of the functionality in this project has now been made available the library clinlp: production ready NLP pipelines for Dutch Clinical Text. Although the code here might still benefit some projects, the project itself is no longer maintained (and thus archived).
This package bundles some functionality for applying NLP (preprocessing) techniques to clinical text in psychiatry. Specifically, it contains the following submodules
:
preprocessing
-- Preprocessing textspelling
-- Spelling correctionentity
-- Entity matchingcontext
-- Detecting properties of entities (e.g. negation, plausibility) based on context
These submodules are further documented in their respective readmes, which you will find by following the links above.
Since some paths need to be initialized, installation is most easily done by downloading the source, modifying paths in (psynlp/utils.py
-- see Requirements below), and running:
pip install -r requirements.txt
python setup.py install
The psynlp
package has the following dependencies (automatically installed when using the commands above):
doublemetaphone
gensim
nltk
pandas
spacy
Some functionality requires specific models, which are not included in the repository because of their privacy-sensitive nature. Their paths should be specified in psynlp/utils.py
.
- A
spacy
model can be obtained here (e.g.python -m spacy download nl_core_news_sm
for standard Dutch model) - A
gensim
trained Word2Vec model, used for theEmbeddingRanker
in thespelling
module. - Token frequencies in the specific corpus required for the
NoisyRanker
, in acsv
file (;
-separated with atoken
and afrequency
column).
psynlp
follows an object-oriented paradigm, much like the sklearn
libary for machine learning. To use the spelling correction from the spelling
submodule for instance, the following code can be used:
from psynlp.spelling import SpellChecker
c = SpellChecker(spacy_model="your_spacy_model_name")
c.correct("Dit is een tekst met daarin een splefout")
>>> "Dit is een tekst met daarin een spelfout"
Usage is futher documented in detail in the respective submodule READMEs.
Basic usage and API of each submodule is documented in the submodule README. Additionally, some use cases are documented in the following notebooks (also referenced in the relevant submodule READMEs):
preprocessing.ipynb
-- Example code for preprocessingspelling.ipynb
-- Example code for spelling correctionentity.ipynb
-- Example code for entity recognitioncontext.ipynb
-- Example code for context matchingexample_pipeline.ipynb
-- Example code for extracting variables from text, using all of the four submodules
Vincent Menger -- Conceptualization, developing code
Nick Ermers -- Improving context detection