Skip to content
This repository has been archived by the owner on Jun 3, 2024. It is now read-only.
/ psynlp Public archive

Functionality for NLP on psychiatric clinical text

Notifications You must be signed in to change notification settings

vmenger/psynlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

psynlp --- NLP functionality for psychiatric text

❗ Most of the functionality in this project has now been made available the library clinlp: production ready NLP pipelines for Dutch Clinical Text. Although the code here might still benefit some projects, the project itself is no longer maintained (and thus archived).

This package bundles some functionality for applying NLP (preprocessing) techniques to clinical text in psychiatry. Specifically, it contains the following submodules:

  • preprocessing -- Preprocessing text
  • spelling -- Spelling correction
  • entity -- Entity matching
  • context -- Detecting properties of entities (e.g. negation, plausibility) based on context

These submodules are further documented in their respective readmes, which you will find by following the links above.

Installation

Since some paths need to be initialized, installation is most easily done by downloading the source, modifying paths in (psynlp/utils.py -- see Requirements below), and running:

pip install -r requirements.txt
python setup.py install 

Dependencies

The psynlp package has the following dependencies (automatically installed when using the commands above):

  • doublemetaphone
  • gensim
  • nltk
  • pandas
  • spacy

Requirements

Some functionality requires specific models, which are not included in the repository because of their privacy-sensitive nature. Their paths should be specified in psynlp/utils.py.

  • A spacy model can be obtained here (e.g. python -m spacy download nl_core_news_sm for standard Dutch model)
  • A gensim trained Word2Vec model, used for the EmbeddingRanker in the spelling module.
  • Token frequencies in the specific corpus required for the NoisyRanker, in a csv file (;-separated with a token and a frequency column).

Usage

psynlp follows an object-oriented paradigm, much like the sklearn libary for machine learning. To use the spelling correction from the spelling submodule for instance, the following code can be used:

from psynlp.spelling import SpellChecker
c = SpellChecker(spacy_model="your_spacy_model_name")
c.correct("Dit is een tekst met daarin een splefout")
>>> "Dit is een tekst met daarin een spelfout"

Usage is futher documented in detail in the respective submodule READMEs.

Examples

Basic usage and API of each submodule is documented in the submodule README. Additionally, some use cases are documented in the following notebooks (also referenced in the relevant submodule READMEs):

Contributors

Vincent Menger -- Conceptualization, developing code

Nick Ermers -- Improving context detection

About

Functionality for NLP on psychiatric clinical text

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages