-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Architecture choice on custom extensions #47
Comments
@percevalw, @Thomzoy, @Aremaki, I'd love to get your thoughts on this! |
Great idea, I second you on this. Here are a few points of our IRL discussion As most end users would either only use the existing attributes (
As the Following your
|
Sounds good! 🎉 |
As discussed, here is a potential solution that standardizes the current architecture for custom extensions @Thomzoy @aricohen93
A specific This way, we can keep a consistent typing of each extension ( This does not prevent to define other extensions if needed, or to keep the old entity extensions and deprecate them in future versions. |
Following the discussion with @Thomzoy, we carry on with the approach commented above:
For instance, the following (non-exhaustive) modifications should be made:
|
These suggestions have been integrated in #213 |
Description
The way we've handled spaCy extensions in EDS-NLP has been erratic at best, with each pipeline declaring its own set of new extensions, cluttering spaCy's
Underscore
object.For instance, the pipeline
eds.dates
,eds.measures
andeds.emergency.priority
all include a parsing component, and each pipeline saves its result in a different attribute.We can clearly do better than that by adopting a more holistic and uniform approach.
Proposition
To start off the discussion, here are a few ideas/questions:
Obect._.edsnlp
master attribute. This would avoid cluttering the extensions, and leave more room for the end users.value
orparsed
.norm
key, that contains the normalised variant for a given entity (eg stripped of pollution, accents, etc). A reasonable idea is to provide the text used for matching. The normalised variant should perhaps be computed on the fly, using a getter function ?edsnlp
extensions, we could use anUnderscore
object. Not sure if that is overkill and/or incurs significant added complexity.The text was updated successfully, but these errors were encountered: