Feature request: relieve constraints on non edsnlp custom attributes #220

cvinot · 2023-10-03T10:01:13Z

Feature type

Enhance compatibility of EDSnlp custom attributes with potential external pipelines.

Description

in the BaseComponent class, in this commit you added this line :

        Span.set_extension(
            "value",
            getter=lambda span: span._.get(span.label_)
            if span._.has(span.label_)
            else None,
            force=True,
        )`

Doing this, you are enforcing (overwriting if already defined) an attribute which is non edsnlp specific. i.e not named specificaly for your use case, and could be very easily required by anyone using your package as part of a broader pipeline.(see spacy good practices regarding naming components/attributes)

Forcing a getter function means if later on a component were to try to set a value, it would be ignored.

Example of potential conflict:

import spacy
# Setting up a first pipeline
random_nlp = spacy.blank("fr")
random_nlp.add_pipe(
    "eds.terminology", # This component is enforcing the "value" custom attribute
    name="test",
    config=dict(label="Any",terms={}),
)

# Setting up another custom pipeline somewhere else in the code
nlp = spacy.blank("fr")
text = "hello this is a test"
doc = nlp(text)
my_span = doc[0:3]
my_span._.value = "CustomValue" # This raises no error.

assert my_span._.value == "CustomValue" # Error: my_span._.value is None.

WIth my modest experience I would suggest avoiding enforcing attributes in general, but if necessary, renaming the attribute to avoid conflicts. If not possible allowing renaming of the attribute, and/or make sure you let the user know what attributes you are enforcing. (I believe this should be made very clear in the doc specifically in the case of attributes such as "value").

I ran into this conflict upgrading edsnlp from 0.7.4 to 0.9 I might be missing something here and would love to hear what the reasons are if this is absolutely needed.

The text was updated successfully, but these errors were encountered:

percevalw · 2023-10-04T14:53:49Z

Thank you for letting us know! Indeed, this is problematic. This change was made to provide a uniform, generic, access to the normalized entity value (see #47), useful for printing results, exporting as a dataframe, etc, regardless of the span label.
I feel we can have the best of both worlds by querying the user_data dict first, and running the getter if the search failed.

In any case, you're right:

force should be False
this should be better documented
a warning can be displayed when the user sets the extension directly to remind them that its intended to be used as a getter

cvinot · 2023-10-05T08:35:25Z

Thanks for the PR link, it does make more sense understanding your approach.

However I still find dubious the use of the "value" extension you are suggesting, as a label_ getter. It seems you are basically erasing this extension by linking it to "label_" which is basically limited to span categories, when it could simply be used to different use cases. I do understand the will to store normalized values.

Why not get directly the "label_" attribute when needed ?

In my use case I see the value extension more like a free entry value which could be set and overridden by different process, sort of how you use the value of your measurements. I run various pipelines, which store information in their respective specific attributes, defined in specifically named attributes, and "decisions" pipeline which decide for a value given the information of the spans. That means the value is not specifically normalized and could be of many shapes including custom models.

I think attributes with strong constraints should be isolated in very specific names to avoid overlapping with more general attributes if that makes sense. I'm personnaly not scared of verbose, we could imagine "span._.label_getter" with the same usecase.

Anyway thanks for explaining your view, I can understand that multiple approaches can be considered.

percevalw · 2023-10-05T13:08:18Z

In my use case I see the value extension more like a free entry value which could be set and overridden by different process

Agreed, we will soon allow this. But in the case a value has not been set by the user, we will keep the getter behavior.

Why not get directly the "label_" attribute when needed ?

It's true that just typing span._.get(span.label_) would often be sufficient. However, we have multiple use cases in which we list the extensions that we want to see exported / learned / evaluated, etc: with the value extension, just listing it in my_func(extensions=["value", "negation"]) solves this.

That means the value is not specifically normalized and could be of many shapes including custom models.

In our case, a normalized value can also be an instance of a specific class (for instance an AbsoluteDate object or a SimpleMeasurement), or a str if it is enough to semantically describe the extraction.

cvinot changed the title ~~Feature request: [feature]~~ Feature request: relieve constraints on non edsnlp custom attributes Oct 3, 2023

percevalw mentioned this issue Dec 4, 2023

Compatible Span._.value extension #228

Merged

3 tasks

percevalw closed this as completed in #228 Dec 15, 2023

percevalw mentioned this issue Dec 15, 2023

Bump version to 0.10.1 #237

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: relieve constraints on non edsnlp custom attributes #220

Feature request: relieve constraints on non edsnlp custom attributes #220

cvinot commented Oct 3, 2023

percevalw commented Oct 4, 2023

cvinot commented Oct 5, 2023 •

edited

Loading

percevalw commented Oct 5, 2023

Feature request: relieve constraints on non edsnlp custom attributes #220

Feature request: relieve constraints on non edsnlp custom attributes #220

Comments

cvinot commented Oct 3, 2023

Feature type

Description

percevalw commented Oct 4, 2023

cvinot commented Oct 5, 2023 • edited Loading

percevalw commented Oct 5, 2023

cvinot commented Oct 5, 2023 •

edited

Loading