-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor evaluation to allow span based metrics #71
base: feature/new-datagen-and-eval
Are you sure you want to change the base?
Changes from all commits
9018e6c
c7c8f30
f6c3840
c6fb0e4
bcd142a
d74aba1
c64d2c2
b0b7dcb
9bf20e5
d3c8caa
106aeeb
abc9d5d
e285ef4
e3acf9c
c92b543
d123cca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -4,6 +4,7 @@ | |||||
from collections import Counter | ||||||
|
||||||
import pandas as pd | ||||||
import numpy as np | ||||||
import spacy | ||||||
from spacy import Language | ||||||
from spacy.tokens import Doc, DocBin | ||||||
|
@@ -73,6 +74,14 @@ def intersect(self, other, ignore_entity_type: bool): | |||||
return min(self.end_position, other.end_position) - max( | ||||||
self.start_position, other.start_position | ||||||
) | ||||||
|
||||||
def get_overlap_ratio(self, other): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I know we don't have type hints across the entire codebase, but let's try to update at least the methods we add to modernize the codebase. |
||||||
""" | ||||||
Calculates the ratio as: ratio = 2.0*M / T , where M = matches , T = total number of elements in both sequences | ||||||
""" | ||||||
nb_matches = self.intersect(other, ignore_entity_type = True) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will we always want to ignore_the entity type? Perhaps we should pass it as and argument to the function? |
||||||
total_characters = (self.end_position - self.start_position) + (other.end_position - other.start_position) | ||||||
return np.round((2*nb_matches/total_characters), 2) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any theoretical chance that |
||||||
|
||||||
@classmethod | ||||||
def from_faker_span(cls, faker_span: FakerSpan) -> "Span": | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
from .model_error import ModelError | ||
from .evaluator_objects import SpanOutput, TokenOutput, ModelPrediction | ||
from .sample_error import SampleError | ||
from .evaluation_result import EvaluationResult | ||
from .evaluator import Evaluator | ||
|
||
__all__ = ["ModelError", "EvaluationResult", "Evaluator"] | ||
__all__ = ["SpanOutput", "TokenOutput", "ModelPrediction", "SampleError", "EvaluationResult", "Evaluator"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.