Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor evaluation to allow span based metrics #71

Open
wants to merge 16 commits into
base: feature/new-datagen-and-eval
Choose a base branch
from

Conversation

tranguyen221
Copy link
Collaborator

  • Implement the compare_span function in Evaluator class which takes lists of annotated_span and predicted_span and then generates the evaluation output in numbers
  • Implement the get_overlap_ratio function in Span class to calculate the overlapping ratio between annotated and predicted offsets.
  • Add the unittest for those functions

@melmatlis
Copy link
Collaborator

A general comment regarding the data_object.py file and unrelated to the PR.
The file is really long and contains 2 classes, perhaps we can add a separate task to add a folder for "data objects" and split the classes to separate files?
@omri374 @tranguyen221 what are your thoughts?

@omri374
Copy link
Contributor

omri374 commented Feb 5, 2023

@melmatlis I agree, but suggest to wait with this until we finalize all the changes, in order not to make unnecessary conflicts.

@microsoft microsoft deleted a comment from azure-pipelines bot Feb 5, 2023
@microsoft microsoft deleted a comment from azure-pipelines bot Feb 5, 2023
@omri374 omri374 changed the base branch from feature/refactor-evaluator to master February 5, 2023 14:56
@omri374 omri374 changed the base branch from master to data-generator-2.1 February 5, 2023 14:56
@omri374 omri374 changed the base branch from data-generator-2.1 to master February 5, 2023 19:58
@omri374
Copy link
Contributor

omri374 commented Feb 5, 2023

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@omri374 omri374 changed the base branch from master to feature/new-datagen-and-eval February 6, 2023 09:28
Copy link
Contributor

@omri374 omri374 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Publishing initial comments as discussed in our code review



class Evaluator:
def __init__(
self,
verbose: bool = False,
compare_by_io=True,
entities_to_keep: Optional[List[str]] = None,
entities_to_keep=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a list of entities? If the logic has changed, please update the name of the argument and its docstring

@@ -37,6 +40,25 @@ def __init__(
self.entities_to_keep = entities_to_keep
self.span_overlap_threshold = span_overlap_threshold

# setup a dict for storing the span metrics
self.span_model_metrics = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on having this as a class/dataclass?

@@ -0,0 +1,156 @@
import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this file move into the evaluation folder?

"""
Given a predicted_span, get the best matchest annotated_span based on the overlap_threshold.
Return a SpanOutput
:param sample: InputSample
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please align docstring with function signature

from presidio_evaluator.evaluation import SpanOutput


def get_matched_gold(predicted_span: Span,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method used? If not, consider removing. If yes, is the logic aligned with the docstring?

"""Find the overlap between two ranges
Find the overlap between two ranges. Return the overlapping values if
present, else return an empty set().
Examples:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add params to docstring, change the example format (:Example:) and add type hints

@@ -73,6 +74,14 @@ def intersect(self, other, ignore_entity_type: bool):
return min(self.end_position, other.end_position) - max(
self.start_position, other.start_position
)

def get_overlap_ratio(self, other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_overlap_ratio(self, other):
def get_overlap_ratio(self, other: "Span") -> float:

I know we don't have type hints across the entire codebase, but let's try to update at least the methods we add to modernize the codebase.

from pathlib import Path
from copy import deepcopy
from difflib import SequenceMatcher
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove unused imports

Copy link
Collaborator

@melmatlis melmatlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Trang, thanks for the PR. A lot of work went into the complicated modifications. Thank you for these.
May we please have a peer review session on the evaluator.py file. I need some guidance on the code readability. Thank you

@@ -73,6 +74,14 @@ def intersect(self, other, ignore_entity_type: bool):
return min(self.end_position, other.end_position) - max(
self.start_position, other.start_position
)

def get_overlap_ratio(self, other):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_overlap_ratio(self, other):
def get_overlap_ratio(self, other: Span):

"""
Calculates the ratio as: ratio = 2.0*M / T , where M = matches , T = total number of elements in both sequences
"""
nb_matches = self.intersect(other, ignore_entity_type = True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we always want to ignore_the entity type? Perhaps we should pass it as and argument to the function?

"""
nb_matches = self.intersect(other, ignore_entity_type = True)
total_characters = (self.end_position - self.start_position) + (other.end_position - other.start_position)
return np.round((2*nb_matches/total_characters), 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any theoretical chance that total_characters will be equal to 0?

[
(150, 153, "123", "A", 150, 153, "123", "A", True),
(150, 153, "123", "B", 150, 153, "123", "A", False),
(150, 153, "123", "A", 150, 153, "345", "A", False),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that the same range will have different entity values?

@@ -1,27 +1,30 @@
from collections import Counter
from typing import List, Optional, Dict, Tuple
from pathlib import Path
from copy import deepcopy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tranguyen221 May we please do peer review on this file?

def __eq__(self, other):
return (
self.output_type == other.output_type
and self.overlap_score == other.overlap_score
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should compare the floats using math.isclose or some other alternative in order to avoid floating point comparison errors?
image
This is one alternative:
image

@@ -22,57 +22,79 @@ def get_matched_gold(predicted_span: Span,
overlap_score=0
)

def find_overlap(true_range, pred_range):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this file to be part of the evaluation directory as well? what is the logic of what is included/excluded from the directory?

def span_compute_actual_possible(results: dict) -> dict:
"""
Takes a result dict that has been output by compute metrics.
Returns the results dict with actual, possible populated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to update the doc string further and to explain what's "Actual" and "possible" refer to.
Add the formulas into the docstring as well

calculating precision and recall.
"""

actual = results["actual"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the schema you created, we should have a class named EvaluationResult. Are the results of type dict as an intermediate solution?

@omri374 omri374 changed the title Tranguyen/implement compare span Refactor evaluation to allow span based metrics Oct 30, 2023
@omri374 omri374 reopened this Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants