Issue on overlap entities in the task-1 training set #8

jplu · 2015-03-27T22:41:33Z

Hi,

I found a new bug in the training set, this one is about the overlap of two entities:

<http://www.ontologydesignpatterns.org/data/oke-challenge/task-1/sentence-15#char=12,28>
        a                     nif:String , nif:RFC5147String ;
        nif:anchorOf          "Auburn, New York"@en ;
        nif:beginIndex        "12"^^xsd:int ;
        nif:endIndex          "28"^^xsd:int ;
        nif:referenceContext  <http://www.ontologydesignpatterns.org/data/oke-challenge/task-1/sentence-15#char=0,145> ;
        itsrdf:taIdentRef     <http://www.ontologydesignpatterns.org/data/oke-challenge/task-1/Auburn,_New_York> .

And

<http://www.ontologydesignpatterns.org/data/oke-challenge/task-1/sentence-15#char=2,27>
        a                     nif:String , nif:RFC5147String ;
        nif:anchorOf          "native of Auburn, New Yor"@en ;
        nif:beginIndex        "2"^^xsd:int ;
        nif:endIndex          "27"^^xsd:int ;
        nif:referenceContext  <http://www.ontologydesignpatterns.org/data/oke-challenge/task-1/sentence-15#char=0,145> ;
        itsrdf:taIdentRef     <http://www.ontologydesignpatterns.org/data/oke-challenge/task-1/Native_of_Auburn,_New_York_1> .

I think the second one is false.

Cheers.

The text was updated successfully, but these errors were encountered:

anuzzolese · 2015-03-31T10:25:43Z

Hi Julien,

this overlap identifies two different entities:

oke:Auburn,_New_York, which is a place;
oke:Native_of_Auburn,_New_York_1, which is a person.

This distinction is correct, hence the overlap is correct as well.

giusepperizzo · 2015-03-31T11:10:23Z

well, if so why not tagging/linking New_York as well?

Would you mind to detail a bit more how you have managed nested entities in the creation of the GS?

rtroncy · 2015-03-31T13:30:39Z

@anuzzolese Can we please re-open this issue? This is serious, since nested entities is a very _hard_ problem for the community. Fine that the organizers of the challenge want to consider it but then, you need to communicate what are/were the clear guidelines provided to the annotators. For example, @giusepperizzo just gave you an example of why not all possible nested entities have been annotated? Next, you need to guarantee that consistency will have been applied between the training and the test sets.

Warning: you really enter a can of worms by considering nested entities. You are likely to have a long adjudication phase where all systems having participated in the challenge will come back and complain and ask to re-compute the figures since they will discover inconsistencies.
Are you sure you want this?

anuzzolese · 2015-03-31T13:52:37Z

@giusepperizzo and @rtroncy I see you point and I agree it's very hard to address the task of overlapping entities.

I asked annotators to report possible different entities in case of overlaps. In this case the annotator found two distinct entities and considered New_York as a characterisation (a way for disambiguating) of Auburn.
However, the comment is highly pertinent and this way of generating entities might introduce a worm in the evaluation. In fact, someone could say that New York is a mention to another entity.

Hence, in my opinion there are two possibilities:

take into account all the nested entities (I will take personally care of updating the training set accordingly);
remove the identification of nested entities;

The issue is reopened.
WDYT?

rtroncy · 2015-03-31T18:21:48Z

Thanks for having re-opened the issue. For the challenge purpose, I think you should go for your second option, i.e. remove all identification of nested entities, in both the training and test dataset, and only consider the "largest" (this is often the longest surface form) entity.

Annotating the dataset in terms of nested entities is also a very valuable effort and, if you're willing to do it, it might be of great benefit for the community. This resource will be useful post-challenge for performing additional experiments. For example, TAC 2014 consider the nested entities as optional (for the systems which wanted to do some trials) but this was not part of the official competition since the community is still trying to learn and discover how this complex problem should be scored/evaluated, etc.

jplu · 2015-05-28T08:02:45Z

According to this issue there is again two other cases:

In sentence 64: these two entities "Methodist Episcopal" and "clergyman" are following each other. They are both extracted and typed as a "Role" whereas it can be "Methodist Episcopal clergyman".
In sentence 31: the entity "Wiveliscombe" is extracted and the correct entity might be "a native of Wiveliscombe"

anuzzolese closed this as completed Mar 31, 2015

anuzzolese reopened this Mar 31, 2015

jplu mentioned this issue Apr 5, 2015

Maybe mention extraction errors in task-1 training set #20

Closed

rtroncy mentioned this issue Jul 19, 2015

Clean evaluation test for task1 #29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue on overlap entities in the task-1 training set #8

Issue on overlap entities in the task-1 training set #8

jplu commented Mar 27, 2015

anuzzolese commented Mar 31, 2015

giusepperizzo commented Mar 31, 2015

rtroncy commented Mar 31, 2015

anuzzolese commented Mar 31, 2015

rtroncy commented Mar 31, 2015

jplu commented May 28, 2015

Issue on overlap entities in the task-1 training set #8

Issue on overlap entities in the task-1 training set #8

Comments

jplu commented Mar 27, 2015

anuzzolese commented Mar 31, 2015

giusepperizzo commented Mar 31, 2015

rtroncy commented Mar 31, 2015

anuzzolese commented Mar 31, 2015

rtroncy commented Mar 31, 2015

jplu commented May 28, 2015