-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue on overlap entities in the task-1 training set #8
Comments
Hi Julien, this overlap identifies two different entities:
This distinction is correct, hence the overlap is correct as well. |
well, if so why not tagging/linking New_York as well? Would you mind to detail a bit more how you have managed nested entities in the creation of the GS? |
@anuzzolese Can we please re-open this issue? This is serious, since nested entities is a very _hard_ problem for the community. Fine that the organizers of the challenge want to consider it but then, you need to communicate what are/were the clear guidelines provided to the annotators. For example, @giusepperizzo just gave you an example of why not all possible nested entities have been annotated? Next, you need to guarantee that consistency will have been applied between the training and the test sets. Warning: you really enter a can of worms by considering nested entities. You are likely to have a long adjudication phase where all systems having participated in the challenge will come back and complain and ask to re-compute the figures since they will discover inconsistencies. |
@giusepperizzo and @rtroncy I see you point and I agree it's very hard to address the task of overlapping entities. I asked annotators to report possible different entities in case of overlaps. In this case the annotator found two distinct entities and considered New_York as a characterisation (a way for disambiguating) of Auburn. Hence, in my opinion there are two possibilities:
The issue is reopened. |
Thanks for having re-opened the issue. For the challenge purpose, I think you should go for your second option, i.e. remove all identification of nested entities, in both the training and test dataset, and only consider the "largest" (this is often the longest surface form) entity. Annotating the dataset in terms of nested entities is also a very valuable effort and, if you're willing to do it, it might be of great benefit for the community. This resource will be useful post-challenge for performing additional experiments. For example, TAC 2014 consider the nested entities as optional (for the systems which wanted to do some trials) but this was not part of the official competition since the community is still trying to learn and discover how this complex problem should be scored/evaluated, etc. |
According to this issue there is again two other cases:
|
Hi,
I found a new bug in the training set, this one is about the overlap of two entities:
And
I think the second one is false.
Cheers.
The text was updated successfully, but these errors were encountered: