Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implementation of ChemDisGene data set #918

Merged
merged 1 commit into from
Jun 4, 2024

Conversation

mariosaenger
Copy link
Collaborator

Closes #917

Copy link
Collaborator

@leonweber leonweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your contribution! It seems like the dataset statistics reported by the unit tests do not match those reported in the source paper (https://aclanthology.org/2022.lrec-1.116.pdf):

train
==========
id: 523
document_id: 523
passages: 1046
entities: 14248
normalized: 13686
events: 0
coreferences: 0
relations: 59414

Is this expected? Am I missing something?

@mariosaenger
Copy link
Collaborator Author

Thanks for checking the implementation. There are several aspects to keep in mind. First the dataset consists of a curated and a non-curated part. This implementation only concerns the former one. Second, the data set annotates relations only on abstract-level (using knowledge base identifiers). Following default practices in BigBio, I unrolled the document-level relations to mention-level. Note, however, the document-level annotations are available in the source schema. These aspects complicate a direct comparison of the numbers :-/

@leonweber
Copy link
Collaborator

Ah, thanks for pointing this out. Then let's merge this : )

Copy link
Collaborator

@leonweber leonweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@leonweber leonweber merged commit 8335363 into bigscience-workshop:main Jun 4, 2024
@leonweber leonweber deleted the chem_dis_gene branch June 4, 2024 20:10
phlobo pushed a commit to davidkartchner/biomedical that referenced this pull request Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ChemDisGene data set
2 participants