Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: metacat can predict on spans in arbitrary spangroups #391

Merged
merged 4 commits into from
Jan 30, 2024

Conversation

jkgenser
Copy link
Contributor

Hello - I opened this PR request to start a discussion about adding a new feature that allows a trained MetaCAT model to run on arbitrary spangroup. This will will allow configuring a MetaCAT model such that it can be pointed at an arbitrary spacy span group and set meta_anns on it.

Currently, it is allowed to run on the two set of list[spacy.token.Span]: doc.ents or doc._.ents.

However, there are use cases where we may store other spans as part of a pipeline under doc.spans["my_spangroup_name"] that are distinct from doc.ents in which we want to call predict on the meta cat model.

I don't think it's yet necessary to be able to train on arbitrary spangroups since that would get into changing the input format or extract format from medcat trainer.

One workaround is to either zero out the ents object and set the spangroup to those ents and call predict however this starts to feel hacky.

Another workaround is to create a duplicate spacy.token.Doc object and using doc.ents for the separate spangroup. However this is inefficient as it requires running parts of the spacy pipeline again and means we can't use this spangroup in other pipeline components.

@jkgenser jkgenser changed the title add: ability to predict on other spangroups add: metacat can predict on spans in arbitrary spangroups Jan 22, 2024
Copy link
Collaborator

@mart-r mart-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting in the effort! It's much appreciated!

Some changes for GHA to run.
Some changes for clarity.

medcat/config_meta_cat.py Show resolved Hide resolved
medcat/meta_cat.py Outdated Show resolved Hide resolved
tests/test_meta_cat.py Outdated Show resolved Hide resolved
tests/test_meta_cat.py Outdated Show resolved Hide resolved
@jkgenser
Copy link
Contributor Author

Thanks for putting in the effort! It's much appreciated!

Some changes for GHA to run. Some changes for clarity.

Thank you for your comments! I will address them.

@jkgenser
Copy link
Contributor Author

@mart-r I have iterated on your PR comments. I also added a more informative warning in case the span group is not set no the doc and a test for it.

@jkgenser jkgenser requested a review from mart-r January 23, 2024 13:59
@jkgenser
Copy link
Contributor Author

Thanks for putting in the effort! It's much appreciated!

Some changes for GHA to run. Some changes for clarity.

I've done my best to address all of your changes. Let me know if you have further comments!

Copy link
Collaborator

@mart-r mart-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

But please address the GitHub Actions issues (linting so far):

medcat/meta_cat.py:359:1: W293 blank line contains whitespace

@jkgenser
Copy link
Contributor Author

Looks good overall.

But please address the GitHub Actions issues (linting so far):

medcat/meta_cat.py:359:1: W293 blank line contains whitespace

@mart-r Got it, I fixed the linting and flake8 passes locally.

@jkgenser jkgenser requested a review from mart-r January 29, 2024 20:06
@mart-r mart-r merged commit 85cbe77 into CogStack:master Jan 30, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants