Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fill_value for SimpleImputer with string data #1123

Merged
merged 1 commit into from
Sep 13, 2024

Conversation

bodnarbm
Copy link
Contributor

@bodnarbm bodnarbm commented Aug 7, 2024

Adds support for setting the fill_value of a SimpleImputer using the constant strategy on string data. This is done by moving the existing guard statement that limits string values of fill_value to be only the value "nan".

Based on some testing, likely this guard statement can be removed completely due to sci-kit learn doing it's own casting checks when fitting data to the SimpleImputer (see https://github.com/scikit-learn/scikit-learn/blob/5600faab452b0adb1f4090219012aa3ae6208c66/sklearn/impute/_base.py#L375-L398). I moved the statement in this PR, in case there was some edge case I missed while doing my own checks and there is some numeric SimpleImputer that allows fill_value=="nan". But let me know if you rather I remove it outright.

Addresses #1122

@bodnarbm bodnarbm force-pushed the support-constant-simple-imputer-strings branch from cd6b54d to e1716a3 Compare August 7, 2024 23:40
@bodnarbm
Copy link
Contributor Author

@xadupre Sorry for the direct mention, but looking at the git history, you might be the best person to ask. Is there anything I can do to help with review on this pull request?

@xadupre
Copy link
Collaborator

xadupre commented Sep 2, 2024

Sorry for the delay, I was off. Your PR looks good.

tests/test_sklearn_imputer_converter.py Fixed Show fixed Hide fixed
tests/test_sklearn_imputer_converter.py Fixed Show fixed Hide fixed
tests/test_sklearn_imputer_converter.py Fixed Show fixed Hide fixed
tests/test_sklearn_imputer_converter.py Fixed Show fixed Hide fixed
tests/test_sklearn_imputer_converter.py Fixed Show fixed Hide fixed
tests/test_sklearn_imputer_converter.py Fixed Show fixed Hide fixed
@bodnarbm
Copy link
Contributor Author

bodnarbm commented Sep 3, 2024

No worries.

I will try to clean up the CodeQL reported issues in the PR tomorrow. The CI failures look to be related to other unchanged areas of code, but I am also willing to dive into those too as I have time this week.

@bodnarbm
Copy link
Contributor Author

bodnarbm commented Sep 9, 2024

@xadupre I went ahead and re-ran the CI jobs on my own fork. It looks like the latest merge from main that I pulled into this branch addressed the remaining test flakiness. I believe the workflows should succeed now for this branch. Thank you for fixing those tests / dependencies in the other PRs you did!

@bodnarbm bodnarbm force-pushed the support-constant-simple-imputer-strings branch from d4f5318 to 634a06e Compare September 9, 2024 04:13
@bodnarbm bodnarbm force-pushed the support-constant-simple-imputer-strings branch from 634a06e to e457593 Compare September 9, 2024 04:28
@xadupre xadupre merged commit 0394ccd into onnx:main Sep 13, 2024
20 checks passed
@bodnarbm bodnarbm deleted the support-constant-simple-imputer-strings branch September 14, 2024 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants