-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use spectral decomposition as more accurate fallback from Cholesky #2930
Merged
david-cortes-intel
merged 17 commits into
oneapi-src:main
from
david-cortes-intel:overdetermined_minimum_norm
Oct 16, 2024
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
38bab8b
Use spectral decomposition as more accurate fallback from Cholesky
david-cortes-intel 9deb825
correct buffer size
david-cortes-intel 53dd077
add optional pragma (has no effect though)
david-cortes-intel 12cbd13
add test with an overdetermined system
david-cortes-intel eec237e
Update cpp/daal/src/algorithms/service_kernel_math.h
david-cortes-intel c8fadb7
PR comments
david-cortes-intel b062696
use syevr which is faster than syev
david-cortes-intel c2deb3c
avoid too eager threshold for aborting fp32 cholesky
david-cortes-intel 97f6f8a
missing commas
david-cortes-intel 7d49a14
fix declaration types
david-cortes-intel c859340
fix incorrect diagonal indexing
david-cortes-intel 119edcb
add test with multi-output and overdetermined
david-cortes-intel 9a97161
missing dispatch
david-cortes-intel 81fca31
skip recently introduced test on GPU
david-cortes-intel 74dce1c
missing file
david-cortes-intel 4c1271e
rename GPU check and move to correct file
david-cortes-intel d21da3c
Merge branch 'main' into overdetermined_minimum_norm
david-cortes-intel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if symmetric matrix decomposition fail to converge. For example, due to some rounding errors.
Isn't it required to fall back to general SVD decomposition (like
gesvd
which cannot fail)?Or maybe it would be even more stable approach to use
gesvd
instead ofsyev
for a spectral decomposition from the beginning?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,$\mathbf{X}$ would be more accurate than $\mathbf{X}^T \mathbf{X}$ , but it would be slower. However the solver $\mathbf{X}^T \mathbf{X}$ (I guess that's what the name implies but not 100% sure).
gesvd
onsyev
onnormEq
is based ongesvd
could potentially be a fallback from QR, but that'd be a different PR.I think
scikit-learn-intelex
only callsoneDAL
's solver withnormEq
BTW.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant not to apply$X$ , but rather to $X^T X$ .$X^T X$ might be not positive-semidefinite and
$X^T X \beta = X^T y$
$X^T X = U \Sigma V^T$
$\beta = V \Sigma^{-1} U^T X^T y$
gesvd
toDue to the factors like rounding errors
syev
might fail to converge. And we will come to the same issue as we have now with Cholesky.With SVD we can do:
And there will should be no possibilities to fail.
Or maybe this is too precautious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems that$X$ matrix -> double cost in terms of performance.
sklearnex
only uses normal equations method now. We have QR method implemented, but on CPUs only.And it is rather inefficient in terms of performance to use normal equations at first, and if fail then apply QR to the whole
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow.
The Eigendecomposition part doesn't require positiveness - it would succeed even if the input had negative entries in the main diagonal. If by numerical inaccuracies it happens that some eigenvalue is close to zero in theory but has negative sign when calculated in practice (negative-indefinite matrices would have negative eigenvalues), it would anyway be removed as part of the procedure, so the end result would not take that component.
These "sy" routines takes the values only from the lower triangle of the input matrix, so there shouldn't be any issues with symmetry either if the matrix is getting filled in both corners. I understand they should also be more accurate than
gesvd
when the input matrix is actually symmetric, but I'm not 100% sure about it.About the QR part: sorry, to clarify - what I meant was that$\mathbf{X}$ could be a fallback when $\mathbf{X}$ fails. In those cases, $\mathbf{X}^T \mathbf{X}$ would not get computed.
gesvd
ongeqrf
onThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, right. Forgot that positiveness is not a problem for eigenvalue decomposition, only for Cholesky.
Sorry for misleading info.
My comment was based on the experience with eigenvalue decomposition methods. But I forgot what exactly led to failures. So, I mean those methods can fail and trying to avoid that.
Probably it was not positiveness, but numerical stability issues.$X^T X$ is too small, i.e. all the eigenvalues are close, some eigenvalue decomposition methods can fail. Similar things might happen when the condition number is too big.
If the condition number of
But I am not too familiar in terms of which MKL decomposition routine is better in the case of small or big condition numbers [thought
gesvd
is the best, but it looks like it's not].Anyway from your comment I see that for symmetric matrices
sy
methods should be better thange
. So, the choice ofsyevr
looks reasonable now.I also understand the point about QR and$X$ decomposition, but this is not the scope of this PR.
gesvd
forThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. It looks from the syevd docs that it is indeed possible for this routine to fail due to values in the inputs, although it doesn't explain when that could happen.
That being said, in case it makes it look less bad, if Eigendecomposition fails, it follows that other methods like SciPy's
lstsq
or scikit-learn'sLinearRegression
would also fail since they are also based on spectral decomposition.