-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use spectral decomposition as more accurate fallback from Cholesky #2930
Use spectral decomposition as more accurate fallback from Cholesky #2930
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me.
The biggest concern is that it is possible that syev
also fail to converge by some reason. What should we do then?
Wouldn't it be safer to use some general form of spectral decomposition functions like gesvd
?
Another comment is about the testing.
I think the case of RHS > 1 also needs to be covered. And also it would be better to have a test that covers wider variety of input data sizes like it is done for run_and_check_linear
case.
LapackInst<FPType, cpu>::xsyev(&jobz, &uplo, (DAAL_INT *)&n, a, (DAAL_INT *)&n, eigenvalues.get(), work_buffer.get(), &work_buffer_size, | ||
&info); | ||
} | ||
if (info) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if symmetric matrix decomposition fail to converge. For example, due to some rounding errors.
Isn't it required to fall back to general SVD decomposition (like gesvd
which cannot fail)?
Or maybe it would be even more stable approach to use gesvd
instead of syev
for a spectral decomposition from the beginning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, gesvd
on syev
on normEq
is based on gesvd
could potentially be a fallback from QR, but that'd be a different PR.
I think scikit-learn-intelex
only calls oneDAL
's solver with normEq
BTW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant not to apply gesvd
to
Due to the factors like rounding errors syev
might fail to converge. And we will come to the same issue as we have now with Cholesky.
With SVD we can do:
And there will should be no possibilities to fail.
Or maybe this is too precautious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems that sklearnex
only uses normal equations method now. We have QR method implemented, but on CPUs only.
And it is rather inefficient in terms of performance to use normal equations at first, and if fail then apply QR to the whole
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow.
The Eigendecomposition part doesn't require positiveness - it would succeed even if the input had negative entries in the main diagonal. If by numerical inaccuracies it happens that some eigenvalue is close to zero in theory but has negative sign when calculated in practice (negative-indefinite matrices would have negative eigenvalues), it would anyway be removed as part of the procedure, so the end result would not take that component.
These "sy" routines takes the values only from the lower triangle of the input matrix, so there shouldn't be any issues with symmetry either if the matrix is getting filled in both corners. I understand they should also be more accurate than gesvd
when the input matrix is actually symmetric, but I'm not 100% sure about it.
About the QR part: sorry, to clarify - what I meant was that gesvd
on geqrf
on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, right. Forgot that positiveness is not a problem for eigenvalue decomposition, only for Cholesky.
Sorry for misleading info.
My comment was based on the experience with eigenvalue decomposition methods. But I forgot what exactly led to failures. So, I mean those methods can fail and trying to avoid that.
Probably it was not positiveness, but numerical stability issues.
If the condition number of
But I am not too familiar in terms of which MKL decomposition routine is better in the case of small or big condition numbers [thought gesvd
is the best, but it looks like it's not].
Anyway from your comment I see that for symmetric matrices sy
methods should be better than ge
. So, the choice of syevr
looks reasonable now.
I also understand the point about QR and gesvd
for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. It looks from the syevd docs that it is indeed possible for this routine to fail due to values in the inputs, although it doesn't explain when that could happen.
That being said, in case it makes it look less bad, if Eigendecomposition fails, it follows that other methods like SciPy's lstsq
or scikit-learn's LinearRegression
would also fail since they are also based on spectral decomposition.
Co-authored-by: Victoriya Fedotova <[email protected]>
Yes, I was thinking about adding such a test with more than one regression target. But I unfortunately cannot find the signature of this Testing with more input sizes would be trickier though - the logic for the current tests would not generalize to non-positive systems, and here we're looking for the minimum-norm solution instead of just any solution. Ideas would be welcome, but the only potential approach I see would be to just compare against LAPACK's |
Yes, I really cannot find the information about the support of multiple right hand sides in oneDAL linear regression docs. So, it is possible to test multiple RHS just by providing the I will also try to provide ideas about wide matrices testing. |
/intelci: run |
So @Vika-F , I'm leaning towards asking for benchmarks on it, just to be careful. To be honest I'm not sure if it occurs in the datasets of the benchmarks, but its worth checking (since its so core). What do you think? Could be a bit of a pain sorry @david-cortes-intel |
Do you mean running the existing benchmarks after this PR, or to generate some other kind of benchmark comparing the eigenvalue solvers? I don't think re-running the existing benchmarks would make much sense - there's three possible cases:
|
Actually I was wrong about the second point - there are many cases even in the tests where Cholesky fails but PLU succeeds, which would now fall back to spectral decomposition. Need to investigate about what characteristics do they have ... |
Adding tests for multi-output regression seems to be quite tricky, as there's one issue that's currently a blocker: the tests here assume that coefficients come with shape
Whereas the current solver produces them with shape
There do seem to be some tests currently running that execute multi-output regressions, but they use the coefficients assuming shape Hence, if I add a test with the expected results like it was done in previous commits, it will end up erroring out either due to shape mismatches (if I change the table to So I'd say adding tests for multi-output regression would first require fixing that issue with the mismatched dimensions, which maybe could be left for a different PR. |
Sorry, I actually had some mistakes in my code:
|
/intelci: run |
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the feedback.
The changes are good to me.
I am almost ready to approve. I have only 1 comment regarding gesvd
use. Please see below. What do you think?
LapackInst<FPType, cpu>::xsyev(&jobz, &uplo, (DAAL_INT *)&n, a, (DAAL_INT *)&n, eigenvalues.get(), work_buffer.get(), &work_buffer_size, | ||
&info); | ||
} | ||
if (info) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant not to apply gesvd
to
Due to the factors like rounding errors syev
might fail to converge. And we will come to the same issue as we have now with Cholesky.
With SVD we can do:
And there will should be no possibilities to fail.
Or maybe this is too precautious.
LapackInst<FPType, cpu>::xsyev(&jobz, &uplo, (DAAL_INT *)&n, a, (DAAL_INT *)&n, eigenvalues.get(), work_buffer.get(), &work_buffer_size, | ||
&info); | ||
} | ||
if (info) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems that sklearnex
only uses normal equations method now. We have QR method implemented, but on CPUs only.
And it is rather inefficient in terms of performance to use normal equations at first, and if fail then apply QR to the whole
Looks like my earlier comment somehow got lost, so reposting. I don't follow. The Eigendecomposition part doesn't require positiveness - it would succeed even if the input had negative entries in the main diagonal. If by numerical inaccuracies it happens that some eigenvalue is close to zero in theory but has negative sign when calculated in practice (negative-indefinite matrices would have negative eigenvalues), it would anyway be removed as part of the procedure, so the end result would not take that component. These "sy" routines takes the values only from the lower triangle of the input matrix, so there shouldn't be any issues with symmetry either if the matrix is getting filled in both corners. I understand they should also be more accurate than About the QR part: sorry, to clarify - what I meant was that gesvd on |
/intelci: run |
/intelci: run |
/intelci: run |
@@ -89,6 +89,10 @@ class float_algo_fixture : public algo_fixture { | |||
return is_double && !this->get_policy().has_native_float64(); | |||
} | |||
|
|||
bool running_on_gpu() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this function into linear_regression fixture: https://github.com/david-cortes-intel/oneDAL/blob/overdetermined_minimum_norm/cpp/oneapi/dal/algo/linear_regression/test/fixture.hpp
Also, it would be better to rename to something more descriptive like: "non_psd_system_not_supported_on_device"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to that file and renamed it.
@icfaust it seems I've missed your comment. Or do you mean that David needs to check that the performance of the algorithm, as we measure it now, have not degraded after those changes? |
I just wonder if we need to verify that performance have not degraded with these changes. What do you think @Vika-F |
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
This PR changes the logic in solving non-positive linear systems to use spectral decomposition as a fallback instead of PLU.
Motivation
Linear systems of the form$\mathbf{Ax} = \mathbf{b}$ are currently solved through Cholesky factorization of $\mathbf{A}$ .
When the matrix$\mathbf{A}$ is not positive-definite, it falls back to performing PLU factorization. This factorization might be able to produce a solution to the linear system if the matrix is close to positive, in the sense that the solution will satisfy $\lVert \mathbf{Ax} - \mathbf{b} \rVert \approx 0$ .
However, an overdetermined system (e.g. when$\mathbf{A}$ is the cross product of a matrix $\mathbf{X}$ that has more columns than rows) might have an infinite number of solutions, of which either the one with minimum rank or minimum norm is typically preferred, such as by LAPACK's own least-squares routines (prefers minimum norm). In such cases, Cholesky and PLU will not produce the most desirable solution, assuming that tey are able to produce a solution at all, which isn't always the case.
Solution
This PR changes the logic to instead fall back to inversion based on spectral decomposition, but discarding too small singular values. It follows the same logic for discarding small singular values as LAPACK's
gelsd
with default tolerance.This has the added bonus that, in such cases, the results will match with those produced by Python software such as Scikit-Learn, SciPy, or StatsModels, which is not currently the case for linear regression problems in which there are more columns than rows.
It additionally adds an extra check in the Cholesky approach for whether the diagonal elements might be too small, which could be indicative of numerical instability in the results that is also better handled by the fallback added here.
It doesn't guarantee 100% agreement in results with Python libraries, since there isn't a 1-to-1 correspondence between values of Cholesky diagonal and singular values and hence there can be edge cases in which the criteria here will end up using Cholesky in situations in which SciPy would discard singular values, but it should make mismatches much less frequent.
Other changes
This PR also fixes an issue by which calling the function to solve linear systems with more than 1 RHS would produce an incorrect solution when it falls back to PLU.
Additionally, it adds a few comments in the LAPACK wrappers that might help other people when looking at the code and function signatures.
Next steps
There are cases in which one can know beforehand that Cholesky would fail, and in those situations, it'd be better to go straight away for the spectral decomposition. A potential next step would be to add it as an additional solver option, and make wrappers such as scikit-learn-intelex request that solver when there are more columns than rows in regression scenarios.
Another next step would be to allow scikit-learn-intelex to use oneDAL's routines in cases in which there are more columns than rows, which is currently disabled there.
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
Performance