Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgbn_mont_sqr(r, a, modulus) returns result > modulus #15

Open
sethtroisi opened this issue Aug 18, 2021 · 4 comments
Open

cgbn_mont_sqr(r, a, modulus) returns result > modulus #15

sethtroisi opened this issue Aug 18, 2021 · 4 comments

Comments

@sethtroisi
Copy link
Contributor

sethtroisi commented Aug 18, 2021

I have found a case with N=971 bits where cgbn_mont_sqr(r, a, n) returns (a^2 % n) + n

I made a minimal repro branch here

Would you be willing to assist in debugging? Any ideas what might be wrong?

@sethtroisi
Copy link
Contributor Author

sethtroisi commented Aug 18, 2021

I increased the reproducibility by setting

a = n - bn2mont(1)

With BITS=1024 TPI={8,16,32}, this fails for all 2^n-1, n >= 683
With BITS=512 TPI={8,16}, this fails for all 2^n-1, n>=342
With BITS=256 TPI=8, this fails for all 2^n-1 >= 171
This is probably n = 2/3 BITS

With BITS=512, TPI=32, seems to work on the tested inputs
With BITS=256, TPI={16,32}, seems to work on the tested inputs
With BITS=128, TPI={8,16,32}, seems to work on the tested inputs

@sethtroisi
Copy link
Contributor Author

This error is present if I use core_mul_xmad.cu or core_mul_imad.cu

@sethtroisi
Copy link
Contributor Author

sethtroisi commented Aug 19, 2021

I looked at testing this with unit_test but I found that experience hard. Is it expected that make takes ~90 seconds for tester?
Is there a faster way to write a single unit test that tests cpu(mpz) vs gpu(CGBN)?

@nemmart
Copy link

nemmart commented Aug 19, 2021

Hi Seth,

Thanks for reporting. You have found a problem here, but it's more of a documentation issue than a coding issue. The mont_mul and mont_sqr routines use a redundant representation, and only subtract N if there was a carry out -- this is faster because it saves a comparison on each mont_sqr and mont_mul. The mont2bn routine is the final step and ensures that the result is normalized to N. I think the implementation is right, but it should be documented better about what's going on under the covers.

I'll leave this issue open until I update the docs.

Thank you,
Niall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants