-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hypergeometric incorrect samples #1508
Comments
It looks like there is something wrong with the rejection-acceptance sampling method. For various input parameters, I generated 10000000 samples, and compared the frequencies of the output values to the theoretical frequencies. ("Compare" means I just printed the observed and expected frequencies, and looked for big discrepancies. I used a Python script for that, and used the |
Interesting. That's not what I observe. With the master branch, when I call rand/rand_distr/src/hypergeometric.rs Lines 196 to 198 in 0fba940
With those inputs, |
I was mistaken, I had it in a debugger from the KS tests, but I think I forgot to comment out the other hyperparameter |
I would wait a bit if someone with experience with the algorithm (maybe @teryror ?) wants to investigate. Otherwise I will try myself. |
It turns out the problem is a bug in the original algorithm. R discovered this years ago: https://bugs.r-project.org/show_bug.cgi?id=7314 The fix is to change this line rand/rand_distr/src/hypergeometric.rs Line 362 in f5185d9
to
There is a separate (but apparently not so significant) bug: in |
Really good catch :) I actually tried to see if fixing the Stirling helps, but it did not have any measurable effect on the KS statistic (but this was with the bug still there). It would be good to know what would be the minimal values it can be called with. |
Hypergeometric::new(100,50,49)
produces samples which are very likely not from this distribution.The distribution is not very extreme, so I would expect this to sample correctly.
One piece of evidence is in the failed KS test (see #1504)
I also did a chisquared test which gives a p value of 0.0 for a million samples:
The frequencies I sample:
the theoretical ones:
The text was updated successfully, but these errors were encountered: