Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(p/demo): PCG pseudo-random number generator #1993

Closed
wants to merge 3 commits into from

Conversation

notJoon
Copy link
Member

@notJoon notJoon commented Apr 29, 2024

Description

Random number generation package using the PCG algorithm

Overview

I used the PCG (Permuted Congruential Generator) algorithm to support most of the methods provided in math/rand. This algorithm is a pseudo-random number generator (PRNG) that generates the same random number sequence whenever the initial seed (state) is given, exhibiting fully deterministic characteristics. It also makes patterns and long period1 difficult to predict through the permutation process.

The implementation basically uses a linear congruential method but modifies the results through additional bitwise operations such as XOR. This approach improves the statistical quality of random numbers and reduces predictability.

Compatibility

I tried to match the methods provided in the math/rand package as much as possible, but there are a few differences. The table below compares the main functions between each implementation function name and includes descriptions.

math/rand PCG description
Seed Seed Sets the seed value (initial state) of the PRNG.
Int Uint32
Uint64
Returns a random integer. PCG32 returns 32 bits, and PCG64 returns 64 bits.
Intn Uint32n
Uint64n
Returns a random integer within the given range.
Float64 Float64 Returns a floating-point number between 0.0 and 1.0.
Perm Perm Returns a randomly shuffled array of integers from 0 to n-1.
Read Read Fills a byte slice with random data.
Shuffle Shuffle Randomly shuffles the elements of a given slice.
NormFloat64 X Returns a floating-point number following a standard normal distribution. (Not provided in PCG)
ExpFloat64 X Returns a floating-point number following an exponential distribution. (Not provided in PCG)

Performance and Benchmarks

Currently, direct benchmarking is not possible in Gno, so I measured using Go. The Go implementation is named MathRand, and PCG is named PCG in the benchmark identifiers. Please refer to the following link for detailed benchmark code.

Based on the benchmarks I conducted using Go implementations, the PCG algorithm generally outperforms the standard math/rand package in terms of generating single random numbers. PCG32 and PCG64 are approximately twice as fast as their math/rand counterparts. When generating random numbers within a given range, PCG also demonstrates superior performance, with PCG32 and PCG64 being nearly 3 times faster than math/rand's Intn method. In the case of shuffling, PCG32 and PCG64 exhibit better performance compared to math/rand, with PCG32 being the fastest.

However, when it comes to generating random floating-point numbers, PCG implementation is slightly slower than Go's math/rand package. Overall, the PCG algorithm provides improved performance in most scenarios while maintaining high-quality random number generation.

Generate Single Random Number

Benchmark # of Iterations Elapsed Time (ns/op) # Allocated Bytes (B/op) Allocations Per Operation
BenchmarkPCG32Rand 441127071 2.764 0 0
BenchmarkMathRand32 201966975 6.017 0 0
Benchmark # of Iterations Elapsed Time (ns/op) # Allocated Bytes (B/op) Allocations Per Operation
BenchmarkPCG64Rand 398531355 3.037 0 0
BenchmarkMathRand64 184649094 6.665 0 0

Generate Singe Number In Given Range

Benchmark # of Iterations Elapsed Time (ns/op) # Allocated Bytes (B/op) Allocations Per Operation
BenchmarkPCG32_Uintn32 440600071 2.820 0 0
BenchmarkPCG64_Uintn64 386100385 3.201 0 0
Benchmark_MathRanIntn 151258676 7.877 0 0

Read

Benchmark # of Iterations Elapsed Time (ns/op) # Allocated Bytes (B/op) Allocations Per Operation
BenchmarkPCG32Read 2884942 416.1 0 0
BenchmarkPCG64Read 2970010 404.9 0 0
BenchmarkMathRandRead 1403140 857.8 0 0

Shuffle

Benchmark # of Iterations Elapsed Time (ns/op) # Allocated Bytes (B/op) Allocations Per Operation
BenchmarkPCG32Shuffle 386566 3120 0 0
BenchmarkPCG64Shuffle 289429 4131 0 0
BenchmarkMathRandShuffle 270508 4389 0 0

Float

Generating random floating-point numbers is slightly slower than Go's implementation.

Benchmark # of Iterations Elapsed Time (ns/op) # Allocated Bytes (B/op) Allocations Per Operation
BenchmarkPCGFloat64 359199702 3.046 0 0
BenchmarkPCGFloat64Full 400288152 3.002 0 0
BenchmarkMathRandFloat64 495593946 2.412 0 0

Limitations

However, the current PCG implementation does not guarantee a completely uniform distribution. This is because when applying chi-square2 test, the p-value3 is smaller than the significance level, leading to the rejection of the null hypothesis. Nevertheless, it exhibits a certain level of uniformity, so it is believed that this will not pose a significant problem for a simple purpose (.

Below are the distributions of random numbers generated by PCG and math/rand under the same conditions. The blue color (top) is PCG. See the link for the code to generate the distribution graph.

pcg32_histogram
rand_histogram


Footnotes

  1. In a PRNG, the priod refers to the number of generated values before the sequence starts repeating itself. It is a crucial property that determines the quality and effectiveness of the PRNG algorithm. A longer period indicates that the generated sequence will have more randome and less predictable pattern. The PCG is designed to have an extremely long period, typically in the order of 2^64 or higher ref.

  2. The chi-square test is a stastical hypothesis test that measures the goodness of fit between the observed and expected frequencies under the assumption of a specific distribution. In this context, it is used to assess the uniformity of the generated random numbers by comparing their distribution to the expected uniform distribution.

  3. The p-value is the probability of observing a test stastic as extreme as the one calculated from the sample data. assuming the null hypothesis(H0) is true. In hypothesis testing, if the p-value is smaller than the predetermined significance level (e.g., 0.05 mostly), the H0 is rejected, indicating that the observed data is unlikely to have occurred by chance alone. The alternative hypothesis(H1), in this case, would state the generated random numbers do not follow a uniform distribution.

@notJoon notJoon requested review from a team as code owners April 29, 2024 10:24
@github-actions github-actions bot added the 🧾 package/realm Tag used for new Realms or Packages. label Apr 29, 2024
@notJoon notJoon changed the title feat(p/demo): PCG pseudo-number generator feat(p/demo): PCG pseudo-random number generator Apr 29, 2024
@moul
Copy link
Member

moul commented Apr 29, 2024

That's great! Ideally, we should complete porting math/rand to ensure that most Go code using basic functions like pickRandom(choices) can still function properly.

@notJoon
Copy link
Member Author

notJoon commented Apr 30, 2024

That's great! Ideally, we should complete porting math/rand to ensure that most Go code using basic functions like pickRandom(choices) can still function properly.

Thanks! I'm continuing to work on supporting the full functionality of math/rand as much as possible.

@notJoon notJoon marked this pull request as draft April 30, 2024 09:53
@leohhhn
Copy link
Contributor

leohhhn commented Apr 30, 2024

@notJoon even though its still in draft, amazing effort 👏

I'm not the expert on this but, since we are talking blockchains and determinism - I assume you plan to use this package just as a demo package, since the values should be guessable with enough effort from an attacker's perspective. Is this correct?

@notJoon
Copy link
Member Author

notJoon commented Apr 30, 2024

I'm not the expert on this but, since we are talking blockchains and determinism - I assume you plan to use this package just as a demo package, since the values should be guessable with enough effort from an attacker's perspective. Is this correct?

It is almost impossible to predict random values without knowing the initial state and seed value. However, I have not yet conducted the NIST SP 800-22 tests (well known test suite for PRNG), and a lot of verification is needed to ensure complete security. Therefore, I made it at the demo level.

@moul
Copy link
Member

moul commented May 1, 2024

cc @kristovatlas

@moul moul requested a review from kristovatlas May 1, 2024 21:02
Copy link

codecov bot commented May 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 46.72%. Comparing base (dd68d61) to head (a93fc36).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1993      +/-   ##
==========================================
- Coverage   48.44%   46.72%   -1.72%     
==========================================
  Files         409      492      +83     
  Lines       61965    69614    +7649     
==========================================
+ Hits        30019    32530    +2511     
- Misses      29446    34371    +4925     
- Partials     2500     2713     +213     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@notJoon
Copy link
Member Author

notJoon commented May 7, 2024

Over the past few days, I created a test framework based on the NIST SP-800-22 paper to verify the random numbers generated in the implementation and performed the verification. I set the significance level to 0.01 (If the result is greater than this value, it means it is random), and for the remaining values, I used the recommended input values from the paper and ran the tests. The results are as follows: (The universal test was run separately as it requires a large input size)

+---------------------------------------------+---------------------------------------------------------------------------------------------+--------+
| NIST STATISTICAL TEST SUITE                 | P-VALUE                                                                                     | RESULT |
+---------------------------------------------+---------------------------------------------------------------------------------------------+--------+
| Frequency (Monobit) Test                    | 0.53                                                                                        | Pass   |
| Frequency Test within a Block               | 0.24                                                                                        | Pass   |
| Runs Test                                   | 0.20                                                                                        | Pass   |
| Test for the Longest Run of Ones in a Block | 0.20                                                                                        | Pass   |
| Binary Matrix Rank Test                     | 0.33                                                                                        | Pass   |
| Discrete Fourier Transform (Spectral) Test  | 0.53                                                                                        | Pass   |
| Non-overlapping Template Matching Test      | 1.00                                                                                        | Pass   |
| Overlapping Template Matching Test          | 0.85                                                                                        | Pass   |
| Linear Complexity Test                      | 0.81                                                                                        | Pass   |
| Serial Test                                 | [0.82 0.95]                                                                                 | Pass   |
| Approximate Entropy Test                    | 1.00                                                                                        | Pass   |
| Cumulative Sums Test                        | 0.63                                                                                        | Pass   |
| Random Excursions Test                      | [0.29 0.76 0.09 0.53 0.58 0.87 0.39 0.83]                                                   | Pass   |
| Maurer's Universal Statistical Test         | 1.00                                                                                        | Pass   |
| Random Excursions Variant Test              | [0.32 0.44 0.42 0.30 0.49 0.67 0.61 0.72 0.92 0.35 0.51 0.64 0.78 0.76 0.58 0.77 0.89 0.98] | Pass   |
+---------------------------------------------+---------------------------------------------------------------------------------------------+--------+
|                                             | TOTAL TESTS                                                                                 | 15     |
|                                             | PASS                                                                                        | 15     |
|                                             | FAIL                                                                                        | 0      |
+---------------------------------------------+---------------------------------------------------------------------------------------------+--------+

While these tests can demonstrate that the generated random numbers have statistical randomness, they do not guarantee cryptographic security. Although it is difficult to reverse-engineer the seed using only the generated values without knowing the initial state or seed, there is still a possibility, however small.

Nevertheless, Go's math/rand package recommends using it only for parts that do not require security and suggests using crypto/rand (partially implemented in Gno) when security is crucial. Therefore, there seems to be no significant issue with providing random function support itself.

Currently, the initial seed value is set to be specified by the user, as hiding it remains a challenge. If it becomes possible to utilize the operating system's entropy source in the future, this aspect could be improved.

@thehowl
Copy link
Member

thehowl commented Jun 28, 2024

Hey @notJoon, apologies for the late reply on this. And thank you for the great PR as always.

We have a requirement for a random number generator in Gno for an initiative at Gophercon, hence why I created #2455. Because it ports over math/rand/v2, which uses PCG, do you think we need this now?

@notJoon
Copy link
Member Author

notJoon commented Jul 1, 2024

Hey @notJoon, apologies for the late reply on this. And thank you for the great PR as always.

We have a requirement for a random number generator in Gno for an initiative at Gophercon, hence why I created #2455. Because it ports over math/rand/v2, which uses PCG, do you think we need this now?

Thanks for reply @thehowl. I implemented this as a temporary measure when the random package was not available, so I think it may not have much significance now. Also, there was still some ambiguity because still couldn't access the entropy source.

@notJoon notJoon closed this Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧾 package/realm Tag used for new Realms or Packages.
Projects
Status: Done
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants