Zipf: let n have type F #1518

dhardy · 2024-10-24T17:13:03Z

Added a CHANGELOG.md entry

Summary

Change the parameter type of Zipf's n to F

Motivation

Details

The CDF test fails:

---- zipf stdout ----
KS statistic: 0.13359213049244018
Critical value: 0.00195
thread 'zipf' panicked at rand_distr/tests/cdf.rs:119:5:
assertion failed: ks_statistic < critical_value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The value stability tests (only 4 samples; bottom of zipf.rs) did not fail.

benjamin-lieser · 2024-10-24T17:24:28Z

You cannot use the continuous test, because the cdf is not continuous. It does work, if you replace
test_continuous(seed as u64, dist, |k| cdf(k, n, x));
with
test_discrete(seed as u64, dist, |k| cdf(k as f64, n, x));

dhardy · 2024-10-25T07:20:34Z

This should be mostly ready, but do we want to make this change?

Also note: #1517 added a note about casting results to ints being safe as a result of the input bounding the output; the input can now be larger too.

benjamin-lieser · 2024-10-25T08:12:17Z

I am not sure if we should do it. I do not have some quantitative proof for it, but I doubt that for values bigger u64::MAX there is a measurable difference to the Zeta distribution. So if such big values are desired there is already a solution, albeit less elegant because the user has to do the case distinction.
I guess most users will use Zipf with smaller integers, but to be honest I never really had a need myself, so this is a very vague guess.

Another point might be the name. If you search for Zipf, you fill find mostly the Zeta distribution, scipy has our Zipf as Zipfian. Maybe this would be a better name.

Edit: I guess there is still a difference to Zeta, because even for 2**64 there is significant mass in the tail of the harmonic series. And also Zeta does not support s=1. So there is definitely a potential usecase for this.

dhardy · 2024-10-25T09:16:57Z

I'll wait for @vks to comment.

vks

Looks good, but I'm a bit confused why there is a test failure.

vks · 2024-10-25T09:54:10Z

rand_distr/tests/cdf.rs

@@ -385,7 +385,7 @@ fn zipf() {
    let parameters = [(1000, 1.0), (500, 2.0), (1000, 0.5)];

    for (seed, (n, x)) in parameters.into_iter().enumerate() {
-        let dist = rand_distr::Zipf::new(n, x).unwrap();
+        let dist = rand_distr::Zipf::new(n as f64, x).unwrap();


You might as well change the values in parameters to floats.

benjamin-lieser · 2024-10-25T10:01:46Z

Looks good, but I'm a bit confused why there is a test failure.

Which test failure?

Do you have an opinion on the name? Zipf vs Zipfian

Zipf: let n have type F

af3251c

dhardy requested review from vks and benjamin-lieser October 24, 2024 17:13

dhardy added 4 commits October 24, 2024 18:49

Revert CDF test

6e8f8d8

Fix benchmark

8e709c3

CHANGELOG

242b32b

Merge branch 'master' into zeta-cts

163f1b3

dhardy marked this pull request as ready for review October 25, 2024 07:19

vks approved these changes Oct 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zipf: let n have type F #1518

Zipf: let n have type F #1518

dhardy commented Oct 24, 2024

benjamin-lieser commented Oct 24, 2024

dhardy commented Oct 25, 2024

benjamin-lieser commented Oct 25, 2024 •

edited

Loading

dhardy commented Oct 25, 2024

vks left a comment

vks Oct 25, 2024

benjamin-lieser commented Oct 25, 2024

Zipf: let n have type F #1518

Are you sure you want to change the base?

Zipf: let n have type F #1518

Conversation

dhardy commented Oct 24, 2024

Summary

Motivation

Details

benjamin-lieser commented Oct 24, 2024

dhardy commented Oct 25, 2024

benjamin-lieser commented Oct 25, 2024 • edited Loading

dhardy commented Oct 25, 2024

vks left a comment

Choose a reason for hiding this comment

vks Oct 25, 2024

Choose a reason for hiding this comment

benjamin-lieser commented Oct 25, 2024

benjamin-lieser commented Oct 25, 2024 •

edited

Loading