Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add truncated distributions to rand / rand_distr / new rand-crate #1189

Open
CGMossa opened this issue Oct 6, 2021 · 10 comments
Open

Add truncated distributions to rand / rand_distr / new rand-crate #1189

CGMossa opened this issue Oct 6, 2021 · 10 comments
Labels
X-stale Outdated or abandoned work

Comments

@CGMossa
Copy link
Contributor

CGMossa commented Oct 6, 2021

Background

What is your motivation?
I'm writing a simulation and the truncated distributions is a great convenience for sampling
parameters.

What type of application is this? (E.g. cryptography, game, numerical simulation)
Statistics, numerical simulation, etc.

Feature request

I would like to have truncated distributions as part of (perhaps) rand_distr or it could be a new
crate rand_trunc maybe. I'm asking for some guidance here to figure out the best course of action.

@vks
Copy link
Collaborator

vks commented Oct 6, 2021

Could you please be a bit more specific about what you would like to see implemented? Is it just a few distributions or something more complex?

@CGMossa
Copy link
Contributor Author

CGMossa commented Oct 6, 2021

Personally, I would like to have a truncnorm implemented. But I think there is a general approach that can be used to implemented, in a way that you define a truncation (a,b) and then apply that to whatever distribution you want to be truncated.

@dhardy
Copy link
Member

dhardy commented Oct 6, 2021

If it's just that we could add a support function:

pub trait Distribution<T> {
    // ...
    
    fn sample_truncated<R: Rng + ?Sized>(&self, rng: &mut R, min: T, max: T) -> T where T: PartialOrd {
        loop {
            let x = self.sample(rng);
            if min <= x && x <= max {
                return x;
            }
        }
    }
}

@dhardy
Copy link
Member

dhardy commented Oct 6, 2021

Or even a wrapper:

pub trait Distribution<T> {
    // ....

    fn truncated(self, min: T, max: T) -> Truncated<Self> where T: PartialOrd { .. }
}

// parameters: (distr, min, max)
pub struct Truncated<T, D: Distribution<T>>(D, T, T);
impl<T, D: Distribution<T>> Distribution<T> for Truncated<T, D> { .. }

Edit: added bounds to Truncated.

@CGMossa
Copy link
Contributor Author

CGMossa commented Oct 6, 2021

I think either will work fine for now.
Maybe the wrapper-type is better; I believe that there are efficient sampling algorithms written for truncated-normal,
and maybe a few others, that would serve better if they were used, rather than the rejection-sampling scheme you've presented.
But they are a bit tricky, and I don't want to go through them right now.

I look at Julia's implementation of these, see here.

@dhardy
Copy link
Member

dhardy commented Oct 6, 2021

Ah, so this is Julia's implementation for truncated normal with finite μ. It does some computation on the bounds (which could in theory be done earlier when the bounds are available at construction time) then calls a different sampling algorithm.

Any implementing distribution could override the sample_trunctated method allowing use of a different algorithm, but precomputation on the bounds is not possible.

The Truncated wrapper should allow the same but only when Rust supports specialisation, which it currently doesn't, therefore it is worse (for now). It should also allow overriding the truncated constructor method, therefore allowing precomputation on the bounds, with the limitation that the same Truncated type must be returned.

A third option would be to make the Truncated type an associated type of the trait, which allows more flexibility, but since we don't wish to add that to Distribution (traits with associated types are not object safe), it would require a new trait:

pub trait TruncateDistribution<T: PartialOrd>: Distribution<T> {
    type Truncated;
    fn truncated(self, min: T, max: T) -> Self::Truncated;
}

However, we can't have a default implementation of this trait without specialisation, thus it's arguable that this is any better than simply constructing a new distribution for TruncatedNormal. In fact, we could add TruncatedNormal now and the TruncateDistribution trait later (once Rust supports specialisation).

Which brings us back to the question of what we want to do: add a convenience function for sampling-with-rejection or implement new specific truncated-distribution algorithms? If the latter, then I think the first move would be a PR adding TruncatedNormal into rand_distr, if you'd like to contribute.

@vks I think since we would likely never have many truncation-specific variants they should just live in rand_distr alongside the existing implementations?

@vks
Copy link
Collaborator

vks commented Oct 7, 2021

I agree that adding TruncatedNormal for now seems the way to go. I'm not worried about having too many truncated distributions in the root namespace, we can move them to a module (as a breaking change) should it be necessary.

I'm a bit skeptical about the rejection sampling approach for generic distributions, it seems very easy to get high rejection rates. Wouldn't inverse transform sampling work better here?

@dhardy dhardy added the X-stale Outdated or abandoned work label Jul 20, 2024
@dhardy
Copy link
Member

dhardy commented Jul 20, 2024

Conclusion: we will accept a PR adding a TruncatedNormal distribution to rand_distr, should someone wish to contribute.

We may revisit the idea of adding a generic support for truncated distributions later, but in general there isn't a whole lot of interest. (There's also the question of how this should work. The trivial answer is rejection sampling. @vks suggested inverse (CDF) transform sampling, but we don't have general representations of (inverse) CDF functions, so can't simply implement this in a wrapper..)

@dhardy dhardy closed this as completed Jul 20, 2024
@CGMossa
Copy link
Contributor Author

CGMossa commented Jul 20, 2024

Uhm.. I'm still interested in contributing here (even though it has been years). Was there maybe a list of tasks needed to do this, you mind sharing with me please.

@dhardy
Copy link
Member

dhardy commented Jul 20, 2024

Great! So, the first step should be to implement a truncated normal distribution. That article includes notes on multiple methods of sampling from the distribution.

We should start only with one distribution to focus on how best to implement this. Simply rejecting samples outside the acceptance range is the easiest option, but will run into performance issues if the acceptance region is small or only covers low-probability parts of the distribution, so we should look into alternatives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-stale Outdated or abandoned work
Projects
None yet
Development

No branches or pull requests

3 participants