Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionaries-based implementation appears 45x faster for rand #64

Closed
JockLawrie opened this issue Oct 11, 2023 · 5 comments
Closed

Dictionaries-based implementation appears 45x faster for rand #64

JockLawrie opened this issue Oct 11, 2023 · 5 comments

Comments

@JockLawrie
Copy link

JockLawrie commented Oct 11, 2023

Hi there,

I need a categorical distribution with a fast rand method to be used in simulations.
I found the bare-bones implementation below to be about 45x faster for rand on my machine.
It is based on Dictionaries.jl.

Hopefully I've done the benchmarking correctly.
If so, is a Dictionaries.jl-based implementation suitable for CategoricalDistributions.jl?
Can it cover the uses cases that CategoricalDistributions.jl satisfies?

using Dictionaries
using Distributions
using Random

struct CatDist{K} <: Distribution{Univariate, Discrete}
    dict::Dictionary{K, Float64}  # TODO: Constructor that checks probs, unique levels, etc
end

CatDist(levels, probs) = CatDist(Dictionary(levels, probs))

Base.eltype(d::CatDist{K}) where {K} = K

function Base.rand(rng::Random.AbstractRNG, d::CatDist{K}) where {K}
    u = rand(rng)
    total = 0.0
    for (level, prob) in pairs(d.dict)
        total += prob
        u <= total && return level
    end
end

################################################################
using BenchmarkTools

d1 = Categorical([0.5, 0.4, 0.1])
@benchmark rand($(d1))  # 16ns

d2 = CatDist(["maybe", "no", "yes"], [0.5, 0.4, 0.1])
@benchmark rand($(d2))  # 16ns

using CategoricalDistributions
d3 = UnivariateFinite(["maybe", "no", "yes"], [0.5, 0.4, 0.1]; pool=missing)
@benchmark rand($(d3))  # 730ns
@ablaom
Copy link
Member

ablaom commented Oct 15, 2023

Thanks @JockLawrie for the insights here.

Started playing around with this but tripped on #65. I'll return to this after addressing that.

@ablaom
Copy link
Member

ablaom commented Oct 18, 2023

Okayed, I've looked into this bit more.

Choice of container

Currently the implementation already uses dictionaries but:

  1. We use LittleDict from OrderedCollections
  2. The dicts are keyed on CategoricalArrays's internal reference integers not on the raw labels.

If we swap out Dictionaries.Dictionary from your implementation with LittleDict there is no significant change in performance (benchmarks attached below). So the container is not the issue here. (Side note, we currently use the Vector version of LittleDict's, which might be slightly improved by using the tuple version, ie, by freezeing the dicts, but I doubt the benefit will be much.)

Precomputation of inverse pdf

In the current implementation, a cumulative pdf is pre-computed (following the implementation of Distributions.Categorical, but which may have changed since we adapted it). This avoids repeating arithmetic when drawing multiple samples at once. So, if we repeat your experiment for 10_000 samples, but use UnivariteFinite's implementation for multiple samples, then the performance is basically the same. If we increase the number of classes from 3 to 100, the UnivariateFinite implementation is faster (but still the same order of magnitude). Multiple sampling may be even faster if we switch to the alias method and precompute the alias tables.

Next steps for the single sample case

It doesn't seem likely to me that the slowdown you have observed is mostly due to the precomputation of the inverse CDF, but I haven't tested this. My guess is the slowdown is somewhere else. Perhaps if we reimplemented UnivariateFinite using dictionaries keyed directly on the raw labels, that might be faster. Doing might slow other methods down - I'm not sure. More investigation is needed before justifying the substantial refactor that would entail.

Needs CategoricalDistributions 0.1.12:
using Dictionaries
using OrderedCollections
using Distributions
using Random
using CategoricalDistributions

struct CatDist{K} <: Distribution{Univariate, Discrete}
    dict::Dictionary{K, Float64}  # TODO: Constructor that checks probs, unique levels, etc
end

CatDist(levels, probs) = CatDist(Dictionary(levels, probs))

Base.eltype(d::CatDist{K}) where {K} = K

function Base.rand(rng::Random.AbstractRNG, d::CatDist{K}) where {K}
    u = rand(rng)
    total = 0.0
    for (level, prob) in pairs(d.dict)
        total += prob
        u <= total && return level
    end
end

struct LittleCatDist{K} <: Distribution{Univariate, Discrete}
    dict::LittleDict{K, Float64, Vector{K}, Vector{Float64}}
end

LittleCatDist(levels, probs) = LittleCatDist(LittleDict(levels, probs))

Base.eltype(d::LittleCatDist{K}) where {K} = K

function Base.rand(rng::Random.AbstractRNG, d::LittleCatDist{K}) where {K}
    u = rand(rng)
    total = 0.0
    for (level, prob) in pairs(d.dict)
        total += prob
        u <= total && return level
    end
end

## Benchmarks for single sample

using BenchmarkTools

categorical = Categorical([0.5, 0.4, 0.1])
catdist = CatDist(["maybe", "no", "yes"], [0.5, 0.4, 0.1])
littlecatdist = LittleCatDist(["maybe", "no", "yes"], [0.5, 0.4, 0.1])
d = UnivariateFinite(["maybe", "no", "yes"], [0.5, 0.4, 0.1]; pool=missing)

@btime rand($categorical) # 11.6 ns
@btime rand($catdist) # 13.7 ns
@btime rand($littlecatdist) # 12.1 ns
@btime rand($d) # 855 ns

## Benchmarks for multiple samples

N = 10_000
@btime rand($categorical, N);  # 141 μs
@btime [rand($catdist) for i in 1:N];  # 136 μs
@btime [rand($littlecatdist) for i in 1:N];  # 128 μs
@btime rand($d, N);  # 137 μs

## Benchmarks for larger numbers of classes

N = 10_0000
M = 100
p = rand(100)
p = p ./ sum(p)
labels = 1:M

categorical = Categorical(p)
catdist = CatDist(labels, p)
littlecatdist = LittleCatDist(labels, p)
d = UnivariateFinite(labels, p; pool=missing)

@btime rand($categorical, N);  # 1.41 ms
@btime [rand($catdist) for i in 1:N];  # 6.50 ms
@btime [rand($littlecatdist) for i in 1:N];  # 6.40 ms
@btime rand($d, N);  # 4.62 ms

@ablaom
Copy link
Member

ablaom commented Oct 18, 2023

@JockLawrie For your use case, do you actually need a fast call of rand(d) or is it enough that rand(d, n) is fast, for n much bigger than 1?

@JockLawrie
Copy link
Author

Thanks for looking into this Anthony, much appreciated.

I do need a fast rand(d) for an agent-based model with millions of agents, where the distribution d changes for each individual over time and across replications. That is, I'm calling rand(d) for billions of different d in sequence.

Looks like the LittleDict implementation is just as fast as the Dictionary implementation, so no need to switch from the former to the latter.

Perhaps a pre-computed CDF for rand(d, N) but not for rand(d)?

@ablaom
Copy link
Member

ablaom commented Oct 23, 2023

Closed by #68

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants