Dictionaries-based implementation appears 45x faster for rand #64

JockLawrie · 2023-10-11T01:45:38Z

Hi there,

I need a categorical distribution with a fast rand method to be used in simulations.
I found the bare-bones implementation below to be about 45x faster for rand on my machine.
It is based on Dictionaries.jl.

Hopefully I've done the benchmarking correctly.
If so, is a Dictionaries.jl-based implementation suitable for CategoricalDistributions.jl?
Can it cover the uses cases that CategoricalDistributions.jl satisfies?

using Dictionaries
using Distributions
using Random

struct CatDist{K} <: Distribution{Univariate, Discrete}
    dict::Dictionary{K, Float64}  # TODO: Constructor that checks probs, unique levels, etc
end

CatDist(levels, probs) = CatDist(Dictionary(levels, probs))

Base.eltype(d::CatDist{K}) where {K} = K

function Base.rand(rng::Random.AbstractRNG, d::CatDist{K}) where {K}
    u = rand(rng)
    total = 0.0
    for (level, prob) in pairs(d.dict)
        total += prob
        u <= total && return level
    end
end

################################################################
using BenchmarkTools

d1 = Categorical([0.5, 0.4, 0.1])
@benchmark rand($(d1))  # 16ns

d2 = CatDist(["maybe", "no", "yes"], [0.5, 0.4, 0.1])
@benchmark rand($(d2))  # 16ns

using CategoricalDistributions
d3 = UnivariateFinite(["maybe", "no", "yes"], [0.5, 0.4, 0.1]; pool=missing)
@benchmark rand($(d3))  # 730ns

The text was updated successfully, but these errors were encountered:

ablaom · 2023-10-15T04:12:45Z

Thanks @JockLawrie for the insights here.

Started playing around with this but tripped on #65. I'll return to this after addressing that.

ablaom · 2023-10-18T23:51:53Z

Okayed, I've looked into this bit more.

Choice of container

Currently the implementation already uses dictionaries but:

We use LittleDict from OrderedCollections
The dicts are keyed on CategoricalArrays's internal reference integers not on the raw labels.

If we swap out Dictionaries.Dictionary from your implementation with LittleDict there is no significant change in performance (benchmarks attached below). So the container is not the issue here. (Side note, we currently use the Vector version of LittleDict's, which might be slightly improved by using the tuple version, ie, by freezeing the dicts, but I doubt the benefit will be much.)

Precomputation of inverse pdf

In the current implementation, a cumulative pdf is pre-computed (following the implementation of Distributions.Categorical, but which may have changed since we adapted it). This avoids repeating arithmetic when drawing multiple samples at once. So, if we repeat your experiment for 10_000 samples, but use UnivariteFinite's implementation for multiple samples, then the performance is basically the same. If we increase the number of classes from 3 to 100, the UnivariateFinite implementation is faster (but still the same order of magnitude). Multiple sampling may be even faster if we switch to the alias method and precompute the alias tables.

Next steps for the single sample case

It doesn't seem likely to me that the slowdown you have observed is mostly due to the precomputation of the inverse CDF, but I haven't tested this. My guess is the slowdown is somewhere else. Perhaps if we reimplemented UnivariateFinite using dictionaries keyed directly on the raw labels, that might be faster. Doing might slow other methods down - I'm not sure. More investigation is needed before justifying the substantial refactor that would entail.

Needs CategoricalDistributions 0.1.12:

using Dictionaries
using OrderedCollections
using Distributions
using Random
using CategoricalDistributions

struct CatDist{K} <: Distribution{Univariate, Discrete}
    dict::Dictionary{K, Float64}  # TODO: Constructor that checks probs, unique levels, etc
end

CatDist(levels, probs) = CatDist(Dictionary(levels, probs))

Base.eltype(d::CatDist{K}) where {K} = K

function Base.rand(rng::Random.AbstractRNG, d::CatDist{K}) where {K}
    u = rand(rng)
    total = 0.0
    for (level, prob) in pairs(d.dict)
        total += prob
        u <= total && return level
    end
end

struct LittleCatDist{K} <: Distribution{Univariate, Discrete}
    dict::LittleDict{K, Float64, Vector{K}, Vector{Float64}}
end

LittleCatDist(levels, probs) = LittleCatDist(LittleDict(levels, probs))

Base.eltype(d::LittleCatDist{K}) where {K} = K

function Base.rand(rng::Random.AbstractRNG, d::LittleCatDist{K}) where {K}
    u = rand(rng)
    total = 0.0
    for (level, prob) in pairs(d.dict)
        total += prob
        u <= total && return level
    end
end

## Benchmarks for single sample

using BenchmarkTools

categorical = Categorical([0.5, 0.4, 0.1])
catdist = CatDist(["maybe", "no", "yes"], [0.5, 0.4, 0.1])
littlecatdist = LittleCatDist(["maybe", "no", "yes"], [0.5, 0.4, 0.1])
d = UnivariateFinite(["maybe", "no", "yes"], [0.5, 0.4, 0.1]; pool=missing)

@btime rand($categorical) # 11.6 ns
@btime rand($catdist) # 13.7 ns
@btime rand($littlecatdist) # 12.1 ns
@btime rand($d) # 855 ns

## Benchmarks for multiple samples

N = 10_000
@btime rand($categorical, N);  # 141 μs
@btime [rand($catdist) for i in 1:N];  # 136 μs
@btime [rand($littlecatdist) for i in 1:N];  # 128 μs
@btime rand($d, N);  # 137 μs

## Benchmarks for larger numbers of classes

N = 10_0000
M = 100
p = rand(100)
p = p ./ sum(p)
labels = 1:M

categorical = Categorical(p)
catdist = CatDist(labels, p)
littlecatdist = LittleCatDist(labels, p)
d = UnivariateFinite(labels, p; pool=missing)

@btime rand($categorical, N);  # 1.41 ms
@btime [rand($catdist) for i in 1:N];  # 6.50 ms
@btime [rand($littlecatdist) for i in 1:N];  # 6.40 ms
@btime rand($d, N);  # 4.62 ms

ablaom · 2023-10-18T23:54:30Z

@JockLawrie For your use case, do you actually need a fast call of rand(d) or is it enough that rand(d, n) is fast, for n much bigger than 1?

JockLawrie · 2023-10-19T00:49:14Z

Thanks for looking into this Anthony, much appreciated.

I do need a fast rand(d) for an agent-based model with millions of agents, where the distribution d changes for each individual over time and across replications. That is, I'm calling rand(d) for billions of different d in sequence.

Looks like the LittleDict implementation is just as fast as the Dictionary implementation, so no need to switch from the former to the latter.

Perhaps a pre-computed CDF for rand(d, N) but not for rand(d)?

ablaom · 2023-10-23T21:29:22Z

Closed by #68

ablaom mentioned this issue Oct 19, 2023

Introduce optimizations to speed up rand(d) #68

Merged

ablaom closed this as completed Oct 27, 2023

mkitti mentioned this issue Jan 8, 2024

New package: ObjectStore v0.1.0 JuliaRegistries/General#98486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dictionaries-based implementation appears 45x faster for rand #64

Dictionaries-based implementation appears 45x faster for rand #64

JockLawrie commented Oct 11, 2023 •

edited

Loading

ablaom commented Oct 15, 2023

ablaom commented Oct 18, 2023

ablaom commented Oct 18, 2023

JockLawrie commented Oct 19, 2023

ablaom commented Oct 23, 2023

Dictionaries-based implementation appears 45x faster for rand #64

Dictionaries-based implementation appears 45x faster for rand #64

Comments

JockLawrie commented Oct 11, 2023 • edited Loading

ablaom commented Oct 15, 2023

ablaom commented Oct 18, 2023

Choice of container

Precomputation of inverse pdf

Next steps for the single sample case

ablaom commented Oct 18, 2023

JockLawrie commented Oct 19, 2023

ablaom commented Oct 23, 2023

JockLawrie commented Oct 11, 2023 •

edited

Loading