-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance enhancement: freeze
the internal LittleDicts
#56
Comments
Hello @ablaom sir, I would like to work for this issue. CategoricalDistributions.jl/src/types.jl Line 174 in 5c19ce3
Can you help me what further I am supposed to do , I will be more than happy to contribute ! |
Thanks @Jay-sanjay for the offer of help. Please understand that there's no guarantee that this will substantially speed things up, but I thought it might be interesting to investigate. First thing is to establish some benchmarks, to make sure we don't make important things significantly worse. I've attached some suggested benchmarks at the end. So, have a look at those, get a little familiar with what is being benchmarked, and run the benchmarks locally on your machine. Adding new type parameter
|
return UnivariateFiniteArray(S, parent_decoder, LittleDict(pairs...)) |
Later, we could try some further optimisations to construct the tuple-based dicts more directly from the keys and values of the supplied dictionary, but let's leave that for later.
Maybe some unit tests will need fixing but hopefully that's it.
Benchmarking code
using CategoricalArrays
using CategoricalDistributions
import Random
import BenchmarkTools as B
import Distributions as D
using Pkg
# for benchmark reproducability:
Random.seed!(123)
# number of samples to take in benchmarks; increase to lower σ in benchmarks, decrease if
# impatient:
S = 100
# size of data for fitting:
N = 10000
# size of multi-samples and `UnivariateFiniteVector`s
K = 10_000
pool = collect("abcde") |> categorical;
data = rand(pool, N);
bench(b) = B.run(b, samples=S, seconds=60)
println()
Main.versioninfo()
println()
Pkg.status()
println()
b = B.@benchmarkable D.fit(UnivariateFinite, $data)
@info "`fit`" bench(b)
true_probs = rand(K);
b = B.@benchmarkable UnivariateFinite($true_probs, augment=true, pool=missing)
@info "constructors - binary case, missing pool" bench(b)
binary_pool = [false, true] |> categorical
b = B.@benchmarkable UnivariateFinite($binary_pool, $true_probs, augment=true)
@info "constructors - binary case, labels specified" bench(b)
probs = rand(K, 5);
probs = probs ./ sum(probs, dims=2);
b = B.@benchmarkable UnivariateFinite($probs, pool=missing)
@info "constructors - multiclass case, missing pool" bench(b)
b = B.@benchmarkable UnivariateFinite($pool, $probs)
@info "constructors - binary case, labels specified" bench(b)
d = UnivariateFinite(pool, fill(0.2, 5));
b = B.@benchmarkable rand($d)
@info "`rand` - single sample" bench(b)
b = B.@benchmarkable rand($d, K)
@info "`rand` - multiple samples" bench(b)
x = 'a'
b = B.@benchmarkable pdf($d, $x)
@info "`pdf` - single distribution" bench(b)
d_vector = UnivariateFinite(pool, probs);
b = B.@benchmarkable pdf.($d_vector, $x)
@info "`pdf` - multi-distribution" bench(b)
b = B.@benchmarkable mode($d)
@info "`mode` - single distribution" bench(b)
b = B.@benchmarkable mode.($d_vector)
@info "`mode` - multi-distribution" bench(b)
OrderedCollections.jl has a
freeze
command to make aLittleDict
immutable but faster.@OkonSamuel
The text was updated successfully, but these errors were encountered: