Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with MeasureTheory.jl #8

Open
ablaom opened this issue Oct 30, 2021 · 5 comments
Open

Integration with MeasureTheory.jl #8

ablaom opened this issue Oct 30, 2021 · 5 comments

Comments

@ablaom
Copy link
Member

ablaom commented Oct 30, 2021

The atomic objects defined in this package are just non-negative measures over a labelled sample space (see here) and so ought to fit into MeasureTheory.jl framework.

@cscherrer Be great if you can give a run-down of what's required. This package is a port of functionality still in MLJBase but with plans to replace it. While it's now publicly available, I've not promoted it all and there's scope for fixing things you may not like.

@davibarreira
Copy link

Very nice package! I've been looking for a proper way to deal with finite discrete measures for a while, since there is no such type of distribution in Distributions.jl. Now, am I mislead by the name or is this implementation not "well suitable" for actually multivariate distributions, I mean, when instead of unordered labels we have something like many samples from R^n?

@cscherrer
Copy link

Hi @ablaom , thanks for the ping :)

Generally, for some d::D to be a Measure requires two things:

  1. At least one method of Measurebase.logdensity(d::D, x) or Measurebase.density(d::D, x), returning a float
  2. A method basemeasure(d::D), returning another value satisfying the same interface.

It's valid for a measure to have itself as a base measure, which would make that measure "primitive". In this case we should get a log-density of 0.0.

The interface is still in development, and I'd welcome collaboration. "Primitiveness" should probably be trait-based.

If A → B means "A.jl has B.jl as a dependency", I can imagine three possible setups:

  1. MeasureTheory → CategoricalDistributions. This could make sense if CD is light-weight, flexible, and fairly general, with good performance.
  2. CategoricalDistributions → MeasureBase. This would require a little more for things to work well, in that you'd need to define a basemeasure instance. That should be easy, but it also makes your package a bit heavier (but still not bad IMO, MB is pretty light-weight)
  3. MeasureBase ← CategoricalMeasures → CategoricalDistributions. Here the middle package would be a new glue package extending the other two to work well together.

In any case, in MeasureTheory we usually have a logdensity that only depends on the data, with other terms pushed into the base measure. This may not be an issue in this case, IIRC there's no normalization factor here. But it may come up if non-normalized weights are included. Anyway, if there is a normalization, I'd suggest having a logpdf method that's something like

logpdf(cd::CategoricalDistribution) = datadependentterms(cd) + normalizationterms(cd)

though probably with different names ;)

Splitting things up in this way makes it easy to optimize product measures, pulling the normalization terms of of the loop.

@ablaom
Copy link
Member Author

ablaom commented Oct 31, 2021

@ablaom
Copy link
Member Author

ablaom commented Oct 31, 2021

@cscherrer Thanks for that.

@davibarreira No, you are not mislead. This is not yet multivariate.

@davibarreira
Copy link

Thanks for the answer @ablaom.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: tracking/discussion/metaissues/misc
Development

No branches or pull requests

3 participants