Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unconstrained/transformed variables to InferenceData #230

Closed
ColCarroll opened this issue Sep 12, 2018 · 12 comments · Fixed by #2086
Closed

Add unconstrained/transformed variables to InferenceData #230

ColCarroll opened this issue Sep 12, 2018 · 12 comments · Fixed by #2086
Labels
Feature Request New functionality requests from users

Comments

@ColCarroll
Copy link
Member

@ahartikainen suggested adding unconstrained variables to the schema somewhere. It seems like they could go

  1. Make new unconstrained_posterior group
    Good:
    • Explicit is better than implicit: it is clear what this group is for
      Bad:
    • Should there also be unconstrained_prior?
  2. Add them to the existing posterior group
    Good:
    • Can reuse existing coords/dims
    • Can add to prior in the same way
      Bad:
    • Would need a naming convention, which we've managed to avoid so far
    • PyMC3 currently does this, and it is kind of a pain to have a flag everywhere to include or exclude these variables.
  3. Don't include them at all
    Good:
    • No work :)
      Bad:
    • Bounds and domain information are not captured anywhere currently
@aloctavodia
Copy link
Contributor

I like option 3! :-)

If the main purpose of including unconstrained variables is to explore things like divergences, then option 1 becomes the best option as there is no motivation (other than consistency) to add unconstrained_prior.

BTW, shouldn't the bounds and domain information be part of a variable type (or something similar), currently we only have this for discrete and continuous variables (because we have integers and floats types). This will be super useful for circular variables as we can automatically compute proper statistics and plots for such variables.

@ahartikainen
Copy link
Contributor

I think option 2 could be the easiest. If we add an option to transform everything to unconstrained space if a user wants it. That way we do not really need to add anything to the schema. Unless there is a notion on metadata somewhere.

I don't think users normally would need these values.

@junpenglao
Copy link
Contributor

Vote for option 2 for the convenience of backward compatibility with pymc3 😬

@OriolAbril OriolAbril added the Feature Request New functionality requests from users label Jan 15, 2020
@dfm
Copy link

dfm commented Mar 13, 2021

I just came across this because I often find myself wanting access to the transformed variables in a PyMC3 trace. I often need this in order to evaluate model components after sampling (for example, I often want to plot the model prediction on a fine grid that is too large/memory intensive to save as a deterministic, but is fine to compute at a few points in a thinned chain). Right now, it looks like it would be easy enough to allow opt-in support for PyMC3 here:

var_names = self.pymc3.util.get_default_varnames(
self.trace.varnames, include_transformed=False
)

since include_transformed is hard-coded to False. I'd be keen to make this a parameter of the PyMC3Converter to at least have it as an option (this would be option 2 from above). Any feelings about this?

@ColCarroll
Copy link
Member Author

hey dan! I think I'm generally on team option 3 still, but also on team "lets help out science". So I'm totally on board with you adding a little backdoor option to allow this on pymc3, and if you do, maybe I can open a second issue requesting help to add it to various other backends (or even rethinking the design)?

@ahartikainen
Copy link
Contributor

Do we want transformed vars into the main idata or a new idata containing transformed data?

Or new groups?

@OriolAbril
Copy link
Member

I think if present they should be in the posterior, I have done this for now whenever I need to transform variables, and we have some examples on idata methods like map that transform variables post sampling and store them in the same group the original variable was (some transforms are also rescaling observations+posterior predictive at the same time, not only posterior vars)

@aseyboldt
Copy link
Contributor

I would also like to see a way to get the transformed variables in the trace. The fact that this isn't possible right now is pretty much the only reason I still use the original trace object of pymc3 quite a bit. I think I might even prefer if one of option 1 or 2 was the default behavior.
The most common use cases I can see would be

  • When diagnosing convergence issues the untransformed space often doesn't tell you much. If the transformation is just a log transform, that is easy to fix, but say for a Dirichlet this isn't the case anymore. I think we should always think in the transformed space when making scatter plots of the posteriors, looking for funnels, correlations etc.
  • Anything that uses the logp has to also take into account which space we are talking about. The logp that is stored in the trace is with respect to the unconstrained space, so it doesn't even make much sense when you don't know what this is.
  • If we interact with the model a bit more, because for example we want to also look at the gradients of the logp (this can tell us something about correlations), or if we want to compute sensitivities with respect to some other variable, we need to interact with the aesara graph of the logp function. This is defined in terms of the transformed variables though. Getting at the gradients is easy with the original trace:
func = model.logp_dlogp_function()
func.set_extra_values({})
for point in trace.points():
    logp, grad = func(func.dict_to_array(point))

This is a lot more work if if we do not have the pymc3 trace, because we first have to manually map the values to the transformed space.

@OriolAbril
Copy link
Member

I don't generally use unconstrained variables, but I think we should reach a decision on this and imo option 3 which is what we defaulted after not reaching a consensus is generating too much friction and extra work (like the complications in #2040 cc @karink520) to be a viable option.

I think someone should write a proposal and call for a vote. I offer to write the proposal at some point next week if nobody can do it before then.

If I do have to write the proposal, as I am not very familiar with the issue it would be great to know what PyMC and Stan do on this and what are the general expectations on this:

  • Do both Stan and PyMC allow returning transformed variables in the output with some "addendum" to the variable name?
  • I think most users don't care about this. So we'd want to exclude those by default. Is that something that you generally do on a model/run basis (i.e. we'd default to exclude those in the conversion directly unless requested) or something you generally decide on a plot/summary basis?

also cc @sethaxen and @mitzimorris who might be interested in the discussion

@mitzimorris
Copy link
Contributor

mitzimorris commented Jun 7, 2022

regarding the question, does Stan return variables on the unconstrained scale, in CmdStan, sampler argument 'diagnostic_file' is used to report the model variables on the unconstrained scale.
because this is its own CSV file, the variable names remain the same.

does this answer the above question?

having access to the parameters on the unconstrained scale is useful if you're interested in the posterior geometry and how well an algorithm can deal with it.

@ahartikainen
Copy link
Contributor

One can access the unconstrained samples with pystan, but it needs to be done explicitly.

@mitzimorris
Copy link
Contributor

the underlying mechanism here is calling the Stan model class member function log_prob to get the unconstrained parameter values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request New functionality requests from users
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants