Council vote: unconstrained variables #2056

OriolAbril · 2022-06-21T16:53:00Z

#230 has been open for too long without any clear consensus. Following https://www.arviz.org/en/latest/governance/processes.html#council-decision-making-process I am opening this issue with a proposal on how to solve this and calling a vote of the RV council. cc @AlexAndorra @sethaxen @ColCarroll @ahartikainen @canyon289 @aloctavodia.

Proposal

Add a new unconstrained_posterior group (and other unconstrained_<group> as needed). I propose that the unconstrained representation of variables with a transformation (no need to duplicate those that are the same) are added to this group with the exact same name as they have in the posterior. The transform can be added as metadata either at variable level or as metadata of the unconstrained_... group and is left to the PPLs to add.

The whole unconstrained posterior can therefore be obtained by taking the variables in unconstrained_posterior and adding those missing using their values form the posterior group.

The text was updated successfully, but these errors were encountered:

OriolAbril · 2022-06-21T16:56:55Z

Note: We now enter the process of asking for clarifications on the proposal's wording, I am locking to prevent people not familiar with the governance process to join the discussion. We should remove the discussion label in 3 days and vote.

Ref: https://www.arviz.org/en/latest/governance/processes.html#call-for-a-vote

Extra note: being this the first "council vote" issue, we should probably also take notes on how it goes and update the process if we think it can be improved now that we'll have tried it

sethaxen · 2022-06-21T17:28:12Z

The transformation will be formulated as a transformation from the unconstrained to constrained variable, correct?

OriolAbril · 2022-06-21T17:43:37Z

The proposal is to store transformed and untransformed samples and make this part of the inferencedata schema.

The formulation of the transformation and how to do it is left completely on the hands of the PPLs and the people writing the converters. It can be the actual function in either direction stored as metadata (probably won't work saving to netcdf or zarr then but metadata can be excluded from the writing process, and would work while using the results after sampling); it can be a keyword that identifies the transform in that PPL, it might also be possible to combine both and use a function like turing_transforms_to/from_metadata...

sethaxen · 2022-06-21T17:50:45Z

Cool, then I vote yes on this proposal.

ahartikainen · 2022-06-21T19:29:53Z

What groups might have this property?

ColCarroll · 2022-06-21T20:43:52Z

I think the idea is that this would be one or two new groups:

posterior_transformed
prior_transformed (maybe -- I don't know why I would want that, or to what end I would use it)

I agree that we should leave this mostly up to the PPLs to implement, but as a suggestion, we might have a posterior:

sd ~ HalfNormal(...)
mu ~ Normal(...)
likelihood ~ Normal(data | mu, sd)

then I would suggest that

idata.posterior has variables sd and mu
idata.posterior_transformed has variable sd
idata.posterior_transformed has metadata including

"transforms": {"sd": "exp"}

ColCarroll · 2022-06-21T20:47:21Z

Note that this path also means it'd be nice if we provide a utility function that "merges" posterior_transformed with posterior, which, if xr.DataSet was like a dictionary, would be just

def get_full_unconstrained_posterior(idata):
  return idata.posterior.update(idata.posterior_transformed)

ahartikainen · 2022-06-21T20:52:58Z

Going to route of transforming only the subset of variables is ok.

We currently don't have functionality which could use this information, but in future we would need both posterior and transformed posterior for this functionality?

ColCarroll · 2022-06-21T21:51:09Z

I think right now only #2040 would use this.

OriolAbril · 2022-06-21T22:17:05Z

Please do not discuss alternatives to the proposal here. This issue is for voting the proposal only and it is locked to everyone but core contributors. Any discussion should happen (or should have already happened) in #230 which is open to all the ArviZ community.

I think right now only #2040 would use this.

Yes, this is currently blocking/dificulting #2040 which is why I called for a vote now. But why the vote was called doesn't really matter. From the governance:

Core Contributors can call for a vote to resolve a target issue they feel has been stale for too long and for which informal consensus appears unlikely. For a vote to be called, the target issue must be at least 2 months old.

The idea now is to make sure the proposal is clear and vote in order to unlock the stalemate.

What groups might have this property?

For now I only see uses to this for posterior and prior (assuming the priors have potentials or something that requires sampling them with hmc too in which case having unconstrained samples can help diagnose any issues that might arise). But the proposal is for the exact same pattern to apply to any other group if anyone finds a use for it.

I think the idea is that this would be one or two new groups:

Yes, there would be new groups called unconstrained_posterior, unconstrained_prior...

then I would suggest that

idata.posterior has variables sd and mu

idata.unconstrained_posterior has variable sd

This is what the proposal covers. All variables will have samples in the posterior group and only those with transformations will have the transformed samples also stored (optionally) in this new group. Yes, PPLs can add metadata as they want. But that won't be part of the inferencedata schema, only the storage of samples in both spaces when available.

Note that this path also means it'd be nice if we provide a utility function that "merges" posterior_transformed with posterior, which, if xr.DataSet was like a dictionary, would be just ...

This is already possible. Datasets have a dict like interface. But outside of the scope of the proposal. Once this is unlocked, the default will be to continue deciding things on a consensus basis in issues and PRs without the council voting.

ahartikainen · 2022-06-21T23:37:06Z

I vote for yes

canyon289 · 2022-06-22T02:07:21Z

Quick question

Oriols comment above says

Note: We now enter the process of asking for clarifications on the proposal's wording, I am locking to prevent people not familiar with the governance process to join the discussion. We should remove the discussion label in 3 days and vote.

Is issue for rewording an accepted idea, or voting on the implementing the idea itself?

OriolAbril · 2022-06-22T07:42:40Z

The main goal of the issue is voting on the proposal described. However, to make sure we are all on the same page and are voting on the same thing. There is a 3 day period where clarifications about what the proposal entails can (and should) be asked. We are currently in this part of the process:

Before voting starts, at least 3 days will be left for Core Contributors to raise doubts about the proposal’s phrasing, no extra discussion will take place in the proposal issue. Proposal issues should be locked from creation to prevent attracting discussion from people not familiar with the decision process.

The idea is definitely not accepted yet, but we are using the issue to "reword" (but not modify) the proposal until is is phrased in a clear and unambiguous manner for everyone

canyon289 · 2022-06-26T00:56:46Z

Ok I vote yes,

I dont see any downsides as it doesnt hurt to have extra groups

aloctavodia · 2022-06-27T16:06:26Z

Yes

OriolAbril · 2022-06-27T20:37:12Z

I also vote yes

AlexAndorra · 2022-07-01T16:32:00Z

Yep, love this!

ColCarroll · 2022-07-04T23:47:26Z

yes

OriolAbril · 2022-07-06T17:09:58Z

The vote has passed 🎉 , I'll close the issue and send a PR with the changes to the schema soonish. Updating of the converters will come later

OriolAbril added Discussion Issue open for discussion, still not ready for a PR on it Governance labels Jun 21, 2022

arviz-devs locked as resolved and limited conversation to collaborators Jun 21, 2022

OriolAbril removed the Discussion Issue open for discussion, still not ready for a PR on it label Jun 26, 2022

OriolAbril closed this as completed Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Council vote: unconstrained variables #2056

Council vote: unconstrained variables #2056

OriolAbril commented Jun 21, 2022

OriolAbril commented Jun 21, 2022 •

edited

Loading

sethaxen commented Jun 21, 2022

OriolAbril commented Jun 21, 2022

sethaxen commented Jun 21, 2022

ahartikainen commented Jun 21, 2022 •

edited

Loading

ColCarroll commented Jun 21, 2022

ColCarroll commented Jun 21, 2022

ahartikainen commented Jun 21, 2022

ColCarroll commented Jun 21, 2022

OriolAbril commented Jun 21, 2022

ahartikainen commented Jun 21, 2022

canyon289 commented Jun 22, 2022

OriolAbril commented Jun 22, 2022

canyon289 commented Jun 26, 2022

aloctavodia commented Jun 27, 2022

OriolAbril commented Jun 27, 2022

AlexAndorra commented Jul 1, 2022

ColCarroll commented Jul 4, 2022

OriolAbril commented Jul 6, 2022

Council vote: unconstrained variables #2056

Council vote: unconstrained variables #2056

Comments

OriolAbril commented Jun 21, 2022

Proposal

OriolAbril commented Jun 21, 2022 • edited Loading

sethaxen commented Jun 21, 2022

OriolAbril commented Jun 21, 2022

sethaxen commented Jun 21, 2022

ahartikainen commented Jun 21, 2022 • edited Loading

ColCarroll commented Jun 21, 2022

ColCarroll commented Jun 21, 2022

ahartikainen commented Jun 21, 2022

ColCarroll commented Jun 21, 2022

OriolAbril commented Jun 21, 2022

ahartikainen commented Jun 21, 2022

canyon289 commented Jun 22, 2022

OriolAbril commented Jun 22, 2022

canyon289 commented Jun 26, 2022

aloctavodia commented Jun 27, 2022

OriolAbril commented Jun 27, 2022

AlexAndorra commented Jul 1, 2022

ColCarroll commented Jul 4, 2022

OriolAbril commented Jul 6, 2022

OriolAbril commented Jun 21, 2022 •

edited

Loading

ahartikainen commented Jun 21, 2022 •

edited

Loading