loo_compare for crps and loo_crps #220

bnicenboim · 2023-03-31T10:52:49Z

I just discovered crps! This is great, I was actually asking about how to compare probabilistic vs non-probabilistic predictions https://discourse.mc-stan.org/t/comparison-of-machine-learning-models-and-bayesian-models/30833 and this is exactly what I needed.

I just noticed that loo_compare and print are not implemented. If it's ok, I volunteer to implement it. I just want to know how would you call the predictions, "elpd" doesn't make much sense. Maybe "ps"? (predictive score)?

The text was updated successfully, but these errors were encountered:

jgabry · 2023-04-01T20:13:51Z

Seems like a good idea, thanks @bnicenboim. @avehtari what do you think?

Maybe "ps"? (predictive score)?

Maybe just "score"?

bnicenboim · 2023-04-02T08:22:33Z

On Sat, Apr 1, 2023, 10:14 PM Jonah Gabry ***@***.***> wrote: Seems like a good idea, thanks @bnicenboim <https://github.com/bnicenboim>. @avehtari <https://github.com/avehtari> what do you think? Maybe "ps"? (predictive score)? Maybe just "score"?

The problem is that the Brier score is kind of related and a higher score indicates a worst prediction. I think it's important to indicate that this is a predictive score.

…

— Reply to this email directly, view it on GitHub <#220 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABNUQ6TDLLCPIAGLG4RZMVLW7CEAVANCNFSM6AAAAAAWOPVNTY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

jgabry · 2023-04-02T23:02:18Z

The problem is that the Brier score is kind of related and a higher score indicates a worst prediction

Good point. Let's see if @avehtari or any other loo developers (or users) have an opinion on the name.

avehtari · 2023-04-03T18:06:59Z

I just want to know how would you call the predictions, "elpd" doesn't make much sense

Can you be more specific? Are you talking about what is printed or what are the names in the data structure storing the needed information. elpd is computed using predictions and data, so it's not a prediction either.

We have also loo_predictive_metric(), so it would be useful to think that at the same time. ((s)crps have their own functions as they need the extra argument).

When printing in loo_compare, it would be good to show e.g. crps_diff, rmse_diff etc.

The problem is that the Brier score is kind of related and a higher score
indicates a worst prediction. I think it's important to indicate that this
is a predictive score.

Yeah, it's annoying that Brier is in that direction, but Brier score can be a predictive score, too.

bnicenboim · 2023-04-04T09:19:27Z

I just want to know how would you call the predictions, "elpd" doesn't make much sense

Can you be more specific? Are you talking about what is printed or what are the names in the data structure storing the needed information. elpd is computed using predictions and data, so it's not a prediction either.

Sorry, I was being too sloppy, I mean the following, this is the output when you print a loo object, and the structure of the object:

suppressMessages(library(loo))
suppressMessages(library(rstanarm))
data("kidiq")
fit <- stan_glm(kid_score ~ mom_hs + mom_iq, data = kidiq, refresh=0)
(l <- loo(fit))
#> 
#> Computed from 4000 by 434 log-likelihood matrix
#> 
#>          Estimate   SE
#> elpd_loo  -1876.2 14.2
#> p_loo         4.1  0.4
#> looic      3752.3 28.4
#> ------
#> Monte Carlo SE of elpd_loo is 0.0.
#> 
#> All Pareto k estimates are good (k < 0.5).
#> See help('pareto-k-diagnostic') for details.
l$pointwise |> head()
#>       elpd_loo mcse_elpd_loo       p_loo     looic influence_pareto_k
#> [1,] -5.696742  0.0026537646 0.034212237 11.393485         0.10213248
#> [2,] -4.208480  0.0009781805 0.004128595  8.416959        -0.02607779
#> [3,] -4.032512  0.0007344157 0.002412484  8.065024         0.02448749
#> [4,] -3.855000  0.0005129338 0.001242125  7.710000        -0.09151790
#> [5,] -5.290719  0.0019263523 0.016870922 10.581438         0.09749258
#> [6,] -4.027275  0.0012510816 0.006134031  8.054551         0.05941903

^{Created on 2023-04-04 with reprex v2.0.2}

And this is for crps

suppressMessages(library(loo))
suppressMessages(library(rstanarm))
data("kidiq")
fit <- stan_glm(kid_score ~ mom_hs + mom_iq, data = kidiq, refresh=0)
ypred1 <- posterior_predict(fit)
ypred2 <- posterior_predict(fit)
(ps <- crps(ypred1, ypred2, y = fit$y))
#> $estimates
#>    Estimate          SE 
#> -10.2131439   0.3440731 
#> 
#> $pointwise
#>          1          2          3          4          5          6          7 
#> -25.441469  -9.469547  -7.159489  -4.931622 -21.550196  -7.035573 -30.444535 
#>          8          9         10         11         12         13         14 
#>  ...

I meant how to organize $pointwise as a matrix and how to call the estimates. If maybe ps, or score (I don't think it's a good idea), or just crps. I vote for ps, because then if someone (I can do it) implements rps, it will have the same name. Then loo_compare could show ps_diff.

We have also loo_predictive_metric(), so it would be useful to think that at the same time. ((s)crps have their own functions as they need the extra argument).

I also asked about this here #221, why do they need an extra argument, one could split the predictions array, and use half for the first argument and half for the second one:

If you just divide the predictions in two, the estimates are virtually the same:

(ps2 <- crps(ypred1[1:2000,], ypred1[2001:4000,], y = fit$y))

avehtari · 2023-04-11T16:05:03Z

I think ps is too short. Why not use the actual acronym for each scrps, crps, rps? If it just says ps, how would you know which has been used? Furthermore, for consistency, there should be _loo ending. Like this:

#> 
#> Computed from 4000 by 434 log-likelihood matrix
#> 
#>          Estimate   SE
#> scrps_loo  -1876.2 14.2
#> ------
#> Monte Carlo SE of scrps_loo is 0.0.
#> 
#> All Pareto k estimates are good (k < 0.5).
#> See help('pareto-k-diagnostic') for details.

#>       scrps_loo mcse_scrps_loo influence_pareto_k
#> [1,] -5.696742  0.0026537646 0.10213248
#> [2,] -4.208480  0.0009781805 -0.02607779
#> [3,] -4.032512  0.0007344157 0.02448749
#> [4,] -3.855000  0.0005129338 -0.09151790
#> [5,] -5.290719  0.0019263523 0.09749258
#> [6,] -4.027275  0.0012510816 0.05941903

This would also allow that that matrix could include both columns scrps_loo and crps_loo.

I also asked about this here #221, why do they need an extra argument, one could split the predictions array, and use half for the first argument and half for the second one:

If you just divide the predictions in two, the estimates are virtually the same:

I think it would be clearer to have this discussed in a separate issue (and tag @LeeviLindgren). I don't expect there is much difference in the easy case, but if the variability of the importance weights is big then a) the importance weights are different for the two halves (while in the current implementation they are the same), b) the number of draws for the importance weighting is halved, which increases Monte Carlo error. It is possible that in practice a + b are not a problem, but this should be investigated more thoroughly.

bnicenboim · 2023-04-12T12:25:10Z

I think ps is too short. Why not use the actual acronym for each scrps, crps, rps? If it just says ps, how would you know which has been used?

ok, sure, it makes sense

Furthermore, for consistency, there should be _loo ending.

But what if one uses crps, it won't (necessarily) be leave one out, it could be kfold or held out data.

And also this: Computed from 4000 by 434 log-likelihood matrix is only relevant for the loo_crps/ loo_scrps, right? Should I just leave it empty in the other cases?

avehtari · 2023-04-12T15:19:24Z

But what if one uses crps, it won't (necessarily) be leave one out, it could be kfold or held out data.

For kfold and held out data, we have https://mc-stan.org/loo/articles/loo2-elpd.html, and could follow that or do complete redesign.

And also this: Computed from 4000 by 434 log-likelihood matrix is only relevant for the loo_crps/ loo_scrps, right? Should I just leave it empty in the other cases?

For non-loo, it could still report the size of the yrep and y

avehtari · 2023-11-29T17:19:32Z

@bnicenboim should we have a video call to discuss this?

bnicenboim · 2023-11-29T20:39:28Z

I'm sorry, as I mentioned in the other issue, I won't have time until mid January (hopefully).

bnicenboim mentioned this issue Mar 31, 2023

crps for non Bayesian models #221

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loo_compare for crps and loo_crps #220

loo_compare for crps and loo_crps #220

bnicenboim commented Mar 31, 2023

jgabry commented Apr 1, 2023

bnicenboim commented Apr 2, 2023 via email

jgabry commented Apr 2, 2023

avehtari commented Apr 3, 2023

bnicenboim commented Apr 4, 2023 •

edited

Loading

avehtari commented Apr 11, 2023

bnicenboim commented Apr 12, 2023

avehtari commented Apr 12, 2023

avehtari commented Nov 29, 2023

bnicenboim commented Nov 29, 2023

loo_compare for crps and loo_crps #220

loo_compare for crps and loo_crps #220

Comments

bnicenboim commented Mar 31, 2023

jgabry commented Apr 1, 2023

bnicenboim commented Apr 2, 2023 via email

jgabry commented Apr 2, 2023

avehtari commented Apr 3, 2023

bnicenboim commented Apr 4, 2023 • edited Loading

avehtari commented Apr 11, 2023

bnicenboim commented Apr 12, 2023

avehtari commented Apr 12, 2023

avehtari commented Nov 29, 2023

bnicenboim commented Nov 29, 2023

bnicenboim commented Apr 4, 2023 •

edited

Loading