Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluation for ordinal predictions #846

Open
elray1 opened this issue Jun 27, 2024 · 1 comment
Open

evaluation for ordinal predictions #846

elray1 opened this issue Jun 27, 2024 · 1 comment

Comments

@elray1
Copy link
Collaborator

elray1 commented Jun 27, 2024

This is a request for evaluation of ordinal categorical predictions.

An example of predictions for an ordinal target can be found here. For that target, the category levels are "low", "moderate", "high", "very high".

To align with what's been proposed for nominal forecasts, we could have an input format as follows, where the predicted_label and observed columns are ordered factors:

                 model location reference_date horizon target_end_date                    target output_type predicted_label    predicted observed
                <char>   <char>         <Date>   <int>          <Date>                    <char>      <char>           <ord>        <num>    <ord>
  1: Flusight-baseline       25     2022-11-19       0      2022-11-19 wk flu hosp rate category         pmf             low 9.999997e-01      low
  2: Flusight-baseline       25     2022-11-19       0      2022-11-19 wk flu hosp rate category         pmf        moderate 2.677124e-07      low
  3: Flusight-baseline       25     2022-11-19       0      2022-11-19 wk flu hosp rate category         pmf            high 0.000000e+00      low
  4: Flusight-baseline       25     2022-11-19       0      2022-11-19 wk flu hosp rate category         pmf       very high 0.000000e+00      low
  5: Flusight-baseline       25     2022-11-19       1      2022-11-26 wk flu hosp rate category         pmf             low 9.999983e-01 moderate
 ---                                                                                                                                              
188:          PSI-DICE       48     2022-12-17       2      2022-12-31 wk flu hosp rate category         pmf       very high 7.108169e-05 moderate
189:          PSI-DICE       48     2022-12-17       3      2023-01-07 wk flu hosp rate category         pmf             low 8.184334e-02 moderate
190:          PSI-DICE       48     2022-12-17       3      2023-01-07 wk flu hosp rate category         pmf        moderate 8.705084e-01 moderate
191:          PSI-DICE       48     2022-12-17       3      2023-01-07 wk flu hosp rate category         pmf            high 4.764736e-02 moderate
192:          PSI-DICE       48     2022-12-17       3      2023-01-07 wk flu hosp rate category         pmf       very high 8.894322e-07 moderate

Setting notation, let $f(k)$ and $F(k)$ be the submitted predictive pmf and the implied predictive cdf obtained via $F(k) = \sum_{j \leq k} f(k)$, with $K$ total categories so that $k \in {1, \ldots, K}$ and the observed value $y \in {1, \ldots, K}$. Additionally, adopt the convention that $f(0) = F(0) = 0$. Some scores/metrics that it would be nice to support for ordinal forecasts include:

  • log score: $\log[f(y)]$
  • ranked probability score: $\sum_j [F(j) - 1(y \leq j)]^2$, i.e. the sum across ordered category levels of the squared difference between the predictive cdf and the empirical cdf corresponding to a point mass at the observed category level
  • PIT values, with randomization: $F(k - 1) + U * f(k)$ where $U \sim Unif(0, 1)$

For both nominal and ordinal categorical forecasts, there are also all sorts of things based on summaries of confusion matrices, e.g. precision, recall, and F scores. For me personally those are less of a priority.

@nikosbosse nikosbosse added this to the scoringutils-2.x milestone Jul 14, 2024
@seabbs
Copy link
Contributor

seabbs commented Jul 30, 2024

Now that #837 this should be readily doable. @nikosbosse have you had any thoughts about if there is anything we can do here to reduce code duplication across nominal and ordinal predictions?

One option is just using internal functions fairly heavily and the other more complicated option is some kind of s3 class hierarchy but I am not totally sure that is worth it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

3 participants