reported_cases_opts() #346

seabbs · 2023-01-23T17:16:40Z

Currently, all inputs except for reported_cases are managed via a helper function (i.e delay_opts()) that enables specifying options. It would make sense to standardized reported_cases to be in line with this approach. This would also make sense as currently there are several data processing steps (i.e to deal with missing dates) that occur inside package code that are not well surfaced to the user. Putting these in a reported_cases_opts() function would help resolve this and bring the user closer to the data being used for model fitting which is generally a good idea.

The text was updated successfully, but these errors were encountered:

sbfnk · 2024-01-29T15:21:08Z

Perhaps data_opts() would be better if it's to have preprocessing etc.? I think "reported cases" is a bit too specific a name for the argument anyway (could be hospitalisations etc.).

sbfnk · 2024-03-06T17:15:25Z

A downside of this approach is that, if we want a common data_opts() for all the three models contained in the package (which I think we do) then we can't use this to document requirements of the data (estimate_truncation and estimate_secondary: data frames with specific columns; estimate_truncation: list) and will instead have to point to the specific function documentation.

An alternative solution would be to keep the reported_cases/primary/obs argument(s) as first argument and add preprocessing_opts() for filtering zeros etc. This would also have a potential benefit of being slightly easier to integrate in a pipe.

sbfnk · 2024-03-11T09:57:43Z

Trying to separate the steps involved in closing the issue if going with the approach suggested in the previous comment:

Add filter_leading_zeros and zero_threshold to estimate_secondary and estimate_truncation as explicit arguments
Put these two into a common preprocessing_opts() (or a shorter name?) function and call this from all models
Review if any other preprocessing steps should be made explicit / optional
rename first argument of all three models to data

Once this issue is closed we can then make the data column argument more flexible, addressing #505

jamesmbaazam · 2024-03-11T10:57:45Z

rename first argument of all three models to data

Makes sense to me. Would bullet 4 require us to deprecate the current names?

sbfnk · 2024-03-11T10:58:44Z

Yes, I think so.

jamesmbaazam · 2024-03-11T11:40:22Z

I can work on this if it's good to go.

sbfnk · 2024-03-11T11:45:54Z

Yes, it should be good to go - I think ideally addressing each of the bullet points in sequence using separate PRs.

jamesmbaazam · 2024-04-25T15:25:57Z

I might be wrong but the proposed preprocessing_opts() function is just create_clean_reported_cases() right?.

sbfnk · 2024-04-26T09:00:44Z

I might be wrong but the proposed preprocessing_opts() function is just create_clean_reported_cases() right?.

Good question. There are two options:

we create a preprocessing function that would create clean data that can then passed to the relevant functions
we create a preprocessing_opts() function that collates the arguments (as with the other ..._opts() functions) which are passed to create_clean_reported_cases() within the relevant functions. This would add a preprocessing (or other?) argument to those functions.

Option (2) is might be the easier one as it doesn't require updating any internal logic (e.g. where some internal processing is done before calling create_clean_reported_cases(). It's perhaps also more in line with the existing function/argument structure. But I'm open to arguments for (1).

seabbs · 2024-04-29T10:48:05Z

Is definitely more in line with other bits of the logic but I much prefer the idea of making the data preprocessing apparent and accessible to people (i.e 1.)

sbfnk · 2024-04-29T12:12:27Z

If going down that route we probably want to rename it to create_clean_data() or preprocess_data() in line with the other changes triggered by this issue.

jamesmbaazam · 2024-04-29T12:12:45Z

Is definitely more in line with other bits of the logic but I much prefer the idea of making the data preprocessing apparent and accessible to people (i.e 1.)

Two options seem to be apparent here:

The data argument can only accept objects run through preprocessing_opts()
If users want to pipe the data, they must do raw_data %>% preprocessing_opts() %>% estimate_*().

jamesmbaazam · 2024-04-29T12:20:53Z

If going down that route we probably want to rename it to create_clean_data() or preprocess_data() in line with the other changes triggered by this issue.

Yes, I noticed from Deprecate obs, reports, and reported_cases in favour of data #607 that we need a PR run to rename data objects and functions to more neutral names.
Additionally, I would suggest splitting up create_clean_reported_cases() into:
- add_horizon() - completes the dates and adds the horizon window
- add_breakpoints()
- filter_leading_zeros()
- handle_zero_threshold()

In particular, the zero_threshold cleaning step (https://github.com/epiforecasts/EpiNow2/blob/main/R/create.R#L60-L77) deserves being in a separate function.

sbfnk · 2024-04-30T08:09:26Z

I really like those suggestions. I think they might take a bit more thinking though so would suggest to push them into a future release in order to get 1.5.0 out asap.

jamesmbaazam · 2024-04-30T08:12:06Z

I really like those suggestions. I think they might take a bit more thinking though so would suggest to push them into a future release in order to get 1.5.0 out asap.

Agreed. They're not user-facing so not a priority for this release.

sbfnk · 2024-04-30T08:26:52Z

The suggestions here could also help address #547 and #640

sbfnk · 2024-05-21T08:19:29Z

Proposed set up for data handling / cleaning would be to

distinguish between NA (no value) and missing dates (can be accumulated) Distinguish NA (missing) from NA (accumulated) #547
remove the horizon argument from estimate_infections etc.
add a forecast argument with function forecast_opts with arguments horizon, frequency, and maybe future (from rt_opts()) - internally this will add future dates to the passed data frame with NA values at a given frequency (which will ensure it works with the set up for accumulating incidence if desired)
add a clean_data() or similar function that does the other steps mentioned above, i.e. filtering leading zeroes and handling the zero threshold. This would be in line with reported_cases_opts() #346 (comment)

In principle the horizon stuff could also be separate in an add_horizon() function but perhaps making settings about forecasts a data manipulating function is confusing? At the same time there's a certain elegance to it, i.e. future dates are just another form of missing data and the model doesn't actually need to know when the present is except for plotting and the future argument in rt_opts(). The present could be inferred as the last known data point.

sbfnk · 2024-05-21T08:53:51Z

We probably still want a data_opts() function where users can specify which column in the data frame indicates the date and which the data to fit to (if not the standard names), see #505

seabbs · 2024-05-21T22:44:28Z

an add_horizon() fun

I like this idea a lot

sbfnk · 2024-05-22T07:52:11Z

@jamesmbaazam what do you think?

jamesmbaazam · 2024-05-22T14:15:27Z

I like all the points.

In principle the horizon stuff could also be separate in an add_horizon() function but perhaps making settings about forecasts a data manipulating function is confusing? At the same time there's a certain elegance to it, i.e. future dates are just another form of missing data and the model doesn't actually need to know when the present is except for plotting and the future argument in rt_opts(). The present could be inferred as the last known data point.

I think the forecasting stuff should probably be done internally using forecast_opts() rather than being a preprocessing step. It removes one step in the model setup.

Maybe, frequency -> interval?? (Unless I don't understand what frequency means here).

add a clean_data() or similar function that does the other steps mentioned above, i.e. filtering leading zeroes and handling the zero threshold. This would be in line with #346 (comment)

Maybe, more specifically, handle_zero_cases()?

jamesmbaazam · 2024-05-22T14:32:34Z

We probably still want a data_opts() function where users can specify which column in the data frame indicates the date and which the data to fit to (if not the standard names), see #505

This seems to suit the use case for linelist but not suggesting we take it on as a dependency.

seabbs · 2024-05-22T14:35:40Z

We probably still want a data_opts() function where users can specify which column in the data frame indicates the date and which the data to fit to (if not the standard names), see

I'm generally pretty sceptical of this as an idea. If users are going to get useful things out of the package they likely have thoughts on how to change the name of a variable already.

Or is the proposal you then track their column name through the code base? Again I am a bit sceptical of the value added here to most users.

sbfnk · 2024-05-22T14:41:56Z

I'm generally pretty sceptical of this as an idea. If users are going to get useful things out of the package they likely have thoughts on how to change the name of a variable already.

I generally agree, just flagging that if ever going ahead with the suggestion in #371 (comment) we might need a way to point out which column in a passed data frame corresponds to which observation model (though that'll likely look different from a data_opts() option).

Somewhere we might also want users to specify what the data represent, i.e. #505

sbfnk · 2024-08-02T12:03:33Z

It seems we broadly have two options here:

obs |>
  rename(value = confirm) |>
  filter_leading_zeroes() |>
  apply_zero_threshold(threshold = 10) |>
  add_horizon(n = 3, frequency = 7) |>
  estimate_infections()

or

obs |>
  estimate_infections(
    data = data_opts(col = "confirm", zero_threshold = 10),
    forecast = forecast_opts(n = 3, frequency = 7)
  )

jamesmbaazam · 2024-08-02T15:21:18Z

I think the second cannot be piped that way as by definition, the pipe passes the data to the first argument. So it will rather be

  estimate_infections(
    data = data_opts(obs, col = "confirm", zero_threshold = 10),
    forecast = forecast_opts(n = 3, frequency = 7)
  )

Until now, users have not had to do any data cleaning using exported functions here, so I am more inclined to vote for the second option.

sbfnk · 2024-08-02T15:31:45Z

If the case data set becomes part of data_opts yes, but I don't think it has to be (though of course it might make sense for it to do so).

sbfnk · 2024-08-02T15:54:34Z

Until now, users have not had to do any data cleaning using exported functions here, so I am more inclined to vote for the second option.

That is a valid point. A counterpoint would be that with the explicit functions users can actually see what happens (e.g. which values get filtered out / changed, or which dates will be forecast) whereas if it's all internal to estimate_infections() they have no way of accessing that information.

jamesmbaazam · 2024-08-02T16:56:04Z

If the case data set becomes part of data_opts yes, but I don't think it has to be (though of course it might make sense for it to do so).

We already have a data argument so if we are going by the original second option, it will have to be named something else, then the piping will work.

That is a valid point. A counterpoint would be that with the explicit functions users can actually see what happens (e.g. which values get filtered out / changed, or which dates will be forecast) whereas if it's all internal to estimate_infections() they have no way of accessing that information.

Valid point. Alternatively, we could improve the logging and messaging in the current setup to report all of this.

seabbs added enhancement New feature or request help wanted Extra attention is needed labels Jan 23, 2023

sbfnk mentioned this issue Jul 18, 2023

EpiNow2 2.0.0 roadmap #423

Closed

18 tasks

sbfnk mentioned this issue Nov 13, 2023

Remove references to "confirmed" cases #505

Open

sbfnk mentioned this issue Feb 21, 2024

create gt_opts as an alias for generation_time_opts #564

Closed

This was referenced Mar 11, 2024

Add filter_leading_zeros and zero_threshold arguments to estimate_secondary() and estimate_truncation() #605

Closed

Add preprocessing_opts() #606

Open

Deprecate obs, reports, and reported_cases in favour of data #607

Closed

sbfnk mentioned this issue Apr 30, 2024

accumulate for observed reports #643

Draft

7 tasks

sbfnk added this to the CRAN v1.6 release milestone May 1, 2024

jamesmbaazam mentioned this issue May 2, 2024

Pre-release cleanup #649

Merged

7 tasks

avallecam mentioned this issue Jun 7, 2024

replace cases by confirm for API consistency epiverse-trace/cfr#131

Open

sbfnk modified the milestones: CRAN v1.6 release, CRAN v1.7 release Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reported_cases_opts() #346

reported_cases_opts() #346

seabbs commented Jan 23, 2023

sbfnk commented Jan 29, 2024 •

edited

Loading

sbfnk commented Mar 6, 2024

sbfnk commented Mar 11, 2024 •

edited

Loading

jamesmbaazam commented Mar 11, 2024

sbfnk commented Mar 11, 2024

jamesmbaazam commented Mar 11, 2024

sbfnk commented Mar 11, 2024

jamesmbaazam commented Apr 25, 2024 •

edited

Loading

sbfnk commented Apr 26, 2024

seabbs commented Apr 29, 2024

sbfnk commented Apr 29, 2024

jamesmbaazam commented Apr 29, 2024

jamesmbaazam commented Apr 29, 2024 •

edited

Loading

sbfnk commented Apr 30, 2024

jamesmbaazam commented Apr 30, 2024

sbfnk commented Apr 30, 2024 •

edited

Loading

sbfnk commented May 21, 2024

sbfnk commented May 21, 2024 •

edited

Loading

seabbs commented May 21, 2024

sbfnk commented May 22, 2024

jamesmbaazam commented May 22, 2024 •

edited

Loading

jamesmbaazam commented May 22, 2024

seabbs commented May 22, 2024

sbfnk commented May 22, 2024 •

edited

Loading

sbfnk commented Aug 2, 2024

jamesmbaazam commented Aug 2, 2024 •

edited

Loading

sbfnk commented Aug 2, 2024

sbfnk commented Aug 2, 2024

jamesmbaazam commented Aug 2, 2024

reported_cases_opts() #346

reported_cases_opts() #346

Comments

seabbs commented Jan 23, 2023

sbfnk commented Jan 29, 2024 • edited Loading

sbfnk commented Mar 6, 2024

sbfnk commented Mar 11, 2024 • edited Loading

jamesmbaazam commented Mar 11, 2024

sbfnk commented Mar 11, 2024

jamesmbaazam commented Mar 11, 2024

sbfnk commented Mar 11, 2024

jamesmbaazam commented Apr 25, 2024 • edited Loading

sbfnk commented Apr 26, 2024

seabbs commented Apr 29, 2024

sbfnk commented Apr 29, 2024

jamesmbaazam commented Apr 29, 2024

jamesmbaazam commented Apr 29, 2024 • edited Loading

sbfnk commented Apr 30, 2024

jamesmbaazam commented Apr 30, 2024

sbfnk commented Apr 30, 2024 • edited Loading

sbfnk commented May 21, 2024

sbfnk commented May 21, 2024 • edited Loading

seabbs commented May 21, 2024

sbfnk commented May 22, 2024

jamesmbaazam commented May 22, 2024 • edited Loading

jamesmbaazam commented May 22, 2024

seabbs commented May 22, 2024

sbfnk commented May 22, 2024 • edited Loading

sbfnk commented Aug 2, 2024

jamesmbaazam commented Aug 2, 2024 • edited Loading

sbfnk commented Aug 2, 2024

sbfnk commented Aug 2, 2024

jamesmbaazam commented Aug 2, 2024

sbfnk commented Jan 29, 2024 •

edited

Loading

sbfnk commented Mar 11, 2024 •

edited

Loading

jamesmbaazam commented Apr 25, 2024 •

edited

Loading

jamesmbaazam commented Apr 29, 2024 •

edited

Loading

sbfnk commented Apr 30, 2024 •

edited

Loading

sbfnk commented May 21, 2024 •

edited

Loading

jamesmbaazam commented May 22, 2024 •

edited

Loading

sbfnk commented May 22, 2024 •

edited

Loading

jamesmbaazam commented Aug 2, 2024 •

edited

Loading