Skip to content
This repository has been archived by the owner on Sep 29, 2023. It is now read-only.

Update help menus, fix typos (in documentation and variable names), add more explanation of day offsets #16

Open
alisoncallahan opened this issue May 14, 2021 · 1 comment

Comments

@alisoncallahan
Copy link

2. CLMBR Pre-training.ipynb

--clmbr_create_info

  • rename --extract_dir to --input_data_dir (or something similar), and explain what it is in the help menu - right now the description in help is the name of the command ('Extract dir') which isn't very informative

--clmbr_train_model

  • complete the help menu descriptions, i.e. add English language descriptions of all the options
  • make the required options actually required (the help menu indicates that they are all optional arguments, but then the notebook says there are two required arguments)
  • the description of --model_dir is kind of confusing - 'override' implies that there's a default output location, but then the notebook says this option is required, and that it can't already exist (what if I want to overwrite the content of an existing folder? I have to delete the folder first? sure, the benefit is that it is a stopgap to prevent accidental overwriting, but also one extra step)
  • also, it seems like the logic for the directory overwriting is just in the next cell, not actually built into the clmbr_train_model function, which isn't consistent with the documentation of the notebook

3a. Model Training with Custom Labeler.ipynb

  • "Set up the labeler for the downstream task we're interested in"

    • the comment for the DiabetesLabeler class could use a little more detail - the description below this cell makes it sound like the labeler is labeling patient days based on whether there is a diabetes code present on that day or not. But the comment talks about a prediction task with a time horizon (also, how does one specify the time point that the horizon is relative to?).

    • why do timelines have dictionaries? need documentation in timeline.pyi

    • "this randomly selects on label per patient " should be "this randomly selects one label per patient"

    • for output of ehr_ml.clmbr.featurize_patients_w_labels (features, labels, patient_ids, day_offsets) -- what is day_offsets relative to? DOB of patient?

  • "Using the trained model"

3b. Model Training with Patient List.ipynb

  • explanation of day offsets could include more information, for truly new users - e.g. is this date the date that a prediction should be made? Is this date somehow used as a cutoff for generating features? The notebook seems to assume a lot of knowledge/intution about the day offsets that users may not have.
@woffett
Copy link
Contributor

woffett commented May 21, 2021

Update to reflect changes in the newest PR:

2. CLMBR Pre-training

  • clmbr_create_info
    • the flag has been renamed to --input_data_dir
  • clmbr_train_model
    • help menu descriptions have been added for all options
    • the required options have been specified as positional arguments instead of optional arguments
    • description of --model_dir has been updated
    • the logic in the cell exists to prevent folks from overwriting the existing sample documents; it is redundant, as a check also exists in the code for the command, but this makes it clear for people who are just running the notebook and don't need to look at the code
    • the typo has been fixed

3a. Model Training with Custom Labeler

  • This first point is still open and is to be addressed.
  • I think this comment refers to the line in the labeler class diabetes_code = timelines.get_dictionary().map("ICD10CM/E11.9"). I'm not too sure myself, @lalaland do you have any clarification on this?
  • My understanding was that day_offsets indexes into an array of all days where a patient observation is recorded in the dataset. TODO: the documentation still needs to be updated to reflect this, once it's updated we can link to it in the notebook for clarity.
  • the variable should be renamed in the next PR

3b. Model Training with Patient List

  • That's a good point. TODO: add explanation in 3b. about what convert_patient_data is doing

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants