You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 29, 2023. It is now read-only.
rename --extract_dir to --input_data_dir (or something similar), and explain what it is in the help menu - right now the description in help is the name of the command ('Extract dir') which isn't very informative
--clmbr_train_model
complete the help menu descriptions, i.e. add English language descriptions of all the options
make the required options actually required (the help menu indicates that they are all optional arguments, but then the notebook says there are two required arguments)
the description of --model_dir is kind of confusing - 'override' implies that there's a default output location, but then the notebook says this option is required, and that it can't already exist (what if I want to overwrite the content of an existing folder? I have to delete the folder first? sure, the benefit is that it is a stopgap to prevent accidental overwriting, but also one extra step)
also, it seems like the logic for the directory overwriting is just in the next cell, not actually built into the clmbr_train_model function, which isn't consistent with the documentation of the notebook
3a. Model Training with Custom Labeler.ipynb
"Set up the labeler for the downstream task we're interested in"
the comment for the DiabetesLabeler class could use a little more detail - the description below this cell makes it sound like the labeler is labeling patient days based on whether there is a diabetes code present on that day or not. But the comment talks about a prediction task with a time horizon (also, how does one specify the time point that the horizon is relative to?).
why do timelines have dictionaries? need documentation in timeline.pyi
"this randomly selects on label per patient " should be "this randomly selects one label per patient"
for output of ehr_ml.clmbr.featurize_patients_w_labels (features, labels, patient_ids, day_offsets) -- what is day_offsets relative to? DOB of patient?
explanation of day offsets could include more information, for truly new users - e.g. is this date the date that a prediction should be made? Is this date somehow used as a cutoff for generating features? The notebook seems to assume a lot of knowledge/intution about the day offsets that users may not have.
The text was updated successfully, but these errors were encountered:
help menu descriptions have been added for all options
the required options have been specified as positional arguments instead of optional arguments
description of --model_dir has been updated
the logic in the cell exists to prevent folks from overwriting the existing sample documents; it is redundant, as a check also exists in the code for the command, but this makes it clear for people who are just running the notebook and don't need to look at the code
the typo has been fixed
3a. Model Training with Custom Labeler
This first point is still open and is to be addressed.
I think this comment refers to the line in the labeler class diabetes_code = timelines.get_dictionary().map("ICD10CM/E11.9"). I'm not too sure myself, @lalaland do you have any clarification on this?
My understanding was that day_offsets indexes into an array of all days where a patient observation is recorded in the dataset. TODO: the documentation still needs to be updated to reflect this, once it's updated we can link to it in the notebook for clarity.
the variable should be renamed in the next PR
3b. Model Training with Patient List
That's a good point. TODO: add explanation in 3b. about what convert_patient_data is doing
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
2. CLMBR Pre-training.ipynb
--clmbr_create_info
--clmbr_train_model
clmbr_train_model
function, which isn't consistent with the documentation of the notebook3a. Model Training with Custom Labeler.ipynb
"Set up the labeler for the downstream task we're interested in"
the comment for the DiabetesLabeler class could use a little more detail - the description below this cell makes it sound like the labeler is labeling patient days based on whether there is a diabetes code present on that day or not. But the comment talks about a prediction task with a time horizon (also, how does one specify the time point that the horizon is relative to?).
why do timelines have dictionaries? need documentation in timeline.pyi
"this randomly selects on label per patient " should be "this randomly selects one label per patient"
for output of
ehr_ml.clmbr.featurize_patients_w_labels
(features, labels, patient_ids, day_offsets
) -- what isday_offsets
relative to? DOB of patient?"Using the trained model"
patient_indices
should beday_offsets
(based on featurize_patients_w_labels in https://github.com/som-shahlab/ehr_ml/blob/245dd3436a5dcddada41222611e8129be96cd85b/ehr_ml/clmbr/__init__.py)3b. Model Training with Patient List.ipynb
The text was updated successfully, but these errors were encountered: