Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ktokunaga/30/add marketingyears to distribs #60

Open
wants to merge 46 commits into
base: main
Choose a base branch
from

Conversation

kristentaytok
Copy link
Contributor

Fixes #30

Explanation

Carrying the FDA marketing date steps over the finish line:

  1. Cleaned up Rob's code and added to database.py --> creates two sql tables: one for ingredient_rxcui_years and the other for product_rxcui_years ; these tables contain their respective rxcui and a column with all years between MIN(startmarketingyear) and MAX(endmarketingyear) + fills NULL endmarketingyear with current year.
  2. Added the fda marketing years as a column to the ingredient_distrib dfs/CSVs and the product_distrib dfs/CSVs.
  3. Fixed the minor typo in the default_probability for Transition 2 [Product Name Distribution]: chaned idx == 0 --> idx == 1 per our discussion today, to set the first product's probability to 1.

Rationale

See issue #30

Tests

  • Execute python -m mdt.run_mdt D007037 may_treat
testing logs
**mdt output** 

Payload built with base URL: https://rxnav.nlm.nih.gov/REST/rxclass/class/byId.json? and parameters: classId=D001249
GET Request sent to URL: https://rxnav.nlm.nih.gov/REST/rxclass/class/byId.json?classId=D001249
Response HTTP Code: 200
['Asthma']
Payload built with base URL: https://rxnav.nlm.nih.gov/REST/rxclass/classMembers.json? and parameters: classId=D001249&relaSource=MEDRT&ttys=IN+MIN&rela=may_prevent
GET Request sent to URL: https://rxnav.nlm.nih.gov/REST/rxclass/classMembers.json?classId=D001249&relaSource=MEDRT&ttys=IN+MIN&rela=may_prevent
Response HTTP Code: 200
['435', '1347', '19499', '19831', '42612', '3264', '4333', '25120', '41126', '25255', '7688', '31563', '36117', '10759', '114970', '40575']
RXCUI list matched on 1966 NDCs
RXCUI list matched on 2609 NDCs

**resulting files**

- run of asthma may_treat files resulted 1 ingredient file + 17 product files, each containing a year column and distributions summing to ~1 (slight variation due to rounding to 3 decimal places)  
- Breo Ellipta is a particularly useful example to examine (FDA approved in 2013), and the year range in the fluticasone-vilanterol_product_distrib file and fluticasone_vilanterol column in the ingredient_distrib file is 2013-2021 (yay!)

yevgenybulochnik and others added 30 commits May 16, 2021 13:59
Comment on lines +211 to +212
del medication_ingredient_rxcui_years
del medication_product_rxcui_years
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kristentaytok - I get the error below because you are deleting a df that doesn't exist yet. Maybe you meant a different df? Please take a look. This only errors out if you are initially loading the DB, but doesn't affect the actual DB load.

UnboundLocalError: local variable 'medication_ingredient_rxcui_years' referenced before assignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agh, I had that error at first and oddly it was fine after running the exact same code once in a Jupyter nb. but I think I figured it out in case we decide to use this (or some variation of it): I was supposed to write it this way:

del med_marketing_year_dict['medication_ingredient_rxcui_years']
del med_marketing_year_dict['medication_product_rxcui_years']

because I created the dataframes as dictionary key-value pairs because that's the only option (I'm aware of) to dynamically create a variable name in the for loop (i.e., create a variable called "medication_ingredient_df" in the first loop, and another variable "medication_product_df" in the next loop).

@jrlegrand
Copy link
Member

Uh oh @kristentaytok - I'm getting the error when I run Synthea that I was worried about:

java.lang.RuntimeException: LOOKUP TABLE ERROR: Attribute 'year' in CSV table 'Hypothyroidism_ingredient_distrib.csv' does not exist as one of this person's attributes.

What this means is that I don't think we can build year into the CSV distribution because the current year isn't technically a patient attribute... which complicates things... I think this means I will have to build the conditional logic into the JSON to check current year and if it's too early, go back to the beginning of the submodule to find a different match that works... I dunno...

@jrlegrand
Copy link
Member

I also tried replacing year with date as the column for the csv files, but that did not work.

https://github.com/synthetichealth/synthea/wiki/Generic-Module-Framework%3A-Logic#date

@jrlegrand
Copy link
Member

One hacky thing we could try would be to assign a current_year attribute to the patient at the beginning of the submodule and then clear it out at the end of the submodule... though I don't even know if that's technically possible with Synthea.

@jrlegrand
Copy link
Member

@kristentaytok - I just emailed the Synthea devs asking whether either of my ideas are technically possible. For now, let's hold off on this. As we discussed before, not having the FDA year thing figured out should not impede validation testing. Thanks for doing all this - I did not expect you to turn this all around in a day, but other than Synthea not playing nice, it seems to work very well! I really hope I didn't waste your time with this one.

@kristentaytok
Copy link
Contributor Author

@jrlegrand thanks for the update and testing out the year column!

and no worries - I figured that like the validation/chi square testing, it was something we should "fail fast" to see if it was possible and more of the time spent on the issue would be figuring out what to do if it doesn't lol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Figure out an automated way to find first available date of a medication product
4 participants