Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DailyMed NDC to Label Image Mart #326

Merged
merged 49 commits into from
Oct 30, 2024
Merged

DailyMed NDC to Label Image Mart #326

merged 49 commits into from
Oct 30, 2024

Conversation

jrlegrand
Copy link
Member

@jrlegrand jrlegrand commented Oct 24, 2024

Resolves #309
Resolves #322

Explanation

Took the approach of using FTP to download all desired DailyMed SPL zip files. Can specify in the DAG whether you want all human rx / 1 out of the 5 human rx / OTC / etc. By default it pulls all human rx.

Extract:
DAG will unzip all outer zip files into one folder leaving a folder (i.e. data/dailymed/prescription) full of thousands of zip files.

Load:
DAG will peek inside of each zip file and unzip just the XML document to the folder. Then it will parse the XML document using a custom XSLT template (dags/dailymed/template.xsl). It will then delete that XML file and move on to the next zip file. It will store the resulting XML created from the template in a list of dicts and when finished will convert that to a Pandas dataframe and then load it into Postgres. Any changes to the initial XML parsing need to be in the template.xml file for now. Optimization will be to modularize this a bit so different parts might be in different XSL files.

NOTE: see example XML template output at the bottom of this PR.

Transform:
This is probably the most unusual part, but the part I like the best. Instead of using Python or other methods to transform the resulting smaller XML document, I use dbt data models (using PostgreSQL XML functions) to transform the XML in the data lake in a stepwise manner using staging and intermediate tables that can be checked along the way for troubleshooting purposes.

Rough transform workflow:

  1. stg_dailymed__ndcs - get all valid NDCs for SPL (at the SPL level, not at the package label section level) - used for validation of parsed (RegEx'ed) NDCs
  2. stg_dailymed__package_label_sections - get each package label section for each SPL
  3. stg_dailymed__package_label_section_images - get each image from MediaList/Media
  4. stg_dailymed__package_label_section_ndcs - use RegEx to parse all potential NDCs from the Text of the section
  5. int_dailymed_validated_package_label_ndcs - compare parsed NDCs against valid NDCs for SPL (to filter out noise). Maintain the order the NDCs appear in the text using ranking
  6. int_dailymed_ranked_package_label_ndcs - re-rank the filtered NDCs (i.e. if 3 NDCs were found but only 1 and 3 were valid, then the new rank would be 1, 2 instead of 1, 3).
  7. int_dailymed_ranked_package_label_images - rank the order in which the image files appear in the package label section.
  8. int_dailymed_image_xml_ndcs - Map validated NDCs to images based on order they appear. The assumption is that NDCs and images appear in the same order and we can map them together as such.
  9. int_dailymed_image_name_ndcs - using the image names found in the package label sections, try to RegEx NDCs out of the name and validate the match against valid NDCs for the SPL, converting both to NDC11 first to ensure no formatting issues. Store the NDC11 version of the matched NDC from the image file name.
  10. MART: ndcs_to_label_images - pull everything together. Basically just union together the last two intermediate models (matches from XML and matches from image file names) and then concatenate stuff together to get links to images and DailyMed SPL pages.

Rationale

DailyMed SPLs have label images for many drug products, but they are not at the NDC level - they are at the SPL level. To get to NDC-level images, you need to do something along the lines of what we've done here.

NDC-level images are useful for drug purchasing or basic drug information about what an NDC looks like.

Tests

Ran DAG to completion and built marts using dbt run --select ndcs_to_label_images

This produced around 57k NDC -> label image matches at the time of writing this PR

I had run this DAG several times before and each time, I compared outputs manually to try to validate that I wasn't breaking anything that worked before and was actually adding new matches. I feel like I'm at a stable point where enough is working well that this should finally be merged to main.

Future Enhancements

NOTE: every time this DAG is run, we currently need to manually DROP/CASCADE from the sagerx_lake.dailymed table to avoid duplication. This needs to be addressed.

ALSO NOTE: I think if we expand from just all human rx to all human rx and OTC, something weird happens with the folders during extract or load. If we expand to human and OTC this needs to be fixed.

General optimizations:

example XML template output

NOTE: the important parts for this work are everything inside <PackageLabels />.

<MediaList /> contains a list of all images found directly within or referenced from the package label section. We try to associate this with any NDCs parsed out of the text of the section and also try to parse NDCs directly from the image name (i.e. sometimes images are named "12345-456-2.jpg".
<Text /> is the raw text of the section that we parse for NDCs using RegEx in a dbt data model
<ID /> is the ID of the section

There can be multiple package label sections. In this example, there is only one.

<NDCList /> is also relevant to this work. It contains the list of NDCs represented by the SPL overall. It is used to validate any NDCs we parse out of the text of the package label section.

<dailymed>
  <documentId>057302f7-9a50-42f4-8f96-ce23f409bc4c</documentId>
  <SetId>8aa48212-19b0-4304-9c61-dcf4db2b19ea</SetId>
  <VersionNumber>5</VersionNumber>
  <EffectiveDate>20220601</EffectiveDate>
  <MarketStatus>ANDA</MarketStatus>
  <ApplicationNumber>ANDA091240</ApplicationNumber>
  <PackageLabels>
    <PackageLabel>
      <MediaList>
        <Media>
          <ID>EB654092-98B7-485A-927B-56AB5CA4B2B8</ID>
          <Image>8c496175-figure-02.jpg</Image>
        </Media>
      </MediaList>
      <ID>2bfd3ca6-c29b-4bc6-96c1-6017e64b59d2</ID>
      <Text>
               
               
               PACKAGE LABEL.PRINCIPAL DISPLAY PANEL 
               
                  
               
               
               
                  
                     71205-215-30
                     
                        
                     
                  
               
            </Text>
    </PackageLabel>
  </PackageLabels>
  <NDCList>
    <NDC>71205-215-20</NDC>
    <NDC>71205-215-30</NDC>
    <NDC>71205-215-60</NDC>
    <NDC>71205-215-90</NDC>
  </NDCList>
  <InteractionText/>
  <Organizations>
    <establishment>
      <DUN>079196022</DUN>
      <name>Proficient Rx LP</name>
      <type>Repacker</type>
      <source_list>
        <source>31722-542</source>
      </source_list>
    </establishment>
    <establishment>
      <DUN>079196022</DUN>
      <name>Proficient Rx LP</name>
      <type>Functioner</type>
      <function>
        <name>REPACK</name>
        <item_list>
          <item>71205-215</item>
        </item_list>
      </function>
      <function>
        <name>RELABEL</name>
        <item_list>
          <item>71205-215</item>
        </item_list>
      </function>
    </establishment>
    <OrganizationsText>
                  Indomethacin Capsules, USP are available containing either 25 mg of Indomethacin, USP.
                  The 25 mg capsules are size &#8216;3&#8217; hard gelatin capsules, with opaque light green cap imprinted with &#8216;H&#8217; and opaque light green body imprinted with &#8216;103&#8217;, containing white to off-white powder.
                  Bottles of 20 capsules NDC 71205-215-20
                  Bottles of 30 capsules NDC 71205-215-30
                  Bottles of 60 capsules NDC 71205-215-60
                  Bottles of 90 capsules NDC 71205-215-90
                  
                     Store at 20&#176; to 25&#176;C (68&#176; to 77&#176;F) [see USP Controlled Room Temperature]. 
                  
                  
                     Protect from light. 
                  
                  Dispense in a tight, light-resistant container as defined in the USP using a child-resistant closure.
                  
                     PHARMACIST: Dispense a Medication Guide with each prescription.
                  Manufactured for:Camber Pharmaceuticals, Inc&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; 2012729Piscataway, NJ 08854
                  By: Hetero Labs LimitedJeedimetla, Hyderabad-500 055, India.
                  Repackaged by:Proficient Rx LPThousand Oaks, CA 91320
               </OrganizationsText>
    <OrganizationsText>
                  
                     
                        INDOMETHACIN CAPSULES USP
                     
                  
                  
                     Medication Guide for Non-Steroidal Anti-Inflammatory Drugs (NSAIDs)
                  
                  
                     (See the end of this Medication Guide for a list of prescription NSAID medicines.) 
                  
                  
                     What is the most important information I should know about medicines called Non-Steroidal Anti-Inflammatory Drugs (NSAIDs)? 
                  
                  
                     NSAID medicines may increase the chance of a heart attack or stroke that can lead to death. This chance increases:
                  
                  
                     
                        &#8226;with longer use of NSAID medicines
                     
                        &#8226;in people who have heart disease
                  
                  
                     NSAID medicines should never be used right before or after a heart surgery called a "coronary artery bypass graft (CABG)." 
                  
                  
                     NSAID medicines can cause ulcers and bleeding in the stomach and intestines at any time during treatment. Ulcers and bleeding:
                  
                  
                     
                        &#8226;can happen without warning symptoms
                     
                        &#8226;may cause death
                  
                  
                     The chance of a person getting an ulcer or bleeding increases with: 
                  
                  
                     
                        &#8226;taking medicines called "corticosteroids" and "anticoagulants"
                     
                        &#8226;longer use
                     
                        &#8226;smoking
                     
                        &#8226;drinking alcohol
                     
                        &#8226;older age
                     
                        &#8226;having poor health
                  
                  
                     NSAID medicines should only be used: 
                  
                  
                     
                        &#8226;exactly as prescribed
                     
                        &#8226;at the lowest dose possible for your treatment
                     
                        &#8226;for the shortest time needed
                  
                  
                     What are Non-Steroidal Anti-Inflammatory Drugs (NSAIDs)? 
                  
                  NSAID medicines are used to treat pain and redness, swelling, and heat (inflammation) from medical conditions such as:
                  
                     
                        &#8226;different types of arthritis
                     
                        &#8226;menstrual cramps and other types of short-term pain
                  
                  
                     Who should not take a Non-Steroidal Anti-Inflammatory Drug (NSAID)? 
                  
                  
                     Do not take an NSAID medicine: 
                  
                  
                     
                        &#8226;if you had an asthma attack, hives, or other allergic reaction with aspirin or any other NSAID medicine
                     
                        &#8226;for pain right before or after heart bypass surgery
                  
                  
                     Tell your healthcare provider: 
                  
                  
                     
                        &#8226;about all of your medical conditions.
                     
                        &#8226;about all of the medicines you take. NSAIDs and some other medicines can interact with each other and cause serious side effects. Keep a list of your medicines to show to your healthcare provider and pharmacist.
                     
                     
                        &#8226;if you are pregnant. NSAID medicines should not be used by pregnant women late in their pregnancy. 
                     
                     
                        &#8226;if you are breastfeeding. Talk to your doctor. 
                     
                  
                  
                     What are the possible side effects of Non-Steroidal Anti-Inflammatory Drugs (NSAIDs)? 
                  
                  
                     
                     
                     
                        
                           
                              
                                 &#160;&#160; serious side effects include:
                              
                           
                           
                              
                                  Other side effects include:&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; 
                              
                           
                        
                        
                           
                              
                                 
                                    &#8226;heart attack
                                 
                                    &#8226;stroke
                                 
                                    &#8226;high blood pressure
                                 
                                    &#8226;heart failure from body swelling (fluid Retension)
                                 
                                    &#8226;kidney problems including kidney failure
                                 
                                    &#8226;bleeding and ulcers in the stomach and intestine
                                 
                                    &#8226;low red blood cells (anemia)
                                 
                                    &#8226;life-threatening skin reactions
                                 
                                    &#8226;life-threatening allergic reactions
                                 
                                    &#8226;liver problems including liver failure
                                 
                                    &#8226;asthma attacks in people who have asthma
                              
                           
                           
                              
                                 
                                    &#8226;stomach pain
                                 
                                    &#8226;constipation
                                 
                                    &#8226;diarrhea
                                 
                                    &#8226;gas
                                 
                                    &#8226;heartburn
                                 
                                    &#8226;nausea
                                 
                                    &#8226;vomiting
                                 
                                    &#8226;dizziness
                              
                           
                        
                     
                  
                  
                     Get emergency help right away if you have any of the following symptoms: 
                  
                  
                     
                        &#8226;shortness of breath or trouble breathing
                     
                        &#8226;chest pain
                     
                        &#8226;weakness in one part or side of your body
                     
                        &#8226;slurred speech
                     
                        &#8226;swelling of the face or throat
                  
                  
                     Stop your NSAID medicine and call your healthcare provider right away if you have any of the following symptoms: 
                  
                  
                     
                        &#8226;nausea
                     
                        &#8226;more tired or weaker than usual
                     
                        &#8226;itching
                     
                        &#8226;your skin or eyes look yellow
                     
                        &#8226;stomach pain
                     
                        &#8226;flu-like symptoms
                     
                        &#8226;vomit blood
                     
                        &#8226;there is blood in your bowel movement or it is black and sticky like tar
                     
                        &#8226;unusual weight gain
                     
                        &#8226;skin rash or blisters with fever
                     
                        &#8226;swelling of the arms and legs, hands and feet
                  
                  These are not all the side effects with NSAID medicines. Talk to your healthcare provider or pharmacist for more information about NSAID medicines. Call your doctor for medical advice about side effects. You may report side effects to FDA at 1-800-FDA-1088.
                  
                  
                     Other information about Non-Steroidal Anti-Inflammatory Drugs (NSAIDs) 
                  
                  Aspirin is an NSAID medicine but it does not increase the chance of a heart attack.Aspirin can cause bleeding in the brain, stomach, and intestines. Aspirin can also cause ulcers in the stomach and intestines.
                  Some of these NSAID medicines are sold in lower doses without a prescription (over-the-counter). Talk to your healthcare provider before using over-the-counter NSAIDs for more than 10 days.
                  
                     
                        NSAID medicines that need a prescription
                     
                  
                  
                     
                     
                     
                        
                           
                              
                                 Celecoxib
                              
                           
                           
                              
                                 &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Celebrex
                              
                           
                        
                        
                           
                              Diclofenac
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Cataflam, Voltaren, Arthrotec (combined with&#160; misoprostol)
                           
                        
                        
                           
                              Diflunisal
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Dolobid
                           
                        
                        
                           
                              Etodolac
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Lodine, Lodine XL
                           
                        
                        
                           
                              Fenoprofen
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Nalfon, Nalfon 200
                           
                        
                        
                           
                              Flurbiprofen
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Ansaid
                           
                        
                        
                           
                              Ibuprofen
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Motrin, Tab-Profen, Vicoprofen( (combined with&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; hydrocodone), Combunox (combined with oxycodone)
                           
                        
                        
                           
                              Indomethacin
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Indocin, Indocin SR, Indo-Lemmon, Indomethagan
                           
                        
                        
                           
                              Ketoprofen
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Oruvail
                           
                        
                        
                           
                              Ketorolac
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Toradol
                           
                        
                        
                           
                              MefenamicAcid
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Ponstel
                           
                        
                        
                           
                              Meloxicam
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Mobic
                           
                        
                        
                           
                              Nabumetone
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Relafen
                           
                        
                        
                           
                              Naproxen
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Naprosyn, Anaprox, Anaprox DS, EC-Naprosyn, Naprelan,&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Naprapac (copackaged with lansoprazole)
                           
                        
                        
                           
                              Oxaprozin
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Daypro
                           
                        
                        
                           
                              Piroxicam
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Feldene
                           
                        
                        
                           
                              Sulindac
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Clinoril
                           
                        
                        
                           
                              Tolmetin
                           
                           
                              &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Tolectin, Tolectin DS, Tolectin 600
                           
                        
                     
                  
                  
                     *Vicoprofen contains the same dose of ibuprofen as over-the-counter (OTC) NSAIDs, and is usually used for less than 10 days to treat pain. The OTC NSAIDS label warns that long term continuous use may increase the risk of heart attack or stroke.
                  
                     This Medication Guide has been approved by the U.S. Food and Drug Administration. 
                  
                  Manufactured for:Camber Pharmaceuticals, IncPiscataway, NJ 08854
                  By: Hetero Labs LimitedJeedimetla, Hyderabad-500 055, India.
                  Repackaged by:Proficient Rx LPThousand Oaks, CA 91320
               </OrganizationsText>
  </Organizations>
</dailymed>

Created an XPath for Media that looks for ObservationMedia
and then grabs an image file name (if it exists - need to build
a test for if it exists to reduce the noise probably) and also
the entire text of the component.

Next step is to build a dbt staging model to RegEx the NDC
out of the <Text/> element since XPath doesn't natively
support that.
Changed template to look specifically at the package label
display panel section(s) in the SPL for images. Also updated
the staging table to have nested XMLTABLE commands (thanks
ChatGPT).
Created an XPath for Media that looks for ObservationMedia
and then grabs an image file name (if it exists - need to build
a test for if it exists to reduce the noise probably) and also
the entire text of the component.

Next step is to build a dbt staging model to RegEx the NDC
out of the <Text/> element since XPath doesn't natively
support that.
Changed template to look specifically at the package label
display panel section(s) in the SPL for images. Also updated
the staging table to have nested XMLTABLE commands (thanks
ChatGPT).
@jrlegrand jrlegrand changed the title DailyMed NDC to Image Mart DailyMed NDC to Label Image Mart Oct 24, 2024
Copy link
Collaborator

@lprzychodzien lprzychodzien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jrlegrand jrlegrand merged commit 8cd1504 into main Oct 30, 2024
@jrlegrand jrlegrand deleted the jrlegrand/dailymed branch October 30, 2024 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Image referenced but not directly mentioned in packaging section DailyMed XML processing for NDC -> image
2 participants