Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard coded SNOMED release file bug. #467

Open
antsh3k opened this issue Jul 24, 2024 · 4 comments
Open

Hard coded SNOMED release file bug. #467

antsh3k opened this issue Jul 24, 2024 · 4 comments

Comments

@antsh3k
Copy link
Collaborator

antsh3k commented Jul 24, 2024

Due to changes in the naming convention of SNOMED CT release files. The date/release no longer fits these exact character range.

self.release = data_path[-16:-8]

Needs to be checked with different releases and extensions before merging.

@mart-r
Copy link
Collaborator

mart-r commented Jul 25, 2024

I've got these ones locally from June and they seem to still produce the release date reliably:

SnomedCT_InternationalRF2_PRODUCTION_20240201T120000Z
SnomedCT_InternationalRF2_PRODUCTION_20240601T120000Z
SnomedCT_UKClinicalRF2_PRODUCTION_20240410T000001Z
SnomedCT_UKClinicalRefsetsRF2_PRODUCTION_20240410T000001Z
SnomedCT_UKDrugRF2_PRODUCTION_20240508T000001Z
SnomedCT_UKEditionRF2_PRODUCTION_20240410T000001Z
SnomedCT_UKEditionRF2_PRODUCTION_20240508T000001Z
SnomedCT_Release_AU1000036_20240630T120000Z

Which versions does the new naming convention start with? And what does it look like?

@antsh3k
Copy link
Collaborator Author

antsh3k commented Jul 25, 2024

You are right nothing has changed. I think what I was alluding to was using the folder one level up. I could be mistaken in using the wrong level. In which case we need to throw an error as it can pass through without one.

For example, for the following names, this convention does not work.

uk_sct2cl_38.2.0_20240605000001Z
uk_sct2cl_32.6.0_20211027000001Z

@tomolopolis
Copy link
Member

so the above code is brittle anyway. data_path.split('_')[3] would make more sense? with different split indices tried out?

@mart-r
Copy link
Collaborator

mart-r commented Jul 25, 2024

so the above code is brittle anyway. data_path.split('_')[3] would make more sense? with different split indices tried out?

Yeah, I think we should be able to match the folder basename with regex and pull the third group:

^SnomedCT_([A-Za-z0-9]+)_([A-Za-z0-9]+)_(\d{8}T\d{6}Z$)

If there's no match, we can raise an exception.
If there's a match, we can take the first 8 characters of the third group.

EDIT:
Just to comment on the splitting - it wouldn't necessarily catch a weirdly named folder. It would work for anything with at least 3 _s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants