Skip to content

Commit

Permalink
Update the regex pattern to capture non-word, non-digit ...
Browse files Browse the repository at this point in the history
We need to be able to capture expressions like ensembl.gene
and uniprot.Swiss-Prot which were not captured by the default
pattern matching mechanism.

<Per regex101>
Named Capture Group scope (?P<scope>[\w\W]+)
Match a single character present in the list below [\w\W]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\w matches any word character (equivalent to [a-zA-Z0-9_])
\W matches any non-word character (equivalent to [^a-zA-Z0-9_])
  • Loading branch information
Johnathan Schaff committed Feb 7, 2024
1 parent 355b755 commit b8b245d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/config_web.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@
# CURIE ID prefixes match the pattern. This pattern matches the default
# presented in the ESQueryBuilder in the biothings.api library.
# Infers based off empty scopes
default_pattern = (re.compile(r"(?P<scope>\w+):(?P<term>[^:]+)"), [])
default_pattern = (re.compile(r"(?P<scope>[\W\w]+):(?P<term>[^:]+)"), [])

ANNOTATION_ID_REGEX_LIST = [
*biolink_curie_regex_list,
Expand Down

0 comments on commit b8b245d

Please sign in to comment.