Skip to content

Commit

Permalink
Support one-letter unicode classes like \p{L}.
Browse files Browse the repository at this point in the history
These are found in some of the newer syntax definitions.
I assume `p{L}` matches anything in Lt, Lm, or Lo, and
similarly for `p{M}` etc.
  • Loading branch information
jgm committed Dec 2, 2023
1 parent 969a555 commit 37a2e98
Showing 1 changed file with 17 additions and 0 deletions.
17 changes: 17 additions & 0 deletions skylighting-core/src/Regex/KDE/Compile.hs
Original file line number Diff line number Diff line change
Expand Up @@ -270,31 +270,48 @@ pRegexCharClass = do
"Lt" -> (== TitlecaseLetter)
"Lm" -> (== ModifierLetter)
"Lo" -> (== OtherLetter)
"L" -> (\c -> c == UppercaseLetter || c == LowercaseLetter ||
c == TitlecaseLetter || c == ModifierLetter ||
c == OtherLetter)
"Mn" -> (== NonSpacingMark)
"Mc" -> (== SpacingCombiningMark)
"Me" -> (== EnclosingMark)
"M" -> (\c -> c == NonSpacingMark || c == SpacingCombiningMark ||
c == EnclosingMark)
"Nd" -> (== DecimalNumber)
"Nl" -> (== LetterNumber)
"No" -> (== OtherNumber)
"N" -> (\c -> c == DecimalNumber || c == LetterNumber ||
c == OtherNumber)
"Pc" -> (== ConnectorPunctuation)
"Pd" -> (== DashPunctuation)
"Ps" -> (== OpenPunctuation)
"Pe" -> (== ClosePunctuation)
"Pi" -> (== InitialQuote)
"Pf" -> (== FinalQuote)
"Po" -> (== OtherPunctuation)
"P" -> (\c -> c == ConnectorPunctuation || c == DashPunctuation ||
c == OpenPunctuation || c == ClosePunctuation ||
c == InitialQuote || c == FinalQuote ||
c == OtherPunctuation)
"Sm" -> (== MathSymbol)
"Sc" -> (== CurrencySymbol)
"Sk" -> (== ModifierSymbol)
"So" -> (== OtherSymbol)
"S" -> (\c -> c == MathSymbol || c == CurrencySymbol ||
c == ModifierSymbol || c == OtherSymbol)
"Zs" -> (== Space)
"Zl" -> (== LineSeparator)
"Zp" -> (== ParagraphSeparator)
"Z" -> (\c -> c == Space || c == LineSeparator ||
c == ParagraphSeparator)
"Cc" -> (== Control)
"Cf" -> (== Format)
"Cs" -> (== Surrogate)
"Co" -> (== PrivateUse)
"Cn" -> (== NotAssigned)
"C" -> (\c -> c == Control || c == Format || c == Surrogate ||
c == PrivateUse || c == NotAssigned)
_ -> (const False)) . generalCategory
brack <- option [] $ [(==']')] <$ char ']'
fs <- many (getEscapedClass <|> getPosixClass <|> getCRange <|> getCClass)
Expand Down

0 comments on commit 37a2e98

Please sign in to comment.