You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Pomsky currently does not remove unnecessary elements in character classes. E.g. [ w "abc" ] compiles to [\wabc] (Java). However, the abc is unnecessary because [\wabc] == \w.
Describe the solution you'd like
Remove unnecessary elements in character classes to optimize and simplify them.
Additional context
This requires knowing the precise set of characters accepted by each character class element. For an example implementation of this, checkout the regexp/no-dupe-characters-character-class rule.
The text was updated successfully, but these errors were encountered:
Thanks for your feature request! This is already on my to-do list, but is tricky to get right.
Another reason why we need this is to prevent the following:
![w !d]
A negated character set matching neither \w nor \D matches nothing, which is forbidden in Rust. So I'm working on a way to determine whether two character classes overlap, are disjunct, or one is a subset of the other.
So I'm working on a way to determine whether two character classes overlap, are disjunct, or one is a subset of the other.
The exact set of characters matched by each character set is defined in pomsky, right? Then couldn't you parse them into an interval set? These interval sets can be efficiently unioned, intersected, and compared (equal, subset, disjoint).
That's what the regex crate also does under the hood. We also do this for eslint-plugin-regexp. Having this representation for characters, character sets, and character classes makes it pretty easy to implement some optimizations.
Yes, except that we want to preserve \w, \d, \s, \p{Greek}, \p{Separator}, etc. rather than lowering them to a lot of ranges, so we can emit the smallest possible output.
Preserving character sets and Unicode properties is not mutually exclusive with using interval sets. It's of course true that interval sets do not preserve the elements that created them, but that's also not really a problem. I meant to suggest that the optimizer should have a way to get the interval set from character elements, not that character classes should be represented by interval sets.
Is your feature request related to a problem? Please describe.
Pomsky currently does not remove unnecessary elements in character classes. E.g.
[ w "abc" ]
compiles to[\wabc]
(Java). However, theabc
is unnecessary because[\wabc]
==\w
.Describe the solution you'd like
Remove unnecessary elements in character classes to optimize and simplify them.
Additional context
This requires knowing the precise set of characters accepted by each character class element. For an example implementation of this, checkout the
regexp/no-dupe-characters-character-class
rule.The text was updated successfully, but these errors were encountered: