Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations for assertions and lookarounds #56

Open
RunDevelopment opened this issue Dec 24, 2022 · 0 comments
Open

Optimizations for assertions and lookarounds #56

RunDevelopment opened this issue Dec 24, 2022 · 0 comments
Labels
C-optimize Issue or feature request for an optimization enhancement New feature or request

Comments

@RunDevelopment
Copy link

RunDevelopment commented Dec 24, 2022

Is your feature request related to a problem? Please describe.

Lookarounds are useful for many things and present interesting optimization opportunities. There are a few optimizations that I would like to request.

Describe the solution you'd like

I would like to see the following optimizations:

  1. Inlining lookarounds at the end of the expression tree of a lookaround. E.g. >> "a" (>> "b")>> "a" "b". For this optimization to be valid, the inner and outer lookarounds have to look in the same direction, the inner lookaround must be not be negated, and the inner lookaround must be at the end of the expression tree of the one.
  2. Outline assertions at the start of non-negated lookarounds. E.g. >> ^ "a"^ (>> "a"). In general, any element that does not consume characters at the start of a non-negated lookaround can be outlined.
  3. Removing trivially accepting assertions. E.g. (>> [w]) "foo""foo". Assertions that can be statically determined to always accept because of the surrounding pattern and do not have any side effects (e.g. capturing groups) can be removed (= replaced with the empty string "", the neutral element of concatenation).
  4. Removing trivially rejecting assertions. E.g. (>> "a") "foo" | "bar"∅ "foo" | "bar" (I'm using to denote the empty set). Assertions that can be statically determined to always accept because of the surrounding pattern and do not have any side effects (e.g. capturing groups) can be removed (= replaced with the empty set, the absorbing element of concatenation).
  5. Applying assertions. E.g. (>> [w]) C[w], (!>> [w]) C![w]. Single-character lookarounds can be removed by applying them to the character before/after them. This is not only an optimization (these optimized regexes are around 4x faster in JavaScript), but it can also be used as a method to achieve character class intersection and subtraction.

Notes about 3 and 4:

  • These optimizations should work on branches of lookarounds, not the whole lookaround. E.g. (>> "a" | "-") [w]+(>> "a") [w]+.
  • Boundary and start/end assertions should also be included.
  • Determining whether an assertion always accepts or rejects is quite difficult, but there are some fast approximations that can be used. One idea is to only consider the first character that comes before/after an assertion. Knowing the next character before/after an assertion is enough to optimize start, end, and boundary assertions as well as a lot of lookarounds. We implement this approach in many optimization-related rules in eslint-plugin-regexp.
  • Optimizing rejecting assertions is especially useful for optimizing custom boundary assertions. E.g. using JS syntax (?:(?<=\w)(?!\w)|(?<!\w)(?=\w))foo can be optimized to (?:(?<=\w)∅|(?<!\w)(?=\w))foo(?:(?<!\w)(?=\w))foo(?:(?<!\w))foo(?<!\w)foo.
@RunDevelopment RunDevelopment added the enhancement New feature or request label Dec 24, 2022
@Aloso Aloso added the C-optimize Issue or feature request for an optimization label Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-optimize Issue or feature request for an optimization enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants