Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop unicode aware regexes #216

Merged
merged 1 commit into from
Sep 7, 2023
Merged

Drop unicode aware regexes #216

merged 1 commit into from
Sep 7, 2023

Conversation

dburgener
Copy link
Owner

The regex tokens "\s" and "\d" are "unicode friendly" and could match whitespace and digit characters in other encodings than ASCII. The result of including them in our grammar is that we then require regex support for the entirety of unicode, which builds a large table of unicode characters into our binary.

Replace \s and \d with their ASCII-only equivalents, [[:space:]] and [[:digit:]].

Note that this does not remove our unicode dependency by itself. Lalrpop 0.19.12 builds in unicode support by default because it doesn't yet expose the correct configuration knobs to users to enable or disable unicode support. Once lalrpop 0.20 is released, then users that don't require unicode support will be able to disable it using a feature. This change moves Cascade to that "users that don't require unicode" group, so that we can take advantage of removing unicode support once Lalrpop exposes the correct knob.

The regex tokens "\s" and "\d" are "unicode friendly" and could match
whitespace and digit characters in other encodings than ASCII.  The
result of including them in our grammar is that we then require regex
support for the entirety of unicode, which builds a large table of
unicode characters into our binary.

Replace \s and \d with their ASCII-only equivalents, [[:space:]] and
[[:digit:]].

Note that this does not remove our unicode dependency by itself.
Lalrpop 0.19.12 builds in unicode support by default because it doesn't
yet expose the correct configuration knobs to users to enable or disable
unicode support.  Once lalrpop 0.20 is released, then users that don't
require unicode support will be able to disable it using a feature.
This change moves Cascade to that "users that don't require unicode"
group, so that we can take advantage of removing unicode support once
Lalrpop exposes the correct knob.
@dburgener dburgener marked this pull request as ready for review September 1, 2023 21:07
@dburgener
Copy link
Owner Author

Rebased and published this. Since 0.1 is delayed, I think it makes sense to take this for 0.1.

@github-actions
Copy link

github-actions bot commented Sep 1, 2023

Stable clippy has failed for this run: nightly. A maintainer should check the logs. If known good clippy passed, this is non-fatal to the PR.

@dburgener dburgener merged commit 67dbb65 into main Sep 7, 2023
11 checks passed
@dburgener dburgener deleted the dburgener/no-unicode branch September 7, 2023 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants