Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Convert Go regex to Rust regex #76

Merged
merged 1 commit into from
Nov 7, 2023
Merged

Conversation

fbs
Copy link
Contributor

@fbs fbs commented Oct 30, 2023

Go and Rust have some differences in their regex implementation. In Go the following is valid:

uri=~"/v[1-9]/.*/{gid}/{uid}"

Rust sees the {gid} as a bad repetition, it expects a range like {1,4}

regex parse error:
    /v[1-9]/.*/{gid}/{uid}
                ^
error: repetition quantifier expects a valid decimal

For it to be a valid Rust regex the opening { has to be escaped.

This change adds that escaping.

The implementation is simple, iteration + writing in a result buffer. It is probably not the best of nicest way of doing it, but it straight forward.

Fixes #75

@fbs fbs mentioned this pull request Oct 30, 2023
@fbs fbs changed the title Convert Go regex to Rust regex fix: Convert Go regex to Rust regex Oct 30, 2023
@codecov
Copy link

codecov bot commented Oct 30, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (6cf0736) 98.88% compared to head (8ec4d87) 98.90%.

Changes have been made to critical files, which contain lines commonly executed in production. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #76      +/-   ##
==========================================
+ Coverage   98.88%   98.90%   +0.01%     
==========================================
  Files          14       14              
  Lines        6126     6215      +89     
==========================================
+ Hits         6058     6147      +89     
  Misses         68       68              
Files Coverage Δ
src/label/matcher.rs 100.00% <100.00%> (ø)
src/parser/parse.rs Critical 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yuanbohan
Copy link
Contributor

yuanbohan commented Oct 31, 2023

Hi, thanks for your contribution 👍
Could you fix the clippy error? You can refer #77, or you can rebase the main branch after the PR #77 is merged

src/label/matcher.rs Show resolved Hide resolved
src/label/matcher.rs Show resolved Hide resolved
src/label/matcher.rs Outdated Show resolved Hide resolved
src/label/matcher.rs Outdated Show resolved Hide resolved
Go and Rust have some differences in their regex implementation. In Go
the following is valid:

```
uri=~"/v[1-9]/.*/{gid}/{uid}"
```

Rust sees the `{gid}` as a bad repetition, it expects a range like {1,4}

```
regex parse error:
    /v[1-9]/.*/{gid}/{uid}
                ^
error: repetition quantifier expects a valid decimal
```

For it to be a valid Rust regex the opening { has to be escaped.

This change adds that escaping.

The implementation is simple, iteration + writing in a result buffer. It
is probably not the best of nicest way of doing it, but it straight
forward
@yuanbohan
Copy link
Contributor

@waynexia @evenyag please take a look

@evenyag evenyag requested a review from v0y4g3r November 1, 2023 04:04
@v0y4g3r
Copy link

v0y4g3r commented Nov 3, 2023

Looks like we only need to convert the { characters without preceding backslash (\) to \{ while remove the preceding \ from \{ in Golang regex to make it work in Rust regex.

The problem is the difference between Go and Rust in handling repetition sequence and bracket literal. Go treats { as
bracket literal while Rust treats { as the start of repetition sequence.

For example, for /v[1-9]/.*/{gid}/{uid} in Golang, the corresponding Rust regex would be /v[1-9]/.*/\{gid\}/\{uid\}: https://regex101.com/r/k7b6N8/1

@fbs
Copy link
Contributor Author

fbs commented Nov 3, 2023

Yes thats basically what this MR does, Go 'auto escapes' the sequence while Rust doesn't. So the conversion is a{aa} -> a\{aa} but a{1,2} stays a{1,2}

@waynexia
Copy link
Member

waynexia commented Nov 6, 2023

Yes thats basically what this MR does, Go 'auto escapes' the sequence while Rust doesn't. So the conversion is a{aa} -> a\{aa} but a{1,2} stays a{1,2}

Should a{aa} goes to a\{aa\}? } needs escaping as well.

@yuanbohan
Copy link
Contributor

Yes thats basically what this MR does, Go 'auto escapes' the sequence while Rust doesn't. So the conversion is a{aa} -> a\{aa} but a{1,2} stays a{1,2}

Should a{aa} goes to a\{aa\}? } needs escaping as well.

I'm interested in the escaping of }, maybe it does not affect the result if the regex does not find the starting of repetition pattern?

@fbs
Copy link
Contributor Author

fbs commented Nov 6, 2023

I'm not entirely sure what you mean. In my testing just escaping the { is enough. A } without leading (or with escaped) { is just seen as a literal

$ cat src/main.rs
use regex::Regex;

fn main() {
    let re = Regex::new(r"a}a\{a}").unwrap();
    let Some(caps) = re.captures("a}a{a}") else {
        println!("no match!");
        return;
    };
    println!("The name is: {:?}", caps);
}

/tmp/regex master[?] $ cargo r
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/regexxx`
The name is: Captures({0: 0..6/"a}a{a}"})

@waynexia waynexia merged commit 6d752e7 into GreptimeTeam:main Nov 7, 2023
9 checks passed
@waynexia
Copy link
Member

waynexia commented Nov 7, 2023

Thanks for your contribution @fbs ❤️

@fbs
Copy link
Contributor Author

fbs commented Nov 7, 2023

Thanks for the quick review team! <3.

I wonder if there are more regex incompatibility issues in this.

@yuanbohan
Copy link
Contributor

Thanks for the quick review team! <3.

I wonder if there are more regex incompatibility issues in this.

I will watch on this, thanks for your contribution 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failing to parse valid regex
4 participants