-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement parser #450
base: main
Are you sure you want to change the base?
Reimplement parser #450
Conversation
CodSpeed Performance ReportMerging #450 will not alter performanceComparing Summary
|
Thanks for this heroic effort! This is amazing! Two thoughts:
|
These two commits speed up the slowest part of the combinator approach, which are the large multi-way choices. It fixes this by branching on the name of the command/action/schedule, which lets us skip attempting to parse them. |
This parser still doesn't have great error messages, but they are better in some cases. I'm marking as ready for review to get more eyes. It's at the point where I'd like to merge and incrementally improve error messages as bad cases come up (which we can do now because it's all in Rust). |
3a71e3d
to
03aa96c
Compare
03aa96c
to
4d70999
Compare
It seems to me that this would be vastly simplified by parsing to a general S-expression structure, then resolving commands according to their identifiers. Parsing and semantic analysis would be decoupled and both would become more consistent. For parsing: The grammar for a single S-expression would be something like For semantic analysis: Each command would register its signature (e.g., arity, types, optional arguments, or whatever needed), then they'd be resolved by identifier. Since they'd be parsed to a single structure with spans, shared across all commands, then error reporting has far fewer cases. This would allow more invalid programs to be parsed, where, for example, the only problem is in arities or names of optional arguments, and the correct signature could be reported in the error. If there is no lookahead, each top-level command could be parsed independently with allocations recycled, so the AST for the entire program does not need to be generated. |
I agree that parsing to S-expressions first does almost certainly improve the error reporting somewhat, although I'm not sure how much of a win it actually is; currently error reporting is pretty simple. I'm not motivated enough to write two parsers for the same language in one week, but if someone does want to take a shot at it you can use my toy parser from a year ago as a reference/baseline/starting point. It parses a subset of egglog but the basic ideas are there. ( I do think that merging this PR makes that job slightly easier, since this PR consolidates all of the parsing in one place and does some related cleanup (like removing |
An additional related Zach thought to put on people's radar: an egglog fuzzer would be nice, but it's the sort of thing that's perpetually low priority until it gets done. |
This PR implements a fully-Rust parser, which adds better error messages. This has the additional benefits of reducing dependencies and reducing variance in small benchmarks.