Reimplement parser #450

Alex-Fischman · 2024-10-21T00:42:30Z

This PR implements a fully-Rust parser, which adds better error messages. This has the additional benefits of reducing dependencies and reducing variance in small benchmarks.

codspeed-hq · 2024-10-21T00:51:32Z

CodSpeed Performance Report

Merging #450 will not alter performance

_{Comparing Alex-Fischman:fast-parser (4d70999) with main (af49ae2)}

Summary

✅ 8 untouched benchmarks

yihozhang · 2024-10-21T18:08:55Z

Thanks for this heroic effort! This is amazing! Two thoughts:

Parser combinator is known to be asymptotically slower than bottom-up parsers like what lalrpop provides. However here we see speedups for several small to medium programs. Have you compared the two parsers with larger programs and see their performance?
The parser lalrpop generates is very, very large, something like 50k lines of code, so I imagine this PR can greatly reduce the compilation time (especially when one just tweaks the parser a little bit). We can maybe try to get some concrete numbers on this once your PR is ready.

Alex-Fischman · 2024-10-21T18:18:06Z

Parser combinator is known to be asymptotically slower than bottom-up parsers like what lalrpop provides. However here we see speedups for several small to medium programs. Have you compared the two parsers with larger programs and see their performance?

These two commits speed up the slowest part of the combinator approach, which are the large multi-way choices. It fixes this by branching on the name of the command/action/schedule, which lets us skip attempting to parse them.

Alex-Fischman · 2024-10-22T06:50:59Z

This parser still doesn't have great error messages, but they are better in some cases. I'm marking as ready for review to get more eyes. It's at the point where I'd like to merge and incrementally improve error messages as bad cases come up (which we can do now because it's all in Rust).

thaliaarchi · 2024-10-25T04:34:40Z

It seems to me that this would be vastly simplified by parsing to a general S-expression structure, then resolving commands according to their identifiers. Parsing and semantic analysis would be decoupled and both would become more consistent.

For parsing: The grammar for a single S-expression would be something like sexp ::= "(" WS* IDENT (WS+ (":" IDENT WS+)? sexp)* ")" | literal (but with proper handling of whitespace; e.g., required between ident and integer literal, but not ident and paren). The code duplication for similar patterns is then eliminated and combinators wouldn't be necessary.

For semantic analysis: Each command would register its signature (e.g., arity, types, optional arguments, or whatever needed), then they'd be resolved by identifier. Since they'd be parsed to a single structure with spans, shared across all commands, then error reporting has far fewer cases. This would allow more invalid programs to be parsed, where, for example, the only problem is in arities or names of optional arguments, and the correct signature could be reported in the error.

If there is no lookahead, each top-level command could be parsed independently with allocations recycled, so the AST for the entire program does not need to be generated.

Alex-Fischman · 2024-10-25T05:59:40Z

I agree that parsing to S-expressions first does almost certainly improve the error reporting somewhat, although I'm not sure how much of a win it actually is; currently error reporting is pretty simple.

I'm not motivated enough to write two parsers for the same language in one week, but if someone does want to take a shot at it you can use my toy parser from a year ago as a reference/baseline/starting point. It parses a subset of egglog but the basic ideas are there. (Slice is just Span.)

I do think that merging this PR makes that job slightly easier, since this PR consolidates all of the parsing in one place and does some related cleanup (like removing Quote).

Alex-Fischman · 2024-10-25T06:00:49Z

An additional related Zach thought to put on people's radar: an egglog fuzzer would be nice, but it's the sort of thing that's perpetually low priority until it gets done.

Alex-Fischman changed the title ~~Fast parser~~ Reimplement parser Oct 21, 2024

Alex-Fischman marked this pull request as ready for review October 22, 2024 06:50

Alex-Fischman requested a review from a team as a code owner October 22, 2024 06:50

Alex-Fischman requested review from saulshanabrook and removed request for a team October 22, 2024 06:50

saulshanabrook requested review from yihozhang and removed request for saulshanabrook October 22, 2024 15:18

Alex-Fischman force-pushed the fast-parser branch from 3a71e3d to 03aa96c Compare October 23, 2024 02:05

oflatt requested a review from ezrosent October 24, 2024 16:41

Alex-Fischman added 17 commits October 24, 2024 13:57

Implement broken parser

19d6379

Cleanup, still broken (spans are 0..0?)

140bec3

Improve errors, still broken

dede1e0

Fix bug in text()

ba49128

Undo ident_after_parens

8e3549b

Fix choice error returns

faf1624

Make Parser return Span

cf5519e

Clean up, making progress on failing tests

a621683

Make progress by reordering choices

7971345

Fix (run) being greedy

9ffb7e3

Passing all tests

fcce178

Optimize command() with ident_after_paren()

9455886

Also optimize schedule() and non_let_action()

4acf789

cargo fmt

e02bad4

Improve error messages with repeat_until()

e602088

Distinguish EndOfFile errors, fix Unicode get_location() bug

5b649af

Don't print negative ranges

297139b

Remove unused line

4d70999

Alex-Fischman force-pushed the fast-parser branch from 03aa96c to 4d70999 Compare October 24, 2024 21:00

Alex-Fischman removed the request for review from yihozhang October 24, 2024 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplement parser #450

Reimplement parser #450

Alex-Fischman commented Oct 21, 2024 •

edited

Loading

codspeed-hq bot commented Oct 21, 2024 •

edited

Loading

yihozhang commented Oct 21, 2024

Alex-Fischman commented Oct 21, 2024

Alex-Fischman commented Oct 22, 2024 •

edited

Loading

thaliaarchi commented Oct 25, 2024 •

edited

Loading

Alex-Fischman commented Oct 25, 2024

Alex-Fischman commented Oct 25, 2024

Reimplement parser #450

Are you sure you want to change the base?

Reimplement parser #450

Conversation

Alex-Fischman commented Oct 21, 2024 • edited Loading

codspeed-hq bot commented Oct 21, 2024 • edited Loading

CodSpeed Performance Report

Merging #450 will not alter performance

Summary

yihozhang commented Oct 21, 2024

Alex-Fischman commented Oct 21, 2024

Alex-Fischman commented Oct 22, 2024 • edited Loading

thaliaarchi commented Oct 25, 2024 • edited Loading

Alex-Fischman commented Oct 25, 2024

Alex-Fischman commented Oct 25, 2024

Alex-Fischman commented Oct 21, 2024 •

edited

Loading

codspeed-hq bot commented Oct 21, 2024 •

edited

Loading

Alex-Fischman commented Oct 22, 2024 •

edited

Loading

thaliaarchi commented Oct 25, 2024 •

edited

Loading