Default to re2 parser is available #184
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After benchmarking, the results are out, at least on the current sample file:
First, re2 is ridiculously faster than the basic parser, even with tons of caching. re2 does benefit from caching, but it's so fast that it needs very high hitrates (so a very large cache) for the caching to have a real impact, it's fast enough that at low hitrates (small sizes) the cache does slow down parsing visibly which is not the case of the basic parser.
Second, LRU is confirmed to be a better cache replacement policy than clearing (which... duh), it's not super sensible at very low sizes but at 100 entries it starts really pulling ahead, so definitely the better default at 200 (where even with the overhead of the more layered approach it's ahead of the legacy parser and its immutable 20 entries clearing cache).
The locking doesn't seem to have much impact without contention, and even contended the LRU seems to behave way better than the clearing cache still. So fallback onto locked LRU if re2 is not available.