Build status | Documentation |
---|---|
A simple straightforward implementation of PikaParser in pure Julia, following the specification by Luke A. D. Hutchison (see https://github.com/lukehutch/pikaparser).
Pika parsers are pretty fast, they are easy to specify, carry the ability to unambiguously match all PEG grammars including the left-recursive ones, and provide great mechanisms for parsing error recovery.
import PikaParser as P
All grammar clauses are subtype of a Clause
. The types are indexed by the
labels for your grammar rules -- Julia symbols are a natural choice, but you
are free to use integers, strings, or anything else.
rules = Dict(
# match a sequence of characters that satisfies `isdigit`
:digits => P.some(:digit => P.satisfy(isdigit)),
# expression in parentheses
:parens => P.seq(
P.token('('),
# you can name the rules in nested contexts
:expr => P.first(:plusexpr, :minusexpr, :digits, :parens),
P.token(')'),
),
# some random operators
:plusexpr => P.seq(:expr, P.token('+'), :expr),
:minusexpr => P.seq(:expr, P.token('-'), :expr),
)
g = P.make_grammar(
[:expr], # the top-level rule
P.flatten(rules, Char), # process the rules into a single level and specialize them for crunching Chars
)
The grammar is now prepared for parsing.
Parsing is executed simply by running your grammar on any indexable input using
parse
.
(Notably, PikaParsers require frequent indexing of inputs, and incremental
parsing of streams is thus complicated. To improve the performance, it is also
advisable to lex your input into a vector of more complex tokens, using e.g.
parse_lex
.)
input = "12-(34+567-8)"
p = P.parse(g, input)
You can find if an expression was matched at a certain position:
P.find_match_at!(p, :expr, 1)
...which returns an index in the match table (if found), such as 45
.
You can have a look at the match: p.matches[45]
should return:
PikaParser.Match(10, 1, 13, 2, 52, 0, 41, 0)
where 10
is the renumbered rule ID for :expr
, 1
is the starting position
of the match in the input, 13
is the last position of the match (here, that
means the whole input); 2
is the option index (in this case, it points to
:expr
option 2, which is :minusexpr
). The rest of the Match
structure is
used for internal values that organize the match tree and submatches.
You can use traverse_match
to recursively walk the parse trees, to produce
ASTs, and translate, interpret or evaluate the expressions:
P.traverse_match(p, P.find_match_at!(p, :expr, 1))
By default, this runs through the whole match tree and transcodes the matches
to Julia Expr
AST. In this case, if you pipe the output through
JuliaFormatter, you will get something like:
expr(
minusexpr(
expr(digits(digit("1"), digit("2"))),
var"minusexpr-2"("-"),
expr(
parens(
var"parens-1"("("),
expr(
plusexpr(
expr(digits(digit("3"), digit("4"))),
var"plusexpr-2"("+"),
expr(
minusexpr(
expr(digits(digit("5"), digit("6"), digit("7"))),
var"minusexpr-2"("-"),
expr(digits(digit("8"))),
),
),
),
),
var"parens-3"(")"),
),
),
),
)
It is straightforward to specify your own method of evaluating the parses by supplying the matchtree opening and folding functions. For example, you can evaluate the expression as follows:
P.traverse_match(p, P.find_match_at!(p, :expr, 1),
fold = (m, p, subvals) ->
m.rule == :digits ? parse(Int, m.view) :
m.rule == :expr ? subvals[1] :
m.rule == :parens ? subvals[2] :
m.rule == :plusexpr ? subvals[1] + subvals[3] :
m.rule == :minusexpr ? subvals[1] - subvals[3] : nothing,
)
You should get the expectable result (-581
).
PikaParser.jl
was developed at the Luxembourg Centre for Systems
Biomedicine of the University of Luxembourg
(uni.lu/lcsb).
The development was supported by European Union's Horizon 2020 Programme under
PerMedCoE project (permedcoe.eu),
agreement no. 951773.