option: --script
Esrap is maintained courtesy of Steel Bank Studio Ltd by Nikodemus Siivola.
Esrap is maintained in Git:
git clone git://github.com/nikodemus/esrap.git
will get you a local copy.
http://github.com/nikodemus/esrap
is the GitHub project page.
Esrap is licenced under an MIT-style licence.
For more on packrat parsing, see http://pdos.csail.mit.edu/~baford/packrat/thesis/ for Bryan Ford's 2002 thesis: “Packrat Parsing: a Practical Linear Time Algorithm with Backtracking”.
Parsing proceeds by matching text against parsing expressions. Matching has three components: success vs failure, consumption of input, and associated production.
Parsing expressions that fail never consume input. Parsing expressions that succeed may or may not consume input.
A parsing expressions can be:
A terminal is a character or a string of length one, which succeeds and consumes a single character if that character matches the terminal.
Additionally, Esrap supports some pseudoterminals.
character
always succeeds, consuming
and producing a single character.
(character-ranges range ...)
match a
single character from the given range(s), consuming and producing that
character. A range is can be either a list of the form
(#\start_char #\stop_char)
or a single character.
"foo"
succeeds and consumes input as if (and #\f #\o
#\o)
. Produces the consumed string.
(string length)
can be used to specify
sequences of arbitrary characters: (string 2)
succeeds and
consumes input as if (and character character)
. Produces the
consumed string.
Nonterminals are specified using symbols. A nonterminal symbol succeeds if the parsing expression associated with it succeeds, and consumes whatever the input that expression consumes.
The production of a nonterminal depends on the associated expression and an optional transformation rule.
Nonterminals are defined using defrule
.
Note: Currently all rules share the same namespace, so you should not use symbols in the COMMON-LISP package or other shared packages to name your rules unless you are certain there are no other Esrap using components in your Lisp image. In a future version of Esrap grammar objects will be introduced to allow multiple definitions of nonterminals. Symbols in the COMMON-LISP package are specifically reserved for use by Esrap.
(and subexpression ...)
A sequence succeeds if all subexpressions succeed, and consumes all input consumed by the subexpressions. A sequence produces the productions of its subexpressions as a list.
(or subexpression ...)
An ordered choice succeeds if any of the subexpressions succeeds, and consumes all the input consumed by the successful subexpression. An ordered choice produces whatever the successful subexpression produces.
Subexpressions are checked strictly in the specified order, and once a subexpression succeeds no further ones will be tried.
(not subexpression)
A negation succeeds if the subexpression fails, and consumes one character of input. A negation produces the character it consumes.
(* subexpresssion)
A greedy repetition always succeeds, consuming all input consumed by applying subexpression repeatedly as long as it succeeds.
A greedy repetition produces the productions of the subexpression as a list.
(+ subexpresssion)
A greedy repetition succeeds if subexpression succeeds at least once, and consumes all input consumed by applying subexpression repeatedly as long as it succeeds. A greedy positive repetition produces the productions of the subexpression as a list.
(? subexpression)
Optionals always succeed, and consume whatever input the subexpression
consumes. An optional produces whatever the subexpression produces, or
nil
if the subexpression does not succeed.
(& subexpression)
A followed-by predicate succeeds if the subexpression succeeds, and consumes no input. A followed-by predicate produces whatever the subexpression produces.
(! subexpression)
A not-followed-by predicate succeeds if the subexpression does not
succeed, and consumes no input. A not-followed-by predicate
produces nil
.
(predicate-name subexpression)
The predicate-name
is a symbol naming a global function. A
semantic predicate succeeds if subsexpression succeeds and the
named function returns true for the production of the subexpression. A
semantic predicate produces whatever the subexpression produces.
Note: semantic predicates may change in the future to produce whatever the predicate function returns.
Define
symbol
as a nonterminal, usingexpression
as associated the parsing expression.Following
options
can be specified:
(:when test)
The rule is active only when
test
evaluates to true. This can be used to specify optional extensions to a grammar.(:constant constant)
No matter what input is consumed or what
expression
produces, the production of the rule is alwaysconstant
.(:function function)
If provided the production of the expression is transformed using
function
.function
can be a function name or a lambda-expression.(:identity boolean)
If true, the production of expression is used as-is, as if
(:function identity)
has been specified. If no production option is specified, this is the default.(:text boolean)
If true, the production of expression is flattened and concatenated into a string as if by
(:function text)
has been specified.(:lambda lambda-list &body body)
If provided, same as using the corresponding lambda-expression with
:function
.As an extension of the standard lambda list syntax,
lambda-list
accepts the optional pseudo lambda-list keywordesrap:&bounds
, which(1)
must appear after all standard lambda list keywords.(2)
can be followed by one or two variables to which bounding indexes of the matching substring are bound.Therefore:
lambda-list
::=
(standard-lambda-list-elements [&bounds start [end]])
(:destructure destructuring-lambda-list &body body)
If provided, same as using a lambda-expression that destructures its argument using
destructuring-bind
and the provided lambda-list with:function
.
destructuring-lambda-list
can useesrap:&bounds
in the same way as described for:lambda
.(:around ([&bounds start [end]]) &body body)
If provided, execute
body
around the construction of the production of the rule.body
has to callesrap:call-transform
to trigger the computation of the production. Any transformation provided via:lambda
,:function
or:destructure
is executed inside the call toesrap:call-transform
. As a result, modification to the dynamic state are visible within the transform.
esrap:&bounds
can be used in the same way as described for:lambda
and:destructure
.This option can be used to safely track nesting depth, manage symbol tables or for other stack-like operations.
Parses
text
usingexpression
fromstart
toend
. Incomplete parses are allowed only ifjunk-allowed
is true.
Prints the grammar tree rooted at nonterminal
symbol
tostream
for human inspection.
Arguments must be strings, or lists whose leaves are strings. Catenates all the strings in arguments into a single string.
Associates
rule
with the nonterminalsymbol
. Signals an error if the rule is already associated with a nonterminal. If the symbol is already associated with a rule, the old rule is removed first.
Modifies the nonterminal
symbol
to useexpression
instead. Temporarily removes the rule while it is being modified.
Returns rule designated by
symbol
, if any. Symbol must be a nonterminal symbol.
Makes the nonterminal
symbol
undefined. If the nonterminal is defined an already referred to by other rules, an error is signalled unless:force
is true.
Returns the dependencies of the
rule:
primary value is a list of defined nonterminal symbols, and secondary value is a list of undefined nonterminal symbols.
Modify
rule
to useexpression
as the parsing expression. The rule must be detached beforehand.
Returns the nonterminal associated with the
rule
, ornil
of the rule is not attached to any nonterminal.
Turn on tracing of nonterminal
symbol
. Ifrecursive
is true, turn on tracing for the whole grammar rooted atsymbol
. Ifbreak
is true, break is entered when the rule is invoked.
Turn off tracing of nonterminal
symbol
. Ifrecursive
is true, untraces the whole grammar rooted atsymbol
.break
is ignored, and is provided only for symmetry withtrace-rule
.
Class precedence list:
esrap-error, parse-error, error, serious-condition, condition, t
Signaled when an Esrap parse fails. Use
esrap-error-text
to obtain the string that was being parsed, andesrap-error-position
the position at which the error occurred.
Class precedence list:
left-recursion, esrap-error, parse-error, error, serious-condition, condition, t
Signaled when left recursion is detected during Esrap parsing.
left-recursion-nonterminal
names the symbol for which left recursion was detected, andleft-recursion-path
lists nonterminals of which the left recursion cycle consists.