Title: LL Parsing
1LL() Parsing
- Terence Parr
- University of San Francisco
2Topics
- Research goals
- Background
- Problem definition
- Solution overview
- What is LL()?
- How much more powerful is it?
- Limitations
- Nondeterminism detection
- LL() Algorithm
- Generated code
3Research goals
- Make top-down LL-based parsers as powerful as
possible - allows more natural grammars
- language tools more accessible
- My research constrained by what programmers
can/will use - recursive-descent parsers must be the base
- k1 fixed lookahead
- semantic predicates
- syntactic predicates controlled backtracking and
means of specifying ambiguity resolution - And for my next trick LL()
4Background parsers
- Building a parser generator is easy except for
the lookahead analysis - rule ref ? rule()
- token ref ? match(t)
- rule def ? void rule() if (
lookahead-expr-alt 1 ) match alt 1 else if
( lookahead-expr-alt 2 ) match alt 2 else
error - The nature of the lookahead expressions dictates
the strength of your parser generator
5LL(2) parser example
void a() if ( LA(1)A LA(2)X )
match(A) match(X) match(R) else if (
LA(1)A LA(2)Y ) match(A)
match(Y) match(S) else error
a A X R A Y S
Lookahead is set of2-sequences that indicate
what alternativewill ultimately succeed
6Lookahead as DFA
void a() int alt0 if ( LA(1)A )
if ( LA(2)X ) alt1 if ( LA(2)Y )
alt2 switch (alt) case 1
match(A) match(X) match(R)
case 2 match(A) match(Y)
match(S) default error
7Linear approximate lookahead
- Note that LA(1) doesnt help distinguish
- Often its the depth not sequence of tokens that
matters - Reduces O(Tk)to O(T x k) spacefor
lookaheadsequences - Collapse all tokensat depth d
- Only slightly weakerthan LL(k)
void a() int alt0 if ( LA(2)X ) alt1
if ( LA(2)Y ) alt2 switch (alt) case
1 case 2 default error
8Problem what cant LL(k) do?
- Cant see past arbitrarily long constructs from
left edge - For example, cant see past A here
- Could left-factor, but not always possible and
its unnatural!
a A X R A Y S
a A (X R Y S)
9Solution overview
- Natural extension to LL(k) lookahead DFA Allow
cyclic DFA that can skip ahead past the As to X
or Y - Dont approximate entire CFGwith regex i.e.,
dont include R or S - Just predict and proceed normallywith LL parse
- DFA yields the predicted alt number
- Grammar actions are not sucked intoDFAs and
arent executed duringprediction
10LL() code
- Arbitrary cyclic graphs cant be encoded w/o
gotos in Java, but here a simple while is ok
void a() int alt0 if ( LA(1)A )
consume() else error while ( LA(1)A )
consume() if ( LA(1)X ) alt1 if (
LA(1)Y ) alt2 switch (alt) case 1
case 2 default error
11Isnt that just backtracking?
- No. For example, if I can guarantee you will
never lookahead more than 10 symbols, it's just
LL(10), right? - Not backtracking with the parser! DFA is smaller
and faster e.g., DFA predicting expr does not
follow deep call chain parser does - Dont have to avoid or unroll actions in grammar!
- The DFAs are efficiently coded and automatically
throttle down when less lookahead is needed
12Do we need LL() in practice?
- Natural grammars sometimes not LL(k) e.g. C
function decl vs def - From the left edge, lookahead is not fixed to see
the vs . We need arbitrary lookahead
because of the arg - If you have actions at ID, cant easily refactor
- Lookahead will be 5k10 usually for this decision
func type ID ( arg ) type ID
( arg ) body
13Can we classify LL() strength?
- Obviously stronger than LL(k) for fixed k
- Weaker than syntactic predicates LL(k), but
its automatic and faster - ANTLR v3 will have LL() syntactic predicates
) - What about LL(k)s traditional foe LR(k) and its
nefarious minion LALR(1) (yacc)? - No strict ordering! (see next slide)
- Weaker than GLR or any other system that handles
all context-free grammars
14LL() vs LR(k)
- LR(k) even with k1 is generally more powerful
than LL() or at least more efficient for same
grammar, but there is no strict ordering add
epsilon rule refs to left edge of our grammar and
its not LR(k) for fixed k derived from adding
actions
a b A X R c A Y S b c
LL() but not LR(k) due to reduce-reduceconflict
15LL() Strength Limitations
- Limited to regular approximation
- Creating regular covering approximation to
lookahead language of context-free grammar
fragment - Cant distinguish between context-free fragments
- Cant see past recursive structures
- Still deterministic cant deal with ambiguous
grammars must pick one interpretation
16Cant see past recursion
- LL() DFA construction takes LL stack into
consideration, but resulting DFA will not have
stack uses sequence instead - Example weakness (same language diff grammar)
// works a b X b Y b A
// doesnt work a b X b Y b A A b
// tail recursion
t.g25 Alternative 1 after matching input such
as A A A A decision cannot predict what comes
next due to recursion overflow to b from
b t.g25 Alternative 2
17LL() Static Analysis Problems
- Sometimes LL() creates giant DFA looking for
more lookahead to distinguish alternatives - most often due to true ambiguity
- wont ever succeed, but it keeps trying
- w/o throttle would be hideous in time/space
- Workarounds
- can manually set fixed k lookahead
- refactor grammar if ambiguous or to reduce
lookahead requirements - Algorithm O() constant is critical got java.g
processing to drop from 20 minutes to 10s
18LL() Analysis Benefits
- LL() analysis and resulting prediction DFAs are
paradoxically simpler sometimes - LL(k) must compute all possible sequences with
fixed k length using acyclic DFA - LL(3) lookahead of (AB) is AAA,AAB,ABA,ABB,BAA,
BAB,BBA,BBB - LL() lookahead of (AB) is simply
19LL() Algorithm Outline
- build RTN-like NFA from grammar (similar to LR
machine construction actually) - modified classical NFA-to-DFA conversion (subset
construction algorithm) - DFA state encodes configurations NFA could be in
after having seen input sequence including call
invocation stack - NFA configuration (saltcontext) tracks state,
predicted alt, and rule invocation stack to get
to that state - terminate algorithm when state uniquely predicts
an alternative or nondeterminism found (sictx)
and (sjctx) for same state s but different alts
i,j and same/similar context - verify DFA is reduced and all alternatives have
predict state
20Example difference from classical conversion
a A X R A Y S
a (AA) B
DFA
DFA
LL()
LL()
Stops as nondeterminism or unique prediction
21Generated Code
- acyclic DFA generated inline as above
- cyclic DFA dumped as state objects and walked at
parse-time withint predict(IntStream input,
State start)
class DFA3 extends DFA DFA.State s2 new
DFA.State() alt1 DFA.State s1 new
DFA.State() public DFA.State
transition(IntStream input) switch (
input.LA(1) ) case X return s2
22Summary and Conclusions
- LL() syntactic predicates is the most powerful
parsing strategy accessible and attractive to
average programmer - LL() has all benefits of LL but is much
stronger results in natural grammars - Doesn't alter recursive descent parser itself at
all just enhances the predictive capabilities. - Basic algorithm is not that complicated, but
making it real and useful is interesting it
has taken 2.5 years to fully understand - Pre-release http//www.antlr.org/download/