LL Parsing - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

LL Parsing

Description:

DFA yields the predicted alt number. Grammar actions are not sucked into ... { int alt=0; if ( LA(1)==A ) consume(); else error; while ( LA(1)==A ) consume ... – PowerPoint PPT presentation

Number of Views:77

Avg rating:3.0/5.0

Slides: 23

Provided by: terenc2

Learn more at: https://www.antlr.org

Category:

more less

Transcript and Presenter's Notes

Title: LL Parsing

1
LL() Parsing

Terence Parr
University of San Francisco

2
Topics

Research goals
Background
Problem definition
Solution overview
What is LL()?
How much more powerful is it?
Limitations
Nondeterminism detection
LL() Algorithm
Generated code

3
Research goals

Make top-down LL-based parsers as powerful as
possible
allows more natural grammars
language tools more accessible
My research constrained by what programmers
can/will use
recursive-descent parsers must be the base
k1 fixed lookahead
semantic predicates
syntactic predicates controlled backtracking and
means of specifying ambiguity resolution
And for my next trick LL()

4
Background parsers

Building a parser generator is easy except for
the lookahead analysis
rule ref ? rule()
token ref ? match(t)
rule def ? void rule() if (
lookahead-expr-alt 1 ) match alt 1 else if
( lookahead-expr-alt 2 ) match alt 2 else
error
The nature of the lookahead expressions dictates
the strength of your parser generator

5
LL(2) parser example
void a() if ( LA(1)A LA(2)X )
match(A) match(X) match(R) else if (
LA(1)A LA(2)Y ) match(A)
match(Y) match(S) else error
a A X R A Y S
Lookahead is set of2-sequences that indicate
what alternativewill ultimately succeed
6
Lookahead as DFA
void a() int alt0 if ( LA(1)A )
if ( LA(2)X ) alt1 if ( LA(2)Y )
alt2 switch (alt) case 1
match(A) match(X) match(R)
case 2 match(A) match(Y)
match(S) default error
7
Linear approximate lookahead

Note that LA(1) doesnt help distinguish
Often its the depth not sequence of tokens that
matters
Reduces O(Tk)to O(T x k) spacefor
lookaheadsequences
Collapse all tokensat depth d
Only slightly weakerthan LL(k)

void a() int alt0 if ( LA(2)X ) alt1
if ( LA(2)Y ) alt2 switch (alt) case
1 case 2 default error
8
Problem what cant LL(k) do?

Cant see past arbitrarily long constructs from
left edge
For example, cant see past A here
Could left-factor, but not always possible and
its unnatural!

a A X R A Y S
a A (X R Y S)
9
Solution overview

Natural extension to LL(k) lookahead DFA Allow
cyclic DFA that can skip ahead past the As to X
or Y
Dont approximate entire CFGwith regex i.e.,
dont include R or S
Just predict and proceed normallywith LL parse
DFA yields the predicted alt number
Grammar actions are not sucked intoDFAs and
arent executed duringprediction

10
LL() code

Arbitrary cyclic graphs cant be encoded w/o
gotos in Java, but here a simple while is ok

void a() int alt0 if ( LA(1)A )
consume() else error while ( LA(1)A )
consume() if ( LA(1)X ) alt1 if (
LA(1)Y ) alt2 switch (alt) case 1
case 2 default error
11
Isnt that just backtracking?

No. For example, if I can guarantee you will
never lookahead more than 10 symbols, it's just
LL(10), right?
Not backtracking with the parser! DFA is smaller
and faster e.g., DFA predicting expr does not
follow deep call chain parser does
Dont have to avoid or unroll actions in grammar!
The DFAs are efficiently coded and automatically
throttle down when less lookahead is needed

12
Do we need LL() in practice?

Natural grammars sometimes not LL(k) e.g. C
function decl vs def
From the left edge, lookahead is not fixed to see
the vs . We need arbitrary lookahead
because of the arg
If you have actions at ID, cant easily refactor
Lookahead will be 5k10 usually for this decision

func type ID ( arg ) type ID
( arg ) body
13
Can we classify LL() strength?

Obviously stronger than LL(k) for fixed k
Weaker than syntactic predicates LL(k), but
its automatic and faster
ANTLR v3 will have LL() syntactic predicates
)
What about LL(k)s traditional foe LR(k) and its
nefarious minion LALR(1) (yacc)?
No strict ordering! (see next slide)
Weaker than GLR or any other system that handles
all context-free grammars

14
LL() vs LR(k)

LR(k) even with k1 is generally more powerful
than LL() or at least more efficient for same
grammar, but there is no strict ordering add
epsilon rule refs to left edge of our grammar and
its not LR(k) for fixed k derived from adding
actions

a b A X R c A Y S b c
LL() but not LR(k) due to reduce-reduceconflict
15
LL() Strength Limitations

Limited to regular approximation
Creating regular covering approximation to
lookahead language of context-free grammar
fragment
Cant distinguish between context-free fragments
Cant see past recursive structures
Still deterministic cant deal with ambiguous
grammars must pick one interpretation

16
Cant see past recursion

LL() DFA construction takes LL stack into
consideration, but resulting DFA will not have
stack uses sequence instead
Example weakness (same language diff grammar)

// works a b X b Y b A
// doesnt work a b X b Y b A A b
// tail recursion
t.g25 Alternative 1 after matching input such
as A A A A decision cannot predict what comes
next due to recursion overflow to b from
b t.g25 Alternative 2
17
LL() Static Analysis Problems

Sometimes LL() creates giant DFA looking for
more lookahead to distinguish alternatives
most often due to true ambiguity
wont ever succeed, but it keeps trying
w/o throttle would be hideous in time/space
Workarounds
can manually set fixed k lookahead
refactor grammar if ambiguous or to reduce
lookahead requirements
Algorithm O() constant is critical got java.g
processing to drop from 20 minutes to 10s

18
LL() Analysis Benefits

LL() analysis and resulting prediction DFAs are
paradoxically simpler sometimes
LL(k) must compute all possible sequences with
fixed k length using acyclic DFA
LL(3) lookahead of (AB) is AAA,AAB,ABA,ABB,BAA,
BAB,BBA,BBB
LL() lookahead of (AB) is simply

19
LL() Algorithm Outline

build RTN-like NFA from grammar (similar to LR
machine construction actually)
modified classical NFA-to-DFA conversion (subset
construction algorithm)
DFA state encodes configurations NFA could be in
after having seen input sequence including call
invocation stack
NFA configuration (saltcontext) tracks state,
predicted alt, and rule invocation stack to get
to that state
terminate algorithm when state uniquely predicts
an alternative or nondeterminism found (sictx)
and (sjctx) for same state s but different alts
i,j and same/similar context
verify DFA is reduced and all alternatives have
predict state

20
Example difference from classical conversion
a A X R A Y S
a (AA) B
DFA
DFA
LL()
LL()
Stops as nondeterminism or unique prediction
21
Generated Code

acyclic DFA generated inline as above
cyclic DFA dumped as state objects and walked at
parse-time withint predict(IntStream input,
State start)

class DFA3 extends DFA DFA.State s2 new
DFA.State() alt1 DFA.State s1 new
DFA.State() public DFA.State
transition(IntStream input) switch (
input.LA(1) ) case X return s2

22
Summary and Conclusions

LL() syntactic predicates is the most powerful
parsing strategy accessible and attractive to
average programmer
LL() has all benefits of LL but is much
stronger results in natural grammars
Doesn't alter recursive descent parser itself at
all just enhances the predictive capabilities.
Basic algorithm is not that complicated, but
making it real and useful is interesting it
has taken 2.5 years to fully understand
Pre-release http//www.antlr.org/download/